Friday, 1 August 2025

AI Voice Chat MAUI App - using open-source tools

 Here's a complete guide to creating a MAUI app that communicates via voice only, with language detection and voice response options. This solution uses entirely open-source technologies.

Architecture Overview

  1. Speech Recognition: Whisper.cpp (OpenAI's Whisper model ported to C++)

  2. AI Chat: llama.cpp (Local LLM) or Ollama API

  3. Text-to-Speech: Piper TTS (Open-source neural TTS)

  4. Database: SQLite for conversation history

  5. UI: .NET MAUI with background screensaver

Step 1: Set Up the MAUI Project

bash
dotnet new maui -n AIVoiceChat
cd AIVoiceChat

Step 2: Add Required NuGet Packages

text
dotnet add package Microsoft.Data.Sqlite
dotnet add package Community.Toolkit.Maui
dotnet add package SkiaSharp.Views.Maui.Controls

Step 3: MainPage.xaml

xml
<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns="http://schemas.microsoft.com/dotnet/2021/maui"
             xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
             xmlns:controls="clr-namespace:AIVoiceChat.Controls"
             x:Class="AIVoiceChat.MainPage"
             BackgroundColor="Black">

    <Grid>
        <!-- Screensaver Layer -->
        <controls:ScreensaverView x:Name="screensaver" 
                                IsVisible="True" />

        <!-- Conversation Layer (semi-transparent) -->
        <ScrollView x:Name="conversationScroll"
                   BackgroundColor="#80000000"
                   VerticalOptions="End"
                   Margin="20">
            <StackLayout x:Name="conversationContainer"
                        Spacing="10">
                <!-- Messages will be added here -->
            </StackLayout>
        </ScrollView>

        <!-- Voice Control Layer -->
        <Grid VerticalOptions="End"
              Margin="20"
              RowSpacing="10">
            <Grid.RowDefinitions>
                <RowDefinition Height="Auto" />
                <RowDefinition Height="Auto" />
            </Grid.RowDefinitions>

            <Button x:Name="listenButton"
                    Text="Hold to Speak"
                    BackgroundColor="#4CAF50"
                    TextColor="White"
                    CornerRadius="20"
                    HeightRequest="60"
                    Pressed="OnListenPressed"
                    Released="OnListenReleased" />

            <Label x:Name="statusLabel"
                   Text="Ready"
                   TextColor="White"
                   HorizontalOptions="Center"
                   Grid.Row="1" />
        </Grid>
    </Grid>
</ContentPage>

Step 4: Screensaver Control (ScreensaverView.xaml)

xml
<ContentView xmlns="http://schemas.microsoft.com/dotnet/2021/maui"
             xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
             xmlns:skia="clr-namespace:SkiaSharp.Views.Maui.Controls;assembly=SkiaSharp.Views.Maui.Controls"
             x:Class="AIVoiceChat.Controls.ScreensaverView">
    <skia:SKCanvasView x:Name="canvasView"
                       PaintSurface="OnCanvasViewPaintSurface"
                       EnableTouchEvents="True"
                       Touch="OnCanvasViewTouch" />
</ContentView>

Step 5: Screensaver Code-behind (ScreensaverView.xaml.cs)

csharp
using SkiaSharp;
using SkiaSharp.Views.Maui;

namespace AIVoiceChat.Controls;

public partial class ScreensaverView : ContentView
{
    private readonly List<Bubble> _bubbles = new();
    private readonly Random _random = new();
    private bool _isAnimating = true;
    
    public ScreensaverView()
    {
        InitializeComponent();
        StartAnimation();
    }
    
    private void StartAnimation()
    {
        // Create initial bubbles
        for (int i = 0; i < 20; i++)
        {
            _bubbles.Add(new Bubble
            {
                X = _random.Next(0, (int)Width),
                Y = _random.Next(0, (int)Height),
                Radius = _random.Next(10, 50),
                SpeedX = _random.Next(-3, 3),
                SpeedY = _random.Next(-3, 3),
                Color = SKColor.FromHsl(_random.Next(0, 360), 100, 50)
            });
        }
        
        Device.StartTimer(TimeSpan.FromMilliseconds(16), () =>
        {
            if (!_isAnimating) return false;
            
            UpdateBubbles();
            canvasView.InvalidateSurface();
            return _isAnimating;
        });
    }
    
    private void UpdateBubbles()
    {
        if (canvasView.CanvasSize.IsEmpty) return;
        
        for (int i = 0; i < _bubbles.Count; i++)
        {
            var bubble = _bubbles[i];
            
            // Update position
            bubble.X += bubble.SpeedX;
            bubble.Y += bubble.SpeedY;
            
            // Bounce off edges
            if (bubble.X - bubble.Radius < 0 || bubble.X + bubble.Radius > canvasView.CanvasSize.Width)
                bubble.SpeedX *= -1;
                
            if (bubble.Y - bubble.Radius < 0 || bubble.Y + bubble.Radius > canvasView.CanvasSize.Height)
                bubble.SpeedY *= -1;
        }
    }
    
    private void OnCanvasViewPaintSurface(object sender, SKPaintSurfaceEventArgs e)
    {
        var surface = e.Surface;
        var canvas = surface.Canvas;
        
        canvas.Clear(SKColors.Black);
        
        foreach (var bubble in _bubbles)
        {
            var paint = new SKPaint
            {
                Color = bubble.Color,
                IsAntialias = true,
                Style = SKPaintStyle.Fill
            };
            
            canvas.DrawCircle(bubble.X, bubble.Y, bubble.Radius, paint);
        }
    }
    
    private void OnCanvasViewTouch(object sender, SKTouchEventArgs e)
    {
        if (e.ActionType == SKTouchAction.Pressed)
        {
            // Add a new bubble where touched
            _bubbles.Add(new Bubble
            {
                X = e.Location.X,
                Y = e.Location.Y,
                Radius = _random.Next(10, 50),
                SpeedX = _random.Next(-3, 3),
                SpeedY = _random.Next(-3, 3),
                Color = SKColor.FromHsl(_random.Next(0, 360), 100, 50)
            });
            
            e.Handled = true;
        }
    }
    
    private class Bubble
    {
        public float X { get; set; }
        public float Y { get; set; }
        public float Radius { get; set; }
        public float SpeedX { get; set; }
        public float SpeedY { get; set; }
        public SKColor Color { get; set; }
    }
}

Step 6: MainPage Code-behind (MainPage.xaml.cs)

csharp
using AIVoiceChat.Controls;
using CommunityToolkit.Maui.Core;
using CommunityToolkit.Maui.Views;
using Microsoft.Data.Sqlite;
using System.Collections.ObjectModel;
using System.Text;

namespace AIVoiceChat;

public partial class MainPage : ContentPage
{
    private readonly IDatabaseService _database;
    private readonly ISpeechToText _speechToText;
    private readonly ITextToSpeech _textToSpeech;
    private readonly IAIChatService _aiChatService;
    
    private CancellationTokenSource _listenCts;
    private bool _isListening = false;
    
    public MainPage(IDatabaseService database, 
                   ISpeechToText speechToText,
                   ITextToSpeech textToSpeech,
                   IAIChatService aiChatService)
    {
        InitializeComponent();
        _database = database;
        _speechToText = speechToText;
        _textToSpeech = textToSpeech;
        _aiChatService = aiChatService;
        
        LoadConversationHistory();
    }
    
    private async void LoadConversationHistory()
    {
        var history = await _database.GetRecentConversationsAsync(10);
        
        foreach (var message in history.Reverse())
        {
            AddMessageToUI(message);
        }
    }
    
    private void AddMessageToUI(ConversationMessage message)
    {
        var messageLabel = new Label
        {
            Text = $"{message.Speaker}: {message.Text}",
            TextColor = message.IsUser ? Colors.LightBlue : Colors.LightGreen,
            FontSize = 16,
            Margin = new Thickness(10, 5)
        };
        
        conversationContainer.Children.Insert(0, messageLabel);
    }
    
    private async void OnListenPressed(object sender, EventArgs e)
    {
        if (_isListening) return;
        
        _isListening = true;
        statusLabel.Text = "Listening...";
        _listenCts = new CancellationTokenSource();
        
        try
        {
            var recognitionResult = await _speechToText.ListenAsync(_listenCts.Token);
            
            if (!string.IsNullOrEmpty(recognitionResult.Text))
            {
                // Add user message to UI and DB
                var userMessage = new ConversationMessage
                {
                    Text = recognitionResult.Text,
                    IsUser = true,
                    Timestamp = DateTime.Now,
                    Language = recognitionResult.Language
                };
                
                AddMessageToUI(userMessage);
                await _database.SaveMessageAsync(userMessage);
                
                // Get AI response
                statusLabel.Text = "Thinking...";
                var aiResponse = await _aiChatService.GetResponseAsync(recognitionResult.Text);
                
                // Add AI response to UI and DB
                var aiMessage = new ConversationMessage
                {
                    Text = aiResponse.Text,
                    IsUser = false,
                    Timestamp = DateTime.Now,
                    Language = aiResponse.Language,
                    VoiceType = Preferences.Get("SelectedVoice", "female")
                };
                
                AddMessageToUI(aiMessage);
                await _database.SaveMessageAsync(aiMessage);
                
                // Speak the response
                await _textToSpeech.SpeakAsync(aiResponse.Text, aiResponse.Language, aiMessage.VoiceType);
            }
        }
        catch (Exception ex)
        {
            statusLabel.Text = $"Error: {ex.Message}";
        }
        finally
        {
            _isListening = false;
            statusLabel.Text = "Ready";
        }
    }
    
    private void OnListenReleased(object sender, EventArgs e)
    {
        _listenCts?.Cancel();
        _isListening = false;
        statusLabel.Text = "Ready";
    }
}

Step 7: Database Service

csharp
using Microsoft.Data.Sqlite;
using System.Collections.ObjectModel;

namespace AIVoiceChat;

public interface IDatabaseService
{
    Task SaveMessageAsync(ConversationMessage message);
    Task<ObservableCollection<ConversationMessage>> GetRecentConversationsAsync(int count);
    Task InitializeAsync();
}

public class DatabaseService : IDatabaseService
{
    private const string DatabaseFilename = "conversations.db3";
    private static string DatabasePath => Path.Combine(FileSystem.AppDataDirectory, DatabaseFilename);
    
    public async Task InitializeAsync()
    {
        if (!File.Exists(DatabasePath))
        {
            using var connection = new SqliteConnection($"Data Source={DatabasePath}");
            await connection.OpenAsync();
            
            var command = connection.CreateCommand();
            command.CommandText = @"
                CREATE TABLE IF NOT EXISTS Messages (
                    Id INTEGER PRIMARY KEY AUTOINCREMENT,
                    Text TEXT NOT NULL,
                    IsUser INTEGER NOT NULL,
                    Timestamp DATETIME NOT NULL,
                    Language TEXT,
                    VoiceType TEXT
                )";
                
            await command.ExecuteNonQueryAsync();
        }
    }
    
    public async Task SaveMessageAsync(ConversationMessage message)
    {
        using var connection = new SqliteConnection($"Data Source={DatabasePath}");
        await connection.OpenAsync();
        
        var command = connection.CreateCommand();
        command.CommandText = @"
            INSERT INTO Messages (Text, IsUser, Timestamp, Language, VoiceType)
            VALUES (@text, @isUser, @timestamp, @language, @voiceType)";
            
        command.Parameters.AddWithValue("@text", message.Text);
        command.Parameters.AddWithValue("@isUser", message.IsUser ? 1 : 0);
        command.Parameters.AddWithValue("@timestamp", message.Timestamp);
        command.Parameters.AddWithValue("@language", message.Language ?? "");
        command.Parameters.AddWithValue("@voiceType", message.VoiceType ?? "");
        
        await command.ExecuteNonQueryAsync();
    }
    
    public async Task<ObservableCollection<ConversationMessage>> GetRecentConversationsAsync(int count)
    {
        var messages = new ObservableCollection<ConversationMessage>();
        
        using var connection = new SqliteConnection($"Data Source={DatabasePath}");
        await connection.OpenAsync();
        
        var command = connection.CreateCommand();
        command.CommandText = @"
            SELECT Text, IsUser, Timestamp, Language, VoiceType
            FROM Messages
            ORDER BY Timestamp DESC
            LIMIT @count";
            
        command.Parameters.AddWithValue("@count", count);
        
        using var reader = await command.ExecuteReaderAsync();
        while (await reader.ReadAsync())
        {
            messages.Add(new ConversationMessage
            {
                Text = reader.GetString(0),
                IsUser = reader.GetBoolean(1),
                Timestamp = reader.GetDateTime(2),
                Language = reader.IsDBNull(3) ? null : reader.GetString(3),
                VoiceType = reader.IsDBNull(4) ? null : reader.GetString(4)
            });
        }
        
        return messages;
    }
}

public class ConversationMessage
{
    public string Text { get; set; }
    public bool IsUser { get; set; }
    public DateTime Timestamp { get; set; }
    public string Language { get; set; }
    public string VoiceType { get; set; }
}

Step 8: Speech-to-Text Service (Whisper.cpp)

csharp
using System.Diagnostics;

namespace AIVoiceChat;

public interface ISpeechToText
{
    Task<SpeechRecognitionResult> ListenAsync(CancellationToken cancellationToken);
}

public class SpeechRecognitionResult
{
    public string Text { get; set; }
    public string Language { get; set; }
}

public class WhisperSpeechToText : ISpeechToText
{
    public async Task<SpeechRecognitionResult> ListenAsync(CancellationToken cancellationToken)
    {
        try
        {
            // Record audio (platform specific)
            var audioFile = await RecordAudioAsync();
            
            // Use whisper.cpp to process audio
            var result = await ProcessWithWhisper(audioFile);
            
            return new SpeechRecognitionResult
            {
                Text = result.Text,
                Language = result.Language
            };
        }
        catch (Exception ex)
        {
            Debug.WriteLine($"Speech recognition error: {ex}");
            return new SpeechRecognitionResult();
        }
    }
    
    private async Task<string> RecordAudioAsync()
    {
        // Platform-specific audio recording
        // For Android:
        // var audioFile = Path.Combine(FileSystem.CacheDirectory, "recording.wav");
        // Use Android.Media.MediaRecorder
        
        // For iOS:
        // Use AVAudioRecorder
        
        // For Windows:
        // Use NAudio or Windows.Media.Capture
        
        await Task.Delay(1000); // Simulate recording
        return "recording.wav";
    }
    
    private async Task<SpeechRecognitionResult> ProcessWithWhisper(string audioFile)
    {
        // This would call the whisper.cpp executable
        // You need to include the whisper.cpp binaries in your app
        
        var startInfo = new ProcessStartInfo
        {
            FileName = "whisper.cpp/main",
            Arguments = $"-m models/ggml-base.en.bin -f {audioFile} --language auto",
            RedirectStandardOutput = true,
            UseShellExecute = false,
            CreateNoWindow = true
        };
        
        using var process = new Process { StartInfo = startInfo };
        process.Start();
        
        var output = await process.StandardOutput.ReadToEndAsync();
        await process.WaitForExitAsync();
        
        // Parse whisper output
        return ParseWhisperOutput(output);
    }
    
    private SpeechRecognitionResult ParseWhisperOutput(string output)
    {
        // Simple parsing - you'd need to adjust based on actual whisper output
        var lines = output.Split('\n');
        var text = new StringBuilder();
        string detectedLanguage = "en"; // Default
        
        foreach (var line in lines)
        {
            if (line.StartsWith("[Detected language:"))
            {
                detectedLanguage = line.Split(':')[1].Trim().Substring(0, 2);
            }
            else if (!string.IsNullOrWhiteSpace(line) && !line.StartsWith("["))
            {
                text.AppendLine(line.Trim());
            }
        }
        
        return new SpeechRecognitionResult
        {
            Text = text.ToString().Trim(),
            Language = detectedLanguage
        };
    }
}

Step 9: Text-to-Speech Service (Piper TTS)

csharp
using System.Diagnostics;

namespace AIVoiceChat;

public interface ITextToSpeech
{
    Task SpeakAsync(string text, string language, string voiceType);
}

public class PiperTextToSpeech : ITextToSpeech
{
    public async Task SpeakAsync(string text, string language, string voiceType)
    {
        try
        {
            // Get the appropriate voice model based on language and voiceType
            var voiceModel = GetVoiceModel(language, voiceType);
            
            if (voiceModel == null)
            {
                Debug.WriteLine($"No voice model found for {language}/{voiceType}");
                return;
            }
            
            // Call piper TTS
            await ProcessWithPiper(text, voiceModel);
        }
        catch (Exception ex)
        {
            Debug.WriteLine($"TTS error: {ex}");
        }
    }
    
    private string GetVoiceModel(string language, string voiceType)
    {
        // Map language and voiceType to specific piper voice models
        // You would need to include these models in your app resources
        
        // Example mapping:
        if (language.StartsWith("en"))
        {
            return voiceType switch
            {
                "male" => "models/piper/en/en_US-lessac-medium.onnx",
                "female" => "models/piper/en/en_US-amy-medium.onnx",
                "boy" => "models/piper/en/en_US-hfc_male-medium.onnx",
                "girl" => "models/piper/en/en_US-hfc_female-medium.onnx",
                "oldman" => "models/piper/en/en_US-danny-medium.onnx",
                _ => "models/piper/en/en_US-amy-medium.onnx"
            };
        }
        else if (language.StartsWith("es"))
        {
            return "models/piper/es/es_ES-davefx-medium.onnx";
        }
        // Add more languages as needed
        
        return null;
    }
    
    private async Task ProcessWithPiper(string text, string voiceModel)
    {
        // This would call the piper executable
        // You need to include the piper binaries in your app
        
        var tempFile = Path.Combine(FileSystem.CacheDirectory, "tts_output.wav");
        
        var startInfo = new ProcessStartInfo
        {
            FileName = "piper/piper",
            Arguments = $"--model {voiceModel} --output_file {tempFile}",
            RedirectStandardInput = true,
            UseShellExecute = false,
            CreateNoWindow = true
        };
        
        using var process = new Process { StartInfo = startInfo };
        process.Start();
        
        // Send text to piper
        await process.StandardInput.WriteLineAsync(text);
        process.StandardInput.Close();
        
        await process.WaitForExitAsync();
        
        // Play the generated audio (platform specific)
        await PlayAudio(tempFile);
    }
    
    private async Task PlayAudio(string filePath)
    {
        // Platform-specific audio playback
        // For Android: Use Android.Media.MediaPlayer
        // For iOS: Use AVAudioPlayer
        // For Windows: Use NAudio or Windows.Media.Playback
        
        await Task.Delay(500); // Simulate playback
    }
}

Step 10: AI Chat Service (llama.cpp or Ollama)

csharp
using System.Diagnostics;
using System.Text;

namespace AIVoiceChat;

public interface IAIChatService
{
    Task<AIResponse> GetResponseAsync(string input);
}

public class AIResponse
{
    public string Text { get; set; }
    public string Language { get; set; }
}

public class LocalAIChatService : IAIChatService
{
    public async Task<AIResponse> GetResponseAsync(string input)
    {
        try
        {
            // Call llama.cpp or Ollama
            var result = await ProcessWithLlama(input);
            
            return new AIResponse
            {
                Text = result,
                Language = "en" // Assume English for now
            };
        }
        catch (Exception ex)
        {
            Debug.WriteLine($"AI error: {ex}");
            return new AIResponse
            {
                Text = "I encountered an error processing your request.",
                Language = "en"
            };
        }
    }
    
    private async Task<string> ProcessWithLlama(string input)
    {
        // This would call the llama.cpp executable
        // You need to include the llama.cpp binaries and model in your app
        
        var startInfo = new ProcessStartInfo
        {
            FileName = "llama.cpp/main",
            Arguments = "-m models/ggml-model-q4_0.bin --temp 0.7 --repeat_penalty 1.1",
            RedirectStandardInput = true,
            RedirectStandardOutput = true,
            UseShellExecute = false,
            CreateNoWindow = true
        };
        
        using var process = new Process { StartInfo = startInfo };
        process.Start();
        
        // Send prompt
        var prompt = $"User: {input}\nAI:";
        await process.StandardInput.WriteLineAsync(prompt);
        process.StandardInput.Close();
        
        // Read response
        var output = new StringBuilder();
        while (!process.StandardOutput.EndOfStream)
        {
            var line = await process.StandardOutput.ReadLineAsync();
            if (line == null) break;
            
            output.AppendLine(line);
            
            // Simple heuristic to stop when we see certain tokens
            if (line.Contains("User:") || line.Contains("\n\n"))
                break;
        }
        
        await process.WaitForExitAsync();
        
        return output.ToString().Trim();
    }
}

Step 11: Settings Page

xml
<?xml version="1.0" encoding="utf-8" ?>
<ContentPage xmlns="http://schemas.microsoft.com/dotnet/2021/maui"
             xmlns:x="http://schemas.microsoft.com/winfx/2009/xaml"
             x:Class="AIVoiceChat.SettingsPage"
             Title="Settings">
    <ScrollView>
        <StackLayout Padding="20" Spacing="15">
            <Label Text="Voice Settings"
                   FontSize="18"
                   FontAttributes="Bold" />
                   
            <Picker x:Name="voicePicker"
                    Title="Select Voice"
                    ItemsSource="{Binding VoiceOptions}"
                    ItemDisplayBinding="{Binding DisplayName}" />
                    
            <Label Text="AI Model Settings"
                   FontSize="18"
                   FontAttributes="Bold"
                   Margin="0,20,0,0" />
                   
            <Picker x:Name="modelPicker"
                    Title="Select AI Model"
                    ItemsSource="{Binding ModelOptions}" />
                    
            <Button Text="Save Settings"
                    Clicked="OnSaveClicked"
                    BackgroundColor="#4CAF50"
                    TextColor="White"
                    CornerRadius="20"
                    HeightRequest="50" />
        </StackLayout>
    </ScrollView>
</ContentPage>
csharp
using System.Collections.ObjectModel;
using System.Globalization;

namespace AIVoiceChat;

public partial class SettingsPage : ContentPage
{
    public ObservableCollection<VoiceOption> VoiceOptions { get; } = new()
    {
        new VoiceOption { Id = "female", DisplayName = "Female Voice" },
        new VoiceOption { Id = "male", DisplayName = "Male Voice" },
        new VoiceOption { Id = "boy", DisplayName = "Boy's Voice" },
        new VoiceOption { Id = "girl", DisplayName = "Girl's Voice" },
        new VoiceOption { Id = "oldman", DisplayName = "Old Man's Voice" }
    };
    
    public ObservableCollection<string> ModelOptions { get; } = new()
    {
        "Small (Fast)",
        "Medium (Balanced)",
        "Large (High Quality)"
    };
    
    public SettingsPage()
    {
        InitializeComponent();
        BindingContext = this;
        
        // Load saved settings
        var savedVoice = Preferences.Get("SelectedVoice", "female");
        var savedModel = Preferences.Get("SelectedModel", 1);
        
        voicePicker.SelectedItem = VoiceOptions.FirstOrDefault(v => v.Id == savedVoice);
        modelPicker.SelectedIndex = savedModel;
    }
    
    private void OnSaveClicked(object sender, EventArgs e)
    {
        if (voicePicker.SelectedItem is VoiceOption selectedVoice)
        {
            Preferences.Set("SelectedVoice", selectedVoice.Id);
        }
        
        Preferences.Set("SelectedModel", modelPicker.SelectedIndex);
        
        DisplayAlert("Success", "Settings saved", "OK");
    }
}

public class VoiceOption
{
    public string Id { get; set; }
    public string DisplayName { get; set; }
}

Step 12: MauiProgram.cs

csharp
using AIVoiceChat.Services;
using CommunityToolkit.Maui;
using Microsoft.Extensions.Logging;
using SkiaSharp.Views.Maui.Controls.Hosting;

namespace AIVoiceChat;

public static class MauiProgram
{
    public static MauiApp CreateMauiApp()
    {
        var builder = MauiApp.CreateBuilder();
        builder
            .UseMauiApp<App>()
            .UseSkiaSharp()
            .UseMauiCommunityToolkit()
            .ConfigureFonts(fonts =>
            {
                fonts.AddFont("OpenSans-Regular.ttf", "OpenSansRegular");
                fonts.AddFont("OpenSans-Semibold.ttf", "OpenSansSemibold");
            });
            
        // Register services
        builder.Services.AddSingleton<IDatabaseService, DatabaseService>();
        builder.Services.AddSingleton<ISpeechToText, WhisperSpeechToText>();
        builder.Services.AddSingleton<ITextToSpeech, PiperTextToSpeech>();
        builder.Services.AddSingleton<IAIChatService, LocalAIChatService>();
        
        // Register pages
        builder.Services.AddTransient<MainPage>();
        builder.Services.AddTransient<SettingsPage>();
        
        // Initialize database
        var dbService = builder.Services.BuildServiceProvider().GetRequiredService<IDatabaseService>();
        dbService.InitializeAsync().Wait();
        
#if DEBUG
        builder.Logging.AddDebug();
#endif

        return builder.Build();
    }
}

Step 13: App.xaml.cs

csharp
namespace AIVoiceChat;

public partial class App : Application
{
    public App()
    {
        InitializeComponent();
        
        // Set default voice if not set
        if (!Preferences.ContainsKey("SelectedVoice"))
        {
            Preferences.Set("SelectedVoice", "female");
        }
        
        MainPage = new AppShell();
    }
}

public class AppShell : Shell
{
    public AppShell()
    {
        var mainPage = new MainPage(
            ServiceProvider.GetService<IDatabaseService>(),
            ServiceProvider.GetService<ISpeechToText>(),
            ServiceProvider.GetService<ITextToSpeech>(),
            ServiceProvider.GetService<IAIChatService>());
            
        Items.Add(new ShellContent
        {
            Title = "AI Voice Chat",
            Content = mainPage
        });
        
        Items.Add(new ShellContent
        {
            Title = "Settings",
            Content = new SettingsPage()
        });
    }
}

Step 14: Platform-Specific Implementations

For each platform (Android, iOS, Windows), you'll need to implement:

  1. Audio Recording: Capture microphone input

  2. Audio Playback: Play the TTS output

  3. Native Libraries: Include the whisper.cpp, llama.cpp, and piper binaries

Example for Android (Platforms/Android/MainActivity.cs):

csharp
using Android.Content.PM;
using Android.OS;
using Android.Runtime;
using AndroidX.Core.App;
using AndroidX.Core.Content;

[assembly: UsesPermission(Android.Manifest.Permission.RecordAudio)]
[assembly: UsesPermission(Android.Manifest.Permission.ModifyAudioSettings)]

namespace AIVoiceChat.Platforms.Android;

[Activity(Theme = "@style/Maui.SplashTheme", MainLauncher = true, 
          ConfigurationChanges = ConfigChanges.ScreenSize | ConfigChanges.Orientation | ConfigChanges.UiMode | ConfigChanges.ScreenLayout | ConfigChanges.SmallestScreenSize | ConfigChanges.Density)]
public class MainActivity : MauiAppCompatActivity
{
    protected override void OnCreate(Bundle savedInstanceState)
    {
        base.OnCreate(savedInstanceState);
        
        // Request microphone permission if not granted
        if (ContextCompat.CheckSelfPermission(this, Android.Manifest.Permission.RecordAudio) != Permission.Granted)
        {
            ActivityCompat.RequestPermissions(this, new[] { Android.Manifest.Permission.RecordAudio }, 1);
        }
        
        // Copy model files from assets to app data directory
        CopyModelFiles();
    }
    
    private void CopyModelFiles()
    {
        var modelsDir = new Java.IO.File(ApplicationContext.FilesDir, "models");
        if (!modelsDir.Exists()) modelsDir.Mkdirs();
        
        // Copy whisper model
        CopyAssetToFile("ggml-base.en.bin", new Java.IO.File(modelsDir, "ggml-base.en.bin"));
        
        // Copy piper voices
        var piperDir = new Java.IO.File(modelsDir, "piper");
        if (!piperDir.Exists()) piperDir.Mkdirs();
        
        // Copy llama model
        CopyAssetToFile("ggml-model-q4_0.bin", new Java.IO.File(modelsDir, "ggml-model-q4_0.bin"));
    }
    
    private void CopyAssetToFile(string assetName, Java.IO.File destination)
    {
        if (destination.Exists()) return;
        
        using var assetStream = Assets.Open(assetName);
        using var fileStream = new Java.IO.FileOutputStream(destination);
        assetStream.CopyTo(fileStream);
    }
}

Step 15: Include Model Files

  1. Download the required models:

    • whisper.cpp base.en model

    • llama.cpp 7B quantized model

    • Piper TTS voice models for your target languages

  2. Add them to your project as MauiAssets:

    • Create a "Resources/Raw" folder in your .NET MAUI project

    • Add model files with Build Action = MauiAsset

Deployment Notes

  1. This app will be large due to the included models (several GB)

  2. Consider offering model downloads on first run to reduce initial app size

  3. For better performance, use quantized models (e.g., q4_0 for llama.cpp)

Limitations

  1. Local AI models may be slow on mobile devices

  2. Large memory requirements for models

  3. Voice quality may not match commercial TTS services

Alternatives

If local processing is too resource-intensive, you could:

  1. Use OpenAI's Whisper API for speech recognition

  2. Use OpenAI's ChatGPT for responses

  3. Use Azure or Google Cloud TTS services

However, these would require internet connectivity and may have costs associated.

This implementation provides a completely offline, open-source solution for voice-based AI conversation with conversation history and a visually appealing interface.

No comments:

Post a Comment

Cricket Streaming MAUI App: Complete Solution

This document outlines a comprehensive solution for building a cricket streaming MAUI app with subscription plans, geo-restrictions, and pre...

Ads2