How to use Speech Recognition with Xamarin.iOS

Apple has introduced a few interesting new features in the iOS 10 for working with speech including the new Speech API and SiriKit. The Speech API especially provides a compelling way to incorporate speech recognition into your application. Xamarin has ported all of these new APIs to their platform giving you the ability to try these new features in your Xamarin.iOS apps. In this article we'll examine how you can integrate speech recognition into your Xamarin.iOS application by guiding you through creating a speech driven FlexGrid with the ability to filter based on your spoken phrase.

Speech Recognition using the Speech API

The new Speech API is a powerful to add speech recognition into your app. The API works with both live and prerecorded speech, and gives you transcriptions, multiple interpretations, and confidence levels for the results it provides. The speech API generally relies on Apple's servers for processing speech, and will return a result back to you (some newer iOS devices are able to do this locally for certain languages). Both Apple and Xamarin have some documentation related to the Speech API, how it works, and how to implement it. Using the Speech API requires that you target iOS 10.

Setting up permission for Speech Recognition

Since the Speech API transmits data back and forth from Apple's server, it's important that you both are transparent about the process to your users and add the correct permissions into your app. You can add these permissions directly into your info.plist by adding the NSSpeechRecognitionUageDescription (for speech recognition) and NSMicrophoneUsageDescription (for accessing the microphone). Our example will use live audio transcription which is why we will also need to add the second permission for the microphone. infoplist

Using the Speech API

There's a number of different steps to follow while implementing speech recognition in iOS 10. We'll need to request authorization from the user to use a speech recognizer, create a speech recognition request, and then pass the request to our speech recognizer. Rather than start from scratch, we'll be modifying the FullTextFilter sample to demonstrate the Speech API working with FlexGrid. I've modified the sample slightly by adding a button to turn the mic on and off when the uses wants to begin and end voice input. srdesigner We'll create some variables at the top of our class which we'll use throughout the example. These objects pertain to using the Speech API or (in the case of the AVAudioEngine) the built-in microphone. The SFSpeechRecognizer can be passed a locale for determining the correct language and region for the speech recognition.


        private SFSpeechAudioBufferRecognitionRequest recognitionRequest;  
        private SFSpeechRecognitionTask recognitionTask;  
        private AVAudioEngine audioEngine = new AVAudioEngine();  
        private SFSpeechRecognizer speechRecognizer = new SFSpeechRecognizer(new NSLocale("en_US"));  

The next step is to get authorization from the user for the speech recognizer. You can do this in the ViewDidLoad method. We'll toggle the mic button on and off depending on whether or not the user has provided permission for voice input.


       public override void ViewDidLoad ()  
        {  
            base.ViewDidLoad ();  

            MicButton.Enabled = false;  

            SFSpeechRecognizer.RequestAuthorization((SFSpeechRecognizerAuthorizationStatus auth) =>  
            {  
                bool buttonIsEnabled = false;  
                switch(auth){  
                    case SFSpeechRecognizerAuthorizationStatus.Authorized:  
                        buttonIsEnabled = true;  
                        var node = audioEngine.InputNode;  
                        var recordingFormat = node.GetBusOutputFormat(0);  
                        node.InstallTapOnBus(0, 1024, recordingFormat, (AVAudioPcmBuffer buffer, AVAudioTime when) =>  
                        {  
                            recognitionRequest.Append(buffer);  
                        });  
                        break;  
                    case SFSpeechRecognizerAuthorizationStatus.Denied:  
                        buttonIsEnabled = false;  
                        break;  
                    case SFSpeechRecognizerAuthorizationStatus.Restricted:  
                        buttonIsEnabled = false;  
                        break;  
                    case SFSpeechRecognizerAuthorizationStatus.NotDetermined:  
                        buttonIsEnabled = false;  
                        break;  
                }  

                InvokeOnMainThread(() => { MicButton.Enabled = buttonIsEnabled; });  
            });  
...  

Assuming that the user gives us permission to use the mic, we'll install the mic tap on the AVAudioEngine InputNode using InstallTapOnBus. We'll handle starting and stopping the recording elsewhere in separate methods. Starting speech recognition consists of initializing a new recongitionRequest, preparing and starting the audioEngine, and starting the speech recognition task. We can provide some feedback by changing the placeholder of the UITextfield (searchText)to display a recording message while recording is in progress. We'll capture the best transcription of our text once the recognition task ends, and both show it in search text and call textChange to filter based on that new value.


      public void StartSpeechRecognition()  
        {  

            searchText.Placeholder = "Recording";  
            recognitionRequest = new SFSpeechAudioBufferRecognitionRequest();  

            audioEngine.Prepare();  
            NSError error;  
            audioEngine.StartAndReturnError(out error);  
            if (error != null)  
            {  
                Console.WriteLine(error.ToString());  
                return;  
            }  
            recognitionTask = speechRecognizer.GetRecognitionTask(recognitionRequest, (SFSpeechRecognitionResult result, NSError err) =>  
            {  
                if (err != null)  
                {  
                    Console.WriteLine(err.ToString());  
                }  
                else {  
                    if (result.Final == true)  
                    {  
                        searchText.Text = result.BestTranscription.FormattedString;  

                        textChange(searchText);  
                    }  
                }  
            });  
        }  

Stopping the recording is much easier. All we need to to is stop the audioEngine and end the recognitionRequest.


        public void StopSpeechRecognition()  
        {  
            audioEngine.Stop();  
            recognitionRequest.EndAudio();  
        }  

The final piece is to allow the mic button to toggle between starting and ending recording. We'll also provide some feedback to the user via the text displayed on the button whether the mic is recording or has stopped.


     partial void MicButton_TouchUpInside(UIButton sender)  
        {  
            if (audioEngine.Running == true)  
            {  
                StopSpeechRecognition();  
                MicButton.SetTitle("Start", UIControlState.Normal);  
            }  
            else {  
                StartSpeechRecognition();  
                MicButton.SetTitle("Stop", UIControlState.Normal);  
            }  

        }  

Wrap up

We're handling the easiest case for voice recognition in this blog, but you could go further and provide actual voice commands for your app using the same API. All of the mobile platforms now provide some form of speech recognition so this is simply a starting point as far as the Xamarin platform goes. In the coming weeks we'll take a look at adding this behavior into Xamarin.Android, and possibly Xamarin.Forms as well.

GrapeCity

GrapeCity Developer Tools
comments powered by Disqus