Being able to let your users control
your application with their voice can be really useful. That’s where
the Speech Recognition comes in. Before you can get started, you need
the capabilities to include the network, speech recognition, and
microphone support (for example, ID_CAP_NETWORKING
, ID_CAP_SPEECH_RECOGNITION
, and ID_CAP_MICROPHONE
) in the WMAppManifest
.xml file.
Now that you have the right capabilities, you
can start to use the speech recognition system. There are two main ways
to use speech recognition: with and without the system user interface.
Using the system’s user interface is the easiest method, but it might
conflict with your branding or vision for your application.
The class responsible for speech recognition using the system UI is the SpeechRecognizerUI
class. This class supports the IDisposable
interface, which
means you should take care to clean up its resources when you’re done
using it. Typically, you would create the object on the Loaded
event of a page/control and dispose of it on the Unloaded
event as shown:
public partial class MainPage : PhoneApplicationPage
{
SpeechRecognizerUI _ui;
// Constructor
public MainPage()
{
InitializeComponent();
Loaded += MainPage_Loaded;
Unloaded += MainPage_Unloaded;
}
void MainPage_Loaded(object sender, RoutedEventArgs e)
{
_ui = new SpeechRecognizerUI();
}
void MainPage_Unloaded(object sender, RoutedEventArgs e)
{
_ui.Dispose();
}
...
}
After you have the creation and tear-down specified, you can use the class. The method for listening for speech is called RecognizeWithUIAsync
, and it uses the async
pattern. So to use it, you should mark your method with async
and use await
to let the UI be shown and get the speech for recognition:
private async void speechUIButton_Click(object sender, EventArgs e)
{
var result = await _ui.RecognizeWithUIAsync();
if (result.ResultStatus == SpeechRecognitionUIStatus.Succeeded)
{
var recognized = result.RecognitionResult.Text;
AddItem(recognized);
}
}
When you call the RecognizeWithUIAsync
method, it shows the UI and listens for any speech. You can see this in Figure 1.
FIGURE 1 Using SpeechRecognitionUI
The result of the speech recognition is a structure that contains two properties. The first one is the ResultStatus
,
which is an enumeration of whether the operation succeeded. You can
test for success (as shown previously). If the operation did not
succeed, the result’s RecognitionResult
property (a SpeechRecognitionResult
object) will not be valid. If it is, you can use the result to see the text that was recognized.
The speech recognition engine also calculates
its confidence in the result. This is helpful to figure out whether the
text it recognized could possibly be correct. You can use the RecognitionResult
’s TextConfidence
property to determine this:
private async void speechUIButton_Click(object sender, EventArgs e)
{
var result = await _ui.RecognizeWithUIAsync();
if (result.ResultStatus == SpeechRecognitionUIStatus.Succeeded &&
result.RecognitionResult.TextConfidence >=
SpeechRecognitionConfidence.Medium)
{
var confidence = result.RecognitionResult.TextConfidence;
var recognized = result.RecognitionResult.Text;
AddItem(string.Concat(confidence, " - ", recognized));
}
}
In this case we are using the result only if the confidence is at least medium. This will reduce the number of false positives.
The other method for using the speech
recognition engine is to use it without the user interface.
Unsurprisingly, the class involved is called SpeechRecognizer
(note no “UI” suffix). The pattern for using it is very much the same as the UI class:
public partial class MainPage : PhoneApplicationPage
{
SpeechRecognizer _rec;
// Constructor
public MainPage()
{
InitializeComponent();
Loaded += MainPage_Loaded;
Unloaded += MainPage_Unloaded;
}
void MainPage_Loaded(object sender, RoutedEventArgs e)
{
_rec = new SpeechRecognizer();
}
void MainPage_Unloaded(object sender, RoutedEventArgs e)
{
_rec.Dispose();
}
...
}
When actually using the speech recognition, you will again do it with the async
pattern, but it is up to you to make your users aware that you are listening:
private async void speechButton_Click(object sender, EventArgs e)
{
// Show User you are listening
VisualStateManager.GoToState(this, "Listening", true);
// Listen for speech
var result = await _rec.RecognizeAsync();
if (result.TextConfidence >= SpeechRecognitionConfidence.Medium)
{
var confidence = result.TextConfidence;
var recognized = result.Text;
AddItem(string.Concat(confidence, " - ", recognized));
}
}
The only real difference in using the RecognizeAsync
method is that it returns a SpeechRecognitionResult
object directly. You can then just test the confidence as you did
before. If the operation failed, the confidence will be the value of SpeechRecognitionConfidence.Rejected,
which you can test for a failure with.