When a user holds the home button
for more than a couple of seconds, the speech recognition subsystem
launches and enables a user to perform searches and other commands for
the phone. She can say anything and the “Listening” screen (as shown in
Figure 1) waits for the user to say something.
FIGURE 1 The “Listening” Screen
Although users can say
anything they want to search, they can also use voice commands like
“Call Mom” or “Open Twitter.” By default the voice commands are based
on the app name, but your app might want to have its own voice commands
for specific functionality.
Before your application can add its own voice commands, it needs to first request certain capabilities for your app, including ID_CAP_MICROPHONE
, ID_CAP_NETWORKING,
and ID_CAP_SPEECH_RECOGNITION
. You can see these capabilities added to the WMAppManifest.xml
file in Figure 2.
FIGURE 2 Adding the voice command capabilities
Next, you will need to add a new file called a
Voice Command Definition (VCD) file. This is just an XML file, but
Visual Studio for the Windows Phone includes an item template for it
(as shown in Figure 3).
FIGURE 3 The Voice Command Definition item template
The XML file has a simple structure as shown here:
<?xml version="1.0" encoding="utf-8"?>
<VoiceCommands xmlns="http://schemas.microsoft.com/
voicecommands/1.0">
<CommandSet >
<CommandPrefix>Facey</CommandPrefix>
<Example> show me a smile </Example>
<Command Name="ShowFace">
<Example> show me a smile </Example>
<ListenFor> show [me] [a] {facetype} </ListenFor>
<Feedback> Your Face is Coming </Feedback>
<Navigate Target="pages/facepage.xaml" />
</Command>
<PhraseList Label="facetype">
<Item> smile </Item>
<Item> frown </Item>
<Item> grimace </Item>
</PhraseList>
</CommandSet>
</VoiceCommands>
The CommandSet
element is used to define a set of commands. The different sections are used to define the command metadata. For example, the CommandPrefix
and Example
are used for all the commands. Next, you will have one or more Command
elements that define a type of command you support. Each command must have a unique name. Finally, the PhraseList
element is used to define replaceable elements. You should think of the PhraseList
as an enumerated list of possible values. Let’s see how these work in a simple example.
All your commands need to start with a common
prefix. This is how the phone knows it should match your application’s
commands. The prefix defined in the CommandPrefix
element
will be used to determine the speech recognition phrase to use to start
your command. For instance, in this example we have an app that wants a
prefix that says “Facey.” This is so users can say “Facey show me a
smile,” and this app will show a smiley face to them. The Example
element after the CommandPrefix
is used to define a string that is shown to users to show them how to use the app’s voice commands.
Next we need to set up the command itself. The Command
element in the example shows that we need a name to identify the Command
. Then inside the element, you can specify the following:
• Example
: Like the main example, this is a command-specific example.
• ListenFor
: One or more patterns for the speech recognition engine to search for. The words in brackets (for example, [a]
) are optional. The words in curly braces are words in a PhraseList
element (for example, {facetype}
).
• Navigate
: This is an option URI for the part of your app to navigate to. If this is omitted, your first page (for example, MainPage.xaml
) will be used.
The last element is the PhraseList
. As the example shows, the PhraseList
has a Label
that identifies it as is used in the ListenFor
elements to match the two. This list does not have to be hard-coded;
we’ll show you later how to programmatically specify the contents of a PhraseList
.
After you have the VCD file created, you must register it with the VoiceCommandService
class. This is done as an asynchronous call that can be handled during
the first launch of your application. For example, to register the VCD
file during the navigation to your main page:
protected async override void OnNavigatedTo(NavigationEventArgs e)
{
base.OnNavigatedTo(e);
await VoiceCommandService.InstallCommandSetsFromFileAsync(
new Uri("ms-appx:///MyVoiceCommands.xml"));
}
The VoiceCommandService
class has a static method called InstallCommandSetsFromFileAsync
method. This method uses the new async support; therefore, you need to specify the async
keyword in the containing method signature and use the await
keyword to make the page wait until this method completes to continue.
You will specify the path
to your VCD file using a new URI. You might notice that it uses a new
URI moniker called “ms-appx.” This specifies that the file path is in
the installation directory for your phone application. The ms-appx://
is the moniker and the /MyVoiceCommands.xml
is the path to the VCD file. If you have placed this as a subdirectory
of your project, make sure that the path is included here.
Now that you have created commands and
registered them, the user will be able to launch your app using the
commands you’ve defined. When your commands are executed, the Voice
Command system launches your application and notifies the user that the
voice command was accepted by showing your app name, icon, and voice
command description, as shown in Figure 4.
FIGURE 4 Voice Command launching your app
The next part of the puzzle is to react to the
navigation to the page you want the voice command to execute. In the
VCD, you can specify a path to a page in the option Navigate
element (inside the Command
element). When
the voice command is executed, your page will be shown and the
navigation will include information in the query string for the
command. So, on your page, just override the OnNavigatedTo
method and first check to make sure the NavigationMode
is New
:
protected override void OnNavigatedTo(NavigationEventArgs e)
{
base.OnNavigatedTo(e);
// Only check for voice command on fresh navigation,
// not tombstoning
if (e.NavigationMode == NavigationMode.New)
{
// ...
}
}
This will ensure that this is checked only when
the page is launched from some external source (for example, the Voice
Command). Next, you should check the NavigationContext.QueryString
for the actual command name sent:
// Is there a voice command in the query string?
if (NavigationContext.QueryString.ContainsKey("voiceCommandName"))
{
// If so, get the name of the Voice Command.
var cmdName = NavigationContext.QueryString["voiceCommandName"];
// If it's the command we expect,
// then find the type of face to show
if (cmdName == "ShowFace")
{
// ...
}
}
The voiceCommandName
query string parameter will be set to the Name
of the Command
element in the VCD that was matched. If so, you can retrieve the voiceCommandName
and test it against the expected commands. This is useful if you need
to have a specific page handle more than one type of command. After
you’ve determined that it is the right command, then you can retrieve
the query string parameter that matched the PhraseList
item. In this case the facetype PhraseList
supports three face types, as shown here:
var faceType = NavigationContext.QueryString["facetype"].ToLower();
// Show supported face types
switch (faceType)
{
case "smile":
VisualStateManager.GoToState(this, "Smile", false);
break;
case "grimace":
VisualStateManager.GoToState(this, "Grimace", false);
break;
case "frown":
VisualStateManager.GoToState(this, "Frown", false);
break;
}
What you do with the data provided is completely up to you, but you can see an example here where we use the VisualStateManager
to show the different smiles for us.
Using voice commands is simple, but
sometimes you need to handle voice-based control within your app—and
that’s where speech recognition comes in.