Audio2 is the Direct Sound replacement for Windows
developers and is an enhanced version of the XAudio API that Xbox
developers have been enjoying for some time. In this article we will create a demo that will play a sound
file once and then exit. This demo will show you how to get XAudio2 up
and working to play sound inside any application.
does not have a way to detect and convert audio files between
incompatible endian orders. This means that if you are working directly
with XAudio2 on Xbox 360 and Windows, you must handle endian order
Like XACT3, XAudio2 has an
interface that you create to use XAudio2. This interface is called
IXAudio2, and it is created by calling the SDK function XAudio2Create().
On the Xbox 360 this is an actual API function, while on Windows,
according to the DirectX documentation, it is a convenient inline
function defined in XAudio2.h. XAudio2Create()
has the following function prototype and takes as parameters the
IXAudio2 object that will be created, creation flags (defaults to 0 or XAUDIO2_DEBUG_ENGINE for debug mode), and an audio processor that specifies which CPU XAudio2 should use, which has a default value of XAUDIO2_DEFAULT_PROCESSOR.
UINT32 Flags = 0,
XAUDIO2_PROCESSOR XAudio2Processor = XAUDIO2_DEFAULT_PROCESSOR
the Xbox 360, XAudio2 is implemented as a statically linked library,
while on Windows it is a COM object implemented by a dynamic link
uses something known as voices to manipulate and control audio. There
are three types of these voices in the XAudio2 API: source voices,
submix voices, and mastering voices. A source voice is used to send
sound data to the other types of voices, and it represents an audio
stream of data. A submix voice is used to process audio data from a
source voice to perform various effects (e.g., sample rate conversion)
and can also be used as an input voice to another submix voice or to a
mastering voice. A mastering voice is the voice that is audible, and it
sends that data it receives from source and submix voices to the audio
hardware. The mastering voice is the only voice that allows you to hear
anything, so you must create this voice in XAudio2 to hear anything.
As far as the basics
of XAudio2 are concerned this is essentially what you need to play
audio in the API. In the XAudio2 demo’s main source file the function
calls CoInitializeEx() because
XAudio2 is a COM object in Windows. It creates the XAudio2 engine, and
it creates the mastering voice that will play the actual sound. In the
demo the loading and playing of the actual file are done in a function
called PlayPCM(), which will be discussed later in this section.
The creation of the mastering voice is done with a call to CreateMasteringVoice(), which takes as parameters an address to an IXAudio2MasteringVoice
object that will store the created voice object, the audio channels,
the audio sample rate, flags for the voice (which must be set to 0), the
output device index the voice will use, and an optional audio effects
chain using the structure XAUDIO2_EFFECT_CHAIN. The audio channels are set to XAUDIO2_DEFAULT_CHANNELS and default to 5.1 surround on Xbox 360. In Windows, XAudio2 attempts to determine the speaker configuration.
The main() function in the XAudio2 demo is shown in Listing 1.
To recap, the function initializes COM, creates the audio engine,
creates the mastering voice, loads and plays the sound with a call to PlayPCM() that will be implemented next, and exits the application after releasing the audio engine and uninitializing COM.
Listing 1. The XAudio2 Demo’s main() Source File
int main(int args, char* argc)
cout << "XAudio2 Demo: Playing clip.wav" << endl << endl;
cout << "Demo will end when the sound is done." << endl << endl;
IXAudio2* xAudio2Engine = NULL;
UINT32 flags = 0;
flags |= XAUDIO2_DEBUG_ENGINE;
cout << "XAudio2 engine was not created!" << endl;
IXAudio2MasteringVoice *masterVoice = NULL;
0, 0, NULL)))
cout << "Master voice was not created!" << endl;
if(xAudio2Engine != NULL)
if(PlayPCM(xAudio2Engine, "clip.wav") == false)
cout << "clip.wav failed to load!" << endl;
if(xAudio2Engine != NULL)
if(xAudio2Engine != NULL)
An audio file is loaded and played with a call to PlayPCM(). This function is a modified version of the PlayPCM()
function offered in the Microsoft DirectX SDK sample XAudio2BasicSound.
To load and play sounds we will use this function as well as the files
SDKwavefile.h and SDKwavefile.cpp. The SDKwavefile files are part of the
DirectX Utility (DXUT) library and can be found in any of the DXUT
samples in the DirectX SDK. Since these files are part of DirectX, we
will use them instead of writing some very long and complicated code for
loading audio files. Since the files use DXUT, they have been slightly
altered so that the use of the SDKwavefile files does not require any of
the other DXUT headers or source files.
The PlayPCM() function uses the CWaveFile class defined in SDKwavefile.h to open the audio file. The file is read by calling the Read()
function, which takes as parameters a buffer to read into, the size to
read in bytes, and an out pointer to the size of bytes read by the
Once the file is loaded,
the source voice is created. Keep in mind that the source voice
represents a stream of audio data. To create the source voice, which has
an interface of IXAudio2SourceVoice, we call the CreateSourceVoice()
function of the XAudio2 engine object. This function takes the source
voice that will be created, the format of the audio (using the WAVEFORMATEX
structure provided by Windows), behavior flags, the maximum frequency
ratio, a callback interface function, a send list of source voices for
the destination of the audio date (optional), and an audio effect chain.
The behavior flags can have one of the following values:
XAUDIO2_VOICE_NOPITCH for no pitch control
XAUDIO2_VOICE_NOSRC for no sample rate conversion
XAUDIO2_VOICE_USEFILTER to enable filter effects on the sound
XAUDIO2_VOICE_MUSIC to state that the voice is used to play background music
Once the source voice is created, an audio buffer using the XAudio2 structure XAUDIO2_BUFFER
is created. This buffer will take the sound data and submit it to the
sound voice, which can only happen after a valid sound voice has been
created by CreateSoundVoice(). The audio buffer has the audio data assigned to the pAudioData variable, the audio flags to the Flags variable, and the size of the audio to the AudioBytes variable. The flag of XAUDIO2_END_OF_STREAM tells XAudio2 that there is no more data to follow after the sound has played.
To submit the data to the source voice, you call SubmitSourceBuffer() on the source voice object, which takes as a parameter the XAUDIO_BUFFER object. If all is successful, you can start processing the sound by calling Start() on the source voice. The Start() function takes as parameters behavior flags that must be set to 0 and an operation set. The operation set can be XAUDIO2_COMMIT_NOW to apply the operation immediately or XAUDIO2_COMMIT_ALL to apply all pending operations.
When a source voice is processing, it is being played. You can test the state of the sound by calling the GetState() function on the source voice object. This will return an XAUDIO2_VOICE_STATE object that you can test for various states. To test if the sound is still playing you can test if the BuffersQueued variable is greater than 0.
Once you are done with a source voice, you free it by calling DestroyVoice(). The entire PlayPCM() function is shown in Listing 2
with all the code we’ve just discussed in the previous few paragraphs.
This function essentially loads a sound, plays it, and then frees it
from memory. As a bonus exercise you should separate the loading and
playing code into their own functions and allow the sound to be played
multiple times before it is freed.
Listing 2. The PlayPCM() Function
bool PlayPCM(IXAudio2* xAudio2Engine, char *filename)
if(FAILED(wav.Open(filename, NULL, WAVEFILE_READ)))
WAVEFORMATEX *format = wav.GetFormat();
unsigned long wavSize = wav.GetSize();
unsigned char *wavData = new unsigned char[wavSize];
if(FAILED(wav.Read(wavData, wavSize, &wavSize)))