Programming with DirectX : Sound in DirectX - XAudio2

7/6/2011 6:01:52 PM
Audio2 is the Direct Sound replacement for Windows developers and is an enhanced version of the XAudio API that Xbox developers have been enjoying for some time. In this article we will create a demo that will play a sound file once and then exit. This demo will show you how to get XAudio2 up and working to play sound inside any application.

XAudio2 does not have a way to detect and convert audio files between incompatible endian orders. This means that if you are working directly with XAudio2 on Xbox 360 and Windows, you must handle endian order carefully.


Like XACT3, XAudio2 has an interface that you create to use XAudio2. This interface is called IXAudio2, and it is created by calling the SDK function XAudio2Create(). On the Xbox 360 this is an actual API function, while on Windows, according to the DirectX documentation, it is a convenient inline function defined in XAudio2.h. XAudio2Create() has the following function prototype and takes as parameters the IXAudio2 object that will be created, creation flags (defaults to 0 or XAUDIO2_DEBUG_ENGINE for debug mode), and an audio processor that specifies which CPU XAudio2 should use, which has a default value of XAUDIO2_DEFAULT_PROCESSOR.

HRESULT XAudio2Create(
IXAudio2 **ppXAudio2,
UINT32 Flags = 0,

On the Xbox 360, XAudio2 is implemented as a statically linked library, while on Windows it is a COM object implemented by a dynamic link library.

XAudio2 uses something known as voices to manipulate and control audio. There are three types of these voices in the XAudio2 API: source voices, submix voices, and mastering voices. A source voice is used to send sound data to the other types of voices, and it represents an audio stream of data. A submix voice is used to process audio data from a source voice to perform various effects (e.g., sample rate conversion) and can also be used as an input voice to another submix voice or to a mastering voice. A mastering voice is the voice that is audible, and it sends that data it receives from source and submix voices to the audio hardware. The mastering voice is the only voice that allows you to hear anything, so you must create this voice in XAudio2 to hear anything.

As far as the basics of XAudio2 are concerned this is essentially what you need to play audio in the API. In the XAudio2 demo’s main source file the function calls CoInitializeEx() because XAudio2 is a COM object in Windows. It creates the XAudio2 engine, and it creates the mastering voice that will play the actual sound. In the demo the loading and playing of the actual file are done in a function called PlayPCM(), which will be discussed later in this section.

The creation of the mastering voice is done with a call to CreateMasteringVoice(), which takes as parameters an address to an IXAudio2MasteringVoice object that will store the created voice object, the audio channels, the audio sample rate, flags for the voice (which must be set to 0), the output device index the voice will use, and an optional audio effects chain using the structure XAUDIO2_EFFECT_CHAIN. The audio channels are set to XAUDIO2_DEFAULT_CHANNELS and default to 5.1 surround on Xbox 360. In Windows, XAudio2 attempts to determine the speaker configuration.

The main() function in the XAudio2 demo is shown in Listing 1. To recap, the function initializes COM, creates the audio engine, creates the mastering voice, loads and plays the sound with a call to PlayPCM() that will be implemented next, and exits the application after releasing the audio engine and uninitializing COM.

Listing 1. The XAudio2 Demo’s main() Source File
int main(int args, char* argc[])
cout << "XAudio2 Demo: Playing clip.wav" << endl << endl;
cout << "Demo will end when the sound is done." << endl << endl;

return 0;

IXAudio2* xAudio2Engine = NULL;

UINT32 flags = 0;
#ifdef _DEBUG

cout << "XAudio2 engine was not created!" << endl;
return 0;

IXAudio2MasteringVoice *masterVoice = NULL;

0, 0, NULL)))
cout << "Master voice was not created!" << endl;

if(xAudio2Engine != NULL)

return 0;

if(PlayPCM(xAudio2Engine, "clip.wav") == false)
cout << "clip.wav failed to load!" << endl;

if(xAudio2Engine != NULL)

return 0;

if(xAudio2Engine != NULL)


return 1;

An audio file is loaded and played with a call to PlayPCM(). This function is a modified version of the PlayPCM() function offered in the Microsoft DirectX SDK sample XAudio2BasicSound. To load and play sounds we will use this function as well as the files SDKwavefile.h and SDKwavefile.cpp. The SDKwavefile files are part of the DirectX Utility (DXUT) library and can be found in any of the DXUT samples in the DirectX SDK. Since these files are part of DirectX, we will use them instead of writing some very long and complicated code for loading audio files. Since the files use DXUT, they have been slightly altered so that the use of the SDKwavefile files does not require any of the other DXUT headers or source files.

The PlayPCM() function uses the CWaveFile class defined in SDKwavefile.h to open the audio file. The file is read by calling the Read() function, which takes as parameters a buffer to read into, the size to read in bytes, and an out pointer to the size of bytes read by the function.

Once the file is loaded, the source voice is created. Keep in mind that the source voice represents a stream of audio data. To create the source voice, which has an interface of IXAudio2SourceVoice, we call the CreateSourceVoice() function of the XAudio2 engine object. This function takes the source voice that will be created, the format of the audio (using the WAVEFORMATEX structure provided by Windows), behavior flags, the maximum frequency ratio, a callback interface function, a send list of source voices for the destination of the audio date (optional), and an audio effect chain. The behavior flags can have one of the following values:

  • XAUDIO2_VOICE_NOPITCH for no pitch control

  • XAUDIO2_VOICE_NOSRC for no sample rate conversion

  • XAUDIO2_VOICE_USEFILTER to enable filter effects on the sound

  • XAUDIO2_VOICE_MUSIC to state that the voice is used to play background music

Once the source voice is created, an audio buffer using the XAudio2 structure XAUDIO2_BUFFER is created. This buffer will take the sound data and submit it to the sound voice, which can only happen after a valid sound voice has been created by CreateSoundVoice(). The audio buffer has the audio data assigned to the pAudioData variable, the audio flags to the Flags variable, and the size of the audio to the AudioBytes variable. The flag of XAUDIO2_END_OF_STREAM tells XAudio2 that there is no more data to follow after the sound has played.

To submit the data to the source voice, you call SubmitSourceBuffer() on the source voice object, which takes as a parameter the XAUDIO_BUFFER object. If all is successful, you can start processing the sound by calling Start() on the source voice. The Start() function takes as parameters behavior flags that must be set to 0 and an operation set. The operation set can be XAUDIO2_COMMIT_NOW to apply the operation immediately or XAUDIO2_COMMIT_ALL to apply all pending operations.

When a source voice is processing, it is being played. You can test the state of the sound by calling the GetState() function on the source voice object. This will return an XAUDIO2_VOICE_STATE object that you can test for various states. To test if the sound is still playing you can test if the BuffersQueued variable is greater than 0.

Once you are done with a source voice, you free it by calling DestroyVoice(). The entire PlayPCM() function is shown in Listing 2 with all the code we’ve just discussed in the previous few paragraphs. This function essentially loads a sound, plays it, and then frees it from memory. As a bonus exercise you should separate the loading and playing code into their own functions and allow the sound to be played multiple times before it is freed.

Listing 2. The PlayPCM() Function
bool PlayPCM(IXAudio2* xAudio2Engine, char *filename)
CWaveFile wav;

if(FAILED(wav.Open(filename, NULL, WAVEFILE_READ)))
return false;
WAVEFORMATEX *format = wav.GetFormat();
unsigned long wavSize = wav.GetSize();
unsigned char *wavData = new unsigned char[wavSize];

if(FAILED(wav.Read(wavData, wavSize, &wavSize)))
delete[] wavData;

return false;

IXAudio2SourceVoice *srcVoice;

if(FAILED(xAudio2Engine->CreateSourceVoice(&srcVoice, format,
  •  Programming with DirectX : Sound in DirectX - XACT3 (part 2) - XACT3 Demo
  •  Programming with DirectX : Sound in DirectX - XACT3 (part 1) - XACT3 Tools
  •  iPhone 3D Programming : Image-Processing Example: Bloom
  •  iPhone 3D Programming : Anisotropic Filtering: Textures on Steroids
  •  iPhone 3D Programming : Reflections with Cube Maps
  •  Silverlight Recipes : Networking and Web Service Integration - Accessing Resources over HTTP
  •  Silverlight Recipes : Networking and Web Service Integration - Using JSON Serialization over HTTP
  •  Microsoft XNA Game Studio 3.0 : Displaying Images - Using Resources in a Game (part 4) - Filling the Screen
  •  Microsoft XNA Game Studio 3.0 : Displaying Images - Using Resources in a Game (part 3) - Sprite Drawing with SpriteBatch
  •  Microsoft XNA Game Studio 3.0 : Displaying Images - Using Resources in a Game (part 2) - Positioning Your Game Sprite on the Screen
  •  Microsoft XNA Game Studio 3.0 : Displaying Images - Using Resources in a Game (part 1) - Loading XNA Textures
  •  iPhone 3D Programming : Holodeck Sample (part 5) - Overlaying with a Live Camera Image
  •  iPhone 3D Programming : Holodeck Sample (part 4) - Replacing Buttons with Orientation Sensors
  •  iPhone 3D Programming : Holodeck Sample (part 3) - Handling the Heads-Up Display
  •  iPhone 3D Programming : Holodeck Sample (part 2) - Rendering the Dome, Clouds, and Text
  •  iPhone 3D Programming : Holodeck Sample (part 1) - Application Skeleton
  •  Building LOB Applications : Printing in a Silverlight LOB Application
  •  Building LOB Applications : Data Validation through Data Annotation
  •  Building LOB Applications : Implementing CRUD Operations in RIA Services
  •  Microsoft XNA Game Studio 3.0 : Displaying Images - Resources and Content (part 2) - Adding Resources to a Project
    Top 10
    3D Printing … for people who don’t have a 3D printer
    SQL Programming Language : Ordering, Calculating, and Grouping in Queries
    In-ear Or Over-ear Headphones?
    Do More With Mail (Part 5) - Postcard, Email’n Walk, eMailGanizer pro
    Monitoring Microsoft Windows Server 2003 : Using the Performance Console
    Collaborating via Blogs and Wikis : Evaluating Blogs for Collaboration
    Windows Server 2008 R2 and Windows 7 : Overview of Branchcache & Planning to Deploy Branchcache
    Sony 3D Bloggie
    IIS 7.0 : Hosting Application Development Frameworks - Hosting ASP Applications & Hosting PHP Applications
    Most View
    Keep up to date with Windows Update
    Blackberry World 2012 (Part 2) - BlackBerry 10, Apps and development
    The best music apps for your iOS Device (Part 3) - ITUNES U, VOXER
    Windows 7 : Networking with TCP/IP (part 2) - Understanding IPv6 & Configuring IPv4, IPv6, and Other Protocols
    Arctic Freezer i30 - Enthusiast-grade CPU cooler
    Share and stream media (Part 2) - Stream away to an Android device
    Programming with DirectX : Shading and Surfaces - Implementing Texture Mapping (part 2) - Multi Texture Demo
    A First Silverlight Example: Creating a Web Site
    Sharing Files and Folders Over the Network in Vista
    Microsoft Enterprise Library : Non-Formatted Trace Listeners
    Predict The Weather (Part 2) - Feathery storm clouds
    SQL Server 2008 : Transact-SQL Programming - Ranking Functions
    Anatomy of Utrabooks (Part 3) - ASUS ZENBOOK UX31
    Armageddon Alien II G7 - For The Ones New To The Game
    SQL Server 2008 : Using Remote Stored Procedures
    Get A Faster, Safer PC (Part 1) - Clear out the system tray, Remove crapware & A slicker setup
    SQL Server 2008 : Developing with LINQ to SQL (part 1)
    Programming Microsoft SQL Server 2005 : Deployment (part 2) - Testing Your Stored Procedures
    Windows Server 2008 : Harnessing the Power and Potential of FIM
    ASP.NET 4 : LINQ and the Entity Framework - The EntityDataSource