Audio
Codecs and Voice Message Sizes
Codec is a
contraction of coding and decoding digital data. This is the format in
which the audio stream is stored. It includes both the number of bit
rate (bits/sec) and compression that is used.
The codec that is used by
the Unified Messaging server to encode the messages is one of the
following four:
Windows
Media Audio (WMA)— 16-bit compressed
GSM 06.10 (GSM)—
8-bit compressed
G.711 PCM Linear (G711)— 16-bit
uncompressed
Mpeg
Audio Layer 3 (MP3)— 16-bit compressed
The Exchange
Server 2010 unified messaging default is MP3. This is a change from
Exchange Server 2007 in which the default was WMA. Although using WMA
results in slightly smaller file sizes, most people prefer the universal
nature of MP3. This enables a much larger number of mobile devices to
play voice mail messages. The Audio Codec setting is configured on the
UM dial plan on the Settings tab.
Note
A dirty
little secret is that the digital compression can result in loss of
data. When the data is compressed and decompressed, information can be
lost. That is, bits of the conversation or message can be lost. This is a
trade-off that the codec makes to save space. This is why the G.711
codec is available, which doesn’t compress data and doesn’t lose data
but at a heavy cost in storage.
These are stored in the
message as attachments using the following formats:
Windows Media Audio Format (.wma)— For the WMA
codec
RIFF/WAV
Format (.wav)— For GSM or G.711 codecs
Mpeg Audio Layer 3 (.mp3)—
For the MP3 codec
The choice of the audio
codec impacts the audio quality and the size of the attached file. Table 1 shows the approximate size of data in the file
attachment for each codec.
Table 1. Audio Size
for Codec Options
Codec Setting | Approximate Size of 10 Sec of Audio |
---|
WMA | 11,000 bytes |
G.711 | 160,000 bytes |
GSM | 16,000 bytes |
MP3 | 19,500 bytes |
The G.711 audio codec
setting results in a greater than 10:1 storage penalty when compared to
the WMA audio codec setting. Although the GSM audio codec setting
results in approximately the same storage as the WMA codec setting, this
comes at a cost of a 50% reduction in audio quality. MP3 provides
similar audio quality to WMA at an acceptable file size. The ubiquitous
nature of the MP3 codec makes it the preferred choice for Exchange
Server 2010.
Note
The .wma file
format has a larger header (about 7KB) than the .wav format (about 0.1KB). So for small messages,
the GSM files will be smaller. However, after messages exceed 15
seconds, the WMA files will be smaller than the GSM files.
Operating System
Requirements
This section
discusses the recommended minimum hardware requirements for Exchange
Server 2010 servers.
Exchange Server 2010 unified messaging supports
the following processors:
x64
architecture-based Intel Xeon or Intel Pentium family processor that
supports Intel Extended Memory 64 Technology
x64 architecture-based
computer with AMD Opteron or AMD Athlon 64-bit processor that supports
AMD64 platform
The Exchange
Server 2010 unified messaging memory requirements are as follows:
2GB of RAM minimum
4GB of RAM recommended
The Exchange
Server 2010 unified messaging disk space requirements are as follows:
A minimum
of 1.2GB of available disk space
Plus 500MB of available disk space for each unified
messaging language pack
200MB of available disk space on the system drive
DVD drive
As features and
complexity of the applications such as Exchange Server 2010 have grown,
the installation code bases have grown proportionally. Luckily, so have
the hardware specifications of the average new system, which now
typically includes a DVD drive.
Exchange
Server 2010 unified messaging supports the following operating system
and Windows components:
Windows
Server 2008, x64 Standard Edition with service pack 2
Windows Server 2008, x64
Enterprise Edition with service pack 2
Windows
Server 2008, x64 R2 Standard Edition
Windows
Server 2008, x64 R2 Enterprise Edition
Exchange Server 2010
unified messaging requires the following components to be installed:
Microsoft .NET
Framework Version 3.5
Windows PowerShell
2.0
Windows Remote Management (WinRM) 2.0
Extensions for ASP.NET AJAX 1.0
Desktop Experience
operating system feature
Microsoft
Management Console (MMC) 3.0.
Out of the box, an
Exchange Server 2010 Unified Messaging server is configured for a
maximum of 100 concurrent calls. This is enough to support potentially
thousands of users, given that the number of calls and voice messages
per day is a fraction of the number of users and is spread out
throughout the day.
Supported IP/VoIP
Hardware
Exchange Server 2010
unified messaging relies on the ability of the IP/VoIP gateway to
translate time-division multiplexing (TDM) or telephony circuit-switched
based protocols, such as Integrated Services Digital Network (ISDN) or
QSIG, from a PBX to protocols based on voice over IP (VoIP) or IP, such
as Session Initiation Protocol (SIP), Real-Time Transport Protocol
(RTP), or T.38 for real-time facsimile transport.
Although there are many
types and manufacturers of PBXs, IP/VoIP gateways, and IP/PBXs, there
are essentially two types of IP/VoIP gateway component configurations:
IP/VoIP
Gateway— A legacy PBX and an IP/VoIP
gateway provisioned as two separate devices. The Unified Messaging
server communicates with the IP/VoIP gateway.
IP/PBX—
A modern IP-based or hybrid PBX such as a Cisco CallManager. The
Unified Messaging server communicates directly with the PBX.
Table 2 lists the currently supported IP/VoIP gateways.
Table 2.
Supported IP/VoIP Gateways for Exchange Server 2010 UM
Manufacturer | Model | Supported Protocols |
---|
AudioCodes | MediaPack 114, MediaPack
118 | Analog with
In-Band or SMDI |
AudioCodes | Mediant 1000/2000 | T1/ or E1 with CAS—In-Band or SMDI,
T1/E1 with Primary Rate Interface (PRI) and Q.SIG or Analog PSTN |
Dialogic | 1000/2000 | T1/ or
E1 with CAS—In-Band or SMDI, T1/E1 with Primary Rate Interface (PRI)
and Q.SIG or Analog PSTN |
Ferrari AG | OfficeMaster 3.2 | PSTN Analog |
Net | VX1200 | T1/
or E1 with CAS—In-Band or SMDI, T1/E1 with Primary Rate Interface (PRI)
and Q.SIG or Analog PSTN |
Nortel | CS1000 | Direct SIP |
Quintum | Tenor-series | Analog PSTN |
To support Exchange Server
2010 unified messaging, one or both types of IP/VoIP device
configurations are used when connecting a telephony network
infrastructure to a data network infrastructure.
All these
solutions must communicate with the unified messenger via SIP over TCP
(TLS encrypted) and SRTP.
Telephony
Components and Terminology
With the
integration of Exchange Server 2010 into the telephony world, it is
important for the Exchange Server administrator to understand the
various components and terminology of a modern telephone system.
The following are some
of the common components and terms that are critical to understand:
Circuit— A circuit is a connection between two end-to-end
devices. This allows the device to communicate. A common example of this
is a telephone call where two people are talking, in which a circuit is
established between the two telephones.
Circuit-switched networks— Circuit-switched networks consist of dedicated
end-to-end connections through the network that support sessions
between end devices. The circuits are set up end-to-end through a series
of switches as needed and torn down
when done. While the circuit is set up, the entire circuit is dedicated
to the devices. A common example of a circuit-switched network is the
PSTN.
DTMF— The Dual Tone Multiple frequency (DTMF) signaling
protocol is used for telephony signaling and call setup. The most common
use is for telephone tone dialing and is known as Touch-Tone. This is
used to convey phone button key presses to devices on the network.
IP/PBX— With the advent of high-speed ubiquitous packet-switched
networks, many corporations have moved from legacy PBXs to modern
IP-based PBXs known as Internet Protocol/Private Branch Exchange
(IP/PBX). These devices come in a myriad of forms, including true
IP/PBXs that only support IP protocols to hybrid devices that support
both circuit-switched and packet-switched devices. A major advantage of
the IP/PBXs is that they are typically much easier to provision and
administer. Rather than having to add a separate physical line to plug a
phone into, IP phones are simply plugged into the Ethernet jack. Rather
than being provisioned by the physical line they are plugged into, the
IP phones are provisioned by their own internal characteristics such as
the MAC address. This allows for more flexibility.
IP/VoIP gateways— Connecting legacy circuit-switched networks to
packet-switched networks, IP/VoIP gateways provide connections between
the new packet-switched VoIP protocols and the circuit-switched
protocols. These gateways can connect the PSTN to an IP/PBX or a legacy
PBX to VoIP devices. In the case of Exchange Server 2010 unified
messaging, the IP/VoIP gateway connects the Unified Messaging server to
the legacy PBX. This is not typically needed if the PBX that the Unified
Messaging server is connecting to is an IP/PBX.
Packet-switched networks— In packet-switched networks, there is no
dedicated end-to-end circuit. Instead, the sessions between devices are
disassembled into packets and transmitted individually over the network,
then reassembled when they reach their destination. All sessions travel
over the shared network. A common example of a packet-switched network
is the Internet.
PBX— In
all but the smallest companies, there is a device that takes incoming
calls from the circuit-switched telephone network and routes them within
the company. This device is called a Private Branch Exchange or PBX. In
the old days, this was done by an operator who plugged in the lines
manually. The PBX also routes internal outgoing calls, calls between
internal phones, and calls to other devices such as the voice mail
system.
POTS— The Plain Old Telephone System (POTS) is the
original analog version of the PSTN. The term originally referred to
Post Office Telephone Service, but morphed into the current definition
when control of the telephone systems was removed from national post
offices.
PSTN— The Public Switched Telephone Network (PSTN) is
the circuit-switched network to which most telephones connect. It can be
either analog, digital, or a combination of the two.
TDM— Time-division
multiplexing (TDM) is a digital, multiplexing technique for placing
multiple simultaneous calls over a circuit-switched network such as the
PSTN.
VoIP— Voice over Internet Protocol (VoIP) is the use of
voice technologies over packet-switched networks using TCP/IP transport
protocols rather than circuit-switched networks like the PSTN. This
takes advantage of and reflects the trend toward a single, ubiquitous
packet-switched network. The local area network (LAN) and wide area
network (WAN) are used not only for data traffic, but also for voice
traffic. VoIP is not a single technology, but rather a collection of
different technologies, protocols, hardware, and software.
Unified Messaging
Protocols
The Exchange
Server 2010 Unified Messaging servers use several telephony-related
protocols to integrate and communicate with telephony devices. These
protocols are listed and discussed in the following list:
SIP— Session Initiation Protocol (SIP) is the signaling
protocol that is used to set up and tear down VoIP calls. These calls
include voice, video, instant messaging, and a variety of other
services. The SIP protocol is specified in RFC 3261 produced by the
Internet Engineering Task Force (IETF) SIP Working Group. SIP is only a
signaling protocol and does not transmit data per se. After the call is
set up, the actual communications take place using the RTP for voice and
video or T.38 for faxes.
Note
Exchange Server
2010 only supports SIP over TCP. SIP can be configured to run over User
Datagram Protocol (UDP) or Transmission Control Protocol (TCP). UDP is
connectionless and does not provide reliability guarantees over the
network. TCP is connection-oriented and provides reliability guarantees
for its packets.
RTP— Real-Time Transport Protocol (RTP) is a protocol
for sending the voice and video data over the TCP/IP network. The
protocol relies on other protocols, such as SIP or H.323, to perform
call setup and teardown. It was developed by the IETF Audio-Video
Transport Working Group and is specified in RFC 3550. There is not a
defined port for the RTP protocol, but it is normally configured to use
ports in the range 16384–32767. The protocol uses a dynamic port range,
so it is not ideally suited to traversing firewalls.
T.38— The Real-Time Facsimile Transport (T.38)
protocol is an International Telecommunication Union (ITU) standard for
transmitting faxes over TCP/IP. The protocol is described in RFC 3362.
Although it can support call setup and teardown, it is normally used in
conjunction with a signaling protocol such as SIP.
It is important to note
that the Exchange Server 2010 Unified Messaging server is also a Windows
server, a web server, and a member of the Active Directory domain.
There are a myriad of protocols, including domain name system (DNS),
Hypertext Transfer Protocol (HTTP), Lightweight Directory Access
Protocol (LDAP), remote procedure calls (RPC), and Simple Mail Transfer
Protocol (SMTP) among others, that the servers use to communicate with
other servers in addition to the telephony communications.
Unified Messaging
Port Assignments
Table 3 shows the IP ports that unified messaging uses for each
protocol. The table also shows if the ports can be changed and where.
Table 3. Ports Used for Unified Messaging
Protocols
Protocol | TCP Port | UDP Port | Can Ports Be Changed? |
---|
SIP-UM Service | 5060 | | Ports are hard-coded. |
SIP-Worker Process | 5061 and 5062 | | Ports
are set by using the Extensible
Markup Language (XML) configuration file. |
RTP | | Port range above 1024 | The
range of ports can be changed in the Registry. |
T.38 | | Dynamic port above 1024 | Ports
are defined by the system. |
UM Web Service | Dynamic port above 1024 | | Ports
are defined by the system. |