Approved on September 30th, 1999, after a four-year selection process involving extensive testing, ITU-T Recommendation G.722.1 is a state-of-the-art international standard wideband audio compression algorithm. It is based on Polycom's third-generation Siren compression technology and is derived from Polycom's field-proven PT716plus algorithm. Polycom developed this technology to meet the demanding audio needs of the multimedia community. It provides high-quality audio at low bit rates with low delay and very low complexity. It works for all kinds of audio signal including speech, music, and singing, for example.
Siren at 16 kbps is an extension of the G.722.1 standard that operates at 16 kbps.
An electronic copy of G.722.1 may be purchased direct from the ITU online bookstore at http://www.itu.int/rec/T-REC-G.722.1/en.
Use of the ITU-T Recommendation G.722.1 is subject to executing a license agreement with Polycom.
ITU-T Recommendation G.722.1 consists of the following items:
First, purchase a copy of G.722.1 from the ITU-T. Recommendation G.722.1 contains all the information required to implement the algorithm. The output signals from any implementation of G.722.1 on any hardware should exactly match that of the reference C code when processing the same identical input signals. The test vectors provided in the standard are designed for the purpose of testing the correctness of an implementation.
There are both input and output test vectors to test the encoder and decoder implementations. The test vectors were created to exercise as much of the algorithm code as possible. Therefore any implementation successfully reproducing the output test vectors is considered to accurately reproduce the reference C code performance.
SirenZip is a program that runs on Microsoft Windows 95 or higher versions of the Microsoft Windows operating system. It executes G.722.1 at one of three selectable bit rates (the 16,000 kbps extension, 24,000 kbps or 32 kbps). Download your free demonstration copy of SirenZip (download size 208 KB).
How to Use SirenZip
G.722.1 is specified as a fixed-point algorithm in the ITU-T standard. A floating-point version will be standardized in the future by the ITU-T, and it will interoperate with the fixed-point standard.(An interoperable floating-point version exists at Polycom.) The MIPS complexity numbers below are examples of non-optimized implementations on three different types of DSP. Note, two of the DSPs shown are floating-point units.
General G.722.1 Parameters
Parameter |
Value |
|---|---|
Audio sample rate |
16 kHz |
Bit rate (rate may change on any frame boundary) |
16, 24, 32 kbps (16 kbps is a Polycom extension to the standard) |
Audio Bandwidth |
50 Hz - 7 kHz. |
Audio frame size |
20 ms |
Algorithmic delay (see Note 1) |
40 ms |
RAM (fixed point) |
< 7.5 k bytes |
ROM table space (fixed point) |
~ 20 k bytes |
MIPS ratio between encoder and decoder |
approximately 1 to 1 |
Example MIPS Figures for Different Processors
Processor |
MIPS (encode + decode) |
|---|---|
TI TMS320C50 |
13.9 (= 6.95 + 6.95) (un-optimized code for fixed point G.722.1) |
TriMedia TM 1300 |
4 (= 2+2) (floating point implementation) |
TI TMS320C31 |
9.25 (= 4.51+4.51) (floating point implementation) |
Traditional telephony is called "narrowband" because it passes audio signals only in the range of 300-3500 Hz, a bandwidth of just 3.2 kHz. This narrow bandwidth gives telephone calls their characteristic "tinny" sound, as compared to the rich wideband sound of high fidelity systems. G.722.1 provides 7 kHz of audio bandwidth (50-7000 Hz), a vast improvement, and closer to FM radio quality than to traditional telephone quality. Wideband audio is unanimously preferred when compared to narrowband audio quality.
For example, in a telephone conversation, have you ever confused the words "see" and "fee"? The "f" and "s" sounds are easily confused because their intelligibility is lost with inadequate rendition of the high frequencies. Such confusion never occurs with wideband coding, because all the frequencies required for speech are now fully represented.
The whole audio experience when using wideband is far more natural and relaxing to the ears.
G.722.1 is also capable of excellent music reproduction at unprecedented low bit rates. It sounds far better than AM radio.
Example G.722.1 and Siren at 16 kbps applications:
IP telephony, video conferencing, and audio conferencing all have very similar audio needs: High audio quality with low latency and complexity. In addition, the ability to change bit rate to accommodate channel requirements is necessary. G.722.1 allows the rate to change between the 24-, 32-, and the Siren-extension 16 kbps rates on any 20 ms frame boundary.
In streaming applications, low complexity and cost in the client is an absolute must. G.722.1 fulfils this requirement without sacrificing quality. Bit rates of 16-, 24-, and 32 kbps enables clients to experience high-quality audio even on V.90 (56 kbps) modem connections.
Messaging is a store-and-forward application. Low complexity means low-cost hardware implementation. The high-quality audio compression ensures an excellent rendition of the sender's voice or music clip.
Download these .WAV files to hear the quality of G.722.1 for yourself.
(Note these sample files are uncompressed .WAV files playable on any computer. You don't need the G.722.1 codec to play them, but as a result they will take some time to download.)
Speech files |
Download |
|---|---|
3.5 kHz audio bandwidth, POTS toll quality |
|
7 kHz audio bandwidth, coded at 16 kbps using Siren |
|
7 kHz audio bandwidth, coded at 24 kbps using G.722.1 |
|
7 kHz audio bandwidth, coded at 32 kbps using G.722.1 |
|
Music files |
Download |
|---|---|
3.5 kHz audio bandwidth, POTS toll quality |
|
7 kHz audio bandwidth, coded at 16 kbps using Siren |
|
7 kHz audio bandwidth, coded at 24 kbps using G.722.1 |
|
7 kHz audio bandwidth, coded at 32 kbps using G.722.1 |
|
For users who have low-speed internet access, Siren performs extremely well at low bit rates (e.g. if internet access is through 28.8 through 56 kbps dial-up modems). Hear comparison samples:
Siren at 14 kHz audio bandwidth at 22 kbps vs. Windows Media Player (Only supports a sampling rate of 44 kHz at 22 kbps)*.
Bit rate |
Download |
Download |
|---|---|---|
22 kbps |
*The Windows Media Player's bandwidth is whatever the bit rate allows - the high sampling 44 kHz rate only allows the possibility of 20 kHz audio bandwidth.
Siren at 14 kHz audio bandwith at 24 kbps vs. MP3 (has a maximum of 11 kHz audio bandwidth at 24 kbps)
Bit rate |
Download |
Download |
|---|---|---|
24 kbps |
In order for different vendors' equipment to interoperate using G.722.1, it is necessary to standardize the capability exchange and mode selection for G.722.1. These technical aspects for H.320, H.323, and H.324 systems have been defined by ITU-T Study Group 16.
Licensees will receive all information necessary to negotiate the use of G.722.1 per ITU-T standards, as well as capability exchange and negotiation procedures for inter-vendor interoperable use of the Polycom 16 kbps extension of the standard.
Yes, provided the license terms are followed. Specifically, the royalty-free license offered by Polycom includes a requirement to give Polycom credit under specified conditions. The easiest way to allow this to happen properly is for the author of the open source code to execute a license with Polycom and to direct the users of the open source code to Polycom so that the user may also execute the agreement. The author should include the necessary elements in the open source code to meet the requirements as to the open source code distribution, with the users complying for their own products which use the open source code.
You can download a license agreement. Sign and mail two copies; Polycom will sign and return one to you. Full information is available at http://www.polycom.com/company/about_us/technology/siren_g7221/license_schedule.html
ITU-T Recommendation G.722.1 Annex A specifies the packet format, capability identifiers, and capability parameters for G.722.1 and G.722.1C and they can use that document to verify their design. For real-time testing, they can call Polycom's systems with G.722.1C to know if there are the interoperability problems.
It is "Little Endian", i.e.: the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address.
The frame length is 20 ms. The total stated delay of 40 ms is the sum of two parts:
To make this clear, suppose your audio is already packetized into 20 ms. Then Siren will only add an additional frame delay – 20 ms ( NOT 40 ms )
This incremental 19 ms or so, compared to G.722, is due to the nature of the codec. Any transform codec (Fourier transform or comparable) such as Siren/G.722.1 needs enough of a temporal sample to be able to do a frequency analysis. In Siren, 20ms was chosen as the best compromise for this application. G.722 is fundamentally different – it’s an ADPCM codec, actually two ADPCM codecs, and operates sample by sample on two separate bands – the low half and the high half.
The tradeoff is that G.722 requires two to three times the data rate. So as is always the case in codec selection, choosing a codec will be a task of prioritizing requirements. For VoIP, we haven’t seen the incremental delay of Siren/G.722.1 presenting a problem, especially considering its higher data efficiency, and it's in the middle of the field with other codecs; AMR and AMR-WB, SILK, ISAC, and others present similar tradeoffs between latency and efficiency. For more information on this, please refer to the white paper "VoIP to 20 kHz: Codec Choices for High Definition Voice Telephony," which can be found at http://www.polycom.com/global/documents/whitepapers/codecs_white_paper.pdf
MP3 (MPEG-1/2 Layer-3) has an algorithmic delay of 2591 samples. For the sample rate of 48kHz, it corresponds to about 54ms. In case of 32kHz, it’s about 81ms.
The bit rate will vary once encapsulated in RTP; the Siren payload is not treated differently than another codec or stream, and so consequently the tabulation will be specific to the network and the application. However, the RTP payload for G.722.1 and G.722.1C is specified in ITU-T Recommendation G.722.1 Annex A, and also specified in IETF RFC 3047 and draft-ietf-avt-rfc3047-bis-04 which updates RFC 3047 adding support for G.722.1C.
The input signal to G.722.1C should be 16-bit PCM format with no header (usually named as *.pcm). The *.wav format has a header and should be converted into 16-bit PCM format for encoding. The output of the codec is the same 16-bit PCM signal; this can not be played with Windows Media Player.
To correctly code the file, perform the following steps:
G.722.1D.Zip contains the executables for the floating-point version of G.722.1/G.722.1C. The following DOS command lines are used to run the G.722.1 (Siren7) encoder and decoder at 16kbps.
Encoder:
encode 0 <input_speech_file> <output_bitstream_file> 16000 7000
Decoder:
decode 0 <input_bitstream_file> <output_speech_file> 16000 7000
G.722.1 is the formal name for the codec, as approved by the ITU. Siren7 is an informal name which refers to the same codec. There was an earlier form of Siren7, which is licensed to Microsoft and is included in the Microsoft Windows operating system. For purposes of clarity, we will call that earlier version Siren7pre. The differences between Siren7pre and G.722.1 are in the header information. In Siren7pre, the first 2 bits in its 20 ms payload packet indicate the bit rate of the codec. The last 4 bits of the packet are a checksum to check for bit errors in the packet.
In contrast, in G.722.1, there is no 2 bits for the header and there is no 4 bits for the checksum. Thus, the total available bits per packet is 6 bits more for G.722.1 compared to Siren7pre. Otherwise, the actual codec algorithm is identical.
Because the number of bits for the actual codec differs by 6 bits per packet, compatibility between G.722.1 and Siren7pre must be accomplished via transcoding.
This is true for all bit rates. This also means that for Siren7 for the16Ksps/ 16kbps mode of operation, out of 320 bits/frame 314 are used for encoding and the remaining 6 for checksum and header. All new implementations of Siren7 should conform to the G.722.1 specification.
G.722.1 Annex C is the formal name for the codec, as approved by the ITU. Siren14 is an informal name which refers to an earlier form of the codec and has slight differences in the bitstream format. The Siren14 bitstream format is different to that of G.722.1C. Siren14 has 2-b bitrate indicator at the beginning and 4-b CRC at the end. G.722.1C has coded data only. Audio quality is same for both codecs.
All new implementations of "Siren14" should conform to the G.722.1C specification.
The 2 bit header and 4 bit checksum happens inside the “C” subroutine for the Siren encoder and decoder.
Yes. You can see this in specific,
SirenEncode ( ) in the file encoder.c
and
SirenDecode( ) in the file decoder.c
It will be obvious how Siren creates and reads the 2 bit header and 4 bits checksum. The software is well documented. The first two bits have different meanings in the 14 kHz and 7 kHz modes –
For 7 kHz:
if (number_of_bits_per_frame == 640)// this corresponds to 640 x 50 = 32000 bits per second
current_word = 3;
else if (number_of_bits_per_frame == 480)
current_word = 2;
else if (number_of_bits_per_frame == 320)
current_word = 1;
else
current_word = 0;
For 14 kHz:
if (number_of_bits_per_frame == 960)// this corresponds to 960 x 50 = 48000 bits per second
current_word = 3;
else if (number_of_bits_per_frame == 640)
current_word = 2;
else if (number_of_bits_per_frame == 480)
current_word = 1;
else
current_word = 0;
The 4 bit checksum is calculated the same way for Siren7 and Siren 14.
Officially, ITU-T requires a bit-exact match. This is the reason for supplying a "C" implementation to the public. Polycom does not monitor implementations for this (I don't know anybody who does, especially if they perform well).
The 16Kbps rate is not part of the G.722.1 standard as defined by the ITU. Polycom signals the 16K rate in a proprietary way for H.323 and H.320. RFC 3047 does allow for non-standard rates and the 16K rate can be specified in a standard way for SIP. Polycom’s preference is that G.722.1 implementations enable the 16kbps rate, but this is not mandatory.
For H.323 and SIP Polycom uses the RTP format. For H.320 it is in the exact same format, but without a RTP header
The ITU is the International Telecommunications Union, based in Geneva, Switzerland. The ITU is the world's oldest international treaty organization (founded 1865), now part of the United Nations, and is responsible for standardization of technology for international telecommunications, including telephone, radio, and data communications. For more information, visit the ITU's Web site at http://www.itu.int.
For additional technical information please email: SirenInfo@polycom.com