Jump to content

Polycom® Siren/G 722.1 FAQs

What Is G.722.1?

Approved on September 30th, 1999, after a four-year selection process involving extensive testing, ITU-T Recommendation G.722.1 is a state-of-the-art international standard wideband audio compression algorithm. It is based on Polycom's third-generation Siren compression technology and is derived from Polycom's field-proven PT716plus algorithm. Polycom developed this technology to meet the demanding audio needs of the multimedia community. It provides high-quality audio at low bit rates with low delay and very low complexity. It works for all kinds of audio signal including speech, music, and singing, for example.

Back to top

What Is Siren at 16 kbps?

Siren at 16 kbps is an extension of the G.722.1 standard that operates at 16 kbps.

Back to top

How to Obtain ITU-T Recommendation G.722.1?

An electronic copy of G.722.1 may be purchased direct from the ITU online bookstore at http://www.itu.int/rec/T-REC-G.722.1/en.

Use of the ITU-T Recommendation G.722.1 is subject to executing a license agreement with Polycom.

Back to top

What Exactly Is Contained in ITU-T Recommendation G.722.1?

ITU-T Recommendation G.722.1 consists of the following items:

  • Description of the wideband coding algorithm.
  • Reference C code for the encoder and decoder.
  • Test vectors (signals), a tool to assist implementers in verifying the accuracy of their implementation.

Back to top

How Do I Implement G.722.1?

First, purchase a copy of G.722.1 from the ITU-T. Recommendation G.722.1 contains all the information required to implement the algorithm. The output signals from any implementation of G.722.1 on any hardware should exactly match that of the reference C code when processing the same identical input signals. The test vectors provided in the standard are designed for the purpose of testing the correctness of an implementation.

There are both input and output test vectors to test the encoder and decoder implementations. The test vectors were created to exercise as much of the algorithm code as possible. Therefore any implementation successfully reproducing the output test vectors is considered to accurately reproduce the reference C code performance.

Back to top

Download a Demonstration Program of G.722.1

SirenZip is a program that runs on Microsoft Windows 95 or higher versions of the Microsoft Windows operating system. It executes G.722.1 at one of three selectable bit rates (the 16,000 kbps extension, 24,000 kbps or 32 kbps). Download your free demonstration copy of SirenZip (download size 208 KB).

How to Use SirenZip

  • Once downloaded the SirenZip is ready to run.
  • Double click on SirenZip.exe, then click on the Siren Encode and enter the correct audio source and output bitstream file names. Select the correct bit rate, then either click on Play Wav Input to hear the input file, or click on Encode Wave Input to compress the file using G.722.1.
  • To decode, click on Siren Decode and enter the correct bitstream source file and audio output file names. Then click on either Play Wav Output to hear the G.722.1 decoded output file, or click on Decode Bitstream to synthesize the audio output using G.722.1.
  • Here are some points to keep in mind when using SirenZip:
    • The encoder accepts audio input files in a mono wave format.
    • The decoder outputs audio files in a mono wave format.
    • The bit rate may be set in the encoder, the decoder automatically knows the correct operational bit rate.
    • Audio input files longer than 60 seconds will be truncated to 60 seconds.
    • The audio input may be at one of two sample rates, 16,000 Hz or 22,050 Hz. If the input is sampled at 22,050 Hz it will be down sampled to the correct 16 kHz by SirenZip; this may result in some changes to the audio fidelity. It is recommended to use 16 kHz sampling whenever available.

Back to top

Technical Specifications

G.722.1 is specified as a fixed-point algorithm in the ITU-T standard. A floating-point version will be standardized in the future by the ITU-T, and it will interoperate with the fixed-point standard.(An interoperable floating-point version exists at Polycom.) The MIPS complexity numbers below are examples of non-optimized implementations on three different types of DSP. Note, two of the DSPs shown are floating-point units.

General G.722.1 Parameters

Parameter

Value

Audio sample rate

16 kHz

Bit rate (rate may change on any frame boundary)

16, 24, 32 kbps (16 kbps is a Polycom extension to the standard)

Audio Bandwidth

 

50 Hz - 7 kHz.

Audio frame size

 

20 ms

Algorithmic delay (see Note 1)

 

40 ms

 

RAM (fixed point)

 

< 7.5 k bytes

ROM table space (fixed point)

 

~ 20 k bytes

 

MIPS ratio between encoder and decoder

 

approximately 1 to 1

 

Example MIPS Figures for Different Processors

 

Processor

MIPS (encode + decode)

TI TMS320C50

 

13.9 (= 6.95 + 6.95) (un-optimized code for fixed point G.722.1)

 

TriMedia TM 1300

 

4 (= 2+2) (floating point implementation)

 

TI TMS320C31

9.25 (= 4.51+4.51) (floating point implementation)

 

Back to top

Why Wideband?

Traditional telephony is called "narrowband" because it passes audio signals only in the range of 300-3500 Hz, a bandwidth of just 3.2 kHz. This narrow bandwidth gives telephone calls their characteristic "tinny" sound, as compared to the rich wideband sound of high fidelity systems. G.722.1 provides 7 kHz of audio bandwidth (50-7000 Hz), a vast improvement, and closer to FM radio quality than to traditional telephone quality. Wideband audio is unanimously preferred when compared to narrowband audio quality.

For example, in a telephone conversation, have you ever confused the words "see" and "fee"? The "f" and "s" sounds are easily confused because their intelligibility is lost with inadequate rendition of the high frequencies. Such confusion never occurs with wideband coding, because all the frequencies required for speech are now fully represented.

The whole audio experience when using wideband is far more natural and relaxing to the ears.

G.722.1 is also capable of excellent music reproduction at unprecedented low bit rates. It sounds far better than AM radio.

Back to top

Applications

Example G.722.1 and Siren at 16 kbps applications:

  • Wideband IP telephony
  • Streaming audio (including music!) over the Internet
  • Video conferencing
  • Audio conferencing
  • Audio storage playback (recorders...)
  • Store-and-forward messaging (voice mail)
  • Audio-enabling your Web site

IP telephony, video conferencing, and audio conferencing all have very similar audio needs: High audio quality with low latency and complexity. In addition, the ability to change bit rate to accommodate channel requirements is necessary. G.722.1 allows the rate to change between the 24-, 32-, and the Siren-extension 16 kbps rates on any 20 ms frame boundary.

In streaming applications, low complexity and cost in the client is an absolute must. G.722.1 fulfils this requirement without sacrificing quality. Bit rates of 16-, 24-, and 32 kbps enables clients to experience high-quality audio even on V.90 (56 kbps) modem connections.

Messaging is a store-and-forward application. Low complexity means low-cost hardware implementation. The high-quality audio compression ensures an excellent rendition of the sender's voice or music clip.

Back to top

Hear G.722.1 and Siren at 16 kbps for Yourself.

Download these .WAV files to hear the quality of G.722.1 for yourself.

(Note these sample files are uncompressed .WAV files playable on any computer. You don't need the G.722.1 codec to play them, but as a result they will take some time to download.)

Speech files

Download

3.5 kHz audio bandwidth, POTS toll quality

  speech_3p5kHz_mulaw.wav (download size 114 KB)

7 kHz audio bandwidth, coded at 16 kbps using Siren

 speech_16kbps_siren.wav (download size 452 KB)

7 kHz audio bandwidth, coded at 24 kbps using G.722.1

 speech_24kbps_g722p1.wav (download size 452 KB)

7 kHz audio bandwidth, coded at 32 kbps using G.722.1

 speech_32kbps_g722p1.wav (download size 452 KB)

Music files

Download

3.5 kHz audio bandwidth, POTS toll quality

 music_3p5kHz_mulaw.wav (download size 72 KB)

7 kHz audio bandwidth, coded at 16 kbps using Siren

 music_16kbps_siren.wav (download size 286 KB)

7 kHz audio bandwidth, coded at 24 kbps using G.722.1

 music_24kbps_g722p1.wav (download size 286 KB)

7 kHz audio bandwidth, coded at 32 kbps using G.722.1

 music_32kbps_g722p1.wav (download size 286 KB)

Back to top

Compare Siren™ with Windows Media Player and MP3

For users who have low-speed internet access, Siren performs extremely well at low bit rates (e.g. if internet access is through 28.8 through 56 kbps dial-up modems). Hear comparison samples:

Siren at 14 kHz audio bandwidth at 22 kbps vs. Windows Media Player (Only supports a sampling rate of 44 kHz at 22 kbps)*.

Bit rate

Download

Download

22 kbps

 siren_22kbps.wav

 wmplayer_22kbps.asf

*The Windows Media Player's bandwidth is whatever the bit rate allows - the high sampling 44 kHz rate only allows the possibility of 20 kHz audio bandwidth.

Siren at 14 kHz audio bandwith at 24 kbps vs. MP3 (has a maximum of 11 kHz audio bandwidth at 24 kbps)

Bit rate

Download

Download

24 kbps

 siren_24kbps.wav

 mp3_24kbps.asf

Back to top

Capabilities Exchange for H.320, H.323 and H.324 Systems

In order for different vendors' equipment to interoperate using G.722.1, it is necessary to standardize the capability exchange and mode selection for G.722.1. These technical aspects for H.320, H.323, and H.324 systems have been defined by ITU-T Study Group 16.

Licensees will receive all information necessary to negotiate the use of G.722.1 per ITU-T standards, as well as capability exchange and negotiation procedures for inter-vendor interoperable use of the Polycom 16 kbps extension of the standard.

Back to top

Can Siren7/14/G.719 be used in open source projects?

Yes, provided the license terms are followed. Specifically, the royalty-free license offered by Polycom includes a requirement to give Polycom credit under specified conditions. The easiest way to allow this to happen properly is for the author of the open source code to execute a license with Polycom and to direct the users of the open source code to Polycom so that the user may also execute the agreement. The author should include the necessary elements in the open source code to meet the requirements as to the open source code distribution, with the users complying for their own products which use the open source code.

Back to top

How do I get a license for Siren?

You can download a license agreement. Sign and mail two copies; Polycom will sign and return one to you. Full information is available at http://www.polycom.com/company/about_us/technology/siren_g7221/license_schedule.html 

Back to top

How can I verify compatibility between my implementation and G.722.1 or G.722.1 Annex C?

ITU-T Recommendation G.722.1 Annex A specifies the packet format, capability identifiers, and capability parameters for G.722.1 and G.722.1C and they can use that document to verify their design. For real-time testing, they can call Polycom's systems with G.722.1C to know if there are the interoperability problems.

Back to top

Is the reference G.722.1/G.722.1 code in Little Endian or Big Endian Mode or both?

It is "Little Endian", i.e.: the low-order byte of the number is stored in memory at the lowest address, and the high-order byte at the highest address.

Back to top

What is the delay of the G.722.1/G.722.1C codec?

The frame length is 20 ms. The total stated delay of 40 ms is the sum of two parts:

  1. 20 ms due to frame buffering
  2. 20 ms one frame look-ahead delay – the transform window


To make this clear, suppose your audio is already packetized into 20 ms. Then Siren will only add an additional frame delay – 20 ms ( NOT 40 ms )

Back to top

Why does G.722.1/G.722.1C have a longer latency than G.722?

This incremental 19 ms or so, compared to G.722, is due to the nature of the codec. Any transform codec (Fourier transform or comparable) such as Siren/G.722.1 needs enough of a temporal sample to be able to do a frequency analysis. In Siren, 20ms was chosen as the best compromise for this application. G.722 is fundamentally different – it’s an ADPCM codec, actually two ADPCM codecs, and operates sample by sample on two separate bands – the low half and the high half.

The tradeoff is that G.722 requires two to three times the data rate. So as is always the case in codec selection, choosing a codec will be a task of prioritizing requirements. For VoIP, we haven’t seen the incremental delay of Siren/G.722.1 presenting a problem, especially considering its higher data efficiency, and it's in the middle of the field with other codecs; AMR and AMR-WB, SILK, ISAC, and others present similar tradeoffs between latency and efficiency. For more information on this, please refer to the white paper "VoIP to 20 kHz: Codec Choices for High Definition Voice Telephony," which can be found at http://www.polycom.com/global/documents/whitepapers/codecs_white_paper.pdf 

Back to top

How do these delays compare to MP3?

MP3 (MPEG-1/2 Layer-3) has an algorithmic delay of 2591 samples. For the sample rate of 48kHz, it corresponds to about 54ms. In case of 32kHz, it’s about 81ms.

Back to top

What is the total data rate when Siren is encapsulated within RTP?

The bit rate will vary once encapsulated in RTP; the Siren payload is not treated differently than another codec or stream, and so consequently the tabulation will be specific to the network and the application. However, the RTP payload for G.722.1 and G.722.1C is specified in ITU-T Recommendation G.722.1 Annex A, and also specified in IETF RFC 3047 and draft-ietf-avt-rfc3047-bis-04 which updates RFC 3047 adding support for G.722.1C.

Back to top

What are the correct signal formats for running the Siren simulator on a PC?

The input signal to G.722.1C should be 16-bit PCM format with no header (usually named as *.pcm). The *.wav format has a header and should be converted into 16-bit PCM format for encoding. The output of the codec is the same 16-bit PCM signal; this can not be played with Windows Media Player.

To correctly code the file, perform the following steps:

  1. Convert the *.wav file into the *.pcm format. This can be done using commercial software such as CoolEdit (now Adobe Audition).
  2. Encode and decode the file.
  3. Convert the decoder output into the *.wav format to play it with Windows Media Player,

Back to top

What commands do I use to run the G.722.1/G.722.1C simulator at 16kbps?

G.722.1D.Zip contains the executables for the floating-point version of G.722.1/G.722.1C. The following DOS command lines are used to run the G.722.1 (Siren7) encoder and decoder at 16kbps.

Encoder:

 
encode  0  <input_speech_file>  <output_bitstream_file>  16000  7000

Decoder:

decode  0  <input_bitstream_file>  <output_speech_file>  16000  7000

Back to top

What is the difference between Siren7 and G.722.1?

G.722.1 is the formal name for the codec, as approved by the ITU. Siren7 is an informal name which refers to the same codec. There was an earlier form of Siren7, which is licensed to Microsoft and is included in the Microsoft Windows operating system. For purposes of clarity, we will call that earlier version Siren7pre. The differences between Siren7pre and G.722.1 are in the header information. In Siren7pre, the first 2 bits in its 20 ms payload packet indicate the bit rate of the codec. The last 4 bits of the packet are a checksum to check for bit errors in the packet.

In contrast, in G.722.1, there is no 2 bits for the header and there is no 4 bits for the checksum. Thus, the total available bits per packet is 6 bits more for G.722.1 compared to Siren7pre. Otherwise, the actual codec algorithm is identical.

Because the number of bits for the actual codec differs by 6 bits per packet, compatibility between G.722.1 and Siren7pre must be accomplished via transcoding.

This is true for all bit rates. This also means that for Siren7 for the16Ksps/ 16kbps mode of operation, out of 320 bits/frame 314 are used for encoding and the remaining 6 for checksum and header. All new implementations of Siren7 should conform to the G.722.1 specification.

Back to top

What is the difference between Siren14 and G.722.1 Annex C?

G.722.1 Annex C is the formal name for the codec, as approved by the ITU. Siren14 is an informal name which refers to an earlier form of the codec and has slight differences in the bitstream format. The Siren14 bitstream format is different to that of G.722.1C. Siren14 has 2-b bitrate indicator at the beginning and 4-b CRC at the end. G.722.1C has coded data only. Audio quality is same for both codecs.

All new implementations of "Siren14" should conform to the G.722.1C specification.

Back to top

Where does packetization occur? Is it inside the codec, or outside the encoding?

The 2 bit header and 4 bit checksum happens inside the “C” subroutine for the Siren encoder and decoder.

Back to top

Does the siren.exe from the polycom website output this kind of header output (i.e. frame with 2 bit header and 4 bit checksum)?

Yes. You can see this in specific,

SirenEncode ( ) in the file encoder.c

and

SirenDecode( ) in the file decoder.c

It will be obvious how Siren creates and reads the 2 bit header and 4 bits checksum. The software is well documented. The first two bits have different meanings in the 14 kHz and 7 kHz modes –

 
For 7 kHz:
      if (number_of_bits_per_frame == 640)// this corresponds to 640 x 50 = 32000 bits per second 
        current_word = 3;
      else if (number_of_bits_per_frame == 480)
        current_word = 2;
      else if (number_of_bits_per_frame == 320)
        current_word = 1;
      else
        current_word = 0;
 
For 14 kHz:     
      if (number_of_bits_per_frame == 960)// this corresponds to 960 x 50 = 48000 bits per second
        current_word = 3;
      else if (number_of_bits_per_frame == 640)
        current_word = 2;
      else if (number_of_bits_per_frame == 480)
        current_word = 1;
      else
        current_word = 0;

The 4 bit checksum is calculated the same way for Siren7 and Siren 14.

Back to top

Does my codec have to produce a bit-exact match?

Officially, ITU-T requires a bit-exact match. This is the reason for supplying a "C" implementation to the public. Polycom does not monitor implementations for this (I don't know anybody who does, especially if they perform well).

Back to top

Is the 16kbps rate included in the ITU standard for G.722.1?

The 16Kbps rate is not part of the G.722.1 standard as defined by the ITU. Polycom signals the 16K rate in a proprietary way for H.323 and H.320. RFC 3047 does allow for non-standard rates and the 16K rate can be specified in a standard way for SIP. Polycom’s preference is that G.722.1 implementations enable the 16kbps rate, but this is not mandatory.

Back to top

When does Polycom use the RTP header for G.722.1/G.722.1C?

For H.323 and SIP Polycom uses the RTP format. For H.320 it is in the exact same format, but without a RTP header

Back to top

What Is the ITU?

The ITU is the International Telecommunications Union, based in Geneva, Switzerland. The ITU is the world's oldest international treaty organization (founded 1865), now part of the United Nations, and is responsible for standardization of technology for international telecommunications, including telephone, radio, and data communications. For more information, visit the ITU's Web site at http://www.itu.int.

Back to top

Technical Contact

For additional technical information please email: SirenInfo@polycom.com

www.polycom.co.uk provides product specific information for telepresence, voice and video conferencing customers in the United Kingdom and the Republic of Ireland.

270 Bath Road Slough Berkshire SL1 4DX UK. Sales enquiries: +44 (0) 1753 723282

© Polycom, Inc. All rights reserved.