A Survey of Audio Coders for Electronic-Art Music
Coded digital audio is undeniably a necessity in the iPod age. Not long after making the switch from records to compact discs in our personal music libraries, we have grown accustomed to a lesser fidelity via audio coding in favour of portability, accessibility, storage-space and web/pod-casting capability among other advantages. In the last decade or so, millions of music lovers worldwide have transformed their CD, record, and cassette collections into digital data on their computer hard drives (often not with the best choice of coding); various low-datarate coders, such as Real audio have allowed audio webcasts as early as the 14.4 kbps age; and satellite radios, such as XM and Sirius, allow us to tune-in to low-bitrate audio streams across the continent.
For the electronic musician, audio coding plays additional important roles: (1) often compressed-audio files are used as sound source, either field-recordings done on a minidisc or other devices, sounds downloaded from the internet, or recorded sounds shared with other musicians via email, p2p or bit torrents. (2) Some musicians use the web for live collaboration via internet telephony and messaging services (the likes of Skype and Windows Messenger). Musicians also use coded sound to (3) post their music online, (4) podcast it, (5) send it to competitions or promote it online in any other way.
This survey examines several popular audio codecs and compares their accuracy and resulting sound quality. There are numerous audio compression formats, some lossy and some lossless, each approaching the task of reducing the digital-audio file’s size differently. Lossless compression formats such as Apple Lossless Audio Codec (ALAC) or Free Lossless Audio Codec (FLAC), among others, stay true to the original PCM source data and are assessable primarily by their encoder/decoder efficiency and performance, while lossy formats should, additionally, be evaluated by fidelity and accuracy, which are dependent mainly on the format’s psychoacoustic model, time-to-frequency mapping, and bit allocation.
Most psychoacoustic models of lossy compression formats are optimized for tonal (of tones) sound material, with little, if any, instantaneous radical spectral or dynamic changes, and occupy the spectrum most prominently between 0 and 5 kHz — the rest of the spectrum being taken mostly by related partials, not independent content. The psychoacoustic models of these compression formats rely on time-to-frequency mapping, which reduces the temporal resolution in favor of higher frequency resolution, therefore causing inefficiency in coding short-attacks and transient sounds. These characteristics make lossy audio coding far from ideal for electronic art-music and sound art.
This survey explores and evaluates some of the losses in several popular lossy codecs including mp3, m4a (AAC), mpc (Musepack), oma (Sony ATRAC3plus), ogg (Vorbis), and wma (Microsoft). All sounds will be compared in a low bitrate, 128 kbps (stereo, CBR or ABR), and a high one, 256 kbps. The choice of these two is not accidental; 128 kbps (also known as near-CD quality) is presently the bitrate of files purchased on iTunes in the m4p format (an m4a with DRM protection), and is also quite common in audio streaming. Additionally, this low datarate exposes the deficiencies of the psychoacoustic models quite effectively; to accentuate these deficiencies, the sound content in this survey is designed specifically to attack the vulnerabilities of the codecs. The higher bitrate, 256 kbps, is arguably a sufficiently high datarate for audio coding while still saving a significant amount of storage space. The usefulness of datarates 320 kbps and higher is questionable because lossless compression formats average bitrates that are not much higher (about 450 kbps) without any data loss (and therefore no need for a psychoacoustic model). Additionally, 256 kbps is the default bitrate used in Sony Hi-MD minidisc field recorders in the ATRAC3Plus format, which is one of the formats examined in this survey.
Since lossy compression relies on the transformation from the time domain to the frequency domain, compromising one resolution for the other, this survey examines each domain individually.
Test 1 — Time Domain
The first test examines the transient responses of the different coders. The coders are given the simple task of coding a 0.5 Hz square wave, which switches the amplitude’s polarity twice every cycle (therefore sounding a pulse every second). The switch is immediate (one sample) in the original PCM format and the coders are examined for (1) whether they retain the sharpness of the switch and (2) whether they distort the waveform and add noise.
The images below are in a single-sample resolution and are not always representative of fidelity; they demonstrate only the polarity switch but not necessarily the pulse in its totality, the overall sound or added noise as you shall find when listening to these examples. Click the images to listen and read the comments on the right-hand column. Despite the connecting lines between the individual samples in these images, in reality there is no such graduation between the amplitude values of adjacent samples; nevertheless, some sloping will occur in the conversion to acoustic energy at the loudspeaker.
To avoid player and format compatibility problems, all audio examples were reconverted to wav format and therefore should be playable on any system.
All coders used here are the latest available versions as of May 2007.
It is preferable to listen to these examples on a pro or semi-pro system with a wide, flat frequency response. If you are using headphones, please watch the levels!
In some cases it may take a few listen-attempts to segregate the noise; however, once heard, it is almost impossible to “unhear” it.
Additional points to consider:
- The wav format itself (PCM) is not error-free; the very concept of sampling introduces distortion, primarily by amplitude quantization (round-off noise), clipping, LSB oscillation, and aliasing; however, as this format is exclusively time-domain based and does not utilize a psychoacoustic masking model, its errors are much more predictable and easily handled in the codec than the other coding methods. When possible, it is recommended to use a 24-bit resolution or higher to reduce the round-off and LSB noises (or to send them outside the audible dynamic and frequency ranges). Presently, among the lossy compression formats surveyed, only the Microsoft WMA 10 Pro supports 24 bit coding; lossless compression supports it as well.
- When the audio signal is static in nature, simple, or with long silences, lossless coding may be advantageous even in terms of storage space. The lossless flac file in this example is almost X30 smaller than the (very) lossy 256 kbps mp3 format.
- It appears that the best performer in this section of the survey is Microsoft’s 256 kbps WMA 10 Pro format, with its 128 kbps counterpart at second place. The 128 kbps m4a format (the iTunes format) is quite adequate for the low bitrate as well; its 256 kbps version, however, does not offer a significant improvement in this example. The performance of all the other codecs is less than adequate, with ogg being the worst in this instance. These results may not represent other signal types.
- There are many other issues to consider when choosing a codec, among them are player compatibility and availability. Mp3, for instance, has been, by far, the most widely used, while clearly not the best choice for fidelity. This survey focuses on reproductive accuracy alone.
Test 2 — Frequency Domain
In the second test the coders are given the task to code a white noise signal, which was passed through a digital HPF with a 10 kHz cutoff frequency and a moderate slope; this allows us to examine how the coders cope with a spectrally irregular signal in their weaker areas.
- This time the images are linear-scale spectrographs (generated on Sonic Visualiser) showing 0–22.5 kHz. The transparent window on the right-hand side provides information about the highest coded frequency bin.
- Click the images to listen to the audio examples in wav format.
Additional points to consider:
- There are several mp3 coders, primarily LAME, Fraunhofer, Xing, and BladeEnc, the first two being the most widely used — Fraunhofer because of being first and LAME as a result of being free. This survey did not include any examples encoded by Xing as it is the encoder with the most problems (hear a Xing example in 128 kbps).
- The best performer in this test is the m4a codec (in both bitrates). Among the surveyed coders, it introduces the smallest amount of audible artifacts and it keeps the truest to the original timbre; a fairly close runner-up would be SONY ATRAC3Plus. Ogg Vorbis does a fine job in 256 kbps, but all the other coders in this test distort the signal and introduce significant audible artifacts. The mp3 format is the worst of all.
- In all low-bitrate coded signals, the coders cut a chunk off the high frequencies. While cutting off 6 kHz of our hearing range in the Blade-encoded 128 kbps mp3 (assuming 20 kHz as the optimal hearing limit) is a significant loss for the overall sound, it is not as big a loss as a linear spectrograph may suggest. While in our logarithmic pitch perception, the frequency range 20–6000 Hz represents a pitch range of more than 8 octaves, the 6 kHz from 14 kHz to 20 kHz amount for (approximately) a triton in an area of the spectrum where our perceptual sensitivity to loudness and pitch is, at best, extremely limited.
Final Conclusions and Recommendations
- Make lossless coding your first choice! If your sound is spectrally simple, static in nature, or has long periods of silence, lossless coding may even save you storage space when compared with lossy formats. On the other hand, if your sound is spectrally complex, includes many transients, or both, the loss in fidelity may be too great a price to pay for the space saving offered by lossy coding.
- In instances where you must compress your sound, m4a (AAC) would be your best bet among the coders surveyed here — both in its low and high bitrates; its performance was extremely effective in both tests. Additionally, being the iTunes format, m4a is widely used, strongly supported and frequently updated.
- Mp3 is not among the strong performers of this survey, but its history, global player-compatibility, and accessibility give it dominance. In many instances you will be asked to provide your works in mp3 format; you must choose your coder and settings very carefully. Among the surveyed encoders, LAME produced the best (least bad) results, with one exception: the low bitrate encoding of a spectrally complex signal which was, perhaps, a bit better with Blade. If you are forced to use 128 kbps, and especially if your digital sound is not rich in transients, you may be better off with Blade instead of LAME. The only way to be certain, naturally, is to try them both.
- This survey did not examine the effectiveness of using a VBR coding. As a general rule, allowing an encoder the flexibility of varying the bitrate may prove useful on sounds with large dynamic and spectral ranges (such as classical music, experimental music, electronic art-music, sound art, etc.); however, this additional decision-making process may also introduce new problems. Most contemporary codecs, nonetheless, allow effective usage of both CBR and VBR; some codecs use VBR or ABR (a variable bitrate encoding with a preset average rate) by default; among them ogg Vorbis, m4a, and mpc.
- When recording with a Hi-MD minidisc, use PCM recording when possible, especially if you are recording a signal with many transients; however, if you must record continuously for a period longer than 1.5 hours (the time-limit on a 1GB disc) you may use ATRAC3Plus at 256 kbps (Hi-SP), which is quite effective throughout the entire audible spectrum.
- A quick look on the tests’ results shows no apparent advantages to using the Sony, Musepack and Vorbis formats over m4a. The Microsoft format demonstrated a slight advantage in the time domain test over the m4a but with a bigger penalty in the frequency domain. Nonetheless, if your signal is mostly made out of transients with little sustained or reverberated sounds you may find the WMA 10 Pro format advantageous.
- These two tests attack the codecs’ vulnerabilities in order to demonstrate how the codecs deal with the highly-irregular content of electroacoustic music as generally and as objectively as possible. However, each sound work may introduce its own “issues” and your best option would be to try-out several coders and evaluate their performances in the time and frequency domains.
Recommended Bibliography
Bosi, Marina and Richard E. Goldberg. Introduction to Digital Audio Coding and Standards. Boston, MA: Kluwer Academic Publishers, 2002.
Various. “Discussion of Audio Compression.”
A collection of documents posted by individuals, including technical data, tests, reviews, recommendations, and general chatter.
Various. “MP3 Tech.” 1996.
A site dedicated to the mp3 standard, including information about upcoming audio compression techniques, tests, source codes, and more.
With thanks to Kevin Austin and Tim Ramsay for their useful suggestions.
20 May 2007
Social top