Yet Another Beginner's Guide to Audio Codecs

Thursday, October 20, 2005
Some weeks back Knucklehead was asking on Roger's blog for an explanation of audio compression. Or something. I'm no longer sure what he was asking for but this is what I came up with.

Sound comes in waves through the air or water. To record music, some sort of facsimile which can be converted back into sound waves has to be produced. Perhaps the first such facsimiles were the rolls of paper tape used by old-time player pianos. If you've never seen them, these were just rolls of paper with holes punched in them in various locations. They were rolled through a mechanical device which, when it encountered a hole, caused a mechanical arm to pull a piano key. The paper tapes didn't really record the sound; they recorded an idealized version of the sound represented by the notes. Loudness and softness and incidental overtones which determined the timbre of the real-live piano in the first place could not be replicated by this means.

Later, player pianos were replaced by analog devices invented by Thomas Edison. These used the actual sound waves to cause physical deformities on some sort of medium. The most familiar version of this for most of us is vinyl records, which can still be found in the hip but misguided sections of the larger cities. These physical deformities could then be translated back by suitable equipment into sound waves. These analog devices represented a huge step forward, because they had the ability to capture all of the actual sounds involved rather than an idealized version of the sounds represented by the notes. How would one "record" a human voice on a player piano's paper tape? Early recordings of Caruso's very fine voice--one of my grandmother's proudest possessions--made these new devices very popular. There were limitations to the ability to record the sounds because there were physical limitations in the materials written on. This created hard physical limits to the playback quality one could achieve.

The digital electronics revolution replaced the older analog devices by recording sound as a sequence of numbers. The more numbers, the more information recorded about the sound, so that there is in principle no limit whatsoever to the fidelity with which one can reproduce the sound (except lack of space on one's hard drive for keeping all those numbers). In practice one does not need to record an infinity of numbers, because there are hard physical limits to the ability of the human ear to hear, and of the human brain to understand, what it is hearing. It is only necessary for the recording to contain enough numbers to meet this goal. For a technical reason called Nyquist's Theorem it is necessary to sample sounds when recording digitally at a frequency above twice the maximum frequency audible to the human ear, and that is the sampling rate at which CDs are recorded. A CD is not the exact sound that was recorded, but the claim is that, given sufficiently good playback equipment, the difference is inaudible to the human ear. What actually gets put on the CD is a complicated conversion of the original recording. This is done to save space and to correct for errors, so that even if the CD is scratched the playback is perfect--something that is far from true with analog recordings. The complete mathematical background to CD recording can be found here.

Modern day music files are kept in one of two digital formats, lossless or lossy. The issue is how to take all the numbers off a CD and put them into a file. "Lossless encoding" means that the process keeps all the numbers, perhaps in code; "lossy" means that some numbers are deliberately thrown away in order to save space. Lossy encoding inevitably leads to degradation of the sound. If it is done cleverly though, the degradation is essentially inaudible to the human ear. Because there are different ideas about how best to store the numbers or about how to compress the numbers, different standards have arisen. The issue is further complicated and encumbered by patents on the various algorithms used. A good guide to the various standards can be found here.

Among the popular lossless formats are APE, WAV, FLAC, and ALE. FLAC stands for "Free Lossless Audio Codec" and it is popular because it is completely free of patent encumbrances and is open source. Among the popular lossy codecs are MP3, Ogg, MPC, AAC, and WMA. They each try to throw out those sounds which human beings generally don't notice. MP3 and Ogg both achieve a ten- or twelve-to-one reduction in the storage required by using these "psycho-acoustic" models. The various formats differ in the psycho-acoustic model they employ, with varying results to the listener. Results of one extensive test can be found here in the case of 128-kbit recordings. Ogg generally comes out on top in user-determined quality under blind tests. At higher bitrates, MPC is generally considered the best format. The higher the bitrate, the lower the compression, and the greater the fidelity to the original CD. Modern pocket players like the iPod require that compression of some sort be performed in order to store all those songs.

There is one other fly in the ointment one must consider because, although the standards are fixed, the actual software used to record to MP3 format, for example, may use a different algorithm from that used by other software. That is, two different programs for playing the same MP3-compressed file will yield two different results in terms of quality. There are marked quality differences between the various programs available for MP3 creation; to the best of my knowledge the LAME codec produced partly in Boulder is considered to be of highest quality

In addition to having high quality, the Ogg format, like the FLAC format, is patent-free and open source. The main downside to the Ogg format is that it is not as well known as the others and is not so well supported by proprietary vendors who wish to lock users into their own special formats. Most of the main software decoders play Ogg. It is harder to find hardware that plays Ogg, although the iRiver device does so just amazingly well in my expeerience. More Ogg information can be found here.

5 comments:

truepeers said...

Interesting post. When you talk about what is audible to the human ear, do you mean that recording standards are determined by some theoretical understanding of our physiological or neurological capabilities, or largely by testing on sundry individuals?

I ask because I know a recording engineer and when you listen to a recording with him, he makes observations which are not audible to me. So it seems one can train one's ear to hear more than the average person. Do you think the limits of audibility for the skilled will advance with the progress in digital technologies?

chuck said...

When you talk about what is audible to the human ear, do you mean that recording standards are determined by some theoretical understanding of our physiological or neurological capabilities,

I can't speak to audio codecs, but jpeg image compression is similar and in that case I believe the measured ability of the eye to discriminate is taken into account in the lossy compression. Off course, my eye for color is not anything like the eye of an artist, so your comment still applies. I could live with an amount of compression that would probably drive an artist nuts. Fortunately, the lossiness is selectable. I wonder if the same is true of audio codecs?

MeaninglessHotAir said...

do you mean that recording standards are determined by some theoretical understanding of our physiological or neurological capabilities, or largely by testing on sundry individuals?

The answer is that both are used.

Do you think the limits of audibility for the skilled will advance with the progress in digital technologies?

I don't think too much more progress in digital technologies can be made because we are close to the theoretical limit already. There is only so much compression that can be done before you start to lose significant information. That there is a theoretical limit comes down to the post on information theory which Seneca owes us.

The only thing missing now is an entirely digital speaker system, which no one has yet created to the best of my knowledge.

Knucklehead said...

MHA,

Thanks! Your section on "lossy" vs. "lossless" formats was the sort of thing I was asking about - what are the pros and cons of among the various formats.

JB said...

Truepeers,

Check this out.