Digital audio quality

February 7th, 2006  |  Tags:  |  9 Comments

Pliable has a mostly-reasonable rant about digital audio and audio quality. I’d like to raise a couple of objections, though. These thoughts were to be posted in a comment at the overgrown path, but have become overgrown in their own right.

There are two factors in the potential fidelity of PCM digital audio: sample rate and bit depth. This is only true for PCM audio (i.e., excluding SACD) and only true for uncompressed formats; more on MP3 and friends in a moment. The highest frequency that a digital signal can reproduce is given by the Nyquist-Shannon sampling theorem as Sample rate/2. (I have written about the sampling theorem and its blatant violation by popular culture here; this is such a pet peeve of mine that friends mention it as a joke in my presence.)

The bit depth, on the other hand, is what enables dynamic range. More bits means more discrete volume levels; CDs use 16 bits, meaning that there are 65,536 possible discrete dynamic levels for any sample. (If it sounds to you like reducing a continuous analog signal to one of a certain number of fixed amplitudes can introduce distortion, you’re right: read about dither.)

When you multiply the bit depth by the sampling rate, you get the bit rate: the number of bits per second used to represent an analog signal. There is a reasonable relationship between bit rate and fidelity, for obvious reasons, and Pliable is right to lament the decrease in fidelity from the perfect analog signal to a CD to a low-bit-rate MP3. However, there are a few problems with Pliable’s argument:

  1. Vinyl is not as great as we remember it. Sure, records sound “warm,” but that’s just pleasing harmonic distortion. A good record can represent frequencies up to 18khz, which would only require a sampling rate of 36khz to reproduce in the digital domain. Furthermore, vinyl does not have a uniform frequency response (hence the RIAA EQ curve). Finally, the dynamic range of vinyl suffers at the quiet end because of the medium’s low signal-to-noise ratio and at the loud end because very loud sounds may throw the needle.
  2. We aren’t recording for dogs. Independently of whether or not we can actually hear such high harmonics as to make high sampling rates necessary, high-end microphones like the Neumann U87 can only capture up to 20khz. Of course, with bits as cheap as they are, there is no reason not to increase the bit rate as high as the source equipment allows, but it seems to me that bits are better spent on depth (dynamic range) than on sampling rate (frequency response), at least after a certain point.

The biggest problem with Pliable’s argument, however, is that it relies on the claim that a 128kbps MP3 file is only about 1/10 the bit rate of a 1144kbps CD. This is true, but meaningless because the two formats represent audio in different ways. On a CD, an analog signal is represented as a stream of sampled amplitudes, as in the figures below:

Signal

A continuous signal

Signal19

The continuous signal, sampled 32 times with 19 possible amplitude levels

Signal6

The continuous signal, sampled 32 times with 7 possible amplitude levels

An MP3, on the other hand, represents audio data in the frequency domain, or as a time-varying spectrum. In the case of the continuous signal above, it could be represented perfectly in the frequency domain by a file that said “play a single period of a sine wave with frequency X, amplitude Y, and phase Z.” This file, it should be clear, would be substantially smaller than any digitally-sampled rendition of the sine, as well as more accurate.

You can in fact represent any time-domain signal (that is, one that represents amplitudes varying over time, like the sine above) as a set of frequency-domain signal (that is, representing a sound as a series of frequency spectra). Obviously, the sine above is a special case (any recording that Pliable or I would be listening to would require a much more complex description). Furthermore, recording increasingly precise representations of continuous signals takes a great deal of space no matter whether you are operating in the time domain or the frequency domain. However, it is possible to “cheat” in the frequency domain and save space in ways that aren’t possible in the time domain.

  1. One can restrict the precision of the frequency analysis. This adversely affects sound quality, but it may be possible to reduce the size of a precise spectral representation of a sound a great deal before the effect is perceptible.
  2. Alternatively (this is the approach that MP3 uses) one can throw away “psychoacoustically insignificant” frequencies, saving space in the representation. This “lossy encoding” doesn’t always work all that well, since an algorithm (the psychoacoustic model) has to determine which frequencies aren’t important (and, for spectrally complex or “in-tune” music, the algorithm is often wrong). A 96kbps MP3 of a pop song might be adequate, but the Tallis Scholars at 96kbps is probably unlistenable.

Basically, my claim is this: digital audio necessarily discards parts of the original analog signal. However, it is not clear that there exists any reasonable way to reproduce an analog signal with complete fidelity (or what that would mean in the case of reproducing frequencies well above the range of human perception.) Compact discs and MP3s discard different things to save space, but direct comparisons between PCM bit rates (as in CD, DAT, etc.) and compressed bit rates (as in MP3, digital radio, AAC, etc.) are meaningless.

Finally, as long as the bulk of money is in the Antares-Autotune-riddled, equal-tempered, dynamic-range-free pop music world, there is no particular financial incentive for record companies to improve upon the lowly CD.

I’m currently listening to Dies Sind Die Heiligen Zehen Gebot BWV 678 from the album “Clavier-Ubung III” by Masaaki Suzuki

Technorati Tags: , ,