Much Ado About Loudness
The author was invited by the Alberta chapter of the Audio Engineering Society (AES) to The Banff Centre to discuss emerging trends in audio post-production, new delivery formats and related encoding processes used in film and TV production. The following is a summary of some of the points made in his presentation, which took place on January 23, 2005.
Well here we are in 2006, and it seems like “loudness” issues are driving us all mad (if not also deaf). For the music folks, it’s a question of how loud is the next song on the radio or how loud is the next CD in the jukebox. For the TV broadcaster, it has impact on how many complaints they receive for large dynamics within a dramatic program, how different the program material sounds from the commercials or how much of a difference is noted between stations. For the film people, like myself, we have to deal with the relative differences between extremely varied content playing through a system that can be dangerous when abused… trailers and promos being the most obvious examples.
In the past decade, we have seen an increased awareness of these problems. We are now over the threshold and viable solutions are presenting themselves.
The film solution was the least challenging. Now before you write me nasty emails about how there was nothing simple about it, I’ll clarify. Film mixers have one advantage over all other mixers in the audio community (or are we a co-op??): theaters are calibrated to a known set of parameters. Whether you’re in the mix room or the cinema, we would expect to see similar levels from the screen, surround and Subwoofer speakers (for more on these specs, I highly recommend a visit to Dolby.com). So in any given film room, I would expect to see 85 dB SPL (when sending pink noise at 0VU, measured with a C-weighting curve and SLOW response time) from each of the screen speakers and 80–82 dB SPL from the each surrounds, hopefully giving a sum of 85 dB. The Sub needs to be adjusted with an RTA so that the sub gives a reading 10 dB higher than any of the screen channels, but only in the frequency range desired (a typical Radio Shack SPL meter is not accurate in this range). All this to say that in an ideal situation, we have a pretty good idea of how loud things will be in a calibrated room. The problem we’ve had was that teasers and trailers are mixed like TV commercials. In meetings with some of the Canadian distributors, one of the agents said: “It’s not that we want to be louder than everyone else, we’re just afraid of being softer” ...... errrrrrr. What happens is that in your standard mega-psycho-plex with 16 theaters and only one projectionist, the volumes are set to get the least number of complaints. If the trailers are all loud, the manager turns down the level and, unfortunately, the poor projectionist never makes it back to turn up the sound for the feature. Some mixers compensate by mixing the films louder, but basically we’ve already begun the downward spiral.
The Leq(m) is a measurement of loudness over time. When we refer to loudness, it corresponds to peak, average AND frequency content. By adding in a way of averaging it over time, we can now determine its impact on the average human ear. Dolby’s printmastering box, the DMU, gives a read-out of the Leq(m). If you follow the norms set up by groups like TASA in the US, trailers cannot have an Leq(m) of more than 85 from first frame to last. I’ve mixed several trailers in the last few months and I can honestly say that I never feel constrained artistically. In some ways, my mixes are better, because I pay closer attention to my dynamics and frequency content... if I need to make it loud, then I make it softer elsewhere, etc. By adopting and adhering to these rules, we assure that the cinemas will stay closer to the calibration level and that our 5-week mix will be heard the way we intended it.
The TV community doesn’t have the luxury of standardized listening environments. They have to deal with everything from the 5.1 home cinema to the monaural speaker in the 12-inch kitchen TV. Listening can be affected by extraneous noise. Overall dynamics can be problematic if people are listening late at night.
What is meta-data? Well, simply put, it is data about the data. Things as simple as header information that differentiates between a WORD document and an audio file are forms of metadata. The meta-data that will get us through the TV loudness problems are a little more involved. The mains components are Downmixing, Dialnorm, and Dynamic Range Control (the 3D of metadata according to Tim Caroll of Linear Acoustics). Different types of material have different dynamic needs. What metadata can do for us, is give us the ability to deliver one mix to suit a multitude of needs. We can deliver the 5.1 surround mix and the consumers TV/decoder/set-top box can derive any other needed signals (dolby pro-logic, stereo, mono, etc.). Now as a mixer, I will go on record saying that no stereo downmix done by a machine will have the same details as a downmix done by an audio professional, but since the broadcasters won’t take several different mixes for each show, the next best thing is for me to set the metadata parameters dictating how that downmix will be done. Same goes for dynamics. If I could deliver one mix for broadcast and a second for VHS (god-forbid) and another for DVD, all would be great, but it turns out the production companies only want one LtRt or stereo version of the program. Long live the metadata that allows me to pick how much compression will be used for the broadcast RF mix, how much for the average listener and none for the true believer. Now we get to the main event...
Dialnorm stands for dialog normalization. Basically, when flicking from channel to channel or between shows and commercials, we perceive different sound levels based on dialog material. Sure the music and effects play into it, but you can’t expect the whispered loved scene to be the same as the car chase. Dialnorm values are derived from the average level of dialog for each show. A dramatic TV show might have a Dialnorm of -24 dB, the news might come in at -16, and a rock video at -10. If we drop them all to a common point (Dolby uses -31), then when we change channels the set-top box at your home re-aligns all the shows according to their values. The result is a fairly constant listening level for everything. The bonus is that we retain the original dynamics of show. If we also apply dynamic range control, we can tailor the sound to the environment and not have to over-compress dramatic mixes to fit into an analog broadcast chain.
The music world, as of right now, has no plans for implementation of metadata for music. Bob Katz told me that his dream would be to find a way of applying something similar to music through the digital radio streams. We could finally get away from having to squish everything through a Raves L7 Hypersquisherizer just so it sounds good on an mp3 player on a bus.
I challenge all of you to bring the life back to audio. We can’t keep getting louder and louder. Quality audio needs a large dynamic range. Let’s get back to making at least one version of the mix live dynamically….. metadata can help us with the rest.