A High-Efficency Audio Codec
13 years ago
General
Hmmm, I feel a song coming on... how about a little Pendulum?
Enjoyed? Sounded good? (If it didn't sound good, read on for a possible explanation) Now how big do you suppose that file was? About 5 minutes of music, 44kHz sample rate, stereo... would you guess it was less than a megabyte?
Yes, 5 minutes of near-CD-quality stereo music in just 868kB. A variable bitrate stream that averages around just 24kbps. You could send that over a 33.6 modem with room to spare. That is the magic of the AAC-HEv2 codec. And of course I wouldn't be bringing it up if behind the magic was some real genius work.
The original AAC codec was a good lossy audio codec and performed a bit better than MP3 at similar bit rates. Still, when the bit rates got low, it sounded just as crummy and muffled as any other codec. There's only so much you can cram into a small stream.
Enter the aural magicians.
They took the AAC-LC (low complexity) codec and added Spectral Band Repliction or SBR. SBR takes the low and middle frequencies of audio and, based on harmonics and transposition, can recreate the high frequency section fairly accurately. It gets some help from math sent over a side-band in the stream when necessary, but for the most part it can work on its own.
Now that crummy, dull-sounding 24kbps stream gets high frequencies added in for little or no extra bandwidth. Certainly not as much as it would take to actually encode and send those high frequencies: the heavy lifting and reconstruction is done by the decoder itself. This became AAC-HE (high efficiency).
What to do for an encore? So far we're only talking about a mono audio stream here. Why don't we make it stereo? Consider it done. Enter Parametric Stereo or PS. PS does for the spatial encoding what SBR does for the spectral encoding: sends the decoder mathematical information so it can process the mono stream back into a fairly-faithful stereo image. All this for just an added 3-4kbps. A 40 kbps stream (two 20kpbs mono streams for left and right) or a 24kpbs stream with Parametric Stereo information? I think that's a no-brainer.
Adding PS to AAC-HE gives us AAC-HEv2, what you're hearing in the audio clip at the start of this journal. And like stereo FM broadcasting, it's completely backward compatible with players that may not understand the add-ons. If you listened to the music and it didn't sound so great, you may have a player that doesn't understand AAC-HEv2.
If it only understands AAC, you'll get a 20 kbps mono stream that sounds like 20kbps. If it understands AAC-HE, you get the good sounds, but only in mono. Nothing ever breaks outright, you just don't get the extra goodness associated with the later versions.
I'll leave you with a few more selections to sample. Enjoy!
Industria (913kB)
Kid For Today (1.13MB)
Shiny World (1.08MB)
By the way
, if you ever wondered why the first second of a stream from DI or Sky.FM would sound muffled when the Freevo would start playing it, this is your answer. For some reason, MPlayer starts with the basic AAC stream and then the extra math kicks in to make it sound good.
I'm pretty confident most modern audio players should understand AAC-HEv2, but if yours doesn't, I know from experience that Foobar 2000 and VLC play back correctly. Give them a whirl.
Enjoyed? Sounded good? (If it didn't sound good, read on for a possible explanation) Now how big do you suppose that file was? About 5 minutes of music, 44kHz sample rate, stereo... would you guess it was less than a megabyte?
Yes, 5 minutes of near-CD-quality stereo music in just 868kB. A variable bitrate stream that averages around just 24kbps. You could send that over a 33.6 modem with room to spare. That is the magic of the AAC-HEv2 codec. And of course I wouldn't be bringing it up if behind the magic was some real genius work.
The original AAC codec was a good lossy audio codec and performed a bit better than MP3 at similar bit rates. Still, when the bit rates got low, it sounded just as crummy and muffled as any other codec. There's only so much you can cram into a small stream.
Enter the aural magicians.
They took the AAC-LC (low complexity) codec and added Spectral Band Repliction or SBR. SBR takes the low and middle frequencies of audio and, based on harmonics and transposition, can recreate the high frequency section fairly accurately. It gets some help from math sent over a side-band in the stream when necessary, but for the most part it can work on its own.
Now that crummy, dull-sounding 24kbps stream gets high frequencies added in for little or no extra bandwidth. Certainly not as much as it would take to actually encode and send those high frequencies: the heavy lifting and reconstruction is done by the decoder itself. This became AAC-HE (high efficiency).
What to do for an encore? So far we're only talking about a mono audio stream here. Why don't we make it stereo? Consider it done. Enter Parametric Stereo or PS. PS does for the spatial encoding what SBR does for the spectral encoding: sends the decoder mathematical information so it can process the mono stream back into a fairly-faithful stereo image. All this for just an added 3-4kbps. A 40 kbps stream (two 20kpbs mono streams for left and right) or a 24kpbs stream with Parametric Stereo information? I think that's a no-brainer.
Adding PS to AAC-HE gives us AAC-HEv2, what you're hearing in the audio clip at the start of this journal. And like stereo FM broadcasting, it's completely backward compatible with players that may not understand the add-ons. If you listened to the music and it didn't sound so great, you may have a player that doesn't understand AAC-HEv2.
If it only understands AAC, you'll get a 20 kbps mono stream that sounds like 20kbps. If it understands AAC-HE, you get the good sounds, but only in mono. Nothing ever breaks outright, you just don't get the extra goodness associated with the later versions.
I'll leave you with a few more selections to sample. Enjoy!
Industria (913kB)
Kid For Today (1.13MB)
Shiny World (1.08MB)
By the way
, if you ever wondered why the first second of a stream from DI or Sky.FM would sound muffled when the Freevo would start playing it, this is your answer. For some reason, MPlayer starts with the basic AAC stream and then the extra math kicks in to make it sound good.I'm pretty confident most modern audio players should understand AAC-HEv2, but if yours doesn't, I know from experience that Foobar 2000 and VLC play back correctly. Give them a whirl.
FA+

Thanks for the music links, I've missed being exposed to your tastes in music. I know you are more a fan of liquid DnB and those genres, while I am still firmly entrenched in Trance (Epic Trance has become my new favorite flavor).
Also, some players don't handle it well. My phone, for example, plays back HEv2 files but seems to insert chirpy artifacts (reminiscent of CD-skips) when decoding. For that reason, I went back to MP3s, but I don't keep a lot of songs on my phone anyway.
If you wanna play around and experiment, you can get the official reference encoder here, from Nero. It's not for the command-line-adverse, but it's free. Normally if you feed it a bit-rate or quality, it'll pick which version of AAC it should use (LC, HE or HEv2) so you can play with several flavors.
And remember (as I never do) that the command line bit-rate is the full amount, not "k": enter 24000 and not 24, unless you want to hear the codec really thrash around trying to give you a 24 bits-per-second stream... ^_^
In terms of audio bang for your bandwidth buck, it's still quite good!