podcasters, beg-a-thons, and bandwidth

NPR is famous for beg-a-thons, wherein they lament the high cost of production and distribution and, oh yeah, solicit funds.  It’s more pronounced now that they have started podcasting.  More popular podcasts (particularly those who are making a go at doing it for a living) also have the same complaints.

I am slowly being converted to the opinion that the bandwidth complaint is a fundraising ploy. If they were really concerned about the cost of bandwidth they would attempt to minimize the use of bandwidth.  But the bandwidth Jeremiahs are the same folks who apparently take no care to reduce their own footprint, putting out excessively, needlessly large audiofiles.  YES it’s their podcast and their content and they can put it out however they wish, but if you want to beg for donations to cover your costs then I will expect you to attempt to minimize your costs.

Consider these ideas for reducing bandwidth;  they take a bit of up-front scripting to make it work programatically, then it just motors along with zero added effort from the podcaster:

  1. Make a bittorrent feed available.  Your listeners will happily donate bandwidth to keep your content flowing.
  2. Publish two podcast feeds:  a feed with all the bells/whistles and a low-bandwidth feed.
  3. use a lower sampling rate.  44.1 is CD quality. Does your spoken word podcast require better-than-CD sampling?  For our purposes, the Nyquist Rate predicts sampling should be 2x the highest audio frequency.    Since the human ear hears roughly 20-20,000Hz, this explains the 44.1k sampling rate of CDs.  If your podcast does not contain high frequencies at the extremes of human hearing then it does not need a high samping rate.
  4. use Variable Bit Rate (VBR) rather than Constant Bit Rate (CBR).  This ensures no frame uses excessive bitrate to encode the audio.
  5. if you must use CBR, use a lower bitrate.
  6. if your podcast is mono, then encode as mono rather than stereo.  Joint stereo reduced some of the waste when distributing mono in stereo encodes, but it’s still a waste.
  7. Consider voice presets in your favorite encoder.  See the lame voice preset result below.  This single change would probably make the biggest difference in podcasting bandwidth for most content producers.
  8. Consider other formats.  If you insist on high bitrates, .ogg can shave a bit off the filesize (see below).   If are really serious, use a speech-specific format like speex.

A specific example

Here’s the last file I downloaded from the feed of FreedomainRadio, your friendly neighborhood anarcho-capitalist (recommended, btw):

$ file FDR_2096_Sunday_Show_19_Feb_2012.mp3
FDR_2096_Sunday_Show_19_Feb_2012.mp3: Audio file with ID3 version 2.3.0, contains: MPEG ADTS, layer III, v1, 128 kbps, 48 kHz, Monaural
For the purposes of re-encoding I decoded to .wav using mgp123.  The result was a 48kHz .wav file.  I resampled in some cases below to get 44.1kHz .wav files or lower.
101097  FDR_2096_Sunday_Show_19_Feb_2012.mp3
The original file is 101.1MB, sampled at 48kHz and CBR 128K, representing a 0% savings in bandwidth cost.  At least it’s mono, which is not something we can take for granted with voice podcasts.
Ok, so let’s do some encoding with more bandwidth-friendly options in LAME.
100576 Feb 21 01:14 fdrtest-lamevbr-noresample.mp3
 97379 Feb 21 01:28 fdrtest.lamevbr-resample.mp3
 53812 Feb 21 00:48 fdrtest-lame-voice-preset.mp3
Encoding VBR at the original wonky sampling rate is 100.58MB, representing 0.5% savings in bandwidth.  Not much, but would the podcaster rather keep half of 1% of bandwidth costs in his pocket?  You bet.
VBR at 44.1kHz is 97.38MB, 3.7% savings.  Now we are getting somewhere.  And the ‘cast should never have been distributed as 48kHz, anyhow.
LAME’s –preset voice flag is probably what podcasters should be using by default.  Notice it is 53.81MB, for a 46.8% savings in bandwidth.  With no added effort.  With little or no degradation in voice-only audio. My friends, this is what I call Good Enough.  Yes, music gets thin and weird with this preset but we are talking about voice here.
68536 Feb 21 01:08 fdrtest.ogg

Changing nothing in the original 48kHz .wav, encoding with ogg vorbis gives us a 68.54MB file, for a 32.2% savings.  Not bad, though one might lose some windoze/mac listeners.  But as a second feed…  Note that the .ogg advantage will decrease on lower quality sounds files.  Speex is what you use for those.

50746 fdrtest.48k-original-sample.speex
 14498  fdrtest.08k.speex
 26394 fdrtest.16k.speex
 33830 fdrtest.32k.speex
Speex is a codec made expressly for voice.  It is not good for music.  That VOIP app you use probably uses speex.  Speex is a minority codec;  it does not have large mindshare or even widespread technical adoption.  The default Android media app does not play speex as of this writing, and that’s a shame.  Speex also really only works well with certain predefined sampling rates:  32, 16, and 8kHz.
The first run is speex encoding the 48kHz wav.  It hated that rate and said so when invoked.  But it output a file of 50.75MB, for a savings of 49.8%.
The best use of speex for podcast is probably encoding 32kHz files.  Our trial resulted in 33.83MB, a savings of 66.5%.  Or even speex at 16kHz (73.9% savings).
For completeness I also tested speex at  8kHz (85.7% savings) but it’s not really practical.  8kHz is  listenable for short periods (like voicemail) but fatigues the ears after long exposure.   It sounds like a telephone.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s