Tuesday, June 18, 2024
Google search engine
HomeUncategorizedSmoother sailing: Studying audio imperfections in Steamboat Willie

Smoother sailing: Studying audio imperfections in Steamboat Willie

[Image: Mickey Mouse whistling on the bridge of a steamboat.]

Steamboat Willie (1928) was one of the earliest cartoons with synchronized sound. That is, it had post-production sound effects; this was something new and exciting. Now that the cartoon has recently entered the public domain[bbc24] we can safely delve into its famous soundtrack. See, there’s something interesting about how it sounds…

If you listen closely to the soundtrack on Youtube it sounds somehow distorted. You might be tempted to point out that it’s 96 years old, yes. But you might also recognize that it is suffering from flutter, i.e. an unstable playback or recording speed.

In the spirit of this blog let’s geek out for a bit and study this flutter distortion further. Can we learn something interesting? Could we perhaps learn enough to be able to reduce it?

Of course the flutter might be 100% authentic to how it sounded in theatres in the 1920s; we don’t know when and why it appeared in the audio (more on that later!). It might have sounded even worse. But we can still hope to enjoy the sound effects in their original recorded form.

Prior work

I’m not the first one to notice this clip is ‘fluttering’ and to try and do something about it. I found videos of people’s attempts to un-flutter it using Celemony Capstan, a professional tool made just for this purpose, with varying results. Capstan uses Melodyne’s famous note detection engine to detect musical features and then controls a varispeed effect to cancel out any flutter.

But Capstan is expensive, and it’s more fun to come up with a home-made solution anyway. And what about non-musical sounds? Besides, I had some code laying around in a forgotten desk drawer that just might fit the purpose.

Finding a high quality source

Why would I need a high-quality digital file of a poor-quality soundtrack from the 1920s? I guess it’s the archivist in me hoping that it has been preserved with high level of detail. But also, if you’re going to try and dig up some hidden details in the sound, you’d want minimal interference from any lossy psychoacoustic compression, right? These artifacts might become audible after varispeed effects and could also hinder frequency detection.

[Image: Two spectrograms labeled 'random Youtube video' and '4K version', the former showing compression artifacts.]

The high-quality source I found is in the Internet Archive. It might originally be coming from the 4K Blu-Ray release called Celebrating Mickey. The spectrogram doesn’t show almost any compression artifacts that I can see, even in the quietest frequency ranges! Perfect!

[Image: A single film frame.]

But the Internet Archive delivers something even better. There’s a lossless 4K scan of the movie with the optical soundtrack partially included (above)! The lossless version is 34 GB, but there’s a downscaled 480p MP4 one thousandth of the size.

I listened to the optical soundtrack from this low-resolution version with a little pixel-reader script. Turns out the flutter is already present on the film! (Edit: Note that we don’t know where this particular film print came from. When was it created? Is there an original somewhere, without flutter?)

Hand-guiding a frequency tracker

Looking at the above spectrogram, we can see that the frequency of everything is zig-zagging as a function of time – that’s flutter all right. But how to quantify these variations? We could zoom in on one of the frequency peaks and follow the course of its frequency in time. I’m using FFT peak interpolation to find more accurate frequency estimates[gasior04].

Take the sound of Pete’s tobacco hitting the ship’s bell around the 01’45” mark. You’d think a bell is supposed to have a constant frequency, yet this one sounds quite unstable. We can follow any one of the harmonics and see how the playback speed (bell frequency) varies over the period of one second:

[Image: Spectrogram with fluctuating tones.]

To my eye, this oscillation looks periodic and not random at all. We can run another round of FFT on a longer stretch of samples to find the strongest period of these fluctuations: It turns out to be 15 Hz. (Why 15? I so hoped it would have been 24 Hz – it would have made a more interesting story! More on that later…)

[Image: Spectrum plot showing a peak at 15.0 Hz about 15 dB higher than background.]

Okay, so can we repeat this process for the whole movie? I don’t think we can just automatically follow the frequency of every peak, since some sounds will naturally contain vibration and rises and drops in frequency. Not all of it is due to flutter. Some sort of a vetting process is needed. We could try a tedious manual route…

[Image: GUI of a software with spectrograms and oscillogram plots.]

I made a little software tool (above) where I could click and drag little boxes onto a spectrogram to search for peaks in. This wobbly line is then simply taken to be the speed variation (red graph in the top picture).

It became quite a chore to annotate longer sounds as this software didn’t come with undo, edit, or save features for the longest time!

Now let’s think about what to do with this speed information…

Desk drawer deep dive

Some time ago I had made a tool that could well come in handy now. It was for correcting wobbly wideband radio recordings stored on VHS tapes. These recordings contained some empty carriers that happened to work like seismographs, accurately recording the tape speed variations. The tool then used a Lagrange polynomial to interpolate new samples at a steady interval, so called ‘digital varispeed’.

It was ultimately based on an interesting paper on de-fluttering magnetic tapes using the tape bias signal as reference[howarth04].

[Image: Buttons of an old device, one of them Varispeed, labeled 1981. Below, part of a GUI with the text Varispeed, labeled 2023.]

Here’s what this digital varispeed sounds like when exaggerated. In the below example I’m doing it in a simpler way. Instead of the Lagrange method I first upsampled some music by 10x in an audio software; hand-drew a speed curve in Audacity; and then used that curve to pick samples out of the oversampled music:

[Image: A waveform in Audacity.]

Carefully controlled, this effect can be used to cancel out flutter. Here’s how: If we knew exactly how the playback speed was fluctuating we could instantly vary the speed of our resampler in the opposite direction, thus canceling the variations. And with the above research we now have that knowledge!

Well, almost. I couldn’t always see a clear frequency peak to follow, so the graph is patchy. But.. Maybe it could help to band-pass the speed signal at 15 Hz? This would help fill out small gaps and also preserve vibrato and other fluctuations that aren’t part of the flutter distortion. We can at least try!

[Image: Two waveforms, one of them piecewise and noisy, the other one smooth and continuous.]

In the example above, I replaced empty parts with a constant value of 100% and then filtered the whole thing. This sews the disjointed parts together in a smooth way.

Can we hear some examples already?

This clip is from when the goat ate Minnie’s sheet music and guitar – the apparent catalyst event that sent Mickey Mouse to seek revenge on the entire animal kingdom.

Before [Image: Movie screenshot]
After

You can definitely hear the difference in the bell-like sounds coming from the goats insides. It even sounds like the little flute notes in the beginning are easier to tell apart in the corrected version.

Here’s another musical example, with strings.

Before [Image: Movie screenshot]
After

The cow’s moo. That’s a hard one because it’s so rich in harmonics, in the spectrogram it looks almost like a spaghetti bolognese. My algorithm is constrained to a box and can’t stay with one harmonic when the ‘moo’ slides in frequency. You can hear some artifacts because of this, but still the result sounds less sheep-like than the original.

Before [Image: Movie screenshot]
After

But Mickey whistling “Steamboat Bill” in the beginning of the film actually doesn’t sound better when corrected… I preferred a bit of vibrato!

Before [Image: Movie screenshot]
After

Sidetrack 1: Anything else we can find?

Glad you’re still reading! Let’s step away from flutter for a while and take the raw audio track itself under the Fourier microscope. Zooming closer, is there anything interesting in the lower end?

[Image: Spectrogram showing a frequency range from 0 to 180 Hz.]

We can faintly see peaks at multiples of both 24 and 60 Hz. No surprises there, really… 24 Hz being the film framerate and 60 Hz the North American mains frequency. Was there a projector running in the recording studio? Or maybe it’s an artifact of scanning the soundtrack one frame at a time? In any case, these sounds are pretty weak.

[Image: Spectrogram showing tones with apparent sidebands.]

In some places you can see some sort of modulation that seems to be generating sidebands, just like in radio signals. It’s especially visible in Mickey’s whistle when it’s flutter-corrected, here at the 5-second mark. The sidebands peaks are 107 and 196 Hz away from the ‘carrier’ if you will. I’m not sure what this could be. Fluctuating amplitude?

Sidetrack 2: Playing sound-on-film frame by frame?

This is an experiment I did some time ago. It’s just a silly thought – what would happen if the soundtrack was being read in the same way as the picture is – stopped 24 times per second? Would this be the ultimate flutter distortion?

In the olden days, sound was stored on the film next to the picture frames as analog information. Unlike the picture frames that had to be stopped momentarily for projection, the sound had to be played at a constant speed. There was a complicated mechanism in the projector to make this possible.

I found some speed curves for old-school movie projectors in [bickford72]. They describe the film’s deceleration and acceleration during these stops. Let’s emulate these speed curves in audio with the oversampling varispeed method.

The video below is a 3D animation where this same speed curve controls an animation of a moving film in an imaginary machine. The clip is from another 1920s animation, Alice in the Wooly West (1926).

~~ Now we know ~~

Conclusions

  • We found a 15 Hz speed fluctuation that was, to some extent, reversible.
  • This flutter signal is already present in the optical soundtrack of a film scan.
  • With enough manual work, much of the soundtrack could probably be ‘corrected’.
  • ‘Hmm, that sounds odd’ are sometimes the words of a white rabbit.

References

Read More

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -
Google search engine

Most Popular

Recent Comments