Posts Tagged sounds

A voice basis

Not too long ago, Leslie and I were wondering about pronunciations of names.   We had found a site that had some audio samples of the pronunciations, and we had started playing several of them over and over again.  It sounded pretty funny listening to them, and I thought it would be neat to hear a real audio track represented with a bunch of people just saying names.  Then you keep adding more people saying more different names until you get something that “sounds” like the input.

The set of audio samples of someone saying names, become a basis for your audio signal.  I hacked together a quick program and some scripts to use a set of voices to represent an audio track.  The algorithm simply tries to fit the audio samples to the input signal and keeps layering audio to fit the residual.  It’s a greedy approach, where the best fit from the database is chosen first.  Each layer positions as many database samples over the input signal (or residual signal) in a non-overlapping way (see the code for more details).  There are probably much faster ways to dot his, perhaps using spectral techniques, but I wanted something quick and dirty (e.g., a few hours of work).

The result doesn’t sound near as compelling as I had imagined.  To emphasize that it is just people speaking names that are used to produce the target audio track, I’ve broken the target audio track into 5 second segments.  In the beginning, 5*pow(2, i) speakers are used to represent the signal for the i-th segment, so that the signal gets better as you listen longer.

The input audio track is a 60s sample from “Am I a Good Man”.

In the last segment, 10240 speakers are used to represent the signal.  Results:

five_kmw.mp3:The best example as it has the most names and the most speakers (names that start with K, M, V)

five.mp3:uses fewer names (only those that start with K)

three.mp3. uses only 3*pow(2, i) voices at each segment, so the approximation is not as good.


Code (with scripts, in a zip file)

, , ,

1 Comment