Sound Synthesis

The course Audio Analysis and Synthesis  taught by Will Pirkle at University of Miami consists of the development of real synthesizers written in C++. I had attended some workshops about synthesis, but they were focused mainly on the theory rather than implementation in real time. On the other hand, we had to deal with theory and also implementation issues at the same time in this course.  The programs were implemented on C++ with Socket (currently RackAfx)  in such way that each stage was built up on the previous one.  I called my final project as Vowel Synthesizer,  which was presented in public at the Music School:

[hdplay playlistid=1 flashvars=”autoplay=false”]

We starting building a synthesizer based on the Novation BassStation. This is an analog synth widely used during the 90’s, which comprises two oscillators, an LP filter,  ADRS modules, and so on. The oscillators produce sawtooth and PWM signals. Although the analog synth was based on the BassStation, it actually possess many other features. The following video shows them.




The Vowel Synthesizer shown on the first video is based on the Source/Filter model, which is widely used for speech analysis and synthesis. Generally speaking, it states that speech production and also some acoustic musical instruments can be modeled as a basic signal generator (source) that feeds a shaping filter as shown on Fig. 1


Fig 1. Diagram of the Source/Filter Model of Speech Production


The human speech production system is a complex musical instrument, which can produce musical notes while changing its shape; such ability that most of the musical instruments lack. This allows communication by sending messages using the spoken language. Specifically, each produced vocal tract shape behaves as a different acoustical filter that produces a determined voiced phoneme (vowels or nasals). In simplest words, each filter is the acoustical representation of one voiced phoneme.


The Vowel Synthesizer works mixing and switching different source signals and shaping filters. The source signals are human voice, violin, several people speaking, flute, and synthesized sounds. The shaping filters correspond to the vowels /a/,/e/,/i/,/o/,/u/ (of Spanish language), piano, flute, violin and synthesized filters. In the case of human voice and acoustical instruments it is necessary to estimate the source signals and the shaping filters from real recorded sounds.


There are several methods to estimate the vocal tract filter (shaping filter) and the glottal pulse (source signal) from real speech signals such as Linear Prediction, Cepstral Analysis, Adaptive Filtering, etc. The following video is a comprehensive demo of the Vowel Synth, whose first part describes its operation and a demo is played on the second half. Thr code of the synths can be downloaded here.

[hdplay playlistid=2 flashvars=”autoplay=false”]

Comments are closed, but trackbacks and pingbacks are open.