WAVEFORM GENERATION BASED ON SIGNAL RESHAPING FOR STATISTICAL PARAMETRIC SPEECH SYNTHESIS [pdf]

(Felipe Espic, Cassia Valentini-Botinhao, Zhizheng Wu, Simon King / CSTR, University of Edinburgh, UK)

Samples to support the Interspeech 2016 paper titled above. This page contains the following samples:

  • Nat: Natural speech.
  • STR: STRAIGHT.
  • SR_all: Signal Reshaping with “ideal” settings: matched gender voiced base signal, linear-phase filtering, and Mel-scale spectral smoothing.
  • SR_gen: as SR all but base voiced signal is from the opposite gender to target.
  • SR_dp: as SR all but filtering is not linear phase.
  • SR_ns: as SR all but without Mel-warped spectral smoothing of base signal spectral envelopes.

Also, the base signals used to synthesise the provided audios samples are in the bottom of the page.

SamplesNatSTRSR_allSR_genSR_dpSR_ns
female 1
female 2
female 3
male 1
male 2
male 3

 


Base Signals

VoicedUnvoiced
MaleFemale/f//s//ʃ/