Colloquium and Research Group Meeting working on Speech Signal Processing and Prosody in relation to Information Theory (Pragmatics, Semantics) using Deep Learning

If you are interested in participating in a research and reading group on prosodic and supra-segmental speech processing, we are meetings Mondays at 2:30 in the 111 N Bryan Ave house.

Our goals are to understand how to extract detailed properties from speech signal using common libraries (also with a Python interface), extracting:

  • F0 or vocal pitch variation and the corresponding timing information
  • acoustic intensity or loudness (relative variation or fluctuation in speech in real-time mapped on the time axis)
  • rhythm, looking at phoneme and syllable duration

We are interested in studying this for setting up real-time processing linked to speech recognition services to be able to augment ASR output (speech recognition based transcription) with prosodic cues. We want to relate these cues to information theoretic processing (pragmatics and semantics), processing contrastive stress, intonation contours specific to interrogative or declarative utterances, and so on.

There is a real data and corpus creation component to this project, as well as coding, implementation, and experimenting with ML and Deep Learning speech processing algorithms.

If you are interested to work with us on these problems, let us know and join us.

The group is managed by:

We have some undergraduate students interested in this project, as well as students from Damir’s L665 class.

For more information consult the Wiki-page on GitHub, as well as a code repository.