Research focus: Speech Processing of Prosody for Pragmatics and Dialogs, Chatbots, and AI

The focus of this research project lies on prosody, intonation countour detection, focus and stress pattern analysis for the processing of semantic and pragmatic aspects of spoken language.

The relevance of technologies that would correctly map stress patterns onto a string representation of language utterances becomes clear when you consider the following examples:

Common speech recognition systems (ASR, speech to text transcription) would not make a difference between the transcription of the flat intonation utterance or the one with stress on a particular word or phrase. Such stressed words or phrases indicate for example contrast to a previously uttered proposition or claim. In this case the interpretation is: no, I did not buy a car, I bought a bicycle.

More complicated are examples with intonation that indicates an echo or rhetorical question:

Common ASRs would not detect the intonation contour and transcribe the response utterance without indications of question intonation, e.g. "you bought a new car".

The goal in this research project is to:

If you are interested in working on the creation of a prosodic speech corpus, speech signal processing, and semantic and pragmatic NLP, contact me please.

There are various GitHub repositories with material and code that explain the project. We are also collecting a corpus and asking people to donate speech to it. Once prepared, annotated and transcribed, it will be made available publicly. See for detail how to donate speech at the corresponding GitHub page.