Research focus: Speech Processing of Prosody for Pragmatics and Dialogs, Chatbots, and AI

The focus of this research project lies on prosody, intonation countour detection, focus and stress pattern analysis for the processing of semantic and pragmatic aspects of spoken language.

The relevance of technologies that would correctly map stress patterns onto a string representation of language utterances becomes clear when you consider the following examples:

Person A says: "You bought a new car. Interesting.
Person B answers: "No, I bought a new BICYCLE." (with stress on BICYCLE)

Common speech recognition systems (ASR, speech to text transcription) would not make a difference between the transcription of the flat intonation utterance or the one with stress on a particular word or phrase. Such stressed words or phrases indicate for example contrast to a previously uttered proposition or claim. In this case the interpretation is: no, I did not buy a car, I bought a bicycle.

More complicated are examples with intonation that indicates an echo or rhetorical question:

Person A says: "I bought a new car!
Person B asks: "You bought a new car?" (with a question intonation contour)

Common ASRs would not detect the intonation contour and transcribe the response utterance without indications of question intonation, e.g. "you bought a new car".

The goal in this research project is to:

Generate a spoken language corpus with all necessary variation in intonation, focus stress, and corresponding semantic and pragmatic variation in interpretation.
Train models and develop technologies that are able to detect the intonation contour type and to associate potential stress in the speach signal with the corresponding string representation of the word or phrase.
Integrate the resulting technologies in a higher level semantic and pragmatic NLP component that links acoustic properties of spoken language with semantic and pragmatic properties for use in Chatbots and dialog systems, that is in AI.

If you are interested in working on the creation of a prosodic speech corpus, speech signal processing, and semantic and pragmatic NLP, contact me please.

There are various GitHub repositories with material and code that explain the project. We are also collecting a corpus and asking people to donate speech to it. Once prepared, annotated and transcribed, it will be made available publicly. See for detail how to donate speech at the corresponding GitHub page.