Damir Cavar's Homepage

Logo

Damir Cavar is a Natural Language Processing, AI, and Knowledge Representation scientist

curriculum vitae

publications

talks

research

teaching

code

blog

View My GitHub Profile

5 April 2017

ELAN2split

by Damir Cavar

I published a new version of ELAN2split on Bitbucket.

ELAN2split is a tool that creates pairs of audio/transcription files that correspond to time-aligned segments in an ELAN file. Each time-aligned segment is saved in two files, the trimmed WAVE-file from the original recording and the transcription or annotation text in the corresponding tier that can be selected via command line. This corpus is ideal to build and train a Forced Aligner and initial speech corpora and subsequently to train a speech recognizer. I built this tool to work with the Prosodylab-Aligner.

This is a command line tool. It does not come with a graphical interface. Binary versions for Ubuntu 16.04 64-bit and Mac OSX are available in the Downloads section of the Bitbucket repository.

tags: ELAN2split ELAN Prosodylab-Aligner "Forced Aligner" "Speech Corpus"