LID - Language Identification in Python

The following code is developed using Python 2.7.x. might make it Python 3 compatible. For testing on Microsoft Windows systems, you will need to install Python. Check this site here... Mac OS X and typical Linux systems should come with an appropriate version of Python pre-installed.

To test LID, just run:

python lid.py sample*.py

in a command line shell.

Download the complete archive as a ZIP-file.

For an implementation in other languages, have a look at the bottom of the Home-page.

lid.py - The main code for the language identifier.
lid-speech.py - The language identifier using Win32 Python extensions and output via speech synthesis. (Only for Windows)
lidtrainer.py - The main code of the LID trainer. It generates n-gram models from given texts of a specific language.
sample1.txt - A sample text for testing with LID.
sample2.txt - A sample text for testing with LID.
sample3.txt - A sample text for testing with LID.
Croatian.dat - Croatian language model (just a simple example trained on a couple of text-files).
English.dat - English language model (just a simple example trained on a couple of text-files).
German.dat - German language model (just a simple example trained on a couple of text-files).
Italian.dat - Italian language model (just a simple example trained on a couple of text-files).
Japanese.dat - Japanese language model (just a simple example trained on a couple of text-files).
Afrikaans.dat - Afrikaans language model

LID - Language Identification in Python

Practical use of n-gram models and simple statistics