The following code is developed using Python 2.7.x. might make it Python 3 compatible. For testing on Microsoft Windows systems, you will need to install Python. Check this site here... Mac OS X and typical Linux systems should come with an appropriate version of Python pre-installed.

To test LID, just run:

python lid.py sample*.py

in a command line shell.

Download the complete archive as a ZIP-file.

For an implementation in other languages, have a look at the bottom of the Home-page.

  • lid.py - The main code for the language identifier.
  • lid-speech.py - The language identifier using Win32 Python extensions and output via speech synthesis. (Only for Windows)
  • lidtrainer.py - The main code of the LID trainer. It generates n-gram models from given texts of a specific language.
  • sample1.txt - A sample text for testing with LID.
  • sample2.txt - A sample text for testing with LID.
  • sample3.txt - A sample text for testing with LID.
  • Croatian.dat - Croatian language model (just a simple example trained on a couple of text-files).
  • English.dat - English language model (just a simple example trained on a couple of text-files).
  • German.dat - German language model (just a simple example trained on a couple of text-files).
  • Italian.dat - Italian language model (just a simple example trained on a couple of text-files).
  • Japanese.dat - Japanese language model (just a simple example trained on a couple of text-files).
  • Afrikaans.dat - Afrikaans language model