DRAFT – Work in Progress

The standard Ubuntu distribution comes with various linguistic tools. I am linking here to 16.04(.1).

Various Finite State Transducer toolkits can be found in the package list that are used for the development of morphological analyzers, tokenizers, and other NLP tools:

There are also ready NLP tools for various languages in the standard package list:

  • Chasen, a Japanese Morphological Analysis System (chasen)
  • Juman, Japanese morphological analysis system (juman)
  • Frog, a morphosyntactic tagger, lemmatizer, morphological analyzer, and dependency parser for Dutch. (seems to be missing in the new distro) (see frogdata)

Other Repositories

Some repositories provide more packages that might be interesting or useful for linguistic work, be it language documentation or corpus linguistics:

I set up the SIL repository by creating as root or using sudo a file:


with this content for Ubuntu xenial (16.04):

deb http://packages.sil.org/ubuntu xenial main