Implementations of text, language and corpus processing functions in Python

Damir Cavar

Last change: June 2011

TextStat.py contains numerous functions for text processing, basic NLP, statistical functions for text properties, distributional token or word statistics, and so on.

See the examples for details about how to use the module.
  • N-Grams - How to create N-gram models, convert them to feature vectors, use them in statistical analyses, and many more use-cases.
  • Significance - How to calculate significance in various distributional tasks, collocation analysis, feature selection for classification tasks, and many more related topics.
  • Classification - How to classify text documents, and basically any distributional or N-gram model, and calculate similarities between documents, models, languages, text genres etc.