TextStat.py: Home

TextStat.py
Implementations of text, language and corpus processing functions in Python

Damir Cavar

Last change: June 2011

TextStat.py contains numerous functions for text processing, basic NLP, statistical functions for text properties, distributional token or word statistics, and so on.

The Python 3.x source (ZIP file) of TextStat.py
The Pydoc documentation of the code
TextStat.py is is published under the GNU Lesser General Public License Version 3.

See the examples for details about how to use the module.

N-Grams - How to create N-gram models, convert them to feature vectors, use them in statistical analyses, and many more use-cases.
Significance - How to calculate significance in various distributional tasks, collocation analysis, feature selection for classification tasks, and many more related topics.
Classification - How to classify text documents, and basically any distributional or N-gram model, and calculate similarities between documents, models, languages, text genres etc.

TextStat.py - text, language and statistical processing functions in Python 3.x

Simple and maybe useful functions for corpus linguistics, text-based language studies...