TextStat.py
Implementations of text, language and corpus processing functions in Python
Damir Cavar
Last change: June 2011
TextStat.py contains numerous functions for text processing, basic NLP, statistical functions for text properties, distributional token or word statistics, and so on.
- The Python 3.x source (ZIP file) of TextStat.py
- The Pydoc documentation of the code
- TextStat.py is is published under the GNU Lesser General Public License Version 3.
See the examples for details about how to use the module.
- N-Grams - How to create N-gram models, convert them to feature vectors, use them in statistical analyses, and many more use-cases.
- Significance - How to calculate significance in various distributional tasks, collocation analysis, feature selection for classification tasks, and many more related topics.
- Classification - How to classify text documents, and basically any distributional or N-gram model, and calculate similarities between documents, models, languages, text genres etc.