Python

My PyPi modules:

Some of the material is available on GitHub or Bitbucket.

I developed some Python and NLP, CL, ML teaching material as iPython notebooks for jupyter. They will all be linked here eventually, here are some examples:

Intro to Part-of-Speech Tagging (zip, jupyter nbviewer, Anaconda Cloud Notebook, GitHub repo)
Intro to Hidden Markov Models (zip, jupyter nbviewer, Anaconda Cloud Notebook, GitHub repo)
Intro to WordNet and NLTK (zip, jupyter nbviewer, GitHub repo)
Topic Modeling with MALLET (zip, jupyter nbviewer, GitHub repo)
Intro to the Forward Algorithm (zip, jupyter nbviewer)
Intro to the Backward Algorithm (zip, jupyter nbviewer)
…

I was porting some Finite State algorithms to Python 3 for some more or less functional FST-lib for Weighted Finite State Transducers in native Python, and code generation to C for example. I will place the code on GitHub: Project PyFST

Here is some of the material from my Python classes and developments. Some of it is from the late 90s, so it might be outdated, and not really working in Python 3.x. Some of the Python examples and tutorials (slides and instruction handouts) for corpus, data and language processing are adapted to Python 3.

course material for JSSECL 2006
course material for the DGfS/CL Fall School 2005
Corpus processing tools (TEI XML from HTML, XML filtering, quantitative analysis)
Language identification (LID) with n-gram models
Orthography to IPA conversion for Croatian (with Malgorzata E. Cavar): see phonemic
Parsing algorithms (Charty, Earley algorithm in Python, Scheme and JavaScript, and other computational syntax tools)
TextStat.py lightweight module with functions for creating and using n-gram models for statistical analyses, various statistical functions, chi2 test, vector space conversion of n-gram models, entropy and information theoretic measures etc. There are examples for document classification, measures of text or model similarity and various other useful functions.
Finite State Automata (FSA) scripts: FSA class, automaton from word list, DOT (Graphviz) from automaton, etc.
Mutual Information and Relative Entropy syntactic parsing (Python code base)
Text 2 TEI XML with linguistic annotation
Lithuanian, Croatian, ... finite state morphology (transducer, lemmatized, feature annotation) (mostly in C++ now, see the FLE Project)

Python

My PyPi modules:

More: