LING 592: Statistics in Language Technologies

  • Main / 
  • Links / 
  • Main
  • Links
  • Blog


R
  • The Comprehensive R Archive Network
  • G. Jay Kerns (2010) Introduction to Probability and Statistics Using R.

Statistics
  • Chi2 table

Other course pages
  • John Hale’s Ling 4476 Statistics for Linguists

Some tools
  • Wolfram Demonstrations Project: Collocation by Chi Square

Python
  • Python.org
  • ActiveState ActivePython 3.x and Komodo Edit
  • Python Natural Language Toolkit (NLTK)
  • TextStat.py by Damir Cavar
  • Dive into Python (free online book)
  • Learning to Program
  • How to Think Like a Computer Scientist: Learning with Python (2nd ed.) (free online book)
  • Thinking in Python (free online book)

Language data and corpora
  • Linguistic Data Consortium
  • Evaluations and Language Resources Distribution Agency (ELDA)
  • European Language Resources Association (ELRA)
  • CHILDES Child Language Data Exchange System
  • see NLTK data…
  • Text resources: Project Gutenberg
  • MICASE: Michigan Corpus of Academic Spoken English
  • Concordancer (lextutor.ca)
  • British National Corpus (BNC)
  • The Corpus of Historical American English (COHA)
  • The Corpus of Contemporary American English (COCA)
  • Digitales Wörterbuch der Deutschen Sprache (DWDS)
  • Croatian Language Corpus (CLC)
  • VisualThesaurus

© 2012 Damir Cavar Contact the instructor