LING 479: Language Corpora and Software

  • Main / 
  • Links / 
  • Main
  • Links
  • Blog


Technologies
  • HTML (w3school)
  • XML (w3school)
  • Regular Expressions, Online Regular Expression Tester
  • Unicode, ASCII, Unicode.org
  • Text Encoding Initiative (TEI) and the P5 Guidelines
  • Antconc Corpus Tool
  • OxGarage document conversion (anything to TEI XML)

Statistics
  • Chi2 table

Editors
  • Notepad++ for Microsoft® Windows®
  • Jedit (Java-based)
  • TextWrangler (Mac)
  • oXygen XML editor

Language data and corpora
  • Language Technology Lab Corpora: LTL-corpus, LLC-corpus
  • Gutenberg archive
  • Linguistic Data Consortium
  • Evaluations and Language Resources Distribution Agency (ELDA)
  • European Language Resources Association (ELRA)
  • CHILDES Child Language Data Exchange System
  • see NLTK data…
  • Text resources: Project Gutenberg
  • MICASE: Michigan Corpus of Academic Spoken English
  • Concordancer (lextutor.ca)
  • British National Corpus (BNC)
  • The Corpus of Historical American English (COHA)
  • The Corpus of Contemporary American English (COCA)
  • Digitales Wörterbuch der Deutschen Sprache (DWDS)
  • Croatian Language Corpus (CLC)
  • VisualThesaurus

© 2012-2013 Damir Cavar Contact the instructor