- HTML (w3school)
- XML (w3school)
- Regular Expressions, Online Regular Expression Tester
- Unicode, ASCII, Unicode.org
- Text Encoding Initiative (TEI) and the P5 Guidelines
- Antconc Corpus Tool
- OxGarage document conversion (anything to TEI XML)
Statistics
Editors
- Notepad++ for Microsoft® Windows®
- Jedit (Java-based)
- TextWrangler (Mac)
- oXygen XML editor
Language data and corpora
- Language Technology Lab Corpora: LTL-corpus, LLC-corpus
- Gutenberg archive
- Linguistic Data Consortium
- Evaluations and Language Resources Distribution Agency (ELDA)
- European Language Resources Association (ELRA)
- CHILDES Child Language Data Exchange System
- see NLTK data…
- Text resources: Project Gutenberg
- MICASE: Michigan Corpus of Academic Spoken English
- Concordancer (lextutor.ca)
- British National Corpus (BNC)
- The Corpus of Historical American English (COHA)
- The Corpus of Contemporary American English (COCA)
- Digitales Wörterbuch der Deutschen Sprache (DWDS)
- Croatian Language Corpus (CLC)
- VisualThesaurus