**Tutorial on**

Computational Modeling of Lexical and Grammatical Knowledge Acquisition using Machine Learning Techniques

Computational Modeling of Lexical and Grammatical Knowledge Acquisition using Machine Learning Techniques

by Damir Ćavar

December 2004 at the University of Potsdam

**Abstract**

Modeling of different aspects of language acquisition is not only a very relevant research topic in the domain linguistic theory and psycholinguistic language acquisition models, it is also extremely important for successfull computational linguistic applications that have to cope with variability in the language as such. Most linguistic applications are facing the problem of sparsness of data and lexical creativity. The amount of new concepts and corresponding lexical labels is growing continuously at a very high paste. The following tutorial will introduce aspects of computational modeling in the domain of empirical research on cognitive and psycholinguistic aspects of language acquisition, and show how these models can be applied to real world problems in the domain of applied computational linguistics.

In this tutorial we will discuss different machine learning methods for the acquisition of lexical, and grammatical knowledge. The following topics will be discussed:

1. Introduction into statistical induction for linguistics

2. Information theoretic approaches: Entropy, Mutual Information, Relative Entropy

3. Morphological and syntactic analysis and rule induction

4. Suport Vector Machines/Vector Space Modeling, Clustering and Classification, Latent Semantic Analysis for lexical induction

All concepts will be introduced theoretically, as well as with practical examples. The practical examples will be based entirely on Python code.

No programming knowledge is required, neither detailed knowledge of statistical and information theoretic concepts.