CSCI-B 659 Topics in Artificial Intelligence - Advanced Natural Language Processing

LING-L 645 Advanced Natural Language Processing

This is the course page for Topics in Artificial Intelligence / Advanced Natural Language Processing (NLP) by Damir Cavar.

Damir Cavar

– August 2023 –

Course Arrangements

Meeting time: MW, 4:55-6:10 PM

Classroom: Ballantine Hall (BH) 343

Course website: Assignments, slides, and other material will be posted on Canvas.

Credits: 3

Instructor: Dr. Damir Cavar

Office: Ballantine Hall (BH) 511

Phone: (812) 856-5094

Office hours (BH 516): Thursdays, 4:15-5:15 PM and by appointment

Course Description

Symbolic, statistical, and neural methods are at the core of Computational Linguistics and Natural Language Processing (NLP) in research and applications. This course introduces advanced techniques for NLP based on statistical modeling and machine learning algorithms, including neural network and Deep Learning approaches, bringing them together with symbolic and knowledge-based systems. We aim to bridge research and insights from language and linguistic disciplines and the application of NLP and linguistic technologies from the computer and information science perspective.

This course will cover fundamental notions in probability and information theory, focusing on the concepts needed for common NLP tasks. We will discuss N-gram models, exemplified by an approach to document classification or Part-of-Speech (PoS) tagging. In the next step, we will extend to probabilistic methods and to sentiment analysis. We will study advanced neural network approaches (Deep Learning) for NLP, used for various speech and language processing tasks.

Additionally, we will cover concrete topics such as information extraction and graph-based knowledge representations used for text classification, natural language understanding, dialog systems or chatbots (so called AIs), or information retrieval, and how to use various NLP methods in the context of such systems. There is space to focus in part on topics of interest related to the choice of concrete applications of NLP methods.

We are discussing advanced hybrid NLP methods, covering symbolic, statistical, and neural network methods in the context of particular tasks. All the methods we use apply to a range of tasks in NLP. The mission is to teach students techniques, algorithms, and existing environments to enable them to develop their own strategies to analyze linguistic phenomena using language data, to apply NLP in the domain of information extraction from unstructured data, or to research in the field of AI, psycholinguistic or cognitive language faculty, verbal behavior, and general speech and language technologies.

Crucial aspects of course outcomes are:

to understand advanced NLP algorithms and methods
to understand the linguistic annotations, analyses, and outputs that these methods generate
to acquire the skills and ability to develop own models and to tune such methods to apply NLP to entirely new problems and research areas

This course provides an essential platform for further work in NLP.

Coding and Computational Experiments

Students are encouraged to bring their laptops or other computational devices to class.

The readings and exercises will be accompanied by practical examples using:

Python 3 (with Numpy, Scipy) (and Cython) with NLTK, spaCy, Stanza, Stanford CoreNLP, Spark NLP, PyTorch
Code using Rust, Java, Scala, Clojure, or other modern programming languages will be accepted and can be used in projects

Schedule

This is a tentative schedule. It is subject to change. Updates and changes will be discussed in class. [JM] refers to the Jurafsky and Martin textnook.

date	topic
08/21/2023	Introduction and Orientation Meeting
08/23/2023	Introduction, MS Ch. 1, JM Ch. 1
08/28/2023	Corpora and Linguistic Annotation, JM Ch. 2, Canvas material
08/30/2023	Text Processing, JM Ch. 2
09/06/2023	Edit Distance, JM Ch. 3
09/11/2023	N-gram models, JM Ch. 3
09/13/2023	N-gram models and statistical analysis, JM Ch. 3
09/18/2023	NLP Technologies and Linguistic Annotation, On Canvas
09/20/2023	NLP Technologies and Linguistic Annotation, On Canvas
09/25/2023	Common NLP Pipelines, On Canvas
09/27/2023	Common NLP Pipelines, On Canvas
10/02/2023	Naïve Byes, Text and Sentiment Classification, JM Ch. 4
10/04/2023	Naïve Byes, Text and Sentiment Classification, JM Ch. 4
10/09/2023	Logistic Regression, JM Ch. 5
10/11/2023	Logistic Regression, JM Ch. 5
10/16/2023	Logistic Regression, JM Ch. 5
10/18/2023	Vector Semantics and Embedding, JM Ch. 6
10/23/2023	Vector Semantics and Embedding, JM Ch. 6
10/25/2023	Vector Semantics and Embedding, JM Ch. 6
10/30/2023	Neural Networks, JM Ch. 7
11/01/2023	Neural Networks, JM Ch. 7
11/06/2023	PoS-tagging and NER, JM Ch. 8
11/08/2023	RNNs and LSTMs, JM Ch. 9
11/13/2023	RNNs and LSTMs, JM Ch. 9
11/15/2023	Transformers and Pretrained Language Models, JM Ch. 10
11/20/2023	Transformers and Pretrained Language Models, JM Ch. 10
11/22/2023	Large Language Models, On Canvas
11/27/2023	Parsing and Semantic Processing, JM Ch. 18, 20, 21
11/29/2023	Graph Models of Knowledge and Semantics, JM Ch. 18, 20, 21
12/04/2023	Graph Models of Knowledge and Semantics, On Canvas
12/06/2023	Relation and Event Extraction, JM Ch. 21
12/11/2023	Project Presentations
12/13/2023	Project Presentations

Literature

Main textbook

We will be using the most recent 3rd edition of the textbook and additional material shared on Canvas.

Jurafsky, Dan and James H. Martin (2008) Speech and Language Processing. 2nd ed. The 3rd edition is available online (https://web.stanford.edu/~jurafsky/slp3/).

Additional books and articles

Manning, Christopher and Schütze, Hinrich (1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.
Bird, Steven, Ewan Klein, Edward Loper (2009) Natural Language Processing with Python. O’Reilly Media.
Hogg, Robert V., Elliot A. Tanis (2001) Probability and statistical inference. 6th ed. Upper Saddle River, NJ: Prentice Hall.
Goldberg, Yoav (2015) A Primer on Neural Network Models for Natural Language Processing.

General Information and Notes

Participation in the NLP-Lab Projects

Students are welcome to participate in NLP-Lab meetings and projects after consultation with the instructor. See for more details: https://nlp-lab.org/

Disclaimer

This syllabus is subject to change and likely will change. All critical changes will be made in writing, with ample time for adjustment.