Damir Cavar's Homepage

Logo

Damir Cavar is a Natural Language Processing, AI, and Knowledge Representation scientist

curriculum vitae

publications

talks

research

teaching

code

blog

View My GitHub Profile

CSCI-B 659 Topics in Artificial Intelligence - Advanced Natural Language Processing

LING-L 645 Advanced Natural Language Processing

This is the course page for Topics in Artificial Intelligence / Advanced Natural Language Processing (NLP) by Damir Cavar.

Damir Cavar

– August 2023 –


Course Arrangements

Meeting time: MW, 4:55-6:10 PM

Classroom: Ballantine Hall (BH) 343

Course website: Assignments, slides, and other material will be posted on Canvas.

Credits: 3

Instructor: Dr. Damir Cavar

Office: Ballantine Hall (BH) 511

Phone: (812) 856-5094

Office hours (BH 516): Thursdays, 4:15-5:15 PM and by appointment

Course Description

Symbolic, statistical, and neural methods are at the core of Computational Linguistics and Natural Language Processing (NLP) in research and applications. This course introduces advanced techniques for NLP based on statistical modeling and machine learning algorithms, including neural network and Deep Learning approaches, bringing them together with symbolic and knowledge-based systems. We aim to bridge research and insights from language and linguistic disciplines and the application of NLP and linguistic technologies from the computer and information science perspective.

This course will cover fundamental notions in probability and information theory, focusing on the concepts needed for common NLP tasks. We will discuss N-gram models, exemplified by an approach to document classification or Part-of-Speech (PoS) tagging. In the next step, we will extend to probabilistic methods and to sentiment analysis. We will study advanced neural network approaches (Deep Learning) for NLP, used for various speech and language processing tasks.

Additionally, we will cover concrete topics such as information extraction and graph-based knowledge representations used for text classification, natural language understanding, dialog systems or chatbots (so called AIs), or information retrieval, and how to use various NLP methods in the context of such systems. There is space to focus in part on topics of interest related to the choice of concrete applications of NLP methods.

We are discussing advanced hybrid NLP methods, covering symbolic, statistical, and neural network methods in the context of particular tasks. All the methods we use apply to a range of tasks in NLP. The mission is to teach students techniques, algorithms, and existing environments to enable them to develop their own strategies to analyze linguistic phenomena using language data, to apply NLP in the domain of information extraction from unstructured data, or to research in the field of AI, psycholinguistic or cognitive language faculty, verbal behavior, and general speech and language technologies.

Crucial aspects of course outcomes are:

This course provides an essential platform for further work in NLP.

Coding and Computational Experiments

Students are encouraged to bring their laptops or other computational devices to class.

The readings and exercises will be accompanied by practical examples using:

Schedule

This is a tentative schedule. It is subject to change. Updates and changes will be discussed in class. [JM] refers to the Jurafsky and Martin textnook.

date topic
08/21/2023 Introduction and Orientation Meeting
08/23/2023 Introduction, MS Ch. 1, JM Ch. 1
08/28/2023 Corpora and Linguistic Annotation, JM Ch. 2, Canvas material
08/30/2023 Text Processing, JM Ch. 2
09/06/2023 Edit Distance, JM Ch. 3
09/11/2023 N-gram models, JM Ch. 3
09/13/2023 N-gram models and statistical analysis, JM Ch. 3
09/18/2023 NLP Technologies and Linguistic Annotation, On Canvas
09/20/2023 NLP Technologies and Linguistic Annotation, On Canvas
09/25/2023 Common NLP Pipelines, On Canvas
09/27/2023 Common NLP Pipelines, On Canvas
10/02/2023 Naïve Byes, Text and Sentiment Classification, JM Ch. 4
10/04/2023 Naïve Byes, Text and Sentiment Classification, JM Ch. 4
10/09/2023 Logistic Regression, JM Ch. 5
10/11/2023 Logistic Regression, JM Ch. 5
10/16/2023 Logistic Regression, JM Ch. 5
10/18/2023 Vector Semantics and Embedding, JM Ch. 6
10/23/2023 Vector Semantics and Embedding, JM Ch. 6
10/25/2023 Vector Semantics and Embedding, JM Ch. 6
10/30/2023 Neural Networks, JM Ch. 7
11/01/2023 Neural Networks, JM Ch. 7
11/06/2023 PoS-tagging and NER, JM Ch. 8
11/08/2023 RNNs and LSTMs, JM Ch. 9
11/13/2023 RNNs and LSTMs, JM Ch. 9
11/15/2023 Transformers and Pretrained Language Models, JM Ch. 10
11/20/2023 Transformers and Pretrained Language Models, JM Ch. 10
11/22/2023 Large Language Models, On Canvas
11/27/2023 Parsing and Semantic Processing, JM Ch. 18, 20, 21
11/29/2023 Graph Models of Knowledge and Semantics, JM Ch. 18, 20, 21
12/04/2023 Graph Models of Knowledge and Semantics, On Canvas
12/06/2023 Relation and Event Extraction, JM Ch. 21
12/11/2023 Project Presentations
12/13/2023 Project Presentations

Literature

Main textbook

We will be using the most recent 3rd edition of the textbook and additional material shared on Canvas.

Additional books and articles

General Information and Notes

Participation in the NLP-Lab Projects

Students are welcome to participate in NLP-Lab meetings and projects after consultation with the instructor. See for more details: https://nlp-lab.org/

Disclaimer

This syllabus is subject to change and likely will change. All critical changes will be made in writing, with ample time for adjustment.


(C) 2024 by Damir Cavar