CSCI-B 659 Topics in Artificial Intelligence

LING-L 715 Seminar on Knowledge Graphs, Large Language Models, and Graph-based Reasoning using Agentive AI models

This is the course page for Topics in Artificial Intelligence / Seminar on Knowledge Graphs, Large Language Models, and Graph-based Reasoning using Agentive AI models by Damir Cavar.

Introduction

Sections: 10343 or 10496

Instructor: Assoc. Prof. Dr. Damir Cavar

Contact: email, phone

Office hours: Thursday 1-2 PM and by arrangement

Office: Ballantine Hall (BH) 511

Meeting time: Tuesday and Thursday, 2:20 - 3:35 PM

Course website: Assignments, slides, and other material will be posted on Canvas.

Credits: 3

Seminar description

Large Language Models (LLMs) demonstrated by ChatGPT, Gemini, or Claude 4 a high level of sophistication in the language and image processing domain. One of the problematic issues with such models is hallucination. LLMs tend to generate plausible and well-formulated text, references, and data that do not correspond to factual knowledge. Various proposals discuss how approved and valid knowledge can be added to LLMs to minimize hallucinations and update the knowledge of LLMs without retraining them more frequently using the newest content. LLMs seem to provide superior capabilities to process or generate natural language and, to a limited extent, reason over utterances and claims. LLMs also seem able to reason over temporal and event logic in limited ways—these capabilities we want to combine with formal knowledge representations.

Knowledge Graphs, Ontologies, and related representations of semantic properties and concept relations facilitate efficient graph-based storage for processing meaning via entailment and concept hierarchies. Those technologies enable the specification of semantic relations and limitations for precise representation of core aspects of natural language semantics. This includes some possibilities to reason over data and relations. Factual knowledge stored in Knowledge Graphs enables processing descriptions of the world or a specific knowledge domain. Knowledge Graphs are limited in keeping complex event representations and changes of entities and situations over time.

This seminar consists of a series of experiments to test and experiment with:

The reasoning capabilities of Large Language Models.
The possibility of integrating Knowledge Graphs and Description Logic ontologies into LLMs for looking up facts and reasoning about states.
The generation of Knowledge Graphs or Ontologies from different data sources, e.g., structured data, unstructured text, or images.
The architecture of a dynamic graph representation for the storage of event semantics and event unfolding along the time axis.
Orchestration of LLMs in Agentive AI architectures and intelligent systems to improve computational reasoning.

We will look at implementations of LLMs and experiment with integrating Knowledge Graphs in such LLMs. In addition to that, we will experiment with approaches to generate knowledge representations from structured and unstructured sources, providing access to such models via LLMs.

We are discussing, implementing, and experimenting with general techniques to map knowledge from unstructured sources (text, speech, image, sensory data) to graph representations:

Entities and relations
Events and unfolding of events as graph transformations
Temporal relations (sequencing) and event durations

We use graphs as symbolic knowledge representations (or Knowledge Graphs) with RDF, JSON-LD, OWL backends, as well as probabilistic and dynamic networks in hybrid models (symbolic and neural). The complexity of knowledge extraction becomes much higher when including processing implicatures and presuppositions and representing those in graph models.

Our goal is to a.) gain a deep understanding of the mapping from unstructured information (e.g., language, vision) to high-precision graph-based knowledge representations, and b.) generate implicatures and presuppositions from both to be able to extend the logical reasoning capabilities, to c.) explore the limits of hybrid AI and Machine Learning methods on symbolic and probabilistic/dynamic Knowledge Graphs using various approaches to Graph Embeddings, with different graph and Graph Neural Network algorithms. Integrating sophisticated knowledge representations in an LLM environment can significantly facilitate AI systems and provide new reliable reasoning capabilities for data and information in various domains, e.g., medical, cybersecurity, or scientific writing.

Learning Outcomes

Crucial aspects of course outcomes are:

Understand machine learning, computational semantics, and Natural Language Processing (NLP)
Understanding symbolic knowledge representations and computational reasoning
Understand the linguistic annotations, analyses, and outputs that LLMs and ontologies generate
Acquire the skills and ability to develop own models and to tune such methods to apply NLP to entirely new problems and research areas
Understand how Large Language Models and Generative AI work
Understand orchestration of LLMs and Agentive AI architectures with integrated RAGs and Knowledge Representations
Reinforce concepts of programming in Python
Learn to apply well-documented scientific libraries in Python

This course provides an essential platform for further work with LLMs, NLP, and AI.

Assessment

Grades are based on the following schema:

45% Written reports, assigned 7.
10% Class participation
40% Final project – written report
5% Final project – oral report

Major due dates

Note to students: these dates are subject to change with adequate notification via Canvas announcements.

Written reports: due most Fridays
Final project oral report: Tuesday, Dec. 09 and Thursday, Dec. 11 (in lecture)
Final project - written report: Thursday, Dec. 18

Required materials

There is no required textbook in this seminar.

A recommended textbook is available online free of charge:

Jurafsky, Dan and James H. Martin (2008) Speech and Language Processing. 2nd ed. The 3rd edition is available online (https://web.stanford.edu/~jurafsky/slp3/

Links to an external site.).

Additional books and publications recommended for course reference materials are supplied in class.

No software purchases are required; the course uses free open source software (Python and GitHub).

Students are expected to supply a personal working laptop computer with permissions to install and use the course software.

Relevant Literature

Knowledge Graph and Ontology

Barrasa, Jesus and Jim Webber (2023) Building Knowledge Graphs. O’Reilly Media, Inc.
Jakus, Grega, and Veljko Milutinović, Sanida Omerović, Sašo Tomažič (2013) Concepts, Ontologies, and Knowledge Representation. Springer New York, NY.
Staab, Steffen, and Rudi Studer (2009) Handbook on Ontologies. Springer Berlin, Heidelberg
Keet, C. Maria (2020) An Introduction to Ontology Engineering.
Tutorial: Build a Knowledge Graph using NLP and Ontologies

Large Language Model

Wolfram, Steven (2023) What Is ChatGPT Doing … and Why Does It Work?
Pan, Shirui and Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, Xindong Wu (2023) Unifying Large Language Models and Knowledge Graphs: A Roadmap.
Radford, Alec, and Karthik Narasimhan, Tim Salimans, Ilya Sutskever (2019) Improving Language Understanding by Generative Pre-Training
Floridi, Luciano and Massimo Chiriatti (2020) GPT-3: Its Nature, Scope, Limits, and Consequences. In Minds and Machines, 30:681–694
Thoppilan R. et al. (2022) LaMDA: Language Models for Dialog Applications. Neo4j tutorial.

Natural Language Processing

Jurafsky, Dan and James H. Martin (2008) Speech and Language Processing. 2nd ed. The 3rd edition 2023 (or later) is available online (https://web.stanford.edu/~jurafsky/slp3/).

Disclaimer

This syllabus is subject to change and likely will change. All important changes will be made in writing, with ample time for adjustment.