CSCI-B 659 (Class # 12560) Topics in Artificial Intelligence

LING-L 715 Seminar on Knowledge Graphs, Large Language Models, and Graph-based Reasoning

This is the course page for Topics in Artificial Intelligence / Seminar on Knowledge Graphs, Large Language Models, and Graph-based Reasoning by Damir Cavar.

Introduction

Meeting time: MW, 1:15-2:30 PM

Classroom: Public Health (PH) 10

Course website: Assignments, slides, and other material will be posted on Canvas.

Credits: 3

Instructor: Dr. Damir Cavar

Office: Ballantine Hall (BH) 511

Phone: (812) 856-5094

Office hours (BH 516): Thursdays 4:15-5:15 PM and by appointment

Seminar description

Natural Language Processing (NLP) technologies enjoyed significant hype due to the stunning abilities of Large Language Models (LLMs) demonstrated by ChatGPT and GPT4. One of the problematic issues with such models is hallucination. LLMs tend to generate plausible and well-formulated text, references, and data that do not correspond to factual knowledge. Various proposals discuss how approved and valid knowledge can be added to LLMs to minimize hallucinations. On the other hand, LLMs seem to provide superior capabilities to process or generate natural language and, to a limited extent, reason over utterances and claims. LLMs also seem able to reason over temporal and event logic in limited ways—these capabilities we want to combine with formal knowledge representations.

Knowledge Graphs, Ontologies, and related representations of semantic properties and concept relations facilitate efficient graph-based storage for processing meaning via entailment and concept hierarchies. Those technologies enable the specification of semantic relations and limitations for precise representation of core aspects of natural language semantics. This includes some possibilities to reason over data and relations. Factive knowledge stored in Knowledge Graphs enables processing descriptions of the world or a specific knowledge domain. Knowledge Graphs are limited in keeping complex event representations and changes of entities and situations over time.

This seminar consists of a series of experiments to test and experiment with:

The reasoning capabilities of Large Language Models.
The possibility to integrate Knowledge Graphs into LLMs for looking up facts and reasoning about states.
The generation of Knowledge Graphs from different data sources, e.g., structured data, unstructured text, or images.
The architecture of a dynamic graph representation for storage of event semantics and event unfolding along the time axis.

We will look at implementations of LLMs and experiment with integrating Knowledge Graphs in such LLMs. In addition to that, we will experiment with approaches to generate knowledge representations from structured and unstructured sources, providing access to such models via LLMs.

We are discussing, implementing, and experimenting with general techniques to map knowledge from unstructured sources (text, speech, image, sensory data) to graph representations:

Entities and relations
Events and unfolding of events as graph transformations
Temporal relations (sequencing) and event durations

We use graphs as symbolic knowledge representations (or Knowledge Graphs) with RDF, JSON-LD, OWL backends, as well as probabilistic and dynamic networks in hybrid models (symbolic and neural). The complexity of knowledge extraction becomes much higher when including processing implicatures and presuppositions and representing those in graph models.

Our goal is to a.) gain a deep understanding of the mapping from unstructured information (e.g., language, vision) to high-precision graph-based knowledge representations to b.) generate implicatures and presuppositions from both to be able to extend the logical reasoning capabilities, to c.) explore the limits of hybrid AI and Machine Learning methods on symbolic and probabilistic/dynamic Knowledge Graphs using various approaches to Graph Embeddings, with different graph and Graph Neural Network algorithms. Integrating sophisticated knowledge representations in an LLM environment can facilitate AI systems significantly and provide new reliable reasoning capabilities for data and information in various domains, e.g., medical, cybersecurity, or scientific writing.

Literature

Knowledge Graph and Ontology

Barrasa, Jesus and Jim Webber (2023) Building Knowledge Graphs. O’Reilly Media, Inc.
Jakus, Grega, and Veljko Milutinović, Sanida Omerović, Sašo Tomažič (2013) Concepts, Ontologies, and Knowledge Representation. Springer New York, NY.
Staab, Steffen, and Rudi Studer (2009) Handbook on Ontologies. Springer Berlin, Heidelberg
Keet, C. Maria (2020) An Introduction to Ontology Engineering.
Tutorial: Build a Knowledge Graph using NLP and Ontologies

Large Language Model

Wolfram, Steven (2023) What Is ChatGPT Doing … and Why Does It Work?
Pan, Shirui and Linhao Luo, Yufei Wang, Chen Chen, Jiapu Wang, Xindong Wu (2023) Unifying Large Language Models and Knowledge Graphs: A Roadmap.
Radford, Alec, and Karthik Narasimhan, Tim Salimans, Ilya Sutskever (2019) Improving Language Understanding by Generative Pre-Training
Floridi, Luciano and Massimo Chiriatti (2020) GPT-3: Its Nature, Scope, Limits, and Consequences. In Minds and Machines, 30:681–694
Thoppilan R. et al. (2022) LaMDA: Language Models for Dialog Applications. Neo4j tutorial.

Natural Language Processing

Jurafsky, Dan and James H. Martin (2008) Speech and Language Processing. 2nd ed. The 3rd edition 2023 (or later) is available online (https://web.stanford.edu/~jurafsky/slp3/).

Disclaimer

This syllabus is subject to change and likely will change. All important changes will be made in writing, with ample time for adjustment.