Research focus: Computational Semantics and Pragmatics
I am working on algorithms and formal models for computational semantics and pragmatics. The challenge can be
- Extraction of core semantic concepts and relations, that is Entities and Relations from unstructured
- Mapping these Entities and Relations into (Knowledge) Graph representations.
- Analyzing hidden properties and relations using formal analyses of Implicatures and Presuppositions.
As for the extraction of semantic concepts and relations I developed an Open Information Extraction
platform that is based on various NLP components to analyze words, clauses, sentences, and paragraphs of
text, and to detect core semantic relations that represent the predicate and its arguments. In simple terms,
the task is to analyze input like:
- (...) It's a country that has the second largest global defense budget (...)
and map it to:
- China --> has --> second largest global defense budget
It is easy to see how such relations, once extracted from unstructured text could be mapped into graph
representations using various NLP components, as for example constituent parsers, dependency parsers, named
entity recognition, anaphora resolution. Thus, the sequence of base sentences:
John Smith sold Grungle Inc. He bought Sproke LLC. Sproke LLC likes him. He
would be rendered as a graph in the following way:
In fact, my current OpenIE system parses the sentences and renders this representation in Neo4J.
My interest, among others, is in:
- Identifying the most specific semantic type of every entity using general knowledge graphs, ontologies,
or other forms of encoded world knowledge. In the graph above, what we want to achieve is a more
detailed typing of the organisations Grungle Inc. and Sproke LLC, as well as a more
detailed specification of apple not as a THING, but as a FRUIT. I use embeddings
and vector models to identify the most specific hypernym for a concept in knowledge graphs like YAGO or
- Expanding the knowledge or concept graph using the analysis of implicatures and presuppositions
for every utterance or sentence.
- Implicatures, for example, could be resulting from conversations, as for example, when in a
conversation speaker A says to speaker B: "I bought the black dress.", we might
immediately conclude that A and B already talked about the event of buyin a dress
earlier, in fact, about the one black dress. We know this, because speaker A used a
definite article with the direct object. We also know that speaker A had more than one dress to
chose from and that none of the other dresses is black. We can derive this from the specificity
properties of the direct object noun phrase.
- Presupposition would be all the relevant facts or propositions that should be true, if a
particular utterance or sentence should be assumed to be true. For example, if some speaker A
says: "Susan is walking her dog.", we can conclude various other facts that need to be
true, for example: "Susan owns or possesses a dog.", "Susan walks at the time of the
utterance.", "Susan's dog walks at the time of the utterance.", and so on. We can
compute and represent for further analysis and study the presuppositions as additional concept
relations together with the extracted ones.
- Analyzing the temporal logic and sequence of events, and causalities. This has to do with analyzing the
tense of a sentence or clause, and its scope over other clauses and predicative relations. For example,
in a sentence like "John said that Peter bought a new car." the two events are ordered in time,
that is "John say X" is preceded by "Peter buy car". This is not true in a sentence like "John
said that Peter will buy a new car.". The goal is to order all relevant events extracted from
unstructured text along the time axis and provide a flow-analysis of the event sequence.
Analyzing Implicatures and Presuppositions, the semantics and pragmatics of text, requires not only modeling
of general and universal linguistic properties, but also domain specific knowledge. To analyze very specific
semantic properties in the medical domain, for example, I utilize knowledge representations and ontologies
from these specific domains. One example could be that we could conclude that person A has headache
or uses blood thinners for some reason, if she says or writes that the price for Aspirin went
up. Such implicatures are generated by looking up properties of specific medicine or chemicals from
domain specific ontologies, taxonomies, or knowledge representations.
This kind of OpenIE requires much
more sophisticated NLP components than common free tools and pipelines
provide. The experiments involve processing of business, legal, or medical documents, as well as news
My current experimental environment consists of microservices
based on C++, Java,
Go, and Python components
coupled in an RPC-based and RESTful architecture. Given
suggestions, advice and initial experiments, I am
working towards a more homogeneous environment, mostly based on Java 10 based services and a RESTful
If you are interested in working on such an OpenIE and Graph-based system and
join my research group, contact me please.
There are various GitHub repositories that I maintain with material
and code that explain the project. See for more details my GitHub and Bitbucket pages.