I am organizing a research lab, called the NLP-Lab at Indiana University. The lab meets usually 4 times per week. The times and other details are available on the website and the Slack channel.
HooSIER - HooSIER
Semantic Information ExtractoR is a text to semantic or knowledge graph system with Deep NLP and
Deep Graph Processing algorithms for learning of probabilistic Knowledge Graphs from unstructured texts, and in
the next phase from visual or multi-modal input. This project is part of my NLP-Lab
The HooSIER project subsumes the engineering and standardization effort
related to JSON-NLP, the engineering of a
big-data capable NLP platform of high-performance Microservices for Deep NLP,
Knowledge Graphs, and Natural Language Semantics and Pragmatics.\\
This project is related to my interests in High-Performance Computing for NLP: HPNLP.org.
Hate Speech and anti-Semitism in Social Media
Corpus annotation and creation of AI and NLP technologies for the detection and annotation of content in text and images.
This is a joint project with Günther Jikeli at
Indiana University, sponsored, among others by the Office of the Provost for Reseach
at Indiana University.
NLP in Legal
Natural Language Processing of Legal Documents, Information Extraction (IE), Mapping of Text to Knowledge Graph,
Semantic Search over legal documents.
Business Document Mining and Semantic Web
Collaborative Research Grant together with Prof. Matthew Josefy from the Kelley School of Business
at Indiana University on: Technologies for Deep Linguistic NLP for mining of SEC reports, network
mapping of people and firms, and risk management analyses. Funded by the Office of the Office of the Vice Provost for Research at Indiana University.
Joint fellowship with Prof. Josefy from the Kelley School of Business at Indiana University: Fellows of the National Center for the Middle Market at the Fisher College of Business, The Ohio State University. The project is on extraction of business data from SEC reports of Middle Market firms.
Legacy projects that are still maintained:
GORILLA (Global Open Resources and Information for Language and Linguistic Analysis), Speech and Language
Resources, Corpora, Speech Recognition and related technologies for low-resourced languages: Burmese, Chatino, Croatian, Yiddish, ... See also: GORILLA
The Croatian Language Corpus
korpus) is a joint project with the Institute for Croatian Language and Linguistics, as part of the program
"Croatian Online Language Repository", in cooperation with Dunja Brozović-Rončević, Małgorzata E. Ćavar, Tomislav Stojanov. The CLC
text corpus of Croatian literature, newspapers and other genres, encoded in XML on the basis of the TEI P5 standard, made available online
the Philologic interface. Currently
additional interfaces are being developed and tested, to extend the online usability and user experience, when
working with the corpus. The corpus is being annotated phonemically and morphologically and syntactically
parsed. We ported an initial hand-crafted morphological analyzer to XFST, and we are
working on a Croatian LFG
grammar for XLE for syntactic parsing and
functional markup. An extended search interface that allows for online retrieval of linguistic annotations and
structures at these linguistic levels will be provided in the near future.
Applied Technology for Language-Aided CMS (ATLAS) till September 2010 Workpackages leader:
Damir Cavar (Croatian Language
Processing Chain and Multilingual Document Classification) EC web site Funded under: The Information and Communication Technologies Policy
Support Programme Area: CIP-ICT-PSP.2009.5.3 - Multilingual Web : Multilingual Web content management:
methods, tools and processes Total cost: €3.32m; EU contribution: €1.66m; Project reference:
250467; Execution: From 01/03/2010 to 28/02/2013; Project status: Running In cooperation with: Pavle
Valerijev (University of Zadar), Franjo Pehar (University of Zadar), Damir Kero (University of Zadar), Drahomira
Gavranović (University of Zadar), Malgorzata E. Cavar
(University of Zadar) Consortium: Tetracom Interactive Solutions (Tetracom) – Coordinator;
Deutsches Forschungszentrum Fuer Kuenstliche Intelligenz GmbH (DFKI); Instytut Podstaw Informatyki Polskiej
Akademii Nauk (ICS PAS); Atlantis Consulting SA (Atlantis); University Alexandru Ioan Cuza (UAIC); Institute for
Bulgarian Language (IBL DCL); Institute of Technologies and Development Foundation (ITD); University of Hamburg
(UHH); University of Zadar (UniZD)
ABUGI (Alignment based grammar induction)
Unsupervised Grammar Induction; with Joshua Herring, Toshikazu Ikuta, Paul Rodrigues, Giancarlo Schrementi,
Linguistics Dept., Indiana University.
A quantitative Model of Contact-Induced Language
Change... with a Focus on Pidginization and Creolization This grant was funded in 2005 and 2006
by the FRSP Award program, Linguistics Dept., Indiana University.
Caddoan Languages Documentation Project
Award Number: 0421838; Principal
Investigator: Douglas Parks; Co-Principal Investigator: Wallace Hooper, Damir Cavar; Organization: Indiana
University; NSF Organization: BCS Award Date: 07/15/2004; Award Amount: $ 324,999.00