ML with Domain-Specific Ontology for IT Security Industry
2023-06-19 , Palais Atelier

The BSI provides actual data on acute IT threat situations. We developed a system for detecting threats: crawling, automatic analysis with NER, NEL, provision and use of dedicated tools for evaluating


The BSI monitors and assesses the current IT security situation and its long-term changes. This includes, for example, hacker groups or newly discovered security vulnerabilities. For this purpose, various news sources are monitored and important information is extracted to identify current trends and gain an overview.

To optimize this process, we are working with the BSI to develop a system that supports the work by subjecting documents to automatic analysis using methods such as Named Entity Recognition (NER) and Named Entity Linking (NEL). While NER refers to the mapping of text passages to given classes through machine learning (e.g., "browser" to software), NEL aims at mapping to concrete entities of an ontology (e.g., "DOS" to "Disk Operating System"). We explain how we deal with the particular challenge of conceptual ambiguities ("DOS" stands not only for "Disk Operating System" but also for "Denial of Service"). The talk gives an insight into our entity recognition system and how we create a powerful tool for analyzing IT security documents by combining ontology and machine learning.

See also: Slides (3.6 MB)

Qi Wu, Machine Learning Engineer at ontolux, a brand of Neofonie GmbH, works on topics such as training and optimizing models, with a focus on finetuning and distillation, and translates current research results into usable applications for customers. During her master studies in statistics, she has already worked with Prof. Dr. Alan Akbik on the NLP framework FLAIR and worked on ML in the area of natural language processing.

Bertram Sändig leads the Machine Learning team at ontolux, a brand of Neofonie GmbH. He works on the adaptation, optimization, and integration of large language models for ontolux's text analysis toolkit, translating current research results into usable applications for customers.