direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Research Projects

This page give an overview of ODS research projects.

Further ODS research projects in the context of DCAITI can be found here.

Berlin Institute for the Foundations of Learning and Data


BIFOLD, the Berlin Institute for the Foundations of Learning and Data aims to conduct research into the scientific foundations of Big Data and Machine Learning, to advance AI application development, and greatly increase the impact to society, the economy, and science.

BIFOLD will pursue the following strategic priorities in line with the German National AI Strategy:

  • Research: Conduct high-impact foundational research in the fields of Big Data, Machine Learning and their intersection, to profoundly advance the state-of-the-art in Big Data and Machine Learning methods and technologies as well as attract the world’s best scientists to Germany.
  • Innovation: Prototype AI technologies, Big Data systems, Data Science tools, Machine Learning algorithms, and support knowledge and technology exchange, to empower innovation in the sciences, humanities, and companies, particularly, startups.
  • Education: Prepare the next generation of experts in Big Data and Machine Learning, for future academic or industrial careers.

Computing Foundations For Semantic Stream Processing (COSMO)

Jan 1, 2021 - Dec 31, 2023

The ability to process stream data is ubiquitous in modern information systems. The grand challenge in establishing a processing framework for powering such systems is how to strike the right balance between expressivity and computability in a highly dynamic setting. The expressivity of the framework reflects what kind of input data and what types of processing operations it enables. The computability corresponds to its ability to process a certain workload (e.g., processing workflow and data size) under an execution setting (e.g., CPU, RAM and network bandwidth).

So far, various research communities have independently addressed this challenge by imposing their application-specific trade-offs and assumptions to their underlying processing model. Such trade-offs and assumptions are driven by prior-knowledge on data characteristics (e.g., format, modality, schema and distribution), processing workload and computation settings. However, the recent developments of the Internet of Things and AI have brought completely new levels of expressivity of the processing pipeline as well as dynamicity of computation settings. For instance, a typical processing pipeline of a connected vehicle includes not only multimodal stream elements generated at real-time by unprecedented types of sensors but also very complex processing workflows including logic reasoning and statistical inference. Furthermore, this pipeline can be executed in a highly dynamic distributed setting, e.g. combining in-car processing units and cloud/edge computing infrastructures. The processing pipeline and the setup of this kind hence need a radical overhaul of the state of the art of several areas.

To this end, this project will aim to establish computing foundations that enable a unified processing framework to address this grand challenge. The targeted framework will propose a semantic-based processing model with a standard-oriented graph data model and query language fragments, called Semantic Stream Processing. Therefore, the project will carry out a systematic study on tractable classes of a wide range of processing operators, e.g, graph query pattern, logic reasoning, and statistical inference on stream data. The newly-invented tractable classes of processing operations will pave the way for designing efficient classes of incremental evaluation algorithms. To address the scalability, the project will also study how to elastically and robustly scale a highly expressive stream processing pipeline in a dynamic and distributed computing environment. Moreover, the project will investigate a novel optimisation mechanism which combines the logic optimisation algorithms which exploit rewriting rules and pruning constraints with adaptive optimisation algorithms which continuously optimise its execution plans based on runtime statistics. The proposed algorithms and framework will be extensively and systematically evaluated in two application domains, connected vehicles and the Web of Things.



As an interdisciplinary scientific technology field, catalysis is of great strategic importance for the economy and society as a whole. It is one of the most important core technologies for simultaneously solving the pressing challenges of climate change, the supply of sustainable energy and sustainable materials. Concrete examples are the reduction or complete avoidance of CO2 emissions, the recycling of plastic waste and CO2 in chemical production, sustainable hydrogen production, fuel cell technology or the sustainable nutrition of more than seven billion people on earth. They all require groundbreaking advances in catalysis science and technology.

This requires a fundamental change in catalysis research, chemical engineering and process technology. A key challenge is to bring together the different disciplines in catalysis research and technology with the support of data scientists and mathematicians. The aim is to redefine catalysis research in the digital age. This so-called “digital catalysis” is to be realized along the data value chain, which is oriented along “molecules to chemical processes”.

The NFDI4Cat consortium, coordinated by DECHEMA (Society for Chemical Engineering and Biotechnology), consists of experts from the fields of homogeneous, heterogeneous, photo-, bio- and electrocatalysis and is supplemented by experts from the engineering, data and mathematical sciences. Partner institutions are:

  • Leibniz Institute for Catalysis e.V. (LIKAT)
  • Friedrich-Alexander-Universität Erlangen
  • RWTH Aachen
  • Universität Greifswald
  • Universität Leipzig
  • Universität Rostock
  • TU Berlin
  • TU Braunschweig
  • TU Dortmund
  • TU München
  • Fraunhofer Institute for Open Communication Systems (FOKUS)
  • High Performance Computing Center Stuttgart (HLRS)
  • Karlsruhe Institute of Technology (KIT)
  • Max Planck Institute for Chemical Energy Conversion
  • Max Planck Institute for Dynamics of Complex Technical Systems

The consortium is complemented by the TU Darmstadt as an associated partner. A unique selling point of NFDI4Cat is the role of the industry, which supports NFDI4Cat in an advisory capacity. In addition to hte GmbH, which will play a leading role, the companies include industry representatives BASF SE, Clariant Produkte GmbH (Catalysts), Covestro Deutschland AG, Evonik Industries AG, Linde AG (Engineering Division) and thyssenkrupp Industrial Solutions AG.

In order to achieve the overall objectives of NFDI in an interdisciplinary way, NFDI4Cat will cooperate particularly closely with other funded and emerging consortia such as NFDI4Ing and NFDI4Chem due to overlapping areas of interest.

About the National Research Data Infrastructure:

The National Research Data Infrastructure (NFDI) aims to systematically develop, sustainably secure and make the data sets of science and research accessible, as well as networking them (inter-)nationally. It is currently being set up in a process driven by science as a networked structure of individually acting consortia. The NFDI will be established in three stages over a period of three years (2019 to 2021). In each of the three stages, new consortia can be admitted to the NFDI in a science-led application process. The Federal Government and the federal states intend to fund up to 30 consortia in total. In the final stage, up to 85 million euros per year will be available for funding.

Berlin Open Science Platform


The Berlin Open Science Platform (BOP) is a curation platform for research data developed for the Berlin University Alliance (BUA). BOP provides sharing and processing services for research data and supports openness, transparency and participation in research. BOP shall enable users to

  • find and access research data (publications, datasets) of the BUA partners through a single point of access,
  • combine research data in experiments and evaluate it (data curation: visualization, data clustering, text summarization, text translation), and
  • connect researchers from different disciplines, to simplify collaborations between them and to support sustainable research within the Berlin University Alliance.

The project is embedded in an initiative of Objective 5 to build a SOURCE centre which will provide a portal for services for electronic research data. It is planned that the platform will be made sustainable long-term through the university libraries.

The software development is accompanied by co-creation workshops to assess the requirements and the resulting prototypes in a systematic way. The goal is to design the development in a value-oriented fashion collaboratively with its users to ensure that the use of the platform is in line with the value-oriented governance of the BUA. This subproject shall also obtain results on the acceptance, design and use of the co-creation workshops themselves to identify relevant aspects of co-creation workshops for further use in the BUA context. The sub-project supports the complete project life cycle and aims at supporting a software development process which addresses all requirements and expectations of the users to facilitate collaborative prototype development and testing.

Berlin Big Data Center Phase II


Funded by the German Federal Ministry of Education and Research (BMBF) and established in 2014, the Berlin Big Data Center (BBDC) is a national big data competence center led by the Technische Universität Berlin (TUB). In Phase I, besides TUB, BBDC consortium partners included the Beuth University of Applied Sciences Berlin, the German Research Center for Artificial Intelligence (DFKI), the Fritz-Haber-Institute of the Max Planck Society, and the Zuse Institute Berlin (ZIB). Over its initial four-year period, the BBDC sought to prepare German/European industry, science and society for the global big data revolution. Its key objectives include:

  1. conducting fundamental research to enable scalable big data analysis,
  2. developing an integrated, declarative, and highly scalable open-source system for advanced data analysis,
  3. transferring technology and know-how to support innovation in industry, and
  4. educating future data scientists at leading academic programs.

In 2018, the BBDC entered into a subsequent three-year period due to an additional funding award by the BMBF. In Phase II, besides TUB, the consortium partners include Charité Universitätsmedizin Berlin, DFKI, Technische Universität Braunschweig, and ZIB. In this phase, research will be carried out at the intersection of scalable data management and machine learning, in support of big data and data science. In particular, the BBDC will continue to explore scalability issues surrounding the real-time processing of data streams and declarative machine learning on massive datasets. In addition, varying application areas will be addressed, including the analysis of distributed biomedical data and heterogeneous morphomolecular data arising in cancer research, learning on compressed data streams, real-time speech technology for interactive user interfaces, as well as security and privacy issues concerning the handling of sensitive personal information in big data systems. Moreover, the BBDC will closely collaborate with the newly established Berlin Center for Machine Learning (BZML).

For more information, please visit: www.bbdc.berlin.

ProvDS - Uncertain Provenance Management over Incomplete & Heterogeneous Linked Stream Data

Jan 1, 2018 - Feb 14, 2021
DFG D-A-CH Programme

In heterogeneous environments, operations on data are performed by multiple and different uncoordinated participants (e.g., producers, processors, consumers), each of them propagates and introduces errors. These errors cause uncertainty of the process that is further amplified when many data sources are combined and errors are propagated across a diversity of actors. The ability to properly identify how such errors influence the results is crucial to assess the results quality, thus to establish user trust. The problem of error propagation is aggravated in the realm of the Internet of Things which is a multilevel heterogeneous platform involving many participants propagating data.

In such platform, none of the participants has a full view on the propagation of data. More specifically, at none of the propagation stages, there exists a complete knowledge of how a particular piece of data is produced, how it is processed, or how the results are derived. In order to increase the user confidence on the platform, in this project we will develop methods to understand the entire process flow, i.e., how results are derived starting from data sources, curation, recovery, intermediate processing, to a final point delivering the output. We propose to find a provenance trace of the results that is, information how data propagates to derive the results. We will allow to precisely identify how particular piece of data influences the results which will provide a capability to establish transparency, consequently increasing user trust.

At a technical level, we will investigate methods to manage uncertain provenance over incomplete and heterogeneous Linked Stream Data. More specifically, ProvDS project will introduce provenance and recovery aware data management techniques. Unlike traditional provenance management techniques, which are applied on complete and static data, the following research agenda focuses on dynamic and incomplete heterogeneous data. The accuracy and the efficiency of the developed techniques will be evaluated and tested using real-world open Linked Data collections and open collections of time series that contain heterogeneous unstructured data.

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions