TRAILS - Trustworthy and Inclusive Machines

TRAILS - Sponsored by the Federal Ministry of Education and Research

Natural language processing (NLP) has demonstrated impressive performance in some human tasks. To achieve such performance, current neural models need to be pre-trained on huge amounts of raw text data. This dependence on uncurated data has at least four indirect and unintended consequences:

  1. Uncurated data tends to be linguistically and culturally non-diverse due to the statistical dominance of major languages and dialects in online texts (English vs. North Frisian, US English vs. UK English, etc.).

  2. Pre-trained neural models such as the ubiquitous pre-trained language models (PLM) reproduce the features present in the data, including human biases.

  3. Rare phenomena (or languages) in the “long tail” are often not sufficiently taken into account in model evaluation, leading to an underestimation of model performance, especially in real-world application scenarios.

  4. The focus on achieving state-of-the-art results through the use of transfer learning with giant PLMs such as GPT4 or mT5 often underestimates alternative methods that are more accessible, efficient and sustainable.

As inclusion and trust are undermined by these problems, in TRAILS we focus on three main research directions to address such problems: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and “trustworthy content”, and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.

TRAILS is funded by the German Federal Ministry of Education and Research (BMBF) under the funding code 01IW24005.

Principal Investigators

Avatar

Sebastian Möller

Professor for Quality and Usability, TU Berlin and Department Head, DFKI

Avatar

Josef van Genabith

Professor at German Research Center for Artificial Intelligence (DFKI)

Researchers

Avatar

Tatiana Anikina

PhD Student

Avatar

Arne Binder

PhD Student

Avatar

Cristina España i Bonet

Senior Researcher

Avatar

David Harbecke

PhD Student

Avatar

Leonhard Hennig

Senior Researcher

Avatar

Simon Ostermann

Senior Researcher

Avatar

Günter Neumann

Professor at German Research Center for Artificial Intelligence (DFKI)

Avatar

Tanja Bäumel

PhD Student

Recent Publications

TRAILS - Sponsored by the Federal Ministry of Education and Research