Exploring Toxic and Hate Bias in Large Language Models

Álvaro Esteban Muñoz

abstract

With the advent of generative AI, numerous applications have adopted
strategies based on Large Language Models (LLMs) to tackle various NLP
tasks. Many inside the scientific community highlights the inherent toxic
and hate biases present in these LLMs, potentially leading to various social
consequences and affect many persons in different ways. The amount of
studies addressing this issue is increasing, however many do not agree
with sociologists in robust standards or methodologies. Moreover, as more
companies recognize the potential of LLMs-based applications, there’s a
risk that they may overlook these issues. In the following project we try
to gather and expose the main techniques, methodologies, standards and
datasets used in order to address the exploration of bias in LLMs.

outcomes

forum on Virtuale • repo url for the project