“Quantifying and Alleviating Political Bias in Language Models”, 2022-03-01 ():
Current large-scale language models can be politically biased as a result of the data they are trained on, potentially causing serious problems when they are deployed in real-world settings.
In this paper, we first describe metrics for measuring political bias in GPT-2 generation, and discuss several interesting takeaways:
The generation of vanilla GPT-2 model is mostly liberal-leaning,
Such political bias depends on the sensitive attributes mentioned in the context, and
- Priming the generation with an explicit political identifier, the extent of political bias is imbalanced (between liberal and conservative).
We then propose a reinforcement learning (RL) framework for mitigating such political biases in generated text: By using rewards from word embeddings or a classifier, our RL framework guides debiased generation without having access to the training data or requiring the model to be retrained.
In empirical experiments on 3 attributes sensitive to political bias (gender, location, and topic), our methods reduced bias according to both our metrics and human evaluation, while maintaining readability and semantic coherence.
[Keywords: bias in language models, natural language generation, political bias, measuring bias, mitigating bias]