subscribe to arXiv mailings

Mapping the Ethics of Generative AI: A Comprehensive Scoping Review

Abstract: The advent of generative artificial intelligence and the widespread adoption of it in society engendered intensive debates about its ethical implications and risks. These risks often differ from those associated with traditional discriminative machine learning. To synthesize the recent discourse and map its normative concepts, we conducted a scoping review on the ethics of generative artificial in… ▽ More The advent of generative artificial intelligence and the widespread adoption of it in society engendered intensive debates about its ethical implications and risks. These risks often differ from those associated with traditional discriminative machine learning. To synthesize the recent discourse and map its normative concepts, we conducted a scoping review on the ethics of generative artificial intelligence, including especially large language models and text-to-image models. Our analysis provides a taxonomy of 378 normative issues in 19 topic areas and ranks them according to their prevalence in the literature. The study offers a comprehensive overview for scholars, practitioners, or policymakers, condensing the ethical debates surrounding fairness, safety, harmful content, hallucinations, privacy, interaction risks, security, alignment, societal impacts, and others. We discuss the results, evaluate imbalances in the literature, and explore unsubstantiated risk scenarios. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2311.06826 [pdf, other]

doi 10.1007/s13347-023-00679-8

Fairness Hacking: The Malicious Practice of Shrouding Unfairness in Algorithms

Authors: Kristof Meding, Thilo Hagendorff

Abstract: Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness… ▽ More Fairness in machine learning (ML) is an ever-growing field of research due to the manifold potential for harm from algorithmic discrimination. To prevent such harm, a large body of literature develops new approaches to quantify fairness. Here, we investigate how one can divert the quantification of fairness by describing a practice we call "fairness hacking" for the purpose of shrouding unfairness in algorithms. This impacts end-users who rely on learning algorithms, as well as the broader community interested in fair AI practices. We introduce two different categories of fairness hacking in reference to the established concept of p-hacking. The first category, intra-metric fairness hacking, describes the misuse of a particular metric by adding or removing sensitive attributes from the analysis. In this context, countermeasures that have been developed to prevent or reduce p-hacking can be applied to similarly prevent or reduce fairness hacking. The second category of fairness hacking is inter-metric fairness hacking. Inter-metric fairness hacking is the search for a specific fair metric with given attributes. We argue that countermeasures to prevent or reduce inter-metric fairness hacking are still in their infancy. Finally, we demonstrate both types of fairness hacking using real datasets. Our paper intends to serve as a guidance for discussions within the fair ML community to prevent or reduce the misuse of fairness metrics, and thus reduce overall harm from ML applications. △ Less

Submitted 12 November, 2023; originally announced November 2023.

arXiv:2307.16513 [pdf]

doi 10.1073/pnas.2317967121

Deception Abilities Emerged in Large Language Models

Authors: Thilo Hagendorff

Abstract: Large language models (LLMs) are currently at the forefront of intertwining artificial intelligence (AI) systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitorin… ▽ More Large language models (LLMs) are currently at the forefront of intertwining artificial intelligence (AI) systems with human communication and everyday life. Thus, aligning them with human values is of great importance. However, given the steady increase in reasoning abilities, future LLMs are under suspicion of becoming able to deceive human operators and utilizing this ability to bypass monitoring efforts. As a prerequisite to this, LLMs need to possess a conceptual understanding of deception strategies. This study reveals that such strategies emerged in state-of-the-art LLMs, such as GPT-4, but were non-existent in earlier LLMs. We conduct a series of experiments showing that state-of-the-art LLMs are able to understand and induce false beliefs in other agents, that their performance in complex deception scenarios can be amplified utilizing chain-of-thought reasoning, and that eliciting Machiavellianism in LLMs can alter their propensity to deceive. In sum, revealing hitherto unknown machine behavior in LLMs, our study contributes to the nascent field of machine psychology. △ Less

Submitted 2 February, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

arXiv:2306.07622

doi 10.1038/s43588-023-00527-x

Human-Like Intuitive Behavior and Reasoning Biases Emerged in Language Models -- and Disappeared in GPT-4

Authors: Thilo Hagendorff, Sarah Fabi

Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs, most notably GPT-3, exhibit behavior that strikingly resembles human-like intuition -- and the cognitive errors that come with it. However, LLMs with higher cog… ▽ More Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs, most notably GPT-3, exhibit behavior that strikingly resembles human-like intuition -- and the cognitive errors that come with it. However, LLMs with higher cognitive capabilities, in particular ChatGPT and GPT-4, learned to avoid succumbing to these errors and perform in a hyperrational manner. For our experiments, we probe LLMs with the Cognitive Reflection Test (CRT) as well as semantic illusions that were originally designed to investigate intuitive decision-making in humans. Moreover, we probe how sturdy the inclination for intuitive-like decision-making is. Our study demonstrates that investigating LLMs with methods from psychology has the potential to reveal otherwise unknown emergent traits. △ Less

Submitted 18 August, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

Comments: Overlap with arXiv:2212.05206

arXiv:2303.13988 [pdf]

Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods

Authors: Thilo Hagendorff

Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to t… ▽ More Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Due to rapid technological advances and their extreme versatility, LLMs nowadays have millions of users and are at the cusp of being the main go-to technology for information retrieval, content generation, problem-solving, etc. Therefore, it is of great importance to thoroughly assess and scrutinize their capabilities. Due to increasingly complex and novel behavioral patterns in current LLMs, this can be done by treating them as participants in psychology experiments that were originally designed to test humans. For this purpose, the paper introduces a new field of research called "machine psychology". The paper outlines how different subfields of psychology can inform behavioral tests for LLMs. It defines methodological standards for machine psychology research, especially by focusing on policies for prompt designs. Additionally, it describes how behavioral patterns discovered in LLMs are to be interpreted. In sum, machine psychology aims to discover emergent abilities in LLMs that cannot be detected by most traditional natural language processing benchmarks. △ Less

Submitted 8 July, 2024; v1 submitted 24 March, 2023; originally announced March 2023.

arXiv:2301.06859 [pdf]

Methodological reflections for AI alignment research using human feedback

Authors: Thilo Hagendorff, Sarah Fabi

Abstract: The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models (LLMs), which have the potential to exhibit unintended behavior due to their ability to learn and adapt in ways that are difficult to predict. In this paper, w… ▽ More The field of artificial intelligence (AI) alignment aims to investigate whether AI technologies align with human interests and values and function in a safe and ethical manner. AI alignment is particularly relevant for large language models (LLMs), which have the potential to exhibit unintended behavior due to their ability to learn and adapt in ways that are difficult to predict. In this paper, we discuss methodological challenges for the alignment problem specifically in the context of LLMs trained to summarize texts. In particular, we focus on methods for collecting reliable human feedback on summaries to train a reward model which in turn improves the summarization model. We conclude by suggesting specific improvements in the experimental design of alignment studies for LLMs' summarization capabilities. △ Less

Submitted 22 December, 2022; originally announced January 2023.

arXiv:2212.05206 [pdf]

doi 10.1038/s43588-023-00527-x

Thinking Fast and Slow in Large Language Models

Authors: Thilo Hagendorff, Sarah Fabi, Michal Kosinski

Abstract: Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs like GPT-3 exhibit behavior that strikingly resembles human-like intuition - and the cognitive errors that come with it. However, LLMs with higher cognitive capa… ▽ More Large language models (LLMs) are currently at the forefront of intertwining AI systems with human communication and everyday life. Therefore, it is of great importance to evaluate their emerging abilities. In this study, we show that LLMs like GPT-3 exhibit behavior that strikingly resembles human-like intuition - and the cognitive errors that come with it. However, LLMs with higher cognitive capabilities, in particular ChatGPT and GPT-4, learned to avoid succumbing to these errors and perform in a hyperrational manner. For our experiments, we probe LLMs with the Cognitive Reflection Test (CRT) as well as semantic illusions that were originally designed to investigate intuitive decision-making in humans. Our study demonstrates that investigating LLMs with methods from psychology has the potential to reveal otherwise unknown emergent traits. △ Less

Submitted 2 August, 2023; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2206.09887 [pdf]

doi 10.1007/s44206-023-00063-1

How to Assess Trustworthy AI in Practice

Authors: Roberto V. Zicari, Julia Amann, Frédérick Bruneault, Megan Coffee, Boris Düdder, Eleanore Hickman, Alessio Gallucci, Thomas Krendl Gilbert, Thilo Hagendorff, Irmhild van Halem, Elisabeth Hildt, Sune Holm, Georgios Kararigas, Pedro Kringen, Vince I. Madai, Emilie Wiinblad Mathez, Jesmin Jahan Tithi, Dennis Vetter, Magnus Westerlund, Renee Wurth

Abstract: This report is a methodological reflection on Z-Inspection$^{\small{\circledR}}$. Z-Inspection$^{\small{\circledR}}$ is a holistic process used to evaluate the trustworthiness of AI-based technologies at different stages of the AI lifecycle. It focuses, in particular, on the identification and discussion of ethical issues and tensions through the elaboration of socio-technical scenarios. It uses t… ▽ More This report is a methodological reflection on Z-Inspection$^{\small{\circledR}}$. Z-Inspection$^{\small{\circledR}}$ is a holistic process used to evaluate the trustworthiness of AI-based technologies at different stages of the AI lifecycle. It focuses, in particular, on the identification and discussion of ethical issues and tensions through the elaboration of socio-technical scenarios. It uses the general European Union's High-Level Expert Group's (EU HLEG) guidelines for trustworthy AI. This report illustrates for both AI researchers and AI practitioners how the EU HLEG guidelines for trustworthy AI can be applied in practice. We share the lessons learned from conducting a series of independent assessments to evaluate the trustworthiness of AI systems in healthcare. We also share key recommendations and practical suggestions on how to ensure a rigorous trustworthy AI assessment throughout the life-cycle of an AI system. △ Less

Submitted 28 June, 2022; v1 submitted 20 June, 2022; originally announced June 2022.

Comments: On behalf of the Z-Inspection$^{\small{\circledR}}$ initiative (2022)

arXiv:2203.09911 [pdf]

doi 10.1080/0952813X.2023.2178517

Why we need biased AI -- How including cognitive and ethical machine biases can enhance AI systems

Authors: Sarah Fabi, Thilo Hagendorff

Abstract: This paper stresses the importance of biases in the field of artificial intelligence (AI) in two regards. First, in order to foster efficient algorithmic decision-making in complex, unstable, and uncertain real-world environments, we argue for the structurewise implementation of human cognitive biases in learning algorithms. Secondly, we argue that in order to achieve ethical machine behavior, fil… ▽ More This paper stresses the importance of biases in the field of artificial intelligence (AI) in two regards. First, in order to foster efficient algorithmic decision-making in complex, unstable, and uncertain real-world environments, we argue for the structurewise implementation of human cognitive biases in learning algorithms. Secondly, we argue that in order to achieve ethical machine behavior, filter mechanisms have to be applied for selecting biased training stimuli that represent social or behavioral traits that are ethically desirable. We use insights from cognitive science as well as ethics and apply them to the AI field, combining theoretical considerations with seven case studies depicting tangible bias implementation scenarios. Ultimately, this paper is the first tentative step to explicitly pursue the idea of a re-evaluation of the ethical significance of machine biases, as well as putting the idea forth to implement cognitive biases into machines. △ Less

Submitted 18 March, 2022; originally announced March 2022.

arXiv:2202.10848 [pdf]

doi 10.1007/s43681-022-00199-9

Speciesist bias in AI -- How AI applications perpetuate discrimination and unfair outcomes against animals

Authors: Thilo Hagendorff, Leonie Bossert, Tse Yip Fai, Peter Singer

Abstract: Massive efforts are made to reduce biases in both data and algorithms in order to render AI applications fair. These efforts are propelled by various high-profile cases where biased algorithmic decision-making caused harm to women, people of color, minorities, etc. However, the AI fairness field still succumbs to a blind spot, namely its insensitivity to discrimination against animals. This paper… ▽ More Massive efforts are made to reduce biases in both data and algorithms in order to render AI applications fair. These efforts are propelled by various high-profile cases where biased algorithmic decision-making caused harm to women, people of color, minorities, etc. However, the AI fairness field still succumbs to a blind spot, namely its insensitivity to discrimination against animals. This paper is the first to describe the 'speciesist bias' and investigate it in several different AI systems. Speciesist biases are learned and solidified by AI applications when they are trained on datasets in which speciesist patterns prevail. These patterns can be found in image recognition systems, large language models, and recommender systems. Therefore, AI technologies currently play a significant role in perpetuating and normalizing violence against animals. This can only be changed when AI fairness frameworks widen their scope and include mitigation measures for speciesist biases. This paper addresses the AI community in this regard and stresses the influence AI systems can have on either increasing or reducing the violence that is inflicted on animals, and especially on farmed animals. △ Less

Submitted 22 February, 2022; originally announced February 2022.

arXiv:2011.12750 [pdf]

doi 10.1007/s13347-022-00553-z

AI virtues -- The missing link in putting AI ethics into practice

Authors: Thilo Hagendorff

Abstract: Several seminal ethics initiatives have stipulated sets of principles and standards for good technology development in the AI sector. However, widespread criticism has pointed out a lack of practical realization of these principles. Following that, AI ethics underwent a practical turn, but without deviating from the principled approach and the many shortcomings associated with it. This paper propo… ▽ More Several seminal ethics initiatives have stipulated sets of principles and standards for good technology development in the AI sector. However, widespread criticism has pointed out a lack of practical realization of these principles. Following that, AI ethics underwent a practical turn, but without deviating from the principled approach and the many shortcomings associated with it. This paper proposes a different approach. It defines four basic AI virtues, namely justice, honesty, responsibility and care, all of which represent specific motivational settings that constitute the very precondition for ethical decision making in the AI field. Moreover, it defines two second-order AI virtues, prudence and fortitude, that bolster achieving the basic virtues by helping with overcoming bounded ethicality or the many hidden psychological forces that impair ethical decision making and that are hitherto disregarded in AI ethics. Lastly, the paper describes measures for successfully cultivating the mentioned virtues in organizations dealing with AI research and development. △ Less

Submitted 18 February, 2021; v1 submitted 25 November, 2020; originally announced November 2020.

arXiv:2008.11463 [pdf]

doi 10.1007/s11023-021-09573-8

Ethical behavior in humans and machines -- Evaluating training data quality for beneficial machine learning

Authors: Thilo Hagendorff

Abstract: Machine behavior that is based on learning algorithms can be significantly influenced by the exposure to data of different qualities. Up to now, those qualities are solely measured in technical terms, but not in ethical ones, despite the significant role of training and annotation data in supervised machine learning. This is the first study to fill this gap by describing new dimensions of data qua… ▽ More Machine behavior that is based on learning algorithms can be significantly influenced by the exposure to data of different qualities. Up to now, those qualities are solely measured in technical terms, but not in ethical ones, despite the significant role of training and annotation data in supervised machine learning. This is the first study to fill this gap by describing new dimensions of data quality for supervised machine learning applications. Based on the rationale that different social and psychological backgrounds of individuals correlate in practice with different modes of human-computer-interaction, the paper describes from an ethical perspective how varying qualities of behavioral data that individuals leave behind while using digital technologies have socially relevant ramification for the development of machine learning applications. The specific objective of this study is to describe how training data can be selected according to ethical assessments of the behavior it originates from, establishing an innovative filter regime to transition from the big data rationale n = all to a more selective way of processing data for training sets in machine learning. The overarching aim of this research is to promote methods for achieving beneficial machine learning applications that could be widely useful for industry as well as academia. △ Less

Submitted 26 August, 2020; originally announced August 2020.

Comments: 30 pages, 1 figure

arXiv:2006.04541 [pdf, other]

doi 10.1007/s00146-021-01284-z

Ethical Considerations and Statistical Analysis of Industry Involvement in Machine Learning Research

Authors: Thilo Hagendorff, Kristof Meding

Abstract: Industry involvement in the machine learning (ML) community seems to be increasing. However, the quantitative scale and ethical implications of this influence are rather unknown. For this purpose, we have not only carried out an informed ethical analysis of the field, but have inspected all papers of the main ML conferences NeurIPS, CVPR, and ICML of the last 5 years - almost 11,000 papers in tota… ▽ More Industry involvement in the machine learning (ML) community seems to be increasing. However, the quantitative scale and ethical implications of this influence are rather unknown. For this purpose, we have not only carried out an informed ethical analysis of the field, but have inspected all papers of the main ML conferences NeurIPS, CVPR, and ICML of the last 5 years - almost 11,000 papers in total. Our statistical approach focuses on conflicts of interest, innovation and gender equality. We have obtained four main findings: (1) Academic-corporate collaborations are growing in numbers. At the same time, we found that conflicts of interest are rarely disclosed. (2) Industry publishes papers about trending ML topics on average two years earlier than academia does. (3) Industry papers are not lagging behind academic papers in regard to social impact considerations. (4) Finally, we demonstrate that industrial papers fall short of their academic counterparts with respect to the ratio of gender diversity. We believe that this work is a starting point for an informed debate within and outside of the ML community. △ Less

Submitted 19 October, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

arXiv:1911.08603 [pdf]

doi 10.1007/s00146-020-01045-4

Forbidden knowledge in machine learning -- Reflections on the limits of research and publication

Authors: Thilo Hagendorff

Abstract: Certain research strands can yield "forbidden knowledge". This term refers to knowledge that is considered too sensitive, dangerous or taboo to be produced or shared. Discourses about such publication restrictions are already entrenched in scientific fields like IT security, synthetic biology or nuclear physics research. This paper makes the case for transferring this discourse to machine learning… ▽ More Certain research strands can yield "forbidden knowledge". This term refers to knowledge that is considered too sensitive, dangerous or taboo to be produced or shared. Discourses about such publication restrictions are already entrenched in scientific fields like IT security, synthetic biology or nuclear physics research. This paper makes the case for transferring this discourse to machine learning research. Some machine learning applications can very easily be misused and unfold harmful consequences, for instance with regard to generative video or text synthesis, personality analysis, behavior manipulation, software vulnerability detection and the like. Up to now, the machine learning research community embraces the idea of open access. However, this is opposed to precautionary efforts to prevent the malicious use of machine learning applications. Information about or from such applications may, if improperly disclosed, cause harm to people, organizations or whole societies. Hence, the goal of this work is to outline norms that can help to decide whether and when the dissemination of such information should be prevented. It proposes review parameters for the machine learning community to establish an ethical framework on how to deal with forbidden knowledge and dual-use applications. △ Less

Submitted 19 November, 2019; originally announced November 2019.

arXiv:1907.03848 [pdf]

Artificial Intelligence Governance and Ethics: Global Perspectives

Authors: Angela Daly, Thilo Hagendorff, Li Hui, Monique Mann, Vidushi Marda, Ben Wagner, Wei Wang, Saskia Witteborn

Abstract: Artificial intelligence (AI) is a technology which is increasingly being utilised in society and the economy worldwide, and its implementation is planned to become more prevalent in coming years. AI is increasingly being embedded in our lives, supplementing our pervasive use of digital technologies. But this is being accompanied by disquiet over problematic and dangerous implementations of AI, or… ▽ More Artificial intelligence (AI) is a technology which is increasingly being utilised in society and the economy worldwide, and its implementation is planned to become more prevalent in coming years. AI is increasingly being embedded in our lives, supplementing our pervasive use of digital technologies. But this is being accompanied by disquiet over problematic and dangerous implementations of AI, or indeed, even AI itself deciding to do dangerous and problematic actions, especially in fields such as the military, medicine and criminal justice. These developments have led to concerns about whether and how AI systems adhere, and will adhere to ethical standards. These concerns have stimulated a global conversation on AI ethics, and have resulted in various actors from different countries and sectors issuing ethics and governance initiatives and guidelines for AI. Such developments form the basis for our research in this report, combining our international and interdisciplinary expertise to give an insight into what is happening in Australia, China, Europe, India and the US. △ Less

Submitted 28 June, 2019; originally announced July 2019.

arXiv:1903.03425 [pdf]

doi 10.1007/s11023-020-09517-8

The Ethics of AI Ethics -- An Evaluation of Guidelines

Authors: Thilo Hagendorff

Abstract: Current advances in research, development and application of artificial intelligence (AI) systems have yielded a far-reaching discourse on AI ethics. In consequence, a number of ethics guidelines have been released in recent years. These guidelines comprise normative principles and recommendations aimed to harness the "disruptive" potentials of new AI technologies. Designed as a comprehensive eval… ▽ More Current advances in research, development and application of artificial intelligence (AI) systems have yielded a far-reaching discourse on AI ethics. In consequence, a number of ethics guidelines have been released in recent years. These guidelines comprise normative principles and recommendations aimed to harness the "disruptive" potentials of new AI technologies. Designed as a comprehensive evaluation, this paper analyzes and compares these guidelines highlighting overlaps but also omissions. As a result, I give a detailed overview of the field of AI ethics. Finally, I also examine to what extent the respective ethical principles and values are implemented in the practice of research, development and application of AI systems - and how the effectiveness in the demands of AI ethics can be improved. △ Less

Submitted 11 October, 2019; v1 submitted 28 February, 2019; originally announced March 2019.

Comments: 16 pages, 1 table

Journal ref: Minds & Machines, 2020

Showing 1–16 of 16 results for author: Hagendorff, T