Skip to main content

Showing 1–50 of 70 results for author: Lakkaraju, H

  1. arXiv:2407.08689  [pdf, ps, other

    cs.AI cs.CY cs.LG

    Operationalizing the Blueprint for an AI Bill of Rights: Recommendations for Practitioners, Researchers, and Policy Makers

    Authors: Alex Oesterling, Usha Bhalla, Suresh Venkatasubramanian, Himabindu Lakkaraju

    Abstract: As Artificial Intelligence (AI) tools are increasingly employed in diverse real-world applications, there has been significant interest in regulating these tools. To this end, several regulatory frameworks have been introduced by different countries worldwide. For example, the European Union recently passed the AI Act, the White House issued an Executive Order on safe, secure, and trustworthy AI,… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 15 pages

  2. arXiv:2406.10625  [pdf, other

    cs.CL

    On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

    Authors: Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: As Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare, it is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their underlying behavior. While LLMs are known to generate CoT reasoning that is appealing to humans, prior studies have shown that these explanations d… ▽ More

    Submitted 1 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

  3. arXiv:2405.05386  [pdf, other

    cs.LG cs.CL cs.CV stat.ML

    Interpretability Needs a New Paradigm

    Authors: Andreas Madsen, Himabindu Lakkaraju, Siva Reddy, Sarath Chandar

    Abstract: Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be explained, and the post-hoc paradigm, which believes that black-box models can be explained. At the core of this debate is how each paradigm ensures its explanations… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2404.18870  [pdf, other

    cs.CL cs.AI

    More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

    Authors: Aaron J. Li, Satyapriya Krishna, Himabindu Lakkaraju

    Abstract: The surge in Large Language Models (LLMs) development has led to improved performance on cognitive tasks as well as an urgent need to align these models with human values in order to safely exploit their power. Despite the effectiveness of preference learning algorithms like Reinforcement Learning From Human Feedback (RLHF) in aligning human preferences, their assumed improvements on model trustwo… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  5. arXiv:2404.07981  [pdf, other

    cs.IR cs.AI cs.CL

    Manipulating Large Language Models to Increase Product Visibility

    Authors: Aounon Kumar, Himabindu Lakkaraju

    Abstract: Large language models (LLMs) are increasingly being integrated into search engines to provide natural language responses tailored to user queries. Customers and end-users are also becoming more dependent on these models for quick and easy purchase decisions. In this work, we investigate whether recommendations from LLMs can be manipulated to enhance a product's visibility. We demonstrate that addi… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  6. arXiv:2404.04714  [pdf, other

    cs.LG cs.AI cs.CR

    Data Poisoning Attacks on Off-Policy Policy Evaluation Methods

    Authors: Elita Lobo, Harvineet Singh, Marek Petrik, Cynthia Rudin, Himabindu Lakkaraju

    Abstract: Off-policy Evaluation (OPE) methods are a crucial tool for evaluating policies in high-stakes domains such as healthcare, where exploration is often infeasible, unethical, or expensive. However, the extent to which such methods can be trusted under adversarial threats to data quality is largely unexplored. In this work, we make the first attempt at investigating the sensitivity of OPE methods to m… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted at UAI 2022

  7. arXiv:2403.05565  [pdf, other

    cs.HC cs.AI

    OpenHEXAI: An Open-Source Framework for Human-Centered Evaluation of Explainable Machine Learning

    Authors: Jiaqi Ma, Vivian Lai, Yiming Zhang, Chacha Chen, Paul Hamilton, Davor Ljubenkov, Himabindu Lakkaraju, Chenhao Tan

    Abstract: Recently, there has been a surge of explainable AI (XAI) methods driven by the need for understanding machine learning model behaviors in high-stakes scenarios. However, properly evaluating the effectiveness of the XAI methods inevitably requires the involvement of human subjects, and conducting human-centered benchmarks is challenging in a number of ways: designing and implementing user studies i… ▽ More

    Submitted 20 February, 2024; originally announced March 2024.

  8. arXiv:2403.03744  [pdf, other

    cs.AI

    MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models

    Authors: Tessa Han, Aounon Kumar, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: As large language models (LLMs) develop increasingly sophisticated capabilities and find applications in medical settings, it becomes important to assess their medical safety due to their far-reaching implications for personal and public health, patient safety, and human rights. However, there is little to no understanding of the notion of medical safety in the context of LLMs, let alone how to ev… ▽ More

    Submitted 13 June, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

  9. arXiv:2402.17840  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

    Authors: Zhenting Qi, Hanlin Zhang, Eric Xing, Sham Kakade, Himabindu Lakkaraju

    Abstract: Retrieval-Augmented Generation (RAG) improves pre-trained models by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context RAG Language Models (LMs). We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with ins… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  10. arXiv:2402.10688  [pdf, other

    cs.CL

    Towards Uncovering How Large Language Model Works: An Explainability Perspective

    Authors: Haiyan Zhao, Fan Yang, Bo Shen, Himabindu Lakkaraju, Mengnan Du

    Abstract: Large language models (LLMs) have led to breakthroughs in language tasks, yet the internal mechanisms that enable their remarkable generalization and reasoning abilities remain opaque. This lack of transparency presents challenges such as hallucinations, toxicity, and misalignment with human values, hindering the safe and beneficial deployment of LLMs. This paper aims to uncover the mechanisms und… ▽ More

    Submitted 15 April, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 8 pages, 2 figures

  11. arXiv:2402.10376  [pdf, other

    cs.LG cs.CV

    Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

    Authors: Usha Bhalla, Alex Oesterling, Suraj Srinivas, Flavio P. Calmon, Himabindu Lakkaraju

    Abstract: CLIP embeddings have demonstrated remarkable performance across a wide range of computer vision tasks. However, these high-dimensional, dense vector representations are not easily interpretable, restricting their usefulness in downstream applications that require transparency. In this work, we empirically show that CLIP's latent space is highly structured, and consequently that CLIP representation… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 17 pages, 8 figures, Code is provided at https://github.com/AI4LIFE-GROUP/SpLiCE

  12. arXiv:2402.06625  [pdf, other

    cs.CL

    Understanding the Effects of Iterative Prompting on Truthfulness

    Authors: Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: The development of Large Language Models (LLMs) has notably transformed numerous sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of these models remain pressing concerns. To this end, we investigate iterative prompting, a strategy hypothesized to refine LLM responses, assessing its impact on LLM truthfulness, an area which has not been thoroughly ex… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  13. arXiv:2402.04614  [pdf, other

    cs.CL

    Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language Models

    Authors: Chirag Agarwal, Sree Harsha Tanneru, Himabindu Lakkaraju

    Abstract: Large Language Models (LLMs) are deployed as powerful tools for several natural language processing (NLP) applications. Recent works show that modern LLMs can generate self-explanations (SEs), which elicit their intermediate reasoning steps for explaining their behavior. Self-explanations have seen widespread adoption owing to their conversational and plausible nature. However, there is little to… ▽ More

    Submitted 13 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  14. arXiv:2312.05690  [pdf, other

    cs.HC

    Is Ignorance Bliss? The Role of Post Hoc Explanation Faithfulness and Alignment in Model Trust in Laypeople and Domain Experts

    Authors: Tessa Han, Yasha Ektefaie, Maha Farhat, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: Post hoc explanations have emerged as a way to improve user trust in machine learning models by providing insight into model decision-making. However, explanations tend to be evaluated based on their alignment with prior knowledge while the faithfulness of an explanation with respect to the model, a fundamental criterion, is often overlooked. Furthermore, the effect of explanation faithfulness and… ▽ More

    Submitted 11 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  15. arXiv:2312.04021  [pdf, other

    cs.CL cs.AI cs.LG

    A Study on the Calibration of In-context Learning

    Authors: Hanlin Zhang, Yi-Fan Zhang, Yaodong Yu, Dhruv Madeka, Dean Foster, Eric Xing, Himabindu Lakkaraju, Sham Kakade

    Abstract: Accurate uncertainty quantification is crucial for the safe deployment of machine learning models, and prior research has demonstrated improvements in the calibration of modern language models (LMs). We study in-context learning (ICL), a prevalent method for adapting static LMs through tailored prompts, and examine the balance between performance and calibration across a broad spectrum of natural… ▽ More

    Submitted 27 March, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

    Comments: NAACL 2024

  16. arXiv:2311.03533  [pdf, other

    cs.CL

    Quantifying Uncertainty in Natural Language Explanations of Large Language Models

    Authors: Sree Harsha Tanneru, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: Large Language Models (LLMs) are increasingly used as powerful tools for several high-stakes natural language processing (NLP) applications. Recent prompting works claim to elicit intermediate reasoning steps and key tokens that serve as proxy explanations for LLM predictions. However, there is no certainty whether these explanations are reliable and reflect the LLMs behavior. In this work, we mak… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  17. arXiv:2310.14607  [pdf, other

    cs.CL cs.LG

    Confronting LLMs with Traditional ML: Rethinking the Fairness of Large Language Models in Tabular Classifications

    Authors: Yanchen Liu, Srishti Gautam, Jiaqi Ma, Himabindu Lakkaraju

    Abstract: Recent literature has suggested the potential of using large language models (LLMs) to make classifications for tabular tasks. However, LLMs have been shown to exhibit harmful social biases that reflect the stereotypes and inequalities present in society. To this end, as well as the widespread use of tabular data in many high-stake applications, it is important to explore the following questions:… ▽ More

    Submitted 2 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NAACL 2024 (Main Conference)

  18. arXiv:2310.07579  [pdf, other

    cs.LG cs.AI cs.CR

    In-Context Unlearning: Language Models as Few Shot Unlearners

    Authors: Martin Pawelczyk, Seth Neel, Himabindu Lakkaraju

    Abstract: Machine unlearning, the study of efficiently removing the impact of specific training instances on a model, has garnered increased attention in recent years due to regulatory guidelines such as the \emph{Right to be Forgotten}. Achieving precise unlearning typically involves fully retraining the model and is computationally infeasible in case of very large models such as Large Language Models (LLM… ▽ More

    Submitted 6 June, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: Accepted at ICML 2024

  19. arXiv:2310.05797  [pdf, other

    cs.CL cs.AI cs.LG

    In-Context Explainers: Harnessing LLMs for Explaining Black Box Models

    Authors: Nicholas Kroeger, Dan Ley, Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated exceptional capabilities in complex tasks like machine translation, commonsense reasoning, and language understanding. One of the primary reasons for the adaptability of LLMs in such diverse tasks is their in-context learning (ICL) capability, which allows them to perform well on new tasks by simply using a few task samples in t… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

  20. arXiv:2309.16452  [pdf, other

    cs.LG

    On the Trade-offs between Adversarial Robustness and Actionable Explanations

    Authors: Satyapriya Krishna, Chirag Agarwal, Himabindu Lakkaraju

    Abstract: As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the fir… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

  21. arXiv:2309.02705  [pdf, other

    cs.CL cs.AI cs.CR cs.LG

    Certifying LLM Safety against Adversarial Prompting

    Authors: Aounon Kumar, Chirag Agarwal, Suraj Srinivas, Aaron Jiaxun Li, Soheil Feizi, Himabindu Lakkaraju

    Abstract: Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt to bypass the safety guardrails of an LLM and cause it to produce harmful content. In this work, we introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees. Given a prompt, our procedure erases tokens individually and in… ▽ More

    Submitted 12 February, 2024; v1 submitted 6 September, 2023; originally announced September 2023.

  22. arXiv:2308.04341  [pdf, other

    cs.LG cs.CR

    Accurate, Explainable, and Private Models: Providing Recourse While Minimizing Training Data Leakage

    Authors: Catherine Huang, Chelse Swoopes, Christina Xiao, Jiaqi Ma, Himabindu Lakkaraju

    Abstract: Machine learning models are increasingly utilized across impactful domains to predict individual outcomes. As such, many models provide algorithmic recourse to individuals who receive negative outcomes. However, recourse can be leveraged by adversaries to disclose private information. This work presents the first attempt at mitigating such attacks. We present two novel methods to generate differen… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Proceedings of The Second Workshop on New Frontiers in Adversarial Machine Learning (AdvML-Frontiers @ ICML 2023)

  23. arXiv:2307.15007  [pdf, other

    cs.LG cs.CV

    Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

    Authors: Usha Bhalla, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: With the increased deployment of machine learning models in various real-world applications, researchers and practitioners alike have emphasized the need for explanations of model behaviour. To this end, two broad strategies have been outlined in prior literature to explain models. Post hoc explanation methods explain the behaviour of complex black-box models by identifying features critical to mo… ▽ More

    Submitted 15 February, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Journal ref: NeurIPS 2023 (Thirty-seventh Conference on Neural Information Processing Systems)

  24. arXiv:2307.14754  [pdf, other

    cs.LG cs.AI

    Fair Machine Unlearning: Data Removal while Mitigating Disparities

    Authors: Alex Oesterling, Jiaqi Ma, Flavio P. Calmon, Hima Lakkaraju

    Abstract: The Right to be Forgotten is a core principle outlined by regulatory frameworks such as the EU's General Data Protection Regulation (GDPR). This principle allows individuals to request that their personal data be deleted from deployed machine learning models. While "forgetting" can be naively achieved by retraining on the remaining dataset, it is computationally expensive to do to so with each new… ▽ More

    Submitted 15 February, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

    Comments: 25 pages, 3 figures, accepted to AISTATS 2024. Code is provided at https://github.com/AI4LIFE-GROUP/fair-unlearning

  25. arXiv:2307.13885  [pdf, other

    cs.LG

    Characterizing Data Point Vulnerability via Average-Case Robustness

    Authors: Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: Studying the robustness of machine learning models is important to ensure consistent model behaviour across real-world settings. To this end, adversarial robustness is a standard framework, which views robustness of predictions through a binary lens: either a worst-case adversarial misclassification exists in the local region around an input, or it does not. However, this binary perspective does n… ▽ More

    Submitted 8 July, 2024; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: UAI 2024

  26. arXiv:2307.13339  [pdf, other

    cs.CL cs.AI

    Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions

    Authors: Skyler Wu, Eric Meng Shen, Charumathi Badrinath, Jiaqi Ma, Himabindu Lakkaraju

    Abstract: Chain-of-thought (CoT) prompting has been shown to empirically improve the accuracy of large language models (LLMs) on various question answering tasks. While understanding why CoT prompting is effective is crucial to ensuring that this phenomenon is a consequence of desired model behavior, little work has addressed this; nonetheless, such an understanding is a critical prerequisite for responsibl… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Accepted to Workshop on Challenges in Deployable Generative AI at ICML 2023

  27. arXiv:2306.06716  [pdf, other

    cs.LG

    On Minimizing the Impact of Dataset Shifts on Actionable Explanations

    Authors: Anna P. Meyer, Dan Ley, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: The Right to Explanation is an important regulatory principle that allows individuals to request actionable explanations for algorithmic decisions. However, several technical challenges arise when providing such actionable explanations in practice. For instance, models are periodically retrained to handle dataset shifts. This process may invalidate some of the previously prescribed explanations, t… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 30 pages, 19 figures. To be published at UAI 2023

  28. arXiv:2306.06193  [pdf, other

    cs.LG cs.AI cs.CY

    Consistent Explanations in the Face of Model Indeterminacy via Ensembling

    Authors: Dan Ley, Leonard Tang, Matthew Nazari, Hongjin Lin, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: This work addresses the challenge of providing consistent explanations for predictive models in the presence of model indeterminacy, which arises due to the existence of multiple (nearly) equally well-performing models for a given dataset and task. Despite their similar performance, such models often exhibit inconsistent or even contradictory explanations for their predictions, posing challenges t… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  29. arXiv:2306.05500  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Word-Level Explanations for Analyzing Bias in Text-to-Image Models

    Authors: Alexander Lin, Lucas Monteiro Paes, Sree Harsha Tanneru, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: Text-to-image models take a sentence (i.e., prompt) and generate images associated with this input prompt. These models have created award wining-art, videos, and even synthetic datasets. However, text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex. This paper investigates which word in the input prompt is responsible for bias in generated images. We… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: 5 main pages, 3 pages in appendix, and 3 figures

  30. arXiv:2305.19101  [pdf, other

    cs.LG cs.CV

    Which Models have Perceptually-Aligned Gradients? An Explanation via Off-Manifold Robustness

    Authors: Suraj Srinivas, Sebastian Bordt, Hima Lakkaraju

    Abstract: One of the remarkable properties of robust computer vision models is that their input-gradients are often aligned with human perception, referred to in the literature as perceptually-aligned gradients (PAGs). Despite only being trained for classification, PAGs cause robust models to have rudimentary generative capabilities, including image generation, denoising, and in-painting. However, the under… ▽ More

    Submitted 11 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  31. arXiv:2305.11426  [pdf, other

    cs.CL cs.AI

    Post Hoc Explanations of Language Models Can Improve Language Models

    Authors: Satyapriya Krishna, Jiaqi Ma, Dylan Slack, Asma Ghandeharioun, Sameer Singh, Himabindu Lakkaraju

    Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales… ▽ More

    Submitted 7 December, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

  32. arXiv:2302.04288  [pdf, other

    cs.AI

    Towards Bridging the Gaps between the Right to Explanation and the Right to be Forgotten

    Authors: Satyapriya Krishna, Jiaqi Ma, Himabindu Lakkaraju

    Abstract: The Right to Explanation and the Right to be Forgotten are two important principles outlined to regulate algorithmic decision making and data usage in real-world applications. While the right to explanation allows individuals to request an actionable explanation for an algorithmic decision, the right to be forgotten grants them the right to ask for their data to be deleted from all the databases a… ▽ More

    Submitted 9 February, 2023; v1 submitted 8 February, 2023; originally announced February 2023.

  33. arXiv:2211.05427  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    On the Privacy Risks of Algorithmic Recourse

    Authors: Martin Pawelczyk, Himabindu Lakkaraju, Seth Neel

    Abstract: As predictive models are increasingly being employed to make consequential decisions, there is a growing emphasis on developing techniques that can provide algorithmic recourse to affected individuals. While such recourses can be immensely beneficial to affected individuals, potential adversaries could also exploit these recourses to compromise privacy. In this work, we make the first attempt at i… ▽ More

    Submitted 10 November, 2022; originally announced November 2022.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS), 25-27 April 2023

  34. Towards Robust Off-Policy Evaluation via Human Inputs

    Authors: Harvineet Singh, Shalmali Joshi, Finale Doshi-Velez, Himabindu Lakkaraju

    Abstract: Off-policy Evaluation (OPE) methods are crucial tools for evaluating policies in high-stakes domains such as healthcare, where direct deployment is often infeasible, unethical, or expensive. When deployment environments are expected to undergo changes (that is, dataset shifts), it is important for OPE methods to perform robust evaluation of the policies amidst such changes. Existing approaches con… ▽ More

    Submitted 18 September, 2022; originally announced September 2022.

    Comments: 10 pages, 5 figures, 1 table. Appeared at AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. Expanded version of arXiv:2103.15933

  35. arXiv:2208.09339  [pdf, other

    cs.LG cs.AI

    Evaluating Explainability for Graph Neural Networks

    Authors: Chirag Agarwal, Owen Queen, Himabindu Lakkaraju, Marinka Zitnik

    Abstract: As post hoc explanations are increasingly used to understand the behavior of graph neural networks (GNNs), it becomes crucial to evaluate the quality and reliability of GNN explanations. However, assessing the quality of GNN explanations is challenging as existing graph datasets have no or unreliable ground-truth explanations for a given task. Here, we introduce a synthetic graph data generator, S… ▽ More

    Submitted 16 January, 2023; v1 submitted 19 August, 2022; originally announced August 2022.

  36. arXiv:2207.04154  [pdf, other

    cs.LG cs.AI cs.CL

    TalkToModel: Explaining Machine Learning Models with Interactive Natural Language Conversations

    Authors: Dylan Slack, Satyapriya Krishna, Himabindu Lakkaraju, Sameer Singh

    Abstract: Machine Learning (ML) models are increasingly used to make critical decisions in real-world applications, yet they have become more complex, making them harder to understand. To this end, researchers have proposed several techniques to explain model predictions. However, practitioners struggle to use these explainability techniques because they often do not know which one to choose and how to inte… ▽ More

    Submitted 6 March, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: Pre-print; comments welcome! Reach out to dslack@uci.edu v3 update title and abstract

  37. arXiv:2206.11104  [pdf, other

    cs.LG cs.AI

    OpenXAI: Towards a Transparent Evaluation of Model Explanations

    Authors: Chirag Agarwal, Dan Ley, Satyapriya Krishna, Eshika Saxena, Martin Pawelczyk, Nari Johnson, Isha Puri, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: While several types of post hoc explanation methods have been proposed in recent literature, there is very little work on systematically benchmarking these methods. Here, we introduce OpenXAI, a comprehensive and extensible open-source framework for evaluating and benchmarking post hoc explanation methods. OpenXAI comprises of the following key components: (i) a flexible synthetic data generator a… ▽ More

    Submitted 13 March, 2024; v1 submitted 22 June, 2022; originally announced June 2022.

    Comments: Newer version with updated results and code

  38. arXiv:2206.07144  [pdf, other

    cs.LG

    Efficiently Training Low-Curvature Neural Networks

    Authors: Suraj Srinivas, Kyle Matoba, Himabindu Lakkaraju, Francois Fleuret

    Abstract: The highly non-linear nature of deep neural networks causes them to be susceptible to adversarial examples and have unstable gradients which hinders interpretability. However, existing methods to solve these issues, such as adversarial training, are expensive and often sacrifice predictive accuracy. In this work, we consider curvature, which is a mathematical quantity which encodes the degree of… ▽ More

    Submitted 10 January, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022

  39. arXiv:2206.02868  [pdf, other

    cs.LG cs.HC

    A Human-Centric Take on Model Monitoring

    Authors: Murtuza N Shergadwala, Himabindu Lakkaraju, Krishnaram Kenthapadi

    Abstract: Predictive models are increasingly used to make various consequential decisions in high-stakes domains such as healthcare, finance, and policy. It becomes critical to ensure that these models make accurate predictions, are robust to shifts in the data, do not rely on spurious features, and do not unduly discriminate against minority groups. To this end, several approaches spanning various areas su… ▽ More

    Submitted 20 September, 2022; v1 submitted 6 June, 2022; originally announced June 2022.

  40. arXiv:2206.01254  [pdf, other

    cs.LG cs.AI

    Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

    Authors: Tessa Han, Suraj Srinivas, Himabindu Lakkaraju

    Abstract: A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not… ▽ More

    Submitted 29 December, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

  41. Fairness via Explanation Quality: Evaluating Disparities in the Quality of Post hoc Explanations

    Authors: Jessica Dai, Sohini Upadhyay, Ulrich Aivodji, Stephen H. Bach, Himabindu Lakkaraju

    Abstract: As post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to ensure that the quality of the resulting explanations is consistently high across various population subgroups including the minority groups. For instance, it should not be the case that explanations associated with instances belonging to a particular gender su… ▽ More

    Submitted 1 July, 2022; v1 submitted 15 May, 2022; originally announced May 2022.

    Comments: As presented at AIES 2022

  42. arXiv:2203.06877  [pdf, other

    cs.LG

    Rethinking Stability for Attribution-based Explanations

    Authors: Chirag Agarwal, Nari Johnson, Martin Pawelczyk, Satyapriya Krishna, Eshika Saxena, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: As attribution-based explanation methods are increasingly used to establish model trustworthiness in high-stakes situations, it is critical to ensure that these explanations are stable, e.g., robust to infinitesimal perturbations to an input. However, previous works have shown that state-of-the-art explanation methods generate unstable explanations. Here, we introduce metrics to quantify the stabi… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

  43. arXiv:2203.06768  [pdf, other

    cs.LG cs.CY

    Probabilistically Robust Recourse: Navigating the Trade-offs between Costs and Robustness in Algorithmic Recourse

    Authors: Martin Pawelczyk, Teresa Datta, Johannes van-den-Heuvel, Gjergji Kasneci, Himabindu Lakkaraju

    Abstract: As machine learning models are increasingly being employed to make consequential decisions in real-world settings, it becomes critical to ensure that individuals who are adversely impacted (e.g., loan denied) by the predictions of these models are provided with a means for recourse. While several approaches have been proposed to construct recourses for affected individuals, the recourses output by… ▽ More

    Submitted 11 October, 2023; v1 submitted 13 March, 2022; originally announced March 2022.

    Comments: ICLR 2023, camera ready version

    Journal ref: 11th International Conference on Learning Representations (ICLR) 2023

  44. arXiv:2202.01875  [pdf, other

    cs.LG

    Rethinking Explainability as a Dialogue: A Practitioner's Perspective

    Authors: Himabindu Lakkaraju, Dylan Slack, Yuxin Chen, Chenhao Tan, Sameer Singh

    Abstract: As practitioners increasingly deploy machine learning models in critical domains such as health care, finance, and policy, it becomes vital to ensure that domain experts function effectively alongside these models. Explainability is one way to bridge the gap between human decision-makers and machine learning models. However, most of the existing work on explainability focuses on one-off, static ex… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  45. arXiv:2202.01602  [pdf, other

    cs.LG cs.AI

    The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

    Authors: Satyapriya Krishna, Tessa Han, Alex Gu, Steven Wu, Shahin Jabbari, Himabindu Lakkaraju

    Abstract: As various post hoc explanation methods are increasingly being leveraged to explain complex models in high-stakes settings, it becomes critical to develop a deeper understanding of if and when the explanations output by these methods disagree with each other, and how such disagreements are resolved in practice. However, there is little to no research that provides answers to these critical questio… ▽ More

    Submitted 8 July, 2024; v1 submitted 3 February, 2022; originally announced February 2022.

    Comments: Published in Transactions on Machine Learning Research (TMLR)

  46. arXiv:2106.13346  [pdf, other

    cs.LG cs.AI cs.CY

    What will it take to generate fairness-preserving explanations?

    Authors: Jessica Dai, Sohini Upadhyay, Stephen H. Bach, Himabindu Lakkaraju

    Abstract: In situations where explanations of black-box models may be useful, the fairness of the black-box is also often a relevant concern. However, the link between the fairness of the black-box model and the behavior of explanations for the black-box is unclear. We focus on explanations applied to tabular datasets, suggesting that explanations do not necessarily preserve the fairness properties of the b… ▽ More

    Submitted 24 June, 2021; originally announced June 2021.

    Comments: Presented at ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI

  47. arXiv:2106.12563  [pdf, other

    cs.LG cs.CR

    Feature Attributions and Counterfactual Explanations Can Be Manipulated

    Authors: Dylan Slack, Sophie Hilgard, Sameer Singh, Himabindu Lakkaraju

    Abstract: As machine learning models are increasingly used in critical decision-making settings (e.g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions. Such \textit{explanations} are used to understand and establish trust in models and are vital components in machine learning pipelines. Though explanations are a critical piece in these systems, ther… ▽ More

    Submitted 25 June, 2021; v1 submitted 23 June, 2021; originally announced June 2021.

    Comments: arXiv admin note: text overlap with arXiv:2106.02666

  48. arXiv:2106.09992  [pdf, other

    cs.LG cs.AI

    Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

    Authors: Martin Pawelczyk, Chirag Agarwal, Shalmali Joshi, Sohini Upadhyay, Himabindu Lakkaraju

    Abstract: As machine learning (ML) models become more widely deployed in high-stakes applications, counterfactual explanations have emerged as key tools for providing actionable model explanations in practice. Despite the growing popularity of counterfactual explanations, a deeper understanding of these explanations is still lacking. In this work, we systematically analyze counterfactual explanations throug… ▽ More

    Submitted 19 October, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

    Journal ref: International Conference on Artificial Intelligence and Statistics (AISTATS), 28-30 March 2022

  49. arXiv:2106.09078  [pdf, other

    cs.LG

    Probing GNN Explainers: A Rigorous Theoretical and Empirical Analysis of GNN Explanation Methods

    Authors: Chirag Agarwal, Marinka Zitnik, Himabindu Lakkaraju

    Abstract: As Graph Neural Networks (GNNs) are increasingly being employed in critical real-world applications, several methods have been proposed in recent literature to explain the predictions of these models. However, there has been little to no work on systematically analyzing the reliability of these methods. Here, we introduce the first-ever theoretical analysis of the reliability of state-of-the-art G… ▽ More

    Submitted 22 February, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

    Comments: Accepted to AISTATS 2022

  50. arXiv:2106.02666  [pdf, other

    cs.LG

    Counterfactual Explanations Can Be Manipulated

    Authors: Dylan Slack, Sophie Hilgard, Himabindu Lakkaraju, Sameer Singh

    Abstract: Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilities of these methods and find ways to address them. However, there is little understandin… ▽ More

    Submitted 3 November, 2021; v1 submitted 4 June, 2021; originally announced June 2021.