subscribe to arXiv mailings

doi 10.1145/3630106.3658966

Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings

Authors: Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, Bill Howe

Abstract: The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-wei… ▽ More The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-weight models as incompatible with requirements for transparency, privacy, adaptability, and standards of evidence. Yet the performance penalty in using open-weight models, especially in low-data and low-resource settings, is unclear. We assess the feasibility of using smaller, open-weight models to replace GPT-4-Turbo in zero-shot, few-shot, and fine-tuned regimes, assuming access to only a single, low-cost GPU. We assess value-sensitive issues around bias, privacy, and abstention on three additional tasks relevant to those topics. We find that with relatively low effort, very low absolute monetary cost, and relatively little data for fine-tuning, small open-weight models can achieve competitive performance in domain-adapted tasks without sacrificing generality. We then run experiments considering practical issues in bias, privacy, and hallucination risk, finding that open models offer several benefits over closed models. We intend this work as a case study in understanding the opportunity cost of reproducibility and transparency over for-profit state-of-the-art zero shot performance, finding this cost to be marginal under realistic settings. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: Accepted at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

arXiv:2208.12700 [pdf, other]

Epistemic Parity: Reproducibility as an Evaluation Metric for Differential Privacy

Authors: Lucas Rosenblatt, Bernease Herman, Anastasia Holovenko, Wonkwon Lee, Joshua Loftus, Elizabeth McKinnie, Taras Rumezhak, Andrii Stadnik, Bill Howe, Julia Stoyanovich

Abstract: Differential privacy (DP) data synthesizers support public release of sensitive information, offering theoretical guarantees for privacy but limited evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, accuracy of trained classifiers, or performance over a query workload. The ability for these results t… ▽ More Differential privacy (DP) data synthesizers support public release of sensitive information, offering theoretical guarantees for privacy but limited evidence of utility in practical settings. Utility is typically measured as the error on representative proxy tasks, such as descriptive statistics, accuracy of trained classifiers, or performance over a query workload. The ability for these results to generalize to practitioners' experience has been questioned in a number of settings, including the U.S. Census. In this paper, we propose an evaluation methodology for synthetic data that avoids assumptions about the representativeness of proxy tasks, instead measuring the likelihood that published conclusions would change had the authors used synthetic data, a condition we call epistemic parity. Our methodology consists of reproducing empirical conclusions of peer-reviewed papers on real, publicly available data, then re-running these experiments a second time on DP synthetic data, and comparing the results. We instantiate our methodology over a benchmark of recent peer-reviewed papers that analyze public datasets in the ICPSR repository. We model quantitative claims computationally to automate the experimental workflow, and model qualitative claims by reproducing visualizations and comparing the results manually. We then generate DP synthetic datasets using multiple state-of-the-art mechanisms, and estimate the likelihood that these conclusions will hold. We find that state-of-the-art DP synthesizers are able to achieve high epistemic parity for several papers in our benchmark. However, some papers, and particularly some specific findings, are difficult to reproduce for any of the synthesizers. We advocate for a new class of mechanisms that favor stronger utility guarantees and offer privacy protection with a focus on application-specific threat models and risk-assessment. △ Less

Submitted 31 May, 2023; v1 submitted 26 August, 2022; originally announced August 2022.

Comments: Preprint. 14 pages

arXiv:2205.11473 [pdf, other]

Rethinking Streaming Machine Learning Evaluation

Authors: Shreya Shankar, Bernease Herman, Aditya G. Parameswaran

Abstract: While most work on evaluating machine learning (ML) models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly. In this position paper, we discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delaye… ▽ More While most work on evaluating machine learning (ML) models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly. In this position paper, we discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delayed arrival of labels) and recommend additional metrics to assess streaming ML performance. △ Less

Submitted 23 May, 2022; originally announced May 2022.

Comments: ML Evaluation Standards Workshop (ICLR 2022)

arXiv:2010.08859 [pdf, other]

Printmaking, Puzzles, and Studio Closets: Using Artistic Metaphors to Reimagine the User Interface for Designing Immersive Visualizations

Authors: Bridger Herman, Francesca Samsel, Annie Bares, Seth Johnson, Greg Abram, Daniel F. Keefe

Abstract: We, as a society, need artists to help us interpret and explain science, but what does an artist's studio look like when today's science is built upon the language of large, increasingly complex data? This paper presents a data visualization design interface that lifts the barriers for artists to engage with actively studied, 3D multivariate datasets. To accomplish this, the interface must weave t… ▽ More We, as a society, need artists to help us interpret and explain science, but what does an artist's studio look like when today's science is built upon the language of large, increasingly complex data? This paper presents a data visualization design interface that lifts the barriers for artists to engage with actively studied, 3D multivariate datasets. To accomplish this, the interface must weave together the need for creative artistic processes and the challenging constraints of real-time, data-driven 3D computer graphics. The result is an interface for a technical process, but technical in the way artistic printmaking is technical, not in the sense of computer scripting and programming. Using metaphor, computer graphics algorithms and shader program parameters are reimagined as tools in an artist's printmaking studio. These artistic metaphors and language are merged with a puzzle-piece approach to visual programming and matching iconography. Finally, artists access the interface using a web browser, making it possible to design immersive multivariate data visualizations that can be displayed in VR and AR environments using familiar drawing tablets and touch screens. We report on insights from the interdisciplinary design of the interface and early feedback from artists. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:1912.02943 [pdf, other]

An Algorithmic Equity Toolkit for Technology Audits by Community Advocates and Activists

Authors: Michael Katell, Meg Young, Bernease Herman, Dharma Dailey, Aaron Tam, Vivian Guetler, Corinne Binz, Daniella Raz, P. M. Krafft

Abstract: A wave of recent scholarship documenting the discriminatory harms of algorithmic systems has spurred widespread interest in algorithmic accountability and regulation. Yet effective accountability and regulation is stymied by a persistent lack of resources supporting public understanding of algorithms and artificial intelligence. Through interactions with a US-based civil rights organization and th… ▽ More A wave of recent scholarship documenting the discriminatory harms of algorithmic systems has spurred widespread interest in algorithmic accountability and regulation. Yet effective accountability and regulation is stymied by a persistent lack of resources supporting public understanding of algorithms and artificial intelligence. Through interactions with a US-based civil rights organization and their coalition of community organizations, we identify a need for (i) heuristics that aid stakeholders in distinguishing between types of analytic and information systems in lay language, and (ii) risk assessment tools for such systems that begin by making algorithms more legible. The present work delivers a toolkit to achieve these aims. This paper both presents the Algorithmic Equity Toolkit (AEKit) Equity as an artifact, and details how our participatory process shaped its design. Our work fits within human-computer interaction scholarship as a demonstration of the value of HCI methods and approaches to problems in the area of algorithmic transparency and accountability. △ Less

Submitted 5 December, 2019; originally announced December 2019.

arXiv:1907.13178 [pdf, other]

doi 10.1109/TVCG.2019.2934260

Artifact-Based Rendering: Harnessing Natural and Traditional Visual Media for More Expressive and Engaging 3D Visualizations

Authors: Seth Johnson, Francesca Samsel, Gregory Abram, Daniel Olson, Andrew J. Solis, Bridger Herman, Phillip J. Wolfram, Christophe Lenglet, Daniel F. Keefe

Abstract: We introduce Artifact-Based Rendering (ABR), a framework of tools, algorithms, and processes that makes it possible to produce real, data-driven 3D scientific visualizations with a visual language derived entirely from colors, lines, textures, and forms created using traditional physical media or found in nature. A theory and process for ABR is presented to address three current needs: (i) designi… ▽ More We introduce Artifact-Based Rendering (ABR), a framework of tools, algorithms, and processes that makes it possible to produce real, data-driven 3D scientific visualizations with a visual language derived entirely from colors, lines, textures, and forms created using traditional physical media or found in nature. A theory and process for ABR is presented to address three current needs: (i) designing better visualizations by making it possible for non-programmers to rapidly design and critique many alternative data-to-visual mappings; (ii) expanding the visual vocabulary used in scientific visualizations to depict increasingly complex multivariate data; (iii) bringing a more engaging, natural, and human-relatable handcrafted aesthetic to data visualization. New tools and algorithms to support ABR include front-end applets for constructing artifact-based colormaps, optimizing 3D scanned meshes for use in data visualization, and synthesizing textures from artifacts. These are complemented by an interactive rendering engine with custom algorithms and interfaces that demonstrate multiple new visual styles for depicting point, line, surface, and volume data. A within-the-research-team design study provides early evidence of the shift in visualization design processes that ABR is believed to enable when compared to traditional scientific visualization systems. Qualitative user feedback on applications to climate science and brain imaging support the utility of ABR for scientific discovery and public communication. △ Less

Submitted 15 October, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

Comments: Published in IEEE VIS 2019, 9 pages of content with 2 pages of references, 12 figures

arXiv:1711.07414

The Promise and Peril of Human Evaluation for Model Interpretability

Authors: Bernease Herman

Abstract: Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive expl… ▽ More Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability. △ Less

Submitted 30 October, 2019; v1 submitted 20 November, 2017; originally announced November 2017.

Comments: Presented at NIPS 2017 Symposium on Interpretable Machine Learning. I'm not happy with the writing and presentation of these ideas and hope to submit an updated and extended version in 2020

arXiv:1710.08874 [pdf, other]

Synthetic Data for Social Good

Authors: Bill Howe, Julia Stoyanovich, Haoyue Ping, Bernease Herman, Matt Gee

Abstract: Data for good implies unfettered access to data. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take over a year… ▽ More Data for good implies unfettered access to data. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take over a year to establish. We consider the generation and use of synthetic data to facilitate ad hoc collaborations involving sensitive data. A good synthetic dataset has two properties: it is representative of the original data, and it provides strong guarantees about privacy. In this paper, we discuss important use cases for synthetic data that challenge the state of the art in privacy-preserving data generation, and describe DataSynthesizer, a dataset generation tool that takes a sensitive dataset as input and generates a structurally and statistically similar synthetic dataset, with strong privacy guarantees, as output. The data owners need not release their data, while potential collaborators can begin developing models and methods with some confidence that their results will work similarly on the real dataset. The distinguishing feature of DataSynthesizer is its usability - in most cases, the data owner need not specify any parameters to start generating and sharing data safely and effectively. The code implementing DataSynthesizer is publicly available on GitHub at https://github.com/DataResponsibly. The work on DataSynthesizer is part of the Data, Responsibly project, where the goal is to operationalize responsibility in data sharing, integration, analysis and use. △ Less

Submitted 24 October, 2017; originally announced October 2017.

Comments: Presented at the Data For Good Exchange 2017

arXiv:1710.02447 [pdf, other]

Data science for urban equity: Making gentrification an accessible topic for data scientists, policymakers, and the community

Authors: Bernease Herman, Gundula Proksch, Rachel Berney, Hillary Dawkins, Jacob Kovacs, Yahui Ma, Jacob Rich, Amanda Tan

Abstract: The University of Washington eScience Institute runs an annual Data Science for Social Good (DSSG) program that selects four projects each year to train students from a wide range of disciplines while helping community members execute social good projects, often with an urban focus. We present observations and deliberations of one such project, the DSSG 2017 'Equitable Futures' project, which in… ▽ More The University of Washington eScience Institute runs an annual Data Science for Social Good (DSSG) program that selects four projects each year to train students from a wide range of disciplines while helping community members execute social good projects, often with an urban focus. We present observations and deliberations of one such project, the DSSG 2017 'Equitable Futures' project, which investigates the ongoing gentrification process and the increasingly inequitable access to opportunities in Seattle. Similar processes can be observed in many major cities. The project connects issues usually analyzed in the disciplines of the built environment, geography, sociology, economics, social work and city governments with data science methodologies and visualizations. △ Less

Submitted 6 October, 2017; originally announced October 2017.

Comments: Presented at the Data For Good Exchange 2017

Showing 1–9 of 9 results for author: Herman, B