-
Physics-augmented neural networks for constitutive modeling of hyperelastic geometrically exact beams
Authors:
Jasper O. Schommartz,
Dominik K. Klein,
Juan C. Alzate Cobo,
Oliver Weeger
Abstract:
We present neural network-based constitutive models for hyperelastic geometrically exact beams. The proposed models are physics-augmented, i.e., formulated to fulfill important mechanical conditions by construction. Strains and curvatures of the beam are used as input for feed-forward neural networks that represent the effective hyperelastic beam potential. Forces and moments are then received as…
▽ More
We present neural network-based constitutive models for hyperelastic geometrically exact beams. The proposed models are physics-augmented, i.e., formulated to fulfill important mechanical conditions by construction. Strains and curvatures of the beam are used as input for feed-forward neural networks that represent the effective hyperelastic beam potential. Forces and moments are then received as the gradients of the beam potential, ensuring thermodynamic consistency. Furthermore, normalization conditions are considered via additional projection terms. To include the symmetry of beams with point-symmetric cross-sections, a flip symmetry constraint is introduced. Additionally, parameterized models are proposed that can represent the beam's constitutive behavior for varying cross-sectional geometries. The physically motivated parameterization takes into account the influence of the beam radius on the beam potential. Formulating the beam potential as a neural network provides a highly flexible model. This enables efficient constitutive surrogate modeling for geometrically exact beams with nonlinear material behavior and cross-sectional deformation, which otherwise would require computationally much more expensive methods. The models are calibrated to data generated for beams with circular, deformable cross-sections and varying radii, showing excellent accuracy and generalization. The applicability of the proposed model is further demonstrated by applying it in beam simulations. In all studied cases, the proposed model shows excellent performance.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Efficacy of Language Model Self-Play in Non-Zero-Sum Games
Authors:
Austen Liao,
Nicholas Tomlin,
Dan Klein
Abstract:
Game-playing agents like AlphaGo have achieved superhuman performance through self-play, which is theoretically guaranteed to yield optimal policies in competitive games. However, most language tasks are partially or fully cooperative, so it is an open question whether techniques like self-play can effectively be used to improve language models. We empirically investigate this question in a negoti…
▽ More
Game-playing agents like AlphaGo have achieved superhuman performance through self-play, which is theoretically guaranteed to yield optimal policies in competitive games. However, most language tasks are partially or fully cooperative, so it is an open question whether techniques like self-play can effectively be used to improve language models. We empirically investigate this question in a negotiation game setting known as Deal or No Deal (DoND). Crucially, the objective in DoND can be modified to produce a fully cooperative game, a strictly competitive one, or anything in between. We finetune language models in self-play over multiple rounds of filtered behavior cloning in DoND for each of these objectives. Contrary to expectations, we find that language model self-play leads to significant performance gains in both cooperation and competition with humans, suggesting that self-play and related techniques have promise despite a lack of theoretical guarantees.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Linguistic Bias in ChatGPT: Language Models Reinforce Dialect Discrimination
Authors:
Eve Fleisig,
Genevieve Smith,
Madeline Bossi,
Ishita Rustagi,
Xavier Yin,
Dan Klein
Abstract:
We present a large-scale study of linguistic bias exhibited by ChatGPT covering ten dialects of English (Standard American English, Standard British English, and eight widely spoken non-"standard" varieties from around the world). We prompted GPT-3.5 Turbo and GPT-4 with text by native speakers of each variety and analyzed the responses via detailed linguistic feature annotation and native speaker…
▽ More
We present a large-scale study of linguistic bias exhibited by ChatGPT covering ten dialects of English (Standard American English, Standard British English, and eight widely spoken non-"standard" varieties from around the world). We prompted GPT-3.5 Turbo and GPT-4 with text by native speakers of each variety and analyzed the responses via detailed linguistic feature annotation and native speaker evaluation. We find that the models default to "standard" varieties of English; based on evaluation by native speakers, we also find that model responses to non-"standard" varieties consistently exhibit a range of issues: lack of comprehension (10% worse compared to "standard" varieties), stereotyping (16% worse), demeaning content (22% worse), and condescending responses (12% worse). We also find that if these models are asked to imitate the writing style of prompts in non-"standard" varieties, they produce text that exhibits lower comprehension of the input and is especially prone to stereotyping. GPT-4 improves on GPT-3.5 in terms of comprehension, warmth, and friendliness, but it also results in a marked increase in stereotyping (+17%). The results suggest that GPT-3.5 Turbo and GPT-4 exhibit linguistic discrimination in ways that can exacerbate harms for speakers of non-"standard" varieties.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
American Sign Language Handshapes Reflect Pressures for Communicative Efficiency
Authors:
Kayo Yin,
Terry Regier,
Dan Klein
Abstract:
Communicative efficiency is a key topic in linguistics and cognitive psychology, with many studies demonstrating how the pressure to communicate with minimal effort guides the form of natural language. However, this phenomenon is rarely explored in signed languages. This paper shows how handshapes in American Sign Language (ASL) reflect these efficiency pressures and provides new evidence of commu…
▽ More
Communicative efficiency is a key topic in linguistics and cognitive psychology, with many studies demonstrating how the pressure to communicate with minimal effort guides the form of natural language. However, this phenomenon is rarely explored in signed languages. This paper shows how handshapes in American Sign Language (ASL) reflect these efficiency pressures and provides new evidence of communicative efficiency in the visual-gestural modality.
We focus on hand configurations in native ASL signs and signs borrowed from English to compare efficiency pressures from both ASL and English usage. First, we develop new methodologies to quantify the articulatory effort needed to produce handshapes and the perceptual effort required to recognize them. Then, we analyze correlations between communicative effort and usage statistics in ASL or English. Our findings reveal that frequent ASL handshapes are easier to produce and that pressures for communicative efficiency mostly come from ASL usage, rather than from English lexical borrowing.
△ Less
Submitted 10 June, 2024; v1 submitted 6 June, 2024;
originally announced June 2024.
-
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval
Authors:
Yizhou Chi,
Jessy Lin,
Kevin Lin,
Dan Klein
Abstract:
Users often make ambiguous requests that require clarification. We study the problem of asking clarification questions in an information retrieval setting, where systems often face ambiguous search queries and it is challenging to turn the uncertainty in the retrieval model into a natural language question. We present CLARINET, a system that asks informative clarification questions by choosing que…
▽ More
Users often make ambiguous requests that require clarification. We study the problem of asking clarification questions in an information retrieval setting, where systems often face ambiguous search queries and it is challenging to turn the uncertainty in the retrieval model into a natural language question. We present CLARINET, a system that asks informative clarification questions by choosing questions whose answers would maximize certainty in the correct candidate. Our approach works by augmenting a large language model (LLM) to condition on a retrieval distribution, finetuning end-to-end to generate the question that would have maximized the rank of the true candidate at each turn. When evaluated on a real-world retrieval dataset of users searching for books, our system outperforms traditional heuristics such as information gain on retrieval success by 17% and vanilla-prompted LLMs by 39% relative.
△ Less
Submitted 28 April, 2024;
originally announced May 2024.
-
The Perspectivist Paradigm Shift: Assumptions and Challenges of Capturing Human Labels
Authors:
Eve Fleisig,
Su Lin Blodgett,
Dan Klein,
Zeerak Talat
Abstract:
Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine…
▽ More
Longstanding data labeling practices in machine learning involve collecting and aggregating labels from multiple annotators. But what should we do when annotators disagree? Though annotator disagreement has long been seen as a problem to minimize, new perspectivist approaches challenge this assumption by treating disagreement as a valuable source of information. In this position paper, we examine practices and assumptions surrounding the causes of disagreement--some challenged by perspectivist approaches, and some that remain to be addressed--as well as practical and normative challenges for work operating under these assumptions. We conclude with recommendations for the data labeling pipeline and avenues for future research engaging with subjectivity and disagreement.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Pose Priors from Language Models
Authors:
Sanjay Subramanian,
Evonne Ng,
Lea Müller,
Dan Klein,
Shiry Ginosar,
Trevor Darrell
Abstract:
We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation.
We can thus leverage this insight to improve pose estimation by converting natural language des…
▽ More
We present a zero-shot pose optimization method that enforces accurate physical contact constraints when estimating the 3D pose of humans. Our central insight is that since language is often used to describe physical interaction, large pretrained text-based models can act as priors on pose estimation.
We can thus leverage this insight to improve pose estimation by converting natural language descriptors, generated by a large multimodal model (LMM), into tractable losses to constrain the 3D pose optimization. Despite its simplicity, our method produces surprisingly compelling pose reconstructions of people in close contact, correctly capturing the semantics of the social and physical interactions. We demonstrate that our method rivals more complex state-of-the-art approaches that require expensive human annotation of contact points and training specialized models. Moreover, unlike previous approaches, our method provides a unified framework for resolving self-contact and person-to-person contact.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Greater benefits of deep learning-based computer-aided detection systems for finding small signals in 3D volumetric medical images
Authors:
Devi Klein,
Srijita Karmakar,
Aditya Jonnalagadda,
Craig K. Abbey,
Miguel P. Eckstein
Abstract:
Purpose: Radiologists are tasked with visually scrutinizing large amounts of data produced by 3D volumetric imaging modalities. Small signals can go unnoticed during the 3d search because they are hard to detect in the visual periphery. Recent advances in machine learning and computer vision have led to effective computer-aided detection (CADe) support systems with the potential to mitigate percep…
▽ More
Purpose: Radiologists are tasked with visually scrutinizing large amounts of data produced by 3D volumetric imaging modalities. Small signals can go unnoticed during the 3d search because they are hard to detect in the visual periphery. Recent advances in machine learning and computer vision have led to effective computer-aided detection (CADe) support systems with the potential to mitigate perceptual errors.
Approach: Sixteen non-expert observers searched through digital breast tomosynthesis (DBT) phantoms and single cross-sectional slices of the DBT phantoms. The 3D/2D searches occurred with and without a convolutional neural network (CNN)-based CADe support system. The model provided observers with bounding boxes superimposed on the image stimuli while they looked for a small microcalcification signal and a large mass signal. Eye gaze positions were recorded and correlated with changes in the area under the ROC curve (AUC).
Results: The CNN-CADe improved the 3D search for the small microcalcification signal (delta AUC = 0.098, p = 0.0002) and the 2D search for the large mass signal (delta AUC = 0.076, p = 0.002). The CNN-CADe benefit in 3D for the small signal was markedly greater than in 2D (delta delta AUC = 0.066, p = 0.035). Analysis of individual differences suggests that those who explored the least with eye movements benefited the most from the CNN-CADe (r = -0.528, p = 0.036). However, for the large signal, the 2D benefit was not significantly greater than the 3D benefit (delta delta AUC = 0.033, p = 0.133).
Conclusion: The CNN-CADe brings unique performance benefits to the 3D (vs. 2D) search of small signals by reducing errors caused by the under-exploration of the volumetric data.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search
Authors:
Yizhou Chi,
Kevin Yang,
Dan Klein
Abstract:
We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action s…
▽ More
We present THOUGHTSCULPT, a general reasoning and search method for tasks with outputs that can be decomposed into components. THOUGHTSCULPT explores a search tree of potential solutions using Monte Carlo Tree Search (MCTS), building solutions one action at a time and evaluating according to any domain-specific heuristic, which in practice is often simply an LLM evaluator. Critically, our action space includes revision actions: THOUGHTSCULPT may choose to revise part of its previous output rather than continuing to build the rest of its output. Empirically, THOUGHTSCULPT outperforms state-of-the-art reasoning methods across three challenging tasks: Story Outline Improvement (up to +30% interestingness), Mini-Crosswords Solving (up to +16% word success rate), and Constrained Generation (up to +10% concept coverage).
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
What Evidence Do Language Models Find Convincing?
Authors:
Alexander Wan,
Eric Wallace,
Dan Klein
Abstract:
Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer". To resolve these ambiguous queries, one must search through a large range of websites and consider "which, if any, of this evidence do I find convincing?". In this work, we study how LLMs answer this question. In particular, we construct C…
▽ More
Retrieval-augmented language models are being increasingly tasked with subjective, contentious, and conflicting queries such as "is aspartame linked to cancer". To resolve these ambiguous queries, one must search through a large range of websites and consider "which, if any, of this evidence do I find convincing?". In this work, we study how LLMs answer this question. In particular, we construct ConflictingQA, a dataset that pairs controversial queries with a series of real-world evidence documents that contain different facts (e.g., quantitative results), argument styles (e.g., appeals to authority), and answers (Yes or No). We use this dataset to perform sensitivity and counterfactual analyses to explore which text features most affect LLM predictions. Overall, we find that current models rely heavily on the relevance of a website to the query, while largely ignoring stylistic features that humans find important such as whether a text contains scientific references or is written with a neutral tone. Taken together, these results highlight the importance of RAG corpus quality (e.g., the need to filter misinformation), and possibly even a shift in how LLMs are trained to better align with human judgements.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Prompted Contextual Vectors for Spear-Phishing Detection
Authors:
Daniel Nahmias,
Gal Engelberg,
Dan Klein,
Asaf Shabtai
Abstract:
Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to…
▽ More
Spear-phishing attacks present a significant security challenge, with large language models (LLMs) escalating the threat by generating convincing emails and facilitating target reconnaissance. To address this, we propose a detection approach based on a novel document vectorization method that utilizes an ensemble of LLMs to create representation vectors. By prompting LLMs to reason and respond to human-crafted questions, we quantify the presence of common persuasion principles in the email's content, producing prompted contextual document vectors for a downstream supervised machine learning model. We evaluate our method using a unique dataset generated by a proprietary system that automates target reconnaissance and spear-phishing email creation. Our method achieves a 91% F1 score in identifying LLM-generated spear-phishing emails, with the training set comprising only traditional phishing and benign emails. Key contributions include an innovative document vectorization method utilizing LLM reasoning, a publicly available dataset of high-quality spear-phishing emails, and the demonstrated effectiveness of our method in detecting such emails. This methodology can be utilized for various document classification tasks, particularly in adversarial problem domains.
△ Less
Submitted 14 February, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Nonlinear electro-elastic finite element analysis with neural network constitutive models
Authors:
Dominik K. Klein,
Rogelio Ortigosa,
Jesús Martínez-Frutos,
Oliver Weeger
Abstract:
In the present work, the applicability of physics-augmented neural network (PANN) constitutive models for complex electro-elastic finite element analysis is demonstrated. For the investigations, PANN models for electro-elastic material behavior at finite deformations are calibrated to different synthetically generated datasets, including an analytical isotropic potential, a homogenised rank-one la…
▽ More
In the present work, the applicability of physics-augmented neural network (PANN) constitutive models for complex electro-elastic finite element analysis is demonstrated. For the investigations, PANN models for electro-elastic material behavior at finite deformations are calibrated to different synthetically generated datasets, including an analytical isotropic potential, a homogenised rank-one laminate, and a homogenised metamaterial with a spherical inclusion. Subsequently, boundary value problems inspired by engineering applications of composite electro-elastic materials are considered. Scenarios with large electrically induced deformations and instabilities are particularly challenging and thus necessitate extensive investigations of the PANN constitutive models in the context of finite element analyses. First of all, an excellent prediction quality of the model is required for very general load cases occurring in the simulation. Furthermore, simulation of large deformations and instabilities poses challenges on the stability of the numerical solver, which is closely related to the constitutive model. In all cases studied, the PANN models yield excellent prediction qualities and a stable numerical behavior even in highly nonlinear scenarios. This can be traced back to the PANN models excellent performance in learning both the first and second derivatives of the ground truth electro-elastic potentials, even though it is only calibrated on the first derivatives. Overall, this work demonstrates the applicability of PANN constitutive models for the efficient and robust simulation of engineering applications of composite electro-elastic materials.
△ Less
Submitted 10 February, 2024;
originally announced February 2024.
-
Unbalancedness in Neural Monge Maps Improves Unpaired Domain Translation
Authors:
Luca Eyring,
Dominik Klein,
Théo Uscidda,
Giovanni Palla,
Niki Kilbertus,
Zeynep Akata,
Fabian Theis
Abstract:
In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, whi…
▽ More
In optimal transport (OT), a Monge map is known as a mapping that transports a source distribution to a target distribution in the most cost-efficient way. Recently, multiple neural estimators for Monge maps have been developed and applied in diverse unpaired domain translation tasks, e.g. in single-cell biology and computer vision. However, the classic OT framework enforces mass conservation, which makes it prone to outliers and limits its applicability in real-world scenarios. The latter can be particularly harmful in OT domain translation tasks, where the relative position of a sample within a distribution is explicitly taken into account. While unbalanced OT tackles this challenge in the discrete setting, its integration into neural Monge map estimators has received limited attention. We propose a theoretically grounded method to incorporate unbalancedness into any Monge map estimator. We improve existing estimators to model cell trajectories over time and to predict cellular responses to perturbations. Moreover, our approach seamlessly integrates with the OT flow matching (OT-FM) framework. While we show that OT-FM performs competitively in image translation, we further improve performance by incorporating unbalancedness (UOT-FM), which better preserves relevant features. We hence establish UOT-FM as a principled method for unpaired image translation.
△ Less
Submitted 11 March, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Which One? Leveraging Context Between Objects and Multiple Views for Language Grounding
Authors:
Chancharik Mitra,
Abrar Anwar,
Rodolfo Corona,
Dan Klein,
Trevor Darrell,
Jesse Thomason
Abstract:
When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position. As such, we present the Multi-view Approach to Grounding in Context (MAGiC), which selects an object referent…
▽ More
When connecting objects and their language referents in an embodied 3D environment, it is important to note that: (1) an object can be better characterized by leveraging comparative information between itself and other objects, and (2) an object's appearance can vary with camera position. As such, we present the Multi-view Approach to Grounding in Context (MAGiC), which selects an object referent based on language that distinguishes between two similar objects. By pragmatically reasoning over both objects and across multiple views of those objects, MAGiC improves over the state-of-the-art model on the SNARE object reference task with a relative error reduction of 12.9\% (representing an absolute improvement of 2.7\%). Ablation studies show that reasoning jointly over object referent candidates and multiple views of each object both contribute to improved accuracy. Code: https://github.com/rcorona/magic_snare/
△ Less
Submitted 6 April, 2024; v1 submitted 11 November, 2023;
originally announced November 2023.
-
Improving Pacing in Long-Form Story Planning
Authors:
Yichen Wang,
Kevin Yang,
Xiaoming Liu,
Dan Klein
Abstract:
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, whether glossing over important events or over-elaborating on insignificant details, resulting in a jarring experience for the reader. We propose a CONCrete Outline ConTrol (CONCOCT) system to improve pacing when automatically generating story outlines. We first train a concreteness…
▽ More
Existing LLM-based systems for writing long-form stories or story outlines frequently suffer from unnatural pacing, whether glossing over important events or over-elaborating on insignificant details, resulting in a jarring experience for the reader. We propose a CONCrete Outline ConTrol (CONCOCT) system to improve pacing when automatically generating story outlines. We first train a concreteness evaluator to judge which of two events is more concrete (low-level-detailed). This evaluator can then be used to control pacing in hierarchical outline generation; in this work, we explore a vaguest-first expansion procedure that aims for uniform pacing. We further use the evaluator to filter new outline items based on predicted concreteness. Compared to a baseline hierarchical outline generator, humans judge CONCOCT's pacing to be more consistent over 57% of the time across multiple outline lengths; the gains also translate to downstream stories. All code, data, and models are open-sourced.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Incorporating Worker Perspectives into MTurk Annotation Practices for NLP
Authors:
Olivia Huang,
Eve Fleisig,
Dan Klein
Abstract:
Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical litera…
▽ More
Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical literature review and a survey of MTurk workers aimed at addressing open questions regarding best practices for fair payment, worker privacy, data quality, and considering worker incentives. We found that worker preferences are often at odds with received wisdom among NLP researchers. Surveyed workers preferred reliable, reasonable payments over uncertain, very high payments; reported frequently lying on demographic questions; and expressed frustration at having work rejected with no explanation. We also found that workers view some quality control methods, such as requiring minimum response times or Master's qualifications, as biased and largely ineffective. Based on the survey results, we provide recommendations on how future NLP studies may better account for MTurk workers' experiences in order to respect workers' rights and improve data quality.
△ Less
Submitted 15 November, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Entropic (Gromov) Wasserstein Flow Matching with GENOT
Authors:
Dominik Klein,
Théo Uscidda,
Fabian Theis,
Marco Cuturi
Abstract:
Optimal transport (OT) theory has reshaped the field of generative modeling: Combined with neural networks, recent \textit{Neural OT} (N-OT) solvers use OT as an inductive bias, to focus on ``thrifty'' mappings that minimize average displacement costs. This core principle has fueled the successful application of N-OT solvers to high-stakes scientific challenges, notably single-cell genomics. N-OT…
▽ More
Optimal transport (OT) theory has reshaped the field of generative modeling: Combined with neural networks, recent \textit{Neural OT} (N-OT) solvers use OT as an inductive bias, to focus on ``thrifty'' mappings that minimize average displacement costs. This core principle has fueled the successful application of N-OT solvers to high-stakes scientific challenges, notably single-cell genomics. N-OT solvers are, however, increasingly confronted with practical challenges: while most N-OT solvers can handle squared-Euclidean costs, they must be repurposed to handle more general costs; their reliance on deterministic Monge maps as well as mass conservation constraints can easily go awry in the presence of outliers; mapping points \textit{across} heterogeneous spaces is out of their reach. While each of these challenges has been explored independently, we propose a new framework that can handle, natively, all of these needs. The \textit{generative entropic neural OT} (GENOT) framework models the conditional distribution $π_\varepsilon(\*y|\*x)$ of an optimal \textit{entropic} coupling $π_\varepsilon$, using conditional flow matching. GENOT is generative, and can transport points \textit{across} spaces, guided by sample-based, unbalanced solutions to the Gromov-Wasserstein problem, that can use any cost. We showcase our approach on both synthetic and single-cell datasets, using GENOT to model cell development, predict cellular responses, and translate between data modalities.
△ Less
Submitted 12 March, 2024; v1 submitted 13 October, 2023;
originally announced October 2023.
-
Can Language Models Learn to Listen?
Authors:
Evonne Ng,
Sanjay Subramanian,
Dan Klein,
Angjoo Kanazawa,
Trevor Darrell,
Shiry Ginosar
Abstract:
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose t…
▽ More
We present a framework for generating appropriate facial responses from a listener in dyadic social interactions based on the speaker's words. Given an input transcription of the speaker's words with their timestamps, our approach autoregressively predicts a response of a listener: a sequence of listener facial gestures, quantized using a VQ-VAE. Since gesture is a language component, we propose treating the quantized atomic motion elements as additional language token inputs to a transformer-based large language model. Initializing our transformer with the weights of a language model pre-trained only on text results in significantly higher quality listener responses than training a transformer from scratch. We show that our generated listener motion is fluent and reflective of language semantics through quantitative metrics and a qualitative user study. In our evaluation, we analyze the model's ability to utilize temporal and semantic aspects of spoken text. Project page: https://people.eecs.berkeley.edu/~evonne_ng/projects/text2listen/
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Learning to Model the World with Language
Authors:
Jessy Lin,
Yuqing Du,
Olivia Watkins,
Danijar Hafner,
Pieter Abbeel,
Dan Klein,
Anca Dragan
Abstract:
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state o…
▽ More
To interact with humans and act in the world, agents need to understand the range of language that people use and relate it to the visual world. While current agents can learn to execute simple language instructions, we aim to build agents that leverage diverse language -- language like "this button turns on the TV" or "I put the bowls away" -- that conveys general knowledge, describes the state of the world, provides interactive feedback, and more. Our key idea is that agents should interpret such diverse language as a signal that helps them predict the future: what they will observe, how the world will behave, and which situations will be rewarded. This perspective unifies language understanding with future prediction as a powerful self-supervised learning objective. We instantiate this in Dynalang, an agent that learns a multimodal world model to predict future text and image representations, and learns to act from imagined model rollouts. While current methods that learn language-conditioned policies degrade in performance with more diverse types of language, we show that Dynalang learns to leverage environment descriptions, game rules, and instructions to excel on tasks ranging from game-playing to navigating photorealistic home scans. Finally, we show that our method enables additional capabilities due to learning a generative model: Dynalang can be pretrained on text-only data, enabling learning from offline datasets, and generate language grounded in an environment.
△ Less
Submitted 31 May, 2024; v1 submitted 31 July, 2023;
originally announced August 2023.
-
RLCD: Reinforcement Learning from Contrastive Distillation for Language Model Alignment
Authors:
Kevin Yang,
Dan Klein,
Asli Celikyilmaz,
Nanyun Peng,
Yuandong Tian
Abstract:
We propose Reinforcement Learning from Contrastive Distillation (RLCD), a method for aligning language models to follow principles expressed in natural language (e.g., to be more harmless) without using human feedback. RLCD creates preference pairs from two contrasting model outputs, one using a positive prompt designed to encourage following the given principles, and one using a negative prompt d…
▽ More
We propose Reinforcement Learning from Contrastive Distillation (RLCD), a method for aligning language models to follow principles expressed in natural language (e.g., to be more harmless) without using human feedback. RLCD creates preference pairs from two contrasting model outputs, one using a positive prompt designed to encourage following the given principles, and one using a negative prompt designed to encourage violating them. Using two different prompts causes model outputs to be more differentiated on average, resulting in cleaner preference labels in the absence of human annotations. We then use the preference pairs to train a preference model, which is in turn used to improve a base unaligned language model via reinforcement learning. Empirically, RLCD outperforms RLAIF (Bai et al., 2022b) and context distillation (Huang et al., 2022) baselines across three diverse alignment tasks--harmlessness, helpfulness, and story outline generation--and when using both 7B and 30B model scales for simulating preference data.
△ Less
Submitted 16 March, 2024; v1 submitted 24 July, 2023;
originally announced July 2023.
-
Parametrised polyconvex hyperelasticity with physics-augmented neural networks
Authors:
Dominik K. Klein,
Fabian J. Roth,
Iman Valizadeh,
Oliver Weeger
Abstract:
In the present work, neural networks are applied to formulate parametrised hyperelastic constitutive models. The models fulfill all common mechanical conditions of hyperelasticity by construction. In particular, partially input-convex neural network (pICNN) architectures are applied based on feed-forward neural networks. Receiving two different sets of input arguments, pICNNs are convex in one of…
▽ More
In the present work, neural networks are applied to formulate parametrised hyperelastic constitutive models. The models fulfill all common mechanical conditions of hyperelasticity by construction. In particular, partially input-convex neural network (pICNN) architectures are applied based on feed-forward neural networks. Receiving two different sets of input arguments, pICNNs are convex in one of them, while for the other, they represent arbitrary relationships which are not necessarily convex. In this way, the model can fulfill convexity conditions stemming from mechanical considerations without being too restrictive on the functional relationship in additional parameters, which may not necessarily be convex. Two different models are introduced, where one can represent arbitrary functional relationships in the additional parameters, while the other is monotonic in the additional parameters. As a first proof of concept, the model is calibrated to data generated with two differently parametrised analytical potentials, whereby three different pICNN architectures are investigated. In all cases, the proposed model shows excellent performance.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Authors:
Jonathan Pei,
Kevin Yang,
Dan Klein
Abstract:
We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated…
▽ More
We propose Prefix-Adaptive Decoding (PREADD), a flexible method for controlled text generation. Unlike existing methods that use auxiliary expert models to control for attributes, PREADD does not require an external model, instead relying on linearly combining output logits from multiple prompts. Specifically, PREADD contrasts the output logits generated using a raw prompt against those generated using a prefix-prepended prompt, enabling both positive and negative control with respect to any attribute encapsulated by the prefix. We evaluate PREADD on three tasks -- toxic output mitigation, gender bias reduction, and sentiment control -- and find that PREADD outperforms not only prompting baselines, but also an auxiliary-expert control method, by 12% or more in relative gain on our main metrics for each task.
△ Less
Submitted 6 July, 2023;
originally announced July 2023.
-
Advanced discretization techniques for hyperelastic physics-augmented neural networks
Authors:
Marlon Franke,
Dominik K. Klein,
Oliver Weeger,
Peter Betsch
Abstract:
In the present work, advanced spatial and temporal discretization techniques are tailored to hyperelastic physics-augmented neural networks, i.e., neural network based constitutive models which fulfill all relevant mechanical conditions of hyperelasticity by construction. The framework takes into account the structure of neural network-based constitutive models, in particular, that their derivativ…
▽ More
In the present work, advanced spatial and temporal discretization techniques are tailored to hyperelastic physics-augmented neural networks, i.e., neural network based constitutive models which fulfill all relevant mechanical conditions of hyperelasticity by construction. The framework takes into account the structure of neural network-based constitutive models, in particular, that their derivatives are more complex compared to analytical models. The proposed framework allows for convenient mixed Hu-Washizu like finite element formulations applicable to nearly incompressible material behavior. The key feature of this work is a tailored energy-momentum scheme for time discretization, which allows for energy and momentum preserving dynamical simulations. Both the mixed formulation and the energy-momentum discretization are applied in finite element analysis. For this, a hyperelastic physics-augmented neural network model is calibrated to data generated with an analytical potential. In all finite element simulations, the proposed discretization techniques show excellent performance. All of this demonstrates that, from a formal point of view, neural networks are essentially mathematical functions. As such, they can be applied in numerical methods as straightforwardly as analytical constitutive models. Nevertheless, their special structure suggests to tailor advanced discretization methods, to arrive at compact mathematical formulations and convenient implementations.
△ Less
Submitted 16 June, 2023;
originally announced June 2023.
-
Modular Visual Question Answering via Code Generation
Authors:
Sanjay Subramanian,
Medhini Narasimhan,
Kushal Khangaonkar,
Kevin Yang,
Arsha Nagrani,
Cordelia Schmid,
Andy Zeng,
Trevor Darrell,
Dan Klein
Abstract:
We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the o…
▽ More
We present a framework that formulates visual question answering as modular code generation. In contrast to prior work on modular approaches to VQA, our approach requires no additional training and relies on pre-trained language models (LMs), visual models pre-trained on image-caption pairs, and fifty VQA examples used for in-context learning. The generated Python programs invoke and compose the outputs of the visual models using arithmetic and conditional logic. Our approach improves accuracy on the COVR dataset by at least 3% and on the GQA dataset by roughly 2% compared to the few-shot baseline that does not employ code generation.
△ Less
Submitted 8 June, 2023;
originally announced June 2023.
-
The Influence of Variable Frame Timing on First-Person Gaming
Authors:
Devi Klein,
Josef Spjut,
Ben Boudaoud,
Joohwan Kim
Abstract:
Variable frame timing (VFT), or changes in the time intervals between discrete frame images displayed to users, deviates from our traditional conceptualization of frame rate in which all frame times are equal. With the advent of variable refresh rate (VRR) monitor technologies, gamers experience VFT at the display. VRR, coupled with increased display refresh rates and high-end hardware, enables sm…
▽ More
Variable frame timing (VFT), or changes in the time intervals between discrete frame images displayed to users, deviates from our traditional conceptualization of frame rate in which all frame times are equal. With the advent of variable refresh rate (VRR) monitor technologies, gamers experience VFT at the display. VRR, coupled with increased display refresh rates and high-end hardware, enables smoother variation of frame presentation sequences. We assess the effects of VFT on the perception of smoothness (experiment 1) and performance (experiment 2) in first-person shooter (FPS) gameplay by introducing frequent but relatively small (4-12 ms) variations in frame time around typical refresh rates (30-240 Hz). Our results indicate that VFT impacts the perception of smoothness. However, the results from experiment 2 do not indicate differences in FPS task performance (i.e., completion time) between variable and constant frame time sequences ranked equally smooth in experiment 1.
△ Less
Submitted 2 June, 2023;
originally announced June 2023.
-
Are Layout-Infused Language Models Robust to Layout Distribution Shifts? A Case Study with Scientific Documents
Authors:
Catherine Chen,
Zejiang Shen,
Dan Klein,
Gabriel Stanovsky,
Doug Downey,
Kyle Lo
Abstract:
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text…
▽ More
Recent work has shown that infusing layout features into language models (LMs) improves processing of visually-rich documents such as scientific papers. Layout-infused LMs are often evaluated on documents with familiar layout features (e.g., papers from the same publisher), but in practice models encounter documents with unfamiliar distributions of layout features, such as new combinations of text sizes and styles, or new spatial configurations of textual elements. In this work we test whether layout-infused LMs are robust to layout distribution shifts. As a case study we use the task of scientific document structure recovery, segmenting a scientific paper into its structural categories (e.g., "title", "caption", "reference"). To emulate distribution shifts that occur in practice we re-partition the GROTOAP2 dataset. We find that under layout distribution shifts model performance degrades by up to 20 F1. Simple training strategies, such as increasing training diversity, can reduce this degradation by over 35% relative F1; however, models fail to reach in-distribution performance in any tested out-of-distribution conditions. This work highlights the need to consider layout distribution shifts during model evaluation, and presents a methodology for conducting such evaluations.
△ Less
Submitted 1 June, 2023;
originally announced June 2023.
-
Decomposing Complex Queries for Tip-of-the-tongue Retrieval
Authors:
Kevin Lin,
Kyle Lo,
Joseph E. Gonzalez,
Dan Klein
Abstract:
When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e.g., book characters or events), information beyond the document text (e.g., descriptions of book covers), or personal context (e.g., when they read a book). This retrieval setting, called tip…
▽ More
When re-finding items, users who forget or are uncertain about identifying details often rely on creative strategies for expressing their information needs -- complex queries that describe content elements (e.g., book characters or events), information beyond the document text (e.g., descriptions of book covers), or personal context (e.g., when they read a book). This retrieval setting, called tip of the tongue (TOT), is especially challenging for models heavily reliant on lexical and semantic overlap between query and document text. In this work, we introduce a simple yet effective framework for handling such complex queries by decomposing the query into individual clues, routing those as sub-queries to specialized retrievers, and ensembling the results. This approach allows us to take advantage of off-the-shelf retrievers (e.g., CLIP for retrieving images of book covers) or incorporate retriever-specific logic (e.g., date constraints). We show that our framework incorportating query decompositions into retrievers can improve gold book recall up to 7% relative again for Recall@5 on a new collection of 14,441 real-world query-book pairs from an online community for resolving TOT inquiries.
△ Less
Submitted 24 May, 2023;
originally announced May 2023.
-
Ghostbuster: Detecting Text Ghostwritten by Large Language Models
Authors:
Vivek Verma,
Eve Fleisig,
Nicholas Tomlin,
Dan Klein
Abstract:
We introduce Ghostbuster, a state-of-the-art system for detecting AI-generated text. Our method works by passing documents through a series of weaker language models, running a structured search over possible combinations of their features, and then training a classifier on the selected features to predict whether documents are AI-generated. Crucially, Ghostbuster does not require access to token…
▽ More
We introduce Ghostbuster, a state-of-the-art system for detecting AI-generated text. Our method works by passing documents through a series of weaker language models, running a structured search over possible combinations of their features, and then training a classifier on the selected features to predict whether documents are AI-generated. Crucially, Ghostbuster does not require access to token probabilities from the target model, making it useful for detecting text generated by black-box models or unknown model versions. In conjunction with our model, we release three new datasets of human- and AI-generated text as detection benchmarks in the domains of student essays, creative writing, and news articles. We compare Ghostbuster to a variety of existing detectors, including DetectGPT and GPTZero, as well as a new RoBERTa baseline. Ghostbuster achieves 99.0 F1 when evaluated across domains, which is 5.9 F1 higher than the best preexisting model. It also outperforms all previous approaches in generalization across writing domains (+7.5 F1), prompting strategies (+2.1 F1), and language models (+4.4 F1). We also analyze the robustness of our system to a variety of perturbations and paraphrasing attacks and evaluate its performance on documents written by non-native English speakers.
△ Less
Submitted 5 April, 2024; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Centering the Margins: Outlier-Based Identification of Harmed Populations in Toxicity Detection
Authors:
Vyoma Raman,
Eve Fleisig,
Dan Klein
Abstract:
The impact of AI models on marginalized communities has traditionally been measured by identifying performance differences between specified demographic subgroups. Though this approach aims to center vulnerable groups, it risks obscuring patterns of harm faced by intersectional subgroups or shared across multiple groups. To address this, we draw on theories of marginalization from disability studi…
▽ More
The impact of AI models on marginalized communities has traditionally been measured by identifying performance differences between specified demographic subgroups. Though this approach aims to center vulnerable groups, it risks obscuring patterns of harm faced by intersectional subgroups or shared across multiple groups. To address this, we draw on theories of marginalization from disability studies and related disciplines, which state that people farther from the norm face greater adversity, to consider the "margins" in the domain of toxicity detection. We operationalize the "margins" of a dataset by employing outlier detection to identify text about people with demographic attributes distant from the "norm". We find that model performance is consistently worse for demographic outliers, with mean squared error (MSE) between outliers and non-outliers up to 70.4% worse across toxicity types. It is also worse for text outliers, with a MSE up to 68.4% higher for outliers than non-outliers. We also find text and demographic outliers to be particularly susceptible to errors in the classification of severe toxicity and identity attacks. Compared to analysis of disparities using traditional demographic breakdowns, we find that our outlier analysis frequently surfaces greater harms faced by a larger, more intersectional group, which suggests that outlier analysis is particularly beneficial for identifying harms against those groups.
△ Less
Submitted 1 December, 2023; v1 submitted 24 May, 2023;
originally announced May 2023.
-
Revisiting Entropy Rate Constancy in Text
Authors:
Vivek Verma,
Nicholas Tomlin,
Dan Klein
Abstract:
The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel & Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel & Charniak…
▽ More
The uniform information density (UID) hypothesis states that humans tend to distribute information roughly evenly across an utterance or discourse. Early evidence in support of the UID hypothesis came from Genzel & Charniak (2002), which proposed an entropy rate constancy principle based on the probability of English text under n-gram language models. We re-evaluate the claims of Genzel & Charniak (2002) with neural language models, failing to find clear evidence in support of entropy rate constancy. We conduct a range of experiments across datasets, model sizes, and languages and discuss implications for the uniform information density hypothesis and linguistic theories of efficient communication more broadly.
△ Less
Submitted 17 October, 2023; v1 submitted 19 May, 2023;
originally announced May 2023.
-
When the Majority is Wrong: Modeling Annotator Disagreement for Subjective Tasks
Authors:
Eve Fleisig,
Rediet Abebe,
Dan Klein
Abstract:
Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus, a crucial problem in hate speech detection is determining whether a statement is offensive to the demographic group that it targets, when that group may consti…
▽ More
Though majority vote among annotators is typically used for ground truth labels in natural language processing, annotator disagreement in tasks such as hate speech detection may reflect differences in opinion across groups, not noise. Thus, a crucial problem in hate speech detection is determining whether a statement is offensive to the demographic group that it targets, when that group may constitute a small fraction of the annotator pool. We construct a model that predicts individual annotator ratings on potentially offensive text and combines this information with the predicted target group of the text to model the opinions of target group members. We show gains across a range of metrics, including raising performance over the baseline by 22% at predicting individual annotators' ratings and by 33% at predicting variance among annotators, which provides a metric for model uncertainty downstream. We find that annotator ratings can be predicted using their demographic information and opinions on online content, without the need to track identifying annotator IDs that link each annotator to their ratings. We also find that use of non-invasive survey questions on annotators' online experiences helps to maximize privacy and minimize unnecessary collection of demographic information when predicting annotators' opinions.
△ Less
Submitted 17 March, 2024; v1 submitted 11 May, 2023;
originally announced May 2023.
-
Poisoning Language Models During Instruction Tuning
Authors:
Alexander Wan,
Eric Wallace,
Sheng Shen,
Dan Klein
Abstract:
Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired tr…
▽ More
Instruction-tuned LMs such as ChatGPT, FLAN, and InstructGPT are finetuned on datasets that contain user-submitted examples, e.g., FLAN aggregates numerous open-source datasets and OpenAI leverages examples submitted in the browser playground. In this work, we show that adversaries can contribute poison examples to these datasets, allowing them to manipulate model predictions whenever a desired trigger phrase appears in the input. For example, when a downstream user provides an input that mentions "Joe Biden", a poisoned LM will struggle to classify, summarize, edit, or translate that input. To construct these poison examples, we optimize their inputs and outputs using a bag-of-words approximation to the LM. We evaluate our method on open-source instruction-tuned LMs. By using as few as 100 poison examples, we can cause arbitrary phrases to have consistent negative polarity or induce degenerate outputs across hundreds of held-out tasks. Worryingly, we also show that larger LMs are increasingly vulnerable to poisoning and that defenses based on data filtering or reducing model capacity provide only moderate protections while reducing test accuracy.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Goal Driven Discovery of Distributional Differences via Language Descriptions
Authors:
Ruiqi Zhong,
Peter Zhang,
Steve Li,
Jinwoo Ahn,
Dan Klein,
Jacob Steinhardt
Abstract:
Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal "$\textit{comparing the side effects of drug A and drug B}$" and a corpus pair (two large collections of patients' self-reported reactions a…
▽ More
Mining large corpora can generate useful discoveries but is time-consuming for humans. We formulate a new task, D5, that automatically discovers differences between two large corpora in a goal-driven way. The task input is a problem comprising a research goal "$\textit{comparing the side effects of drug A and drug B}$" and a corpus pair (two large collections of patients' self-reported reactions after taking each drug). The output is a language description (discovery) of how these corpora differ (patients taking drug A "$\textit{mention feelings of paranoia}$" more often). We build a D5 system, and to quantitatively measure its performance, we 1) contribute a meta-dataset, OpenD5, aggregating 675 open-ended problems ranging across business, social sciences, humanities, machine learning, and health, and 2) propose a set of unified evaluation metrics: validity, relevance, novelty, and significance. With the dataset and the unified metrics, we confirm that language models can use the goals to propose more relevant, novel, and significant candidate discoveries. Finally, our system produces discoveries previously unknown to the authors on a wide range of applications in OpenD5, including temporal and demographic differences in discussion topics, political stances and stereotypes in speech, insights in commercial reviews, and error patterns in NLP models.
△ Less
Submitted 24 October, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Neural networks meet hyperelasticity: A guide to enforcing physics
Authors:
Lennart Linden,
Dominik K. Klein,
Karl A. Kalina,
Jörg Brummund,
Oliver Weeger,
Markus Kästner
Abstract:
In the present work, a hyperelastic constitutive model based on neural networks is proposed which fulfills all common constitutive conditions by construction, and in particular, is applicable to compressible material behavior. Using different sets of invariants as inputs, a hyperelastic potential is formulated as a convex neural network, thus fulfilling symmetry of the stress tensor, objectivity,…
▽ More
In the present work, a hyperelastic constitutive model based on neural networks is proposed which fulfills all common constitutive conditions by construction, and in particular, is applicable to compressible material behavior. Using different sets of invariants as inputs, a hyperelastic potential is formulated as a convex neural network, thus fulfilling symmetry of the stress tensor, objectivity, material symmetry, polyconvexity, and thermodynamic consistency. In addition, a physically sensible stress behavior of the model is ensured by using analytical growth terms, as well as normalization terms which ensure the undeformed state to be stress free and with zero energy. In particular, polyconvex, invariant-based stress normalization terms are formulated for both isotropic and transversely isotropic material behavior. By fulfilling all of these conditions in an exact way, the proposed physics-augmented model combines a sound mechanical basis with the extraordinary flexibility that neural networks offer. Thus, it harmonizes the theory of hyperelasticity developed in the last decades with the up-to-date techniques of machine learning. Furthermore, the non-negativity of the hyperelastic neural network-based potentials is numerically examined by sampling the space of admissible deformations states, which, to the best of the authors' knowledge, is the only possibility for the considered nonlinear compressible models. For the isotropic neural network model, the sampling space required for that is reduced by analytical considerations. In addition, a proof for the non-negativity of the compressible Neo-Hooke potential is presented. The applicability of the model is demonstrated by calibrating it on data generated with analytical potentials, which is followed by an application of the model to finite element simulations. In addition, an adaption of the model to noisy data is shown and its [...]
△ Less
Submitted 6 July, 2023; v1 submitted 5 February, 2023;
originally announced February 2023.
-
Towards an Ontology-Driven Approach for Process-Aware Risk Propagation
Authors:
Gal Engelberg,
Mattia Fumagalli,
Adrian Kuboszek,
Dan Klein,
Pnina Soffer,
Giancarlo Guizzardi
Abstract:
The rapid development of cyber-physical systems creates an increasing demand for a general approach to risk, especially considering how physical and digital components affect the processes of the system itself. In risk analytics and management, risk propagation is a central technique, which allows the calculation of the cascading effect of risk within a system and supports risk mitigation activiti…
▽ More
The rapid development of cyber-physical systems creates an increasing demand for a general approach to risk, especially considering how physical and digital components affect the processes of the system itself. In risk analytics and management, risk propagation is a central technique, which allows the calculation of the cascading effect of risk within a system and supports risk mitigation activities. However, one open challenge is to devise a process-aware risk propagation solution that can be used to assess the impact of risk at different levels of abstraction, accounting for actors, processes, physical-digital objects, and their interrelations. To address this challenge, we propose a process-aware risk propagation approach that builds on two main components: i. an ontology, which supports functionalities typical of Semantic Web technologies (SWT), and semantics-based intelligent systems, representing a system with processes and objects having different levels of abstraction, and ii. a method to calculate the propagation of risk within the given system. We implemented our approach in a proof-of-concept tool, which was validated and demonstrated in the cybersecurity domain.
△ Less
Submitted 22 December, 2022;
originally announced December 2022.
-
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
Authors:
Boyi Li,
Rodolfo Corona,
Karttikeya Mangalam,
Catherine Chen,
Daniel Flaherty,
Serge Belongie,
Kilian Q. Weinberger,
Jitendra Malik,
Trevor Darrell,
Dan Klein
Abstract:
Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger…
▽ More
Are multimodal inputs necessary for grammar induction? Recent work has shown that multimodal training inputs can improve grammar induction. However, these improvements are based on comparisons to weak text-only baselines that were trained on relatively little textual data. To determine whether multimodal inputs are needed in regimes with large amounts of textual training data, we design a stronger text-only baseline, which we refer to as LC-PCFG. LC-PCFG is a C-PFCG that incorporates em-beddings from text-only large language models (LLMs). We use a fixed grammar family to directly compare LC-PCFG to various multi-modal grammar induction methods. We compare performance on four benchmark datasets. LC-PCFG provides an up to 17% relative improvement in Corpus-F1 compared to state-of-the-art multimodal grammar induction methods. LC-PCFG is also more computationally efficient, providing an up to 85% reduction in parameter count and 8.8x reduction in training time compared to multimodal approaches. These results suggest that multimodal inputs may not be necessary for grammar induction, and emphasize the importance of strong vision-free baselines for evaluating the benefit of multimodal approaches.
△ Less
Submitted 12 April, 2024; v1 submitted 20 December, 2022;
originally announced December 2022.
-
DOC: Improving Long Story Coherence With Detailed Outline Control
Authors:
Kevin Yang,
Dan Klein,
Nanyun Peng,
Yuandong Tian
Abstract:
We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to th…
▽ More
We propose the Detailed Outline Control (DOC) framework for improving long-range plot coherence when automatically generating several-thousand-word-long stories. DOC consists of two complementary components: a detailed outliner and a detailed controller. The detailed outliner creates a more detailed, hierarchically structured outline, shifting creative burden from the main drafting procedure to the planning stage. The detailed controller ensures the more detailed outline is still respected during generation by controlling story passages to align with outline details. In human evaluations of automatically generated stories, DOC substantially outperforms a strong Re3 baseline (Yang et al., 2022) on plot coherence (22.5% absolute gain), outline relevance (28.2%), and interestingness (20.7%). Humans also judged DOC to be much more controllable in an interactive generation setting.
△ Less
Submitted 14 June, 2023; v1 submitted 20 December, 2022;
originally announced December 2022.
-
Discovering Latent Knowledge in Language Models Without Supervision
Authors:
Collin Burns,
Haotian Ye,
Dan Klein,
Jacob Steinhardt
Abstract:
Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a l…
▽ More
Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. Specifically, we introduce a method for accurately answering yes-no questions given only unlabeled model activations. It works by finding a direction in activation space that satisfies logical consistency properties, such as that a statement and its negation have opposite truth values. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models: across 6 models and 10 question-answering datasets, it outperforms zero-shot accuracy by 4\% on average. We also find that it cuts prompt sensitivity in half and continues to maintain high accuracy even when models are prompted to generate incorrect answers. Our results provide an initial step toward discovering what language models know, distinct from what they say, even when we don't have access to explicit ground truth labels.
△ Less
Submitted 2 March, 2024; v1 submitted 7 December, 2022;
originally announced December 2022.
-
Neural Unsupervised Reconstruction of Protolanguage Word Forms
Authors:
Andre He,
Nicholas Tomlin,
Dan Klein
Abstract:
We present a state-of-the-art neural approach to the unsupervised reconstruction of ancient word forms. Previous work in this domain used expectation-maximization to predict simple phonological changes between ancient word forms and their cognates in modern languages. We extend this work with neural models that can capture more complicated phonological and morphological changes. At the same time,…
▽ More
We present a state-of-the-art neural approach to the unsupervised reconstruction of ancient word forms. Previous work in this domain used expectation-maximization to predict simple phonological changes between ancient word forms and their cognates in modern languages. We extend this work with neural models that can capture more complicated phonological and morphological changes. At the same time, we preserve the inductive biases from classical methods by building monotonic alignment constraints into the model and deliberately underfitting during the maximization step. We evaluate our performance on the task of reconstructing Latin from a dataset of cognates across five Romance languages, achieving a notable reduction in edit distance from the target word forms compared to previous methods.
△ Less
Submitted 16 November, 2022;
originally announced November 2022.
-
Re3: Generating Longer Stories With Recursive Reprompting and Revision
Authors:
Kevin Yang,
Yuandong Tian,
Nanyun Peng,
Dan Klein
Abstract:
We consider the problem of automatically generating longer stories of over two thousand words. Compared to prior work on shorter stories, long-range plot coherence and relevance are more central challenges here. We propose the Recursive Reprompting and Revision framework (Re3) to address these challenges by (a) prompting a general-purpose language model to construct a structured overarching plan,…
▽ More
We consider the problem of automatically generating longer stories of over two thousand words. Compared to prior work on shorter stories, long-range plot coherence and relevance are more central challenges here. We propose the Recursive Reprompting and Revision framework (Re3) to address these challenges by (a) prompting a general-purpose language model to construct a structured overarching plan, and (b) generating story passages by repeatedly injecting contextual information from both the plan and current story state into a language model prompt. We then revise by (c) reranking different continuations for plot coherence and premise relevance, and finally (d) editing the best continuation for factual consistency. Compared to similar-length stories generated directly from the same base model, human evaluators judged substantially more of Re3's stories as having a coherent overarching plot (by 14% absolute increase), and relevant to the given initial premise (by 20%).
△ Less
Submitted 21 October, 2022; v1 submitted 13 October, 2022;
originally announced October 2022.
-
Learning by Distilling Context
Authors:
Charlie Snell,
Dan Klein,
Ruiqi Zhong
Abstract:
Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning capabilities by generating a scratch-pad before predicting the final answers. However, they do not \textit{internalize} these performance gains, which disappear when the context tokens are gone. Our work proposes…
▽ More
Language models significantly benefit from context tokens, such as prompts or scratchpads. They perform better when prompted with informative instructions, and they acquire new reasoning capabilities by generating a scratch-pad before predicting the final answers. However, they do not \textit{internalize} these performance gains, which disappear when the context tokens are gone. Our work proposes to apply context distillation so that a language model can improve itself by internalizing these gains. Concretely, given a synthetic unlabeled input for the target task, we condition the model on ``[instructions] + [task-input]'' to predict ``[scratch-pad] + [final answer]''; then we fine-tune the same model to predict its own ``[final answer]'' conditioned on the ``[task-input]'', without seeing the ``[instructions]'' or using the ``[scratch-pad]''.
We show that context distillation is a general method to train language models, and it can effectively internalize 3 types of training signals. First, it can internalize abstract task instructions and explanations, so we can iteratively update the model parameters with new instructions and overwrite old ones. Second, it can internalize step-by-step reasoning for complex tasks (e.g., 8-digit addition), and such a newly acquired capability proves to be useful for other downstream tasks. Finally, it can internalize concrete training examples, and it outperforms directly learning with gradient descent by 9\% on the SPIDER Text-to-SQL dataset; furthermore, combining context distillation operations can internalize more training examples than the context window size allows.
△ Less
Submitted 29 September, 2022;
originally announced September 2022.
-
The Whole Truth and Nothing But the Truth: Faithful and Controllable Dialogue Response Generation with Dataflow Transduction and Constrained Decoding
Authors:
Hao Fang,
Anusha Balakrishnan,
Harsh Jhamtani,
John Bufe,
Jean Crawford,
Jayant Krishnamurthy,
Adam Pauls,
Jason Eisner,
Jacob Andreas,
Dan Klein
Abstract:
In a real-world dialogue system, generated text must be truthful and informative while remaining fluent and adhering to a prescribed style. Satisfying these constraints simultaneously is difficult for the two predominant paradigms in language generation: neural language modeling and rule-based generation. We describe a hybrid architecture for dialogue response generation that combines the strength…
▽ More
In a real-world dialogue system, generated text must be truthful and informative while remaining fluent and adhering to a prescribed style. Satisfying these constraints simultaneously is difficult for the two predominant paradigms in language generation: neural language modeling and rule-based generation. We describe a hybrid architecture for dialogue response generation that combines the strengths of both paradigms. The first component of this architecture is a rule-based content selection model defined using a new formal framework called dataflow transduction, which uses declarative rules to transduce a dialogue agent's actions and their results (represented as dataflow graphs) into context-free grammars representing the space of contextually acceptable responses. The second component is a constrained decoding procedure that uses these grammars to constrain the output of a neural language model, which selects fluent utterances. Our experiments show that this system outperforms both rule-based and learned approaches in human evaluations of fluency, relevance, and truthfulness.
△ Less
Submitted 26 May, 2023; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Neural Networks for Chess
Authors:
Dominik Klein
Abstract:
AlphaZero, Leela Chess Zero and Stockfish NNUE revolutionized Computer Chess. This book gives a complete introduction into the technical inner workings of such engines. The book is split into four main chapters -- excluding chapter 1 (introduction) and chapter 6 (conclusion): Chapter 2 introduces neural networks and covers all the basic building blocks that are used to build deep networks such as…
▽ More
AlphaZero, Leela Chess Zero and Stockfish NNUE revolutionized Computer Chess. This book gives a complete introduction into the technical inner workings of such engines. The book is split into four main chapters -- excluding chapter 1 (introduction) and chapter 6 (conclusion): Chapter 2 introduces neural networks and covers all the basic building blocks that are used to build deep networks such as those used by AlphaZero. Contents include the perceptron, back-propagation and gradient descent, classification, regression, multilayer perceptron, vectorization techniques, convolutional networks, squeeze and excitation networks, fully connected networks, batch normalization and rectified linear units, residual layers, overfitting and underfitting. Chapter 3 introduces classical search techniques used for chess engines as well as those used by AlphaZero. Contents include minimax, alpha-beta search, and Monte Carlo tree search. Chapter 4 shows how modern chess engines are designed. Aside from the ground-breaking AlphaGo, AlphaGo Zero and AlphaZero we cover Leela Chess Zero, Fat Fritz, Fat Fritz 2 and Efficiently Updatable Neural Networks (NNUE) as well as Maia. Chapter 5 is about implementing a miniaturized AlphaZero. Hexapawn, a minimalistic version of chess, is used as an example for that. Hexapawn is solved by minimax search and training positions for supervised learning are generated. Then as a comparison, an AlphaZero-like training loop is implemented where training is done via self-play combined with reinforcement learning. Finally, AlphaZero-like training and supervised training are compared.
△ Less
Submitted 3 September, 2022;
originally announced September 2022.
-
Finite electro-elasticity with physics-augmented neural networks
Authors:
Dominik K. Klein,
Rogelio Ortigosa,
Jesús Martínez-Frutos,
Oliver Weeger
Abstract:
In the present work, a machine learning based constitutive model for electro-mechanically coupled material behavior at finite deformations is proposed. Using different sets of invariants as inputs, an internal energy density is formulated as a convex neural network. In this way, the model fulfills the polyconvexity condition which ensures material stability, as well as thermodynamic consistency, o…
▽ More
In the present work, a machine learning based constitutive model for electro-mechanically coupled material behavior at finite deformations is proposed. Using different sets of invariants as inputs, an internal energy density is formulated as a convex neural network. In this way, the model fulfills the polyconvexity condition which ensures material stability, as well as thermodynamic consistency, objectivity, material symmetry, and growth conditions. Depending on the considered invariants, this physics-augmented machine learning model can either be applied for compressible or nearly incompressible material behavior, as well as for arbitrary material symmetry classes. The applicability and versatility of the approach is demonstrated by calibrating it on transversely isotropic data generated with an analytical potential, as well as for the effective constitutive modeling of an analytically homogenized, transversely isotropic rank-one laminate composite and a numerically homogenized cubic metamaterial. These examinations show the excellent generalization properties that physics-augmented neural networks offer also for multi-physical material modeling such as nonlinear electro-elasticity.
△ Less
Submitted 27 August, 2022; v1 submitted 10 June, 2022;
originally announced June 2022.
-
Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL
Authors:
Ruiqi Zhong,
Charlie Snell,
Dan Klein,
Jason Eisner
Abstract:
Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utteranc…
▽ More
Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It then asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct and could be used to fine-tune the parser. As a first case study, we recruited human non-programmers to use APEL to re-annotate SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation accuracy as the original expert annotators (75%) and exposed many subtle errors in the original annotations.
△ Less
Submitted 23 October, 2023; v1 submitted 24 May, 2022;
originally announced May 2022.
-
Voxel-informed Language Grounding
Authors:
Rodolfo Corona,
Shizhan Zhu,
Dan Klein,
Trevor Darrell
Abstract:
Natural language applied to natural 2D images describes a fundamentally 3D world. We present the Voxel-informed Language Grounder (VLG), a language grounding model that leverages 3D geometric information in the form of voxel maps derived from the visual input using a volumetric reconstruction model. We show that VLG significantly improves grounding accuracy on SNARE, an object reference game task.…
▽ More
Natural language applied to natural 2D images describes a fundamentally 3D world. We present the Voxel-informed Language Grounder (VLG), a language grounding model that leverages 3D geometric information in the form of voxel maps derived from the visual input using a volumetric reconstruction model. We show that VLG significantly improves grounding accuracy on SNARE, an object reference game task. At the time of writing, VLG holds the top place on the SNARE leaderboard, achieving SOTA results with a 2.0% absolute improvement.
△ Less
Submitted 19 May, 2022;
originally announced May 2022.
-
Automated Crossword Solving
Authors:
Eric Wallace,
Nicholas Tomlin,
Albert Xu,
Kevin Yang,
Eshaan Pathak,
Matthew Ginsberg,
Dan Klein
Abstract:
We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 7…
▽ More
We present the Berkeley Crossword Solver, a state-of-the-art approach for automatically solving crossword puzzles. Our system works by generating answer candidates for each crossword clue using neural question answering models and then combines loopy belief propagation with local search to find full puzzle solutions. Compared to existing approaches, our system improves exact puzzle accuracy from 71% to 82% on crosswords from The New York Times and obtains 99.9% letter accuracy on themeless puzzles. Additionally, in 2021, a hybrid of our system and the existing Dr.Fill system outperformed all human competitors for the first time at the American Crossword Puzzle Tournament. To facilitate research on question answering and crossword solving, we analyze our system's remaining errors and release a dataset of over six million question-answer pairs.
△ Less
Submitted 3 July, 2022; v1 submitted 19 May, 2022;
originally announced May 2022.
-
Understanding Game-Playing Agents with Natural Language Annotations
Authors:
Nicholas Tomlin,
Andre He,
Dan Klein
Abstract:
We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We f…
▽ More
We present a new dataset containing 10K human-annotated games of Go and show how these natural language annotations can be used as a tool for model interpretability. Given a board state and its associated comment, our approach uses linear probing to predict mentions of domain-specific terms (e.g., ko, atari) from the intermediate state representations of game-playing agents like AlphaGo Zero. We find these game concepts are nontrivially encoded in two distinct policy networks, one trained via imitation learning and another trained via reinforcement learning. Furthermore, mentions of domain-specific terms are most easily predicted from the later layers of both models, suggesting that these policy networks encode high-level abstractions similar to those used in the natural language annotations.
△ Less
Submitted 15 April, 2022;
originally announced April 2022.
-
Inferring Rewards from Language in Context
Authors:
Jessy Lin,
Daniel Fried,
Dan Klein,
Anca Dragan
Abstract:
In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. We present a model that infers rewards from language pragmatically: reasonin…
▽ More
In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e.g., selecting that flight). However, language also conveys information about a user's underlying reward function (e.g., a general preference for JetBlue), which can allow a model to carry out desirable actions in new contexts. We present a model that infers rewards from language pragmatically: reasoning about how speakers choose utterances not only to elicit desired actions, but also to reveal information about their preferences. On a new interactive flight-booking task with natural language, our model more accurately infers rewards and predicts optimal actions in unseen environments, in comparison to past work that first maps language to actions (instruction following) and then maps actions to rewards (inverse reinforcement learning).
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Describing Differences between Text Distributions with Natural Language
Authors:
Ruiqi Zhong,
Charlie Snell,
Dan Klein,
Jacob Steinhardt
Abstract:
How do two distributions of texts differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by "learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., "is military-related…
▽ More
How do two distributions of texts differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by "learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., "is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: "[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is_____." We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.
△ Less
Submitted 18 May, 2022; v1 submitted 28 January, 2022;
originally announced January 2022.