subscribe to arXiv mailings

arXiv:2406.19225 [pdf, other]

ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation

Authors: Nazanin Moradinasab, Laura S. Shankman, Rebecca A. Deaton, Gary K. Owens, Donald E. Brown

Abstract: Domain adaptive semantic segmentation aims to generate accurate and dense predictions for an unlabeled target domain by leveraging a supervised model trained on a labeled source domain. The prevalent self-training approach involves retraining the dense discriminative classifier of $p(class|pixel feature)$ using the pseudo-labels from the target domain. While many methods focus on mitigating the is… ▽ More Domain adaptive semantic segmentation aims to generate accurate and dense predictions for an unlabeled target domain by leveraging a supervised model trained on a labeled source domain. The prevalent self-training approach involves retraining the dense discriminative classifier of $p(class|pixel feature)$ using the pseudo-labels from the target domain. While many methods focus on mitigating the issue of noisy pseudo-labels, they often overlook the underlying data distribution p(pixel feature|class) in both the source and target domains. To address this limitation, we propose the multi-prototype Gaussian-Mixture-based (ProtoGMM) model, which incorporates the GMM into contrastive losses to perform guided contrastive learning. Contrastive losses are commonly executed in the literature using memory banks, which can lead to class biases due to underrepresented classes. Furthermore, memory banks often have fixed capacities, potentially restricting the model's ability to capture diverse representations of the target/source domains. An alternative approach is to use global class prototypes (i.e. averaged features per category). However, the global prototypes are based on the unimodal distribution assumption per class, disregarding within-class variation. To address these challenges, we propose the ProtoGMM model. This novel approach involves estimating the underlying multi-prototype source distribution by utilizing the GMM on the feature space of the source samples. The components of the GMM model act as representative prototypes. To achieve increased intra-class semantic similarity, decreased inter-class similarity, and domain alignment between the source and target domains, we employ multi-prototype contrastive learning between source distribution and target samples. The experiments show the effectiveness of our method on UDA benchmarks. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.04058 [pdf, ps, other]

Watching Popular Musicians Learn by Ear: A Hypothesis-Generating Study of Human-Recording Interactions in YouTube Videos

Authors: Christopher Liscio, Daniel G. Brown

Abstract: Popular musicians often learn music by ear. It is unclear what role technology plays for those with experience at this task. In search of opportunities for the development of novel human-recording interactions, we analyze 18 YouTube videos depicting real-world examples of by-ear learning, and discuss why, during this preliminary phase of research, online videos are appropriate data. From our obser… ▽ More Popular musicians often learn music by ear. It is unclear what role technology plays for those with experience at this task. In search of opportunities for the development of novel human-recording interactions, we analyze 18 YouTube videos depicting real-world examples of by-ear learning, and discuss why, during this preliminary phase of research, online videos are appropriate data. From our observations we generate hypotheses that can inform future work. For example, a musician's scope of learning may influence what technological interactions would help them, they could benefit from tools that accommodate their working memory, and transcription does not appear to play a key role in ear learning. Based on these findings, we pose a number of research questions, and discuss their methodological considerations to guide future study. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.01855 [pdf, other]

TruthEval: A Dataset to Evaluate LLM Truthfulness and Reliability

Authors: Aisha Khatun, Daniel G. Brown

Abstract: Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values.… ▽ More Large Language Model (LLM) evaluation is currently one of the most important areas of research, with existing benchmarks proving to be insufficient and not completely representative of LLMs' various capabilities. We present a curated collection of challenging statements on sensitive topics for LLM benchmarking called TruthEval. These statements were curated by hand and contain known truth values. The categories were chosen to distinguish LLMs' abilities from their stochastic nature. We perform some initial analyses using this dataset and find several instances of LLMs failing in simple tasks showing their inability to understand simple questions. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.00492 [pdf, other]

Is Temperature the Creativity Parameter of Large Language Models?

Authors: Max Peeperkorn, Tom Kouwenhoven, Dan Brown, Anna Jordanous

Abstract: Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with… ▽ More Large language models (LLMs) are applied to all sorts of creative tasks, and their outputs vary from beautiful, to peculiar, to pastiche, into plain plagiarism. The temperature parameter of an LLM regulates the amount of randomness, leading to more diverse outputs; therefore, it is often claimed to be the creativity parameter. Here, we investigate this claim using a narrative generation task with a predetermined fixed context, model and prompt. Specifically, we present an empirical analysis of the LLM output for different temperature values using four necessary conditions for creativity in narrative generation: novelty, typicality, cohesion, and coherence. We find that temperature is weakly correlated with novelty, and unsurprisingly, moderately correlated with incoherence, but there is no relationship with either cohesion or typicality. However, the influence of temperature on creativity is far more nuanced and weak than suggested by the "creativity parameter" claim; overall results suggest that the LLM generates slightly more novel outputs as temperatures get higher. Finally, we discuss ideas to allow more controlled LLM creativity, rather than relying on chance via changing the temperature parameter. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: To be published in the Proceedings of the 15th International Conference on Computational Creativity (ICCC'24), 8 pages, 2 figures, 2 tables

arXiv:2404.17051 [pdf, other]

Toward Improving Binary Program Comprehension via Embodied Immersion: A Survey

Authors: Dennis Brown, Emily Mulder, Samuel Mulder

Abstract: Binary program comprehension is critical for many use cases but is difficult, suffering from compounded uncertainty and lack of full automation. We seek methods to improve the effectiveness of the human-machine joint cognitive system performing binary PC. We survey three research areas to perform an indirect cognitive task analysis: cognitive models of the PC process, related elements of cognitive… ▽ More Binary program comprehension is critical for many use cases but is difficult, suffering from compounded uncertainty and lack of full automation. We seek methods to improve the effectiveness of the human-machine joint cognitive system performing binary PC. We survey three research areas to perform an indirect cognitive task analysis: cognitive models of the PC process, related elements of cognitive theory, and applicable affordances of virtual reality. Based on common elements in these areas, we identify three overarching themes: enhancing abductive iteration, augmenting working memory, and supporting information organization. These themes spotlight several affordances of VR to exploit in future studies of immersive tools for binary PC. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 27 pages, 4 figures, Submitted to ACM Computing Surveys

ACM Class: H.1.2; H.5.1; D.2.7

arXiv:2404.07185 [pdf, other]

Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Authors: Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown

Abstract: Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This pap… ▽ More Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decision-making processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are high-dimensional point clouds. Code and videos available here: https://sites.google.com/view/lfdinelectrocautery △ Less

Submitted 15 April, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

Comments: In proceedings of the International Symposium on Medical Robotics (ISMR) 2024. Equal contribution from two first authors

arXiv:2404.04241 [pdf, other]

Modeling Kinematic Uncertainty of Tendon-Driven Continuum Robots via Mixture Density Networks

Authors: Jordan Thompson, Brian Y. Cho, Daniel S. Brown, Alan Kuntz

Abstract: Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a… ▽ More Tendon-driven continuum robot kinematic models are frequently computationally expensive, inaccurate due to unmodeled effects, or both. In particular, unmodeled effects produce uncertainties that arise during the robot's operation that lead to variability in the resulting geometry. We propose a novel solution to these issues through the development of a Gaussian mixture kinematic model. We train a mixture density network to output a Gaussian mixture model representation of the robot geometry given the current tendon displacements. This model computes a probability distribution that is more representative of the true distribution of geometries at a given configuration than a model that outputs a single geometry, while also reducing the computation time. We demonstrate one use of this model through a trajectory optimization method that explicitly reasons about the workspace uncertainty to minimize the probability of collision. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.19831 [pdf, other]

TASR: A Novel Trust-Aware Stackelberg Routing Algorithm to Mitigate Traffic Congestion

Authors: Doris E. M. Brown, Venkata Sriram Siddhardh Nadendla, Sajal K. Das

Abstract: Stackelberg routing platforms (SRP) reduce congestion in one-shot traffic networks by proposing optimal route recommendations to selfish travelers. Traditionally, Stackelberg routing is cast as a partial control problem where a fraction of traveler flow complies with route recommendations, while the remaining respond as selfish travelers. In this paper, a novel Stackelberg routing framework is for… ▽ More Stackelberg routing platforms (SRP) reduce congestion in one-shot traffic networks by proposing optimal route recommendations to selfish travelers. Traditionally, Stackelberg routing is cast as a partial control problem where a fraction of traveler flow complies with route recommendations, while the remaining respond as selfish travelers. In this paper, a novel Stackelberg routing framework is formulated where the agents exhibit \emph{probabilistic compliance} by accepting SRP's route recommendations with a \emph{trust} probability. A greedy \emph{\textbf{T}rust-\textbf{A}ware \textbf{S}tackelberg \textbf{R}outing} algorithm (in short, TASR) is proposed for SRP to compute unique path recommendations to each traveler flow with a unique demand. Simulation experiments are designed with random travel demands with diverse trust values on real road networks such as Sioux Falls, Chicago Sketch, and Sydney networks for both single-commodity and multi-commodity flows. The performance of TASR is compared with state-of-the-art Stackelberg routing methods in terms of traffic congestion and trust dynamics over repeated interaction between the SRP and the travelers. Results show that TASR improves network congestion without causing a significant reduction in trust towards the SRP, when compared to most well-known Stackelberg routing strategies. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.02431 [pdf, other]

Bayesian Constraint Inference from User Demonstrations Based on Margin-Respecting Preference Models

Authors: Dimitris Papadimitriou, Daniel S. Brown

Abstract: It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian… ▽ More It is crucial for robots to be aware of the presence of constraints in order to acquire safe policies. However, explicitly specifying all constraints in an environment can be a challenging task. State-of-the-art constraint inference algorithms learn constraints from demonstrations, but tend to be computationally expensive and prone to instability issues. In this paper, we propose a novel Bayesian method that infers constraints based on preferences over demonstrations. The main advantages of our proposed approach are that it 1) infers constraints without calculating a new policy at each iteration, 2) uses a simple and more realistic ranking of groups of demonstrations, without requiring pairwise comparisons over all demonstrations, and 3) adapts to cases where there are varying levels of constraint violation. Our empirical results demonstrate that our proposed Bayesian approach infers constraints of varying severity, more accurately than state-of-the-art constraint inference methods. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2401.07955 [pdf, other]

A Study on Large Language Models' Limitations in Multiple-Choice Question Answering

Authors: Aisha Khatun, Daniel G. Brown

Abstract: The widespread adoption of Large Language Models (LLMs) has become commonplace, particularly with the emergence of open-source models. More importantly, smaller models are well-suited for integration into consumer devices and are frequently employed either as standalone solutions or as subroutines in various AI tasks. Despite their ubiquitous use, there is no systematic analysis of their specific… ▽ More The widespread adoption of Large Language Models (LLMs) has become commonplace, particularly with the emergence of open-source models. More importantly, smaller models are well-suited for integration into consumer devices and are frequently employed either as standalone solutions or as subroutines in various AI tasks. Despite their ubiquitous use, there is no systematic analysis of their specific capabilities and limitations. In this study, we tackle one of the most widely used tasks - answering Multiple Choice Question (MCQ). We analyze 26 small open-source models and find that 65% of the models do not understand the task, only 4 models properly select an answer from the given choices, and only 5 of these models are choice order independent. These results are rather alarming given the extensive use of MCQ tests with these models. We recommend exercising caution and testing task understanding before using MCQ to evaluate LLMs in any field whatsoever. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.13274 [pdf, other]

A Broad Comparative Evaluation of Software Debloating Tools

Authors: Michael D. Brown, Adam Meily, Brian Fairservice, Akshay Sood, Jonathan Dorn, Eric Kilmer, Ronald Eytchison

Abstract: Software debloating tools seek to improve program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics an… ▽ More Software debloating tools seek to improve program security and performance by removing unnecessary code, called bloat. While many techniques have been proposed, several barriers to their adoption have emerged. Namely, debloating tools are highly specialized, making it difficult for adopters to find the right type of tool for their needs. This is further hindered by a lack of established metrics and comparative evaluations between tools. To close this information gap, we surveyed 10 years of debloating literature and several tools currently under commercial development to taxonomize knowledge about the debloating ecosystem. We then conducted a broad comparative evaluation of 10 debloating tools to determine their relative strengths and weaknesses. Our evaluation, conducted on a diverse set of 20 benchmark programs, measures tools across 12 performance, security, and correctness metrics. Our evaluation surfaces several concerning findings that contradict the prevailing narrative in the debloating literature. First, debloating tools lack the maturity required to be used on real-world software, evidenced by a slim 22% overall success rate for creating passable debloated versions of medium- and high-complexity benchmarks. Second, debloating tools struggle to produce sound and robust programs. Using our novel differential fuzzing tool, DIFFER, we discovered that only 13% of our debloating attempts produced a sound and robust debloated program. Finally, our results indicate that debloating tools typically do not improve the performance or security posture of debloated programs by a significant degree according to our evaluation metrics. We believe that our contributions in this paper will help potential adopters better understand the landscape of tools and will motivate future research and development of more capable debloating tools. △ Less

Submitted 12 June, 2024; v1 submitted 20 December, 2023; originally announced December 2023.

Comments: 17 pages, 8 tables

arXiv:2312.04600 [pdf, other]

Haldane Bundles: A Dataset for Learning to Predict the Chern Number of Line Bundles on the Torus

Authors: Cody Tipton, Elizabeth Coda, Davis Brown, Alyson Bittner, Jung Lee, Grayson Jorgenson, Tegan Emerson, Henry Kvinge

Abstract: Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic… ▽ More Characteristic classes, which are abstract topological invariants associated with vector bundles, have become an important notion in modern physics with surprising real-world consequences. As a representative example, the incredible properties of topological insulators, which are insulators in their bulk but conductors on their surface, can be completely characterized by a specific characteristic class associated with their electronic band structure, the first Chern class. Given their importance to next generation computing and the computational challenge of calculating them using first-principles approaches, there is a need to develop machine learning approaches to predict the characteristic classes associated with a material system. To aid in this program we introduce the {\emph{Haldane bundle dataset}}, which consists of synthetically generated complex line bundles on the $2$-torus. We envision this dataset, which is not as challenging as noisy and sparsely measured real-world datasets but (as we show) still difficult for off-the-shelf architectures, to be a testing ground for architectures that incorporate the rich topological and geometric priors underlying characteristic classes. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.01435 [pdf, other]

Automatic Report Generation for Histopathology images using pre-trained Vision Transformers and BERT

Authors: Saurav Sengupta, Donald E. Brown

Abstract: Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art (SOTA) methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an exist… ▽ More Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art (SOTA) methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer (ViT) to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model for language modeling-based decoder for report generation, we can build a performant and portable report generation mechanism that takes into account the whole high resolution image. Our method allows us to not only generate and evaluate captions that describe the image, but also helps us classify the image into tissue types and the gender of the patient as well. Our best performing model achieves a 89.52% accuracy in Tissue Type classification with a BLEU-4 score of 0.12 in our caption generation task. △ Less

Submitted 15 March, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted at IEEE ISBI 2024. arXiv admin note: substantial text overlap with arXiv:2311.06176

arXiv:2311.15696 [pdf, other]

Peptide Binding Classification on Quantum Computers

Authors: Charles London, Douglas Brown, Wenduan Xu, Sezen Vatansever, Christopher James Langmead, Dimitri Kartsaklis, Stephen Clark, Konstantinos Meichanetzidis

Abstract: We conduct an extensive study on using near-term quantum computers for a task in the domain of computational biology. By constructing quantum models based on parameterised quantum circuits we perform sequence classification on a task relevant to the design of therapeutic proteins, and find competitive performance with classical baselines of similar scale. To study the effect of noise, we run some… ▽ More We conduct an extensive study on using near-term quantum computers for a task in the domain of computational biology. By constructing quantum models based on parameterised quantum circuits we perform sequence classification on a task relevant to the design of therapeutic proteins, and find competitive performance with classical baselines of similar scale. To study the effect of noise, we run some of the best-performing quantum models with favourable resource requirements on emulators of state-of-the-art noisy quantum processors. We then apply error mitigation methods to improve the signal. We further execute these quantum models on the Quantinuum H1-1 trapped-ion quantum processor and observe very close agreement with noiseless exact simulation. Finally, we perform feature attribution methods and find that the quantum models indeed identify sensible relationships, at least as well as the classical baselines. This work constitutes the first proof-of-concept application of near-term quantum computing to a task critical to the design of therapeutic proteins, opening the route toward larger-scale applications in this and related fields, in line with the hardware development roadmaps of near-term quantum technologies. △ Less

Submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.06176 [pdf, other]

Automatic Report Generation for Histopathology images using pre-trained Vision Transformers

Authors: Saurav Sengupta, Donald E. Brown

Abstract: Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre… ▽ More Deep learning for histopathology has been successfully used for disease classification, image segmentation and more. However, combining image and text modalities using current state-of-the-art methods has been a challenge due to the high resolution of histopathology images. Automatic report generation for histopathology images is one such challenge. In this work, we show that using an existing pre-trained Vision Transformer in a two-step process of first using it to encode 4096x4096 sized patches of the Whole Slide Image (WSI) and then using it as the encoder and an LSTM decoder for report generation, we can build a fairly performant and portable report generation mechanism that takes into account the whole of the high resolution image, instead of just the patches. We are also able to use representations from an existing powerful pre-trained hierarchical vision transformer and show its usefulness in not just zero shot classification but also for report generation. △ Less

Submitted 13 November, 2023; v1 submitted 10 November, 2023; originally announced November 2023.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2023, December 10th, 2023, New Orleans, United States, 09 pages

arXiv:2310.16941 [pdf, other]

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

Authors: Connor Mattson, Jeremy C. Clark, Daniel S. Brown

Abstract: We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this… ▽ More We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities. Prior work has considered behavior search for homogeneous swarms and proposed the use of novelty search over either a hand-specified or learned behavior space followed by clustering to return a taxonomy of emergent behaviors to the user. In this paper, we seek to better understand the role of novelty search and the efficacy of using clustering to discover novel emergent behaviors. Through a large set of experiments and ablations, we analyze the effect of representations, evolutionary search, and various clustering methods in the search for novel behaviors in a heterogeneous swarm. Our results indicate that prior methods fail to discover many interesting behaviors and that an iterative human-in-the-loop discovery process discovers more behaviors than random search, swarm chemistry, and automated behavior discovery. The combined discoveries of our experiments uncover 23 emergent behaviors, 18 of which are novel discoveries. To the best of our knowledge, these are the first known emergent behaviors for heterogeneous swarms of computation-free agents. Videos, code, and appendix are available at the project website: https://sites.google.com/view/heterogeneous-bd-methods △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 11 pages, 9 figures, To be published in Proceedings IEEE International Symposium on Multi-Robot & Multi-Agent Systems (MRS 2023)

arXiv:2310.14993 [pdf, other]

Understanding the Inner Workings of Language Models Through Representation Dissimilarity

Authors: Davis Brown, Charles Godfrey, Nicholas Konz, Jonathan Tu, Henry Kvinge

Abstract: As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining in… ▽ More As language models are applied to an increasing number of real-world applications, understanding their inner workings has become an important issue in model trust, interpretability, and transparency. In this work we show that representation dissimilarity measures, which are functions that measure the extent to which two model's internal representations differ, can be a valuable tool for gaining insight into the mechanics of language models. Among our insights are: (i) an apparent asymmetry in the internal representations of model using SoLU and GeLU activation functions, (ii) evidence that dissimilarity measures can identify and locate generalization properties of models that are invisible via in-distribution test set performance, and (iii) new evaluations of how language model features vary as width and depth are increased. Our results suggest that dissimilarity measures are a promising set of tools for shedding light on the inner workings of language models. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 (main)

arXiv:2310.10610 [pdf, other]

Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

Authors: Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

Abstract: Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to… ▽ More Our ultimate goal is to build robust policies for robots that assist people. What makes this hard is that people can behave unexpectedly at test time, potentially interacting with the robot outside its training distribution and leading to failures. Even just measuring robustness is a challenge. Adversarial perturbations are the default, but they can paint the wrong picture: they can correspond to human motions that are unlikely to occur during natural interactions with people. A robot policy might fail under small adversarial perturbations but work under large natural perturbations. We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance. We introduce RIGID, a method for constructing this frontier by training adversarial human policies that trade off between minimizing robot reward and acting human-like (as measured by a discriminator). On an Assistive Gym task, we use RIGID to analyze the performance of standard collaborative Reinforcement Learning, as well as the performance of existing methods meant to increase robustness. We also compare the frontier RIGID identifies with the failures identified in expert adversarial interaction, and with naturally-occurring failures during user interaction. Overall, we find evidence that RIGID can provide a meaningful measure of robustness predictive of deployment performance, and uncover failure cases in human-robot interaction that are difficult to find manually. https://ood-human.github.io. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.07667 [pdf, other]

Global Minima, Recoverability Thresholds, and Higher-Order Structure in GNNS

Authors: Drake Brown, Trevor Garrity, Kaden Parker, Jason Oliphant, Stone Carson, Cole Hanson, Zachary Boyd

Abstract: We analyze the performance of graph neural network (GNN) architectures from the perspective of random graph theory. Our approach promises to complement existing lenses on GNN analysis, such as combinatorial expressive power and worst-case adversarial analysis, by connecting the performance of GNNs to typical-case properties of the training data. First, we theoretically characterize the nodewise ac… ▽ More We analyze the performance of graph neural network (GNN) architectures from the perspective of random graph theory. Our approach promises to complement existing lenses on GNN analysis, such as combinatorial expressive power and worst-case adversarial analysis, by connecting the performance of GNNs to typical-case properties of the training data. First, we theoretically characterize the nodewise accuracy of one- and two-layer GCNs relative to the contextual stochastic block model (cSBM) and related models. We additionally prove that GCNs cannot beat linear models under certain circumstances. Second, we numerically map the recoverability thresholds, in terms of accuracy, of four diverse GNN architectures (GCN, GAT, SAGE, and Graph Transformer) under a variety of assumptions about the data. Sample results of this second analysis include: heavy-tailed degree distributions enhance GNN performance, GNNs can work well on strongly heterophilous graphs, and SAGE and Graph Transformer can perform well on arbitrarily noisy edge data, but no architecture handled sufficiently noisy feature data well. Finally, we show how both specific higher-order structures in synthetic data and the mix of empirical structures in real data have dramatic effects (usually negative) on GNN performance. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 28 pages

arXiv:2310.03149 [pdf, other]

Attributing Learned Concepts in Neural Networks to Training Data

Authors: Nicholas Konz, Charles Godfrey, Madelyn Shapiro, Jonathan Tu, Henry Kvinge, Davis Brown

Abstract: By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this,… ▽ More By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation. △ Less

Submitted 28 December, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: ATTRIB Workshop at NeurIPS 2023

arXiv:2309.16536 [pdf, other]

Uncertainty Quantification for Eosinophil Segmentation

Authors: Kevin Lin, Donald Brown, Sana Syed, Adam Greene

Abstract: Eosinophilic Esophagitis (EoE) is an allergic condition increasing in prevalence. To diagnose EoE, pathologists must find 15 or more eosinophils within a single high-power field (400X magnification). Determining whether or not a patient has EoE can be an arduous process and any medical imaging approaches used to assist diagnosis must consider both efficiency and precision. We propose an improvemen… ▽ More Eosinophilic Esophagitis (EoE) is an allergic condition increasing in prevalence. To diagnose EoE, pathologists must find 15 or more eosinophils within a single high-power field (400X magnification). Determining whether or not a patient has EoE can be an arduous process and any medical imaging approaches used to assist diagnosis must consider both efficiency and precision. We propose an improvement of Adorno et al's approach for quantifying eosinphils using deep image segmentation. Our new approach leverages Monte Carlo Dropout, a common approach in deep learning to reduce overfitting, to provide uncertainty quantification on current deep learning models. The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils. △ Less

Submitted 7 November, 2023; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Preprint, Final Article Submitted to ICBRA 2023 and will be published in the International Conference Proceedings by ACM, Association for Computing Machinery (ISBN: 979-8-4007-0815-2), which will be archived in ACM Digital Library, indexed by Ei Compendex and Scopus

arXiv:2309.11408 [pdf, other]

Indirect Swarm Control: Characterization and Analysis of Emergent Swarm Behaviors

Authors: Ricardo Vega, Connor Mattson, Daniel S. Brown, Cameron Nowzari

Abstract: Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for ch… ▽ More Emergence and emergent behaviors are often defined as cases where changes in local interactions between agents at a lower level effectively changes what occurs in the higher level of the system (i.e., the whole swarm) and its properties. However, the manner in which these collective emergent behaviors self-organize is less understood. The focus of this paper is in presenting a new framework for characterizing the conditions that lead to different macrostates and how to predict/analyze their macroscopic properties, allowing us to indirectly engineer the same behaviors from the bottom up by tuning their environmental conditions rather than local interaction rules. We then apply this framework to a simple system of binary sensing and acting agents as an example to see if a re-framing of this swarms problem can help us push the state of the art forward. By first creating some working definitions of macrostates in a particular swarm system, we show how agent-based modeling may be combined with control theory to enable a generalized understanding of controllable emergent processes without needing to simulate everything. Whereas phase diagrams can generally only be created through Monte Carlo simulations or sweeping through ranges of parameters in a simulator, we develop closed-form functions that can immediately produce them revealing an infinite set of swarm parameter combinations that can lead to a specifically chosen self-organized behavior. While the exact methods are still under development, we believe simply laying out a potential path towards solutions that have evaded our traditional methods using a novel method is worth considering. Our results are characterized through both simulations and real experiments on ground robots. △ Less

Submitted 28 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: 8 pages, 13 figures, submitted to IROS 2024 conference

arXiv:2309.03744 [pdf, other]

Label-efficient Contrastive Learning-based model for nuclei detection and classification in 3D Cardiovascular Immunofluorescent Images

Authors: Nazanin Moradinasab, Rebecca A. Deaton, Laura S. Shankman, Gary K. Owens, Donald E. Brown

Abstract: Recently, deep learning-based methods achieved promising performance in nuclei detection and classification applications. However, training deep learning-based methods requires a large amount of pixel-wise annotated data, which is time-consuming and labor-intensive, especially in 3D images. An alternative approach is to adapt weak-annotation methods, such as labeling each nucleus with a point, but… ▽ More Recently, deep learning-based methods achieved promising performance in nuclei detection and classification applications. However, training deep learning-based methods requires a large amount of pixel-wise annotated data, which is time-consuming and labor-intensive, especially in 3D images. An alternative approach is to adapt weak-annotation methods, such as labeling each nucleus with a point, but this method does not extend from 2D histopathology images (for which it was originally developed) to 3D immunofluorescent images. The reason is that 3D images contain multiple channels (z-axis) for nuclei and different markers separately, which makes training using point annotations difficult. To address this challenge, we propose the Label-efficient Contrastive learning-based (LECL) model to detect and classify various types of nuclei in 3D immunofluorescent images. Previous methods use Maximum Intensity Projection (MIP) to convert immunofluorescent images with multiple slices to 2D images, which can cause signals from different z-stacks to falsely appear associated with each other. To overcome this, we devised an Extended Maximum Intensity Projection (EMIP) approach that addresses issues using MIP. Furthermore, we performed a Supervised Contrastive Learning (SCL) approach for weakly supervised settings. We conducted experiments on cardiovascular datasets and found that our proposed framework is effective and efficient in detecting and classifying various types of nuclei in 3D immunofluorescent images. △ Less

Submitted 14 January, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

Comments: 11 pages, 5 figures, MICCAI Workshop Conference 2023

arXiv:2308.13035 [pdf]

The intersection of video capsule endoscopy and artificial intelligence: addressing unique challenges using machine learning

Authors: Shan Guleria, Benjamin Schwartz, Yash Sharma, Philip Fernandes, James Jablonski, Sodiq Adewole, Sanjana Srivastava, Fisher Rhoads, Michael Porter, Michelle Yeghyayan, Dylan Hyatt, Andrew Copland, Lubaina Ehsan, Donald Brown, Sana Syed

Abstract: Introduction: Technical burdens and time-intensive review processes limit the practical utility of video capsule endoscopy (VCE). Artificial intelligence (AI) is poised to address these limitations, but the intersection of AI and VCE reveals challenges that must first be overcome. We identified five challenges to address. Challenge #1: VCE data are stochastic and contains significant artifact. Cha… ▽ More Introduction: Technical burdens and time-intensive review processes limit the practical utility of video capsule endoscopy (VCE). Artificial intelligence (AI) is poised to address these limitations, but the intersection of AI and VCE reveals challenges that must first be overcome. We identified five challenges to address. Challenge #1: VCE data are stochastic and contains significant artifact. Challenge #2: VCE interpretation is cost-intensive. Challenge #3: VCE data are inherently imbalanced. Challenge #4: Existing VCE AIMLT are computationally cumbersome. Challenge #5: Clinicians are hesitant to accept AIMLT that cannot explain their process. Methods: An anatomic landmark detection model was used to test the application of convolutional neural networks (CNNs) to the task of classifying VCE data. We also created a tool that assists in expert annotation of VCE data. We then created more elaborate models using different approaches including a multi-frame approach, a CNN based on graph representation, and a few-shot approach based on meta-learning. Results: When used on full-length VCE footage, CNNs accurately identified anatomic landmarks (99.1%), with gradient weighted-class activation mapping showing the parts of each frame that the CNN used to make its decision. The graph CNN with weakly supervised learning (accuracy 89.9%, sensitivity of 91.1%), the few-shot model (accuracy 90.8%, precision 91.4%, sensitivity 90.9%), and the multi-frame model (accuracy 97.5%, precision 91.5%, sensitivity 94.8%) performed well. Discussion: Each of these five challenges is addressed, in part, by one of our AI-based models. Our goal of producing high performance using lightweight models that aim to improve clinician confidence was achieved. △ Less

Submitted 24 August, 2023; originally announced August 2023.

arXiv:2307.12941 [pdf, other]

On Privileged and Convergent Bases in Neural Network Representations

Authors: Davis Brown, Nikhil Vyas, Yamini Bansal

Abstract: In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational i… ▽ More In this study, we investigate whether the representations learned by neural networks possess a privileged and convergent basis. Specifically, we examine the significance of feature directions represented by individual neurons. First, we establish that arbitrary rotations of neural representations cannot be inverted (unlike linear networks), indicating that they do not exhibit complete rotational invariance. Subsequently, we explore the possibility of multiple bases achieving identical performance. To do this, we compare the bases of networks trained with the same parameters but with varying random initializations. Our study reveals two findings: (1) Even in wide networks such as WideResNets, neural networks do not converge to a unique basis; (2) Basis correlation increases significantly when a few early layers of the network are frozen identically. Furthermore, we analyze Linear Mode Connectivity, which has been studied as a measure of basis correlation. Our findings give evidence that while Linear Mode Connectivity improves with increased network width, this improvement is not due to an increase in basis correlation. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: In the Workshop on High-dimensional Learning Dynamics at ICML 2023

arXiv:2307.10026 [pdf, other]

Contextual Reliability: When Different Features Matter in Different Contexts

Authors: Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, Aditi Raghunathan

Abstract: Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we canno… ▽ More Deep neural networks often fail catastrophically by relying on spurious correlations. Most prior work assumes a clear dichotomy into spurious and reliable features; however, this is often unrealistic. For example, most of the time we do not want an autonomous car to simply copy the speed of surrounding cars -- we don't want our car to run a red light if a neighboring car does so. However, we cannot simply enforce invariance to next-lane speed, since it could provide valuable information about an unobservable pedestrian at a crosswalk. Thus, universally ignoring features that are sometimes (but not always) reliable can lead to non-robust performance. We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context. We propose and analyze a two-stage framework called Explicit Non-spurious feature Prediction (ENP) which first identifies the relevant features to use for a given context, then trains a model to rely exclusively on these features. Our work theoretically and empirically demonstrates the advantages of ENP over existing methods and provides new benchmarks for contextual reliability. △ Less

Submitted 19 July, 2023; originally announced July 2023.

Comments: ICML 2023 Camera Ready Version

arXiv:2306.15745 [pdf, other]

Identity Construction in a Misogynist Incels Forum

Authors: Michael Miller Yoder, Chloe Perry, David West Brown, Kathleen M. Carley, Meredith L. Pruden

Abstract: Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, menti… ▽ More Online communities of involuntary celibates (incels) are a prominent source of misogynist hate speech. In this paper, we use quantitative text and network analysis approaches to examine how identity groups are discussed on incels-dot-is, the largest black-pilled incels forum. We find that this community produces a wide range of novel identity terms and, while terms for women are most common, mentions of other minoritized identities are increasing. An analysis of the associations made with identity groups suggests an essentialist ideology where physical appearance, as well as gender and racial hierarchies, determine human value. We discuss implications for research into automated misogynist hate speech detection. △ Less

Submitted 9 July, 2023; v1 submitted 27 June, 2023; originally announced June 2023.

Comments: Workshop on Online Abuse and Harms (WOAH) 2023; Minor edits to author names and abstracts in most recent version

arXiv:2306.15732 [pdf, other]

A Weakly Supervised Classifier and Dataset of White Supremacist Language

Authors: Michael Miller Yoder, Ahmad Diab, David West Brown, Kathleen M. Carley

Abstract: We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incor… ▽ More We present a dataset and classifier for detecting the language of white supremacist extremism, a growing issue in online hate speech. Our weakly supervised classifier is trained on large datasets of text from explicitly white supremacist domains paired with neutral and anti-racist data from similar domains. We demonstrate that this approach improves generalization performance to new domains. Incorporating anti-racist texts as counterexamples to white supremacist language mitigates bias. △ Less

Submitted 27 June, 2023; originally announced June 2023.

Comments: ACL 2023 short

arXiv:2306.13004 [pdf, other]

Can Differentiable Decision Trees Enable Interpretable Reward Learning from Human Feedback?

Authors: Akansha Kalra, Daniel S. Brown

Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if the… ▽ More Reinforcement Learning from Human Feedback (RLHF) has emerged as a popular paradigm for capturing human intent to alleviate the challenges of hand-crafting the reward values. Despite the increasing interest in RLHF, most works learn black box reward functions that while expressive are difficult to interpret and often require running the whole costly process of RL before we can even decipher if these frameworks are actually aligned with human preferences. We propose and evaluate a novel approach for learning expressive and interpretable reward functions from preferences using Differentiable Decision Trees (DDTs). Our experiments across several domains, including CartPole, Visual Gridworld environments and Atari games, provide evidence that the tree structure of our learned reward function is useful in determining the extent to which the reward function is aligned with human preferences. We also provide experimental evidence that not only shows that reward DDTs can often achieve competitive RL performance when compared with larger capacity deep neural network reward functions but also demonstrates the diagnostic utility of our framework in checking alignment of learned reward functions. We also observe that the choice between soft and hard (argmax) output of reward DDT reveals a tension between wanting highly shaped rewards to ensure good RL performance, while also wanting simpler, more interpretable rewards. Videos and code, are available at: https://sites.google.com/view/ddt-rlhf △ Less

Submitted 24 June, 2024; v1 submitted 22 June, 2023; originally announced June 2023.

Comments: Accepted at RLC 2024

arXiv:2306.09909 [pdf, other]

Neural Volumetric Reconstruction for Coherent Synthetic Aperture Sonar

Authors: Albert W. Reed, Juhyeon Kim, Thomas Blanford, Adithya Pediredla, Daniel C. Brown, Suren Jayasuriya

Abstract: Synthetic aperture sonar (SAS) measures a scene from multiple views in order to increase the resolution of reconstructed imagery. Image reconstruction methods for SAS coherently combine measurements to focus acoustic energy onto the scene. However, image formation is typically under-constrained due to a limited number of measurements and bandlimited hardware, which limits the capabilities of exist… ▽ More Synthetic aperture sonar (SAS) measures a scene from multiple views in order to increase the resolution of reconstructed imagery. Image reconstruction methods for SAS coherently combine measurements to focus acoustic energy onto the scene. However, image formation is typically under-constrained due to a limited number of measurements and bandlimited hardware, which limits the capabilities of existing reconstruction methods. To help meet these challenges, we design an analysis-by-synthesis optimization that leverages recent advances in neural rendering to perform coherent SAS imaging. Our optimization enables us to incorporate physics-based constraints and scene priors into the image formation process. We validate our method on simulation and experimental results captured in both air and water. We demonstrate both quantitatively and qualitatively that our method typically produces superior reconstructions than existing approaches. We share code and data for reproducibility. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2306.06199 [pdf, other]

Reliability Check: An Analysis of GPT-3's Response to Sensitive Topics and Prompt Wording

Authors: Aisha Khatun, Daniel G. Brown

Abstract: Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF),… ▽ More Large language models (LLMs) have become mainstream technology with their versatile use cases and impressive performance. Despite the countless out-of-the-box applications, LLMs are still not reliable. A lot of work is being done to improve the factual accuracy, consistency, and ethical standards of these models through fine-tuning, prompting, and Reinforcement Learning with Human Feedback (RLHF), but no systematic analysis of the responses of these models to different categories of statements, or on their potential vulnerabilities to simple prompting changes is available. In this work, we analyze what confuses GPT-3: how the model responds to certain sensitive topics and what effects the prompt wording has on the model response. We find that GPT-3 correctly disagrees with obvious Conspiracies and Stereotypes but makes mistakes with common Misconceptions and Controversies. The model responses are inconsistent across prompts and settings, highlighting GPT-3's unreliability. Dataset and code of our analysis is available in https://github.com/tanny411/GPT3-Reliability-Check. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: Accepted in TrustNLP: Third Workshop on Trustworthy Natural Language Processing, co-located with ACL 2023

arXiv:2305.16148 [pdf, other]

doi 10.1145/3583131.3590443

Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms

Authors: Connor Mattson, Daniel S. Brown

Abstract: Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand… ▽ More Robot swarms often exhibit emergent behaviors that are fascinating to observe; however, it is often difficult to predict what swarm behaviors can emerge under a given set of agent capabilities. We seek to efficiently leverage human input to automatically discover a taxonomy of collective behaviors that can emerge from a particular multi-agent system, without requiring the human to know beforehand what behaviors are interesting or even possible. Our proposed approach adapts to user preferences by learning a similarity space over swarm collective behaviors using self-supervised learning and human-in-the-loop queries. We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors. We also propose several general-purpose heuristics that improve the efficiency of our novelty search by prioritizing robot controllers that are likely to lead to interesting emergent behaviors. We test our approach in simulation on two robot capability models and show that our methods consistently discover a richer set of emergent behaviors than prior work. Code, videos, and datasets are available at https://sites.google.com/view/evolving-novel-swarms. △ Less

Submitted 16 July, 2023; v1 submitted 25 April, 2023; originally announced May 2023.

Comments: 13 pages, 10 figures, To be published in Proceedings Genetic and Evolutionary Computation Conference (GECCO 2023)

arXiv:2305.11064 [pdf, ps, other]

Bits of Grass: Does GPT already know how to write like Whitman?

Authors: Piotr Sawicki, Marek Grzes, Fabricio Goes, Dan Brown, Max Peeperkorn, Aisha Khatun

Abstract: This study examines the ability of GPT-3.5, GPT-3.5-turbo (ChatGPT) and GPT-4 models to generate poems in the style of specific authors using zero-shot and many-shot prompts (which use the maximum context length of 8192 tokens). We assess the performance of models that are not fine-tuned for generating poetry in the style of specific authors, via automated evaluation. Our findings indicate that wi… ▽ More This study examines the ability of GPT-3.5, GPT-3.5-turbo (ChatGPT) and GPT-4 models to generate poems in the style of specific authors using zero-shot and many-shot prompts (which use the maximum context length of 8192 tokens). We assess the performance of models that are not fine-tuned for generating poetry in the style of specific authors, via automated evaluation. Our findings indicate that without fine-tuning, even when provided with the maximum number of 17 poem examples (8192 tokens) in the prompt, these models do not generate poetry in the desired style. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: short paper 5 pages

arXiv:2305.11057 [pdf, other]

From Assistive Technologies to Metaverse: Technologies in Inclusive Higher Education for Students with Specific Learning Difficulties

Authors: Gokul Yenduri, Rajesh Kaluri, Dharmendra Singh Rajput, Kuruva Lakshmanna, Thippa Reddy Gadekallu, Mufti Mahmud, David J. Brown

Abstract: The development of new technologies and their expanding use in a wide range of educational environments are driving the transformation of higher education. Assistive technologies are a subset of cutting-edge technology that can help students learn more effectively and make education accessible to everyone. Assistive technology can enhance, maintain, or improve the capacities of students with learn… ▽ More The development of new technologies and their expanding use in a wide range of educational environments are driving the transformation of higher education. Assistive technologies are a subset of cutting-edge technology that can help students learn more effectively and make education accessible to everyone. Assistive technology can enhance, maintain, or improve the capacities of students with learning difficulties. Students with learning difficulties will be greatly benefited from the use of assistive technologies. If these technologies are used effectively, students with learning difficulties can compete with their peers and complete their academic tasks. We aim to conduct this review to better understand the role of assistive technologies in providing inclusive higher education for students with learning difficulties. The review begins with the introduction of learning difficulties and their causes; inclusive education and the need for assistive technologies; the reasoning for conducting this review; and a summary of related reviews on assistive technologies for students with learning difficulties in inclusive higher education. Then, we discuss the preliminaries for the learning difficulties type and assistive technology. Later, we discuss the effects of assistive technology on inclusive higher education for students with learning difficulties. Additionally, we discuss related projects and support tools available in inclusive higher education for students with learning difficulties. We also explore the challenges and possible solutions related to using assistive technology in higher education to provide inclusive education for students with learning difficulties. We conclude the review with a discussion of potential promising future directions. △ Less

Submitted 5 May, 2023; originally announced May 2023.

Comments: Submitted to peer review

arXiv:2305.03625 [pdf, ps, other]

Physics-Based Acoustic Holograms

Authors: Antonio Stanziola, Ben T. Cox, Bradley E. Treeby, Michael D. Brown

Abstract: Advances in additive manufacturing have enabled the realisation of inexpensive, scalable, diffractive acoustic lenses that can be used to generate complex acoustic fields via phase and/or amplitude modulation. However, the design of these holograms relies on a thin-element approximation adapted from optics which can severely limit the fidelity of the realised acoustic field. Here, we introduce phy… ▽ More Advances in additive manufacturing have enabled the realisation of inexpensive, scalable, diffractive acoustic lenses that can be used to generate complex acoustic fields via phase and/or amplitude modulation. However, the design of these holograms relies on a thin-element approximation adapted from optics which can severely limit the fidelity of the realised acoustic field. Here, we introduce physics-based acoustic holograms with a complex internal structure. The structures are designed using a differentiable acoustic model with manufacturing constraints via optimisation of the acoustic property distribution within the hologram. The holograms can be fabricated simply and inexpensively using contemporary 3D printers. Experimental measurements demonstrate a significant improvement compared to conventional thin-element holograms. △ Less

Submitted 5 May, 2023; originally announced May 2023.

arXiv:2304.01035 [pdf, other]

Reproducing the results for NICER observation of PSR J0030+0451

Authors: Chaitanya Afle, Patrick R. Miles, Silvina Caino-Lores, Collin D. Capano, Ingo Tews, Karan Vahi, Ewa Deelman, Michela Taufer, Duncan A. Brown

Abstract: NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We not… ▽ More NASA's Neutron Star Interior Composition Explorer (NICER) observed X-ray emission from the pulsar PSR J0030+0451 in 2018. Riley et al. reported Bayesian parameter measurements of the mass and the star's radius using pulse-profile modeling of the X-ray data. This paper reproduces their result using the open-source software X-PSI and publicly available data within expected statistical errors. We note the challenges we faced in reproducing the results and demonstrate that the analysis can be reproduced and reused in future works by changing the prior distribution for the radius and the sampler configuration. We find no significant change in the measurement of the mass and radius, demonstrating that the original result is robust to these changes. Finally, we provide a containerized working environment that facilitates third-party reproduction of the measurements of mass and radius of PSR J0030+0451 using the NICER observations. △ Less

Submitted 31 January, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

Comments: 13 pages, 4 figures, 2 tables. Final version accepted for publication in Computing in Science & Engineering

Report number: LA-UR-22-29359

arXiv:2303.14173 [pdf, other]

How many dimensions are required to find an adversarial example?

Authors: Charles Godfrey, Henry Kvinge, Elise Bishoff, Myles Mckay, Davis Brown, Tim Doster, Eleanor Byler

Abstract: Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrai… ▽ More Past work exploring adversarial vulnerability have focused on situations where an adversary can perturb all dimensions of model input. On the other hand, a range of recent works consider the case where either (i) an adversary can perturb a limited number of input parameters or (ii) a subset of modalities in a multimodal problem. In both of these cases, adversarial examples are effectively constrained to a subspace $V$ in the ambient input space $\mathcal{X}$. Motivated by this, in this work we investigate how adversarial vulnerability depends on $\dim(V)$. In particular, we show that the adversarial success of standard PGD attacks with $\ell^p$ norm constraints behaves like a monotonically increasing function of $ε(\frac{\dim(V)}{\dim \mathcal{X}})^{\frac{1}{q}}$ where $ε$ is the perturbation budget and $\frac{1}{p} + \frac{1}{q} =1$, provided $p > 1$ (the case $p=1$ presents additional subtleties which we analyze in some detail). This functional form can be easily derived from a simple toy linear model, and as such our results land further credence to arguments that adversarial examples are endemic to locally linear models on high dimensional spaces. △ Less

Submitted 10 April, 2023; v1 submitted 24 March, 2023; originally announced March 2023.

Comments: Comments welcome! V2: minor edits for clarity

MSC Class: 68T07 (Primary) ACM Class: G.3; I.2; I.5; J.2

arXiv:2303.06208 [pdf, ps, other]

Fast computation of permutation equivariant layers with the partition algebra

Authors: Charles Godfrey, Michael G. Rawson, Davis Brown, Henry Kvinge

Abstract: Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant sub… ▽ More Linear neural network layers that are either equivariant or invariant to permutations of their inputs form core building blocks of modern deep learning architectures. Examples include the layers of DeepSets, as well as linear layers occurring in attention blocks of transformers and some graph neural networks. The space of permutation equivariant linear layers can be identified as the invariant subspace of a certain symmetric group representation, and recent work parameterized this space by exhibiting a basis whose vectors are sums over orbits of standard basis elements with respect to the symmetric group action. A parameterization opens up the possibility of learning the weights of permutation equivariant linear layers via gradient descent. The space of permutation equivariant linear layers is a generalization of the partition algebra, an object first discovered in statistical physics with deep connections to the representation theory of the symmetric group, and the basis described above generalizes the so-called orbit basis of the partition algebra. We exhibit an alternative basis, generalizing the diagram basis of the partition algebra, with computational benefits stemming from the fact that the tensors making up the basis are low rank in the sense that they naturally factorize into Kronecker products. Just as multiplication by a rank one matrix is far less expensive than multiplication by an arbitrary matrix, multiplication with these low rank tensors is far less expensive than multiplication with elements of the orbit basis. Finally, we describe an algorithm implementing multiplication with these basis elements. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Comments welcome!

MSC Class: 68T07 (Primary) 05E10; 20C30 (Secondary) ACM Class: G.3; I.2; I.5; J.2

arXiv:2303.00046 [pdf, other]

Edit at your own risk: evaluating the robustness of edited models to distribution shifts

Authors: Davis Brown, Charles Godfrey, Cody Nizinski, Jonathan Tu, Henry Kvinge

Abstract: The current trend toward ever-larger models makes standard retraining procedures an ever-more expensive burden. For this reason, there is growing interest in model editing, which enables computationally inexpensive, interpretable, post-hoc model modifications. While many model editing techniques are promising, research on the properties of edited models is largely limited to evaluation of validati… ▽ More The current trend toward ever-larger models makes standard retraining procedures an ever-more expensive burden. For this reason, there is growing interest in model editing, which enables computationally inexpensive, interpretable, post-hoc model modifications. While many model editing techniques are promising, research on the properties of edited models is largely limited to evaluation of validation accuracy. The robustness of edited models is an important and yet mostly unexplored topic. In this paper, we employ recently developed techniques from the field of deep learning robustness to investigate both how model editing affects the general robustness of a model, as well as the robustness of the specific behavior targeted by the edit. We find that edits tend to reduce general robustness, but that the degree of degradation depends on the editing algorithm and layers chosen. Motivated by these observations we introduce a new model editing algorithm, 1-layer interpolation (1-LI), which uses weight-space interpolation to navigate the trade-off between editing task accuracy and general robustness. △ Less

Submitted 17 July, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

Comments: DB and CG contributed equally

arXiv:2302.09301 [pdf, other]

Exploring the Representation Manifolds of Stable Diffusion Through the Lens of Intrinsic Dimension

Authors: Henry Kvinge, Davis Brown, Charles Godfrey

Abstract: Prompting has become an important mechanism by which users can more effectively interact with many flavors of foundation model. Indeed, the last several years have shown that well-honed prompts can sometimes unlock emergent capabilities within such models. While there has been a substantial amount of empirical exploration of prompting within the community, relatively few works have studied prompti… ▽ More Prompting has become an important mechanism by which users can more effectively interact with many flavors of foundation model. Indeed, the last several years have shown that well-honed prompts can sometimes unlock emergent capabilities within such models. While there has been a substantial amount of empirical exploration of prompting within the community, relatively few works have studied prompting at a mathematical level. In this work we aim to take a first step towards understanding basic geometric properties induced by prompts in Stable Diffusion, focusing on the intrinsic dimension of internal representations within the model. We find that choice of prompt has a substantial impact on the intrinsic dimension of representations at both layers of the model which we explored, but that the nature of this impact depends on the layer being considered. For example, in certain bottleneck layers of the model, intrinsic dimension of representations is correlated with prompt perplexity (measured using a surrogate model), while this correlation is not apparent in the latent layers. Our evidence suggests that intrinsic dimension could be a useful tool for future studies of the impact of different prompts on text-to-image models. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: 11 pages

arXiv:2301.13388 [pdf, other]

Large Music Recommendation Studies for Small Teams

Authors: Kyle Robinson, Dan Brown

Abstract: Running live music recommendation studies without direct industry partnerships can be a prohibitively daunting task, especially for small teams. In order to help future researchers interested in such evaluations, we present a number of struggles we faced in the process of generating our own such evaluation system alongside potential solutions. These problems span the topics of users, data, computa… ▽ More Running live music recommendation studies without direct industry partnerships can be a prohibitively daunting task, especially for small teams. In order to help future researchers interested in such evaluations, we present a number of struggles we faced in the process of generating our own such evaluation system alongside potential solutions. These problems span the topics of users, data, computation, and application architecture. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Journal ref: Late Breaking/Demo, Proc. of the 22nd Int. Society for Music Information Retrieval Conf., Online, 2021

arXiv:2301.13380 [pdf, other]

Automated Time-frequency Domain Audio Crossfades using Graph Cuts

Authors: Kyle Robinson, Dan Brown

Abstract: The problem of transitioning smoothly from one audio clip to another arises in many music consumption scenarios, especially as music consumption has moved from professionally curated and live-streamed radios to personal playback devices and services. we present the first steps toward a new method of automatically transitioning from one audio clip to another by discretizing the frequency spectrum i… ▽ More The problem of transitioning smoothly from one audio clip to another arises in many music consumption scenarios, especially as music consumption has moved from professionally curated and live-streamed radios to personal playback devices and services. we present the first steps toward a new method of automatically transitioning from one audio clip to another by discretizing the frequency spectrum into bins and then finding transition times for each bin. We phrase the problem as one of graph flow optimization; specifically min-cut/max-flow. △ Less

Submitted 30 January, 2023; originally announced January 2023.

Journal ref: Late Breaking/Demo at the 20th International Society for Music Information Retrieval, Delft, The Netherlands, 2019

arXiv:2301.04741 [pdf, other]

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

Authors: Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

Abstract: Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we stu… ▽ More Preference-based reinforcement learning (PbRL) can enable robots to learn to perform tasks based on an individual's preferences without requiring a hand-crafted reward function. However, existing approaches either assume access to a high-fidelity simulator or analytic model or take a model-free approach that requires extensive, possibly unsafe online environment interactions. In this paper, we study the benefits and challenges of using a learned dynamics model when performing PbRL. In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction. Our paper provides empirical evidence that learned dynamics models enable robots to learn customized policies based on user preferences in ways that are safer and more sample efficient than prior preference learning approaches. Supplementary materials and code are available at https://sites.google.com/berkeley.edu/mop-rl. △ Less

Submitted 9 February, 2024; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: In proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA 2023)

arXiv:2301.01392 [pdf, other]

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

Authors: Daniel Shin, Anca D. Dragan, Daniel S. Brown

Abstract: Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization… ▽ More Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment. However, in many tasks the agent might have access to offline data from related tasks in the same target environment. While offline data is increasingly being used to aid policy optimization via offline RL, our observation is that it can be a surprisingly rich source of information for preference learning as well. We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning, learns a distribution over reward functions, and optimizes a corresponding policy via offline RL. Crucially, our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps. To test our approach, we first evaluate existing offline RL benchmarks for their suitability for offline reward learning. Surprisingly, for many offline RL domains, we find that simply using a trivial reward function results good policy performance, making these domains ill-suited for evaluating learned rewards. To address this, we identify a subset of existing offline RL benchmarks that are well suited for offline reward learning and also propose new offline apprenticeship learning benchmarks which allow for more open-ended behaviors. When evaluated on this curated set of domains, our empirical results suggest that combining offline RL with learned human preferences can enable an agent to learn to perform novel tasks that were not explicitly shown in the offline data. △ Less

Submitted 3 January, 2023; originally announced January 2023.

Comments: Transactions on Machine Learning Research. arXiv admin note: text overlap with arXiv:2107.09251

arXiv:2301.00810 [pdf, other]

doi 10.1145/3568162.3576989

SIRL: Similarity-based Implicit Representation Learning

Authors: Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

Abstract: When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that co… ▽ More When robots learn reward functions using high capacity models that take raw state directly as input, they need to both learn a representation for what matters in the task -- the task ``features" -- as well as how to combine these features into a single objective. If they try to do both at once from input designed to teach the full reward function, it is easy to end up with a representation that contains spurious correlations in the data, which fails to generalize to new settings. Instead, our ultimate goal is to enable robots to identify and isolate the causal features that people actually care about and use when they represent states and behavior. Our idea is that we can tune into this representation by asking users what behaviors they consider similar: behaviors will be similar if the features that matter are similar, even if low-level behavior is different; conversely, behaviors will be different if even one of the features that matter differs. This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not. The notion of learning representations based on similarity has a nice parallel in contrastive learning, a self-supervised representation learning technique that maps visually similar data points to similar embeddings, where similarity is defined by a designer through data augmentation heuristics. By contrast, in order to learn the representations that people use, so we can learn their preferences and objectives, we use their definition of similarity. In simulation as well as in a user study, we show that learning through such similarity queries leads to representations that, while far from perfect, are indeed more generalizable than self-supervised and task-input alternatives. △ Less

Submitted 17 March, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: 12 pages, 6 figures, HRI 2023

arXiv:2212.11214 [pdf, other]

Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges

Authors: Fabricio Goes, Zisen Zhou, Piotr Sawicki, Marek Grzes, Daniel G. Brown

Abstract: This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable usin… ▽ More This paper presents the Crowd Score, a novel method to assess the funniness of jokes using large language models (LLMs) as AI judges. Our method relies on inducing different personalities into the LLM and aggregating the votes of the AI judges into a single score to rate jokes. We validate the votes using an auditing technique that checks if the explanation for a particular vote is reasonable using the LLM. We tested our methodology on 52 jokes in a crowd of four AI voters with different humour types: affiliative, self-enhancing, aggressive and self-defeating. Our results show that few-shot prompting leads to better results than zero-shot for the voting question. Personality induction showed that aggressive and self-defeating voters are significantly more inclined to find more jokes funny of a set of aggressive/self-defeating jokes than the affiliative and self-enhancing voters. The Crowd Score follows the same trend as human judges by assigning higher scores to jokes that are also considered funnier by human judges. We believe that our methodology could be applied to other creative domains such as story, poetry, slogans, etc. It could both help the adoption of a flexible and accurate standard approach to compare different work in the CC community under a common metric and by minimizing human participation in assessing creative artefacts, it could accelerate the prototyping of creative artefacts and reduce the cost of hiring human participants to rate creative artefacts. △ Less

Submitted 21 December, 2022; originally announced December 2022.

Comments: 11 pages, 3 figures

arXiv:2212.03175 [pdf, other]

Learning Representations that Enable Generalization in Assistive Tasks

Authors: Jerry Zhi-Yang He, Aditi Raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

Abstract: Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Su… ▽ More Recent work in sim2real has successfully enabled robots to act in physical environments by training in simulation with a diverse ''population'' of environments (i.e. domain randomization). In this work, we focus on enabling generalization in assistive tasks: tasks in which the robot is acting to assist a user (e.g. helping someone with motor impairments with bathing or with scratching an itch). Such tasks are particularly interesting relative to prior sim2real successes because the environment now contains a human who is also acting. This complicates the problem because the diversity of human users (instead of merely physical environment parameters) is more difficult to capture in a population, thus increasing the likelihood of encountering out-of-distribution (OOD) human policies at test time. We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only. We study how to best learn such a representation by evaluating on purposefully constructed OOD test policies. We find that sim2real methods that encode environment (or population) parameters and work well in tasks that robots do in isolation, do not work well in assistance. In assistance, it seems crucial to train the representation based on the history of interaction directly, because that is what the robot will have access to at test time. Further, training these representations to then predict human actions not only gives them better structure, but also enables them to be fine-tuned at test-time, when the robot observes the partner act. https://adaptive-caregiver.github.io. △ Less

Submitted 5 December, 2022; originally announced December 2022.

arXiv:2212.00222 [pdf, other]

Experimental Observations of the Topology of Convolutional Neural Network Activations

Authors: Emilie Purvine, Davis Brown, Brett Jefferson, Cliff Joslyn, Brenda Praggastis, Archit Rathore, Madelyn Shapiro, Bei Wang, Youjia Zhou

Abstract: Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal repres… ▽ More Topological data analysis (TDA) is a branch of computational mathematics, bridging algebraic topology and data science, that provides compact, noise-robust representations of complex structures. Deep neural networks (DNNs) learn millions of parameters associated with a series of transformations defined by the model architecture, resulting in high-dimensional, difficult-to-interpret internal representations of input data. As DNNs become more ubiquitous across multiple sectors of our society, there is increasing recognition that mathematical methods are needed to aid analysts, researchers, and practitioners in understanding and interpreting how these models' internal representations relate to the final classification. In this paper, we apply cutting edge techniques from TDA with the goal of gaining insight into the interpretability of convolutional neural networks used for image classification. We use two common TDA approaches to explore several methods for modeling hidden-layer activations as high-dimensional point clouds, and provide experimental evidence that these point clouds capture valuable structural information about the model's process. First, we demonstrate that a distance metric based on persistent homology can be used to quantify meaningful differences between layers, and we discuss these distances in the broader context of existing representational similarity metrics for neural network interpretability. Second, we show that a mapper graph can provide semantic insight into how these models organize hierarchical class knowledge at each layer. These observations demonstrate that TDA is a useful tool to help deep learning practitioners unlock the hidden structures of their models. △ Less

Submitted 30 November, 2022; originally announced December 2022.

Comments: Accepted at AAAI 2023. This version includes supplementary material

arXiv:2211.15542 [pdf]

doi 10.1145/3610977.3634984

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

Authors: Tu Trinh, Haoyu Chen, Daniel S. Brown

Abstract: We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high… ▽ More We examine the problem of determining demonstration sufficiency: how can a robot self-assess whether it has received enough demonstrations from an expert to ensure a desired level of performance? To address this problem, we propose a novel self-assessment approach based on Bayesian inverse reinforcement learning and value-at-risk, enabling learning-from-demonstration ("LfD") robots to compute high-confidence bounds on their performance and use these bounds to determine when they have a sufficient number of demonstrations. We propose and evaluate two definitions of sufficiency: (1) normalized expected value difference, which measures regret with respect to the human's unobserved reward function, and (2) percent improvement over a baseline policy. We demonstrate how to formulate high-confidence bounds on both of these metrics. We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency. We also show that the robot can utilize active learning in asking for demonstrations from specific states which results in fewer demos needed for the robot to still maintain high confidence in its policy. Finally, via a user study, we show that our approach successfully enables robots to perform at users' desired performance levels, without needing too many or perfectly optimal demonstrations. △ Less

Submitted 2 January, 2024; v1 submitted 28 November, 2022; originally announced November 2022.

Comments: Prior version appears in proceedings of AAAI FSS-22 Symposium "Lessons Learned for Autonomous Assessment of Machine Abilities (LLAAMA)". Current version appears in proceedings of HRI '24, March 11-14, 2024, Boulder, CO, USA

arXiv:2211.10558 [pdf, other]

Internal Representations of Vision Models Through the Lens of Frames on Data Manifolds

Authors: Henry Kvinge, Grayson Jorgenson, Davis Brown, Charles Godfrey, Tegan Emerson

Abstract: While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such represen… ▽ More While the last five years have seen considerable progress in understanding the internal representations of deep learning models, many questions remain. This is especially true when trying to understand the impact of model design choices, such as model architecture or training algorithm, on hidden representation geometry and dynamics. In this work we present a new approach to studying such representations inspired by the idea of a frame on the tangent bundle of a manifold. Our construction, which we call a neural frame, is formed by assembling a set of vectors representing specific types of perturbations of a data point, for example infinitesimal augmentations, noise perturbations, or perturbations produced by a generative model, and studying how these change as they pass through a network. Using neural frames, we make observations about the way that models process, layer-by-layer, specific modes of variation within a small neighborhood of a datapoint. Our results provide new perspectives on a number of phenomena, such as the manner in which training with augmentation produces model invariance or the proposed trade-off between adversarial training and model generalization. △ Less

Submitted 6 December, 2023; v1 submitted 18 November, 2022; originally announced November 2022.

Comments: 30 pages, accepted as an oral presentation at the Workshop on Symmetry and Geometry in Neural Representations at NeurIPS 2023

Showing 1–50 of 178 results for author: Brown, D