subscribe to arXiv mailings

Bayesian temporal biclustering with applications to multi-subject neuroscience studies

Authors: Federica Zoe Ricci, Erik B. Sudderth, Jaylen Lee, Megan A. K. Peters, Marina Vannucci, Michele Guindani

Abstract: We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of… ▽ More We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of subjects induces a time-varying partition of measurements. Our approach allows for data-driven determination of the number of subject and measurement clusters as well as estimation of the number and location of changepoints in measurement partitions. To efficiently perform model fitting and posterior estimation with Markov Chain Monte Carlo, we derive a blocked update of measurements' cluster-assignment sequences. We illustrate the performance of our model in two applications to functional magnetic resonance imaging data and to an electroencephalogram dataset. The results indicate that the proposed model can combine information from potentially many subjects to discover a set of interpretable, dynamic patterns. Experiments on simulated data compare the estimation performance of the proposed model against ground-truth values and other statistical methods, showing that it performs well at identifying ground-truth subject and measurement clusters even when no subject or time dependence is present. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2405.19221 [pdf]

Domain adaptation in small-scale and heterogeneous biological datasets

Authors: Seyedmehdi Orouji, Martin C. Liu, Tal Korem, Megan A. K. Peters

Abstract: Machine learning techniques are steadily becoming more important in modern biology, and are used to build predictive models, discover patterns, and investigate biological problems. However, models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories, due to differences in the statistical properties of these datasets. These could stem from tech… ▽ More Machine learning techniques are steadily becoming more important in modern biology, and are used to build predictive models, discover patterns, and investigate biological problems. However, models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories, due to differences in the statistical properties of these datasets. These could stem from technical differences, such as the measurement technique used, or from relevant biological differences between the populations studied. Domain adaptation, a type of transfer learning, can alleviate this problem by aligning the statistical distributions of features and samples among different datasets so that similar models can be applied across them. However, a majority of state-of-the-art domain adaptation methods are designed to work with large-scale data, mostly text and images, while biological datasets often suffer from small sample sizes, and possess complexities such as heterogeneity of the feature space. This Review aims to synthetically discuss domain adaptation methods in the context of small-scale and highly heterogeneous biological data. We describe the benefits and challenges of domain adaptation in biological research and critically discuss some of its objectives, strengths, and weaknesses through key representative methodologies. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: main manuscript + supplement

arXiv:2402.00838 [pdf, other]

OLMo: Accelerating the Science of Language Models

Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs. To this end, we have built OLMo, a competitive, truly Open Language Model, to enable the scientific study of language models. Unlike most prior efforts that have only released model weights and inference code, we release OLMo alongside open training data and training and evaluation code. We hope this release will empower the open research community and inspire a new wave of innovation. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2402.00159 [pdf, other]

Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training data impacts model capabilities and limitations. To facilitate scientific research on language model pretraining, we curate and release Dolma, a three-trillion-token English corpus, built from a diverse mixture of web content, scientific papers, code, public-domain books, social media, and encyclopedic materials. We extensively document Dolma, including its design principles, details about its construction, and a summary of its contents. We present analyses and experimental results on intermediate states of Dolma to share what we have learned about important data curation practices. Finally, we open-source our data curation toolkit to enable reproduction of our work as well as support further research in large-scale data curation. △ Less

Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

arXiv:2312.06235 [pdf, other]

On The Effect of Replacement Policies on The Security of Randomized Cache Architectures

Authors: Moritz Peters, Nicolas Gaudin, Jan Philipp Thoma, Vianney Lapôtre, Pascal Cotret, Guy Gogniat, Tim Güneysu

Abstract: Randomizing the mapping of addresses to cache entries has proven to be an effective technique for hardening caches against contention-based attacks like Prime+Prome. While attacks and defenses are still evolving, it is clear that randomized caches significantly increase the security against such attacks. However, one aspect that is missing from most analyses of randomized cache architectures is th… ▽ More Randomizing the mapping of addresses to cache entries has proven to be an effective technique for hardening caches against contention-based attacks like Prime+Prome. While attacks and defenses are still evolving, it is clear that randomized caches significantly increase the security against such attacks. However, one aspect that is missing from most analyses of randomized cache architectures is the choice of the replacement policy. Often, only the random- and LRU replacement policies are investigated. However, LRU is not applicable to randomized caches due to its immense hardware overhead, while the random replacement policy is not ideal from a performance and security perspective. In this paper, we explore replacement policies for randomized caches. We develop two new replacement policies and evaluate a total of five replacement policies regarding their security against Prime+Prune+Probe attackers. Moreover, we analyze the effect of the replacement policy on the system's performance and quantify the introduced hardware overhead. We implement randomized caches with configurable replacement policies in software and hardware using a custom cache simulator, gem5, and the CV32E40P RISC-V core. Among others, we show that the construction of eviction sets with our new policy, VARP-64, requires over 25-times more cache accesses than with the random replacement policy while also enhancing overall performance. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.02034 [pdf, other]

Trust, distrust, and appropriate reliance in (X)AI: a survey of empirical evaluation of user trust

Authors: Roel Visser, Tobias M. Peters, Ingrid Scharlau, Barbara Hammer

Abstract: A current concern in the field of Artificial Intelligence (AI) is to ensure the trustworthiness of AI systems. The development of explainability methods is one prominent way to address this, which has often resulted in the assumption that the use of explainability will lead to an increase in the trust of users and wider society. However, the dynamics between explainability and trust are not well e… ▽ More A current concern in the field of Artificial Intelligence (AI) is to ensure the trustworthiness of AI systems. The development of explainability methods is one prominent way to address this, which has often resulted in the assumption that the use of explainability will lead to an increase in the trust of users and wider society. However, the dynamics between explainability and trust are not well established and empirical investigations of their relation remain mixed or inconclusive. In this paper we provide a detailed description of the concepts of user trust and distrust in AI and their relation to appropriate reliance. For that we draw from the fields of machine learning, human-computer interaction, and the social sciences. Furthermore, we have created a survey of existing empirical studies that investigate the effects of AI systems and XAI methods on user (dis)trust. With clarifying the concepts and summarizing the empirical investigations, we aim to provide researchers, who examine user trust in AI, with an improved starting point for developing user studies to measure and evaluate the user's attitude towards and reliance on AI systems. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.10702 [pdf, other]

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

Abstract: Since the release of TÜLU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into TÜLU, resulting in TÜLU 2, a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user pr… ▽ More Since the release of TÜLU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into TÜLU, resulting in TÜLU 2, a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user preferences. Concretely, we release: (1) TÜLU-V2-mix, an improved collection of high-quality instruction datasets; (2) TÜLU 2, LLAMA-2 models finetuned on the V2 mixture; (3) TÜLU 2+DPO, TÜLU 2 models trained with direct preference optimization (DPO), including the largest DPO-trained model to date (TÜLU 2+DPO 70B); (4) CODE TÜLU 2, CODE LLAMA models finetuned on our V2 mix that outperform CODE LLAMA and its instruction-tuned variant, CODE LLAMA-Instruct. Our evaluation from multiple perspectives shows that the TÜLU 2 suite achieves state-of-the-art performance among open models and matches or exceeds the performance of GPT-3.5-turbo-0301 on several benchmarks. We release all the checkpoints, data, training and evaluation code to facilitate future open efforts on adapting large language models. △ Less

Submitted 19 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: technical report; fixed zephyr numbers

arXiv:2310.02074 [pdf, other]

ACE: A fast, skillful learned global atmospheric model for climate prediction

Authors: Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, Christopher S. Bretherton

Abstract: Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass… ▽ More Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 100 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 90% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources. Without fine-tuning, ACE can stably generalize to a previously unseen historical sea surface temperature dataset. △ Less

Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: Accepted at Tackling Climate Change with Machine Learning: workshop at NeurIPS 2023

arXiv:2308.08708 [pdf, other]

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

Authors: Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen

Abstract: Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of con… ▽ More Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators. △ Less

Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

arXiv:2307.13601 [pdf, other]

doi 10.1007/978-3-031-44070-0

The Importance of Distrust in AI

Authors: Tobias M. Peters, Roel W. Visser

Abstract: In recent years the use of Artificial Intelligence (AI) has become increasingly prevalent in a growing number of fields. As AI systems are being adopted in more high-stakes areas such as medicine and finance, ensuring that they are trustworthy is of increasing importance. A concern that is prominently addressed by the development and application of explainability methods, which are purported to in… ▽ More In recent years the use of Artificial Intelligence (AI) has become increasingly prevalent in a growing number of fields. As AI systems are being adopted in more high-stakes areas such as medicine and finance, ensuring that they are trustworthy is of increasing importance. A concern that is prominently addressed by the development and application of explainability methods, which are purported to increase trust from its users and wider society. While an increase in trust may be desirable, an analysis of literature from different research fields shows that an exclusive focus on increasing trust may not be warranted. Something which is well exemplified by the recent development in AI chatbots, which while highly coherent tend to make up facts. In this contribution, we investigate the concepts of trust, trustworthiness, and user reliance. In order to foster appropriate reliance on AI we need to prevent both disuse of these systems as well as overtrust. From our analysis of research on interpersonal trust, trust in automation, and trust in (X)AI, we identify the potential merit of the distinction between trust and distrust (in AI). We propose that alongside trust a healthy amount of distrust is of additional value for mitigating disuse and overtrust. We argue that by considering and evaluating both trust and distrust, we can ensure that users can rely appropriately on trustworthy AI, which can both be useful as well as fallible. △ Less

Submitted 2 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The version of records of this contribution is published in Explainable Artificial Intelligence First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part III (CCIS, volume 1903) and is available at https://doi.org/10.1007/978-3-031-44070-0

Journal ref: Explainable Artificial Intelligence First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part III (CCIS, volume 1903)

arXiv:2307.09701 [pdf, other]

Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi

Abstract: Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across diffe… ▽ More Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across different institutions. Moreover, improvements in metrics such as FLOPs often fail to translate to progress in real-world applications. In response, we introduce Pentathlon, a benchmark for holistic and realistic evaluation of model efficiency. Pentathlon focuses on inference, which accounts for a majority of the compute in a model's lifecycle. It offers a strictly-controlled hardware platform, and is designed to mirror real-world applications scenarios. It incorporates a suite of metrics that target different aspects of efficiency, including latency, throughput, memory overhead, and energy consumption. Pentathlon also comes with a software library that can be seamlessly integrated into any codebase and enable evaluation. As a standardized and centralized evaluation platform, Pentathlon can drastically reduce the workload to make fair and reproducible efficiency comparisons. While initially focused on natural language processing (NLP) models, Pentathlon is designed to allow flexible extension to other fields. We envision Pentathlon will stimulate algorithmic innovations in building efficient models, and foster an increased awareness of the social and environmental implications in the development of future-generation NLP models. △ Less

Submitted 18 July, 2023; originally announced July 2023.

arXiv:2305.15387 [pdf, other]

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

Authors: Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

Abstract: The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systemati… ▽ More The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systematically generate semantically-oriented questions from a salient sentence in one document and challenge the model, during pre-training, to answer these questions while "peeking" into other topically-related documents. In a similar manner, the model is also challenged to recover the sentence from which the question was generated, again while leveraging cross-document information. This novel multi-document QA formulation directs the model to better recover cross-text informational relations, and introduces a natural augmentation that artificially increases the pre-training data. Further, unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation (e.g., QA) and long text generation (e.g., summarization). Following this scheme, we pre-train our model -- termed QAmden -- and evaluate its performance across several multi-document tasks, including multi-document QA, summarization, and query-focused summarization, yielding improvements of up to 7%, and significantly outperforms zero-shot GPT-3.5 and GPT-4. △ Less

Submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at ACL 2023; camera-ready version

arXiv:2305.08379 [pdf, other]

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

Authors: Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

Abstract: Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-c… ▽ More Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-conditioned Simplex Diffusion (TESS), a text diffusion model that is fully non-autoregressive, employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. Through extensive experiments on natural language understanding and generation tasks including summarization, text simplification, paraphrase generation, and question generation, we demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models. We publicly release our codebase at https://github.com/allenai/tess-diffusion. △ Less

Submitted 20 February, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

Comments: EACL 2024

arXiv:2302.07027 [pdf, other]

AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

Authors: Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge

Abstract: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel d… ▽ More Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel domain at test time. In this paper, we introduce AdapterSoup, an approach that performs weight-space averaging of adapters trained on different domains. Our approach is embarrassingly parallel: first, we train a set of domain-specific adapters; then, for each novel domain, we determine which adapters should be averaged at test time. We present extensive experiments showing that AdapterSoup consistently improves performance to new domains without extra training. We also explore weight averaging of adapters trained on the same domain with different hyper-parameters, and show that it preserves the performance of a PLM on new domains while obtaining strong in-domain results. We explore various approaches for choosing which adapters to combine, such as text clustering and semantic similarity. We find that using clustering leads to the most competitive results on novel domains. △ Less

Submitted 28 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Comments: Accepted at EACL 2023; camera-ready version; fixed typo in related work

arXiv:2212.10315 [pdf, other]

HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation

Authors: Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew Peters

Abstract: Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on concatenating lengthy instructions with every input example, resulting in costly reprocessing of the instruction. To avoid this, we introduce Hyper… ▽ More Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on concatenating lengthy instructions with every input example, resulting in costly reprocessing of the instruction. To avoid this, we introduce Hypernetworks for INstruction Tuning (HINT), which convert task instructions and examples into parameter-efficient modules inserted into an underlying model using a pretrained text encoder, eliminating the need to include instructions in the model input. The hypernetwork in HINT also produces an encoded instruction, which we concatenate with encoded inputs during decoding to further improve performance. HINT models outperform strong state-of-the-art baselines by over 10% when controlling for compute (measured in FLOPs). By converting instructions into modules, HINT models can effectively disregard the length of instructions and few-shot example inputs in terms of compute usage. As a result, HINT can enhance its performance by up to 25% by incorporating additional few-shot data, while utilizing only up to 5% more compute. This combines the strengths of parameter-efficient fine-tuning and in-context learning. △ Less

Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

Comments: ACL 2023

arXiv:2210.13575 [pdf, other]

Does Self-Rationalization Improve Robustness to Spurious Correlations?

Authors: Alexis Ross, Matthew E. Peters, Ana Marasović

Abstract: Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to… ▽ More Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to self-rationalize can aid in their learning to solve tasks for the right reasons. Specifically, we evaluate how training self-rationalization models with free-text rationales affects robustness to spurious correlations in fine-tuned encoder-decoder and decoder-only models of six different sizes. We evaluate robustness to spurious correlations by measuring performance on 1) manually annotated challenge datasets and 2) subsets of original test sets where reliance on spurious correlations would fail to produce correct answers. We find that while self-rationalization can improve robustness to spurious correlations in low-resource settings, it tends to hurt robustness in higher-resource settings. Furthermore, these effects depend on model family and size, as well as on rationale content. Together, our results suggest that explainability can come at the cost of robustness; thus, appropriate care should be taken when training self-rationalizing models with the goal of creating more trustworthy models. △ Less

Submitted 24 October, 2022; originally announced October 2022.

arXiv:2208.08478 [pdf]

"Task-relevant autoencoding" enhances machine learning for human neuroscience

Authors: Seyedmehdi Orouji, Vincent Taschereau-Dumouchel, Aurelio Cortese, Brian Odegaard, Cody Cushing, Mouslim Cherkaoui, Mitsuo Kawato, Hakwan Lau, Megan A. K. Peters

Abstract: In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience a… ▽ More In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience are precisely those relevant to subjects' behavior. We thus developed a Task-Relevant Autoencoder via Classifier Enhancement (TRACE), and tested its ability to extract behaviorally-relevant, separable representations compared to a standard autoencoder, a variational autoencoder, and principal component analysis for two severely truncated machine learning datasets. We then evaluated all models on fMRI data from 59 subjects who observed animals and objects. TRACE outperformed all models nearly unilaterally, showing up to 12% increased classification accuracy and up to 56% improvement in discovering "cleaner", task-relevant representations. These results showcase TRACE's potential for a wide variety of data related to human behavior. △ Less

Submitted 22 September, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

Comments: 41 pages, 11 figures, 5 tables including supplemental material

arXiv:2207.00227 [pdf]

Introducing flexible perovskites to the IoT world using photovoltaic-powered wireless tags

Authors: Sai Nithin Reddy Kantareddy, Rahul Bhattacharya, Sanjay E. Sarma, Ian Mathews, Janak Thapa, Liu Zhe, Shijing Sun, Ian Marius Peters, Tonio Buonassisi

Abstract: Billions of everyday objects could become part of the Internet of Things (IoT) by augmentation with low-cost, long-range, maintenance-free wireless sensors. Radio Frequency Identification (RFID) is a low-cost wireless technology that could enable this vision, but it is constrained by short communication range and lack of sufficient energy available to power auxiliary electronics and sensors. Here,… ▽ More Billions of everyday objects could become part of the Internet of Things (IoT) by augmentation with low-cost, long-range, maintenance-free wireless sensors. Radio Frequency Identification (RFID) is a low-cost wireless technology that could enable this vision, but it is constrained by short communication range and lack of sufficient energy available to power auxiliary electronics and sensors. Here, we explore the use of flexible perovskite photovoltaic cells to provide external power to semi-passive RFID tags to increase range and energy availability for external electronics such as microcontrollers and digital sensors. Perovskites are intriguing materials that hold the possibility to develop high-performance, low-cost, optically tunable (to absorb different light spectra), and flexible light energy harvesters. Our prototype perovskite photovoltaic cells on plastic substrates have an efficiency of 13% and a voltage of 0.88 V at maximum power under standard testing conditions. We built prototypes of RFID sensors powered with these flexible photovoltaic cells to demonstrate real-world applications. Our evaluation of the prototypes suggests that: i) flexible PV cells are durable up to a bending radius of 5 mm with only a 20 % drop in relative efficiency; ii) RFID communication range increased by 5x, and meets the energy needs (10-350 microwatt) to enable self-powered wireless sensors; iii) perovskite powered wireless sensors enable many battery-less sensing applications (e.g., perishable good monitoring, warehouse automation) △ Less

Submitted 1 July, 2022; originally announced July 2022.

arXiv:2205.15772 [pdf, other]

The hybrid approach -- Convolutional Neural Networks and Expectation Maximization Algorithm -- for Tomographic Reconstruction of Hyperspectral Images

Authors: Mads J. Ahlebæk, Mads S. Peters, Wei-Chih Huang, Mads T. Frandsen, René L. Eriksen, Bjarke Jørgensen

Abstract: We present a simple but novel hybrid approach to hyperspectral data cube reconstruction from computed tomography imaging spectrometry (CTIS) images that sequentially combines neural networks and the iterative Expectation Maximization (EM) algorithm. We train and test the ability of the method to reconstruct data cubes of $100\times100\times25$ and $100\times100\times100$ voxels, corresponding to 2… ▽ More We present a simple but novel hybrid approach to hyperspectral data cube reconstruction from computed tomography imaging spectrometry (CTIS) images that sequentially combines neural networks and the iterative Expectation Maximization (EM) algorithm. We train and test the ability of the method to reconstruct data cubes of $100\times100\times25$ and $100\times100\times100$ voxels, corresponding to 25 and 100 spectral channels, from simulated CTIS images generated by our CTIS simulator. The hybrid approach utilizes the inherent strength of the Convolutional Neural Network (CNN) with regard to noise and its ability to yield consistent reconstructions and make use of the EM algorithm's ability to generalize to spectral images of any object without training. The hybrid approach achieves better performance than both the CNNs and EM alone for seen (included in CNN training) and unseen (excluded from CNN training) cubes for both the 25- and 100-channel cases. For the 25 spectral channels, the improvements from CNN to the hybrid model (CNN + EM) in terms of the mean-squared errors are between 14-26%. For 100 spectral channels, the improvements between 19-40% are attained with the largest improvement of 40% for the unseen data, to which the CNNs are not exposed during the training. △ Less

Submitted 19 December, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: 36 pages, 13 figures and 2 tables. Supplemental material: 21 pages and 14 figures. v2: Clarifications added, analyses and argumentation updated

arXiv:2205.11961 [pdf, other]

ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts

Authors: Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi

Abstract: This work introduces a new multi-task, parameter-efficient language model (LM) tuning method that learns to transfer knowledge across different tasks via a mixture of soft prompts-small prefix embedding vectors pre-trained for different tasks. Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of p… ▽ More This work introduces a new multi-task, parameter-efficient language model (LM) tuning method that learns to transfer knowledge across different tasks via a mixture of soft prompts-small prefix embedding vectors pre-trained for different tasks. Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of parameters and trains an attention module to interpolate the source prompts and a newly initialized target prompt for every instance in the target task. During training, only the target task prompt and the attention weights, which are shared between tasks in multi-task training, are updated, while the original LM and source prompts are intact. ATTEMPT is highly parameter-efficient (e.g., updates 2,300 times fewer parameters than full fine-tuning) while achieving high task performance using knowledge from high-resource tasks. Moreover, it is modular using pre-trained soft prompts, and can flexibly add or remove source prompts for effective knowledge transfer. Our experimental results across 21 diverse NLP datasets show that ATTEMPT significantly outperforms prompt tuning and outperforms or matches fully fine-tuned or other parameter-efficient tuning approaches that use over ten times more parameters. Finally, ATTEMPT outperforms previous work in few-shot learning settings. △ Less

Submitted 1 December, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

Comments: Published as a conference paper at EMNLP 2022 (long). Code available at https://github.com/AkariAsai/ATTEMPT

arXiv:2205.05124 [pdf, other]

Extracting Latent Steering Vectors from Pretrained Language Models

Authors: Nishant Subramani, Nivedita Suresh, Matthew E. Peters

Abstract: Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vect… ▽ More Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vectors directly from pretrained language model decoders without fine-tuning. Experiments show that there exist steering vectors, which, when added to the hidden states of the language model, generate a target sentence nearly perfectly (> 99 BLEU) for English sentences from a variety of domains. We show that vector arithmetic can be used for unsupervised sentiment transfer on the Yelp sentiment benchmark, with performance comparable to models tailored to this task. We find that distances between steering vectors reflect sentence similarity when evaluated on a textual similarity benchmark (STS-B), outperforming pooled hidden states of models. Finally, we present an analysis of the intrinsic properties of the steering vectors. Taken together, our results suggest that frozen LMs can be effectively controlled through their latent steering space. △ Less

Submitted 10 May, 2022; originally announced May 2022.

Comments: Accepted to ACL2022 Findings; 16 pages (9 pages plus references and appendices); Code: https://github.com/nishantsubramani/steering_vectors; Some text overlap with arXiv:2008.09049

arXiv:2204.02733 [pdf, other]

Georeferencing of Photovoltaic Modules from Aerial Infrared Videos using Structure-from-Motion

Authors: Lukas Bommes, Claudia Buerhop-Lutz, Tobias Pickel, Jens Hauch, Christoph Brabec, Ian Marius Peters

Abstract: To identify abnormal photovoltaic (PV) modules in large-scale PV plants economically, drone-mounted infrared (IR) cameras and automated video processing algorithms are frequently used. While most related works focus on the detection of abnormal modules, little has been done to automatically localize those modules within the plant. In this work, we use incremental structure-from-motion to automatic… ▽ More To identify abnormal photovoltaic (PV) modules in large-scale PV plants economically, drone-mounted infrared (IR) cameras and automated video processing algorithms are frequently used. While most related works focus on the detection of abnormal modules, little has been done to automatically localize those modules within the plant. In this work, we use incremental structure-from-motion to automatically obtain geocoordinates of all PV modules in a plant based on visual cues and the measured GPS trajectory of the drone. In addition, we extract multiple IR images of each PV module. Using our method, we successfully map 99.3 % of the 35084 modules in four large-scale and one rooftop plant and extract over 2.2 million module images. As compared to our previous work, extraction misses 18 times less modules (one in 140 modules as compared to one in eight). Furthermore, two or three plant rows can be processed simultaneously, increasing module throughput and reducing flight duration by a factor of 2.1 and 3.7, respectively. Comparison with an accurate orthophoto of one of the large-scale plants yields a root mean square error of the estimated module geocoordinates of 5.87 m and a relative error within each plant row of 0.22 m to 0.82 m. Finally, we use the module geocoordinates and extracted IR images to visualize distributions of module temperatures and anomaly predictions of a deep learning classifier on a map. While the temperature distribution helps to identify disconnected strings, we also find that its detection accuracy for module anomalies reaches, or even exceeds, that of a deep learning classifier for seven out of ten common anomaly types. The software is published at https://github.com/LukasBommes/PV-Hawk. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.08304 [pdf, other]

Hyperdecoders: Instance-specific decoders for multi-task NLP

Authors: Hamish Ivison, Matthew E. Peters

Abstract: We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder adaptation for every input instance, allowing the network a larger degree of flexibility than prior work that only produces one decoder adaptation per task. We apply ou… ▽ More We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder adaptation for every input instance, allowing the network a larger degree of flexibility than prior work that only produces one decoder adaptation per task. We apply our method to sequence classification tasks, extractive QA, and summarisation and find that it surpasses previous parameter efficient fine-tuning methods and often outperforms fully finetuning the underlying model. An analysis of the embeddings used by our hypernetwork shows that they are sensitive to output label and type, suggesting that our approach better maps from encoder representations to output labels. Our code is publicly available at https://github.com/allenai/hyperdecoders. △ Less

Submitted 18 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

Comments: Accepted to Findings of EMNLP 2022

arXiv:2203.06211 [pdf, other]

Staged Training for Transformer Language Models

Authors: Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew Peters, Iz Beltagy

Abstract: The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width. By initializing each stage with the output… ▽ More The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width. By initializing each stage with the output of the previous one, the training process effectively re-uses the compute from prior stages and becomes more efficient. Our growth operators each take as input the entire training state (including model parameters, optimizer state, learning rate schedule, etc.) and output a new training state from which training continues. We identify two important properties of these growth operators, namely that they preserve both the loss and the "training dynamics" after applying the operator. While the loss-preserving property has been discussed previously, to the best of our knowledge this work is the first to identify the importance of preserving the training dynamics (the rate of decrease of the loss during training). To find the optimal schedule for stages, we use the scaling laws from (Kaplan et al., 2020) to find a precise schedule that gives the most compute saving by starting a new stage when training efficiency starts decreasing. We empirically validate our growth operators and staged training for autoregressive language models, showing up to 22% compute savings compared to a strong baseline trained from scratch. Our code is available at https://github.com/allenai/staged-training. △ Less

Submitted 11 March, 2022; originally announced March 2022.

arXiv:2201.08164 [pdf, other]

doi 10.1145/3583558

From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

Authors: Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Jörg Schlötterer, Maurice van Keulen, Christin Seifert

Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness a… ▽ More The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness and Correctness, that should be evaluated for comprehensively assessing the quality of an explanation. Our so-called Co-12 properties serve as categorization scheme for systematically reviewing the evaluation practices of more than 300 papers published in the last 7 years at major AI and ML conferences that introduce an XAI method. We find that 1 in 3 papers evaluate exclusively with anecdotal evidence, and 1 in 5 papers evaluate with users. This survey also contributes to the call for objective, quantifiable evaluation methods by presenting an extensive overview of quantitative XAI evaluation methods. Our systematic collection of evaluation methods provides researchers and practitioners with concrete tools to thoroughly validate, benchmark and compare new and existing XAI methods. The Co-12 categorization scheme and our identified evaluation methods open up opportunities to include quantitative metrics as optimization criteria during model training in order to optimize for accuracy and interpretability simultaneously. △ Less

Submitted 24 February, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

Comments: Published in ACM Computing Surveys (DOI http://dx.doi.org/10.1145/3583558). This ArXiv version includes the supplementary material. Website with categorization of XAI methods at https://utwente-dmb.github.io/xai-papers/

arXiv:2112.08786 [pdf, other]

Efficient Hierarchical Domain Adaptation for Pretrained Language Models

Authors: Alexandra Chronopoulou, Matthew E. Peters, Jesse Dodge

Abstract: The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we… ▽ More The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we introduce a method to permit domain adaptation to many diverse domains using a computationally efficient adapter approach. Our method is based on the observation that textual domains are partially overlapping, and we represent domains as a hierarchical tree structure where each node in the tree is associated with a set of adapter weights. When combined with a frozen pretrained language model, this approach enables parameter sharing among related domains, while avoiding negative interference between unrelated ones. Experimental results with GPT-2 and a large fraction of the 100 most represented websites in C4 show across-the-board improvements in-domain. We additionally provide an inference time algorithm for a held-out domain and show that averaging over multiple paths through the tree enables further gains in generalization, while adding only a marginal cost to inference. △ Less

Submitted 3 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

Comments: NAACL 2022 accepted paper camera ready version

arXiv:2112.02922 [pdf, other]

Anomaly Detection in IR Images of PV Modules using Supervised Contrastive Learning

Authors: Lukas Bommes, Mathis Hoffmann, Claudia Buerhop-Lutz, Tobias Pickel, Jens Hauch, Christoph Brabec, Andreas Maier, Ian Marius Peters

Abstract: Increasing deployment of photovoltaic (PV) plants requires methods for automatic detection of faulty PV modules in modalities, such as infrared (IR) images. Recently, deep learning has become popular for this. However, related works typically sample train and test data from the same distribution ignoring the presence of domain shift between data of different PV plants. Instead, we frame fault dete… ▽ More Increasing deployment of photovoltaic (PV) plants requires methods for automatic detection of faulty PV modules in modalities, such as infrared (IR) images. Recently, deep learning has become popular for this. However, related works typically sample train and test data from the same distribution ignoring the presence of domain shift between data of different PV plants. Instead, we frame fault detection as more realistic unsupervised domain adaptation problem where we train on labelled data of one source PV plant and make predictions on another target plant. We train a ResNet-34 convolutional neural network with a supervised contrastive loss, on top of which we employ a k-nearest neighbor classifier to detect anomalies. Our method achieves a satisfactory area under the receiver operating characteristic (AUROC) of 73.3 % to 96.6 % on nine combinations of four source and target datasets with 2.92 million IR images of which 8.5 % are anomalous. It even outperforms a binary cross-entropy classifier in some cases. With a fixed decision threshold this results in 79.4 % and 77.1 % correctly classified normal and anomalous images, respectively. Most misclassified anomalies are of low severity, such as hot diodes and small hot spots. Our method is insensitive to hyperparameter settings, converges quickly and reliably detects unknown types of anomalies making it well suited for practice. Possible uses are in automatic PV plant inspection systems or to streamline manual labelling of IR datasets by filtering out normal images. Furthermore, our work serves the community with a more realistic view on PV module fault detection using unsupervised domain adaptation to develop more performant methods with favorable generalization capabilities. △ Less

Submitted 6 December, 2021; originally announced December 2021.

arXiv:2111.08284 [pdf, other]

Few-Shot Self-Rationalization with Natural Language Prompts

Authors: Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters

Abstract: Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using fe… ▽ More Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using few training examples. We present FEB -- a standardized collection of four existing English-language datasets and associated metrics. We identify the right prompting approach by extensively exploring natural language prompts on FEB. Then, by using this prompt and scaling the model size, we demonstrate that making progress on few-shot self-rationalization is possible. We show there is still ample room for improvement in this task: the average plausibility of generated explanations assessed by human annotators is at most 51% (with GPT-3), while plausibility of human explanations is 76%. We hope that FEB and our proposed approach will spur the community to take on the few-shot self-rationalization challenge. △ Less

Submitted 25 April, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: v2: NAACL Findings 2022 accepted paper camera-ready version. First two authors contributed equally. 9 pages main, 3 pages appendix

arXiv:2109.06707 [pdf, other]

A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients

Authors: Adam Izdebski, Patrick J. Thoral, Robbert C. A. Lalisang, Dean M. McHugh, Diederik Gommers, Olaf L. Cremer, Rob J. Bosman, Sander Rigter, Evert-Jan Wils, Tim Frenzel, Dave A. Dongelmans, Remko de Jong, Marco A. A. Peters, Marlijn J. A Kamps, Dharmanand Ramnarain, Ralph Nowitzky, Fleur G. C. A. Nooteboom, Wouter de Ruijter, Louise C. Urlings-Strop, Ellen G. M. Smit, D. Jannet Mehagnoul-Schipper, Tom Dormans, Cornelis P. C. de Jager, Stefaan H. A. Hendriks, Sefanja Achterberg , et al. (21 additional authors not shown)

Abstract: Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodo… ▽ More Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodology to obtain preliminary but robust estimation of treatment effect from observational studies, to provide front-line clinicians with a degree of confidence in their treatment strategy. Our study design is applied to an open problem, the estimation of treatment effect of the proning maneuver on COVID-19 Intensive Care patients. △ Less

Submitted 3 December, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

arXiv:2108.13640 [pdf, other]

Module-Power Prediction from PL Measurements using Deep Learning

Authors: Mathis Hoffmann, Johannes Hepp, Bernd Doll, Claudia Buerhop-Lutz, Ian Marius Peters, Christoph Brabec, Andreas Maier, Vincent Christlein

Abstract: The individual causes for power loss of photovoltaic modules are investigated for quite some time. Recently, it has been shown that the power loss of a module is, for example, related to the fraction of inactive areas. While these areas can be easily identified from electroluminescense (EL) images, this is much harder for photoluminescence (PL) images. With this work, we close the gap between powe… ▽ More The individual causes for power loss of photovoltaic modules are investigated for quite some time. Recently, it has been shown that the power loss of a module is, for example, related to the fraction of inactive areas. While these areas can be easily identified from electroluminescense (EL) images, this is much harder for photoluminescence (PL) images. With this work, we close the gap between power regression from EL and PL images. We apply a deep convolutional neural network to predict the module power from PL images with a mean absolute error (MAE) of 4.4% or 11.7WP. Furthermore, we depict that regression maps computed from the embeddings of the trained network can be used to compute the localized power loss. Finally, we show that these regression maps can be used to identify inactive regions in PL images as well. △ Less

Submitted 31 August, 2021; originally announced August 2021.

arXiv:2108.13458 [pdf, other]

The Application of Convolutional Neural Networks for Tomographic Reconstruction of Hyperspectral Images

Authors: Wei-Chih Huang, Mads Svanborg Peters, Mads Juul Ahlebaek, Mads Toudal Frandsen, René Lynge Eriksen, Bjarke Jørgensen

Abstract: A novel method, utilizing convolutional neural networks (CNNs), is proposed to reconstruct hyperspectral cubes from computed tomography imaging spectrometer (CTIS) images. Current reconstruction algorithms are usually subject to long reconstruction times and mediocre precision in cases of a large number of spectral channels. The constructed CNNs deliver higher precision and shorter reconstruction… ▽ More A novel method, utilizing convolutional neural networks (CNNs), is proposed to reconstruct hyperspectral cubes from computed tomography imaging spectrometer (CTIS) images. Current reconstruction algorithms are usually subject to long reconstruction times and mediocre precision in cases of a large number of spectral channels. The constructed CNNs deliver higher precision and shorter reconstruction time than a sparse expectation maximization algorithm. In addition, the network can handle two different types of real-world images at the same time -- specifically ColorChecker and carrot spectral images are considered. This work paves the way toward real-time reconstruction of hyperspectral cubes from CTIS images. △ Less

Submitted 14 March, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

Comments: 31 pages, 18 figures and 4 tables. v2: clarifications and references added, analyses and network diagrams updated

arXiv:2107.07150 [pdf, other]

Tailor: Generating and Perturbing Text with Semantic Controls

Authors: Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

Abstract: Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived fr… ▽ More Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived from semantic representations. We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes. These operations can be further composed into higher-level ones, allowing for flexible perturbation strategies. We demonstrate the effectiveness of these perturbations in multiple applications. First, we use Tailor to automatically create high-quality contrast sets for four distinct natural language processing (NLP) tasks. These contrast sets contain fewer spurious artifacts and are complementary to manually annotated ones in their lexical diversity. Second, we show that Tailor perturbations can improve model generalization through data augmentation. Perturbing just 2% of training data leads to a 5.8-point gain on an NLI challenge set measuring reliance on syntactic heuristics. △ Less

Submitted 17 March, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

arXiv:2106.07314 [pdf, other]

doi 10.1002/pip.3448

Computer Vision Tool for Detection, Mapping and Fault Classification of PV Modules in Aerial IR Videos

Authors: Lukas Bommes, Tobias Pickel, Claudia Buerhop-Lutz, Jens Hauch, Christoph Brabec, Ian Marius Peters

Abstract: Increasing deployment of photovoltaics (PV) plants demands for cheap and fast inspection. A viable tool for this task is thermographic imaging by unmanned aerial vehicles (UAV). In this work, we develop a computer vision tool for the semi-automatic extraction of PV modules from thermographic UAV videos. We use it to curate a dataset containing 4.3 million IR images of 107842 PV modules from thermo… ▽ More Increasing deployment of photovoltaics (PV) plants demands for cheap and fast inspection. A viable tool for this task is thermographic imaging by unmanned aerial vehicles (UAV). In this work, we develop a computer vision tool for the semi-automatic extraction of PV modules from thermographic UAV videos. We use it to curate a dataset containing 4.3 million IR images of 107842 PV modules from thermographic videos of seven different PV plants. To demonstrate its use for automated PV plant inspection, we train a ResNet-50 to classify ten common module anomalies with more than 90 % test accuracy. Experiments show that our tool generalizes well to different PV plants. It successfully extracts PV modules from 512 out of 561 plant rows. Failures are mostly due to an inappropriate UAV trajectory and erroneous module segmentation. Including all manual steps our tool enables inspection of 3.5 MW p to 9 MW p of PV installations per day, potentially scaling to multi-gigawatt plants due to its parallel nature. While we present an effective method for automated PV plant inspection, we are also confident that our approach helps to meet the growing demand for large thermographic datasets for machine learning tasks, such as power prediction or unsupervised defect identification. △ Less

Submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.00188 [pdf, other]

PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

Authors: Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

Abstract: We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language mode… ▽ More We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language model, giving us a unified model of linguistic form and grounded meaning. PIGLeT can read a sentence, simulate neurally what might happen next, and then communicate that result through a literal symbolic representation, or natural language. Experimental results show that our model effectively learns world dynamics, along with how to communicate them. It is able to correctly forecast "what happens next" given an English sentence over 80% of the time, outperforming a 100x larger, text-to-text approach by over 10%. Likewise, its natural language summaries of physical interactions are also judged by humans as more accurate than LM alternatives. We present comprehensive analysis showing room for future work. △ Less

Submitted 30 January, 2022; v1 submitted 31 May, 2021; originally announced June 2021.

Comments: ACL 2021 camera ready, project page at https://rowanzellers.com/piglet/

arXiv:2104.08646 [pdf, other]

Competency Problems: On Finding and Removing Artifacts in Language Data

Authors: Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

Abstract: Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have "spurious" instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a… ▽ More Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have "spurious" instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems. For example, the word "amazing" on its own should not give information about a sentiment label independent of the context in which it appears, which could include negation, metaphor, sarcasm, etc. We theoretically analyze the difficulty of creating data for competency problems when human bias is taken into account, showing that realistic datasets will increasingly deviate from competency problems as dataset size increases. This analysis gives us a simple statistical test for dataset artifacts, which we use to show more subtle biases than were described in prior work, including demonstrating that models are inappropriately affected by these less extreme biases. Our theoretical treatment of this problem also allows us to analyze proposed solutions, such as making local edits to dataset instances, and to give recommendations for future data collection and model design efforts that target competency problems. △ Less

Submitted 28 December, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

Comments: EMNLP 2021. This version fixes an error in Proposition 1 and adds discussion (the EMNLP camera ready version is unfixed) (and v3 adds the acknowledgements that we forgot to put into v2)

arXiv:2101.00406 [pdf, other]

CDLM: Cross-Document Language Modeling

Authors: Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

Abstract: We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by… ▽ More We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by introducing dynamic global attention that has access to the entire input to predict masked tokens. We release CDLM (Cross-Document Language Model), a new general language model for multi-document setting that can be easily applied to downstream tasks. Our extensive analysis shows that both ideas are essential for the success of CDLM, and work in synergy to set new state-of-the-art results for several multi-text tasks. Code and models are available at https://github.com/aviclu/CDLM. △ Less

Submitted 2 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

Comments: EMNLP 2021, findings

arXiv:2012.13985 [pdf, other]

Explaining NLP Models via Minimal Contrastive Editing (MiCE)

Authors: Alexis Ross, Ana Marasović, Matthew E. Peters

Abstract: Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contr… ▽ More Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contrastive explanations of model predictions in the form of edits to inputs that change model outputs to the contrast case. Our experiments across three tasks--binary sentiment classification, topic classification, and multiple-choice question answering--show that MiCE is able to produce edits that are not only contrastive, but also minimal and fluent, consistent with human contrastive edits. We demonstrate how MiCE edits can be used for two use cases in NLP system development--debugging incorrect model outputs and uncovering dataset artifacts--and thereby illustrate that producing contrastive explanations is a promising research direction for model interpretability. △ Less

Submitted 23 June, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

arXiv:2011.08115 [pdf, other]

Learning from Task Descriptions

Authors: Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters

Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this fra… ▽ More Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this framework with a new English language dataset, ZEST, structured for task-oriented evaluation on unseen tasks. Formulating task descriptions as questions, we ensure each is general enough to apply to many possible inputs, thus comprehensively evaluating a model's ability to solve each task. Moreover, the dataset's structure tests specific types of systematic generalization. We find that the state-of-the-art T5 model achieves a score of 12% on ZEST, leaving a significant challenge for NLP researchers. △ Less

Submitted 16 November, 2020; originally announced November 2020.

Comments: EMNLP 2020

arXiv:2011.05003 [pdf, other]

doi 10.1109/JPHOTOV.2021.3072229

Joint Super-Resolution and Rectification for Solar Cell Inspection

Authors: Mathis Hoffmann, Thomas Köhler, Bernd Doll, Frank Schebesch, Florian Talkenberg, Ian Marius Peters, Christoph J. Brabec, Andreas Maier, Vincent Christlein

Abstract: Visual inspection of solar modules is an important monitoring facility in photovoltaic power plants. Since a single measurement of fast CMOS sensors is limited in spatial resolution and often not sufficient to reliably detect small defects, we apply multi-frame super-resolution (MFSR) to a sequence of low resolution measurements. In addition, the rectification and removal of lens distortion simpli… ▽ More Visual inspection of solar modules is an important monitoring facility in photovoltaic power plants. Since a single measurement of fast CMOS sensors is limited in spatial resolution and often not sufficient to reliably detect small defects, we apply multi-frame super-resolution (MFSR) to a sequence of low resolution measurements. In addition, the rectification and removal of lens distortion simplifies subsequent analysis. Therefore, we propose to fuse this pre-processing with standard MFSR algorithms. This is advantageous, because we omit a separate processing step, the motion estimation becomes more stable and the spacing of high-resolution (HR) pixels on the rectified module image becomes uniform w. r. t. the module plane, regardless of perspective distortion. We present a comprehensive user study showing that MFSR is beneficial for defect recognition by human experts and that the proposed method performs better than the state of the art. Furthermore, we apply automated crack segmentation and show that the proposed method performs 3x better than bicubic upsampling and 2x better than the state of the art for automated inspection. △ Less

Submitted 7 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

arXiv:2010.00991 [pdf, other]

doi 10.1007/978-3-030-59861-7

RDCNet: Instance segmentation with a minimalist recurrent residual network

Authors: Raphael Ortiz, Gustavo de Medeiros, Antoine H. F. M. Peters, Prisca Liberali, Markus Rempfler

Abstract: Instance segmentation is a key step for quantitative microscopy. While several machine learning based methods have been proposed for this problem, most of them rely on computationally complex models that are trained on surrogate tasks. Building on recent developments towards end-to-end trainable instance segmentation, we propose a minimalist recurrent network called recurrent dilated convolutional… ▽ More Instance segmentation is a key step for quantitative microscopy. While several machine learning based methods have been proposed for this problem, most of them rely on computationally complex models that are trained on surrogate tasks. Building on recent developments towards end-to-end trainable instance segmentation, we propose a minimalist recurrent network called recurrent dilated convolutional network (RDCNet), consisting of a shared stacked dilated convolution (sSDC) layer that iteratively refines its output and thereby generates interpretable intermediate predictions. It is light-weight and has few critical hyperparameters, which can be related to physical aspects such as object size or density.We perform a sensitivity analysis of its main parameters and we demonstrate its versatility on 3 tasks with different imaging modalities: nuclear segmentation of H&E slides, of 3D anisotropic stacks from light-sheet fluorescence microscopy and leaf segmentation of top-view images of plants. It achieves state-of-the-art on 2 of the 3 datasets. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: Accepted at MICCAI-MLMI 2020 workshop

arXiv:2009.14712 [pdf, other]

Deep Learning-based Pipeline for Module Power Prediction from EL Measurements

Authors: Mathis Hoffmann, Claudia Buerhop-Lutz, Luca Reeb, Tobias Pickel, Thilo Winkler, Bernd Doll, Tobias Würfl, Ian Marius Peters, Christoph Brabec, Andreas Maier, Vincent Christlein

Abstract: Automated inspection plays an important role in monitoring large-scale photovoltaic power plants. Commonly, electroluminescense measurements are used to identify various types of defects on solar modules but have not been used to determine the power of a module. However, knowledge of the power at maximum power point is important as well, since drops in the power of a single module can affect the p… ▽ More Automated inspection plays an important role in monitoring large-scale photovoltaic power plants. Commonly, electroluminescense measurements are used to identify various types of defects on solar modules but have not been used to determine the power of a module. However, knowledge of the power at maximum power point is important as well, since drops in the power of a single module can affect the performance of an entire string. By now, this is commonly determined by measurements that require to discontact or even dismount the module, rendering a regular inspection of individual modules infeasible. In this work, we bridge the gap between electroluminescense measurements and the power determination of a module. We compile a large dataset of 719 electroluminescense measurementsof modules at various stages of degradation, especially cell cracks and fractures, and the corresponding power at maximum power point. Here,we focus on inactive regions and cracks as the predominant type of defect. We set up a baseline regression model to predict the power from electroluminescense measurements with a mean absolute error of 9.0+/-3.7$W_P$ (4.0+/-8.4%). Then, we show that deep-learning can be used to train a model that performs significantly better (7.3+/-2.7$W_P$ or 3.2+/-6.5%) and propose a variant of class activation maps to obtain the per cell power loss, as predicted by the model. With this work, we aim to open a new research topic. Therefore, we publicly release the dataset, the code and trained models to empower other researchers to compare against our results. Finally, we present a thorough evaluation of certain boundary conditions like the dataset size and an automated preprocessing pipeline for on-site measurements showing multiple modules at once. △ Less

Submitted 26 November, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

arXiv:2004.05150 [pdf, other]

Longformer: The Long-Document Transformer

Authors: Iz Beltagy, Matthew E. Peters, Arman Cohan

Abstract: Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in rep… ▽ More Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in replacement for the standard self-attention and combines a local windowed attention with a task motivated global attention. Following prior work on long-sequence transformers, we evaluate Longformer on character-level language modeling and achieve state-of-the-art results on text8 and enwik8. In contrast to most prior work, we also pretrain Longformer and finetune it on a variety of downstream tasks. Our pretrained Longformer consistently outperforms RoBERTa on long document tasks and sets new state-of-the-art results on WikiHop and TriviaQA. We finally introduce the Longformer-Encoder-Decoder (LED), a Longformer variant for supporting long document generative sequence-to-sequence tasks, and demonstrate its effectiveness on the arXiv summarization dataset. △ Less

Submitted 2 December, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

Comments: Version 2 introduces the Longformer-Encoder-Decoder (LED) model

arXiv:2002.04108 [pdf, other]

Adversarial Filters of Dataset Biases

Authors: Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi

Abstract: Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting to spurious dataset biases. We investigate one recently proposed approach, AFLite,… ▽ More Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting to spurious dataset biases. We investigate one recently proposed approach, AFLite, which adversarially filters such dataset biases, as a means to mitigate the prevalent overestimation of machine performance. We provide a theoretical understanding for AFLite, by situating it in the generalized framework for optimum bias reduction. We present extensive supporting evidence that AFLite is broadly applicable for reduction of measurable dataset biases, and that models trained on the filtered datasets yield better generalization to out-of-distribution tasks. Finally, filtering results in a large drop in model performance (e.g., from 92% to 62% for SNLI), while human performance still remains high. Our work thus shows that such filtered datasets can pose new research challenges for robust generalization by serving as upgraded benchmarks. △ Less

Submitted 10 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

Comments: Accepted to ICML 2020

arXiv:1909.05818 [pdf]

doi 10.1109/JIOT.2019.2913403

Long range battery-less PV-powered RFID tag sensors

Authors: Sai Nithin R. Kantareddy, Ian Mathews, Rahul Bhattacharyya, Ian Marius Peters, Tonio Buonassisi, Sanjay E. Sarma

Abstract: Communication range in passive Radio-Frequency Identification (RFID) front-end devices is a critical barrier in the real-world implementation of this low-cost technology. Purely passive RFID tags power up by harvesting the limited RF energy transmitted by the interrogator, and communicate by backscattering the incident signal. This mode of communication keeps manufacturing costs below a few cents… ▽ More Communication range in passive Radio-Frequency Identification (RFID) front-end devices is a critical barrier in the real-world implementation of this low-cost technology. Purely passive RFID tags power up by harvesting the limited RF energy transmitted by the interrogator, and communicate by backscattering the incident signal. This mode of communication keeps manufacturing costs below a few cents per tag, but the limited power available at the tag undermines long-range deployment. In this paper, we present an approach to use Photovoltaics (PV) to augment the available energy at the tag to improve read range and sensing capabilities. We provide this extra-energy to the RFID integrated circuit (IC) using minimum additional electronics yet enabling persistent sensor-data acquisition. Current and emerging thin-film PV technologies have significant potential for being very low-cost, hence eliminating the barrier for implementation and making of PV-RFID wireless sensors. We reduce the long-range PV-RFID idea to practice by creating functional prototypes of i) a wireless building environment sensor to monitor temperature, and ii) an embedded tracker to find lost golf balls. The read range of PV-RFID is enhanced 8 times compared to conventional passive devices. In addition, the PV-RFID tags persistently transmit large volumes of sensor data (>0.14 million measurements per day) without using batteries. For communication range and energy persistence, we observe good agreement between calculated estimates and experimental results. We have also identified avenues for future research to develop low-cost PV-RFID devices for wireless sensing in the midst of the other competitive wireless technologies such as Bluetooth, Zigbee, Long Range (LoRa) backscatter etc. △ Less

Submitted 12 September, 2019; originally announced September 2019.

Journal ref: IEEE Internet of Things, 2019

arXiv:1909.04164 [pdf, other]

Knowledge Enhanced Contextual Word Representations

Authors: Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith

Abstract: Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we f… ▽ More Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we first use an integrated entity linker to retrieve relevant entity embeddings, then update contextual word representations via a form of word-to-entity attention. In contrast to previous approaches, the entity linkers and self-supervised language modeling objective are jointly trained end-to-end in a multitask setting that combines a small amount of entity linking supervision with a large amount of raw text. After integrating WordNet and a subset of Wikipedia into BERT, the knowledge enhanced BERT (KnowBert) demonstrates improved perplexity, ability to recall facts as measured in a probing task and downstream performance on relationship extraction, entity typing, and word sense disambiguation. KnowBert's runtime is comparable to BERT's and it scales to large KBs. △ Less

Submitted 30 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

Comments: EMNLP 2019

arXiv:1908.11047 [pdf, other]

Shallow Syntax in Deep Water

Authors: Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith

Abstract: Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain. We investigate the role of shallow syntax-aware representations for NLP tasks using two techniques. First, we enhance the ELMo architecture to allow pretraining on predicted shallow syntactic parses, instead of just raw text, so that co… ▽ More Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain. We investigate the role of shallow syntax-aware representations for NLP tasks using two techniques. First, we enhance the ELMo architecture to allow pretraining on predicted shallow syntactic parses, instead of just raw text, so that contextual embeddings make use of shallow syntactic context. Our second method involves shallow syntactic features obtained automatically on downstream task data. Neither approach leads to a significant gain on any of the four downstream tasks we considered relative to ELMo-only baselines. Further analysis using black-box probes confirms that our shallow-syntax-aware contextual embeddings do not transfer to linguistic tasks any more easily than ELMo's embeddings. We take these findings as evidence that ELMo-style pretraining discovers representations which make additional awareness of shallow syntax redundant. △ Less

Submitted 29 August, 2019; originally announced August 2019.

arXiv:1906.07241 [pdf, other]

Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

Authors: Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh

Abstract: Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts fr… ▽ More Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts from a knowledge graph that are relevant to the context. These mechanisms enable the model to render information it has never seen before, as well as generate out-of-vocabulary tokens. We also introduce the Linked WikiText-2 dataset, a corpus of annotated text aligned to the Wikidata knowledge graph whose contents (roughly) match the popular WikiText-2 benchmark. In experiments, we demonstrate that the KGLM achieves significantly better performance than a strong baseline language model. We additionally compare different language model's ability to complete sentences requiring factual knowledge, showing that the KGLM outperforms even very large language models in generating facts. △ Less

Submitted 20 June, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

arXiv:1903.08855 [pdf, other]

Linguistic Knowledge and Transferability of Contextual Representations

Authors: Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith

Abstract: Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model,… ▽ More Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model, and BERT) with a suite of seventeen diverse probing tasks. We find that linear models trained on top of frozen contextual representations are competitive with state-of-the-art task-specific models in many cases, but fail on tasks requiring fine-grained linguistic knowledge (e.g., conjunct identification). To investigate the transferability of contextual word representations, we quantify differences in the transferability of individual layers within contextualizers, especially between recurrent neural networks (RNNs) and transformers. For instance, higher layers of RNNs are more task-specific, while transformer layers do not exhibit the same monotonic trend. In addition, to better understand what makes contextual word representations transferable, we compare language model pretraining with eleven supervised pretraining tasks. For any given task, pretraining on a closely related task yields better performance than language model pretraining (which is better on average) when the pretraining dataset is fixed. However, language model pretraining on more data gives the best results. △ Less

Submitted 25 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: 22 pages, 4 figures; to appear at NAACL 2019

arXiv:1903.05987 [pdf, other]

To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

Authors: Matthew E. Peters, Sebastian Ruder, Noah A. Smith

Abstract: While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with tw… ▽ More While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with two state-of-the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner. △ Less

Submitted 11 June, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

Comments: Proceedings of the 4th Workshop on Representation Learning for NLP

arXiv:1808.08949 [pdf, other]

Dissecting Contextual Word Embeddings: Architecture and Representation

Authors: Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih

Abstract: Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN… ▽ More Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated. △ Less

Submitted 27 September, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

Comments: EMNLP 2018

Showing 1–50 of 65 results for author: Peters, M