Skip to main content

Showing 1–50 of 65 results for author: Peters, M

  1. arXiv:2406.17131  [pdf, other

    stat.ME cs.LG stat.AP

    Bayesian temporal biclustering with applications to multi-subject neuroscience studies

    Authors: Federica Zoe Ricci, Erik B. Sudderth, Jaylen Lee, Megan A. K. Peters, Marina Vannucci, Michele Guindani

    Abstract: We consider the problem of analyzing multivariate time series collected on multiple subjects, with the goal of identifying groups of subjects exhibiting similar trends in their recorded measurements over time as well as time-varying groups of associated measurements. To this end, we propose a Bayesian model for temporal biclustering featuring nested partitions, where a time-invariant partition of… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  2. arXiv:2405.19221  [pdf

    q-bio.QM cs.LG

    Domain adaptation in small-scale and heterogeneous biological datasets

    Authors: Seyedmehdi Orouji, Martin C. Liu, Tal Korem, Megan A. K. Peters

    Abstract: Machine learning techniques are steadily becoming more important in modern biology, and are used to build predictive models, discover patterns, and investigate biological problems. However, models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories, due to differences in the statistical properties of these datasets. These could stem from tech… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: main manuscript + supplement

  3. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  4. arXiv:2402.00159  [pdf, other

    cs.CL

    Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research

    Authors: Luca Soldaini, Rodney Kinney, Akshita Bhagia, Dustin Schwenk, David Atkinson, Russell Authur, Ben Bogin, Khyathi Chandu, Jennifer Dumas, Yanai Elazar, Valentin Hofmann, Ananya Harsh Jha, Sachin Kumar, Li Lucy, Xinxi Lyu, Nathan Lambert, Ian Magnusson, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Abhilasha Ravichander, Kyle Richardson, Zejiang Shen , et al. (11 additional authors not shown)

    Abstract: Information about pretraining corpora used to train the current best-performing language models is seldom discussed: commercial models rarely detail their data, and even open models are often released without accompanying training data or recipes to reproduce them. As a result, it is challenging to conduct and advance scientific research on language modeling, such as understanding how training dat… ▽ More

    Submitted 6 June, 2024; v1 submitted 31 January, 2024; originally announced February 2024.

    Comments: Accepted at ACL 2024; Dataset: https://hf.co/datasets/allenai/dolma; Code: https://github.com/allenai/dolma

  5. arXiv:2312.06235  [pdf, other

    cs.CR

    On The Effect of Replacement Policies on The Security of Randomized Cache Architectures

    Authors: Moritz Peters, Nicolas Gaudin, Jan Philipp Thoma, Vianney Lapôtre, Pascal Cotret, Guy Gogniat, Tim Güneysu

    Abstract: Randomizing the mapping of addresses to cache entries has proven to be an effective technique for hardening caches against contention-based attacks like Prime+Prome. While attacks and defenses are still evolving, it is clear that randomized caches significantly increase the security against such attacks. However, one aspect that is missing from most analyses of randomized cache architectures is th… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  6. arXiv:2312.02034  [pdf, other

    cs.HC

    Trust, distrust, and appropriate reliance in (X)AI: a survey of empirical evaluation of user trust

    Authors: Roel Visser, Tobias M. Peters, Ingrid Scharlau, Barbara Hammer

    Abstract: A current concern in the field of Artificial Intelligence (AI) is to ensure the trustworthiness of AI systems. The development of explainability methods is one prominent way to address this, which has often resulted in the assumption that the use of explainability will lead to an increase in the trust of users and wider society. However, the dynamics between explainability and trust are not well e… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  7. arXiv:2311.10702  [pdf, other

    cs.CL

    Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

    Authors: Hamish Ivison, Yizhong Wang, Valentina Pyatkin, Nathan Lambert, Matthew Peters, Pradeep Dasigi, Joel Jang, David Wadden, Noah A. Smith, Iz Beltagy, Hannaneh Hajishirzi

    Abstract: Since the release of TÜLU [Wang et al., 2023b], open resources for instruction tuning have developed quickly, from better base models to new finetuning techniques. We test and incorporate a number of these advances into TÜLU, resulting in TÜLU 2, a suite of improved TÜLU models for advancing the understanding and best practices of adapting pretrained language models to downstream tasks and user pr… ▽ More

    Submitted 19 November, 2023; v1 submitted 17 November, 2023; originally announced November 2023.

    Comments: technical report; fixed zephyr numbers

  8. arXiv:2310.02074  [pdf, other

    physics.ao-ph cs.LG

    ACE: A fast, skillful learned global atmospheric model for climate prediction

    Authors: Oliver Watt-Meyer, Gideon Dresdner, Jeremy McGibbon, Spencer K. Clark, Brian Henn, James Duncan, Noah D. Brenowitz, Karthik Kashinath, Michael S. Pritchard, Boris Bonev, Matthew E. Peters, Christopher S. Bretherton

    Abstract: Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass… ▽ More

    Submitted 6 December, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted at Tackling Climate Change with Machine Learning: workshop at NeurIPS 2023

  9. arXiv:2308.08708  [pdf, other

    cs.AI cs.CY cs.LG q-bio.NC

    Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

    Authors: Patrick Butlin, Robert Long, Eric Elmoznino, Yoshua Bengio, Jonathan Birch, Axel Constant, George Deane, Stephen M. Fleming, Chris Frith, Xu Ji, Ryota Kanai, Colin Klein, Grace Lindsay, Matthias Michel, Liad Mudrik, Megan A. K. Peters, Eric Schwitzgebel, Jonathan Simon, Rufin VanRullen

    Abstract: Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of con… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

  10. The Importance of Distrust in AI

    Authors: Tobias M. Peters, Roel W. Visser

    Abstract: In recent years the use of Artificial Intelligence (AI) has become increasingly prevalent in a growing number of fields. As AI systems are being adopted in more high-stakes areas such as medicine and finance, ensuring that they are trustworthy is of increasing importance. A concern that is prominently addressed by the development and application of explainability methods, which are purported to in… ▽ More

    Submitted 2 November, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: This preprint has not undergone peer review or any post-submission improvements or corrections. The version of records of this contribution is published in Explainable Artificial Intelligence First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part III (CCIS, volume 1903) and is available at https://doi.org/10.1007/978-3-031-44070-0

    Journal ref: Explainable Artificial Intelligence First World Conference, xAI 2023, Lisbon, Portugal, July 26-28, 2023, Proceedings, Part III (CCIS, volume 1903)

  11. arXiv:2307.09701  [pdf, other

    cs.CL

    Efficiency Pentathlon: A Standardized Arena for Efficiency Evaluation

    Authors: Hao Peng, Qingqing Cao, Jesse Dodge, Matthew E. Peters, Jared Fernandez, Tom Sherborne, Kyle Lo, Sam Skjonsberg, Emma Strubell, Darrell Plessas, Iz Beltagy, Evan Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi

    Abstract: Rising computational demands of modern natural language processing (NLP) systems have increased the barrier to entry for cutting-edge research while posing serious environmental concerns. Yet, progress on model efficiency has been impeded by practical challenges in model evaluation and comparison. For example, hardware is challenging to control due to disparate levels of accessibility across diffe… ▽ More

    Submitted 18 July, 2023; originally announced July 2023.

  12. arXiv:2305.15387  [pdf, other

    cs.CL cs.AI

    Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

    Authors: Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

    Abstract: The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks. In this work, we propose extending this idea by pre-training a generic multi-document model from a novel cross-document question answering pre-training objective. To that end, given a set (or cluster) of topically-related documents, we systemati… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted at ACL 2023; camera-ready version

  13. arXiv:2305.08379  [pdf, other

    cs.CL cs.LG

    TESS: Text-to-Text Self-Conditioned Simplex Diffusion

    Authors: Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

    Abstract: Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various continuous domains. However, applying continuous diffusion models to natural language remains challenging due to its discrete nature and the need for a large number of diffusion steps to generate text, making diffusion-based generation expensive. In this work, we propose Text-to-text Self-c… ▽ More

    Submitted 20 February, 2024; v1 submitted 15 May, 2023; originally announced May 2023.

    Comments: EACL 2024

  14. arXiv:2302.07027  [pdf, other

    cs.CL

    AdapterSoup: Weight Averaging to Improve Generalization of Pretrained Language Models

    Authors: Alexandra Chronopoulou, Matthew E. Peters, Alexander Fraser, Jesse Dodge

    Abstract: Pretrained language models (PLMs) are trained on massive corpora, but often need to specialize to specific domains. A parameter-efficient adaptation method suggests training an adapter for each domain on the task of language modeling. This leads to good in-domain scores but can be impractical for domain- or resource-restricted settings. A solution is to use a related-domain adapter for the novel d… ▽ More

    Submitted 28 March, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

    Comments: Accepted at EACL 2023; camera-ready version; fixed typo in related work

  15. arXiv:2212.10315  [pdf, other

    cs.CL

    HINT: Hypernetwork Instruction Tuning for Efficient Zero- & Few-Shot Generalisation

    Authors: Hamish Ivison, Akshita Bhagia, Yizhong Wang, Hannaneh Hajishirzi, Matthew Peters

    Abstract: Recent NLP models have shown the remarkable ability to effectively generalise `zero-shot' to new tasks using only natural language instructions as guidance. However, many of these approaches suffer from high computational costs due to their reliance on concatenating lengthy instructions with every input example, resulting in costly reprocessing of the instruction. To avoid this, we introduce Hyper… ▽ More

    Submitted 24 May, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: ACL 2023

  16. arXiv:2210.13575  [pdf, other

    cs.CL cs.AI

    Does Self-Rationalization Improve Robustness to Spurious Correlations?

    Authors: Alexis Ross, Matthew E. Peters, Ana Marasović

    Abstract: Rationalization is fundamental to human reasoning and learning. NLP models trained to produce rationales along with predictions, called self-rationalization models, have been investigated for their interpretability and utility to end-users. However, the extent to which training with human-written rationales facilitates learning remains an under-explored question. We ask whether training models to… ▽ More

    Submitted 24 October, 2022; originally announced October 2022.

  17. arXiv:2208.08478  [pdf

    q-bio.NC cs.LG

    "Task-relevant autoencoding" enhances machine learning for human neuroscience

    Authors: Seyedmehdi Orouji, Vincent Taschereau-Dumouchel, Aurelio Cortese, Brian Odegaard, Cody Cushing, Mouslim Cherkaoui, Mitsuo Kawato, Hakwan Lau, Megan A. K. Peters

    Abstract: In human neuroscience, machine learning can help reveal lower-dimensional neural representations relevant to subjects' behavior. However, state-of-the-art models typically require large datasets to train, so are prone to overfitting on human neuroimaging data that often possess few samples but many input dimensions. Here, we capitalized on the fact that the features we seek in human neuroscience a… ▽ More

    Submitted 22 September, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Comments: 41 pages, 11 figures, 5 tables including supplemental material

  18. arXiv:2207.00227  [pdf

    eess.SP cs.RO

    Introducing flexible perovskites to the IoT world using photovoltaic-powered wireless tags

    Authors: Sai Nithin Reddy Kantareddy, Rahul Bhattacharya, Sanjay E. Sarma, Ian Mathews, Janak Thapa, Liu Zhe, Shijing Sun, Ian Marius Peters, Tonio Buonassisi

    Abstract: Billions of everyday objects could become part of the Internet of Things (IoT) by augmentation with low-cost, long-range, maintenance-free wireless sensors. Radio Frequency Identification (RFID) is a low-cost wireless technology that could enable this vision, but it is constrained by short communication range and lack of sufficient energy available to power auxiliary electronics and sensors. Here,… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

  19. arXiv:2205.15772  [pdf, other

    eess.IV cs.CV

    The hybrid approach -- Convolutional Neural Networks and Expectation Maximization Algorithm -- for Tomographic Reconstruction of Hyperspectral Images

    Authors: Mads J. Ahlebæk, Mads S. Peters, Wei-Chih Huang, Mads T. Frandsen, René L. Eriksen, Bjarke Jørgensen

    Abstract: We present a simple but novel hybrid approach to hyperspectral data cube reconstruction from computed tomography imaging spectrometry (CTIS) images that sequentially combines neural networks and the iterative Expectation Maximization (EM) algorithm. We train and test the ability of the method to reconstruct data cubes of $100\times100\times25$ and $100\times100\times100$ voxels, corresponding to 2… ▽ More

    Submitted 19 December, 2022; v1 submitted 31 May, 2022; originally announced May 2022.

    Comments: 36 pages, 13 figures and 2 tables. Supplemental material: 21 pages and 14 figures. v2: Clarifications added, analyses and argumentation updated

  20. arXiv:2205.11961  [pdf, other

    cs.CL

    ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts

    Authors: Akari Asai, Mohammadreza Salehi, Matthew E. Peters, Hannaneh Hajishirzi

    Abstract: This work introduces a new multi-task, parameter-efficient language model (LM) tuning method that learns to transfer knowledge across different tasks via a mixture of soft prompts-small prefix embedding vectors pre-trained for different tasks. Our method, called ATTEMPT (ATTEntional Mixtures of Prompt Tuning), obtains source prompts as encodings of large-scale source tasks into a small number of p… ▽ More

    Submitted 1 December, 2022; v1 submitted 24 May, 2022; originally announced May 2022.

    Comments: Published as a conference paper at EMNLP 2022 (long). Code available at https://github.com/AkariAsai/ATTEMPT

  21. arXiv:2205.05124  [pdf, other

    cs.CL cs.AI cs.LG

    Extracting Latent Steering Vectors from Pretrained Language Models

    Authors: Nishant Subramani, Nivedita Suresh, Matthew E. Peters

    Abstract: Prior work on controllable text generation has focused on learning how to control language models through trainable decoding, smart-prompt design, or fine-tuning based on a desired objective. We hypothesize that the information needed to steer the model to generate a target sentence is already encoded within the model. Accordingly, we explore a different approach altogether: extracting latent vect… ▽ More

    Submitted 10 May, 2022; originally announced May 2022.

    Comments: Accepted to ACL2022 Findings; 16 pages (9 pages plus references and appendices); Code: https://github.com/nishantsubramani/steering_vectors; Some text overlap with arXiv:2008.09049

  22. arXiv:2204.02733  [pdf, other

    cs.CV

    Georeferencing of Photovoltaic Modules from Aerial Infrared Videos using Structure-from-Motion

    Authors: Lukas Bommes, Claudia Buerhop-Lutz, Tobias Pickel, Jens Hauch, Christoph Brabec, Ian Marius Peters

    Abstract: To identify abnormal photovoltaic (PV) modules in large-scale PV plants economically, drone-mounted infrared (IR) cameras and automated video processing algorithms are frequently used. While most related works focus on the detection of abnormal modules, little has been done to automatically localize those modules within the plant. In this work, we use incremental structure-from-motion to automatic… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

  23. arXiv:2203.08304  [pdf, other

    cs.CL

    Hyperdecoders: Instance-specific decoders for multi-task NLP

    Authors: Hamish Ivison, Matthew E. Peters

    Abstract: We investigate input-conditioned hypernetworks for multi-tasking in NLP, generating parameter-efficient adaptations for a decoder using a hypernetwork conditioned on the output of an encoder. This approach produces a unique decoder adaptation for every input instance, allowing the network a larger degree of flexibility than prior work that only produces one decoder adaptation per task. We apply ou… ▽ More

    Submitted 18 October, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

    Comments: Accepted to Findings of EMNLP 2022

  24. arXiv:2203.06211  [pdf, other

    cs.CL

    Staged Training for Transformer Language Models

    Authors: Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew Peters, Iz Beltagy

    Abstract: The current standard approach to scaling transformer language models trains each model size from a different random initialization. As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width. By initializing each stage with the output… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

  25. From Anecdotal Evidence to Quantitative Evaluation Methods: A Systematic Review on Evaluating Explainable AI

    Authors: Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yasmin Schmitt, Jörg Schlötterer, Maurice van Keulen, Christin Seifert

    Abstract: The rising popularity of explainable artificial intelligence (XAI) to understand high-performing black boxes raised the question of how to evaluate explanations of machine learning (ML) models. While interpretability and explainability are often presented as a subjectively validated binary property, we consider it a multi-faceted concept. We identify 12 conceptual properties, such as Compactness a… ▽ More

    Submitted 24 February, 2023; v1 submitted 20 January, 2022; originally announced January 2022.

    Comments: Published in ACM Computing Surveys (DOI http://dx.doi.org/10.1145/3583558). This ArXiv version includes the supplementary material. Website with categorization of XAI methods at https://utwente-dmb.github.io/xai-papers/

  26. arXiv:2112.08786  [pdf, other

    cs.CL

    Efficient Hierarchical Domain Adaptation for Pretrained Language Models

    Authors: Alexandra Chronopoulou, Matthew E. Peters, Jesse Dodge

    Abstract: The remarkable success of large language models has been driven by dense models trained on massive unlabeled, unstructured corpora. These corpora typically contain text from diverse, heterogeneous sources, but information about the source of the text is rarely used during training. Transferring their knowledge to a target domain is typically done by continuing training in-domain. In this paper, we… ▽ More

    Submitted 3 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: NAACL 2022 accepted paper camera ready version

  27. arXiv:2112.02922  [pdf, other

    cs.CV

    Anomaly Detection in IR Images of PV Modules using Supervised Contrastive Learning

    Authors: Lukas Bommes, Mathis Hoffmann, Claudia Buerhop-Lutz, Tobias Pickel, Jens Hauch, Christoph Brabec, Andreas Maier, Ian Marius Peters

    Abstract: Increasing deployment of photovoltaic (PV) plants requires methods for automatic detection of faulty PV modules in modalities, such as infrared (IR) images. Recently, deep learning has become popular for this. However, related works typically sample train and test data from the same distribution ignoring the presence of domain shift between data of different PV plants. Instead, we frame fault dete… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

  28. arXiv:2111.08284  [pdf, other

    cs.CL

    Few-Shot Self-Rationalization with Natural Language Prompts

    Authors: Ana Marasović, Iz Beltagy, Doug Downey, Matthew E. Peters

    Abstract: Self-rationalization models that predict task labels and generate free-text elaborations for their predictions could enable more intuitive interaction with NLP systems. These models are, however, currently trained with a large amount of human-written free-text explanations for each task which hinders their broader usage. We propose to study a more realistic setting of self-rationalization using fe… ▽ More

    Submitted 25 April, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

    Comments: v2: NAACL Findings 2022 accepted paper camera-ready version. First two authors contributed equally. 9 pages main, 3 pages appendix

  29. arXiv:2109.06707  [pdf, other

    cs.LG cs.AI

    A pragmatic approach to estimating average treatment effects from EHR data: the effect of prone positioning on mechanically ventilated COVID-19 patients

    Authors: Adam Izdebski, Patrick J. Thoral, Robbert C. A. Lalisang, Dean M. McHugh, Diederik Gommers, Olaf L. Cremer, Rob J. Bosman, Sander Rigter, Evert-Jan Wils, Tim Frenzel, Dave A. Dongelmans, Remko de Jong, Marco A. A. Peters, Marlijn J. A Kamps, Dharmanand Ramnarain, Ralph Nowitzky, Fleur G. C. A. Nooteboom, Wouter de Ruijter, Louise C. Urlings-Strop, Ellen G. M. Smit, D. Jannet Mehagnoul-Schipper, Tom Dormans, Cornelis P. C. de Jager, Stefaan H. A. Hendriks, Sefanja Achterberg , et al. (21 additional authors not shown)

    Abstract: Despite the recent progress in the field of causal inference, to date there is no agreed upon methodology to glean treatment effect estimation from observational data. The consequence on clinical practice is that, when lacking results from a randomized trial, medical personnel is left without guidance on what seems to be effective in a real-world scenario. This article proposes a pragmatic methodo… ▽ More

    Submitted 3 December, 2021; v1 submitted 14 September, 2021; originally announced September 2021.

  30. arXiv:2108.13640  [pdf, other

    cs.CV

    Module-Power Prediction from PL Measurements using Deep Learning

    Authors: Mathis Hoffmann, Johannes Hepp, Bernd Doll, Claudia Buerhop-Lutz, Ian Marius Peters, Christoph Brabec, Andreas Maier, Vincent Christlein

    Abstract: The individual causes for power loss of photovoltaic modules are investigated for quite some time. Recently, it has been shown that the power loss of a module is, for example, related to the fraction of inactive areas. While these areas can be easily identified from electroluminescense (EL) images, this is much harder for photoluminescence (PL) images. With this work, we close the gap between powe… ▽ More

    Submitted 31 August, 2021; originally announced August 2021.

  31. arXiv:2108.13458  [pdf, other

    eess.IV cs.CV

    The Application of Convolutional Neural Networks for Tomographic Reconstruction of Hyperspectral Images

    Authors: Wei-Chih Huang, Mads Svanborg Peters, Mads Juul Ahlebaek, Mads Toudal Frandsen, René Lynge Eriksen, Bjarke Jørgensen

    Abstract: A novel method, utilizing convolutional neural networks (CNNs), is proposed to reconstruct hyperspectral cubes from computed tomography imaging spectrometer (CTIS) images. Current reconstruction algorithms are usually subject to long reconstruction times and mediocre precision in cases of a large number of spectral channels. The constructed CNNs deliver higher precision and shorter reconstruction… ▽ More

    Submitted 14 March, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: 31 pages, 18 figures and 4 tables. v2: clarifications and references added, analyses and network diagrams updated

  32. arXiv:2107.07150  [pdf, other

    cs.CL

    Tailor: Generating and Perturbing Text with Semantic Controls

    Authors: Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

    Abstract: Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived fr… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 July, 2021; originally announced July 2021.

  33. Computer Vision Tool for Detection, Mapping and Fault Classification of PV Modules in Aerial IR Videos

    Authors: Lukas Bommes, Tobias Pickel, Claudia Buerhop-Lutz, Jens Hauch, Christoph Brabec, Ian Marius Peters

    Abstract: Increasing deployment of photovoltaics (PV) plants demands for cheap and fast inspection. A viable tool for this task is thermographic imaging by unmanned aerial vehicles (UAV). In this work, we develop a computer vision tool for the semi-automatic extraction of PV modules from thermographic UAV videos. We use it to curate a dataset containing 4.3 million IR images of 107842 PV modules from thermo… ▽ More

    Submitted 14 June, 2021; originally announced June 2021.

  34. arXiv:2106.00188  [pdf, other

    cs.CL cs.AI

    PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World

    Authors: Rowan Zellers, Ari Holtzman, Matthew Peters, Roozbeh Mottaghi, Aniruddha Kembhavi, Ali Farhadi, Yejin Choi

    Abstract: We propose PIGLeT: a model that learns physical commonsense knowledge through interaction, and then uses this knowledge to ground language. We factorize PIGLeT into a physical dynamics model, and a separate language model. Our dynamics model learns not just what objects are but also what they do: glass cups break when thrown, plastic ones don't. We then use it as the interface to our language mode… ▽ More

    Submitted 30 January, 2022; v1 submitted 31 May, 2021; originally announced June 2021.

    Comments: ACL 2021 camera ready, project page at https://rowanzellers.com/piglet/

  35. arXiv:2104.08646  [pdf, other

    cs.CL

    Competency Problems: On Finding and Removing Artifacts in Language Data

    Authors: Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

    Abstract: Much recent work in NLP has documented dataset artifacts, bias, and spurious correlations between input features and output labels. However, how to tell which features have "spurious" instead of legitimate correlations is typically left unspecified. In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a… ▽ More

    Submitted 28 December, 2021; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021. This version fixes an error in Proposition 1 and adds discussion (the EMNLP camera ready version is unfixed) (and v3 adds the acknowledgements that we forgot to put into v2)

  36. arXiv:2101.00406  [pdf, other

    cs.CL

    CDLM: Cross-Document Language Modeling

    Authors: Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

    Abstract: We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective. First, instead of considering documents in isolation, we pretrain over sets of multiple related documents, encouraging the model to learn cross-document relationships. Second, we improve over recent long-range transformers by… ▽ More

    Submitted 2 September, 2021; v1 submitted 2 January, 2021; originally announced January 2021.

    Comments: EMNLP 2021, findings

  37. arXiv:2012.13985  [pdf, other

    cs.CL cs.AI

    Explaining NLP Models via Minimal Contrastive Editing (MiCE)

    Authors: Alexis Ross, Ana Marasović, Matthew E. Peters

    Abstract: Humans have been shown to give contrastive explanations, which explain why an observed event happened rather than some other counterfactual event (the contrast case). Despite the influential role that contrastivity plays in how humans explain, this property is largely missing from current methods for explaining NLP models. We present Minimal Contrastive Editing (MiCE), a method for producing contr… ▽ More

    Submitted 23 June, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

  38. arXiv:2011.08115  [pdf, other

    cs.CL

    Learning from Task Descriptions

    Authors: Orion Weller, Nicholas Lourie, Matt Gardner, Matthew E. Peters

    Abstract: Typically, machine learning systems solve new tasks by training on thousands of examples. In contrast, humans can solve new tasks by reading some instructions, with perhaps an example or two. To take a step toward closing this gap, we introduce a framework for developing NLP systems that solve new tasks after reading their descriptions, synthesizing prior work in this area. We instantiate this fra… ▽ More

    Submitted 16 November, 2020; originally announced November 2020.

    Comments: EMNLP 2020

  39. Joint Super-Resolution and Rectification for Solar Cell Inspection

    Authors: Mathis Hoffmann, Thomas Köhler, Bernd Doll, Frank Schebesch, Florian Talkenberg, Ian Marius Peters, Christoph J. Brabec, Andreas Maier, Vincent Christlein

    Abstract: Visual inspection of solar modules is an important monitoring facility in photovoltaic power plants. Since a single measurement of fast CMOS sensors is limited in spatial resolution and often not sufficient to reliably detect small defects, we apply multi-frame super-resolution (MFSR) to a sequence of low resolution measurements. In addition, the rectification and removal of lens distortion simpli… ▽ More

    Submitted 7 April, 2021; v1 submitted 10 November, 2020; originally announced November 2020.

  40. RDCNet: Instance segmentation with a minimalist recurrent residual network

    Authors: Raphael Ortiz, Gustavo de Medeiros, Antoine H. F. M. Peters, Prisca Liberali, Markus Rempfler

    Abstract: Instance segmentation is a key step for quantitative microscopy. While several machine learning based methods have been proposed for this problem, most of them rely on computationally complex models that are trained on surrogate tasks. Building on recent developments towards end-to-end trainable instance segmentation, we propose a minimalist recurrent network called recurrent dilated convolutional… ▽ More

    Submitted 2 October, 2020; originally announced October 2020.

    Comments: Accepted at MICCAI-MLMI 2020 workshop

  41. arXiv:2009.14712  [pdf, other

    cs.CV cs.LG eess.IV

    Deep Learning-based Pipeline for Module Power Prediction from EL Measurements

    Authors: Mathis Hoffmann, Claudia Buerhop-Lutz, Luca Reeb, Tobias Pickel, Thilo Winkler, Bernd Doll, Tobias Würfl, Ian Marius Peters, Christoph Brabec, Andreas Maier, Vincent Christlein

    Abstract: Automated inspection plays an important role in monitoring large-scale photovoltaic power plants. Commonly, electroluminescense measurements are used to identify various types of defects on solar modules but have not been used to determine the power of a module. However, knowledge of the power at maximum power point is important as well, since drops in the power of a single module can affect the p… ▽ More

    Submitted 26 November, 2020; v1 submitted 30 September, 2020; originally announced September 2020.

  42. arXiv:2004.05150  [pdf, other

    cs.CL

    Longformer: The Long-Document Transformer

    Authors: Iz Beltagy, Matthew E. Peters, Arman Cohan

    Abstract: Transformer-based models are unable to process long sequences due to their self-attention operation, which scales quadratically with the sequence length. To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer. Longformer's attention mechanism is a drop-in rep… ▽ More

    Submitted 2 December, 2020; v1 submitted 10 April, 2020; originally announced April 2020.

    Comments: Version 2 introduces the Longformer-Encoder-Decoder (LED) model

  43. arXiv:2002.04108  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Adversarial Filters of Dataset Biases

    Authors: Ronan Le Bras, Swabha Swayamdipta, Chandra Bhagavatula, Rowan Zellers, Matthew E. Peters, Ashish Sabharwal, Yejin Choi

    Abstract: Large neural models have demonstrated human-level performance on language and vision benchmarks, while their performance degrades considerably on adversarial or out-of-distribution samples. This raises the question of whether these models have learned to solve a dataset rather than the underlying task by overfitting to spurious dataset biases. We investigate one recently proposed approach, AFLite,… ▽ More

    Submitted 10 July, 2020; v1 submitted 10 February, 2020; originally announced February 2020.

    Comments: Accepted to ICML 2020

  44. arXiv:1909.05818  [pdf

    eess.SP cs.ET cs.NI

    Long range battery-less PV-powered RFID tag sensors

    Authors: Sai Nithin R. Kantareddy, Ian Mathews, Rahul Bhattacharyya, Ian Marius Peters, Tonio Buonassisi, Sanjay E. Sarma

    Abstract: Communication range in passive Radio-Frequency Identification (RFID) front-end devices is a critical barrier in the real-world implementation of this low-cost technology. Purely passive RFID tags power up by harvesting the limited RF energy transmitted by the interrogator, and communicate by backscattering the incident signal. This mode of communication keeps manufacturing costs below a few cents… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Journal ref: IEEE Internet of Things, 2019

  45. arXiv:1909.04164  [pdf, other

    cs.CL

    Knowledge Enhanced Contextual Word Representations

    Authors: Matthew E. Peters, Mark Neumann, Robert L. Logan IV, Roy Schwartz, Vidur Joshi, Sameer Singh, Noah A. Smith

    Abstract: Contextual word representations, typically trained on unstructured, unlabeled text, do not contain any explicit grounding to real world entities and are often unable to remember facts about those entities. We propose a general method to embed multiple knowledge bases (KBs) into large scale models, and thereby enhance their representations with structured, human-curated knowledge. For each KB, we f… ▽ More

    Submitted 30 October, 2019; v1 submitted 9 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  46. arXiv:1908.11047  [pdf, other

    cs.CL

    Shallow Syntax in Deep Water

    Authors: Swabha Swayamdipta, Matthew Peters, Brendan Roof, Chris Dyer, Noah A. Smith

    Abstract: Shallow syntax provides an approximation of phrase-syntactic structure of sentences; it can be produced with high accuracy, and is computationally cheap to obtain. We investigate the role of shallow syntax-aware representations for NLP tasks using two techniques. First, we enhance the ELMo architecture to allow pretraining on predicted shallow syntactic parses, instead of just raw text, so that co… ▽ More

    Submitted 29 August, 2019; originally announced August 2019.

  47. arXiv:1906.07241  [pdf, other

    cs.CL

    Barack's Wife Hillary: Using Knowledge-Graphs for Fact-Aware Language Modeling

    Authors: Robert L. Logan IV, Nelson F. Liu, Matthew E. Peters, Matt Gardner, Sameer Singh

    Abstract: Modeling human language requires the ability to not only generate fluent text but also encode factual knowledge. However, traditional language models are only capable of remembering facts seen at training time, and often have difficulty recalling them. To address this, we introduce the knowledge graph language model (KGLM), a neural language model with mechanisms for selecting and copying facts fr… ▽ More

    Submitted 20 June, 2019; v1 submitted 17 June, 2019; originally announced June 2019.

  48. arXiv:1903.08855  [pdf, other

    cs.CL

    Linguistic Knowledge and Transferability of Contextual Representations

    Authors: Nelson F. Liu, Matt Gardner, Yonatan Belinkov, Matthew E. Peters, Noah A. Smith

    Abstract: Contextual word representations derived from large-scale neural language models are successful across a diverse set of NLP tasks, suggesting that they encode useful and transferable features of language. To shed light on the linguistic knowledge they capture, we study the representations produced by several recent pretrained contextualizers (variants of ELMo, the OpenAI transformer language model,… ▽ More

    Submitted 25 April, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

    Comments: 22 pages, 4 figures; to appear at NAACL 2019

  49. arXiv:1903.05987  [pdf, other

    cs.CL cs.LG

    To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks

    Authors: Matthew E. Peters, Sebastian Ruder, Noah A. Smith

    Abstract: While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly fine-tuning the pretrained model. Our empirical results across diverse NLP tasks with tw… ▽ More

    Submitted 11 June, 2019; v1 submitted 14 March, 2019; originally announced March 2019.

    Comments: Proceedings of the 4th Workshop on Representation Learning for NLP

  50. arXiv:1808.08949  [pdf, other

    cs.CL

    Dissecting Contextual Word Embeddings: Architecture and Representation

    Authors: Matthew E. Peters, Mark Neumann, Luke Zettlemoyer, Wen-tau Yih

    Abstract: Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN… ▽ More

    Submitted 27 September, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018