-
Competition-Level Code Generation with AlphaCode
Authors:
Yujia Li,
David Choi,
Junyoung Chung,
Nate Kushman,
Julian Schrittwieser,
Rémi Leblond,
Tom Eccles,
James Keeling,
Felix Gimeno,
Agustin Dal Lago,
Thomas Hubert,
Peter Choy,
Cyprien de Masson d'Autume,
Igor Babuschkin,
Xinyun Chen,
Po-Sen Huang,
Johannes Welbl,
Sven Gowal,
Alexey Cherepanov,
James Molloy,
Daniel J. Mankowitz,
Esme Sutherland Robson,
Pushmeet Kohli,
Nando de Freitas,
Koray Kavukcuoglu
, et al. (1 additional authors not shown)
Abstract:
Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple…
▽ More
Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions.
△ Less
Submitted 8 February, 2022;
originally announced March 2022.
-
Moiré dispersion of edge states in spin chains on superconductors
Authors:
Cristina Mier,
Deung-Jang Choi,
Nicolás Lorente
Abstract:
Our calculations of ferromagnetic spin chains on s-wave superconductors show that the energy oscillations of edge states with the chain's length are due to a moiré pattern emerging from Friedel-like oscillations and the discreteness of the spin-chain lattice. By modifying the spin lattice, the moiré dispersion of edge states can be controlled. In particular, we can engineer non-dispersive edge sta…
▽ More
Our calculations of ferromagnetic spin chains on s-wave superconductors show that the energy oscillations of edge states with the chain's length are due to a moiré pattern emerging from Friedel-like oscillations and the discreteness of the spin-chain lattice. By modifying the spin lattice, the moiré dispersion of edge states can be controlled. In particular, we can engineer non-dispersive edge states that remain at fixed energy regardless of the size distribution of the spin chains. This is an important step in the study of edge states of spin chains that can be fabricated with a certain size dispersion.
△ Less
Submitted 11 March, 2022;
originally announced March 2022.
-
SPICEprop: Backpropagating Errors Through Memristive Spiking Neural Networks
Authors:
Peng Zhou,
Jason K. Eshraghian,
Dong-Uk Choi,
Sung-Mo Kang
Abstract:
We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Sy…
▽ More
We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Synaptic weights are trained by BPTT using the membrane potential of the MIF neuron model and can be processed on memristive crossbars. The natural spiking dynamics of the MIF neuron model are fully differentiable, eliminating the need for gradient approximations that are prevalent in the spiking neural network literature. Despite the added complexity of training directly on SPICE circuit models, we achieve 97.58% accuracy on the MNIST testing dataset and 75.26% on the Fashion-MNIST testing dataset, the highest accuracies among all fully MSNNs.
△ Less
Submitted 9 March, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
A Fully Memristive Spiking Neural Network with Unsupervised Learning
Authors:
Peng Zhou,
Dong-Uk Choi,
Jason K. Eshraghian,
Sung-Mo Kang
Abstract:
We present a fully memristive spiking neural network (MSNN) consisting of physically-realizable memristive neurons and memristive synapses to implement an unsupervised Spiking Time Dependent Plasticity (STDP) learning rule. The system is fully memristive in that both neuronal and synaptic dynamics can be realized by using memristors. The neuron is implemented using the SPICE-level memristive integ…
▽ More
We present a fully memristive spiking neural network (MSNN) consisting of physically-realizable memristive neurons and memristive synapses to implement an unsupervised Spiking Time Dependent Plasticity (STDP) learning rule. The system is fully memristive in that both neuronal and synaptic dynamics can be realized by using memristors. The neuron is implemented using the SPICE-level memristive integrate-and-fire (MIF) model, which consists of a minimal number of circuit elements necessary to achieve distinct depolarization, hyperpolarization, and repolarization voltage waveforms. The proposed MSNN uniquely implements STDP learning by using cumulative weight changes in memristive synapses from the voltage waveform changes across the synapses, which arise from the presynaptic and postsynaptic spiking voltage signals during the training process. Two types of MSNN architectures are investigated: 1) a biologically plausible memory retrieval system, and 2) a multi-class classification system. Our circuit simulation results verify the MSNN's unsupervised learning efficacy by replicating biological memory retrieval mechanisms, and achieving 97.5% accuracy in a 4-pattern recognition problem in a large scale discriminative MSNN.
△ Less
Submitted 9 March, 2022; v1 submitted 2 March, 2022;
originally announced March 2022.
-
PT4AL: Using Self-Supervised Pretext Tasks for Active Learning
Authors:
John Seon Keun Yi,
Minseok Seo,
Jongchan Park,
Dong-Geol Choi
Abstract:
Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext t…
▽ More
Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set.
△ Less
Submitted 26 July, 2022; v1 submitted 19 January, 2022;
originally announced January 2022.
-
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
Authors:
Kaustubh D. Dhole,
Varun Gangal,
Sebastian Gehrmann,
Aadesh Gupta,
Zhenhao Li,
Saad Mahamood,
Abinaya Mahendiran,
Simon Mille,
Ashish Shrivastava,
Samson Tan,
Tongshuang Wu,
Jascha Sohl-Dickstein,
Jinho D. Choi,
Eduard Hovy,
Ondrej Dusek,
Sebastian Ruder,
Sajant Anand,
Nagender Aneja,
Rabin Banjade,
Lisa Barthe,
Hanna Behnke,
Ian Berlot-Attwell,
Connor Boyle,
Caroline Brun,
Marco Antonio Sobrevilla Cabezudo
, et al. (101 additional authors not shown)
Abstract:
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split…
▽ More
Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter).
△ Less
Submitted 11 October, 2022; v1 submitted 5 December, 2021;
originally announced December 2021.
-
The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete Release of MaNGA, MaStar and APOGEE-2 Data
Authors:
Abdurro'uf,
Katherine Accetta,
Conny Aerts,
Victor Silva Aguirre,
Romina Ahumada,
Nikhil Ajgaonkar,
N. Filiz Ak,
Shadab Alam,
Carlos Allende Prieto,
Andres Almeida,
Friedrich Anders,
Scott F. Anderson,
Brett H. Andrews,
Borja Anguiano,
Erik Aquino-Ortiz,
Alfonso Aragon-Salamanca,
Maria Argudo-Fernandez,
Metin Ata,
Marie Aubert,
Vladimir Avila-Reese,
Carles Badenes,
Rodolfo H. Barba,
Kat Barger,
Jorge K. Barrera-Ballesteros,
Rachael L. Beaton
, et al. (316 additional authors not shown)
Abstract:
This paper documents the seventeenth data release (DR17) from the Sloan Digital Sky Surveys; the fifth and final release from the fourth phase (SDSS-IV). DR17 contains the complete release of the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, which reached its goal of surveying over 10,000 nearby galaxies. The complete release of the MaNGA Stellar Library (MaStar) accompanies…
▽ More
This paper documents the seventeenth data release (DR17) from the Sloan Digital Sky Surveys; the fifth and final release from the fourth phase (SDSS-IV). DR17 contains the complete release of the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, which reached its goal of surveying over 10,000 nearby galaxies. The complete release of the MaNGA Stellar Library (MaStar) accompanies this data, providing observations of almost 30,000 stars through the MaNGA instrument during bright time. DR17 also contains the complete release of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) survey which publicly releases infra-red spectra of over 650,000 stars. The main sample from the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), as well as the sub-survey Time Domain Spectroscopic Survey (TDSS) data were fully released in DR16. New single-fiber optical spectroscopy released in DR17 is from the SPectroscipic IDentification of ERosita Survey (SPIDERS) sub-survey and the eBOSS-RM program. Along with the primary data sets, DR17 includes 25 new or updated Value Added Catalogs (VACs). This paper concludes the release of SDSS-IV survey data. SDSS continues into its fifth phase with observations already underway for the Milky Way Mapper (MWM), Local Volume Mapper (LVM) and Black Hole Mapper (BHM) surveys.
△ Less
Submitted 13 January, 2022; v1 submitted 3 December, 2021;
originally announced December 2021.
-
Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph
Authors:
Liyan Xu,
Xuchao Zhang,
Bo Zong,
Yanchi Liu,
Wei Cheng,
Jingchao Ni,
Haifeng Chen,
Liang Zhao,
Jinho D. Choi
Abstract:
We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to…
▽ More
We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path.
△ Less
Submitted 15 March, 2022; v1 submitted 1 December, 2021;
originally announced December 2021.
-
What Went Wrong? Explaining Overall Dialogue Quality through Utterance-Level Impacts
Authors:
James D. Finch,
Sarah E. Finch,
Jinho D. Choi
Abstract:
Improving user experience of a dialogue system often requires intensive developer effort to read conversation logs, run statistical analyses, and intuit the relative importance of system shortcomings. This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality. Unlike prior work on uttera…
▽ More
Improving user experience of a dialogue system often requires intensive developer effort to read conversation logs, run statistical analyses, and intuit the relative importance of system shortcomings. This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality. Unlike prior work on utterance-level quality prediction, our approach learns the impact of each interaction from the overall user rating without utterance-level annotation, allowing resultant model conclusions to be derived on the basis of empirical evidence and at low cost. Our model identifies interactions that have a strong correlation with the overall dialogue quality in a chatbot setting. Experiments show that the automated analysis from our model agrees with expert judgments, making this work the first to show that such weakly-supervised learning of utterance-level quality prediction is highly achievable.
△ Less
Submitted 31 October, 2021;
originally announced November 2021.
-
An Approach to Inference-Driven Dialogue Management within a Social Chatbot
Authors:
Sarah E. Finch,
James D. Finch,
Daniil Huryn,
William Hutsell,
Xiaoyuan Huang,
Han He,
Jinho D. Choi
Abstract:
We present a chatbot implementing a novel dialogue management approach based on logical inference. Instead of framing conversation a sequence of response generation tasks, we model conversation as a collaborative inference process in which speakers share information to synthesize new knowledge in real time. Our chatbot pipeline accomplishes this modelling in three broad stages. The first stage tra…
▽ More
We present a chatbot implementing a novel dialogue management approach based on logical inference. Instead of framing conversation a sequence of response generation tasks, we model conversation as a collaborative inference process in which speakers share information to synthesize new knowledge in real time. Our chatbot pipeline accomplishes this modelling in three broad stages. The first stage translates user utterances into a symbolic predicate representation. The second stage then uses this structured representation in conjunction with a larger knowledge base to synthesize new predicates using efficient graph matching. In the third and final stage, our bot selects a small subset of predicates and translates them into an English response. This approach lends itself to understanding latent semantics of user inputs, flexible initiative taking, and responses that are novel and coherent with the dialogue context.
△ Less
Submitted 31 October, 2021;
originally announced November 2021.
-
Improving Object Permanence using Agent Actions and Reasoning
Authors:
Ying Siu Liang,
Chen Zhang,
Dongkyu Choi,
Kenneth Kwok
Abstract:
Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation act…
▽ More
Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation actions performed on an object prior to its disappearance allows us to reason about its location, e.g., that the object has been placed in a carrier. In this paper we argue that object permanence can be improved when the robot uses knowledge about executed actions and describe an approach to infer hidden object states from agent actions. We show that considering agent actions not only improves rule-based reasoning models but also purely neural approaches, showing its general applicability. Then, we conduct quantitative experiments on a snitch localization task using a dataset of 1,371 synthesized videos, where we compare the performance of different object permanence models with and without action annotations. We demonstrate that models with action annotations can significantly increase performance of both neural and rule-based approaches. Finally, we evaluate the usability of our approach in real-world applications by conducting qualitative experiments with two Universal Robots (UR5 and UR16e) in both lab and industrial settings. The robots complete benchmark tasks for a gearbox assembly and demonstrate the object permanence capabilities with real sensor data in an industrial environment.
△ Less
Submitted 1 October, 2021;
originally announced October 2021.
-
Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope
Authors:
Gregor Williamson,
Patrick Elliott,
Yuxin Ji,
Jinho D. Choi
Abstract:
Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. However, at present it is unable to satisfyingly represent non-veridical intensional contexts, often licensing inappropriate inferences. In this paper, we show how to resolve the problem of non-veridicality without appealing to layered graph…
▽ More
Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. However, at present it is unable to satisfyingly represent non-veridical intensional contexts, often licensing inappropriate inferences. In this paper, we show how to resolve the problem of non-veridicality without appealing to layered graphs through a mapping from AMRs into Simply-Typed Lambda Calculus (STLC). At least for some cases, this requires the introduction of a new role :content which functions as an intensional operator. The translation proposed is inspired by the formal linguistics literature on the event semantics of attitude reports. Next, we address the interaction of quantifier scope and intensional operators in so-called de re/de dicto ambiguities. We adopt a scope node from the literature and provide an explicit multidimensional semantics utilizing Cooper storage which allows us to derive the de re and de dicto scope readings as well as intermediate scope readings which prove difficult for accounts without a scope node.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
StreamSide: A Fully-Customizable Open-Source Toolkit for Efficient Annotation of Meaning Representations
Authors:
Jinho D. Choi,
Gregor Williamson
Abstract:
This demonstration paper presents StreamSide, an open-source toolkit for annotating multiple kinds of meaning representations. StreamSide supports frame-based annotation schemes e.g., Abstract Meaning Representation (AMR) and frameless annotation schemes e.g., Widely Interpretable Semantic Representation (WISeR). Moreover, it supports both sentence-level and document-level annotation by allowing a…
▽ More
This demonstration paper presents StreamSide, an open-source toolkit for annotating multiple kinds of meaning representations. StreamSide supports frame-based annotation schemes e.g., Abstract Meaning Representation (AMR) and frameless annotation schemes e.g., Widely Interpretable Semantic Representation (WISeR). Moreover, it supports both sentence-level and document-level annotation by allowing annotators to create multi-rooted graphs for input text. It can open and automatically convert between several types of input formats including plain text, Penman notation, and its own JSON format enabling richer annotation. It features reference frames for AMR predicate argument structures, and also concept-to-text alignment. StreamSide is released under the Apache 2.0 license, and is completely open-source so that it can be customized to annotate enriched meaning representations in different languages (e.g., Uniform Meaning Representations). All StreamSide resources are publicly distributed through our open source project at: https://github.com/emorynlp/StreamSide.
△ Less
Submitted 20 September, 2021;
originally announced September 2021.
-
The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders
Authors:
Han He,
Jinho D. Choi
Abstract:
Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task le…
▽ More
Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis.
△ Less
Submitted 14 September, 2021;
originally announced September 2021.
-
ELIT: Emory Language and Information Toolkit
Authors:
Han He,
Liyan Xu,
Jinho D. Choi
Abstract:
We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatizat…
▽ More
We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatization, part-of-speech tagging, named entity recognition, dependency parsing, constituency parsing, semantic role labeling, and AMR parsing. The backbone of ELIT's MTL framework is a pre-trained transformer encoder that is shared across tasks to speed up their inference. ELIT provides pre-trained models developed on a remix of eight datasets. To scale up its service, ELIT also integrates a RESTful Client/Server combination. On the server side, ELIT extends its functionality to cover other tasks such as tokenization and coreference resolution, providing an end user with agile research experience. All resources including the source codes, documentation, and pre-trained models are publicly available at https://github.com/emorynlp/elit.
△ Less
Submitted 8 September, 2021;
originally announced September 2021.
-
Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation
Authors:
Liyan Xu,
Xuchao Zhang,
Xujiang Zhao,
Haifeng Chen,
Feng Chen,
Jinho D. Choi
Abstract:
Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Thre…
▽ More
Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 on average for NER and 2.5 accuracy score for NLI.
△ Less
Submitted 23 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues
Authors:
Liyan Xu,
Jinho D. Choi
Abstract:
We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adapt…
▽ More
We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adaptation strategies, they are shown to bring significant impact to the final performance, with up to 27 F1 improvement over the baseline. Our final system ranks the 1st place on the leaderboard of the anaphora resolution track in the CRAC 2021 shared task, and achieves the best evaluation results on all four datasets.
△ Less
Submitted 23 September, 2021; v1 submitted 1 September, 2021;
originally announced September 2021.
-
Calculations of in-gap states of ferromagnetic spin chains on \textit{s}-wave wide-band superconductors
Authors:
Cristina Mier,
deung-Jang Choi,
Nicolás Lorente
Abstract:
Magnetic impurities create in-gap states on superconductors. Recent experiments explore the topological properties of one-dimensional arrays of magnetic impurities on superconductors, because in certain regimes p-wave pairing can be locally induced leading to new topological phases. A by-product of the new accessible phases is the appearance of zero-energy edge states that have non-Abelian exchang…
▽ More
Magnetic impurities create in-gap states on superconductors. Recent experiments explore the topological properties of one-dimensional arrays of magnetic impurities on superconductors, because in certain regimes p-wave pairing can be locally induced leading to new topological phases. A by-product of the new accessible phases is the appearance of zero-energy edge states that have non-Abelian exchange properties and can be used for topological quantum computation. Despite the large amount of theory devoted to these systems, most treatments use approximations that render their applicability limited when comparing with usual experiments of 1-D impurity arrays on wide-band superconductors. These approximations either involve tight-binding-like approximations where the impurity energy scales match the minute energy scale of the superconducting gap and are many times unrealistic, or they assume strongly-bound in-gap states. Here, we present a theory for s-wave superconductors based on a wide-band normal metal, with any possible energy scale for the magnetic impurities. The theory is based on free-electron Green's functions. We include Rashba coupling and compare with recent experimental results, permitting us to analyze the topological phases and the experimental edge states. The infinite-chain properties can be analytically obtained, giving us a way to compare with finite-chain calculations. We show that it is possible to converge to the infinite limit by doing finite numerical calculation, paving the way for numerical calculations not based on analytical Green's functions.
△ Less
Submitted 25 August, 2021;
originally announced August 2021.
-
An electron-spin qubit platform assembled atom-by-atom on a surface
Authors:
Yu Wang,
Yi Chen,
Hong T. Bui,
Christoph Wolf,
Masahiro Haze,
Cristina Mier,
Jinkyung Kim,
Deung-jang Choi,
Christopher P. Lutz,
Yujeong Bae,
Soo-Hyon Phark,
Andreas J. Heinrich
Abstract:
Creating a quantum-coherent architecture at the atomic scale has long been an ambition in quantum science and nanotechnology. This ultimate length scale requires the use of fundamental quantum properties of atoms, such as the spin of electrons, which naturally occurs in many solid-state environments and allows high-fidelity operations and readout by electromagnetic means. Despite decades of effort…
▽ More
Creating a quantum-coherent architecture at the atomic scale has long been an ambition in quantum science and nanotechnology. This ultimate length scale requires the use of fundamental quantum properties of atoms, such as the spin of electrons, which naturally occurs in many solid-state environments and allows high-fidelity operations and readout by electromagnetic means. Despite decades of effort, however, it remains a formidable task to realize an atomic-scale quantum architecture where multiple electron spin qubits can be precisely assembled, controllably coupled, and coherently operated. Electron spin qubits created in dopants in semiconductors and color centers in insulators, for example, can be well controlled individually6-8 but are difficult to couple together into a circuit. On the other hand, multiple magnetic atoms and molecules on surfaces can be coupled to each other by building sophisticated atomic structures using a scanning tunneling microscope (STM), but coherent operation has so far been limited to a single qubit in the tunnel junction. Here we demonstrate an atomic-scale qubit platform by showing atom-by-atom construction, coherent operations, and readout of multiple electron-spin qubits on a surface. To enable the coherent control of remote qubits that are outside the tunnel junction, we complement each electron spin with a local magnetic field gradient from a nearby single-atom magnet. To enable readout of remote qubits, we employ a sensor qubit in the tunnel junction and implement pulsed double electron spin resonance. Using these methods, we demonstrate fast single-, two-, and three-qubit operations in an all-electrical fashion. Our work marks the creation of an Angstrom-scale qubit platform, where quantum functionalities using electron spin arrays, built atom-by-atom on a surface, are now within reach.
△ Less
Submitted 5 August, 2022; v1 submitted 22 August, 2021;
originally announced August 2021.
-
Exploiting Features with Split-and-Share Module
Authors:
Jaemin Lee,
Minseok Seo,
Jongchan Park,
Dong-Geol Choi
Abstract:
Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, whic…
▽ More
Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, which are partially shared by multiple sub-classifiers. Our intuition is that the more the features are shared, the more common they will become, and SSM can encourage such structural characteristics in the split features. SSM can be easily integrated into any architecture without bells and whistles. We have extensively validated the efficacy of SSM on ImageNet-1K classification task, andSSM has shown consistent and significant improvements over baseline architectures. In addition, we analyze the effect of SSM using the Grad-CAM visualization.
△ Less
Submitted 10 August, 2021; v1 submitted 10 August, 2021;
originally announced August 2021.
-
Security and Privacy Enhanced Gait Authentication with Random Representation Learning and Digital Lockers
Authors:
Lam Tran,
Thuc Nguyen,
Hyunil Kim,
Deokjai Choi
Abstract:
Gait data captured by inertial sensors have demonstrated promising results on user authentication. However, most existing approaches stored the enrolled gait pattern insecurely for matching with the validating pattern, thus, posed critical security and privacy issues. In this study, we present a gait cryptosystem that generates from gait data the random key for user authentication, meanwhile, secu…
▽ More
Gait data captured by inertial sensors have demonstrated promising results on user authentication. However, most existing approaches stored the enrolled gait pattern insecurely for matching with the validating pattern, thus, posed critical security and privacy issues. In this study, we present a gait cryptosystem that generates from gait data the random key for user authentication, meanwhile, secures the gait pattern. First, we propose a revocable and random binary string extraction method using a deep neural network followed by feature-wise binarization. A novel loss function for network optimization is also designed, to tackle not only the intrauser stability but also the inter-user randomness. Second, we propose a new biometric key generation scheme, namely Irreversible Error Correct and Obfuscate (IECO), improved from the Error Correct and Obfuscate (ECO) scheme, to securely generate from the binary string the random and irreversible key. The model was evaluated with two benchmark datasets as OU-ISIR and whuGAIT. We showed that our model could generate the key of 139 bits from 5-second data sequence with zero False Acceptance Rate (FAR) and False Rejection Rate (FRR) smaller than 5.441%. In addition, the security and user privacy analyses showed that our model was secure against existing attacks on biometric template protection, and fulfilled irreversibility and unlinkability.
△ Less
Submitted 5 August, 2021;
originally announced August 2021.
-
Nonlinear imaging of nanoscale topological corner states
Authors:
Sergey S. Kruk,
Wenlong Gao,
Duk-Yong Choi,
Thomas Zentgraf,
Shuang Zhang,
Yuri Kivshar
Abstract:
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to…
▽ More
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to (N-1)-dimensional boundary states. The higher-order bulk-boundary correspondence relates N-dimensional bulk to boundary states with dimensionality reduced by more than 1. A special interest lies in miniaturization of such higher-order topological states to the nanoscale. Here, we realize nanoscale topological corner states in metasurfaces with C6-symmetric honeycomb lattices. We directly observe nanoscale topology-empowered edge and corner localizations of light and enhancement of light-matter interactions via a nonlinear imaging technique. Control of light at the nanoscale empowered by topology may facilitate miniaturization and on-chip integration of classical and quantum photonic devices.
△ Less
Submitted 28 July, 2021;
originally announced July 2021.
-
A flexible design platform for Si/SiGe exchange-only qubits with low disorder
Authors:
Wonill Ha,
Sieu D. Ha,
Maxwell D. Choi,
Yan Tang,
Adele E. Schmitz,
Mark P. Levendorf,
Kangmu Lee,
James M. Chappell,
Tower S. Adams,
Daniel R. Hulbert,
Edwin Acuna,
Ramsey S. Noah,
Justine W. Matten,
Michael P. Jura,
Jeffrey A. Wright,
Matthew T. Rakher,
Matthew G. Borselli
Abstract:
Spin-based silicon quantum dots are an attractive qubit technology for quantum information processing with respect to coherence time, control, and engineering. Here we present an exchange-only Si qubit device platform that combines the throughput of CMOS-like wafer processing with the versatility of direct-write lithography. The technology, which we coin "SLEDGE," features dot-shaped gates that ar…
▽ More
Spin-based silicon quantum dots are an attractive qubit technology for quantum information processing with respect to coherence time, control, and engineering. Here we present an exchange-only Si qubit device platform that combines the throughput of CMOS-like wafer processing with the versatility of direct-write lithography. The technology, which we coin "SLEDGE," features dot-shaped gates that are patterned simultaneously on one topographical plane and subsequently connected by vias to interconnect metal lines. The process design enables non-trivial layouts as well as flexibility in gate dimensions, material selection, and additional device features such as for rf qubit control. We show that the SLEDGE process has reduced electrostatic disorder with respect to traditional overlapping gate devices with lift-off metallization, and we present spin coherent exchange oscillations and single qubit blind randomized benchmarking data.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
Levi Graph AMR Parser using Heterogeneous Attention
Authors:
Han He,
Jinho D. Choi
Abstract:
Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsin…
▽ More
Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsing by combining heterogeneous data (tokens, concepts, labels) as one input to a transformer to learn attention, and use only attention matrices from the transformer to predict all elements in AMR graphs (concepts, arcs, labels). Although our models use significantly fewer parameters than the previous state-of-the-art graph parser, they show similar or better accuracy on AMR 2.0 and 3.0.
△ Less
Submitted 8 July, 2021;
originally announced July 2021.
-
Maintaining a Reliable World Model using Action-aware Perceptual Anchoring
Authors:
Ying Siu Liang,
Dongkyu Choi,
Kenneth Kwok
Abstract:
Reliable perception is essential for robots that interact with the world. But sensors alone are often insufficient to provide this capability, and they are prone to errors due to various conditions in the environment. Furthermore, there is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible. This requires anchoring perceptual informa…
▽ More
Reliable perception is essential for robots that interact with the world. But sensors alone are often insufficient to provide this capability, and they are prone to errors due to various conditions in the environment. Furthermore, there is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible. This requires anchoring perceptual information onto symbols that represent the objects in the environment. In this paper, we present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner. Our rule-based approach considers inductive biases to perform high-level reasoning over the results from low-level object detection, and it improves the robot's perceptual capability for complex tasks. We evaluate our model against existing baseline models for object permanence and show that it outperforms these on a snitch localisation task using a dataset of 1,371 videos. We also integrate our action-aware perceptual anchoring in the context of a cognitive architecture and demonstrate its benefits in a realistic gearbox assembly task on a Universal Robot.
△ Less
Submitted 7 July, 2021;
originally announced July 2021.
-
Pool of Experts: Realtime Querying Specialized Knowledge in Massive Neural Networks
Authors:
Hakbin Kim,
Dong-Wan Choi
Abstract:
In spite of the great success of deep learning technologies, training and delivery of a practically serviceable model is still a highly time-consuming process. Furthermore, a resulting model is usually too generic and heavyweight, and hence essentially goes through another expensive model compression phase to fit in a resource-limited device like embedded systems. Inspired by the fact that a machi…
▽ More
In spite of the great success of deep learning technologies, training and delivery of a practically serviceable model is still a highly time-consuming process. Furthermore, a resulting model is usually too generic and heavyweight, and hence essentially goes through another expensive model compression phase to fit in a resource-limited device like embedded systems. Inspired by the fact that a machine learning task specifically requested by mobile users is often much simpler than it is supported by a massive generic model, this paper proposes a framework, called Pool of Experts (PoE), that instantly builds a lightweight and task-specific model without any training process. For a realtime model querying service, PoE first extracts a pool of primitive components, called experts, from a well-trained and sufficiently generic network by exploiting a novel conditional knowledge distillation method, and then performs our train-free knowledge consolidation to quickly combine necessary experts into a lightweight network for a target task. Thanks to this train-free property, in our thorough empirical study, PoE can build a fairly accurate yet compact model in a realtime manner, whereas it takes a few minutes per query for the other training methods to achieve a similar level of the accuracy.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
Split-and-Bridge: Adaptable Class Incremental Learning within a Single Neural Network
Authors:
Jong-Yeong Kim,
Dong-Wan Choi
Abstract:
Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use…
▽ More
Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use a soft label for the old task, namely a knowledge distillation (KD) loss, together with a class label for the new task, namely a cross entropy (CE) loss, to form a composite loss for a single neural network. However, this approach suffers from learning the knowledge by a CE loss as a KD loss often more strongly influences the objective function when they are in a competitive situation within a single network. This could be a critical problem particularly in a class incremental scenario, where the knowledge across tasks as well as within the new task, both of which can only be acquired by a CE loss, is essentially learned due to the existence of a unified classifier. In this paper, we propose a novel continual learning method, called Split-and-Bridge, which can successfully address the above problem by partially splitting a neural network into two partitions for training the new task separated from the old task and re-connecting them for learning the knowledge across tasks. In our thorough experimental analysis, our Split-and-Bridge method outperforms the state-of-the-art competitors in KD-based continual learning.
△ Less
Submitted 3 July, 2021;
originally announced July 2021.
-
New Estimands for Experiments with Strong Interference
Authors:
David Choi
Abstract:
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estim…
▽ More
In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estimated without such assumptions, allowing for interval estimates assuming only the randomization of treatment. However, the causal implications of these estimands are more limited than those attainable under stronger assumptions, showing only that the treatment effects under the observed assignment varied systematically as a function of each unit's direct and indirect exposure, while also lower bounding the number of units affected.
△ Less
Submitted 29 August, 2023; v1 submitted 1 July, 2021;
originally announced July 2021.
-
TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration
Authors:
Dongjin Choi,
Sara Evensen,
Çağatay Demiralp,
Estevam Hruschka
Abstract:
Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, inclu…
▽ More
Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks.
△ Less
Submitted 24 June, 2021;
originally announced June 2021.
-
Non-archimedean Sendov's Conjecture
Authors:
Daebeom Choi,
Seewoo Lee
Abstract:
We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture.
We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture.
△ Less
Submitted 4 July, 2021; v1 submitted 21 June, 2021;
originally announced June 2021.
-
Measuring the repertoire of age-related behavioral changes in Drosophila melanogaster
Authors:
Katherine E. Overman,
Daniel M. Choi,
Kawai Leung,
Joshua W. Shaevitz,
Gordon J. Berman
Abstract:
Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors…
▽ More
Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors as a function of age. We observe a sexually dimorphic pattern of changes in the behavioral repertoire during aging. Although the stereotypy of the behaviors and the complexity of the repertoire overall remains relatively unchanged, we find evidence that the observed alterations in behavior can be explained by changing the fly's overall energy budget, suggesting potential connections between metabolism, aging, and behavior.
△ Less
Submitted 15 June, 2021; v1 submitted 14 June, 2021;
originally announced June 2021.
-
Semantic-aware Binary Code Representation with BERT
Authors:
Hyungjoon Koo,
Soyeon Park,
Daejin Choi,
Taesoo Kim
Abstract:
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the exis…
▽ More
A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the existing approaches utilizing machine learning are still specialized to solve one domain of problems, rendering recreation of models for different types of binary analysis. In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code.
To this end, we introduce well-balanced instruction normalization that holds rich information for each of instructions yet minimizing an out-of-vocabulary (OOV) problem. DeepSemantic has been carefully designed based on our study with large swaths of binaries. Besides, DeepSemantic leverages the essence of the BERT architecture into re-purposing a pre-trained generic model that is readily available as a one-time processing, followed by quickly applying specific downstream tasks with a fine-tuning process. We demonstrate DeepSemantic with two downstream tasks, namely, binary similarity comparison and compiler provenance (i.e., compiler and optimization level) prediction. Our experimental results show that the binary similarity model outperforms two state-of-the-art binary similarity tools, DeepBinDiff and SAFE, 49.84% and 15.83% on average, respectively.
△ Less
Submitted 9 June, 2021;
originally announced June 2021.
-
View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data
Authors:
Payam Karisani,
Jinho D. Choi,
Li Xiong
Abstract:
We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in t…
▽ More
We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data.
△ Less
Submitted 24 May, 2021;
originally announced May 2021.
-
OutFlip: Generating Out-of-Domain Samples for Unknown Intent Detection with Natural Language Attack
Authors:
DongHyun Choi,
Myeong Cheol Shin,
EungGyun Kim,
Dong Ryeol Shin
Abstract:
Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples inste…
▽ More
Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples instead of adversarial examples. Our evaluation results showed that integrating OutFlip-generated out-of-domain samples into the training dataset could significantly improve an intent classification model's out-of-domain detection performance.
△ Less
Submitted 12 May, 2021;
originally announced May 2021.
-
Balancing weights for region-level analysis: the effect of Medicaid Expansion on the uninsurance rate among states that did not expand Medicaid
Authors:
Max Rubinstein,
Amelia Haviland,
David Choi
Abstract:
We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distr…
▽ More
We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distribution of the non-expansion regions. Existing methods to estimate balancing weights often assume that the covariates are measured without error and do not account for dependencies in the outcome model. Our covariates have random noise that is uncorrelated with the outcome errors and our outcome model has state-level random effects inducing dependence between regions. To correct for the bias induced by the measurement error, we propose generating our weights on a linear approximation to the true covariates, using an idea from measurement error literature known as "regression-calibration" (see, e.g., Carroll (2006)). This requires auxiliary data to estimate the variability of the measurement error. We also modify the Stable Balancing Weights objective proposed by Zubizaretta (2015)) to reduce the variance of our estimator when the model errors follow our assumed correlation structure. We show that these approaches outperform existing methods when attempting to predict observed outcomes during the pre-treatment period. Using this method we estimate that Medicaid expansion would have caused a -2.33 (-3.54, -1.11) percentage point change in the adult uninsurance rate among states that did not expand Medicaid.
△ Less
Submitted 23 May, 2022; v1 submitted 5 May, 2021;
originally announced May 2021.
-
Enhancing Cognitive Models of Emotions with Representation Learning
Authors:
Yuting Guo,
Jinho Choi
Abstract:
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model…
▽ More
We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions. Our layer analysis can derive an emotion graph to depict hierarchical relations among the emotions. Our emotion representations can be used to generate an emotion wheel directly comparable to the one from Plutchik's\LN model, and also augment the values of missing emotions in the PAD emotional state model.
△ Less
Submitted 20 April, 2021;
originally announced April 2021.
-
Evaluation of Unsupervised Entity and Event Salience Estimation
Authors:
Jiaying Lu,
Jinho D. Choi
Abstract:
Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the ev…
▽ More
Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the evaluation process is problematic: the entity linking tool used for entity matching is very noisy, while the ignorance of event argument for event evaluation leads to boosted performance. In this work, we propose a light yet practical entity and event salience estimation evaluation protocol, which incorporates the more reliable syntactic dependency parser. Furthermore, we conduct a comprehensive analysis among popular entity and event definition standards, and present our own definition for the Salience Estimation task to reduce noise during the pseudo-ground truth generation process. Furthermore, we construct dependency-based heterogeneous graphs to capture the interactions of entities and events. The empirical results show that both baseline methods and the novel GNN method utilizing the heterogeneous graph consistently outperform the previous SOTA model in all proposed metrics.
△ Less
Submitted 14 April, 2021;
originally announced April 2021.
-
Atomic Manipulation of In-gap States on the $β$-Bi$_2$Pd Superconductor
Authors:
Cristina Mier,
Jiyoon Hwang,
Jinkyung Kim,
Yujeong Bae,
Fuyuki Nabeshima,
Yoshinori Imai,
Atsutaka Maeda,
Nicolás Lorente,
Andreas Heinrich,
Deung-Jang Choi
Abstract:
Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological p…
▽ More
Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological phases because of the large atom-superconductor interactions and the existence of a large Rashba coupling at the Bi-terminated surface. We study two spin chains, one with atoms two-lattice-parameter apart and one with square-root-of-two lattice parameters. Of these, only the second one is in a topologically non-trivial phase, in correspondence with the spin interactions for this geometry.
△ Less
Submitted 6 May, 2021; v1 submitted 13 April, 2021;
originally announced April 2021.
-
Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning
Authors:
Sangmin Lee,
Hak Gu Kim,
Dae Hwi Choi,
Hyung-Il Kim,
Yong Man Ro
Abstract:
Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input seq…
▽ More
Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available.
△ Less
Submitted 2 April, 2021;
originally announced April 2021.
-
Multiplicity one bound for cohomological automorphic representations with a fixed level
Authors:
Dohoon Choi
Abstract:
Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$.
Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that…
▽ More
Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$.
Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that $π_{1,v} \not \simeq π_{2,v}$ and that $π_{1,v}$ and $π_{2,v}$ are unramified. We prove that there exists a constant $C_N$ such that for every pair $π_1$ and $π_2$, $$\mathcal{N}(π_1,π_2) \leq C_N.$$ This improves known bounds $$ \mathcal{N}(π_1,π_2)=O(Q^A) \;\;\; (\text{some } A \text{ depending only on } n), $$ where $Q$ is the maximum of the analytic conductors of $π_1$ and $π_2$.
This result applies to newforms on $Γ_1(N)$. In particular, assume that $f_1$ and $f_2$ are Hecke eigenforms of weight $k_1$ and $k_2$ on $\mathrm{SL}_2(\mathbb{Z})$, respectively. We prove that if for all $p \in \{2,7\}$, $$λ_{f_1}(p)/\sqrt{p}^{(k_1-1)} = λ_{f_2}(p)/\sqrt{p}^{(k_2-1)},$$ then $f_1=cf_2$ for some constant $c$. Here, for each prime $p$, $λ_{f_i}(p)$ denotes the $p$-th Hecke eigenvalue of $f_i$.
△ Less
Submitted 12 March, 2022; v1 submitted 23 March, 2021;
originally announced March 2021.
-
Spin Resonance Amplitude and Frequency of a Single Atom on a Surface in a Vector Magnetic Field
Authors:
Jinkyung Kim,
Won-jun Jang,
Thi Hong Bui,
Deung-jang Choi,
Christoph Wolf,
Fernando Delgado,
Denis Krylov,
Soonhyeong Lee,
Sangwon Yoon,
Christopher P. Lutz,
Andreas J. Heinrich,
Yujeong Bae
Abstract:
We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude tha…
▽ More
We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude that the Ti spin is aligned along the magnetic field, while the tip spin follows its strong magnetic anisotropy. Our results show the interplay between the tip and surface spins in determining the ESR signals and highlight the precision of ESR-STM to identify the single atom's spin states.
△ Less
Submitted 17 March, 2021;
originally announced March 2021.
-
Putting Humans in the Natural Language Processing Loop: A Survey
Authors:
Zijie J. Wang,
Dongjin Choi,
Shenyu Xu,
Diyi Yang
Abstract:
How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods…
▽ More
How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods to learn from collected feedback. We present a survey of HITL NLP work from both Machine Learning (ML) and Human-Computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks focusing on their tasks, goals, human interactions, and feedback learning methods. Finally, we discuss future directions for integrating human feedback in the NLP development loop.
△ Less
Submitted 6 March, 2021;
originally announced March 2021.
-
Run Your Visual-Inertial Odometry on NVIDIA Jetson: Benchmark Tests on a Micro Aerial Vehicle
Authors:
Jinwoo Jeon,
Sungwook Jung,
Eungchang Lee,
Duckyu Choi,
Hyun Myung
Abstract:
This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform we…
▽ More
This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform well in situations where the size of the processing board and weight is limited. Jetson boards released by NVIDIA satisfy these constraints as they have a sufficiently powerful central processing unit (CPU) and graphics processing unit (GPU) for image processing. However, in existing studies, the performance of Jetson boards as a processing platform for executing VO/VIO has not been compared extensively in terms of the usage of computing resources and accuracy. Therefore, this study compares representative VO/VIO algorithms on several NVIDIA Jetson platforms, namely NVIDIA Jetson TX2, Xavier NX, and AGX Xavier, and introduces a novel dataset 'KAIST VIO dataset' for UAVs. Including pure rotations, the dataset has several geometric trajectories that are harsh to visual(-inertial) state estimation. The evaluation is performed in terms of the accuracy of estimated odometry, CPU usage, and memory usage on various Jetson boards, algorithms, and trajectories. We present the {results of the} comprehensive benchmark test and release the dataset for the computer vision and robotics applications.
△ Less
Submitted 2 March, 2021;
originally announced March 2021.
-
Peacock Exploration: A Lightweight Exploration for UAV using Control-Efficient Trajectory
Authors:
EungChang Mason Lee,
Duckyu Choi,
Hyun Myung
Abstract:
Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for U…
▽ More
Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for UAVs were not considered. In this paper, to solve those issues and to consider the limited payload and computational resource of UAVs, we propose "Peacock Exploration": A lightweight exploration method for UAVs using precomputed minimum snap trajectories which look like a peacock's tail. Using the widely known, control efficient minimum snap trajectories and OctoMap, the UAV equipped with a RGB-D camera can explore unknown 3D environments without any prior knowledge or human-guidance with only O(logN) computational complexity. It also adopts the receding horizon approach and simple, heuristic scoring criteria. The proposed algorithm's performance is demonstrated by exploring a challenging 3D maze environment and compared with a state-of-the-art algorithm.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
Creating a Physicist: The Impact of Informal Programs on University Student Development
Authors:
Callie Rethman,
Jonathan Perry,
Jonan Donaldson,
Daniel Choi,
Tatiana Erukhimova
Abstract:
Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal p…
▽ More
Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal programs on dimensions of physics identity, sense of community, 21st century skill development, and motivation. We employed a mixed methods study combining a survey instrument (117 responses) and interviews (35) with current and former undergraduate and graduate students who participated in five programs through a physics and astronomy department at a large land-grant university. To examine interviews, we employed a framework based on situated learning theory, transformative learning theory, and the Dynamic Systems Model of Role Identity. Our findings, based on self-reported data, show that students who facilitated informal physics programs positively developed their physics identity, experienced increased sense of belonging to the physics community, and developed 21st century career skills. Specifically, students reported positive benefits to their communication, teamwork and networking, and design skills. The benefits of these programs can be achieved by departments of any size without significant commitment of funds or changes to curriculum.
△ Less
Submitted 29 May, 2021; v1 submitted 27 December, 2020;
originally announced December 2020.
-
Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems
Authors:
Hyungjun Park,
Daiki Min,
Jong-hyun Ryu,
Dong Gu Choi
Abstract:
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First…
▽ More
Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority.
△ Less
Submitted 19 May, 2021; v1 submitted 21 November, 2020;
originally announced November 2020.
-
Nonlinear imaging of nanoscale topological corner states
Authors:
Sergey Kruk,
Wenlong Gao,
Duk Yong Choi,
Thomas Zentgraf,
Shuang Zhang,
Yuri Kivshar
Abstract:
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to…
▽ More
Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to (N-1)-dimensional boundary states. The higher-order bulk-boundary correspondence relates N-dimensional bulk to boundary states with dimensionality reduced by more than 1. A special interest lies in miniaturization of such higher-order topological states to the nanoscale. Here, we realize nanoscale topological corner states in metasurfaces with C6-symmetric honeycomb lattices. We directly observe nanoscale topology-empowered edge and corner localizations of light and enhancement of light-matter interactions via a nonlinear imaging technique. Control of light at the nanoscale empowered by topology may facilitate miniaturization and on-chip integration of classical and quantum photonic devices.
△ Less
Submitted 1 September, 2022; v1 submitted 19 November, 2020;
originally announced November 2020.
-
Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering
Authors:
Ricky T. Q. Chen,
Dami Choi,
Lukas Balles,
David Duvenaud,
Philipp Hennig
Abstract:
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyper…
▽ More
Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers.
△ Less
Submitted 9 November, 2020;
originally announced November 2020.
-
Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models
Authors:
Changmao Li,
Elaine Fisher,
Rebecca Thomas,
Steve Pittard,
Vicki Hertzberg,
Jinho D. Choi
Abstract:
This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resu…
▽ More
This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resume is manually annotated to its most appropriate CRC position by experts through several rounds of triple annotation to establish guidelines. As a result, a high Kappa score of 61% is achieved for inter-annotator agreement. Given this dataset, novel transformer-based classification models are developed for two tasks: the first task takes a resume and classifies it to a CRC level (T1), and the second task takes both a resume and a job description to apply and predicts if the application is suited to the job T2. Our best models using section encoding and multi-head attention decoding give results of 73.3% to T1 and 79.2% to T2. Our analysis shows that the prediction errors are mostly made among adjacent CRC levels, which are hard for even experts to distinguish, implying the practical value of our models in real HR platforms.
△ Less
Submitted 5 November, 2020;
originally announced November 2020.
-
Extracting Chemical-Protein Interactions via Calibrated Deep Neural Network and Self-training
Authors:
Dongha Choi,
Hyunju Lee
Abstract:
The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data,…
▽ More
The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data, which tend to become over-confident, leading to degradation of the model reliability. To estimate the data uncertainty and improve the reliability, "calibration" techniques have been applied to deep learning models. In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques. Our model first encodes the input sequence using a pre-trained language-understanding model, following which it is trained using two calibration methods: mixup training and addition of a confidence penalty loss. Finally, the model is re-trained with augmented data that are extracted using the estimated uncertainties. Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches. Furthermore, our approach also presents the possibilities of using uncertainty estimation for performance improvement.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.