subscribe to arXiv mailings

doi 10.1126/science.abq1158

Competition-Level Code Generation with AlphaCode

Authors: Yujia Li, David Choi, Junyoung Chung, Nate Kushman, Julian Schrittwieser, Rémi Leblond, Tom Eccles, James Keeling, Felix Gimeno, Agustin Dal Lago, Thomas Hubert, Peter Choy, Cyprien de Masson d'Autume, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu , et al. (1 additional authors not shown)

Abstract: Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple… ▽ More Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more productive and accessible, yet so far incorporating innovations in AI has proven challenging. Recent large-scale language models have demonstrated an impressive ability to generate code, and are now able to complete simple programming tasks. However, these models still perform poorly when evaluated on more complex, unseen problems that require problem-solving skills beyond simply translating instructions into code. For example, competitive programming problems which require an understanding of algorithms and complex natural language remain extremely challenging. To address this gap, we introduce AlphaCode, a system for code generation that can create novel solutions to these problems that require deeper reasoning. In simulated evaluations on recent programming competitions on the Codeforces platform, AlphaCode achieved on average a ranking of top 54.3% in competitions with more than 5,000 participants. We found that three key components were critical to achieve good and reliable performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling to explore the search space, followed by filtering based on program behavior to a small set of submissions. △ Less

Submitted 8 February, 2022; originally announced March 2022.

Comments: 74 pages

arXiv:2203.05934 [pdf, other]

doi 10.1103/PhysRevResearch.4.L032010

Moiré dispersion of edge states in spin chains on superconductors

Authors: Cristina Mier, Deung-Jang Choi, Nicolás Lorente

Abstract: Our calculations of ferromagnetic spin chains on s-wave superconductors show that the energy oscillations of edge states with the chain's length are due to a moiré pattern emerging from Friedel-like oscillations and the discreteness of the spin-chain lattice. By modifying the spin lattice, the moiré dispersion of edge states can be controlled. In particular, we can engineer non-dispersive edge sta… ▽ More Our calculations of ferromagnetic spin chains on s-wave superconductors show that the energy oscillations of edge states with the chain's length are due to a moiré pattern emerging from Friedel-like oscillations and the discreteness of the spin-chain lattice. By modifying the spin lattice, the moiré dispersion of edge states can be controlled. In particular, we can engineer non-dispersive edge states that remain at fixed energy regardless of the size distribution of the spin chains. This is an important step in the study of edge states of spin chains that can be fabricated with a certain size dispersion. △ Less

Submitted 11 March, 2022; originally announced March 2022.

Journal ref: Phys. Rev. Research 4, L032010 (2022)

arXiv:2203.01426 [pdf, other]

SPICEprop: Backpropagating Errors Through Memristive Spiking Neural Networks

Authors: Peng Zhou, Jason K. Eshraghian, Dong-Uk Choi, Sung-Mo Kang

Abstract: We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Sy… ▽ More We present a fully memristive spiking neural network (MSNN) consisting of novel memristive neurons trained using the backpropagation through time (BPTT) learning rule. Gradient descent is applied directly to the memristive integrated-and-fire (MIF) neuron designed using analog SPICE circuit models, which generates distinct depolarization, hyperpolarization, and repolarization voltage waveforms. Synaptic weights are trained by BPTT using the membrane potential of the MIF neuron model and can be processed on memristive crossbars. The natural spiking dynamics of the MIF neuron model are fully differentiable, eliminating the need for gradient approximations that are prevalent in the spiking neural network literature. Despite the added complexity of training directly on SPICE circuit models, we achieve 97.58% accuracy on the MNIST testing dataset and 75.26% on the Fashion-MNIST testing dataset, the highest accuracies among all fully MSNNs. △ Less

Submitted 9 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2203.01416 [pdf, other]

A Fully Memristive Spiking Neural Network with Unsupervised Learning

Authors: Peng Zhou, Dong-Uk Choi, Jason K. Eshraghian, Sung-Mo Kang

Abstract: We present a fully memristive spiking neural network (MSNN) consisting of physically-realizable memristive neurons and memristive synapses to implement an unsupervised Spiking Time Dependent Plasticity (STDP) learning rule. The system is fully memristive in that both neuronal and synaptic dynamics can be realized by using memristors. The neuron is implemented using the SPICE-level memristive integ… ▽ More We present a fully memristive spiking neural network (MSNN) consisting of physically-realizable memristive neurons and memristive synapses to implement an unsupervised Spiking Time Dependent Plasticity (STDP) learning rule. The system is fully memristive in that both neuronal and synaptic dynamics can be realized by using memristors. The neuron is implemented using the SPICE-level memristive integrate-and-fire (MIF) model, which consists of a minimal number of circuit elements necessary to achieve distinct depolarization, hyperpolarization, and repolarization voltage waveforms. The proposed MSNN uniquely implements STDP learning by using cumulative weight changes in memristive synapses from the voltage waveform changes across the synapses, which arise from the presynaptic and postsynaptic spiking voltage signals during the training process. Two types of MSNN architectures are investigated: 1) a biologically plausible memory retrieval system, and 2) a multi-class classification system. Our circuit simulation results verify the MSNN's unsupervised learning efficacy by replicating biological memory retrieval mechanisms, and achieving 97.5% accuracy in a 4-pattern recognition problem in a large scale discriminative MSNN. △ Less

Submitted 9 March, 2022; v1 submitted 2 March, 2022; originally announced March 2022.

arXiv:2201.07459 [pdf, other]

PT4AL: Using Self-Supervised Pretext Tasks for Active Learning

Authors: John Seon Keun Yi, Minseok Seo, Jongchan Park, Dong-Geol Choi

Abstract: Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext t… ▽ More Labeling a large set of data is expensive. Active learning aims to tackle this problem by asking to annotate only the most informative data from the unlabeled set. We propose a novel active learning approach that utilizes self-supervised pretext tasks and a unique data sampler to select data that are both difficult and representative. We discover that the loss of a simple self-supervised pretext task, such as rotation prediction, is closely correlated to the downstream task loss. Before the active learning iterations, the pretext task learner is trained on the unlabeled set, and the unlabeled data are sorted and split into batches by their pretext task losses. In each active learning iteration, the main task model is used to sample the most uncertain data in a batch to be annotated. We evaluate our method on various image classification and segmentation benchmarks and achieve compelling performances on CIFAR10, Caltech-101, ImageNet, and Cityscapes. We further show that our method performs well on imbalanced datasets, and can be an effective solution to the cold-start problem where active learning performance is affected by the randomly sampled initial labeled set. △ Less

Submitted 26 July, 2022; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Code is available at https://github.com/johnsk95/PT4AL Updated for ECCV 2022 submission

arXiv:2112.02721 [pdf, other]

NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation

Authors: Kaustubh D. Dhole, Varun Gangal, Sebastian Gehrmann, Aadesh Gupta, Zhenhao Li, Saad Mahamood, Abinaya Mahendiran, Simon Mille, Ashish Shrivastava, Samson Tan, Tongshuang Wu, Jascha Sohl-Dickstein, Jinho D. Choi, Eduard Hovy, Ondrej Dusek, Sebastian Ruder, Sajant Anand, Nagender Aneja, Rabin Banjade, Lisa Barthe, Hanna Behnke, Ian Berlot-Attwell, Connor Boyle, Caroline Brun, Marco Antonio Sobrevilla Cabezudo , et al. (101 additional authors not shown)

Abstract: Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data split… ▽ More Data augmentation is an important component in the robustness evaluation of models in natural language processing (NLP) and in enhancing the diversity of the data they are trained on. In this paper, we present NL-Augmenter, a new participatory Python-based natural language augmentation framework which supports the creation of both transformations (modifications to the data) and filters (data splits according to specific features). We describe the framework and an initial set of 117 transformations and 23 filters for a variety of natural language tasks. We demonstrate the efficacy of NL-Augmenter by using several of its transformations to analyze the robustness of popular natural language models. The infrastructure, datacards and robustness analysis results are available publicly on the NL-Augmenter repository (https://github.com/GEM-benchmark/NL-Augmenter). △ Less

Submitted 11 October, 2022; v1 submitted 5 December, 2021; originally announced December 2021.

Comments: 39 pages, repository at https://github.com/GEM-benchmark/NL-Augmenter

arXiv:2112.02026 [pdf, other]

doi 10.3847/1538-4365/ac4414

The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete Release of MaNGA, MaStar and APOGEE-2 Data

Authors: Abdurro'uf, Katherine Accetta, Conny Aerts, Victor Silva Aguirre, Romina Ahumada, Nikhil Ajgaonkar, N. Filiz Ak, Shadab Alam, Carlos Allende Prieto, Andres Almeida, Friedrich Anders, Scott F. Anderson, Brett H. Andrews, Borja Anguiano, Erik Aquino-Ortiz, Alfonso Aragon-Salamanca, Maria Argudo-Fernandez, Metin Ata, Marie Aubert, Vladimir Avila-Reese, Carles Badenes, Rodolfo H. Barba, Kat Barger, Jorge K. Barrera-Ballesteros, Rachael L. Beaton , et al. (316 additional authors not shown)

Abstract: This paper documents the seventeenth data release (DR17) from the Sloan Digital Sky Surveys; the fifth and final release from the fourth phase (SDSS-IV). DR17 contains the complete release of the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, which reached its goal of surveying over 10,000 nearby galaxies. The complete release of the MaNGA Stellar Library (MaStar) accompanies… ▽ More This paper documents the seventeenth data release (DR17) from the Sloan Digital Sky Surveys; the fifth and final release from the fourth phase (SDSS-IV). DR17 contains the complete release of the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, which reached its goal of surveying over 10,000 nearby galaxies. The complete release of the MaNGA Stellar Library (MaStar) accompanies this data, providing observations of almost 30,000 stars through the MaNGA instrument during bright time. DR17 also contains the complete release of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) survey which publicly releases infra-red spectra of over 650,000 stars. The main sample from the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), as well as the sub-survey Time Domain Spectroscopic Survey (TDSS) data were fully released in DR16. New single-fiber optical spectroscopy released in DR17 is from the SPectroscipic IDentification of ERosita Survey (SPIDERS) sub-survey and the eBOSS-RM program. Along with the primary data sets, DR17 includes 25 new or updated Value Added Catalogs (VACs). This paper concludes the release of SDSS-IV survey data. SDSS continues into its fifth phase with observations already underway for the Milky Way Mapper (MWM), Local Volume Mapper (LVM) and Black Hole Mapper (BHM) surveys. △ Less

Submitted 13 January, 2022; v1 submitted 3 December, 2021; originally announced December 2021.

Comments: 40 pages, 8 figures, 6 tables. In press at ApJSS (arxiv v2 corrects some minor typos and updates references)

arXiv:2112.00503 [pdf, other]

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-sentence Dependency Graph

Authors: Liyan Xu, Xuchao Zhang, Bo Zong, Yanchi Liu, Wei Cheng, Jingchao Ni, Haifeng Chen, Liang Zhao, Jinho D. Choi

Abstract: We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to… ▽ More We target the task of cross-lingual Machine Reading Comprehension (MRC) in the direct zero-shot setting, by incorporating syntactic features from Universal Dependencies (UD), and the key features we use are the syntactic relations within each sentence. While previous work has demonstrated effective syntax-guided MRC models, we propose to adopt the inter-sentence syntactic relations, in addition to the rudimentary intra-sentence relations, to further utilize the syntactic dependencies in the multi-sentence input of the MRC task. In our approach, we build the Inter-Sentence Dependency Graph (ISDG) connecting dependency trees to form global syntactic relations across sentences. We then propose the ISDG encoder that encodes the global dependency graph, addressing the inter-sentence relations via both one-hop and multi-hop dependency paths explicitly. Experiments on three multilingual MRC datasets (XQuAD, MLQA, TyDiQA-GoldP) show that our encoder that is only trained on English is able to improve the zero-shot performance on all 14 test sets covering 8 languages, with up to 3.8 F1 / 5.2 EM improvement on-average, and 5.2 F1 / 11.2 EM on certain languages. Further analysis shows the improvement can be attributed to the attention on the cross-linguistically consistent syntactic path. △ Less

Submitted 15 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

Comments: Accepted to AAAI 2022

arXiv:2111.00572 [pdf, other]

What Went Wrong? Explaining Overall Dialogue Quality through Utterance-Level Impacts

Authors: James D. Finch, Sarah E. Finch, Jinho D. Choi

Abstract: Improving user experience of a dialogue system often requires intensive developer effort to read conversation logs, run statistical analyses, and intuit the relative importance of system shortcomings. This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality. Unlike prior work on uttera… ▽ More Improving user experience of a dialogue system often requires intensive developer effort to read conversation logs, run statistical analyses, and intuit the relative importance of system shortcomings. This paper presents a novel approach to automated analysis of conversation logs that learns the relationship between user-system interactions and overall dialogue quality. Unlike prior work on utterance-level quality prediction, our approach learns the impact of each interaction from the overall user rating without utterance-level annotation, allowing resultant model conclusions to be derived on the basis of empirical evidence and at low cost. Our model identifies interactions that have a strong correlation with the overall dialogue quality in a chatbot setting. Experiments show that the automated analysis from our model agrees with expert judgments, making this work the first to show that such weakly-supervised learning of utterance-level quality prediction is highly achievable. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: Accepted at the 3rd Workshop on NLP for ConvAI

arXiv:2111.00570 [pdf, other]

An Approach to Inference-Driven Dialogue Management within a Social Chatbot

Authors: Sarah E. Finch, James D. Finch, Daniil Huryn, William Hutsell, Xiaoyuan Huang, Han He, Jinho D. Choi

Abstract: We present a chatbot implementing a novel dialogue management approach based on logical inference. Instead of framing conversation a sequence of response generation tasks, we model conversation as a collaborative inference process in which speakers share information to synthesize new knowledge in real time. Our chatbot pipeline accomplishes this modelling in three broad stages. The first stage tra… ▽ More We present a chatbot implementing a novel dialogue management approach based on logical inference. Instead of framing conversation a sequence of response generation tasks, we model conversation as a collaborative inference process in which speakers share information to synthesize new knowledge in real time. Our chatbot pipeline accomplishes this modelling in three broad stages. The first stage translates user utterances into a symbolic predicate representation. The second stage then uses this structured representation in conjunction with a larger knowledge base to synthesize new predicates using efficient graph matching. In the third and final stage, our bot selects a small subset of predicates and translates them into an English response. This approach lends itself to understanding latent semantics of user inputs, flexible initiative taking, and responses that are novel and coherent with the dialogue context. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: Published in 4th Proceedings of Alexa Prize (Alexa Prize 2020)

arXiv:2110.00238 [pdf, other]

Improving Object Permanence using Agent Actions and Reasoning

Authors: Ying Siu Liang, Chen Zhang, Dongkyu Choi, Kenneth Kwok

Abstract: Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation act… ▽ More Object permanence in psychology means knowing that objects still exist even if they are no longer visible. It is a crucial concept for robots to operate autonomously in uncontrolled environments. Existing approaches learn object permanence from low-level perception, but perform poorly on more complex scenarios, like when objects are contained and carried by others. Knowledge about manipulation actions performed on an object prior to its disappearance allows us to reason about its location, e.g., that the object has been placed in a carrier. In this paper we argue that object permanence can be improved when the robot uses knowledge about executed actions and describe an approach to infer hidden object states from agent actions. We show that considering agent actions not only improves rule-based reasoning models but also purely neural approaches, showing its general applicability. Then, we conduct quantitative experiments on a snitch localization task using a dataset of 1,371 synthesized videos, where we compare the performance of different object permanence models with and without action annotations. We demonstrate that models with action annotations can significantly increase performance of both neural and rule-based approaches. Finally, we evaluate the usability of our approach in real-world applications by conducting qualitative experiments with two Universal Robots (UR5 and UR16e) in both lab and industrial settings. The robots complete benchmark tasks for a gearbox assembly and demonstrate the object permanence capabilities with real sensor data in an industrial environment. △ Less

Submitted 1 October, 2021; originally announced October 2021.

Comments: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

arXiv:2109.09858 [pdf, other]

Intensionalizing Abstract Meaning Representations: Non-Veridicality and Scope

Authors: Gregor Williamson, Patrick Elliott, Yuxin Ji, Jinho D. Choi

Abstract: Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. However, at present it is unable to satisfyingly represent non-veridical intensional contexts, often licensing inappropriate inferences. In this paper, we show how to resolve the problem of non-veridicality without appealing to layered graph… ▽ More Abstract Meaning Representation (AMR) is a graphical meaning representation language designed to represent propositional information about argument structure. However, at present it is unable to satisfyingly represent non-veridical intensional contexts, often licensing inappropriate inferences. In this paper, we show how to resolve the problem of non-veridicality without appealing to layered graphs through a mapping from AMRs into Simply-Typed Lambda Calculus (STLC). At least for some cases, this requires the introduction of a new role :content which functions as an intensional operator. The translation proposed is inspired by the formal linguistics literature on the event semantics of attitude reports. Next, we address the interaction of quantifier scope and intensional operators in so-called de re/de dicto ambiguities. We adopt a scope node from the literature and provide an explicit multidimensional semantics utilizing Cooper storage which allows us to derive the de re and de dicto scope readings as well as intermediate scope readings which prove difficult for accounts without a scope node. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: LAW-DMR'21, 8 pages (excl. refs)

arXiv:2109.09853 [pdf, other]

StreamSide: A Fully-Customizable Open-Source Toolkit for Efficient Annotation of Meaning Representations

Authors: Jinho D. Choi, Gregor Williamson

Abstract: This demonstration paper presents StreamSide, an open-source toolkit for annotating multiple kinds of meaning representations. StreamSide supports frame-based annotation schemes e.g., Abstract Meaning Representation (AMR) and frameless annotation schemes e.g., Widely Interpretable Semantic Representation (WISeR). Moreover, it supports both sentence-level and document-level annotation by allowing a… ▽ More This demonstration paper presents StreamSide, an open-source toolkit for annotating multiple kinds of meaning representations. StreamSide supports frame-based annotation schemes e.g., Abstract Meaning Representation (AMR) and frameless annotation schemes e.g., Widely Interpretable Semantic Representation (WISeR). Moreover, it supports both sentence-level and document-level annotation by allowing annotators to create multi-rooted graphs for input text. It can open and automatically convert between several types of input formats including plain text, Penman notation, and its own JSON format enabling richer annotation. It features reference frames for AMR predicate argument structures, and also concept-to-text alignment. StreamSide is released under the Apache 2.0 license, and is completely open-source so that it can be customized to annotate enriched meaning representations in different languages (e.g., Uniform Meaning Representations). All StreamSide resources are publicly distributed through our open source project at: https://github.com/emorynlp/StreamSide. △ Less

Submitted 20 September, 2021; originally announced September 2021.

Comments: demo paper, 6 pages (excl. refs), 6 figures

arXiv:2109.06939 [pdf, other]

The Stem Cell Hypothesis: Dilemma behind Multi-Task Learning with Transformer Encoders

Authors: Han He, Jinho D. Choi

Abstract: Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task le… ▽ More Multi-task learning with transformer encoders (MTL) has emerged as a powerful technique to improve performance on closely-related tasks for both accuracy and efficiency while a question still remains whether or not it would perform as well on tasks that are distinct in nature. We first present MTL results on five NLP tasks, POS, NER, DEP, CON, and SRL, and depict its deficiency over single-task learning. We then conduct an extensive pruning analysis to show that a certain set of attention heads get claimed by most tasks during MTL, who interfere with one another to fine-tune those heads for their own objectives. Based on this finding, we propose the Stem Cell Hypothesis to reveal the existence of attention heads naturally talented for many tasks that cannot be jointly trained to create adequate embeddings for all of those tasks. Finally, we design novel parameter-free probes to justify our hypothesis and demonstrate how attention heads are transformed across the five tasks during MTL through label analysis. △ Less

Submitted 14 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP 2021: The 2021 Conference on Empirical Methods in Natural Language Processing

arXiv:2109.03903 [pdf, other]

ELIT: Emory Language and Information Toolkit

Authors: Han He, Liyan Xu, Jinho D. Choi

Abstract: We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatizat… ▽ More We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatization, part-of-speech tagging, named entity recognition, dependency parsing, constituency parsing, semantic role labeling, and AMR parsing. The backbone of ELIT's MTL framework is a pre-trained transformer encoder that is shared across tasks to speed up their inference. ELIT provides pre-trained models developed on a remix of eight datasets. To scale up its service, ELIT also integrates a RESTful Client/Server combination. On the server side, ELIT extends its functionality to cover other tasks such as tokenization and coreference resolution, providing an end user with agile research experience. All resources including the source codes, documentation, and pre-trained models are publicly available at https://github.com/emorynlp/elit. △ Less

Submitted 8 September, 2021; originally announced September 2021.

arXiv:2109.00194 [pdf, other]

Boosting Cross-Lingual Transfer via Self-Learning with Uncertainty Estimation

Authors: Liyan Xu, Xuchao Zhang, Xujiang Zhao, Haifeng Chen, Feng Chen, Jinho D. Choi

Abstract: Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Thre… ▽ More Recent multilingual pre-trained language models have achieved remarkable zero-shot performance, where the model is only finetuned on one source language and directly evaluated on target languages. In this work, we propose a self-learning framework that further utilizes unlabeled data of target languages, combined with uncertainty estimation in the process to select high-quality silver labels. Three different uncertainties are adapted and analyzed specifically for the cross lingual transfer: Language Heteroscedastic/Homoscedastic Uncertainty (LEU/LOU), Evidential Uncertainty (EVI). We evaluate our framework with uncertainties on two cross-lingual tasks including Named Entity Recognition (NER) and Natural Language Inference (NLI) covering 40 languages in total, which outperforms the baselines significantly by 10 F1 on average for NER and 2.5 accuracy score for NLI. △ Less

Submitted 23 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: Accepted to EMNLP 2021

arXiv:2109.00185 [pdf, other]

Adapted End-to-End Coreference Resolution System for Anaphoric Identities in Dialogues

Authors: Liyan Xu, Jinho D. Choi

Abstract: We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adapt… ▽ More We present an effective system adapted from the end-to-end neural coreference resolution model, targeting on the task of anaphora resolution in dialogues. Three aspects are specifically addressed in our approach, including the support of singletons, encoding speakers and turns throughout dialogue interactions, and knowledge transfer utilizing existing resources. Despite the simplicity of our adaptation strategies, they are shown to bring significant impact to the final performance, with up to 27 F1 improvement over the baseline. Our final system ranks the 1st place on the leaderboard of the anaphora resolution track in the CRAC 2021 shared task, and achieves the best evaluation results on all four datasets. △ Less

Submitted 23 September, 2021; v1 submitted 1 September, 2021; originally announced September 2021.

Comments: Accepted to CODI-CRAC 2021

arXiv:2108.11146 [pdf, other]

doi 10.1103/PhysRevB.104.245415

Calculations of in-gap states of ferromagnetic spin chains on \textit{s}-wave wide-band superconductors

Authors: Cristina Mier, deung-Jang Choi, Nicolás Lorente

Abstract: Magnetic impurities create in-gap states on superconductors. Recent experiments explore the topological properties of one-dimensional arrays of magnetic impurities on superconductors, because in certain regimes p-wave pairing can be locally induced leading to new topological phases. A by-product of the new accessible phases is the appearance of zero-energy edge states that have non-Abelian exchang… ▽ More Magnetic impurities create in-gap states on superconductors. Recent experiments explore the topological properties of one-dimensional arrays of magnetic impurities on superconductors, because in certain regimes p-wave pairing can be locally induced leading to new topological phases. A by-product of the new accessible phases is the appearance of zero-energy edge states that have non-Abelian exchange properties and can be used for topological quantum computation. Despite the large amount of theory devoted to these systems, most treatments use approximations that render their applicability limited when comparing with usual experiments of 1-D impurity arrays on wide-band superconductors. These approximations either involve tight-binding-like approximations where the impurity energy scales match the minute energy scale of the superconducting gap and are many times unrealistic, or they assume strongly-bound in-gap states. Here, we present a theory for s-wave superconductors based on a wide-band normal metal, with any possible energy scale for the magnetic impurities. The theory is based on free-electron Green's functions. We include Rashba coupling and compare with recent experimental results, permitting us to analyze the topological phases and the experimental edge states. The infinite-chain properties can be analytically obtained, giving us a way to compare with finite-chain calculations. We show that it is possible to converge to the infinite limit by doing finite numerical calculation, paving the way for numerical calculations not based on analytical Green's functions. △ Less

Submitted 25 August, 2021; originally announced August 2021.

arXiv:2108.09880 [pdf]

An electron-spin qubit platform assembled atom-by-atom on a surface

Authors: Yu Wang, Yi Chen, Hong T. Bui, Christoph Wolf, Masahiro Haze, Cristina Mier, Jinkyung Kim, Deung-jang Choi, Christopher P. Lutz, Yujeong Bae, Soo-Hyon Phark, Andreas J. Heinrich

Abstract: Creating a quantum-coherent architecture at the atomic scale has long been an ambition in quantum science and nanotechnology. This ultimate length scale requires the use of fundamental quantum properties of atoms, such as the spin of electrons, which naturally occurs in many solid-state environments and allows high-fidelity operations and readout by electromagnetic means. Despite decades of effort… ▽ More Creating a quantum-coherent architecture at the atomic scale has long been an ambition in quantum science and nanotechnology. This ultimate length scale requires the use of fundamental quantum properties of atoms, such as the spin of electrons, which naturally occurs in many solid-state environments and allows high-fidelity operations and readout by electromagnetic means. Despite decades of effort, however, it remains a formidable task to realize an atomic-scale quantum architecture where multiple electron spin qubits can be precisely assembled, controllably coupled, and coherently operated. Electron spin qubits created in dopants in semiconductors and color centers in insulators, for example, can be well controlled individually6-8 but are difficult to couple together into a circuit. On the other hand, multiple magnetic atoms and molecules on surfaces can be coupled to each other by building sophisticated atomic structures using a scanning tunneling microscope (STM), but coherent operation has so far been limited to a single qubit in the tunnel junction. Here we demonstrate an atomic-scale qubit platform by showing atom-by-atom construction, coherent operations, and readout of multiple electron-spin qubits on a surface. To enable the coherent control of remote qubits that are outside the tunnel junction, we complement each electron spin with a local magnetic field gradient from a nearby single-atom magnet. To enable readout of remote qubits, we employ a sensor qubit in the tunnel junction and implement pulsed double electron spin resonance. Using these methods, we demonstrate fast single-, two-, and three-qubit operations in an all-electrical fashion. Our work marks the creation of an Angstrom-scale qubit platform, where quantum functionalities using electron spin arrays, built atom-by-atom on a surface, are now within reach. △ Less

Submitted 5 August, 2022; v1 submitted 22 August, 2021; originally announced August 2021.

arXiv:2108.04500 [pdf, other]

doi 10.3390/electronics11020235

Exploiting Features with Split-and-Share Module

Authors: Jaemin Lee, Minseok Seo, Jongchan Park, Dong-Geol Choi

Abstract: Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, whic… ▽ More Deep convolutional neural networks (CNNs) have shown state-of-the-art performances in various computer vision tasks. Advances on CNN architectures have focused mainly on designing convolutional blocks of the feature extractors, but less on the classifiers that exploit extracted features. In this work, we propose Split-and-Share Module (SSM),a classifier that splits a given feature into parts, which are partially shared by multiple sub-classifiers. Our intuition is that the more the features are shared, the more common they will become, and SSM can encourage such structural characteristics in the split features. SSM can be easily integrated into any architecture without bells and whistles. We have extensively validated the efficacy of SSM on ImageNet-1K classification task, andSSM has shown consistent and significant improvements over baseline architectures. In addition, we analyze the effect of SSM using the Grad-CAM visualization. △ Less

Submitted 10 August, 2021; v1 submitted 10 August, 2021; originally announced August 2021.

Journal ref: Electronics 2022

arXiv:2108.02400 [pdf, other]

Security and Privacy Enhanced Gait Authentication with Random Representation Learning and Digital Lockers

Authors: Lam Tran, Thuc Nguyen, Hyunil Kim, Deokjai Choi

Abstract: Gait data captured by inertial sensors have demonstrated promising results on user authentication. However, most existing approaches stored the enrolled gait pattern insecurely for matching with the validating pattern, thus, posed critical security and privacy issues. In this study, we present a gait cryptosystem that generates from gait data the random key for user authentication, meanwhile, secu… ▽ More Gait data captured by inertial sensors have demonstrated promising results on user authentication. However, most existing approaches stored the enrolled gait pattern insecurely for matching with the validating pattern, thus, posed critical security and privacy issues. In this study, we present a gait cryptosystem that generates from gait data the random key for user authentication, meanwhile, secures the gait pattern. First, we propose a revocable and random binary string extraction method using a deep neural network followed by feature-wise binarization. A novel loss function for network optimization is also designed, to tackle not only the intrauser stability but also the inter-user randomness. Second, we propose a new biometric key generation scheme, namely Irreversible Error Correct and Obfuscate (IECO), improved from the Error Correct and Obfuscate (ECO) scheme, to securely generate from the binary string the random and irreversible key. The model was evaluated with two benchmark datasets as OU-ISIR and whuGAIT. We showed that our model could generate the key of 139 bits from 5-second data sequence with zero False Acceptance Rate (FAR) and False Rejection Rate (FRR) smaller than 5.441%. In addition, the security and user privacy analyses showed that our model was secure against existing attacks on biometric template protection, and fulfilled irreversibility and unlinkability. △ Less

Submitted 5 August, 2021; originally announced August 2021.

arXiv:2107.13227 [pdf]

doi 10.1021/acs.nanolett.1c00449

Nonlinear imaging of nanoscale topological corner states

Authors: Sergey S. Kruk, Wenlong Gao, Duk-Yong Choi, Thomas Zentgraf, Shuang Zhang, Yuri Kivshar

Abstract: Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to… ▽ More Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to (N-1)-dimensional boundary states. The higher-order bulk-boundary correspondence relates N-dimensional bulk to boundary states with dimensionality reduced by more than 1. A special interest lies in miniaturization of such higher-order topological states to the nanoscale. Here, we realize nanoscale topological corner states in metasurfaces with C6-symmetric honeycomb lattices. We directly observe nanoscale topology-empowered edge and corner localizations of light and enhancement of light-matter interactions via a nonlinear imaging technique. Control of light at the nanoscale empowered by topology may facilitate miniaturization and on-chip integration of classical and quantum photonic devices. △ Less

Submitted 28 July, 2021; originally announced July 2021.

Comments: arXiv admin note: text overlap with arXiv:2011.10164

arXiv:2107.10916 [pdf, other]

doi 10.1021/acs.nanolett.1c03026

A flexible design platform for Si/SiGe exchange-only qubits with low disorder

Authors: Wonill Ha, Sieu D. Ha, Maxwell D. Choi, Yan Tang, Adele E. Schmitz, Mark P. Levendorf, Kangmu Lee, James M. Chappell, Tower S. Adams, Daniel R. Hulbert, Edwin Acuna, Ramsey S. Noah, Justine W. Matten, Michael P. Jura, Jeffrey A. Wright, Matthew T. Rakher, Matthew G. Borselli

Abstract: Spin-based silicon quantum dots are an attractive qubit technology for quantum information processing with respect to coherence time, control, and engineering. Here we present an exchange-only Si qubit device platform that combines the throughput of CMOS-like wafer processing with the versatility of direct-write lithography. The technology, which we coin "SLEDGE," features dot-shaped gates that ar… ▽ More Spin-based silicon quantum dots are an attractive qubit technology for quantum information processing with respect to coherence time, control, and engineering. Here we present an exchange-only Si qubit device platform that combines the throughput of CMOS-like wafer processing with the versatility of direct-write lithography. The technology, which we coin "SLEDGE," features dot-shaped gates that are patterned simultaneously on one topographical plane and subsequently connected by vias to interconnect metal lines. The process design enables non-trivial layouts as well as flexibility in gate dimensions, material selection, and additional device features such as for rf qubit control. We show that the SLEDGE process has reduced electrostatic disorder with respect to traditional overlapping gate devices with lift-off metallization, and we present spin coherent exchange oscillations and single qubit blind randomized benchmarking data. △ Less

Submitted 22 July, 2021; originally announced July 2021.

arXiv:2107.04152 [pdf, other]

Levi Graph AMR Parser using Heterogeneous Attention

Authors: Han He, Jinho D. Choi

Abstract: Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsin… ▽ More Coupled with biaffine decoders, transformers have been effectively adapted to text-to-graph transduction and achieved state-of-the-art performance on AMR parsing. Many prior works, however, rely on the biaffine decoder for either or both arc and label predictions although most features used by the decoder may be learned by the transformer already. This paper presents a novel approach to AMR parsing by combining heterogeneous data (tokens, concepts, labels) as one input to a transformer to learn attention, and use only attention matrices from the transformer to predict all elements in AMR graphs (concepts, arcs, labels). Although our models use significantly fewer parameters than the previous state-of-the-art graph parser, they show similar or better accuracy on AMR 2.0 and 3.0. △ Less

Submitted 8 July, 2021; originally announced July 2021.

Comments: Accepted in IWPT 2021: The 17th International Conference on Parsing Technologies

arXiv:2107.03038 [pdf, other]

Maintaining a Reliable World Model using Action-aware Perceptual Anchoring

Authors: Ying Siu Liang, Dongkyu Choi, Kenneth Kwok

Abstract: Reliable perception is essential for robots that interact with the world. But sensors alone are often insufficient to provide this capability, and they are prone to errors due to various conditions in the environment. Furthermore, there is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible. This requires anchoring perceptual informa… ▽ More Reliable perception is essential for robots that interact with the world. But sensors alone are often insufficient to provide this capability, and they are prone to errors due to various conditions in the environment. Furthermore, there is a need for robots to maintain a model of its surroundings even when objects go out of view and are no longer visible. This requires anchoring perceptual information onto symbols that represent the objects in the environment. In this paper, we present a model for action-aware perceptual anchoring that enables robots to track objects in a persistent manner. Our rule-based approach considers inductive biases to perform high-level reasoning over the results from low-level object detection, and it improves the robot's perceptual capability for complex tasks. We evaluate our model against existing baseline models for object permanence and show that it outperforms these on a snitch localisation task using a dataset of 1,371 videos. We also integrate our action-aware perceptual anchoring in the context of a cognitive architecture and demonstrate its benefits in a realistic gearbox assembly task on a Universal Robot. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 7 pages, 3 figures

Journal ref: 2021 International Conference on Robotics and Automation (ICRA 2021)

arXiv:2107.01354 [pdf, other]

doi 10.1145/3448016.3457326

Pool of Experts: Realtime Querying Specialized Knowledge in Massive Neural Networks

Authors: Hakbin Kim, Dong-Wan Choi

Abstract: In spite of the great success of deep learning technologies, training and delivery of a practically serviceable model is still a highly time-consuming process. Furthermore, a resulting model is usually too generic and heavyweight, and hence essentially goes through another expensive model compression phase to fit in a resource-limited device like embedded systems. Inspired by the fact that a machi… ▽ More In spite of the great success of deep learning technologies, training and delivery of a practically serviceable model is still a highly time-consuming process. Furthermore, a resulting model is usually too generic and heavyweight, and hence essentially goes through another expensive model compression phase to fit in a resource-limited device like embedded systems. Inspired by the fact that a machine learning task specifically requested by mobile users is often much simpler than it is supported by a massive generic model, this paper proposes a framework, called Pool of Experts (PoE), that instantly builds a lightweight and task-specific model without any training process. For a realtime model querying service, PoE first extracts a pool of primitive components, called experts, from a well-trained and sufficiently generic network by exploiting a novel conditional knowledge distillation method, and then performs our train-free knowledge consolidation to quickly combine necessary experts into a lightweight network for a target task. Thanks to this train-free property, in our thorough empirical study, PoE can build a fairly accurate yet compact model in a realtime manner, whereas it takes a few minutes per query for the other training methods to achieve a similar level of the accuracy. △ Less

Submitted 3 July, 2021; originally announced July 2021.

Comments: In SIGMOD/PODS 2021

Journal ref: SIGMOD Conference 2021: 2244-2252

arXiv:2107.01349 [pdf, other]

Split-and-Bridge: Adaptable Class Incremental Learning within a Single Neural Network

Authors: Jong-Yeong Kim, Dong-Wan Choi

Abstract: Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use… ▽ More Continual learning has been a major problem in the deep learning community, where the main challenge is how to effectively learn a series of newly arriving tasks without forgetting the knowledge of previous tasks. Initiated by Learning without Forgetting (LwF), many of the existing works report that knowledge distillation is effective to preserve the previous knowledge, and hence they commonly use a soft label for the old task, namely a knowledge distillation (KD) loss, together with a class label for the new task, namely a cross entropy (CE) loss, to form a composite loss for a single neural network. However, this approach suffers from learning the knowledge by a CE loss as a KD loss often more strongly influences the objective function when they are in a competitive situation within a single network. This could be a critical problem particularly in a class incremental scenario, where the knowledge across tasks as well as within the new task, both of which can only be acquired by a CE loss, is essentially learned due to the existence of a unified classifier. In this paper, we propose a novel continual learning method, called Split-and-Bridge, which can successfully address the above problem by partially splitting a neural network into two partitions for training the new task separated from the old task and re-connecting them for learning the knowledge across tasks. In our thorough experimental analysis, our Split-and-Bridge method outperforms the state-of-the-art competitors in KD-based continual learning. △ Less

Submitted 3 July, 2021; originally announced July 2021.

Comments: In AAAI-2021

Journal ref: In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 9, pp. 8137-8145) 2021

arXiv:2107.00248 [pdf, ps, other]

New Estimands for Experiments with Strong Interference

Authors: David Choi

Abstract: In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estim… ▽ More In experiments that study social phenomena, such as peer influence or herd immunity, the treatment of one unit may influence the outcomes of others. Such "interference between units" violates traditional approaches for causal inference, so that additional assumptions are often imposed to model or limit the underlying social mechanism. For binary outcomes, we propose new estimands that can be estimated without such assumptions, allowing for interval estimates assuming only the randomization of treatment. However, the causal implications of these estimands are more limited than those attainable under stronger assumptions, showing only that the treatment effects under the observed assignment varied systematically as a function of each unit's direct and indirect exposure, while also lower bounding the number of units affected. △ Less

Submitted 29 August, 2023; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: new title, expanded discussion of interpretation and limitations, consolidation of central limit theorem results

arXiv:2106.12767 [pdf, other]

TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

Authors: Dongjin Choi, Sara Evensen, Çağatay Demiralp, Estevam Hruschka

Abstract: Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, inclu… ▽ More Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks. △ Less

Submitted 24 June, 2021; originally announced June 2021.

Comments: WWW'21 Demo

arXiv:2106.11155 [pdf, ps, other]

doi 10.1134/S2070046622010058

Non-archimedean Sendov's Conjecture

Authors: Daebeom Choi, Seewoo Lee

Abstract: We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture. We prove non-archimedean analogue of Sendov's conjecure. We also provide complete list of polynomials over an algebraically closed non-archimedean field $K$ that satisfy the optimal bound in the Sendov's conjecture. △ Less

Submitted 4 July, 2021; v1 submitted 21 June, 2021; originally announced June 2021.

Comments: 4 pages, criterion for polynomials of degree $n$ with $v(n) = 0$ is added

MSC Class: 11C08; 11S05

Journal ref: P-Adic Num Ultrametr Anal Appl 14, 77-80 (2022)

arXiv:2106.07610 [pdf, other]

doi 10.1371/journal.pcbi.1009867

Measuring the repertoire of age-related behavioral changes in Drosophila melanogaster

Authors: Katherine E. Overman, Daniel M. Choi, Kawai Leung, Joshua W. Shaevitz, Gordon J. Berman

Abstract: Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors… ▽ More Aging affects almost all aspects of an organism -- its morphology, its physiology, its behavior. Isolating which biological mechanisms are regulating these changes, however, has proven difficult, potentially due to our inability to characterize the full repertoire of an animal's behavior across the lifespan. Using data from fruit flies (D. melanogaster) we measure the full repertoire of behaviors as a function of age. We observe a sexually dimorphic pattern of changes in the behavioral repertoire during aging. Although the stereotypy of the behaviors and the complexity of the repertoire overall remains relatively unchanged, we find evidence that the observed alterations in behavior can be explained by changing the fly's overall energy budget, suggesting potential connections between metabolism, aging, and behavior. △ Less

Submitted 15 June, 2021; v1 submitted 14 June, 2021; originally announced June 2021.

arXiv:2106.05478 [pdf, other]

Semantic-aware Binary Code Representation with BERT

Authors: Hyungjoon Koo, Soyeon Park, Daejin Choi, Taesoo Kim

Abstract: A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the exis… ▽ More A wide range of binary analysis applications, such as bug discovery, malware analysis and code clone detection, require recovery of contextual meanings on a binary code. Recently, binary analysis techniques based on machine learning have been proposed to automatically reconstruct the code representation of a binary instead of manually crafting specifics of the analysis algorithm. However, the existing approaches utilizing machine learning are still specialized to solve one domain of problems, rendering recreation of models for different types of binary analysis. In this paper, we propose DeepSemantic utilizing BERT in producing the semantic-aware code representation of a binary code. To this end, we introduce well-balanced instruction normalization that holds rich information for each of instructions yet minimizing an out-of-vocabulary (OOV) problem. DeepSemantic has been carefully designed based on our study with large swaths of binaries. Besides, DeepSemantic leverages the essence of the BERT architecture into re-purposing a pre-trained generic model that is readily available as a one-time processing, followed by quickly applying specific downstream tasks with a fine-tuning process. We demonstrate DeepSemantic with two downstream tasks, namely, binary similarity comparison and compiler provenance (i.e., compiler and optimization level) prediction. Our experimental results show that the binary similarity model outperforms two state-of-the-art binary similarity tools, DeepBinDiff and SAFE, 49.84% and 15.83% on average, respectively. △ Less

Submitted 9 June, 2021; originally announced June 2021.

Comments: 16 pages

arXiv:2105.11354 [pdf, other]

View Distillation with Unlabeled Data for Extracting Adverse Drug Effects from User-Generated Data

Authors: Payam Karisani, Jinho D. Choi, Li Xiong

Abstract: We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in t… ▽ More We present an algorithm based on multi-layer transformers for identifying Adverse Drug Reactions (ADR) in social media data. Our model relies on the properties of the problem and the characteristics of contextual word embeddings to extract two views from documents. Then a classifier is trained on each view to label a set of unlabeled documents to be used as an initializer for a new classifier in the other view. Finally, the initialized classifier in each view is further trained using the initial training examples. We evaluated our model in the largest publicly available ADR dataset. The experiments testify that our model significantly outperforms the transformer-based models pretrained on domain-specific data. △ Less

Submitted 24 May, 2021; originally announced May 2021.

Comments: NAACL 2021 (workshops)

arXiv:2105.05601 [pdf, other]

OutFlip: Generating Out-of-Domain Samples for Unknown Intent Detection with Natural Language Attack

Authors: DongHyun Choi, Myeong Cheol Shin, EungGyun Kim, Dong Ryeol Shin

Abstract: Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples inste… ▽ More Out-of-domain (OOD) input detection is vital in a task-oriented dialogue system since the acceptance of unsupported inputs could lead to an incorrect response of the system. This paper proposes OutFlip, a method to generate out-of-domain samples using only in-domain training dataset automatically. A white-box natural language attack method HotFlip is revised to generate out-of-domain samples instead of adversarial examples. Our evaluation results showed that integrating OutFlip-generated out-of-domain samples into the training dataset could significantly improve an intent classification model's out-of-domain detection performance. △ Less

Submitted 12 May, 2021; originally announced May 2021.

Comments: 9 pages, 3 figures; to be appear in ACL Findings of ACL-IJCNLP 2021

arXiv:2105.02381 [pdf, other]

Balancing weights for region-level analysis: the effect of Medicaid Expansion on the uninsurance rate among states that did not expand Medicaid

Authors: Max Rubinstein, Amelia Haviland, David Choi

Abstract: We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distr… ▽ More We predict the average effect of Medicaid expansion on the non-elderly adult uninsurance rate among states that did not expand Medicaid in 2014 as if they had expanded their Medicaid eligibility requirements. Using American Community Survey data aggregated to the region level, we estimate this effect by finding weights that approximately reweights the expansion regions to match the covariate distribution of the non-expansion regions. Existing methods to estimate balancing weights often assume that the covariates are measured without error and do not account for dependencies in the outcome model. Our covariates have random noise that is uncorrelated with the outcome errors and our outcome model has state-level random effects inducing dependence between regions. To correct for the bias induced by the measurement error, we propose generating our weights on a linear approximation to the true covariates, using an idea from measurement error literature known as "regression-calibration" (see, e.g., Carroll (2006)). This requires auxiliary data to estimate the variability of the measurement error. We also modify the Stable Balancing Weights objective proposed by Zubizaretta (2015)) to reduce the variance of our estimator when the model errors follow our assumed correlation structure. We show that these approaches outperform existing methods when attempting to predict observed outcomes during the pre-treatment period. Using this method we estimate that Medicaid expansion would have caused a -2.33 (-3.54, -1.11) percentage point change in the adult uninsurance rate among states that did not expand Medicaid. △ Less

Submitted 23 May, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

arXiv:2104.10117 [pdf, other]

Enhancing Cognitive Models of Emotions with Representation Learning

Authors: Yuting Guo, Jinho Choi

Abstract: We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model… ▽ More We present a novel deep learning-based framework to generate embedding representations of fine-grained emotions that can be used to computationally describe psychological models of emotions. Our framework integrates a contextualized embedding encoder with a multi-head probing model that enables to interpret dynamically learned representations optimized for an emotion classification task. Our model is evaluated on the Empathetic Dialogue dataset and shows the state-of-the-art result for classifying 32 emotions. Our layer analysis can derive an emotion graph to depict hierarchical relations among the emotions. Our emotion representations can be used to generate an emotion wheel directly comparable to the one from Plutchik's\LN model, and also augment the values of missing emotions in the PAD emotional state model. △ Less

Submitted 20 April, 2021; originally announced April 2021.

Comments: Accepted by the NAACL Workshop on Cognitive Modeling and Computational Linguistics 2021

arXiv:2104.06924 [pdf, other]

Evaluation of Unsupervised Entity and Event Salience Estimation

Authors: Jiaying Lu, Jinho D. Choi

Abstract: Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the ev… ▽ More Salience Estimation aims to predict term importance in documents. Due to few existing human-annotated datasets and the subjective notion of salience, previous studies typically generate pseudo-ground truth for evaluation. However, our investigation reveals that the evaluation protocol proposed by prior work is difficult to replicate, thus leading to few follow-up studies existing. Moreover, the evaluation process is problematic: the entity linking tool used for entity matching is very noisy, while the ignorance of event argument for event evaluation leads to boosted performance. In this work, we propose a light yet practical entity and event salience estimation evaluation protocol, which incorporates the more reliable syntactic dependency parser. Furthermore, we conduct a comprehensive analysis among popular entity and event definition standards, and present our own definition for the Salience Estimation task to reduce noise during the pseudo-ground truth generation process. Furthermore, we construct dependency-based heterogeneous graphs to capture the interactions of entities and events. The empirical results show that both baseline methods and the novel GNN method utilizing the heterogeneous graph consistently outperform the previous SOTA model in all proposed metrics. △ Less

Submitted 14 April, 2021; originally announced April 2021.

Journal ref: Proceedings of the 34rd International Florida Artificial Intelligence Research Society Conference, 2021

arXiv:2104.06171 [pdf, other]

doi 10.1103/PhysRevB.104.045406

Atomic Manipulation of In-gap States on the $β$-Bi$_2$Pd Superconductor

Authors: Cristina Mier, Jiyoon Hwang, Jinkyung Kim, Yujeong Bae, Fuyuki Nabeshima, Yoshinori Imai, Atsutaka Maeda, Nicolás Lorente, Andreas Heinrich, Deung-Jang Choi

Abstract: Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological p… ▽ More Electronic states in the gap of a superconductor inherit intriguing many-body properties from the superconductor. Here, we create these in-gap states by manipulating Cr atomic chains on the $β$-Bi$_2$Pd superconductor. We find that the topological properties of the in-gap states can greatly vary depending on the crafted spin chain. These systems make an ideal platform for non-trivial topological phases because of the large atom-superconductor interactions and the existence of a large Rashba coupling at the Bi-terminated surface. We study two spin chains, one with atoms two-lattice-parameter apart and one with square-root-of-two lattice parameters. Of these, only the second one is in a topologically non-trivial phase, in correspondence with the spin interactions for this geometry. △ Less

Submitted 6 May, 2021; v1 submitted 13 April, 2021; originally announced April 2021.

Journal ref: Phys. Rev. B 104, 045406 (2021)

arXiv:2104.00924 [pdf, other]

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

Authors: Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, Yong Man Ro

Abstract: Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input seq… ▽ More Our work addresses long-term motion context issues for predicting future frames. To predict the future precisely, it is required to capture which long-term motion context (e.g., walking or running) the input motion (e.g., leg movement) belongs to. The bottlenecks arising when dealing with the long-term motion context are: (i) how to predict the long-term motion context naturally matching input sequences with limited dynamics, (ii) how to predict the long-term motion context with high-dimensionality (e.g., complex motion). To address the issues, we propose novel motion context-aware video prediction. To solve the bottleneck (i), we introduce a long-term motion context memory (LMC-Memory) with memory alignment learning. The proposed memory alignment learning enables to store long-term motion contexts into the memory and to match them with sequences including limited dynamics. As a result, the long-term context can be recalled from the limited input sequence. In addition, to resolve the bottleneck (ii), we propose memory query decomposition to store local motion context (i.e., low-dimensional dynamics) and recall the suitable local context for each local part of the input individually. It enables to boost the alignment effects of the memory. Experimental results show that the proposed method outperforms other sophisticated RNN-based methods, especially in long-term condition. Further, we validate the effectiveness of the proposed network designs by conducting ablation studies and memory feature analysis. The source code of this work is available. △ Less

Submitted 2 April, 2021; originally announced April 2021.

Comments: CVPR 2021 (Oral)

arXiv:2103.12533

Multiplicity one bound for cohomological automorphic representations with a fixed level

Authors: Dohoon Choi

Abstract: Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$. Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that… ▽ More Let $F$ be a totally real field, and $\mathbb{A}_F$ be the adele ring of $F$. Let us fix $N$ to be a positive integer. Let $π_1=\otimesπ_{1,v}$ and $π_2=\otimesπ_{2,v}$ be distinct cohomological cuspidal automorphic representations of $\mathrm{GL}_n(\mathbb{A}_{F})$ with levels less than or equal to $N$. Let $\mathcal{N}(π_1,π_2)$ be the minimum of the absolute norm of $v \nmid \infty$ such that $π_{1,v} \not \simeq π_{2,v}$ and that $π_{1,v}$ and $π_{2,v}$ are unramified. We prove that there exists a constant $C_N$ such that for every pair $π_1$ and $π_2$, $$\mathcal{N}(π_1,π_2) \leq C_N.$$ This improves known bounds $$ \mathcal{N}(π_1,π_2)=O(Q^A) \;\;\; (\text{some } A \text{ depending only on } n), $$ where $Q$ is the maximum of the analytic conductors of $π_1$ and $π_2$. This result applies to newforms on $Γ_1(N)$. In particular, assume that $f_1$ and $f_2$ are Hecke eigenforms of weight $k_1$ and $k_2$ on $\mathrm{SL}_2(\mathbb{Z})$, respectively. We prove that if for all $p \in \{2,7\}$, $$λ_{f_1}(p)/\sqrt{p}^{(k_1-1)} = λ_{f_2}(p)/\sqrt{p}^{(k_2-1)},$$ then $f_1=cf_2$ for some constant $c$. Here, for each prime $p$, $λ_{f_i}(p)$ denotes the $p$-th Hecke eigenvalue of $f_i$. △ Less

Submitted 12 March, 2022; v1 submitted 23 March, 2021; originally announced March 2021.

Comments: A new paper preparing with other collaborators will include the results of this paper

arXiv:2103.09582 [pdf]

doi 10.1103/PhysRevB.104.174408

Spin Resonance Amplitude and Frequency of a Single Atom on a Surface in a Vector Magnetic Field

Authors: Jinkyung Kim, Won-jun Jang, Thi Hong Bui, Deung-jang Choi, Christoph Wolf, Fernando Delgado, Denis Krylov, Soonhyeong Lee, Sangwon Yoon, Christopher P. Lutz, Andreas J. Heinrich, Yujeong Bae

Abstract: We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude tha… ▽ More We used electron spin resonance (ESR) combined with scanning tunneling microscopy (STM) to measure hydrogenated Ti (spin-1/2) atoms at low-symmetry binding sites on MgO in vector magnetic fields. We found strongly anisotropic g-values in all three spatial directions. Interestingly, the amplitude and lineshape of the ESR signals are also strongly dependent on the angle of the field. We conclude that the Ti spin is aligned along the magnetic field, while the tip spin follows its strong magnetic anisotropy. Our results show the interplay between the tip and surface spins in determining the ESR signals and highlight the precision of ESR-STM to identify the single atom's spin states. △ Less

Submitted 17 March, 2021; originally announced March 2021.

arXiv:2103.04044 [pdf, other]

Putting Humans in the Natural Language Processing Loop: A Survey

Authors: Zijie J. Wang, Dongjin Choi, Shenyu Xu, Diyi Yang

Abstract: How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods… ▽ More How can we design Natural Language Processing (NLP) systems that learn from human feedback? There is a growing research body of Human-in-the-loop (HITL) NLP frameworks that continuously integrate human feedback to improve the model itself. HITL NLP research is nascent but multifarious -- solving various NLP problems, collecting diverse feedback from different people, and applying different methods to learn from collected feedback. We present a survey of HITL NLP work from both Machine Learning (ML) and Human-Computer Interaction (HCI) communities that highlights its short yet inspiring history, and thoroughly summarize recent frameworks focusing on their tasks, goals, human interactions, and feedback learning methods. Finally, we discuss future directions for integrating human feedback in the NLP development loop. △ Less

Submitted 6 March, 2021; originally announced March 2021.

Comments: The paper is accepted to the HCI+NLP workshop at EACL 2021

arXiv:2103.01655 [pdf, other]

Run Your Visual-Inertial Odometry on NVIDIA Jetson: Benchmark Tests on a Micro Aerial Vehicle

Authors: Jinwoo Jeon, Sungwook Jung, Eungchang Lee, Duckyu Choi, Hyun Myung

Abstract: This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform we… ▽ More This paper presents benchmark tests of various visual(-inertial) odometry algorithms on NVIDIA Jetson platforms. The compared algorithms include mono and stereo, covering Visual Odometry (VO) and Visual-Inertial Odometry (VIO): VINS-Mono, VINS-Fusion, Kimera, ALVIO, Stereo-MSCKF, ORB-SLAM2 stereo, and ROVIO. As these methods are mainly used for unmanned aerial vehicles (UAVs), they must perform well in situations where the size of the processing board and weight is limited. Jetson boards released by NVIDIA satisfy these constraints as they have a sufficiently powerful central processing unit (CPU) and graphics processing unit (GPU) for image processing. However, in existing studies, the performance of Jetson boards as a processing platform for executing VO/VIO has not been compared extensively in terms of the usage of computing resources and accuracy. Therefore, this study compares representative VO/VIO algorithms on several NVIDIA Jetson platforms, namely NVIDIA Jetson TX2, Xavier NX, and AGX Xavier, and introduces a novel dataset 'KAIST VIO dataset' for UAVs. Including pure rotations, the dataset has several geometric trajectories that are harsh to visual(-inertial) state estimation. The evaluation is performed in terms of the accuracy of estimated odometry, CPU usage, and memory usage on various Jetson boards, algorithms, and trajectories. We present the {results of the} comprehensive benchmark test and release the dataset for the computer vision and robotics applications. △ Less

Submitted 2 March, 2021; originally announced March 2021.

Comments: 8 pages, 5 figures

arXiv:2012.14649 [pdf]

doi 10.1007/978-981-16-4803-8_16

Peacock Exploration: A Lightweight Exploration for UAV using Control-Efficient Trajectory

Authors: EungChang Mason Lee, Duckyu Choi, Hyun Myung

Abstract: Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for U… ▽ More Unmanned Aerial Vehicles have received much attention in recent years due to its wide range of applications, such as exploration of an unknown environment to acquire a 3D map without prior knowledge of it. Existing exploration methods have been largely challenged by computationally heavy probabilistic path planning. Similarly, kinodynamic constraints or proper sensors considering the payload for UAVs were not considered. In this paper, to solve those issues and to consider the limited payload and computational resource of UAVs, we propose "Peacock Exploration": A lightweight exploration method for UAVs using precomputed minimum snap trajectories which look like a peacock's tail. Using the widely known, control efficient minimum snap trajectories and OctoMap, the UAV equipped with a RGB-D camera can explore unknown 3D environments without any prior knowledge or human-guidance with only O(logN) computational complexity. It also adopts the receding horizon approach and simple, heuristic scoring criteria. The proposed algorithm's performance is demonstrated by exploring a challenging 3D maze environment and compared with a state-of-the-art algorithm. △ Less

Submitted 29 December, 2020; originally announced December 2020.

Comments: 10 pages

arXiv:2012.13981 [pdf, other]

doi 10.1103/PhysRevPhysEducRes.17.020110

Creating a Physicist: The Impact of Informal Programs on University Student Development

Authors: Callie Rethman, Jonathan Perry, Jonan Donaldson, Daniel Choi, Tatiana Erukhimova

Abstract: Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal p… ▽ More Physics outreach programs provide a critical context for informal experiences that promote the transition from new student to contributing physicist. Prior studies have suggested a positive link between participation in informal physics outreach programs and the development of a student's physics identity. In this study, we adopt a student-focused investigation to explore the effects of informal programs on dimensions of physics identity, sense of community, 21st century skill development, and motivation. We employed a mixed methods study combining a survey instrument (117 responses) and interviews (35) with current and former undergraduate and graduate students who participated in five programs through a physics and astronomy department at a large land-grant university. To examine interviews, we employed a framework based on situated learning theory, transformative learning theory, and the Dynamic Systems Model of Role Identity. Our findings, based on self-reported data, show that students who facilitated informal physics programs positively developed their physics identity, experienced increased sense of belonging to the physics community, and developed 21st century career skills. Specifically, students reported positive benefits to their communication, teamwork and networking, and design skills. The benefits of these programs can be achieved by departments of any size without significant commitment of funds or changes to curriculum. △ Less

Submitted 29 May, 2021; v1 submitted 27 December, 2020; originally announced December 2020.

Comments: 15 pages, 6 figures

Journal ref: Phys. Rev. Phys. Educ. Res. 17, 020110 (2021)

arXiv:2011.10897

Reinforcement learning with distance-based incentive/penalty (DIP) updates for highly constrained industrial control systems

Authors: Hyungjun Park, Daiki Min, Jong-hyun Ryu, Dong Gu Choi

Abstract: Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First… ▽ More Typical reinforcement learning (RL) methods show limited applicability for real-world industrial control problems because industrial systems involve various constraints and simultaneously require continuous and discrete control. To overcome these challenges, we devise a novel RL algorithm that enables an agent to handle a highly constrained action space. This algorithm has two main features. First, we devise two distance-based Q-value update schemes, incentive update and penalty update, in a distance-based incentive/penalty update technique to enable the agent to decide discrete and continuous actions in the feasible region and to update the value of these types of actions. Second, we propose a method for defining the penalty cost as a shadow price-weighted penalty. This approach affords two advantages compared to previous methods to efficiently induce the agent to not select an infeasible action. We apply our algorithm to an industrial control problem, microgrid system operation, and the experimental results demonstrate its superiority. △ Less

Submitted 19 May, 2021; v1 submitted 21 November, 2020; originally announced November 2020.

Comments: We request withdrawal of this article due to a definition error on methodology and problem definition (Section 3-4; pages 2-5)

arXiv:2011.10164 [pdf]

doi 10.1021/acs.nanolett.1c00449

Nonlinear imaging of nanoscale topological corner states

Authors: Sergey Kruk, Wenlong Gao, Duk Yong Choi, Thomas Zentgraf, Shuang Zhang, Yuri Kivshar

Abstract: Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to… ▽ More Topological states of light represent counterintuitive optical modes localized at boundaries of finite-size optical structures that originate from the properties of the bulk. Being defined by bulk properties, such boundary states are insensitive to certain types of perturbations, thus naturally enhancing robustness of photonic circuitries. Conventionally, the N-dimensional bulk modes correspond to (N-1)-dimensional boundary states. The higher-order bulk-boundary correspondence relates N-dimensional bulk to boundary states with dimensionality reduced by more than 1. A special interest lies in miniaturization of such higher-order topological states to the nanoscale. Here, we realize nanoscale topological corner states in metasurfaces with C6-symmetric honeycomb lattices. We directly observe nanoscale topology-empowered edge and corner localizations of light and enhancement of light-matter interactions via a nonlinear imaging technique. Control of light at the nanoscale empowered by topology may facilitate miniaturization and on-chip integration of classical and quantum photonic devices. △ Less

Submitted 1 September, 2022; v1 submitted 19 November, 2020; originally announced November 2020.

Journal ref: Nano Lett. 2021, 21, 11, 4592-4597

arXiv:2011.04803 [pdf, other]

Self-Tuning Stochastic Optimization with Curvature-Aware Gradient Filtering

Authors: Ricky T. Q. Chen, Dami Choi, Lukas Balles, David Duvenaud, Philipp Hennig

Abstract: Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyper… ▽ More Standard first-order stochastic optimization algorithms base their updates solely on the average mini-batch gradient, and it has been shown that tracking additional quantities such as the curvature can help de-sensitize common hyperparameters. Based on this intuition, we explore the use of exact per-sample Hessian-vector products and gradients to construct optimizers that are self-tuning and hyperparameter-free. Based on a dynamics model of the gradient, we derive a process which leads to a curvature-corrected, noise-adaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our model-based procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of well-tuned optimizers and ultimately, this is an interesting step for constructing self-tuning optimizers. △ Less

Submitted 9 November, 2020; originally announced November 2020.

arXiv:2011.02998 [pdf, other]

Competence-Level Prediction and Resume & Job Description Matching Using Context-Aware Transformer Models

Authors: Changmao Li, Elaine Fisher, Rebecca Thomas, Steve Pittard, Vicki Hertzberg, Jinho D. Choi

Abstract: This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resu… ▽ More This paper presents a comprehensive study on resume classification to reduce the time and labor needed to screen an overwhelming number of applications significantly, while improving the selection of suitable candidates. A total of 6,492 resumes are extracted from 24,933 job applications for 252 positions designated into four levels of experience for Clinical Research Coordinators (CRC). Each resume is manually annotated to its most appropriate CRC position by experts through several rounds of triple annotation to establish guidelines. As a result, a high Kappa score of 61% is achieved for inter-annotator agreement. Given this dataset, novel transformer-based classification models are developed for two tasks: the first task takes a resume and classifies it to a CRC level (T1), and the second task takes both a resume and a job description to apply and predicts if the application is suited to the job T2. Our best models using section encoding and multi-head attention decoding give results of 73.3% to T1 and 79.2% to T2. Our analysis shows that the prediction errors are mostly made among adjacent CRC levels, which are hard for even experts to distinguish, implying the practical value of our models in real HR platforms. △ Less

Submitted 5 November, 2020; originally announced November 2020.

Comments: Accepted by the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)

ACM Class: I.2.7

arXiv:2011.02207 [pdf, other]

Extracting Chemical-Protein Interactions via Calibrated Deep Neural Network and Self-training

Authors: Dongha Choi, Hyunju Lee

Abstract: The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data,… ▽ More The extraction of interactions between chemicals and proteins from several biomedical articles is important in many fields of biomedical research such as drug development and prediction of drug side effects. Several natural language processing methods, including deep neural network (DNN) models, have been applied to address this problem. However, these methods were trained with hard-labeled data, which tend to become over-confident, leading to degradation of the model reliability. To estimate the data uncertainty and improve the reliability, "calibration" techniques have been applied to deep learning models. In this study, to extract chemical--protein interactions, we propose a DNN-based approach incorporating uncertainty information and calibration techniques. Our model first encodes the input sequence using a pre-trained language-understanding model, following which it is trained using two calibration methods: mixup training and addition of a confidence penalty loss. Finally, the model is re-trained with augmented data that are extracted using the estimated uncertainties. Our approach has achieved state-of-the-art performance with regard to the Biocreative VI ChemProt task, while preserving higher calibration abilities than those of previous approaches. Furthermore, our approach also presents the possibilities of using uncertainty estimation for performance improvement. △ Less

Submitted 4 November, 2020; originally announced November 2020.

Comments: 10 pages, 4 figures, accepted for the Findings of EMNLP

Showing 101–150 of 344 results for author: Choi, D