-
Distributional Preference Alignment of LLMs via Optimal Transport
Authors:
Igor Melnyk,
Youssef Mroueh,
Brian Belgodere,
Mattia Rigotti,
Apoorva Nitsure,
Mikhail Yurochkin,
Kristjan Greenewald,
Jiri Navratil,
Jerret Ross
Abstract:
Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically…
▽ More
Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval.
△ Less
Submitted 9 June, 2024;
originally announced June 2024.
-
GP-MoLFormer: A Foundation Model For Molecular Generation
Authors:
Jerret Ross,
Brian Belgodere,
Samuel C. Hoffman,
Vijil Chenthamarakshan,
Youssef Mroueh,
Payel Das
Abstract:
Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFo…
▽ More
Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFormer, an autoregressive molecular string generator that is trained on more than 1.1B chemical SMILES. GP-MoLFormer uses a 46.8M parameter transformer decoder model with linear attention and rotary positional encodings as the base architecture. We explore the utility of GP-MoLFormer in generating novel, valid, and unique SMILES. Impressively, we find GP-MoLFormer is able to generate a significant fraction of novel, valid, and unique SMILES even when the number of generated molecules is in the 10 billion range and the reference set is over a billion. We also find strong memorization of training data in GP-MoLFormer generations, which has so far remained unexplored for chemical language models. Our analyses reveal that training data memorization and novelty in generations are impacted by the quality of the training data; duplication bias in training data can enhance memorization at the cost of lowering novelty. We evaluate GP-MoLFormer's utility and compare it with that of existing baselines on three different tasks: de novo generation, scaffold-constrained molecular decoration, and unconstrained property-guided optimization. While the first two are handled with no additional training, we propose a parameter-efficient fine-tuning method for the last task, which uses property-ordered molecular pairs as input. We call this new approach pair-tuning. Our results show GP-MoLFormer performs better or comparable with baselines across all three tasks, demonstrating its general utility.
△ Less
Submitted 4 April, 2024;
originally announced May 2024.
-
Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection
Authors:
Gaurav Bhatt,
James Ross,
Leonid Sigal
Abstract:
Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-…
▽ More
Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-trained DETR-style detector to new tasks while preserving knowledge from previous tasks. We propose a novel localized query function for efficient information retrieval from memory units, aiming to minimize forgetting. Furthermore, we identify a fundamental challenge in continual detection referred to as background relegation. This arises when object categories from earlier tasks reappear in future tasks, potentially without labels, leading them to be implicitly treated as background. This is an inevitable issue in continual detection or segmentation. The introduced continual optimization technique effectively tackles this challenge. Finally, we assess the performance of our proposed system on continual detection benchmarks and demonstrate that our approach surpasses the performance of existing state-of-the-art resulting in 5-7% improvements on MS-COCO and PASCAL-VOC on the task of continual detection.
△ Less
Submitted 15 July, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Risk Aware Benchmarking of Large Language Models
Authors:
Apoorva Nitsure,
Youssef Mroueh,
Mattia Rigotti,
Kristjan Greenewald,
Brian Belgodere,
Mikhail Yurochkin,
Jiri Navratil,
Igor Melnyk,
Jerret Ross
Abstract:
We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and math…
▽ More
We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a metrics portfolio for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.
△ Less
Submitted 9 June, 2024; v1 submitted 10 October, 2023;
originally announced October 2023.
-
Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies
Authors:
Mehdi Rezaie,
Ashley J. Ross,
Hee-Jong Seo,
Hui Kong,
Anna Porredon,
Lado Samushia,
Edmond Chaussidon,
Alex Krolewski,
Arnaud de Mattia,
Florian Beutler,
Jessica Nicole Aguilar,
Steven Ahlen,
Shadab Alam,
Santiago Avila,
Benedict Bahr-Kalus,
Jose Bermejo-Climent,
David Brooks,
Todd Claybaugh,
Shaun Cole,
Kyle Dawson,
Axel de la Macorra,
Peter Doel,
Andreu Font-Ribera,
Jaime E. Forero-Romero,
Satya Gontcho A Gontcho
, et al. (24 additional authors not shown)
Abstract:
We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the…
▽ More
We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty.
△ Less
Submitted 25 June, 2024; v1 submitted 4 July, 2023;
originally announced July 2023.
-
Improved Synthesis of Toffoli-Hadamard Circuits
Authors:
Matthew Amy,
Andrew N. Glaudell,
Sarah Meng Li,
Neil J. Ross
Abstract:
The matrices that can be exactly represented by a circuit over the Toffoli-Hadamard gate set are the orthogonal matrices of the form $M/ \sqrt{2}{}^k$, where $M$ is an integer matrix and $k$ is a nonnegative integer. The exact synthesis problem for this gate set is the problem of constructing a circuit for a given such matrix. Existing methods produce circuits consisting of $O(2^n \log(n)k)$ gates…
▽ More
The matrices that can be exactly represented by a circuit over the Toffoli-Hadamard gate set are the orthogonal matrices of the form $M/ \sqrt{2}{}^k$, where $M$ is an integer matrix and $k$ is a nonnegative integer. The exact synthesis problem for this gate set is the problem of constructing a circuit for a given such matrix. Existing methods produce circuits consisting of $O(2^n \log(n)k)$ gates, where $n$ is the dimension of the matrix. In this paper, we provide two improved synthesis methods. First, we show that a technique introduced by Kliuchnikov in 2013 for Clifford+$T$ circuits can be straightforwardly adapted to Toffoli-Hadamard circuits, reducing the complexity of the synthesized circuit from $O(2^n \log(n)k)$ to $O(n^2 \log(n)k)$. Then, we present an alternative synthesis method of similarly improved cost, but whose application is restricted to circuits on no more than three qubits. Our results also apply to orthogonal matrices over the dyadic fractions, which correspond to circuits using the 2-qubit gate $H\otimes H$, rather than the usual single-qubit Hadamard gate $H$.
△ Less
Submitted 18 May, 2023;
originally announced May 2023.
-
Auditing and Generating Synthetic Data with Controllable Trust Trade-offs
Authors:
Brian Belgodere,
Pierre Dognin,
Adam Ivankay,
Igor Melnyk,
Youssef Mroueh,
Aleksandra Mojsilovic,
Jiri Navratil,
Apoorva Nitsure,
Inkit Padhi,
Mattia Rigotti,
Jerret Ross,
Yair Schiff,
Radhika Vedpathak,
Richard A. Young
Abstract:
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew…
▽ More
Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with "TrustFormers" across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees.
△ Less
Submitted 9 June, 2024; v1 submitted 21 April, 2023;
originally announced April 2023.
-
Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs
Authors:
Zeljko Kraljevic,
Dan Bean,
Anthony Shek,
Rebecca Bendayan,
Harry Hemingway,
Joshua Au Yeung,
Alexander Deng,
Alfie Baston,
Jack Ross,
Esther Idowu,
James T Teo,
Richard J Dobson
Abstract:
Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generati…
▽ More
Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings. Methods: We present Foresight, a novel transformer-based pipeline that uses named entity recognition and linking tools to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, substances, procedures and findings. We processed the entire free-text portion from three different hospital datasets totalling 811336 patients covering both physical and mental health. Findings: On tests in two UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 0.68, 0.76 and 0.88 was achieved for forecasting the next disorder in a patient timeline, while precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by five clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. As a generative model, it can forecast follow-on biomedical concepts for as many steps as required. Interpretation: Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials and clinical research to study the progression of disorders, simulate interventions and counterfactuals, and educational purposes.
△ Less
Submitted 24 January, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding
Authors:
Josiah Ross,
Luke Yoffe,
Alon Albalak,
William Yang Wang
Abstract:
Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency. This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. We hypothesize that a model can utilize the information learned from a source task to better learn a target…
▽ More
Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency. This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. We hypothesize that a model can utilize the information learned from a source task to better learn a target task, thereby reducing the number of target task training samples required. Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning. Our results lead us to believe that this unexpected result could be due to the effects of catastrophic forgetting, motivating further work into methods that prevent such forgetting.
△ Less
Submitted 21 October, 2022;
originally announced October 2022.
-
Cloud-Based Real-Time Molecular Screening Platform with MolFormer
Authors:
Brian Belgodere,
Vijil Chenthamarakshan,
Payel Das,
Pierre Dognin,
Toby Kurien,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff,
Richard A. Young
Abstract:
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The pla…
▽ More
With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The platform currently supports three tasks: nearest neighbor retrieval, chemical space visualization, and property prediction. Based on the functionalities of this platform and results obtained, we believe that such a platform can play a pivotal role in automating chemistry and chemical engineering research, as well as assist in drug discovery and material design tasks. A demo of our platform is provided at \url{www.ibm.biz/molecular_demo}.
△ Less
Submitted 13 August, 2022;
originally announced August 2022.
-
On the Lambek embedding and the category of product-preserving presheaves
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits…
▽ More
It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits, while still enjoying most of the other useful properties of the Yoneda embedding. We call this modified embedding the Lambek embedding. The category of limit-preserving functors is known to be a reflective subcategory of the category of all functors, i.e., there is a left adjoint for the inclusion functor. In the literature, the existence of this left adjoint is often proved non-constructively, e.g., by an application of Freyd's adjoint functor theorem. In this paper, we provide an alternative, more constructive proof of this fact. We first explain the Lambek embedding and why it preserves coproducts. Then we review some concepts from multi-sorted algebras and observe that there is a one-to-one correspondence between product-preserving presheaves and certain multi-sorted term algebras. We provide a construction that freely turns any presheaf functor into a product-preserving one, hence giving an explicit definition of the left adjoint functor of the inclusion. Finally, we sketch how to extend our method to prove that the subcategory of limit-preserving functors is also reflective.
△ Less
Submitted 12 May, 2022;
originally announced May 2022.
-
Proto-Quipper with dynamic lifting
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit exe…
▽ More
Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit execution time. Values that are known at circuit generation time are called parameters, and values that are known at circuit execution time are called states. Dynamic lifting is an operation that enables a state, such as the result of a measurement, to be lifted to a parameter, where it can influence the generation of the next portion of the circuit. As a result, dynamic lifting enables Proto-Quipper programs to interleave classical and quantum computation. We describe the syntax of a language we call Proto-Quipper-Dyn. Its type system uses a system of modalities to keep track of the use of dynamic lifting. We also provide an operational semantics, as well as an abstract categorical semantics for dynamic lifting based on enriched category theory. We prove that both the type system and the operational semantics are sound with respect to our categorical semantics. Finally, we give some examples of Proto-Quipper-Dyn programs that make essential use of dynamic lifting.
△ Less
Submitted 8 November, 2022; v1 submitted 27 April, 2022;
originally announced April 2022.
-
A Biset-Enriched Categorical Model for Proto-Quipper with Dynamic Lifting
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generati…
▽ More
Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generation time, and states, which are known at circuit execution time. Sometimes, it is desirable for the results of measurements to control the generation of the next part of the circuit. Therefore, the language needs to turn states, such as measurement outcomes, into parameters, an operation we call dynamic lifting. The goal of this paper is to model this interaction between the runtimes by providing a general categorical structure enriched in what we call "bisets". We demonstrate that the biset-enriched structure achieves a proper semantics of the two runtimes and their interaction, by showing that it models a variant of Proto-Quipper with dynamic lifting. The present paper deals with the concrete categorical semantics of this language, whereas a companion paper deals with the syntax, type system, operational semantics, and abstract categorical semantics.
△ Less
Submitted 15 November, 2023; v1 submitted 27 April, 2022;
originally announced April 2022.
-
A Physics-Guided Neural Operator Learning Approach to Model Biological Tissues from Digital Image Correlation Measurements
Authors:
Huaiqian You,
Quinn Zhang,
Colton J. Ross,
Chung-Hao Lee,
Ming-Chen Hsu,
Yue Yu
Abstract:
We present a data-driven workflow to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios, without postulating a specific constitutive model form nor possessing knowledges on the material microstructure. To this end, a material database is constructed from the DIC displacement tracking measurem…
▽ More
We present a data-driven workflow to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios, without postulating a specific constitutive model form nor possessing knowledges on the material microstructure. To this end, a material database is constructed from the DIC displacement tracking measurements of multiple biaxial stretching protocols on a porcine tricuspid valve anterior leaflet, with which we build a neural operator learning model. The material response is modeled as a solution operator from the loading to the resultant displacement field, with the material microstructure properties learned implicitly from the data and naturally embedded in the network parameters. Using various combinations of loading protocols, we compare the predictivity of this framework with finite element analysis based on the phenomenological Fung-type model. From in-distribution tests, the predictivity of our approach presents good generalizability to different loading conditions and outperforms the conventional constitutive modeling at approximately one order of magnitude. When tested on out-of-distribution loading ratios, the neural operator learning approach becomes less effective. To improve the generalizability of our framework, we propose a physics-guided neural operator learning model via imposing partial physics knowledge. This method is shown to improve the model's extrapolative performance in the small-deformation regime. Our results demonstrate that with sufficient data coverage and/or guidance from partial physics constraints, the data-driven approach can be a more effective method for modeling biological materials than the traditional constitutive modeling.
△ Less
Submitted 1 April, 2022;
originally announced April 2022.
-
Learning Deep Implicit Fourier Neural Operators (IFNOs) with Applications to Heterogeneous Material Modeling
Authors:
Huaiqian You,
Quinn Zhang,
Colton J. Ross,
Chung-Hao Lee,
Yue Yu
Abstract:
Constitutive modeling based on continuum mechanics theory has been a classical approach for modeling the mechanical responses of materials. However, when constitutive laws are unknown or when defects and/or high degrees of heterogeneity are present, these classical models may become inaccurate. In this work, we propose to use data-driven modeling, which directly utilizes high-fidelity simulation a…
▽ More
Constitutive modeling based on continuum mechanics theory has been a classical approach for modeling the mechanical responses of materials. However, when constitutive laws are unknown or when defects and/or high degrees of heterogeneity are present, these classical models may become inaccurate. In this work, we propose to use data-driven modeling, which directly utilizes high-fidelity simulation and/or experimental measurements to predict a material's response without using conventional constitutive models. Specifically, the material response is modeled by learning the implicit mappings between loading conditions and the resultant displacement and/or damage fields, with the neural network serving as a surrogate for a solution operator. To model the complex responses due to material heterogeneity and defects, we develop a novel deep neural operator architecture, which we coin as the Implicit Fourier Neural Operator (IFNO). In the IFNO, the increment between layers is modeled as an integral operator to capture the long-range dependencies in the feature space. As the network gets deeper, the limit of IFNO becomes a fixed point equation that yields an implicit neural operator and naturally mimics the displacement/damage fields solving procedure in material modeling problems. We demonstrate the performance of our proposed method for a number of examples, including hyperelastic, anisotropic and brittle materials. As an application, we further employ the proposed approach to learn the material models directly from digital image correlation (DIC) tracking measurements, and show that the learned solution operators substantially outperform the conventional constitutive models in predicting displacement fields.
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
Generators and Relations for Real Stabilizer Operators
Authors:
Justin Makary,
Neil J. Ross,
Peter Selinger
Abstract:
Real stabilizer operators, which are also known as real Clifford operators, are generated, through composition and tensor product, by the Hadamard gate, the Pauli Z gate, and the controlled-Z gate. We introduce a normal form for real stabilizer circuits and show that every real stabilizer operator admits a unique normal form. Moreover, we give a finite set of relations that suffice to rewrite any…
▽ More
Real stabilizer operators, which are also known as real Clifford operators, are generated, through composition and tensor product, by the Hadamard gate, the Pauli Z gate, and the controlled-Z gate. We introduce a normal form for real stabilizer circuits and show that every real stabilizer operator admits a unique normal form. Moreover, we give a finite set of relations that suffice to rewrite any real stabilizer circuit to its normal form.
△ Less
Submitted 12 September, 2021;
originally announced September 2021.
-
Primordial non-Gaussianity from the Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey I: Catalogue Preparation and Systematic Mitigation
Authors:
Mehdi Rezaie,
Ashley J. Ross,
Hee-Jong Seo,
Eva-Maria Mueller,
Will J. Percival,
Grant Merz,
Reza Katebi,
Razvan C. Bunescu,
Julian Bautista,
Joel R. Brownstein,
Etienne Burtin,
Kyle Dawson,
Héctor Gil-Marín,
Jiamin Hou,
Eleanor B. Lyke,
Axel de la Macorra,
Graziano Rossi,
Donald P. Schneider,
Pauline Zarrouk,
Gong-Bo Zhao
Abstract:
We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate s…
▽ More
We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate spurious fluctuations in the density field caused by spatial variations in the quality of the imaging data used to select targets for follow-up spectroscopy. Simulations are used with the same angular and radial distributions as the real data to estimate covariance matrices, perform error analyses, and assess residual systematic uncertainties. We measure the mean density contrast and cross-correlations of the eBOSS quasars against maps of potential sources of imaging systematics to address algorithm effectiveness, finding that the neural network-based approach outperforms standard linear regression. Stellar density is one of the most important sources of spurious fluctuations, and a new template constructed using data from the Gaia spacecraft provides the best match to the observed quasar clustering. The end-product from this work is a new value-added quasar catalogue with the improved weights to correct for nonlinear imaging systematic effects, which will be made public. Our quasar catalogue is used to measure the local-type primordial non-Gaussianity in our companion paper, Mueller et al. in preparation.
△ Less
Submitted 25 June, 2021;
originally announced June 2021.
-
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Authors:
Jerret Ross,
Brian Belgodere,
Vijil Chenthamarakshan,
Inkit Padhi,
Youssef Mroueh,
Payel Das
Abstract:
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based languag…
▽ More
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based language models pretrained on a large unlabelled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, we present molecular embeddings obtained by training an efficient transformer encoder model, MoLFormer, which uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed training, on SMILES sequences of 1.1 billion unlabelled molecules from the PubChem and ZINC datasets. We show that the learned molecular representation outperforms existing baselines, including supervised and self-supervised graph neural networks and language models, on several downstream tasks from ten benchmark datasets. They perform competitively on two others. Further analyses, specifically through the lens of attention, demonstrate that MoLFormer trained on chemical SMILES indeed learns the spatial relationships between atoms within a molecule. These results provide encouraging evidence that large-scale molecular language models can capture sufficient chemical and structural information to predict various distinct molecular properties, including quantum-chemical properties.
△ Less
Submitted 14 December, 2022; v1 submitted 17 June, 2021;
originally announced June 2021.
-
Generators and Relations for the Group On(Z[1/2])
Authors:
Sarah Meng Li,
Neil J. Ross,
Peter Selinger
Abstract:
We give a finite presentation by generators and relations for the group O_n(Z[1/2]) of n-dimensional orthogonal matrices with entries in Z[1/2]. We then obtain a similar presentation for the group of n-dimensional orthogonal matrices of the form M/sqrt(2)^k, where k is a nonnegative integer and M is an integer matrix. Both groups arise in the study of quantum circuits. In particular, when the dime…
▽ More
We give a finite presentation by generators and relations for the group O_n(Z[1/2]) of n-dimensional orthogonal matrices with entries in Z[1/2]. We then obtain a similar presentation for the group of n-dimensional orthogonal matrices of the form M/sqrt(2)^k, where k is a nonnegative integer and M is an integer matrix. Both groups arise in the study of quantum circuits. In particular, when the dimension is a power of 2, the elements of the latter group are precisely the unitary matrices that can be represented by a quantum circuit over the universal gate set consisting of the Toffoli gate, the Hadamard gate, and the computational ancilla.
△ Less
Submitted 12 September, 2021; v1 submitted 2 June, 2021;
originally announced June 2021.
-
Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms
Authors:
Graham Cormode,
Abhinav Mishra,
Joseph Ross,
Pavel Veselý
Abstract:
Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: $t$-digest and ReqSketch. While $t$-digest is a popular comp…
▽ More
Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: $t$-digest and ReqSketch. While $t$-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for $t$-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than $t$-digest, while its error stays bounded on any instance. Still, our results confirm that $t$-digest remains more accurate on the ``non-adversarial'' data encountered in practice.
△ Less
Submitted 10 June, 2021; v1 submitted 18 February, 2021;
originally announced February 2021.
-
Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff,
Richard A. Young,
Brian Belgodere
Abstract:
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on the…
▽ More
Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems.
△ Less
Submitted 18 June, 2021; v1 submitted 21 December, 2020;
originally announced December 2020.
-
Alleviating Noisy Data in Image Captioning with Cooperative Distillation
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Inkit Padhi,
Mattia Rigotti,
Jarret Ross,
Yair Schiff
Abstract:
Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose…
▽ More
Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions.
△ Less
Submitted 21 December, 2020;
originally announced December 2020.
-
Language-Driven Region Pointer Advancement for Controllable Image Captioning
Authors:
Annika Lindh,
Robert J. Ross,
John D. Kelleher
Abstract:
Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning…
▽ More
Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size.
△ Less
Submitted 30 November, 2020;
originally announced November 2020.
-
Tabular Transformers for Modeling Multivariate Time Series
Authors:
Inkit Padhi,
Yair Schiff,
Igor Melnyk,
Mattia Rigotti,
Youssef Mroueh,
Pierre Dognin,
Jerret Ross,
Ravi Nair,
Erik Altman
Abstract:
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learn…
▽ More
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations. Code and data are available at https://github.com/IBM/TabFormer.
△ Less
Submitted 11 February, 2021; v1 submitted 3 November, 2020;
originally announced November 2020.
-
Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse
Authors:
Ancil Crayton,
João Fonseca,
Kanav Mehra,
Michelle Ng,
Jared Ross,
Marcelo Sandoval-Castañeda,
Rachel von Gnechten
Abstract:
People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identi…
▽ More
People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020.
△ Less
Submitted 11 September, 2020;
originally announced September 2020.
-
Sampling hypergraphs with given degrees
Authors:
Martin Dyer,
Catherine Greenhill,
Pieter Kleer,
James Ross,
Leen Stougie
Abstract:
There is a well-known connection between hypergraphs and bipartite graphs, obtained by treating the incidence matrix of the hypergraph as the biadjacency matrix of a bipartite graph. We use this connection to describe and analyse a rejection sampling algorithm for sampling simple uniform hypergraphs with a given degree sequence. Our algorithm uses, as a black box, an algorithm $\mathcal{A}$ for sa…
▽ More
There is a well-known connection between hypergraphs and bipartite graphs, obtained by treating the incidence matrix of the hypergraph as the biadjacency matrix of a bipartite graph. We use this connection to describe and analyse a rejection sampling algorithm for sampling simple uniform hypergraphs with a given degree sequence. Our algorithm uses, as a black box, an algorithm $\mathcal{A}$ for sampling bipartite graphs with given degrees, uniformly or nearly uniformly, in (expected) polynomial time. The expected runtime of the hypergraph sampling algorithm depends on the (expected) runtime of the bipartite graph sampling algorithm $\mathcal{A}$, and the probability that a uniformly random bipartite graph with given degrees corresponds to a simple hypergraph. We give some conditions on the hypergraph degree sequence which guarantee that this probability is bounded below by a positive constant.
△ Less
Submitted 13 July, 2021; v1 submitted 22 June, 2020;
originally announced June 2020.
-
Fast Mixing of Multi-Scale Langevin Dynamics under the Manifold Hypothesis
Authors:
Adam Block,
Youssef Mroueh,
Alexander Rakhlin,
Jerret Ross
Abstract:
Recently, the task of image generation has attracted much attention. In particular, the recent empirical successes of the Markov Chain Monte Carlo (MCMC) technique of Langevin Dynamics have prompted a number of theoretical advances; despite this, several outstanding problems remain. First, the Langevin Dynamics is run in very high dimension on a nonconvex landscape; in the worst case, due to the N…
▽ More
Recently, the task of image generation has attracted much attention. In particular, the recent empirical successes of the Markov Chain Monte Carlo (MCMC) technique of Langevin Dynamics have prompted a number of theoretical advances; despite this, several outstanding problems remain. First, the Langevin Dynamics is run in very high dimension on a nonconvex landscape; in the worst case, due to the NP-hardness of nonconvex optimization, it is thought that Langevin Dynamics mixes only in time exponential in the dimension. In this work, we demonstrate how the manifold hypothesis allows for the considerable reduction of mixing time, from exponential in the ambient dimension to depending only on the (much smaller) intrinsic dimension of the data. Second, the high dimension of the sampling space significantly hurts the performance of Langevin Dynamics; we leverage a multi-scale approach to help ameliorate this issue and observe that this multi-resolution algorithm allows for a trade-off between image quality and computational expense in generation.
△ Less
Submitted 22 June, 2020; v1 submitted 19 June, 2020;
originally announced June 2020.
-
Asymmetric scale functions for $t$-digests
Authors:
Joseph Ross
Abstract:
The $t$-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a $t$-digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. Aft…
▽ More
The $t$-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a $t$-digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. After establishing some theoretical properties of scale functions for $t$-digests, we show that a tangent line construction on the familiar scale functions preserves the crucial properties that allow $t$-digests to operate online and be mergeable. We conclude with an empirical study demonstrating the asymmetric variant preserves accuracy on one side of the distribution with a much smaller memory footprint.
△ Less
Submitted 19 May, 2020;
originally announced May 2020.
-
A tutorial introduction to quantum circuit programming in dependently typed Proto-Quipper
Authors:
Peng Fu,
Kohei Kishida,
Neil J. Ross,
Peter Selinger
Abstract:
We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the p…
▽ More
We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the problem of type-safe uncomputation of garbage qubits. We also discuss other language features along the way.
△ Less
Submitted 12 December, 2020; v1 submitted 17 May, 2020;
originally announced May 2020.
-
Generative-based Airway and Vessel Morphology Quantification on Chest CT Images
Authors:
Pietro Nardelli,
James C. Ross,
Raúl San José Estépar
Abstract:
Accurately and precisely characterizing the morphology of small pulmonary structures from Computed Tomography (CT) images, such as airways and vessels, is becoming of great importance for diagnosis of pulmonary diseases. The smaller conducting airways are the major site of increased airflow resistance in chronic obstructive pulmonary disease (COPD), while accurately sizing vessels can help identif…
▽ More
Accurately and precisely characterizing the morphology of small pulmonary structures from Computed Tomography (CT) images, such as airways and vessels, is becoming of great importance for diagnosis of pulmonary diseases. The smaller conducting airways are the major site of increased airflow resistance in chronic obstructive pulmonary disease (COPD), while accurately sizing vessels can help identify arterial and venous changes in lung regions that may determine future disorders. However, traditional methods are often limited due to image resolution and artifacts.
We propose a Convolutional Neural Regressor (CNR) that provides cross-sectional measurement of airway lumen, airway wall thickness, and vessel radius. CNR is trained with data created by a generative model of synthetic structures which is used in combination with Simulated and Unsupervised Generative Adversarial Network (SimGAN) to create simulated and refined airways and vessels with known ground-truth.
For validation, we first use synthetically generated airways and vessels produced by the proposed generative model to compute the relative error and directly evaluate the accuracy of CNR in comparison with traditional methods. Then, in-vivo validation is performed by analyzing the association between the percentage of the predicted forced expiratory volume in one second (FEV1\%) and the value of the Pi10 parameter, two well-known measures of lung function and airway disease, for airways. For vessels, we assess the correlation between our estimate of the small-vessel blood volume and the lungs' diffusing capacity for carbon monoxide (DLCO).
The results demonstrate that Convolutional Neural Networks (CNNs) provide a promising direction for accurately measuring vessels and airways on chest CT images with physiological correlates.
△ Less
Submitted 13 March, 2020; v1 submitted 13 February, 2020;
originally announced February 2020.
-
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets
Authors:
Mingrui Liu,
Youssef Mroueh,
Jerret Ross,
Wei Zhang,
Xiaodong Cui,
Payel Das,
Tianbao Yang
Abstract:
Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical an…
▽ More
Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. First, we analyze a variant of Optimistic Stochastic Gradient (OSG) proposed in~\citep{daskalakis2017training} for solving a class of non-convex non-concave min-max problem and establish $O(ε^{-4})$ complexity for finding $ε$-first-order stationary point, in which the algorithm only requires invoking one stochastic first-order oracle while enjoying state-of-the-art iteration complexity achieved by stochastic extragradient method by~\citep{iusem2017extragradient}. Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(ε^{-\frac{2}{1-α}}\right)$, where $α$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq α\leq 1/2$. To the best of our knowledge, this is the first work for establishing adaptive complexity in non-convex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically.
△ Less
Submitted 24 December, 2020; v1 submitted 26 December, 2019;
originally announced December 2019.
-
A Decentralized Parallel Algorithm for Training Generative Adversarial Nets
Authors:
Mingrui Liu,
Wei Zhang,
Youssef Mroueh,
Xiaodong Cui,
Jerret Ross,
Tianbao Yang,
Payel Das
Abstract:
Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs…
▽ More
Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs to either directly communicate with the central node or indirectly communicate with all other workers in every iteration. However, when the network bandwidth is low or network latency is high, the performance would be significantly degraded. Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner. The main difficulty lies at handling the nonconvex-nonconcave min-max optimization and the decentralized communication simultaneously. In this paper, we address this difficulty by designing the \textbf{first gradient-based decentralized parallel algorithm} which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm. Theoretically, our proposed decentralized algorithm is able to solve a class of non-convex non-concave min-max problems with provable non-asymptotic convergence to first-order stationary point. Experimental results on GANs demonstrate the effectiveness of the proposed algorithm.
△ Less
Submitted 19 October, 2020; v1 submitted 28 October, 2019;
originally announced October 2019.
-
Wasserstein Barycenter Model Ensembling
Authors:
Pierre Dognin,
Igor Melnyk,
Youssef Mroueh,
Jerret Ross,
Cicero Dos Santos,
Tom Sercu
Abstract:
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement b…
▽ More
In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling.
△ Less
Submitted 13 February, 2019;
originally announced February 2019.
-
Generating Diverse and Meaningful Captions
Authors:
Annika Lindh,
Robert J. Ross,
Abhijit Mahalunkar,
Giancarlo Salton,
John D. Kelleher
Abstract:
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates mo…
▽ More
Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online.
△ Less
Submitted 19 December, 2018;
originally announced December 2018.
-
Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms
Authors:
Giancarlo D. Salton,
Robert J. Ross,
John D. Kelleher
Abstract:
Idioms pose problems to almost all Machine Translation systems. This type of language is very frequent in day-to-day language use and cannot be simply ignored. The recent interest in memory augmented models in the field of Language Modelling has aided the systems to achieve good results by bridging long-distance dependencies. In this paper we explore the use of such techniques into a Neural Machin…
▽ More
Idioms pose problems to almost all Machine Translation systems. This type of language is very frequent in day-to-day language use and cannot be simply ignored. The recent interest in memory augmented models in the field of Language Modelling has aided the systems to achieve good results by bridging long-distance dependencies. In this paper we explore the use of such techniques into a Neural Machine Translation system to help in translation of idiomatic language.
△ Less
Submitted 10 October, 2018;
originally announced October 2018.
-
Adversarial Semantic Alignment for Improved Image Captions
Authors:
Pierre L. Dognin,
Igor Melnyk,
Youssef Mroueh,
Jarret Ross,
Tom Sercu
Abstract:
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient…
▽ More
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training.
△ Less
Submitted 6 June, 2019; v1 submitted 30 April, 2018;
originally announced May 2018.
-
Automated optimization of large quantum circuits with continuous parameters
Authors:
Yunseong Nam,
Neil J. Ross,
Yuan Su,
Andrew M. Childs,
Dmitri Maslov
Abstract:
We develop and implement automated methods for optimizing quantum circuits of the size and type expected in quantum computations that outperform classical computers. We show how to handle continuous gate parameters and report a collection of fast algorithms capable of optimizing large-scale quantum circuits. For the suite of benchmarks considered, we obtain substantial reductions in gate counts. I…
▽ More
We develop and implement automated methods for optimizing quantum circuits of the size and type expected in quantum computations that outperform classical computers. We show how to handle continuous gate parameters and report a collection of fast algorithms capable of optimizing large-scale quantum circuits. For the suite of benchmarks considered, we obtain substantial reductions in gate counts. In particular, we provide better optimization in significantly less time than previous approaches, while making minimal structural changes so as to preserve the basic layout of the underlying quantum algorithms. Our results help bridge the gap between the computations that can be run on existing hardware and those that are expected to outperform classical computers.
△ Less
Submitted 1 June, 2018; v1 submitted 19 October, 2017;
originally announced October 2017.
-
A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor
Authors:
David Richie,
James Ross,
Jamie Infantolino
Abstract:
The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed sha…
▽ More
The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (DSM) model supported in software transparently using C++ templated metaprogramming techniques. The approach offers an extremely simple parallel programming model well suited for the architecture. Initial results are presented that demonstrate the approach and provide insight into the efficiency of the programming model and also the ability of the NoC to support a DSM without explicit control over data movement and localization.
△ Less
Submitted 26 April, 2017;
originally announced April 2017.
-
In-Datacenter Performance Analysis of a Tensor Processing Unit
Authors:
Norman P. Jouppi,
Cliff Young,
Nishant Patil,
David Patterson,
Gaurav Agrawal,
Raminder Bajwa,
Sarah Bates,
Suresh Bhatia,
Nan Boden,
Al Borchers,
Rick Boyle,
Pierre-luc Cantin,
Clifford Chao,
Chris Clark,
Jeremy Coriell,
Mike Daley,
Matt Dau,
Jeffrey Dean,
Ben Gelb,
Tara Vazir Ghaemmaghami,
Rajendra Gottipati,
William Gulland,
Robert Hagmann,
C. Richard Ho,
Doug Hogberg
, et al. (50 additional authors not shown)
Abstract:
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp…
▽ More
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.
△ Less
Submitted 16 April, 2017;
originally announced April 2017.
-
I CAN HAS SUPERCOMPUTER? A Novel Approach to Teaching Parallel and Distributed Computing Concepts Using a Meme-Based Programming Language
Authors:
David Richie,
James Ross
Abstract:
A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this paper uses a procedural programming language appealing to undergraduate students. We propose that the amusing nature of the approach may engender creativity and int…
▽ More
A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this paper uses a procedural programming language appealing to undergraduate students. We propose that the amusing nature of the approach may engender creativity and interest using these concepts later in more sober environments. Specifically, we implement parallel extensions to LOLCODE within a source-to-source compiler sufficient for the development of parallel and distributed algorithms normally implemented using conventional high-performance computing languages and APIs.
△ Less
Submitted 29 March, 2017;
originally announced March 2017.
-
A Finite Presentation of CNOT-Dihedral Operators
Authors:
Matthew Amy,
Jianxin Chen,
Neil J. Ross
Abstract:
We give a finite presentation by generators and relations of the unitary operators expressible over the {CNOT, T, X} gate set, also known as CNOT-dihedral operators. To this end, we introduce a notion of normal form for CNOT-dihedral circuits and prove that every CNOT-dihedral operator admits a unique normal form. Moreover, we show that in the presence of certain structural rules only finitely man…
▽ More
We give a finite presentation by generators and relations of the unitary operators expressible over the {CNOT, T, X} gate set, also known as CNOT-dihedral operators. To this end, we introduce a notion of normal form for CNOT-dihedral circuits and prove that every CNOT-dihedral operator admits a unique normal form. Moreover, we show that in the presence of certain structural rules only finitely many circuit identities are required to reduce an arbitrary CNOT-dihedral circuit to its normal form.
By appropriately restricting our relations, we obtain a finite presentation of unitary operators expressible over the {CNOT, T} gate set as a corollary.
△ Less
Submitted 28 April, 2019; v1 submitted 31 December, 2016;
originally announced January 2017.
-
Self-critical Sequence Training for Image Captioning
Authors:
Steven J. Rennie,
Etienne Marcheret,
Youssef Mroueh,
Jarret Ross,
Vaibhava Goel
Abstract:
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, signifi…
▽ More
Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a "baseline" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7.
△ Less
Submitted 15 November, 2017; v1 submitted 1 December, 2016;
originally announced December 2016.
-
A data-driven model for influenza transmission incorporating media effects
Authors:
Lewis Mitchell,
Joshua V. Ross
Abstract:
Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of "big data" coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming availab…
▽ More
Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of "big data" coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming available to researchers. In this study we combine an online data set comprising millions of shared messages relating to influenza with traditional surveillance data on flu activity to suggest a functional form for the relationship between the two. Using this data we present a simple deterministic model for influenza dynamics incorporating media effects, and show that such a model helps explain the dynamics of historical influenza outbreaks. Furthermore, through model selection we show that the proposed media function fits historical data better than other media functions proposed in earlier studies.
△ Less
Submitted 27 September, 2016;
originally announced September 2016.
-
OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture
Authors:
David Richie,
James Ross
Abstract:
There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with min…
▽ More
There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.
△ Less
Submitted 11 August, 2016;
originally announced August 2016.
-
An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor
Authors:
James Ross,
David Richie
Abstract:
This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and…
▽ More
This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM.
△ Less
Submitted 11 August, 2016;
originally announced August 2016.
-
Overcoming the language barrier in mobile user interface design: A case study on a mobile health app
Authors:
Jason Ross,
Jing Gao
Abstract:
This research report proposes a structured solution to address the need for awareness of cultural and language in user design. It will include evaluated research on established methods that already exist. Discussed ideas about how to address this situation include: what others have found to take into consideration when using design principles to develop an interface, detailed troubles and critical…
▽ More
This research report proposes a structured solution to address the need for awareness of cultural and language in user design. It will include evaluated research on established methods that already exist. Discussed ideas about how to address this situation include: what others have found to take into consideration when using design principles to develop an interface, detailed troubles and critical issues that have been previously identified and also ways that have been found already to overcome such issues. This will also involve designing a prototype application catering to resolving these issues. Overcoming the language barrier plays an important role in the process of implementing a user design interface that will satisfy users. This issue must be researched and examined to identify the issues and concerns associated in order to provide a solution in an ethical manner.
△ Less
Submitted 16 May, 2016;
originally announced May 2016.
-
Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor
Authors:
David A. Richie,
James A. Ross
Abstract:
The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which contribute to software design challenges for the application developer. Addressing these challenges within the software stack that supports application development i…
▽ More
The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which contribute to software design challenges for the application developer. Addressing these challenges within the software stack that supports application development is critical to improving productivity and expanding the range of applications for the architecture. We report here on advances that have been made in the COPRTHR-2 software stack targeting the Epiphany architecture that address critical issues identified in previous work. Specifically, we describe improvements that bring greater control and precision to the design of compact compiled binary programs in the context of the limited per-core local memory of the architecture. We describe a new design for run-time support that has been implemented to dramatically improve the program load and execute performance and capabilities. Finally, we describe developments that advance host-coprocessor interoperability to expand the functionality available to the application developer.
△ Less
Submitted 14 April, 2016;
originally announced April 2016.
-
Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor
Authors:
James A. Ross,
David A. Richie
Abstract:
The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following…
▽ More
The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following an investigation into the use of two-sided communication using threaded MPI, one-sided communication using SHMEM is being explored. Here we present work in progress on the development of an OpenSHMEM 1.2 implementation for the Epiphany architecture.
△ Less
Submitted 14 April, 2016;
originally announced April 2016.
-
Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI
Authors:
James A. Ross,
David A. Richie,
Song J. Park,
Dale R. Shires
Abstract:
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This…
▽ More
The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, a five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture.
△ Less
Submitted 17 June, 2015;
originally announced June 2015.
-
Understanding the Heavy Tailed Dynamics in Human Behavior
Authors:
Gordon J Ross,
Tim Jones
Abstract:
The recent availability of electronic datasets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the inter-event times between consecutive communication events obey heavy tailed power law dynamics. Explaining this has proved controversial, and tw…
▽ More
The recent availability of electronic datasets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the inter-event times between consecutive communication events obey heavy tailed power law dynamics. Explaining this has proved controversial, and two distinct hypotheses have emerged. The first holds that these power laws are fundamental, and arise from the mechanisms such as priority queuing that humans use to schedule tasks. The second holds that they are a statistical artifact which only occur in aggregated data when features such as circadian rhythms and burstiness are ignored. We use a large social media data set to test these hypotheses, and find that although models that incorporate circadian rhythms and burstiness do explain part of the observed heavy tails, there is residual unexplained heavy tail behavior which suggests a more fundamental cause. Based on this, we develop a new quantitative model of human behavior which improves on existing approaches, and gives insight into the mechanisms underlying human interactions.
△ Less
Submitted 6 May, 2015;
originally announced May 2015.