subscribe to arXiv mailings

Distributional Preference Alignment of LLMs via Optimal Transport

Authors: Igor Melnyk, Youssef Mroueh, Brian Belgodere, Mattia Rigotti, Apoorva Nitsure, Mikhail Yurochkin, Kristjan Greenewald, Jiri Navratil, Jerret Ross

Abstract: Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically… ▽ More Current LLM alignment techniques use pairwise human preferences at a sample level, and as such, they do not imply an alignment on the distributional level. We propose in this paper Alignment via Optimal Transport (AOT), a novel method for distributional preference alignment of LLMs. AOT aligns LLMs on unpaired preference data by making the reward distribution of the positive samples stochastically dominant in the first order on the distribution of negative samples. We introduce a convex relaxation of this first-order stochastic dominance and cast it as an optimal transport problem with a smooth and convex cost. Thanks to the one-dimensional nature of the resulting optimal transport problem and the convexity of the cost, it has a closed-form solution via sorting on empirical measures. We fine-tune LLMs with this AOT objective, which enables alignment by penalizing the violation of the stochastic dominance of the reward distribution of the positive samples on the reward distribution of the negative samples. We analyze the sample complexity of AOT by considering the dual of the OT problem and show that it converges at the parametric rate. Empirically, we show on a diverse set of alignment datasets and LLMs that AOT leads to state-of-the-art models in the 7B family of models when evaluated with Open LLM Benchmarks and AlpacaEval. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.04912 [pdf, other]

GP-MoLFormer: A Foundation Model For Molecular Generation

Authors: Jerret Ross, Brian Belgodere, Samuel C. Hoffman, Vijil Chenthamarakshan, Youssef Mroueh, Payel Das

Abstract: Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFo… ▽ More Transformer-based models trained on large and general purpose datasets consisting of molecular strings have recently emerged as a powerful tool for successfully modeling various structure-property relations. Inspired by this success, we extend the paradigm of training chemical language transformers on large-scale chemical datasets to generative tasks in this work. Specifically, we propose GP-MoLFormer, an autoregressive molecular string generator that is trained on more than 1.1B chemical SMILES. GP-MoLFormer uses a 46.8M parameter transformer decoder model with linear attention and rotary positional encodings as the base architecture. We explore the utility of GP-MoLFormer in generating novel, valid, and unique SMILES. Impressively, we find GP-MoLFormer is able to generate a significant fraction of novel, valid, and unique SMILES even when the number of generated molecules is in the 10 billion range and the reference set is over a billion. We also find strong memorization of training data in GP-MoLFormer generations, which has so far remained unexplored for chemical language models. Our analyses reveal that training data memorization and novelty in generations are impacted by the quality of the training data; duplication bias in training data can enhance memorization at the cost of lowering novelty. We evaluate GP-MoLFormer's utility and compare it with that of existing baselines on three different tasks: de novo generation, scaffold-constrained molecular decoration, and unconstrained property-guided optimization. While the first two are handled with no additional training, we propose a parameter-efficient fine-tuning method for the last task, which uses property-ordered molecular pairs as input. We call this new approach pair-tuning. Our results show GP-MoLFormer performs better or comparable with baselines across all three tasks, demonstrating its general utility. △ Less

Submitted 4 April, 2024; originally announced May 2024.

arXiv:2403.14797 [pdf, other]

Preventing Catastrophic Forgetting through Memory Networks in Continuous Detection

Authors: Gaurav Bhatt, James Ross, Leonid Sigal

Abstract: Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-… ▽ More Modern pre-trained architectures struggle to retain previous information while undergoing continuous fine-tuning on new tasks. Despite notable progress in continual classification, systems designed for complex vision tasks such as detection or segmentation still struggle to attain satisfactory performance. In this work, we introduce a memory-based detection transformer architecture to adapt a pre-trained DETR-style detector to new tasks while preserving knowledge from previous tasks. We propose a novel localized query function for efficient information retrieval from memory units, aiming to minimize forgetting. Furthermore, we identify a fundamental challenge in continual detection referred to as background relegation. This arises when object categories from earlier tasks reappear in future tasks, potentially without labels, leading them to be implicitly treated as background. This is an inevitable issue in continual detection or segmentation. The introduced continual optimization technique effectively tackles this challenge. Finally, we assess the performance of our proposed system on continual detection benchmarks and demonstrate that our approach surpasses the performance of existing state-of-the-art resulting in 5-7% improvements on MS-COCO and PASCAL-VOC on the task of continual detection. △ Less

Submitted 15 July, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Journal ref: European Conference on Computer Vision, 2024

arXiv:2310.07132 [pdf, other]

Risk Aware Benchmarking of Large Language Models

Authors: Apoorva Nitsure, Youssef Mroueh, Mattia Rigotti, Kristjan Greenewald, Brian Belgodere, Mikhail Yurochkin, Jiri Navratil, Igor Melnyk, Jerret Ross

Abstract: We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and math… ▽ More We propose a distributional framework for benchmarking socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a metrics portfolio for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. The statistical significance of our tests is backed theoretically by an asymptotic analysis via central limit theorems instantiated in practice via a bootstrap variance estimate. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content. △ Less

Submitted 9 June, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: ICML 2024

arXiv:2307.01753 [pdf, other]

doi 10.1093/mnras/stae886

Local primordial non-Gaussianity from the large-scale clustering of photometric DESI luminous red galaxies

Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Hui Kong, Anna Porredon, Lado Samushia, Edmond Chaussidon, Alex Krolewski, Arnaud de Mattia, Florian Beutler, Jessica Nicole Aguilar, Steven Ahlen, Shadab Alam, Santiago Avila, Benedict Bahr-Kalus, Jose Bermejo-Climent, David Brooks, Todd Claybaugh, Shaun Cole, Kyle Dawson, Axel de la Macorra, Peter Doel, Andreu Font-Ribera, Jaime E. Forero-Romero, Satya Gontcho A Gontcho , et al. (24 additional authors not shown)

Abstract: We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the… ▽ More We use angular clustering of luminous red galaxies from the Dark Energy Spectroscopic Instrument (DESI) imaging surveys to constrain the local primordial non-Gaussianity parameter $\fnl$. Our sample comprises over 12 million targets, covering 14,000 square degrees of the sky, with redshifts in the range $0.2< z < 1.35$. We identify Galactic extinction, survey depth, and astronomical seeing as the primary sources of systematic error, and employ linear regression and artificial neural networks to alleviate non-cosmological excess clustering on large scales. Our methods are tested against simulations with and without $\fnl$ and systematics, showing superior performance of the neural network treatment. The neural network with a set of nine imaging property maps passes our systematic null test criteria, and is chosen as the fiducial treatment. Assuming the universality relation, we find $\fnl = 34^{+24(+50)}_{-44(-73)}$ at 68\%(95\%) confidence. We apply a series of robustness tests (e.g., cuts on imaging, declination, or scales used) that show consistency in the obtained constraints. We study how the regression method biases the measured angular power-spectrum and degrades the $\fnl$ constraining power. The use of the nine maps more than doubles the uncertainty compared to using only the three primary maps in the regression. Our results thus motivate the development of more efficient methods that avoid over-correction, protect large-scale clustering information, and preserve constraining power. Additionally, our results encourage further studies of $\fnl$ with DESI spectroscopic samples, where the inclusion of 3D clustering modes should help separate imaging systematics and lessen the degradation in the $\fnl$ uncertainty. △ Less

Submitted 25 June, 2024; v1 submitted 4 July, 2023; originally announced July 2023.

Comments: 21 pages, 17 figures, 7 tables (Appendix excluded). Published in MNRAS

arXiv:2305.11305 [pdf, other]

Improved Synthesis of Toffoli-Hadamard Circuits

Authors: Matthew Amy, Andrew N. Glaudell, Sarah Meng Li, Neil J. Ross

Abstract: The matrices that can be exactly represented by a circuit over the Toffoli-Hadamard gate set are the orthogonal matrices of the form $M/ \sqrt{2}{}^k$, where $M$ is an integer matrix and $k$ is a nonnegative integer. The exact synthesis problem for this gate set is the problem of constructing a circuit for a given such matrix. Existing methods produce circuits consisting of $O(2^n \log(n)k)$ gates… ▽ More The matrices that can be exactly represented by a circuit over the Toffoli-Hadamard gate set are the orthogonal matrices of the form $M/ \sqrt{2}{}^k$, where $M$ is an integer matrix and $k$ is a nonnegative integer. The exact synthesis problem for this gate set is the problem of constructing a circuit for a given such matrix. Existing methods produce circuits consisting of $O(2^n \log(n)k)$ gates, where $n$ is the dimension of the matrix. In this paper, we provide two improved synthesis methods. First, we show that a technique introduced by Kliuchnikov in 2013 for Clifford+$T$ circuits can be straightforwardly adapted to Toffoli-Hadamard circuits, reducing the complexity of the synthesized circuit from $O(2^n \log(n)k)$ to $O(n^2 \log(n)k)$. Then, we present an alternative synthesis method of similarly improved cost, but whose application is restricted to circuits on no more than three qubits. Our results also apply to orthogonal matrices over the dyadic fractions, which correspond to circuits using the 2-qubit gate $H\otimes H$, rather than the usual single-qubit Hadamard gate $H$. △ Less

Submitted 18 May, 2023; originally announced May 2023.

arXiv:2304.10819 [pdf, other]

Auditing and Generating Synthetic Data with Controllable Trust Trade-offs

Authors: Brian Belgodere, Pierre Dognin, Adam Ivankay, Igor Melnyk, Youssef Mroueh, Aleksandra Mojsilovic, Jiri Navratil, Apoorva Nitsure, Inkit Padhi, Mattia Rigotti, Jerret Ross, Yair Schiff, Radhika Vedpathak, Richard A. Young

Abstract: Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framew… ▽ More Real-world data often exhibits bias, imbalance, and privacy risks. Synthetic datasets have emerged to address these issues. This paradigm relies on generative AI models to generate unbiased, privacy-preserving data while maintaining fidelity to the original data. However, assessing the trustworthiness of synthetic datasets and models is a critical challenge. We introduce a holistic auditing framework that comprehensively evaluates synthetic datasets and AI models. It focuses on preventing bias and discrimination, ensures fidelity to the source data, assesses utility, robustness, and privacy preservation. We demonstrate the framework's effectiveness by auditing various generative models across diverse use cases like education, healthcare, banking, and human resources, spanning different data modalities such as tabular, time-series, vision, and natural language. This holistic assessment is essential for compliance with regulatory safeguards. We introduce a trustworthiness index to rank synthetic datasets based on their safeguards trade-offs. Furthermore, we present a trustworthiness-driven model selection and cross-validation process during training, exemplified with "TrustFormers" across various data types. This approach allows for controllable trustworthiness trade-offs in synthetic data creation. Our auditing framework fosters collaboration among stakeholders, including data scientists, governance experts, internal reviewers, external certifiers, and regulators. This transparent reporting should become a standard practice to prevent bias, discrimination, and privacy violations, ensuring compliance with policies and providing accountability, safety, and performance guarantees. △ Less

Submitted 9 June, 2024; v1 submitted 21 April, 2023; originally announced April 2023.

Comments: submitted

arXiv:2212.08072 [pdf]

Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs

Authors: Zeljko Kraljevic, Dan Bean, Anthony Shek, Rebecca Bendayan, Harry Hemingway, Joshua Au Yeung, Alexander Deng, Alfie Baston, Jack Ross, Esther Idowu, James T Teo, Richard J Dobson

Abstract: Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generati… ▽ More Background: Electronic Health Records hold detailed longitudinal information about each patient's health status and general clinical history, a large portion of which is stored within the unstructured text. Existing approaches focus mostly on structured data and a subset of single-domain outcomes. We explore how temporal modelling of patients from free text and structured data, using deep generative transformers can be used to forecast a wide range of future disorders, substances, procedures or findings. Methods: We present Foresight, a novel transformer-based pipeline that uses named entity recognition and linking tools to convert document text into structured, coded concepts, followed by providing probabilistic forecasts for future medical events such as disorders, substances, procedures and findings. We processed the entire free-text portion from three different hospital datasets totalling 811336 patients covering both physical and mental health. Findings: On tests in two UK hospitals (King's College Hospital, South London and Maudsley) and the US MIMIC-III dataset precision@10 0.68, 0.76 and 0.88 was achieved for forecasting the next disorder in a patient timeline, while precision@10 of 0.80, 0.81 and 0.91 was achieved for forecasting the next biomedical concept. Foresight was also validated on 34 synthetic patient timelines by five clinicians and achieved relevancy of 97% for the top forecasted candidate disorder. As a generative model, it can forecast follow-on biomedical concepts for as many steps as required. Interpretation: Foresight is a general-purpose model for biomedical concept modelling that can be used for real-world risk forecasting, virtual trials and clinical research to study the progression of disorders, simulate interventions and counterfactuals, and educational purposes. △ Less

Submitted 24 January, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

arXiv:2210.11729 [pdf, other]

An Exploration of Data Efficiency in Intra-Dataset Task Transfer for Dialog Understanding

Authors: Josiah Ross, Luke Yoffe, Alon Albalak, William Yang Wang

Abstract: Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency. This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. We hypothesize that a model can utilize the information learned from a source task to better learn a target… ▽ More Transfer learning is an exciting area of Natural Language Processing that has the potential to both improve model performance and increase data efficiency. This study explores the effects of varying quantities of target task training data on sequential transfer learning in the dialog domain. We hypothesize that a model can utilize the information learned from a source task to better learn a target task, thereby reducing the number of target task training samples required. Unintuitively, our data shows that often target task training data size has minimal effect on how sequential transfer learning performs compared to the same model without transfer learning. Our results lead us to believe that this unexpected result could be due to the effects of catastrophic forgetting, motivating further work into methods that prevent such forgetting. △ Less

Submitted 21 October, 2022; originally announced October 2022.

arXiv:2208.06665 [pdf, other]

Cloud-Based Real-Time Molecular Screening Platform with MolFormer

Authors: Brian Belgodere, Vijil Chenthamarakshan, Payel Das, Pierre Dognin, Toby Kurien, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young

Abstract: With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The pla… ▽ More With the prospect of automating a number of chemical tasks with high fidelity, chemical language processing models are emerging at a rapid speed. Here, we present a cloud-based real-time platform that allows users to virtually screen molecules of interest. For this purpose, molecular embeddings inferred from a recently proposed large chemical language model, named MolFormer, are leveraged. The platform currently supports three tasks: nearest neighbor retrieval, chemical space visualization, and property prediction. Based on the functionalities of this platform and results obtained, we believe that such a platform can play a pivotal role in automating chemistry and chemical engineering research, as well as assist in drug discovery and material design tasks. A demo of our platform is provided at \url{www.ibm.biz/molecular_demo}. △ Less

Submitted 13 August, 2022; originally announced August 2022.

Comments: Paper accepted at ECML PKDD 2022 demo track

arXiv:2205.06068 [pdf, ps, other]

On the Lambek embedding and the category of product-preserving presheaves

Authors: Peng Fu, Kohei Kishida, Neil J. Ross, Peter Selinger

Abstract: It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits… ▽ More It is well-known that the category of presheaf functors is complete and cocomplete, and that the Yoneda embedding into the presheaf category preserves products. However, the Yoneda embedding does not preserve coproducts. It is perhaps less well-known that if we restrict the codomain of the Yoneda embedding to the full subcategory of limit-preserving functors, then this embedding preserves colimits, while still enjoying most of the other useful properties of the Yoneda embedding. We call this modified embedding the Lambek embedding. The category of limit-preserving functors is known to be a reflective subcategory of the category of all functors, i.e., there is a left adjoint for the inclusion functor. In the literature, the existence of this left adjoint is often proved non-constructively, e.g., by an application of Freyd's adjoint functor theorem. In this paper, we provide an alternative, more constructive proof of this fact. We first explain the Lambek embedding and why it preserves coproducts. Then we review some concepts from multi-sorted algebras and observe that there is a one-to-one correspondence between product-preserving presheaves and certain multi-sorted term algebras. We provide a construction that freely turns any presheaf functor into a product-preserving one, hence giving an explicit definition of the left adjoint functor of the inclusion. Finally, we sketch how to extend our method to prove that the subcategory of limit-preserving functors is also reflective. △ Less

Submitted 12 May, 2022; originally announced May 2022.

arXiv:2204.13041 [pdf, ps, other]

Proto-Quipper with dynamic lifting

Authors: Peng Fu, Kohei Kishida, Neil J. Ross, Peter Selinger

Abstract: Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit exe… ▽ More Quipper is a functional programming language for quantum computing. Proto-Quipper is a family of languages aiming to provide a formal foundation for Quipper. In this paper, we extend Proto-Quipper-M with a construct called dynamic lifting, which is present in Quipper. By virtue of being a circuit description language, Proto-Quipper has two separate runtimes: circuit generation time and circuit execution time. Values that are known at circuit generation time are called parameters, and values that are known at circuit execution time are called states. Dynamic lifting is an operation that enables a state, such as the result of a measurement, to be lifted to a parameter, where it can influence the generation of the next portion of the circuit. As a result, dynamic lifting enables Proto-Quipper programs to interleave classical and quantum computation. We describe the syntax of a language we call Proto-Quipper-Dyn. Its type system uses a system of modalities to keep track of the use of dynamic lifting. We also provide an operational semantics, as well as an abstract categorical semantics for dynamic lifting based on enriched category theory. We prove that both the type system and the operational semantics are sound with respect to our categorical semantics. Finally, we give some examples of Proto-Quipper-Dyn programs that make essential use of dynamic lifting. △ Less

Submitted 8 November, 2022; v1 submitted 27 April, 2022; originally announced April 2022.

arXiv:2204.13039 [pdf, ps, other]

doi 10.4204/EPTCS.394.16

A Biset-Enriched Categorical Model for Proto-Quipper with Dynamic Lifting

Authors: Peng Fu, Kohei Kishida, Neil J. Ross, Peter Selinger

Abstract: Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generati… ▽ More Quipper and Proto-Quipper are a family of quantum programming languages that, by their nature as circuit description languages, involve two runtimes: one at which the program generates a circuit and one at which the circuit is executed, normally with probabilistic results due to measurements. Accordingly, the language distinguishes two kinds of data: parameters, which are known at circuit generation time, and states, which are known at circuit execution time. Sometimes, it is desirable for the results of measurements to control the generation of the next part of the circuit. Therefore, the language needs to turn states, such as measurement outcomes, into parameters, an operation we call dynamic lifting. The goal of this paper is to model this interaction between the runtimes by providing a general categorical structure enriched in what we call "bisets". We demonstrate that the biset-enriched structure achieves a proper semantics of the two runtimes and their interaction, by showing that it models a variant of Proto-Quipper with dynamic lifting. The present paper deals with the concrete categorical semantics of this language, whereas a companion paper deals with the syntax, type system, operational semantics, and abstract categorical semantics. △ Less

Submitted 15 November, 2023; v1 submitted 27 April, 2022; originally announced April 2022.

Comments: In Proceedings QPL 2022, arXiv:2311.08375

Journal ref: EPTCS 394, 2023, pp. 302-342

arXiv:2204.00205 [pdf, other]

A Physics-Guided Neural Operator Learning Approach to Model Biological Tissues from Digital Image Correlation Measurements

Authors: Huaiqian You, Quinn Zhang, Colton J. Ross, Chung-Hao Lee, Ming-Chen Hsu, Yue Yu

Abstract: We present a data-driven workflow to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios, without postulating a specific constitutive model form nor possessing knowledges on the material microstructure. To this end, a material database is constructed from the DIC displacement tracking measurem… ▽ More We present a data-driven workflow to biological tissue modeling, which aims to predict the displacement field based on digital image correlation (DIC) measurements under unseen loading scenarios, without postulating a specific constitutive model form nor possessing knowledges on the material microstructure. To this end, a material database is constructed from the DIC displacement tracking measurements of multiple biaxial stretching protocols on a porcine tricuspid valve anterior leaflet, with which we build a neural operator learning model. The material response is modeled as a solution operator from the loading to the resultant displacement field, with the material microstructure properties learned implicitly from the data and naturally embedded in the network parameters. Using various combinations of loading protocols, we compare the predictivity of this framework with finite element analysis based on the phenomenological Fung-type model. From in-distribution tests, the predictivity of our approach presents good generalizability to different loading conditions and outperforms the conventional constitutive modeling at approximately one order of magnitude. When tested on out-of-distribution loading ratios, the neural operator learning approach becomes less effective. To improve the generalizability of our framework, we propose a physics-guided neural operator learning model via imposing partial physics knowledge. This method is shown to improve the model's extrapolative performance in the small-deformation regime. Our results demonstrate that with sufficient data coverage and/or guidance from partial physics constraints, the data-driven approach can be a more effective method for modeling biological materials than the traditional constitutive modeling. △ Less

Submitted 1 April, 2022; originally announced April 2022.

arXiv:2203.08205 [pdf, other]

doi 10.1016/j.cma.2022.115296

Learning Deep Implicit Fourier Neural Operators (IFNOs) with Applications to Heterogeneous Material Modeling

Authors: Huaiqian You, Quinn Zhang, Colton J. Ross, Chung-Hao Lee, Yue Yu

Abstract: Constitutive modeling based on continuum mechanics theory has been a classical approach for modeling the mechanical responses of materials. However, when constitutive laws are unknown or when defects and/or high degrees of heterogeneity are present, these classical models may become inaccurate. In this work, we propose to use data-driven modeling, which directly utilizes high-fidelity simulation a… ▽ More Constitutive modeling based on continuum mechanics theory has been a classical approach for modeling the mechanical responses of materials. However, when constitutive laws are unknown or when defects and/or high degrees of heterogeneity are present, these classical models may become inaccurate. In this work, we propose to use data-driven modeling, which directly utilizes high-fidelity simulation and/or experimental measurements to predict a material's response without using conventional constitutive models. Specifically, the material response is modeled by learning the implicit mappings between loading conditions and the resultant displacement and/or damage fields, with the neural network serving as a surrogate for a solution operator. To model the complex responses due to material heterogeneity and defects, we develop a novel deep neural operator architecture, which we coin as the Implicit Fourier Neural Operator (IFNO). In the IFNO, the increment between layers is modeled as an integral operator to capture the long-range dependencies in the feature space. As the network gets deeper, the limit of IFNO becomes a fixed point equation that yields an implicit neural operator and naturally mimics the displacement/damage fields solving procedure in material modeling problems. We demonstrate the performance of our proposed method for a number of examples, including hyperelastic, anisotropic and brittle materials. As an application, we further employ the proposed approach to learn the material models directly from digital image correlation (DIC) tracking measurements, and show that the learned solution operators substantially outperform the conventional constitutive models in predicting displacement fields. △ Less

Submitted 15 March, 2022; originally announced March 2022.

arXiv:2109.05655 [pdf, other]

doi 10.4204/EPTCS.343.2

Generators and Relations for Real Stabilizer Operators

Authors: Justin Makary, Neil J. Ross, Peter Selinger

Abstract: Real stabilizer operators, which are also known as real Clifford operators, are generated, through composition and tensor product, by the Hadamard gate, the Pauli Z gate, and the controlled-Z gate. We introduce a normal form for real stabilizer circuits and show that every real stabilizer operator admits a unique normal form. Moreover, we give a finite set of relations that suffice to rewrite any… ▽ More Real stabilizer operators, which are also known as real Clifford operators, are generated, through composition and tensor product, by the Hadamard gate, the Pauli Z gate, and the controlled-Z gate. We introduce a normal form for real stabilizer circuits and show that every real stabilizer operator admits a unique normal form. Moreover, we give a finite set of relations that suffice to rewrite any real stabilizer circuit to its normal form. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: In Proceedings QPL 2021, arXiv:2109.04886

Journal ref: EPTCS 343, 2021, pp. 14-36

arXiv:2106.13724 [pdf, other]

doi 10.1093/mnras/stab1730

Primordial non-Gaussianity from the Completed SDSS-IV extended Baryon Oscillation Spectroscopic Survey I: Catalogue Preparation and Systematic Mitigation

Authors: Mehdi Rezaie, Ashley J. Ross, Hee-Jong Seo, Eva-Maria Mueller, Will J. Percival, Grant Merz, Reza Katebi, Razvan C. Bunescu, Julian Bautista, Joel R. Brownstein, Etienne Burtin, Kyle Dawson, Héctor Gil-Marín, Jiamin Hou, Eleanor B. Lyke, Axel de la Macorra, Graziano Rossi, Donald P. Schneider, Pauline Zarrouk, Gong-Bo Zhao

Abstract: We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate s… ▽ More We investigate the large-scale clustering of the final spectroscopic sample of quasars from the recently completed extended Baryon Oscillation Spectroscopic Survey (eBOSS). The sample contains $343708$ objects in the redshift range $0.8<z<2.2$ and $72667$ objects with redshifts $2.2<z<3.5$, covering an effective area of $4699~{\rm deg}^{2}$. We develop a neural network-based approach to mitigate spurious fluctuations in the density field caused by spatial variations in the quality of the imaging data used to select targets for follow-up spectroscopy. Simulations are used with the same angular and radial distributions as the real data to estimate covariance matrices, perform error analyses, and assess residual systematic uncertainties. We measure the mean density contrast and cross-correlations of the eBOSS quasars against maps of potential sources of imaging systematics to address algorithm effectiveness, finding that the neural network-based approach outperforms standard linear regression. Stellar density is one of the most important sources of spurious fluctuations, and a new template constructed using data from the Gaia spacecraft provides the best match to the observed quasar clustering. The end-product from this work is a new value-added quasar catalogue with the improved weights to correct for nonlinear imaging systematic effects, which will be made public. Our quasar catalogue is used to measure the local-type primordial non-Gaussianity in our companion paper, Mueller et al. in preparation. △ Less

Submitted 25 June, 2021; originally announced June 2021.

Comments: 17 pages, 13 figures, 2 tables. Accepted for publication in MNRAS. For the associated code and value-added catalogs see https://github.com/mehdirezaie/sysnetdev and https://github.com/mehdirezaie/eBOSSDR16QSOE

arXiv:2106.09553 [pdf, other]

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

Authors: Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, Payel Das

Abstract: Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based languag… ▽ More Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based language models pretrained on a large unlabelled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, we present molecular embeddings obtained by training an efficient transformer encoder model, MoLFormer, which uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed training, on SMILES sequences of 1.1 billion unlabelled molecules from the PubChem and ZINC datasets. We show that the learned molecular representation outperforms existing baselines, including supervised and self-supervised graph neural networks and language models, on several downstream tasks from ten benchmark datasets. They perform competitively on two others. Further analyses, specifically through the lens of attention, demonstrate that MoLFormer trained on chemical SMILES indeed learns the spatial relationships between atoms within a molecule. These results provide encouraging evidence that large-scale molecular language models can capture sufficient chemical and structural information to predict various distinct molecular properties, including quantum-chemical properties. △ Less

Submitted 14 December, 2022; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: NMI 2022

arXiv:2106.01175 [pdf, ps, other]

doi 10.4204/EPTCS.343.11

Generators and Relations for the Group On(Z[1/2])

Authors: Sarah Meng Li, Neil J. Ross, Peter Selinger

Abstract: We give a finite presentation by generators and relations for the group O_n(Z[1/2]) of n-dimensional orthogonal matrices with entries in Z[1/2]. We then obtain a similar presentation for the group of n-dimensional orthogonal matrices of the form M/sqrt(2)^k, where k is a nonnegative integer and M is an integer matrix. Both groups arise in the study of quantum circuits. In particular, when the dime… ▽ More We give a finite presentation by generators and relations for the group O_n(Z[1/2]) of n-dimensional orthogonal matrices with entries in Z[1/2]. We then obtain a similar presentation for the group of n-dimensional orthogonal matrices of the form M/sqrt(2)^k, where k is a nonnegative integer and M is an integer matrix. Both groups arise in the study of quantum circuits. In particular, when the dimension is a power of 2, the elements of the latter group are precisely the unitary matrices that can be represented by a quantum circuit over the universal gate set consisting of the Toffoli gate, the Hadamard gate, and the computational ancilla. △ Less

Submitted 12 September, 2021; v1 submitted 2 June, 2021; originally announced June 2021.

Comments: In Proceedings QPL 2021, arXiv:2109.04886

Journal ref: EPTCS 343, 2021, pp. 210-264

arXiv:2102.09299 [pdf, other]

Theory meets Practice at the Median: a worst case comparison of relative error quantile algorithms

Authors: Graham Cormode, Abhinav Mishra, Joseph Ross, Pavel Veselý

Abstract: Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: $t$-digest and ReqSketch. While $t$-digest is a popular comp… ▽ More Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to understand the tails of the input distribution. Namely, we focus on two recent state-of-the-art solutions: $t$-digest and ReqSketch. While $t$-digest is a popular compact summary which works well in a variety of settings, ReqSketch comes with formal accuracy guarantees at the cost of its size growing as new observations are inserted. In this work, we provide insight into which conditions make one preferable to the other. Namely, we show how to construct inputs for $t$-digest that induce an almost arbitrarily large error and demonstrate that it fails to provide accurate results even on i.i.d. samples from a highly non-uniform distribution. We propose practical improvements to ReqSketch, making it faster than $t$-digest, while its error stays bounded on any instance. Still, our results confirm that $t$-digest remains more accurate on the ``non-adversarial'' data encountered in practice. △ Less

Submitted 10 June, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

Comments: Updated experiments, improved presentation. To appear in KDD 2021

ACM Class: F.2.2

arXiv:2012.11696 [pdf, other]

Image Captioning as an Assistive Technology: Lessons Learned from VizWiz 2020 Challenge

Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff, Richard A. Young, Brian Belgodere

Abstract: Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on the… ▽ More Image captioning has recently demonstrated impressive progress largely owing to the introduction of neural network algorithms trained on curated dataset like MS-COCO. Often work in this field is motivated by the promise of deployment of captioning systems in practical applications. However, the scarcity of data and contexts in many competition datasets renders the utility of systems trained on these datasets limited as an assistive technology in real-world settings, such as helping visually impaired people navigate and accomplish everyday tasks. This gap motivated the introduction of the novel VizWiz dataset, which consists of images taken by the visually impaired and captions that have useful, task-oriented information. In an attempt to help the machine learning computer vision field realize its promise of producing technologies that have positive social impact, the curators of the VizWiz dataset host several competitions, including one for image captioning. This work details the theory and engineering from our winning submission to the 2020 captioning competition. Our work provides a step towards improved assistive image captioning systems. △ Less

Submitted 18 June, 2021; v1 submitted 21 December, 2020; originally announced December 2020.

Comments: In submission to JAIR. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2012.11691 [pdf, other]

Alleviating Noisy Data in Image Captioning with Cooperative Distillation

Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Inkit Padhi, Mattia Rigotti, Jarret Ross, Yair Schiff

Abstract: Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose… ▽ More Image captioning systems have made substantial progress, largely due to the availability of curated datasets like Microsoft COCO or Vizwiz that have accurate descriptions of their corresponding images. Unfortunately, scarce availability of such cleanly labeled data results in trained algorithms producing captions that can be terse and idiosyncratically specific to details in the image. We propose a new technique, cooperative distillation that combines clean curated datasets with the web-scale automatically extracted captions of the Google Conceptual Captions dataset (GCC), which can have poor descriptions of images, but is abundant in size and therefore provides a rich vocabulary resulting in more expressive captions. △ Less

Submitted 21 December, 2020; originally announced December 2020.

Comments: CVPR 2020 VizWiz Challenge

arXiv:2011.14901 [pdf, other]

Language-Driven Region Pointer Advancement for Controllable Image Captioning

Authors: Annika Lindh, Robert J. Ross, John D. Kelleher

Abstract: Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning… ▽ More Controllable Image Captioning is a recent sub-field in the multi-modal task of Image Captioning wherein constraints are placed on which regions in an image should be described in the generated natural language caption. This puts a stronger focus on producing more detailed descriptions, and opens the door for more end-user control over results. A vital component of the Controllable Image Captioning architecture is the mechanism that decides the timing of attending to each region through the advancement of a region pointer. In this paper, we propose a novel method for predicting the timing of region pointer advancement by treating the advancement step as a natural part of the language structure via a NEXT-token, motivated by a strong correlation to the sentence structure in the training data. We find that our timing agrees with the ground-truth timing in the Flickr30k Entities test data with a precision of 86.55% and a recall of 97.92%. Our model implementing this technique improves the state-of-the-art on standard captioning metrics while additionally demonstrating a considerably larger effective vocabulary size. △ Less

Submitted 30 November, 2020; originally announced November 2020.

Comments: Accepted to COLING 2020

MSC Class: 68T07; 68T45; 68T50 ACM Class: I.2.7; I.2.10; I.5.1

arXiv:2011.01843 [pdf, other]

Tabular Transformers for Modeling Multivariate Time Series

Authors: Inkit Padhi, Yair Schiff, Igor Melnyk, Mattia Rigotti, Youssef Mroueh, Pierre Dognin, Jerret Ross, Ravi Nair, Erik Altman

Abstract: Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learn… ▽ More Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations. Code and data are available at https://github.com/IBM/TabFormer. △ Less

Submitted 11 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

Comments: Accepted to ICASSP, 2021; https://github.com/IBM/TabFormer

arXiv:2009.05560 [pdf, other]

Narratives and Needs: Analyzing Experiences of Cyclone Amphan Using Twitter Discourse

Authors: Ancil Crayton, João Fonseca, Kanav Mehra, Michelle Ng, Jared Ross, Marcelo Sandoval-Castañeda, Rachel von Gnechten

Abstract: People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identi… ▽ More People often turn to social media to comment upon and share information about major global events. Accordingly, social media is receiving increasing attention as a rich data source for understanding people's social, political and economic experiences of extreme weather events. In this paper, we contribute two novel methodologies that leverage Twitter discourse to characterize narratives and identify unmet needs in response to Cyclone Amphan, which affected 18 million people in May 2020. △ Less

Submitted 11 September, 2020; originally announced September 2020.

Comments: 6 pages, 4 figures, 1 table

arXiv:2006.12021 [pdf, ps, other]

Sampling hypergraphs with given degrees

Authors: Martin Dyer, Catherine Greenhill, Pieter Kleer, James Ross, Leen Stougie

Abstract: There is a well-known connection between hypergraphs and bipartite graphs, obtained by treating the incidence matrix of the hypergraph as the biadjacency matrix of a bipartite graph. We use this connection to describe and analyse a rejection sampling algorithm for sampling simple uniform hypergraphs with a given degree sequence. Our algorithm uses, as a black box, an algorithm $\mathcal{A}$ for sa… ▽ More There is a well-known connection between hypergraphs and bipartite graphs, obtained by treating the incidence matrix of the hypergraph as the biadjacency matrix of a bipartite graph. We use this connection to describe and analyse a rejection sampling algorithm for sampling simple uniform hypergraphs with a given degree sequence. Our algorithm uses, as a black box, an algorithm $\mathcal{A}$ for sampling bipartite graphs with given degrees, uniformly or nearly uniformly, in (expected) polynomial time. The expected runtime of the hypergraph sampling algorithm depends on the (expected) runtime of the bipartite graph sampling algorithm $\mathcal{A}$, and the probability that a uniformly random bipartite graph with given degrees corresponds to a simple hypergraph. We give some conditions on the hypergraph degree sequence which guarantee that this probability is bounded below by a positive constant. △ Less

Submitted 13 July, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: 21 pages. This version addresses referees' comments

arXiv:2006.11166 [pdf, other]

Fast Mixing of Multi-Scale Langevin Dynamics under the Manifold Hypothesis

Authors: Adam Block, Youssef Mroueh, Alexander Rakhlin, Jerret Ross

Abstract: Recently, the task of image generation has attracted much attention. In particular, the recent empirical successes of the Markov Chain Monte Carlo (MCMC) technique of Langevin Dynamics have prompted a number of theoretical advances; despite this, several outstanding problems remain. First, the Langevin Dynamics is run in very high dimension on a nonconvex landscape; in the worst case, due to the N… ▽ More Recently, the task of image generation has attracted much attention. In particular, the recent empirical successes of the Markov Chain Monte Carlo (MCMC) technique of Langevin Dynamics have prompted a number of theoretical advances; despite this, several outstanding problems remain. First, the Langevin Dynamics is run in very high dimension on a nonconvex landscape; in the worst case, due to the NP-hardness of nonconvex optimization, it is thought that Langevin Dynamics mixes only in time exponential in the dimension. In this work, we demonstrate how the manifold hypothesis allows for the considerable reduction of mixing time, from exponential in the ambient dimension to depending only on the (much smaller) intrinsic dimension of the data. Second, the high dimension of the sampling space significantly hurts the performance of Langevin Dynamics; we leverage a multi-scale approach to help ameliorate this issue and observe that this multi-resolution algorithm allows for a trade-off between image quality and computational expense in generation. △ Less

Submitted 22 June, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

arXiv:2005.09599 [pdf, other]

Asymmetric scale functions for $t$-digests

Authors: Joseph Ross

Abstract: The $t$-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a $t$-digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. Aft… ▽ More The $t$-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a $t$-digest variant with accuracy asymmetric about the median, thereby making possible alternative tradeoffs between computational resources and accuracy which may be of particular interest for distributions with significant skew. After establishing some theoretical properties of scale functions for $t$-digests, we show that a tangent line construction on the familiar scale functions preserves the crucial properties that allow $t$-digests to operate online and be mergeable. We conclude with an empirical study demonstrating the asymmetric variant preserves accuracy on one side of the distribution with a much smaller memory footprint. △ Less

Submitted 19 May, 2020; originally announced May 2020.

Comments: 18 pages, 8 figured; submitted

arXiv:2005.08396 [pdf, other]

doi 10.1007/978-3-030-52482-1_9

A tutorial introduction to quantum circuit programming in dependently typed Proto-Quipper

Authors: Peng Fu, Kohei Kishida, Neil J. Ross, Peter Selinger

Abstract: We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the p… ▽ More We introduce dependently typed Proto-Quipper, or Proto-Quipper-D for short, an experimental quantum circuit programming language with linear dependent types. We give several examples to illustrate how linear dependent types can help in the construction of correct quantum circuits. Specifically, we show how dependent types enable programming families of circuits, and how dependent types solve the problem of type-safe uncomputation of garbage qubits. We also discuss other language features along the way. △ Less

Submitted 12 December, 2020; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Added a section on related work and a paragraph explaining qubit initialization and termination

Journal ref: LNCS 12227:153-168 (2020)

arXiv:2002.05702 [pdf, other]

doi 10.1016/j.media.2020.101691

Generative-based Airway and Vessel Morphology Quantification on Chest CT Images

Authors: Pietro Nardelli, James C. Ross, Raúl San José Estépar

Abstract: Accurately and precisely characterizing the morphology of small pulmonary structures from Computed Tomography (CT) images, such as airways and vessels, is becoming of great importance for diagnosis of pulmonary diseases. The smaller conducting airways are the major site of increased airflow resistance in chronic obstructive pulmonary disease (COPD), while accurately sizing vessels can help identif… ▽ More Accurately and precisely characterizing the morphology of small pulmonary structures from Computed Tomography (CT) images, such as airways and vessels, is becoming of great importance for diagnosis of pulmonary diseases. The smaller conducting airways are the major site of increased airflow resistance in chronic obstructive pulmonary disease (COPD), while accurately sizing vessels can help identify arterial and venous changes in lung regions that may determine future disorders. However, traditional methods are often limited due to image resolution and artifacts. We propose a Convolutional Neural Regressor (CNR) that provides cross-sectional measurement of airway lumen, airway wall thickness, and vessel radius. CNR is trained with data created by a generative model of synthetic structures which is used in combination with Simulated and Unsupervised Generative Adversarial Network (SimGAN) to create simulated and refined airways and vessels with known ground-truth. For validation, we first use synthetically generated airways and vessels produced by the proposed generative model to compute the relative error and directly evaluate the accuracy of CNR in comparison with traditional methods. Then, in-vivo validation is performed by analyzing the association between the percentage of the predicted forced expiratory volume in one second (FEV1\%) and the value of the Pi10 parameter, two well-known measures of lung function and airway disease, for airways. For vessels, we assess the correlation between our estimate of the small-vessel blood volume and the lungs' diffusing capacity for carbon monoxide (DLCO). The results demonstrate that Convolutional Neural Networks (CNNs) provide a promising direction for accurately measuring vessels and airways on chest CT images with physiological correlates. △ Less

Submitted 13 March, 2020; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: 19 pages, 13 figures

MSC Class: 68T20 ACM Class: I.2.1; I.4.7; J.2

arXiv:1912.11940 [pdf, other]

Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets

Authors: Mingrui Liu, Youssef Mroueh, Jerret Ross, Wei Zhang, Xiaodong Cui, Payel Das, Tianbao Yang

Abstract: Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical an… ▽ More Adaptive gradient algorithms perform gradient-based updates using the history of gradients and are ubiquitous in training deep neural networks. While adaptive gradient methods theory is well understood for minimization problems, the underlying factors driving their empirical success in min-max problems such as GANs remain unclear. In this paper, we aim at bridging this gap from both theoretical and empirical perspectives. First, we analyze a variant of Optimistic Stochastic Gradient (OSG) proposed in~\citep{daskalakis2017training} for solving a class of non-convex non-concave min-max problem and establish $O(ε^{-4})$ complexity for finding $ε$-first-order stationary point, in which the algorithm only requires invoking one stochastic first-order oracle while enjoying state-of-the-art iteration complexity achieved by stochastic extragradient method by~\citep{iusem2017extragradient}. Then we propose an adaptive variant of OSG named Optimistic Adagrad (OAdagrad) and reveal an \emph{improved} adaptive complexity $O\left(ε^{-\frac{2}{1-α}}\right)$, where $α$ characterizes the growth rate of the cumulative stochastic gradient and $0\leq α\leq 1/2$. To the best of our knowledge, this is the first work for establishing adaptive complexity in non-convex non-concave min-max optimization. Empirically, our experiments show that indeed adaptive gradient algorithms outperform their non-adaptive counterparts in GAN training. Moreover, this observation can be explained by the slow growth rate of the cumulative stochastic gradient, as observed empirically. △ Less

Submitted 24 December, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

Comments: Accepted by ICLR 2020

arXiv:1910.12999 [pdf, other]

A Decentralized Parallel Algorithm for Training Generative Adversarial Nets

Authors: Mingrui Liu, Wei Zhang, Youssef Mroueh, Xiaodong Cui, Jerret Ross, Tianbao Yang, Payel Das

Abstract: Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs… ▽ More Generative Adversarial Networks (GANs) are a powerful class of generative models in the deep learning community. Current practice on large-scale GAN training utilizes large models and distributed large-batch training strategies, and is implemented on deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) designed in a centralized manner. In the centralized network topology, every worker needs to either directly communicate with the central node or indirectly communicate with all other workers in every iteration. However, when the network bandwidth is low or network latency is high, the performance would be significantly degraded. Despite recent progress on decentralized algorithms for training deep neural networks, it remains unclear whether it is possible to train GANs in a decentralized manner. The main difficulty lies at handling the nonconvex-nonconcave min-max optimization and the decentralized communication simultaneously. In this paper, we address this difficulty by designing the \textbf{first gradient-based decentralized parallel algorithm} which allows workers to have multiple rounds of communications in one iteration and to update the discriminator and generator simultaneously, and this design makes it amenable for the convergence analysis of the proposed decentralized algorithm. Theoretically, our proposed decentralized algorithm is able to solve a class of non-convex non-concave min-max problems with provable non-asymptotic convergence to first-order stationary point. Experimental results on GANs demonstrate the effectiveness of the proposed algorithm. △ Less

Submitted 19 October, 2020; v1 submitted 28 October, 2019; originally announced October 2019.

Comments: Accepted by NeurIPS 2020

arXiv:1902.04999 [pdf, other]

Wasserstein Barycenter Model Ensembling

Authors: Pierre Dognin, Igor Melnyk, Youssef Mroueh, Jerret Ross, Cicero Dos Santos, Tom Sercu

Abstract: In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement b… ▽ More In this paper we propose to perform model ensembling in a multiclass or a multilabel learning setting using Wasserstein (W.) barycenters. Optimal transport metrics, such as the Wasserstein distance, allow incorporating semantic side information such as word embeddings. Using W. barycenters to find the consensus between models allows us to balance confidence and semantics in finding the agreement between the models. We show applications of Wasserstein ensembling in attribute-based classification, multilabel learning and image captioning generation. These results show that the W. ensembling is a viable alternative to the basic geometric or arithmetic mean ensembling. △ Less

Submitted 13 February, 2019; originally announced February 2019.

Comments: ICLR 2019

arXiv:1812.08126 [pdf]

doi 10.1007/978-3-030-01418-6_18

Generating Diverse and Meaningful Captions

Authors: Annika Lindh, Robert J. Ross, Abhijit Mahalunkar, Giancarlo Salton, John D. Kelleher

Abstract: Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates mo… ▽ More Image Captioning is a task that requires models to acquire a multi-modal understanding of the world and to express this understanding in natural language text. While the state-of-the-art for this task has rapidly improved in terms of n-gram metrics, these models tend to output the same generic captions for similar images. In this work, we address this limitation and train a model that generates more diverse and specific captions through an unsupervised training approach that incorporates a learning signal from an Image Retrieval model. We summarize previous results and improve the state-of-the-art on caption diversity and novelty. We make our source code publicly available online. △ Less

Submitted 19 December, 2018; originally announced December 2018.

Comments: Accepted for presentation at The 27th International Conference on Artificial Neural Networks (ICANN 2018)

Journal ref: Artificial Neural Networks and Machine Learning - ICANN 2018 (pp. 176-187). Springer International Publishing

arXiv:1810.06695 [pdf, other]

Exploring the Use of Attention within an Neural Machine Translation Decoder States to Translate Idioms

Authors: Giancarlo D. Salton, Robert J. Ross, John D. Kelleher

Abstract: Idioms pose problems to almost all Machine Translation systems. This type of language is very frequent in day-to-day language use and cannot be simply ignored. The recent interest in memory augmented models in the field of Language Modelling has aided the systems to achieve good results by bridging long-distance dependencies. In this paper we explore the use of such techniques into a Neural Machin… ▽ More Idioms pose problems to almost all Machine Translation systems. This type of language is very frequent in day-to-day language use and cannot be simply ignored. The recent interest in memory augmented models in the field of Language Modelling has aided the systems to achieve good results by bridging long-distance dependencies. In this paper we explore the use of such techniques into a Neural Machine Translation system to help in translation of idiomatic language. △ Less

Submitted 10 October, 2018; originally announced October 2018.

arXiv:1805.00063 [pdf, other]

Adversarial Semantic Alignment for Improved Image Captions

Authors: Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, Tom Sercu

Abstract: In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient… ▽ More In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC) test set. The OOC set, combined with our semantic score, are the proposed new diagnosis tools for the captioning community. When evaluated on OOC and MS-COCO benchmarks, we show that SCST-based training has a strong performance in both semantic score and human evaluation, promising to be a valuable new approach for efficient discrete GAN training. △ Less

Submitted 6 June, 2019; v1 submitted 30 April, 2018; originally announced May 2018.

Comments: Authors Equal Contribution, CVPR 2019

arXiv:1710.07345 [pdf, other]

doi 10.1038/s41534-018-0072-4

Automated optimization of large quantum circuits with continuous parameters

Authors: Yunseong Nam, Neil J. Ross, Yuan Su, Andrew M. Childs, Dmitri Maslov

Abstract: We develop and implement automated methods for optimizing quantum circuits of the size and type expected in quantum computations that outperform classical computers. We show how to handle continuous gate parameters and report a collection of fast algorithms capable of optimizing large-scale quantum circuits. For the suite of benchmarks considered, we obtain substantial reductions in gate counts. I… ▽ More We develop and implement automated methods for optimizing quantum circuits of the size and type expected in quantum computations that outperform classical computers. We show how to handle continuous gate parameters and report a collection of fast algorithms capable of optimizing large-scale quantum circuits. For the suite of benchmarks considered, we obtain substantial reductions in gate counts. In particular, we provide better optimization in significantly less time than previous approaches, while making minimal structural changes so as to preserve the basic layout of the underlying quantum algorithms. Our results help bridge the gap between the computations that can be run on existing hardware and those that are expected to outperform classical computers. △ Less

Submitted 1 June, 2018; v1 submitted 19 October, 2017; originally announced October 2017.

Comments: 21 pages

Journal ref: npj:Quantum Information 4, 23 (2018)

arXiv:1704.08343 [pdf]

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor

Authors: David Richie, James Ross, Jamie Infantolino

Abstract: The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed sha… ▽ More The Adapteva Epiphany many-core architecture comprises a scalable 2D mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. Whereas such a processor offers high computational energy efficiency and parallel scalability, developing effective programming models that address the unique architecture features has presented many challenges. We present here a distributed shared memory (DSM) model supported in software transparently using C++ templated metaprogramming techniques. The approach offers an extremely simple parallel programming model well suited for the architecture. Initial results are presented that demonstrate the approach and provide insight into the efficiency of the programming model and also the ability of the NoC to support a DSM without explicit control over data movement and localization. △ Less

Submitted 26 April, 2017; originally announced April 2017.

Comments: 10 pages, 2 figures, ICCS/ALCHEMY Workshop 2017

arXiv:1704.04760 [pdf]

In-Datacenter Performance Analysis of a Tensor Processing Unit

Authors: Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, Rick Boyle, Pierre-luc Cantin, Clifford Chao, Chris Clark, Jeremy Coriell, Mike Daley, Matt Dau, Jeffrey Dean, Ben Gelb, Tara Vazir Ghaemmaghami, Rajendra Gottipati, William Gulland, Robert Hagmann, C. Richard Ho, Doug Hogberg , et al. (50 additional authors not shown)

Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOp… ▽ More Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU. △ Less

Submitted 16 April, 2017; originally announced April 2017.

Comments: 17 pages, 11 figures, 8 tables. To appear at the 44th International Symposium on Computer Architecture (ISCA), Toronto, Canada, June 24-28, 2017

arXiv:1703.10242 [pdf]

I CAN HAS SUPERCOMPUTER? A Novel Approach to Teaching Parallel and Distributed Computing Concepts Using a Meme-Based Programming Language

Authors: David Richie, James Ross

Abstract: A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this paper uses a procedural programming language appealing to undergraduate students. We propose that the amusing nature of the approach may engender creativity and int… ▽ More A novel approach is presented to teach the parallel and distributed computing concepts of synchronization and remote memory access. The single program multiple data (SPMD) partitioned global address space (PGAS) model presented in this paper uses a procedural programming language appealing to undergraduate students. We propose that the amusing nature of the approach may engender creativity and interest using these concepts later in more sober environments. Specifically, we implement parallel extensions to LOLCODE within a source-to-source compiler sufficient for the development of parallel and distributed algorithms normally implemented using conventional high-performance computing languages and APIs. △ Less

Submitted 29 March, 2017; originally announced March 2017.

Comments: 7 pages, 2 figures, example code, accepted for publication at the 7th NSF/TCPP Workshop on Parallel and Distributed Computing Education (EduPar-17) workshop in conjunction with the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS 17)

arXiv:1701.00140 [pdf, other]

doi 10.4204/EPTCS.266.5

A Finite Presentation of CNOT-Dihedral Operators

Authors: Matthew Amy, Jianxin Chen, Neil J. Ross

Abstract: We give a finite presentation by generators and relations of the unitary operators expressible over the {CNOT, T, X} gate set, also known as CNOT-dihedral operators. To this end, we introduce a notion of normal form for CNOT-dihedral circuits and prove that every CNOT-dihedral operator admits a unique normal form. Moreover, we show that in the presence of certain structural rules only finitely man… ▽ More We give a finite presentation by generators and relations of the unitary operators expressible over the {CNOT, T, X} gate set, also known as CNOT-dihedral operators. To this end, we introduce a notion of normal form for CNOT-dihedral circuits and prove that every CNOT-dihedral operator admits a unique normal form. Moreover, we show that in the presence of certain structural rules only finitely many circuit identities are required to reduce an arbitrary CNOT-dihedral circuit to its normal form. By appropriately restricting our relations, we obtain a finite presentation of unitary operators expressible over the {CNOT, T} gate set as a corollary. △ Less

Submitted 28 April, 2019; v1 submitted 31 December, 2016; originally announced January 2017.

Comments: In Proceedings QPL 2017, arXiv:1802.09737

Journal ref: EPTCS 266, 2018, pp. 84-97

arXiv:1612.00563 [pdf, other]

Self-critical Sequence Training for Image Captioning

Authors: Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, Vaibhava Goel

Abstract: Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, signifi… ▽ More Recently it has been shown that policy-gradient methods for reinforcement learning can be utilized to train deep end-to-end systems directly on non-differentiable metrics for the task at hand. In this paper we consider the problem of optimizing image captioning systems using reinforcement learning, and show that by carefully optimizing our systems using the test metrics of the MSCOCO task, significant gains in performance can be realized. Our systems are built using a new optimization approach that we call self-critical sequence training (SCST). SCST is a form of the popular REINFORCE algorithm that, rather than estimating a "baseline" to normalize the rewards and reduce variance, utilizes the output of its own test-time inference algorithm to normalize the rewards it experiences. Using this approach, estimating the reward signal (as actor-critic methods must do) and estimating normalization (as REINFORCE algorithms typically do) is avoided, while at the same time harmonizing the model with respect to its test-time inference procedure. Empirically we find that directly optimizing the CIDEr metric with SCST and greedy decoding at test-time is highly effective. Our results on the MSCOCO evaluation sever establish a new state-of-the-art on the task, improving the best result in terms of CIDEr from 104.9 to 114.7. △ Less

Submitted 15 November, 2017; v1 submitted 1 December, 2016; originally announced December 2016.

Comments: CVPR 2017 + additional analysis + fixed baseline results, 16 pages

arXiv:1609.08283 [pdf, other]

A data-driven model for influenza transmission incorporating media effects

Authors: Lewis Mitchell, Joshua V. Ross

Abstract: Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of "big data" coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming availab… ▽ More Numerous studies have attempted to model the effect of mass media on the transmission of diseases such as influenza, however quantitative data on media engagement has until recently been difficult to obtain. With the recent explosion of "big data" coming from online social media and the like, large volumes of data on a population's engagement with mass media during an epidemic are becoming available to researchers. In this study we combine an online data set comprising millions of shared messages relating to influenza with traditional surveillance data on flu activity to suggest a functional form for the relationship between the two. Using this data we present a simple deterministic model for influenza dynamics incorporating media effects, and show that such a model helps explain the dynamics of historical influenza outbreaks. Furthermore, through model selection we show that the proposed media function fits historical data better than other media functions proposed in earlier studies. △ Less

Submitted 27 September, 2016; originally announced September 2016.

Comments: To appear in Royal Society Open Science

arXiv:1608.03549 [pdf, other]

doi 10.1007/978-3-319-50995-2_12

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

Authors: David Richie, James Ross

Abstract: There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with min… ▽ More There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologies

arXiv:1608.03545 [pdf, other]

doi 10.1007/978-3-319-50995-2_10

An OpenSHMEM Implementation for the Adapteva Epiphany Coprocessor

Authors: James Ross, David Richie

Abstract: This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and… ▽ More This paper reports the implementation and performance evaluation of the OpenSHMEM 1.3 specification for the Adapteva Epiphany architecture within the Parallella single-board computer. The Epiphany architecture exhibits massive many-core scalability with a physically compact 2D array of RISC CPU cores and a fast network-on-chip (NoC). While fully capable of MPMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to Partitioned Global Address Space (PGAS) programming models and SPMD execution with SHMEM. △ Less

Submitted 11 August, 2016; originally announced August 2016.

Comments: 14 pages, 9 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologies

arXiv:1605.04693 [pdf]

Overcoming the language barrier in mobile user interface design: A case study on a mobile health app

Authors: Jason Ross, Jing Gao

Abstract: This research report proposes a structured solution to address the need for awareness of cultural and language in user design. It will include evaluated research on established methods that already exist. Discussed ideas about how to address this situation include: what others have found to take into consideration when using design principles to develop an interface, detailed troubles and critical… ▽ More This research report proposes a structured solution to address the need for awareness of cultural and language in user design. It will include evaluated research on established methods that already exist. Discussed ideas about how to address this situation include: what others have found to take into consideration when using design principles to develop an interface, detailed troubles and critical issues that have been previously identified and also ways that have been found already to overcome such issues. This will also involve designing a prototype application catering to resolving these issues. Overcoming the language barrier plays an important role in the process of implementing a user design interface that will satisfy users. This issue must be researched and examined to identify the issues and concerns associated in order to provide a solution in an ethical manner. △ Less

Submitted 16 May, 2016; originally announced May 2016.

Comments: Research-in-progress ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

Report number: ACIS/2015/13

arXiv:1604.04207 [pdf]

doi 10.1016/j.procs.2016.05.478

Advances in Run-Time Performance and Interoperability for the Adapteva Epiphany Coprocessor

Authors: David A. Richie, James A. Ross

Abstract: The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which contribute to software design challenges for the application developer. Addressing these challenges within the software stack that supports application development i… ▽ More The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). The architecture presents many features and constraints which contribute to software design challenges for the application developer. Addressing these challenges within the software stack that supports application development is critical to improving productivity and expanding the range of applications for the architecture. We report here on advances that have been made in the COPRTHR-2 software stack targeting the Epiphany architecture that address critical issues identified in previous work. Specifically, we describe improvements that bring greater control and precision to the design of compact compiled binary programs in the context of the limited per-core local memory of the architecture. We describe a new design for run-time support that has been implemented to dramatically improve the program load and execute performance and capabilities. Finally, we describe developments that advance host-coprocessor interoperability to expand the functionality available to the application developer. △ Less

Submitted 14 April, 2016; originally announced April 2016.

Comments: 11 pages, 3 figures, accepted to ICCS'16 ALCHEMY workshop

arXiv:1604.04205 [pdf]

Implementing OpenSHMEM for the Adapteva Epiphany RISC Array Processor

Authors: James A. Ross, David A. Richie

Abstract: The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following… ▽ More The energy-efficient Adapteva Epiphany architecture exhibits massive many-core scalability in a physically compact 2D array of RISC cores with a fast network-on-chip (NoC). With fully divergent cores capable of MIMD execution, the physical topology and memory-mapped capabilities of the core and network translate well to partitioned global address space (PGAS) parallel programming models. Following an investigation into the use of two-sided communication using threaded MPI, one-sided communication using SHMEM is being explored. Here we present work in progress on the development of an OpenSHMEM 1.2 implementation for the Epiphany architecture. △ Less

Submitted 14 April, 2016; originally announced April 2016.

Comments: 4 pages, 1 figure, accepted to ICCS'16 ALCHEMY workshop

arXiv:1506.05442 [pdf]

Parallel Programming Model for the Epiphany Many-Core Coprocessor Using Threaded MPI

Authors: James A. Ross, David A. Richie, Song J. Park, Dale R. Shires

Abstract: The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This… ▽ More The Adapteva Epiphany many-core architecture comprises a 2D tiled mesh Network-on-Chip (NoC) of low-power RISC cores with minimal uncore functionality. It offers high computational energy efficiency for both integer and floating point calculations as well as parallel scalability. Yet despite the interesting architectural features, a compelling programming model has not been presented to date. This paper demonstrates an efficient parallel programming model for the Epiphany architecture based on the Message Passing Interface (MPI) standard. Using MPI exploits the similarities between the Epiphany architecture and a conventional parallel distributed cluster of serial cores. Our approach enables MPI codes to execute on the RISC array processor with little modification and achieve high performance. We report benchmark results for the threaded MPI implementation of four algorithms (dense matrix-matrix multiplication, N-body particle interaction, a five-point 2D stencil update, and 2D FFT) and highlight the importance of fast inter-core communication for the architecture. △ Less

Submitted 17 June, 2015; originally announced June 2015.

Comments: 7 pages, 6 figures, presented at ISCA'15, Third ACM International Workshop on Manycore Embedded Systems

ACM Class: C.1.4; D.1.3

arXiv:1505.01547 [pdf, other]

Understanding the Heavy Tailed Dynamics in Human Behavior

Authors: Gordon J Ross, Tim Jones

Abstract: The recent availability of electronic datasets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the inter-event times between consecutive communication events obey heavy tailed power law dynamics. Explaining this has proved controversial, and tw… ▽ More The recent availability of electronic datasets containing large volumes of communication data has made it possible to study human behavior on a larger scale than ever before. From this, it has been discovered that across a diverse range of data sets, the inter-event times between consecutive communication events obey heavy tailed power law dynamics. Explaining this has proved controversial, and two distinct hypotheses have emerged. The first holds that these power laws are fundamental, and arise from the mechanisms such as priority queuing that humans use to schedule tasks. The second holds that they are a statistical artifact which only occur in aggregated data when features such as circadian rhythms and burstiness are ignored. We use a large social media data set to test these hypotheses, and find that although models that incorporate circadian rhythms and burstiness do explain part of the observed heavy tails, there is residual unexplained heavy tail behavior which suggests a more fundamental cause. Based on this, we develop a new quantitative model of human behavior which improves on existing approaches, and gives insight into the mechanisms underlying human interactions. △ Less

Submitted 6 May, 2015; originally announced May 2015.

Comments: 9 pages in Physical Review E, 2015

Showing 1–50 of 57 results for author: Ross, J