Skip to main content

Showing 1–50 of 52 results for author: Allen, J

  1. arXiv:2407.05333  [pdf, other

    physics.app-ph cs.AI

    Generating multi-scale NMC particles with radial grain architectures using spatial stochastics and GANs

    Authors: Lukas Fuchs, Orkun Furat, Donal P. Finegan, Jeffery Allen, Francois L. E. Usseglio-Viretta, Bertan Ozdogru, Peter J. Weddle, Kandler Smith, Volker Schmidt

    Abstract: Understanding structure-property relationships of Li-ion battery cathodes is crucial for optimizing rate-performance and cycle-life resilience. However, correlating the morphology of cathode particles, such as in NMC811, and their inner grain architecture with electrode performance is challenging, particularly, due to the significant length-scale difference between grain and particle sizes. Experi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  2. arXiv:2406.09116  [pdf, other

    cs.LG stat.ML

    Injective Flows for parametric hypersurfaces

    Authors: Marcello Massimo Negri, Jonathan Aellen, Volker Roth

    Abstract: Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for para… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.04230  [pdf, other

    cs.CV cs.AI

    M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data

    Authors: Matthew J Allen, Francisco Dorr, Joseph Alejandro Gallego Mejia, Laura Martínez-Ferrer, Anna Jungbluth, Freddie Kalaitzis, Raúl Ramos-Pollán

    Abstract: Satellite-based remote sensing has revolutionised the way we address global challenges in a rapidly evolving world. Huge quantities of Earth Observation (EO) data are generated by satellite sensors daily, but processing these large datasets for use in ML pipelines is technically and computationally challenging. Specifically, different types of EO data are often hosted on a variety of platforms, wi… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures

    ACM Class: I.4; I.4.6; I.4.8; I.4.9; I.5; I.5.4

  4. arXiv:2405.05596  [pdf, other

    cs.CY cs.HC cs.IR cs.LG stat.ME

    Measuring Strategization in Recommendation: Users Adapt Their Behavior to Shape Future Content

    Authors: Sarah H. Cen, Andrew Ilyas, Jennifer Allen, Hannah Li, Aleksander Madry

    Abstract: Most modern recommendation algorithms are data-driven: they generate personalized recommendations by observing users' past behaviors. A common assumption in recommendation is that how a user interacts with a piece of content (e.g., whether they choose to "like" it) is a reflection of the content, but not of the algorithm that generated it. Although this assumption is convenient, it fails to captur… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2402.06831  [pdf

    cs.SI

    What We Know About Using Non-Engagement Signals in Content Ranking

    Authors: Tom Cunningham, Sana Pandey, Leif Sigerson, Jonathan Stray, Jeff Allen, Bonnie Barrilleaux, Ravi Iyer, Smitha Milli, Mohit Kothari, Behnam Rezaei

    Abstract: Many online platforms predominantly rank items by predicted user engagement. We believe that there is much unrealized potential in including non-engagement signals, which can improve outcomes both for platforms and for society as a whole. Based on a daylong workshop with experts from industry and academia, we formulate a series of propositions and document each as best we can from public evidence,… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3; H.4.3

  6. arXiv:2311.00197  [pdf, other

    cs.RO

    Design, Modeling, and Control of a Low-Cost and Rapid Response Soft-Growing Manipulator for Orchard Operations

    Authors: Ryan Dorosh, Justin Allen, Zixuan He, Christopher Ninatanta, Jack Coleman, Jack Spieker, Ethan Tuck, Jordan Kurtz, Qin Zhang, Matthew D. Whiting, Jiecai Luo, Manoj Karkee, Ming Luo

    Abstract: Tree fruit growers around the world are facing labor shortages for critical operations, including harvest and pruning. There is a great interest in developing robotic solutions for these labor-intensive tasks, but current efforts have been prohibitively costly, slow, or require a reconfiguration of the orchard in order to function. In this paper, we introduce an alternative approach to robotics us… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

    Comments: International Conference on Intelligent Robots and Systems (IROS) 2023

  7. arXiv:2305.16846  [pdf, other

    cs.LG physics.data-an physics.flu-dyn stat.ML

    Lagrangian Flow Networks for Conservation Laws

    Authors: F. Arend Torres, Marcello Massimo Negri, Marco Inversi, Jonathan Aellen, Volker Roth

    Abstract: We introduce Lagrangian Flow Networks (LFlows) for modeling fluid densities and velocities continuously in space and time. By construction, the proposed LFlows satisfy the continuity equation, a PDE describing mass conservation in its differentiable form. Our model is based on the insight that solutions to the continuity equation can be expressed as time-dependent density transformations via diffe… ▽ More

    Submitted 13 December, 2023; v1 submitted 26 May, 2023; originally announced May 2023.

  8. arXiv:2303.15604  [pdf, other

    q-bio.BM cs.LG

    HD-Bind: Encoding of Molecular Structure with Low Precision, Hyperdimensional Binary Representations

    Authors: Derek Jones, Jonathan E. Allen, Xiaohua Zhang, Behnam Khaleghi, Jaeyoung Kang, Weihong Xu, Niema Moshiri, Tajana S. Rosing

    Abstract: Publicly available collections of drug-like molecules have grown to comprise 10s of billions of possibilities in recent history due to advances in chemical synthesis. Traditional methods for identifying ``hit'' molecules from a large collection of potential drug-like candidates have relied on biophysical theory to compute approximations to the Gibbs free energy of the binding interaction between t… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

  9. arXiv:2302.06829  [pdf, other

    cs.CL cs.SC

    The Role of Semantic Parsing in Understanding Procedural Text

    Authors: Hossein Rajaby Faghihi, Parisa Kordjamshidi, Choh Man Teng, James Allen

    Abstract: In this paper, we investigate whether symbolic semantic representations, extracted from deep semantic parsers, can help reasoning over the states of involved entities in a procedural text. We consider a deep semantic parser~(TRIPS) and semantic role labeling as two sources of semantic parsing knowledge. First, we propose PROPOLIS, a symbolic parsing-based procedural reasoning framework. Second, we… ▽ More

    Submitted 17 May, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: 9 pages, Appected in EACL2023

  10. arXiv:2212.03646  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Automatic Cetacean Photo-Identification: A Framework for Fine-Grain, Few-Shot Learning in Marine Ecology

    Authors: Cameron Trotter, Nick Wright, A. Stephen McGough, Matt Sharpe, Barbara Cheney, Mònica Arso Civil, Reny Tyson Moore, Jason Allen, Per Berggren

    Abstract: Photo-identification (photo-id) is one of the main non-invasive capture-recapture methods utilised by marine researchers for monitoring cetacean (dolphin, whale, and porpoise) populations. This method has historically been performed manually resulting in high workload and cost due to the vast number of images collected. Recently automated aids have been developed to help speed-up photo-id, althoug… ▽ More

    Submitted 7 December, 2022; originally announced December 2022.

    Comments: 8 pages, 8 figures, 3 tables. Submitted and accepted to IEEE Big Data 2022 Conference

  11. arXiv:2210.17043  [pdf, other

    cs.LG stat.AP

    Evaluating Point-Prediction Uncertainties in Neural Networks for Drug Discovery

    Authors: Ya Ju Fan, Jonathan E. Allen, Kevin S. McLoughlin, Da Shi, Brian J. Bennion, Xiaohua Zhang, Felice C. Lightstone

    Abstract: Neural Network (NN) models provide potential to speed up the drug discovery process and reduce its failure rates. The success of NN models require uncertainty quantification (UQ) as drug discovery explores chemical space beyond the training data distribution. Standard NN models do not provide uncertainty information. Methods that combine Bayesian models with NN models address this issue, but are d… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

  12. arXiv:2207.14709  [pdf, other

    eess.IV cs.CV

    Robust Quantitative Susceptibility Mapping via Approximate Message Passing with Parameter Estimation

    Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

    Abstract: Purpose: For quantitative susceptibility mapping (QSM), the lack of ground-truth in clinical settings makes it challenging to determine suitable parameters for the dipole inversion. We propose a probabilistic Bayesian approach for QSM with built-in parameter estimation, and incorporate the nonlinear formulation of the dipole inversion to achieve a robust recovery of the susceptibility maps. Theo… ▽ More

    Submitted 30 May, 2023; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: Keywords: Approximate message passing, Compressive sensing, Outlier modelling, Parameter estimation, Quantitative susceptibility mapping

  13. Compressing the chronology of a temporal network with graph commutators

    Authors: Andrea J. Allen, Cristopher Moore, Laurent Hébert-Dufresne

    Abstract: Studies of dynamics on temporal networks often represent the network as a series of "snapshots," static networks active for short durations of time. We argue that successive snapshots can be aggregated if doing so has little effect on the overlying dynamics. We propose a method to compress network chronologies by progressively combining pairs of snapshots whose matrix commutators have the smallest… ▽ More

    Submitted 29 March, 2024; v1 submitted 23 May, 2022; originally announced May 2022.

    Journal ref: Phys. Rev. Lett. 132, 077402 (2024)

  14. arXiv:2205.04321  [pdf, other

    cs.LG

    Evaluating the Fairness Impact of Differentially Private Synthetic Data

    Authors: Blake Bullwinkel, Kristen Grabarz, Lily Ke, Scarlett Gong, Chris Tanner, Joshua Allen

    Abstract: Differentially private (DP) synthetic data is a promising approach to maximizing the utility of data containing sensitive information. Due to the suppression of underrepresented classes that is often required to achieve privacy, however, it may be in conflict with fairness. We evaluate four DP synthesizers and present empirical results indicating that three of these models frequently degrade fairn… ▽ More

    Submitted 20 June, 2022; v1 submitted 9 May, 2022; originally announced May 2022.

  15. arXiv:2204.12903  [pdf, other

    cs.LG cs.CR

    Spending Privacy Budget Fairly and Wisely

    Authors: Lucas Rosenblatt, Joshua Allen, Julia Stoyanovich

    Abstract: Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are c… ▽ More

    Submitted 27 April, 2022; originally announced April 2022.

  16. arXiv:2203.09986  [pdf, other

    cs.CV

    GiNGR: Generalized Iterative Non-Rigid Point Cloud and Surface Registration Using Gaussian Process Regression

    Authors: Dennis Madsen, Jonathan Aellen, Andreas Morel-Forster, Thomas Vetter, Marcel Lüthi

    Abstract: In this paper, we unify popular non-rigid registration methods for point sets and surfaces under our general framework, GiNGR. GiNGR builds upon Gaussian Process Morphable Models (GPMM) and hence separates modeling the deformation prior from model adaptation for registration. In addition, it provides explainable hyperparameters, multi-resolution registration, trivial inclusion of expert annotation… ▽ More

    Submitted 18 March, 2022; originally announced March 2022.

  17. arXiv:2202.06692  [pdf, other

    cs.CR cs.CY cs.HC

    TRIP: Trust-Limited Coercion-Resistant In-Person Voter Registration

    Authors: Louis-Henri Merino, Simone Colombo, Rene Reyes, Alaleh Azhir, Haoqian Zhang, Jeff Allen, Bernhard Tellenbach, Vero Estrada-Galiñanes, Bryan Ford

    Abstract: Remote electronic voting is convenient and flexible, but presents risks of coercion and vote buying. One promising mitigation strategy enables voters to give a coercer fake voting credentials, which silently cast votes that do not count. However, current proposals make problematic assumptions during credential issuance, such as relying on a trustworthy registrar, on trusted hardware, or on voters… ▽ More

    Submitted 17 March, 2024; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: 21 pages

  18. arXiv:2112.08692  [pdf, other

    cs.CV cs.CL cs.LG

    Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription

    Authors: Nikolai Vogler, Jonathan Parkes Allen, Matthew Thomas Miller, Taylor Berg-Kirkpatrick

    Abstract: We present a self-supervised pre-training approach for learning rich visual language representations for both handwritten and printed historical document transcription. After supervised fine-tuning of our pre-trained encoder representations for low-resource document transcription on two languages, (1) a heterogeneous set of handwritten Islamicate manuscript images and (2) early modern English prin… ▽ More

    Submitted 16 December, 2021; originally announced December 2021.

  19. arXiv:2111.07407  [pdf, other

    cs.LG stat.AP stat.ML

    A Machine Learning Approach for Recruitment Prediction in Clinical Trial Design

    Authors: Jingshu Liu, Patricia J Allen, Luke Benz, Daniel Blickstein, Evon Okidi, Xiao Shi

    Abstract: Significant advancements have been made in recent years to optimize patient recruitment for clinical trials, however, improved methods for patient recruitment prediction are needed to support trial site selection and to estimate appropriate enrollment timelines in the trial design stage. In this paper, using data from thousands of historical clinical trials, we explore machine learning methods to… ▽ More

    Submitted 14 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  20. arXiv:2109.03328  [pdf

    cs.CR cs.AI cs.LG cs.NI

    Predicting Process Name from Network Data

    Authors: Justin Allen, David Knapp, Kristine Monteith

    Abstract: The ability to identify applications based on the network data they generate could be a valuable tool for cyber defense. We report on a machine learning technique capable of using netflow-like features to predict the application that generated the traffic. In our experiments, we used ground-truth labels obtained from host-based sensors deployed in a large enterprise environment; we applied random… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: Presented at 1st International Workshop on Adaptive Cyber Defense, 2021 (arXiv:2108.08476)

    Report number: IJCAI-ACD/2021/104

  21. arXiv:2105.11538  [pdf

    cs.SI physics.soc-ph

    The power of reciprocal knowledge sharing relationships for startup success

    Authors: T. J. Allen, P. Gloor, A. Fronzetti Colladon, S. L. Woerner, O. Raz

    Abstract: Purpose: The purpose of this paper is to examine the innovative capabilities of biotech start-ups in relation to geographic proximity and knowledge sharing interaction in the R&D network of a major high-tech cluster. Design-methodology-approach: This study compares longitudinal informal communication networks of researchers at biotech start-ups with company patent applications in subsequent year… ▽ More

    Submitted 20 May, 2021; originally announced May 2021.

    ACM Class: J.4

    Journal ref: Journal of Small Business and Enterprise Development 23(3), 636-651 (2016)

  22. arXiv:2105.07140  [pdf, other

    q-bio.NC cs.CV q-bio.QM

    NeuroGen: activation optimized image synthesis for discovery neuroscience

    Authors: Zijin Gu, Keith W. Jamison, Meenakshi Khosla, Emily J. Allen, Yihan Wu, Thomas Naselaris, Kendrick Kay, Mert R. Sabuncu, Amy Kuceyeski

    Abstract: Functional MRI (fMRI) is a powerful technique that has allowed us to characterize visual cortex responses to stimuli, yet such experiments are by nature constructed based on a priori hypotheses, limited to the set of images presented to the individual while they are in the scanner, are subject to noise in the observed brain responses, and may vary widely across individuals. In this work, we propos… ▽ More

    Submitted 15 May, 2021; originally announced May 2021.

  23. arXiv:2104.04547  [pdf, other

    cs.LG q-bio.BM

    High-Throughput Virtual Screening of Small Molecule Inhibitors for SARS-CoV-2 Protein Targets with Deep Fusion Models

    Authors: Garrett A. Stevenson, Derek Jones, Hyojin Kim, W. F. Drew Bennett, Brian J. Bennion, Monica Borucki, Feliza Bourguet, Aidan Epstein, Magdalena Franco, Brooke Harmon, Stewart He, Max P. Katz, Daniel Kirshner, Victoria Lao, Edmond Y. Lau, Jacky Lo, Kevin McLoughlin, Richard Mosesso, Deepa K. Murugesh, Oscar A. Negrete, Edwin A. Saada, Brent Segelke, Maxwell Stefan, Marisa W. Torres, Dina Weilhammer , et al. (7 additional authors not shown)

    Abstract: Structure-based Deep Fusion models were recently shown to outperform several physics- and machine learning-based protein-ligand binding affinity prediction methods. As part of a multi-institutional COVID-19 pandemic response, over 500 million small molecules were computationally screened against four protein structures from the novel coronavirus (SARS-CoV-2), which causes COVID-19. Three enhanceme… ▽ More

    Submitted 31 May, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

  24. arXiv:2103.14035  [pdf, other

    cs.CR stat.AP

    U.S. Broadband Coverage Data Set: A Differentially Private Data Release

    Authors: Mayana Pereira, Allen Kim, Joshua Allen, Kevin White, Juan Lavista Ferres, Rahul Dodhia

    Abstract: Broadband connectivity is a key metric in today's economy. In an era of rapid expansion of the digital economy, it directly impacts GDP. Furthermore, with the COVID-19 guidelines of social distancing, internet connectivity became necessary to everyday activities such as work, learning, and staying in touch with family and friends. This paper introduces a publicly available U.S. Broadband Coverage… ▽ More

    Submitted 1 April, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

  25. arXiv:2011.05537  [pdf, other

    cs.LG cs.AI cs.CR cs.CY

    Differentially Private Synthetic Data: Applied Evaluations and Enhancements

    Authors: Lucas Rosenblatt, Xiaoyan Liu, Samira Pouyanfar, Eduardo de Leon, Anuj Desai, Joshua Allen

    Abstract: Machine learning practitioners frequently seek to leverage the most informative available data, without violating the data owner's privacy, when building predictive models. Differentially private data synthesis protects personal details from exposure, and allows for the training of differentially private machine learning models on privately generated datasets. But how can we effectively assess the… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

    Comments: Under Review

  26. Distributed Differentially Private Mutual Information Ranking and Its Applications

    Authors: Ankit Srivastava, Samira Pouyanfar, Joshua Allen, Ken Johnston, Qida Ma

    Abstract: Computation of Mutual Information (MI) helps understand the amount of information shared between a pair of random variables. Automated feature selection techniques based on MI ranking are regularly used to extract information from sensitive datasets exceeding petabytes in size, over millions of features and classes. Series of one-vs-all MI computations can be cascaded to produce n-fold MI results,… ▽ More

    Submitted 22 September, 2020; originally announced September 2020.

    Journal ref: 2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI) (pp. 90-96)

  27. arXiv:2008.01806  [pdf, other

    eess.IV cs.CV

    Fast Nonconvex $T_2^*$ Mapping Using ADMM

    Authors: Shuai Huang, James J. Lah, Jason W. Allen, Deqiang Qiu

    Abstract: Magnetic resonance (MR)-$T_2^*$ mapping is widely used to study hemorrhage, calcification and iron deposition in various clinical applications, it provides a direct and precise mapping of desired contrast in the tissue. However, the long acquisition time required by conventional 3D high-resolution $T_2^*$ mapping method causes discomfort to patients and introduces motion artifacts to reconstructed… ▽ More

    Submitted 4 August, 2020; originally announced August 2020.

  28. arXiv:2007.12327  [pdf, other

    cs.GT cs.CR

    Stochastic Dynamic Information Flow Tracking Game using Supervised Learning for Detecting Advanced Persistent Threats

    Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Linda Bushnell, Wenke Lee, Radha Poovendran

    Abstract: Advanced persistent threats (APTs) are organized prolonged cyberattacks by sophisticated attackers. Although APT activities are stealthy, they interact with the system components and these interactions lead to information flows. Dynamic Information Flow Tracking (DIFT) has been proposed as one of the effective ways to detect APTs using the information flows. However, wide range security analysis u… ▽ More

    Submitted 25 June, 2021; v1 submitted 23 July, 2020; originally announced July 2020.

  29. arXiv:2007.02670  [pdf

    cs.CL

    A Broad-Coverage Deep Semantic Lexicon for Verbs

    Authors: James Allen, Hannah An, Ritwik Bose, Will de Beaumont, Choh Man Teng

    Abstract: Progress on deep language understanding is inhibited by the lack of a broad coverage lexicon that connects linguistic behavior to ontological concepts and axioms. We have developed COLLIE-V, a deep lexical resource for verbs, with the coverage of WordNet and syntactic and semantic details that meet or exceed existing resources. Bootstrapping from a hand-built lexicon and ontology, new ontological… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

    Comments: Draft of LREC-2020 paper. Proceedings of The 12th Language Resources and Evaluation Conference. 2020

    ACM Class: I.2.7

  30. arXiv:2007.00076  [pdf, other

    math.OC cs.GT

    A Reinforcement Learning Approach for Dynamic Information Flow Tracking Games for Detecting Advanced Persistent Threats

    Authors: Dinuka Sahabandu, Shana Moothedath, Joey Allen, Linda Bushnell, Wenke Lee, Radha Poovendran

    Abstract: Advanced Persistent Threats (APTs) are stealthy attacks that threaten the security and privacy of sensitive information. Interactions of APTs with victim system introduce information flows that are recorded in the system logs. Dynamic Information Flow Tracking (DIFT) is a promising detection mechanism for detecting APTs. DIFT taints information flows originating at system entities that are suscept… ▽ More

    Submitted 28 June, 2021; v1 submitted 30 June, 2020; originally announced July 2020.

    Comments: 15

  31. arXiv:2006.15942  [pdf, other

    cs.CL

    Hinting Semantic Parsing with Statistical Word Sense Disambiguation

    Authors: Ritwik Bose, Siddharth Vashishtha, James Allen

    Abstract: The task of Semantic Parsing can be approximated as a transformation of an utterance into a logical form graph where edges represent semantic roles and nodes represent word senses. The resulting representation should be capture the meaning of the utterance and be suitable for reasoning. Word senses and semantic roles are interdependent, meaning errors in assigning word senses can cause errors in a… ▽ More

    Submitted 6 July, 2020; v1 submitted 29 June, 2020; originally announced June 2020.

    Comments: Longer version of AAAI2020 student abstract

    ACM Class: I.2.7

  32. arXiv:2006.12327  [pdf, other

    cs.GT

    Dynamic Information Flow Tracking for Detection of Advanced Persistent Threats: A Stochastic Game Approach

    Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Linda Bushnell, Wenke Lee, Radha Poovendran

    Abstract: Advanced Persistent Threats (APTs) are stealthy customized attacks by intelligent adversaries. This paper deals with the detection of APTs that infiltrate cyber systems and compromise specifically targeted data and/or infrastructures. Dynamic information flow tracking is an information trace-based detection mechanism against APTs that taints suspicious information flows in the system and generates… ▽ More

    Submitted 25 June, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

  33. arXiv:2005.07704  [pdf, other

    q-bio.BM cs.LG

    Improved Protein-ligand Binding Affinity Prediction with Structure-Based Deep Fusion Inference

    Authors: Derek Jones, Hyojin Kim, Xiaohua Zhang, Adam Zemla, Garrett Stevenson, William D. Bennett, Dan Kirshner, Sergio Wong, Felice Lightstone, Jonathan E. Allen

    Abstract: Predicting accurate protein-ligand binding affinity is important in drug discovery but remains a challenge even with computationally expensive biophysics-based energy scoring methods and state-of-the-art deep learning approaches. Despite the recent advances in the deep convolutional and graph neural network based approaches, the model performance depends on the input data representation and suffer… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

  34. arXiv:2005.07225  [pdf, other

    eess.IV cs.CV

    SAGE: Sequential Attribute Generator for Analyzing Glioblastomas using Limited Dataset

    Authors: Padmaja Jonnalagedda, Brent Weinberg, Jason Allen, Taejin L. Min, Shiv Bhanu, Bir Bhanu

    Abstract: While deep learning approaches have shown remarkable performance in many imaging tasks, most of these methods rely on availability of large quantities of data. Medical image data, however, is scarce and fragmented. Generative Adversarial Networks (GANs) have recently been very effective in handling such datasets by generating more data. If the datasets are very small, however, GANs cannot learn th… ▽ More

    Submitted 3 June, 2022; v1 submitted 14 May, 2020; originally announced May 2020.

  35. arXiv:1911.05211  [pdf, other

    q-bio.QM cs.LG stat.ML

    AMPL: A Data-Driven Modeling Pipeline for Drug Discovery

    Authors: Amanda J. Minnich, Kevin McLoughlin, Margaret Tse, Jason Deng, Andrew Weber, Neha Murad, Benjamin D. Madej, Bharath Ramsundar, Tom Rush, Stacie Calad-Thomson, Jim Brase, Jonathan E. Allen

    Abstract: One of the key requirements for incorporating machine learning into the drug discovery process is complete reproducibility and traceability of the model building and evaluation process. With this in mind, we have developed an end-to-end modular and extensible software pipeline for building and sharing machine learning models that predict key pharma-relevant parameters. The ATOM Modeling PipeLine,… ▽ More

    Submitted 13 November, 2019; v1 submitted 12 November, 2019; originally announced November 2019.

  36. arXiv:1909.09064  [pdf, other

    cs.AI cs.HC

    Human-In-The-Loop Learning of Qualitative Preference Models

    Authors: Joseph Allen, Ahmed Moussa, Xudong Liu

    Abstract: In this work, we present a novel human-in-the-loop framework to help the human user understand the decision making process that involves choosing preferred options. We focus on qualitative preference models over alternatives from combinatorial domains. This framework is interactive: the user provides her behavioral data to the framework, and the framework explains the learned model to the user. It… ▽ More

    Submitted 19 September, 2019; originally announced September 2019.

    Comments: Published in the Proceedings of the 32nd International Florida Artificial Intelligence Research Society Conference, 2019

  37. arXiv:1901.11152  [pdf, other

    cs.LG stat.ML

    Distinguishing between Normal and Cancer Cells Using Autoencoder Node Saliency

    Authors: Ya Ju Fan, Jonathan E. Allen, Sam Ade Jacobs, Brian C. Van Essen

    Abstract: Gene expression profiles have been widely used to characterize patterns of cellular responses to diseases. As data becomes available, scalable learning toolkits become essential to processing large datasets using deep learning models to model complex biological processes. We present an autoencoder to capture nonlinear relationships recovered from gene expression profiles. The autoencoder is a nonl… ▽ More

    Submitted 30 January, 2019; originally announced January 2019.

    Comments: Second Workshop on HPC Applications in Precision Medicine, June 2018

  38. arXiv:1811.05622  [pdf, other

    cs.GT

    A Game Theoretic Approach for Dynamic Information Flow Tracking to Detect Multi-Stage Advanced Persistent Threats

    Authors: Shana Moothedath, Dinuka Sahabandu, Joey Allen, Andrew Clark, Linda Bushnell, Wenke Lee, Radha Poovendran

    Abstract: Advanced Persistent Threats (APTs) infiltrate cyber systems and compromise specifically targeted data and/or resources through a sequence of stealthy attacks consisting of multiple stages. Dynamic information flow tracking has been proposed to detect APTs. In this paper, we develop a dynamic information flow tracking game for resource-efficient detection of APTs via multi-stage dynamic games. The… ▽ More

    Submitted 13 November, 2018; originally announced November 2018.

    Comments: 16

  39. arXiv:1807.00736  [pdf, other

    cs.CR cs.DS

    An Algorithmic Framework For Differentially Private Data Analysis on Trusted Processors

    Authors: Joshua Allen, Bolin Ding, Janardhan Kulkarni, Harsha Nori, Olga Ohrimenko, Sergey Yekhanin

    Abstract: Differential privacy has emerged as the main definition for private data analysis and machine learning. The {\em global} model of differential privacy, which assumes that users trust the data collector, provides strong privacy guarantees and introduces small errors in the output. In contrast, applications of differential privacy in commercial systems by Apple, Google, and Microsoft, use the {\em l… ▽ More

    Submitted 26 October, 2019; v1 submitted 2 July, 2018; originally announced July 2018.

    Comments: Accepted at NeurIPS 2019

  40. arXiv:1807.00412  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Learning to Drive in a Day

    Authors: Alex Kendall, Jeffrey Hawke, David Janz, Przemyslaw Mazur, Daniele Reda, John-Mark Allen, Vinh-Dieu Lam, Alex Bewley, Amar Shah

    Abstract: We demonstrate the first application of deep reinforcement learning to autonomous driving. From randomly initialised parameters, our model is able to learn a policy for lane following in a handful of training episodes using a single monocular image as input. We provide a general and easy to obtain reward: the distance travelled by the vehicle without the safety driver taking control. We use a cont… ▽ More

    Submitted 11 September, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

    Comments: Further results and demo videos can be viewed at: https://wayve.ai/blog/l2diad

  41. arXiv:1803.09027  [pdf, other

    cs.CR math.ST

    Comparing Population Means under Local Differential Privacy: with Significance and Power

    Authors: Bolin Ding, Harsha Nori, Paul Li, Joshua Allen

    Abstract: A statistical hypothesis test determines whether a hypothesis should be rejected based on samples from populations. In particular, randomized controlled experiments (or A/B testing) that compare population means using, e.g., t-tests, have been widely deployed in technology companies to aid in making data-driven decisions. Samples used in these tests are collected from users and may contain sensiti… ▽ More

    Submitted 23 March, 2018; originally announced March 2018.

    Comments: Full version of an AAAI 2018 conference paper

  42. arXiv:1708.07785  [pdf, other

    cs.CV

    Integral Curvature Representation and Matching Algorithms for Identification of Dolphins and Whales

    Authors: Hendrik J. Weideman, Zachary M. Jablons, Jason Holmberg, Kiirsten Flynn, John Calambokidis, Reny B. Tyson, Jason B. Allen, Randall S. Wells, Krista Hupman, Kim Urian, Charles V. Stewart

    Abstract: We address the problem of identifying individual cetaceans from images showing the trailing edge of their fins. Given the trailing edge from an unknown individual, we produce a ranking of known individuals from a database. The nicks and notches along the trailing edge define an individual's unique signature. We define a representation based on integral curvature that is robust to changes in viewpo… ▽ More

    Submitted 25 August, 2017; originally announced August 2017.

    Comments: To appear in ICCV 2017 First Workshop on Visual Wildlife Monitoring

  43. arXiv:1604.01696  [pdf, other

    cs.CL cs.AI

    A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories

    Authors: Nasrin Mostafazadeh, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, James Allen

    Abstract: Representation and learning of commonsense knowledge is one of the foundational problems in the quest to enable deep language understanding. This issue is particularly challenging for understanding casual and correlational relationships between events. While this topic has received a lot of interest in the NLP community, research has been hindered by the lack of a proper evaluation framework. This… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

    Comments: In Proceedings of the 2016 North American Chapter of the ACL (NAACL HLT), 2016

  44. arXiv:1412.1419  [pdf

    cs.DC

    CloudQTL: Evolving a Bioinformatics Application to the Cloud

    Authors: John Allen, David Scott, Malcolm Illingworth, Bartek Dobrzelecki, Davy Virdee, Steve Thorn, Sara Knott

    Abstract: A timeline is presented which shows the stages involved in converting a bioinformatics software application from a set of standalone algorithms through to a simple web based tool then to a web based portal harnessing Grid technologies and on to its latest inception as a Cloud based bioinformatics web tool. The nature of the software is discussed together with a description of its development at va… ▽ More

    Submitted 3 December, 2014; originally announced December 2014.

    Comments: 12 pages, 3 figures, EGI conference Madrid 2013

  45. arXiv:1303.5731  [pdf

    cs.AI

    A Language for Planning with Statistics

    Authors: Nathaniel G. Martin, James F. Allen

    Abstract: When a planner must decide whether it has enough evidence to make a decision based on probability, it faces the sample size problem. Current planners using probabilities need not deal with this problem because they do not generate their probabilities from observations. This paper presents an event based language in which the planner's probabilities are calculated from the binomial random variabl… ▽ More

    Submitted 20 March, 2013; originally announced March 2013.

    Comments: Appears in Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence (UAI1991)

    Report number: UAI-P-1991-PG-220-227

  46. arXiv:1206.5333  [pdf, ps, other

    cs.CL

    TempEval-3: Evaluating Events, Time Expressions, and Temporal Relations

    Authors: Naushad UzZaman, Hector Llorens, James Allen, Leon Derczynski, Marc Verhagen, James Pustejovsky

    Abstract: We describe the TempEval-3 task which is currently in preparation for the SemEval-2013 evaluation exercise. The aim of TempEval is to advance research on temporal information processing. TempEval-3 follows on from previous TempEval events, incorporating: a three-part task structure covering event, temporal expression and temporal relation extraction; a larger dataset; and single overall task quali… ▽ More

    Submitted 25 May, 2014; v1 submitted 22 June, 2012; originally announced June 2012.

  47. Identifying Discourse Markers in Spoken Dialog

    Authors: Peter A. Heeman, Donna Byron, James F. Allen

    Abstract: In this paper, we present a method for identifying discourse marker usage in spontaneous speech based on machine learning. Discourse markers are denoted by special POS tags, and thus the process of POS tagging can be used to identify discourse markers. By incorporating POS tagging into language modeling, discourse markers can be identified during speech recognition, in which the timeliness of th… ▽ More

    Submitted 16 January, 1998; originally announced January 1998.

    Comments: 8 pages, uses psfig

    Journal ref: AAAI 1998 Spring Symposium on Applying Machine Learning to Discourse Processing

  48. Incorporating POS Tagging into Language Modeling

    Authors: Peter A. Heeman, James F. Allen

    Abstract: Language models for speech recognition tend to concentrate solely on recognizing the words that were spoken. In this paper, we redefine the speech recognition problem so that its goal is to find both the best sequence of words and their syntactic role (part-of-speech) in the utterance. This is a necessary first step towards tightening the interaction between speech recognition and natural langua… ▽ More

    Submitted 22 May, 1997; originally announced May 1997.

    Comments: 5 pages, 2 postscript figures

    Journal ref: In proceedings of Eurospeech'97

  49. Intonational Boundaries, Speech Repairs and Discourse Markers: Modeling Spoken Dialog

    Authors: Peter A. Heeman, James F. Allen

    Abstract: To understand a speaker's turn of a conversation, one needs to segment it into intonational phrases, clean up any speech repairs that might have occurred, and identify discourse markers. In this paper, we argue that these problems must be resolved together, and that they must be resolved early in the processing stream. We put forward a statistical language model that resolves these problems, doe… ▽ More

    Submitted 23 April, 1997; originally announced April 1997.

    Comments: 8 pages, 3 postscript figures

    Journal ref: In proceedings of ACL/EACL'97

  50. arXiv:cmp-lg/9606023  [pdf, ps

    cs.CL

    A Robust System for Natural Spoken Dialogue

    Authors: James F. Allen, Bradford W. Miller, Eric K. Ringger, Teresa Sikorski

    Abstract: This paper describes a system that leads us to believe in the feasibility of constructing natural spoken dialogue systems in task-oriented domains. It specifically addresses the issue of robust interpretation of speech in the presence of recognition errors. Robustness is achieved by a combination of statistical error post-correction, syntactically- and semantically-driven robust parsing, and ext… ▽ More

    Submitted 18 June, 1996; originally announced June 1996.

    Comments: uuencoded, gzipped PostScript. Includes extra Appendix

    Journal ref: Proceedings of the 34th Annual Meeting of the ACL