Skip to main content

Showing 1–50 of 59 results for author: Cox, D

  1. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  2. arXiv:2407.00121  [pdf, other

    cs.LG cs.AI cs.CL

    Granite-Function Calling Model: Introducing Function Calling Abilities via Multi-task Learning of Granular Tasks

    Authors: Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal, Sadhana Kumaravel, Matthew Stallone, Rameswar Panda, Yara Rizk, GP Bhargav, Maxwell Crouse, Chulaka Gunasekara, Shajith Ikbal, Sachin Joshi, Hima Karanam, Vineet Kumar, Asim Munawar, Sumit Neelam, Dinesh Raghu, Udit Sharma, Adriana Meza Soria, Dheeraj Sreedhar, Praveen Venkateswaran, Merve Unuvar, David Cox, Salim Roukos, Luis Lastras , et al. (1 additional authors not shown)

    Abstract: Large language models (LLMs) have recently shown tremendous promise in serving as the backbone to agentic systems, as demonstrated by their performance in multi-faceted, challenging benchmarks like SWE-Bench and Agent-Bench. However, to realize the true potential of LLMs as autonomous agents, they must learn to identify, call, and interact with external tools and application program interfaces (AP… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  3. arXiv:2406.12034  [pdf, other

    cs.CL cs.LG

    Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts

    Authors: Junmo Kang, Leonid Karlinsky, Hongyin Luo, Zhen Wang, Jacob Hansen, James Glass, David Cox, Rameswar Panda, Rogerio Feris, Alan Ritter

    Abstract: We present Self-MoE, an approach that transforms a monolithic LLM into a compositional, modular system of self-specialized experts, named MiXSE (MiXture of Self-specialized Experts). Our approach leverages self-specialization, which constructs expert modules using self-generated synthetic data, each equipped with a shared base LLM and incorporating self-optimized routing. This allows for dynamic a… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. Wavefront Threading Enables Effective High-Level Synthesis

    Authors: Blake Pelton, Adam Sapek, Ken Eguro, Daniel Lo, Alessandro Forin, Matt Humphrey, Jinwen Xi, David Cox, Rajas Karandikar, Johannes de Fine Licht, Evgeny Babin, Adrian Caulfield, Doug Burger

    Abstract: Digital systems are growing in importance and computing hardware is growing more heterogeneous. Hardware design, however, remains laborious and expensive, in part due to the limitations of conventional hardware description languages (HDLs) like VHDL and Verilog. A longstanding research goal has been programming hardware like software, with high-level languages that can generate efficient hardware… ▽ More

    Submitted 10 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Accepted to PLDI'24

  5. arXiv:2405.17258  [pdf, other

    cs.LG cs.AI

    $\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning

    Authors: Runqian Wang, Soumya Ghosh, David Cox, Diego Antognini, Aude Oliva, Rogerio Feris, Leonid Karlinsky

    Abstract: Low-rank adapters (LoRA) and their variants are popular parameter-efficient fine-tuning (PEFT) techniques that closely match full model fine-tune performance while requiring only a small number of additional parameters. These additional LoRA parameters are specific to the base model being adapted. When the base model needs to be deprecated and replaced with a new one, all the associated LoRA modul… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  6. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  7. arXiv:2403.01081  [pdf, other

    cs.CL cs.LG

    LAB: Large-Scale Alignment for ChatBots

    Authors: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Kai Xu, David D. Cox, Akash Srivastava

    Abstract: This work introduces LAB (Large-scale Alignment for chatBots), a novel methodology designed to overcome the scalability challenges in the instruction-tuning phase of large language model (LLM) training. Leveraging a taxonomy-guided synthetic data generation process and a multi-phase tuning framework, LAB significantly reduces reliance on expensive human annotations and proprietary models like GPT-… ▽ More

    Submitted 29 April, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Corresponding Author: Akash Srivastava. Equal Contribution: Shivchander Sudalairaj, Abhishek Bhandwaldar, Aldo Pareja, Akash Srivastava, Code: https://github.com/instructlab

  8. arXiv:2310.07654  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Audio-Visual Neural Syntax Acquisition

    Authors: Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

    Abstract: We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  9. arXiv:2310.05910  [pdf, other

    cs.CL cs.AI cs.LG

    SALMON: Self-Alignment with Instructable Reward Models

    Authors: Zhiqing Sun, Yikang Shen, Hongxin Zhang, Qinhong Zhou, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

    Abstract: Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM-based AI agents. However, a significant limitation of such an approach is its dependency on high-quality human annotations, making its application to intricate tasks challenging due to difficulties in obtaining consistent response… ▽ More

    Submitted 9 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Previous Title: SALMON: Self-Alignment with Principle-Following Reward Models. Accepted to ICLR 2024. Project page: https://github.com/IBM/SALMON

  10. arXiv:2310.00160  [pdf, other

    cs.CL cs.AI

    Self-Specialization: Uncovering Latent Expertise within Large Language Models

    Authors: Junmo Kang, Hongyin Luo, Yada Zhu, Jacob Hansen, James Glass, David Cox, Alan Ritter, Rogerio Feris, Leonid Karlinsky

    Abstract: Recent works have demonstrated the effectiveness of self-alignment in which a large language model is aligned to follow general instructions using instructional data generated from the model itself starting from a handful of human-written seeds. Instead of general alignment, in this work, we focus on self-alignment for expert domain specialization (e.g., biomedicine, finance). As a preliminary, we… ▽ More

    Submitted 5 June, 2024; v1 submitted 29 September, 2023; originally announced October 2023.

    Comments: ACL 2024 (Findings; Long Paper)

  11. arXiv:2305.03047  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision

    Authors: Zhiqing Sun, Yikang Shen, Qinhong Zhou, Hongxin Zhang, Zhenfang Chen, David Cox, Yiming Yang, Chuang Gan

    Abstract: Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning (SFT) with human annotations and reinforcement learning from human feedback (RLHF) to align the output of large language models (LLMs) with human intentions, ensuring they are helpful, ethical, and reliable. However, this dependence can significantly constrain the true potential of AI-assistant agents due to… ▽ More

    Submitted 2 December, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted at NeurIPS 2023 (Spotlight). Project page: https://github.com/IBM/Dromedary

  12. arXiv:2304.03767  [pdf, other

    cs.CV

    Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following

    Authors: Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua B. Tenenbaum, Chuang Gan

    Abstract: Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CoRL 2022

  13. arXiv:2303.00980  [pdf, other

    cs.LG

    Learning to Grow Pretrained Models for Efficient Transformer Training

    Authors: Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

    Abstract: Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: International Conference on Learning Representations (ICLR), 2023

  14. arXiv:2302.05941  [pdf, other

    cs.SE cs.AI

    Rapid Development of Compositional AI

    Authors: Lee Martie, Jessie Rosenberg, Veronique Demers, Gaoyuan Zhang, Onkar Bhardwaj, John Henning, Aditya Prasad, Matt Stallone, Ja Young Lee, Lucy Yip, Damilola Adesina, Elahe Paikari, Oscar Resendiz, Sarah Shaw, David Cox

    Abstract: Compositional AI systems, which combine multiple artificial intelligence components together with other application components to solve a larger problem, have no known pattern of development and are often approached in a bespoke and ad hoc style. This makes development slower and harder to reuse for future applications. To support the full rapid development cycle of compositional AI applications,… ▽ More

    Submitted 12 February, 2023; originally announced February 2023.

    Comments: Accepted to ICSE 2023, NIER track

    Journal ref: 2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Technologies Results Track (ICSE-NIER), Melbourne, Australia, 2023, pp. (forthcoming)

  15. arXiv:2211.09790  [pdf, other

    cs.LG cs.AI cs.CV

    ConStruct-VL: Data-Free Continual Structured VL Concepts Learning

    Authors: James Seale Smith, Paola Cascante-Bonilla, Assaf Arbelle, Donghyun Kim, Rameswar Panda, David Cox, Diyi Yang, Zsolt Kira, Rogerio Feris, Leonid Karlinsky

    Abstract: Recently, large-scale pre-trained Vision-and-Language (VL) foundation models have demonstrated remarkable capabilities in many zero-shot downstream tasks, achieving competitive results for recognizing objects defined by as little as short text prompts. However, it has also been shown that VL models are still brittle in Structured VL Concept (SVLC) reasoning, such as the ability to recognize object… ▽ More

    Submitted 30 March, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  16. arXiv:2206.00100  [pdf, other

    cs.CV cs.CL

    VALHALLA: Visual Hallucination for Machine Translation

    Authors: Yi Li, Rameswar Panda, Yoon Kim, Chun-Fu Chen, Rogerio Feris, David Cox, Nuno Vasconcelos

    Abstract: Designing better machine translation systems by considering auxiliary inputs such as images has attracted much attention in recent years. While existing methods show promising performance over the conventional text-only translation systems, they typically require paired text and image as input during inference, which limits their applicability to real-world scenarios. In this paper, we introduce a… ▽ More

    Submitted 31 May, 2022; originally announced June 2022.

    Comments: CVPR 2022

  17. arXiv:2204.09224  [pdf, other

    cs.SD cs.AI eess.AS

    ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

    Authors: Kaizhi Qian, Yang Zhang, Heting Gao, Junrui Ni, Cheng-I Lai, David Cox, Mark Hasegawa-Johnson, Shiyu Chang

    Abstract: Self-supervised learning in speech involves training a speech representation network on a large-scale unannotated speech corpus, and then applying the learned representations to downstream tasks. Since the majority of the downstream tasks of SSL learning in speech largely focus on the content information in speech, the most desirable speech representations should be able to disentangle unwanted va… ▽ More

    Submitted 23 June, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

  18. arXiv:2111.06979  [pdf, other

    q-bio.NC cs.LG cs.NE

    Neural Population Geometry Reveals the Role of Stochasticity in Robust Perception

    Authors: Joel Dapello, Jenelle Feather, Hang Le, Tiago Marques, David D. Cox, Josh H. McDermott, James J. DiCarlo, SueYeon Chung

    Abstract: Adversarial examples are often cited by neuroscientists and machine learning researchers as an example of how computational models diverge from biological sensory systems. Recent work has proposed adding biologically-inspired components to visual neural networks as a way to improve their adversarial robustness. One surprisingly effective component for reducing adversarial vulnerability is response… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  19. arXiv:2110.14068  [pdf, other

    cs.LG

    Drawing Robust Scratch Tickets: Subnetworks with Inborn Robustness Are Found within Randomly Initialized Networks

    Authors: Yonggan Fu, Qixuan Yu, Yang Zhang, Shang Wu, Xu Ouyang, David Cox, Yingyan Lin

    Abstract: Deep Neural Networks (DNNs) are known to be vulnerable to adversarial attacks, i.e., an imperceptible perturbation to the input can mislead DNNs trained on clean images into making erroneous predictions. To tackle this, adversarial training is currently the most effective defense method, by augmenting the training set with adversarial samples generated on the fly. Interestingly, we discover for th… ▽ More

    Submitted 2 February, 2022; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted at NeurIPS 2021

  20. arXiv:2110.01147  [pdf, other

    cs.SD cs.CL eess.AS

    On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

    Authors: Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

    Abstract: Are end-to-end text-to-speech (TTS) models over-parametrized? To what extent can these models be pruned, and what happens to their synthesis capabilities? This work serves as a starting point to explore pruning both spectrogram prediction networks and vocoders. We thoroughly investigate the tradeoffs between sparsity and its subsequent effects on synthetic speech. Additionally, we explored several… ▽ More

    Submitted 27 October, 2021; v1 submitted 3 October, 2021; originally announced October 2021.

  21. arXiv:2106.08519  [pdf, other

    eess.AS cs.LG cs.SD

    Global Rhythm Style Transfer Without Text Transcriptions

    Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson

    Abstract: Prosody plays an important role in characterizing the style of a speaker or an emotion, but most non-parallel voice or emotion style transfer algorithms do not convert any prosody information. Two major components of prosody are pitch and rhythm. Disentangling the prosody information, particularly the rhythm component, from the speech is challenging because it involves breaking the synchrony betwe… ▽ More

    Submitted 15 June, 2021; originally announced June 2021.

  22. arXiv:2106.06575  [pdf, other

    cs.LG

    Auto-NBA: Efficient and Effective Search Over the Joint Space of Networks, Bitwidths, and Accelerators

    Authors: Yonggan Fu, Yongan Zhang, Yang Zhang, David Cox, Yingyan Lin

    Abstract: While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to th… ▽ More

    Submitted 24 April, 2023; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted at ICML 2021

  23. arXiv:2106.05933  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

    Authors: Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

    Abstract: Self-supervised speech representation learning (speech SSL) has demonstrated the benefit of scale in learning rich representations for Automatic Speech Recognition (ASR) with limited paired data, such as wav2vec 2.0. We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results. However, directly applying widely adopted prunin… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

  24. arXiv:2104.03835  [pdf, other

    math.CO cs.DM

    Eternal distance-k domination on graphs

    Authors: Danielle Cox, Erin Meger, M. E. Messinger

    Abstract: Eternal domination is a dynamic process by which a graph is protected from an infinite sequence of vertex intrusions. In eternal distance-$k$ domination, guards initially occupy the vertices of a distance-$k$ dominating set. After a vertex is attacked, guards ``defend'' by each moving up to distance $k$ to form a distance-$k$ dominating set, such that some guard occupies the attacked vertex. The e… ▽ More

    Submitted 18 November, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

    Comments: 21 pages, 8 figures

  25. arXiv:2012.11587  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Object-Centric Diagnosis of Visual Reasoning

    Authors: Jianwei Yang, Jiayuan Mao, Jiajun Wu, Devi Parikh, David D. Cox, Joshua B. Tenenbaum, Chuang Gan

    Abstract: When answering questions about an image, it not only needs knowing what -- understanding the fine-grained contents (e.g., objects, relationships) in the image, but also telling why -- reasoning over grounding visual cues to derive the answer for a question. Over the last few years, we have seen significant progress on visual question answering. Though impressive as the accuracy grows, it still lag… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  26. Ten Simple Rules for making a vocabulary FAIR

    Authors: Simon J D Cox, Alejandra N Gonzalez-Beltran, Barbara Magagna, Maria-Cristina Marinescu

    Abstract: We present ten simple rules that support converting a legacy vocabulary -- a list of terms available in a print-based glossary or table not accessible using web standards -- into a FAIR vocabulary. Various pathways may be followed to publish the FAIR vocabulary, but we emphasise particularly the goal of providing a distinct IRI for each term or concept. A standard representation of the concept sho… ▽ More

    Submitted 3 December, 2020; originally announced December 2020.

    Comments: 13 pages

    Journal ref: PLoS Comput Biol 17(6): e1009041 (2021)

  27. arXiv:2010.13187  [pdf, other

    stat.ML cs.CV cs.LG

    Improving the Reconstruction of Disentangled Representation Learners via Multi-Stage Modeling

    Authors: Akash Srivastava, Yamini Bansal, Yukun Ding, Cole Lincoln Hurwitz, Kai Xu, Bernhard Egger, Prasanna Sattigeri, Joshua B. Tenenbaum, Phuong Le, Arun Prakash R, Nengfeng Zhou, Joel Vaughan, Yaquan Wang, Anwesha Bhattacharyya, Kristjan Greenewald, David D. Cox, Dan Gutfreund

    Abstract: Current autoencoder-based disentangled representation learning methods achieve disentanglement by penalizing the (aggregate) posterior to encourage statistical independence of the latent factors. This approach introduces a trade-off between disentangled representation learning and reconstruction quality since the model does not have enough capacity to learn correlated latent variables that capture… ▽ More

    Submitted 3 April, 2024; v1 submitted 25 October, 2020; originally announced October 2020.

  28. arXiv:2009.04433  [pdf, other

    eess.IV cs.CV stat.ML

    not-so-BigGAN: Generating High-Fidelity Images on Small Compute with Wavelet-based Super-Resolution

    Authors: Seungwook Han, Akash Srivastava, Cole Hurwitz, Prasanna Sattigeri, David D. Cox

    Abstract: State-of-the-art models for high-resolution image generation, such as BigGAN and VQVAE-2, require an incredible amount of compute resources and/or time (512 TPU-v3 cores) to train, putting them out of reach for the larger research community. On the other hand, GAN-based image super-resolution models, such as ESRGAN, can not only upscale images to high dimensions, but also are efficient to train. I… ▽ More

    Submitted 25 October, 2020; v1 submitted 9 September, 2020; originally announced September 2020.

  29. arXiv:2009.01129  [pdf, other

    cs.CV

    Lifelong Object Detection

    Authors: Wang Zhou, Shiyu Chang, Norma Sosa, Hendrik Hamann, David Cox

    Abstract: Recent advances in object detection have benefited significantly from rapid developments in deep neural networks. However, neural networks suffer from the well-known issue of catastrophic forgetting, which makes continual or lifelong learning problematic. In this paper, we leverage the fact that new training classes arrive in a sequential manner and incrementally refine the model so that it additi… ▽ More

    Submitted 2 September, 2020; originally announced September 2020.

  30. arXiv:2007.04954  [pdf, other

    cs.CV cs.GR cs.LG cs.RO

    ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation

    Authors: Chuang Gan, Jeremy Schwartz, Seth Alter, Damian Mrowca, Martin Schrimpf, James Traer, Julian De Freitas, Jonas Kubilius, Abhishek Bhandwaldar, Nick Haber, Megumi Sano, Kuno Kim, Elias Wang, Michael Lingelbach, Aidan Curtis, Kevin Feigelis, Daniel M. Bear, Dan Gutfreund, David Cox, Antonio Torralba, James J. DiCarlo, Joshua B. Tenenbaum, Josh H. McDermott, Daniel L. K. Yamins

    Abstract: We introduce ThreeDWorld (TDW), a platform for interactive multi-modal physical simulation. TDW enables simulation of high-fidelity sensory data and physical interactions between mobile agents and objects in rich 3D environments. Unique properties include: real-time near-photo-realistic image rendering; a library of objects and environments, and routines for their customization; generative procedu… ▽ More

    Submitted 28 December, 2021; v1 submitted 9 July, 2020; originally announced July 2020.

    Comments: Oral Presentation at NeurIPS 21 Datasets and Benchmarks Track. Project page: http://www.threedworld.org

  31. arXiv:2006.00084  [pdf, other

    astro-ph.IM astro-ph.EP cs.GR

    Clustering-informed Cinematic Astrophysical Data Visualization with Application to the Moon-forming Terrestrial Synestia

    Authors: Patrick D. Aleo, Simon J. Lock, Donna J. Cox, Stuart A. Levy, J. P. Naiman, A. J. Christensen, Kalina Borkiewicz, Robert Patterson

    Abstract: Scientific visualization tools are currently not optimized to create cinematic, production-quality representations of numerical data for the purpose of science communication. In our pipeline \texttt{Estra}, we outline a step-by-step process from a raw simulation into a finished render as a way to teach non-experts in the field of visualization how to achieve production-quality outputs on their own… ▽ More

    Submitted 29 May, 2020; originally announced June 2020.

    Comments: 19 pages, 16 figures, submitted to MNRAS

  32. arXiv:2004.11284  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Speech Decomposition via Triple Information Bottleneck

    Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, David Cox, Mark Hasegawa-Johnson

    Abstract: Speech information can be roughly decomposed into four components: language content, timbre, pitch, and rhythm. Obtaining disentangled representations of these components is useful in many speech analysis and generation applications. Recently, state-of-the-art voice conversion systems have led to speech representations that can disentangle speaker-dependent and independent information. However, th… ▽ More

    Submitted 13 March, 2021; v1 submitted 23 April, 2020; originally announced April 2020.

  33. arXiv:2001.11122  [pdf, other

    cs.CV

    Joint Visual-Temporal Embedding for Unsupervised Learning of Actions in Untrimmed Sequences

    Authors: Rosaura G. VidalMata, Walter J. Scheirer, Anna Kukleva, David Cox, Hilde Kuehne

    Abstract: Understanding the structure of complex activities in untrimmed videos is a challenging task in the area of action recognition. One problem here is that this task usually requires a large amount of hand-annotated minute- or even hour-long video data, but annotating such data is very time consuming and can not easily be automated or scaled. To address this problem, this paper proposes an approach fo… ▽ More

    Submitted 30 September, 2020; v1 submitted 29 January, 2020; originally announced January 2020.

  34. arXiv:1912.00869  [pdf, ps, other

    cs.CV

    More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

    Authors: Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, David Cox

    Abstract: Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets. This results in a need for large GPU clusters to train and evaluate such architectures. To address this problem, we present a lightweight and memory-friendly architecture for action recognition that performs on par with or better than current architectures by using only a fraction of resources.… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Accepted at NeurIPS 2019, codes and models are available at https://github.com/IBM/bLVNet-TAM

    Report number: 32

    Journal ref: Advances in Neural Information Processing Systems (Neurips 2019)

  35. arXiv:1911.08051  [pdf, other

    stat.ML cs.LG

    SimVAE: Simulator-Assisted Training forInterpretable Generative Models

    Authors: Akash Srivastava, Jessie Rosenberg, Dan Gutfreund, David D. Cox

    Abstract: This paper presents a simulator-assisted training method (SimVAE) for variational autoencoders (VAE) that leads to a disentangled and interpretable latent space. Training SimVAE is a two-step process in which first a deep generator network(decoder) is trained to approximate the simulator. During this step, the simulator acts as the data source or as a teacher network. Then an inference network (en… ▽ More

    Submitted 18 November, 2019; originally announced November 2019.

  36. arXiv:1910.11760  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    Self-supervised Moving Vehicle Tracking with Stereo Sound

    Authors: Chuang Gan, Hang Zhao, Peihao Chen, David Cox, Antonio Torralba

    Abstract: Humans are able to localize objects in the environment using both visual and auditory cues, integrating information from multiple modalities into a common reference frame. We introduce a system that can leverage unlabeled audio-visual data to learn to localize objects (moving vehicles) in a visual reference frame, purely using stereo sound at inference time. Since it is labor-intensive to manually… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: To appear at ICCV 2019. Project page: http://sound-track.csail.mit.edu

  37. arXiv:1910.06513  [pdf, other

    cs.LG math.OC stat.ML

    ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization

    Authors: Xiangyi Chen, Sijia Liu, Kaidi Xu, Xingguo Li, Xue Lin, Mingyi Hong, David Cox

    Abstract: The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we prop… ▽ More

    Submitted 15 October, 2019; v1 submitted 14 October, 2019; originally announced October 2019.

  38. arXiv:1907.01821  [pdf, other

    cs.CV eess.IV

    Super-Resolution of PROBA-V Images Using Convolutional Neural Networks

    Authors: Marcus Märtens, Dario Izzo, Andrej Krzic, Daniël Cox

    Abstract: ESA's PROBA-V Earth observation satellite enables us to monitor our planet at a large scale, studying the interaction between vegetation and climate and provides guidance for important decisions on our common global future. However, the interval at which high resolution images are recorded spans over several days, in contrast to the availability of lower resolution images which is often daily. We… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: To appear in Special Issue on Applications of Artificial Intelligence in Aerospace Engineering in the Journal "Astrodynamics"

  39. arXiv:1810.05590  [pdf, other

    cs.DM math.CO

    Chromatic Polynomials of Oriented Graphs

    Authors: Danielle Cox, Christopher Duffy

    Abstract: The oriented chromatic polynomial of a oriented graph outputs the number of oriented $k$-colourings for any input $k$. We fully classify those oriented graphs for which the oriented graph has the same chromatic polynomial as the underlying simple graph, closing an open problem posed by Sopena. We find that such oriented graphs can be both identified and constructed in polynomial time as they are e… ▽ More

    Submitted 20 December, 2018; v1 submitted 12 October, 2018; originally announced October 2018.

    MSC Class: 05C20

  40. arXiv:1807.08093  [pdf, other

    cs.CV

    Conditional Infilling GANs for Data Augmentation in Mammogram Classification

    Authors: Eric Wu, Kevin Wu, David Cox, William Lotter

    Abstract: Deep learning approaches to breast cancer detection in mammograms have recently shown promising results. However, such models are constrained by the limited size of publicly available mammography datasets, in large part due to privacy concerns and the high cost of generating expert annotations. Limited dataset size is further exacerbated by substantial class imbalance since "normal" images dramati… ▽ More

    Submitted 24 August, 2018; v1 submitted 21 July, 2018; originally announced July 2018.

    Comments: To appear in MICCAI 2018, Breast Image Analysis Workshop

  41. arXiv:1806.00730  [pdf, other

    stat.ML cs.LG cs.NE

    Minnorm training: an algorithm for training over-parameterized deep neural networks

    Authors: Yamini Bansal, Madhu Advani, David D Cox, Andrew M Saxe

    Abstract: In this work, we propose a new training method for finding minimum weight norm solutions in over-parameterized neural networks (NNs). This method seeks to improve training speed and generalization performance by framing NN training as a constrained optimization problem wherein the sum of the norm of the weights in each layer of the network is minimized, under the constraint of exactly fitting trai… ▽ More

    Submitted 21 June, 2018; v1 submitted 2 June, 2018; originally announced June 2018.

  42. arXiv:1805.10734  [pdf, other

    q-bio.NC cs.CV cs.LG

    A neural network trained to predict future video frames mimics critical properties of biological neuronal responses and perception

    Authors: William Lotter, Gabriel Kreiman, David Cox

    Abstract: While deep neural networks take loose inspiration from neuroscience, it is an open question how seriously to take the analogies between artificial deep networks and biological neuronal systems. Interestingly, recent work has shown that deep convolutional neural networks (CNNs) trained on large-scale image recognition tasks can serve as strikingly good models for predicting the responses of neurons… ▽ More

    Submitted 29 May, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

  43. SOSA: A Lightweight Ontology for Sensors, Observations, Samples, and Actuators

    Authors: Krzysztof Janowicz, Armin Haller, Simon J D Cox, Danh Le Phuoc, Maxime Lefrancois

    Abstract: The Sensor, Observation, Sample, and Actuator (SOSA) ontology provides a formal but lightweight general-purpose specification for modeling the interaction between the entities involved in the acts of observation, actuation, and sampling. SOSA is the result of rethinking the W3C-XG Semantic Sensor Network (SSN) ontology based on changes in scope and target audience, technical developments, and less… ▽ More

    Submitted 25 December, 2018; v1 submitted 25 May, 2018; originally announced May 2018.

    Journal ref: Journal of Web Semantics, 2018

  44. arXiv:1805.09874  [pdf, other

    stat.ML cs.LG cs.NE q-bio.NC

    Learning Nonlinear Brain Dynamics: van der Pol Meets LSTM

    Authors: German Abrevaya, Irina Rish, Aleksandr Y. Aravkin, Guillermo Cecchi, James Kozloski, Pablo Polosecki, Peng Zheng, Silvina Ponce Dawson, Juliana Rhee, David Cox

    Abstract: Many real-world data sets, especially in biology, are produced by complex nonlinear dynamical systems. In this paper, we focus on brain calcium imaging (CaI) of different organisms (zebrafish and rat), aiming to build a model of joint activation dynamics in large neuronal populations, including the whole brain of zebrafish. We propose a new approach for capturing dynamics of temporal SVD component… ▽ More

    Submitted 20 July, 2019; v1 submitted 24 May, 2018; originally announced May 2018.

    Comments: 14 pages, 11 figures

    MSC Class: 62F35; 65K10; 49M15

  45. arXiv:1802.05371  [pdf, other

    cs.DC

    Input-Aware Auto-Tuning of Compute-Bound HPC Kernels

    Authors: Philippe Tillet, David Cox

    Abstract: Efficient implementations of HPC applications for parallel architectures generally rely on external software packages (e.g., BLAS, LAPACK, CUDNN). While these libraries provide highly optimized routines for certain characteristics of inputs (e.g., square matrices), they generally do not retain optimal performance across the wide range of problems encountered in practice. In this paper, we present… ▽ More

    Submitted 14 February, 2018; originally announced February 2018.

    Journal ref: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2017

  46. arXiv:1710.10112  [pdf, ps, other

    math.CO cs.DM

    Hyperopic Cops and Robbers

    Authors: A. Bonato, N. E. Clarke, D. Cox, S. Finbow, F. Mc Inerney, M. E. Messinger

    Abstract: We introduce a new variant of the game of Cops and Robbers played on graphs, where the robber is invisible unless outside the neighbor set of a cop. The hyperopic cop number is the corresponding analogue of the cop number, and we investigate bounds and other properties of this parameter. We characterize the cop-win graphs for this variant, along with graphs with the largest possible hyperopic cop… ▽ More

    Submitted 27 October, 2017; originally announced October 2017.

  47. arXiv:1708.07179  [pdf, other

    cs.DM

    Limited Visibility Cops and Robbers

    Authors: N. E. Clarke, D. Cox, C. Duffy, D. Dyer, S. Fitzpatrick, M. E. Messinger

    Abstract: We consider a variation of the Cops and Robber game where the cops can only see the robber when the distance between them is at most a fixed parameter $\ell$. We consider the basic consequences of this definition for some simple graph families, and show that this model is not monotonic, unlike common models where the robber is invisible. We see that cops' strategy consists of a phase in which they… ▽ More

    Submitted 23 August, 2017; originally announced August 2017.

    Comments: 22 pages

    MSC Class: 49N75

  48. arXiv:1707.06978  [pdf, other

    cs.CV

    A Multi-Scale CNN and Curriculum Learning Strategy for Mammogram Classification

    Authors: William Lotter, Greg Sorensen, David Cox

    Abstract: Screening mammography is an important front-line tool for the early detection of breast cancer, and some 39 million exams are conducted each year in the United States alone. Here, we describe a multi-scale convolutional neural network (CNN) trained with a curriculum learning strategy that achieves high levels of accuracy in classifying mammograms. Specifically, we first train CNN-based patch class… ▽ More

    Submitted 21 July, 2017; originally announced July 2017.

    Comments: Accepted to MICCAI 2017 Workshop on Deep Learning in Medical Image Analysis

  49. arXiv:1706.02240  [pdf

    q-bio.NC cs.AI cs.CV cs.LG

    Recurrent computations for visual pattern completion

    Authors: Hanlin Tang, Martin Schrimpf, Bill Lotter, Charlotte Moerman, Ana Paredes, Josue Ortega Caro, Walter Hardesty, David Cox, Gabriel Kreiman

    Abstract: Making inferences from partial information constitutes a critical aspect of cognition. During visual perception, pattern completion enables recognition of poorly visible or occluded objects. We combined psychophysics, physiology and computational models to test the hypothesis that pattern completion is implemented by recurrent computations and present three pieces of evidence that are consistent w… ▽ More

    Submitted 6 April, 2018; v1 submitted 7 June, 2017; originally announced June 2017.

  50. arXiv:1703.05463  [pdf, ps, other

    cs.CV

    Using Human Brain Activity to Guide Machine Learning

    Authors: Ruth Fong, Walter Scheirer, David Cox

    Abstract: Machine learning is a field of computer science that builds algorithms that learn. In many cases, machine learning algorithms are used to recreate a human ability like adding a caption to a photo, driving a car, or playing a game. While the human brain has long served as a source of inspiration for machine learning, little effort has been made to directly use data collected from working brains as… ▽ More

    Submitted 19 September, 2017; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Supplemental material can be downloaded here: http://www.wjscheirer.com/misc/activity_weights/fong-et-al-supplementary.pdf