Skip to main content

Showing 1–35 of 35 results for author: Wilson, K

  1. arXiv:2407.06116  [pdf

    eess.IV cs.CV cs.LG

    Data-driven Nucleus Subclassification on Colon H&E using Style-transferred Digital Pathology

    Authors: Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Y. Cai, Thomas Li, Ruining Deng, Nancy R. Newlin, Adam M. Saunders, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, Bennett A. Landman

    Abstract: Understanding the way cells communicate, co-locate, and interrelate is essential to furthering our understanding of how the body functions. H&E is widely available, however, cell subtyping often requires expert knowledge and the use of specialized stains. To reduce the annotation burden, AI has been proposed for the classification of cells on H&E. For example, the recent Colon Nucleus Identificati… ▽ More

    Submitted 15 May, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.05602

  2. arXiv:2406.19317  [pdf, other

    cs.LG cs.AI cs.CL

    Jump Starting Bandits with LLM-Generated Prior Knowledge

    Authors: Parand A. Alamdari, Yanshuai Cao, Kevin H. Wilson

    Abstract: We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  3. arXiv:2405.18563  [pdf, other

    cs.LG stat.ME

    Counterfactual Explanations for Multivariate Time-Series without Training Datasets

    Authors: Xiangyu Sun, Raquel Aoki, Kevin H. Wilson

    Abstract: Machine learning (ML) methods have experienced significant growth in the past decade, yet their practical application in high-impact real-world domains has been hindered by their opacity. When ML methods are responsible for making critical decisions, stakeholders often require insights into how to alter these decisions. Counterfactual explanations (CFEs) have emerged as a solution, offering interp… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2402.03006  [pdf, ps, other

    cs.LG stat.ML

    On the development of a practical Bayesian optimisation algorithm for expensive experiments and simulations with changing environmental conditions

    Authors: Mike Diessner, Kevin J. Wilson, Richard D. Whalley

    Abstract: Experiments in engineering are typically conducted in controlled environments where parameters can be set to any desired value. This assumes that the same applies in a real-world setting -- an assumption that is often incorrect as many experiments are influenced by uncontrollable environmental conditions such as temperature, humidity and wind speed. When optimising such experiments, the focus shou… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: 23 pages, 10 figures

  5. arXiv:2401.05602  [pdf

    cs.CV

    Nucleus subtype classification using inter-modality learning

    Authors: Lucas W. Remedios, Shunxing Bao, Samuel W. Remedios, Ho Hin Lee, Leon Y. Cai, Thomas Li, Ruining Deng, Can Cui, Jia Li, Qi Liu, Ken S. Lau, Joseph T. Roland, Mary K. Washington, Lori A. Coburn, Keith T. Wilson, Yuankai Huo, Bennett A. Landman

    Abstract: Understanding the way cells communicate, co-locate, and interrelate is essential to understanding human physiology. Hematoxylin and eosin (H&E) staining is ubiquitously available both for clinical studies and research. The Colon Nucleus Identification and Classification (CoNIC) Challenge has recently innovated on robust artificial intelligence labeling of six cell types on H&E stains of the colon.… ▽ More

    Submitted 28 January, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

  6. arXiv:2308.10166  [pdf, other

    cs.CV

    Cell Spatial Analysis in Crohn's Disease: Unveiling Local Cell Arrangement Pattern with Graph-based Signatures

    Authors: Shunxing Bao, Sichen Zhu, Vasantha L Kolachala, Lucas W. Remedios, Yeonjoo Hwang, Yutong Sun, Ruining Deng, Can Cui, Yike Li, Jia Li, Joseph T. Roland, Qi Liu, Ken S. Lau, Subra Kugathasan, Peng Qiu, Keith T. Wilson, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

    Abstract: Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains cha… ▽ More

    Submitted 20 August, 2023; originally announced August 2023.

    Comments: Submitted to SPIE Medical Imaging. San Diego, CA. February 2024

  7. arXiv:2307.00750  [pdf, other

    cs.CV cs.AI

    Feasibility of Universal Anomaly Detection without Knowing the Abnormality in Medical Images

    Authors: Can Cui, Yaohong Wang, Shunxing Bao, Yucheng Tang, Ruining Deng, Lucas W. Remedios, Zuhayr Asad, Joseph T. Roland, Ken S. Lau, Qi Liu, Lori A. Coburn, Keith T. Wilson, Bennett A. Landman, Yuankai Huo

    Abstract: Many anomaly detection approaches, especially deep learning methods, have been recently developed to identify abnormal image morphology by only employing normal images during training. Unfortunately, many prior anomaly detection methods were optimized for a specific "known" abnormality (e.g., brain tumor, bone fraction, cell types). Moreover, even though only the normal images were used in the tra… ▽ More

    Submitted 19 August, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

  8. arXiv:2305.11151  [pdf, other

    cs.SD eess.AS

    Unsupervised Multi-channel Separation and Adaptation

    Authors: Cong Han, Kevin Wilson, Scott Wisdom, John R. Hershey

    Abstract: A key challenge in machine learning is to generalize from training data to an application domain of interest. This work generalizes the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to train a model on far-field microphone array recordings of overlapping reverberant and noisy speech from the AMI Corpus. Th… ▽ More

    Submitted 22 March, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  9. arXiv:2305.06709  [pdf, other

    cs.LG cs.MS stat.ML

    NUBO: A Transparent Python Package for Bayesian Optimization

    Authors: Mike Diessner, Kevin J. Wilson, Richard D. Whalley

    Abstract: NUBO, short for Newcastle University Bayesian Optimization, is a Bayesian optimization framework for optimizing expensive-to-evaluate black-box functions, such as physical experiments and computer simulators. Bayesian optimization is a cost-efficient optimization strategy that uses surrogate modeling via Gaussian processes to represent an objective function and acquisition functions to guide the s… ▽ More

    Submitted 3 June, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: Accepted for publication by the Journal of Statistical Software

  10. arXiv:2304.04155  [pdf, other

    eess.IV cs.CV

    Segment Anything Model (SAM) for Digital Pathology: Assess Zero-shot Segmentation on Whole Slide Imaging

    Authors: Ruining Deng, Can Cui, Quan Liu, Tianyuan Yao, Lucas W. Remedios, Shunxing Bao, Bennett A. Landman, Lee E. Wheless, Lori A. Coburn, Keith T. Wilson, Yaohong Wang, Shilin Zhao, Agnes B. Fogo, Haichun Yang, Yucheng Tang, Yuankai Huo

    Abstract: The segment anything model (SAM) was released as a foundation model for image segmentation. The promptable segmentation model was trained by over 1 billion masks on 11M licensed and privacy-respecting images. The model supports zero-shot image segmentation with various segmentation prompts (e.g., points, boxes, masks). It makes the SAM attractive for medical image analysis, especially for digital… ▽ More

    Submitted 9 April, 2023; originally announced April 2023.

  11. arXiv:2304.00216  [pdf, other

    eess.IV cs.CV cs.LG

    Cross-scale Multi-instance Learning for Pathological Image Diagnosis

    Authors: Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

    Abstract: Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnifica… ▽ More

    Submitted 16 February, 2024; v1 submitted 31 March, 2023; originally announced April 2023.

  12. arXiv:2303.03677  [pdf, other

    cs.CY cs.AI cs.LG

    Training Machine Learning Models to Characterize Temporal Evolution of Disadvantaged Communities

    Authors: Milan Jain, Narmadha Meenu Mohankumar, Heng Wan, Sumitrra Ganguly, Kyle D Wilson, David M Anderson

    Abstract: Disadvantaged communities (DAC), as defined by the Justice40 initiative of the Department of Energy (DOE), USA, identifies census tracts across the USA to determine where benefits of climate and energy investments are or are not currently accruing. The DAC status not only helps in determining the eligibility for future Justice40-related investments but is also critical for exploring ways to achiev… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  13. arXiv:2212.04549  [pdf, other

    cs.RO eess.SY

    Optimizing Real-Time Performances for Timed-Loop Racing under F1TENTH

    Authors: Nitish Gupta, Kurt Wilson, Zhishan Guo

    Abstract: Motion planning and control in autonomous car racing are one of the most challenging and safety-critical tasks due to high speed and dynamism. The lower-level control nodes are expected to be highly optimized due to resource constraints of onboard embedded processing units, although there are strict latency requirements. Some of these guarantees can be provided at the application level, such as us… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

    Journal ref: Proceedings of the 43rd IEEE Real-Time Systems Symposium (RTSS), Industry Challenge, Houston, US, Dec. 2022

  14. arXiv:2209.08716  [pdf, other

    cs.CV cs.LG

    GLARE: A Dataset for Traffic Sign Detection in Sun Glare

    Authors: Nicholas Gray, Megan Moraes, Jiang Bian, Alex Wang, Allen Tian, Kurt Wilson, Yan Huang, Haoyi Xiong, Zhishan Guo

    Abstract: Real-time machine learning object detection algorithms are often found within autonomous vehicle technology and depend on quality datasets. It is essential that these algorithms work correctly in everyday conditions as well as under strong sun glare. Reports indicate glare is one of the two most prominent environment-related reasons for crashes. However, existing datasets, such as the Laboratory f… ▽ More

    Submitted 13 December, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

  15. arXiv:2208.07322  [pdf, other

    cs.CV cs.AI

    Cross-scale Attention Guided Multi-instance Learning for Crohn's Disease Diagnosis with Pathological Images

    Authors: Ruining Deng, Can Cui, Lucas W. Remedios, Shunxing Bao, R. Michael Womick, Sophie Chiron, Jia Li, Joseph T. Roland, Ken S. Lau, Qi Liu, Keith T. Wilson, Yaohong Wang, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

    Abstract: Multi-instance learning (MIL) is widely used in the computer-aided interpretation of pathological Whole Slide Images (WSIs) to solve the lack of pixel-wise or patch-wise annotations. Often, this approach directly applies "natural image driven" MIL algorithms which overlook the multi-scale (i.e. pyramidal) nature of WSIs. Off-the-shelf MIL algorithms are typically deployed on a single-scale of WSIs… ▽ More

    Submitted 15 August, 2022; originally announced August 2022.

  16. Investigating Bayesian optimization for expensive-to-evaluate black box functions: Application in fluid dynamics

    Authors: Mike Diessner, Joseph O'Connor, Andrew Wynn, Sylvain Laizet, Yu Guan, Kevin Wilson, Richard D. Whalley

    Abstract: Bayesian optimization provides an effective method to optimize expensive-to-evaluate black box functions. It has been widely applied to problems in many fields, including notably in computer science, e.g. in machine learning to optimize hyperparameters of neural networks, and in engineering, e.g. in fluid dynamics to optimize control strategies that maximize drag reduction. This paper empirically… ▽ More

    Submitted 20 December, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    Journal ref: Front. Appl. Math. Stat. 8:1076296 (2022)

  17. arXiv:2207.00562  [pdf, other

    cs.SD eess.AS

    Distance-Based Sound Separation

    Authors: Katharine Patterson, Kevin Wilson, Scott Wisdom, John R. Hershey

    Abstract: We propose the novel task of distance-based sound separation, where sounds are separated based only on their distance from a single microphone. In the context of assisted listening devices, proximity provides a simple criterion for sound selection in noisy environments that would allow the user to focus on sounds relevant to a local conversation. We demonstrate the feasibility of this approach by… ▽ More

    Submitted 1 July, 2022; originally announced July 2022.

    Comments: Accepted for publication at Interspeech 2022

  18. arXiv:2203.15588  [pdf

    cs.LG cs.AI cs.CV

    Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review

    Authors: Can Cui, Haichun Yang, Yaohong Wang, Shilin Zhao, Zuhayr Asad, Lori A. Coburn, Keith T. Wilson, Bennett A. Landman, Yuankai Huo

    Abstract: The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and… ▽ More

    Submitted 26 January, 2023; v1 submitted 25 March, 2022; originally announced March 2022.

  19. arXiv:2109.09004  [pdf, other

    eess.IV cs.CV

    Random Multi-Channel Image Synthesis for Multiplexed Immunofluorescence Imaging

    Authors: Shunxing Bao, Yucheng Tang, Ho Hin Lee, Riqiang Gao, Sophie Chiron, Ilwoo Lyu, Lori A. Coburn, Keith T. Wilson, Joseph T. Roland, Bennett A. Landman, Yuankai Huo

    Abstract: Multiplex immunofluorescence (MxIF) is an emerging imaging technique that produces the high sensitivity and specificity of single-cell mapping. With a tenet of 'seeing is believing', MxIF enables iterative staining and imaging extensive antibodies, which provides comprehensive biomarkers to segment and group different cells on a single tissue section. However, considerable depletion of the scarce… ▽ More

    Submitted 18 September, 2021; originally announced September 2021.

    Comments: Accepted at the third MICCAI workshop on Computational Pathology (COMPAY 2021)

  20. arXiv:2105.02096  [pdf, other

    cs.SD cs.LG eess.AS

    End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

    Authors: Soumi Maiti, Hakan Erdogan, Kevin Wilson, Scott Wisdom, Shinji Watanabe, John R. Hershey

    Abstract: We present an end-to-end deep network model that performs meeting diarization from single-channel audio recordings. End-to-end diarization models have the advantage of handling speaker overlap and enabling straightforward handling of discriminative training, unlike traditional clustering-based diarization methods. The proposed system is designed to handle meetings with unknown numbers of speakers,… ▽ More

    Submitted 5 May, 2021; originally announced May 2021.

    Comments: 5 pages, 2 figures, ICASSP 2021

    Journal ref: ICASSP 2021, SPE-54.1

  21. arXiv:2009.04323  [pdf, other

    eess.AS cs.LG cs.SD eess.SP stat.ML

    VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

    Authors: Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein

    Abstract: We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance unde… ▽ More

    Submitted 9 September, 2020; originally announced September 2020.

  22. arXiv:2006.12701  [pdf, other

    eess.AS cs.LG cs.SD

    Unsupervised Sound Separation Using Mixture Invariant Training

    Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron J. Weiss, Kevin Wilson, John R. Hershey

    Abstract: In recent years, rapid progress has been made on the problem of single-channel sound separation using supervised training of deep neural networks. In such supervised approaches, a model is trained to predict the component sources from synthetic mixtures created by adding up isolated ground-truth sources. Reliance on this synthetic training data is problematic because good performance depends upon… ▽ More

    Submitted 23 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Accepted for spotlight presentation at NeurIPS 2020

  23. arXiv:2003.08310  [pdf, other

    cs.CV

    On the Distribution of Minima in Intrinsic-Metric Rotation Averaging

    Authors: Kyle Wilson, David Bindel

    Abstract: Rotation Averaging is a non-convex optimization problem that determines orientations of a collection of cameras from their images of a 3D scene. The problem has been studied using a variety of distances and robustifiers. The intrinsic (or geodesic) distance on SO(3) is geometrically meaningful; but while some extrinsic distance-based solvers admit (conditional) guarantees of correctness, no compar… ▽ More

    Submitted 18 March, 2020; originally announced March 2020.

    Comments: To be published in CVPR2020

  24. arXiv:1911.07953  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement

    Authors: Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey

    Abstract: This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation. Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function. For beamforming, we explore multiple ways of computing time-varying covariance matrices, incl… ▽ More

    Submitted 3 November, 2020; v1 submitted 18 November, 2019; originally announced November 2019.

    Comments: 7 pages, 7 figures, IEEE SLT 2021 (slt2020.org)

  25. arXiv:1908.01901  [pdf, other

    cs.LG eess.IV stat.ML

    Fully-automated patient-level malaria assessment on field-prepared thin blood film microscopy images, including Supplementary Information

    Authors: Charles B. Delahunt, Mayoore S. Jaiswal, Matthew P. Horning, Samantha Janko, Clay M. Thompson, Sourabh Kulhare, Liming Hu, Travis Ostbye, Grace Yun, Roman Gebrehiwot, Benjamin K. Wilson, Earl Long, Stephane Proux, Dionicia Gamboa, Peter Chiodini, Jane Carter, Mehul Dhorda, David Isaboke, Bernhards Ogutu, Wellington Oyibo, Elizabeth Villasis, Kyaw Myo Tun, Christine Bachman, David Bell, Courosh Mehanian

    Abstract: Malaria is a life-threatening disease affecting millions. Microscopy-based assessment of thin blood films is a standard method to (i) determine malaria species and (ii) quantitate high-parasitemia infections. Full automation of malaria microscopy by machine learning (ML) is a challenging task because field-prepared slides vary widely in quality and presentation, and artifacts often heavily outnumb… ▽ More

    Submitted 11 September, 2022; v1 submitted 5 August, 2019; originally announced August 2019.

    Comments: 16 pages, 13 figures

    MSC Class: 68T10 ACM Class: I.5.0

  26. arXiv:1905.03330  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Universal Sound Separation

    Authors: Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, John R. Hershey

    Abstract: Recent deep learning approaches have achieved impressive performance on speech enhancement and separation tasks. However, these approaches have not been investigated for separating mixtures of arbitrary sounds of different types, a task we refer to as universal sound separation, and it is unknown how performance on speech tasks carries over to non-speech tasks. To study this question, we develop a… ▽ More

    Submitted 2 August, 2019; v1 submitted 8 May, 2019; originally announced May 2019.

    Comments: 5 pages, accepted to WASPAA 2019

  27. arXiv:1811.08521  [pdf, other

    cs.SD eess.AS

    Differentiable Consistency Constraints for Improved Deep Speech Enhancement

    Authors: Scott Wisdom, John R. Hershey, Kevin Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, Rif A. Saurous

    Abstract: In recent years, deep networks have led to dramatic improvements in speech enhancement by framing it as a data-driven pattern recognition problem. In many modern enhancement systems, large amounts of data are used to train a deep network to estimate masks for complex-valued short-time Fourier transforms (STFTs) to suppress noise and preserve speech. However, current masking approaches often neglec… ▽ More

    Submitted 20 November, 2018; originally announced November 2018.

  28. arXiv:1811.07030  [pdf, other

    cs.SD eess.AS

    Exploring Tradeoffs in Models for Low-latency Speech Enhancement

    Authors: Kevin Wilson, Michael Chinen, Jeremy Thorpe, Brian Patton, John Hershey, Rif A. Saurous, Jan Skoglund, Richard F. Lyon

    Abstract: We explore a variety of neural networks configurations for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on previous state-of-the-art performance on the CHiME2 speech enhancement task by 0.4 decibels in signal-to-distortion ratio (SDR). We examine trade-offs such as non-causal look-ahead, computation, and parameter count versus enhancement performance and… ▽ More

    Submitted 16 November, 2018; originally announced November 2018.

  29. arXiv:1810.04826  [pdf, other

    eess.AS cs.LG eess.SP stat.ML

    VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

    Authors: Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno

    Abstract: In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embe… ▽ More

    Submitted 19 June, 2019; v1 submitted 10 October, 2018; originally announced October 2018.

    Comments: To appear in Interspeech 2019

  30. arXiv:1808.00606  [pdf, other

    cs.SD eess.AS

    AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

    Authors: Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

    Abstract: Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or… ▽ More

    Submitted 23 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

    Comments: Interspeech, 2018

  31. arXiv:1804.03619  [pdf, other

    cs.SD cs.CV eess.AS

    Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation

    Authors: Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Freeman, Michael Rubinstein

    Abstract: We present a joint audio-visual model for isolating a single speech signal from a mixture of sounds such as other speakers and background noise. Solving this task using only audio as input is extremely challenging and does not provide an association of the separated speech signals with speakers in the video. In this paper, we present a deep network-based model that incorporates both visual and aud… ▽ More

    Submitted 9 August, 2018; v1 submitted 10 April, 2018; originally announced April 2018.

    Comments: Accepted to SIGGRAPH 2018. Project webpage: https://looking-to-listen.github.io

    Journal ref: ACM Trans. Graph. 37(4): 112:1-112:11 (2018)

  32. arXiv:1611.09207  [pdf, other

    cs.CL cs.LG stat.ML

    AutoMOS: Learning a non-intrusive assessor of naturalness-of-speech

    Authors: Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley

    Abstract: Developers of text-to-speech synthesizers (TTS) often make use of human raters to assess the quality of synthesized speech. We demonstrate that we can model human raters' mean opinion scores (MOS) of synthesized speech using a deep recurrent neural network whose inputs consist solely of a raw waveform. Our best models provide utterance-level estimates of MOS only moderately inferior to sampled hum… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

    Comments: 4 pages, 2 figures, 2 tables, NIPS 2016 End-to-end Learning for Speech and Audio Processing Workshop

  33. arXiv:1609.09430  [pdf, other

    cs.SD cs.LG stat.ML

    CNN Architectures for Large-Scale Audio Classification

    Authors: Shawn Hershey, Sourish Chaudhuri, Daniel P. W. Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal, Devin Platt, Rif A. Saurous, Bryan Seybold, Malcolm Slaney, Ron J. Weiss, Kevin Wilson

    Abstract: Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying th… ▽ More

    Submitted 10 January, 2017; v1 submitted 29 September, 2016; originally announced September 2016.

    Comments: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new additions

  34. arXiv:1604.02336  [pdf, other

    cs.AI cs.LG

    Back to the Basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation

    Authors: Kevin H. Wilson, Yan Karklin, Bojian Han, Chaitanya Ekanadham

    Abstract: Estimating student proficiency is an important task for computer based learning systems. We compare a family of IRT-based proficiency estimation methods to Deep Knowledge Tracing (DKT), a recently proposed recurrent neural network model with promising initial results. We evaluate how well each model predicts a student's future response given previous responses using two publicly available and one… ▽ More

    Submitted 21 May, 2016; v1 submitted 8 April, 2016; originally announced April 2016.

    Comments: 6 pages, 2 figures, Educational Data Mining 2016

  35. arXiv:1510.08172  [pdf, other

    cs.IT

    Spectrally and Energy Efficient OFDM (SEE-OFDM) for Intensity Modulated Optical Wireless Systems

    Authors: Emily Lam, Sarah Kate Wilson, Hany Elgala, Thomas D. C. Little

    Abstract: Spectrally and energy efficient orthogonal frequency division multiplexing (SEE-OFDM) is an optical OFDM technique based on combining multiple asymmetrically clipped optical OFDM (ACO-OFDM) signals into one OFDM signal. By summing different components together, SEE-OFDM can achieve the same spectral efficiency as DC-biased optical OFDM (DCO-OFDM) without an energy-inefficient DC-bias. This paper i… ▽ More

    Submitted 27 October, 2015; originally announced October 2015.

    Comments: 26 pages, 13 figures