Skip to main content

Showing 1–26 of 26 results for author: Hung, J

  1. arXiv:2403.01792  [pdf, other

    cs.SD eess.AS

    ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning

    Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

    Abstract: Speech separation has recently made significant progress thanks to the fine-grained vision used in time-domain methods. However, several studies have shown that adopting Short-Time Fourier Transform (STFT) for feature extraction could be beneficial when encountering harsher conditions, such as noise or reverberation. Therefore, we propose a magnitude-conditioned time-domain framework, ConSep, to i… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  2. arXiv:2403.01785  [pdf, other

    cs.SD eess.AS

    What do neural networks listen to? Exploring the crucial bands in Speech Enhancement using Sinc-convolution

    Authors: Kuan-Hsun Ho, Jeih-weih Hung, Berlin Chen

    Abstract: This study introduces a reformed Sinc-convolution (Sincconv) framework tailored for the encoder component of deep networks for speech enhancement (SE). The reformed Sincconv, based on parametrized sinc functions as band-pass filters, offers notable advantages in terms of training efficiency, filter diversity, and interpretability. The reformed Sinc-conv is evaluated in conjunction with various SE… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  3. arXiv:2402.06859  [pdf, other

    cs.LG cs.AI cs.IR

    LiRank: Industrial Large Scale Ranking Models at LinkedIn

    Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

    Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3

  4. arXiv:2402.04374  [pdf, other

    cs.RO

    SKOOTR: A SKating, Omni-Oriented, Tripedal Robot

    Authors: Adam Joshua Hung, Challen Enninful Adu, Talia Y. Moore

    Abstract: In both animals and robots, locomotion capabilities are determined by the physical structure of the system. The majority of legged animals and robots are bilaterally symmetric, which facilitates locomotion with consistent headings and obstacle traversal, but leads to constraints in their turning ability. On the other hand, radially symmetric animals have demonstrated rapid turning abilities enable… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  5. arXiv:2312.03231  [pdf, other

    cs.LG cs.AI cs.CV cs.HC eess.AS

    Deep Multimodal Fusion for Surgical Feedback Classification

    Authors: Rafal Kocielnik, Elyssa Y. Wong, Timothy N. Chu, Lydia Lin, De-An Huang, Jiayun Wang, Anima Anandkumar, Andrew J. Hung

    Abstract: Quantification of real-time informal feedback delivered by an experienced surgeon to a trainee during surgery is important for skill improvements in surgical training. Such feedback in the live operating room is inherently multimodal, consisting of verbal conversations (e.g., questions and answers) as well as non-verbal elements (e.g., through visual cues like pointing to anatomic elements). In th… ▽ More

    Submitted 5 December, 2023; originally announced December 2023.

    Journal ref: Published in Proceedings of Machine Learning for Health 2024

  6. arXiv:2310.04997  [pdf

    cs.CY

    Unmasking Biases and Navigating Pitfalls in the Ophthalmic Artificial Intelligence Lifecycle: A Review

    Authors: Luis Filipe Nakayama, João Matos, Justin Quion, Frederico Novaes, William Greig Mitchell, Rogers Mwavu, Ju-Yi Ji Hung, Alvina Pauline dy Santiago, Warachaya Phanphruk, Jaime S. Cardoso, Leo Anthony Celi

    Abstract: Over the past two decades, exponential growth in data availability, computational power, and newly available modeling techniques has led to an expansion in interest, investment, and research in Artificial Intelligence (AI) applications. Ophthalmology is one of many fields that seek to benefit from AI given the advent of telemedicine screening programs and the use of ancillary imaging. However, bef… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  7. arXiv:2308.12615  [pdf, other

    cs.SD eess.AS

    Naaloss: Rethinking the objective of speech enhancement

    Authors: Kuan-Hsun Ho, En-Lun Yu, Jeih-weih Hung, Berlin Chen

    Abstract: Reducing noise interference is crucial for automatic speech recognition (ASR) in a real-world scenario. However, most single-channel speech enhancement (SE) generates "processing artifacts" that negatively affect ASR performance. Hence, in this study, we suggest a Noise- and Artifacts-aware loss function, NAaLoss, to ameliorate the influence of artifacts from a novel perspective. NAaLoss considers… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

  8. arXiv:2307.01292  [pdf, other

    cs.CR cs.AI cs.LG

    Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

    Authors: Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov

    Abstract: Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robust… ▽ More

    Submitted 6 August, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: 17 pages, 9 figures, 6 tables

  9. arXiv:2208.06917   

    cs.CV

    MTCSNN: Multi-task Clinical Siamese Neural Network for Diabetic Retinopathy Severity Prediction

    Authors: Chao Feng, Jui Po Hung, Aishan Li, Jieping Yang, Xinyu Zhang

    Abstract: Diabetic Retinopathy (DR) has become one of the leading causes of vision impairment in working-aged people and is a severe problem worldwide. However, most of the works ignored the ordinal information of labels. In this project, we propose a novel design MTCSNN, a Multi-task Clinical Siamese Neural Network for Diabetic Retinopathy severity prediction task. The novelty of this project is to utilize… ▽ More

    Submitted 27 October, 2022; v1 submitted 14 August, 2022; originally announced August 2022.

    Comments: This paper is not sufficiently exhaustive and lacks some analysis. Besides, certain methods of this paper are from the first author's other co-first authoring research paper. There exist disputes among authors, thus we decide to withdraw this paper currently

  10. arXiv:2205.03028  [pdf, other

    cs.RO cs.CV cs.LG

    Quantification of Robotic Surgeries with Vision-Based Deep Learning

    Authors: Dani Kiyasseh, Runzhuo Ma, Taseen F. Haque, Jessica Nguyen, Christian Wagner, Animashree Anandkumar, Andrew J. Hung

    Abstract: Surgery is a high-stakes domain where surgeons must navigate critical anatomical structures and actively avoid potential complications while achieving the main task at hand. Such surgical activity has been shown to affect long-term patient outcomes. To better understand this relationship, whose mechanics remain unknown for the majority of surgical procedures, we hypothesize that the core elements… ▽ More

    Submitted 6 May, 2022; originally announced May 2022.

  11. arXiv:2202.01975  [pdf

    q-bio.QM cs.LG

    Performance of multilabel machine learning models and risk stratification schemas for predicting stroke and bleeding risk in patients with non-valvular atrial fibrillation

    Authors: Juan Lu, Rebecca Hutchens, Joseph Hung, Mohammed Bennamoun, Brendan McQuillan, Tom Briffa, Ferdous Sohel, Kevin Murray, Jonathon Stewart, Benjamin Chow, Frank Sanfilippo, Girish Dwivedi

    Abstract: Appropriate antithrombotic therapy for patients with atrial fibrillation (AF) requires assessment of ischemic stroke and bleeding risks. However, risk stratification schemas such as CHA2DS2-VASc and HAS-BLED have modest predictive capacity for patients with AF. Machine learning (ML) techniques may improve predictive performance and support decision-making for appropriate antithrombotic therapy. We… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  12. arXiv:2201.00696  [pdf

    cs.IR cs.CR

    Full-privacy secured search engine empowered by efficient genome-mapping algorithms

    Authors: Yuan-Yu Chang, Sheng-Tang Wong, Emmanuel O Salawu, Yu-Xuan Wang, Jui-Hung Hung, Lee-Wei Yang

    Abstract: Since the 90s, keyword-based search engines have been helping people locate relevant web content via a simple query, so have the recent full-text-based search engines mainly used for plagiarism detection following an article upload. However, these "free" or paid services operate by storing users' search queries and preferences for personal profiling and targeted ads delivery, while user-uploaded a… ▽ More

    Submitted 25 April, 2022; v1 submitted 29 December, 2021; originally announced January 2022.

    Comments: 21 pages, 5 figures, 3 tables

  13. arXiv:2108.11598  [pdf

    eess.AS cs.MM cs.SD eess.SP

    Cross-domain Single-channel Speech Enhancement Model with Bi-projection Fusion Module for Noise-robust ASR

    Authors: Fu-An Chao, Jeih-weih Hung, Berlin Chen

    Abstract: In recent decades, many studies have suggested that phase information is crucial for speech enhancement (SE), and time-domain single-channel speech enhancement techniques have shown promise in noise suppression and robust automatic speech recognition (ASR). This paper presents a continuation of the above lines of research and explores two effective SE methods that consider phase information in tim… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 6 pages, 3 figures, Accepted by ICME 2021

  14. arXiv:2107.01531  [pdf

    eess.AS cs.SD eess.SP

    TENET: A Time-reversal Enhancement Network for Noise-robust ASR

    Authors: Fu-An Chao, Shao-Wei Fan Jiang, Bi-Cheng Yan, Jeih-weih Hung, Berlin Chen

    Abstract: Due to the unprecedented breakthroughs brought about by deep learning, speech enhancement (SE) techniques have been developed rapidly and play an important role prior to acoustic modeling to mitigate noise effects on speech. To increase the perceptual quality of speech, current state-of-the-art in the SE field adopts adversarial training by connecting an objective metric to the discriminator. Howe… ▽ More

    Submitted 14 September, 2021; v1 submitted 3 July, 2021; originally announced July 2021.

    Comments: Accepted to ASRU 2021

  15. arXiv:2011.07442  [pdf, other

    cs.SD cs.LG eess.AS

    Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information

    Authors: Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, Ching-Feng Liu, Jeih-weih Hung, Shinji Watanabe, Yu Tsao

    Abstract: Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information t… ▽ More

    Submitted 18 June, 2023; v1 submitted 14 November, 2020; originally announced November 2020.

    Comments: To appear in IEEE Transactions on Audio, Speech and Language Processing (TASLP)

  16. arXiv:2008.11833  [pdf

    cs.CV cs.RO

    Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery

    Authors: Francisco Luongo, Ryan Hakim, Jessica H. Nguyen, Animashree Anandkumar, Andrew J Hung

    Abstract: Our previous work classified a taxonomy of suturing gestures during a vesicourethral anastomosis of robotic radical prostatectomy in association with tissue tears and patient outcomes. Herein, we train deep-learning based computer vision (CV) to automate the identification and classification of suturing gestures for needle driving attempts. Using two independent raters, we manually annotated live… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: 5 figures, 2 tables

    ACM Class: J.3

  17. arXiv:2008.07618  [pdf, other

    eess.AS cs.LG cs.SD

    Incorporating Broad Phonetic Information for Speech Enhancement

    Authors: Yen-Ju Lu, Chien-Feng Liao, Xugang Lu, Jeih-weih Hung, Yu Tsao

    Abstract: In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals. Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance. To obtain the phonetic information, we usually prepare a phoneme-based aco… ▽ More

    Submitted 13 August, 2020; originally announced August 2020.

    Comments: to be published in Interspeech 2020

  18. arXiv:2002.09505  [pdf, other

    cs.LG cs.AI stat.ML

    Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

    Authors: Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason Yosinski

    Abstract: In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still… ▽ More

    Submitted 25 August, 2020; v1 submitted 21 February, 2020; originally announced February 2020.

    Comments: Accepted into ICML 2020

  19. arXiv:1912.02164  [pdf, other

    cs.CL cs.AI cs.LG

    Plug and Play Language Models: A Simple Approach to Controlled Text Generation

    Authors: Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, Rosanne Liu

    Abstract: Large transformer-based language models (LMs) trained on huge text corpora have shown unparalleled generation capabilities. However, controlling attributes of the generated language (e.g. switching topic or sentiment) is difficult without modifying the model architecture or fine-tuning on attribute-specific data and entailing the significant cost of retraining. We propose a simple alternative: the… ▽ More

    Submitted 3 March, 2020; v1 submitted 4 December, 2019; originally announced December 2019.

    Comments: ICLR 2020 camera ready

  20. arXiv:1911.09847  [pdf, ps, other

    eess.AS cs.SD eess.SP

    Time-Domain Multi-modal Bone/air Conducted Speech Enhancement

    Authors: Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, Szu-Wei Fu, Yu Tsao, Jeih-weih Hung

    Abstract: Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while ma… ▽ More

    Submitted 17 June, 2020; v1 submitted 21 November, 2019; originally announced November 2019.

    Comments: multi-modal, bone/air-conducted signals, speech enhancement, fully convolutional network

    Journal ref: IEEE Signal Processing Letters, vol. 27, pp. 1035-1039, 2020

  21. arXiv:1911.08153  [pdf, ps, other

    eess.AS cs.LG cs.SD

    Distributed Microphone Speech Enhancement based on Deep Learning

    Authors: Syu-Siang Wang, Yu-You Liang, Jeih-weih Hung, Yu Tsao, Hsin-Min Wang, Shih-Hau Fang

    Abstract: Speech-related applications deliver inferior performance in complex noise environments. Therefore, this study primarily addresses this problem by introducing speech-enhancement (SE) systems based on deep neural networks (DNNs) applied to a distributed microphone architecture, and then investigates the effectiveness of three different DNN-model structures. The first system constructs a DNN model fo… ▽ More

    Submitted 24 May, 2020; v1 submitted 19 November, 2019; originally announced November 2019.

    Comments: deep neural network, multi-channel speech enhancement, distributed microphone architecture, diffuse noise environment

  22. arXiv:1904.01631  [pdf, other

    cs.DC cs.LG stat.ML

    TonY: An Orchestrator for Distributed Machine Learning Jobs

    Authors: Anthony Hsu, Keqiu Hu, Jonathan Hung, Arun Suresh, Zhe Zhang

    Abstract: Training machine learning (ML) models on large datasets requires considerable computing power. To speed up training, it is typical to distribute training across several machines, often with specialized hardware like GPUs or TPUs. Managing a distributed training job is complex and requires dealing with resource contention, distributed configurations, monitoring, and fault tolerance. In this paper,… ▽ More

    Submitted 23 March, 2019; originally announced April 2019.

    Comments: 2 pages, to be published in OpML '19

  23. arXiv:1811.11357  [pdf, other

    stat.ML cs.LG

    Metropolis-Hastings Generative Adversarial Networks

    Authors: Ryan Turner, Jane Hung, Eric Frank, Yunus Saatci, Jason Yosinski

    Abstract: We introduce the Metropolis-Hastings generative adversarial network (MH-GAN), which combines aspects of Markov chain Monte Carlo and GANs. The MH-GAN draws samples from the distribution implicitly defined by a GAN's discriminator-generator pair, as opposed to standard GANs which draw samples from the distribution defined only by the generator. It uses the discriminator from GAN training to build a… ▽ More

    Submitted 17 May, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

  24. arXiv:1811.03486  [pdf, other

    eess.AS cs.SD

    Speech Enhancement Based on Reducing the Detail Portion of Speech Spectrograms in Modulation Domain via Discrete Wavelet Transform

    Authors: Shih-kuang Lee, Syu-Siang Wang, Yu Tsao, Jeih-weih Hung

    Abstract: In this paper, we propose a novel speech enhancement (SE) method by exploiting the discrete wavelet transform (DWT). This new method reduces the amount of fast time-varying portion, viz. the DWT-wise detail component, in the spectrogram of speech signals so as to highlight the speech-dominant component and achieves better speech quality. A particularity of this new method is that it is completely… ▽ More

    Submitted 8 November, 2018; originally announced November 2018.

    Comments: 4 pages, 4 figures, to appear in ISCSLP 2018

  25. arXiv:1804.09548  [pdf

    cs.CV

    Applying Faster R-CNN for Object Detection on Malaria Images

    Authors: Jane Hung, Deepali Ravel, Stefanie C. P. Lopes, Gabriel Rangel, Odailton Amaral Nery, Benoit Malleret, Francois Nosten, Marcus V. G. Lacerda, Marcelo U. Ferreira, Laurent Rénia, Manoj T. Duraisingh, Fabio T. M. Costa, Matthias Marti, Anne E. Carpenter

    Abstract: Deep learning based models have had great success in object detection, but the state of the art models have not yet been widely applied to biological image data. We apply for the first time an object detection model previously used on natural images to identify cells and recognize their stages in brightfield microscopy images of malaria-infected blood. Many micro-organisms like malaria parasites a… ▽ More

    Submitted 11 March, 2019; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: CVPR 2017: computer vision for microscopy image analysis (CVMI) Workshop

  26. Wavelet speech enhancement based on nonnegative matrix factorization

    Authors: Syu-Siang Wang, Alan Chern, Yu Tsao, Jeih-weih Hung, Xugang Lu, Ying-Hui Lai, Borching Su

    Abstract: For most of the state-of-the-art speech enhancement techniques, a spectrogram is usually preferred than the respective time-domain raw data since it reveals more compact presentation together with conspicuous temporal information over a long time span. However, the short-time Fourier transform (STFT) that creates the spectrogram in general distorts the original signal and thereby limits the capabi… ▽ More

    Submitted 10 January, 2016; originally announced January 2016.