Skip to main content

Showing 1–49 of 49 results for author: Dong, H

  1. arXiv:2405.07759  [pdf, other

    cs.MM cs.AI cs.NI eess.IV

    MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction

    Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpo… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  2. arXiv:2404.11336  [pdf, other

    eess.SY cs.CV cs.RO

    Vision-based control for landing an aerial vehicle on a marine vessel

    Authors: Haohua Dong

    Abstract: This work addresses the landing problem of an aerial vehicle, exemplified by a simple quadrotor, on a moving platform using image-based visual servo control. First, the mathematical model of the quadrotor aircraft is introduced, followed by the design of the inner-loop control. At the second stage, the image features on the textured target plane are exploited to derive a vision-based control law.… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  3. arXiv:2404.07318  [pdf, other

    eess.IV cs.CV cs.LG

    Rethinking Perceptual Metrics for Medical Image Translation

    Authors: Nicholas Konz, Yuwen Chen, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski

    Abstract: Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in gene… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  4. arXiv:2403.10786  [pdf, other

    eess.IV cs.CV cs.LG

    ContourDiff: Unpaired Image Translation with Contour-Guided Diffusion Models

    Authors: Yuwen Chen, Nicholas Konz, Hanxue Gu, Haoyu Dong, Yaqian Chen, Lin Li, Jisoo Lee, Maciej A. Mazurowski

    Abstract: Accurately translating medical images across different modalities (e.g., CT to MRI) has numerous downstream clinical and machine learning applications. While several methods have been proposed to achieve this, they often prioritize perceptual quality with respect to output domain features over preserving anatomical fidelity. However, maintaining anatomy during translation is essential for many tas… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Code will be released on GitHub

  5. arXiv:2402.05210  [pdf, other

    eess.IV cs.CV cs.LG stat.ML

    Anatomically-Controllable Medical Image Generation with Segmentation-Guided Diffusion Models

    Authors: Nicholas Konz, Yuwen Chen, Haoyu Dong, Maciej A. Mazurowski

    Abstract: Diffusion models have enabled remarkably high-quality medical image generation, yet it is challenging to enforce anatomical constraints in generated images. To this end, we propose a diffusion model-based method that supports anatomically-controllable medical image generation, by following a multi-class anatomical segmentation mask at each sampling step. We additionally introduce a random mask abl… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Accepted at MICCAI 2024. Code and synthetic dataset: https://github.com/mazurowski-lab/segmentation-guided-diffusion

  6. arXiv:2401.12974  [pdf, other

    eess.IV cs.CV q-bio.QM

    SegmentAnyBone: A Universal Model that Segments Any Bone at Any Location on MRI

    Authors: Hanxue Gu, Roy Colglazier, Haoyu Dong, Jikai Zhang, Yaqian Chen, Zafer Yildiz, Yuwen Chen, Lin Li, Jichen Yang, Jay Willhite, Alex M. Meyer, Brian Guo, Yashvi Atul Shah, Emily Luo, Shipra Rajput, Sally Kuehn, Clark Bulleit, Kevin A. Wu, Jisoo Lee, Brandon Ramirez, Darui Lu, Jay M. Levin, Maciej A. Mazurowski

    Abstract: Magnetic Resonance Imaging (MRI) is pivotal in radiology, offering non-invasive and high-quality insights into the human body. Precise segmentation of MRIs into different organs and tissues would be highly beneficial since it would allow for a higher level of understanding of the image content and enable important measurements, which are essential for accurate diagnosis and effective treatment pla… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: 15 pages, 15 figures

  7. arXiv:2311.12257  [pdf, other

    cs.SD cs.IR cs.MM eess.AS

    Equipping Pretrained Unconditional Music Transformers with Instrument and Genre Controls

    Authors: Weihan Xu, Julian McAuley, Shlomo Dubnov, Hao-Wen Dong

    Abstract: The ''pretraining-and-finetuning'' paradigm has become a norm for training domain-specific models in natural language processing and computer vision. In this work, we aim to examine this paradigm for symbolic music generation through leveraging the largest ever symbolic music dataset sourced from the MuseScore forum. We first pretrain a large unconditional transformer model using 1.5 million songs… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  8. arXiv:2308.09693  [pdf, other

    cs.CV cs.LG eess.IV

    A Lightweight Transformer for Faster and Robust EBSD Data Collection

    Authors: Harry Dong, Sean Donegan, Megna Shah, Yuejie Chi

    Abstract: Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures tha… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  9. arXiv:2308.00507  [pdf, other

    eess.IV cs.CV cs.LG

    Improved Prognostic Prediction of Pancreatic Cancer Using Multi-Phase CT by Integrating Neural Distance and Texture-Aware Transformer

    Authors: Hexin Dong, Jiawen Yao, Yuxing Tang, Mingze Yuan, Yingda Xia, Jian Zhou, Hong Lu, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Yu Shi, Ling Zhang

    Abstract: Pancreatic ductal adenocarcinoma (PDAC) is a highly lethal cancer in which the tumor-vascular involvement greatly affects the resectability and, thus, overall survival of patients. However, current prognostic prediction methods fail to explicitly and accurately investigate relationships between the tumor and nearby important vessels. This paper proposes a novel learnable neural distance that descr… ▽ More

    Submitted 13 September, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

    Comments: MICCAI 2023

  10. arXiv:2307.08208  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Towards Stealthy Backdoor Attacks against Speech Recognition via Elements of Sound

    Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Stefanos Koffas, Yiming Li

    Abstract: Deep neural networks (DNNs) have been widely and successfully adopted and deployed in various applications of speech recognition. Recently, a few works revealed that these models are vulnerable to backdoor attacks, where the adversaries can implant malicious prediction behaviors into victim models by poisoning their training process. In this paper, we revisit poison-only backdoor attacks against s… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: 13 pages

  11. arXiv:2307.04525  [pdf, other

    eess.IV cs.CV cs.LG

    Cluster-Induced Mask Transformers for Effective Opportunistic Gastric Cancer Screening on Non-contrast CT Scans

    Authors: Mingze Yuan, Yingda Xia, Xin Chen, Jiawen Yao, Junli Wang, Mingyan Qiu, Hexin Dong, Jingren Zhou, Bin Dong, Le Lu, Li Zhang, Zaiyi Liu, Ling Zhang

    Abstract: Gastric cancer is the third leading cause of cancer-related mortality worldwide, but no guideline-recommended screening test exists. Existing methods can be invasive, expensive, and lack sensitivity to identify early-stage gastric cancer. In this study, we explore the feasibility of using a deep learning approach on non-contrast CT scans for gastric cancer detection. We propose a novel cluster-ind… ▽ More

    Submitted 15 July, 2023; v1 submitted 10 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

  12. arXiv:2306.09736  [pdf

    eess.SY

    Overtaking-enabled Eco-approach Control at Signalized Intersections for Connected and Automated Vehicles

    Authors: Haoxuan Dong, Weichao Zhuang, Guoyuan Wu, Zhaojian Li, Guodong Yin, Ziyou Song

    Abstract: Preceding vehicles typically dominate the movement of following vehicles in traffic systems, thereby significantly influencing the efficacy of eco-driving control that concentrates on vehicle speed optimization. To potentially mitigate the negative effect of preceding vehicles on eco-driving control at the signalized intersection, this paper proposes an overtakingenabled eco-approach control (OEAC… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  13. arXiv:2306.09635  [pdf, other

    cs.SD cs.LG cs.MM eess.AS eess.SP

    CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models

    Authors: Hao-Wen Dong, Xiaoyu Liu, Jordi Pons, Gautam Bhattacharya, Santiago Pascual, Joan Serrà, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Recent work has studied text-to-audio synthesis using large amounts of paired text-audio data. However, audio recordings with high-quality text annotations can be difficult to acquire. In this work, we approach text-to-audio synthesis using unlabeled videos and pretrained language-vision models. We propose to learn the desired text-audio correspondence by leveraging the visual modality as a bridge… ▽ More

    Submitted 23 July, 2023; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by WASPAA 2023. Demo: https://salu133445.github.io/clipsonic/

  14. Physics-Augmented Data-EnablEd Predictive Control for Eco-driving of Mixed Traffic Considering Diverse Human Behaviors

    Authors: Dongjun Li, Kaixiang Zhang, Haoxuan Dong, Qun Wang, Zhaojian Li, Ziyou Song

    Abstract: Data-driven cooperative control of connected and automated vehicles (CAVs) has gained extensive research interest as it can utilize collected data to generate control actions without relying on parametric system models that are generally challenging to obtain. Existing methods mainly focused on improving traffic safety and stability, while less emphasis has been placed on energy efficiency in the… ▽ More

    Submitted 1 February, 2024; v1 submitted 2 June, 2023; originally announced June 2023.

  15. arXiv:2305.03098  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Unsupervised anomaly localization in high-resolution breast scans using deep pluralistic image completion

    Authors: Nicholas Konz, Haoyu Dong, Maciej A. Mazurowski

    Abstract: Automated tumor detection in Digital Breast Tomosynthesis (DBT) is a difficult task due to natural tumor rarity, breast tissue variability, and high resolution. Given the scarcity of abnormal images and the abundance of normal images for this problem, an anomaly detection/localization approach could be well-suited. However, most anomaly localization research in machine learning focuses on non-medi… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted in Medical Image Analysis (2023). Our code is at https://github.com/mazurowski-lab/picard

    Journal ref: Medical Image Analysis, 102836 (2023)

  16. GPSMirror: Expanding Accurate GPS Positioning to Shadowed and Indoor Regions with Backscatter

    Authors: Huixin Dong, Yirong Xie, Xianan Zhang, Wei Wang, Xinyu Zhang, Jianhua He

    Abstract: Despite the prevalence of GPS services, they still suffer from intermittent positioning with poor accuracy in partially shadowed regions like urban canyons, flyover shadows, and factories' indoor areas. Existing wisdom relies on hardware modifications of GPS receivers or power-hungry infrastructures requiring continuous plug-in power supply which is hard to provide in outdoor regions and some fact… ▽ More

    Submitted 15 April, 2023; originally announced April 2023.

    Comments: 13 pages, 26 figures, to appear in MobiCom 2023

  17. A Framework of Reconfigurable Transducer Nodes for Smart Home Environments

    Authors: Basim Hafidh, Hussein Al Osman, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: This letter presents a transducer network framework that supports the amalgamation of multiple transducers into single wireless nodes. This approach is aimed at decreasing energy consumption by reducing the number of wireless transceivers involved in such networks. To make wireless nodes easily reconfigurable, a plug and play mechanism is applied to enable the clustering of any number of transduce… ▽ More

    Submitted 25 December, 2022; originally announced January 2023.

    Journal ref: IEEE Embedded Systems Letters, vol. 7, no. 3, pp. 81-84, 2015

  18. arXiv:2212.12908  [pdf, other

    eess.SP cs.LG cs.NE

    Sitting Posture Recognition Using a Spiking Neural Network

    Authors: Jianquan Wang, Basim Hafidh, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: To increase the quality of citizens' lives, we designed a personalized smart chair system to recognize sitting behaviors. The system can receive surface pressure data from the designed sensor and provide feedback for guiding the user towards proper sitting postures. We used a liquid state machine and a logistic regression classifier to construct a spiking neural network for classifying 15 sitting… ▽ More

    Submitted 25 December, 2022; originally announced December 2022.

    Journal ref: IEEE Sensors Journal, vol. 21, no. 2, pp. 1779-1786, 2021

  19. arXiv:2212.10103  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    VSVC: Backdoor attack against Keyword Spotting based on Voiceprint Selection and Voice Conversion

    Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

    Abstract: Keyword spotting (KWS) based on deep neural networks (DNNs) has achieved massive success in voice control scenarios. However, training of such DNN-based KWS systems often requires significant data and hardware resources. Manufacturers often entrust this process to a third-party platform. This makes the training process uncontrollable, where attackers can implant backdoors in the model by manipulat… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

    Comments: 7 pages,5 figures

  20. arXiv:2212.07065  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos

    Authors: Hao-Wen Dong, Naoya Takahashi, Yuki Mitsufuji, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Recent years have seen progress beyond domain-specific sound separation for speech or music towards universal sound separation for arbitrary sounds. Prior work on universal sound separation has investigated separating a target sound out of an audio mixture given a text query. Such text-queried sound separation systems provide a natural and scalable interface for specifying arbitrary target sounds.… ▽ More

    Submitted 3 March, 2023; v1 submitted 14 December, 2022; originally announced December 2022.

    Comments: Accepted by ICLR 2023. Audio samples can be found at https://sony.github.io/CLIPSep/

  21. arXiv:2211.08697  [pdf, ps, other

    cs.SD cs.AI cs.CR cs.LG eess.AS

    PBSM: Backdoor attack against Keyword spotting based on pitch boosting and sound masking

    Authors: Hanbo Cai, Pengcheng Zhang, Hai Dong, Yan Xiao, Shunhui Ji

    Abstract: Keyword spotting (KWS) has been widely used in various speech control scenarios. The training of KWS is usually based on deep neural networks and requires a large amount of data. Manufacturers often use third-party data to train KWS. However, deep neural networks are not sufficiently interpretable to manufacturers, and attackers can manipulate third-party training data to plant backdoors during th… ▽ More

    Submitted 16 November, 2022; originally announced November 2022.

    Comments: 5 pages, 4 figures

  22. arXiv:2210.02147  [pdf, other

    eess.SY

    Adaptive Leading Cruise Control in Mixed Traffic Considering Human Behavioral Diversity

    Authors: Qun Wang, Haoxuan Dong, Fei Ju, Weichao Zhuang, Chen Lv, Liangmo Wang, Ziyou Song

    Abstract: This paper presents an adaptive leading cruise control strategy for the connected and automated vehicle (CAV) and first considers its impact on the following human-driven vehicle (HDV) with diverse driving characteristics in the unified optimization framework for improved holistic energy efficiency. The car-following behaviors of HDV are statistically calibrated using the Next Generation Simulatio… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  23. arXiv:2209.02871  [pdf, other

    cs.SD cs.MM eess.AS

    Improving Choral Music Separation through Expressive Synthesized Data from Sampled Instruments

    Authors: Ke Chen, Hao-Wen Dong, Yi Luo, Julian McAuley, Taylor Berg-Kirkpatrick, Miller Puckette, Shlomo Dubnov

    Abstract: Choral music separation refers to the task of extracting tracks of voice parts (e.g., soprano, alto, tenor, and bass) from mixed audio. The lack of datasets has impeded research on this topic as previous work has only been able to train and evaluate models on a few minutes of choral music data due to copyright issues and dataset collection difficulties. In this paper, we investigate the use of syn… ▽ More

    Submitted 6 September, 2022; originally announced September 2022.

    Comments: Camera Ready for Proceedings of the 23rd International Society for Music Information Retrieval Conference, ISMIR 2022

    Journal ref: The 23rd International Society for Music Information Retrieval Conference, 2022

  24. arXiv:2208.12517  [pdf

    cs.RO eess.SY

    Enabling Massage Actions: An Interactive Parallel Robot with Compliant Joints

    Authors: Huixu Dong, Yue Feng, Chen Qiu, Ye Pan, Miaoying He, I-Ming Chen

    Abstract: We propose a parallel massage robot with compliant joints based on the series elastic actuator (SEA), offering a unified force-position control approach. First, the kinematic and static force models are established for obtaining the corresponding control variables. Then, a novel force-position control strategy is proposed to separately control the force-position along the normal direction of the s… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  25. arXiv:2207.06983  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Multitrack Music Transformer

    Authors: Hao-Wen Dong, Ke Chen, Shlomo Dubnov, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Existing approaches for generating multitrack music with transformer models have been limited in terms of the number of instruments, the length of the music segments and slow inference. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations. In this work, we propose a new multitrack music representation that allows a diverse set of ins… ▽ More

    Submitted 24 May, 2023; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted by ICASSP 2023. Demo: https://salu133445.github.io/mmt/ . Code: https://github.com/salu133445/mmt

  26. The Intrinsic Manifolds of Radiological Images and their Role in Deep Learning

    Authors: Nicholas Konz, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski

    Abstract: The manifold hypothesis is a core mechanism behind the success of deep learning, so understanding the intrinsic manifold structure of image data is central to studying how neural networks learn from the data. Intrinsic dataset manifolds and their relationship to learning difficulty have recently begun to be studied for the common domain of natural images, but little such research has been attempte… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: preprint version, accepted for MICCAI 2022 (25th International Conference on Medical Image Computing and Computer Assisted Intervention). 8 pages (+ author names + references + supplementary), 4 figures. Code available at https://github.com/mazurowski-lab/radiologyintrinsicmanifolds

  27. arXiv:2206.09109  [pdf, other

    stat.ML cs.LG eess.SP math.OC

    Fast and Provable Tensor Robust Principal Component Analysis via Scaled Gradient Descent

    Authors: Harry Dong, Tian Tong, Cong Ma, Yuejie Chi

    Abstract: An increasing number of data science and machine learning problems rely on computation with tensors, which better capture the multi-way relationships and interactions of data than matrices. When tapping into this critical advantage, a key challenge is to develop computationally efficient and provably correct algorithms for extracting useful information from tensor data that are simultaneously robu… ▽ More

    Submitted 22 February, 2023; v1 submitted 18 June, 2022; originally announced June 2022.

  28. Attention-embedded Quadratic Network (Qttention) for Effective and Interpretable Bearing Fault Diagnosis

    Authors: Jing-Xiao Liao, Hang-Cheng Dong, Zhi-Qi Sun, Jinwei Sun, Shiping Zhang, Feng-Lei Fan

    Abstract: Bearing fault diagnosis is of great importance to decrease the damage risk of rotating machines and further improve economic profits. Recently, machine learning, represented by deep learning, has made great progress in bearing fault diagnosis. However, applying deep learning to such a task still faces a major problem. A deep network is notoriously a black box. It is difficult to know how a model c… ▽ More

    Submitted 7 August, 2022; v1 submitted 1 June, 2022; originally announced June 2022.

    Comments: update abstract add experiments in classification results delete small data experiment add comparison experiments of qttention and convolution

    Report number: Art no. 3511113

    Journal ref: IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1-13, 2023

  29. arXiv:2202.10690  [pdf

    eess.SP

    An Energy-concentrated Wavelet Transform for Time Frequency Analysis of Transient Signals

    Authors: Haoran Dong, Gang Yu

    Abstract: Transient signals are often composed of a series of modes that have multivalued time-dependent instantaneous frequency (IF), which brings challenges to the development of signal processing technology. Fortunately, the group delay (GD) of such signal can be well expressed as a single valued function of frequency. By considering the frequency-domain signal model, we present a postprocessing method c… ▽ More

    Submitted 22 February, 2022; originally announced February 2022.

  30. arXiv:2202.06034  [pdf, other

    cs.SD cs.LG cs.MM eess.AS eess.SP

    Deep Performer: Score-to-Audio Music Performance Synthesis

    Authors: Hao-Wen Dong, Cong Zhou, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing… ▽ More

    Submitted 20 February, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

    Comments: ICASSP 2022 final version with appendix

  31. CrossMoDA 2021 challenge: Benchmark of Cross-Modality Domain Adaptation techniques for Vestibular Schwannoma and Cochlea Segmentation

    Authors: Reuben Dorent, Aaron Kujawa, Marina Ivory, Spyridon Bakas, Nicola Rieke, Samuel Joutard, Ben Glocker, Jorge Cardoso, Marc Modat, Kayhan Batmanghelich, Arseniy Belkov, Maria Baldeon Calisto, Jae Won Choi, Benoit M. Dawant, Hexin Dong, Sergio Escalera, Yubo Fan, Lasse Hansen, Mattias P. Heinrich, Smriti Joshi, Victoriya Kashtanova, Hyeon Gyu Kim, Satoshi Kondo, Christian N. Kruse, Susana K. Lai-Yuen , et al. (15 additional authors not shown)

    Abstract: Domain Adaptation (DA) has recently raised strong interests in the medical imaging community. While a large variety of DA techniques has been proposed for image segmentation, most of these techniques have been validated either on private datasets or on small publicly available datasets. Moreover, these datasets mostly addressed single-class problems. To tackle these limitations, the Cross-Modality… ▽ More

    Submitted 14 December, 2022; v1 submitted 8 January, 2022; originally announced January 2022.

    Comments: In Medical Image Analysis

  32. arXiv:2112.05758  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Edge-Enhanced Dual Discriminator Generative Adversarial Network for Fast MRI with Parallel Imaging Using Multi-view Information

    Authors: Jiahao Huang, Weiping Ding, Jun Lv, Jingwen Yang, Hao Dong, Javier Del Ser, Jun Xia, Tiaojuan Ren, Stephen Wong, Guang Yang

    Abstract: In clinical medicine, magnetic resonance imaging (MRI) is one of the most important tools for diagnosis, triage, prognosis, and treatment planning. However, MRI suffers from an inherent slow data acquisition process because data is collected sequentially in k-space. In recent years, most MRI reconstruction methods proposed in the literature focus on holistic image reconstruction rather than enhanc… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

    Comments: 33 pages, 13 figures, Applied Intelligence

  33. arXiv:2112.05150  [pdf, other

    eess.IV cs.CV

    Deep Recurrent Neural Network with Multi-scale Bi-directional Propagation for Video Deblurring

    Authors: Chao Zhu, Hang Dong, Jinshan Pan, Boyang Liang, Yuhao Huang, Lean Fu, Fei Wang

    Abstract: The success of the state-of-the-art video deblurring methods stems mainly from implicit or explicit estimation of alignment among the adjacent frames for latent video restoration. However, due to the influence of the blur effect, estimating the alignment information from the blurry adjacent frames is not a trivial task. Inaccurate estimations will interfere the following frame restoration. Instead… ▽ More

    Submitted 9 December, 2021; originally announced December 2021.

    Comments: Accepted by AAAI-2022

  34. arXiv:2111.04046  [pdf

    cs.RO eess.SY

    GSG: A Granary Soft Gripper with Mechanical Force Sensing via 3-Dimensional Snap-Through Structure

    Authors: Huixu Dong, Chao-Yu Chen, Chen Qiu, Chen-Hua Yeow, Haoyong Yu

    Abstract: Grasping is an essential capability for most robots in practical applications. Soft robotic grippers are considered as a critical part of robotic grasping and have attracted considerable attention in terms of the advantages of the high compliance and robustness to variance in object geometry; however, they are still limited by the corresponding sensing capabilities and actuation mechanisms. We pro… ▽ More

    Submitted 7 November, 2021; originally announced November 2021.

  35. arXiv:2108.01769  [pdf, other

    cs.CV cs.LG cs.SD eess.AS

    An Empirical Evaluation of End-to-End Polyphonic Optical Music Recognition

    Authors: Sachinda Edirisooriya, Hao-Wen Dong, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: Previous work has shown that neural architectures are able to perform optical music recognition (OMR) on monophonic and homophonic music with high accuracy. However, piano and orchestral scores frequently exhibit polyphonic passages, which add a second dimension to the task. Monophonic and homophonic music can be described as homorhythmic, or having a single musical rhythm. Polyphonic music, on th… ▽ More

    Submitted 3 August, 2021; originally announced August 2021.

    Comments: Accepted to ISMIR 2021

  36. arXiv:2108.00184  [pdf, other

    eess.SY

    Performance assessment and tuning of PID control using TLBO: the single-loop case and PI/P cascade case

    Authors: Wei Zhang, He Dong, Yunlang Xu, Xiaoping Li

    Abstract: Proportional-integral-derivative (PID) control, the most common control strategy in the industry, always suffers from health problems resulting from external disturbances, improper tuning, etc. Therefore, there have been many studies on control performance assessment (CPA) and optimal tuning. Minimum output variance (MOV) is used as a benchmark for CPA of PID, but it is difficult to be found due t… ▽ More

    Submitted 31 July, 2021; originally announced August 2021.

  37. arXiv:2107.05916  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Towards Automatic Instrumentation by Learning to Separate Parts in Symbolic Multitrack Music

    Authors: Hao-Wen Dong, Chris Donahue, Taylor Berg-Kirkpatrick, Julian McAuley

    Abstract: Modern keyboards allow a musician to play multiple instruments at the same time by assigning zones -- fixed pitch ranges of the keyboard -- to different instruments. In this paper, we aim to further extend this idea and examine the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance. In addition to the online, real-time-capable se… ▽ More

    Submitted 21 October, 2021; v1 submitted 13 July, 2021; originally announced July 2021.

    Comments: ISMIR 2021 camera ready

  38. Federated Meta Learning Enhanced Acoustic Radio Cooperative Framework for Ocean of Things Underwater Acoustic Communications

    Authors: Hao Zhao, Fei Ji, Quansheng Guan, Qiang Li, Shuai Wang, Hefeng Dong, Miaowen Wen

    Abstract: Sixth-generation wireless communication (6G) will be an integrated architecture of "space, air, ground and sea". One of the most difficult part of this architecture is the underwater information acquisition which need to transmitt information cross the interface between water and air.In this senario, ocean of things (OoT) will play an important role, because it can serve as a hub connecting Intern… ▽ More

    Submitted 23 May, 2021; originally announced May 2021.

  39. arXiv:2011.12754  [pdf, other

    cs.SD eess.SP physics.ao-ph

    Feature Selection based on Principal Component Analysis for Underwater Source Localization by Deep Learning

    Authors: Xiaoyu Zhu, Hefeng Dong, Pierluigi Salvo Rossi, Martin Landrø

    Abstract: In this paper, we propose an interpretable feature selection method based on principal component analysis (PCA) and principal component regression (PCR), which can extract important features for underwater source localization by only introducing the source location without other prior information. This feature selection method is combined with a two-step framework for underwater source localizatio… ▽ More

    Submitted 25 November, 2020; originally announced November 2020.

  40. arXiv:2009.09361  [pdf, other

    eess.SY cs.LG

    Lyapunov-Based Reinforcement Learning for Decentralized Multi-Agent Control

    Authors: Qingrui Zhang, Hao Dong, Wei Pan

    Abstract: Decentralized multi-agent control has broad applications, ranging from multi-robot cooperation to distributed sensor networks. In decentralized multi-agent control, systems are complex with unknown or highly uncertain dynamics, where traditional model-based control methods can hardly be applied. Compared with model-based control in control theory, deep reinforcement learning (DRL) is promising to… ▽ More

    Submitted 20 September, 2020; originally announced September 2020.

    Comments: Accepted to The 2nd International Conference on Distributed Artificial Intelligence

  41. arXiv:2009.00072  [pdf

    cs.NI cs.DC eess.SP

    Under Water Waste Cleaning by Mobile Edge Computing and Intelligent Image Processing Based Robotic Fish

    Authors: Subhadeep Sahoo, Xiao Han Dong, Zi Qian Liu, Joydeep Sahoo

    Abstract: As water pollution is a serious threat to underwater resources, i.e., underwater plants and species, we focus on protecting the resources by cleaning the non-biodegradable waste from the water. The waste can be recycled for further usage. Here we design a robotic fish which mainly comprises optical biosensor, camera module, piston module, and wireless transceiver. By exploiting the LTE and 5G netw… ▽ More

    Submitted 31 August, 2020; originally announced September 2020.

    Comments: This is an innovative project report awarded by Ericsson Innovation Award 2019

  42. arXiv:2008.01951  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    MusPy: A Toolkit for Symbolic Music Generation

    Authors: Hao-Wen Dong, Ke Chen, Julian McAuley, Taylor Berg-Kirkpatrick

    Abstract: In this paper, we present MusPy, an open source Python library for symbolic music generation. MusPy provides easy-to-use tools for essential components in a music generation system, including dataset management, data I/O, data preprocessing and model evaluation. In order to showcase its potential, we present statistical analysis of the eleven datasets currently supported by MusPy. Moreover, we con… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Comments: Accepted by International Society for Music Information Retrieval Conference (ISMIR), 2020

  43. arXiv:2001.03831  [pdf, other

    cs.CV eess.IV

    A Comparative Study for Non-rigid Image Registration and Rigid Image Registration

    Authors: Xiaoran Zhang, Hexiang Dong, Di Gao, Xiao Zhao

    Abstract: Image registration algorithms can be generally categorized into two groups: non-rigid and rigid. Recently, many deep learning-based algorithms employ a neural net to characterize non-rigid image registration function. However, do they always perform better? In this study, we compare the state-of-art deep learning-based non-rigid registration approach with rigid registration approach. The data is g… ▽ More

    Submitted 11 January, 2020; originally announced January 2020.

  44. arXiv:2001.02360  [pdf, other

    cs.SD cs.LG eess.AS

    Automatic Melody Harmonization with Triad Chords: A Comparative Study

    Authors: Yin-Cheng Yeh, Wen-Yi Hsiao, Satoru Fukayama, Tetsuro Kitahara, Benjamin Genchel, Hao-Min Liu, Hao-Wen Dong, Yian Chen, Terence Leong, Yi-Hsuan Yang

    Abstract: Several prior works have proposed various methods for the task of automatic melody harmonization, in which a model aims to generate a sequence of chords to serve as the harmonic accompaniment of a given multiple-bar melody sequence. In this paper, we present a comparative study evaluating and comparing the performance of a set of canonical approaches to this task, including a template matching bas… ▽ More

    Submitted 27 April, 2021; v1 submitted 7 January, 2020; originally announced January 2020.

    Comments: 20 pages, 6 figures, published in Journal of New Music Research (JNMR), Volume 50 Issue 1

  45. arXiv:1911.03461  [pdf, other

    eess.IV cs.CV

    AIM 2019 Challenge on Image Demoireing: Methods and Results

    Authors: Shanxin Yuan, Radu Timofte, Gregory Slabaugh, Ales Leonardis, Bolun Zheng, Xin Ye, Xiang Tian, Yaowu Chen, Xi Cheng, Zhenyong Fu, Jian Yang, Ming Hong, Wenying Lin, Wenjin Yang, Yanyun Qu, Hong-Kyu Shin, Joon-Yeon Kim, Sung-Jea Ko, Hang Dong, Yu Guo, Jie Wang, Xuan Ding, Zongyan Han, Sourya Dipta Das, Kuldeep Purohit , et al. (3 additional authors not shown)

    Abstract: This paper reviews the first-ever image demoireing challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ICCV 2019. This paper describes the challenge, and focuses on the proposed solutions and their results. Demoireing is a difficult task of removing moire patterns from an image to reveal an underlying clean image. A new dataset, called LCDMoire wa… ▽ More

    Submitted 8 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1911.02498

  46. arXiv:1906.00884  [pdf, other

    cs.CV eess.IV

    Fashion Editing with Adversarial Parsing Learning

    Authors: Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin

    Abstract: Interactive fashion image manipulation, which enables users to edit images with sketches and color strokes, is an interesting research problem with great application value. Existing works often treat it as a general inpainting task and do not fully leverage the semantic structural information in fashion images. Moreover, they directly utilize conventional convolution and normalization layers to re… ▽ More

    Submitted 28 September, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: 22 pages, 18 figures

  47. arXiv:1808.00312  [pdf, ps, other

    eess.SY math.OC

    On a hierarchical control strategy for multi-agent formation without reflection

    Authors: Toshiharu Sugie, Brian D. O. Anderson, Zhiyong Sun, Huichao Dong

    Abstract: This paper considers a formation shape control problem for point agents in a two-dimensional ambient space, where the control is distributed, is based on achieving desired distances between nominated agent pairs, and avoids the possibility of reflection ambiguities. This has potential applications for large-scale multi-agent systems having simple information exchange structure. One solution to thi… ▽ More

    Submitted 1 August, 2018; originally announced August 2018.

    Comments: Accepted by the 57th IEEE Conference on Decision and Control

  48. arXiv:1804.09399  [pdf, other

    cs.LG cs.AI cs.SD eess.AS stat.ML

    Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation

    Authors: Hao-Wen Dong, Yi-Hsuan Yang

    Abstract: It has been shown recently that deep convolutional generative adversarial networks (GANs) can learn to generate music in the form of piano-rolls, which represent music by binary-valued time-pitch matrices. However, existing models can only generate real-valued piano-rolls and require further post-processing, such as hard thresholding (HT) or Bernoulli sampling (BS), to obtain the final binary-valu… ▽ More

    Submitted 6 October, 2018; v1 submitted 25 April, 2018; originally announced April 2018.

    Comments: A preliminary version of this paper appeared in ISMIR 2018. In this version, we added an appendix to provide figures of sample results and remarks on the end-to-end models

  49. arXiv:1709.06298  [pdf, other

    eess.AS cs.AI cs.LG cs.SD stat.ML

    MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment

    Authors: Hao-Wen Dong, Wen-Yi Hsiao, Li-Chia Yang, Yi-Hsuan Yang

    Abstract: Generating music has a few notable differences from generating images and videos. First, music is an art of time, necessitating a temporal model. Second, music is usually composed of multiple instruments/tracks with their own temporal dynamics, but collectively they unfold over time interdependently. Lastly, musical notes are often grouped into chords, arpeggios or melodies in polyphonic music, an… ▽ More

    Submitted 24 November, 2017; v1 submitted 19 September, 2017; originally announced September 2017.

    Comments: to appear at AAAI 2018