Skip to main content

Showing 1–26 of 26 results for author: Yang, S H

  1. arXiv:2407.03132  [pdf, other

    cs.SD cs.AI cs.CL cs.LG eess.AS

    Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Paula Andrea Pérez-Toro, Maria Schuster, Elmar Noeth, Bjoern Heismann, Andreas Maier, Seung Hee Yang

    Abstract: This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: to be published in Interspeech 2024 proceedings

  2. arXiv:2406.14576  [pdf, other

    eess.AS

    Towards Intelligent Speech Assistants in Operating Rooms: A Multimodal Model for Surgical Workflow Analysis

    Authors: Kubilay Can Demir, Belen Lojo Rodriguez, Tobias Weise, Andreas Maier, Seung Hee Yang

    Abstract: To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 5 Pages, Interspeech 2024

    MSC Class: 00b20

  3. arXiv:2404.08064  [pdf

    eess.AS cs.AI cs.CR cs.LG

    The Impact of Speech Anonymization on Pathology and Its Limits

    Authors: Soroosh Tayebi Arasteh, Tomas Arias-Vergara, Paula Andrea Perez-Toro, Tobias Weise, Kai Packhaeuser, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva… ▽ More

    Submitted 22 June, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

  4. arXiv:2403.03383  [pdf, other

    physics.optics physics.app-ph

    DaISy: Diffuser-aided Sub-THz Imaging System

    Authors: Shao-Hsuan Wu, Yiyao Zhang, Ke Chen, Shang Hua Yang

    Abstract: Sub-terahertz (Sub-THz) waves possess exceptional attributes, capable of penetrating non-metallic and non-polarized materials while ensuring bio-safety. However, their practicality in imaging is marred by the emergence of troublesome speckle artifacts, primarily due to diffraction effects caused by wavelengths comparable to object dimensions. In addressing this limitation, we present the Diffuser-… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: These authors (Shao-Hsuan Wu and Yiyao Zhang) contributed equally to this work. 15 pages, 7 figures. Supplemental Document: https://doi.org/10.6084/m9.figshare.25328746

    Journal ref: Optics Express (OE) 2024

  5. PoCaPNet: A Novel Approach for Surgical Phase Recognition Using Speech and X-Ray Images

    Authors: Kubilay Can Demir, Tobias Weise, Matthias May, Axel Schmid, Andreas Maier, Seung Hee Yang

    Abstract: Surgical phase recognition is a challenging and necessary task for the development of context-aware intelligent systems that can support medical personnel for better patient care and effective operating room management. In this paper, we present a surgical phase recognition framework that employs a Multi-Stage Temporal Convolution Network using speech and X-Ray images for the first time. We evalua… ▽ More

    Submitted 25 May, 2023; originally announced May 2023.

    Comments: 5 Pages, 3 figures, INTERSPEECH 2023

    MSC Class: 00b20

  6. Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages

    Authors: Soroosh Tayebi Arasteh, Cristian David Rios-Urrego, Elmar Noeth, Andreas Maier, Seung Hee Yang, Jan Rusz, Juan Rafael Orozco-Arroyave

    Abstract: Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing pati… ▽ More

    Submitted 21 August, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: INTERSPEECH 2023, pp. 5003--5007, Dublin, Ireland

    Journal ref: INTERSPEECH 2023

  7. arXiv:2206.12320  [pdf, other

    cs.SD cs.AI eess.AS

    PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis

    Authors: Kubilay Can Demir, Matthias May, Axel Schmid, Michael Uder, Katharina Breininger, Tobias Weise, Andreas Maier, Seung Hee Yang

    Abstract: This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for developing a smart speech assistant in ope… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: 8 pages, 4 figures, Text, Speech and Dialogue 2022 Conference

    MSC Class: 00b20

  8. arXiv:2204.06450  [pdf, other

    cs.SD cs.LG eess.AS

    The effect of speech pathology on automatic speaker verification -- a large-scale study

    Authors: Soroosh Tayebi Arasteh, Tobias Weise, Maria Schuster, Elmar Noeth, Andreas Maier, Seung Hee Yang

    Abstract: Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects… ▽ More

    Submitted 22 November, 2023; v1 submitted 13 April, 2022; originally announced April 2022.

    Comments: Published in Scientific Reports

    Journal ref: Sci Rep 13, 20476 (2023)

  9. arXiv:2204.04016  [pdf, other

    eess.AS cs.CL cs.LG cs.SD q-bio.QM

    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment

    Authors: Tobias Weise, Philipp Klumpp, Kubilay Can Demir, Andreas Maier, Elmar Noeth, Bjoern Heismann, Maria Schuster, Seung Hee Yang

    Abstract: Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech represen… ▽ More

    Submitted 27 June, 2022; v1 submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted and Accepted at INTERSPEECH2022

  10. arXiv:2204.01677  [pdf, other

    cs.CL

    Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Juan Camilo Vásquez-Correa, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  11. arXiv:2204.01670  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition

    Authors: Abner Hernandez, Paula Andrea Pérez-Toro, Elmar Nöth, Juan Rafael Orozco-Arroyave, Andreas Maier, Seung Hee Yang

    Abstract: State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of sp… ▽ More

    Submitted 4 April, 2022; originally announced April 2022.

    Comments: Submitted for review at Interspeech 2022

  12. arXiv:2202.03540  [pdf, other

    cs.CV

    SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks

    Authors: Aline Sindel, Abner Hernandez, Seung Hee Yang, Vincent Christlein, Andreas Maier

    Abstract: With the increasing number of online learning material in the web, search for specific content in lecture videos can be time consuming. Therefore, automatic slide extraction from the lecture videos can be helpful to give a brief overview of the main content and to support the students in their studies. For this task, we propose a deep learning method to detect slide transitions in lectures videos.… ▽ More

    Submitted 7 February, 2022; originally announced February 2022.

    Comments: 6 pages, 5 figures, 1 table, accepted to OAGM Workshop 2021

  13. arXiv:2112.03678  [pdf, other

    cs.CR cs.CV cs.CY cs.LG eess.IV physics.med-ph

    Does Proprietary Software Still Offer Protection of Intellectual Property in the Age of Machine Learning? -- A Case Study using Dual Energy CT Data

    Authors: Andreas Maier, Seung Hee Yang, Farhad Maleki, Nikesh Muthukrishnan, Reza Forghani

    Abstract: In the domain of medical image processing, medical device manufacturers protect their intellectual property in many cases by shipping only compiled software, i.e. binary code which can be executed but is difficult to be understood by a potential attacker. In this paper, we investigate how well this procedure is able to protect image processing algorithms. In particular, we investigate whether the… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: 6 pages, 2 figures, 1 table, accepted on BVM 2022

  14. arXiv:2108.04543  [pdf, other

    cs.LG cs.CV physics.med-ph

    Known Operator Learning and Hybrid Machine Learning in Medical Imaging -- A Review of the Past, the Present, and the Future

    Authors: Andreas Maier, Harald Köstler, Marco Heisig, Patrick Krauss, Seung Hee Yang

    Abstract: In this article, we perform a review of the state-of-the-art of hybrid machine learning in medical imaging. We start with a short summary of the general developments of the past in machine learning and how general and specialized approaches have been in competition in the past decades. A particular focus will be the theoretical and experimental evidence pro and contra hybrid modelling. Next, we in… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: 22 pages, 4 figures, submitted to "Progress in Biomedical Engineering"

    Journal ref: Prog. Biomed. Eng. 4 022002 (2022)

  15. arXiv:2101.09651  [pdf

    physics.app-ph

    The Blow-off Impulse Equivalence between Multi-Energy Composite Spectrum Electron Beam and Powerful Pulsed X-ray

    Authors: D. W. Wang, S. H. Yang, S. Wang, J. Wang, H. P. Li

    Abstract: The electron beam, one of the most effective approaches to simulate the irradiation effects of powerful pulsed X-ray in the laboratory, plays an important role in the experiment of simulating thermodynamic effects of powerful pulsed X-ray. This paper studies the thermodynamics equivalence between multi-energy composite spectrum electron beam and blackbody spectrum X-ray, which is helpful to quickl… ▽ More

    Submitted 24 January, 2021; originally announced January 2021.

    Comments: 17 pages, 14 figures

  16. Sunspot penumbral filaments intruding into a light bridge and the resultant reconnection jets

    Authors: Y. J. Hou, T. Li, S. H. Zhong, S. H. Yang, Y. L. Guo, X. H. Li, J. Zhang, Y. Y. Xiang

    Abstract: Penumbral filaments and light bridges are prominent structures inside sunspots and are important for understanding the nature of sunspot magnetic fields and magneto-convection underneath. We investigate an interesting event where several penumbral filaments intruded into a sunspot light bridge for more insights into magnetic fields of the sunspot penumbral filament and light bridge, as well as the… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

    Comments: 14 pages, 9 figures, 3 movies, abstract shortened to meet arXiv requirements, accepted for publication in A&A

    Journal ref: A&A 642, A44 (2020)

  17. Fast degradation of the circular flare ribbon on 2014 August 24

    Authors: Q. M. Zhang, S. H. Yang, T. Li, Y. J. Hou, Y. Li

    Abstract: The separation and elongation motions of solar flare ribbons have extensively been investigated. The degradation and disappearance of ribbons have rarely been explored. In this paper, we report our multiwavelength observations of a C5.5 circular-ribbon flare associated with two jets (jet1 and jet2) on 2014 August 24, focusing on the fast degradation of the outer circular ribbon (CR). The flare, co… ▽ More

    Submitted 14 April, 2020; originally announced April 2020.

    Comments: 4 pages, 5 figures, accepted for publication in A&A Letters

    Journal ref: A&A 636, L11 (2020)

  18. arXiv:2001.04260  [pdf

    eess.AS cs.CL cs.SD

    Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training

    Authors: Seung Hee Yang, Minhwa Chung

    Abstract: Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were rec… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: To be Published on the 24th February in BIOSIGNALS 2020. arXiv admin note: text overlap with arXiv:1904.09407

  19. arXiv:2001.03278  [pdf

    cs.CL

    A Scalable Chatbot Platform Leveraging Online Community Posts: A Proof-of-Concept Study

    Authors: Sihyeon Jo, Seungryong Yoo, Sangwon Im, Seung Hee Yang, Tong Zuo, Hee-Eun Kim, SangWook Han, Seong-Woo Kim

    Abstract: The development of natural language processing algorithms and the explosive growth of conversational data are encouraging researches on the human-computer conversation. Still, getting qualified conversational data on a large scale is difficult and expensive. In this paper, we verify the feasibility of constructing a data-driven chatbot with processed online community posts by using them as pseudo-… ▽ More

    Submitted 9 January, 2020; originally announced January 2020.

    Comments: To be Published on the 10th February, 2020, in HCI (Human-Computer Interaction) Conference 2020, Republic of Korea

  20. arXiv:1904.09407  [pdf

    cs.CL

    Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training

    Authors: Seung Hee Yang, Minhwa Chung

    Abstract: Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PS… ▽ More

    Submitted 20 April, 2019; originally announced April 2019.

  21. Eruption of a multi-flux-rope system in solar active region 12673 leading to the two largest flares in Solar Cycle 24

    Authors: Y. J. Hou, J. Zhang, T. Li, S. H. Yang, X. H. Li

    Abstract: Solar active region (AR) 12673 in 2017 September produced two largest flares in Solar Cycle 24: the X9.3 flare on September 06 and the X8.2 flare on September 10. We attempt to investigate the evolutions of the two great flares and their associated complex magnetic system in detail. Aided by the NLFFF modeling, we identify a double-decker flux rope configuration above the polarity inversion line (… ▽ More

    Submitted 21 October, 2018; v1 submitted 21 August, 2018; originally announced August 2018.

    Comments: 10 pages, 8 figures. To be published in A&A

    Journal ref: A&A 619, A100 (2018)

  22. A blowout jet associated with one obvious extreme-ultraviolet wave and one complicated coronal mass ejection event

    Authors: Y. H. Miao, Y. Liu, H. B. Li, Y. D. Shen, S. H. Yang, A. Elmhamdi, A. S. Kordi, Z. Z. Abidin

    Abstract: In this paper, we present a detailed analysis of a coronal blowout jet eruption which was associated with an obvious extreme-ultraviolet (EUV) wave and one complicated coronal mass ejection (CME) event based on the multi-wavelength and multi-view-angle observations from {\sl Solar Dynamics Observatory} and {\sl Solar Terrestrial Relations Observatory}. It is found that the triggering of the blowou… ▽ More

    Submitted 23 December, 2018; v1 submitted 29 March, 2017; originally announced March 2017.

    Comments: APJ, Accepted October 19, 2018

  23. Light Walls Around Sunspots Observed by the Interface Region Imaging Spectrograph

    Authors: Y. J. Hou, T. Li, S. H. Yang, J. Zhang

    Abstract: The Interface Region Imaging Spectrograph (IRIS) mission provides high-resolution observations of the chromosphere and transition region. We try to determine whether the light walls exist somewhere else in active regions besides light bridges. Employing half-year high tempo-spatial data from the IRIS, we find lots of light walls either around sunspots or above light bridges. For the first time, we… ▽ More

    Submitted 2 April, 2016; originally announced April 2016.

    Comments: 4 pages, 4 figures, Accepted for publication in A&A Letters

    Journal ref: A&A 589, L7 (2016)

  24. arXiv:1203.3530  [pdf

    cs.LG cs.CV stat.ML

    Hybrid Generative/Discriminative Learning for Automatic Image Annotation

    Authors: Shuang Hong Yang, Jiang Bian, Hongyuan Zha

    Abstract: Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags is usually huge (as large as the vocabulary size) yet each image is only related to a few of them. T… ▽ More

    Submitted 15 March, 2012; originally announced March 2012.

    Comments: Appears in Proceedings of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI2010)

    Report number: UAI-P-2010-PG-683-690

  25. arXiv:1010.0621  [pdf, ps, other

    stat.ML cs.IR cs.SI stat.AP

    Local Optimality of User Choices and Collaborative Competitive Filtering

    Authors: Shuang Hong Yang

    Abstract: While a user's preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) approaches take into account only the binary events of user actions but totally disregard the contexts in which users' decisions are made. In this p… ▽ More

    Submitted 25 February, 2011; v1 submitted 4 October, 2010; originally announced October 2010.

    Comments: 27 pages, 4 figure

    ACM Class: I.2.6; H.1.1; H.3.3

  26. Response of the solar atmosphere to magnetic field evolution in a coronal hole region

    Authors: S. H. Yang, J. Zhang, C. L. Jin, L. P. Li, H. Y. Duan

    Abstract: Methods. We study an equatorial CH observed simultaneously by HINODE and STEREO on July 27, 2007. The HINODE/SP maps are adopted to derive the physical parameters of the photosphere and to research the magnetic field evolution and distribution. The G band and Ca II H images with high tempo-spatial resolution from HINODE/BFI and the multi-wavelength data from STEREO/EUVI are utilized to study the… ▽ More

    Submitted 17 April, 2009; originally announced April 2009.

    Comments: 9 pages, 9 figures. A&A, in press