-
Speaker- and Text-Independent Estimation of Articulatory Movements and Phoneme Alignments from Speech
Authors:
Tobias Weise,
Philipp Klumpp,
Kubilay Can Demir,
Paula Andrea Pérez-Toro,
Maria Schuster,
Elmar Noeth,
Bjoern Heismann,
Andreas Maier,
Seung Hee Yang
Abstract:
This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task le…
▽ More
This paper introduces a novel combination of two tasks, previously treated separately: acoustic-to-articulatory speech inversion (AAI) and phoneme-to-articulatory (PTA) motion estimation. We refer to this joint task as acoustic phoneme-to-articulatory speech inversion (APTAI) and explore two different approaches, both working speaker- and text-independently during inference. We use a multi-task learning setup, with the end-to-end goal of taking raw speech as input and estimating the corresponding articulatory movements, phoneme sequence, and phoneme alignment. While both proposed approaches share these same requirements, they differ in their way of achieving phoneme-related predictions: one is based on frame classification, the other on a two-staged training procedure and forced alignment. We reach competitive performance of 0.73 mean correlation for the AAI task and achieve up to approximately 87% frame overlap compared to a state-of-the-art text-dependent phoneme force aligner.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Towards Intelligent Speech Assistants in Operating Rooms: A Multimodal Model for Surgical Workflow Analysis
Authors:
Kubilay Can Demir,
Belen Lojo Rodriguez,
Tobias Weise,
Andreas Maier,
Seung Hee Yang
Abstract:
To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations…
▽ More
To develop intelligent speech assistants and integrate them seamlessly with intra-operative decision-support frameworks, accurate and efficient surgical phase recognition is a prerequisite. In this study, we propose a multimodal framework based on Gated Multimodal Units (GMU) and Multi-Stage Temporal Convolutional Networks (MS-TCN) to recognize surgical phases of port-catheter placement operations. Our method merges speech and image models and uses them separately in different surgical phases. Based on the evaluation of 28 operations, we report a frame-wise accuracy of 92.65 $\pm$ 3.52% and an F1-score of 92.30 $\pm$ 3.82%. Our results show approximately 10% improvement in both metrics over previous work and validate the effectiveness of integrating multimodal data for the surgical phase recognition task. We further investigate the contribution of individual data channels by comparing mono-modal models with multimodal models.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The Impact of Speech Anonymization on Pathology and Its Limits
Authors:
Soroosh Tayebi Arasteh,
Tomas Arias-Vergara,
Paula Andrea Perez-Toro,
Tobias Weise,
Kai Packhaeuser,
Maria Schuster,
Elmar Noeth,
Andreas Maier,
Seung Hee Yang
Abstract:
Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where priva…
▽ More
Integration of speech into healthcare has intensified privacy concerns due to its potential as a non-invasive biomarker containing individual biometric information. In response, speaker anonymization aims to conceal personally identifiable information while retaining crucial linguistic content. However, the application of anonymization techniques to pathological speech, a critical area where privacy is especially vital, has not been extensively examined. This study investigates anonymization's impact on pathological speech across over 2,700 speakers from multiple German institutions, focusing on privacy, pathological utility, and demographic fairness. We explore both deep-learning-based and signal processing-based anonymization methods, and document substantial privacy improvements across disorders-evidenced by equal error rate increases up to 1933%, with minimal overall impact on utility. Specific disorders such as Dysarthria, Dysphonia, and Cleft Lip and Palate experienced minimal utility changes, while Dysglossia showed slight improvements. Our findings underscore that the impact of anonymization varies substantially across different disorders. This necessitates disorder-specific anonymization strategies to optimally balance privacy with diagnostic utility. Additionally, our fairness analysis revealed consistent anonymization effects across most of the demographics. This study demonstrates the effectiveness of anonymization in pathological speech for enhancing privacy, while also highlighting the importance of customized and disorder-specific approaches to account for inversion attacks.
△ Less
Submitted 22 June, 2024; v1 submitted 11 April, 2024;
originally announced April 2024.
-
DaISy: Diffuser-aided Sub-THz Imaging System
Authors:
Shao-Hsuan Wu,
Yiyao Zhang,
Ke Chen,
Shang Hua Yang
Abstract:
Sub-terahertz (Sub-THz) waves possess exceptional attributes, capable of penetrating non-metallic and non-polarized materials while ensuring bio-safety. However, their practicality in imaging is marred by the emergence of troublesome speckle artifacts, primarily due to diffraction effects caused by wavelengths comparable to object dimensions. In addressing this limitation, we present the Diffuser-…
▽ More
Sub-terahertz (Sub-THz) waves possess exceptional attributes, capable of penetrating non-metallic and non-polarized materials while ensuring bio-safety. However, their practicality in imaging is marred by the emergence of troublesome speckle artifacts, primarily due to diffraction effects caused by wavelengths comparable to object dimensions. In addressing this limitation, we present the Diffuser-aided sub-THz Imaging System (DaISy), which utilizes a diffuser and a focusing lens to convert coherent waves into incoherent counterparts. The cornerstone of our progress lies in a coherence theory-based theoretical framework, pivotal for designing and validating the THz diffuser, and systematically evaluating speckle phenomena. Our experimental results utilizing DaISy reveal substantial improvements in imaging quality and nearly diffraction-limited spatial resolution. Moreover, we demonstrate a tangible application of DaISy in the scenario of security scanning, highlighting the versatile potential of sub-THz waves in miscellaneous fields.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
PoCaPNet: A Novel Approach for Surgical Phase Recognition Using Speech and X-Ray Images
Authors:
Kubilay Can Demir,
Tobias Weise,
Matthias May,
Axel Schmid,
Andreas Maier,
Seung Hee Yang
Abstract:
Surgical phase recognition is a challenging and necessary task for the development of context-aware intelligent systems that can support medical personnel for better patient care and effective operating room management. In this paper, we present a surgical phase recognition framework that employs a Multi-Stage Temporal Convolution Network using speech and X-Ray images for the first time. We evalua…
▽ More
Surgical phase recognition is a challenging and necessary task for the development of context-aware intelligent systems that can support medical personnel for better patient care and effective operating room management. In this paper, we present a surgical phase recognition framework that employs a Multi-Stage Temporal Convolution Network using speech and X-Ray images for the first time. We evaluate our proposed approach using our dataset that comprises 31 port-catheter placement operations and report 82.56 \% frame-wise accuracy with eight surgical phases. Additionally, we investigate the design choices in the temporal model and solutions for the class-imbalance problem. Our experiments demonstrate that speech and X-Ray data can be effectively utilized for surgical phase recognition, providing a foundation for the development of speech assistants in operating rooms of the future.
△ Less
Submitted 25 May, 2023;
originally announced May 2023.
-
Federated learning for secure development of AI models for Parkinson's disease detection using speech from different languages
Authors:
Soroosh Tayebi Arasteh,
Cristian David Rios-Urrego,
Elmar Noeth,
Andreas Maier,
Seung Hee Yang,
Jan Rusz,
Juan Rafael Orozco-Arroyave
Abstract:
Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing pati…
▽ More
Parkinson's disease (PD) is a neurological disorder impacting a person's speech. Among automatic PD assessment methods, deep learning models have gained particular interest. Recently, the community has explored cross-pathology and cross-language models which can improve diagnostic accuracy even further. However, strict patient data privacy regulations largely prevent institutions from sharing patient speech data with each other. In this paper, we employ federated learning (FL) for PD detection using speech signals from 3 real-world language corpora of German, Spanish, and Czech, each from a separate institution. Our results indicate that the FL model outperforms all the local models in terms of diagnostic accuracy, while not performing very differently from the model based on centrally combined training sets, with the advantage of not requiring any data sharing among collaborators. This will simplify inter-institutional collaborations, resulting in enhancement of patient outcomes.
△ Less
Submitted 21 August, 2023; v1 submitted 18 May, 2023;
originally announced May 2023.
-
PoCaP Corpus: A Multimodal Dataset for Smart Operating Room Speech Assistant using Interventional Radiology Workflow Analysis
Authors:
Kubilay Can Demir,
Matthias May,
Axel Schmid,
Michael Uder,
Katharina Breininger,
Tobias Weise,
Andreas Maier,
Seung Hee Yang
Abstract:
This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for developing a smart speech assistant in ope…
▽ More
This paper presents a new multimodal interventional radiology dataset, called PoCaP (Port Catheter Placement) Corpus. This corpus consists of speech and audio signals in German, X-ray images, and system commands collected from 31 PoCaP interventions by six surgeons with average duration of 81.4 $\pm$ 41.0 minutes. The corpus aims to provide a resource for developing a smart speech assistant in operating rooms. In particular, it may be used to develop a speech controlled system that enables surgeons to control the operation parameters such as C-arm movements and table positions. In order to record the dataset, we acquired consent by the institutional review board and workers council in the University Hospital Erlangen and by the patients for data privacy. We describe the recording set-up, data structure, workflow and preprocessing steps, and report the first PoCaP Corpus speech recognition analysis results with 11.52 $\%$ word error rate using pretrained models. The findings suggest that the data has the potential to build a robust command recognition system and will allow the development of a novel intervention support systems using speech and image processing in the medical domain.
△ Less
Submitted 24 June, 2022;
originally announced June 2022.
-
The effect of speech pathology on automatic speaker verification -- a large-scale study
Authors:
Soroosh Tayebi Arasteh,
Tobias Weise,
Maria Schuster,
Elmar Noeth,
Andreas Maier,
Seung Hee Yang
Abstract:
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects…
▽ More
Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n=3,800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of 0.89% with a standard deviation of 0.06%, outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.
△ Less
Submitted 22 November, 2023; v1 submitted 13 April, 2022;
originally announced April 2022.
-
Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment
Authors:
Tobias Weise,
Philipp Klumpp,
Kubilay Can Demir,
Andreas Maier,
Elmar Noeth,
Bjoern Heismann,
Maria Schuster,
Seung Hee Yang
Abstract:
Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech represen…
▽ More
Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.
△ Less
Submitted 27 June, 2022; v1 submitted 8 April, 2022;
originally announced April 2022.
-
Self-Supervised Speech Representations Preserve Speech Characteristics while Anonymizing Voices
Authors:
Abner Hernandez,
Paula Andrea Pérez-Toro,
Juan Camilo Vásquez-Correa,
Juan Rafael Orozco-Arroyave,
Andreas Maier,
Seung Hee Yang
Abstract:
Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech…
▽ More
Collecting speech data is an important step in training speech recognition systems and other speech-based machine learning models. However, the issue of privacy protection is an increasing concern that must be addressed. The current study investigates the use of voice conversion as a method for anonymizing voices. In particular, we train several voice conversion models using self-supervised speech representations including Wav2Vec2.0, Hubert and UniSpeech. Converted voices retain a low word error rate within 1% of the original voice. Equal error rate increases from 1.52% to 46.24% on the LibriSpeech test set and from 3.75% to 45.84% on speakers from the VCTK corpus which signifies degraded performance on speaker verification. Lastly, we conduct experiments on dysarthric speech data to show that speech features relevant to articulation, prosody, phonation and phonology can be extracted from anonymized voices for discriminating between healthy and pathological speech.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
Cross-lingual Self-Supervised Speech Representations for Improved Dysarthric Speech Recognition
Authors:
Abner Hernandez,
Paula Andrea Pérez-Toro,
Elmar Nöth,
Juan Rafael Orozco-Arroyave,
Andreas Maier,
Seung Hee Yang
Abstract:
State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of sp…
▽ More
State-of-the-art automatic speech recognition (ASR) systems perform well on healthy speech. However, the performance on impaired speech still remains an issue. The current study explores the usefulness of using Wav2Vec self-supervised speech representations as features for training an ASR system for dysarthric speech. Dysarthric speech recognition is particularly difficult as several aspects of speech such as articulation, prosody and phonation can be impaired. Specifically, we train an acoustic model with features extracted from Wav2Vec, Hubert, and the cross-lingual XLSR model. Results suggest that speech representations pretrained on large unlabelled data can improve word error rate (WER) performance. In particular, features from the multilingual model led to lower WERs than filterbanks (Fbank) or models trained on a single language. Improvements were observed in English speakers with cerebral palsy caused dysarthria (UASpeech corpus), Spanish speakers with Parkinsonian dysarthria (PC-GITA corpus) and Italian speakers with paralysis-based dysarthria (EasyCall corpus). Compared to using Fbank features, XLSR-based features reduced WERs by 6.8%, 22.0%, and 7.0% for the UASpeech, PC-GITA, and EasyCall corpus, respectively.
△ Less
Submitted 4 April, 2022;
originally announced April 2022.
-
SliTraNet: Automatic Detection of Slide Transitions in Lecture Videos using Convolutional Neural Networks
Authors:
Aline Sindel,
Abner Hernandez,
Seung Hee Yang,
Vincent Christlein,
Andreas Maier
Abstract:
With the increasing number of online learning material in the web, search for specific content in lecture videos can be time consuming. Therefore, automatic slide extraction from the lecture videos can be helpful to give a brief overview of the main content and to support the students in their studies. For this task, we propose a deep learning method to detect slide transitions in lectures videos.…
▽ More
With the increasing number of online learning material in the web, search for specific content in lecture videos can be time consuming. Therefore, automatic slide extraction from the lecture videos can be helpful to give a brief overview of the main content and to support the students in their studies. For this task, we propose a deep learning method to detect slide transitions in lectures videos. We first process each frame of the video by a heuristic-based approach using a 2-D convolutional neural network to predict transition candidates. Then, we increase the complexity by employing two 3-D convolutional neural networks to refine the transition candidates. Evaluation results demonstrate the effectiveness of our method in finding slide transitions.
△ Less
Submitted 7 February, 2022;
originally announced February 2022.
-
Does Proprietary Software Still Offer Protection of Intellectual Property in the Age of Machine Learning? -- A Case Study using Dual Energy CT Data
Authors:
Andreas Maier,
Seung Hee Yang,
Farhad Maleki,
Nikesh Muthukrishnan,
Reza Forghani
Abstract:
In the domain of medical image processing, medical device manufacturers protect their intellectual property in many cases by shipping only compiled software, i.e. binary code which can be executed but is difficult to be understood by a potential attacker. In this paper, we investigate how well this procedure is able to protect image processing algorithms. In particular, we investigate whether the…
▽ More
In the domain of medical image processing, medical device manufacturers protect their intellectual property in many cases by shipping only compiled software, i.e. binary code which can be executed but is difficult to be understood by a potential attacker. In this paper, we investigate how well this procedure is able to protect image processing algorithms. In particular, we investigate whether the computation of mono-energetic images and iodine maps from dual energy CT data can be reverse-engineered by machine learning methods. Our results indicate that both can be approximated using only one single slice image as training data at a very high accuracy with structural similarity greater than 0.98 in all investigated cases.
△ Less
Submitted 6 December, 2021;
originally announced December 2021.
-
Known Operator Learning and Hybrid Machine Learning in Medical Imaging -- A Review of the Past, the Present, and the Future
Authors:
Andreas Maier,
Harald Köstler,
Marco Heisig,
Patrick Krauss,
Seung Hee Yang
Abstract:
In this article, we perform a review of the state-of-the-art of hybrid machine learning in medical imaging. We start with a short summary of the general developments of the past in machine learning and how general and specialized approaches have been in competition in the past decades. A particular focus will be the theoretical and experimental evidence pro and contra hybrid modelling. Next, we in…
▽ More
In this article, we perform a review of the state-of-the-art of hybrid machine learning in medical imaging. We start with a short summary of the general developments of the past in machine learning and how general and specialized approaches have been in competition in the past decades. A particular focus will be the theoretical and experimental evidence pro and contra hybrid modelling. Next, we inspect several new developments regarding hybrid machine learning with a particular focus on so-called known operator learning and how hybrid approaches gain more and more momentum across essentially all applications in medical imaging and medical image analysis. As we will point out by numerous examples, hybrid models are taking over in image reconstruction and analysis. Even domains such as physical simulation and scanner and acquisition design are being addressed using machine learning grey box modelling approaches. Towards the end of the article, we will investigate a few future directions and point out relevant areas in which hybrid modelling, meta learning, and other domains will likely be able to drive the state-of-the-art ahead.
△ Less
Submitted 10 August, 2021;
originally announced August 2021.
-
The Blow-off Impulse Equivalence between Multi-Energy Composite Spectrum Electron Beam and Powerful Pulsed X-ray
Authors:
D. W. Wang,
S. H. Yang,
S. Wang,
J. Wang,
H. P. Li
Abstract:
The electron beam, one of the most effective approaches to simulate the irradiation effects of powerful pulsed X-ray in the laboratory, plays an important role in the experiment of simulating thermodynamic effects of powerful pulsed X-ray. This paper studies the thermodynamics equivalence between multi-energy composite spectrum electron beam and blackbody spectrum X-ray, which is helpful to quickl…
▽ More
The electron beam, one of the most effective approaches to simulate the irradiation effects of powerful pulsed X-ray in the laboratory, plays an important role in the experiment of simulating thermodynamic effects of powerful pulsed X-ray. This paper studies the thermodynamics equivalence between multi-energy composite spectrum electron beam and blackbody spectrum X-ray, which is helpful to quickly determine the experimental parameters in the simulation experiment. The experimental data of electron beam is extrapolated by the numerical calculation, to increase the range of energy flux. Through calculating the blow-off impulse of blackbody spectrum X-ray irradiation, we obtained the curve of X-ray blow-off impulse varying with energy flux, and then found two categories of equivalent relations - equal-energy flux and equal-impulse - by analysing the calculation results of electron beam and X-ray blow-off impulse. Based on such relations, we could directly or indirectly obtain the results of blackbody spectrum X-ray irradiation blow-off impulse via electron beam experiment.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Sunspot penumbral filaments intruding into a light bridge and the resultant reconnection jets
Authors:
Y. J. Hou,
T. Li,
S. H. Zhong,
S. H. Yang,
Y. L. Guo,
X. H. Li,
J. Zhang,
Y. Y. Xiang
Abstract:
Penumbral filaments and light bridges are prominent structures inside sunspots and are important for understanding the nature of sunspot magnetic fields and magneto-convection underneath. We investigate an interesting event where several penumbral filaments intruded into a sunspot light bridge for more insights into magnetic fields of the sunspot penumbral filament and light bridge, as well as the…
▽ More
Penumbral filaments and light bridges are prominent structures inside sunspots and are important for understanding the nature of sunspot magnetic fields and magneto-convection underneath. We investigate an interesting event where several penumbral filaments intruded into a sunspot light bridge for more insights into magnetic fields of the sunspot penumbral filament and light bridge, as well as their interaction. The emission, kinematic, and magnetic topology characteristics of the penumbral filaments intruding into the light bridge and the resultant jets are studied. At the west part of the light bridge, the intruding penumbral filaments penetrated into the umbrae on both sides of the light bridge, and two groups of jets were also detected. The jets shared the same projected morphology with the intruding filaments and were accompanied by intermittent footpoint brightenings. Simultaneous spectral imaging observations provide convincing evidences for the presences of magnetic reconnection related heating and bidirectional flows near the jet bases and contribute to measuring vector velocities of the jets. Additionally, nonlinear force-free field extrapolation results reveal strong and highly inclined magnetic fields along the intruding penumbral filaments, consistent well with the results deduced from the vector velocities of the jets. Therefore, we propose that the jets could be caused by magnetic reconnections between emerging fields within the light bridge and the nearly horizontal fields of intruding filaments. They were then ejected outward along the stronger filaments fields. Our study indicates that magnetic reconnection could occur between the penumbral filament fields and emerging fields within light bridge and produce jets along the stronger filament fields. These results further complement the study of magnetic reconnection and dynamic activities within the sunspot.
△ Less
Submitted 3 August, 2020;
originally announced August 2020.
-
Fast degradation of the circular flare ribbon on 2014 August 24
Authors:
Q. M. Zhang,
S. H. Yang,
T. Li,
Y. J. Hou,
Y. Li
Abstract:
The separation and elongation motions of solar flare ribbons have extensively been investigated. The degradation and disappearance of ribbons have rarely been explored. In this paper, we report our multiwavelength observations of a C5.5 circular-ribbon flare associated with two jets (jet1 and jet2) on 2014 August 24, focusing on the fast degradation of the outer circular ribbon (CR). The flare, co…
▽ More
The separation and elongation motions of solar flare ribbons have extensively been investigated. The degradation and disappearance of ribbons have rarely been explored. In this paper, we report our multiwavelength observations of a C5.5 circular-ribbon flare associated with two jets (jet1 and jet2) on 2014 August 24, focusing on the fast degradation of the outer circular ribbon (CR). The flare, consisting of a short inner ribbon (IR) and outer CR, was triggered by the eruption of a minifilament. The brightness of IR and outer CR reached their maxima simultaneously at $\sim$04:58 UT in all AIA wavelengths. Subsequently, the short eastern part of CR faded out quickly in 1600 Å but gradually in EUV wavelengths. The long western part of CR degraded in the counterclockwise direction and experienced a deceleration. The degradation was distinctly divided into two phases: phase I with faster apparent speeds (58$-$69 km s$^{-1}$) and phase II with slower apparent speeds (29$-$35 km s$^{-1}$). The second phase stopped at $\sim$05:10 UT when the western CR totally disappeared. Besides the outward propagation of jet1, the jet spire experienced untwisting motion in the counterclockwise direction during 04:55$-$05:00 UT. We conclude that the event can be explained by the breakout jet model. The coherent brightenings of the IR and CR at $\sim$04:58 UT may result from the impulsive interchange reconnection near the null point, whereas sub-Alfvénic slipping motion of the western CR in the counterclockwise direction indicates the occurrence of slipping magnetic reconnection. Another possible explanation of the quick disappearance of the hot loops connecting to the western CR is that they are simply reconnected sequentially without the need for significant slippage after the null point reconnection.
△ Less
Submitted 14 April, 2020;
originally announced April 2020.
-
Improving Dysarthric Speech Intelligibility Using Cycle-consistent Adversarial Training
Authors:
Seung Hee Yang,
Minhwa Chung
Abstract:
Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were rec…
▽ More
Dysarthria is a motor speech impairment affecting millions of people. Dysarthric speech can be far less intelligible than those of non-dysarthric speakers, causing significant communication difficulties. The goal of our work is to develop a model for dysarthric to healthy speech conversion using Cycle-consistent GAN. Using 18,700 dysarthric and 8,610 healthy control Korean utterances that were recorded for the purpose of automatic recognition of voice keyboard in a previous study, the generator is trained to transform dysarthric to healthy speech in the spectral domain, which is then converted back to speech. Objective evaluation using automatic speech recognition of the generated utterance on a held-out test set shows that the recognition performance is improved compared with the original dysarthic speech after performing adversarial training, as the absolute WER has been lowered by 33.4%. It demonstrates that the proposed GAN-based conversion method is useful for improving dysarthric speech intelligibility.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
A Scalable Chatbot Platform Leveraging Online Community Posts: A Proof-of-Concept Study
Authors:
Sihyeon Jo,
Seungryong Yoo,
Sangwon Im,
Seung Hee Yang,
Tong Zuo,
Hee-Eun Kim,
SangWook Han,
Seong-Woo Kim
Abstract:
The development of natural language processing algorithms and the explosive growth of conversational data are encouraging researches on the human-computer conversation. Still, getting qualified conversational data on a large scale is difficult and expensive. In this paper, we verify the feasibility of constructing a data-driven chatbot with processed online community posts by using them as pseudo-…
▽ More
The development of natural language processing algorithms and the explosive growth of conversational data are encouraging researches on the human-computer conversation. Still, getting qualified conversational data on a large scale is difficult and expensive. In this paper, we verify the feasibility of constructing a data-driven chatbot with processed online community posts by using them as pseudo-conversational data. We argue that chatbots for various purposes can be built extensively through the pipeline exploiting the common structure of community posts. Our experiment demonstrates that chatbots created along the pipeline can yield the proper responses.
△ Less
Submitted 9 January, 2020;
originally announced January 2020.
-
Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training
Authors:
Seung Hee Yang,
Minhwa Chung
Abstract:
Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PS…
▽ More
Self-imitating feedback is an effective and learner-friendly method for non-native learners in Computer-Assisted Pronunciation Training. Acoustic characteristics in native utterances are extracted and transplanted onto learner's own speech input, and given back to the learner as a corrective feedback. Previous works focused on speech conversion using prosodic transplantation techniques based on PSOLA algorithm. Motivated by the visual differences found in spectrograms of native and non-native speeches, we investigated applying GAN to generate self-imitating feedback by utilizing generator's ability through adversarial training. Because this mapping is highly under-constrained, we also adopt cycle consistency loss to encourage the output to preserve the global structure, which is shared by native and non-native utterances. Trained on 97,200 spectrogram images of short utterances produced by native and non-native speakers of Korean, the generator is able to successfully transform the non-native spectrogram input to a spectrogram with properties of self-imitating feedback. Furthermore, the transformed spectrogram shows segmental corrections that cannot be obtained by prosodic transplantation. Perceptual test comparing the self-imitating and correcting abilities of our method with the baseline PSOLA method shows that the generative approach with cycle consistency loss is promising.
△ Less
Submitted 20 April, 2019;
originally announced April 2019.
-
Eruption of a multi-flux-rope system in solar active region 12673 leading to the two largest flares in Solar Cycle 24
Authors:
Y. J. Hou,
J. Zhang,
T. Li,
S. H. Yang,
X. H. Li
Abstract:
Solar active region (AR) 12673 in 2017 September produced two largest flares in Solar Cycle 24: the X9.3 flare on September 06 and the X8.2 flare on September 10. We attempt to investigate the evolutions of the two great flares and their associated complex magnetic system in detail. Aided by the NLFFF modeling, we identify a double-decker flux rope configuration above the polarity inversion line (…
▽ More
Solar active region (AR) 12673 in 2017 September produced two largest flares in Solar Cycle 24: the X9.3 flare on September 06 and the X8.2 flare on September 10. We attempt to investigate the evolutions of the two great flares and their associated complex magnetic system in detail. Aided by the NLFFF modeling, we identify a double-decker flux rope configuration above the polarity inversion line (PIL) in the AR core region. The north ends of these two flux ropes were rooted in a negative- polarity magnetic patch, which began to move along the PIL and rotate anticlockwise before the X9.3 flare on September 06. The strong shearing motion and rotation contributed to the destabilization of the two magnetic flux ropes, of which the upper one subsequently erupted upward due to the kink-instability. Then another two sets of twisted loop bundles beside these ropes were disturbed and successively erupted within 5 minutes like a chain reaction. Similarly, multiple ejecta components were detected to consecutively erupt during the X8.2 flare occurring in the same AR on September 10. We examine the evolution of the AR magnetic fields from September 03 to 06 and find that five dipoles emerged successively at the east of the main sunspot. The interactions between these dipoles took place continuously, accompanied by magnetic flux cancellations and strong shearing motions. In AR 12673, significant flux emergence and successive interactions between the different emerging dipoles resulted in a complex magnetic system, accompanied by the formations of multiple flux ropes and twisted loop bundles. We propose that the eruptions of a multi-flux-rope system resulted in the two largest flares in Solar Cycle 24.
△ Less
Submitted 21 October, 2018; v1 submitted 21 August, 2018;
originally announced August 2018.
-
A blowout jet associated with one obvious extreme-ultraviolet wave and one complicated coronal mass ejection event
Authors:
Y. H. Miao,
Y. Liu,
H. B. Li,
Y. D. Shen,
S. H. Yang,
A. Elmhamdi,
A. S. Kordi,
Z. Z. Abidin
Abstract:
In this paper, we present a detailed analysis of a coronal blowout jet eruption which was associated with an obvious extreme-ultraviolet (EUV) wave and one complicated coronal mass ejection (CME) event based on the multi-wavelength and multi-view-angle observations from {\sl Solar Dynamics Observatory} and {\sl Solar Terrestrial Relations Observatory}. It is found that the triggering of the blowou…
▽ More
In this paper, we present a detailed analysis of a coronal blowout jet eruption which was associated with an obvious extreme-ultraviolet (EUV) wave and one complicated coronal mass ejection (CME) event based on the multi-wavelength and multi-view-angle observations from {\sl Solar Dynamics Observatory} and {\sl Solar Terrestrial Relations Observatory}. It is found that the triggering of the blowout jet was due to the emergence and cancellation of magnetic fluxes on the photosphere. During the rising stage of the jet, the EUV wave appeared just ahead of the jet top, lasting about 4 minutes and at a speed of 458 - \speed{762}. In addition, obvious dark material is observed along the EUV jet body, which confirms the observation of a mini-filament eruption at the jet base in the chromosphere. Interestingly, two distinct but overlapped CME structures can be observed in corona together with the eruption of the blowout jet. One is in narrow jet-shape, while the other one is in bubble-shape. The jet-shaped component was unambiguously related with the outwardly running jet itself, while the bubble-like one might either be produced due to the reconstruction of the high coronal fields or by the internal reconnection during the mini-filament ejection according to the double-CME blowout jet model firstly proposed by Shen et al. (2012b), suggesting more observational evidence should be supplied to clear the current ambiguity based on large samples of blowout jets in future studies.
△ Less
Submitted 23 December, 2018; v1 submitted 29 March, 2017;
originally announced March 2017.
-
Light Walls Around Sunspots Observed by the Interface Region Imaging Spectrograph
Authors:
Y. J. Hou,
T. Li,
S. H. Yang,
J. Zhang
Abstract:
The Interface Region Imaging Spectrograph (IRIS) mission provides high-resolution observations of the chromosphere and transition region. We try to determine whether the light walls exist somewhere else in active regions besides light bridges. Employing half-year high tempo-spatial data from the IRIS, we find lots of light walls either around sunspots or above light bridges. For the first time, we…
▽ More
The Interface Region Imaging Spectrograph (IRIS) mission provides high-resolution observations of the chromosphere and transition region. We try to determine whether the light walls exist somewhere else in active regions besides light bridges. Employing half-year high tempo-spatial data from the IRIS, we find lots of light walls either around sunspots or above light bridges. For the first time, we report one light wall near an umbral-penumbral boundary and another along a neutral line between two small sunspots. These new observations reveal that these light walls are multi-layer and multi-thermal structures which occur along magnetic neutral lines in active regions.
△ Less
Submitted 2 April, 2016;
originally announced April 2016.
-
Hybrid Generative/Discriminative Learning for Automatic Image Annotation
Authors:
Shuang Hong Yang,
Jiang Bian,
Hongyuan Zha
Abstract:
Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags is usually huge (as large as the vocabulary size) yet each image is only related to a few of them. T…
▽ More
Automatic image annotation (AIA) raises tremendous challenges to machine learning as it requires modeling of data that are both ambiguous in input and output, e.g., images containing multiple objects and labeled with multiple semantic tags. Even more challenging is that the number of candidate tags is usually huge (as large as the vocabulary size) yet each image is only related to a few of them. This paper presents a hybrid generative-discriminative classifier to simultaneously address the extreme data-ambiguity and overfitting-vulnerability issues in tasks such as AIA. Particularly: (1) an Exponential-Multinomial Mixture (EMM) model is established to capture both the input and output ambiguity and in the meanwhile to encourage prediction sparsity; and (2) the prediction ability of the EMM model is explicitly maximized through discriminative learning that integrates variational inference of graphical models and the pairwise formulation of ordinal regression. Experiments show that our approach achieves both superior annotation performance and better tag scalability.
△ Less
Submitted 15 March, 2012;
originally announced March 2012.
-
Local Optimality of User Choices and Collaborative Competitive Filtering
Authors:
Shuang Hong Yang
Abstract:
While a user's preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) approaches take into account only the binary events of user actions but totally disregard the contexts in which users' decisions are made. In this p…
▽ More
While a user's preference is directly reflected in the interactive choice process between her and the recommender, this wealth of information was not fully exploited for learning recommender models. In particular, existing collaborative filtering (CF) approaches take into account only the binary events of user actions but totally disregard the contexts in which users' decisions are made. In this paper, we propose Collaborative Competitive Filtering (CCF), a framework for learning user preferences by modeling the choice process in recommender systems. CCF employs a multiplicative latent factor model to characterize the dyadic utility function. But unlike CF, CCF models the user behavior of choices by encoding a local competition effect. In this way, CCF allows us to leverage dyadic data that was previously lumped together with missing data in existing CF models. We present two formulations and an efficient large scale optimization algorithm. Experiments on three real-world recommendation data sets demonstrate that CCF significantly outperforms standard CF approaches in both offline and online evaluations.
△ Less
Submitted 25 February, 2011; v1 submitted 4 October, 2010;
originally announced October 2010.
-
Response of the solar atmosphere to magnetic field evolution in a coronal hole region
Authors:
S. H. Yang,
J. Zhang,
C. L. Jin,
L. P. Li,
H. Y. Duan
Abstract:
Methods. We study an equatorial CH observed simultaneously by HINODE and STEREO on July 27, 2007. The HINODE/SP maps are adopted to derive the physical parameters of the photosphere and to research the magnetic field evolution and distribution. The G band and Ca II H images with high tempo-spatial resolution from HINODE/BFI and the multi-wavelength data from STEREO/EUVI are utilized to study the…
▽ More
Methods. We study an equatorial CH observed simultaneously by HINODE and STEREO on July 27, 2007. The HINODE/SP maps are adopted to derive the physical parameters of the photosphere and to research the magnetic field evolution and distribution. The G band and Ca II H images with high tempo-spatial resolution from HINODE/BFI and the multi-wavelength data from STEREO/EUVI are utilized to study the corresponding atmospheric response of different overlying layers. Results. We explore an emerging dipole locating at the CH boundary. Mini-scale arch filaments (AFs) accompanying the emerging dipole were observed with the Ca II H line. During the separation of the dipolar footpoints, three AFs appeared and expanded in turn. The first AF divided into two segments in its late stage, while the second and third AFs erupted in their late stages. The lifetimes of these three AFs are 4, 6, 10 minutes, and the two intervals between the three divisions or eruptions are 18 and 12 minutes, respectively. We display an example of mixed-polarity flux emergence of IN fields within the CH and present the corresponding chromospheric response. With the increase of the integrated magnetic flux, the brightness of the Ca II H images exhibits an increasing trend. We also study magnetic flux cancellations of NT fields locating at the CH boundary and present the obvious chromospheric and coronal response. We notice that the brighter regions seen in the 171 A images are relevant to the interacting magnetic elements. By examining the magnetic NT and IN elements and the response of different atmospheric layers, we obtain good positive linear correlations between the NT magnetic flux densities and the brightness of both G band (correlation coefficient 0.85) and Ca II H (correlation coefficient 0.58).
△ Less
Submitted 17 April, 2009;
originally announced April 2009.