subscribe to arXiv mailings

A Large-Scale Evaluation of Speech Foundation Models

Authors: Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-yi Lee

Abstract: The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,… ▽ More The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark. △ Less

Submitted 29 May, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: The extended journal version for SUPERB and SUPERB-SG. Published in IEEE/ACM TASLP. The Arxiv version is preferred

arXiv:2404.05525 [pdf, other]

ALMA Spectroscopy of Europa: A Search for Active Plumes

Authors: M. A. Cordiner, A. E. Thelen, I. -L. Lai, W. -L. Tseng, C. A. Nixon, Y. -J. Kuan, G. L. Villanueva, L. Paganini, S. B. Charnley, K. D. Retherford

Abstract: The subsurface ocean of Europa is a high priority target in the search for extraterrestrial life, but direct investigations are hindered by the presence of a thick, exterior ice shell. Here we present spectral line and continuum maps of Europa obtained over four epochs in May-June 2021 using the Atacama Large Millimeter/submillimeter Array (ALMA), to search for molecular emission from atmospheric… ▽ More The subsurface ocean of Europa is a high priority target in the search for extraterrestrial life, but direct investigations are hindered by the presence of a thick, exterior ice shell. Here we present spectral line and continuum maps of Europa obtained over four epochs in May-June 2021 using the Atacama Large Millimeter/submillimeter Array (ALMA), to search for molecular emission from atmospheric plumes, with the aim of investigating subsurface processes. Using a 3D physical model, we obtained upper limits for the plume abundances of HCN, H$_2$CO, SO$_2$ and CH$_3$OH. If active plume(s) were present, they contained very low abundances of these molecules. Assuming a total gas production rate of $10^{29}$ s$^{-1}$, our H$_2$CO abundance upper limit of $<0.016$\% is more than an order of magnitude less than measured in the Enceladus plume by the Cassini spacecraft, implying a possible chemical difference between the plume source materials for these two icy moons. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: Submitted to IAU Symposium 383 conference proceedings --- Astrochemistry VIII: From the First Galaxies to the Formation of Habitable Worlds

arXiv:2404.00025 [pdf, other]

doi 10.1145/3544549.3577064

Understanding Physical Breakdowns in Virtual Reality

Authors: Wen-Jie Tseng

Abstract: Virtual Reality (VR) moves away from well-controlled laboratory environments into public and personal spaces. As users are visually disconnected from the physical environment, interacting in an uncontrolled space frequently leads to collisions and raises safety concerns. In my thesis, I investigate this phenomenon which I define as the physical breakdown in VR. The goal is to understand the reason… ▽ More Virtual Reality (VR) moves away from well-controlled laboratory environments into public and personal spaces. As users are visually disconnected from the physical environment, interacting in an uncontrolled space frequently leads to collisions and raises safety concerns. In my thesis, I investigate this phenomenon which I define as the physical breakdown in VR. The goal is to understand the reasons for physical breakdowns, provide solutions, and explore future mechanisms that could perpetuate safety risks. First, I explored the reasons for physical breakdowns by investigating how people interact with the current VR safety mechanism (e.g., Oculus Guardian). Results show one reason for breaking out of the safety boundary is when interacting with large motions (e.g., swinging arms), the user does not have enough time to react although they see the safety boundary. I proposed a solution, FingerMapper, that maps small-scale finger motions onto virtual arms and hands to enable whole-body virtual arm motions in VR to avoid physical breakdowns. To demonstrate future safety risks, I explored the malicious use of perceptual manipulations (e.g., redirection techniques) in VR, which could deliberately create physical breakdowns without users noticing. Results indicate further open challenges about the cognitive process of how users comprehend their physical environment when they are blindfolded in VR. △ Less

Submitted 20 March, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures, CHI EA '23, Doctoral Consortium

Journal ref: (CHI EA 2023) 1-5

arXiv:2403.17847 [pdf, other]

Climate Downscaling: A Deep-Learning Based Super-resolution Model of Precipitation Data with Attention Block and Skip Connections

Authors: Chia-Hao Chiang, Zheng-Han Huang, Liwen Liu, Hsin-Chien Liang, Yi-Chi Wang, Wan-Ling Tseng, Chao Wang, Che-Ta Chen, Ko-Chih Wang

Abstract: Human activities accelerate consumption of fossil fuels and produce greenhouse gases, resulting in urgent issues today: global warming and the climate change. These indirectly cause severe natural disasters, plenty of lives suffering and huge losses of agricultural properties. To mitigate impacts on our lands, scientists are developing renewable, reusable, and clean energies and climatologists are… ▽ More Human activities accelerate consumption of fossil fuels and produce greenhouse gases, resulting in urgent issues today: global warming and the climate change. These indirectly cause severe natural disasters, plenty of lives suffering and huge losses of agricultural properties. To mitigate impacts on our lands, scientists are developing renewable, reusable, and clean energies and climatologists are trying to predict the extremes. Meanwhile, governments are publicizing resource-saving policies for a more eco-friendly society and arousing environment awareness. One of the most influencing factors is the precipitation, bringing condensed water vapor onto lands. Water resources are the most significant but basic needs in society, not only supporting our livings, but also economics. In Taiwan, although the average annual precipitation is up to 2,500 millimeter (mm), the water allocation for each person is lower than the global average due to drastically geographical elevation changes and uneven distribution through the year. Thus, it is crucial to track and predict the rainfall to make the most use of it and to prevent the floods. However, climate models have limited resolution and require intensive computational power for local-scale use. Therefore, we proposed a deep convolutional neural network with skip connections, attention blocks, and auxiliary data concatenation, in order to downscale the low-resolution precipitation data into high-resolution one. Eventually, we compare with other climate downscaling methods and show better performance in metrics of Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Pearson Correlation, structural similarity index (SSIM), and forecast indicators. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.13970 [pdf]

Mass supply from Io to Jupiter's magnetosphere

Authors: L. Roth, A. Blöcker, K. de Kleer, D. Goldstein, E. Lellouch, J. Saur, C. Schmidt, D. F. Strobel, C. Tao, F. Tsuchiya, V. Dols, H. Huybrighs, A. Mura, J. R. Szalay, S. V. Badman, I. de Pater, A. -C. Dott, M. Kagitani, L. Klaiber, R. Koga, A. McEwen, Z. Milby, K. D. Retherford, S. Schlegel, N. Thomas , et al. (2 additional authors not shown)

Abstract: Since the Voyager mission flybys in 1979, we have known the moon Io to be extremely volcanically active as well as to be the main source of plasma in the vast magnetosphere of Jupiter. Material lost from Io forms neutral clouds, the Io plasma torus and ultimately the extended plasma sheet. This material is supplied from the upper atmosphere and atmospheric loss is likely driven by plasma-interacti… ▽ More Since the Voyager mission flybys in 1979, we have known the moon Io to be extremely volcanically active as well as to be the main source of plasma in the vast magnetosphere of Jupiter. Material lost from Io forms neutral clouds, the Io plasma torus and ultimately the extended plasma sheet. This material is supplied from the upper atmosphere and atmospheric loss is likely driven by plasma-interaction effects with possible contributions from thermal escape and photochemistry-driven escape. Direct volcanic escape is negligible. The supply of material to maintain the plasma torus was estimated from various methods at roughly one ton per second. Most of the time the magnetospheric plasma environment of Io is stable on timescales from days to months. Similarly, Io's atmosphere was found to have a stable average density on the dayside, although it exhibits lateral, diurnal and seasonal variations. There is a potential positive feedback in the Io torus supply: collisions of torus plasma with atmospheric neutrals likely are a significant loss process, which increases with torus density. The stability of the torus environment might be maintained by limiting mechanisms of either torus supply from Io or the loss from the torus by centrifugal interchange in the middle magnetosphere. Various observations suggest that occasionally the plasma torus undergoes major transient changes over a period of several weeks, apparently overcoming possible stabilizing mechanisms. Such events (and more frequent minor changes) are commonly explained by some kind of change in volcanic activity that triggers a chain of reactions which modify the plasma torus state via a net increase in supply of new mass. However, it remains unknown what kind of volcanic event can trigger torus events, whether Io's atmosphere undergoes a change before or during such magnetospheric events, and what processes could enable such a change. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2312.01820 [pdf, other]

Electrically tunable flat bands with layer-resolved charge distribution in twisted monolayer-bilayer graphene

Authors: Wei-En Tseng, Mei-Yin Chou

Abstract: At a small twist angle, exotic electronic properties emerge in twisted monolayer-bilayer graphene (aAB), including electrically switchable magnetic order and correlated insulating states. These fascinating many-body phenomena manifest when the low-energy bands feature a narrow band width. In this study, we examine the electronic structure of aAB using first-principles calculations combined with an… ▽ More At a small twist angle, exotic electronic properties emerge in twisted monolayer-bilayer graphene (aAB), including electrically switchable magnetic order and correlated insulating states. These fascinating many-body phenomena manifest when the low-energy bands feature a narrow band width. In this study, we examine the electronic structure of aAB using first-principles calculations combined with an accurate tight-binding model. We find that the presence of an intrinsic polarization greatly modifies the low-energy bands of aAB. Furthermore, the low-energy bands reach a minimum width at a quasi-magic angle and feature a layer-dependent charge localization and delocalization pattern. In the presence of an electric field, an energy gap opens only if lattice relaxation is taken into account. The particle-hole asymmetry in aAB further leads to flatter conduction bands compared with the valence bands, with an electrically tunable band width and band gap, and a switchable sublattice-dependent charge localization and delocalization pattern. △ Less

Submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.18320 [pdf, other]

BN-embedded monolayer graphene with tunable electronic and topological properties

Authors: Chih-Piao Chuu, Wei-En Tseng, Kuan-Hung Liu, Ching-Ming Wei, Mei-Yin Chou

Abstract: Finding an effective and controllable way to create a sizable energy gap in graphene-based systems has been a challenging topic of intensive research. We propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping serves as an ideal platform for band-gap engineering and valleytronic applications. We report a systematic first-principles study of the atomic configurations and band… ▽ More Finding an effective and controllable way to create a sizable energy gap in graphene-based systems has been a challenging topic of intensive research. We propose that the hybrid of boron nitride and graphene (h-BNC) at low BN doping serves as an ideal platform for band-gap engineering and valleytronic applications. We report a systematic first-principles study of the atomic configurations and band gap opening for energetically favorable BN patches embedded in graphene. Based on first-principles calculations, we construct a tight-binding model to simulate general doping configurations in large supercells. Unexpectedly, the calculations find a linear dependence of the band gap on the effective BN concentration at low doping, arising from an induced effective on-site energy difference at the two C sublattices as they are substituted by B and N dopants alternately. The significant and tunable band gap of a few hundred meVs, with preserved topological properties of graphene and feasible sample preparation in the laboratory, presents great opportunities to realize valley physics applications in graphene systems at room temperature. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.17344 [pdf, other]

The composition of Saturn's rings

Authors: Kelly E. Miller, Gianrico Filacchione, Jeffrey Cuzzi, Philip D. Nicholson, Matthew M. Hedman, Kevin Baillie, Robert E. Johnson, Wei-Ling Tseng, Paul R. Estrada, J. Hunter Waite, Mauro Ciarniello, Cécile Ferrari, Zhimeng Zhang, Amanda Hendrix, Julianne I. Moses

Abstract: The origin and evolution of Saturn's rings is critical to understanding the Saturnian system as a whole. Here, we discuss the physical and chemical composition of the rings, as a foundation for evolutionary models described in subsequent chapters. We review the physical characteristics of the main rings, and summarize current constraints on their chemical composition. Radial trends are observed in… ▽ More The origin and evolution of Saturn's rings is critical to understanding the Saturnian system as a whole. Here, we discuss the physical and chemical composition of the rings, as a foundation for evolutionary models described in subsequent chapters. We review the physical characteristics of the main rings, and summarize current constraints on their chemical composition. Radial trends are observed in temperature and to a limited extent in particle size distribution, with the C ring exhibiting higher temperatures and a larger population of small particles. The C ring also shows evidence for the greatest abundance of silicate material, perhaps indicative of formation from a rocky body. The C ring and Cassini Division have lower optical depths than the A and B rings, which contributes to the higher abundance of the exogenous neutral absorber in these regions. Overall, the main ring composition is strongly dominated by water ice, with minor silicate, UV absorber, and neutral absorber components. Sampling of the innermost D ring during Cassini's Grand Finale provides a new set of in situ constraints on the ring composition, and we explore ongoing work to understand the linkages between the main rings and the D ring. The D ring material is organic- and silicate-rich and water-poor relative to the main rings, with a large population of small grains. This composition may be explained in part by volatile losses in the D ring, and current constraints suggest some degree of fractionation rather than sampling of the bulk D ring material. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: Submitted to SSR for publication in the collection "New Vision of the Saturnian System in the Context of a Highly Dissipative Saturn"

arXiv:2311.15582 [pdf, other]

Lightly Weighted Automatic Audio Parameter Extraction for the Quality Assessment of Consensus Auditory-Perceptual Evaluation of Voice

Authors: Yi-Heng Lin, Wen-Hsuan Tseng, Li-Chin Chen, Ching-Ting Tan, Yu Tsao

Abstract: The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address t… ▽ More The Consensus Auditory-Perceptual Evaluation of Voice is a widely employed tool in clinical voice quality assessment that is significant for streaming communication among clinical professionals and benchmarking for the determination of further treatment. Currently, because the assessment relies on experienced clinicians, it tends to be inconsistent, and thus, difficult to standardize. To address this problem, we propose to leverage lightly weighted automatic audio parameter extraction, to increase the clinical relevance, reduce the complexity, and enhance the interpretability of voice quality assessment. The proposed method utilizes age, sex, and five audio parameters: jitter, absolute jitter, shimmer, harmonic-to-noise ratio (HNR), and zero crossing. A classical machine learning approach is employed. The result reveals that our approach performs similar to state-of-the-art (SOTA) methods, and outperforms the latent representation obtained by using popular audio pre-trained models. This approach provide insights into the feasibility of different feature extraction approaches for voice evaluation. Audio parameters such as jitter and the HNR are proven to be suitable for characterizing voice quality attributes, such as roughness and strain. Conversely, pre-trained models exhibit limitations in effectively addressing noise-related scorings. This study contributes toward more comprehensive and precise voice quality evaluations, achieved by a comprehensively exploring diverse assessment methodologies. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Published in IEEE 42th International Conference on Consumer Electronics (ICCE 2024)

arXiv:2311.04237 [pdf, ps, other]

Online Learning Quantum States with the Logarithmic Loss via VB-FTRL

Authors: Wei-Fu Tseng, Kai-Chun Chen, Zi-Hong Xiao, Yen-Huan Li

Abstract: Online learning quantum states with the logarithmic loss (LL-OLQS) is a quantum generalization of online portfolio selection, a classic open problem in the field of online learning for over three decades. The problem also emerges in designing randomized optimization algorithms for maximum-likelihood quantum state tomography. Recently, Jezequel et al. (arXiv:2209.13932) proposed the VB-FTRL algorit… ▽ More Online learning quantum states with the logarithmic loss (LL-OLQS) is a quantum generalization of online portfolio selection, a classic open problem in the field of online learning for over three decades. The problem also emerges in designing randomized optimization algorithms for maximum-likelihood quantum state tomography. Recently, Jezequel et al. (arXiv:2209.13932) proposed the VB-FTRL algorithm, the first nearly regret-optimal algorithm for OPS with moderate computational complexity. In this note, we generalize VB-FTRL for LL-OLQS. Let $d$ denote the dimension and $T$ the number of rounds. The generalized algorithm achieves a regret rate of $O ( d^2 \log ( d + T ) )$ for LL-OLQS. Each iteration of the algorithm consists of solving a semidefinite program that can be implemented in polynomial time by, e.g., cutting-plane methods. For comparison, the best-known regret rate for LL-OLQS is currently $O ( d^2 \log T )$, achieved by the exponential weight method. However, there is no explicit implementation available for the exponential weight method for LL-OLQS. To facilitate the generalization, we introduce the notion of VB-convexity. VB-convexity is a sufficient condition for the logarithmic barrier associated with any function to be convex and is of independent interest. △ Less

Submitted 6 November, 2023; originally announced November 2023.

Comments: 20 pages

arXiv:2309.07114 [pdf, other]

doi 10.3847/1538-3881/acedb0

Monitoring H$α$ Emission from the Wide-orbit Brown-dwarf Companion FU Tau B

Authors: Ya-Lin Wu, Yu-Chi Cheng, Li-Ching Huang, Brendan Bowler, Laird Close, Wei-Ling Tseng, Ning Chen, Da-Wei Chen

Abstract: Monitoring mass accretion onto substellar objects provides insights into the geometry of the accretion flows. We use the Lulin One-meter Telescope to monitor H$α$ emission from FU Tau B, a $\sim$19 $M_{\rm Jup}$ brown-dwarf companion at 5.7" (719 au) from the host star, for six consecutive nights. This is the longest continuous H$α$ monitoring for a substellar companion near the deuterium-burning… ▽ More Monitoring mass accretion onto substellar objects provides insights into the geometry of the accretion flows. We use the Lulin One-meter Telescope to monitor H$α$ emission from FU Tau B, a $\sim$19 $M_{\rm Jup}$ brown-dwarf companion at 5.7" (719 au) from the host star, for six consecutive nights. This is the longest continuous H$α$ monitoring for a substellar companion near the deuterium-burning limit. We aim to investigate if accretion near the planetary regime could be rotationally modulated as suggested by magnetospheric accretion models. We find tentative evidence that H$α$ mildly varies on hourly and daily timescales, though our sensitivity is not sufficient to definitively establish any rotational modulation. No burst-like events are detected, implying that accretion onto FU Tau B is overall stable during the time baseline and sampling windows over which it was observed. The primary star FU Tau A also exhibits H$α$ variations over timescales from minutes to days. This program highlights the potential of monitoring accretion onto substellar objects with small telescopes. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: Published in AJ

arXiv:2304.02394 [pdf, other]

doi 10.1145/3544548.3580988

Memory Manipulations in Extended Reality

Authors: Elise Bonnail, Eric Lecolinet, Wen-Jie Tseng, Samuel Huron, Mark Mcgill, Jan Gugenheimer

Abstract: Human memory has notable limitations (e.g., forgetting) which have necessitated a variety of memory aids (e.g., calendars). As we grow closer to mass adoption of everyday Extended Reality (XR), which is frequently leveraging perceptual limitations (e.g., redirected walking), it becomes pertinent to consider how XR could leverage memory limitations (forgetting, distorting, persistence) to induce me… ▽ More Human memory has notable limitations (e.g., forgetting) which have necessitated a variety of memory aids (e.g., calendars). As we grow closer to mass adoption of everyday Extended Reality (XR), which is frequently leveraging perceptual limitations (e.g., redirected walking), it becomes pertinent to consider how XR could leverage memory limitations (forgetting, distorting, persistence) to induce memory manipulations. As memories highly impact our self-perception, social interactions, and behaviors, there is a pressing need to understand XR Memory Manipulations (XRMMs). We ran three speculative design workshops (n=12), with XR and memory researchers creating 48 XRMM scenarios. Through thematic analysis, we define XRMMs, present a framework of their core components and reveal three classes (at encoding, pre-retrieval, at retrieval). Each class differs in terms of technology (AR, VR) and impact on memory (influencing quality of memories, inducing forgetting, distorting memories). We raise ethical concerns and discuss opportunities of perceptual and memory manipulations in XR. △ Less

Submitted 5 April, 2023; originally announced April 2023.

Journal ref: In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI '23), Apr 2023, Hamburg, Germany

arXiv:2303.12861 [pdf, other]

Parallel Diffusion Model-based Sparse-view Cone-beam Breast CT

Authors: Wenjun Xia, Hsin Wu Tseng, Chuang Niu, Wenxiang Cong, Xiaohua Zhang, Shaohua Liu, Ruola Ning, Srinivasan Vedantham, Ge Wang

Abstract: Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation d… ▽ More Breast cancer is the most prevalent cancer among women worldwide, and early detection is crucial for reducing its mortality rate and improving quality of life. Dedicated breast computed tomography (CT) scanners offer better image quality than mammography and tomosynthesis in general but at higher radiation dose. To enable breast CT for cancer screening, the challenge is to minimize the radiation dose without compromising image quality, according to the ALARA principle (as low as reasonably achievable). Over the past years, deep learning has shown remarkable successes in various tasks, including low-dose CT especially few-view CT. Currently, the diffusion model presents the state of the art for CT reconstruction. To develop the first diffusion model-based breast CT reconstruction method, here we report innovations to address the large memory requirement for breast cone-beam CT reconstruction and high computational cost of the diffusion model. Specifically, in this study we transform the cutting-edge Denoising Diffusion Probabilistic Model (DDPM) into a parallel framework for sub-volume-based sparse-view breast CT image reconstruction in projection and image domains. This novel approach involves the concurrent training of two distinct DDPM models dedicated to processing projection and image data synergistically in the dual domains. Our experimental findings reveal that this method delivers competitive reconstruction performance at half to one-third of the standard radiation doses. This advancement demonstrates an exciting potential of diffusion-type models for volumetric breast reconstruction at high-resolution with much-reduced radiation dose and as such hopefully redefines breast cancer screening and diagnosis. △ Less

Submitted 28 January, 2024; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.12379 [pdf, other]

VMCML: Video and Music Matching via Cross-Modality Lifting

Authors: Yi-Shan Lee, Wei-Cheng Tseng, Fu-En Wang, Min Sun

Abstract: We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representatio… ▽ More We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, we establish a large-scale dataset called MSVD, in which we provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSVD datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance. △ Less

Submitted 22 March, 2023; originally announced March 2023.

arXiv:2303.00733 [pdf, other]

SpeechPrompt v2: Prompt Tuning for Speech Classification Tasks

Authors: Kai-Wei Chang, Yu-Kai Wang, Hua Shen, Iu-thing Kang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee

Abstract: Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained language model (LM) to directly generate the output for downstream tasks. Recently, prompt tuning has demonstrated its storage and computation efficiency in both natural language processing (NLP) and speech processing fields. These advantages have also revealed prompt tuning as a candidate approach to serving… ▽ More Prompt tuning is a technology that tunes a small set of parameters to steer a pre-trained language model (LM) to directly generate the output for downstream tasks. Recently, prompt tuning has demonstrated its storage and computation efficiency in both natural language processing (NLP) and speech processing fields. These advantages have also revealed prompt tuning as a candidate approach to serving pre-trained LM for multiple tasks in a unified manner. For speech processing, SpeechPrompt shows its high parameter efficiency and competitive performance on a few speech classification tasks. However, whether SpeechPrompt is capable of serving a large number of tasks is unanswered. In this work, we propose SpeechPrompt v2, a prompt tuning framework capable of performing a wide variety of speech classification tasks, covering multiple languages and prosody-related tasks. The experiment result shows that SpeechPrompt v2 achieves performance on par with prior works with less than 0.15M trainable parameters in a unified framework. △ Less

Submitted 1 March, 2023; originally announced March 2023.

Comments: Project website: https://ga642381.github.io/SpeechPrompt

arXiv:2302.12757 [pdf, other]

Ensemble knowledge distillation of self-supervised speech models

Authors: Kuan-Po Huang, Tzu-hsun Feng, Yu-Kuan Fu, Tsu-Yuan Hsu, Po-Chieh Yen, Wei-Cheng Tseng, Kai-Wei Chang, Hung-yi Lee

Abstract: Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerw… ▽ More Distilled self-supervised models have shown competitive performance and efficiency in recent years. However, there is a lack of experience in jointly distilling multiple self-supervised speech models. In our work, we performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM. We tried two different aggregation techniques, layerwise-average and layerwise-concatenation, to the representations of different teacher models and found that the former was more effective. On top of that, we proposed a multiple prediction head method for student models to predict different layer outputs of multiple teacher models simultaneously. The experimental results show that our method improves the performance of the distilled models on four downstream speech processing tasks, Phoneme Recognition, Speaker Identification, Emotion Recognition, and Automatic Speech Recognition in the hidden-set track of the SUPERB benchmark. △ Less

Submitted 24 February, 2023; originally announced February 2023.

Comments: Accepted by ICASSP 2023

arXiv:2302.11865 [pdf, other]

doi 10.1145/3544548.3580736

FingerMapper: Mapping Finger Motions onto Virtual Arms to Enable Safe Virtual Reality Interaction in Confined Spaces

Authors: Wen-Jie Tseng, Samuel Huron, Eric Lecolinet, Jan Gugenheimer

Abstract: Whole-body movements enhance the presence and enjoyment of Virtual Reality (VR) experiences. However, using large gestures is often uncomfortable and impossible in confined spaces (e.g., public transport). We introduce FingerMapper, mapping small-scale finger motions onto virtual arms and hands to enable whole-body virtual movements in VR. In a first target selection study (n=13) comparing FingerM… ▽ More Whole-body movements enhance the presence and enjoyment of Virtual Reality (VR) experiences. However, using large gestures is often uncomfortable and impossible in confined spaces (e.g., public transport). We introduce FingerMapper, mapping small-scale finger motions onto virtual arms and hands to enable whole-body virtual movements in VR. In a first target selection study (n=13) comparing FingerMapper to hand tracking and ray-casting, we found that FingerMapper can significantly reduce physical motions and fatigue while having a similar degree of precision. In a consecutive study (n=13), we compared FingerMapper to hand tracking inside a confined space (the front passenger seat of a car). The results showed participants had significantly higher perceived safety and fewer collisions with FingerMapper while preserving a similar degree of presence and enjoyment as hand tracking. Finally, we present three example applications demonstrating how FingerMapper could be applied for locomotion and interaction for VR in confined spaces. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: 14 pages, 15 figures

arXiv:2204.07052 [pdf, other]

CroCo: Cross-Modal Contrastive learning for localization of Earth Observation data

Authors: Wei-Hsin Tseng, Hoàng-Ân Lê, Alexandre Boulch, Sébastien Lefèvre, Dirk Tiede

Abstract: It is of interest to localize a ground-based LiDAR point cloud on remote sensing imagery. In this work, we tackle a subtask of this problem, i.e. to map a digital elevation model (DEM) rasterized from aerial LiDAR point cloud on the aerial imagery. We proposed a contrastive learning-based method that trains on DEM and high-resolution optical imagery and experiment the framework on different data s… ▽ More It is of interest to localize a ground-based LiDAR point cloud on remote sensing imagery. In this work, we tackle a subtask of this problem, i.e. to map a digital elevation model (DEM) rasterized from aerial LiDAR point cloud on the aerial imagery. We proposed a contrastive learning-based method that trains on DEM and high-resolution optical imagery and experiment the framework on different data sampling strategies and hyperparameters. In the best scenario, the Top-1 score of 0.71 and Top-5 score of 0.81 are obtained. The proposed method is promising for feature learning from RGB and DEM for localization and is potentially applicable to other data sources too. Source code will be released at https://github.com/wtseng530/AVLocalization. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted for publication in the ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences (online from July 2022)

arXiv:2204.03219 [pdf, other]

DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores

Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Abstract: Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic s… ▽ More Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score. △ Less

Submitted 15 August, 2022; v1 submitted 7 April, 2022; originally announced April 2022.

Comments: Accepted to Interspeech 2022. Code will be available in the future

arXiv:2203.16773 [pdf, other]

SpeechPrompt: An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks

Authors: Kai-Wei Chang, Wei-Cheng Tseng, Shang-Wen Li, Hung-yi Lee

Abstract: Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. Recently, prompting in Natural Language Processing (NLP) has been found to be an e… ▽ More Speech representations learned from Self-supervised learning (SSL) models can benefit various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. Recently, prompting in Natural Language Processing (NLP) has been found to be an efficient technique to leverage pre-trained language models (LMs). Specifically, prompt tuning optimizes a limited number of task-specific parameters with a fixed pre-trained model; as a result, only a small set of parameters is needed to be stored for each task. Prompt tuning improves computation and memory efficiency by leveraging the pre-trained LM's prediction ability. Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper. The source code is available on https://github.com/ga642381/SpeechPrompt. △ Less

Submitted 10 July, 2022; v1 submitted 30 March, 2022; originally announced March 2022.

Comments: Accepted to be published in the Proceedings of Interspeech 2022

arXiv:2203.10168 [pdf, other]

Boreas: A Multi-Season Autonomous Driving Dataset

Authors: Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y. K. Leung, Angela P. Schoellig, Timothy D. Barfoot

Abstract: The Boreas dataset was collected by driving a repeated route over the course of one year, resulting in stark seasonal variations and adverse weather conditions such as rain and falling snow. In total, the Boreas dataset includes over 350km of driving data featuring a 128-channel Velodyne Alpha Prime lidar, a 360$^\circ$ Navtech CIR304-H scanning radar, a 5MP FLIR Blackfly S camera, and centimetre-… ▽ More The Boreas dataset was collected by driving a repeated route over the course of one year, resulting in stark seasonal variations and adverse weather conditions such as rain and falling snow. In total, the Boreas dataset includes over 350km of driving data featuring a 128-channel Velodyne Alpha Prime lidar, a 360$^\circ$ Navtech CIR304-H scanning radar, a 5MP FLIR Blackfly S camera, and centimetre-accurate post-processed ground truth poses. Our dataset will support live leaderboards for odometry, metric localization, and 3D object detection. The dataset and development kit are available at https://www.boreas.utias.utoronto.ca △ Less

Submitted 26 January, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted in IJRR as a data paper

arXiv:2202.13200 [pdf, other]

doi 10.1145/3491102.3517728

The Dark Side of Perceptual Manipulations in Virtual Reality

Authors: Wen-Jie Tseng, Elise Bonnail, Mark McGill, Mohamed Khamis, Eric Lecolinet, Samuel Huron, Jan Gugenheimer

Abstract: "Virtual-Physical Perceptual Manipulations" (VPPMs) such as redirected walking and haptics expand the user's capacity to interact with Virtual Reality (VR) beyond what would ordinarily physically be possible. VPPMs leverage knowledge of the limits of human perception to effect changes in the user's physical movements, becoming able to (perceptibly and imperceptibly) nudge their physical actions to… ▽ More "Virtual-Physical Perceptual Manipulations" (VPPMs) such as redirected walking and haptics expand the user's capacity to interact with Virtual Reality (VR) beyond what would ordinarily physically be possible. VPPMs leverage knowledge of the limits of human perception to effect changes in the user's physical movements, becoming able to (perceptibly and imperceptibly) nudge their physical actions to enhance interactivity in VR. We explore the risks posed by the malicious use of VPPMs. First, we define, conceptualize and demonstrate the existence of VPPMs. Next, using speculative design workshops, we explore and characterize the threats/risks posed, proposing mitigations and preventative recommendations against the malicious use of VPPMs. Finally, we implement two sample applications to demonstrate how existing VPPMs could be trivially subverted to create the potential for physical harm. This paper aims to raise awareness that the current way we apply and publish VPPMs can lead to malicious exploits of our perceptual vulnerabilities. △ Less

Submitted 26 February, 2022; originally announced February 2022.

Comments: 15 pages, 7 figures

arXiv:2202.11849 [pdf, other]

doi 10.3847/1538-4357/ac5893

A SUBLIME 3D Model for Cometary Coma Emission: the Hypervolatile-Rich Comet C/2016 R2 (PanSTARRS)

Authors: M. A. Cordiner, I. M. Coulson, E. Garcia-Berrios, C. Qi, F. Lique, M. Zoltowski, M. de Val-Borro, Y. -J. Kuan, W. -H. Ip, S. Mairs, N. X. Roth, S. B. Charnley, S. N. Milam, W. -L Tseng, Y. -L Chuang

Abstract: The coma of comet C/2016 R2 (PanSTARRS) is one of the most chemically peculiar ever observed, in particular due to its extremely high CO/H2O and N2+/H2O ratios}, and unusual trace volatile abundances. However, the complex shape of its CO emission lines, as well as uncertainties in the coma structure and excitation, has lead to ambiguities in the total CO production rate. We performed high resoluti… ▽ More The coma of comet C/2016 R2 (PanSTARRS) is one of the most chemically peculiar ever observed, in particular due to its extremely high CO/H2O and N2+/H2O ratios}, and unusual trace volatile abundances. However, the complex shape of its CO emission lines, as well as uncertainties in the coma structure and excitation, has lead to ambiguities in the total CO production rate. We performed high resolution, spatially, spectrally and temporally resolved CO observations using the James Clerk Maxwell Telescope (JCMT) and Submillimeter Array (SMA) to elucidate the outgassing behaviour of C/2016 R2. Results are analyzed using a new, time-dependent, three dimensional radiative transfer code (SUBLIME), incorporating for the first time, accurate state-to-state collisional rate coefficients for the CO--CO system. The total CO production rate was found to be in the range $(3.8-7.6)\times10^{28}$ s$^{-1}$ between 2018-01-13 and 2018-02-01, with a mean value of $(5.3\pm0.6)\times10^{28}$ s$^{-1}$ at r_H = 2.8-2.9 au. The emission is concentrated in a near-sunward jet, with an outflow velocity $0.51\pm0.01$ km/s, compared to $0.25\pm0.01$ km/s in the ambient (and night-side) coma. Evidence was also found for an extended source of CO emission, possibly due to icy grain sublimation around $1.2\times10^5$ km from the nucleus. Based on the coma molecular abundances, we propose that the nucleus ices of C/2016 R2 can be divided into a rapidly sublimating apolar phase, rich in CO, CO2, N2 and CH3OH, and a predominantly frozen (or less abundant), polar phase containing more H2O, CH4, H2CO and HCN. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: Accepted for publication in ApJ

arXiv:2202.00181 [pdf, other]

CLA-NeRF: Category-Level Articulated Neural Radiance Field

Authors: Wei-Cheng Tseng, Hung-Ju Liao, Lin Yen-Chen, Min Sun

Abstract: We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation. CLA-NeRF is trained at the object category level using no CAD models and no depth, but a set of RGB images with ground truth camera poses and part segments. During inference, it only takes a few RGB views (i.e., few-shot) of an unseen 3D obj… ▽ More We propose CLA-NeRF -- a Category-Level Articulated Neural Radiance Field that can perform view synthesis, part segmentation, and articulated pose estimation. CLA-NeRF is trained at the object category level using no CAD models and no depth, but a set of RGB images with ground truth camera poses and part segments. During inference, it only takes a few RGB views (i.e., few-shot) of an unseen 3D object instance within the known category to infer the object part segmentation and the neural radiance field. Given an articulated pose as input, CLA-NeRF can perform articulation-aware volume rendering to generate the corresponding RGB image at any camera pose. Moreover, the articulated pose of an object can be estimated via inverse rendering. In our experiments, we evaluate the framework across five categories on both synthetic and real-world data. In all cases, our method shows realistic deformation results and accurate articulated pose estimation. We believe that both few-shot articulated object rendering and articulated pose estimation open doors for robots to perceive and interact with unseen articulated objects. △ Less

Submitted 3 March, 2022; v1 submitted 31 January, 2022; originally announced February 2022.

Comments: accepted by ICRA 2022

arXiv:2112.07222 [pdf, other]

Meta-CPR: Generalize to Unseen Large Number of Agents with Communication Pattern Recognition Module

Authors: Wei-Cheng Tseng, Wei Wei, Da-Cheng Juan, Min Sun

Abstract: Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and d… ▽ More Designing an effective communication mechanism among agents in reinforcement learning has been a challenging task, especially for real-world applications. The number of agents can grow or an environment sometimes needs to interact with a changing number of agents in real-world scenarios. To this end, a multi-agent framework needs to handle various scenarios of agents, in terms of both scales and dynamics, for being practical to real-world applications. We formulate the multi-agent environment with a different number of agents as a multi-tasking problem and propose a meta reinforcement learning (meta-RL) framework to tackle this problem. The proposed framework employs a meta-learned Communication Pattern Recognition (CPR) module to identify communication behavior and extract information that facilitates the training process. Experimental results are poised to demonstrate that the proposed framework (a) generalizes to an unseen larger number of agents and (b) allows the number of agents to change between episodes. The ablation study is also provided to reason the proposed CPR design and show such design is effective. △ Less

Submitted 31 January, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

arXiv:2112.00344 [pdf, other]

Leveraging Sequence Embedding and Convolutional Neural Network for Protein Function Prediction

Authors: Wei-Cheng Tseng, Po-Han Chi, Jia-Hua Wu, Min Sun

Abstract: The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network… ▽ More The capability of accurate prediction of protein functions and properties is essential in the biotechnology industry, e.g. drug development and artificial protein synthesis, etc. The main challenges of protein function prediction are the large label space and the lack of labeled training data. Our method leverages unsupervised sequence embedding and the success of deep convolutional neural network to overcome these challenges. In contrast, most of the existing methods delete the rare protein functions to reduce the label space. Furthermore, some existing methods require additional bio-information (e.g., the 3-dimensional structure of the proteins) which is difficult to be determined in biochemical experiments. Our proposed method significantly outperforms the other methods on the publicly available benchmark using only protein sequences as input. This allows the process of identifying protein functions to be sped up. △ Less

Submitted 1 December, 2021; originally announced December 2021.

Comments: Published in NeurIPS 2018 Machine Learning for Molecules and Materials Workshop

arXiv:2111.13532 [pdf]

The 3D Direct Simulation Monte Carlo Study of Europa Gas Plume

Authors: Wei-Ling Tseng, Ian-Lin Lai, Wing-Huen Ip, Hsiang-Wen Hsu, Jong-Shinn Wu

Abstract: Europa has been spotted to have water outgassing activities by the space and ground-based telescopes as well as reanalysis of the Galileo data (Roth et al. 2014; Sparks et al. 2016, 2017; Paganini et al. 2020; Jia et al. 2018; Arnold et al. 2019). However, these observations only provided limited information about plume dynamics, which is critical in understanding the eruption mechanism and prepar… ▽ More Europa has been spotted to have water outgassing activities by the space and ground-based telescopes as well as reanalysis of the Galileo data (Roth et al. 2014; Sparks et al. 2016, 2017; Paganini et al. 2020; Jia et al. 2018; Arnold et al. 2019). However, these observations only provided limited information about plume dynamics, which is critical in understanding the eruption mechanism and preparation of future exploration. We adopt a 3D DSMC model to investigate the plume characteristics of Europa assuming supersonic expansion originated from the undersurface vent. The main goal is to understand the physical processes and structures of Europa water vapor plumes, which can play a key role on probing its undersurface vent condition and outgassing mechanism. With a parametric study of the total gas production rate and initial gas bulk velocity, the gas number density, temperature and velocity information of the outgassing plumes from the various case studies are derived. Our results show that the plume gases experience acceleration through mutual collisions and adiabatic cooling when exiting and expanding from the surface. The central part of the plume with the relatively large gas production rates (of 1029 and 1030 H2O s-1) is found to sustain thermal equilibrium and nearly continuum condition. Column density maps integrated along two different viewing angles are presented to demonstrate the importance of the projection effect on remote sensing diagnostics. Finally, the density profiles at different altitudes are provided to prepare for observations of Europa plumes including the upcoming spacecraft missions such as JUICE and Europa Clipper. △ Less

Submitted 29 March, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: This paper has been submitted to Universe in Feb 2022, and it is during minor revision

arXiv:2111.05113 [pdf, other]

Membership Inference Attacks Against Self-supervised Speech Models

Authors: Wei-Cheng Tseng, Wei-Tsung Kao, Hung-yi Lee

Abstract: Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In… ▽ More Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA. △ Less

Submitted 15 August, 2022; v1 submitted 9 November, 2021; originally announced November 2021.

Comments: Accepted to Interspeech 2022. Code will be available in the future

arXiv:2105.12182 [pdf, other]

Self-Calibration of the Offset Between GPS and Semantic Map Frames for Robust Localization

Authors: Wei-Kang Tseng, Angela P. Schoellig, Timothy D. Barfoot

Abstract: In self-driving, standalone GPS is generally considered to have insufficient positioning accuracy to stay in lane. Instead, many turn to LIDAR localization, but this comes at the expense of building LIDAR maps that can be costly to maintain. Another possibility is to use semantic cues such as lane lines and traffic lights to achieve localization, but these are usually not continuously visible. Thi… ▽ More In self-driving, standalone GPS is generally considered to have insufficient positioning accuracy to stay in lane. Instead, many turn to LIDAR localization, but this comes at the expense of building LIDAR maps that can be costly to maintain. Another possibility is to use semantic cues such as lane lines and traffic lights to achieve localization, but these are usually not continuously visible. This issue can be remedied by combining semantic cues with GPS to fill in the gaps. However, due to elapsed time between mapping and localization, the live GPS frame can be offset from the semantic map frame, requiring calibration. In this paper, we propose a robust semantic localization algorithm that self-calibrates for the offset between the live GPS and semantic map frames by exploiting common semantic cues, including traffic lights and lane markings. We formulate the problem using a modified Iterated Extended Kalman Filter, which incorporates GPS and camera images for semantic cue detection via Convolutional Neural Networks. Experimental results show that our proposed algorithm achieves decimetre-level accuracy comparable to typical LIDAR localization performance and is robust against sparse semantic features and frequent GPS dropouts. △ Less

Submitted 30 June, 2021; v1 submitted 25 May, 2021; originally announced May 2021.

Comments: Accepted for publication in CRV 2021; corrected reference 4

arXiv:2105.01051 [pdf, ps, other]

SUPERB: Speech processing Universal PERformance Benchmark

Authors: Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-yi Lee

Abstract: Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge… ▽ More Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL due to its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model. Our results demonstrate that the framework is promising as SSL representations show competitive generalizability and accessibility across SUPERB tasks. We release SUPERB as a challenge with a leaderboard and a benchmark toolkit to fuel the research in representation learning and general speech processing. △ Less

Submitted 15 October, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

Comments: To appear in Interspeech 2021

arXiv:2104.03017 [pdf, other]

Utilizing Self-supervised Representations for MOS Prediction

Authors: Wei-Cheng Tseng, Chien-yu Huang, Wei-Tsung Kao, Yist Y. Lin, Hung-yi Lee

Abstract: Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and… ▽ More Speech quality assessment has been a critical issue in speech processing for decades. Existing automatic evaluations usually require clean references or parallel ground truth data, which is infeasible when the amount of data soars. Subjective tests, on the other hand, do not need any additional clean or parallel data and correlates better to human perception. However, such a test is expensive and time-consuming because crowd work is necessary. It thus becomes highly desired to develop an automatic evaluation approach that correlates well with human perception while not requiring ground truth data. In this paper, we use self-supervised pre-trained models for MOS prediction. We show their representations can distinguish between clean and noisy audios. Then, we fine-tune these pre-trained models followed by simple linear layers in an end-to-end manner. The experiment results showed that our framework outperforms the two previous state-of-the-art models by a significant improvement on Voice Conversion Challenge 2018 and achieves comparable or superior performance on Voice Conversion Challenge 2016. We also conducted an ablation study to further investigate how each module benefits the task. The experiment results are implemented and reproducible with publicly available toolkits. △ Less

Submitted 20 September, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

Comments: In Proceedings of Interspeech 2021. We acknowledge the support of AWS Machine Learning Research Awards program. Source code available at https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/mos_prediction

arXiv:2103.02957 [pdf, other]

Toward Robust Long Range Policy Transfer

Authors: Wei-Cheng Tseng, Jin-Siang Lin, Yao-Min Feng, Min Sun

Abstract: Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination… ▽ More Humans can master a new task within a few trials by drawing upon skills acquired through prior experience. To mimic this capability, hierarchical models combining primitive policies learned from prior tasks have been proposed. However, these methods fall short comparing to the human's range of transferability. We propose a method, which leverages the hierarchical structure to train the combination function and adapt the set of diverse primitive polices alternatively, to efficiently produce a range of complex behaviors on challenging new tasks. We also design two regularization terms to improve the diversity and utilization rate of the primitives in the pre-training phase. We demonstrate that our method outperforms other recent policy transfer methods by combining and adapting these reusable primitives in tasks with continuous action space. The experiment results further show that our approach provides a broader transferring range. The ablation study also shows the regularization terms are critical for long range policy transfer. Finally, we show that our method consistently outperforms other methods when the quality of the primitives varies. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: Accepted by AAAI 2021

arXiv:2011.02882 [pdf]

Query Expansion System for the VoxCeleb Speaker Recognition Challenge 2020

Authors: Yu-Sen Cheng, Chun-Liang Shih, Tien-Hong Lo, Wen-Ting Tseng, Berlin Chen

Abstract: In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. Two approaches are adopted. One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study. Another is to use Kaldi extract x-vector and to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score. In this report, we describe our submission to the VoxCeleb Speaker Recognition Challenge (VoxSRC) 2020. Two approaches are adopted. One is to apply query expansion on speaker verification, which shows significant progress compared to baseline in the study. Another is to use Kaldi extract x-vector and to combine its Probabilistic Linear Discriminant Analysis (PLDA) score with ResNet score. △ Less

Submitted 4 November, 2020; originally announced November 2020.

arXiv:2010.14049 [pdf]

Effective FAQ Retrieval and Question Matching With Unsupervised Knowledge Injection

Authors: Wen-Ting Tseng, Tien-Hong Lo, Yung-Chang Hsu, Berlin Chen

Abstract: Frequently asked question (FAQ) retrieval, with the purpose of providing information on frequent questions or concerns, has far-reaching applications in many areas, where a collection of question-answer (Q-A) pairs compiled a priori can be employed to retrieve an appropriate answer in response to a user\u2019s query that is likely to reoccur frequently. To this end, predominant approaches to FAQ r… ▽ More Frequently asked question (FAQ) retrieval, with the purpose of providing information on frequent questions or concerns, has far-reaching applications in many areas, where a collection of question-answer (Q-A) pairs compiled a priori can be employed to retrieve an appropriate answer in response to a user\u2019s query that is likely to reoccur frequently. To this end, predominant approaches to FAQ retrieval typically rank question-answer pairs by considering either the similarity between the query and a question (q-Q), the relevance between the query and the associated answer of a question (q-A), or combining the clues gathered from the q-Q similarity measure and the q-A relevance measure. In this paper, we extend this line of research by combining the clues gathered from the q-Q similarity measure and the q-A relevance measure and meanwhile injecting extra word interaction information, distilled from a generic (open domain) knowledge base, into a contextual language model for inferring the q-A relevance. Furthermore, we also explore to capitalize on domain-specific topically-relevant relations between words in an unsupervised manner, acting as a surrogate to the supervised domain-specific knowledge base information. As such, it enables the model to equip sentence representations with the knowledge about domain-specific and topically-relevant relations among words, thereby providing a better q-A relevance measure. We evaluate variants of our approach on a publicly-available Chinese FAQ dataset, and further apply and contextualize it to a large-scale question-matching task, which aims to search questions from a QA dataset that have a similar intent as an input query. Extensive experimental results on these two datasets confirm the promising performance of the proposed approach in relation to some state-of-the-art ones. △ Less

Submitted 27 October, 2020; originally announced October 2020.

arXiv:2007.15767 [pdf, other]

The Saturn Ring Skimmer Mission Concept: The next step to explore Saturn's rings, atmosphere, interior, and inner magnetosphere

Authors: Matthew S. Tiscareno, Mar Vaquero, Matthew M. Hedman, Hao Cao, Paul R. Estrada, Andrew P. Ingersoll, Kelly E. Miller, Marzia Parisi, David. H. Atkinson, Shawn M. Brooks, Jeffrey N. Cuzzi, James Fuller, Amanda R. Hendrix, Robert E. Johnson, Tommi Koskinen, William S. Kurth, Jonathan I. Lunine, Philip D. Nicholson, Carol S. Paty, Rebecca Schindhelm, Mark R. Showalter, Linda J. Spilker, Nathan J. Strange, Wendy Tseng

Abstract: The innovative Saturn Ring Skimmer mission concept enables a wide range of investigations that address fundamental questions about Saturn and its rings, as well as giant planets and astrophysical disk systems in general. This mission would provide new insights into the dynamical processes that operate in astrophysical disk systems by observing individual particles in Saturn's rings for the first t… ▽ More The innovative Saturn Ring Skimmer mission concept enables a wide range of investigations that address fundamental questions about Saturn and its rings, as well as giant planets and astrophysical disk systems in general. This mission would provide new insights into the dynamical processes that operate in astrophysical disk systems by observing individual particles in Saturn's rings for the first time. The Ring Skimmer would also constrain the origin, history, and fate of Saturn's rings by determining their compositional evolution and material transport rates. In addition, the Ring Skimmer would reveal how the rings, magnetosphere, and planet operate as an inter-connected system by making direct measurements of the ring's atmosphere, Saturn's inner magnetosphere and the material owing from the rings into the planet. At the same time, this mission would clarify the dynamical processes operating in the planet's visible atmosphere and deep interior by making extensive high-resolution observations of cloud features and repeated measurements of the planet's extremely dynamic gravitational field. Given the scientific potential of this basic mission concept, we advocate that it be studied in depth as a potential option for the New Frontiers program. △ Less

Submitted 16 September, 2020; v1 submitted 30 July, 2020; originally announced July 2020.

Comments: White paper submitted to the Planetary Science and Astrobiology Decadal Survey (submission #420)

arXiv:2007.11784 [pdf]

Deep Learning Based Segmentation of Various Brain Lesions for Radiosurgery

Authors: Siang-Ruei Wu, Hao-Yun Chang, Florence T Su, Heng-Chun Liao, Wanju Tseng, Chun-Chih Liao, Feipei Lai, Feng-Ming Hsu, Furen Xiao

Abstract: Semantic segmentation of medical images with deep learning models is rapidly developed. In this study, we benchmarked state-of-the-art deep learning segmentation algorithms on our clinical stereotactic radiosurgery dataset, demonstrating the strengths and weaknesses of these algorithms in a fairly practical scenario. In particular, we compared the model performances with respect to their sampling… ▽ More Semantic segmentation of medical images with deep learning models is rapidly developed. In this study, we benchmarked state-of-the-art deep learning segmentation algorithms on our clinical stereotactic radiosurgery dataset, demonstrating the strengths and weaknesses of these algorithms in a fairly practical scenario. In particular, we compared the model performances with respect to their sampling method, model architecture, and the choice of loss functions, identifying the suitable settings for their applications and shedding light on the possible improvements. △ Less

Submitted 22 July, 2020; originally announced July 2020.

arXiv:2005.05007 [pdf]

doi 10.1016/j.carbon.2019.09.052

Direct growth of mm-size twisted bilayer graphene by plasma-enhanced chemical vapor deposition

Authors: Yen-Chun Chen, Wei-Hsiang Lin, Wei-Shiuan Tseng, Chien-Chang Chen, George. R. Rossman, Chii-Dong Chen, Yu-Shu Wu, Nai-Chang Yeh

Abstract: Plasma enhanced chemical vapor deposition (PECVD) techniques have been shown to be an efficient method to achieve single-step synthesis of high-quality monolayer graphene (MLG) without the need of active heating. Here we report PECVD-growth of single-crystalline hexagonal bilayer graphene (BLG) flakes and mm-size BLG films with the interlayer twist angle controlled by the growth parameters. The tw… ▽ More Plasma enhanced chemical vapor deposition (PECVD) techniques have been shown to be an efficient method to achieve single-step synthesis of high-quality monolayer graphene (MLG) without the need of active heating. Here we report PECVD-growth of single-crystalline hexagonal bilayer graphene (BLG) flakes and mm-size BLG films with the interlayer twist angle controlled by the growth parameters. The twist angle has been determined by three experimental approaches, including direct measurement of the relative orientation of crystalline edges between two stacked monolayers by scanning electron microscopy, analysis of the twist angle-dependent Raman spectral characteristics, and measurement of the Moiré period with scanning tunneling microscopy. In mm-sized twisted BLG (tBLG) films, the average twist angle can be controlled from 0 to approximately 20 \degree, and the angular spread for a given growth condition can be limited to < 7 \degree. Different work functions between MLG and BLG have been verified by the Kelvin probe force microscopy and ultraviolet photoelectron spectroscopy. Electrical measurements of back-gated field-effect-transistor devices based on small-angle tBLG samples revealed high-quality electric characteristics at 300 K and insulating temperature dependence down to 100 K. This controlled PECVD-growth of tBLG thus provides an efficient approach to investigate the effect of varying Moiré potentials on tBLG. △ Less

Submitted 11 May, 2020; originally announced May 2020.

Comments: Manuscript (39 pages, 10 figures) and Supplementary Information (11 pages, 6 figures). Published in Carbon

Journal ref: Carbon 156, 212-224 (2020)

arXiv:1807.00424 [pdf, other]

doi 10.1103/PhysRevA.98.032514

Oscillating magnetic field effects in high precision metrology

Authors: H. C. J. Gan, G. Maslennikov, K. W. Tseng, T. R. Tan, R. Kaewuam, K. J. Arnold, D. Matsukevich, M. D. Barrett

Abstract: We examine a range of effects arising from ac magnetic fields in high precision metrology. These results are directly relevant to high precision measurements, and accuracy assessments for state-of-the-art optical clocks. Strategies to characterize these effects are discussed and a simple technique to accurately determine trap-induced ac magnetic fields in a linear Paul trap is demonstrated using… ▽ More We examine a range of effects arising from ac magnetic fields in high precision metrology. These results are directly relevant to high precision measurements, and accuracy assessments for state-of-the-art optical clocks. Strategies to characterize these effects are discussed and a simple technique to accurately determine trap-induced ac magnetic fields in a linear Paul trap is demonstrated using $^{171}\mathrm{Yb}^+$ △ Less

Submitted 7 July, 2018; v1 submitted 1 July, 2018; originally announced July 2018.

Comments: 10 pages, 6 figures

Journal ref: Phys. Rev. A 98, 032514 (2018)

arXiv:1801.07411 [pdf]

Comparison Training for Computer Chinese Chess

Authors: Wen-Jie Tseng, Jr-Chang Chen, I-Chen Wu, Tinghan Wei

Abstract: This paper describes the application of comparison training (CT) for automatic feature weight tuning, with the final objective of improving the evaluation functions used in Chinese chess programs. First, we propose an n-tuple network to extract features, since n-tuple networks require very little expert knowledge through its large numbers of features, while simulta-neously allowing easy access. Se… ▽ More This paper describes the application of comparison training (CT) for automatic feature weight tuning, with the final objective of improving the evaluation functions used in Chinese chess programs. First, we propose an n-tuple network to extract features, since n-tuple networks require very little expert knowledge through its large numbers of features, while simulta-neously allowing easy access. Second, we propose a novel evalua-tion method that incorporates tapered eval into CT. Experiments show that with the same features and the same Chinese chess program, the automatically tuned comparison training feature weights achieved a win rate of 86.58% against the weights that were hand-tuned. The above trained version was then improved by adding additional features, most importantly n-tuple features. This improved version achieved a win rate of 81.65% against the trained version without additional features. △ Less

Submitted 23 January, 2018; originally announced January 2018.

Comments: Submitted to IEEE Transaction on Games

arXiv:1705.07970 [pdf]

Atomic-scale Structural and Chemical Characterization of Hexagonal Boron Nitride Layers Synthesized at the Wafer-Scale with Monolayer Thickness Control

Authors: Wei-Hsiang Lin, Victor W. Brar, Deep Jariwala, Michelle C. Sherrott, Wei-Shiuan Tseng, Chih-I Wu, Nai-Chang Yeh, Harry A. Atwater

Abstract: Hexagonal boron nitride (h-BN) is a promising two-dimensional insulator with a large band gap and low density of charged impurities that is isostructural and isoelectronic with graphene. Here we report the chemical and atomic-scale structure of CVD-grown wafer-scale (~25 cm2) h-BN sheets ranging in thickness from 1-20 monolayers. Atomic-scale images of h-BN on Au and graphene/Au substrates obtaine… ▽ More Hexagonal boron nitride (h-BN) is a promising two-dimensional insulator with a large band gap and low density of charged impurities that is isostructural and isoelectronic with graphene. Here we report the chemical and atomic-scale structure of CVD-grown wafer-scale (~25 cm2) h-BN sheets ranging in thickness from 1-20 monolayers. Atomic-scale images of h-BN on Au and graphene/Au substrates obtained by scanning tunneling microscopy (STM) reveal high h-BN crystalline quality in monolayer samples. Further characterization of 1-20 monolayer samples indicates uniform thickness for wafer-scale areas; this thickness control is a result of precise control of the precursor flow rate, deposition temperature and pressure. Raman and infrared spectroscopy indicate the presence of B-N bonds and reveal a linear dependence of thickness with growth time. X-ray photoelectron spectroscopy (XPS) shows the film stoichiometry, and the B/N atom ratio in our films is 1 + 0.6% across the range of thicknesses. Electrical current transport in metal/insulator/metal (Au/h-BN/Au) heterostructures indicates that our CVD-grown h-BN films can act as excellent tunnel barriers with a high hard-breakdown field strength. Our results suggest that large-area h-BN films are structurally, chemically and electronically uniform over the wafer scale, opening the door to pervasive application as a dielectric in layered nanoelectronic and nanophotonic heterostructures. △ Less

Submitted 22 May, 2017; originally announced May 2017.

Comments: 26 pages, 5 figures

arXiv:1611.02621 [pdf]

doi 10.3847/2041-8213/834/1/L6

Nanograin densities outside Saturn's A-ring

Authors: Robert E Johnson, Wei-Lin Tseng, Meredith K Elrod, Ann M Persoon

Abstract: The observed disparity between the radial dependence of the ion and electron densities measured by the Cassini plasma and radio science instruments are used to show that the region between the outer edge of Saturn's main rings and its tenuous G-ring is permeated with small charged grains (nanograins). These grains emanate from the edge of the A-ring and from the tenuous F-ring and G-ring. This is… ▽ More The observed disparity between the radial dependence of the ion and electron densities measured by the Cassini plasma and radio science instruments are used to show that the region between the outer edge of Saturn's main rings and its tenuous G-ring is permeated with small charged grains (nanograins). These grains emanate from the edge of the A-ring and from the tenuous F-ring and G-ring. This is a region of Saturn's magnetosphere that is relatively unexplored, but will be a focus of Cassini's F-ring orbits prior to the end of mission in September 2017. Confirmation of the grain densities predicted here will enhance our ability to describe the formation and destruction of material in this important region of Saturn's magnetosphere. △ Less

Submitted 8 November, 2016; originally announced November 2016.

Comments: 8 pages, 1 figure

arXiv:1609.04206 [pdf, ps, other]

doi 10.1007/s11433-017-9007-0

Optical spectroscopy study of charge density wave order in Sr$_{3}$Rh$_{4}$Sn$_{13}$ and (Sr$_{0.5}$Ca$_{0.5}$)$_{3}$Rh$_{4}$Sn$_{13}$

Authors: W. J. Ban, H. P. Wang, C. W. Tseng, C. N. Kuo, C. S. Lue, N. L. Wang

Abstract: We perform optical spectroscopy measurement across the charge density wave (CDW) phase transitions on single-crystal samples of Sr$_{3}$Rh$_{4}$Sn$_{13}$ and (Sr$_{0.5}$Ca$_{0.5}$)$_{3}$Rh$_{4}$Sn$_{13}$. Formation of CDW energy gap was clearly observed for both single-crystal samples when they undergo the phase transitions. The existence of a Drude component in $σ_1(ω)$ below \TCDW indicates that… ▽ More We perform optical spectroscopy measurement across the charge density wave (CDW) phase transitions on single-crystal samples of Sr$_{3}$Rh$_{4}$Sn$_{13}$ and (Sr$_{0.5}$Ca$_{0.5}$)$_{3}$Rh$_{4}$Sn$_{13}$. Formation of CDW energy gap was clearly observed for both single-crystal samples when they undergo the phase transitions. The existence of a Drude component in $σ_1(ω)$ below \TCDW indicates that the Fermi surface is only partially gapped in the CDW state. The obtained value of 2$Δ$/K$_{B}$T$_{CDW}$ is roughly 13 for both Sr$_{3}$Rh$_{4}$Sn$_{13}$ and (Sr$_{0.5}$Ca$_{0.5}$)$_{3}$Rh$_{4}$Sn$_{13}$ compounds. The value is considerably larger than the mean-field value based on the weak-coupling BCS theory. The observed spectral feature in (Sr$_{x}$Ca$_{1-x}$)$_{3}$Rh$_{4}$Sn$_{13}$ resembles those seen in many other CDW systems. △ Less

Submitted 9 January, 2017; v1 submitted 14 September, 2016; originally announced September 2016.

Journal ref: Sci. China-Phys. Mech. Astron. 60, 047011 (2017)

arXiv:1608.04410 [pdf, ps, other]

doi 10.1103/PhysRevB.94.064522

Mn-doping induced ferromagnetism and enhanced superconductivity in Bi_4-x Mn_x O_4 S_3 (0.075 < = x < = 0.15)

Authors: Zhenjie Feng, Xunqing Yin, Yiming Cao, Xianglian Peng, Tian Gao, Chuan Yu, Jingzhe Chen, Baojuan Kang, Bo Lu, Juan Guo, Qing Li, Wei-Shiuan Tseng, Zhongquan Ma, Chao Jing, Shixun Cao, Jincang Zhang, N. -C. Yeh

Abstract: We demonstrate that Mn-doping in the layered sulfides Bi_4O_4S_3 leads to stable Bi_4-x Mn_x O_4 S_3 compounds that exhibit both long-range ferromagnetism and enhanced superconductivity for 0.075 < = x < = 0.15, with a possible record superconducting transition temperature (T_c) = 15 K among all BiS_2-based superconductors. We conjecture that the coexistence of superconductivity and ferromagnetism… ▽ More We demonstrate that Mn-doping in the layered sulfides Bi_4O_4S_3 leads to stable Bi_4-x Mn_x O_4 S_3 compounds that exhibit both long-range ferromagnetism and enhanced superconductivity for 0.075 < = x < = 0.15, with a possible record superconducting transition temperature (T_c) = 15 K among all BiS_2-based superconductors. We conjecture that the coexistence of superconductivity and ferromagnetism may be attributed to Mn-doping in the spacer Bi2O2 layers away from the superconducting BiS_2 layers, whereas the enhancement of T_c may be due to excess electron transfer to BiS_2 from the Mn4+/Mn3+-substitutions in Bi_2O_2. This notion is empirically corroborated by the increased electron-carrier densities upon Mn doping, and by further studies of the Bi_4-x A_x O_4 S_3 compounds (A = Co, Ni; x = 0.1, 0.125), where the T_c values remain comparable to that of the undoped Bi_4O_4S_3 system (= 4.5 K) due to lack of 4+ valences in either Co or Ni ions for excess electron transfer to the BiS_2 layers. These findings therefore shed new light on feasible pathways to enhance the T_c values of BiS_2-based superconductors. △ Less

Submitted 15 August, 2016; originally announced August 2016.

Comments: 11 pages, 10 figures. Accepted for publication in Physical Review B

arXiv:1312.7301 [pdf]

doi 10.1103/PhysRevB.89.024418

Central role of domain wall depinning for perpendicular magnetization switching driven by spin torque from the spin Hall effect

Authors: O. J. Lee, L. Q. Liu, C. F. Pai, H. W. Tseng, Y. Li, D. C. Ralph, R. A. Buhrman

Abstract: We study deterministic magnetic reversal of a perpendicularly magnetized Co layer in a Co/MgO/Ta nano-square driven by spin Hall torque from an in-plane current flowing in an underlying Pt layer. The rate-limiting step of the switching process is domain-wall (DW) depinning by spin Hall torque via a thermally-assisted mechanism that eventually produces full reversal by domain expansion. An in-plane… ▽ More We study deterministic magnetic reversal of a perpendicularly magnetized Co layer in a Co/MgO/Ta nano-square driven by spin Hall torque from an in-plane current flowing in an underlying Pt layer. The rate-limiting step of the switching process is domain-wall (DW) depinning by spin Hall torque via a thermally-assisted mechanism that eventually produces full reversal by domain expansion. An in-plane applied magnetic field collinear with the current is required, with the necessary field scale set by the need to overcome DW chirality imposed by the Dzyaloshinskii-Moriya interaction. Once Joule heating is taken into account the switching current density is quantitatively consistent with a spin Hall angle θ$_{SH}$ ${\approx}$ 0.07 for 4 nm of Pt. △ Less

Submitted 27 December, 2013; originally announced December 2013.

arXiv:1312.4051 [pdf]

doi 10.1016/j.icarus.2014.07.020

Seasonal and radial trends in Saturn's thermal plasma between the main rings and enceladus

Authors: Meredith K. Elrod, Wei-Ling Tseng, Adam K. Woodson, Robert E. Johnson

Abstract: A goal of Cassini's extended mission has been to examine the seasonal variations of Saturn's magnetosphere, moons, and rings. Recently we showed that the magnetospheric plasma between the main rings and Enceladus exhibited a time dependence that we attributed to a seasonally variable source of oxygen from the main rings (Elrod et al., 2012). Such a temporal variation was subsequently seen in the e… ▽ More A goal of Cassini's extended mission has been to examine the seasonal variations of Saturn's magnetosphere, moons, and rings. Recently we showed that the magnetospheric plasma between the main rings and Enceladus exhibited a time dependence that we attributed to a seasonally variable source of oxygen from the main rings (Elrod et al., 2012). Such a temporal variation was subsequently seen in the energetic ion composition (Christon et al., 2013). Here we include the most recent measurements by the Cassini Plasma Spectrometer (CAPS) in our analysis (Elrod et al., 2012) and modeling (Tseng et al., 2013a) of the temporal and radial dependence of the thermal plasma in the region between the main rings and the orbit of Enceladus. Data taken in 2012, well past equinox for which the northern side of the main rings were illuminated, appear consistent with a seasonal variation. Although the thermal plasma in this region comes from two sources, the extended ring atmosphere and the Enceladus torus that have very different radial and temporal trends, the heavy ion density is found to exhibit a steep radial dependence that is similar for all years examined. Using our chemical model, we show that this dependence requires a radial dependence for Enceladus torus than differs from recent models or, more likely, enhanced heavy ion quenching with decreasing distance from the edge of the main rings. We examine the possible physical processes and suggest that the precipitation of the inward diffusing high energy background radiation onto the edge of the main rings could play an important role. △ Less

Submitted 14 December, 2013; originally announced December 2013.

arXiv:1311.2234 [pdf, other]

FuSSO: Functional Shrinkage and Selection Operator

Authors: Junier B. Oliva, Barnabas Poczos, Timothy Verstynen, Aarti Singh, Jeff Schneider, Fang-Cheng Yeh, Wen-Yih Tseng

Abstract: We present the FuSSO, a functional analogue to the LASSO, that efficiently finds a sparse set of functional input covariates to regress a real-valued response against. The FuSSO does so in a semi-parametric fashion, making no parametric assumptions about the nature of input functional covariates and assuming a linear form to the mapping of functional covariates to the response. We provide a statis… ▽ More We present the FuSSO, a functional analogue to the LASSO, that efficiently finds a sparse set of functional input covariates to regress a real-valued response against. The FuSSO does so in a semi-parametric fashion, making no parametric assumptions about the nature of input functional covariates and assuming a linear form to the mapping of functional covariates to the response. We provide a statistical backing for use of the FuSSO via proof of asymptotic sparsistency under various conditions. Furthermore, we observe good results on both synthetic and real-world data. △ Less

Submitted 8 March, 2014; v1 submitted 9 November, 2013; originally announced November 2013.

arXiv:1302.3270 [pdf]

doi 10.1016/j.pss.2013.06.005

The Atomic Hydrogen Cloud in the Saturnian System

Authors: W. -L. Tseng, R. E. Johnson, W. -H. Ip

Abstract: The Voyager flyby observations revealed that a very broad doughnut shaped distribution of the hydrogen atoms existed in the Saturnian magnetosphere. Recent Cassini observations confirmed the local-time asymmetry but also showed the hydrogen cloud density increases with decreasing distance to Saturn. The origin of the atomic hydrogen cloud has been debated ever since. Therefore, we have carried out… ▽ More The Voyager flyby observations revealed that a very broad doughnut shaped distribution of the hydrogen atoms existed in the Saturnian magnetosphere. Recent Cassini observations confirmed the local-time asymmetry but also showed the hydrogen cloud density increases with decreasing distance to Saturn. The origin of the atomic hydrogen cloud has been debated ever since. Therefore, we have carried out a global investigation of the atomic hydrogen cloud taking into account all possible sources: 1) the Saturnian atmosphere, 2) the H2 atmosphere of main rings, 3) Enceladus H2O and OH torus, 4) Titan H2 torus and 5) the atomic hydrogen directly escaping from Titan. We show that the H ejection velocity and angle distribution are modified by collisions of the hot H, produced by electron-impact dissociation of H2, with the ambient atmospheric H2 and H. This in turn affects the morphology of the escaping hydrogen as does the morphology of the ionospheric electron distribution. That Saturn atmosphere is an important source is suggested by the fact that the H cloud peaks well below the ring plane, a feature that, so far, we can not reproduce by the dissociation of the ring H2 atmosphere or other proposed sources. Our simulations show that H directly escaping from Titan is a major contribution in the outer magnetosphere. The morphology of Titan H torus, shaped by the solar radiation pressure and the Saturnian oblateness, can account for the local time asymmetry near Titan orbit. Dissociation of H2O and OH in the Enceladus torus contributes inside ~5 RS, but dissociation of Titan H2 torus does not due to the significant energy released. The total number of H observed by Cassini inside 5 RS: our modeling results suggest ~20% from dissociation in the Enceladus torus, ~10% from dissociation of ring H2 atmosphere, and ~50% from Titan H torus implying that ~20% comes from the Saturnian atmosphere. △ Less

Submitted 13 February, 2013; originally announced February 2013.

Comments: This paper has been submitted to P&SS

arXiv:1208.1711 [pdf]

doi 10.1063/1.4753947

Spin transfer torque devices utilizing the giant spin Hall effect of tungsten

Authors: Chi-Feng Pai, Luqiao Liu, Y. Li, H. W. Tseng, D. C. Ralph, R. A. Buhrman

Abstract: We report a giant spin Hall effect (SHE) in β-W thin films. Using spin torque induced ferromagnetic resonance with a β-W/CoFeB bilayer microstrip we determine the spin Hall angle to be |θ|=0.30\pm0.02, large enough for an in-plane current to efficiently reverse the orientation of an in-plane magnetized CoFeB free layer of a nanoscale magnetic tunnel junction adjacent to a thin β-W layer. From swit… ▽ More We report a giant spin Hall effect (SHE) in β-W thin films. Using spin torque induced ferromagnetic resonance with a β-W/CoFeB bilayer microstrip we determine the spin Hall angle to be |θ|=0.30\pm0.02, large enough for an in-plane current to efficiently reverse the orientation of an in-plane magnetized CoFeB free layer of a nanoscale magnetic tunnel junction adjacent to a thin β-W layer. From switching data obtained with such 3-terminal devices we independently determine |θ|=0.33\pm0.06. We also report variation of the spin Hall switching efficiency with W layers of different resistivities and hence of variable (α and β) phase composition. △ Less

Submitted 8 August, 2012; originally announced August 2012.

arXiv:1203.2875 [pdf]

doi 10.1126/science.1218197

Spin torque switching with the giant spin Hall effect of tantalum

Authors: Luqiao Liu, Chi-Feng Pai, Y. Li, H. W. Tseng, D. C. Ralph, R. A. Buhrman

Abstract: We report a giant spin Hall effect (SHE) in β-Ta that generates spin currents intense enough to induce efficient spin-transfer-torque switching of ferromagnets, thereby providing a new approach for controlling magnetic devices that can be superior to existing technologies. We quantify this SHE by three independent methods and demonstrate spin-torque (ST) switching of both out-of-plane and in-plane… ▽ More We report a giant spin Hall effect (SHE) in β-Ta that generates spin currents intense enough to induce efficient spin-transfer-torque switching of ferromagnets, thereby providing a new approach for controlling magnetic devices that can be superior to existing technologies. We quantify this SHE by three independent methods and demonstrate spin-torque (ST) switching of both out-of-plane and in-plane magnetized layers. We implement a three-terminal device that utilizes current passing through a low impedance Ta-ferromagnet bilayer to effect switching of a nanomagnet, with a higher-impedance magnetic tunnel junction for read-out. The efficiency and reliability of this device, together with its simplicity of fabrication, suggest that this three-terminal SHE-ST design can eliminate the main obstacles currently impeding the development of magnetic memory and non-volatile spin logic technologies. △ Less

Submitted 13 March, 2012; originally announced March 2012.

arXiv:1112.5511 [pdf]

doi 10.1016/j.pss.2012.05.001

Modeling the Seasonal Variability of the Plasma Environment in Saturn's Magnetosphere between Main Rings and Mimas

Authors: W. -L. Tseng, R. E. Johnson, M. K. Elrod

Abstract: The detection of O2+ and O+ ions over Saturn's main rings by the Cassini INMS and CAPS instruments at Saturn orbit insertion (SOI) in 2004 confirmed the existence of the ring atmosphere and ionosphere. The source mechanism was suggested to be primarily photolytic decomposition of water ice producing neutral O2 and H2 (Johnson et al., 2006). Therefore, we predicted that there would be seasonal vari… ▽ More The detection of O2+ and O+ ions over Saturn's main rings by the Cassini INMS and CAPS instruments at Saturn orbit insertion (SOI) in 2004 confirmed the existence of the ring atmosphere and ionosphere. The source mechanism was suggested to be primarily photolytic decomposition of water ice producing neutral O2 and H2 (Johnson et al., 2006). Therefore, we predicted that there would be seasonal variations in the ring atmosphere and ionosphere due to the orientation of the ring plane to the sun (Tseng et al., 2010). The atoms and molecules scattered out of the ring atmosphere by ion-molecule collisions are an important source for the inner magnetosphere (Johnson et al., 2006; Martens et al. 2008; Tseng et al., 2010 and 2011). This source competes with water products from the Enceladus' plumes, which, although possibly variable, do not appear to have a seasonal variability (Smith et al., 2010). Recently, we found that the plasma density, composition and temperature in the region from 2.5 to 3.5 RS exhibited significant seasonal variation between 2004 and 2010 (Elrod et al., 2011). Here we present a one-box ion chemistry model to explain the complex and highly variable plasma environment observed by the CAPS instrument on Cassini. We combine the water products from Enceladus with the molecules scattered from a corrected ring atmosphere, in order to describe the temporal changes in ion densities, composition and temperature detected by CAPS. We found that the observed temporal variations are primarily seasonal, due to the predicted seasonal variation in the ring atmosphere, and are consistent with a compressed magnetosphere at SOI. △ Less

Submitted 22 December, 2011; originally announced December 2011.

Comments: This is submitted to P&SS

Showing 1–50 of 51 results for author: Tseng, W