-
Photocathode characterisation for robust PICOSEC Micromegas precise-timing detectors
Authors:
M. Lisowska,
R. Aleksan,
Y. Angelis,
S. Aune,
J. Bortfeldt,
F. Brunbauer,
M. Brunoldi,
E. Chatzianagnostou,
J. Datta,
K. Dehmelt,
G. Fanourakis,
S. Ferry,
D. Fiorina,
K. J. Floethner,
M. Gallinaro,
F. Garcia,
I. Giomataris,
K. Gnanvo,
F. J. Iguaz,
D. Janssens,
A. Kallitsopoulou,
M. Kovacic,
B. Kross,
C. C. Lai,
P. Legou
, et al. (33 additional authors not shown)
Abstract:
The PICOSEC Micromegas detector is a precise-timing gaseous detector based on a Cherenkov radiator coupled with a semi-transparent photocathode and a Micromegas amplifying structure, targeting a time resolution of tens of picoseconds for minimum ionising particles. Initial single-pad prototypes have demonstrated a time resolution below 25 ps, prompting ongoing developments to adapt the concept for…
▽ More
The PICOSEC Micromegas detector is a precise-timing gaseous detector based on a Cherenkov radiator coupled with a semi-transparent photocathode and a Micromegas amplifying structure, targeting a time resolution of tens of picoseconds for minimum ionising particles. Initial single-pad prototypes have demonstrated a time resolution below 25 ps, prompting ongoing developments to adapt the concept for applications. The achieved performance is being transferred to robust multi-channel detector modules suitable for large-area detection systems requiring excellent timing precision. To enhance the robustness and stability of the PICOSEC Micromegas detector, research on robust carbon-based photocathodes, including Diamond-Like Carbon (DLC) and Boron Carbide (B4C), is pursued. Results from prototypes equipped with DLC and B4C photocathodes exhibited a time resolution of approximately 32 ps and 34.5 ps, respectively. Efforts dedicated to improve detector robustness and stability enhance the feasibility of the PICOSEC Micromegas concept for large experiments, ensuring sustained performance while maintaining excellent timing precision.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Probability of Differentiation Reveals Brittleness of Homogeneity Bias in Large Language Models
Authors:
Messi H. J. Lee,
Calvin K. Lai
Abstract:
Homogeneity bias in Large Language Models (LLMs) refers to their tendency to homogenize the representations of some groups compared to others. Previous studies documenting this bias have predominantly used encoder models, which may have inadvertently introduced biases. To address this limitation, we prompted GPT-4 to generate single word/expression completions associated with 18 situation cues - s…
▽ More
Homogeneity bias in Large Language Models (LLMs) refers to their tendency to homogenize the representations of some groups compared to others. Previous studies documenting this bias have predominantly used encoder models, which may have inadvertently introduced biases. To address this limitation, we prompted GPT-4 to generate single word/expression completions associated with 18 situation cues - specific, measurable elements of environments that influence how individuals perceive situations and compared the variability of these completions using probability of differentiation. This approach directly assessed homogeneity bias from the model's outputs, bypassing encoder models. Across five studies, we find that homogeneity bias is highly volatile across situation cues and writing prompts, suggesting that the bias observed in past work may reflect those within encoder models rather than LLMs. Furthermore, these results suggest that homogeneity bias in LLMs is brittle, as even minor and arbitrary changes in prompts can significantly alter the expression of biases. Future work should further explore how variations in syntactic features and topic choices in longer text generations influence homogeneity bias in LLMs.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Applications of the Green tensor estimates of the nonstationary Stokes system in the half space
Authors:
Kyungkeun Kang,
Baishun Lai,
Chen-Chih Lai,
Tai-Peng Tsai
Abstract:
In this paper, we present a series of applications of the pointwise estimates of the (unrestricted) Green tensor of the nonstationary Stokes system in the half space, established in our previous work [CMP 2023]. First, we show the $L^1$-$L^q$ estimates for the Stokes flow with possibly non-solenoidal $L^1$ initial data, generalizing the results of Giga-Matsui-Shimizu [Math. Z. 1999] and Desch-Hieb…
▽ More
In this paper, we present a series of applications of the pointwise estimates of the (unrestricted) Green tensor of the nonstationary Stokes system in the half space, established in our previous work [CMP 2023]. First, we show the $L^1$-$L^q$ estimates for the Stokes flow with possibly non-solenoidal $L^1$ initial data, generalizing the results of Giga-Matsui-Shimizu [Math. Z. 1999] and Desch-Hieber-Prüss [J. Evol. Equ. 2001]. Second, we construct mild solutions of the Navier-Stokes equations in the half space with mixed-type pointwise decay or with pointwise decay alongside boundary vanishing. Finally, we explore various coupled fluid systems in the half space including viscous resistive magnetohydrodynamics equations, a coupled system for the flow and the magnetic field of MHD type, and the nematic liquid crystal flow. For each of these systems, we construct mild solutions in $L^q$, pointwise decay, and uniformly local $L^q$ spaces.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models
Authors:
Messi H. J. Lee,
Jacob M. Montgomery,
Calvin K. Lai
Abstract:
Vision Language Models (VLMs), exemplified by GPT-4V, adeptly integrate text and vision modalities. This integration enhances Large Language Models' ability to mimic human perception, allowing them to process image inputs. Despite VLMs' advanced capabilities, however, there is a concern that VLMs inherit biases of both modalities in ways that make biases more pervasive and difficult to mitigate. O…
▽ More
Vision Language Models (VLMs), exemplified by GPT-4V, adeptly integrate text and vision modalities. This integration enhances Large Language Models' ability to mimic human perception, allowing them to process image inputs. Despite VLMs' advanced capabilities, however, there is a concern that VLMs inherit biases of both modalities in ways that make biases more pervasive and difficult to mitigate. Our study explores how VLMs perpetuate homogeneity bias and trait associations with regards to race and gender. When prompted to write stories based on images of human faces, GPT-4V describes subordinate racial and gender groups with greater homogeneity than dominant groups and relies on distinct, yet generally positive, stereotypes. Importantly, VLM stereotyping is driven by visual cues rather than group membership alone such that faces that are rated as more prototypically Black and feminine are subject to greater stereotyping. These findings suggest that VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate. We explore the underlying reasons behind this behavior and discuss its implications and emphasize the importance of addressing these biases as VLMs come to mirror human perception.
△ Less
Submitted 21 May, 2024;
originally announced July 2024.
-
$\mathcal{PT}$-Symmetry induced Bi-Stability in Non-Hermitian Cavity Magnomechanics
Authors:
Chaoyi Lai,
Shah Fahad,
Kashif Ammar Yasir
Abstract:
We study the steady-state non-Hermitian magnomechanical system driven by a transverse magnetic field directly interacting with YIG sphere and excites cavity magnons and photons. To make the system non-Hermitian, we use a traveling field directly interacting with magnons generating gain to the system. We start by illustrating PT-configuration of the system, which contains two PT broken region aroun…
▽ More
We study the steady-state non-Hermitian magnomechanical system driven by a transverse magnetic field directly interacting with YIG sphere and excites cavity magnons and photons. To make the system non-Hermitian, we use a traveling field directly interacting with magnons generating gain to the system. We start by illustrating PT-configuration of the system, which contains two PT broken region around exceptional point and PT protected region along the axis of exceptional point. Late, we discover that the numbers of cavity photons and magnons show bistable behavior depending upon the PT configuration, which becomes more significant as the values of the magnon-photon coupling and traveling field strength increases. We illustrate that steady-state photon only shows bistable behavior when the system in in lossy PT broken configuration, means strength of traveling field is less than the magnon-photon coupling. Otherwise, it will just contain a single stable state because of bistability suppression with gain in the system, which is unlike with any other investigation in this direction. Further, a larger magnon-photon coupling increases photon intensity and decreases magnon intensity, because of photon and magnon energy exchange, leading to enhanced photon bistablity and decreased magnon bistability. However, in case of increasing strength of traveling field, both photon as well as magnon bistability is appeared to be decreasing. We also study the steady-state effective potential of the system and illustrate the occurrence of bistability with nonlinear interactions between contour trajectories, which similarly depends on the PT broken configuration of the system.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Reducing Quantum Error Correction Overhead with Versatile Flag-Sharing Syndrome Extraction Circuits
Authors:
Pei-Hao Liou,
Ching-Yi Lai
Abstract:
Given that quantum error correction processes are unreliable, an efficient error syndrome extraction circuit should use fewer ancillary qubits, quantum gates, and measurements, while maintaining low circuit depth, to minimizing the circuit area, roughly defined as the product of circuit depth and the number of physical qubits. We propose to design parallel flagged syndrome extraction with shared f…
▽ More
Given that quantum error correction processes are unreliable, an efficient error syndrome extraction circuit should use fewer ancillary qubits, quantum gates, and measurements, while maintaining low circuit depth, to minimizing the circuit area, roughly defined as the product of circuit depth and the number of physical qubits. We propose to design parallel flagged syndrome extraction with shared flag qubits for quantum stabilizer codes. Versatile parallelization techniques are employed to minimize the required circuit area, thereby improving the error threshold and overall performance. Specifically, all the measurement outcomes in multiple rounds of syndrome extraction are integrated into a lookup table decoder, allowing us to parallelize multiple stabilizer measurements with shared flag qubits. We present flag-sharing and fully parallel schemes for the [[17,1,5]] and [[19,1,5]] Calderbank-Shor-Steane (CSS) codes. This methodology extends to the [[5,1,3]] non-CSS code, achieving the minimum known circuit area. Numerical simulations have demonstrated improved pseudothresholds for these codes by up to an order of magnitude compared to previous schemes in the literature.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Hybrid Quantum-Classical Clustering for Preparing a Prior Distribution of Eigenspectrum
Authors:
Mengzhen Ren,
Yu-Cheng Chen,
Ching-Jui Lai,
Min-Hsiu Hsieh,
Alice Hu
Abstract:
Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamilt…
▽ More
Determining the energy gap in a quantum many-body system is critical to understanding its behavior and is important in quantum chemistry and condensed matter physics. The challenge of determining the energy gap requires identifying both the excited and ground states of a system. In this work, we consider preparing the prior distribution and circuits for the eigenspectrum of time-independent Hamiltonians, which can benefit both classical and quantum algorithms for solving eigenvalue problems. The proposed algorithm unfolds in three strategic steps: Hamiltonian transformation, parameter representation, and classical clustering. These steps are underpinned by two key insights: the use of quantum circuits to approximate the ground state of transformed Hamiltonians and the analysis of parameter representation to distinguish between eigenvectors. The algorithm is showcased through applications to the 1D Heisenberg system and the LiH molecular system, highlighting its potential for both near-term quantum devices and fault-tolerant quantum devices. The paper also explores the scalability of the method and its performance across various settings, setting the stage for more resource-efficient quantum computations that are both accurate and fast. The findings presented here mark a new insight into hybrid algorithms, offering a pathway to overcoming current computational challenges.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Renal digital pathology visual knowledge search platform based on language large model and book knowledge
Authors:
Xiaomin Lv,
Chong Lai,
Liya Ding,
Maode Lai,
Qingrong Sun
Abstract:
Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models,…
▽ More
Large models have become mainstream, yet their applications in digital pathology still require exploration. Meanwhile renal pathology images play an important role in the diagnosis of renal diseases. We conducted image segmentation and paired corresponding text descriptions based on 60 books for renal pathology, clustering analysis for all image and text description features based on large models, ultimately building a retrieval system based on the semantic features of large models. Based above analysis, we established a knowledge base of 10,317 renal pathology images and paired corresponding text descriptions, and then we evaluated the semantic feature capabilities of 4 large models, including GPT2, gemma, LLma and Qwen, and the image-based feature capabilities of dinov2 large model. Furthermore, we built a semantic retrieval system to retrieve pathological images based on text descriptions, and named RppD (aidp.zjsru.edu.cn).
△ Less
Submitted 26 May, 2024;
originally announced June 2024.
-
Discovery and Extensive Follow-Up of SN 2024ggi, a nearby type IIP supernova in NGC 3621
Authors:
Ting-Wan Chen,
Sheng Yang,
Shubham Srivastav,
Takashi J. Moriya,
Stephen J. Smartt,
Sofia Rest,
Armin Rest,
Hsing Wen Lin,
Hao-Yu Miao,
Yu-Chi Cheng,
Amar Aryan,
Chia-Yu Cheng,
Morgan Fraser,
Li-Ching Huang,
Meng-Han Lee,
Cheng-Han Lai,
Yu Hsuan Liu,
Aiswarya Sankar. K,
Ken W. Smith,
Heloise F. Stevance,
Ze-Ning Wang,
Joseph P. Anderson,
Charlotte R. Angus,
Thomas de Boer,
Kenneth Chambers
, et al. (23 additional authors not shown)
Abstract:
We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o…
▽ More
We present the discovery and early observations of the nearby Type II supernova (SN) 2024ggi in NGC 3621 at 6.64 +/- 0.3 Mpc. The SN was caught 5.8 (+1.9 -2.9) hours after its explosion by the ATLAS survey. Early-phase, high-cadence, and multi-band photometric follow-up was performed by the Kinder (Kilonova Finder) project, collecting over 1000 photometric data points within a week. The combined o- and r-band light curves show a rapid rise of 3.3 magnitudes in 13.7 hours, much faster than SN 2023ixf (another recent, nearby, and well-observed SN II). Between 13.8 and 18.8 hours after explosion SN 2024ggi became bluer, with u-g colour dropping from 0.53 to 0.15 mag. The rapid blueward evolution indicates a wind shock breakout (SBO) scenario. No hour-long brightening expected for the SBO from a bare stellar surface was detected during our observations. The classification spectrum, taken 17 hours after the SN explosion, shows flash features of high-ionization species such as Balmer lines, He I, C III, and N III. Detailed light curve modeling reveals critical insights into the properties of the circumstellar material (CSM). Our favoured model has an explosion energy of 2 x 10^51 erg, a mass-loss rate of 10^-3 solar_mass/yr (with an assumed 10 km/s wind), and a confined CSM radius of 6 x 10^14 cm. The corresponding CSM mass is 0.4 solar_mass. Comparisons with SN 2023ixf highlight that SN 2024ggi has a smaller CSM density, resulting in a faster rise and fainter UV flux. The extensive dataset and the involvement of citizen astronomers underscore that a collaborative network is essential for SBO searches, leading to more precise and comprehensive SN characterizations.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Speech Emotion Recognition with ASR Transcripts: A Comprehensive Study on Word Error Rate and Fusion Techniques
Authors:
Yuanchao Li,
Peter Bell,
Catherine Lai
Abstract:
Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a gap between in-lab research and real-world scenarios where Automatic Speech Recognition (ASR) serves as the text source. Hence, this study benchmarks SE…
▽ More
Text data is commonly utilized as a primary input to enhance Speech Emotion Recognition (SER) performance and reliability. However, the reliance on human-transcribed text in most studies impedes the development of practical SER systems, creating a gap between in-lab research and real-world scenarios where Automatic Speech Recognition (ASR) serves as the text source. Hence, this study benchmarks SER performance using ASR transcripts with varying Word Error Rates (WERs) on well-known corpora: IEMOCAP, CMU-MOSI, and MSP-Podcast. Our evaluation includes text-only and bimodal SER with diverse fusion techniques, aiming for a comprehensive analysis that uncovers novel findings and challenges faced by current SER research. Additionally, we propose a unified ASR error-robust framework integrating ASR error correction and modality-gated fusion, achieving lower WER and higher SER results compared to the best-performing ASR transcript. This research is expected to provide insights into SER with ASR assistance, especially for real-world applications.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Adversarial Patch for 3D Local Feature Extractor
Authors:
Yu Wen Pao,
Li Chang Lai,
Hong-Yi Lin
Abstract:
Local feature extractors are the cornerstone of many computer vision tasks. However, their vulnerability to adversarial attacks can significantly compromise their effectiveness. This paper discusses approaches to attack sophisticated local feature extraction algorithms and models to achieve two distinct goals: (1) forcing a match between originally non-matching image regions, and (2) preventing a…
▽ More
Local feature extractors are the cornerstone of many computer vision tasks. However, their vulnerability to adversarial attacks can significantly compromise their effectiveness. This paper discusses approaches to attack sophisticated local feature extraction algorithms and models to achieve two distinct goals: (1) forcing a match between originally non-matching image regions, and (2) preventing a match between originally matching regions. At the end of the paper, we discuss the performance and drawbacks of different patch generation methods.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Better Late Than Never: Formulating and Benchmarking Recommendation Editing
Authors:
Chengyu Lai,
Sheng Zhou,
Zhimeng Jiang,
Qiaoyu Tan,
Yuanchen Bei,
Jiawei Chen,
Ningyu Zhang,
Jiajun Bu
Abstract:
Recommendation systems play a pivotal role in suggesting items to users based on their preferences. However, in online platforms, these systems inevitably offer unsuitable recommendations due to limited model capacity, poor data quality, or evolving user interests. Enhancing user experience necessitates efficiently rectify such unsuitable recommendation behaviors. This paper introduces a novel and…
▽ More
Recommendation systems play a pivotal role in suggesting items to users based on their preferences. However, in online platforms, these systems inevitably offer unsuitable recommendations due to limited model capacity, poor data quality, or evolving user interests. Enhancing user experience necessitates efficiently rectify such unsuitable recommendation behaviors. This paper introduces a novel and significant task termed recommendation editing, which focuses on modifying known and unsuitable recommendation behaviors. Specifically, this task aims to adjust the recommendation model to eliminate known unsuitable items without accessing training data or retraining the model. We formally define the problem of recommendation editing with three primary objectives: strict rectification, collaborative rectification, and concentrated rectification. Three evaluation metrics are developed to quantitatively assess the achievement of each objective. We present a straightforward yet effective benchmark for recommendation editing using novel Editing Bayesian Personalized Ranking Loss. To demonstrate the effectiveness of the proposed method, we establish a comprehensive benchmark that incorporates various methods from related fields. Codebase is available at https://github.com/cycl2018/Recommendation-Editing.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Combining Experimental and Historical Data for Policy Evaluation
Authors:
Ting Li,
Chengchun Shi,
Qianglin Wen,
Yang Sui,
Yongli Qin,
Chunbo Lai,
Hongtu Zhu
Abstract:
This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to min…
▽ More
This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
1st Place Solution to Odyssey Emotion Recognition Challenge Task1: Tackling Class Imbalance Problem
Authors:
Mingjie Chen,
Hezhao Zhang,
Yuanchao Li,
Jiachen Luo,
Wen Wu,
Ziyang Ma,
Peter Bell,
Catherine Lai,
Joshua Reiss,
Lin Wang,
Philip C. Woodland,
Xie Chen,
Huy Phan,
Thomas Hain
Abstract:
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for t…
▽ More
Speech emotion recognition is a challenging classification task with natural emotional speech, especially when the distribution of emotion types is imbalanced in the training and test data. In this case, it is more difficult for a model to learn to separate minority classes, resulting in those sometimes being ignored or frequently misclassified. Previous work has utilised class weighted loss for training, but problems remain as it sometimes causes over-fitting for minor classes or under-fitting for major classes. This paper presents the system developed by a multi-site team for the participation in the Odyssey 2024 Emotion Recognition Challenge Track-1. The challenge data has the aforementioned properties and therefore the presented systems aimed to tackle these issues, by introducing focal loss in optimisation when applying class weighted loss. Specifically, the focal loss is further weighted by prior-based class weights. Experimental results show that combining these two approaches brings better overall performance, by sacrificing performance on major classes. The system further employs a majority voting strategy to combine the outputs of an ensemble of 7 models. The models are trained independently, using different acoustic features and loss functions - with the aim to have different properties for different data. Hence these models show different performance preferences on major classes and minor classes. The ensemble system output obtained the best performance in the challenge, ranking top-1 among 68 submissions. It also outperformed all single models in our set. On the Odyssey 2024 Emotion Recognition Challenge Task-1 data the system obtained a Macro-F1 score of 35.69% and an accuracy of 37.32%.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation
Authors:
Koichi Saito,
Dongjun Kim,
Takashi Shibuya,
Chieh-Hsin Lai,
Zhi Zhong,
Yuhta Takida,
Yuki Mitsufuji
Abstract:
Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error…
▽ More
Sound content is an indispensable element for multimedia works such as video games, music, and films. Recent high-quality diffusion-based sound generation models can serve as valuable tools for the creators. However, despite producing high-quality sounds, these models often suffer from slow inference speeds. This drawback burdens creators, who typically refine their sounds through trial and error to align them with their artistic intentions. To address this issue, we introduce Sound Consistency Trajectory Models (SoundCTM). Our model enables flexible transitioning between high-quality 1-step sound generation and superior sound quality through multi-step generation. This allows creators to initially control sounds with 1-step samples before refining them through multi-step generation. While CTM fundamentally achieves flexible 1-step and multi-step generation, its impressive performance heavily depends on an additional pretrained feature extractor and an adversarial loss, which are expensive to train and not always available in other domains. Thus, we reframe CTM's training framework and introduce a novel feature distance by utilizing the teacher's network for a distillation loss. Additionally, while distilling classifier-free guided trajectories, we train conditional and unconditional student models simultaneously and interpolate between these models during inference. We also propose training-free controllable frameworks for SoundCTM, leveraging its flexible sampling capability. SoundCTM achieves both promising 1-step and multi-step real-time sound generation without using any extra off-the-shelf networks. Furthermore, we demonstrate SoundCTM's capability of controllable sound generation in a training-free manner. Our codes, pretrained models, and audio samples are available at https://github.com/sony/soundctm.
△ Less
Submitted 10 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Revisiting the Message Passing in Heterophilous Graph Neural Networks
Authors:
Zhuonan Zheng,
Yuanchen Bei,
Sheng Zhou,
Yao Ma,
Ming Gu,
HongJia XU,
Chengyu Lai,
Jiawei Chen,
Jiajun Bu
Abstract:
Graph Neural Networks (GNNs) have demonstrated strong performance in graph mining tasks due to their message-passing mechanism, which is aligned with the homophily assumption that adjacent nodes exhibit similar behaviors. However, in many real-world graphs, connected nodes may display contrasting behaviors, termed as heterophilous patterns, which has attracted increased interest in heterophilous G…
▽ More
Graph Neural Networks (GNNs) have demonstrated strong performance in graph mining tasks due to their message-passing mechanism, which is aligned with the homophily assumption that adjacent nodes exhibit similar behaviors. However, in many real-world graphs, connected nodes may display contrasting behaviors, termed as heterophilous patterns, which has attracted increased interest in heterophilous GNNs (HTGNNs). Although the message-passing mechanism seems unsuitable for heterophilous graphs due to the propagation of class-irrelevant information, it is still widely used in many existing HTGNNs and consistently achieves notable success. This raises the question: why does message passing remain effective on heterophilous graphs? To answer this question, in this paper, we revisit the message-passing mechanisms in heterophilous graph neural networks and reformulate them into a unified heterophilious message-passing (HTMP) mechanism. Based on HTMP and empirical analysis, we reveal that the success of message passing in existing HTGNNs is attributed to implicitly enhancing the compatibility matrix among classes. Moreover, we argue that the full potential of the compatibility matrix is not completely achieved due to the existence of incomplete and noisy semantic neighborhoods in real-world heterophilous graphs. To bridge this gap, we introduce a new approach named CMGNN, which operates within the HTMP mechanism to explicitly leverage and improve the compatibility matrix. A thorough evaluation involving 10 benchmark datasets and comparative analysis against 13 well-established baselines highlights the superior performance of the HTMP mechanism and CMGNN method.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping
Authors:
Junyoung Seo,
Kazumi Fukuda,
Takashi Shibuya,
Takuya Narihira,
Naoki Murata,
Shoukang Hu,
Chieh-Hsin Lai,
Seungryong Kim,
Yuki Mitsufuji
Abstract:
Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images. In these methods, an input view is geometrically warped to…
▽ More
Generating novel views from a single image remains a challenging task due to the complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a model on. Recent research combining large-scale text-to-image (T2I) models with monocular depth estimation (MDE) has shown promise in handling in-the-wild images. In these methods, an input view is geometrically warped to novel views with estimated depth maps, then the warped image is inpainted by T2I models. However, they struggle with noisy depth maps and loss of semantic details when warping an input view to novel viewpoints. In this paper, we propose a novel approach for single-shot novel view synthesis, a semantic-preserving generative warping framework that enables T2I generative models to learn where to warp and where to generate, through augmenting cross-view attention with self-attention. Our approach addresses the limitations of existing methods by conditioning the generative model on source view images and incorporating geometric warping signals. Qualitative and quantitative evaluations demonstrate that our model outperforms existing methods in both in-domain and out-of-domain scenarios. Project page is available at https://GenWarp-NVS.github.io/.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Crossmodal ASR Error Correction with Discrete Speech Units
Authors:
Yuanchao Li,
Pinzhen Chen,
Peter Bell,
Catherine Lai
Abstract:
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with…
▽ More
ASR remains unsatisfactory in scenarios where the speaking style diverges from that used to train ASR systems, resulting in erroneous transcripts. To address this, ASR Error Correction (AEC), a post-ASR processing approach, is required. In this work, we tackle an understudied issue: the Low-Resource Out-of-Domain (LROOD) problem, by investigating crossmodal AEC on very limited downstream data with 1-best hypothesis transcription. We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon, shedding light on appropriate training schemes for LROOD data. Moreover, we propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality. Results from multiple corpora and several evaluation metrics demonstrate the feasibility and efficacy of our proposed AEC approach on LROOD data, as well as its generalizability and superiority on large-scale data. Finally, a study on speech emotion recognition confirms that our model produces ASR error-robust transcripts suitable for downstream applications.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Diffusion-Reward Adversarial Imitation Learning
Authors:
Chun-Mao Lai,
Hsiang-Chun Wang,
Ping-Chun Hsieh,
Yu-Chiang Frank Wang,
Min-Hung Chen,
Shao-Hua Sun
Abstract:
Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despit…
▽ More
Imitation learning aims to learn a policy from observing expert demonstrations without access to reward signals from environments. Generative adversarial imitation learning (GAIL) formulates imitation learning as adversarial learning, employing a generator policy learning to imitate expert behaviors and discriminator learning to distinguish the expert demonstrations from agent trajectories. Despite its encouraging results, GAIL training is often brittle and unstable. Inspired by the recent dominance of diffusion models in generative modeling, this work proposes Diffusion-Reward Adversarial Imitation Learning (DRAIL), which integrates a diffusion model into GAIL, aiming to yield more precise and smoother rewards for policy learning. Specifically, we propose a diffusion discriminative classifier to construct an enhanced discriminator; then, we design diffusion rewards based on the classifier's output for policy learning. We conduct extensive experiments in navigation, manipulation, and locomotion, verifying DRAIL's effectiveness compared to prior imitation learning methods. Moreover, additional experimental results demonstrate the generalizability and data efficiency of DRAIL. Visualized learned reward functions of GAIL and DRAIL suggest that DRAIL can produce more precise and smoother rewards.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher
Authors:
Dongjun Kim,
Chieh-Hsin Lai,
Wei-Hsiang Liao,
Yuhta Takida,
Naoki Murata,
Toshimitsu Uesaka,
Yuki Mitsufuji,
Stefano Ermon
Abstract:
To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyo…
▽ More
To accelerate sampling, diffusion models (DMs) are often distilled into generators that directly map noise to data in a single step. In this approach, the resolution of the generator is fundamentally limited by that of the teacher DM. To overcome this limitation, we propose Progressive Growing of Diffusion Autoencoder (PaGoDA), a technique to progressively grow the resolution of the generator beyond that of the original teacher DM. Our key insight is that a pre-trained, low-resolution DM can be used to deterministically encode high-resolution data to a structured latent space by solving the PF-ODE forward in time (data-to-noise), starting from an appropriately down-sampled image. Using this frozen encoder in an auto-encoder framework, we train a decoder by progressively growing its resolution. From the nature of progressively growing decoder, PaGoDA avoids re-training teacher/student models when we upsample the student model, making the whole training pipeline much cheaper. In experiments, we used our progressively growing decoder to upsample from the pre-trained model's 64x64 resolution to generate 512x512 samples, achieving 2x faster inference compared to single-step distilled Stable Diffusion like LCM. PaGoDA also achieved state-of-the-art FIDs on ImageNet across all resolutions from 64x64 to 512x512. Additionally, we demonstrated PaGoDA's effectiveness in solving inverse problems and enabling controllable generation.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information
Authors:
Toshimitsu Uesaka,
Taiji Suzuki,
Yuhta Takida,
Chieh-Hsin Lai,
Naoki Murata,
Yuki Mitsufuji
Abstract:
Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP is a key concept in multimodal representation learning. In this work, we provide a theoretical understanding of the symmetric InfoNCE loss through the lens of the pointwise mutual information and show that encode…
▽ More
Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP is a key concept in multimodal representation learning. In this work, we provide a theoretical understanding of the symmetric InfoNCE loss through the lens of the pointwise mutual information and show that encoders that achieve the optimal similarity in the pretraining provide a good representation for downstream classification tasks under mild assumptions. Based on our theoretical results, we also propose a new similarity metric for multimodal contrastive learning by utilizing a nonlinear kernel to enrich the capability. To verify the effectiveness of the proposed method, we demonstrate pretraining of multimodal representation models on the Conceptual Caption datasets and evaluate zero-shot classification and linear classification on common benchmark datasets.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
How to surpass no-go limits in Gaussian quantum error correction and entangled Gaussian state distillation?
Authors:
En-Jui Chang,
Ching-Yi Lai
Abstract:
Gaussian quantum information processing with continuous-variable (CV) quantum information carriers holds significant promise for applications in quantum communication and quantum internet. However, applying Gaussian state distillation and quantum error correction (QEC) faces limitations imposed by no-go results concerning local Gaussian unitary operations and classical communications. This paper i…
▽ More
Gaussian quantum information processing with continuous-variable (CV) quantum information carriers holds significant promise for applications in quantum communication and quantum internet. However, applying Gaussian state distillation and quantum error correction (QEC) faces limitations imposed by no-go results concerning local Gaussian unitary operations and classical communications. This paper introduces a Gaussian QEC protocol that relies solely on local Gaussian resources. A pivotal component of our approach is CV gate teleportation using entangled Gaussian states, which facilitates the implementation of the partial transpose operation on a quantum channel. Consequently, we can efficiently construct a two-mode noise-polarized channel from two noisy Gaussian channels. Furthermore, this QEC protocol naturally extends to a nonlocal Gaussian state distillation protocol.
△ Less
Submitted 7 May, 2024; v1 submitted 29 April, 2024;
originally announced April 2024.
-
Machine learning for climate physics and simulations
Authors:
Ching-Yao Lai,
Pedram Hassanzadeh,
Aditi Sheshadri,
Maike Sonnewald,
Raffaele Ferrari,
Venkatramani Balaji
Abstract:
We discuss the emerging advances and opportunities at the intersection of machine learning (ML) and climate physics, highlighting the use of ML techniques, including supervised, unsupervised, and equation discovery, to accelerate climate knowledge discoveries and simulations. We delineate two distinct yet complementary aspects: (1) ML for climate physics and (2) ML for climate simulations. While p…
▽ More
We discuss the emerging advances and opportunities at the intersection of machine learning (ML) and climate physics, highlighting the use of ML techniques, including supervised, unsupervised, and equation discovery, to accelerate climate knowledge discoveries and simulations. We delineate two distinct yet complementary aspects: (1) ML for climate physics and (2) ML for climate simulations. While physics-free ML-based models, such as ML-based weather forecasting, have demonstrated success when data is abundant and stationary, the physics knowledge and interpretability of ML models become crucial in the small-data/non-stationary regime to ensure generalizability. Given the absence of observations, the long-term future climate falls into the small-data regime. Therefore, ML for climate physics holds a critical role in addressing the challenges of ML for climate simulations. We emphasize the need for collaboration among climate physics, ML theory, and numerical analysis to achieve reliable ML-based models for climate applications.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
A Large-Scale Evaluation of Speech Foundation Models
Authors:
Shu-wen Yang,
Heng-Jui Chang,
Zili Huang,
Andy T. Liu,
Cheng-I Lai,
Haibin Wu,
Jiatong Shi,
Xuankai Chang,
Hsiang-Sheng Tsai,
Wen-Chin Huang,
Tzu-hsun Feng,
Po-Han Chi,
Yist Y. Lin,
Yung-Sung Chuang,
Tzu-Hsien Huang,
Wei-Cheng Tseng,
Kushal Lakhotia,
Shang-Wen Li,
Abdelrahman Mohamed,
Shinji Watanabe,
Hung-yi Lee
Abstract:
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work,…
▽ More
The foundation model paradigm leverages a shared foundation model to achieve state-of-the-art (SOTA) performance for various tasks, requiring minimal downstream-specific modeling and data annotation. This approach has proven crucial in the field of Natural Language Processing (NLP). However, the speech processing community lacks a similar setup to explore the paradigm systematically. In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech. We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads. Combining our results with community submissions, we verify that the foundation model paradigm is promising for speech, and our multi-tasking framework is simple yet effective, as the best-performing foundation model shows competitive generalizability across most SUPERB tasks. For reproducibility and extensibility, we have developed a long-term maintained platform that enables deterministic benchmarking, allows for result sharing via an online leaderboard, and promotes collaboration through a community-driven benchmark database to support new development cycles. Finally, we conduct a series of analyses to offer an in-depth understanding of SUPERB and speech foundation models, including information flows across tasks inside the models, the correctness of the weighted-sum benchmarking protocol and the statistical significance and robustness of the benchmark.
△ Less
Submitted 29 May, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Efficient Ground State Estimation Using Generalized Hund's Rule
Authors:
Leo Chiang,
Ching-Jui Lai
Abstract:
Quantum computers offer a promising approach to simulate the ground state of molecules, which is crucial for understanding molecular properties and chemical reactions. However, the limited number of available qubits on current devices poses a challenge for simulation. This paper investigates the feasibility of reducing the qubit usage of molecular simulation by examining specific fermionic states…
▽ More
Quantum computers offer a promising approach to simulate the ground state of molecules, which is crucial for understanding molecular properties and chemical reactions. However, the limited number of available qubits on current devices poses a challenge for simulation. This paper investigates the feasibility of reducing the qubit usage of molecular simulation by examining specific fermionic states according to Hund's rule.
We introduced a new framework based on qubit efficiency encoding. Based on this framework, the Hamiltonian is restricted to the Hund subspace. Compared to only concerned particle conservation, the proposed method can reduce $N$ qubit usage for an $M$ orbitals and $N$ electrons molecule when $M\gg N$. Additionally, when using the STO-3G basis sets, the simulations of the $15$ molecules with given molecular geometry by the proposed method are close to the full configuration interaction. The absolute difference is at most $0.121\%$. Meanwhile, predictions from potential energy surfaces using the proposed method have an absolute difference at most $4.1\%$.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
On the Springer correspondence for wreath products
Authors:
You-Hung Hsu,
Chun-Ju Lai
Abstract:
We first show that the wreath product $Σ_m\wr Σ_d$ between two symmetric groups appears as the generalized Weyl group of an Iwahori's generalized Tits system. We then introduce a certain subvariety of the flag variety of type A, and then give a geometric proof of its Bruhat decomposition indexed by $Σ_m\wr Σ_d$, via the Bialynicki-Birula decomposition. Furthermore, we realize the group algebra…
▽ More
We first show that the wreath product $Σ_m\wr Σ_d$ between two symmetric groups appears as the generalized Weyl group of an Iwahori's generalized Tits system. We then introduce a certain subvariety of the flag variety of type A, and then give a geometric proof of its Bruhat decomposition indexed by $Σ_m\wr Σ_d$, via the Bialynicki-Birula decomposition. Furthermore, we realize the group algebra $\mathbb{Q}[Σ_m\wr Σ_d]$ as the top Borel-Moore homology of a Steinberg variety. Such a geometric realization leads to a Springer correspondence for the irreducible representations over $\mathbb{C}[Σ_m\wr Σ_d]$, which can be regarded as a counterpart of the Clifford theory for wreath products. Consequently, we have obtained a new Springer correspondence of type B/C/D using essentially type A geometry.
△ Less
Submitted 18 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
RCdpia: A Renal Carcinoma Digital Pathology Image Annotation dataset based on pathologists
Authors:
Qingrong Sun,
Weixiang Zhong,
Jie Zhou,
Chong Lai,
Xiaodong Teng,
Maode Lai
Abstract:
The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of t…
▽ More
The annotation of digital pathological slide data for renal cell carcinoma is of paramount importance for correct diagnosis of artificial intelligence models due to the heterogeneous nature of the tumor. This process not only facilitates a deeper understanding of renal cell cancer heterogeneity but also aims to minimize noise in the data for more accurate studies. To enhance the applicability of the data, two pathologists were enlisted to meticulously curate, screen, and label a kidney cancer pathology image dataset from The Cancer Genome Atlas Program (TCGA) database. Subsequently, a Resnet model was developed to validate the annotated dataset against an additional dataset from the First Affiliated Hospital of Zhejiang University. Based on these results, we have meticulously compiled the TCGA digital pathological dataset with independent labeling of tumor regions and adjacent areas (RCdpia), which includes 109 cases of kidney chromophobe cell carcinoma, 486 cases of kidney clear cell carcinoma, and 292 cases of kidney papillary cell carcinoma. This dataset is now publicly accessible at http://39.171.241.18:8888/RCdpia/. Furthermore, model analysis has revealed significant discrepancies in predictive outcomes when applying the same model to datasets from different centers. Leveraging the RCdpia, we can now develop more precise digital pathology artificial intelligence models for tasks such as normalization, classification, and segmentation. These advancements underscore the potential for more nuanced and accurate AI applications in the field of digital pathology.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Harnessing Coding Theory for Reliable Network Quantum Communication
Authors:
Ching-Yi Lai,
Kao-Yueh Kuo
Abstract:
This article explores the application of coding techniques for fault-tolerant quantum computation and extends their usage to fault-tolerant quantum communication. We review repeater-based quantum networks, emphasizing the roles of coding theory and fault-tolerant quantum operations, particularly in the context of quantum teleportation. We highlight that fault-tolerant implementation of the Bell me…
▽ More
This article explores the application of coding techniques for fault-tolerant quantum computation and extends their usage to fault-tolerant quantum communication. We review repeater-based quantum networks, emphasizing the roles of coding theory and fault-tolerant quantum operations, particularly in the context of quantum teleportation. We highlight that fault-tolerant implementation of the Bell measurement enables reliable quantum communication without requiring a universal set of quantum gates. Finally, we discuss various quantum code candidates for achieving higher transmission rates.
△ Less
Submitted 29 February, 2024;
originally announced February 2024.
-
In-beam test results of an RPC-based module for position-sensitive neutron detectors with timing readout
Authors:
G. Canezin,
L. M. S. Margato,
A. Morozov,
A. Blanco,
J. Saraiva,
L. Lopes,
P. Fonte,
Chung Chuan Lai,
Per-Olof Svensson,
G. Markaj,
Florian M. Piegsa
Abstract:
Recently we have proposed a new concept of a thermal neutron detector based on resistive plate chambers and 10B4C solid neutron converters, enabling to readout with high resolution in both the 3D position of neutron capture and the neutron time of flight (ToF). In this paper, we report the results of the first beam tests conducted with a new neutron RPC detection module, coupled to the position re…
▽ More
Recently we have proposed a new concept of a thermal neutron detector based on resistive plate chambers and 10B4C solid neutron converters, enabling to readout with high resolution in both the 3D position of neutron capture and the neutron time of flight (ToF). In this paper, we report the results of the first beam tests conducted with a new neutron RPC detection module, coupled to the position readout units of a new design. The main focus is on the measurements of the neutron ToF and identification of the converter layer where the neutron is captured, giving the position along the beam direction.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Authors:
Zechun Liu,
Changsheng Zhao,
Forrest Iandola,
Chen Lai,
Yuandong Tian,
Igor Fedorov,
Yunyang Xiong,
Ernie Chang,
Yangyang Shi,
Raghuraman Krishnamoorthi,
Liangzhen Lai,
Vikas Chandra
Abstract:
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our in…
▽ More
This paper addresses the growing need for efficient large language models (LLMs) on mobile devices, driven by increasing cloud costs and latency concerns. We focus on designing top-quality LLMs with fewer than a billion parameters, a practical choice for mobile deployment. Contrary to prevailing belief emphasizing the pivotal role of data and parameter quantity in determining model quality, our investigation underscores the significance of model architecture for sub-billion scale LLMs. Leveraging deep and thin architectures, coupled with embedding sharing and grouped-query attention mechanisms, we establish a strong baseline network denoted as MobileLLM, which attains a remarkable 2.7%/4.3% accuracy boost over preceding 125M/350M state-of-the-art models. Additionally, we propose an immediate block-wise weight-sharing approach with no increase in model size and only marginal latency overhead. The resultant models, denoted as MobileLLM-LS, demonstrate a further accuracy enhancement of 0.7%/0.8% than MobileLLM 125M/350M. Moreover, MobileLLM model family shows significant improvements compared to previous sub-billion models on chat benchmarks, and demonstrates close correctness to LLaMA-v2 7B in API calling tasks, highlighting the capability of small models for common on-device use cases.
△ Less
Submitted 26 June, 2024; v1 submitted 22 February, 2024;
originally announced February 2024.
-
Influence of thermal effects on atomic Bloch oscillation
Authors:
Guoling Yin,
Chi-Kin Lai,
Nana Chang,
Yi Liang,
Dekai Mao,
Xiaoji Zhou
Abstract:
Advancements in the experimental toolbox of cold atoms have enabled the meticulous control of atomic Bloch oscillation within optical lattices, thereby enhancing the capabilities of gravity interferometers. This work delves into the impact of thermal effects on Bloch oscillation in 1D accelerated optical lattices aligned with gravity by varying the system's initial temperature. Through the applica…
▽ More
Advancements in the experimental toolbox of cold atoms have enabled the meticulous control of atomic Bloch oscillation within optical lattices, thereby enhancing the capabilities of gravity interferometers. This work delves into the impact of thermal effects on Bloch oscillation in 1D accelerated optical lattices aligned with gravity by varying the system's initial temperature. Through the application of Raman cooling, we effectively reduce the longitudinal thermal effect, stabilizing the longitudinal coherence length over the timescale of its lifetime. The atomic losses over multiple Bloch oscillation is measured, which are primarily attributed to transverse excitation. Furthermore, we identify two distinct inverse scaling behaviors in the oscillation lifetime scaled by the corresponding density with respect to temperatures, implying diverse equilibrium processes within or outside the Bose-Einstein condensate regime. The competition between the system's coherence and atomic density leads to a relatively smooth variation in the actual lifetime versus temperature. Our findings provide valuable insights into the interaction between thermal effects and Bloch oscillation, offering avenues for the refinement of quantum measurement technologies.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Learned Image Compression with Text Quality Enhancement
Authors:
Chih-Yu Lai,
Dung Tran,
Kazuhito Koishida
Abstract:
Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original an…
▽ More
Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original and reconstructed images, thereby improving the perceptual quality of the reconstructed text. Through rigorous experimentation across diverse datasets and employing state-of-the-art algorithms, our findings reveal significant enhancements in the quality of reconstructed text upon integration of the proposed loss function with appropriate weighting. Notably, we achieve a Bjontegaard delta (BD) rate of -32.64% for Character Error Rate (CER) and -28.03% for Word Error Rate (WER) on average by applying the text logit loss for two screenshot datasets. Additionally, we present quantitative metrics tailored for evaluating text quality in image compression tasks. Our findings underscore the efficacy and potential applicability of our proposed text logit loss function across various text-aware image compression contexts.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Multi-Blade detector with VMM3a-ASIC-based readout: installation and commissioning at the reflectometer Amor at PSI
Authors:
F. Piscitelli,
F. Ghazi Moradi,
F. S. Alves,
M. J. Christensen,
J. Hrivnak,
A. Johansson,
K. Fissum,
C. C. Lai,
A. Monera Martinez,
D. Pfeiffer,
E. Shahu,
J. Stahn,
P. O. Svensson
Abstract:
The Multi-Blade (MB) Boron-10-based neutron detector is the chosen technology for three instruments at the European Spallation Source (ESS): the two ESS reflectometers, ESTIA and FREIA, and the Test Beam Line. A fourth MB detector has been built, installed and commissioned for the user operation of the reflectometer Amor at PSI (Switzerland). Amor can be considered a downscaled version of the ESS…
▽ More
The Multi-Blade (MB) Boron-10-based neutron detector is the chosen technology for three instruments at the European Spallation Source (ESS): the two ESS reflectometers, ESTIA and FREIA, and the Test Beam Line. A fourth MB detector has been built, installed and commissioned for the user operation of the reflectometer Amor at PSI (Switzerland). Amor can be considered a downscaled version of the ESS reflectometer ESTIA. They are based on the same Selene guide concept, optimized for performing focusing reflectometry on small samples. The experience gained at Amor is invaluable for the future deployment of the MB detector at the ESS. This manuscript describes the MB detector construction and installation at Amor along with the readout electronics chain based on the VMM3a ASIC. The readout chain deployed at Amor is equivalent of that of the ESS, including the readout master module (RMM), event-formation-units (EFUs), Kafka, FileWriter and live visualisation tools.
△ Less
Submitted 18 March, 2024; v1 submitted 13 February, 2024;
originally announced February 2024.
-
Layer-Wise Analysis of Self-Supervised Acoustic Word Embeddings: A Study on Speech Emotion Recognition
Authors:
Alexandra Saliba,
Yuanchao Li,
Ramon Sanabria,
Catherine Lai
Abstract:
The efficacy of self-supervised speech models has been validated, yet the optimal utilization of their representations remains challenging across diverse tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks. AWEs have previously shown utility in capturing acoustic discrimin…
▽ More
The efficacy of self-supervised speech models has been validated, yet the optimal utilization of their representations remains challenging across diverse tasks. In this study, we delve into Acoustic Word Embeddings (AWEs), a fixed-length feature derived from continuous representations, to explore their advantages in specific tasks. AWEs have previously shown utility in capturing acoustic discriminability. In light of this, we propose measuring layer-wise similarity between AWEs and word embeddings, aiming to further investigate the inherent context within AWEs. Moreover, we evaluate the contribution of AWEs, in comparison to other types of speech features, in the context of Speech Emotion Recognition (SER). Through a comparative experiment and a layer-wise accuracy analysis on two distinct corpora, IEMOCAP and ESD, we explore differences between AWEs and raw self-supervised representations, as well as the proper utilization of AWEs alone and in combination with word embeddings. Our findings underscore the acoustic context conveyed by AWEs and showcase the highly competitive SER accuracies by appropriately employing AWEs.
△ Less
Submitted 4 February, 2024;
originally announced February 2024.
-
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering
Authors:
Haibo Wang,
Chenghang Lai,
Yixuan Sun,
Weifeng Ge
Abstract:
Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently, by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no hu…
▽ More
Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently, by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no human annotations for question-critical timestamps in existing VideoQA datasets. In light of this, we propose a novel weakly supervised framework to enforce the LMMs to reason out the answers with question-critical moments as visual inputs. Specifically, we first fuse the question and answer pairs as event descriptions to find multiple keyframes as target moments and pseudo-labels, with the visual-language alignment capability of the CLIP models. With these pseudo-labeled keyframes as additionally weak supervision, we devise a lightweight Gaussian-based Contrastive Grounding (GCG) module. GCG learns multiple Gaussian functions to characterize the temporal structure of the video, and sample question-critical frames as positive moments to be the visual inputs of LMMs. Extensive experiments on several benchmarks verify the effectiveness of our framework, and we achieve substantial improvements compared to previous state-of-the-art methods.
△ Less
Submitted 26 April, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Should ChatGPT Write Your Breakup Text? Exploring the Role of AI in Relationship Dissolution
Authors:
Yue Fu,
Yixin Chen,
Zelia Gomes Da Costa Lai,
Alexis Hiniker
Abstract:
Relationships are essential to our happiness and wellbeing. The dissolution of a relationship, the final stage of relationship's lifecycle and one of the most stressful events in an individual's life, can have profound and long-lasting impacts on people. With the breakup process increasingly facilitated by computer-mediated communication (CMC), and the likely future influence of AI-mediated commun…
▽ More
Relationships are essential to our happiness and wellbeing. The dissolution of a relationship, the final stage of relationship's lifecycle and one of the most stressful events in an individual's life, can have profound and long-lasting impacts on people. With the breakup process increasingly facilitated by computer-mediated communication (CMC), and the likely future influence of AI-mediated communication (AIMC) tools, we conducted a semi-structured interview study with 21 participants. We aim to understand: 1) the current role of technology in the breakup process, 2) the needs and support individuals have during the process, and 3) how AI might address these needs. Our research shows that people have distinct needs at various stages of ending a relationship. Presently, technology is used for information gathering and community support, acting as a catalyst for breakups, enabling ghosting and blocking, and facilitating communication. Participants anticipate that AI could aid in sense-making of their relationship leading up to the breakup, act as a mediator, assist in crafting appropriate wording, tones, and language during breakup conversations, and support companionship, reflection, recovery, and growth after a breakup. Our findings also demonstrate an overlap between the breakup process and the Transtheoretical Model (TTM) of behavior change. Through the lens of TTM, we explore the potential support and affordances AI could offer in breakups, including its benefits and the necessary precautions regarding AI's role in this sensitive process.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans
Authors:
Messi H. J. Lee,
Jacob M. Montgomery,
Calvin K. Lai
Abstract:
Large language models (LLMs) are becoming pervasive in everyday life, yet their propensity to reproduce biases inherited from training data remains a pressing concern. Prior investigations into bias in LLMs have focused on the association of social groups with stereotypical attributes. However, this is only one form of human bias such systems may reproduce. We investigate a new form of bias in LLM…
▽ More
Large language models (LLMs) are becoming pervasive in everyday life, yet their propensity to reproduce biases inherited from training data remains a pressing concern. Prior investigations into bias in LLMs have focused on the association of social groups with stereotypical attributes. However, this is only one form of human bias such systems may reproduce. We investigate a new form of bias in LLMs that resembles a social psychological phenomenon where socially subordinate groups are perceived as more homogeneous than socially dominant groups. We had ChatGPT, a state-of-the-art LLM, generate texts about intersectional group identities and compared those texts on measures of homogeneity. We consistently found that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans, indicating that the model described racial minority groups with a narrower range of human experience. ChatGPT also portrayed women as more homogeneous than men, but these differences were small. Finally, we found that the effect of gender differed across racial/ethnic groups such that the effect of gender was consistent within African and Hispanic Americans but not within Asian and White Americans. We argue that the tendency of LLMs to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
△ Less
Submitted 25 April, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
Self-Supervised Millimeter Wave Indoor Localization using Tiny Neural Networks
Authors:
Anish Shastri,
Steve Blandino,
Camillo Gentile,
Chiehping Lai,
Paolo Casari
Abstract:
The quasi-optical propagation of millimeter-wave signals enables high-accuracy localization algorithms that employ geometric approaches or machine learning models. However, most algorithms require information on the indoor environment, may entail the collection of large training datasets, or bear an infeasible computational burden for commercial off-the-shelf (COTS) devices. In this work, we propo…
▽ More
The quasi-optical propagation of millimeter-wave signals enables high-accuracy localization algorithms that employ geometric approaches or machine learning models. However, most algorithms require information on the indoor environment, may entail the collection of large training datasets, or bear an infeasible computational burden for commercial off-the-shelf (COTS) devices. In this work, we propose to use tiny neural networks (NNs) to learn the relationship between angle difference-of-arrival (ADoA) measurements and locations of a receiver in an indoor environment. To relieve training data collection efforts, we resort to a self-supervised approach by bootstrapping the training of our neural network through location estimates obtained from a state-of-the-art localization algorithm. We evaluate our scheme via mmWave measurements from indoor 60-GHz double-directional channel sounding. We process the measurements to yield dominant multipath components, use the corresponding angles to compute ADoA values, and finally obtain location fixes. Results show that the tiny NN achieves sub-meter errors in 74\% of the cases, thus performing as good as or even better than the state-of-the-art algorithm, with significantly lower computational complexity.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes
Authors:
Yuhta Takida,
Yukara Ikemiya,
Takashi Shibuya,
Kazuki Shimada,
Woosung Choi,
Chieh-Hsin Lai,
Naoki Murata,
Toshimitsu Uesaka,
Kengo Uchida,
Wei-Hsiang Liao,
Yuki Mitsufuji
Abstract:
Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the co…
▽ More
Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the codebook is not efficiently used to express the data, and hence degrades reconstruction accuracy. To mitigate this problem, we propose a novel unified framework to stochastically learn hierarchical discrete representation on the basis of the variational Bayes framework, called hierarchically quantized variational autoencoder (HQ-VAE). HQ-VAE naturally generalizes the hierarchical variants of VQ-VAE, such as VQ-VAE-2 and residual-quantized VAE (RQ-VAE), and provides them with a Bayesian training scheme. Our comprehensive experiments on image datasets show that HQ-VAE enhances codebook usage and improves reconstruction performance. We also validated HQ-VAE in terms of its applicability to a different modality with an audio dataset.
△ Less
Submitted 28 March, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Towards More Faithful Natural Language Explanation Using Multi-Level Contrastive Learning in VQA
Authors:
Chengen Lai,
Shengli Song,
Shiqi Meng,
Jingyang Li,
Sitong Yan,
Guangneng Hu
Abstract:
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, s…
▽ More
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems. Existing post-hoc methods have achieved significant progress in obtaining a plausible explanation. However, such post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations. These problems reduce the faithfulness of explanations generated by models. To address the above issues, we propose a novel self-supervised \textbf{M}ulti-level \textbf{C}ontrastive \textbf{L}earning based natural language \textbf{E}xplanation model (MCLE) for VQA with semantic-level, image-level, and instance-level factual and counterfactual samples. MCLE extracts discriminative features and aligns the feature spaces from explanations with visual question and answer to generate more consistent explanations. We conduct extensive experiments, ablation analysis, and case study to demonstrate the effectiveness of our method on two VQA-NLE benchmarks.
△ Less
Submitted 21 December, 2023;
originally announced December 2023.
-
Erdős similarity problem via bi-Lipschitz embedding
Authors:
De-jun Feng,
Chun-Kit Lai,
Ying Xiong
Abstract:
The Erdős similarity conjecture asserted that an infinite set of real numbers cannot be affinely embedded into every measurable set of positive Lebesgue measure. The problem is still open, in particular for all fast decaying sequences. In this paper, we relax the problem to the bi-Lipschitz embedding and obtain some sharp criteria about the bi-Lipschitz Erdős similarity problem for strictly decrea…
▽ More
The Erdős similarity conjecture asserted that an infinite set of real numbers cannot be affinely embedded into every measurable set of positive Lebesgue measure. The problem is still open, in particular for all fast decaying sequences. In this paper, we relax the problem to the bi-Lipschitz embedding and obtain some sharp criteria about the bi-Lipschitz Erdős similarity problem for strictly decreasing sequences.
△ Less
Submitted 3 December, 2023;
originally announced December 2023.
-
Manifold Preserving Guided Diffusion
Authors:
Yutong He,
Naoki Murata,
Chieh-Hsin Lai,
Yuhta Takida,
Toshimitsu Uesaka,
Dongjun Kim,
Wei-Hsiang Liao,
Yuki Mitsufuji,
J. Zico Kolter,
Ruslan Salakhutdinov,
Stefano Ermon
Abstract:
Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad…
▽ More
Despite the recent advancements, conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training. In this paper, we propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework that leverages pretrained diffusion models and off-the-shelf neural networks with minimal additional inference cost for a broad range of tasks. Specifically, we leverage the manifold hypothesis to refine the guided diffusion steps and introduce a shortcut algorithm in the process. We then propose two methods for on-manifold training-free guidance using pre-trained autoencoders and demonstrate that our shortcut inherently preserves the manifolds when applied to latent diffusion models. Our experiments show that MPGD is efficient and effective for solving a variety of conditional generation applications in low-compute settings, and can consistently offer up to 3.8x speed-ups with the same number of diffusion steps while maintaining high sample quality compared to the baselines.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
Semidefinite programming bounds on the size of entanglement-assisted codeword stabilized quantum codes
Authors:
Ching-Yi Lai,
Pin-Chieh Tseng,
Wei-Hsuan Yu
Abstract:
In this paper, we explore the application of semidefinite programming to the realm of quantum codes, specifically focusing on codeword stabilized (CWS) codes with entanglement assistance. Notably, we utilize the isotropic subgroup of the CWS group and the set of word operators of a CWS-type quantum code to derive an upper bound on the minimum distance. Furthermore, this characterization can be inc…
▽ More
In this paper, we explore the application of semidefinite programming to the realm of quantum codes, specifically focusing on codeword stabilized (CWS) codes with entanglement assistance. Notably, we utilize the isotropic subgroup of the CWS group and the set of word operators of a CWS-type quantum code to derive an upper bound on the minimum distance. Furthermore, this characterization can be incorporated into the associated distance enumerators, enabling us to construct semidefinite constraints that lead to SDP bounds on the minimum distance or size of CWS-type quantum codes. We illustrate several instances where SDP bounds outperform LP bounds, and there are even cases where LP fails to yield meaningful results, while SDP consistently provides tight and relevant bounds. Finally, we also provide interpretations of the Shor-Laflamme weight enumerators and shadow enumerators for codeword stabilized codes, enhancing our understanding of quantum codes.
△ Less
Submitted 13 November, 2023;
originally announced November 2023.
-
HyperS2V: A Framework for Structural Representation of Nodes in Hyper Networks
Authors:
Shu Liu,
Cameron Lai,
Fujio Toriumi
Abstract:
In contrast to regular (simple) networks, hyper networks possess the ability to depict more complex relationships among nodes and store extensive information. Such networks are commonly found in real-world applications, such as in social interactions. Learning embedded representations for nodes involves a process that translates network structures into more simplified spaces, thereby enabling the…
▽ More
In contrast to regular (simple) networks, hyper networks possess the ability to depict more complex relationships among nodes and store extensive information. Such networks are commonly found in real-world applications, such as in social interactions. Learning embedded representations for nodes involves a process that translates network structures into more simplified spaces, thereby enabling the application of machine learning approaches designed for vector data to be extended to network data. Nevertheless, there remains a need to delve into methods for learning embedded representations that prioritize structural aspects. This research introduces HyperS2V, a node embedding approach that centers on the structural similarity within hyper networks. Initially, we establish the concept of hyper-degrees to capture the structural properties of nodes within hyper networks. Subsequently, a novel function is formulated to measure the structural similarity between different hyper-degree values. Lastly, we generate structural embeddings utilizing a multi-scale random walk framework. Moreover, a series of experiments, both intrinsic and extrinsic, are performed on both toy and real networks. The results underscore the superior performance of HyperS2V in terms of both interpretability and applicability to downstream tasks.
△ Less
Submitted 7 November, 2023;
originally announced November 2023.
-
Nominality Score Conditioned Time Series Anomaly Detection by Point/Sequential Reconstruction
Authors:
Chih-Yu Lai,
Fan-Keng Sun,
Zhengqi Gao,
Jeffrey H. Lang,
Duane S. Boning
Abstract:
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based recon…
▽ More
Time series anomaly detection is challenging due to the complexity and variety of patterns that can occur. One major difficulty arises from modeling time-dependent relationships to find contextual anomalies while maintaining detection accuracy for point anomalies. In this paper, we propose a framework for unsupervised time series anomaly detection that utilizes point-based and sequence-based reconstruction models. The point-based model attempts to quantify point anomalies, and the sequence-based model attempts to quantify both point and contextual anomalies. Under the formulation that the observed time point is a two-stage deviated value from a nominal time point, we introduce a nominality score calculated from the ratio of a combined value of the reconstruction errors. We derive an induced anomaly score by further integrating the nominality score and anomaly score, then theoretically prove the superiority of the induced anomaly score over the original anomaly score under certain conditions. Extensive studies conducted on several public datasets show that the proposed framework outperforms most state-of-the-art baselines for time series anomaly detection.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
On the Language Encoder of Contrastive Cross-modal Models
Authors:
Mengjie Zhao,
Junya Ono,
Zhi Zhong,
Chieh-Hsin Lai,
Yuhta Takida,
Naoki Murata,
Wei-Hsiang Liao,
Takashi Shibuya,
Hiromi Wakaki,
Yuki Mitsufuji
Abstract:
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding…
▽ More
Contrastive cross-modal models such as CLIP and CLAP aid various vision-language (VL) and audio-language (AL) tasks. However, there has been limited investigation of and improvement in their language encoder, which is the central component of encoding natural language descriptions of image/audio into vector representations. We extensively evaluate how unsupervised and supervised sentence embedding training affect language encoder quality and cross-modal task performance. In VL pretraining, we found that sentence embedding training language encoder quality and aids in cross-modal tasks, improving contrastive VL models such as CyCLIP. In contrast, AL pretraining benefits less from sentence embedding training, which may result from the limited amount of pretraining data. We analyze the representation spaces to understand the strengths of sentence embedding training, and find that it improves text-space uniformity, at the cost of decreased cross-modal alignment.
△ Less
Submitted 20 October, 2023;
originally announced October 2023.
-
Correcting phenomenological quantum noise via belief propagation
Authors:
Kao-Yueh Kuo,
Ching-Yi Lai
Abstract:
Quantum stabilizer codes often face the challenge of syndrome errors due to error-prone measurements. To address this issue, multiple rounds of syndrome extraction are typically employed to obtain reliable error syndromes. In this paper, we consider phenomenological decoding problems, where data qubit errors may occur between two syndrome extractions, and each syndrome measurement can be faulty. T…
▽ More
Quantum stabilizer codes often face the challenge of syndrome errors due to error-prone measurements. To address this issue, multiple rounds of syndrome extraction are typically employed to obtain reliable error syndromes. In this paper, we consider phenomenological decoding problems, where data qubit errors may occur between two syndrome extractions, and each syndrome measurement can be faulty. To handle these diverse error sources, we define a generalized check matrix over mixed quaternary and binary alphabets to characterize their error syndromes. This generalized check matrix leads to the creation of a Tanner graph comprising quaternary and binary variable nodes, which facilitates the development of belief propagation (BP) decoding algorithms to tackle phenomenological errors. Importantly, our BP decoders are applicable to general sparse quantum codes. Through simulations of quantum memory protected by rotated toric codes, we demonstrates an error threshold of 3.3% in the phenomenological noise model. Additionally, we propose a method to construct effective redundant stabilizer checks for single-shot error correction. Simulations show that BP decoding performs exceptionally well, even when the syndrome error rate greatly exceeds the data error rate.
△ Less
Submitted 19 October, 2023;
originally announced October 2023.
-
Distributed Indexing Schemes for k-Dominant Skyline Analytics on Uncertain Edge-IoT Data
Authors:
Chuan-Chi Lai,
Hsuan-Yu Lin,
Chuan-Ming Liu
Abstract:
Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a k-dominant skyline in which the number of skyline members was reduced by relaxing the restr…
▽ More
Skyline queries typically search a Pareto-optimal set from a given data set to solve the corresponding multiobjective optimization problem. As the number of criteria increases, the skyline presumes excessive data items, which yield a meaningless result. To address this curse of dimensionality, we proposed a k-dominant skyline in which the number of skyline members was reduced by relaxing the restriction on the number of dimensions, considering the uncertainty of data. Specifically, each data item was associated with a probability of appearance, which represented the probability of becoming a member of the k-dominant skyline. As data items appear continuously in data streams, the corresponding k-dominant skyline may vary with time. Therefore, an effective and rapid mechanism of updating the k-dominant skyline becomes crucial. Herein, we proposed two time-efficient schemes, Middle Indexing (MI) and All Indexing (AI), for k-dominant skyline in distributed edge-computing environments, where irrelevant data items can be effectively excluded from the compute to reduce the processing duration. Furthermore, the proposed schemes were validated with extensive experimental simulations. The experimental results demonstrated that the proposed MI and AI schemes reduced the computation time by approximately 13% and 56%, respectively, compared with the existing method.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Neel tensor torque at the ferromagnet/antiferromagnet interface
Authors:
Chao-Yao Yang,
Sheng-Huai Chen,
Chih-Hsiang Tseng,
Chang-Yang Kuo,
Hsiu-Hau Lin,
Chih-Huang Lai
Abstract:
Antiferromagnets (AFMs) exhibit spin arrangements with no net magnetization, positioning them as promising candidates for spintronics applications. While electrical manipulation of the single-crystal AFMs, composed of periodic spin configurations, is achieved recently, it remains a daunting challenge to characterize and to manipulate polycrystalline AFMs. Utilizing statistical analysis in data sci…
▽ More
Antiferromagnets (AFMs) exhibit spin arrangements with no net magnetization, positioning them as promising candidates for spintronics applications. While electrical manipulation of the single-crystal AFMs, composed of periodic spin configurations, is achieved recently, it remains a daunting challenge to characterize and to manipulate polycrystalline AFMs. Utilizing statistical analysis in data science, we demonstrate that polycrystalline AFMs can be described using a real, symmetric, positive semi-definite, rank-two tensor, which we term the Neel tensor. This tensor introduces a unique spin torque, diverging from the conventional field-like and Slonczewski torques in spintronics devices. Remarkably, Neel tensors can be trained to retain a specific orientation, functioning as a form of working memory. This attribute enables zero-field spin-orbit-torque switching in trilayer devices featuring a heavy-metal/ferromagnet/AFM structure and is also consistent with the X-ray magnetic linear dichroism measurements. Our findings uncover hidden statistical patterns in polycrystalline AFMs and establishes the presence of Neel tensor torque, highlighting its potential to drive future spintronics innovations.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
Audio-Visual Neural Syntax Acquisition
Authors:
Cheng-I Jeff Lai,
Freda Shi,
Puyuan Peng,
Yoon Kim,
Kevin Gimpel,
Shiyu Chang,
Yung-Sung Chuang,
Saurabhchand Bhati,
David Cox,
David Harwath,
Yang Zhang,
Karen Livescu,
James Glass
Abstract:
We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without eve…
▽ More
We study phrase structure induction from visually-grounded speech. The core idea is to first segment the speech waveform into sequences of word segments, and subsequently induce phrase structure using the inferred segment-level continuous representations. We present the Audio-Visual Neural Syntax Learner (AV-NSL) that learns phrase structure by listening to audio and looking at images, without ever being exposed to text. By training on paired images and spoken captions, AV-NSL exhibits the capability to infer meaningful phrase structures that are comparable to those derived by naturally-supervised text parsers, for both English and German. Our findings extend prior work in unsupervised language acquisition from speech and grounded grammar induction, and present one approach to bridge the gap between the two topics.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.