subscribe to arXiv mailings

Large spin-orbit torque in a-plane $α$-Fe$_{2}$O$_{3}$/Pt bilayers

Authors: Igor Lyalin, Hantao Zhang, Justin Michel, Daniel Russell, Fengyuan Yang, Ran Cheng, Roland K. Kawakami

Abstract: Realization of efficient spin-orbit torque switching of the Néel vector in insulating antiferromagnets is a challenge, often complicated by spurious effects. Quantifying the spin-orbit torques in antiferromagnet/heavy metal heterostructures is an important first step towards this goal. Here, we employ magneto-optic techniques to study damping-like spin-orbit torque (DL-SOT) in a-plane $α$-Fe$_2$O… ▽ More Realization of efficient spin-orbit torque switching of the Néel vector in insulating antiferromagnets is a challenge, often complicated by spurious effects. Quantifying the spin-orbit torques in antiferromagnet/heavy metal heterostructures is an important first step towards this goal. Here, we employ magneto-optic techniques to study damping-like spin-orbit torque (DL-SOT) in a-plane $α$-Fe$_2$O$_3$ (hematite) with a Pt spin-orbit overlayer. We find that the DL-SOT efficiency is two orders of magnitude larger than reported in c- and r-plane hematite/Pt using harmonic Hall techniques. The large magnitude of DL-SOT is supported by direct imaging of current-induced motion of antiferromagnetic domains that happens at moderate current densities. Our study introduces a new method for quantifying spin-orbit torque in antiferromagnets with a small canted moment and identifies a-plane $α$-Fe$_2$O$_3$ as a promising candidate to realize efficient SOT switching. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures

arXiv:2407.00269 [pdf, other]

High-power and narrow-linewidth laser on thin-film lithium niobate enabled by photonic wire bonding

Authors: Cornelis A. A. Franken, Rebecca Cheng, Keith Powell, Georgios Kyriazidis, Victoria Rosborough, Juergen Musolf, Maximilian Shah, David R. Barton III, Gage Hills, Leif Johansson, Klaus-J. Boller, Marko Lončar

Abstract: Thin-film lithium niobate (TFLN) has emerged as a promising platform for the realization of high performance chip-scale optical systems, spanning a range of applications from optical communications to microwave photonics. Such applications rely on the integration of multiple components onto a single platform. However, while many of these components have already been demonstrated on the TFLN platfo… ▽ More Thin-film lithium niobate (TFLN) has emerged as a promising platform for the realization of high performance chip-scale optical systems, spanning a range of applications from optical communications to microwave photonics. Such applications rely on the integration of multiple components onto a single platform. However, while many of these components have already been demonstrated on the TFLN platform, to date, a major bottleneck of the platform is the existence of a tunable, high-power, and narrow-linewidth on-chip laser. Here, we address this problem using photonic wire bonding to integrate optical amplifiers with a thin-film lithium niobate feedback circuit, and demonstrate an extended cavity diode laser yielding high on-chip power of 78 mW, side mode suppression larger than 60 dB and wide wavelength tunability over 43 nm. The laser frequency stability over short timescales shows an ultra-narrow intrinsic linewidth of 550 Hz. Long-term recordings indicate a high passive stability of the photonic wire bonded laser with 58 hours of mode-hop-free operation, with a trend in the frequency drift of only 4.4 MHz/h. This work verifies photonic wire bonding as a viable integration solution for high performance on-chip lasers, opening the path to system level upscaling and Watt-level output powers. △ Less

Submitted 5 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures; updated long-term stability measurements with new and improved data

arXiv:2406.17245 [pdf, other]

Unlocking Continual Learning Abilities in Language Models

Authors: Wenyu Du, Shuang Cheng, Tongxu Luo, Zihan Qiu, Zeyu Huang, Ka Chun Cheung, Reynold Cheng, Jie Fu

Abstract: Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task informa… ▽ More Language models (LMs) exhibit impressive performance and generalization capabilities. However, LMs struggle with the persistent challenge of catastrophic forgetting, which undermines their long-term sustainability in continual learning (CL). Existing approaches usually address the issue by incorporating old task data or task-wise inductive bias into LMs. However, old data and accurate task information are often unavailable or costly to collect, hindering the availability of current CL approaches for LMs. To address this limitation, we introduce $\textbf{MIGU}$ ($\textbf{M}$agn$\textbf{I}$tude-based $\textbf{G}$radient $\textbf{U}$pdating for continual learning), a rehearsal-free and task-label-free method that only updates the model parameters with large magnitudes of output in LMs' linear layers. MIGU is based on our observation that the L1-normalized magnitude distribution of the output in LMs' linear layers is different when the LM models deal with different task data. By imposing this simple constraint on the gradient update process, we can leverage the inherent behaviors of LMs, thereby unlocking their innate CL abilities. Our experiments demonstrate that MIGU is universally applicable to all three LM architectures (T5, RoBERTa, and Llama2), delivering state-of-the-art or on-par performance across continual finetuning and continual pre-training settings on four CL benchmarks. For example, MIGU brings a 15.2% average accuracy improvement over conventional parameter-efficient finetuning baselines in a 15-task CL benchmark. MIGU can also seamlessly integrate with all three existing CL types to further enhance performance. Code is available at \href{https://github.com/wenyudu/MIGU}{this https URL}. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: preprint, 19 pages

arXiv:2406.10802 [pdf, other]

KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

Authors: Aihua Pei, Zehua Yang, Shunan Zhu, Ruoxi Cheng, Ju Jia, Lina Wang

Abstract: Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework… ▽ More Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework generates original prompts from the triplets of knowledge graphs and creates adversarial prompts by poisoning, assessing the robustness of LLMs through the results of these adversarial attacks. We systematically evaluate the effectiveness of this framework and its modules. Experiments show that adversarial robustness of the ChatGPT family ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo, and the robustness of large language models is influenced by the professional domains in which they operate. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.09274 [pdf, other]

Doubled Shapiro steps in a dynamic axion insulator Josephson junction

Authors: Yu-Hang Li, Ziqian Zhou, Ran Cheng, Hua Jiang, X. C. Xie

Abstract: Dynamic axion insulators feature a time-dependent axion field that can be induced by antiferromagnetic resonance. Here, we show that a Josephson junction incorporating this dynamic axion insulator between two superconductors exhibits a striking doubled Shapiro steps wherein all odd steps are completely suppressed in the jointly presence of a DC bias and a static magnetic field. The resistively shu… ▽ More Dynamic axion insulators feature a time-dependent axion field that can be induced by antiferromagnetic resonance. Here, we show that a Josephson junction incorporating this dynamic axion insulator between two superconductors exhibits a striking doubled Shapiro steps wherein all odd steps are completely suppressed in the jointly presence of a DC bias and a static magnetic field. The resistively shunted junction simulation confirms that these doubled Shapiro steps originate from the distinctive axion electrodynamics driven by the antiferromagnetic resonance, which thus not only furnishes a hallmark to identify the dynamic axion insulator but also provides a method to evaluate its mass term. Furthermore, the experimentally feasible differential conductance is also determined. Our work holds significant importance in condensed matter physics and materials science for understanding the dynamic axion insulator, paving the way for its further exploration and applications. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07365 [pdf, other]

BvSP: Broad-view Soft Prompting for Few-Shot Aspect Sentiment Quad Prediction

Authors: Yinhao Bai, Yalan Xie, Xiaoyi Liu, Yuhua Zhao, Zhixin Han, Mengting Hu, Hang Gao, Renhong Cheng

Abstract: Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity. In practice, unseen aspects, due to distinct data distribution, impose many challenges for a trained neural model. Motivated by this, this work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real application… ▽ More Aspect sentiment quad prediction (ASQP) aims to predict four aspect-based elements, including aspect term, opinion term, aspect category, and sentiment polarity. In practice, unseen aspects, due to distinct data distribution, impose many challenges for a trained neural model. Motivated by this, this work formulates ASQP into the few-shot scenario, which aims for fast adaptation in real applications. Therefore, we first construct a few-shot ASQP dataset (FSQP) that contains richer categories and is more balanced for the few-shot study. Moreover, recent methods extract quads through a generation paradigm, which involves converting the input sentence into a templated target sequence. However, they primarily focus on the utilization of a single template or the consideration of different template orders, thereby overlooking the correlations among various templates. To tackle this issue, we further propose a Broadview Soft Prompting (BvSP) method that aggregates multiple templates with a broader view by taking into account the correlation between the different templates. Specifically, BvSP uses the pre-trained language model to select the most relevant k templates with Jensen-Shannon divergence. BvSP further introduces soft prompts to guide the pre-trained language model using the selected templates. Then, we aggregate the results of multi-templates by voting mechanism. Empirical results demonstrate that BvSP significantly outperforms the stateof-the-art methods under four few-shot settings and other public datasets. Our code and dataset are available at https://github.com/byinhao/BvSP. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: Accepted to ACL 2024 Main Conference

arXiv:2406.06626 [pdf, other]

Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications

Authors: Zhou Zhou, Guohang He, Zheng Zhang, Luziwei Leng, Qinghai Guo, Jianxing Liao, Xuan Song, Ran Cheng

Abstract: Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks… ▽ More Traditional invasive Brain-Computer Interfaces (iBCIs) typically depend on neural decoding processes conducted on workstations within laboratory settings, which prevents their everyday usage. Implementing these decoding processes on edge devices, such as the wearables, introduces considerable challenges related to computational demands, processing speed, and maintaining accuracy. This study seeks to identify an optimal neural decoding backbone that boasts robust performance and swift inference capabilities suitable for edge deployment. We executed a series of neural decoding experiments involving nonhuman primates engaged in random reaching tasks, evaluating four prospective models, Gated Recurrent Unit (GRU), Transformer, Receptance Weighted Key Value (RWKV), and Selective State Space model (Mamba), across several metrics: single-session decoding, multi-session decoding, new session fine-tuning, inference speed, calibration speed, and scalability. The findings indicate that although the GRU model delivers sufficient accuracy, the RWKV and Mamba models are preferable due to their superior inference and calibration speeds. Additionally, RWKV and Mamba comply with the scaling law, demonstrating improved performance with larger data sets and increased model sizes, whereas GRU shows less pronounced scalability, and the Transformer model requires computational resources that scale prohibitively. This paper presents a thorough comparative analysis of the four models in various scenarios. The results are pivotal in pinpointing an optimal backbone that can handle increasing data volumes and is viable for edge implementation. This analysis provides essential insights for ongoing research and practical applications in the field. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2405.20396 [pdf, other]

Using the COSMIC Population Synthesis Code to Investigate How Metallicity Affects the Rates of Interacting Binaries

Authors: Ayanah L. Cason, Nicole M. Lloyd-Ronning, Roseanne M. Cheng

Abstract: We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity.… ▽ More We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity. In particular, these systems occur about 10 times more frequently in at metallicities between $Z = 2\times 10^{-4}$ and $2 \times 10^{-3}$, compared to those between $Z = 2\times 10^{-3}$ and $2 \times 10^{-2}$. This work serves as a prerequisite to predicting the global rates of these systems as a function of redshift, ultimately giving crucial insight into our understanding of the progenitors of long gamma-ray bursts and their evolution over cosmic time. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: submitted to RNAAS

arXiv:2405.15319 [pdf, other]

Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training

Authors: Wenyu Du, Tongxu Luo, Zihan Qiu, Zeyu Huang, Yikang Shen, Reynold Cheng, Yike Guo, Jie Fu

Abstract: LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehen… ▽ More LLMs are computationally expensive to pre-train due to their large scale. Model growth emerges as a promising approach by leveraging smaller models to accelerate the training of larger ones. However, the viability of these model growth methods in efficient LLM pre-training remains underexplored. This work identifies three critical $\underline{\textit{O}}$bstacles: ($\textit{O}$1) lack of comprehensive evaluation, ($\textit{O}$2) untested viability for scaling, and ($\textit{O}$3) lack of empirical guidelines. To tackle $\textit{O}$1, we summarize existing approaches into four atomic growth operators and systematically evaluate them in a standardized LLM pre-training setting. Our findings reveal that a depthwise stacking operator, called $G_{\text{stack}}$, exhibits remarkable acceleration in training, leading to decreased loss and improved overall performance on eight standard NLP benchmarks compared to strong baselines. Motivated by these promising results, we conduct extensive experiments to delve deeper into $G_{\text{stack}}$ to address $\textit{O}$2 and $\textit{O}$3. For $\textit{O}$2 (untested scalability), our study shows that $G_{\text{stack}}$ is scalable and consistently performs well, with experiments up to 7B LLMs after growth and pre-training LLMs with 750B tokens. For example, compared to a conventionally trained 7B model using 300B tokens, our $G_{\text{stack}}$ model converges to the same loss with 194B tokens, resulting in a 54.6\% speedup. We further address $\textit{O}$3 (lack of empirical guidelines) by formalizing guidelines to determine growth timing and growth factor for $G_{\text{stack}}$, making it practical in general LLM pre-training. We also provide in-depth discussions and comprehensive ablation studies of $G_{\text{stack}}$. Our code and pre-trained model are available at $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Preprint; The project link: $\href{https://llm-stacking.github.io/}{https://llm-stacking.github.io/}$

arXiv:2405.15307 [pdf, other]

Before Generation, Align it! A Novel and Effective Strategy for Mitigating Hallucinations in Text-to-SQL Generation

Authors: Ge Qu, Jinyang Li, Bowen Li, Bowen Qin, Nan Huo, Chenhao Ma, Reynold Cheng

Abstract: Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-SQL. Previous methods generally employ a two-stage reasoning framework, namely 1) schema linking and 2) logical synthesis, making the framework not only effective but also interpretable. Despite these advancements, the inherent bad nature of the generalization of LLMs often resul… ▽ More Large Language Models (LLMs) driven by In-Context Learning (ICL) have significantly improved the performance of text-to-SQL. Previous methods generally employ a two-stage reasoning framework, namely 1) schema linking and 2) logical synthesis, making the framework not only effective but also interpretable. Despite these advancements, the inherent bad nature of the generalization of LLMs often results in hallucinations, which limits the full potential of LLMs. In this work, we first identify and categorize the common types of hallucinations at each stage in text-to-SQL. We then introduce a novel strategy, Task Alignment (TA), designed to mitigate hallucinations at each stage. TA encourages LLMs to take advantage of experiences from similar tasks rather than starting the tasks from scratch. This can help LLMs reduce the burden of generalization, thereby mitigating hallucinations effectively. We further propose TA-SQL, a text-to-SQL framework based on this strategy. The experimental results and comprehensive analysis demonstrate the effectiveness and robustness of our framework. Specifically, it enhances the performance of the GPT-4 baseline by 21.23% relatively on BIRD dev and it yields significant improvements across six models and four mainstream, complex text-to-SQL benchmarks. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted to ACL Findings 2024

arXiv:2405.14517 [pdf, other]

Identity Inference from CLIP Models using Only Textual Data

Authors: Songze Li, Ruoxi Cheng, Xiaojun Jia

Abstract: The widespread usage of large-scale multimodal models like CLIP has heightened concerns about the leakage of personally identifiable information (PII). Existing methods for identity inference in CLIP models, i.e., to detect the presence of a person's PII used for training a CLIP model, require querying the model with full PII, including textual descriptions of the person and corresponding images (… ▽ More The widespread usage of large-scale multimodal models like CLIP has heightened concerns about the leakage of personally identifiable information (PII). Existing methods for identity inference in CLIP models, i.e., to detect the presence of a person's PII used for training a CLIP model, require querying the model with full PII, including textual descriptions of the person and corresponding images (e.g., the name and the face photo of the person). However, this may lead to potential privacy breach of the image, as it may have not been seen by the target model yet. Additionally, traditional membership inference attacks (MIAs) train shadow models to mimic the behaviors of the target model, which incurs high computational costs, especially for large CLIP models. To address these challenges, we propose a textual unimodal detector (TUNI) in CLIP models, a novel method for ID inference that 1) queries the target model with only text data; and 2) does not require training shadow models. Firstly, we develop a feature extraction algorithm, guided by the CLIP model, to extract features from a text description. TUNI starts with randomly generating textual gibberish that were clearly not utilized for training, and leverages their feature vectors to train a system of anomaly detectors. During inference, the feature vector of each test text is fed into the anomaly detectors to determine if the person's PII is in the training set (abnormal) or not (normal). Moreover, TUNI can be further strengthened integrating real images associated with the tested individuals, if available at the detector. Extensive experiments of TUNI across various CLIP model architectures and datasets demonstrate its superior performance over baselines, albeit with only text data. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12183 [pdf, other]

Multi-order Graph Clustering with Adaptive Node-level Weight Learning

Authors: Ye Liu, Xuelei Lin, Yejia Chen, Reynold Cheng

Abstract: Current graph clustering methods emphasize individual node and edge con nections, while ignoring higher-order organization at the level of motif. Re cently, higher-order graph clustering approaches have been designed by motif based hypergraphs. However, these approaches often suffer from hypergraph fragmentation issue seriously, which degrades the clustering performance greatly. Moreover, real-wor… ▽ More Current graph clustering methods emphasize individual node and edge con nections, while ignoring higher-order organization at the level of motif. Re cently, higher-order graph clustering approaches have been designed by motif based hypergraphs. However, these approaches often suffer from hypergraph fragmentation issue seriously, which degrades the clustering performance greatly. Moreover, real-world graphs usually contain diverse motifs, with nodes participating in multiple motifs. A key challenge is how to achieve precise clustering results by integrating information from multiple motifs at the node level. In this paper, we propose a multi-order graph clustering model (MOGC) to integrate multiple higher-order structures and edge connections at node level. MOGC employs an adaptive weight learning mechanism to au tomatically adjust the contributions of different motifs for each node. This not only tackles hypergraph fragmentation issue but enhances clustering accuracy. MOGC is efficiently solved by an alternating minimization algo rithm. Experiments on seven real-world datasets illustrate the effectiveness of MOGC. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.11028 [pdf, other]

Simulations of Interacting Binary Systems -- Pathways to Radio Bright GRB Progenitors

Authors: Angel Hernandez, Roseanne M. Cheng, Nicole M. Lloyd-Ronning, Carl E. Fields

Abstract: Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we ex… ▽ More Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we examine how the companion influences the angular momentum and circumstellar environment near the end of the massive star life. We find that tidal effects can cause the compact object companion to significantly increase the angular momentum of the massive star, for orbital periods in the range of up to $\sim 4$ days. We model the density profile evolution of the massive star and discuss how tidal interactions may also lead to stripping of the outer stellar envelope in a way that can create an environment around the binary system that deviates from a typical $1/r^{2}$ wind density profile. We show how our results depend on the metallicity of the system, initial spin of the star, mass ratio, as well as accretion and dynamo prescriptions in the simulations. We conclude that these systems may be viable progenitors for radio-bright, long gamma-ray bursts. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Submitted to ApJ - comments welcome

Report number: LA-UR-24-22983

arXiv:2405.10889 [pdf]

Unconventional Unidirectional Magnetoresistance in vdW Heterostructures

Authors: I-Hsuan Kao, Junyu Tang, Gabriel Calderon Ortiz, Menglin Zhu, Sean Yuan, Rahul Rao, Jiahan Li, James H. Edgar, Jiaqiang Yan, David G. Mandrus, Kenji Watanabe, Takashi Taniguchi, Jinwoo Hwang, Ran Cheng, Jyoti Katoch, Simranjeet Singh

Abstract: Electrical readout of magnetic states is a key to realize novel spintronics devices for efficient computing and data storage. Unidirectional magnetoresistance (UMR) in bilayer systems, consisting of a spin source material and a magnetic layer, refers to a change in the longitudinal resistance upon the reversal of magnetization, which typically originates from the interaction of spin-current and ma… ▽ More Electrical readout of magnetic states is a key to realize novel spintronics devices for efficient computing and data storage. Unidirectional magnetoresistance (UMR) in bilayer systems, consisting of a spin source material and a magnetic layer, refers to a change in the longitudinal resistance upon the reversal of magnetization, which typically originates from the interaction of spin-current and magnetization at the interface. Because of UMR s linear dependence on applied charge current and magnetization, it can be used to electrically read the magnetization state. However, in conventional spin source materials, the spin polarization of an electric field induced spin current is restricted to be in the film plane and hence the ensuing UMR can only respond to the in plane component of the magnetization. On the other hand, magnets with perpendicular magnetic anisotropy (PMA) are highly desired for magnetic memory and spin-logic devices, while the electrical read out of PMA magnets through UMR is critically missing. Here, we report the discovery of an unconventional UMR in bilayer heterostructures of a topological semimetal (WTe2) and a PMA ferromagnetic insulator (Cr2Ge2Te6, CGT), which allows to electrically read the up and down magnetic states of the CGT layer by measuring the longitudinal resistance. Our theoretical calculations based on a tight binding model show that the unconventional UMR originates from the interplay of crystal symmetry breaking in WTe2 and magnetic exchange interaction across the WTe2 and CGT interface. Combining with the ability of WTe2 to obtain magnetic field free switching of the PMA magnets, our discoveries open an exciting pathway to achieve two terminal magnetic memory devices that operate solely on the spin orbit torque and UMR, which is critical for developing next-generation non volatile and low power consumption data storage technologies. △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.10422 [pdf, other]

A First Look at Immersive Telepresence on Apple Vision Pro

Authors: Ruizhi Cheng, Nan Wu, Matteo Varvello, Eugene Chai, Songqing Chen, Bo Han

Abstract: Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, these systems lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mixed reality headset that supports "spatial persona", aims to offer an immersiv… ▽ More Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, these systems lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mixed reality headset that supports "spatial persona", aims to offer an immersive telepresence experience with these applications. In this paper, we conduct a first-of-its-kind in-depth and empirical study to analyze the performance of immersive telepresence with four applications, Apple FaceTime, Cisco Webex, Microsoft Teams, and Zoom, on Vision Pro. We find that only FaceTime provides a truly immersive experience with spatial personas, whereas other applications still operate 2D personas. Our measurement results reveal that (1) FaceTime delivers semantic information to optimize bandwidth consumption, which is even lower than that of 2D persona for other applications, and (2) it employs visibility-aware optimizations to reduce rendering overhead. However, the scalability of FaceTime remains limited, with a simple server allocation strategy that potentially leads to high network delay among users. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.03267 [pdf, other]

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Authors: Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

Abstract: Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes… ▽ More Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$\times$ and 1.7$\times$, these indexes have to pay a 5.8$\times$ storage amplification and 7.7$\times$ with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification. This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices. Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs. △ Less

Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.17341 [pdf, other]

Free curves in Fano hypersurfaces must have high degree

Authors: Raymond Cheng

Abstract: The purpose of this note is to show that the minimal $e$ for which every smooth Fano hypersurface of dimension $n$ contains a free rational curve of degree at most $e$ cannot be bounded by a linear function in $n$ when the base field has positive characteristic. This is done by providing a super-linear bound on the minimal possible degree of a free curve in certain Fermat hypersurfaces. The purpose of this note is to show that the minimal $e$ for which every smooth Fano hypersurface of dimension $n$ contains a free rational curve of degree at most $e$ cannot be bounded by a linear function in $n$ when the base field has positive characteristic. This is done by providing a super-linear bound on the minimal possible degree of a free curve in certain Fermat hypersurfaces. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 4 pages, comments welcome!

MSC Class: 14M22; 14J70 (primary); 14G17; 14J45 (secondary)

arXiv:2404.16266 [pdf, other]

doi 10.1145/3638530.3654389

A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation

Authors: Yifan Zhao, Zhenyu Liang, Zhichao Lu, Ran Cheng

Abstract: As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to ba… ▽ More As one of the emerging challenges in Automated Machine Learning, the Hardware-aware Neural Architecture Search (HW-NAS) tasks can be treated as black-box multi-objective optimization problems (MOPs). An important application of HW-NAS is real-time semantic segmentation, which plays a pivotal role in autonomous driving scenarios. The HW-NAS for real-time semantic segmentation inherently needs to balance multiple optimization objectives, including model accuracy, inference speed, and hardware-specific considerations. Despite its importance, benchmarks have yet to be developed to frame such a challenging task as multi-objective optimization. To bridge the gap, we introduce a tailored streamline to transform the task of HW-NAS for real-time semantic segmentation into standard MOPs. Building upon the streamline, we present a benchmark test suite, CitySeg/MOP, comprising fifteen MOPs derived from the Cityscapes dataset. The CitySeg/MOP test suite is integrated into the EvoXBench platform to provide seamless interfaces with various programming languages (e.g., Python and MATLAB) for instant fitness evaluations. We comprehensively assessed the CitySeg/MOP test suite on various multi-objective evolutionary algorithms, showcasing its versatility and practicality. Source codes are available at https://github.com/EMI-Group/evoxbench. △ Less

Submitted 28 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

Comments: GECCO 2024

arXiv:2404.15622 [pdf, other]

FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search

Authors: Haoming Zhang, Ran Cheng

Abstract: Neural Architecture Search (NAS) has emerged as a key tool in identifying optimal configurations of deep neural networks tailored to specific tasks. However, training and assessing numerous architectures introduces considerable computational overhead. One method to mitigating this is through performance predictors, which offer a means to estimate the potential of an architecture without exhaustive… ▽ More Neural Architecture Search (NAS) has emerged as a key tool in identifying optimal configurations of deep neural networks tailored to specific tasks. However, training and assessing numerous architectures introduces considerable computational overhead. One method to mitigating this is through performance predictors, which offer a means to estimate the potential of an architecture without exhaustive training. Given that neural architectures fundamentally resemble Directed Acyclic Graphs (DAGs), Graph Neural Networks (GNNs) become an apparent choice for such predictive tasks. Nevertheless, the scarcity of training data can impact the precision of GNN-based predictors. To address this, we introduce a novel GNN predictor for NAS. This predictor renders neural architectures into vector representations by combining both the conventional and inverse graph views. Additionally, we incorporate a customized training loss within the GNN predictor to ensure efficient utilization of both types of representations. We subsequently assessed our method through experiments on benchmark datasets including NAS-Bench-101, NAS-Bench-201, and the DARTS search space, with a training dataset ranging from 50 to 400 samples. Benchmarked against leading GNN predictors, the experimental results showcase a significant improvement in prediction accuracy, with a 3%--16% increase in Kendall-tau correlation. Source codes are available at https://github.com/EMI-Group/fr-nas. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: IJCNN'24

arXiv:2404.10160 [pdf, other]

Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debates as Feedback (RLDF), a novel approach for bias mitigation replacing human feedback in traditional RLHF. We utilize LLMs in multi-role debates to create a dataset that includes both high-bias and low-bias instances for training the reward model in reinforcement learning. Our approach comprises two modes: (1) self-reflection, where the same LLM participates in multi-role debates, and (2) teacher-student, where a more advanced LLM like GPT-3.5-turbo guides the LLM to perform this task. Experimental results across different LLMs demonstrate the effectiveness of our approach in bias mitigation. △ Less

Submitted 18 June, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

Comments: The first three authors contributed equally to this work

arXiv:2404.08233 [pdf, other]

Generalized Population-Based Training for Hyperparameter Optimization in Reinforcement Learning

Authors: Hui Bai, Ran Cheng

Abstract: Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelli… ▽ More Hyperparameter optimization plays a key role in the machine learning domain. Its significance is especially pronounced in reinforcement learning (RL), where agents continuously interact with and adapt to their environments, requiring dynamic adjustments in their learning trajectories. To cater to this dynamicity, the Population-Based Training (PBT) was introduced, leveraging the collective intelligence of a population of agents learning simultaneously. However, PBT tends to favor high-performing agents, potentially neglecting the explorative potential of agents on the brink of significant advancements. To mitigate the limitations of PBT, we present the Generalized Population-Based Training (GPBT), a refined framework designed for enhanced granularity and flexibility in hyperparameter adaptation. Complementing GPBT, we further introduce Pairwise Learning (PL). Instead of merely focusing on elite agents, PL employs a comprehensive pairwise strategy to identify performance differentials and provide holistic guidance to underperforming agents. By integrating the capabilities of GPBT and PL, our approach significantly improves upon traditional PBT in terms of adaptability and computational efficiency. Rigorous empirical evaluations across a range of RL benchmarks confirm that our approach consistently outperforms not only the conventional PBT but also its Bayesian-optimized variant. △ Less

Submitted 22 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: IEEE Transactions on Emerging Topics in Computational Intelligence

arXiv:2404.07387 [pdf, other]

BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks

Authors: Ruijia Cheng, Titus Barik, Alan Leung, Fred Hohman, Jeffrey Nichols

Abstract: Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generati… ▽ More Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas. △ Less

Submitted 11 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06398 [pdf]

Integrated electro-optics on thin-film lithium niobate

Authors: Yaowen Hu, Di Zhu, Shengyuan Lu, Xinrui Zhu, Yunxiang Song, Dylan Renaud, Daniel Assumpcao, Rebecca Cheng, CJ Xin, Matthew Yeh, Hana Warner, Xiangwen Guo, Amirhassan Shams-Ansari, David Barton, Neil Sinclair, Marko Loncar

Abstract: Electro-optics serves as the crucial bridge between electronics and photonics, unlocking a wide array of applications ranging from communications and computing to sensing and quantum information. Integrated electro-optics approaches in particular enable essential electronic high-speed control for photonics while offering substantial photonic parallelism for electronics. Recent strides in thin-film… ▽ More Electro-optics serves as the crucial bridge between electronics and photonics, unlocking a wide array of applications ranging from communications and computing to sensing and quantum information. Integrated electro-optics approaches in particular enable essential electronic high-speed control for photonics while offering substantial photonic parallelism for electronics. Recent strides in thin-film lithium niobate photonics have ushered revolutionary advancements in electro-optics. This technology not only offers the requisite strong electro-optic coupling but also boasts ultra-low optical loss and high microwave bandwidth. Further, its tight confinement and compatibility with nanofabrication allow for unprecedented reconfigurability and scalability, facilitating the creation of novel and intricate devices and systems that were once deemed nearly impossible in bulk systems. Building upon this platform, the field has witnessed the emergence of various groundbreaking electro-optic devices surpassing the current state of the art, and introducing functionalities that were previously non-existent. This technological leap forward provides a unique framework to explore various realms of physics as well, including photonic non-Hermitian synthetic dimensions, active topological physics, and quantum electro-optics. In this review, we present the fundamental principles of electro-optics, drawing connections between fundamental science and the forefront of technology. We discuss the accomplishments and future prospects of integrated electro-optics, enabled by thin-film lithium niobate platform. △ Less

Submitted 11 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.06290 [pdf, other]

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models

Authors: Beichen Huang, Xingyu Wu, Yu Zhou, Jibin Wu, Liang Feng, Ran Cheng, Kay Chen Tan

Abstract: Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors t… ▽ More Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of applying LLMs. However, whether the application of LLMs in the black-box optimization problems is genuinely beneficial remains unexplored. This paper endeavors to offer deep insights into the potential of LLMs in optimization through a comprehensive investigation, which covers both discrete and continuous optimization problems to assess the efficacy and distinctive characteristics that LLMs bring to this field. Our findings reveal both the limitations and advantages of LLMs in optimization. Specifically, on the one hand, despite the significant power consumed for running the models, LLMs exhibit subpar performance in pure numerical tasks, primarily due to a mismatch between the problem domain and their processing capabilities; on the other hand, although LLMs may not be ideal for traditional numerical optimization, their potential in broader optimization contexts remains promising, where LLMs exhibit the ability to solve problems in non-numerical domains and can leverage heuristics from the prompt to enhance their performance. To the best of our knowledge, this work presents the first systematic evaluation of LLMs for numerical optimization. Our findings pave the way for a deeper understanding of LLMs' role in optimization and guide future application of LLMs in a wide range of scenarios. △ Less

Submitted 6 July, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.04895 [pdf, other]

Tensorized Ant Colony Optimization for GPU Acceleration

Authors: Luming Yang, Tao Jiang, Ran Cheng

Abstract: Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path in… ▽ More Ant Colony Optimization (ACO) is renowned for its effectiveness in solving Traveling Salesman Problems, yet it faces computational challenges in CPU-based environments, particularly with large-scale instances. In response, we introduce a Tensorized Ant Colony Optimization (TensorACO) to utilize the advancements of GPU acceleration. As the core, TensorACO fully transforms ant system and ant path into tensor forms, a process we refer to as tensorization. For the tensorization of ant system, we propose a preprocessing method to reduce the computational overhead by calculating the probability transition matrix. In the tensorization of ant path, we propose an index mapping method to accelerate the update of pheromone matrix by replacing the mechanism of sequential path update with parallel matrix operations. Additionally, we introduce an Adaptive Independent Roulette (AdaIR) method to overcome the challenges of parallelizing ACO's selection mechanism on GPUs. Comprehensive experiments demonstrate the superior performance of TensorACO achieving up to 1921$\times$ speedup over standard ACO. Moreover, the AdaIR method further improves TensorACO's convergence speed by 80% and solution quality by 2%. Source codes are available at https://github.com/EMI-Group/tensoraco. △ Less

Submitted 12 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2404.03032 [pdf]

Even-Odd Layer-Dependent Exchange Bias Effect in MnBi2Te4 Chern Insulator Devices

Authors: Bo Chen, Xiaoda Liu, Yu-Hang Li, Han Tay, Takashi Taniguchi, Kenji Watanabe, Moses. H. W. Chan, Jiaqiang Yan, Fengqi Song, Ran Cheng, Cui-Zu Chang

Abstract: Magnetic topological materials with coexisting magnetism and non-trivial band structures exhibit many novel quantum phenomena, including the quantum anomalous Hall effect, the axion insulator state, and the Weyl semimetal phase. As a stoichiometric layered antiferromagnetic topological insulator, thin films of MnBi2Te4 show fascinating even-odd layer-dependent physics. In this work, we fabricate a… ▽ More Magnetic topological materials with coexisting magnetism and non-trivial band structures exhibit many novel quantum phenomena, including the quantum anomalous Hall effect, the axion insulator state, and the Weyl semimetal phase. As a stoichiometric layered antiferromagnetic topological insulator, thin films of MnBi2Te4 show fascinating even-odd layer-dependent physics. In this work, we fabricate a series of thin-flake MnBi2Te4 devices using stencil masks and observe the Chern insulator state at high magnetic fields and a square hysteresis loop near zero magnetic field in all these devices. Upon magnetic field training, a large exchange bias effect is observed in odd but not in even septuple layer (SL) devices. Our theoretical calculations interpret this even-odd layer-dependent exchange bias effect as a consequence of contrasting surface and bulk magnetic properties of MnBi2Te4 devices. Our findings reveal the microscopic magnetic configuration of MnBi2Te4 thin flakes and highlight the challenges in replicating the zero magnetic field quantum anomalous Hall effect in odd SL MnBi2Te4 devices. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 23 pages, 4 figures, comments are very much welcome

arXiv:2404.01817 [pdf, other]

Tensorized NeuroEvolution of Augmenting Topologies for GPU Acceleration

Authors: Lishuang Wang, Mengfei Zhao, Enyu Liu, Kebin Sun, Ran Cheng

Abstract: The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its sc… ▽ More The NeuroEvolution of Augmenting Topologies (NEAT) algorithm has received considerable recognition in the field of neuroevolution. Its effectiveness is derived from initiating with simple networks and incrementally evolving both their topologies and weights. Although its capability across various challenges is evident, the algorithm's computational efficiency remains an impediment, limiting its scalability potential. In response, this paper introduces a tensorization method for the NEAT algorithm, enabling the transformation of its diverse network topologies and associated operations into uniformly shaped tensors for computation. This advancement facilitates the execution of the NEAT algorithm in a parallelized manner across the entire population. Furthermore, we develop TensorNEAT, a library that implements the tensorized NEAT algorithm and its variants, such as CPPN and HyperNEAT. Building upon JAX, TensorNEAT promotes efficient parallel computations via automated function vectorization and hardware acceleration. Moreover, the TensorNEAT library supports various benchmark environments including Gym, Brax, and gymnax. Through evaluations across a spectrum of robotics control environments in Brax, TensorNEAT achieves up to 500x speedups compared to the existing implementations such as NEAT-Python. Source codes are available at: https://github.com/EMI-Group/tensorneat. △ Less

Submitted 11 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2404.01159 [pdf, other]

doi 10.1145/3638529.3654223

GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEA

Authors: Zhenyu Liang, Tao Jiang, Kebin Sun, Ran Cheng

Abstract: Evolutionary multiobjective optimization has witnessed remarkable progress during the past decades. However, existing algorithms often encounter computational challenges in large-scale scenarios, primarily attributed to the absence of hardware acceleration. In response, we introduce a Tensorized Reference Vector Guided Evolutionary Algorithm (TensorRVEA) for harnessing the advancements of GPU acce… ▽ More Evolutionary multiobjective optimization has witnessed remarkable progress during the past decades. However, existing algorithms often encounter computational challenges in large-scale scenarios, primarily attributed to the absence of hardware acceleration. In response, we introduce a Tensorized Reference Vector Guided Evolutionary Algorithm (TensorRVEA) for harnessing the advancements of GPU acceleration. In TensorRVEA, the key data structures and operators are fully transformed into tensor forms for leveraging GPU-based parallel computing. In numerical benchmark tests involving large-scale populations and problem dimensions, TensorRVEA consistently demonstrates high computational performance, achieving up to over 1000$\times$ speedups. Then, we applied TensorRVEA to the domain of multiobjective neuroevolution for addressing complex challenges in robotic control tasks. Furthermore, we assessed TensorRVEA's extensibility by altering several tensorized reproduction operators. Experimental results demonstrate promising scalability and robustness of TensorRVEA. Source codes are available at \url{https://github.com/EMI-Group/tensorrvea}. △ Less

Submitted 11 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Genetic and Evolutionary Computation Conference (GECCO '24)

arXiv:2403.13463 [pdf, ps, other]

Derived categories of quartic double fivefolds

Authors: Raymond Cheng, Alexander Perry, Xiaolei Zhao

Abstract: We construct singular quartic double fivefolds whose Kuznetsov component admits a crepant categorical resolution of singularities by a twisted Calabi--Yau threefold. We also construct rational specializations of these fivefolds where such a resolution exists without a twist. This confirms an instance of a higher-dimensional version of Kuznetsov's rationality conjecture, and of a noncommutative ver… ▽ More We construct singular quartic double fivefolds whose Kuznetsov component admits a crepant categorical resolution of singularities by a twisted Calabi--Yau threefold. We also construct rational specializations of these fivefolds where such a resolution exists without a twist. This confirms an instance of a higher-dimensional version of Kuznetsov's rationality conjecture, and of a noncommutative version of Reid's fantasy on the connectedness of the moduli of Calabi--Yau threefolds. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 21 pages, comments welcome!

MSC Class: 14F08; 14E08 (primary); 14M20; 14D06 (secondary)

arXiv:2403.13286 [pdf, other]

A Sampling-based Framework for Hypothesis Testing on Large Attributed Graphs

Authors: Yun Wang, Chrysanthi Kosyfaki, Sihem Amer-Yahia, Reynold Cheng

Abstract: Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing in graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses in attributed graphs. We develop a sampling-based hypothesis testing framework,… ▽ More Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing in graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses in attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an m- dimensional random walk that accounts for the paths specified in a hypothesis. We further optimize its time efficiency and propose PHASEopt. Experiments on real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling in terms of accuracy and time efficiency. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.11073 [pdf]

Tokensome: Towards a Genetic Vision-Language GPT for Explainable and Cognitive Karyotyping

Authors: Haoxi Zhang, Xinxu Zhang, Yuanxin Lin, Maiqi Wang, Yi Lai, Yu Wang, Linfeng Yu, Yufeng Xu, Ran Cheng, Edward Szczerbicki

Abstract: Automatic karyotype analysis is often defined as a visual perception task focused solely on chromosomal object-level modeling. This definition has led most existing methods to overlook componential and holistic information, significantly constraining model performance. Moreover, the lack of interpretability in current technologies hinders clinical adoption. In this paper, we introduce Tokensome, a… ▽ More Automatic karyotype analysis is often defined as a visual perception task focused solely on chromosomal object-level modeling. This definition has led most existing methods to overlook componential and holistic information, significantly constraining model performance. Moreover, the lack of interpretability in current technologies hinders clinical adoption. In this paper, we introduce Tokensome, a novel vision-language model based on chromosome tokenization for explainable and cognitive karyotyping. Tokensome elevates the method from the conventional visual perception layer to the cognitive decision-making layer. This elevation enables the integration of domain knowledge and cognitive reasoning via knowledge graphs and LLMs, markedly enhancing model's explainability and facilitating abnormality detection. △ Less

Submitted 16 March, 2024; originally announced March 2024.

Comments: Preprint. Work in progress

arXiv:2403.08600 [pdf]

Evaluation of Control/User-Plane Denial-of-Service (DoS) Attack on O-RAN Fronthaul Interface

Authors: Ferlinda Feliana, Ting-Wei Hung, Binbin Chen, Ray-Guang Cheng

Abstract: The open fronthaul interface defined by O-RAN ALLIANCE aims to support the interoperability between multi-vendor open radio access network (O-RAN) radio units (O-RU) and O-RAN distributed units (O-DU). This paper introduces a new tool that could be used to evaluate Denial-of-Service (DoS) attacks against the open fronthaul interface. We launched an array of control/user planes (C/U-Planes) attacks… ▽ More The open fronthaul interface defined by O-RAN ALLIANCE aims to support the interoperability between multi-vendor open radio access network (O-RAN) radio units (O-RU) and O-RAN distributed units (O-DU). This paper introduces a new tool that could be used to evaluate Denial-of-Service (DoS) attacks against the open fronthaul interface. We launched an array of control/user planes (C/U-Planes) attacks with the tool under different traffic types and data rates, and we evaluated their impacts on the throughput and block error rate (BLER) of real-world O-RAN systems with commercial hardware. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Accepted by IEEE INFOCOM Workshop: Next-generation Open and Programmable Radio Access Networks (NG-OPERA)

arXiv:2403.07846 [pdf, other]

Topology-induced symmetry breaking: a demonstration in antiferromagnetic magnons on a Möbius strip

Authors: Kuangyin Deng, Ran Cheng

Abstract: We propose a mechanism of topology-induced symmetry breaking, where certain local symmetry preserved by the Hamiltonian is broken in the excited eigenstates due to the nontrivial boundary condition. As a demonstration, we study magnon excitations on a Möbius strip comprising of two antiferromagnetically coupled spin chains. Even under a simple Hamiltonian respecting local rotational symmetry and w… ▽ More We propose a mechanism of topology-induced symmetry breaking, where certain local symmetry preserved by the Hamiltonian is broken in the excited eigenstates due to the nontrivial boundary condition. As a demonstration, we study magnon excitations on a Möbius strip comprising of two antiferromagnetically coupled spin chains. Even under a simple Hamiltonian respecting local rotational symmetry and without considering curvature effects, magnons exhibit linear polarization of the Néel vector devoid of chirality and form two non-degenerate branches that cannot be smoothly connected to or be decomposed by the circularly-polarized magnons typically seen in antiferromagnets. One branch supports standing-wave formation on the Möbius strip while the other does not, owing to its spectral shift incurred by the boundary condition. Our findings showcase the significant influence of real-space topology on the physical nature of quasiparticles. △ Less

Submitted 12 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07145

Electrically Programmable Pixelated Graphene-Integrated Plasmonic Metasurfaces for Coherent Mid-Infrared Emission

Authors: Xiu Liu, Yibai Zhong, Zexiao Wang, Tianyi Huang, Sen Lin, Jingyi Zou, Haozhe Wang, Zhien Wang, Zhuo Li, Xiao Luo, Rui Cheng, Jiayu Li, Hyeong Seok Yun, Han Wang, Jing Kong, Xu Zhang, Sheng Shen

Abstract: Active metasurfaces have recently emerged as compact, lightweight, and efficient platforms for dynamic control of electromagnetic fields and optical responses. However, the complexities associated with their post-fabrication tunability significantly hinder their widespread applications, especially for the mid-infrared range due to material scarcity and design intricacy. Here, we experimentally dem… ▽ More Active metasurfaces have recently emerged as compact, lightweight, and efficient platforms for dynamic control of electromagnetic fields and optical responses. However, the complexities associated with their post-fabrication tunability significantly hinder their widespread applications, especially for the mid-infrared range due to material scarcity and design intricacy. Here, we experimentally demonstrate highly dynamic, pixelated modulations of coherent mid-infrared emission based on an electrically programmable plasmonic metasurface integrated with graphene field effect transistors (Gr-FETs). The ultrabroad infrared transparency of graphene allows for free-form control over plasmonic meta-atoms, thus achieving coherent mid-infrared states across a broad range of wavelengths and polarizations. The spatial temperature modulation generated by Gr-FETs is effectively synergized with the emissivity control by the localized surface plasmon polaritons from gold nanoantennas. This integrated temperature-emissivity modulation of metasurfaces is systematically extended to form a pixelated 2D array, envisioning new approaches toward scalable 2D electrical wiring for densely packed, independently controlled pixels. △ Less

Submitted 6 May, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: Needs more updates for the experiments

arXiv:2403.05680 [pdf, other]

How Well Do Multi-modal LLMs Interpret CT Scans? An Auto-Evaluation Framework for Analyses

Authors: Qingqing Zhu, Benjamin Hou, Tejas S. Mathai, Pritam Mukherjee, Qiao Jin, Xiuying Chen, Zhizheng Wang, Ruida Cheng, Ronald M. Summers, Zhiyong Lu

Abstract: Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini… ▽ More Automatically interpreting CT scans can ease the workload of radiologists. However, this is challenging mainly due to the scarcity of adequate datasets and reference standards for evaluation. This study aims to bridge this gap by introducing a novel evaluation framework, named ``GPTRadScore''. This framework assesses the capabilities of multi-modal LLMs, such as GPT-4 with Vision (GPT-4V), Gemini Pro Vision, LLaVA-Med, and RadFM, in generating descriptions for prospectively-identified findings. By employing a decomposition technique based on GPT-4, GPTRadScore compares these generated descriptions with gold-standard report sentences, analyzing their accuracy in terms of body part, location, and type of finding. Evaluations demonstrated a high correlation with clinician assessments and highlighted its potential over traditional metrics, such as BLEU, METEOR, and ROUGE. Furthermore, to contribute to future studies, we plan to release a benchmark dataset annotated by clinicians. Using GPTRadScore, we found that while GPT-4V and Gemini Pro Vision fare better, their performance revealed significant areas for improvement, primarily due to limitations in the dataset used for training these models. To demonstrate this potential, RadFM was fine-tuned and it resulted in significant accuracy improvements: location accuracy rose from 3.41\% to 12.8\%, body part accuracy from 29.12\% to 53\%, and type accuracy from 9.24\% to 30\%, thereby validating our hypothesis. △ Less

Submitted 18 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05307 [pdf, other]

Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents

Authors: Jinyang Li, Nan Huo, Yan Gao, Jiayi Shi, Yingxiu Zhao, Ge Qu, Yurong Wu, Chenhao Ma, Jian-Guang Lou, Reynold Cheng

Abstract: Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM ag… ▽ More Interactive Data Analysis, the collaboration between humans and LLM agents, enables real-time data exploration for informed decision-making. The challenges and costs of collecting realistic interactive logs for data analysis hinder the quantitative evaluation of Large Language Model (LLM) agents in this task. To mitigate this issue, we introduce Tapilot-Crossing, a new benchmark to evaluate LLM agents on interactive data analysis. Tapilot-Crossing contains 1024 interactions, covering 4 practical scenarios: Normal, Action, Private, and Private Action. Notably, Tapilot-Crossing is constructed by an economical multi-agent environment, Decision Company, with few human efforts. We evaluate popular and advanced LLM agents in Tapilot-Crossing, which underscores the challenges of interactive data analysis. Furthermore, we propose Adaptive Interaction Reflection (AIR), a self-generated reflection strategy that guides LLM agents to learn from successful history. Experiments demonstrate that Air can evolve LLMs into effective interactive data analysis agents, achieving a relative performance improvement of up to 44.5%. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: 30 pages, 7 figures

arXiv:2403.04796 [pdf, other]

Blockchain-Enhanced UAV Networks for Post-Disaster Communication: A Decentralized Flocking Approach

Authors: Sana Hafeez, Runze Cheng, Lina Mohjazi, Yao Sun, Muhammad Ali Imran

Abstract: Unmanned Aerial Vehicles (UAVs) have significant potential for agile communication and relief coordination in post-disaster scenarios, particularly when ground infrastructure is compromised. However, efficiently coordinating and securing flocks of heterogeneous UAVs from different service providers poses significant challenges related to privacy, scalability, lightweight consensus protocols, and c… ▽ More Unmanned Aerial Vehicles (UAVs) have significant potential for agile communication and relief coordination in post-disaster scenarios, particularly when ground infrastructure is compromised. However, efficiently coordinating and securing flocks of heterogeneous UAVs from different service providers poses significant challenges related to privacy, scalability, lightweight consensus protocols, and comprehensive cybersecurity mechanisms. This study introduces a robust blockchain-enabled framework designed to tackle these technical challenges through a combination of consensus protocols, smart contracts, and cryptographic techniques. First, we propose a consortium blockchain architecture that ensures secure and private multi-agency coordination by controlling access and safeguarding the privacy of sensitive data. Second, we develop an optimized hybrid consensus protocol that merges Delegated Proof of Stake and Practical Byzantine Fault Tolerance (DPOS-PBFT), aiming to achieve an effective balance between efficiency, security, and resilience against node failures. Finally, we introduce decentralized flocking algorithms that facilitate adaptable and autonomous operations among specialized UAV clusters, ensuring critical disaster relief functions under conditions of uncertain connectivity. Comprehensive simulations demonstrate the system achieved linear scaling of throughput up to 500 UAV nodes, with only a 50ms increase in latency from 10 to 500 nodes. The framework maintained high throughput and low latency despite spoofing, denial-of-service (DoS), and tampering attacks, showing strong cyber resilience. Communication latencies were kept under 10ms for diverse UAV operations through self-optimizing network intelligence, with median values around 2-3ms. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 11 pages, 9 figures, Digital Communications and Networks Open access

arXiv:2402.17237 [pdf, other]

Image-Text Matching with Multi-View Attention

Authors: Rui Cheng, Wanqing Cui

Abstract: Existing two-stream models for image-text matching show good performance while ensuring retrieval speed and have received extensive attention from industry and academia. These methods use a single representation to encode image and text separately and get a matching score with cosine similarity or the inner product of vectors. However, the performance of the two-stream model is often sub-optimal.… ▽ More Existing two-stream models for image-text matching show good performance while ensuring retrieval speed and have received extensive attention from industry and academia. These methods use a single representation to encode image and text separately and get a matching score with cosine similarity or the inner product of vectors. However, the performance of the two-stream model is often sub-optimal. On the one hand, a single representation is challenging to cover complex content comprehensively. On the other hand, in this framework of lack of interaction, it is challenging to match multiple meanings which leads to information being ignored. To address the problems mentioned above and facilitate the performance of the two-stream model, we propose a multi-view attention approach for two-stream image-text matching MVAM (\textbf{M}ulti-\textbf{V}iew \textbf{A}ttention \textbf{M}odel). It first learns multiple image and text representations by diverse attention heads with different view codes. And then concatenate these representations into one for matching. A diversity objective is also used to promote diversity between attention heads. With this method, models are able to encode images and text from different views and attend to more key points. So we can get representations that contain more information. When doing retrieval tasks, the matching scores between images and texts can be calculated from different aspects, leading to better matching performance. Experiment results on MSCOCO and Flickr30K show that our proposed model brings improvements over existing models. Further case studies show that different attention heads can focus on different contents and finally obtain a more comprehensive representation. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.15331 [pdf, other]

A Blockchain-Enabled Framework of UAV Coordination for Post-Disaster Networks

Authors: Sana Hafeez, Runze Cheng, Lina Mohjazi, Muhammad Ali Imran, Yao Sun

Abstract: Emergency communication is critical but challenging after natural disasters when ground infrastructure is devastated. Unmanned aerial vehicles (UAVs) offer enormous potential for agile relief coordination in these scenarios. However, effectively leveraging UAV fleets poses additional challenges around security, privacy, and efficient collaboration across response agencies. This paper presents a ro… ▽ More Emergency communication is critical but challenging after natural disasters when ground infrastructure is devastated. Unmanned aerial vehicles (UAVs) offer enormous potential for agile relief coordination in these scenarios. However, effectively leveraging UAV fleets poses additional challenges around security, privacy, and efficient collaboration across response agencies. This paper presents a robust blockchain-enabled framework to address these challenges by integrating a consortium blockchain model, smart contracts, and cryptographic techniques to securely coordinate UAV fleets for disaster response. Specifically, we make two key contributions: a consortium blockchain architecture for secure and private multi-agency coordination; and an optimized consensus protocol balancing efficiency and fault tolerance using a delegated proof of stake practical byzantine fault tolerance (DPoS-PBFT). Comprehensive simulations showcase the framework's ability to enhance transparency, automation, scalability, and cyber-attack resilience for UAV coordination in post-disaster networks. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures,IEEE 99th Vehicular Technology Conference: VTC2024-Spring, Singapore

arXiv:2402.13116 [pdf, other]

A Survey on Knowledge Distillation of Large Language Models

Authors: Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, Tianyi Zhou

Abstract: In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employi… ▽ More In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employing themselves as teachers. This paper presents a comprehensive survey of KD's role within the realm of LLM, highlighting its critical function in imparting advanced knowledge to smaller models and its utility in model compression and self-improvement. Our survey is meticulously structured around three foundational pillars: \textit{algorithm}, \textit{skill}, and \textit{verticalization} -- providing a comprehensive examination of KD mechanisms, the enhancement of specific cognitive abilities, and their practical implications across diverse fields. Crucially, the survey navigates the intricate interplay between data augmentation (DA) and KD, illustrating how DA emerges as a powerful paradigm within the KD framework to bolster LLMs' performance. By leveraging DA to generate context-rich, skill-specific training data, KD transcends traditional boundaries, enabling open-source models to approximate the contextual adeptness, ethical alignment, and deep semantic insights characteristic of their proprietary counterparts. This work aims to provide an insightful guide for researchers and practitioners, offering a detailed overview of current methodologies in KD and proposing future research directions. Importantly, we firmly advocate for compliance with the legal terms that regulate the use of LLMs, ensuring ethical and lawful application of KD of LLMs. An associated Github repository is available at https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs. △ Less

Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: 44 pages

arXiv:2402.12699 [pdf, other]

Positive temperature-dependent thermal conductivity induced by wavelike phonons in complex Ag-based argyrodites

Authors: Niuchang Ouyang, Dongyi Shen, Chen Wang, Ruihuan Cheng, Qi Wang, Yue Chen

Abstract: The phonon transport mechanisms and the anomalous temperature-dependent lattice thermal conductivities (kL) in Ag-based argyrodites have not been fully understood. Herein, we systematically study the phonon thermal transport of five Ag-based crystalline argyrodites Ag7PS6, Ag7AsS6, Ag8SnS6, Ag8GeS6 and Ag9GaS6 utilizing perturbation theory and the unified theory thermal transport model. Our result… ▽ More The phonon transport mechanisms and the anomalous temperature-dependent lattice thermal conductivities (kL) in Ag-based argyrodites have not been fully understood. Herein, we systematically study the phonon thermal transport of five Ag-based crystalline argyrodites Ag7PS6, Ag7AsS6, Ag8SnS6, Ag8GeS6 and Ag9GaS6 utilizing perturbation theory and the unified theory thermal transport model. Our results show that, as the complexity of the unit cell increases, the proportion of the population terms falls while the coherence contributions become more significant, leading to the relatively weak temperature-dependent kL of Ag7PS6 and Ag7AsS6, while the more complex crystalline argyrodites, Ag8SnS6, Ag8GeS6 and Ag9GaS6, exhibiting a glass-like behavior in their temperature dependence of kL. We attribute the positive temperature-dependent and ultralow kL of Ag8SnS6, Ag8GeS6 and Ag9GaS6 to the dominance of wavelike phonons and the strong phonon broadening. Furthermore, using laser flash measurements and the homogeneous non-equilibrium molecular dynamics simulations based on accurate machine learning neuroevolution potentials, we provide further evidence for the glass-like temperature-dependent kL of Ag8SnS6 and Ag8GeS6. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 6 pages, 4 figures

arXiv:2402.09884 [pdf, other]

$q$-bic threefolds and their surface of lines

Authors: Raymond Cheng

Abstract: For any power $q$ of the positive ground field characteristic, a smooth $q$-bic threefold -- the Fermat threefold of degree $q+1$ for example -- has a smooth surface $S$ of lines which behaves like the Fano surface of a smooth cubic threefold. I develop projective, moduli-theoretic, and degeneration techniques to study the geometry of $S$. Using, in addition, the modular representation theory of t… ▽ More For any power $q$ of the positive ground field characteristic, a smooth $q$-bic threefold -- the Fermat threefold of degree $q+1$ for example -- has a smooth surface $S$ of lines which behaves like the Fano surface of a smooth cubic threefold. I develop projective, moduli-theoretic, and degeneration techniques to study the geometry of $S$. Using, in addition, the modular representation theory of the finite unitary group and the geometric theory of filtrations, I compute cohomology of the structure sheaf of $S$ when $q$ is prime. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 47 pages. Comments very welcome!

MSC Class: 14F10; 14J29; 14G17; (primary); 14J70; 14M12; 14N05; (secondary)

arXiv:2402.07425 [pdf, other]

doi 10.1145/3589334.3645421

Debiasing Recommendation with Personal Popularity

Authors: Wentao Ning, Reynold Cheng, Xiao Yan, Ben Kao, Nan Huo, Nur AI Hasan Haldar, Bo Tang

Abstract: Global popularity (GP) bias is the phenomenon that popular items are recommended much more frequently than they should be, which goes against the goal of providing personalized recommendations and harms user experience and recommendation accuracy. Many methods have been proposed to reduce GP bias but they fail to notice the fundamental problem of GP, i.e., it considers popularity from a \textit{gl… ▽ More Global popularity (GP) bias is the phenomenon that popular items are recommended much more frequently than they should be, which goes against the goal of providing personalized recommendations and harms user experience and recommendation accuracy. Many methods have been proposed to reduce GP bias but they fail to notice the fundamental problem of GP, i.e., it considers popularity from a \textit{global} perspective of \textit{all users} and uses a single set of popular items, and thus cannot capture the interests of individual users. As such, we propose a user-aware version of item popularity named \textit{personal popularity} (PP), which identifies different popular items for each user by considering the users that share similar interests. As PP models the preferences of individual users, it naturally helps to produce personalized recommendations and mitigate GP bias. To integrate PP into recommendation, we design a general \textit{personal popularity aware counterfactual} (PPAC) framework, which adapts easily to existing recommendation models. In particular, PPAC recognizes that PP and GP have both direct and indirect effects on recommendations and controls direct effects with counterfactual inference techniques for unbiased recommendations. All codes and datasets are available at \url{https://github.com/Stevenn9981/PPAC}. △ Less

Submitted 21 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted by WWW'24 as a research full paper

arXiv:2402.06071 [pdf, other]

Keyframer: Empowering Animation Design using Large Language Models

Authors: Tiffany Tseng, Ruijia Cheng, Jeffrey Nichols

Abstract: Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with profession… ▽ More Large language models (LLMs) have the potential to impact a wide range of creative domains, but the application of LLMs to animation is underexplored and presents novel challenges such as how users might effectively describe motion in natural language. In this paper, we present Keyframer, a design tool for animating static images (SVGs) with natural language. Informed by interviews with professional animation designers and engineers, Keyframer supports exploration and refinement of animations through the combination of prompting and direct editing of generated output. The system also enables users to request design variants, supporting comparison and ideation. Through a user study with 13 participants, we contribute a characterization of user prompting strategies, including a taxonomy of semantic prompt types for describing motion and a 'decomposed' prompting style where users continually adapt their goals in response to generated output.We share how direct editing along with prompting enables iteration beyond one-shot prompting interfaces common in generative tools today. Through this work, we propose how LLMs might empower a range of audiences to engage with animation creation. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.03362 [pdf, other]

NanoNER: Named Entity Recognition for nanobiology using experts' knowledge and distant supervision

Authors: Martin Lentschat, Cyril Labbé, Ran Cheng

Abstract: Here we present the training and evaluation of NanoNER, a Named Entity Recognition (NER) model for Nanobiology. NER consists in the identification of specific entities in spans of unstructured texts and is often a primary task in Natural Language Processing (NLP) and Information Extraction. The aim of our model is to recognise entities previously identified by domain experts as constituting the es… ▽ More Here we present the training and evaluation of NanoNER, a Named Entity Recognition (NER) model for Nanobiology. NER consists in the identification of specific entities in spans of unstructured texts and is often a primary task in Natural Language Processing (NLP) and Information Extraction. The aim of our model is to recognise entities previously identified by domain experts as constituting the essential knowledge of the domain. Relying on ontologies, which provide us with a domain vocabulary and taxonomy, we implemented an iterative process enabling experts to determine the entities relevant to the domain at hand. We then delve into the potential of distant supervision learning in NER, supporting how this method can increase the quantity of annotated data with minimal additional manpower. On our full corpus of 728 full-text nanobiology articles, containing more than 120k entity occurrences, NanoNER obtained a F1-score of 0.98 on the recognition of previously known entities. Our model also demonstrated its ability to discover new entities in the text, with precision scores ranging from 0.77 to 0.81. Ablation experiments further confirmed this and allowed us to assess the dependency of our approach on the external resources. It highlighted the dependency of the approach to the resource, while also confirming its ability to rediscover up to 30% of the ablated terms. This paper details the methodology employed, experimental design, and key findings, providing valuable insights and directions for future related researches on NER in specialized domain. Furthermore, since our approach require minimal manpower , we believe that it can be generalized to other specialized fields. △ Less

Submitted 30 January, 2024; originally announced February 2024.

arXiv:2401.14201 [pdf]

Investigating Organic Carbon and Thermal History of CM Carbonaceous Chondrites Using Spectroscopy and Laboratory Techniques

Authors: Safoura Tanbakouei, Rui-Lin Cheng, Binlong Ye, Josep Ryan Michalski, Ashley J. King

Abstract: The CM chondrites are characterized as primary accretionary rocks which originate from primitive water-rich asteroids formed during the early Solar System. Here, we study the mineralogy and organic characteristics of right CM and one ungrouped chondrite to better understand their alteration history; Queen Alexandra Range 93005 (QUE 93005), Murchison, LaPaz Icefield 02333 (LAP 02333), Miller Range… ▽ More The CM chondrites are characterized as primary accretionary rocks which originate from primitive water-rich asteroids formed during the early Solar System. Here, we study the mineralogy and organic characteristics of right CM and one ungrouped chondrite to better understand their alteration history; Queen Alexandra Range 93005 (QUE 93005), Murchison, LaPaz Icefield 02333 (LAP 02333), Miller Range (MIL 13005), Mackay Glacier 05231 (MCY 05231), Northwest Africa 8534 (NWA 8534), Northwest Africa 3340 (NWA 3340), Yamato 86695 (Y-86695), and the ungrouped carbonaceous chondrite Belgica 7904 (B-7904). Raman spectroscopy has been employed to detect the presence of organic carbon in the samples, specifically through the G band at approximately 1580 cm-1 and D band at around 1350 cm-1. The properties of organic matter in meteorites serve as valuable indicators for characterizing the structure and crystallinity of carbonaceous materials and estimating their thermal metamorphism degree. The R1 parameter, defined as the peak height ratio of the D and G bands, provides a quantifiable measure of this structural organization. Raman spectra are used to show the general mineralogy, thermal history and heating stage of CM and ungrouped chondrites. X-ray diffraction patterns further indicate the mineralogical compositions of the samples. Visible to near-infrared (VNIR) and attenuated total reflection (ATR) reflectance spectra illustrate the trends related to their mineralogy and furthermore infer aqueous alteration, thermal history of CM carbonaceous chondrites, formation and evolution of their parent bodies. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.01474 [pdf, other]

doi 10.15607/RSS.2023.XIX.055

Demonstrating Mobile Manipulation in the Wild: A Metrics-Driven Approach

Authors: Max Bajracharya, James Borders, Richard Cheng, Dan Helmick, Lukas Kaul, Dan Kruse, John Leichty, Jeremy Ma, Carolyn Matl, Frank Michel, Chavdar Papazov, Josh Petersen, Krishna Shankar, Mark Tjersland

Abstract: We present our general-purpose mobile manipulation system consisting of a custom robot platform and key algorithms spanning perception and planning. To extensively test the system in the wild and benchmark its performance, we choose a grocery shopping scenario in an actual, unmodified grocery store. We derive key performance metrics from detailed robot log data collected during six week-long field… ▽ More We present our general-purpose mobile manipulation system consisting of a custom robot platform and key algorithms spanning perception and planning. To extensively test the system in the wild and benchmark its performance, we choose a grocery shopping scenario in an actual, unmodified grocery store. We derive key performance metrics from detailed robot log data collected during six week-long field tests, spread across 18 months. These objective metrics, gained from complex yet repeatable tests, drive the direction of our research efforts and let us continuously improve our system's performance. We find that thorough end-to-end system-level testing of a complex mobile manipulation system can serve as a reality-check for state-of-the-art methods in robotics. This effectively grounds robotics research efforts in real world needs and challenges, which we deem highly useful for the advancement of the field. To this end, we share our key insights and takeaways to inspire and accelerate similar system-level research projects. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: Presented at RSS 2023 [Best Demo Paper Award]

arXiv:2312.10890 [pdf, other]

Low-latency Space-time Supersampling for Real-time Rendering

Authors: Ruian He, Shili Zhou, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan

Abstract: With the rise of real-time rendering and the evolution of display devices, there is a growing demand for post-processing methods that offer high-resolution content in a high frame rate. Existing techniques often suffer from quality and latency issues due to the disjointed treatment of frame supersampling and extrapolation. In this paper, we recognize the shared context and mechanisms between frame… ▽ More With the rise of real-time rendering and the evolution of display devices, there is a growing demand for post-processing methods that offer high-resolution content in a high frame rate. Existing techniques often suffer from quality and latency issues due to the disjointed treatment of frame supersampling and extrapolation. In this paper, we recognize the shared context and mechanisms between frame supersampling and extrapolation, and present a novel framework, Space-time Supersampling (STSS). By integrating them into a unified framework, STSS can improve the overall quality with lower latency. To implement an efficient architecture, we treat the aliasing and warping holes unified as reshading regions and put forth two key components to compensate the regions, namely Random Reshading Masking (RRM) and Efficient Reshading Module (ERM). Extensive experiments demonstrate that our approach achieves superior visual fidelity compared to state-of-the-art (SOTA) methods. Notably, the performance is achieved within only 4ms, saving up to 75\% of time against the conventional two-stage pipeline that necessitates 17ms. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2312.07180 [pdf, other]

Context-Aware Iteration Policy Network for Efficient Optical Flow Estimation

Authors: Ri Cheng, Ruian He, Xuhao Jiang, Shili Zhou, Weimin Tan, Bo Yan

Abstract: Existing recurrent optical flow estimation networks are computationally expensive since they use a fixed large number of iterations to update the flow field for each sample. An efficient network should skip iterations when the flow improvement is limited. In this paper, we develop a Context-Aware Iteration Policy Network for efficient optical flow estimation, which determines the optimal number of… ▽ More Existing recurrent optical flow estimation networks are computationally expensive since they use a fixed large number of iterations to update the flow field for each sample. An efficient network should skip iterations when the flow improvement is limited. In this paper, we develop a Context-Aware Iteration Policy Network for efficient optical flow estimation, which determines the optimal number of iterations per sample. The policy network achieves this by learning contextual information to realize whether flow improvement is bottlenecked or minimal. On the one hand, we use iteration embedding and historical hidden cell, which include previous iterations information, to convey how flow has changed from previous iterations. On the other hand, we use the incremental loss to make the policy network implicitly perceive the magnitude of optical flow improvement in the subsequent iteration. Furthermore, the computational complexity in our dynamic network is controllable, allowing us to satisfy various resource preferences with a single trained model. Our policy network can be easily integrated into state-of-the-art optical flow networks. Extensive experiments show that our method maintains performance while reducing FLOPs by about 40%/20% for the Sintel/KITTI datasets. △ Less

Submitted 5 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: 2024, Association for the Advancement of Artificial Intelligence

arXiv:2312.05734 [pdf, ps, other]

A Duality Approach to Regularized Learning Problems in Banach Spaces

Authors: Raymond Cheng, Rui Wang, Yuesheng Xu

Abstract: Learning methods in Banach spaces are often formulated as regularization problems which minimize the sum of a data fidelity term in a Banach norm and a regularization term in another Banach norm. Due to the infinite dimensional nature of the space, solving such regularization problems is challenging. We construct a direct sum space based on the Banach spaces for the data fidelity term and the regu… ▽ More Learning methods in Banach spaces are often formulated as regularization problems which minimize the sum of a data fidelity term in a Banach norm and a regularization term in another Banach norm. Due to the infinite dimensional nature of the space, solving such regularization problems is challenging. We construct a direct sum space based on the Banach spaces for the data fidelity term and the regularization term, and then recast the objective function as the norm of a suitable quotient space of the direct sum space. In this way, we express the original regularized problem as an unregularized problem on the direct sum space, which is in turn reformulated as a dual optimization problem in the dual space of the direct sum space. The dual problem is to find the maximum of a linear function on a convex polytope, which may be solved by linear programming. A solution of the original problem is then obtained by using related extremal properties of norming functionals from a solution of the dual problem. Numerical experiments are included to demonstrate that the proposed duality approach leads to an implementable numerical method for solving the regularization learning problems. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Showing 1–50 of 330 results for author: Cheng, R