subscribe to arXiv mailings

Thinking Fast and Slow: Data-Driven Adaptive DeFi Borrow-Lending Protocol

Authors: Mahsa Bastankhah, Viraj Nadkarni, Xuechao Wang, Chi Jin, Sanjeev Kulkarni, Pramod Viswanath

Abstract: Decentralized finance (DeFi) borrowing and lending platforms are crucial to the decentralized economy, involving two main participants: lenders who provide assets for interest and borrowers who offer collateral exceeding their debt and pay interest. Collateral volatility necessitates over-collateralization to protect lenders and ensure competitive returns. Traditional DeFi platforms use a fixed in… ▽ More Decentralized finance (DeFi) borrowing and lending platforms are crucial to the decentralized economy, involving two main participants: lenders who provide assets for interest and borrowers who offer collateral exceeding their debt and pay interest. Collateral volatility necessitates over-collateralization to protect lenders and ensure competitive returns. Traditional DeFi platforms use a fixed interest rate curve based on the utilization rate (the fraction of available assets borrowed) and determine over-collateralization offline through simulations to manage risk. This method doesn't adapt well to dynamic market changes, such as price fluctuations and evolving user needs, often resulting in losses for lenders or borrowers. In this paper, we introduce an adaptive, data-driven protocol for DeFi borrowing and lending. Our approach includes a high-frequency controller that dynamically adjusts interest rates to maintain market stability and competitiveness with external markets. Unlike traditional protocols, which rely on user reactions and often adjust slowly, our controller uses a learning-based algorithm to quickly find optimal interest rates, reducing the opportunity cost for users during periods of misalignment with external rates. Additionally, we use a low-frequency planner that analyzes user behavior to set an optimal over-collateralization ratio, balancing risk reduction with profit maximization over the long term. This dual approach is essential for adaptive markets: the short-term component maintains market stability, preventing exploitation, while the long-term planner optimizes market parameters to enhance profitability and reduce risks. We provide theoretical guarantees on the convergence rates and adversarial robustness of the short-term component and the long-term effectiveness of our protocol. Empirical validation confirms our protocol's theoretical benefits. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.05875 [pdf, other]

Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling

Authors: Lintao Zhang, Xiangcheng Du, LeoWu TomyEnrique, Yiqun Wang, Yingbin Zheng, Cheng Jin

Abstract: For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an… ▽ More For image inpainting, the existing Denoising Diffusion Probabilistic Model (DDPM) based method i.e. RePaint can produce high-quality images for any inpainting form. It utilizes a pre-trained DDPM as a prior and generates inpainting results by conditioning on the reverse diffusion process, namely denoising process. However, this process is significantly time-consuming. In this paper, we propose an efficient DDPM-based image inpainting method which includes three speed-up strategies. First, we utilize a pre-trained Light-Weight Diffusion Model (LWDM) to reduce the number of parameters. Second, we introduce a skip-step sampling scheme of Denoising Diffusion Implicit Models (DDIM) for the denoising process. Finally, we propose Coarse-to-Fine Sampling (CFS), which speeds up inference by reducing image resolution in the coarse stage and decreasing denoising timesteps in the refinement stage. We conduct extensive experiments on both faces and general-purpose image inpainting tasks, and our method achieves competitive performance with approximately 60 times speedup. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: The code is avaliable at: https://github.com/linghuyuhangyuan/M2S

arXiv:2407.02769 [pdf, other]

Fine-Grained Scene Image Classification with Modality-Agnostic Adapter

Authors: Yiqun Wang, Zhao Zhou, Xiangcheng Du, Xingjiao Wu, Yingbin Zheng, Cheng Jin

Abstract: When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter)… ▽ More When dealing with the task of fine-grained scene image classification, most previous works lay much emphasis on global visual features when doing multi-modal feature fusion. In other words, models are deliberately designed based on prior intuitions about the importance of different modalities. In this paper, we present a new multi-modal feature fusion approach named MAA (Modality-Agnostic Adapter), trying to make the model learn the importance of different modalities in different cases adaptively, without giving a prior setting in the model architecture. More specifically, we eliminate the modal differences in distribution and then use a modality-agnostic Transformer encoder for a semantic-level feature fusion. Our experiments demonstrate that MAA achieves state-of-the-art results on benchmarks by applying the same modalities with previous methods. Besides, it is worth mentioning that new modalities can be easily added when using MAA and further boost the performance. Code is available at https://github.com/quniLcs/MAA. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00564 [pdf, other]

Variational Nonparametric Inference in Functional Stochastic Block Model

Authors: Zuofeng Shang, Peijun Sang, Yang Feng, Chong Jin

Abstract: We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual… ▽ More We propose a functional stochastic block model whose vertices involve functional data information. This new model extends the classic stochastic block model with vector-valued nodal information, and finds applications in real-world networks whose nodal information could be functional curves. Examples include international trade data in which a network vertex (country) is associated with the annual or quarterly GDP over certain time period, and MyFitnessPal data in which a network vertex (MyFitnessPal user) is associated with daily calorie information measured over certain time period. Two statistical tasks will be jointly executed. First, we will detect community structures of the network vertices assisted by the functional nodal information. Second, we propose computationally efficient variational test to examine the significance of the functional nodal information. We show that the community detection algorithms achieve weak and strong consistency, and the variational test is asymptotically chi-square with diverging degrees of freedom. As a byproduct, we propose pointwise confidence intervals for the slop function of the functional nodal information. Our methods are examined through both simulated and real datasets. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00336 [pdf, other]

Dual-view Aware Smart Contract Vulnerability Detection for Ethereum

Authors: Jiacheng Yao, Maolin Wang, Wanqi Chen, Chengxiang Jin, Jiajun Zhou, Shanqing Yu, Qi Xuan

Abstract: The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprece… ▽ More The wide application of Ethereum technology has brought technological innovation to traditional industries. As one of Ethereum's core applications, smart contracts utilize diverse contract codes to meet various functional needs and have gained widespread use. However, the non-tamperability of smart contracts, coupled with vulnerabilities caused by natural flaws or human errors, has brought unprecedented challenges to blockchain security. Therefore, in order to ensure the healthy development of blockchain technology and the stability of the blockchain community, it is particularly important to study the vulnerability detection techniques for smart contracts. In this paper, we propose a Dual-view Aware Smart Contract Vulnerability Detection Framework named DVDet. The framework initially converts the source code and bytecode of smart contracts into weighted graphs and control flow sequences, capturing potential risk features from these two perspectives and integrating them for analysis, ultimately achieving effective contract vulnerability detection. Comprehensive experiments on the Ethereum dataset show that our method outperforms others in detecting vulnerabilities. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

arXiv:2406.18969 [pdf, ps, other]

Asymptotics of quantized barycenters of lattice polytopes with applications to algebraic geometry

Authors: Chenzi Jin, Yanir A. Rubinstein

Abstract: This article addresses a combinatorial problem with applications to algebraic geometry. To a convex lattice polytope $P$ and each of its integer dilations $kP$ one may associate the barycenter of its lattice points. This sequence of $k$-quantized barycenters converge to the (classical) barycenter of the polytope considered as a convex body. A basic question arises: is there a complete asymptotic e… ▽ More This article addresses a combinatorial problem with applications to algebraic geometry. To a convex lattice polytope $P$ and each of its integer dilations $kP$ one may associate the barycenter of its lattice points. This sequence of $k$-quantized barycenters converge to the (classical) barycenter of the polytope considered as a convex body. A basic question arises: is there a complete asymptotic expansion for this sequence? If so, what are its terms? This article initiates the study of this question. First, we establish the existence of such an expansion as well as determine the first two terms. Second, for Delzant lattice polytopes we use toric algebra to determine all terms using mixed volumes of virtual rooftop polytopes, or alternatively in terms of higher Donaldson--Futaki invariants. Third, for reflexive polytopes we show the quantized barycenters are colinear to first order, and actually colinear in the case of polygons. The proofs use Ehrhart theory, convexity arguments, and toric algebra. As applications we derive the complete asymptotic expansion of the Fujita--Odaka stability thresholds $δ_k$ on arbitrary polarizations on (possibly singular) toric varieties. In fact, we show they are rational functions of $k$ for sufficiently large $k$. This gives the first general result on Tian's stabilization problem for $δ_k$-invariants for (possibly singular) toric Fanos: $δ_k$ stabilize in $k$ if and only if they are all equal to $1$, and when smooth if and only if asymptotically Chow semistable. We also relate the asymptotic expansions to higher Donaldson--Futaki invariants of test configurations motivated by Ehrhart theory, and unify in passing previous results of Donaldson, Ono, Futaki, and Rubinstein--Tian--Zhang. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: with an appendix by Yaxiong Liu

arXiv:2406.15734 [pdf, other]

RankAdaptor: Hierarchical Dynamic Low-Rank Adaptation for Structural Pruned LLMs

Authors: Changhai Zhou, Shijie Han, Shiyang Zhang, Shichao Weng, Zekai Liu, Cheng Jin

Abstract: The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various dow… ▽ More The efficient compression of large language models (LLMs) is becoming increasingly popular. However, recovering the accuracy of compressed LLMs is still a major challenge. Structural pruning with standard Low-Rank Adaptation (LoRA) is a common technique in current LLM compression. In structural pruning, the model architecture is modified unevenly, resulting in suboptimal performance in various downstream tasks via standard LoRA with fixed rank. To address this problem, we introduce RankAdaptor, an efficient fine-tuning method with hierarchical dynamic rank scheduling for pruned LLMs. An end-to-end automatic optimization flow is developed that utilizes a lightweight performance model to determine the different ranks during fine-tuning. Comprehensive experiments on popular benchmarks show that RankAdaptor consistently outperforms standard LoRA with structural pruning over different pruning settings. Without increasing the trainable parameters, RankAdaptor further reduces the accuracy performance gap between the recovery of the pruned model and the original model compared to standard LoRA. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.14449 [pdf, other]

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

Authors: Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

Abstract: Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly… ▽ More Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly reranking, underexplored. Directly applying current prompt engineering algorithms to relevance ranking is challenging due to the integration of query and long passage pairs in the input, where the ranking complexity surpasses classification tasks. To reduce human effort and unlock the potential of prompt optimization in reranking, we introduce a novel automatic prompt engineering algorithm named APEER. APEER iteratively generates refined prompts through feedback and preference optimization. Extensive experiments with four LLMs and ten datasets demonstrate the substantial performance improvement of APEER over existing state-of-the-art (SoTA) manual prompts. Furthermore, we find that the prompts generated by APEER exhibit better transferability across diverse tasks and LLMs. Code is available at https://github.com/jincan333/APEER. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.07131 [pdf, other]

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis

Authors: Yunyi Liu, Craig Jin

Abstract: Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited… ▽ More Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06959 [pdf, other]

Unleashing the Denoising Capability of Diffusion Prior for Solving Inverse Problems

Authors: Jiawei Zhang, Jiaxin Zhuang, Cheng Jin, Gen Li, Yuantao Gu

Abstract: The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primari… ▽ More The recent emergence of diffusion models has significantly advanced the precision of learnable priors, presenting innovative avenues for addressing inverse problems. Since inverse problems inherently entail maximum a posteriori estimation, previous works have endeavored to integrate diffusion priors into the optimization frameworks. However, prevailing optimization-based inverse algorithms primarily exploit the prior information within the diffusion models while neglecting their denoising capability. To bridge this gap, this work leverages the diffusion process to reframe noisy inverse problems as a two-variable constrained optimization task by introducing an auxiliary optimization variable. By employing gradient truncation, the projection gradient descent method is efficiently utilized to solve the corresponding optimization problem. The proposed algorithm, termed ProjDiff, effectively harnesses the prior information and the denoising capability of a pre-trained diffusion model within the optimization framework. Extensive experiments on the image restoration tasks and source separation and partial generation tasks demonstrate that ProjDiff exhibits superior performance across various linear and nonlinear inverse problems, highlighting its potential for practical applications. Code is available at https://github.com/weigerzan/ProjDiff/. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04482 [pdf, other]

Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Authors: Claire Jin, Sudha Rao, Xiangyu Peng, Portia Botchway, Jessica Quaye, Chris Brockett, Bill Dolan

Abstract: Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detec… ▽ More Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted for publication in Findings of the Association for Computational Linguistics: ACL 2024

arXiv:2406.04201 [pdf, ps, other]

Towards Principled Superhuman AI for Multiplayer Symmetric Games

Authors: Jiawei Ge, Yuanhao Wang, Wenzhe Li, Chi Jin

Abstract: Multiplayer games, when the number of players exceeds two, present unique challenges that fundamentally distinguish them from the extensively studied two-player zero-sum games. These challenges arise from the non-uniqueness of equilibria and the risk of agents performing highly suboptimally when adopting equilibrium strategies. While a line of recent works developed learning systems successfully a… ▽ More Multiplayer games, when the number of players exceeds two, present unique challenges that fundamentally distinguish them from the extensively studied two-player zero-sum games. These challenges arise from the non-uniqueness of equilibria and the risk of agents performing highly suboptimally when adopting equilibrium strategies. While a line of recent works developed learning systems successfully achieving human-level or even superhuman performance in popular multiplayer games such as Mahjong, Poker, and Diplomacy, two critical questions remain unaddressed: (1) What is the correct solution concept that AI agents should find? and (2) What is the general algorithmic framework that provably solves all games within this class? This paper takes the first step towards solving these unique challenges of multiplayer games by provably addressing both questions in multiplayer symmetric normal-form games. We also demonstrate that many meta-algorithms developed in prior practical systems for multiplayer games can fail to achieve even the basic goal of obtaining agent's equal share of the total reward. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04089 [pdf, other]

On Limitation of Transformer for Learning HMMs

Authors: Jiachen Hu, Qinghua Liu, Chi Jin

Abstract: Despite the remarkable success of Transformer-based architectures in various sequential modeling tasks, such as natural language processing, computer vision, and robotics, their ability to learn basic sequential models, like Hidden Markov Models (HMMs), is still unclear. This paper investigates the performance of Transformers in learning HMMs and their variants through extensive experimentation an… ▽ More Despite the remarkable success of Transformer-based architectures in various sequential modeling tasks, such as natural language processing, computer vision, and robotics, their ability to learn basic sequential models, like Hidden Markov Models (HMMs), is still unclear. This paper investigates the performance of Transformers in learning HMMs and their variants through extensive experimentation and compares them to Recurrent Neural Networks (RNNs). We show that Transformers consistently underperform RNNs in both training speed and testing accuracy across all tested HMM models. There are even challenging HMM instances where Transformers struggle to learn, while RNNs can successfully do so. Our experiments further reveal the relation between the depth of Transformers and the longest sequence length it can effectively learn, based on the types and the complexity of HMMs. To address the limitation of transformers in modeling HMMs, we demonstrate that a variant of the Chain-of-Thought (CoT), called $\textit{block CoT}$ in the training phase, can help transformers to reduce the evaluation error and to learn longer sequences at a cost of increasing the training time. Finally, we complement our empirical findings by theoretical results proving the expressiveness of transformers in approximating HMMs with logarithmic depth. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02522 [pdf]

Lichen-Mediated Self-Growing Construction Materials for Habitat Outfitting on Mars

Authors: Nisha Rokaya, Erin C. Carr, Richard A. Wilson, Congrui Jin

Abstract: As its next step in space exploration, the National Aeronautics and Space Administration (NASA) revealed plans to establish a permanent human presence on Mars. Habitat outfitting, i.e., the technology to provide the crew with the necessary equipment to perform mission tasks as well as a comfortable, safe, and livable habitable volume, has not been fully explored yet. This study proposes that, rath… ▽ More As its next step in space exploration, the National Aeronautics and Space Administration (NASA) revealed plans to establish a permanent human presence on Mars. Habitat outfitting, i.e., the technology to provide the crew with the necessary equipment to perform mission tasks as well as a comfortable, safe, and livable habitable volume, has not been fully explored yet. This study proposes that, rather than shipping prefabricated outfitting elements to Mars, habitat outfitting can be realized by in-situ construction using cyanobacteria and fungi as building agents. A synthetic lichen system, composed of diazotrophic cyanobacteria and filamentous fungi, can be created to produce abundant biominerals (CaCO3) and biopolymers, which will glue Martian regolith into consolidated building blocks. These self-growing building blocks can be assembled into various structures, such as floors, walls, partitions, and furniture. △ Less

Submitted 13 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.02081 [pdf, other]

FightLadder: A Benchmark for Competitive Multi-Agent Reinforcement Learning

Authors: Wenzhe Li, Zihan Ding, Seth Karten, Chi Jin

Abstract: Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent sy… ▽ More Recent advances in reinforcement learning (RL) heavily rely on a variety of well-designed benchmarks, which provide environmental platforms and consistent criteria to evaluate existing and novel algorithms. Specifically, in multi-agent RL (MARL), a plethora of benchmarks based on cooperative games have spurred the development of algorithms that improve the scalability of cooperative multi-agent systems. However, for the competitive setting, a lightweight and open-sourced benchmark with challenging gaming dynamics and visual inputs has not yet been established. In this work, we present FightLadder, a real-time fighting game platform, to empower competitive MARL research. Along with the platform, we provide implementations of state-of-the-art MARL algorithms for competitive games, as well as a set of evaluation metrics to characterize the performance and exploitability of agents. We demonstrate the feasibility of this platform by training a general agent that consistently defeats 12 built-in characters in single-player mode, and expose the difficulty of training a non-exploitable agent without human knowledge and demonstrations in two-player mode. FightLadder provides meticulously designed environments to address critical challenges in competitive MARL research, aiming to catalyze a new era of discovery and advancement in the field. Videos and code at https://sites.google.com/view/fightladder/home. △ Less

Submitted 23 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

Comments: ICML 2024

arXiv:2405.14701 [pdf, other]

High Fidelity Scene Text Synthesis

Authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

Abstract: Scene text synthesis involves rendering specified texts onto arbitrary images. Current methods typically formulate this task in an end-to-end manner but lack effective character-level guidance during training. Besides, their text encoders, pre-trained on a single font type, struggle to adapt to the diverse font styles encountered in practical applications. Consequently, these methods suffer from c… ▽ More Scene text synthesis involves rendering specified texts onto arbitrary images. Current methods typically formulate this task in an end-to-end manner but lack effective character-level guidance during training. Besides, their text encoders, pre-trained on a single font type, struggle to adapt to the diverse font styles encountered in practical applications. Consequently, these methods suffer from character distortion, repetition, and absence, particularly in polystylistic scenarios. To this end, this paper proposes DreamText for high-fidelity scene text synthesis. Our key idea is to reconstruct the diffusion training process, introducing more refined guidance tailored to this task, to expose and rectify the model's attention at the character level and strengthen its learning of text regions. This transformation poses a hybrid optimization challenge, involving both discrete and continuous variables. To effectively tackle this challenge, we employ a heuristic alternate optimization strategy. Meanwhile, we jointly train the text encoder and generator to comprehensively learn and utilize the diverse font present in the training dataset. This joint training is seamlessly integrated into the alternate optimization process, fostering a synergistic relationship between learning character embedding and re-estimating character attention. Specifically, in each step, we first encode potential character-generated position information from cross-attention maps into latent character masks. These masks are then utilized to update the representation of specific characters in the current step, which, in turn, enables the generator to correct the character's attention in the subsequent steps. Both qualitative and quantitative results demonstrate the superiority of our method to the state of the art. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12801 [pdf, other]

Comparing Neighbors Together Makes it Easy: Jointly Comparing Multiple Candidates for Efficient and Effective Retrieval

Authors: Jonghyun Song, Cheyon Jin, Wenlong Zhao, Jay-Yoon Lee

Abstract: A common retrieve-and-rerank paradigm involves retrieving a broad set of relevant candidates using a scalable bi-encoder, followed by expensive but more accurate cross-encoders to a limited candidate set. However, this small subset often leads to error propagation from the bi-encoders, thereby restricting the performance of the overall pipeline. To address these issues, we propose the Comparing Mu… ▽ More A common retrieve-and-rerank paradigm involves retrieving a broad set of relevant candidates using a scalable bi-encoder, followed by expensive but more accurate cross-encoders to a limited candidate set. However, this small subset often leads to error propagation from the bi-encoders, thereby restricting the performance of the overall pipeline. To address these issues, we propose the Comparing Multiple Candidates (CMC) framework, which compares a query and multiple candidate embeddings jointly through shallow self-attention layers. While providing contextualized representations, CMC is scalable enough to handle multiple comparisons simultaneously, where comparing 2K candidates takes only twice as long as comparing 100. Practitioners can use CMC as a lightweight and effective reranker to improve top-1 accuracy. Moreover, when integrated with another retriever, CMC reranking can function as a virtually enhanced retriever. This configuration adds only negligible latency compared to using a single retriever (virtual), while significantly improving recall at K (enhanced).} Through experiments, we demonstrate that CMC, as a virtually enhanced retriever, significantly improves Recall@k (+6.7, +3.5%-p for R@16, R@64) compared to the initial retrieval stage on the ZeSHEL dataset. Meanwhile, we conduct experiments for direct reranking on entity, passage, and dialogue ranking. The results indicate that CMC is not only faster (11x) than cross-encoders but also often more effective, with improved prediction performance in Wikipedia entity linking (+0.7%-p) and DSTC7 dialogue ranking (+3.3%-p). The code and link to datasets are available at https://github.com/yc-song/cmc △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.10895 [pdf, other]

The unluckiest star: A spectroscopically confirmed repeated partial tidal disruption event AT 2022dbl

Authors: Zheyu Lin, Ning Jiang, Tinggui Wang, Xu Kong, Dongyue Li, Han He, Yibo Wang, Jiazheng Zhu, Wentao Li, Ji-an Jiang, Avinash Singh, Rishabh Singh Teja, D. K. Sahu, Chichuan Jin, Keiichi Maeda, Shifeng Huang

Abstract: The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from m… ▽ More The unluckiest star orbits a supermassive black hole elliptically. Every time it reaches the pericenter, it shallowly enters the tidal radius and gets partially tidal disrupted, producing a series of flares. Confirmation of a repeated partial tidal disruption event (pTDE) requires not only evidence to rule out other types of transients, but also proof that only one star is involved, as TDEs from multiple stars can also produce similar flares. In this letter, we report the discovery of a repeated pTDE, AT 2022dbl. In a quiescent galaxy at z=0.0284, two separate optical/UV flares have been observed in 2022 and 2024, with no bright X-ray, radio or mid-infrared counterparts. Compared to the first flare, the second flare has a similar blackbody temperature of ~26,000 K, slightly lower peak luminosity, and slower rise and fall phases. Compared to the ZTF TDEs, their blackbody parameters, bolometric energies and light curve shapes are all similar. The spectra taken during the second flare show a steeper continuum than the late-time spectra of the previous flare, consistent with a newly risen flare. More importantly, the possibility of two independent TDEs can be largely ruled out because the optical spectra taken around the peak of the two flares exhibit highly similar broad Balmer, N III and possible He II emission lines, especially the extreme ~4100Å emission lines. This represents the first robust spectroscopic evidence for a repeated pTDE, which can soon be verified by observing the third flare, given its short orbital period. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: 15 pages, 8 figures, submitted to ApJ Letters on 2024 Apr 27

arXiv:2405.10313 [pdf, other]

How Far Are We From AGI

Authors: Tao Feng, Chuanyang Jin, Jingyu Liu, Kunlun Zhu, Haoqin Tu, Zirui Cheng, Guanyu Lin, Jiaxuan You

Abstract: The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiv… ▽ More The evolution of artificial intelligence (AI) has profoundly impacted human society, driving significant advancements in multiple sectors. Yet, the escalating demands on AI have highlighted the limitations of AI's current offerings, catalyzing a movement towards Artificial General Intelligence (AGI). AGI, distinguished by its ability to execute diverse real-world tasks with efficiency and effectiveness comparable to human intelligence, reflects a paramount milestone in AI evolution. While existing works have summarized specific recent advancements of AI, they lack a comprehensive discussion of AGI's definitions, goals, and developmental trajectories. Different from existing survey papers, this paper delves into the pivotal questions of our proximity to AGI and the strategies necessary for its realization through extensive surveys, discussions, and original perspectives. We start by articulating the requisite capability frameworks for AGI, integrating the internal, interface, and system dimensions. As the realization of AGI requires more advanced capabilities and adherence to stringent constraints, we further discuss necessary AGI alignment technologies to harmonize these factors. Notably, we emphasize the importance of approaching AGI responsibly by first defining the key levels of AGI progression, followed by the evaluation framework that situates the status-quo, and finally giving our roadmap of how to reach the pinnacle of AGI. Moreover, to give tangible insights into the ubiquitous impact of the integration of AI, we outline existing challenges and potential pathways toward AGI in multiple domains. In sum, serving as a pioneering exploration into the current state and future trajectory of AGI, this paper aims to foster a collective comprehension and catalyze broader public discussions among researchers and practitioners on AGI. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.08278 [pdf, other]

Facilitating Feature and Topology Lightweighting: An Ethereum Transaction Graph Compression Method for Malicious Account Detection

Authors: Jiajun Zhou, Xuanze Chen, Shengbo Gong, Chenkai Hu, Chengxiang Jin, Shanqing Yu, Qi Xuan

Abstract: Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering o… ▽ More Ethereum has become one of the primary global platforms for cryptocurrency, playing an important role in promoting the diversification of the financial ecosystem. However, the relative lag in regulation has led to a proliferation of malicious activities in Ethereum, posing a serious threat to fund security. Existing regulatory methods usually detect malicious accounts through feature engineering or large-scale transaction graph mining. However, due to the immense scale of transaction data and malicious attacks, these methods suffer from inefficiency and low robustness during data processing and anomaly detection. In this regard, we propose an Ethereum Transaction Graph Compression method named TGC4Eth, which assists malicious account detection by lightweighting both features and topology of the transaction graph. At the feature level, we select transaction features based on their low importance to improve the robustness of the subsequent detection models against feature evasion attacks; at the topology level, we employ focusing and coarsening processes to compress the structure of the transaction graph, thereby improving both data processing and inference efficiency of detection models. Extensive experiments demonstrate that TGC4Eth significantly improves the computational efficiency of existing detection models while preserving the connectivity of the transaction graph. Furthermore, TGC4Eth enables existing detection models to maintain stable performance and exhibit high robustness against feature evasion attacks. △ Less

Submitted 1 July, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

Comments: Accepted by International Conference on Blockchain and Trustworthy Systems 2024

arXiv:2405.08074 [pdf]

Optical Imaging of Flavor Order in Flat Band Graphene

Authors: Tian Xie, Tobias M. Wolf, Siyuan Xu, Zhiyuan Cui, Richen Xiong, Yunbo Ou, Patrick Hays, Ludwig F Holleis, Yi Guo, Owen I Sheekey, Caitlin Patterson, Trevor Arp, Kenji Watanabe, Takashi Taniguchi, Seth Ariel Tongay, Andrea F Young, Allan H. MacDonald, Chenhao Jin

Abstract: Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transit… ▽ More Spin and valley flavor polarization plays a central role in the many-body physics of flat band graphene, with fermi surface reconstructions often accompanied by quantized anomalous Hall and superconducting state observed in a variety of experimental systems. Here we describe an optical technique that sensitively and selectively detects flavor textures via the exciton response of a proximal transition metal dichalcogenide layer. Through a systematic study of rhombohedral and rotationally faulted graphene bilayers and trilayers, we show that when the semiconducting dichalcogenide is in direct contact with the graphene, the exciton response is most sensitive to the large momentum rearrangement of the Fermi surface, providing information that is distinct from and complementary to electrical compressibility measurements. The wide-field imaging capability of optical probes allows us to obtain spatial maps of flavor orders with high throughput, and with broad temperature and device compatibility. Our work paves the way for optical probing and imaging of flavor orders in flat band graphene systems. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: 29 pages, 4 figures, with supplementary materials

arXiv:2405.06038 [pdf, other]

From Algorithm to Hardware: A Survey on Efficient and Safe Deployment of Deep Neural Networks

Authors: Xue Geng, Zhe Wang, Chunyun Chen, Qing Xu, Kaixin Xu, Chao Jin, Manas Gupta, Xulei Yang, Zhenghua Chen, Mohamed M. Sabry Aly, Jie Lin, Min Wu, Xiaoli Li

Abstract: Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compress… ▽ More Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compression methods to achieve model efficiency while retaining the performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques such as model quantization, model pruning, knowledge distillation, and optimizations of non-linear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. Additionally, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs, from algorithm to hardware accelerators and security perspectives. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: This manuscript is the accepted version for TNNLS(IEEE Transactions on Neural Networks and Learning Systems)

arXiv:2405.04121 [pdf, other]

ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation

Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

Abstract: Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Mult… ▽ More Cross-modal knowledge transfer enhances point cloud representation learning in LiDAR semantic segmentation. Despite its potential, the \textit{weak teacher challenge} arises due to repetitive and non-diverse car camera images and sparse, inaccurate ground truth labels. To address this, we propose the Efficient Image-to-LiDAR Knowledge Transfer (ELiTe) paradigm. ELiTe introduces Patch-to-Point Multi-Stage Knowledge Distillation, transferring comprehensive knowledge from the Vision Foundation Model (VFM), extensively trained on diverse open-world images. This enables effective knowledge transfer to a lightweight student model across modalities. ELiTe employs Parameter-Efficient Fine-Tuning to strengthen the VFM teacher and expedite large-scale model training with minimal costs. Additionally, we introduce the Segment Anything Model based Pseudo-Label Generation approach to enhance low-quality image labels, facilitating robust semantic representations. Efficient knowledge transfer in ELiTe yields state-of-the-art results on the SemanticKITTI benchmark, outperforming real-time inference models. Our approach achieves this with significantly fewer parameters, confirming its effectiveness and efficiency. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages, 6 figures, ICME 2024 oral

arXiv:2404.18931 [pdf]

doi 10.1038/s41467-024-48725-z

Tunable exciton valley-pseudospin orders in moiré Bose-Hubbard model

Authors: Richen Xiong, Samuel L. Brantly, Kaixiang Su, Jacob H. Nie, Zihan Zhang, Rounak Banerjee, Hayley Ruddick, Kenji Watanabe, Takashi Taniguchi, Sefaattin Tongay, Cenke Xu, Chenhao Jin

Abstract: Spin and charge are the two most important degrees of freedom of electrons. Their interplay lies at the heart of numerous strongly correlated phenomena including Hubbard model physics and high temperature superconductivity. Such interplay for bosons, on the other hand, is largely unexplored in condensed matter systems. Here we demonstrate a unique realization of the spin-1/2 Bose-Hubbard model thr… ▽ More Spin and charge are the two most important degrees of freedom of electrons. Their interplay lies at the heart of numerous strongly correlated phenomena including Hubbard model physics and high temperature superconductivity. Such interplay for bosons, on the other hand, is largely unexplored in condensed matter systems. Here we demonstrate a unique realization of the spin-1/2 Bose-Hubbard model through excitons in a semiconducting moiré superlattice. We find evidence of a transient in-plane ferromagnetic (FM-$xy$) order of exciton spin - here valley pseudospin - around exciton filling $ν_{ex}$ = 1, which transitions into a FM-$z$ order both with increasing exciton filling and a small magnetic field of 10 mT. The phase diagram is different from the fermion case and is qualitatively captured by a simple phenomenological model, highlighting the unique consequence of Bose-Einstein statistics. Our study paves the way for engineering exotic phases of matter from spinor bosons, as well as for unconventional devices in optics and quantum information science. △ Less

Submitted 3 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: Supplementary Materials attached

Journal ref: Nat. Commun. 15, 4254 (2024)

arXiv:2404.17027 [pdf, other]

Player-Driven Emergence in LLM-Driven Game Narrative

Authors: Xiangyu Peng, Jessica Quaye, Sudha Rao, Weijia Xu, Portia Botchway, Chris Brockett, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Bill Dolan

Abstract: We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers t… ▽ More We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation. △ Less

Submitted 3 June, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted at IEEE Conference on Games 2024

Journal ref: IEEE Conference on Games 2024

arXiv:2404.16425 [pdf, other]

Soft X-ray prompt emission from a high-redshift gamma-ray burst EP240315a

Authors: Y. Liu, H. Sun, D. Xu, D. S. Svinkin, J. Delaunay, N. R. Tanvir, H. Gao, C. Zhang, Y. Chen, X. -F. Wu, B. Zhang, W. Yuan, J. An, G. Bruni, D. D. Frederiks, G. Ghirlanda, J. -W. Hu, A. Li, C. -K. Li, J. -D. Li, D. B. Malesani, L. Piro, G. Raman, R. Ricci, E. Troja , et al. (170 additional authors not shown)

Abstract: Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a,… ▽ More Long gamma-ray bursts (GRBs) are believed to originate from core collapse of massive stars. High-redshift GRBs can probe the star formation and reionization history of the early universe, but their detection remains rare. Here we report the detection of a GRB triggered in the 0.5--4 keV band by the Wide-field X-ray Telescope (WXT) on board the Einstein Probe (EP) mission, designated as EP240315a, whose bright peak was also detected by the Swift Burst Alert Telescope and Konus-Wind through off-line analyses. At a redshift of $z=4.859$, EP240315a showed a much longer and more complicated light curve in the soft X-ray band than in gamma-rays. Benefiting from a large field-of-view ($\sim$3600 deg$^2$) and a high sensitivity, EP-WXT captured the earlier engine activation and extended late engine activity through a continuous detection. With a peak X-ray flux at the faint end of previously known high-$z$ GRBs, the detection of EP240315a demonstrates the great potential for EP to study the early universe via GRBs. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 41 pages, 8 figures, 7 tables

arXiv:2404.16422 [pdf, other]

Robust Fine-tuning for Pre-trained 3D Point Cloud Models

Authors: Zhibo Zhang, Ximing Yang, Weizhong Zhang, Cheng Jin

Abstract: This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-trainin… ▽ More This paper presents a robust fine-tuning method designed for pre-trained 3D point cloud models, to enhance feature robustness in downstream fine-tuned models. We highlight the limitations of current fine-tuning methods and the challenges of learning robust models. The proposed method, named Weight-Space Ensembles for Fine-Tuning then Linear Probing (WiSE-FT-LP), integrates the original pre-training and fine-tuning models through weight space integration followed by Linear Probing. This approach significantly enhances the performance of downstream fine-tuned models under distribution shifts, improving feature robustness while maintaining high performance on the target distribution. We apply this robust fine-tuning method to mainstream 3D point cloud pre-trained models and evaluate the quality of model parameters and the degradation of downstream task performance. Experimental results demonstrate the effectiveness of WiSE-FT-LP in enhancing model robustness, effectively balancing downstream task performance and model feature robustness without altering the model structures. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 9 pages, 5 figures

arXiv:2404.12457 [pdf, other]

RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation

Authors: Chao Jin, Zili Zhang, Xuanlin Jiang, Fangyue Liu, Xin Liu, Xuanzhe Liu, Xin Jin

Abstract: Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long sequence generation and leads to high computation and memory costs. We propose RAGCache, a novel multilevel dynamic caching system tailored for RAG. Our analys… ▽ More Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long sequence generation and leads to high computation and memory costs. We propose RAGCache, a novel multilevel dynamic caching system tailored for RAG. Our analysis benchmarks current RAG systems, pinpointing the performance bottleneck (i.e., long sequence due to knowledge injection) and optimization opportunities (i.e., caching knowledge's intermediate states). Based on these insights, we design RAGCache, which organizes the intermediate states of retrieved knowledge in a knowledge tree and caches them in the GPU and host memory hierarchy. RAGCache proposes a replacement policy that is aware of LLM inference characteristics and RAG retrieval patterns. It also dynamically overlaps the retrieval and inference steps to minimize the end-to-end latency. We implement RAGCache and evaluate it on vLLM, a state-of-the-art LLM inference system and Faiss, a state-of-the-art vector database. The experimental results show that RAGCache reduces the time to first token (TTFT) by up to 4x and improves the throughput by up to 2.1x compared to vLLM integrated with Faiss. △ Less

Submitted 25 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.07598 [pdf, other]

Electro-optically Modulated Nonlinear Metasurfaces

Authors: Zhengqing He, Lun Qu, Wei Wu, Jikun Liu, Jingfei You, Weiye Liu, Lu Bai, Chunyan Jin, Chenxiong Wang, Zhidong Gu, Wei Cai, Mengxin Ren, Jingjun Xu

Abstract: Tunable nonlinearity facilitates the creation of reconfigurable nonlinear metasurfaces, enabling innovative applications in signal processing, light switching, and sensing. This paper presents a novel approach to electrically modulate SHG from a lithium niobate (LN) metasurface, exploiting the electro-optical (EO) effect. By fabricating a nanohole array metasurface on a thin LN film and applying a… ▽ More Tunable nonlinearity facilitates the creation of reconfigurable nonlinear metasurfaces, enabling innovative applications in signal processing, light switching, and sensing. This paper presents a novel approach to electrically modulate SHG from a lithium niobate (LN) metasurface, exploiting the electro-optical (EO) effect. By fabricating a nanohole array metasurface on a thin LN film and applying an electric field, we demonstrate the alteration of the material's refractive index, resulting in resonance shifts and modulation of SHG intensity at specific wavelengths. Our findings provide valuable insights for the development of electrically tunable nonlinear light sources, quantum optics, dynamic nonlinear holography, and nonlinear information processing. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 4 pages, 4 figures

arXiv:2404.05980 [pdf, other]

Tackling Structural Hallucination in Image Translation with Local Diffusion

Authors: Seunghoi Kim, Chen Jin, Tom Diethe, Matteo Figini, Henry F. J. Tregidgo, Asher Mullokandov, Philip Teare, Daniel C. Alexander

Abstract: Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing "image hallucination" and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducti… ▽ More Recent developments in diffusion models have advanced conditioned image generation, yet they struggle with reconstructing out-of-distribution (OOD) images, such as unseen tumors in medical images, causing "image hallucination" and risking misdiagnosis. We hypothesize such hallucinations result from local OOD regions in the conditional images. We verify that partitioning the OOD region and conducting separate image generations alleviates hallucinations in several applications. From this, we propose a training-free diffusion framework that reduces hallucination with multiple Local Diffusion processes. Our approach involves OOD estimation followed by two modules: a "branching" module generates locally both within and outside OOD regions, and a "fusion" module integrates these predictions into one. Our evaluation shows our method mitigates hallucination over baseline models quantitatively and qualitatively, reducing misdiagnosis by 40% and 25% in the real-world medical and natural image datasets, respectively. It also demonstrates compatibility with various pre-trained diffusion models. △ Less

Submitted 6 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.20326 [pdf, ps, other]

Shaving Logs via Large Sieve Inequality: Faster Algorithms for Sparse Convolution and More

Authors: Ce Jin, Yinzhan Xu

Abstract: In sparse convolution-type problems, a common technique is to hash the input integers modulo a random prime $p\in [Q/2,Q]$ for some parameter $Q$, which reduces the range of the input integers while preserving their additive structure. However, this hash family suffers from two drawbacks, which led to bottlenecks in many state-of-the-art algorithms: (1) The collision probability of two elements fr… ▽ More In sparse convolution-type problems, a common technique is to hash the input integers modulo a random prime $p\in [Q/2,Q]$ for some parameter $Q$, which reduces the range of the input integers while preserving their additive structure. However, this hash family suffers from two drawbacks, which led to bottlenecks in many state-of-the-art algorithms: (1) The collision probability of two elements from $[N]$ is $O(\frac{\log N}{Q})$ rather than $O(\frac{1}{Q})$; (2) It is difficult to derandomize the choice of $p$; known derandomization techniques lead to super-logarithmic overhead [Chan, Lewenstein STOC'15]. In this paper, we partially overcome these drawbacks in certain scenarios, via novel applications of the large sieve inequality from analytic number theory. Consequently, we obtain the following improved algorithms for various problems (in the standard word RAM model): Sparse Nonnegative Convolution: We obtain an $O(t\log t)$-time Las Vegas algorithm that computes the convolution $A\star B$ of two nonnegative integer vectors $A,B$, where $t$ is the output sparsity $\|A\star B\|_0$. Moreover, our algorithm terminates in $O(t\log t)$ time with $1-1/\mathrm{poly}(t)$ probability. Text-to-Pattern Hamming Distances: Given a length-$m$ pattern $P$ and a length-$n$ text $T$, we obtain a deterministic $O(n\sqrt{m\log \log m})$-time algorithm that exactly computes the Hamming distance between $P$ and every length-$m$ substring of $T$. Sparse General Convolution: We also give a Monte Carlo $O(t\log t)$ time algorithm for sparse convolution with possibly negative input in the restricted case where the length $N$ of the input vectors satisfies $N\le t^{1.99}$. △ Less

Submitted 29 March, 2024; originally announced March 2024.

Comments: To appear at STOC 2024

arXiv:2403.19117 [pdf, other]

A Faster Algorithm for Pigeonhole Equal Sums

Authors: Ce Jin, Hongxun Wu

Abstract: An important area of research in exact algorithms is to solve Subset-Sum-type problems faster than meet-in-middle. In this paper we study Pigeonhole Equal Sums, a total search problem proposed by Papadimitriou (1994): given $n$ positive integers $w_1,\dots,w_n$ of total sum $\sum_{i=1}^n w_i < 2^n-1$, the task is to find two distinct subsets $A, B \subseteq [n]$ such that… ▽ More An important area of research in exact algorithms is to solve Subset-Sum-type problems faster than meet-in-middle. In this paper we study Pigeonhole Equal Sums, a total search problem proposed by Papadimitriou (1994): given $n$ positive integers $w_1,\dots,w_n$ of total sum $\sum_{i=1}^n w_i < 2^n-1$, the task is to find two distinct subsets $A, B \subseteq [n]$ such that $\sum_{i\in A}w_i=\sum_{i\in B}w_i$. Similar to the status of the Subset Sum problem, the best known algorithm for Pigeonhole Equal Sums runs in $O^*(2^{n/2})$ time, via either meet-in-middle or dynamic programming (Allcock, Hamoudi, Joux, Klingelhöfer, and Santha, 2022). Our main result is an improved algorithm for Pigeonhole Equal Sums in $O^*(2^{0.4n})$ time. We also give a polynomial-space algorithm in $O^*(2^{0.75n})$ time. Unlike many previous works in this area, our approach does not use the representation method, but rather exploits a simple structural characterization of input instances with few solutions. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 11 pages

arXiv:2403.18979 [pdf, other]

Complexity of emerging magnetic flux during lifetime of solar ephemeral regions

Authors: Hanlin Yang, Chunlan Jin, Zifan Wang, Jingxiu Wang

Abstract: As a relatively active region, ephemeral region (ER) exhibits highly complex pattern of magnetic flux emergence. We aim to study detailed secondary flux emergences (SFEs) which we define as bipoles that they appear close to ERs and finally coalesce with ERs after a period. We study the SFEs during the whole process from emergence to decay of 5 ERs observed by the Helioseismic and Magnetic Imager (… ▽ More As a relatively active region, ephemeral region (ER) exhibits highly complex pattern of magnetic flux emergence. We aim to study detailed secondary flux emergences (SFEs) which we define as bipoles that they appear close to ERs and finally coalesce with ERs after a period. We study the SFEs during the whole process from emergence to decay of 5 ERs observed by the Helioseismic and Magnetic Imager (HMI) aboard Solar Dynamics Observatory (SDO) . The maximum unsigned magnetic flux for each ER is around $10^{20}$ Mx. Each ER has tens of SFEs with an average emerging magnetic flux of approximately 5$\times10^{18}$ Mx. The frequency of normalized magnetic flux for all the SFEs follows a power law distribution with an index of -2.08 . The majority of SFEs occur between the positive and negative polarities of ER , and their growth time is concentrated within one hour. The magnetic axis of SFE is found to exhibit a random distribution in the 5 ERs. We suggest that the relationship between SFEs and ERs can be understood by regarding the photospheric magnetic field observations as cross-sections of an emerging magnetic structure. Tracking the ERs' evolution, we propose that these SFEs in ERs may be sequent emergences from the bundle of flux tube of ERs, and that SFEs are partially emerged $Ω$-loops. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: 12 pages, 9 figures, 1 table and accepted for publication in the Astrophysical Journal

arXiv:2403.17262 [pdf, ps, other]

Tian's stabilization problem for toric Fanos

Authors: Chenzi Jin, Yanir A. Rubinstein

Abstract: In 1988, Tian posed the stabilization problem for equivariant global log canonical thresholds. We solve it in the case of toric Fano manifolds. This is the first general result on Tian's problem. A key new estimate involves expressing complex singularity exponents associated to orbits of a group action in terms of support and gauge functions from convex geometry. These techniques also yield a reso… ▽ More In 1988, Tian posed the stabilization problem for equivariant global log canonical thresholds. We solve it in the case of toric Fano manifolds. This is the first general result on Tian's problem. A key new estimate involves expressing complex singularity exponents associated to orbits of a group action in terms of support and gauge functions from convex geometry. These techniques also yield a resolution of another conjecture of Tian from 2012 on more general thresholds associated to Grassmannians of plurianticanonical series. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.15295 [pdf, other]

Complete quantum control of orbital qubits by phase-controlled stimulated Raman transitions

Authors: Jun-Yong Yan, Liang Zhai, Hans-Georg Babin, Yuanzhen Li, Si-Hui Pei, Moritz Cygorek, Wei Fang, Fei Gao, Andreas D. Wieck, Arne Ludwig, Chao-Yuan Jin, Da-Wei Wang, Feng Liu

Abstract: Complete quantum control of a stationary quantum bit embedded in a quantum emitter is crucial for photonic quantum information technologies. Recently, the orbital degree of freedom in optically active semiconductor quantum dots emerged as a promising candidate. However, the crucial ability to perform arbitrary rotation on orbital qubits remains elusive. Here, we demonstrate complete control of hol… ▽ More Complete quantum control of a stationary quantum bit embedded in a quantum emitter is crucial for photonic quantum information technologies. Recently, the orbital degree of freedom in optically active semiconductor quantum dots emerged as a promising candidate. However, the crucial ability to perform arbitrary rotation on orbital qubits remains elusive. Here, we demonstrate complete control of hole orbital states in a quantum dot. This is enabled by successfully inducing stimulated Raman transitions within $Λ$ systems connected via radiative Auger transitions. This new capability allows manipulations of polar and azimuth angles of the Bloch vector, as evidenced by Rabi oscillations and Ramsey interference, respectively. Simultaneous control of both parameters is achieved by concurrently varying the amplitude and phase of picosecond Raman pulses, enabling arbitrary unitary rotation of the Bloch vector. Our results establish the orbital states in solid-state quantum emitters as a potentially viable resource for applications in quantum information processing and quantum communication. △ Less

Submitted 22 March, 2024; originally announced March 2024.

Comments: Manuscript with 7 pages and 4 figures plus supplementary Information comprising 8 pages and 8 figures

arXiv:2403.13330 [pdf, other]

Efficient scene text image super-resolution with semantic guidance

Authors: LeoWu TomyEnrique, Xiangcheng Du, Kangliang Liu, Han Yuan, Zhao Zhou, Cheng Jin

Abstract: Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains… ▽ More Scene text image super-resolution has significantly improved the accuracy of scene text recognition. However, many existing methods emphasize performance over efficiency and ignore the practical need for lightweight solutions in deployment scenarios. Faced with the issues, our work proposes an efficient framework called SGENet to facilitate deployment on resource-limited platforms. SGENet contains two branches: super-resolution branch and semantic guidance branch. We apply a lightweight pre-trained recognizer as a semantic extractor to enhance the understanding of text information. Meanwhile, we design the visual-semantic alignment module to achieve bidirectional alignment between image features and semantics, resulting in the generation of highquality prior guidance. We conduct extensive experiments on benchmark dataset, and the proposed SGENet achieves excellent performance with fewer computational costs. Code is available at https://github.com/SijieLiu518/SGENet △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.12771 [pdf, other]

TYC 3340-2437-1: A Quadruple System with A Massive Star

Authors: Jiao Li, Chao Liu, Changqing Luo, Bo Zhang, Jiang-Dan Li, Jia-Dong Li, Zhan-Wen Han, Xue-Fei Chen, Lu-Qian Wang, Min Fang, Li-Feng Xing, Xi-Liang Zhang, Chichuan Jin

Abstract: Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located… ▽ More Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located in the stellar bow-shock nebula (SBN). It has a probability of over 99.99\% being a quadruple system derived from the surface density of the vicinity stars. Its inner orbital periods are 3.390602(89) days and 2.4378(16) days, respectively, and the total mass is about (11.47 + 5.79) + (5.2 + 2.02) = 24.48 $M_{\odot}$. The line-of-sight inclinations of the inner binaries, B$_1$ and B$_2$, are 55.94 and 78.2 degrees, respectively, indicating that they are not co-planar. Based on observations spanning 34 months and the significance of the astrometric excess noise ($D>2$) in Gaia DR3 data, we guess that its outer orbital period might be a few years. If it were true, the quadruple system might form through the disk fragmentation mechanism with outer eccentric greater than zero. This eccentricity could be the cause of both the arc-like feature of the SBN and the noncoplanarity of the inner orbit. The outer orbital period and outer eccentric could be determined with the release of future epoch astrometric data of Gaia. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.09101 [pdf, other]

Soften to Defend: Towards Adversarial Robustness via Self-Guided Label Refinement

Authors: Daiwei Yu, Zhuorong Li, Lina Wei, Canghong Jin, Yun Zhang, Sixian Chan

Abstract: Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessiv… ▽ More Adversarial training (AT) is currently one of the most effective ways to obtain the robustness of deep neural networks against adversarial attacks. However, most AT methods suffer from robust overfitting, i.e., a significant generalization gap in adversarial robustness between the training and testing curves. In this paper, we first identify a connection between robust overfitting and the excessive memorization of noisy labels in AT from a view of gradient norm. As such label noise is mainly caused by a distribution mismatch and improper label assignments, we are motivated to propose a label refinement approach for AT. Specifically, our Self-Guided Label Refinement first self-refines a more accurate and informative label distribution from over-confident hard labels, and then it calibrates the training by dynamically incorporating knowledge from self-distilled models into the current model and thus requiring no external teachers. Empirical results demonstrate that our method can simultaneously boost the standard accuracy and robust performance across multiple benchmark datasets, attack types, and architectures. In addition, we also provide a set of analyses from the perspectives of information theory to dive into our method and suggest the importance of soft labels for robust generalization. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.05053 [pdf, other]

PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering

Authors: Yibin Wang, Weizhong Zhang, Jianwei Zheng, Cheng Jin

Abstract: Image composition involves seamlessly integrating given objects into a specific visual context. The current training-free methods rely on composing attention weights from several samplers to guide the generator. However, since these weights are derived from disparate contexts, their combination leads to coherence confusion in synthesis and loss of appearance information. These issues worsen with t… ▽ More Image composition involves seamlessly integrating given objects into a specific visual context. The current training-free methods rely on composing attention weights from several samplers to guide the generator. However, since these weights are derived from disparate contexts, their combination leads to coherence confusion in synthesis and loss of appearance information. These issues worsen with their excessive focus on background generation, even when unnecessary in this task. This not only slows down inference but also compromises foreground generation quality. Moreover, these methods introduce unwanted artifacts in the transition area. In this paper, we formulate image composition as a subject-based local editing task, solely focusing on foreground generation. At each step, the edited foreground is combined with the noisy background to maintain scene consistency. To address the remaining issues, we propose PrimeComposer, a faster training-free diffuser that composites the images by well-designed attention steering across different noise levels. This steering is predominantly achieved by our Correlation Diffuser, utilizing its self-attention layers at each step. Within these layers, the synthesized subject interacts with both the referenced object and background, capturing intricate details and coherent relationships. This prior information is encoded into the attention weights, which are then integrated into the self-attention layers of the generator to guide the synthesis process. Besides, we introduce a Region-constrained Cross-Attention to confine the impact of specific subject-related words to desired regions, addressing the unwanted artifacts shown in the prior method thereby further improving the coherence in the transition area. Our method exhibits the fastest inference efficiency and extensive experiments demonstrate our superiority both qualitatively and quantitatively. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.01149 [pdf]

Unified understanding to the rich electronic-structure evolutions of 2D black phosphorus under pressure

Authors: Yu-Meng Gao, Yue-Jiao Zhang, Xiao-Lin Zhao, Xin-Yu Li, Shu-Hui Wang, Chen-Dong Jin, Hu Zhang, Ru-Qian Lian, Rui-Ning Wang, Peng-Lai Gong, Jiang-Long Wang, Xing-Qiang Shi

Abstract: The electronic structure evolutions of few-layer black phosphorus (BP) under pressure shows a wealth of phenomena, such as the nonmonotonic change of direct gap at the Γ point, the layer-number dependence, and the distinct responses to normal and hydrostatic pressures. A full and unified understanding to these rich phenomena remains lacking. Here, we provide a unified understanding from the compet… ▽ More The electronic structure evolutions of few-layer black phosphorus (BP) under pressure shows a wealth of phenomena, such as the nonmonotonic change of direct gap at the Γ point, the layer-number dependence, and the distinct responses to normal and hydrostatic pressures. A full and unified understanding to these rich phenomena remains lacking. Here, we provide a unified understanding from the competition between interlayer quasi-bonding (QB) interactions and intralayer chemical bonding interactions. The former decreases while the latter increases the band gap under pressure and the origin can be correlated to different combinations of inter- and intra-layer antibonding or bonding interactions at the band edges. More interestingly, the interlayer QB interactions are a coexistence of two categories of interactions, namely, the coexistence of interactions between bands of the same occupancy (occupied-occupied and empty-empty interactions) and of different occupancies (occupied-empty interaction); and, the overall effect is a four-level interaction, which explains the anomalous interlayer-antibonding feature of the conduction band edge of bilayer BP. Our current study lay the foundation for the electronic structure tuning of two-dimensional (2D) BP, and, our analysis method for multi-energy-level interactions can be applied to other 2D semiconductor homo- and hetero-structures that have occupied-empty interlayer interactions. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.18162 [pdf, ps, other]

Out-of-Distribution Detection using Neural Activation Prior

Authors: Weilin Wan, Weizhong Zhang, Quan Zhou, Fan Yi, Cheng Jin

Abstract: Out-of-distribution detection (OOD) is a crucial technique for deploying machine learning models in the real world to handle the unseen scenarios. In this paper, we first propose a simple yet effective Neural Activation Prior (NAP) for OOD detection. Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the… ▽ More Out-of-distribution detection (OOD) is a crucial technique for deploying machine learning models in the real world to handle the unseen scenarios. In this paper, we first propose a simple yet effective Neural Activation Prior (NAP) for OOD detection. Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the probability of a few neurons being activated with a large response by an in-distribution (ID) sample is significantly higher than that by an OOD sample. An intuitive explanation is that for a model fully trained on ID dataset, each channel would play a role in detecting a certain pattern in the ID dataset, and a few neurons can be activated with a large response when the pattern is detected in an input sample. Then, a new scoring function based on this prior is proposed to highlight the role of these strongly activated neurons in OOD detection. Our approach is plug-and-play and does not lead to any performance degradation on ID data classification and requires no extra training or statistics from training or external datasets. Notice that previous methods primarily rely on post-global-pooling features of the neural networks, while the within-channel distribution information we leverage would be discarded by the global pooling operator. Consequently, our method is orthogonal to existing approaches and can be effectively combined with them in various applications. Experimental results show that our method achieves the state-of-the-art performance on CIFAR benchmark and ImageNet dataset, which demonstrates the power of the proposed prior. Finally, we extend our method to Transformers and the experimental findings indicate that NAP can also significantly enhance the performance of OOD detection on Transformers, thereby demonstrating the broad applicability of this prior knowledge. △ Less

Submitted 24 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.14840 [pdf, other]

RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning

Authors: Congyun Jin, Ming Zhang, Xiaowei Ma, Li Yujiao, Yingbo Wang, Yabo Jia, Yuliang Du, Tao Sun, Haowen Wang, Cong Fan, Jinjie Gu, Chenfei Chi, Xiangguo Lv, Fangzhou Li, Wei Xue, Yiran Huang

Abstract: Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-Me… ▽ More Recent advancements in Large Language Models (LLMs) and Large Multi-modal Models (LMMs) have shown potential in various medical applications, such as Intelligent Medical Diagnosis. Although impressive results have been achieved, we find that existing benchmarks do not reflect the complexity of real medical reports and specialized in-depth reasoning capabilities. In this work, we introduced RJUA-MedDQA, a comprehensive benchmark in the field of medical specialization, which poses several challenges: comprehensively interpreting imgage content across diverse challenging layouts, possessing numerical reasoning ability to identify abnormal indicators and demonstrating clinical reasoning ability to provide statements of disease diagnosis, status and advice based on medical contexts. We carefully design the data generation pipeline and proposed the Efficient Structural Restoration Annotation (ESRA) Method, aimed at restoring textual and tabular content in medical report images. This method substantially enhances annotation efficiency, doubling the productivity of each annotator, and yields a 26.8% improvement in accuracy. We conduct extensive evaluations, including few-shot assessments of 5 LMMs which are capable of solving Chinese medical QA tasks. To further investigate the limitations and potential of current LMMs, we conduct comparative experiments on a set of strong LLMs by using image-text generated by ESRA method. We report the performance of baselines and offer several observations: (1) The overall performance of existing LMMs is still limited; however LMMs more robust to low-quality and diverse-structured images compared to LLMs. (3) Reasoning across context and image content present significant challenges. We hope this benchmark helps the community make progress on these challenging tasks in multi-modal medical document understanding and facilitate its application in healthcare. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: 15 pages, 13 figures

arXiv:2402.12747 [pdf, other]

Enhanced Physical Layer Security for Full-duplex Symbiotic Radio with AN Generation and Forward Noise Suppression

Authors: Chi Jin, Zheng Chang, Fengye Hu, Hsiao-Hwa Chen, Timo Hamalainen

Abstract: Due to the constraints on power supply and limited encryption capability, data security based on physical layer security (PLS) techniques in backscatter communications has attracted a lot of attention. In this work, we propose to enhance PLS in a full-duplex symbiotic radio (FDSR) system with a proactive eavesdropper, which may overhear the information and interfere legitimate communications simul… ▽ More Due to the constraints on power supply and limited encryption capability, data security based on physical layer security (PLS) techniques in backscatter communications has attracted a lot of attention. In this work, we propose to enhance PLS in a full-duplex symbiotic radio (FDSR) system with a proactive eavesdropper, which may overhear the information and interfere legitimate communications simultaneously by emitting attack signals. To deal with the eavesdroppers, we propose a security strategy based on pseudo-decoding and artificial noise (AN) injection to ensure the performance of legitimate communications through forward noise suppression. A novel AN signal generation scheme is proposed using a pseudo-decoding method, where AN signal is superimposed on data signal to safeguard the legitimate channel. The phase control in the forward noise suppression scheme and the power allocation between AN and data signals are optimized to maximize security throughput. The formulated problem can be solved via problem decomposition and alternate optimization algorithms. Simulation results demonstrate the superiority of the proposed scheme in terms of security throughput and attack mitigation performance. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.10806 [pdf, other]

Streaming Algorithms for Connectivity Augmentation

Authors: Ce Jin, Michael Kapralov, Sepideh Mahabadi, Ali Vakilian

Abstract: We study the $k$-connectivity augmentation problem ($k$-CAP) in the single-pass streaming model. Given a $(k-1)$-edge connected graph $G=(V,E)$ that is stored in memory, and a stream of weighted edges $L$ with weights in $\{0,1,\dots,W\}$, the goal is to choose a minimum weight subset $L'\subseteq L$ such that $G'=(V,E\cup L')$ is $k$-edge connected. We give a $(2+ε)$-approximation algorithm for t… ▽ More We study the $k$-connectivity augmentation problem ($k$-CAP) in the single-pass streaming model. Given a $(k-1)$-edge connected graph $G=(V,E)$ that is stored in memory, and a stream of weighted edges $L$ with weights in $\{0,1,\dots,W\}$, the goal is to choose a minimum weight subset $L'\subseteq L$ such that $G'=(V,E\cup L')$ is $k$-edge connected. We give a $(2+ε)$-approximation algorithm for this problem which requires to store $O(ε^{-1} n\log n)$ words. Moreover, we show our result is tight: Any algorithm with better than $2$-approximation for the problem requires $Ω(n^2)$ bits of space even when $k=2$. This establishes a gap between the optimal approximation factor one can obtain in the streaming vs the offline setting for $k$-CAP. We further consider a natural generalization to the fully streaming model where both $E$ and $L$ arrive in the stream in an arbitrary order. We show that this problem has a space lower bound that matches the best possible size of a spanner of the same approximation ratio. Following this, we give improved results for spanners on weighted graphs: We show a streaming algorithm that finds a $(2t-1+ε)$-approximate weighted spanner of size at most $O(ε^{-1} n^{1+1/t}\log n)$ for integer $t$, whereas the best prior streaming algorithm for spanner on weighted graphs had size depending on $\log W$. Using our spanner result, we provide an optimal $O(t)$-approximation for $k$-CAP in the fully streaming model with $O(nk + n^{1+1/t})$ words of space. Finally we apply our results to network design problems such as Steiner tree augmentation problem (STAP), $k$-edge connected spanning subgraph ($k$-ECSS), and the general Survivable Network Design problem (SNDP). In particular, we show a single-pass $O(t\log k)$-approximation for SNDP using $O(kn^{1+1/t})$ words of space, where $k$ is the maximum connectivity requirement. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10414 [pdf, other]

The weakness of soft X-ray intensity: possible physical reason for weak line quasars

Authors: Jiancheng Wu, Qingwen Wu, Chichuan Jin, Jianfeng Wu, Weihua Lei, Xinwu Cao, Xiao Fan, Xiangli Lei, Mengye Wang, Hanrui Xue, Bing Lyu

Abstract: Weak-line quasars (WLQs) are a notable group of active galactic nuclei (AGNs) that show unusually weak UV lines even though their optical-UV continuum shapes are similar to those of typical quasars. The physical mechanism for WLQs is an unsolved puzzle in the AGN unified model. We explore the properties of UV emission lines by performing extensive photoionization calculations based on Cloudy simul… ▽ More Weak-line quasars (WLQs) are a notable group of active galactic nuclei (AGNs) that show unusually weak UV lines even though their optical-UV continuum shapes are similar to those of typical quasars. The physical mechanism for WLQs is an unsolved puzzle in the AGN unified model. We explore the properties of UV emission lines by performing extensive photoionization calculations based on Cloudy simulation with different spectral energy distributions (SEDs) of AGNs. The AGN continua are built from several observational empirical correlations, where the black-body emission from the cold disk, the power-law emission from the hot corona, and a soft X-ray excess component are considered. We find that the equivalent width (EW) of C {\footnotesize IV} from our models is systematically lower than observational values if the component of soft X-ray excess is neglected. The EW will increase several times and is roughly consistent with the observations after considering the soft X-ray excess component as constrained from normal type I AGNs. We find that the UV lines are weak for QSOs with quite large BH mass (e.g., $M_{\rm BH}>10^9M_{\odot}$) and weak soft X-ray emission due to the deficit of ionizing photons. As an example, we present the strength of C {\footnotesize IV} based on the multi-band SEDs for three nearby weak-line AGNs, where the weaker soft X-ray emission normally predicts the weaker lines. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted for Publication in ApJ

arXiv:2402.07793 [pdf, ps, other]

Tuning-Free Stochastic Optimization

Authors: Ahmed Khaled, Chi Jin

Abstract: Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider i… ▽ More Large-scale machine learning problems make the cost of hyperparameter tuning ever more prohibitive. This creates a need for algorithms that can tune themselves on-the-fly. We formalize the notion of "tuning-free" algorithms that can match the performance of optimally-tuned optimization algorithms up to polylogarithmic factors given only loose hints on the relevant problem parameters. We consider in particular algorithms that can match optimally-tuned Stochastic Gradient Descent (SGD). When the domain of optimization is bounded, we show tuning-free matching of SGD is possible and achieved by several existing algorithms. We prove that for the task of minimizing a convex and smooth or Lipschitz function over an unbounded domain, tuning-free optimization is impossible. We discuss conditions under which tuning-free optimization is possible even over unbounded domains. In particular, we show that the recently proposed DoG and DoWG algorithms are tuning-free when the noise distribution is sufficiently well-behaved. For the task of finding a stationary point of a smooth and potentially nonconvex function, we give a variant of SGD that matches the best-known high-probability convergence rate for tuned SGD at only an additional polylogarithmic cost. However, we also give an impossibility result that shows no algorithm can hope to match the optimal expected convergence rate for tuned SGD with high probability. △ Less

Submitted 18 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.05811 [pdf]

High-Q Cavity Interface for Color Centers in Thin Film Diamond

Authors: Sophie W. Ding, Michael Haas, Xinghan Guo, Kazuhiro Kuruma, Chang Jin, Zixi Li, David D. Awschalom, Nazar Delegan, F. Joseph Heremans, Alex High, Marko Loncar

Abstract: Quantum information technology offers the potential to realize unprecedented computational resources via secure channels capable of distributing entanglement between quantum computers. Diamond, as a host to atom-like defects with optically-accessible spin qubits, is a leading platform to realize quantum memory nodes needed to extend the reach of quantum links. Photonic crystal (PhC) cavities enhan… ▽ More Quantum information technology offers the potential to realize unprecedented computational resources via secure channels capable of distributing entanglement between quantum computers. Diamond, as a host to atom-like defects with optically-accessible spin qubits, is a leading platform to realize quantum memory nodes needed to extend the reach of quantum links. Photonic crystal (PhC) cavities enhance light-matter interaction and are essential ingredients of an efficient interface between spins and photons that are used to store and communicate quantum information respectively. Despite great effort, however, the realization of visible PhC cavities with high quality factor (Q) and design flexibility is challenging in diamond. Here, we demonstrate one- and two-dimensional PhC cavities fabricated in recently developed thin-film diamonds, featuring Q-factors of 1.8x10$^5$ and 1.6x10$^5$, respectively, the highest Qs for visible PhC cavities realized in any material. Importantly, our fabrication process is simple and high-yield, based on conventional planar fabrication techniques, in contrast to previous approaches that rely on complex undercut methods. We also demonstrate fiber-coupled 1D PhC cavities with high photon extraction efficiency, and optical coupling between a single SiV center and such a cavity at 4K achieving a Purcell factor of 13. The demonstrated diamond thin-film photonic platform will improve the performance and scalability of quantum nodes and expand the range of quantum technologies. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.02769 [pdf, other]

Learning from Teaching Regularization: Generalizable Correlations Should be Easy to Imitate

Authors: Can Jin, Tong Che, Hongwu Peng, Yiyuan Li, Marco Pavone

Abstract: Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to teach. LoT operationalizes this concept to impro… ▽ More Generalization remains a central challenge in machine learning. In this work, we propose Learning from Teaching (LoT), a novel regularization technique for deep neural networks to enhance generalization. Inspired by the human ability to capture concise and abstract patterns, we hypothesize that generalizable correlations are expected to be easier to teach. LoT operationalizes this concept to improve the generalization of the main model with auxiliary student learners. The student learners are trained by the main model and improve the main model to capture more generalizable and teachable correlations by providing feedback. Our experimental results across several domains, including Computer Vision, Natural Language Processing, and Reinforcement Learning, demonstrate that the introduction of LoT brings significant benefits compared to merely training models on the original training data. It suggests the effectiveness of LoT in identifying generalizable information without falling into the swamp of complex patterns in data, making LoT a valuable addition to the current machine learning frameworks. △ Less

Submitted 5 February, 2024; originally announced February 2024.

arXiv:2401.10475 [pdf, other]

CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios

Authors: Xiangshuo Qiao, Xianxin Li, Xiaozhe Qu, Jie Zhang, Yang Liu, Yu Luo, Cihang Jin, Jin Ma

Abstract: Vision-Language Models pre-trained on large-scale image-text datasets have shown superior performance in downstream tasks such as image retrieval. Most of the images for pre-training are presented in the form of open domain common-sense visual elements. Differently, video covers in short video search scenarios are presented as user-originated contents that provide important visual summaries of vid… ▽ More Vision-Language Models pre-trained on large-scale image-text datasets have shown superior performance in downstream tasks such as image retrieval. Most of the images for pre-training are presented in the form of open domain common-sense visual elements. Differently, video covers in short video search scenarios are presented as user-originated contents that provide important visual summaries of videos. In addition, a portion of the video covers come with manually designed cover texts that provide semantic complements. In order to fill in the gaps in short video cover data, we establish the first large-scale cover-text benchmark for Chinese short video search scenarios. Specifically, we release two large-scale datasets CBVS-5M/10M to provide short video covers, and the manual fine-labeling dataset CBVS-20K to provide real user queries, which serves as an image-text benchmark test in the Chinese short video search field. To integrate the semantics of cover text in the case of modality missing, we propose UniCLIP where cover texts play a guiding role during training, however are not relied upon by inference. Extensive evaluation on CBVS-20K demonstrates the excellent performance of our proposal. UniCLIP has been deployed to Tencent's online video search systems with hundreds of millions of visits and achieved significant gains. The dataset and code are available at https://github.com/QQBrowserVideoSearch/CBVS-UniCLIP. △ Less

Submitted 25 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.08743 [pdf, other]

MMToM-QA: Multimodal Theory of Mind Question Answering

Authors: Chuanyang Jin, Yutong Wu, Jing Cao, Jiannan Xiang, Yen-Ling Kuo, Zhiting Hu, Tomer Ullman, Antonio Torralba, Joshua B. Tenenbaum, Tianmin Shu

Abstract: Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than v… ▽ More Theory of Mind (ToM), the ability to understand people's mental states, is an essential ingredient for developing machines with human-level social intelligence. Recent machine learning models, particularly large language models, seem to show some aspects of ToM understanding. However, existing ToM benchmarks use unimodal datasets - either video or text. Human ToM, on the other hand, is more than video or text understanding. People can flexibly reason about another person's mind based on conceptual representations (e.g., goals, beliefs, plans) extracted from any available data. To address this, we introduce a multimodal Theory of Mind question answering (MMToM-QA) benchmark. MMToM-QA comprehensively evaluates machine ToM both on multimodal data and on different kinds of unimodal data about a person's activity in a household environment. To engineer multimodal ToM capacity, we propose a novel method, BIP-ALM (Bayesian Inverse Planning Accelerated by Language Models). BIP-ALM extracts unified representations from multimodal data and utilizes language models for scalable Bayesian inverse planning. We conducted a systematic comparison of human performance, BIP-ALM, and state-of-the-art models, including GPT-4. The experiments demonstrate that large language models and large multimodal models still lack robust ToM capacity. BIP-ALM, on the other hand, shows promising results, by leveraging the power of both model-based mental inference and language models. △ Less

Submitted 15 June, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: ACL 2024. 26 pages, 11 figures, 7 tables

Showing 1–50 of 709 results for author: Jin, C