subscribe to arXiv mailings

Optimizing KV Cache Eviction in LLMs: Adaptive Allocation for Enhanced Budget Utilization

Authors: Yuan Feng, Junlin Lv, Yukun Cao, Xike Xie, S. Kevin Zhou

Abstract: Large Language Models have excelled in various fields but encounter efficiency limitations due to the extensive KV cache required for long sequences inference. Many efforts try to evict non-critical cache elements during runtime, thereby reducing cache size within a given memory budget while preserving generation quality. Our reexamination of their underlying principles discerns that prevailing st… ▽ More Large Language Models have excelled in various fields but encounter efficiency limitations due to the extensive KV cache required for long sequences inference. Many efforts try to evict non-critical cache elements during runtime, thereby reducing cache size within a given memory budget while preserving generation quality. Our reexamination of their underlying principles discerns that prevailing strategies essentially aim to minimize an upper bound of eviction loss within a specific budget allocation. However, we observe that the current practice of uniformly allocating budgets across different attention heads during the eviction procedure tends to degrade the quality of generation posten-eviction. In light of these findings, we propose a simple yet effective adaptive allocation algorithm that not only theoretically ensures its loss upper bound does not exceed that of previous uniform allocation methods, but also effectively aligns with the characteristics of the self-attention mechanism, thus practically reducing the upper bound. Further, integrating this algorithm with two of the most advanced methods yields Ada-SnapKV and Ada-Pyramid. Extensive experimental validation across 16 datasets and the Needle-in-a-Haystack test confirm that Ada-SnapKV and Ada-Pyramid achieve further enhancements, establishing new benchmarks in state-of-the-art performance. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.10299 [pdf, other]

Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models

Authors: Yuchen Yang, Kwonjoon Lee, Behzad Dariush, Yinzhi Cao, Shao-Yuan Lo

Abstract: Video Anomaly Detection (VAD) is crucial for applications such as security surveillance and autonomous driving. However, existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direc… ▽ More Video Anomaly Detection (VAD) is crucial for applications such as security surveillance and autonomous driving. However, existing VAD methods provide little rationale behind detection, hindering public trust in real-world deployments. In this paper, we approach VAD with a reasoning framework. Although Large Language Models (LLMs) have shown revolutionary reasoning ability, we find that their direct use falls short of VAD. Specifically, the implicit knowledge pre-trained in LLMs focuses on general context and thus may not apply to every specific real-world VAD scenario, leading to inflexibility and inaccuracy. To address this, we propose AnomalyRuler, a novel rule-based reasoning framework for VAD with LLMs. AnomalyRuler comprises two main stages: induction and deduction. In the induction stage, the LLM is fed with few-shot normal reference samples and then summarizes these normal patterns to induce a set of rules for detecting anomalies. The deduction stage follows the induced rules to spot anomalous frames in test videos. Additionally, we design rule aggregation, perception smoothing, and robust reasoning strategies to further enhance AnomalyRuler's robustness. AnomalyRuler is the first reasoning approach for the one-class VAD task, which requires only few-normal-shot prompting without the need for full-shot training, thereby enabling fast adaption to various VAD scenarios. Comprehensive experiments across four VAD benchmarks demonstrate AnomalyRuler's state-of-the-art detection performance and reasoning ability. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.08529 [pdf, other]

Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

Abstract: Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a… ▽ More Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion attacks in spatiotemporal federated learning. In this paper, we explore the gradient attack problem in spatiotemporal federated learning from attack and defense perspectives. To understand privacy risks in spatiotemporal federated learning, we first propose Spatiotemporal Gradient Inversion Attack (ST-GIA), a gradient attack algorithm tailored to spatiotemporal data that successfully reconstructs the original location from gradients. Furthermore, we design an adaptive defense strategy to mitigate gradient inversion attacks in spatiotemporal federated learning. By dynamically adjusting the perturbation levels, we can offer tailored protection for varying rounds of training data, thereby achieving a better trade-off between privacy and utility than current state-of-the-art methods. Through intensive experimental analysis on three real-world datasets, we reveal that the proposed defense strategy can well preserve the utility of spatiotemporal federated learning with effective security protection. △ Less

Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

Comments: Accepted by DASFAA 2024, 16 pages

arXiv:2407.08514 [pdf, other]

Rethinking the Threat and Accessibility of Adversarial Attacks against Face Recognition Systems

Authors: Yuxin Cao, Yumeng Zhu, Derui Wang, Sheng Wen, Minhui Xue, Jin Lu, Hao Ge

Abstract: Face recognition pipelines have been widely deployed in various mission-critical systems in trust, equitable and responsible AI applications. However, the emergence of adversarial attacks has threatened the security of the entire recognition pipeline. Despite the sheer number of attack methods proposed for crafting adversarial examples in both digital and physical forms, it is never an easy task t… ▽ More Face recognition pipelines have been widely deployed in various mission-critical systems in trust, equitable and responsible AI applications. However, the emergence of adversarial attacks has threatened the security of the entire recognition pipeline. Despite the sheer number of attack methods proposed for crafting adversarial examples in both digital and physical forms, it is never an easy task to assess the real threat level of different attacks and obtain useful insight into the key risks confronted by face recognition systems. Traditional attacks view imperceptibility as the most important measurement to keep perturbations stealthy, while we suspect that industry professionals may possess a different opinion. In this paper, we delve into measuring the threat brought about by adversarial attacks from the perspectives of the industry and the applications of face recognition. In contrast to widely studied sophisticated attacks in the field, we propose an effective yet easy-to-launch physical adversarial attack, named AdvColor, against black-box face recognition pipelines in the physical world. AdvColor fools models in the recognition pipeline via directly supplying printed photos of human faces to the system under adversarial illuminations. Experimental results show that physical AdvColor examples can achieve a fooling rate of more than 96% against the anti-spoofing model and an overall attack success rate of 88% against the face recognition pipeline. We also conduct a survey on the threats of prevailing adversarial attacks, including AdvColor, to understand the gap between the machine-measured and human-assessed threat levels of different forms of adversarial attacks. The survey results surprisingly indicate that, compared to deliberately launched imperceptible attacks, perceptible but accessible attacks pose more lethal threats to real-world commercial systems of face recognition. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 19 pages, 12 figures

arXiv:2407.07501 [pdf]

Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7

Authors: Yidian Li, Xian Du, Yantao Cao, Cuiying Pei, Mingxin Zhang, Wenxuan Zhao, Kaiyi Zhai, Runzhe Xu, Zhongkai Liu, Zhiwei Li, Jinkui Zhao, Gang Li, Yanpeng Qi, Hanjie Guo, Yulin Chen, Lexian Yang

Abstract: High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio… ▽ More High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07249 [pdf, other]

Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion

Authors: Yu Cao, Shaogang Gong

Abstract: In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative `tr… ▽ More In the field of Few-Shot Image Generation (FSIG) using Deep Generative Models (DGMs), accurately estimating the distribution of target domain with minimal samples poses a significant challenge. This requires a method that can both capture the broad diversity and the true characteristics of the target domain distribution. We present Conditional Relaxing Diffusion Inversion (CRDI), an innovative `training-free' approach designed to enhance distribution diversity in synthetic image generation. Distinct from conventional methods, CRDI does not rely on fine-tuning based on only a few samples. Instead, it focuses on reconstructing each target image instance and expanding diversity through few-shot learning. The approach initiates by identifying a Sample-wise Guidance Embedding (SGE) for the diffusion model, which serves a purpose analogous to the explicit latent codes in certain Generative Adversarial Network (GAN) models. Subsequently, the method involves a scheduler that progressively introduces perturbations to the SGE, thereby augmenting diversity. Comprehensive experiments demonstrates that our method surpasses GAN-based reconstruction techniques and equals state-of-the-art (SOTA) FSIG methods in performance. Additionally, it effectively mitigates overfitting and catastrophic forgetting, common drawbacks of fine-tuning approaches. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06567 [pdf, other]

FinCon: A Synthesized LLM Multi-Agent System with Conceptual Verbal Reinforcement for Enhanced Financial Decision Making

Authors: Yangyang Yu, Zhiyuan Yao, Haohang Li, Zhiyang Deng, Yupeng Cao, Zhi Chen, Jordan W. Suchow, Rong Liu, Zhenyu Cui, Denghui Zhang, Koduvayur Subbalakshmi, Guojun Xiong, Yueru He, Jimin Huang, Dong Li, Qianqian Xie

Abstract: Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and man… ▽ More Large language models (LLMs) have demonstrated notable potential in conducting complex tasks and are increasingly utilized in various financial applications. However, high-quality sequential financial investment decision-making remains challenging. These tasks require multiple interactions with a volatile environment for every decision, demanding sufficient intelligence to maximize returns and manage risks. Although LLMs have been used to develop agent systems that surpass human teams and yield impressive investment returns, opportunities to enhance multi-sourced information synthesis and optimize decision-making outcomes through timely experience refinement remain unexplored. Here, we introduce the FinCon, an LLM-based multi-agent framework with CONceptual verbal reinforcement tailored for diverse FINancial tasks. Inspired by effective real-world investment firm organizational structures, FinCon utilizes a manager-analyst communication hierarchy. This structure allows for synchronized cross-functional agent collaboration towards unified goals through natural language interactions and equips each agent with greater memory capacity than humans. Additionally, a risk-control component in FinCon enhances decision quality by episodically initiating a self-critiquing mechanism to update systematic investment beliefs. The conceptualized beliefs serve as verbal reinforcement for the future agent's behavior and can be selectively propagated to the appropriate node that requires knowledge updates. This feature significantly improves performance while reducing unnecessary peer-to-peer communication costs. Moreover, FinCon demonstrates strong generalization capabilities in various financial tasks, including single stock trading and portfolio management. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: LLM Applications, LLM Agents, Financial Technology, Quantitative Finance, Algorithmic Trading, Cognitive Science

arXiv:2407.06505 [pdf]

Not all explicit cues help communicate: Pedestrians' perceptions, fixations, and decisions toward automated vehicles with varied appearance

Authors: Wei Lyu, Yaqin Cao, Yi Ding, Jingyu Li, Kai Tian, Hui Zhang

Abstract: Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with… ▽ More Given pedestrians' vulnerability in road traffic, it remains unclear how novel AV appearances will impact pedestrians crossing behaviour. To address this gap, this study pioneers an investigation into the influence of AVs' exterior design, correlated with their kinematics, on pedestrians' road-crossing perception and decision-making. A video-based eye-tracking experimental study was conducted with 61 participants who responded to video stimuli depicting a manipulated vehicle approaching a predefined road-crossing location on an unsignalized, two-way road. The vehicle's kinematic pattern was manipulated into yielding and non-yielding, and its external appearances were varied across five types: with a human driver (as a conventional vehicle), with no driver (as an AV), with text-based identity indications, with roof radar sensors, with dynamic eHMIs adjusted to vehicle kinematics. Participants' perceived clarity, crossing initiation distance (CID), crossing decision time (CDT), and gaze behaviour, during interactions were recorded and reported. The results indicated that AVs' kinematic profiles play a dominant role in pedestrians' road-crossing decisions, supported by their subjective evaluations, CID, CDT, and gaze patterns during interactions. Moreover, the use of clear eHMI, such as dynamic pedestrian icons, reduced pedestrians' visual load, enhanced their perceptual clarity, expedited road-crossing decisions, and thereby improved overall crossing efficiency. However, it was found that both textual identity indications and roof radar sensors have no significant effect on pedestrians' decisions but negatively impact pedestrians' visual attention, as evidenced by heightened fixation counts and prolonged fixation durations, particularly under yielding conditions. Excessive visual and cognitive resource occupation suggests that not all explicit cues facilitate human-vehicle communication. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: 37 pages, 13 figures, 4 tables

arXiv:2407.06177 [pdf, other]

Vision-Language Models under Cultural and Inclusive Considerations

Authors: Antonia Karamolegkou, Phillip Rust, Yong Cao, Ruixiang Cui, Anders Søgaard, Daniel Hershcovich

Abstract: Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives. Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case. To address this problem, we create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing… ▽ More Large vision-language models (VLMs) can assist visually impaired people by describing images from their daily lives. Current evaluation datasets may not reflect diverse cultural user backgrounds or the situational context of this use case. To address this problem, we create a survey to determine caption preferences and propose a culture-centric evaluation benchmark by filtering VizWiz, an existing dataset with images taken by people who are blind. We then evaluate several VLMs, investigating their reliability as visual assistants in a culturally diverse setting. While our results for state-of-the-art models are promising, we identify challenges such as hallucination and misalignment of automatic evaluation metrics with human judgment. We make our survey, data, code, and model outputs publicly available. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: HuCLLM @ ACL 2024

arXiv:2407.05718 [pdf, other]

A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to discover a solution for advancing creativity without relying on questionable randomness and to subtly reconcile the factuality and diversity within the source-grounded paradigm, a novel method named DoGe is proposed. DoGe can dynamically alternate between the utilization of internal parameter knowledge and external source knowledge based on the model's factual confidence. Extensive experiments on three widely-used datasets show that DoGe can not only enhance response diversity but also maintain factuality, and it significantly surpasses other various decoding strategy baselines. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05365 [pdf, other]

ElecBench: a Power Dispatch Evaluation Benchmark for Large Language Models

Authors: Xiyuan Zhou, Huan Zhao, Yuheng Cheng, Yuji Cao, Gaoqi Liang, Guolong Liu, Junhua Zhao

Abstract: In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural… ▽ More In response to the urgent demand for grid stability and the complex challenges posed by renewable energy integration and electricity market dynamics, the power sector increasingly seeks innovative technological solutions. In this context, large language models (LLMs) have become a key technology to improve efficiency and promote intelligent progress in the power sector with their excellent natural language processing, logical reasoning, and generalization capabilities. Despite their potential, the absence of a performance evaluation benchmark for LLM in the power sector has limited the effective application of these technologies. Addressing this gap, our study introduces "ElecBench", an evaluation benchmark of LLMs within the power sector. ElecBench aims to overcome the shortcomings of existing evaluation benchmarks by providing comprehensive coverage of sector-specific scenarios, deepening the testing of professional knowledge, and enhancing decision-making precision. The framework categorizes scenarios into general knowledge and professional business, further divided into six core performance metrics: factuality, logicality, stability, security, fairness, and expressiveness, and is subdivided into 24 sub-metrics, offering profound insights into the capabilities and limitations of LLM applications in the power sector. To ensure transparency, we have made the complete test set public, evaluating the performance of eight LLMs across various scenarios and metrics. ElecBench aspires to serve as the standard benchmark for LLM applications in the power sector, supporting continuous updates of scenarios, metrics, and models to drive technological progress and application. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05239 [pdf, other]

Competitive Analysis of Online Path Selection: Impacts of Path Length, Topology, and System-Level Costs

Authors: Ying Cao, Siyuan Yu, Xiaoqi Tan, Danny H. K. Tsang

Abstract: Consider a communication network to which a sequence of self-interested users come and send requests for data transmission between nodes. This work studies the question of how to guide the path selection choices made by those online-arriving users and maximize the social welfare. Competitive analysis is the main technical tool. Specifically, the impacts of path length bounds and topology on the co… ▽ More Consider a communication network to which a sequence of self-interested users come and send requests for data transmission between nodes. This work studies the question of how to guide the path selection choices made by those online-arriving users and maximize the social welfare. Competitive analysis is the main technical tool. Specifically, the impacts of path length bounds and topology on the competitive ratio of the designed algorithm are analyzed theoretically and explored experimentally. We observe intricate and interesting relationships between the empirical performance and the studied network parameters, which shed some light on how to design the network. We also investigate the influence of system-level costs on the optimal algorithm design. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04999 [pdf, other]

Rethinking the Effectiveness of Graph Classification Datasets in Benchmarks for Assessing GNNs

Authors: Zhengdao Li, Yong Cao, Kefan Shuai, Yiming Miao, Kai Hwang

Abstract: Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In respo… ▽ More Graph classification benchmarks, vital for assessing and developing graph neural networks (GNNs), have recently been scrutinized, as simple methods like MLPs have demonstrated comparable performance. This leads to an important question: Do these benchmarks effectively distinguish the advancements of GNNs over other methodologies? If so, how do we quantitatively measure this effectiveness? In response, we first propose an empirical protocol based on a fair benchmarking framework to investigate the performance discrepancy between simple methods and GNNs. We further propose a novel metric to quantify the dataset effectiveness by considering both dataset complexity and model performance. To the best of our knowledge, our work is the first to thoroughly study and provide an explicit definition for dataset effectiveness in the graph learning area. Through testing across 16 real-world datasets, we found our metric to align with existing studies and intuitive assumptions. Finally, we explore the causes behind the low effectiveness of certain datasets by investigating the correlation between intrinsic graph properties and class labels, and we developed a novel technique supporting the correlation-controllable synthetic dataset generation. Our findings shed light on the current understanding of benchmark datasets, and our new platform could fuel the future evolution of graph classification benchmarks. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04531 [pdf, other]

Neutral atomic and molecular gas dynamics in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496

Authors: Sebastian Laudage, Cosima Eibensteiner, Frank Bigiel, Adam K. Leroy, Sharon Meidt, Eva Schinnerer, W. J. G. de Blok, Miguele Querejeta, Sophia Stuber, Dario Colombo, Erik Rosolowsky, D. J. Pisano, Dyas Utomo, Rebecca C. Levy, Ralf Klessen, Yixian Cao, Eric W. Koch, Sushma Kurapati, Patricia Sanchez-Blazquez, Justus Neumann, Lukas Neumann, Hsi-An Pan, Thomas G. Williams

Abstract: Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and mol… ▽ More Neutral atomic gas (HI) effectively traces galactic dynamics across mid to large galactocentric radii. However, its limitations in observing small-scale changes within the central few kiloparsecs, coupled with the often observed HI deficit in galactic centers, necessitates using molecular gas emission as a preferred tracer in these regions. Understanding the dynamics of both neutral atomic and molecular gas is crucial for a more complete understanding of how galaxies evolve, funnel gas from the outer disk into their central parts, and eventually form stars. In this work we aim to quantify the dynamics of both, the neutral atomic and molecular gas, in the nearby spiral galaxies NGC 1512, NGC 4535, and NGC 7496 using new MeerKAT-HI observations together with ALMA CO (2-1) observations from the PHANGS collaboration. We use the analysis tool 3D-Barolo to fit tilted ring models to the HI and CO observations. A combined approach of using the HI to constrain the true disk orientation parameters before applying these to the CO datasets is tested. This paper sets expectations for the results of the upcoming high-resolution HI coverage of many galaxies in the PHANGS-ALMA sample using MeerKAT or VLA, to establish a robust methodology for characterizing galaxy orientations and deriving dynamics from combining new HI with existing CO data. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: accepted for publication in A&A; 13 pages, 9 Figures (+2 appendix pages)

arXiv:2407.03959 [pdf, other]

Skyrmion Hall effect in altermagnets

Authors: Zhejunyu Jin, Zhaozhuo Zeng, Yunshan Cao, Peng Yan

Abstract: It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrm… ▽ More It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrmion manifests as a magnetic quadrupole in altermagnets. We reveal a hidden gauge field from the magnetic quadrupole, which induces the skyrmion Hall effect when driven by spin transfer torque. Interestingly, we identify a sign change of the Hall angle when one swaps the anisotropic exchange couplings in altermagnets. Furthermore, we demonstrate that both the velocity and Hall angle of altermagnetic skyrmions sensitively depend on the current direction. Our findings real the critical role of magnetic quadrupole in driving the skyrmion Hall effect with vanishing charge, and pave the way to discovering new Hall effect of neutral quasiparticles beyond magnetic skyrmions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 6 pages and 5 figures

arXiv:2407.03320 [pdf, other]

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output

Authors: Pan Zhang, Xiaoyi Dong, Yuhang Zang, Yuhang Cao, Rui Qian, Lin Chen, Qipeng Guo, Haodong Duan, Bin Wang, Linke Ouyang, Songyang Zhang, Wenwei Zhang, Yining Li, Yang Gao, Peng Sun, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Hang Yan, Conghui He, Xingcheng Zhang, Kai Chen, Jifeng Dai, Yu Qiao , et al. (2 additional authors not shown)

Abstract: We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. Th… ▽ More We present InternLM-XComposer-2.5 (IXC-2.5), a versatile large-vision language model that supports long-contextual input and output. IXC-2.5 excels in various text-image comprehension and composition applications, achieving GPT-4V level capabilities with merely 7B LLM backend. Trained with 24K interleaved image-text contexts, it can seamlessly extend to 96K long contexts via RoPE extrapolation. This long-context capability allows IXC-2.5 to excel in tasks requiring extensive input and output contexts. Compared to its previous 2.0 version, InternLM-XComposer-2.5 features three major upgrades in vision-language comprehension: (1) Ultra-High Resolution Understanding, (2) Fine-Grained Video Understanding, and (3) Multi-Turn Multi-Image Dialogue. In addition to comprehension, IXC-2.5 extends to two compelling applications using extra LoRA parameters for text-image composition: (1) Crafting Webpages and (2) Composing High-Quality Text-Image Articles. IXC-2.5 has been evaluated on 28 benchmarks, outperforming existing open-source state-of-the-art models on 16 benchmarks. It also surpasses or competes closely with GPT-4V and Gemini Pro on 16 key tasks. The InternLM-XComposer-2.5 is publicly available at https://github.com/InternLM/InternLM-XComposer. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: Technical Report. https://github.com/InternLM/InternLM-XComposer

arXiv:2407.02715 [pdf, other]

Revealing the Electronic Structure of NiPS$_3$ through Synchrotron-Based ARPES and Alkali Metal Dosing

Authors: Yifeng Cao, Qishuo Tan, Yucheng Guo, Clóvis Guerim Vieira, Mário S. C. Mazzon, Jude Laverock, Nicholas Russo, Hongze Gao, Chris Jozwiak, Aaron Bostwick, Eli Rotenberg, Jinghua Guo, Ming Yi, Matheus J. S. Matos, Xi Ling, Kevin E. Smith

Abstract: This study presents a comprehensive analysis of the band structure in NiPS$_3$, a van der Waals layered antiferromagnet, utilizing high-resolution synchrotron-based angle-resolved photoemission spectroscopy (ARPES) and corroborative density functional theory (DFT) calculations. By tuning the parameters of the light source, we obtained a very clear and wide energy range band structure of NiPS$_3$.… ▽ More This study presents a comprehensive analysis of the band structure in NiPS$_3$, a van der Waals layered antiferromagnet, utilizing high-resolution synchrotron-based angle-resolved photoemission spectroscopy (ARPES) and corroborative density functional theory (DFT) calculations. By tuning the parameters of the light source, we obtained a very clear and wide energy range band structure of NiPS$_3$. Comparison with DFT calculations allows for the identification of the orbital character of the observed bands. Our DFT calculations perfectly match the experimental results, and no adaptations were made to the calculations based on the experimental outcomes. The appearance of novel electronic structure upon alkali metal dosing (AMD) were also obtained in this ARPES study. Above valence band maximum, structure of conduction bands and bands from defect states were firstly observed in NiPS$_3$. We provide the direct determination of the band gap of NiPS$_3$ as 1.3 eV from the band structure by AMD. In addition, detailed temperature dependent ARPES spectra were obtained across a range that spans both below and above the Néel transition temperature of NiPS$_3$. We found that the paramagnetic and antiferromagnetic states have almost identical spectra, indicating the highly localized nature of Ni $d$ states. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: 4 figures

arXiv:2407.02542 [pdf, other]

ECAT: A Entire space Continual and Adaptive Transfer Learning Framework for Cross-Domain Recommendation

Authors: Chaoqun Hou, Yuanhang Zhou, Yi Cao, Tong Liu

Abstract: In industrial recommendation systems, there are several mini-apps designed to meet the diverse interests and needs of users. The sample space of them is merely a small subset of the entire space, making it challenging to train an efficient model. In recent years, there have been many excellent studies related to cross-domain recommendation aimed at mitigating the problem of data sparsity. However,… ▽ More In industrial recommendation systems, there are several mini-apps designed to meet the diverse interests and needs of users. The sample space of them is merely a small subset of the entire space, making it challenging to train an efficient model. In recent years, there have been many excellent studies related to cross-domain recommendation aimed at mitigating the problem of data sparsity. However, few of them have simultaneously considered the adaptability of both sample and representation continual transfer setting to the target task. To overcome the above issue, we propose a Entire space Continual and Adaptive Transfer learning framework called ECAT which includes two core components: First, as for sample transfer, we propose a two-stage method that realizes a coarse-to-fine process. Specifically, we perform an initial selection through a graph-guided method, followed by a fine-grained selection using domain adaptation method. Second, we propose an adaptive knowledge distillation method for continually transferring the representations from a model that is well-trained on the entire space dataset. ECAT enables full utilization of the entire space samples and representations under the supervision of the target task, while avoiding negative migration. Comprehensive experiments on real-world industrial datasets from Taobao show that ECAT advances state-of-the-art performance on offline metrics, and brings +13.6% CVR and +8.6% orders for Baiyibutie, a famous mini-app of Taobao. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.02182 [pdf, other]

Occlusion-Aware Seamless Segmentation

Authors: Yihong Cao, Jiaming Zhang, Hao Shi, Kunyu Peng, Yuhongxuan Zhang, Hui Zhang, Rainer Stiefelhagen, Kailun Yang

Abstract: Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Ble… ▽ More Panoramic images can broaden the Field of View (FoV), occlusion-aware prediction can deepen the understanding of the scene, and domain adaptation can transfer across viewing domains. In this work, we introduce a novel task, Occlusion-Aware Seamless Segmentation (OASS), which simultaneously tackles all these three challenges. For benchmarking OASS, we establish a new human-annotated dataset for Blending Panoramic Amodal Seamless Segmentation, i.e., BlendPASS. Besides, we propose the first solution UnmaskFormer, aiming at unmasking the narrow FoV, occlusions, and domain gaps all at once. Specifically, UnmaskFormer includes the crucial designs of Unmasking Attention (UA) and Amodal-oriented Mix (AoMix). Our method achieves state-of-the-art performance on the BlendPASS dataset, reaching a remarkable mAPQ of 26.58% and mIoU of 43.66%. On public panoramic semantic segmentation datasets, i.e., SynPASS and DensePASS, our method outperforms previous methods and obtains 45.34% and 48.08% in mIoU, respectively. The fresh BlendPASS dataset and our source code will be made publicly available at https://github.com/yihong-97/OASS. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024. The fresh dataset and the source code will be made publicly available at https://github.com/yihong-97/OASS

arXiv:2407.02159 [pdf, other]

SparseSSP: 3D Subcellular Structure Prediction from Sparse-View Transmitted Light Images

Authors: Jintu Zheng, Yi Ding, Qizhe Liu, Yi Cao, Ying Hu, Zenan Wang

Abstract: Traditional fluorescence staining is phototoxic to live cells, slow, and expensive; thus, the subcellular structure prediction (SSP) from transmitted light (TL) images is emerging as a label-free, faster, low-cost alternative. However, existing approaches utilize 3D networks for one-to-one voxel level dense prediction, which necessitates a frequent and time-consuming Z-axis imaging process. Moreov… ▽ More Traditional fluorescence staining is phototoxic to live cells, slow, and expensive; thus, the subcellular structure prediction (SSP) from transmitted light (TL) images is emerging as a label-free, faster, low-cost alternative. However, existing approaches utilize 3D networks for one-to-one voxel level dense prediction, which necessitates a frequent and time-consuming Z-axis imaging process. Moreover, 3D convolutions inevitably lead to significant computation and GPU memory overhead. Therefore, we propose an efficient framework, SparseSSP, predicting fluorescent intensities within the target voxel grid in an efficient paradigm instead of relying entirely on 3D topologies. In particular, SparseSSP makes two pivotal improvements to prior works. First, SparseSSP introduces a one-to-many voxel mapping paradigm, which permits the sparse TL slices to reconstruct the subcellular structure. Secondly, we propose a hybrid dimensions topology, which folds the Z-axis information into channel features, enabling the 2D network layers to tackle SSP under low computational cost. We conduct extensive experiments to validate the effectiveness and advantages of SparseSSP on diverse sparse imaging ratios, and our approach achieves a leading performance compared to pure 3D topologies. SparseSSP reduces imaging frequencies compared to previous dense-view SSP (i.e., the number of imaging is reduced up to 87.5% at most), which is significant in visualizing rapid biological dynamics on low-cost devices and samples. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

Comments: Accpeted to ECCV2024

arXiv:2407.01953 [pdf, other]

CatMemo at the FinLLM Challenge Task: Fine-Tuning Large Language Models using Data Fusion in Financial Applications

Authors: Yupeng Cao, Zhiyuan Yao, Zhi Chen, Zhiyang Deng

Abstract: The integration of Large Language Models (LLMs) into financial analysis has garnered significant attention in the NLP community. This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks: financial classification, financial text summarization, and single stock trading. We adopted Llama3-8B and Mistral-7B a… ▽ More The integration of Large Language Models (LLMs) into financial analysis has garnered significant attention in the NLP community. This paper presents our solution to IJCAI-2024 FinLLM challenge, investigating the capabilities of LLMs within three critical areas of financial tasks: financial classification, financial text summarization, and single stock trading. We adopted Llama3-8B and Mistral-7B as base models, fine-tuning them through Parameter Efficient Fine-Tuning (PEFT) and Low-Rank Adaptation (LoRA) approaches. To enhance model performance, we combine datasets from task 1 and task 2 for data fusion. Our approach aims to tackle these diverse tasks in a comprehensive and integrated manner, showcasing LLMs' capacity to address diverse and complex financial tasks with improved accuracy and decision-making capabilities. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01881 [pdf, other]

Spectral evidence for NiPS3 as a Mott-Hubbard insulator

Authors: Yifeng Cao, Nicholas Russo, Qishuo Tan, Xi Ling, Jinghua Guo, Yi-de Chuang, Kevin E. Smith

Abstract: The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been appli… ▽ More The layered van der Waals trichalcogenide NiPS3 has attracted widespread attention due to its unique optical, magnetic, and electronic properties. The complexity of NiPS3 itself, however, has also led to ongoing debates regarding its characteristics such as the existence of self-doped ligand holes. In this study, X-ray absorption spectroscopy and resonant inelastic X-ray scattering have been applied to investigate the electronic structure of NiPS3. With the aid of theoretical calculations using the charge-transfer multiplet model, we provide experimental evidence for NiPS3 being a Mott-Hubbard insulator rather than a charge-transfer insulator. Moreover, we explain why some previous XAS studies have concluded that NiPS3 is a charge-transfer insulator by comparing surface and bulk sensitive spectra. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: 6 figures

arXiv:2407.01716 [pdf, other]

PHANGS-MeerKAT and MHONGOOSE HI observations of nearby spiral galaxies: physical drivers of the molecular gas fraction, $R_{\mathrm{mol}}$

Authors: Cosima Eibensteiner, Jiayi Sun, Frank Bigiel, Adam K. Leroy, Eva Schinnerer, Erik Rosolowsky, Sushma Kurapati, D. J. Pisano, W. J. G de Blok, Ashley T. Barnes, Mallory Thorp, Dario Colombo, Eric W. Koch, I-Da Chiang, Eve C. Ostriker, Eric J. Murphy, Nikki Zabel, Sebstian Laudage, Filippo M. Maccagni, Julia Healy, Srikrishna Sekhar, Dyas Utomo, Jakob den Brok, Yixian Cao, Mélanie Chevance , et al. (14 additional authors not shown)

Abstract: The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as… ▽ More The molecular-to-atomic gas ratio is crucial to the evolution of the interstellar medium in galaxies. We investigate the balance between the atomic ($Σ_{\rm HI}$) and molecular gas ($Σ_{\rm H2}$) surface densities in eight nearby star-forming galaxies using new high-quality observations from MeerKAT and ALMA (for HI and CO, respectively). We define the molecular gas ratio as $R_{\rm mol} = Σ_{\rm H2} / Σ_{\rm HI}$ and measure how it depends on local conditions in the galaxy disks using multi-wavelength observations. We find that, depending on the galaxy, HI is detected at $>3σ$ out to 20-120 kpc in galactocentric radius ($r_{\rm gal}$). The typical radius at which $Σ_{\rm HI}$ reaches 1~$\rm M_\odot~pc^{-2}$ is $r_{\rm HI}\approx22$~kpc, which corresponds to 1-3 times the optical radius ($r_{25}$). $R_{\rm mol}$ correlates best with the dynamical equilibrium pressure, P$_{\rm DE}$, among potential drivers studied, with a median correlation coefficient of $<ρ>=0.89$. Correlations between $R_{\rm mol}$ and star formation rate, total gas and stellar surface density, metallicity, and $Σ_{\rm SFR}$/P$_{\rm DE}$ are present but somewhat weaker. Our results also show a direct correlation between P$_{\rm DE}$ and $Σ_{\rm SFR}$, supporting self-regulation models. Quantitatively, we measure similar scalings as previous works and attribute the modest differences that we find to the effect of varying resolution and sensitivity. At $r_{\rm gal} {\gtrsim}0.4~r_{25}$, atomic gas dominates over molecular gas, and at the balance of these two gas phases, we find that the baryon mass is dominated by stars, with $Σ_{*} > 5~Σ_{\rm gas}$. Our study constitutes an important step in the statistical investigation of how local galaxy properties impact the conversion from atomic to molecular gas in nearby galaxies. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: accepted for publication in A&A; 20 pages, 12 Figures (+4 appendix pages)

arXiv:2407.01523 [pdf, other]

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Authors: Yubo Ma, Yuhang Zang, Liangyu Chen, Meiqi Chen, Yizhu Jiao, Xinze Li, Xinyuan Lu, Ziyu Liu, Yan Ma, Xiaoyi Dong, Pan Zhang, Liangming Pan, Yu-Gang Jiang, Jiaqi Wang, Yixin Cao, Aixin Sun

Abstract: Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark co… ▽ More Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document understanding (DU). However, their abilities on long-context DU remain an open problem. This work presents MMLongBench-Doc, a long-context, multi-modal benchmark comprising 1,062 expert-annotated questions. Distinct from previous datasets, it is constructed upon 130 lengthy PDF-formatted documents with an average of 49.4 pages and 20,971 textual tokens. Towards comprehensive evaluation, answers to these questions rely on pieces of evidence from (1) different sources (text, image, chart, table, and layout structure) and (2) various locations (i.e. page number). Moreover, 33.2% of the questions are cross-page questions requiring evidence across multiple pages. 22.8% of the questions are designed to be unanswerable for detecting potential hallucinations. Experiments on 14 LVLMs demonstrate that long-context DU greatly challenges current models. Notably, the best-performing model, GPT-4o, achieves an F1 score of only 42.7%, while the second-best, GPT-4V, scores 31.4%. Furthermore, 12 LVLMs (all except GPT-4o and GPT-4V) even present worse performance than their LLM counterparts which are fed with lossy-parsed OCR documents. These results validate the necessity of future research toward more capable long-context LVLMs. Project Page: https://mayubo2333.github.io/MMLongBench-Doc △ Less

Submitted 10 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01121 [pdf, other]

On well (edge) dominated and equimatchable strong product graphs

Authors: Yixin Cao, Guiqiang Mou, Jianxin Wang

Abstract: A graph is well-(edge-)dominated if every minimal (edge) dominating set is minimum. A graph is equimatchable if every maximal matching is maximum. We study these concepts on strong product graphs. We fully characterize well-edge-dominated and equimatchable strong product graphs of nontrivial graphs, and identify a large family of graphs whose strong products with any well-dominated graph are well-… ▽ More A graph is well-(edge-)dominated if every minimal (edge) dominating set is minimum. A graph is equimatchable if every maximal matching is maximum. We study these concepts on strong product graphs. We fully characterize well-edge-dominated and equimatchable strong product graphs of nontrivial graphs, and identify a large family of graphs whose strong products with any well-dominated graph are well-dominated. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.01008 [pdf]

Periodic domain inversion in single crystal barium titanate-on-insulator thin film

Authors: Pragati Aashna, Hong-Lin Lin, Yu Cao, Yuhui Yin, Yuan Gao, Sakthi Sanjeev Mohanraj, Di Zhu, Aaron Danner

Abstract: We report experimentally achieving first-ever electric field periodic poling of single crystal barium titanate (BTO, or BaTiO3) thin film on insulator. Owing to the outstanding optical nonlinearities of BTO, this result is a key step towards achieving quasi-phase-matching in BTO. We first grow the BTO thin film on a dysprosium scandate substrate using pulsed laser deposition with a thin layer of s… ▽ More We report experimentally achieving first-ever electric field periodic poling of single crystal barium titanate (BTO, or BaTiO3) thin film on insulator. Owing to the outstanding optical nonlinearities of BTO, this result is a key step towards achieving quasi-phase-matching in BTO. We first grow the BTO thin film on a dysprosium scandate substrate using pulsed laser deposition with a thin layer of strontium ruthenate later serving as the bottom electrode for poling. We present characterization of the BTO thin film using x-ray diffraction and piezo-response force microscopy to clearly demonstrate single crystal, single domain growth of the film which enables the desired periodic poling. To investigate the poling quality, we apply both non-destructive piezo force response microscopy and destructive etching-assisted scanning electron microscopy and we show that high quality, uniform and intransient poling with 50 % duty cycle and periods ranging from 2 μm to 10 μm is achieved. The successful realization of periodic poling in BTO thin film unlocks the potential for highly efficient nonlinear processes under quasi-phase-matching that seemed far-fetched with prior polycrystalline BTO thin films which predominantly relied on efficiency-limited random or non-phase matching conditions and is a key step towards integration of BTO photonic devices. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00497 [pdf, other]

LLMs-as-Instructors: Learning from Errors Toward Automating Model Improvement

Authors: Jiahao Ying, Mingbao Lin, Yixin Cao, Wei Tang, Bo Wang, Qianru Sun, Xuanjing Huang, Shuicheng Yan

Abstract: This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles… ▽ More This paper introduces the innovative "LLMs-as-Instructors" framework, which leverages the advanced Large Language Models (LLMs) to autonomously enhance the training of smaller target models. Inspired by the theory of "Learning from Errors", this framework employs an instructor LLM to meticulously analyze the specific errors within a target model, facilitating targeted and efficient training cycles. Within this framework, we implement two strategies: "Learning from Error," which focuses solely on incorrect responses to tailor training data, and "Learning from Error by Contrast", which uses contrastive learning to analyze both correct and incorrect responses for a deeper understanding of errors. Our empirical studies, conducted with several open-source models, demonstrate significant improvements across multiple benchmarks, including mathematical reasoning, coding abilities, and factual knowledge. Notably, the refined Llama-3-8b-Instruction has outperformed ChatGPT, illustrating the effectiveness of our approach. By leveraging the strengths of both strategies, we have attained a more balanced performance improvement on both in-domain and out-of-domain benchmarks. Our code can be found at https://yingjiahao14.github.io/LLMs-as-Instructors-pages/. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.20006 [pdf, other]

On the Trade-off between Flatness and Optimization in Distributed Learning

Authors: Ying Cao, Zhaoxian Wu, Kun Yuan, Ali H. Sayed

Abstract: This paper proposes a theoretical framework to evaluate and compare the performance of gradient-descent algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers two interesting results. F… ▽ More This paper proposes a theoretical framework to evaluate and compare the performance of gradient-descent algorithms for distributed learning in relation to their behavior around local minima in nonconvex environments. Previous works have noticed that convergence toward flat local minima tend to enhance the generalization ability of learning algorithms. This work discovers two interesting results. First, it shows that decentralized learning strategies are able to escape faster away from local minimizers and favor convergence toward flatter minima relative to the centralized solution in the large-batch training regime. Second, and importantly, the ultimate classification accuracy is not solely dependent on the flatness of the local minimizer but also on how well a learning algorithm can approach that minimum. In other words, the classification accuracy is a function of both flatness and optimization performance. The paper examines the interplay between the two measures of flatness and optimization error closely. One important conclusion is that decentralized strategies of the diffusion type deliver enhanced classification accuracy because it strikes a more favorable balance between flatness and optimization performance. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19317 [pdf, other]

Jump Starting Bandits with LLM-Generated Prior Knowledge

Authors: Parand A. Alamdari, Yanshuai Cao, Kevin H. Wilson

Abstract: We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human… ▽ More We present substantial evidence demonstrating the benefits of integrating Large Language Models (LLMs) with a Contextual Multi-Armed Bandit framework. Contextual bandits have been widely used in recommendation systems to generate personalized suggestions based on user-specific contexts. We show that LLMs, pre-trained on extensive corpora rich in human knowledge and preferences, can simulate human behaviours well enough to jump-start contextual multi-armed bandits to reduce online learning regret. We propose an initialization algorithm for contextual bandits by prompting LLMs to produce a pre-training dataset of approximate human preferences for the bandit. This significantly reduces online learning regret and data-gathering costs for training such models. Our approach is validated empirically through two sets of experiments with different bandit setups: one which utilizes LLMs to serve as an oracle and a real-world experiment utilizing data from a conjoint survey experiment. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17072 [pdf, other]

GATOS: missing molecular gas in the outflow of NGC5728 revealed by JWST

Authors: R. Davies, T. Shimizu, M. Pereira-Santaella, A. Alonso-Herrero, A. Audibert, E. Bellocchi, P. Boorman, S. Campbell, Y. Cao, F. Combes, D. Delaney, T. Diaz-Santos, F. Eisenhauer, D. Esparza Arredondo, H. Feuchtgruber, N. M. Forster Schreiber, L. Fuller, P. Gandhi, I. Garcia-Bernete, S. Garcia-Burillo, B. Garcia-Lorenzo, R. Genzel, S. Gillessen, O. Gonzalez Martin, H. Haidar , et al. (27 additional authors not shown)

Abstract: The ionisation cones of NGC5728 have a deficit of molecular gas based on millimetre observations of CO(2-1) emission. Although photoionisation from the active nucleus may lead to suppression of this transition, warm molecular gas can still be present. We report the detection of eight mid-infrared rotational H$_2$ lines throughout the central kiloparsec, including the ionisation cones, using integr… ▽ More The ionisation cones of NGC5728 have a deficit of molecular gas based on millimetre observations of CO(2-1) emission. Although photoionisation from the active nucleus may lead to suppression of this transition, warm molecular gas can still be present. We report the detection of eight mid-infrared rotational H$_2$ lines throughout the central kiloparsec, including the ionisation cones, using integral field spectroscopic observations with JWST/MIRI MRS. The H$_2$ line ratios, characteristic of a power-law temperature distribution, indicate that the gas is warmest where it enters the ionisation cone through disk rotation, suggestive of shock excitation. In the nucleus, where the data can be combined with an additional seven ro-vibrational H$_2$ transitions, we find that moderate velocity (30 km s$^{-1}$) shocks in dense ($10^5$ cm$^{-3}$) gas, irradiated by an external UV field ($G_0 = 10^3$), do provide a good match to the full set. The warm molecular gas in the ionisation cone that is traced by the H$_2$ rotational lines has been heated to temperatures $>200$ K. Outside of the ionisation cone the molecular gas kinematics are undisturbed. However, within the ionisation cone, the kinematics are substantially perturbed, indicative of a radial flow, but one that is quantitatively different from the ionised lines. We argue that this outflow is in the plane of the disk, implying a short 50 pc acceleration zone up to speeds of about 400 km s$^{-1}$ followed by an extended deceleration over $\sim$700 pc where it terminates. The deceleration is due to both the radially increasing galaxy mass, and mass-loading as ambient gas in the disk is swept up. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: A&A accepted; 16 pages

arXiv:2406.17067 [pdf]

Optical Control of Adaptive Nanoscale Domain Networks

Authors: Marc Zajac, Tao Zhou, Tiannan Yang, Sujit Das, Yue Cao, Burak Guzelturk, Vladimir Stoica, Mathew Cherukara, John W. Freeland, Venkatraman Gopalan, Ramamoorthy Ramesh, Lane W. Martin, Long-Qing Chen, Martin Holt, Stephan Hruszkewycz, Haidan Wen

Abstract: Adaptive networks can sense and adjust to dynamic environments to optimize their performance. Understanding their nanoscale responses to external stimuli is essential for applications in nanodevices and neuromorphic computing. However, it is challenging to image such responses on the nanoscale with crystallographic sensitivity. Here, the evolution of nanodomain networks in (PbTiO3)n/(SrTiO3)n supe… ▽ More Adaptive networks can sense and adjust to dynamic environments to optimize their performance. Understanding their nanoscale responses to external stimuli is essential for applications in nanodevices and neuromorphic computing. However, it is challenging to image such responses on the nanoscale with crystallographic sensitivity. Here, the evolution of nanodomain networks in (PbTiO3)n/(SrTiO3)n superlattices was directly visualized in real space as the system adapts to ultrafast repetitive optical excitations that emulate controlled neural inputs. The adaptive response allows the system to explore a wealth of metastable states that were previously inaccessible. Their reconfiguration and competition were quantitatively measured by scanning x-ray nanodiffraction as a function of the number of applied pulses, in which crystallographic characteristics were quantitatively assessed by assorted diffraction patterns using unsupervised machine-learning methods. The corresponding domain boundaries and their connectivity were drastically altered by light, holding promise for light-programmable nanocircuits in analogy to neuroplasticity. Phase-field simulations elucidate that the reconfiguration of the domain networks is a result of the interplay between photocarriers and transient lattice temperature. The demonstrated optical control scheme and the uncovered nanoscopic insights open opportunities for remote control of adaptive nanoscale domain networks. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16671 [pdf, other]

STAR: Swarm Technology for Aerial Robotics Research

Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges, we present STAR (Swarm Technology for Aerial Robotics Research), a framework developed explicitly to improve the accessibility of aerial swarm research experiments. Our framework introduces a swarm architecture based on the Crazyflie, a low-cost, open-source, palm-sized aerial platform, well suited for experimental swarm algorithms. To augment cost-effectiveness and mitigate the limitations of employing low-cost robots in experiments, we propose a landmark-based localization module leveraging fiducial markers. This module, also serving as a target detection module, enhances the adaptability and versatility of the framework. Additionally, collision and obstacle avoidance are implemented through velocity obstacles. The presented work strives to bridge the gap between theoretical advances and tangible implementations, thus fostering progress in the field. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16353 [pdf]

Micropores can enhance intrinsic fracture energy of hydrogels

Authors: Puyu Cao, Bin Chen, Yi Cao, Huajian Gao

Abstract: It is widely known that hydrogels, a class of soft materials made of a polymer chain network, are prone to fatigue failure. To understand the underlying mechanism, here we simulate polymer scission and fatigue initiation in the vicinity of a crack tip in a two-dimensional chain network. For a network without pores, our findings reveal that polymer scission can take place across multiple layers of… ▽ More It is widely known that hydrogels, a class of soft materials made of a polymer chain network, are prone to fatigue failure. To understand the underlying mechanism, here we simulate polymer scission and fatigue initiation in the vicinity of a crack tip in a two-dimensional chain network. For a network without pores, our findings reveal that polymer scission can take place across multiple layers of chains, rather than just a single layer as assumed in the classical Lake-Thomas theory, in consistency with previus studies. For a network with a high density of micropores, our results demonstrate that the pores can substantially enhance the intrinsic fracture energy of the network in direct proportion to the pore size. The underlying mechanism is attributed to pore-pore interactions which lead to a relatively uniform distribution of cohesive energy ahead of the crack tip. Our model suggests that micropores could be a promising strategy for improving the intrinsic fracture energy of hydrogels and that natural porous tissues may have evolved for enhanced fatigue resistance. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16253 [pdf, other]

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis. △ Less

Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16058 [pdf, other]

Text-Queried Target Sound Event Localization

Authors: Jinzheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

Abstract: Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the… ▽ More Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA). However, current SELD systems can only predict the activities of specific classes, for example, 13 classes in DCASE challenges. In this paper, we propose text-queried target sound event localization (SEL), a new paradigm that allows the user to input the text to describe the sound event, and the SEL model can predict the location of the related sound event. The proposed task presents a more user-friendly way for human-computer interaction. We provide a benchmark study for the proposed task and perform experiments on datasets created by simulated room impulse response (RIR) and real RIR to validate the effectiveness of the proposed methods. We hope that our benchmark will inspire the interest and additional research for text-queried sound source localization. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Accepted by EUSIPCO 2024

arXiv:2406.15439 [pdf]

doi 10.1038/s41467-024-49228-7

Heterogeneous peer effects of college roommates on academic performance

Authors: Yi Cao, Tao Zhou, Jian Gao

Abstract: Understanding how student peers influence learning outcomes is crucial for effective education management in complex social systems. The complexities of peer selection and evolving peer relationships, however, pose challenges for identifying peer effects using static observational data. Here we use both null-model and regression approaches to examine peer effects using longitudinal data from 5,272… ▽ More Understanding how student peers influence learning outcomes is crucial for effective education management in complex social systems. The complexities of peer selection and evolving peer relationships, however, pose challenges for identifying peer effects using static observational data. Here we use both null-model and regression approaches to examine peer effects using longitudinal data from 5,272 undergraduates, where roommate assignments are plausibly random upon enrollment and roommate relationships persist until graduation. Specifically, we construct a roommate null model by randomly shuffling students among dorm rooms and introduce an assimilation metric to quantify similarities in roommate academic performance. We find significantly larger assimilation in actual data than in the roommate null model, suggesting roommate peer effects, whereby roommates have more similar performance than expected by chance alone. Moreover, assimilation exhibits an overall increasing trend over time, suggesting that peer effects become stronger the longer roommates live together. Our regression analysis further reveals the moderating role of peer heterogeneity. In particular, when roommates perform similarly, the positive relationship between a student's future performance and their roommates' average prior performance is more pronounced, and their ordinal rank in the dorm room has an independent effect. Our findings contribute to understanding the role of college roommates in influencing student academic performance. △ Less

Submitted 29 May, 2024; originally announced June 2024.

Comments: 56 pages, 4 figures, 2 tables, with Supplementary Information

Journal ref: Nature Communications, 15(1), 4785 (2024)

arXiv:2406.14912 [pdf, other]

FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing

Authors: Zhibo Du, Long Peng, Yang Wang, Yang Cao, Zheng-Jun Zha

Abstract: Moiré patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoiréing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\t… ▽ More Moiré patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoiréing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\textbf{C}oder-de\textbf{C}oder based \textbf{D}emoiréing \textbf{Net}work (FC3DNet). FC3DNet utilizes features with multiple scales in each stage of the decoder for comprehensive information, which contains long-range patterns as well as various local moiré styles that both are crucial aspects in demoiréing. Besides, to make full use of multiple features, we design a Multi-Feature Multi-Attention Fusion (MFMAF) module to weigh the importance of each feature and compress them for efficiency. These designs enable our network to achieve performance comparable to state-of-the-art (SOTA) methods in real-world datasets while utilizing only a fraction of parameters, FLOPs, and runtime. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted by ICIP2024

arXiv:2406.14841 [pdf, other]

TabularMark: Watermarking Tabular Datasets for Machine Learning

Authors: Yihao Zheng, Haocheng Xia, Junyuan Pang, Jinfei Liu, Kui Ren, Lingyang Chu, Yang Cao, Li Xiong

Abstract: Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we… ▽ More Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is employed, which can reliably determine the presence of the watermark. Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13870 [pdf, other]

Splatter a Video: Video Gaussian Representation for Versatile Processing

Authors: Yang-Tian Sun, Yi-Hua Huang, Lin Ma, Xiaoyang Lyu, Yan-Pei Cao, Xiaojuan Qi

Abstract: Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a no… ▽ More Video representation is a long-standing problem that is crucial for various down-stream tasks, such as tracking,depth prediction,segmentation,view synthesis,and editing. However, current methods either struggle to model complex motions due to the absence of 3D structure or rely on implicit 3D representations that are ill-suited for manipulation tasks. To address these challenges, we introduce a novel explicit 3D representation-video Gaussian representation -- that embeds a video into 3D Gaussians. Our proposed representation models video appearance in a 3D canonical space using explicit Gaussians as proxies and associates each Gaussian with 3D motions for video motion. This approach offers a more intrinsic and explicit representation than layered atlas or volumetric pixel matrices. To obtain such a representation, we distill 2D priors, such as optical flow and depth, from foundation models to regularize learning in this ill-posed setting. Extensive applications demonstrate the versatility of our new video representation. It has been proven effective in numerous video processing tasks, including tracking, consistent video depth and feature refinement, motion and appearance editing, and stereoscopic video generation. Project page: https://sunyangtian.github.io/spatter_a_video_web/ △ Less

Submitted 26 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13167 [pdf, other]

QRMeM: Unleash the Length Limitation through Question then Reflection Memory Mechanism

Authors: Bo Wang, Heyan Huang, Yixin Cao, Jiahao Ying, Wei Tang, Chong Feng

Abstract: While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques… ▽ More While large language models (LLMs) have made notable advancements in natural language processing, they continue to struggle with processing extensive text. Memory mechanism offers a flexible solution for managing long contexts, utilizing techniques such as compression, summarization, and structuring to facilitate nuanced and efficient handling of large volumes of text. However, existing techniques face challenges with static knowledge integration, leading to insufficient adaptation to task-specific needs and missing multi-segmentation relationships, which hinders the dynamic reorganization and logical combination of relevant segments during the response process. To address these issues, we introduce a novel strategy, Question then Reflection Memory Mechanism (QRMeM), incorporating a dual-structured memory pool. This pool synergizes static textual content with structured graph guidance, fostering a reflective trial-and-error approach for navigating and identifying relevant segments. Our evaluation across multiple-choice questions (MCQ) and multi-document question answering (Multi-doc QA) benchmarks showcases QRMeM enhanced performance compared to existing approaches. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.13093 [pdf, other]

RITA: A Real-time Interactive Talking Avatars Framework

Authors: Wuxinlin Cheng, Cheng Wan, Yupeng Cao, Sihan Chen

Abstract: RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances t… ▽ More RITA presents a high-quality real-time interactive framework built upon generative models, designed with practical applications in mind. Our framework enables the transformation of user-uploaded photos into digital avatars that can engage in real-time dialogue interactions. By leveraging the latest advancements in generative modeling, we have developed a versatile platform that not only enhances the user experience through dynamic conversational avatars but also opens new avenues for applications in virtual reality, online education, and interactive gaming. This work showcases the potential of integrating computer vision and natural language processing technologies to create immersive and interactive digital personas, pushing the boundaries of how we interact with digital content. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12333 [pdf]

Permeability distribution of gas drainage of borehole with the different moisture content caused polar permeability effect

Authors: Lei Zhang, Yao Zhang, Hongyu Pan, Yan Cao, Yuhang Chu, Shihua Yang

Abstract: In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability in… ▽ More In order to study the penetration characteristics in areas with different water content and different stress distributions in the radial direction of the hole after hydraulicization measures, an improved LFTD1812 triaxial permeability meter was used to conduct a test to measure the polar permeability characteristics of coal with different water content combinations were measured by permeability instrument, and the porosity, permeability, pressure gradient and seepage velocity of different samples were analyzed. The relationship between sample porosity, permeability, pressure gradient and seepage velocity was discussed, the influence of moisture content on permeability was discussed, and the directionality and the directivity and polarization effect of permeability were found.. Result shows that The relationship between permeability and porosity shows two trends of exponential type and logarithmic type, and the porosity-permeability(φ-k) plane is divided into three influence regions: super index (I), index (II) and logarithm (III). △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 12 pages,10 figures

arXiv:2406.12268 [pdf, ps, other]

Channel Twinning: An Enabler for Next-Generation Ubiquitous Wireless Connectivity

Authors: Yashuai Cao, Jingbo Tan, Jintao Wang, Wei Ni, Ekram Hossain, Dusit Niyato

Abstract: The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. Ho… ▽ More The emerging concept of channel twinning (CT) has great potential to become a key enabler of ubiquitous connectivity in next-generation (xG) wireless systems. By fusing multimodal sensor data, CT advocates a high-fidelity and low-overhead channel acquisition paradigm, which is promising to provide accurate channel prediction in cross-domain and high-mobility scenarios of ubiquitous xG networks. However, the current literature lacks a universal CT architecture to address the challenges of heterogeneous scenarios, data, and resources in xG networks, which hinders the widespread deployment and applications of CT. This article discusses a new modularized CT architecture to bridge the barriers to scene recognition, cooperative sensing, and decentralized training. Based on the modularized design of CT, universal channel modeling, multimodal cooperative sensing, and lightweight twin modeling are described. Moreover, this article provides a concise definition, technical features, and case studies of CT, followed by potential applications of CT-empowered ubiquitous connectivity and some issues requiring future investigations. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: submitted to IEEE

arXiv:2406.12025 [pdf, other]

A 260 pc resolution ALMA map of HCN(1-0) in the galaxy NGC 4321

Authors: Lukas Neumann, Frank Bigiel, Ashley T. Barnes, Molly J. Gallagher, Adam Leroy, Antonio Usero, Erik Rosolowsky, Ivana Bešlić, Médéric Boquien, Yixian Cao, Mélanie Chevance, Dario Colombo, Daniel A. Dale, Cosima Eibensteiner, Kathryn Grasha, Jonathan D. Henshaw, María J. Jiménez-Donaire, Sharon Meidt, Shyam H. Menon, Eric J. Murphy, Hsi-An Pan, Miguel Querejeta, Toshiki Saito, Eva Schinnerer, Sophia K. Stuber , et al. (2 additional authors not shown)

Abstract: The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and… ▽ More The star formation rate (SFR) is tightly connected to the amount of dense gas in molecular clouds. However, it is not fully understood how the relationship between dense molecular gas and star formation varies within galaxies and in different morphological environments. In this work, we study dense gas and star formation in the nearby spiral galaxy NGC 4321 to test how the amount of dense gas and its ability to form stars varies with environmental properties at 260 pc scales. We present new ALMA observations of HCN(1-0) line emission. Combined with existing CO(2-1) observations from ALMA, and H-alpha from MUSE, as well as F2100W from JWST to trace the SFR, we measure the HCN/CO line ratio, a proxy for the dense gas fraction and SFR/HCN, a proxy for the star formation efficiency of the dense gas. Towards the centre of the galaxy, HCN/CO systematically increases while SFR/HCN decreases, but these ratios stay roughly constant throughout the disc. Spiral arms, interarm regions, and bar ends show similar HCN/CO and SFR/HCN. On the bar, there is a significantly lower SFR/HCN at a similar HCN/CO. We conclude that the centres of galaxies show the strongest environmental influence on dense gas and star formation, suggesting either that clouds couple strongly to the surrounding pressure or that HCN is tracing more of the bulk molecular gas that is less efficiently converted into stars. On the contrary, across the disc of NGC 4321, where the ISM pressure is typically low, SFR/HCN does not show large variations (< 0.3 dex) in agreement with Galactic observations of molecular clouds. Despite the large variations across environments and physical conditions, HCN/CO is a good predictor of the mean molecular gas surface density at 260 pc scales. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 18 pages, 9 figures, accepted for pub in A&A, Jun 13, 2024

arXiv:2406.11739 [pdf, other]

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11507 [pdf, other]

Prior Normality Prompt Transformer for Multi-class Industrial Image Anomaly Detection

Authors: Haiming Yao, Yunkang Cao, Wei Luo, Weihang Zhang, Wenyong Yu, Weiming Shen

Abstract: Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to… ▽ More Image anomaly detection plays a pivotal role in industrial inspection. Traditional approaches often demand distinct models for specific categories, resulting in substantial deployment costs. This raises concerns about multi-class anomaly detection, where a unified model is developed for multiple classes. However, applying conventional methods, particularly reconstruction-based models, directly to multi-class scenarios encounters challenges such as identical shortcut learning, hindering effective discrimination between normal and abnormal instances. To tackle this issue, our study introduces the Prior Normality Prompt Transformer (PNPT) method for multi-class image anomaly detection. PNPT strategically incorporates normal semantics prompting to mitigate the "identical mapping" problem. This entails integrating a prior normality prompt into the reconstruction process, yielding a dual-stream model. This innovative architecture combines normal prior semantics with abnormal samples, enabling dual-stream reconstruction grounded in both prior knowledge and intrinsic sample characteristics. PNPT comprises four essential modules: Class-Specific Normality Prompting Pool (CS-NPP), Hierarchical Patch Embedding (HPE), Semantic Alignment Coupling Encoding (SACE), and Contextual Semantic Conditional Decoding (CSCD). Experimental validation on diverse benchmark datasets and real-world industrial applications highlights PNPT's superior performance in multi-class industrial anomaly detection. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Accepted by IEEE Transactions on Industrial Informatics

arXiv:2406.10650 [pdf, other]

The Implicit Bias of Adam on Separable Data

Authors: Chenyang Zhang, Difan Zou, Yuan Cao

Abstract: Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim… ▽ More Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maximum $\ell_\infty$-margin. Notably, for a general class of diminishing learning rates, this convergence occurs within polynomial time. Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 33 pages, 2 figures

arXiv:2406.10583 [pdf, other]

Demonstration of neutron identification in neutrino interactions in the MicroBooNE liquid argon time projection chamber

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (165 additional authors not shown)

Abstract: A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data const… ▽ More A significant challenge in measurements of neutrino oscillations is reconstructing the incoming neutrino energies. While modern fully-active tracking calorimeters such as liquid argon time projection chambers in principle allow the measurement of all final state particles above some detection threshold, undetected neutrons remain a considerable source of missing energy with little to no data constraining their production rates and kinematics. We present the first demonstration of tagging neutrino-induced neutrons in liquid argon time projection chambers using secondary protons emitted from neutron-argon interactions in the MicroBooNE detector. We describe the method developed to identify neutrino-induced neutrons and demonstrate its performance using neutrons produced in muon-neutrino charged current interactions. The method is validated using a small subset of MicroBooNE's total dataset. The selection yields a sample with $60\%$ of selected tracks corresponding to neutron-induced secondary protons. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0301

arXiv:2406.10123 [pdf, other]

Improving neutrino energy estimation of charged-current interaction events with recurrent neural networks in MicroBooNE

Authors: MicroBooNE collaboration, P. Abratenko, O. Alterkait, D. Andrade Aldana, L. Arellano, J. Asaadi, A. Ashkenazi, S. Balasubramanian, B. Baller, A. Barnard, G. Barr, D. Barrow, J. Barrow, V. Basque, J. Bateman, O. Benevides Rodrigues, S. Berkman, A. Bhanderi, A. Bhat, M. Bhattacharya, M. Bishai, A. Blake, B. Bogart, T. Bolton, J. Y. Book , et al. (164 additional authors not shown)

Abstract: We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstr… ▽ More We present a deep learning-based method for estimating the neutrino energy of charged-current neutrino-argon interactions. We employ a recurrent neural network (RNN) architecture for neutrino energy estimation in the MicroBooNE experiment, utilizing liquid argon time projection chamber (LArTPC) detector technology. Traditional energy estimation approaches in LArTPCs, which largely rely on reconstructing and summing visible energies, often experience sizable biases and resolution smearing because of the complex nature of neutrino interactions and the detector response. The estimation of neutrino energy can be improved after considering the kinematics information of reconstructed final-state particles. Utilizing kinematic information of reconstructed particles, the deep learning-based approach shows improved resolution and reduced bias for the muon neutrino Monte Carlo simulation sample compared to the traditional approach. In order to address the common concern about the effectiveness of this method on experimental data, the RNN-based energy estimator is further examined and validated with dedicated data-simulation consistency tests using MicroBooNE data. We also assess its potential impact on a neutrino oscillation study after accounting for all statistical and systematic uncertainties and show that it enhances physics sensitivity. This method has good potential to improve the performance of other physics analyses. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Report number: FERMILAB-PUB-24-0287

arXiv:2406.09447 [pdf, ps, other]

Self-Sustainable Active Reconfigurable Intelligent Surfaces for Anti-Jamming in Wireless Communications

Authors: Yang Cao, Wenchi Cheng, Jingqing Wang, Wei Zhang

Abstract: Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power lo… ▽ More Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power load, which increases the complexity of hardware and restricts the flexible deployment of active RIS. To overcome these drawbacks, we design a innovative self-sustainable structure in this paper, where the active RIS is energized by harvesting energy from base station (BS) signals through the time dividing based simultaneous wireless information and power transfer (TD-SWIPT) scheme. Based on the above structure, we develop the BS harvesting scheme based on joint transmit and reflecting beamforming with the aim of maximizing the achievable rate of active RIS-assisted system, where the alternating optimization (AO) algorithm based on stochastic successive convex approximation (SSCA) tackles the nonconvex optimization problem in the scheme. Simulation results verified the effectiveness of our developed BS harvesting scheme, which can attain higher anti-jamming performance than other schemes when given the same maximum transmit power. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: submitted to IEEE systems journal

Showing 1–50 of 2,153 results for author: Cao, Y