subscribe to arXiv mailings

Observation of a $p$-orbital higher-order topological insulator phase in puckered lattice acoustic metamaterials

Authors: Bing-Quan Wu, Zhi-Kang Lin, Li-Wei Wang, Jian-Hua Jiang

Abstract: The puckered lattice geometry, along with $p$-orbitals is often overlooked in the study of topological physics. Here, we investigate the higher-order topology of the $p_{x,y}$-orbital bands in acoustic metamaterials using a simplified two-dimensional phosphorene lattice which possesses a puckered structure. Notably, unlike the $s$-orbital bands in planar lattices, the unique higher-order topology… ▽ More The puckered lattice geometry, along with $p$-orbitals is often overlooked in the study of topological physics. Here, we investigate the higher-order topology of the $p_{x,y}$-orbital bands in acoustic metamaterials using a simplified two-dimensional phosphorene lattice which possesses a puckered structure. Notably, unlike the $s$-orbital bands in planar lattices, the unique higher-order topology observed here is specific to $p$-orbitals and the puckered geometry due to the unusual hopping patterns induced by them. {Using acoustic pump-probe measurements in metamaterials}, we confirm the emergence of the edge and corner states arising due to the unconventional higher-order topology. We reveal the uniqueness of the higher-order topological physics here via complimentary tight-binding calculations, finite-element simulations, and acoustic experiments. We analyze the underlying physics of the special properties of the edge and corner states in the puckered lattice acoustic metamaterials from the picture of Wannier orbitals. Our work sheds light on the intriguing physics of $p$-orbital topological physics in puckered lattices and acoustic metamaterials which lead to unconventional topological boundary states. \end{abstract} △ Less

Submitted 9 May, 2024; originally announced May 2024.

Comments: Accepted by Phys. Rev. B

arXiv:2405.06170 [pdf]

doi 10.1103/PhysRevB.108.195126

Non-Hermitian topological phases and skin effects in kagome lattices

Authors: Li-Wei Wang, Zhi-Kang Lin, Jian-Hua Jiang

Abstract: Non-Hermitian physics has added new ingredients to topological physics, leading to the rising frontier of non-Hermitian topological phases. In this study, we investigate Chern insulator phases emerging from non-Hermitian kagome models with non-reciprocal and pure imaginary next-nearest neighbor hoppings. In the presence or absence of $C_3$ rotation symmetry, hybrid topological-skin effects are exp… ▽ More Non-Hermitian physics has added new ingredients to topological physics, leading to the rising frontier of non-Hermitian topological phases. In this study, we investigate Chern insulator phases emerging from non-Hermitian kagome models with non-reciprocal and pure imaginary next-nearest neighbor hoppings. In the presence or absence of $C_3$ rotation symmetry, hybrid topological-skin effects are explored through the identification of distinct corner skin modes in different energy regions within two band gaps. By employing the dynamical analysis, the underlying physics is revealed from the non-Hermitian skin effects associated with the chiral edge states, leading to diverse non-Hermitian bulk-boundary responses. The simplicity of these kagome models and their rich emergent topological phenomena suggest that they are appealing candidates for studying non-Hermitian topological phases. We further discuss the possible realizations of these models in non-Hermitian metamaterials. △ Less

Submitted 9 May, 2024; originally announced May 2024.

Journal ref: Phys. Rev. B 108, 195126 (2023)

arXiv:2405.05975 [pdf, other]

Deep-learning design of graphene metasurfaces for quantum control and Dirac electron holography

Authors: Chen-Di Han, Li-Li Ye, Zin Lin, Vassilios Kovanis, Ying-Cheng Lai

Abstract: Metasurfaces are sub-wavelength patterned layers for controlling waves in physical systems. In optics, meta-surfaces are created by materials with different dielectric constants and are capable of unconventional functionalities. We develop a deep-learning framework for Dirac-material metasurface design for controlling electronic waves. The metasurface is a configuration of circular graphene quantu… ▽ More Metasurfaces are sub-wavelength patterned layers for controlling waves in physical systems. In optics, meta-surfaces are created by materials with different dielectric constants and are capable of unconventional functionalities. We develop a deep-learning framework for Dirac-material metasurface design for controlling electronic waves. The metasurface is a configuration of circular graphene quantum dots, each created by an electric potential. Employing deep convolutional neural networks, we show that the original scattering wave can be reconstructed with fidelity over 95$\%$, suggesting the feasibility of Dirac electron holography. Additional applications such as plane wave generation, designing broadband, and multi-functionality graphene metasurface systems are illustrated. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: 13 pages, 9 figures

arXiv:2405.05803 [pdf, other]

Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

Abstract: Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention s… ▽ More Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention sink phenomenon that is prevalent in LLMs also persists in MLLMs, suggesting that initial tokens and nearest tokens receive the majority of attention, while middle vision tokens garner minimal attention in deep layers; (2) the presence of information migration, which implies that visual information is transferred to subsequent text tokens within the first few layers of MLLMs. As per our findings, we conclude that vision tokens are not necessary in the deep layers of MLLMs. Thus, we strategically withdraw them at a certain layer, enabling only text tokens to engage in subsequent layers. To pinpoint the ideal layer for vision tokens withdrawal, we initially analyze a limited set of tiny datasets and choose the first layer that meets the Kullback-Leibler divergence criterion. Our VTW approach can cut computational overhead by over 40\% across diverse multimodal tasks while maintaining performance. Our code is released at https://github.com/lzhxmu/VTW. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.05252 [pdf, other]

Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable. To this end, we introduce the Attention-driven Training-free Efficient Diffusion Model (AT-EDM) framework that leverages attention maps to perform run-time pruning of redundant tokens, without the need for any retraining. Specifically, for single-denoising-step pruning, we develop a novel ranking algorithm, Generalized Weighted Page Rank (G-WPR), to identify redundant tokens, and a similarity-based recovery method to restore tokens for the convolution operation. In addition, we propose a Denoising-Steps-Aware Pruning (DSAP) approach to adjust the pruning budget across different denoising timesteps for better generation quality. Extensive evaluations show that AT-EDM performs favorably against prior art in terms of efficiency (e.g., 38.8% FLOPs saving and up to 1.53x speed-up over Stable Diffusion XL) while maintaining nearly the same FID and CLIP scores as the full model. Project webpage: https://atedm.github.io. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

arXiv:2405.04342 [pdf, other]

The Curse of Diversity in Ensemble-Based Exploration

Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions -- such as a larger replay buffer or a smaller ensemble size -- either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2405.04332 [pdf, other]

WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive security analysis but also a pressing need for specialized tools that can aid developers in reducing vulnerabilities during the development process. To fill the void, we present a comprehensive security analysis of browser-based wallets in this paper, along with the development of an automated tool designed for this purpose. We first compile a taxonomy of security vulnerabilities resident in cryptocurrency wallets by harvesting historical security reports. Based on this, we design WALLETRADAR, an automated detection framework that can accurately identify security issues based on static and dynamic analysis. Evaluation of 96 popular browser-based wallets shows WALLETRADAR's effectiveness, by successfully automating the detection process in 90% of these wallets with high precision. This evaluation has led to the discovery of 116 security vulnerabilities corresponding to 70 wallets. By the time of this paper, we have received confirmations of 10 vulnerabilities from 8 wallet developers, with over $2,000 bug bounties. Further, we observed that 12 wallet developers have silently fixed 16 vulnerabilities after our disclosure. WALLETRADAR can effectively automate the identification of security risks in cryptocurrency wallets, thereby enhancing software development quality and safety in the blockchain ecosystem. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Just accepted by the Automated Software Engineering Journal

arXiv:2405.04269

An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm

Authors: Zixin Lin, Nur Fariha Syaqina Zulkepli, Mohd Shareduwan Mohd Kasihmuddin, R. U. Gobithaasan

Abstract: The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. W… ▽ More The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. We applied a hybridization of clustering technique to analyze data categorization and topological data analysis method to enhance the performance of our clustering analysis. Specifically, our approach utilized the persistent homology and $k$-means/$k$-means++ clustering. The fourteen data sets from fourteen tide gauge stations were categorized in classes based on a prior categorization that was determined by topological information, and the probability of data points that belong to certain groups that is yielded by $k$-means/$k$-means++ clustering. Our results demonstrated that our method significantly improves the performance of traditional clustering techniques. △ Less

Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: There are some mistakes in the submission, and it needs major revision

arXiv:2405.04086 [pdf, other]

Optimizing Language Model's Reasoning Abilities with Weak Supervision

Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities with minimal human supervision. In this work, we introduce self-reinforcement, which begins with Supervised Fine-Tuning (SFT) of the model using a small collection of annotated questions. Then it iteratively improves LLMs by learning from the differences in responses from the SFT and unfinetuned models on unlabeled questions. Our approach provides an efficient approach without relying heavily on extensive human-annotated explanations. However, current reasoning benchmarks typically only include golden-reference answers or rationales. Therefore, we present \textsc{PuzzleBen}, a weakly supervised benchmark that comprises 25,147 complex questions, answers, and human-generated rationales across various domains, such as brainteasers, puzzles, riddles, parajumbles, and critical reasoning tasks. A unique aspect of our dataset is the inclusion of 10,000 unannotated questions, enabling us to explore utilizing fewer supersized data to boost LLMs' inference capabilities. Our experiments underscore the significance of \textsc{PuzzleBen}, as well as the effectiveness of our methodology as a promising direction in future endeavors. Our dataset and code will be published soon on \texttt{Anonymity Link}. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2405.03990 [pdf, other]

TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

Authors: Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models. △ Less

Submitted 19 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. The extended version of this paper is at arXiv:2404.14204

arXiv:2405.03613 [pdf, other]

Dual Relation Mining Network for Zero-Shot Learning

Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, which can lead to classification ambiguity when different attributes share similar attention regions, and semantic relationship between attributes is rarely discussed. To alleviate the above problems, we propose a Dual Relation Mining Network (DRMN) to enable more effective visual-semantic interactions and learn semantic relationship among attributes for knowledge transfer. Specifically, we introduce a Dual Attention Block (DAB) for visual-semantic relationship mining, which enriches visual information by multi-level feature fusion and conducts spatial attention for visual to semantic embedding. Moreover, an attribute-guided channel attention is utilized to decouple entangled semantic features. For semantic relationship modeling, we utilize a Semantic Interaction Transformer (SIT) to enhance the generalization of attribute representations among images. Additionally, a global classification branch is introduced as a complement to human-defined semantic attributes, and we then combine the results with attribute-based classification. Extensive experiments demonstrate that the proposed DRMN leads to new state-of-the-art performances on three standard ZSL benchmarks, i.e., CUB, SUN, and AwA2. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02057 [pdf]

Probing fragile topology with a screw dislocation

Authors: Ying Wu, Zhi-Kang Lin, Yating Yang, Zhida Song, Feng Li, Jian-Hua Jiang

Abstract: Fragile topology, akin to twisted bilayer graphene and the exotic phases therein, is a notable topological class with intriguing properties. However, due to its unique nature and the lack of bulk-edge correspondence, the experimental signature of fragile topology has been under debated since its birth. Here, we demonstrate experimentally that fragile topological phases with filling anomaly can be… ▽ More Fragile topology, akin to twisted bilayer graphene and the exotic phases therein, is a notable topological class with intriguing properties. However, due to its unique nature and the lack of bulk-edge correspondence, the experimental signature of fragile topology has been under debated since its birth. Here, we demonstrate experimentally that fragile topological phases with filling anomaly can be probed via screw dislocations, despite that they do not support gapless edge states. Using a designer hexagonal phononic crystal with a fragile topological band gap, we find that 1D gapless bound modes can emerge at a screw dislocation due to the bulk fragile topology. We then establish a connection between our system and the twisted boundary condition via the gauge invariance principle and illustrate that such an emergent phenomenon is an intrinsic property of fragile topological phases with filling anomaly. We observe experimentally the 1D topological bound states using the pump-probe measurements of their dispersion and wavefunctions, which unveils a novel bulk-defect correspondence of fragile topology and a powerful tool for probing fragile topological phases and materials. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: Submitted to Science Bulletin

arXiv:2405.01851 [pdf, other]

Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been explored to optimize computation distribution, achieve load balance, and minimize communication cost across processors. Yet their practical effectiveness in the dynamic and diverse real-world mobile environment is less explored. This paper presents a holistic empirical study to assess the capabilities and challenges associated with parallel DL inference on heterogeneous mobile processors. Through carefully designed experiments covering various DL models, mobile software/hardware environments, workload patterns, and resource availability, we identify limitations of existing techniques and highlight opportunities for cross-level optimization. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.00954 [pdf, other]

X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplifying optimization through step-by-step generation. To tackle oversaturation, we introduce Adaptive Variational Parameter (AVP), representing avatars as an adaptive distribution during training. Additionally, we present Avatar-aware Score Distillation Sampling (ASDS), a novel technique that incorporates avatar-aware noise into rendered images for improved generation quality during optimization. Extensive evaluations confirm the superiority of X-Oscar over existing text-to-3D and text-to-avatar approaches. Our anonymous project page: https://xmu-xiaoma666.github.io/Projects/X-Oscar/. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: ICML2024

arXiv:2405.00700 [pdf]

Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from practical application. Due to the special insulator-metal phase transition, Vanadium Dioxide (VO2) has been considered as an idea candidate for neuronal device fabrication. However, its intrinsic insulating state requires the VO2 neuronal device to be driven under large bias voltage, resulting in high power consumption and low frequency. Thus in the current study, we have addressed this challenge by preparing oxygen vacancies modulated VO2 film(VO2-x) and fabricating the VO2-x neuronal devices for Spiking Neural Networks (SNNs) construction. Results indicate the neuron devices can be operated under lower voltage with improved processing speed. The proposed VO2-x based back-propagation SNNs (BP-SNNs) system, trained with the MNIST dataset, demonstrates excellent accuracy in image recognition. Our study not only demonstrates the VO2-x based neurons and SNN system for practical application, but also offers an effective way to optimize the future neuromorphic computing systems by defect engineering strategy. △ Less

Submitted 16 April, 2024; originally announced May 2024.

Comments: 18 pages,4 figures

arXiv:2404.19209 [pdf, other]

AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

Authors: Zheng Lin, Bin Guo, Sicong Liu, Wentao Zhou, Yasan Ding, Yu Zhang, Zhiwen Yu

Abstract: Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning… ▽ More Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning computations across different processors for parallelism and speedup doesn't necessarily correlate with energy consumption optimization and may even increase it. To address this, we present AdaOper, an energy-efficient concurrent DNN inference system. It optimizes energy efficiency on mobile heterogeneous processors while maintaining responsiveness. AdaOper includes a runtime energy profiler that dynamically adjusts operator partitioning to optimize energy efficiency based on dynamic device conditions. We conduct preliminary experiments, which show that AdaOper reduces energy consumption by 16.88% compared to the existing concurrent method while ensuring real-time performance. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18829 [pdf, other]

Disentangling the development of collective flow in high energy proton proton collisions with a multiphase transport model

Authors: Liang Zheng, Lian Liu, Zi-Wei Lin, Qi-Ye Shou, Zhong-Bao Yin

Abstract: In this work, we investigate the collective flow development in high energy proton proton (pp) collisions with a multiphase transport model (AMPT) based on PYTHIA8 initial conditions with a sub-nucleon structure. It is found that the PYTHIA8 based AMPT model can reasonably describe both the charged hadron productions and elliptic flow experimental data measured in pp collisions at $\sqrt{s}=13$ Te… ▽ More In this work, we investigate the collective flow development in high energy proton proton (pp) collisions with a multiphase transport model (AMPT) based on PYTHIA8 initial conditions with a sub-nucleon structure. It is found that the PYTHIA8 based AMPT model can reasonably describe both the charged hadron productions and elliptic flow experimental data measured in pp collisions at $\sqrt{s}=13$ TeV. By turning on the parton and hadron rescatterings in AMPT separately, we find that the observed collective flow in pp collisions is largely developed during the parton evolutions, while no significant flow effect can be generated with the pure hadronic rescatterings. It is also shown that the parton escape mechanism is important for describing both the magnitude of the two-particle cumulant and the sign of the four-particle cumulants. We emphasize that the strong mass ordering of the elliptic flow results from the coalescence process in the transport model can thus be regarded as unique evidence related to the creation of deconfined parton matter in high energy pp collisions. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18533 [pdf, other]

Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic a… ▽ More Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic and non-deterministic, e.g. case study or human evaluation, hindering the development of the field. To bridge the gap, we approach concept-based explanation evaluation via faithfulness and readability. We first introduce a formal definition of concept generalizable to diverse concept-based explanations. Based on this, we quantify faithfulness via the difference in the output upon perturbation. We then provide an automatic measure for readability, by measuring the coherence of patterns that maximally activate a concept. This measure serves as a cost-effective and reliable substitute for human evaluation. Finally, based on measurement theory, we describe a meta-evaluation method for evaluating the above measures via reliability and validity, which can be generalized to other tasks as well. Extensive experimental analysis has been conducted to validate and inform the selection of concept evaluation measures. △ Less

Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18173 [pdf, other]

Eigenvector overlaps in large sample covariance matrices and nonlinear shrinkage estimators

Authors: Zeqin Lin, Guangming Pan

Abstract: Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $Σ$. Let $\mathbf{u}_i$ and $\mathbf{v}_j$ denote the left and right singular vectors of $Y$, respectively. This study investigates the eigenvector/singular vector overlaps… ▽ More Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $Σ$. Let $\mathbf{u}_i$ and $\mathbf{v}_j$ denote the left and right singular vectors of $Y$, respectively. This study investigates the eigenvector/singular vector overlaps $\langle {\mathbf{u}_i, D_1 \mathbf{u}_j} \rangle$, $\langle {\mathbf{v}_i, D_2 \mathbf{v}_j} \rangle$ and $\langle {\mathbf{u}_i, D_3 \mathbf{v}_j} \rangle$, where $D_k$ are general deterministic matrices with bounded operator norms. We establish the convergence in probability of these eigenvector overlaps toward their deterministic counterparts with explicit convergence rates, when the dimension $M$ scales proportionally with the sample size $N$. Building on these findings, we offer a more precise characterization of the loss for Ledoit and Wolf's nonlinear shrinkage estimators of the population covariance $Σ$. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.17808 [pdf, other]

Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while keeping all tokens that have be… ▽ More Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while keeping all tokens that have been merged in the vocabulary, it unavoidably holds tokens that primarily represent subwords of complete words and appear infrequently on their own in the text corpus. We term such tokens as Scaffold Tokens. Due to their infrequent appearance in the text corpus, Scaffold Tokens pose a learning imbalance issue for language models. To address that issue, we propose Scaffold-BPE, which incorporates a dynamic scaffold token removal mechanism by parameter-free, computation-light, and easy-to-implement modifications to the original BPE. This novel approach ensures the exclusion of low-frequency Scaffold Tokens from the token representations for the given texts, thereby mitigating the issue of frequency imbalance and facilitating model training. On extensive experiments across language modeling tasks and machine translation tasks, Scaffold-BPE consistently outperforms the original BPE, well demonstrating its effectiveness and superiority. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.17785 [pdf, other]

Temporal Scaling Law for Large Language Models

Authors: Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jianwei Niu, Guiguang Ding

Abstract: Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss… ▽ More Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss of an LLM throughout its pre-training process remains unexplored, though it is valuable in many aspects, such as selecting better hyperparameters \textit{directly} on the target LLM. In this paper, we propose the novel concept of Temporal Scaling Law, studying how the test loss of an LLM evolves as the training steps scale up. In contrast to modeling the test loss as a whole in a coarse-grained manner, we break it down and dive into the fine-grained test loss of each token position, and further develop a dynamic hyperbolic-law. Afterwards, we derive the much more precise temporal scaling law by studying the temporal patterns of the parameters in the dynamic hyperbolic-law. Results on both in-distribution (ID) and out-of-distribution (OOD) validation datasets demonstrate that our temporal scaling law accurately predicts the test loss of LLMs across training steps. Our temporal scaling law has broad practical applications. First, it enables direct and efficient hyperparameter selection on the target LLM, such as data mixture proportions. Secondly, viewing the LLM pre-training dynamics from the token position granularity provides some insights to enhance the understanding of LLM pre-training. △ Less

Submitted 16 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures; Under review

arXiv:2404.17466 [pdf, other]

FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

Abstract: Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model dem… ▽ More Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model demonstrates success in reconstructing nonlinear kink mode structures by learning from a limited amount of nonlinear simulation data. The knowledge transfer process leverages a pre-trained neural encoder-decoder network, initially trained on linear simulations, to effectively capture nonlinear dynamics. The low-dimensional embeddings extract the coherent structures of interest, while preserving the inherent dynamics of the complex system. Experimental results highlight FTL's capacity to capture transitional behaviors and dynamical features in plasma dynamics -- a task often challenging for conventional methods. The model developed in this study is generalizable and can be extended broadly through transfer learning to address various magnetohydrodynamics (MHD) modes. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 18 pages, 10 figures

MSC Class: 76W05; 68T45 ACM Class: J.2; I.2.10

arXiv:2404.16994 [pdf, other]

PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

Authors: Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

Abstract: Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existi… ▽ More Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existing image-language pre-trained model for dense video understanding. Our preliminary experiments reveal that directly fine-tuning pre-trained image-language models with multiple frames as inputs on video datasets leads to performance saturation or even a drop. Our further investigation reveals that it is largely attributed to the bias of learned high-norm visual features. Motivated by this finding, we propose a simple but effective pooling strategy to smooth the feature distribution along the temporal dimension and thus reduce the dominant impacts from the extreme features. The new model is termed Pooling LLaVA, or PLLaVA in short. PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks. Notably, on the recent popular VideoChatGPT benchmark, PLLaVA achieves a score of 3.48 out of 5 on average of five evaluated dimensions, exceeding the previous SOTA results from GPT4V (IG-VLM) by 9%. On the latest multi-choice benchmark MVBench, PLLaVA achieves 58.1% accuracy on average across 20 sub-tasks, 14.5% higher than GPT4V (IG-VLM). Code is available at https://pllava.github.io/ △ Less

Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.16811 [pdf, other]

Make Your LLM Fully Utilize the Context

Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

Abstract: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t… ▽ More While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM. △ Less

Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

Comments: 19 pages, 7 figures, 3 tables, 9 examples

arXiv:2404.16575 [pdf, other]

Probing the pole origin of $X(3872)$ with the coupled-channel dynamics

Authors: Jun-Zhang Wang, Zi-Yang Lin, Yan-Ke Chen, Lu Meng, Shi-Lin Zhu

Abstract: The $X(3872)$, as the first and the most crucial member in the exotic charmoniumlike $XYZ$ family, has been studied for a long time. However, its dynamical origin, whether stemming from a $D\bar{D}^*$ hadronic molecule or the first excited $P$-wave charmonium $χ_{c1}(2P)$, remains controversial. In this Letter, we demonstrate that the $X(3872)$ definitely does not result from the mass shift of the… ▽ More The $X(3872)$, as the first and the most crucial member in the exotic charmoniumlike $XYZ$ family, has been studied for a long time. However, its dynamical origin, whether stemming from a $D\bar{D}^*$ hadronic molecule or the first excited $P$-wave charmonium $χ_{c1}(2P)$, remains controversial. In this Letter, we demonstrate that the $X(3872)$ definitely does not result from the mass shift of the higher bare $χ_{c1}(2P)$ resonance pole in the coupled-channel dynamics involving a short-distance $c\bar{c}$ core and the long-distance $D\bar{D}^*$ channels. Instead, it originates from either the $D\bar{D}^*$ molecular pole or the shadow pole associated with the $P$-wave charmonium, which depends on the concrete coupling mode between the $c\bar{c}$ and $D\bar{D}^*$. In order to further exploit the nature of $X(3872)$, we carefully investigate potential mechanisms that contribute to its pole width, which suggests that the coupled-channel dynamics plays a critical role in causing a noticeable discrepancy between the pole widths of $X(3872)$ and $T_{cc}^+$. Interestingly, we bridge the quantitative connection among the dynamics origin of $X(3872)$, its pole width and the properties of the predicted new resonance. The precise measurement of the pole width of $X(3872)$ and the search for the new charmoniumlike resonance become highly significant and can be anticipated in future LHCb, BESIII and Belle II experiments. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 11 pages, 5 figures

arXiv:2404.16469 [pdf, ps, other]

From weak to strong-coupling superconductivity tuned by substrate in TiN films

Authors: Yixin Liu, Zulei Xu, Aobo Yu, Xiaoni Wang, Wei Peng, Yu Wu, Gang Mu, Zhi-Rong Lin

Abstract: The interplay between substrates and superconducting thin films has attracted increasing attention. Here, we report an in-depth investigation on superconducting properties of the epitaxial TiN thin films grown on two different substrates by dc reactive magnetron sputtering. The TiN films grown on (0001) sapphire exhibit (111) crystal orientation, while that grown on (100) Si substrates exhibit (10… ▽ More The interplay between substrates and superconducting thin films has attracted increasing attention. Here, we report an in-depth investigation on superconducting properties of the epitaxial TiN thin films grown on two different substrates by dc reactive magnetron sputtering. The TiN films grown on (0001) sapphire exhibit (111) crystal orientation, while that grown on (100) Si substrates exhibit (100) orientation. Moreover, the samples grown on Si reveal a relatively lower level of disorder, accompanied by the higher critical transition temperature $T_c$ and smaller magnitude of upper critical field slope near $T_c$. Remarkably, we uncovered a rather high value of superconducting gap (with $Δ_0/k_BT_c$ = 3.05) in TiN film on Si indicating a very strong coupling superconductivity, in sharp contrast to the case using sapphires as the substrate which reveals a weak-coupling feature. Further analysis shows that the weakened electronic screening effect due to the high level of disorder and the suppressed electronic density of states may be the underlying reasons for the occurrence of weak coupling superconductivity in the TiN films based on sapphire substrate. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures

arXiv:2404.15750 [pdf, other]

A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems

Authors: Xin Jin, Tiejun Lv, Wei Ni, Zhipeng Lin, Qiuming Zhu, Ekram Hossain, H. Vincent Poor

Abstract: Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-pl… ▽ More Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-plus-noise ratio for radar sensing. Considering the non-convexity of this problem arising from multiplicative coupling of the analog and digital beamforming, we convert the sum-rate maximization into an equivalent weighted mean-square error minimization and apply penalty dual decomposition to decouple the analog and digital beamforming. Specifically, a second-order cone program is first constructed to optimize the fully digital counterpart of the HAD beamforming. Then, the sparsity of the RS architecture is exploited to obtain a low-complexity solution for the HAD beamforming. The convergence and complexity analyses of our algorithm are carried out under the RS architecture. Simulations corroborate that, with the RS architecture, DFRC offers effective communication and sensing and improves energy efficiency by 83.4% and 114.2% with a moderate number of radio frequency chains and phase shifters, compared to the persistently- and fullyconnected architectures, respectively. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 14 pages, 9 figures, Accepted by IEEE TWC

arXiv:2404.15701 [pdf, other]

USmorph: An Updated Framework of Automatic Classification of Galaxy Morphologies and Its Application to Galaxies in the COSMOS Field

Authors: Jie Song, GuanWen Fang, Shuo Ba, Zesen Lin, Yizhou Gu, Chichun Zhou, Tao Wang, Cai-Na Hao, Guilin Liu, Hongxin Zhang, Yao Yao, Xu Kong

Abstract: Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing s… ▽ More Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing step. The updated method is applied to the galaxies with $I_{\rm mag}<25$ at $0.2<z<1.2$ in the COSMOS field. Based on their HST/ACS I-band images, we classify them into five distinct morphological types: spherical (SPH, 15,200), early-type disk (ETD, 17,369), late-type disk (LTD, 21,143), irregular disk (IRR, 28,965), and unclassified (UNC, 17,129). In addition, we have conducted both parametric and nonparametric morphological measurements. For galaxies with stellar masses exceeding $10^{9}M_{\sun}$, a gradual increase in effective radius from SPHs to IRRs is observed, accompanied by a decrease in the Sérsic index. Nonparametric morphologies reveal distinct distributions of galaxies across the $Gini-M_{20}$ and $C-A$ parameter spaces for different categories. Moreover, different categories exhibit significant dissimilarity in their $G_2$ and $Ψ$ distributions. We find morphology to be strongly correlated with redshift and stellar mass. The consistency of these classification results with expected correlations among multiple parameters underscores the validity and reliability of our classification method, rendering it a valuable tool for future studies. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: Accepted by ApJS, 16 pages, 12 figures

arXiv:2404.15141 [pdf, other]

CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

Abstract: Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapol… ▽ More Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapolation but cuts a standard patch diffusion process into an initial phase focused on comprehensive structure denoising and a subsequent phase dedicated to specific detail refinement. Comprehensive experiments highlight the numerous almighty advantages of CutDiffusion: (1) simple method construction that enables a concise higher-resolution diffusion process without third-party engagement; (2) fast inference speed achieved through a single-step higher-resolution diffusion process, and fewer inference patches required; (3) cheap GPU cost resulting from patch-wise inference and fewer patches during the comprehensive structure denoising; (4) strong generation performance, stemming from the emphasis on specific detail refinement. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14978 [pdf, ps, other]

A Law of large numbers for vector-valued linear statistics of Bergman DPP

Authors: Zhaofeng Lin, Yanqi Qiu, Kai Wang

Abstract: We establish a law of large numbers for a certain class of vector-valued linear statistics for the Bergman determinantal point process on the unit disk. Our result seems to be the first LLN for vector-valued linear statistics in the setting of determinantal point processes. As an application, we prove that, for almost all configurations $X$ with respect to with respect to the Bergman determinantal… ▽ More We establish a law of large numbers for a certain class of vector-valued linear statistics for the Bergman determinantal point process on the unit disk. Our result seems to be the first LLN for vector-valued linear statistics in the setting of determinantal point processes. As an application, we prove that, for almost all configurations $X$ with respect to with respect to the Bergman determinantal point process, the weighted Poincaré series (we denote by $d_{h}(\cdot,\cdot)$ the hyperbolic distance on $\mathbb{D}$) \begin{align*} \sum_{k=0}^\infty\sum_{x\in X\atop k\le d_{h}(z,x)<k+1}e^{-sd_{\mathrm{h}}(z,x)}f(x) \end{align*} cannot be simultaneously convergent for all Bergman functions $f\in A^2(\mathbb{D})$ whenever $1<s<3/2$. This confirms a result announced without proof in Bufetov-Qiu's work. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2404.14663 [pdf, other]

VLBI with SKA: Possible Arrays and Astrometric Science

Authors: Yingjie Li, Ye Xu, Jingjing Li, Shuaibo Bian, Zehao Lin, Chaojie Hao, Dejian Liu

Abstract: The next generation of very long baseline interferometry (VLBI) is stepping into the era of microarcsecond ($μ$as) astronomy, and pushing astronomy, especially astrometry, to new heights. VLBI with the Square Kilometre Array (SKA), SKA-VLBI, will increase current sensitivity by an order of magnitude, and reach astrometric precision routinely below 10 $μ$as, even challenging 1 $μ$as. This advanceme… ▽ More The next generation of very long baseline interferometry (VLBI) is stepping into the era of microarcsecond ($μ$as) astronomy, and pushing astronomy, especially astrometry, to new heights. VLBI with the Square Kilometre Array (SKA), SKA-VLBI, will increase current sensitivity by an order of magnitude, and reach astrometric precision routinely below 10 $μ$as, even challenging 1 $μ$as. This advancement allows precise parallax and proper motion measurements of various celestial objects. Such improvements can be used to study objects (including isolated objects, and binary or multiple systems) in different stellar stages (such as star formation, main-sequence stars, asymptotic giant branch stars, pulsars, black holes, white dwarfs, etc.), unveil the structure and evolution of complex systems (such as the Milky Way), benchmark the international celestial reference frame, and reveal cosmic expansion. Furthermore, the theory of general relativity can also be tested with SKA-VLBI using precise measurements of light deflection under the gravitational fields of different solar system objects and the perihelion precession of solar system objects. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 41 pages, 12 figures, 4 tables. Accepted to RAA (Review)

arXiv:2404.14219 [pdf, other]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts. △ Less

Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2404.14204 [pdf, other]

TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

Authors: Guanqiao Qu, Zheng Lin, Qian Chen, Jian Li, Fangming Liu, Xianhao Chen, Kaibin Huang

Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observation that a wide range of AI models, such as convolutional neural networks or large language models, can share a significant proportion of parameter blocks containing reusable knowledge, thereby improving storage efficiency. To this end, we formulate a parameter-sharing model placement problem to maximize the cache hit ratio in multi-edge wireless networks by balancing the fundamental tradeoff between storage efficiency and service latency. We show that the formulated problem is a submodular maximization problem with submodular constraints, for which no polynomial-time approximation algorithm exists. To overcome this challenge, we study an important special case, where a small fixed number of parameter blocks are shared across models, which often holds in practice. In such a case, a polynomial-time algorithm with $\left(1-ε\right)/2$-approximation guarantee is developed. Subsequently, we address the original problem for the general case by developing a greedy algorithm. Simulation results demonstrate that the proposed TrimCaching framework significantly improves the cache hit ratio compared with state-of-the-art content caching without exploiting shared parameters in AI models. △ Less

Submitted 12 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 15 pages, 11 figures. Part of this work has been accepted by ICDCS 2024

arXiv:2404.13931 [pdf, other]

Polynomial effective density in quotient of $\mathrm{SL}_2(\mathbb{Q}_p) \times \mathrm{SL}_2(\mathbb{Q}_p)$

Authors: Zuo Lin

Abstract: We prove an effective density theorem with polynomial error rate for orbits of upper triangular subgroup of $\mathrm{SL}_2(\mathbb{Q}_p)$ in $\mathrm{SL}_2(\mathbb{Q}_p) \times \mathrm{SL}_2(\mathbb{Q}_p)$ for prime number $p > 3$. The proof is based on the use of Margulis function, a restricted projection theorem on $\mathbb{Q}_p^3$, and spectral gap of the ambient space. We prove an effective density theorem with polynomial error rate for orbits of upper triangular subgroup of $\mathrm{SL}_2(\mathbb{Q}_p)$ in $\mathrm{SL}_2(\mathbb{Q}_p) \times \mathrm{SL}_2(\mathbb{Q}_p)$ for prime number $p > 3$. The proof is based on the use of Margulis function, a restricted projection theorem on $\mathbb{Q}_p^3$, and spectral gap of the ambient space. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 39 pages

MSC Class: 37A17; 37A25

arXiv:2404.12767 [pdf, other]

On the Path to High-temperature Josephson Multi-junction Devices

Authors: Xu Wang, Fucong Chen, Zefeng Lin, Changhong Yuan, Shibing Tian, Chunguang Li, Victor Kornev, Nikolay Kolotinskiy

Abstract: We report our progress in the high-temperature superconductor (HTS) Josephson junction fabrication process founded on using a focused helium ion beam damaging technique and discuss the expected device performance attainable with the HTS multi-junction device technology. Both the achievable high value of characteristic voltage $V_c=I_cR_N$ of Josephson junctions and the ability to design a large nu… ▽ More We report our progress in the high-temperature superconductor (HTS) Josephson junction fabrication process founded on using a focused helium ion beam damaging technique and discuss the expected device performance attainable with the HTS multi-junction device technology. Both the achievable high value of characteristic voltage $V_c=I_cR_N$ of Josephson junctions and the ability to design a large number of arbitrary located Josephson junctions allow narrowing the existing gap in design abilities for LTS and HTS circuits even with using a single YBCO film layer. A one-layer topology of active electrically small antenna is suggested and its voltage response characteristics are considered. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: 6 pages, 5 figures; submitted to EM Science

arXiv:2404.12674 [pdf, other]

Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

Authors: Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens

Abstract: Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance i… ▽ More Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance in input data distribution, and the use of different communication devices and topologies (e.g., NVLink, PCIe, network cards) that connect multiple compute devices, coupled with the desire for flexible training configurations. Built on top of our prior work for single-GPU platforms, we address these challenges and enable multi-GPU performance modeling by incorporating (1) data-distribution-aware performance models for embedding table lookup, and (2) data movement prediction of communication collectives, into our upgraded performance modeling pipeline equipped with inter-and intra-rank synchronization for ML workloads trained on multi-GPU platforms. Beyond accurately predicting the per-iteration training time of DLRM models with random configurations with a geomean error of 5.21% on two multi-GPU platforms, our prediction pipeline generalizes well to other types of ML workloads, such as Transformer-based NLP models with a geomean error of 3.00%. Moreover, even without actually running ML workloads like DLRMs on the hardware, it is capable of generating insights such as quickly selecting the fastest embedding table sharding configuration (with a success rate of 85%). △ Less

Submitted 27 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

Comments: 12 pages, 11 figures, 4 tables

arXiv:2404.11677 [pdf, other]

Cross-Problem Learning for Solving Vehicle Routing Problems

Authors: Zhuoyi Lin, Yaoxin Wu, Bangjian Zhou, Zhiguang Cao, Wen Song, Yingqian Zhang, Senthilnath Jayavelu

Abstract: Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transform… ▽ More Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transformer for tackling the travelling salesman problem (TSP), and 2) the additional lightweight modules for processing problem-specific features in complex VRPs. Accordingly, we propose to pre-train the backbone Transformer for TSP, and then apply it in the process of fine-tuning the Transformer models for each target VRP variant. On the one hand, we fully fine-tune the trained backbone Transformer and problem-specific modules simultaneously. On the other hand, we only fine-tune small adapter networks along with the modules, keeping the backbone Transformer still. Extensive experiments on typical VRPs substantiate that 1) the full fine-tuning achieves significantly better performance than the one trained from scratch, and 2) the adapter-based fine-tuning also delivers comparable performance while being notably parameter-efficient. Furthermore, we empirically demonstrate the favorable effect of our method in terms of cross-distribution application and versatility. △ Less

Submitted 18 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI'24

arXiv:2404.11199 [pdf, other]

RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models

Authors: Han Huang, Ziqian Lin, Dongchen He, Liang Hong, Yu Li

Abstract: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA… ▽ More RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA sequences directly from 3D structures is still challenging, due to the scarcity of data, the non-unique structure-sequence mapping, and the flexibility of RNA conformation. In this study, we propose RiboDiffusion, a generative diffusion model for RNA inverse folding that can learn the conditional distribution of RNA sequences given 3D backbone structures. Our model consists of a graph neural network-based structure module and a Transformer-based sequence module, which iteratively transforms random sequences into desired sequences. By tuning the sampling weight, our model allows for a trade-off between sequence recovery and diversity to explore more candidates. We split test sets based on RNA clustering with different cut-offs for sequence or structure similarity. Our model outperforms baselines in sequence recovery, with an average relative improvement of $11\%$ for sequence similarity splits and $16\%$ for structure similarity splits. Moreover, RiboDiffusion performs consistently well across various RNA length categories and RNA types. We also apply in-silico folding to validate whether the generated sequences can fold into the given 3D RNA backbones. Our method could be a powerful tool for RNA design that explores the vast sequence space and finds novel solutions to 3D structural constraints. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: 15 pages

arXiv:2404.10718 [pdf, other]

GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

Authors: Zhi-Yi Lin, Jouh Yeong Chew, Jan van Gemert, Xucong Zhang

Abstract: We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at. Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets. In contrast, we investigate an end-to-end multi-person Gaze ta… ▽ More We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at. Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets. In contrast, we investigate an end-to-end multi-person Gaze target detection framework with Heads and Targets Association (GazeHTA), which predicts multiple head-target instances based solely on input scene image. GazeHTA addresses challenges in gaze target detection by (1) leveraging a pre-trained diffusion model to extract scene features for rich semantic understanding, (2) re-injecting a head feature to enhance the head priors for improved head understanding, and (3) learning a connection map as the explicit visual associations between heads and gaze targets. Our extensive experimental results demonstrate that GazeHTA outperforms state-of-the-art gaze target detection methods and two adapted diffusion-based baselines on two standard datasets. △ Less

Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10444 [pdf, other]

Semi-supervised Fréchet Regression

Authors: Rui Qiu, Zhou Yu, Zhenhua Lin

Abstract: This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervi… ▽ More This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervised Euclidean regression methods. We establish their convergence rates with limited labeled data and large amounts of unlabeled data, taking into account the low-dimensional manifold structure of the feature space. Through comprehensive simulations across diverse settings and applications to real data, we demonstrate the superior performance of our methods over their supervised counterparts. This study addresses existing research gaps and paves the way for further exploration and advancements in the field of semi-supervised Fréchet regression. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.10217 [pdf, other]

Protoplanetary Disk Polarization at Multiple Wavelengths: Are Dust Populations Diverse?

Authors: Rachel E. Harrison, Zhe-Yu Daniel Lin, Leslie W. Looney, Zhi-Yun Li, Haifeng Yang, Ian Stephens, Manuel Fernández-López

Abstract: Millimeter and sub-millimeter observations of continuum linear dust polarization provide insight into dust grain growth in protoplanetary disks, which are the progenitors of planetary systems. We present the results of the first survey of dust polarization in protoplanetary disks at 870 $μ$m and 3 mm. We find that protoplanetary disks in the same molecular cloud at similar evolutionary stages can… ▽ More Millimeter and sub-millimeter observations of continuum linear dust polarization provide insight into dust grain growth in protoplanetary disks, which are the progenitors of planetary systems. We present the results of the first survey of dust polarization in protoplanetary disks at 870 $μ$m and 3 mm. We find that protoplanetary disks in the same molecular cloud at similar evolutionary stages can exhibit different correlations between observing wavelength and polarization morphology and fraction. We explore possible origins for these differences in polarization, including differences in dust populations and protostar properties. For RY Tau and MWC 480, which are consistent with scattering at both wavelengths, we present models of the scattering polarization from several dust grain size distributions. These models aim to reproduce two features of the observational results for these disks: (1) both disks have an observable degree of polarization at both wavelengths and (2) the polarization fraction is higher at 3 mm than at 870 $μ$m in the centers of the disks. For both disks, these features can be reproduced by a power-law distribution of spherical dust grains with a maximum radius of 200 $μ$m and high optical depth. In MWC 480, we can also reproduce features (1) and (2) with a model containing large grains ($a_{max}$ = 490 $μ$m ) near the disk midplane and small grains ($a_{max}$ = 140 $μ$m) above and below the midplane. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 22 pages, 12 figures

arXiv:2404.09833 [pdf, other]

Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

Authors: Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF)… ▽ More Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF) module that effectively captures the geometry and visual appearance of the scene; (ii) a mesh module that distills the knowledge from NeRF for faster rendering; and (iii) a physics module that models the interactions and physical dynamics among the objects. By following the carefully designed pipeline, one can construct an interactable and actionable digital replica of the real world. We benchmark our system on both indoor and large-scale outdoor scenes. We show that we can not only produce highly-realistic renderings in real-time, but also build interactive games on top. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page (with code): https://video2game.github.io/

arXiv:2404.09780 [pdf, other]

Nuclear cluster structure effect in $^{16}$O+$^{16}$O collisions at the top RHIC energy

Authors: Xin-Li Zhao, Guo-Liang Ma, You Zhou, Zi-Wei Lin, Chao Zhang

Abstract: The impact of nuclear structure has garnered considerable attention in the high-energy nuclear physics community in recent years. This work focuses on studying the potential nuclear cluster structure in $^{16}\text{O}$ nuclei using anisotropic flow observables in $\rm O+O$ collisions at 200 GeV. Employing an improved AMPT model with various cluster structure configurations, we find that an extende… ▽ More The impact of nuclear structure has garnered considerable attention in the high-energy nuclear physics community in recent years. This work focuses on studying the potential nuclear cluster structure in $^{16}\text{O}$ nuclei using anisotropic flow observables in $\rm O+O$ collisions at 200 GeV. Employing an improved AMPT model with various cluster structure configurations, we find that an extended effective parton formation time is necessary to align with the recent STAR experimental data. In addition, we reveal that the presented flow observables serve as sensitive probes for differentiating configurations of $α$-clustering of $^{16}\text{O}$ nuclei. The systematic AMPT calculations presented in this paper, along with comprehensive comparisons to forthcoming experimental measurements at RHIC and the LHC, pave the way for a novel approach to investigate the $α$-clustering structure of $^{16}\text{O}$ nuclei using $\rm O+O$ collisions at the ultra-relativistic energies. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages, 11 figures

arXiv:2404.09777 [pdf, ps, other]

A $q$-analog of the Stirling-Eulerian Polynomials

Authors: Yao Dong, Zhicong Lin, Qiongqiong Pan

Abstract: In 1974, Carlitz and Scoville introduced the Stirling-Eulerian polynomial $A_n(x,y|α,β)$ as the enumerator of permutations by descents, ascents, left-to-right maxima and right-to-left maxima. Recently, Ji considered a refinement of $A_n(x,y|α,β)$, denoted $P_n(u_1,u_2,u_3,u_4|α,β)$, which is the enumerator of permutations by valleys, peaks, double ascents, double descents, left-to-right maxima and… ▽ More In 1974, Carlitz and Scoville introduced the Stirling-Eulerian polynomial $A_n(x,y|α,β)$ as the enumerator of permutations by descents, ascents, left-to-right maxima and right-to-left maxima. Recently, Ji considered a refinement of $A_n(x,y|α,β)$, denoted $P_n(u_1,u_2,u_3,u_4|α,β)$, which is the enumerator of permutations by valleys, peaks, double ascents, double descents, left-to-right maxima and right-to-left maxima. Using Chen's context-free grammar calculus, Ji proved a formula for the generating function of $P_n(u_1,u_2,u_3,u_4|α,β)$, generalizing the work of Carlitz and Scoville. Ji's formula has many nice consequences, one of which is an intriguing $γ$-positivity expansion for $A_n(x,y|α,β)$. In this paper, we prove a $q$-analog of Ji's formula by using Gessel's $q$-compositional formula and provide a combinatorial approach to her $γ$-positivity expansion of $A_n(x,y|α,β)$. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.09730 [pdf, other]

Convergence Analysis of Probability Flow ODE for Score-based Generative Models

Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Zhengjiang Lin

Abstract: Score-based generative models have emerged as a powerful approach for sampling high-dimensional probability distributions. Despite their effectiveness, their theoretical underpinnings remain relatively underdeveloped. In this work, we study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives. Assuming access to $L^2$-… ▽ More Score-based generative models have emerged as a powerful approach for sampling high-dimensional probability distributions. Despite their effectiveness, their theoretical underpinnings remain relatively underdeveloped. In this work, we study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives. Assuming access to $L^2$-accurate estimates of the score function, we prove the total variation between the target and the generated data distributions can be bounded above by $\mathcal{O}(d\sqrtδ)$ in the continuous time level, where $d$ denotes the data dimension and $δ$ represents the $L^2$-score matching error. For practical implementations using a $p$-th order Runge-Kutta integrator with step size $h$, we establish error bounds of $\mathcal{O}(d(\sqrtδ + (dh)^p))$ at the discrete level. Finally, we present numerical studies on problems up to $128$ dimensions to verify our theory, which indicate a better score matching error and dimension dependence. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 33 pages, 7 figures

arXiv:2404.08958 [pdf, other]

AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introduce a unified formulation to analyze CLIP-based few-shot learning methods from a perspective of logit bias, which encourages us to learn an effective logit bias for further improving performance of CLIP-based few-shot learning methods. To this end, we disassemble three key components involved in computation of logit bias (i.e., logit features, logit predictor, and logit fusion) and empirically analyze the effect on performance of few-shot classification. Based on analysis of key components, this paper proposes a novel AMU-Tuning method to learn effective logit bias for CLIP-based few-shot classification. Specifically, our AMU-Tuning predicts logit bias by exploiting the appropriate $\underline{\textbf{A}}$uxiliary features, which are fed into an efficient feature-initialized linear classifier with $\underline{\textbf{M}}$ulti-branch training. Finally, an $\underline{\textbf{U}}$ncertainty-based fusion is developed to incorporate logit bias into CLIP for few-shot classification. The experiments are conducted on several widely used benchmarks, and the results show AMU-Tuning clearly outperforms its counterparts while achieving state-of-the-art performance of CLIP-based few-shot learning without bells and whistles. △ Less

Submitted 13 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.08237 [pdf, other]

IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

Authors: Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Massimo Tistarelli, Zhe Jin

Abstract: Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision T… ▽ More Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision Transformer (IFViT), which consists of two primary modules. The first module, an interpretable dense registration module, establishes a Vision Transformer (ViT)-based Siamese Network to capture long-range dependencies and the global context in fingerprint pairs. It provides interpretable dense pixel-wise correspondences of feature points for fingerprint alignment and enhances the interpretability in the subsequent matching stage. The second module takes into account both local and global representations of the aligned fingerprint pair to achieve an interpretable fixed-length representation extraction and matching. It employs the ViTs trained in the first module with the additional fully connected layer and retrains them to simultaneously produce the discriminative fixed-length representation and interpretable dense pixel-wise correspondences of feature points. Extensive experimental results on diverse publicly available fingerprint databases demonstrate that the proposed framework not only exhibits superior performance on dense registration and matching but also significantly promotes the interpretability in deep fixed-length representations-based fingerprint matching. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

arXiv:2404.07965 [pdf, other]

Rho-1: Not All Tokens Are What You Need

Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights, we introduce a new language model called Rho-1. Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution. This approach involves scoring pretraining tokens using a reference model, and then training the language model with a focused loss on tokens with higher scores. When continual pretraining on 15B OpenWebMath corpus, Rho-1 yields an absolute improvement in few-shot accuracy of up to 30% in 9 math tasks. After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40.6% and 51.8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens. Furthermore, when pretraining on 80B general tokens, Rho-1 achieves 6.8% average enhancement across 15 diverse tasks, increasing both efficiency and performance of the language model pre-training. △ Less

Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: First two authors equal contribution

arXiv:2404.07805 [pdf, other]

Tensor Neural Network Interpolation and Its Applications

Authors: Yongxin Li, Zhongshuo Lin, Yifan Wang, Hehu Xie

Abstract: Based on tensor neural network, we propose an interpolation method for high dimensional non-tensor-product-type functions. This interpolation scheme is designed by using the tensor neural network based machine learning method. This means that we use a tensor neural network to approximate high dimensional functions which has no tensor product structure. In some sense, the non-tenor-product-type hig… ▽ More Based on tensor neural network, we propose an interpolation method for high dimensional non-tensor-product-type functions. This interpolation scheme is designed by using the tensor neural network based machine learning method. This means that we use a tensor neural network to approximate high dimensional functions which has no tensor product structure. In some sense, the non-tenor-product-type high dimensional function is transformed to the tensor neural network which has tensor product structure. It is well known that the tensor product structure can bring the possibility to design highly accurate and efficient numerical methods for dealing with high dimensional functions. In this paper, we will concentrate on computing the high dimensional integrations and solving high dimensional partial differential equations. The corresponding numerical methods and numerical examples will be provided to validate the proposed tensor neural network interpolation. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 14 pages, 2 figures. arXiv admin note: text overlap with arXiv:2402.00040, arXiv:2311.02732

MSC Class: 65N30; 65N25; 65L15; 65B99

arXiv:2404.06448 [pdf, other]

Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

Authors: Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang

Abstract: Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and commu… ▽ More Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and communication demands, makes it hard to apply to downstream tasks. More importantly, private edge servers often possess varying computing and network resources in real-world scenarios, introducing additional complexities to LLM fine-tuning. To tackle these problems, we design and implement an automated federated pipeline, named FedPipe, to fine-tune LLMs with minimal training cost but without adding any inference latency. FedPipe firstly identifies the weights to be fine-tuned based on their contributions to the LLM training. It then configures a low-rank adapter for each selected weight to train local low-rank adapters on an edge server, and aggregate local adapters of all edge servers to fine-tune the whole LLM. Finally, it appropriately quantizes the parameters of LLM to reduce memory space according to the requirements of edge servers. Extensive experiments demonstrate that FedPipe expedites the model training and achieves higher accuracy than state-of-the-art benchmarks. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 15 pages, 16 figures

Showing 101–150 of 2,239 results for author: Lin, Z