Skip to main content

Showing 101–150 of 2,239 results for author: Lin, Z

  1. arXiv:2405.06174  [pdf, other

    cond-mat.mes-hall

    Observation of a $p$-orbital higher-order topological insulator phase in puckered lattice acoustic metamaterials

    Authors: Bing-Quan Wu, Zhi-Kang Lin, Li-Wei Wang, Jian-Hua Jiang

    Abstract: The puckered lattice geometry, along with $p$-orbitals is often overlooked in the study of topological physics. Here, we investigate the higher-order topology of the $p_{x,y}$-orbital bands in acoustic metamaterials using a simplified two-dimensional phosphorene lattice which possesses a puckered structure. Notably, unlike the $s$-orbital bands in planar lattices, the unique higher-order topology… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted by Phys. Rev. B

  2. arXiv:2405.06170  [pdf

    cond-mat.mtrl-sci physics.optics

    Non-Hermitian topological phases and skin effects in kagome lattices

    Authors: Li-Wei Wang, Zhi-Kang Lin, Jian-Hua Jiang

    Abstract: Non-Hermitian physics has added new ingredients to topological physics, leading to the rising frontier of non-Hermitian topological phases. In this study, we investigate Chern insulator phases emerging from non-Hermitian kagome models with non-reciprocal and pure imaginary next-nearest neighbor hoppings. In the presence or absence of $C_3$ rotation symmetry, hybrid topological-skin effects are exp… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Journal ref: Phys. Rev. B 108, 195126 (2023)

  3. arXiv:2405.05975  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci physics.app-ph physics.optics quant-ph

    Deep-learning design of graphene metasurfaces for quantum control and Dirac electron holography

    Authors: Chen-Di Han, Li-Li Ye, Zin Lin, Vassilios Kovanis, Ying-Cheng Lai

    Abstract: Metasurfaces are sub-wavelength patterned layers for controlling waves in physical systems. In optics, meta-surfaces are created by materials with different dielectric constants and are capable of unconventional functionalities. We develop a deep-learning framework for Dirac-material metasurface design for controlling electronic waves. The metasurface is a configuration of circular graphene quantu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 13 pages, 9 figures

  4. arXiv:2405.05803  [pdf, other

    cs.CV cs.AI

    Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference

    Authors: Zhihang Lin, Mingbao Lin, Luxi Lin, Rongrong Ji

    Abstract: Multimodal large language models (MLLMs) demand considerable computations for inference due to the extensive parameters and the additional input tokens needed for visual information representation. Herein, we introduce Visual Tokens Withdrawal (VTW), a plug-and-play module to boost MLLMs for rapid inference. Our approach is inspired by two intriguing phenomena we have observed: (1) the attention s… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.05252  [pdf, other

    cs.CV cs.AI cs.LG eess.IV eess.SP

    Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models

    Authors: Hongjie Wang, Difan Liu, Yan Kang, Yijun Li, Zhe Lin, Niraj K. Jha, Yuchen Liu

    Abstract: Diffusion Models (DMs) have exhibited superior performance in generating high-quality and diverse images. However, this exceptional performance comes at the cost of expensive architectural design, particularly due to the attention module heavily used in leading models. Existing works mainly adopt a retraining process to enhance DM efficiency. This is computationally expensive and not very scalable… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024

  6. arXiv:2405.04342  [pdf, other

    cs.LG

    The Curse of Diversity in Ensemble-Based Exploration

    Authors: Zhixuan Lin, Pierluca D'Oro, Evgenii Nikishin, Aaron Courville

    Abstract: We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents -- a well-established exploration strategy -- can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated d… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICLR 2024

  7. arXiv:2405.04332  [pdf, other

    cs.CR

    WALLETRADAR: Towards Automating the Detection of Vulnerabilities in Browser-based Cryptocurrency Wallets

    Authors: Pengcheng Xia, Yanhui Guo, Zhaowen Lin, Jun Wu, Pengbo Duan, Ningyu He, Kailong Wang, Tianming Liu, Yinliang Yue, Guoai Xu, Haoyu Wang

    Abstract: Cryptocurrency wallets, acting as fundamental infrastructure to the blockchain ecosystem, have seen significant user growth, particularly among browser-based wallets (i.e., browser extensions). However, this expansion accompanies security challenges, making these wallets prime targets for malicious activities. Despite a substantial user base, there is not only a significant gap in comprehensive se… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Just accepted by the Automated Software Engineering Journal

  8. arXiv:2405.04269   

    stat.AP

    An Analysis of Sea Level Spatial Variability by Topological Indicators and $k$-means Clustering Algorithm

    Authors: Zixin Lin, Nur Fariha Syaqina Zulkepli, Mohd Shareduwan Mohd Kasihmuddin, R. U. Gobithaasan

    Abstract: The time-series data of sea level rise and fall contains crucial information on the variability of sea level patterns. Traditional $k$-means clustering is commonly used for categorizing regional variability of sea level, however, its results are not robust against a number of factors. This study analyzed fourteen datasets of monthly sea level in fourteen shoreline regions of Peninsular Malaysia. W… ▽ More

    Submitted 13 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: There are some mistakes in the submission, and it needs major revision

  9. arXiv:2405.04086  [pdf, other

    cs.CL

    Optimizing Language Model's Reasoning Abilities with Weak Supervision

    Authors: Yongqi Tong, Sizhe Wang, Dawei Li, Yifan Wang, Simeng Han, Zi Lin, Chengsong Huang, Jiaxin Huang, Jingbo Shang

    Abstract: While Large Language Models (LLMs) have demonstrated proficiency in handling complex queries, much of the past work has depended on extensively annotated datasets by human experts. However, this reliance on fully-supervised annotations poses scalability challenges, particularly as models and data requirements grow. To mitigate this, we explore the potential of enhancing LLMs' reasoning abilities w… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  10. arXiv:2405.03990  [pdf, other

    cs.NI cs.AI

    TrimCaching: Parameter-sharing AI Model Caching in Wireless Edge Networks

    Authors: Guanqiao Qu, Zheng Lin, Fangming Liu, Xianhao Chen, Kaibin Huang

    Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More

    Submitted 19 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 11 pages, 7 figures. This paper has been accepted by ICDCS 2024. The extended version of this paper is at arXiv:2404.14204

  11. arXiv:2405.03613  [pdf, other

    cs.CV

    Dual Relation Mining Network for Zero-Shot Learning

    Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  12. arXiv:2405.02057  [pdf

    cond-mat.mes-hall cond-mat.mtrl-sci physics.optics

    Probing fragile topology with a screw dislocation

    Authors: Ying Wu, Zhi-Kang Lin, Yating Yang, Zhida Song, Feng Li, Jian-Hua Jiang

    Abstract: Fragile topology, akin to twisted bilayer graphene and the exotic phases therein, is a notable topological class with intriguing properties. However, due to its unique nature and the lack of bulk-edge correspondence, the experimental signature of fragile topology has been under debated since its birth. Here, we demonstrate experimentally that fragile topological phases with filling anomaly can be… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Submitted to Science Bulletin

  13. arXiv:2405.01851  [pdf, other

    cs.LG cs.AI

    Deep Learning Inference on Heterogeneous Mobile Processors: Potentials and Pitfalls

    Authors: Sicong Liu, Wentao Zhou, Zimu Zhou, Bin Guo, Minfan Wang, Cheng Fang, Zheng Lin, Zhiwen Yu

    Abstract: There is a growing demand to deploy computation-intensive deep learning (DL) models on resource-constrained mobile devices for real-time intelligent applications. Equipped with a variety of processing units such as CPUs, GPUs, and NPUs, the mobile devices hold potential to accelerate DL inference via parallel execution across heterogeneous processors. Various efficient parallel methods have been e… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  14. arXiv:2405.00954  [pdf, other

    cs.CV

    X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation

    Authors: Yiwei Ma, Zhekai Lin, Jiayi Ji, Yijun Fan, Xiaoshuai Sun, Rongrong Ji

    Abstract: Recent advancements in automatic 3D avatar generation guided by text have made significant progress. However, existing methods have limitations such as oversaturation and low-quality output. To address these challenges, we propose X-Oscar, a progressive framework for generating high-quality animatable avatars from text prompts. It follows a sequential Geometry->Texture->Animation paradigm, simplif… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: ICML2024

  15. arXiv:2405.00700  [pdf

    cs.NE cond-mat.str-el

    Oxygen vacancies modulated VO2 for neurons and Spiking Neural Network construction

    Authors: Liang Li, Ting Zhou, Tong Liu, Zhiwei Liu, Yaping Li, Shuo Wu, Shanguang Zhao, Jinglin Zhu, Meiling Liu, Zhihan Lin, Bowen Sun, Jianjun Li, Fangwen Sun, Chongwen Zou

    Abstract: Artificial neuronal devices are the basic building blocks for neuromorphic computing systems, which have been motivated by realistic brain emulation. Aiming for these applications, various device concepts have been proposed to mimic the neuronal dynamics and functions. While till now, the artificial neuron devices with high efficiency, high stability and low power consumption are still far from pr… ▽ More

    Submitted 16 April, 2024; originally announced May 2024.

    Comments: 18 pages,4 figures

  16. arXiv:2404.19209  [pdf, other

    cs.DC

    AdaOper: Energy-efficient and Responsive Concurrent DNN Inference on Mobile Devices

    Authors: Zheng Lin, Bin Guo, Sicong Liu, Wentao Zhou, Yasan Ding, Yu Zhang, Zhiwen Yu

    Abstract: Deep neural network (DNN) has driven extensive applications in mobile technology. However, for long-running mobile apps like voice assistants or video applications on smartphones, energy efficiency is critical for battery-powered devices. The rise of heterogeneous processors in mobile devices today has introduced new challenges for optimizing energy efficiency. Our key insight is that partitioning… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  17. arXiv:2404.18829  [pdf, other

    nucl-th hep-ph nucl-ex

    Disentangling the development of collective flow in high energy proton proton collisions with a multiphase transport model

    Authors: Liang Zheng, Lian Liu, Zi-Wei Lin, Qi-Ye Shou, Zhong-Bao Yin

    Abstract: In this work, we investigate the collective flow development in high energy proton proton (pp) collisions with a multiphase transport model (AMPT) based on PYTHIA8 initial conditions with a sub-nucleon structure. It is found that the PYTHIA8 based AMPT model can reasonably describe both the charged hadron productions and elliptic flow experimental data measured in pp collisions at $\sqrt{s}=13$ Te… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

  18. arXiv:2404.18533  [pdf, other

    cs.AI cs.HC

    Evaluating Concept-based Explanations of Language Models: A Study on Faithfulness and Readability

    Authors: Meng Li, Haoran Jin, Ruixuan Huang, Zhihao Xu, Defu Lian, Zijia Lin, Di Zhang, Xiting Wang

    Abstract: Despite the surprisingly high intelligence exhibited by Large Language Models (LLMs), we are somehow intimidated to fully deploy them into real-life applications considering their black-box nature. Concept-based explanations arise as a promising avenue for explaining what the LLMs have learned, making them more transparent to humans. However, current evaluations for concepts tend to be heuristic a… ▽ More

    Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

  19. arXiv:2404.18173  [pdf, other

    math.ST

    Eigenvector overlaps in large sample covariance matrices and nonlinear shrinkage estimators

    Authors: Zeqin Lin, Guangming Pan

    Abstract: Consider a data matrix $Y = [\mathbf{y}_1, \cdots, \mathbf{y}_N]$ of size $M \times N$, where the columns are independent observations from a random vector $\mathbf{y}$ with zero mean and population covariance $Σ$. Let $\mathbf{u}_i$ and $\mathbf{v}_j$ denote the left and right singular vectors of $Y$, respectively. This study investigates the eigenvector/singular vector overlaps… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  20. arXiv:2404.17808  [pdf, other

    cs.CL

    Scaffold-BPE: Enhancing Byte Pair Encoding with Simple and Effective Scaffold Token Removal

    Authors: Haoran Lian, Yizhe Xiong, Jianwei Niu, Shasha Mo, Zhenpeng Su, Zijia Lin, Peng Liu, Hui Chen, Guiguang Ding

    Abstract: Byte Pair Encoding (BPE) serves as a foundation method for text tokenization in the Natural Language Processing (NLP) field. Despite its wide adoption, the original BPE algorithm harbors an inherent flaw: it inadvertently introduces a frequency imbalance for tokens in the text corpus. Since BPE iteratively merges the most frequent token pair in the text corpus while keeping all tokens that have be… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  21. arXiv:2404.17785  [pdf, other

    cs.CL

    Temporal Scaling Law for Large Language Models

    Authors: Yizhe Xiong, Xiansheng Chen, Xin Ye, Hui Chen, Zijia Lin, Haoran Lian, Zhenpeng Su, Jianwei Niu, Guiguang Ding

    Abstract: Recently, Large Language Models (LLMs) have been widely adopted in a wide range of tasks, leading to increasing attention towards the research on how scaling LLMs affects their performance. Existing works, termed Scaling Laws, have discovered that the final test loss of LLMs scales as power-laws with model size, computational budget, and dataset size. However, the temporal change of the test loss… ▽ More

    Submitted 16 June, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

    Comments: 8 pages, 3 figures; Under review

  22. arXiv:2404.17466  [pdf, other

    physics.comp-ph cs.LG physics.plasm-ph

    FTL: Transfer Learning Nonlinear Plasma Dynamic Transitions in Low Dimensional Embeddings via Deep Neural Networks

    Authors: Zhe Bai, Xishuo Wei, William Tang, Leonid Oliker, Zhihong Lin, Samuel Williams

    Abstract: Deep learning algorithms provide a new paradigm to study high-dimensional dynamical behaviors, such as those in fusion plasma systems. Development of novel model reduction methods, coupled with detection of abnormal modes with plasma physics, opens a unique opportunity for building efficient models to identify plasma instabilities for real-time control. Our Fusion Transfer Learning (FTL) model dem… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: 18 pages, 10 figures

    MSC Class: 76W05; 68T45 ACM Class: J.2; I.2.10

  23. arXiv:2404.16994  [pdf, other

    cs.CV

    PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning

    Authors: Lin Xu, Yilin Zhao, Daquan Zhou, Zhijie Lin, See Kiong Ng, Jiashi Feng

    Abstract: Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally large computational and data resources, which hinders the progress of video-language models. This paper investigates a straight-forward, highly efficient, and resource-light approach to adapting an existi… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  24. arXiv:2404.16811  [pdf, other

    cs.CL cs.AI

    Make Your LLM Fully Utilize the Context

    Authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou

    Abstract: While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on t… ▽ More

    Submitted 26 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 3 tables, 9 examples

  25. arXiv:2404.16575  [pdf, other

    hep-ph hep-ex hep-lat

    Probing the pole origin of $X(3872)$ with the coupled-channel dynamics

    Authors: Jun-Zhang Wang, Zi-Yang Lin, Yan-Ke Chen, Lu Meng, Shi-Lin Zhu

    Abstract: The $X(3872)$, as the first and the most crucial member in the exotic charmoniumlike $XYZ$ family, has been studied for a long time. However, its dynamical origin, whether stemming from a $D\bar{D}^*$ hadronic molecule or the first excited $P$-wave charmonium $χ_{c1}(2P)$, remains controversial. In this Letter, we demonstrate that the $X(3872)$ definitely does not result from the mass shift of the… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 11 pages, 5 figures

  26. arXiv:2404.16469  [pdf, ps, other

    cond-mat.supr-con

    From weak to strong-coupling superconductivity tuned by substrate in TiN films

    Authors: Yixin Liu, Zulei Xu, Aobo Yu, Xiaoni Wang, Wei Peng, Yu Wu, Gang Mu, Zhi-Rong Lin

    Abstract: The interplay between substrates and superconducting thin films has attracted increasing attention. Here, we report an in-depth investigation on superconducting properties of the epitaxial TiN thin films grown on two different substrates by dc reactive magnetron sputtering. The TiN films grown on (0001) sapphire exhibit (111) crystal orientation, while that grown on (100) Si substrates exhibit (10… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures

  27. arXiv:2404.15750  [pdf, other

    eess.SP

    A Reconfigurable Subarray Architecture and Hybrid Beamforming for Millimeter-Wave Dual-Function-Radar-Communication Systems

    Authors: Xin Jin, Tiejun Lv, Wei Ni, Zhipeng Lin, Qiuming Zhu, Ekram Hossain, H. Vincent Poor

    Abstract: Dual-function-radar-communication (DFRC) is a promising candidate technology for next-generation networks. By integrating hybrid analog-digital (HAD) beamforming into a multi-user millimeter-wave (mmWave) DFRC system, we design a new reconfigurable subarray (RS) architecture and jointly optimize the HAD beamforming to maximize the communication sum-rate and ensure a prescribed signal-to-clutter-pl… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: 14 pages, 9 figures, Accepted by IEEE TWC

  28. arXiv:2404.15701  [pdf, other

    astro-ph.GA

    USmorph: An Updated Framework of Automatic Classification of Galaxy Morphologies and Its Application to Galaxies in the COSMOS Field

    Authors: Jie Song, GuanWen Fang, Shuo Ba, Zesen Lin, Yizhou Gu, Chichun Zhou, Tao Wang, Cai-Na Hao, Guilin Liu, Hongxin Zhang, Yao Yao, Xu Kong

    Abstract: Morphological classification conveys abundant information on the formation, evolution, and environment of galaxies. In this work, we refine the two-step galaxy morphological classification framework ({\tt\string USmorph}), which employs a combination of unsupervised machine learning (UML) and supervised machine learning (SML) techniques, along with a self-consistent and robust data preprocessing s… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by ApJS, 16 pages, 12 figures

  29. arXiv:2404.15141  [pdf, other

    cs.CV cs.AI

    CutDiffusion: A Simple, Fast, Cheap, and Strong Diffusion Extrapolation Method

    Authors: Mingbao Lin, Zhihang Lin, Wengyi Zhan, Liujuan Cao, Rongrong Ji

    Abstract: Transforming large pre-trained low-resolution diffusion models to cater to higher-resolution demands, i.e., diffusion extrapolation, significantly improves diffusion adaptability. We propose tuning-free CutDiffusion, aimed at simplifying and accelerating the diffusion extrapolation process, making it more affordable and improving performance. CutDiffusion abides by the existing patch-wise extrapol… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  30. arXiv:2404.14978  [pdf, ps, other

    math.PR math.FA

    A Law of large numbers for vector-valued linear statistics of Bergman DPP

    Authors: Zhaofeng Lin, Yanqi Qiu, Kai Wang

    Abstract: We establish a law of large numbers for a certain class of vector-valued linear statistics for the Bergman determinantal point process on the unit disk. Our result seems to be the first LLN for vector-valued linear statistics in the setting of determinantal point processes. As an application, we prove that, for almost all configurations $X$ with respect to with respect to the Bergman determinantal… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 19 pages

  31. arXiv:2404.14663  [pdf, other

    astro-ph.GA astro-ph.CO astro-ph.IM astro-ph.SR gr-qc

    VLBI with SKA: Possible Arrays and Astrometric Science

    Authors: Yingjie Li, Ye Xu, Jingjing Li, Shuaibo Bian, Zehao Lin, Chaojie Hao, Dejian Liu

    Abstract: The next generation of very long baseline interferometry (VLBI) is stepping into the era of microarcsecond ($μ$as) astronomy, and pushing astronomy, especially astrometry, to new heights. VLBI with the Square Kilometre Array (SKA), SKA-VLBI, will increase current sensitivity by an order of magnitude, and reach astrometric precision routinely below 10 $μ$as, even challenging 1 $μ$as. This advanceme… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 41 pages, 12 figures, 4 tables. Accepted to RAA (Review)

  32. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  33. arXiv:2404.14204  [pdf, other

    cs.NI

    TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading

    Authors: Guanqiao Qu, Zheng Lin, Qian Chen, Jian Li, Fangming Liu, Xianhao Chen, Kaibin Huang

    Abstract: Next-generation mobile networks are expected to facilitate fast AI model downloading to end users. By caching models on edge servers, mobile networks can deliver models to end users with low latency, resulting in a paradigm called edge model caching. In this paper, we develop a novel model placement scheme, called parameter-sharing model caching (TrimCaching). TrimCaching exploits the key observat… ▽ More

    Submitted 12 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 15 pages, 11 figures. Part of this work has been accepted by ICDCS 2024

  34. arXiv:2404.13931  [pdf, other

    math.DS

    Polynomial effective density in quotient of $\mathrm{SL}_2(\mathbb{Q}_p) \times \mathrm{SL}_2(\mathbb{Q}_p)$

    Authors: Zuo Lin

    Abstract: We prove an effective density theorem with polynomial error rate for orbits of upper triangular subgroup of $\mathrm{SL}_2(\mathbb{Q}_p)$ in $\mathrm{SL}_2(\mathbb{Q}_p) \times \mathrm{SL}_2(\mathbb{Q}_p)$ for prime number $p > 3$. The proof is based on the use of Margulis function, a restricted projection theorem on $\mathbb{Q}_p^3$, and spectral gap of the ambient space.

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 39 pages

    MSC Class: 37A17; 37A25

  35. arXiv:2404.12767  [pdf, other

    cond-mat.supr-con physics.app-ph

    On the Path to High-temperature Josephson Multi-junction Devices

    Authors: Xu Wang, Fucong Chen, Zefeng Lin, Changhong Yuan, Shibing Tian, Chunguang Li, Victor Kornev, Nikolay Kolotinskiy

    Abstract: We report our progress in the high-temperature superconductor (HTS) Josephson junction fabrication process founded on using a focused helium ion beam damaging technique and discuss the expected device performance attainable with the HTS multi-junction device technology. Both the achievable high value of characteristic voltage $V_c=I_cR_N$ of Josephson junctions and the ability to design a large nu… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 6 pages, 5 figures; submitted to EM Science

  36. arXiv:2404.12674  [pdf, other

    cs.DC cs.LG cs.PF

    Towards Universal Performance Modeling for Machine Learning Training on Multi-GPU Platforms

    Authors: Zhongyi Lin, Ning Sun, Pallab Bhattacharya, Xizhou Feng, Louis Feng, John D. Owens

    Abstract: Characterizing and predicting the training performance of modern machine learning (ML) workloads on compute systems with compute and communication spread between CPUs, GPUs, and network devices is not only the key to optimization and planning but also a complex goal to achieve. The primary challenges include the complexity of synchronization and load balancing between CPUs and GPUs, the variance i… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

    Comments: 12 pages, 11 figures, 4 tables

  37. arXiv:2404.11677  [pdf, other

    cs.AI

    Cross-Problem Learning for Solving Vehicle Routing Problems

    Authors: Zhuoyi Lin, Yaoxin Wu, Bangjian Zhou, Zhiguang Cao, Wen Song, Yingqian Zhang, Senthilnath Jayavelu

    Abstract: Existing neural heuristics often train a deep architecture from scratch for each specific vehicle routing problem (VRP), ignoring the transferable knowledge across different VRP variants. This paper proposes the cross-problem learning to assist heuristics training for different downstream VRP variants. Particularly, we modularize neural architectures for complex VRPs into 1) the backbone Transform… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI'24

  38. arXiv:2404.11199  [pdf, other

    q-bio.BM

    RiboDiffusion: Tertiary Structure-based RNA Inverse Folding with Generative Diffusion Models

    Authors: Han Huang, Ziqian Lin, Dongchen He, Liang Hong, Yu Li

    Abstract: RNA design shows growing applications in synthetic biology and therapeutics, driven by the crucial role of RNA in various biological processes. A fundamental challenge is to find functional RNA sequences that satisfy given structural constraints, known as the inverse folding problem. Computational approaches have emerged to address this problem based on secondary structures. However, designing RNA… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 15 pages

  39. arXiv:2404.10718  [pdf, other

    cs.CV

    GazeHTA: End-to-end Gaze Target Detection with Head-Target Association

    Authors: Zhi-Yi Lin, Jouh Yeong Chew, Jan van Gemert, Xucong Zhang

    Abstract: We propose an end-to-end approach for gaze target detection: predicting a head-target connection between individuals and the target image regions they are looking at. Most of the existing methods use independent components such as off-the-shelf head detectors or have problems in establishing associations between heads and gaze targets. In contrast, we investigate an end-to-end multi-person Gaze ta… ▽ More

    Submitted 18 April, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  40. arXiv:2404.10444  [pdf, other

    math.ST cs.LG stat.ML

    Semi-supervised Fréchet Regression

    Authors: Rui Qiu, Zhou Yu, Zhenhua Lin

    Abstract: This paper explores the field of semi-supervised Fréchet regression, driven by the significant costs associated with obtaining non-Euclidean labels. Methodologically, we propose two novel methods: semi-supervised NW Fréchet regression and semi-supervised kNN Fréchet regression, both based on graph distance acquired from all feature instances. These methods extend the scope of existing semi-supervi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  41. arXiv:2404.10217  [pdf, other

    astro-ph.SR astro-ph.EP

    Protoplanetary Disk Polarization at Multiple Wavelengths: Are Dust Populations Diverse?

    Authors: Rachel E. Harrison, Zhe-Yu Daniel Lin, Leslie W. Looney, Zhi-Yun Li, Haifeng Yang, Ian Stephens, Manuel Fernández-López

    Abstract: Millimeter and sub-millimeter observations of continuum linear dust polarization provide insight into dust grain growth in protoplanetary disks, which are the progenitors of planetary systems. We present the results of the first survey of dust polarization in protoplanetary disks at 870 $μ$m and 3 mm. We find that protoplanetary disks in the same molecular cloud at similar evolutionary stages can… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 22 pages, 12 figures

  42. arXiv:2404.09833  [pdf, other

    cs.CV cs.AI

    Video2Game: Real-time, Interactive, Realistic and Browser-Compatible Environment from a Single Video

    Authors: Hongchi Xia, Zhi-Hao Lin, Wei-Chiu Ma, Shenlong Wang

    Abstract: Creating high-quality and interactive virtual environments, such as games and simulators, often involves complex and costly manual modeling processes. In this paper, we present Video2Game, a novel approach that automatically converts videos of real-world scenes into realistic and interactive game environments. At the heart of our system are three core components:(i) a neural radiance fields (NeRF)… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: CVPR 2024. Project page (with code): https://video2game.github.io/

  43. arXiv:2404.09780  [pdf, other

    nucl-th hep-ph

    Nuclear cluster structure effect in $^{16}$O+$^{16}$O collisions at the top RHIC energy

    Authors: Xin-Li Zhao, Guo-Liang Ma, You Zhou, Zi-Wei Lin, Chao Zhang

    Abstract: The impact of nuclear structure has garnered considerable attention in the high-energy nuclear physics community in recent years. This work focuses on studying the potential nuclear cluster structure in $^{16}\text{O}$ nuclei using anisotropic flow observables in $\rm O+O$ collisions at 200 GeV. Employing an improved AMPT model with various cluster structure configurations, we find that an extende… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 10 pages, 11 figures

  44. arXiv:2404.09777  [pdf, ps, other

    math.CO

    A $q$-analog of the Stirling-Eulerian Polynomials

    Authors: Yao Dong, Zhicong Lin, Qiongqiong Pan

    Abstract: In 1974, Carlitz and Scoville introduced the Stirling-Eulerian polynomial $A_n(x,y|α,β)$ as the enumerator of permutations by descents, ascents, left-to-right maxima and right-to-left maxima. Recently, Ji considered a refinement of $A_n(x,y|α,β)$, denoted $P_n(u_1,u_2,u_3,u_4|α,β)$, which is the enumerator of permutations by valleys, peaks, double ascents, double descents, left-to-right maxima and… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  45. arXiv:2404.09730  [pdf, other

    cs.LG math.CA math.NA

    Convergence Analysis of Probability Flow ODE for Score-based Generative Models

    Authors: Daniel Zhengyu Huang, Jiaoyang Huang, Zhengjiang Lin

    Abstract: Score-based generative models have emerged as a powerful approach for sampling high-dimensional probability distributions. Despite their effectiveness, their theoretical underpinnings remain relatively underdeveloped. In this work, we study the convergence properties of deterministic samplers based on probability flow ODEs from both theoretical and numerical perspectives. Assuming access to $L^2$-… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 33 pages, 7 figures

  46. arXiv:2404.08958  [pdf, other

    cs.CV cs.CL cs.LG

    AMU-Tuning: Effective Logit Bias for CLIP-based Few-shot Learning

    Authors: Yuwei Tang, Zhenyi Lin, Qilong Wang, Pengfei Zhu, Qinghua Hu

    Abstract: Recently, pre-trained vision-language models (e.g., CLIP) have shown great potential in few-shot learning and attracted a lot of research interest. Although efforts have been made to improve few-shot ability of CLIP, key factors on the effectiveness of existing methods have not been well studied, limiting further exploration of CLIP's potential in few-shot learning. In this paper, we first introdu… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  47. arXiv:2404.08237  [pdf, other

    cs.CV cs.AI

    IFViT: Interpretable Fixed-Length Representation for Fingerprint Matching via Vision Transformer

    Authors: Yuhang Qiu, Honghui Chen, Xingbo Dong, Zheng Lin, Iman Yi Liao, Massimo Tistarelli, Zhe Jin

    Abstract: Determining dense feature points on fingerprints used in constructing deep fixed-length representations for accurate matching, particularly at the pixel level, is of significant interest. To explore the interpretability of fingerprint matching, we propose a multi-stage interpretable fingerprint matching network, namely Interpretable Fixed-length Representation for Fingerprint Matching via Vision T… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: ready to submit to IEEE Transactions on Information Forensics and Security (TIFS)

  48. arXiv:2404.07965  [pdf, other

    cs.CL cs.AI

    Rho-1: Not All Tokens Are What You Need

    Authors: Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

    Abstract: Previous language model pre-training methods have uniformly applied a next-token prediction loss to all training tokens. Challenging this norm, we posit that ''Not all tokens in a corpus are equally important for language model training''. Our initial analysis examines token-level training dynamics of language model, revealing distinct loss patterns for different tokens. Leveraging these insights,… ▽ More

    Submitted 23 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    Comments: First two authors equal contribution

  49. arXiv:2404.07805  [pdf, other

    math.NA

    Tensor Neural Network Interpolation and Its Applications

    Authors: Yongxin Li, Zhongshuo Lin, Yifan Wang, Hehu Xie

    Abstract: Based on tensor neural network, we propose an interpolation method for high dimensional non-tensor-product-type functions. This interpolation scheme is designed by using the tensor neural network based machine learning method. This means that we use a tensor neural network to approximate high dimensional functions which has no tensor product structure. In some sense, the non-tenor-product-type hig… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 14 pages, 2 figures. arXiv admin note: text overlap with arXiv:2402.00040, arXiv:2311.02732

    MSC Class: 65N30; 65N25; 65L15; 65B99

  50. arXiv:2404.06448  [pdf, other

    cs.LG cs.AI

    Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models

    Authors: Zihan Fang, Zheng Lin, Zhe Chen, Xianhao Chen, Yue Gao, Yuguang Fang

    Abstract: Recently, there has been a surge in the development of advanced intelligent generative content (AIGC), especially large language models (LLMs). However, for many downstream tasks, it is necessary to fine-tune LLMs using private data. While federated learning offers a promising privacy-preserving solution to LLM fine-tuning, the substantial size of an LLM, combined with high computational and commu… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 15 pages, 16 figures