subscribe to arXiv mailings

SOFA: A Compute-Memory Optimized Sparsity Accelerator via Cross-Stage Coordinated Tiling

Authors: Huizheng Wang, Jiahao Fang, Xinru Tang, Zhiheng Yue, Jinxi Li, Yubin Qin, Sihan Guan, Qize Yang, Yang Wang, Chao Li, Yang Hu, Shouyi Yin

Abstract: Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively ha… ▽ More Benefiting from the self-attention mechanism, Transformer models have attained impressive contextual comprehension capabilities for lengthy texts. The requirements of high-throughput inference arise as the large language models (LLMs) become increasingly prevalent, which calls for large-scale token parallel processing (LTPP). However, existing dynamic sparse accelerators struggle to effectively handle LTPP, as they solely focus on separate stage optimization, and with most efforts confined to computational enhancements. By re-examining the end-to-end flow of dynamic sparse acceleration, we pinpoint an ever-overlooked opportunity that the LTPP can exploit the intrinsic coordination among stages to avoid excessive memory access and redundant computation. Motivated by our observation, we present SOFA, a cross-stage compute-memory efficient algorithm-hardware co-design, which is tailored to tackle the challenges posed by LTPP of Transformer inference effectively. We first propose a novel leading zero computing paradigm, which predicts attention sparsity by using log-based add-only operations to avoid the significant overhead of prediction. Then, a distributed sorting and a sorted updating FlashAttention mechanism are proposed with a cross-stage coordinated tiling principle, which enables fine-grained and lightweight coordination among stages, helping optimize memory access and latency. Further, we propose a SOFA accelerator to support these optimizations efficiently. Extensive experiments on 20 benchmarks show that SOFA achieves $9.5\times$ speed up and $71.5\times$ higher energy efficiency than Nvidia A100 GPU. Compared to 8 SOTA accelerators, SOFA achieves an average $15.8\times$ energy efficiency, $10.3\times$ area efficiency and $9.3\times$ speed up, respectively. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.07896 [pdf, other]

Pentagonal Photonic Crystal Mirrors: Scalable Lightsails with Enhanced Acceleration via Neural Topology Optimization

Authors: L. Norder, S. Yin, M. J. de Jong, F. Stallone, H. Aydogmus, P. M. Sberna, M. A. Bessa, R. A. Norte

Abstract: The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engine… ▽ More The Starshot Breakthrough Initiative aims to send one-gram microchip probes to Alpha Centauri within 20 years, using gram-scale lightsails propelled by laser-based radiation pressure, reaching velocities nearing a fifth of light speed. This mission requires lightsail materials that challenge the fundamentals of nanotechnology, requiring innovations in optics, material science and structural engineering. Unlike the microchip payload, which must be minimized in every dimension, such lightsails need meter-scale dimensions with nanoscale thickness and billions of nanoscale holes to enhance reflectivity and reduce mass. Our study employs neural topology optimization, revealing a novel pentagonal lattice-based photonic crystal (PhC) reflector. The optimized designs shorten acceleration times, therefore lowering launch costs significantly. Crucially, these designs also enable lightsail material fabrication with orders-of-magnitude reduction in costs. We have fabricated a 60 x 60 mm$^2$, 200nm thick, single-layer reflector perforated with over a billion nanoscale features; the highest aspect-ratio nanophotonic element to date. We achieve this with nearly 9,000 times cost reduction per m$^2$. Starshot lightsails will have several stringent requirements but will ultimately be driven by costs to build at scale. Here we highlight challenges and possible solutions in developing lightsail materials - showcasing the potential of scaling nanophotonics for cost-effective next-generation space exploration. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.03966 [pdf, other]

Serialized Output Training by Learned Dominance

Authors: Ying Shi, Lantian Li, Shi Yin, Dong Wang, Jiqing Han

Abstract: Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization st… ▽ More Serialized Output Training (SOT) has showcased state-of-the-art performance in multi-talker speech recognition by sequentially decoding the speech of individual speakers. To address the challenging label-permutation issue, prior methods have relied on either the Permutation Invariant Training (PIT) or the time-based First-In-First-Out (FIFO) rule. This study presents a model-based serialization strategy that incorporates an auxiliary module into the Attention Encoder-Decoder architecture, autonomously identifying the crucial factors to order the output sequence of the speech components in multi-talker speech. Experiments conducted on the LibriSpeech and LibriMix databases reveal that our approach significantly outperforms the PIT and FIFO baselines in both 2-mix and 3-mix scenarios. Further analysis shows that the serialization module identifies dominant speech components in a mixture by factors including loudness and gender, and orders speech components based on the dominance score. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: accepted by INTERSPEECH 2024

arXiv:2406.10635 [pdf, other]

ROSfs: A User-Level File System for ROS

Authors: Zijun Xu, Xuanjun Wen, Yanjie Song, Shu Yin

Abstract: We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We impleme… ▽ More We present ROSfs, a novel user-level file system for the Robot Operating System (ROS). ROSfs interprets a robot file as a group of sub-files, with each having a distinct label. ROSfs applies a time index structure to enhance the flexible data query while the data file is under modification. It provides multi-robot systems (MRS) with prompt cross-robot data acquisition and collaboration. We implemented a ROSfs prototype and integrated it into a mainstream ROS platform. We then applied and evaluated ROSfs on real-world UAVs and data servers. Evaluation results show that compared with traditional ROS storage methods, ROSfs improves the offline query performance by up to 129x and reduces inter-robot online data query latency under a wireless network by up to 7x. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.08148 [pdf, other]

Probing Implicit Bias in Semi-gradient Q-learning: Visualizing the Effective Loss Landscapes via the Fokker--Planck Equation

Authors: Shuyu Yin, Fei Wen, Peilin Liu, Tao Luo

Abstract: Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualizatio… ▽ More Semi-gradient Q-learning is applied in many fields, but due to the absence of an explicit loss function, studying its dynamics and implicit bias in the parameter space is challenging. This paper introduces the Fokker--Planck equation and employs partial data obtained through sampling to construct and visualize the effective loss landscape within a two-dimensional parameter space. This visualization reveals how the global minima in the loss landscape can transform into saddle points in the effective loss landscape, as well as the implicit bias of the semi-gradient method. Additionally, we demonstrate that saddle points, originating from the global minima in loss landscape, still exist in the effective loss landscape under high-dimensional parameter spaces and neural network settings. This paper develop a novel approach for probing implicit bias in semi-gradient Q-learning. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07421 [pdf, other]

A Comprehensive Investigation on Speaker Augmentation for Speaker Recognition

Authors: Zhenyu Zhou, Shibiao Xu, Shi Yin, Lantian Li, Dong Wang

Abstract: Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this stu… ▽ More Data augmentation (DA) has played a pivotal role in the success of deep speaker recognition. Current DA techniques primarily focus on speaker-preserving augmentation, which does not change the speaker trait of the speech and does not create new speakers. Recent research has shed light on the potential of speaker augmentation, which generates new speakers to enrich the training dataset. In this study, we delve into two speaker augmentation approaches: speed perturbation (SP) and vocal tract length perturbation (VTLP). Despite the empirical utilization of both methods, a comprehensive investigation into their efficacy is lacking. Our study, conducted using two public datasets, VoxCeleb and CN-Celeb, revealed that both SP and VTLP are proficient at generating new speakers, leading to significant performance improvements in speaker recognition. Furthermore, they exhibit distinct properties in sensitivity to perturbation factors and data complexity, hinting at the potential benefits of their fusion. Our research underscores the substantial potential of speaker augmentation, highlighting the importance of in-depth exploration and analysis. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: to be published in INTERSPEECH 2024

arXiv:2406.03868 [pdf, other]

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Authors: Jiahao Fang, Huizheng Wang, Qize Yang, Dehao Kong, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin

Abstract: Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex… ▽ More Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often incorporate numerous cores or tiles even extending to wafer-scale, substantial on-chip bandwidth, and distributed memory systems. This results in an exceedingly complex design space. Moreover, conducting actual training experiments to find optimal configurations is impractical due to time constraints. Hence, predicting the optimal mapping of various parallelisms to such tiled system architectures becomes crucial. In this study, leveraging an analysis of existing mainstream DL model training strategies, we introduce a performance simulator named PALM. PALM targets both the training and inference processes for tiled accelerators, aiming to inspire the design of current and future accelerators. Specifically, (i) we establish a scheduling mechanism among tiled accelerators based on an event-driven framework; (ii) we support user-configurable pipeline, tensor, and data parallelism on tiled accelerators, determining the absolute performance throughput under these parallelism strategies; (iii) we model the interaction of on-chip SRAM, NoC, and off-chip DRAM during operator execution. This work is available here: https://github.com/fangjh21/PALM. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages

arXiv:2405.18132 [pdf, other]

EG4D: Explicit Generation of 4D Object without Score Distillation

Authors: Qi Sun, Zhiyang Guo, Ziyu Wan, Jing Nathan Yan, Shengming Yin, Wengang Zhou, Jing Liao, Houqiang Li

Abstract: In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and… ▽ More In recent years, the increasing demand for dynamic 3D assets in design and gaming applications has given rise to powerful generative pipelines capable of synthesizing high-quality 4D objects. Previous methods generally rely on score distillation sampling (SDS) algorithm to infer the unseen views and motion of 4D objects, thus leading to unsatisfactory results with defects like over-saturation and Janus problem. Therefore, inspired by recent progress of video diffusion models, we propose to optimize a 4D representation by explicitly generating multi-view videos from one input image. However, it is far from trivial to handle practical challenges faced by such a pipeline, including dramatic temporal inconsistency, inter-frame geometry and texture diversity, and semantic defects brought by video generation results. To address these issues, we propose DG4D, a novel multi-stage framework that generates high-quality and consistent 4D assets without score distillation. Specifically, collaborative techniques and solutions are developed, including an attention injection strategy to synthesize temporal-consistent multi-view videos, a robust and efficient dynamic reconstruction method based on Gaussian Splatting, and a refinement stage with diffusion prior for semantic restoration. The qualitative results and user preference study demonstrate that our framework outperforms the baselines in generation quality by a considerable margin. Code will be released at \url{https://github.com/jasongzy/EG4D}. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17221 [pdf, other]

Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture

Authors: Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin

Abstract: Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticat… ▽ More Given the increasing complexity of AI applications, traditional spatial architectures frequently fall short. Our analysis identifies a pattern of interconnected, multi-faceted tasks encompassing both AI and general computational processes. In response, we have conceptualized "Orchestrated AI Workflows," an approach that integrates various tasks with logic-driven decisions into dynamic, sophisticated workflows. Specifically, we find that the intrinsic Dual Dynamicity of Orchestrated AI Workflows, namely dynamic execution times and frequencies of Task Blocks, can be effectively represented using the Orchestrated Workflow Graph. Furthermore, the intrinsic Dual Dynamicity poses challenges to existing spatial architecture, namely Indiscriminate Resource Allocation, Reactive Load Rebalancing, and Contagious PEA Idleness. To overcome these challenges, we present Octopus, a scale-out spatial architecture and a suite of advanced scheduling strategies optimized for executing Orchestrated AI Workflows, such as the Discriminate Dual-Scheduling Mechanism, Adaptive TBU Scheduling Strategy, and Proactive Cluster Scheduling Strategy. Our evaluations demonstrate that Octopus significantly outperforms traditional architectures in handling the dynamic demands of Orchestrated AI Workflows, and possesses robust scalability in large scale hardware such as wafer-scale chip. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.15223 [pdf, other]

iVideoGPT: Interactive VideoGPTs are Scalable World Models

Authors: Jialong Wu, Shaofeng Yin, Ningya Feng, Xu He, Dong Li, Jianye Hao, Mingsheng Long

Abstract: World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer fram… ▽ More World models empower model-based agents to interactively explore, reason, and plan within imagined environments for real-world decision-making. However, the high demand for interactivity poses challenges in harnessing recent advancements in video generative models for developing world models at scale. This work introduces Interactive VideoGPT (iVideoGPT), a scalable autoregressive transformer framework that integrates multimodal signals--visual observations, actions, and rewards--into a sequence of tokens, facilitating an interactive experience of agents via next-token prediction. iVideoGPT features a novel compressive tokenization technique that efficiently discretizes high-dimensional visual observations. Leveraging its scalable architecture, we are able to pre-train iVideoGPT on millions of human and robotic manipulation trajectories, establishing a versatile foundation that is adaptable to serve as interactive world models for a wide range of downstream tasks. These include action-conditioned video prediction, visual planning, and model-based reinforcement learning, where iVideoGPT achieves competitive performance compared with state-of-the-art methods. Our work advances the development of interactive general world models, bridging the gap between generative video models and practical model-based reinforcement learning applications. △ Less

Submitted 2 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: Project website: https://thuml.github.io/iVideoGPT

arXiv:2405.10463 [pdf, other]

Single-shot volumetric fluorescence imaging with neural fields

Authors: Oumeng Zhang, Haowen Zhou, Brandon Y. Feng, Elin M. Larsson, Reinaldo E. Alcalde, Siyuan Yin, Catherine Deng, Changhuei Yang

Abstract: Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, el… ▽ More Single-shot volumetric fluorescence (SVF) imaging offers a significant advantage over traditional imaging methods that require scanning across multiple axial planes as it can capture biological processes with high temporal resolution across a large field of view. The key challenges in SVF imaging include requiring sparsity constraints to meet the multiplexing requirements of compressed sensing, eliminating depth ambiguity in the reconstruction, and maintaining high resolution across a large field of view. In this paper, we introduce the QuadraPol point spread function (PSF) combined with neural fields, a novel approach for SVF imaging. This method utilizes a custom polarizer at the back focal plane and a polarization camera to detect fluorescence, effectively encoding the 3D scene within a compact PSF without depth ambiguity. Additionally, we propose a reconstruction algorithm based on the neural fields technique that provides improved reconstruction quality and addresses the inaccuracies of phase retrieval methods used to correct imaging system aberrations. This algorithm combines the accuracy of experimental PSFs with the long depth of field of computationally generated retrieved PSFs. QuadraPol PSF, combined with neural fields, significantly reduces the acquisition time of a conventional fluorescence microscope by approximately 20 times and captures a 100 mm$^3$ cubic volume in one shot. We validate the effectiveness of both our hardware and algorithm through all-in-focus imaging of bacterial colonies on sand surfaces and visualization of plant root morphology. Our approach offers a powerful tool for advancing biological research and ecological studies. △ Less

Submitted 4 June, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.07551 [pdf, other]

MuMath-Code: Combining Tool-Use Large Language Models with Multi-perspective Data Augmentation for Mathematical Reasoning

Authors: Shuo Yin, Weihao You, Zhilong Ji, Guoqiang Zhong, Jinfeng Bai

Abstract: The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly in… ▽ More The tool-use Large Language Models (LLMs) that integrate with external Python interpreters have significantly enhanced mathematical reasoning capabilities for open-source LLMs, while tool-free methods chose another track: augmenting math reasoning data. However, a great method to integrate the above two research paths and combine their advantages remains to be explored. In this work, we firstly include new math questions via multi-perspective data augmenting methods and then synthesize code-nested solutions to them. The open LLMs (i.e., Llama-2) are finetuned on the augmented dataset to get the resulting models, MuMath-Code ($μ$-Math-Code). During the inference phase, our MuMath-Code generates code and interacts with the external python interpreter to get the execution results. Therefore, MuMath-Code leverages the advantages of both the external tool and data augmentation. To fully leverage the advantages of our augmented data, we propose a two-stage training strategy: In Stage-1, we finetune Llama-2 on pure CoT data to get an intermediate model, which then is trained on the code-nested data in Stage-2 to get the resulting MuMath-Code. Our MuMath-Code-7B achieves 83.8 on GSM8K and 52.4 on MATH, while MuMath-Code-70B model achieves new state-of-the-art performance among open methods -- achieving 90.7% on GSM8K and 55.1% on MATH. Extensive experiments validate the combination of tool use and data augmentation, as well as our two-stage training strategy. We release the proposed dataset along with the associated code for public use. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: The state-of-the-art open-source tool-use LLMs for mathematical reasoning

arXiv:2405.06887 [pdf, other]

FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment

Authors: Jinglin Xu, Sibo Yin, Guohao Zhao, Zishuo Wang, Yuxin Peng

Abstract: Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requi… ▽ More Existing action quality assessment (AQA) methods mainly learn deep representations at the video level for scoring diverse actions. Due to the lack of a fine-grained understanding of actions in videos, they harshly suffer from low credibility and interpretability, thus insufficient for stringent applications, such as Olympic diving events. We argue that a fine-grained understanding of actions requires the model to perceive and parse actions in both time and space, which is also the key to the credibility and interpretability of the AQA technique. Based on this insight, we propose a new fine-grained spatial-temporal action parser named \textbf{FineParser}. It learns human-centric foreground action representations by focusing on target action regions within each frame and exploiting their fine-grained alignments in time and space to minimize the impact of invalid backgrounds during the assessment. In addition, we construct fine-grained annotations of human-centric foreground action masks for the FineDiving dataset, called \textbf{FineDiving-HM}. With refined annotations on diverse target action procedures, FineDiving-HM can promote the development of real-world AQA systems. Through extensive experiments, we demonstrate the effectiveness of FineParser, which outperforms state-of-the-art methods while supporting more tasks of fine-grained action understanding. Data and code are available at \url{https://github.com/PKU-ICST-MIPL/FineParser_CVPR2024}. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR 2024

arXiv:2405.05722 [pdf, other]

A Framework of SO(3)-equivariant Non-linear Representation Learning and its Application to Electronic-Structure Hamiltonian Prediction

Authors: Shi Yin, Xinyang Pan, Fengyan Wang, Feng Wu, Lixin He

Abstract: We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and… ▽ More We present both a theoretical and a methodological framework that addresses a critical challenge in applying deep learning to physical systems: the reconciliation of non-linear expressiveness with SO(3)-equivariance in predictions of SO(3)-equivariant quantities. Inspired by covariant theory in physics, we address this problem by exploring the mathematical relationships between SO(3)-invariant and SO(3)-equivariant quantities and their representations. We first construct theoretical SO(3)-invariant quantities derived from the SO(3)-equivariant regression targets, and use these invariant quantities as supervisory labels to guide the learning of high-quality SO(3)-invariant features. Given that SO(3)-invariance is preserved under non-linear operations, the encoding process for invariant features can extensively utilize non-linear mappings, thereby fully capturing the non-linear patterns inherent in physical systems. Building on this foundation, we propose a gradient-based mechanism to induce SO(3)-equivariant encodings of various degrees from the learned SO(3)-invariant features. This mechanism can incorporate non-linear expressive capabilities into SO(3)-equivariant representations, while theoretically preserving their equivariant properties as we prove. We apply our theory and method to the electronic-structure Hamiltonian prediction tasks, experimental results on eight benchmark databases covering multiple types of elements and challenging scenarios show dramatic breakthroughs on the state-of-the-art prediction accuracy, with improvements of up to 40% in predicting Hamiltonians and up to 76% in predicting downstream physical quantities such as occupied orbital energy. Our approach goes beyond handling physical systems and offers a promising general solution to the critical dilemma between equivariance and non-linear expressiveness for the deep learning paradigm. △ Less

Submitted 18 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02155 [pdf, other]

Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification

Authors: Siqi Yin, Lifan Jiang

Abstract: This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to… ▽ More This paper introduces a novel framework for zero-shot learning (ZSL), i.e., to recognize new categories that are unseen during training, by using a multi-model and multi-alignment integration method. Specifically, we propose three strategies to enhance the model's performance to handle ZSL: 1) Utilizing the extensive knowledge of ChatGPT and the powerful image generation capabilities of DALL-E to create reference images that can precisely describe unseen categories and classification boundaries, thereby alleviating the information bottleneck issue; 2) Integrating the results of text-image alignment and image-image alignment from CLIP, along with the image-image alignment results from DINO, to achieve more accurate predictions; 3) Introducing an adaptive weighting mechanism based on confidence levels to aggregate the outcomes from different prediction methods. Experimental results on multiple datasets, including CIFAR-10, CIFAR-100, and TinyImageNet, demonstrate that our model can significantly improve classification accuracy compared to single-model approaches, achieving AUROC scores above 96% across all test datasets, and notably surpassing 99% on the CIFAR-10 dataset. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.18612 [pdf]

Enhancing Prosthetic Safety and Environmental Adaptability: A Visual-Inertial Prosthesis Motion Estimation Approach on Uneven Terrains

Authors: Chuheng Chen, Xinxing Chen, Shucong Yin, Yuxuan Wang, Binxin Huang, Yuquan Leng, Chenglong Fu

Abstract: Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and oth… ▽ More Environment awareness is crucial for enhancing walking safety and stability of amputee wearing powered prosthesis when crossing uneven terrains such as stairs and obstacles. However, existing environmental perception systems for prosthesis only provide terrain types and corresponding parameters, which fails to prevent potential collisions when crossing uneven terrains and may lead to falls and other severe consequences. In this paper, a visual-inertial motion estimation approach is proposed for prosthesis to perceive its movement and the changes of spatial relationship between the prosthesis and uneven terrain when traversing them. To achieve this, we estimate the knee motion by utilizing a depth camera to perceive the environment and align feature points extracted from stairs and obstacles. Subsequently, an error-state Kalman filter is incorporated to fuse the inertial data into visual estimations to reduce the feature extraction error and obtain a more robust estimation. The motion of prosthetic joint and toe are derived using the prosthesis model parameters. Experiment conducted on our collected dataset and stair walking trials with a powered prosthesis shows that the proposed method can accurately tracking the motion of the human leg and prosthesis with an average root-mean-square error of toe trajectory less than 5 cm. The proposed method is expected to enable the environmental adaptive control for prosthesis, thereby enhancing amputee's safety and mobility in uneven terrains. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.13378 [pdf, other]

doi 10.1109/TIV.2024.3352180

Social Force Embedded Mixed Graph Convolutional Network for Multi-class Trajectory Prediction

Authors: Quancheng Du, Xiao Wang, Shouguo Yin, Lingxi Li, Huansheng Ning

Abstract: Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These meth… ▽ More Accurate prediction of agent motion trajectories is crucial for autonomous driving, contributing to the reduction of collision risks in human-vehicle interactions and ensuring ample response time for other traffic participants. Current research predominantly focuses on traditional deep learning methods, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs). These methods leverage relative distances to forecast the motion trajectories of a single class of agents. However, in complex traffic scenarios, the motion patterns of various types of traffic participants exhibit inherent randomness and uncertainty. Relying solely on relative distances may not adequately capture the nuanced interaction patterns between different classes of road users. In this paper, we propose a novel multi-class trajectory prediction method named the social force embedded mixed graph convolutional network (SFEM-GCN). SFEM-GCN comprises three graph topologies: the semantic graph (SG), position graph (PG), and velocity graph (VG). These graphs encode various of social force relationships among different classes of agents in complex scenes. Specifically, SG utilizes one-hot encoding of agent-class information to guide the construction of graph adjacency matrices based on semantic information. PG and VG create adjacency matrices to capture motion interaction relationships between different classes agents. These graph structures are then integrated into a mixed graph, where learning is conducted using a spatiotemporal graph convolutional neural network (ST-GCNN). To further enhance prediction performance, we adopt temporal convolutional networks (TCNs) to generate the predicted trajectory with fewer parameters. Experimental results on publicly available datasets demonstrate that SFEM-GCN surpasses state-of-the-art methods in terms of accuracy and robustness. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 11 pages,3 figures, published to IEEE Transactions on Intelligent vehicles

arXiv:2404.12104 [pdf, other]

Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models

Authors: Yuzhu Cai, Sheng Yin, Yuxi Wei, Chenxin Xu, Weibo Mao, Felix Juefei-Xu, Siheng Chen, Yanfeng Wang

Abstract: The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designe… ▽ More The burgeoning landscape of text-to-image models, exemplified by innovations such as Midjourney and DALLE 3, has revolutionized content creation across diverse sectors. However, these advancements bring forth critical ethical concerns, particularly with the misuse of open-source models to generate content that violates societal norms. Addressing this, we introduce Ethical-Lens, a framework designed to facilitate the value-aligned usage of text-to-image tools without necessitating internal model revision. Ethical-Lens ensures value alignment in text-to-image models across toxicity and bias dimensions by refining user commands and rectifying model outputs. Systematic evaluation metrics, combining GPT4-V, HEIM, and FairFace scores, assess alignment capability. Our experiments reveal that Ethical-Lens enhances alignment capabilities to levels comparable with or superior to commercial models like DALLE 3, ensuring user-generated content adheres to ethical standards while maintaining image quality. This study indicates the potential of Ethical-Lens to ensure the sustainable development of open-source text-to-image tools and their beneficial integration into society. Our code is available at https://github.com/yuzhu-cai/Ethical-Lens. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 42 pages, 17 figures, 29 tables

arXiv:2404.06762 [pdf, other]

Personality-aware Student Simulation for Conversational Intelligent Tutoring Systems

Authors: Zhengyuan Liu, Stella Xin Yin, Geyu Lin, Nancy F. Chen

Abstract: Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantl… ▽ More Intelligent Tutoring Systems (ITSs) can provide personalized and self-paced learning experience. The emergence of large language models (LLMs) further enables better human-machine interaction, and facilitates the development of conversational ITSs in various disciplines such as math and language learning. In dialogic teaching, recognizing and adapting to individual characteristics can significantly enhance student engagement and learning efficiency. However, characterizing and simulating student's persona remain challenging in training and evaluating conversational ITSs. In this work, we propose a framework to construct profiles of different student groups by refining and integrating both cognitive and noncognitive aspects, and leverage LLMs for personality-aware student simulation in a language learning scenario. We further enhance the framework with multi-aspect validation, and conduct extensive analysis from both teacher and student perspectives. Our experimental results show that state-of-the-art LLMs can produce diverse student responses according to the given language ability and personality traits, and trigger teacher's adaptive scaffolding strategies. △ Less

Submitted 10 April, 2024; originally announced April 2024.

arXiv:2404.06194 [pdf, other]

Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection

Authors: Ting Lei, Shaofeng Yin, Yang Liu

Abstract: Open-vocabulary human-object interaction (HOI) detection, which is concerned with the problem of detecting novel HOIs guided by natural language, is crucial for understanding human-centric scenes. However, prior zero-shot HOI detectors often employ the same levels of feature maps to model HOIs with varying distances, leading to suboptimal performance in scenes containing human-object pairs with a… ▽ More Open-vocabulary human-object interaction (HOI) detection, which is concerned with the problem of detecting novel HOIs guided by natural language, is crucial for understanding human-centric scenes. However, prior zero-shot HOI detectors often employ the same levels of feature maps to model HOIs with varying distances, leading to suboptimal performance in scenes containing human-object pairs with a wide range of distances. In addition, these detectors primarily rely on category names and overlook the rich contextual information that language can provide, which is essential for capturing open vocabulary concepts that are typically rare and not well-represented by category names alone. In this paper, we introduce a novel end-to-end open vocabulary HOI detection framework with conditional multi-level decoding and fine-grained semantic enhancement (CMD-SE), harnessing the potential of Visual-Language Models (VLMs). Specifically, we propose to model human-object pairs with different distances with different levels of feature maps by incorporating a soft constraint during the bipartite matching process. Furthermore, by leveraging large language models (LLMs) such as GPT models, we exploit their extensive world knowledge to generate descriptions of human body part states for various interactions. Then we integrate the generalizable and fine-grained semantics of human body parts to improve interaction recognition. Experimental results on two datasets, SWIG-HOI and HICO-DET, demonstrate that our proposed method achieves state-of-the-art results in open vocabulary HOI detection. The code and models are available at https://github.com/ltttpku/CMD-SE-release. △ Less

Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05412 [pdf]

Valley edge states as bound states in the continuum

Authors: Shunda Yin, Liping Ye, Hailong He, Xueqin Huang, Manzhu Ke, Weiyin Deng, Jiuyang Lu, Zhengyou Liu

Abstract: Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as top… ▽ More Bound states in the continuum (BICs) are spatially localized states with energy embedded in the continuum spectrum of extended states. The combination of BICs physics and nontrivial band topology theory giving rise to topological BICs, which are robust against disorders and meanwhile of the merit of conventional BICs, is attracting wide attention recently. Here, we report valley edge states as topological BICs, which appear at domain wall between two distinct valley topological phases. The robustness of such BICs is demonstrated. The simulations and experiments show great agreement. Our findings of valley related topological BICs shed light on both BICs and valley physics, and may foster innovative applications of topological acoustic devices. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: A revised version has been accepted by Science Bulletin

arXiv:2404.04449 [pdf]

Self-referencing photothermal common-path interferometry to measure absorption of Si3N4 membranes for laser-light sails

Authors: Demeng Feng, Tanuj Kumar, Shenwei Yin, Merlin Mah, Phyo Lin, Margaret Fortman, Gabriel R. Jaffe, Chenghao Wan, Hongyan Mei, Yuzhe Xiao, Ron Synowicki, Ronald J. Warzoha, Victor W. Brar, Joseph J. Talghader, Mikhail A. Kats

Abstract: Laser-light sails are a spacecraft concept wherein lightweight "sails" are propelled to high speeds by lasers with high intensities. The sails must comprise materials with low optical loss, to minimize the risk of laser damage. Stoichiometric silicon nitride (Si$_3$N$_4$) is a candidate material with low loss in the near infrared, but the precise absorption coefficient has not been characterized i… ▽ More Laser-light sails are a spacecraft concept wherein lightweight "sails" are propelled to high speeds by lasers with high intensities. The sails must comprise materials with low optical loss, to minimize the risk of laser damage. Stoichiometric silicon nitride (Si$_3$N$_4$) is a candidate material with low loss in the near infrared, but the precise absorption coefficient has not been characterized in the membrane form-factor needed for sails. We use photothermal common-path interferometry (PCI), a sensitive pump-probe technique, to measure the absorption coefficient of stoichiometric and nonstoichiometric silicon nitride. To calibrate PCI measurements of membranes, we developed a self-referencing technique where a measurement is performed twice: once on a bare membrane, and a second time with a monolayer of graphene deposited on the membrane. The absorption of the sample with graphene can be measured by both PCI and more-conventional spectroscopic techniques, enabling the calibration of the PCI measurement. We find that with an absorption coefficient of (2.09 $\pm$ 0.76) $\times$ 10$^{-2}$ cm$^{-1}$ at 1064 nm, Si$_3$N$_4$ is a suitable laser-sail material for laser intensities as high as ~10 GW/m$^{2}$, which have been proposed for some laser-sail missions, while silicon-rich SiN$_x$ (x~1), with an absorption coefficient of 7.94 $\pm$ 0.50 cm$^{-1}$, is unlikely to survive such high laser intensities. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: Main text + supplementary

arXiv:2404.03681 [pdf, other]

Muon beamtest results of high-density glass scintillator tiles

Authors: Dejing Du, Yong Liu, Hua Cai, Danping Chen, Zhehao Hua, Jifeng Han, Jifeng Han, Baohua Qi, Sen Qian, Jing Ren, Xinyuan Sun, Xinyuan Sun, Dong Yang, Shenghua Yin, Minghui Zhang

Abstract: To achieve the physics goal of precisely measure the Higgs, Z, W bosons and the top quark, future electron-positron colliders require that their detector system has excellent jet energy resolution. One feasible technical option is the high granular calorimetery based on the particle flow algorithm (PFA). A new high-granularity hadronic calorimeter with glass scintillator tiles (GSHCAL) has been pr… ▽ More To achieve the physics goal of precisely measure the Higgs, Z, W bosons and the top quark, future electron-positron colliders require that their detector system has excellent jet energy resolution. One feasible technical option is the high granular calorimetery based on the particle flow algorithm (PFA). A new high-granularity hadronic calorimeter with glass scintillator tiles (GSHCAL) has been proposed, which focus on the significant improvement of hadronic energy resolution with a notable increase of the energy sampling fraction by using high-density glass scintillator tiles. The minimum ionizing particle (MIP) response of a glass scintillator tile is crucial to the hadronic calorimeter, so a dedicated beamtest setup was developed for testing the first batch of large-size glass scintillators. The maximum MIP response of the first batch of glass scintillator tiles can reach up to 107 p.e./MIP, which essentially meets the design requirements of the CEPC GSHCAL. An optical simulation model of a single glass scintillator tile has been established, and the simulation results are consistent with the beamtest results. △ Less

Submitted 9 May, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

arXiv:2404.03429 [pdf, other]

Scaffolding Language Learning via Multi-modal Tutoring Systems with Pedagogical Instructions

Authors: Zhengyuan Liu, Stella Xin Yin, Carolyn Lee, Nancy F. Chen

Abstract: Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in lang… ▽ More Intelligent tutoring systems (ITSs) that imitate human tutors and aim to provide immediate and customized instructions or feedback to learners have shown their effectiveness in education. With the emergence of generative artificial intelligence, large language models (LLMs) further entitle the systems to complex and coherent conversational interactions. These systems would be of great help in language education as it involves developing skills in communication, which, however, drew relatively less attention. Additionally, due to the complicated cognitive development at younger ages, more endeavors are needed for practical uses. Scaffolding refers to a teaching technique where teachers provide support and guidance to students for learning and developing new concepts or skills. It is an effective way to support diverse learning needs, goals, processes, and outcomes. In this work, we investigate how pedagogical instructions facilitate the scaffolding in ITSs, by conducting a case study on guiding children to describe images for language learning. We construct different types of scaffolding tutoring systems grounded in four fundamental learning theories: knowledge construction, inquiry-based learning, dialogic teaching, and zone of proximal development. For qualitative and quantitative analyses, we build and refine a seven-dimension rubric to evaluate the scaffolding process. In our experiment on GPT-4V, we observe that LLMs demonstrate strong potential to follow pedagogical instructions and achieve self-paced learning in different student groups. Moreover, we extend our evaluation framework from a manual to an automated approach, paving the way to benchmark various conversational tutoring systems. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2403.19258 [pdf, other]

Finite-time Scaling beyond the Kibble-Zurek Prerequisite: Driven Critical Dynamics in Strongly Interacting Dirac Systems

Authors: Zhi Zeng, Yin-Kai Yu, Zhi-Xuan Li, Zi-Xiang Li, Shuai Yin

Abstract: In conventional quantum critical point (QCP) characterized by order parameter fluctuations, the celebrated Kibble-Zurek mechanism (KZM) and finite-time scaling (FTS) theory provide universal descriptions of the driven critical dynamics. However, in strongly correlated fermionic systems where gapless fermions are usually present in vicinity of QCP, the driven dynamics has rarely been explored. In t… ▽ More In conventional quantum critical point (QCP) characterized by order parameter fluctuations, the celebrated Kibble-Zurek mechanism (KZM) and finite-time scaling (FTS) theory provide universal descriptions of the driven critical dynamics. However, in strongly correlated fermionic systems where gapless fermions are usually present in vicinity of QCP, the driven dynamics has rarely been explored. In this Letter, we investigate the driven critical dynamics in two-dimensional Dirac systems, which harbor semimetal and Mott insulator phases separated by the QCP triggered by the interplay between fluctuations of gapless Dirac fermions and order-parameter bosons. By studying the evolution of physical quantities for different driving rates through large-scale quantum Monte Carlo simulation, we confirm that the driven dynamics is described by the FTS form. Accordingly, our results significantly generalize the KZM theory by relaxing its requirement for a gapped initial state to the system accommodating gapless Dirac fermionic excitation. Through successfully extending the KZM and FTS theory to Dirac QCP, our work not only brings new fundamental perspective into the nonequilibrium critical dynamics, but also provides a novel theoretical approach to fathom quantum critical properties in fermionic systems. △ Less

Submitted 29 March, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

Comments: 9+3 pages, 5+2 figures

arXiv:2403.09084 [pdf, other]

doi 10.1103/PhysRevB.109.184303

Imaginary-time relaxation quantum critical dynamics in two-dimensional dimerized Heisenberg model

Authors: Jia-Qi Cai, Yu-Rong Shu, Xue-Qing Rao, Shuai Yin

Abstract: We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling rel… ▽ More We study the imaginary-time relaxation critical dynamics of the Neel-paramagnetic quantum phase transition in the two-dimensional (2D) dimerized S = 1/2 Heisenberg model. We focus on the scaling correction in the short-time region. A unified scaling form including both short-time and finite-size corrections is proposed. According to this full scaling form, improved short-imaginary-time scaling relations are obtained. We numerically verify the scaling form and the improved short-time scaling relations for different initial states using projector quantum Monte Carlo algorithm. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: 10 pages, 8 figures

Journal ref: Phys. Rev. B 109, 184303(2024)

arXiv:2403.08459 [pdf, other]

Symmetry restoration and quantum Mpemba effect in symmetric random circuits

Authors: Shuo Liu, Hao-Kai Zhang, Shuai Yin, Shi-Xin Zhang

Abstract: Entanglement asymmetry, which serves as a diagnostic tool for symmetry breaking and a proxy for thermalization, has recently been proposed and studied in the context of symmetry restoration for quantum many-body systems undergoing a quench. In this Letter, we investigate symmetry restoration in various symmetric random quantum circuits, particularly focusing on the U(1) symmetry case. In contrast… ▽ More Entanglement asymmetry, which serves as a diagnostic tool for symmetry breaking and a proxy for thermalization, has recently been proposed and studied in the context of symmetry restoration for quantum many-body systems undergoing a quench. In this Letter, we investigate symmetry restoration in various symmetric random quantum circuits, particularly focusing on the U(1) symmetry case. In contrast to non-symmetric random circuits where the U(1) symmetry of a small subsystem can always be restored at late times, we reveal that symmetry restoration can fail in U(1) symmetric circuits for certain small symmetry-broken initial states in finite-size systems. In the early-time dynamics, we observe an intriguing quantum Mpemba effect implying that symmetry is restored faster when the initial state is more asymmetric. Furthermore, we also investigate the entanglement asymmetry dynamics for SU(2) and $Z_{2}$ symmetric circuits and identify the presence and absence of the quantum Mpemba effect for the corresponding symmetries, respectively. A unified understanding of these results is provided through the lens of quantum thermalization with conserved charges. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: 4.5 pages, 5 figures, and Supplemental Material

arXiv:2403.06770 [pdf, other]

Estimates on the convergence of expansions at finite baryon chemical potentials

Authors: Rui Wen, Shi Yin, Wei-jie Fu

Abstract: Convergence of three different expansion schemes at finite baryon chemical potentials, including the conventional Taylor expansion, the Padé approximants, and the $T'$ expansion proposed recently in lattice QCD simulations, have been investigated in a low energy effective theory within the fRG approach. It is found that the $T'$ expansion or the Padé approximants would hardly improve the convergen… ▽ More Convergence of three different expansion schemes at finite baryon chemical potentials, including the conventional Taylor expansion, the Padé approximants, and the $T'$ expansion proposed recently in lattice QCD simulations, have been investigated in a low energy effective theory within the fRG approach. It is found that the $T'$ expansion or the Padé approximants would hardly improve the convergence of expansion in comparison to the conventional Taylor expansion, within the expansion orders considered in this work. Furthermore, we find that the consistent regions of the three different expansions are in agreement with the convergence radius of the Lee-Yang edge singularities. △ Less

Submitted 19 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

Comments: 9 pages, 6 figures

arXiv:2403.04481 [pdf, other]

Do Large Language Model Understand Multi-Intent Spoken Language ?

Authors: Shangjian Yin, Peijie Huang, Yuhong Xu, Haojing Huang, Jiatian Chen

Abstract: This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of… ▽ More This research signifies a considerable breakthrough in leveraging Large Language Models (LLMs) for multi-intent spoken language understanding (SLU). Our approach re-imagines the use of entity slots in multi-intent SLU applications, making the most of the generative potential of LLMs within the SLU landscape, leading to the development of the EN-LLM series. Furthermore, we introduce the concept of Sub-Intent Instruction (SII) to amplify the analysis and interpretation of complex, multi-intent communications, which further supports the creation of the ENSI-LLM models series. Our novel datasets, identified as LM-MixATIS and LM-MixSNIPS, are synthesized from existing benchmarks. The study evidences that LLMs may match or even surpass the performance of the current best multi-intent SLU models. We also scrutinize the performance of LLMs across a spectrum of intent configurations and dataset distributions. On top of this, we present two revolutionary metrics - Entity Slot Accuracy (ESA) and Combined Semantic Accuracy (CSA) - to facilitate a detailed assessment of LLM competence in this multifaceted field." Our code and datasets are available at \url{https://github.com/SJY8460/SLM}. △ Less

Submitted 15 April, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03742 [pdf, other]

Mitigating Ageism through Virtual Reality: Intergenerational Collaborative Escape Room Design

Authors: Ruotong Zou, Shuyu Yin, Tianqi Song, Peinuan Qin, Yi-Chieh Lee

Abstract: As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape r… ▽ More As virtual reality (VR) becomes more popular for intergenerational collaboration, there is still a significant gap in research regarding understanding the potential for reducing ageism. Our study aims to address this gap by analyzing ageism levels before and after VR escape room collaborative experiences. We recruited 28 participants to collaborate with an older player in a challenging VR escape room game. To ensure consistent and reliable performance data of older players, our experimenters simulated older participants following specific guidelines. After completing the game, we found a significant reduction in ageism among younger participants. Furthermore, we introduce a new game mechanism that encourages intergenerational collaboration. Our research highlights the potential of VR collaborative games as a practical tool for mitigating ageism. It provides valuable insights for designing immersive VR experiences that foster enhanced intergenerational collaboration. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.00019 [pdf, other]

Transformer-based Parameter Estimation in Statistics

Authors: Xiaoxin Yin, David S. Yin

Abstract: Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not… ▽ More Parameter estimation is one of the most important tasks in statistics, and is key to helping people understand the distribution behind a sample of observations. Traditionally parameter estimation is done either by closed-form solutions (e.g., maximum likelihood estimation for Gaussian distribution), or by iterative numerical methods such as Newton-Raphson method when closed-form solution does not exist (e.g., for Beta distribution). In this paper we propose a transformer-based approach to parameter estimation. Compared with existing solutions, our approach does not require a closed-form solution or any mathematical derivations. It does not even require knowing the probability density function, which is needed by numerical methods. After the transformer model is trained, only a single inference is needed to estimate the parameters of the underlying distribution based on a sample of observations. In the empirical study we compared our approach with maximum likelihood estimation on commonly used distributions such as normal distribution, exponential distribution and beta distribution. It is shown that our approach achieves similar or better accuracy as measured by mean-square-errors. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2402.16899 [pdf, other]

A priori Estimates for Deep Residual Network in Continuous-time Reinforcement Learning

Authors: Shuyu Yin, Qixuan Zhou, Fei Wen, Tao Luo

Abstract: Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is a… ▽ More Deep reinforcement learning excels in numerous large-scale practical applications. However, existing performance analyses ignores the unique characteristics of continuous-time control problems, is unable to directly estimate the generalization error of the Bellman optimal loss and require a boundedness assumption. Our work focuses on continuous-time control problems and proposes a method that is applicable to all such problems where the transition function satisfies semi-group and Lipschitz properties. Under this method, we can directly analyze the \emph{a priori} generalization error of the Bellman optimal loss. The core of this method lies in two transformations of the loss function. To complete the transformation, we propose a decomposition method for the maximum operator. Additionally, this analysis method does not require a boundedness assumption. Finally, we obtain an \emph{a priori} generalization error without the curse of dimensionality. △ Less

Submitted 7 March, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

arXiv:2402.16272 [pdf, other]

Mass production and performance study on the 20-inch PMT acrylic protection covers in JUNO

Authors: Miao He, Zhonghua Qin, Diru Wu, Meihang Xu, Wan Xie, Fang Chen, Xiaoping Jing, Genhua Yin, Shengjiong Yin, Linhua Gu, Xiaofeng Xia, Qinchang Wang

Abstract: The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional p… ▽ More The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional precision, mechanical strength, and transparency. This paper presents the manufacturing technology, mass production process, and performance characteristics of the acrylic covers. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: 12 pages, 10 figures

arXiv:2402.14634 [pdf, other]

doi 10.1145/3636534.3649376

GazeTrak: Exploring Acoustic-based Eye Tracking on a Glass Frame

Authors: Ke Li, Ruidong Zhang, Boao Chen, Siyuan Chen, Sicheng Yin, Saif Mahmud, Qikang Liang, François Guimbretière, Cheng Zhang

Abstract: In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further… ▽ More In this paper, we present GazeTrak, the first acoustic-based eye tracking system on glasses. Our system only needs one speaker and four microphones attached to each side of the glasses. These acoustic sensors capture the formations of the eyeballs and the surrounding areas by emitting encoded inaudible sound towards eyeballs and receiving the reflected signals. These reflected signals are further processed to calculate the echo profiles, which are fed to a customized deep learning pipeline to continuously infer the gaze position. In a user study with 20 participants, GazeTrak achieves an accuracy of 3.6° within the same remounting session and 4.9° across different sessions with a refreshing rate of 83.3 Hz and a power signature of 287.9 mW. Furthermore, we report the performance of our gaze tracking system fully implemented on an MCU with a low-power CNN accelerator (MAX78002). In this configuration, the system runs at up to 83.3 Hz and has a total power signature of 95.4 mW with a 30 Hz FPS. △ Less

Submitted 23 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

Comments: 16 pages, 5 figures, 7 tables, The 30th Annual International Conference on Mobile Computing and Networking (ACM MobiCom 2024)

arXiv:2402.12823 [pdf, other]

The influence of hadronic rescatterings on the net-baryon number fluctuations

Authors: Qian Chen, Rui Wen, Shi Yin, Wei-jie Fu, Zi-Wei Lin, Guo-Liang Ma

Abstract: Fluctuations of conserved charges, such as the net-baryon number fluctuations, are influenced by different dynamical evolution processes. In this paper, we investigate the influence of hadronic rescatterings on different orders of cumulants of the net-baryon number distribution. At the start of hadronic rescatterings, we introduce net-baryon number distributions reconstructed based on net-baryon c… ▽ More Fluctuations of conserved charges, such as the net-baryon number fluctuations, are influenced by different dynamical evolution processes. In this paper, we investigate the influence of hadronic rescatterings on different orders of cumulants of the net-baryon number distribution. At the start of hadronic rescatterings, we introduce net-baryon number distributions reconstructed based on net-baryon cumulants of different orders obtained from computation in functional renormalization group (FRG), where the distributions were constructed using the maximum entropy method. This way we introduce the critical fluctuations of Quantum Chromodynamics (QCD) into the AMPT model. Firstly, we find that hadronic rescatterings have distinct effects on cumulant ratios of different orders for the net-baryon number. Secondly, we observe that the effect of hadronic rescatterings is more significant for critical fluctuations than dynamical fluctuations, because the two-, three- and four-particle correlation functions due to critical fluctuations are weakened more significantly by hadronic rescatterings. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 10 pages, 8 figures

arXiv:2402.10534 [pdf, other]

Using Left and Right Brains Together: Towards Vision and Language Planning

Authors: Jun Cen, Chenfei Wu, Xiao Liu, Shengming Yin, Yixuan Pei, Jinglong Yang, Qifeng Chen, Nan Duan, Jianguo Zhang

Abstract: Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking pro… ▽ More Large Language Models (LLMs) and Large Multi-modality Models (LMMs) have demonstrated remarkable decision masking capabilities on a variety of tasks. However, they inherently operate planning within the language space, lacking the vision and spatial imagination ability. In contrast, humans utilize both left and right hemispheres of the brain for language and visual planning during the thinking process. Therefore, we introduce a novel vision-language planning framework in this work to perform concurrent visual and language planning for tasks with inputs of any form. Our framework incorporates visual planning to capture intricate environmental details, while language planning enhances the logical coherence of the overall system. We evaluate the effectiveness of our framework across vision-language tasks, vision-only tasks, and language-only tasks. The results demonstrate the superior performance of our approach, indicating that the integration of visual and language planning yields better contextually aware task execution. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 19 pages, 13 figures

arXiv:2402.02140 [pdf, other]

Generative Visual Compression: A Review

Authors: Bolin Chen, Shanzhi Yin, Peilin Chen, Shiqi Wang, Yan Ye

Abstract: Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promi… ▽ More Artificial Intelligence Generated Content (AIGC) is leading a new technical revolution for the acquisition of digital content and impelling the progress of visual compression towards competitive performance gains and diverse functionalities over traditional codecs. This paper provides a thorough review on the recent advances of generative visual compression, illustrating great potentials and promising applications in ultra-low bitrate communication, user-specified reconstruction/filtering, and intelligent machine analysis. In particular, we review the visual data compression methodologies with deep generative models, and summarize how compact representation and high-fidelity reconstruction could be actualized via generative techniques. In addition, we generalize related generative compression technologies for machine vision and intelligent analytics. Finally, we discuss the fundamental challenges on generative visual compression techniques and envision their future research directions. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01271 [pdf, other]

An Intra-BRNN and GB-RVQ Based END-TO-END Neural Audio Codec

Authors: Linping Xu, Jiawei Jiang, Dejun Zhang, Xianjun Xia, Li Chen, Yijian Xiao, Piao Ding, Shenyi Song, Sixing Yin, Ferdous Sohel

Abstract: Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleave… ▽ More Recently, neural networks have proven to be effective in performing speech coding task at low bitrates. However, under-utilization of intra-frame correlations and the error of quantizer specifically degrade the reconstructed audio quality. To improve the coding quality, we present an end-to-end neural speech codec, namely CBRC (Convolutional and Bidirectional Recurrent neural Codec). An interleaved structure using 1D-CNN and Intra-BRNN is designed to exploit the intra-frame correlations more efficiently. Furthermore, Group-wise and Beam-search Residual Vector Quantizer (GB-RVQ) is used to reduce the quantization noise. CBRC encodes audio every 20ms with no additional latency, which is suitable for real-time communication. Experimental results demonstrate the superiority of the proposed codec when comparing CBRC at 3kbps with Opus at 12kbps. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: INTERSPEECH 2023

arXiv:2401.17840 [pdf, other]

Propagation Dynamics of Rumor vs. Non-rumor across Multiple Social Media Platforms Driven by User Characteristics

Authors: Dongpeng Hou, Shu Yin, Chao Gao, Xianghua Li, Zhen Wang

Abstract: Studying information propagation dynamics in social media can elucidate user behaviors and patterns. However, previous research often focuses on single platforms and fails to differentiate between the nuanced roles of source users and other participants in cascades. To address these limitations, we analyze propagation cascades on Twitter and Weibo combined with a crawled dataset of nearly one mill… ▽ More Studying information propagation dynamics in social media can elucidate user behaviors and patterns. However, previous research often focuses on single platforms and fails to differentiate between the nuanced roles of source users and other participants in cascades. To address these limitations, we analyze propagation cascades on Twitter and Weibo combined with a crawled dataset of nearly one million users with authentic attributes. Our preliminary findings from multiple platforms robustly indicate that rumors tend to spread more deeply, while non-rumors distribute more broadly. Interestingly, we discover that the spread of rumors is slower, persists longer, and, in most cases, involves fewer participants than that of non-rumors. And an undiscovered highlight is that reputable active users, termed `onlookers', inadvertently or unwittingly spread rumors due to their extensive online interactions and the allure of sensational fake news. Conversely, celebrities exhibit caution, mindful of releasing unverified information. Additionally, we identify cascade features aligning with exponential patterns, highlight the Credibility Erosion Effect (CEE) phenomenon in the propagation process, and discover the different contents and policies between the two platforms. Our findings enhance current understanding and provide a valuable statistical analysis for future research. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.17409 [pdf, other]

doi 10.1145/3613904.3642910

EchoWrist: Continuous Hand Pose Tracking and Hand-Object Interaction Recognition Using Low-Power Active Acoustic Sensing On a Wristband

Authors: Chi-Jung Lee, Ruidong Zhang, Devansh Agarwal, Tianhong Catherine Yu, Vipin Gunda, Oliver Lopez, James Kim, Sicheng Yin, Boao Dong, Ke Li, Mose Sakashita, Francois Guimbretiere, Cheng Zhang

Abstract: Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible s… ▽ More Our hands serve as a fundamental means of interaction with the world around us. Therefore, understanding hand poses and interaction context is critical for human-computer interaction. We present EchoWrist, a low-power wristband that continuously estimates 3D hand pose and recognizes hand-object interactions using active acoustic sensing. EchoWrist is equipped with two speakers emitting inaudible sound waves toward the hand. These sound waves interact with the hand and its surroundings through reflections and diffractions, carrying rich information about the hand's shape and the objects it interacts with. The information captured by the two microphones goes through a deep learning inference system that recovers hand poses and identifies various everyday hand activities. Results from the two 12-participant user studies show that EchoWrist is effective and efficient at tracking 3D hand poses and recognizing hand-object interactions. Operating at 57.9mW, EchoWrist is able to continuously reconstruct 20 3D hand joints with MJEDE of 4.81mm and recognize 12 naturalistic hand-object interactions with 97.6% accuracy. △ Less

Submitted 29 March, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.17093 [pdf, other]

StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis

Authors: Zecheng Tang, Chenfei Wu, Zekai Zhang, Mingheng Ni, Shengming Yin, Yu Liu, Zhengyuan Yang, Lijuan Wang, Zicheng Liu, Juntao Li, Nan Duan

Abstract: To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natura… ▽ More To leverage LLMs for visual synthesis, traditional methods convert raster image information into discrete grid tokens through specialized visual modules, while disrupting the model's ability to capture the true semantic representation of visual scenes. This paper posits that an alternative representation of images, vector graphics, can effectively surmount this limitation by enabling a more natural and semantically coherent segmentation of the image information. Thus, we introduce StrokeNUWA, a pioneering work exploring a better visual representation ''stroke tokens'' on vector graphics, which is inherently visual semantics rich, naturally compatible with LLMs, and highly compressed. Equipped with stroke tokens, StrokeNUWA can significantly surpass traditional LLM-based and optimization-based methods across various metrics in the vector graphic generation task. Besides, StrokeNUWA achieves up to a 94x speedup in inference over the speed of prior methods with an exceptional SVG code compression ratio of 6.9%. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.13886 [pdf]

doi 10.1038/s41567-023-02349-0

Observation of possible excitonic charge density waves and metal-insulator transitions in atomically thin semimetals

Authors: Qiang Gao, Yang-hao Chan, Pengfei Jiao, Haiyang Chen, Shuaishuai Yin, Kanjanaporn Tangprapha, Yichen Yang, Xiaolong Li, Zhengtai Liu, Dawei Shen, Shengwei Jiang, Peng Chen

Abstract: Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a con… ▽ More Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a condensed phase in low dimensional HfTe2, whereas angle-resolved photoemission spectroscopy (ARPES) measurements show a metal-insulator transition by lowering the temperature in single triatomic layer (TL) HfTe2. A full gap opening, renormalization of the bands, and emergence of replica bands at the M point are observed in the low temperatures, indicating formation of a CDW in the ground state.Raman spectroscopy shows no sign of lattice distortion within the detection limit. The results are corroborated by first-principles calculations, demonstrating the electronic origin of the CDW. By adding more layers, the phase transition is suppressed and completely destroyed at 3 TL because of the increased screening around the Fermi surface. Interestingly, a small amount of electron doping in 1 TL film during the growth significantly raises the transition temperature (TC), which is attributed to a reduced screening effect and a more balanced electron and hole carrier density. Our results indicate a CDW formation mechanism consistent with the excitonic insulator phase in low dimensional HfTe2 and open up opportunity for realization of novel quantum states based on exciton condensation. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: https://www.nature.com/articles/s41567-023-02349-0 published in Nature Physics

arXiv:2401.00744 [pdf, other]

Towards Harmonization of SO(3)-Equivariance and Expressiveness: a Hybrid Deep Learning Framework for Electronic-Structure Hamiltonian Prediction

Authors: Shi Yin, Xinyang Pan, Xudong Zhu, Tianyu Gao, Haochong Zhang, Feng Wu, Lixin He

Abstract: Deep learning for predicting the electronic-structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method synergizing two distinct categories o… ▽ More Deep learning for predicting the electronic-structure Hamiltonian of quantum systems necessitates satisfying the covariance laws, among which achieving SO(3)-equivariance without sacrificing the non-linear expressive capability of networks remains unsolved. To navigate the harmonization between equivariance and expressiveness, we propose a deep learning method synergizing two distinct categories of neural mechanisms as a two-stage encoding and regression framework. The first stage corresponds to group theory-based neural mechanisms with inherent SO(3)-equivariant properties prior to the parameter learning process, while the second stage is characterized by a non-linear 3D graph Transformer network we propose, featuring high capability on non-linear expressiveness. The novel combination lies in the point that, the first stage predicts baseline Hamiltonians with abundant SO(3)-equivariant features extracted, assisting the second stage in empirical learning of equivariance; and in turn, the second stage refines the first stage's output as a fine-grained prediction of Hamiltonians using powerful non-linear neural mappings, compensating for the intrinsic weakness on non-linear expressiveness capability of mechanisms in the first stage. Our method enables precise, generalizable predictions while capturing SO(3)-equivariance under rotational transformations, and achieves state-of-the-art performance in Hamiltonian prediction on six benchmark databases. △ Less

Submitted 21 June, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.13986 [pdf]

Deep Learning Enabled Design of Terahertz High-Q Metamaterials

Authors: Shan Yin, Haotian Zhong, Wei Huang, Wentao Zhang, Jiaguang Han

Abstract: Metamaterials open up a new way to manipulate electromagnetic waves and realize various functional devices. Metamaterials with high-quality (Q) resonance responses are widely employed in sensing, detection, and other applications. Traditional design of metamaterials involves laborious simulation-optimization and limits the efficiency. The high-Q metamaterials with abrupt spectral change are even h… ▽ More Metamaterials open up a new way to manipulate electromagnetic waves and realize various functional devices. Metamaterials with high-quality (Q) resonance responses are widely employed in sensing, detection, and other applications. Traditional design of metamaterials involves laborious simulation-optimization and limits the efficiency. The high-Q metamaterials with abrupt spectral change are even harder to reverse design on-demand. In this paper, we propose novel solutions for designing terahertz high-Q metamaterials based on deep learning, including the forward prediction of spectral responses and the inverse design of structural parameters. For the forward prediction, we develop the Electromagnetic Response Transformer (ERT) model to establish the complex mapping relations between the highly sensitive structural parameters and the abrupt spectra, and realize precise prediction of the high-Q resonance in terahertz spectra from given structural parameters. For the inverse design, we introduce the Visual Attention Network (VAN) model with a large model capability to attentively learn the abrupt shifts in spectral resonances, which can efficiently reduce errors and achieve highly accurate inverse design of structural parameters according to the expected high-Q resonance responses. Both models exhibit outstanding performance, and the accuracy is improved one or two orders higher compared to the traditional machine learning methods. Besides, our ERT model can be 4000 times faster than the conventional full wave simulations in computation time. Our work provides new avenues for the deep learning enabled design of terahertz high-Q metamaterials, which holds potential applications in various fields, such as terahertz communication, sensing, imaging, and functional devices. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: 17 pages, 6 figures

arXiv:2312.11820 [pdf, other]

SoC-Tuner: An Importance-guided Exploration Framework for DNN-targeting SoC Design

Authors: Shixin Chen, Su Zheng, Chen Bai, Wenqian Zhao, Shuo Yin, Yang Bai, Bei Yu

Abstract: Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to f… ▽ More Designing a system-on-chip (SoC) for deep neural network (DNN) acceleration requires balancing multiple metrics such as latency, power, and area. However, most existing methods ignore the interactions among different SoC components and rely on inaccurate and error-prone evaluation tools, leading to inferior SoC design. In this paper, we present SoC-Tuner, a DNN-targeting exploration framework to find the Pareto optimal set of SoC configurations efficiently. Our framework constructs a thorough SoC design space of all components and divides the exploration into three phases. We propose an importance-based analysis to prune the design space, a sampling algorithm to select the most representative initialization points, and an information-guided multi-objective optimization method to balance multiple design metrics of SoC design. We validate our framework with the actual very-large-scale-integration (VLSI) flow on various DNN benchmarks and show that it outperforms previous methods. To the best of our knowledge, this is the first work to construct an exploration framework of SoCs for DNN acceleration. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: ASP-DAC 2024

arXiv:2312.05739 [pdf, other]

GAMC: An Unsupervised Method for Fake News Detection using Graph Autoencoder with Masking

Authors: Shu Yin, Chao Gao, Zhen Wang

Abstract: With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection, they primarily focus on content, overlooking social context during news propagation. Graph-based techniques have in… ▽ More With the rise of social media, the spread of fake news has become a significant concern, potentially misleading public perceptions and impacting social stability. Although deep learning methods like CNNs, RNNs, and Transformer-based models like BERT have enhanced fake news detection, they primarily focus on content, overlooking social context during news propagation. Graph-based techniques have incorporated this social context but are limited by the need for large labeled datasets. Addressing these challenges, this paper introduces GAMC, an unsupervised fake news detection technique using the Graph Autoencoder with Masking and Contrastive learning. By leveraging both the context and content of news propagation as self-supervised signals, our method negates the requirement for labeled datasets. We augment the original news propagation graph, encode these with a graph encoder, and employ a graph decoder for reconstruction. A unique composite loss function, including reconstruction error and contrast loss, is designed. The method's contributions are: introducing self-supervised learning to fake news detection, proposing a graph autoencoder integrating two distinct losses, and validating our approach's efficacy through real-world dataset experiments. △ Less

Submitted 9 December, 2023; originally announced December 2023.

Journal ref: the Thirty-Eighth AAAI Conference on Artificial Intelligence,2024

arXiv:2311.15534 [pdf, other]

Analogue of collectively induced transparency in metamaterials

Authors: Wei Huang, Shi-Ting Cao, Xiaowei Qu, Shan Yin, Wentao Zhang

Abstract: Most recently, a brand new optical phenomenon, collectively induced transparency (CIT) has already been proposed in the cavity quantum electrodynamics system, which comes from the coupling between the cavity and ions and the quantum interference of collective ions. Due to the equivalent analogue of quantum optics, metamaterial also is a good platform to realize collectively induced transparency (C… ▽ More Most recently, a brand new optical phenomenon, collectively induced transparency (CIT) has already been proposed in the cavity quantum electrodynamics system, which comes from the coupling between the cavity and ions and the quantum interference of collective ions. Due to the equivalent analogue of quantum optics, metamaterial also is a good platform to realize collectively induced transparency (CIT) which can be useful for highly sensitive metamaterial sensors, optical switches and photo-memory. In this paper, we propose the coupling of bright mode and interference of dark modes, to realize the CIT in terahertz (THz) metamaterial system. We give the theoretical analysis, analytical solutions, simulations and experiments to demonstrate our idea. △ Less

Submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.15030 [pdf, other]

Tuning-free Quasi-stiffness Control Framework of a Powered Transfemoral Prosthesis for Task-adaptive Walking

Authors: Teng Ma, Shucong Yin, Zhimin Hou, Binxin Huang, Haoyong Yu, Chenglong Fu

Abstract: Impedance-based control represents a prevalent strategy in the development of powered transfemoral prostheses. However, creating a task-adaptive, tuning-free controller that effectively generalizes across diverse locomotion modes and terrain conditions continues to be a significant challenge. This letter proposes a tuning-free and task-adaptive quasi-stiffness control framework for powered prosthe… ▽ More Impedance-based control represents a prevalent strategy in the development of powered transfemoral prostheses. However, creating a task-adaptive, tuning-free controller that effectively generalizes across diverse locomotion modes and terrain conditions continues to be a significant challenge. This letter proposes a tuning-free and task-adaptive quasi-stiffness control framework for powered prostheses that generalizes across various walking tasks, including the torque-angle relationship reconstruction part and the quasi-stiffness controller design part. A Gaussian Process Regression (GPR) model is introduced to predict the target features of the human joint angle and torque in a new task. Subsequently, a Kernelized Movement Primitives (KMP) is employed to reconstruct the torque-angle relationship of the new task from multiple human reference trajectories and estimated target features. Based on the torque-angle relationship of the new task, a quasi-stiffness control approach is designed for a powered prosthesis. Finally, the proposed framework is validated through practical examples, including varying speeds and inclines walking tasks. Notably, the proposed framework not only aligns with but frequently surpasses the performance of a benchmark finite state machine impedance controller (FSMIC) without necessitating manual impedance tuning and has the potential to expand to variable walking tasks in daily life for the transfemoral amputees. △ Less

Submitted 26 March, 2024; v1 submitted 25 November, 2023; originally announced November 2023.

Comments: 8 pages, 10 figures. This work has been submitted to the IEEE-RAL for possible publication

arXiv:2311.12259 [pdf, ps, other]

doi 10.1016/j.physletb.2024.138797

Analytical models of supermassive black holes in galaxies surrounded by dark matter halos

Authors: Zibo Shen, Anzhong Wang, Yungui Gong, Shaoyu Yin

Abstract: In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the differenc… ▽ More In this Letter, we present five analytical models in closed forms, each representing a supermassive black hole (SMBH) located at the center of a galaxy surrounded by dark matter (DM) halo. The density profile of the halo vanishes inside twice the Schwarzschild radius of the hole and satisfies the weak, strong, and dominant energy conditions. The spacetime are asymptotically flat, and the difference among the models lies in the slopes of the density profiles in the spike and regions far from the center of the galaxy. Three of them represent cusp models, whereas the other two represent core models. With the well-known (generalized) Newman-Janis algorithm, rotating SMBHs with DM halos can be easily constructed from these models. △ Less

Submitted 19 June, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

Comments: revtex4-2, no figures. Version to appear in Phys. Lett. B 855 (2024) 138797

Journal ref: Phys. Lett. B 855 (2024) 138797

arXiv:2311.06203 [pdf, other]

Relaxation Critical Dynamics with Emergent Symmetry

Authors: Yu-Rong Shu, Shuai Yin

Abstract: Different from usual critical point characterized by a single length scale, critical point with emergent symmetry exhibits intriguing critical properties characterized by two relevant length scales, attracting long-term investigations from both theoretical and experimental aspects. A natural question is how the critical dynamics is affected by the presence of two relevant length scales. Here we st… ▽ More Different from usual critical point characterized by a single length scale, critical point with emergent symmetry exhibits intriguing critical properties characterized by two relevant length scales, attracting long-term investigations from both theoretical and experimental aspects. A natural question is how the critical dynamics is affected by the presence of two relevant length scales. Here we study the relaxation critical dynamics in the three-dimensional ($3$D) clock model, whose critical point has emergent $U(1)$ symmetry. We find that in contrast to the magnatization $M$, whose relaxation process is described by the usual dynamic exponent $z$ of the $3$D $XY$ universality class, the angular order parameter $φ_q$ shows a two-stage evolution characterized by different dynamic critical exponents. While in the short-time stage the relaxation dynamics is governed by $z$, in the long-time stage the dynamics is controlled by a new dynamic exponent $z'$. We also show the off-critical-point effects in the critical relaxation. Our results may be experimentally detected in the hexagonal RMnO$_3$ (R$=$rare earth) materials. △ Less

Submitted 10 November, 2023; originally announced November 2023.

Comments: 7 pages, 4 figures

Showing 1–50 of 295 results for author: Yin, S