-
Leveraging The Finite States of Emotion Processing to Study Late-Life Mental Health
Authors:
Yuanzhe Huang,
Saurab Faruque,
Minjie Wu,
Akiko Mizuno,
Eduardo Diniz,
Shaolin Yang,
George Dewitt Stetten,
Noah Schweitzer,
Hecheng Jin,
Linghai Wang,
Howard J. Aizenstein
Abstract:
Traditional approaches in mental health research apply General Linear Models (GLM) to describe the longitudinal dynamics of observed psycho-behavioral measurements (questionnaire summary scores). Similarly, GLMs are also applied to characterize relationships between neurobiological measurements (regional fMRI signals) and perceptual stimuli or other regional signals. While these methods are useful…
▽ More
Traditional approaches in mental health research apply General Linear Models (GLM) to describe the longitudinal dynamics of observed psycho-behavioral measurements (questionnaire summary scores). Similarly, GLMs are also applied to characterize relationships between neurobiological measurements (regional fMRI signals) and perceptual stimuli or other regional signals. While these methods are useful for exploring linear correlations among the isolated signals of those constructs (i.e., summary scores or fMRI signals), these classical frameworks fall short in providing insights into the comprehensive system-level dynamics underlying observable changes. Hidden Markov Models (HMM) are a statistical model that enable us to describe the sequential relations among multiple observable constructs, and when applied through the lens of Finite State Automata (FSA), can provide a more integrated and intuitive framework for modeling and understanding the underlying controller (the prescription for how to respond to inputs) that fundamentally defines any system, as opposed to linearly correlating output signals produced by the controller. We present a simple and intuitive HMM processing pipeline vcHMM (See Preliminary Data) that highlights FSA theory and is applicable for both behavioral analysis of questionnaire data and fMRI data. HMMs offer theoretic promise as they are computationally equivalent to the FSA, the control processor of a Turing Machine (TM) The dynamic programming Viterbi algorithm is used to leverage the HMM model. It efficiently identifies the most likely sequence of hidden states. The vcHMM pipeline leverages this grammar to understand how behavior and neural activity relate to depression.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Ultralight vector dark matter search using data from the KAGRA O3GK run
Authors:
The LIGO Scientific Collaboration,
the Virgo Collaboration,
the KAGRA Collaboration,
A. G. Abac,
R. Abbott,
H. Abe,
I. Abouelfettouh,
F. Acernese,
K. Ackley,
C. Adamcewicz,
S. Adhicary,
N. Adhikari,
R. X. Adhikari,
V. K. Adkins,
V. B. Adya,
C. Affeldt,
D. Agarwal,
M. Agathos,
O. D. Aguiar,
I. Aguilar,
L. Aiello,
A. Ain,
P. Ajith,
T. Akutsu,
S. Albanesi
, et al. (1778 additional authors not shown)
Abstract:
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we prese…
▽ More
Among the various candidates for dark matter (DM), ultralight vector DM can be probed by laser interferometric gravitational wave detectors through the measurement of oscillating length changes in the arm cavities. In this context, KAGRA has a unique feature due to differing compositions of its mirrors, enhancing the signal of vector DM in the length change in the auxiliary channels. Here we present the result of a search for $U(1)_{B-L}$ gauge boson DM using the KAGRA data from auxiliary length channels during the first joint observation run together with GEO600. By applying our search pipeline, which takes into account the stochastic nature of ultralight DM, upper bounds on the coupling strength between the $U(1)_{B-L}$ gauge boson and ordinary matter are obtained for a range of DM masses. While our constraints are less stringent than those derived from previous experiments, this study demonstrates the applicability of our method to the lower-mass vector DM search, which is made difficult in this measurement by the short observation time compared to the auto-correlation time scale of DM.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods
Authors:
Hanlei Jin,
Yang Zhang,
Dan Meng,
Jun Wang,
Jinghua Tan
Abstract:
Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest in both academic and industrial circles. Many studies have been conducted in the past to survey ATS methods; however, they generall…
▽ More
Automatic Text Summarization (ATS), utilizing Natural Language Processing (NLP) algorithms, aims to create concise and accurate summaries, thereby significantly reducing the human effort required in processing large volumes of text. ATS has drawn considerable interest in both academic and industrial circles. Many studies have been conducted in the past to survey ATS methods; however, they generally lack practicality for real-world implementations, as they often categorize previous methods from a theoretical standpoint. Moreover, the advent of Large Language Models (LLMs) has altered conventional ATS methods. In this survey, we aim to 1) provide a comprehensive overview of ATS from a ``Process-Oriented Schema'' perspective, which is best aligned with real-world implementations; 2) comprehensively review the latest LLM-based ATS works; and 3) deliver an up-to-date survey of ATS, bridging the two-year gap in the literature. To the best of our knowledge, this is the first survey to specifically investigate LLM-based ATS methods.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Towards Training A Chinese Large Language Model for Anesthesiology
Authors:
Zhonghai Wang,
Jie Jiang,
Yibing Zhan,
Bohao Zhou,
Yanhong Li,
Chong Zhang,
Liang Ding,
Hua Jin,
Jun Peng,
Xu Lin,
Weifeng Liu
Abstract:
Medical large language models (LLMs) have gained popularity recently due to their significant practical utility. However, most existing research focuses on general medicine, and there is a need for in-depth study of LLMs in specific fields like anesthesiology. To fill the gap, we introduce Hypnos, a Chinese Anesthesia model built upon existing LLMs, e.g., Llama. Hypnos' contributions have three as…
▽ More
Medical large language models (LLMs) have gained popularity recently due to their significant practical utility. However, most existing research focuses on general medicine, and there is a need for in-depth study of LLMs in specific fields like anesthesiology. To fill the gap, we introduce Hypnos, a Chinese Anesthesia model built upon existing LLMs, e.g., Llama. Hypnos' contributions have three aspects: 1) The data, such as utilizing Self-Instruct, acquired from current LLMs likely includes inaccuracies. Hypnos implements a cross-filtering strategy to improve the data quality. This strategy involves using one LLM to assess the quality of the generated data from another LLM and filtering out the data with low quality. 2) Hypnos employs a general-to-specific training strategy that starts by fine-tuning LLMs using the general medicine data and subsequently improving the fine-tuned LLMs using data specifically from Anesthesiology. The general medical data supplement the medical expertise in Anesthesiology and enhance the effectiveness of Hypnos' generation. 3) We introduce a standardized benchmark for evaluating medical LLM in Anesthesiology. Our benchmark includes both publicly available instances from the Internet and privately obtained cases from the Hospital. Hypnos outperforms other medical LLMs in anesthesiology in metrics, GPT-4, and human evaluation on the benchmark dataset.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Authors:
Heegon Jin,
Seonil Son,
Jemin Park,
Youngseok Kim,
Hyungjong Noh,
Yeonsoo Lee
Abstract:
The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, w…
▽ More
The advent of scalable deep models and large datasets has improved the performance of Neural Machine Translation. Knowledge Distillation (KD) enhances efficiency by transferring knowledge from a teacher model to a more compact student model. However, KD approaches to Transformer architecture often rely on heuristics, particularly when deciding which teacher layers to distill from. In this paper, we introduce the 'Align-to-Distill' (A2D) strategy, designed to address the feature mapping problem by adaptively aligning student attention heads with their teacher counterparts during training. The Attention Alignment Module in A2D performs a dense head-by-head comparison between student and teacher attention heads across layers, turning the combinatorial mapping heuristics into a learning problem. Our experiments show the efficacy of A2D, demonstrating gains of up to +3.61 and +0.63 BLEU points for WMT-2022 De->Dsb and WMT-2014 En->De, respectively, compared to Transformer baselines.
△ Less
Submitted 25 March, 2024; v1 submitted 3 March, 2024;
originally announced March 2024.
-
Phonon-pair-driven Ferroelectricity Causes Costless Domain-walls and Bulk-boundary Duality
Authors:
Hyun-Jae Lee,
Kyoung-June Go,
Pawan Kumar,
Chang Hoon Kim,
Yungyeom Kim,
Kyoungjun Lee,
Takao Shimizu,
Seung Chul Chae,
Hosub Jin,
Minseong Lee,
Umesh Waghmare,
Si-Young Choi,
Jun Hee Lee
Abstract:
Ferroelectric domain walls, recognized as distinct from the bulk in terms of symmetry, structure, and electronic properties, host exotic phenomena including conductive walls, ferroelectric vortices, novel topologies, and negative capacitance. Contrary to conventional understanding, our study reveals that the structure of domain walls in HfO2 closely resembles its bulk. First, our first-principles…
▽ More
Ferroelectric domain walls, recognized as distinct from the bulk in terms of symmetry, structure, and electronic properties, host exotic phenomena including conductive walls, ferroelectric vortices, novel topologies, and negative capacitance. Contrary to conventional understanding, our study reveals that the structure of domain walls in HfO2 closely resembles its bulk. First, our first-principles simulations unveil that the robust ferroelectricity is supported by bosonic pairing of all the anionic phonons in bulk HfO2. Strikingly, the paired phonons strongly bond with each other and successfully reach the center of the domain wall without losing their integrity and produce bulk-like domain walls. We then confirmed preservation of the bulk phonon displacements and consequently full revival of the bulk structure at domain walls via aberration-corrected STEM. The newly found duality between the bulk and the domain wall sheds light on previously enigmatic properties such as zero-energy domain walls, perfect Ising-type polar ordering, and exceptionally robust ferroelectricity at the sub-nm scales. The phonon-pairing discovered here is robust against physical boundaries such as domain walls and enables zero momentum and zero-energy cost local ferroelectric switching. This phenomenon demonstrated in Si-compatible ferroelectrics provides a novel technological platform where data storage on domain walls is as feasible as that within the domains, thereby expanding the potential for high-density data storage and advanced ferroelectric applications.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
Orbifold Kodaira-Spencer maps and closed-string mirror symmetry for punctured Riemann surfaces
Authors:
Hansol Hong,
Hyeongjun Jin,
Sangwook Lee
Abstract:
When a Weinstein manifold admits an action of a finite abelian group, we propose its mirror construction following the equivariant TQFT-type construction, and obtain as a mirror the orbifolding of the mirror of the quotient with respect to the induced dual group action. As an application, we construct an orbifold Landau-Ginzburg mirror of a punctured Riemann surface given as an abelian cover of th…
▽ More
When a Weinstein manifold admits an action of a finite abelian group, we propose its mirror construction following the equivariant TQFT-type construction, and obtain as a mirror the orbifolding of the mirror of the quotient with respect to the induced dual group action. As an application, we construct an orbifold Landau-Ginzburg mirror of a punctured Riemann surface given as an abelian cover of the pair-of-pants, and prove its closed-string mirror symmetry using the (part of) closed-open map twisted by the dual group action.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
A sequence of Type Ib, IIb, II-L, and II-P supernovae from binary-star progenitors of varying initial separation
Authors:
Luc Dessart,
Claudia P. Gutierrez,
Andrea Ercolino,
Harim Jin,
Norbert Langer
Abstract:
Over the last decade, evidence has accumulated that massive stars do not typically evolve in isolation but instead follow a tumultuous journey with a companion star on their way to core collapse. While Roche-lobe overflow appears instrumental for the production of a large fraction of supernovae (SNe) of Type Ib and Ic, variations in the initial orbital period Pinit of massive interacting binaries…
▽ More
Over the last decade, evidence has accumulated that massive stars do not typically evolve in isolation but instead follow a tumultuous journey with a companion star on their way to core collapse. While Roche-lobe overflow appears instrumental for the production of a large fraction of supernovae (SNe) of Type Ib and Ic, variations in the initial orbital period Pinit of massive interacting binaries may also produce a wide diversity of case B, BC, or C systems, with preSN stars endowed from minute to massive H-rich envelopes. Focusing here on the explosion of the primary, donor star, originally of 12.6Msun, we use radiation-hydrodynamics and NLTE time-dependent radiative transfer to document the gas and radiation properties of such SNe, covering from Type Ib, IIb, II-L to II-P. Variations in Pinit are the root cause behind the wide diversity of our SN light curves, with single-peak, double-peak, fast-declining or plateau-like morphologies in the V band. The different ejecta structures, expansion rates, and relative abundances (e.g., H, He, 56Ni) are conducive to much diversity in spectral line shapes (absorption vs emission strength, width) and evolution. We emphasize that Halpha is a key tracer of these modulations, and that HeI7065 is an enduring optical diagnostic for the presence of He. Our grid of simulations fare well against representative SNe Ib, IIb, and IIP SNe, but interaction with circumstellar material, which is ignored in this work, is likely at the origin of the tension between our Type IIL SN models and observations (e.g., SN2006Y). Remaining discrepancies in our model rise time to bolometric maximum call for a proper account of both small-scale and large-scale structures in core-collapse SN ejecta. Discrepant Type IIP SN models, with a large plateau brightness but small line widths, may be cured by adopting more compact red-supergiant star progenitors.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
PAC-FNO: Parallel-Structured All-Component Fourier Neural Operators for Recognizing Low-Quality Images
Authors:
Jinsung Jeon,
Hyundong Jin,
Jonghyun Choi,
Sanghyun Hong,
Dongeun Lee,
Kookjin Lee,
Noseong Park
Abstract:
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple mode…
▽ More
A standard practice in developing image recognition models is to train a model on a specific image resolution and then deploy it. However, in real-world inference, models often encounter images different from the training sets in resolution and/or subject to natural variations such as weather changes, noise types and compression artifacts. While traditional solutions involve training multiple models for different resolutions or input variations, these methods are computationally expensive and thus do not scale in practice. To this end, we propose a novel neural network model, parallel-structured and all-component Fourier neural operator (PAC-FNO), that addresses the problem. Unlike conventional feed-forward neural networks, PAC-FNO operates in the frequency domain, allowing it to handle images of varying resolutions within a single model. We also propose a two-stage algorithm for training PAC-FNO with a minimal modification to the original, downstream model. Moreover, the proposed PAC-FNO is ready to work with existing image recognition models. Extensively evaluating methods with seven image recognition benchmarks, we show that the proposed PAC-FNO improves the performance of existing baseline models on images with various resolutions by up to 77.1% and various types of natural variations in the images at inference.
△ Less
Submitted 14 March, 2024; v1 submitted 20 February, 2024;
originally announced February 2024.
-
Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model
Authors:
Junghun Cha,
Ali Haider,
Seoyun Yang,
Hoeyeong Jin,
Subin Yang,
A. F. M. Shahab Uddin,
Jaehyoung Kim,
Soo Ye Kim,
Sung-Ho Bae
Abstract:
A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies…
▽ More
A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies has become an indispensable task for many products, it has not been systematically explored, and to the best of our knowledge, no public datasets are available. In this paper, we define this problem as Descanning and introduce a new high-quality and large-scale dataset named DESCAN-18K. It contains 18K pairs of original and scanned images collected in the wild containing multiple complex degradations. In order to eliminate such complex degradations, we propose a new image restoration model called DescanDiffusion consisting of a color encoder that corrects the global color degradation and a conditional denoising diffusion probabilistic model (DDPM) that removes local degradations. To further improve the generalization ability of DescanDiffusion, we also design a synthetic data generation scheme by reproducing prominent degradations in scanned images. We demonstrate that our DescanDiffusion outperforms other baselines including commercial restoration products, objectively and subjectively, via comprehensive experiments and analyses.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Hot Carriers from Intra- and Interband Transitions in Gold-Silver Alloy Nanoparticles
Authors:
Shreyas Ramachandran,
Simao Joao,
Hanwen Jin,
Johannes Lischner
Abstract:
Hot electrons and holes generated from the decay of localized surface plasmons in metallic nanoparticles can be harnessed for applications in solar energy conversion and sensing. In this paper, we study the generation of hot carriers in large spherical gold-silver alloy nanoparticles using a recently developed atomistic modelling approach that combines a solution of Maxwell's equations with large-…
▽ More
Hot electrons and holes generated from the decay of localized surface plasmons in metallic nanoparticles can be harnessed for applications in solar energy conversion and sensing. In this paper, we study the generation of hot carriers in large spherical gold-silver alloy nanoparticles using a recently developed atomistic modelling approach that combines a solution of Maxwell's equations with large-scale tight-binding simulations. We find that hot-carrier properties depend sensitively on the alloy composition. Specifically, nanoparticles with a large gold fraction produce hot carriers under visible light illumination while nanoparticles with a large silver fraction require higher photon energies to produce hot carriers. Moreover, most hot carriers in nanoparticles with a large gold fraction originate from interband transitions which give rise to energetic holes and "cold" electrons near the Fermi level. Increasing the silver fraction enhances the generation rate of hot carriers from intraband transitions which produce energetic electrons and "cold" holes. These findings demonstrate that alloy composition is a powerful tuning parameter for the design of nanoparticles for applications in solar energy conversion and sensing that require precise control of hot-carrier properties.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
GUARD: Role-playing to Generate Natural-language Jailbreakings to Test Guideline Adherence of Large Language Models
Authors:
Haibo Jin,
Ruoxi Chen,
Andy Zhou,
Yang Zhang,
Haohan Wang
Abstract:
The discovery of "jailbreaks" to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel y…
▽ More
The discovery of "jailbreaks" to bypass safety filters of Large Language Models (LLMs) and harmful responses have encouraged the community to implement safety measures. One major safety measure is to proactively test the LLMs with jailbreaks prior to the release. Therefore, such testing will require a method that can generate jailbreaks massively and efficiently. In this paper, we follow a novel yet intuitive strategy to generate jailbreaks in the style of the human generation. We propose a role-playing system that assigns four different roles to the user LLMs to collaborate on new jailbreaks. Furthermore, we collect existing jailbreaks and split them into different independent characteristics using clustering frequency and semantic patterns sentence by sentence. We organize these characteristics into a knowledge graph, making them more accessible and easier to retrieve. Our system of different roles will leverage this knowledge graph to generate new jailbreaks, which have proved effective in inducing LLMs to generate unethical or guideline-violating responses. In addition, we also pioneer a setting in our system that will automatically follow the government-issued guidelines to generate jailbreaks to test whether LLMs follow the guidelines accordingly. We refer to our system as GUARD (Guideline Upholding through Adaptive Role-play Diagnostics). We have empirically validated the effectiveness of GUARD on three cutting-edge open-sourced LLMs (Vicuna-13B, LongChat-7B, and Llama-2-7B), as well as a widely-utilized commercial LLM (ChatGPT). Moreover, our work extends to the realm of vision language models (MiniGPT-v2 and Gemini Vision Pro), showcasing GUARD's versatility and contributing valuable insights for the development of safer, more reliable LLM-based applications across diverse modalities.
△ Less
Submitted 30 May, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Authors:
Zirui Liu,
Jiayi Yuan,
Hongye Jin,
Shaochen Zhong,
Zhaozhuo Xu,
Vladimir Braverman,
Beidi Chen,
Xia Hu
Abstract:
Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Addi…
▽ More
Efficiently serving large language models (LLMs) requires batching many requests together to reduce the cost per request. Yet, the key-value (KV) cache, which stores attention keys and values to avoid re-computations, significantly increases memory demands and becomes the new bottleneck in speed and memory usage. This memory demand increases with larger batch sizes and longer context lengths. Additionally, the inference speed is limited by the size of KV cache, as the GPU's SRAM must load the entire KV cache from the main GPU memory for each token generated, causing the computational core to be idle during this process. A straightforward and effective solution to reduce KV cache size is quantization, which decreases the total bytes taken by KV cache. However, there is a lack of in-depth studies that explore the element distribution of KV cache to understand the hardness and limitation of KV cache quantization. To fill the gap, we conducted a comprehensive study on the element distribution in KV cache of popular LLMs. Our findings indicate that the key cache should be quantized per-channel, i.e., group elements along the channel dimension and quantize them together. In contrast, the value cache should be quantized per-token. From this analysis, we developed a tuning-free 2bit KV cache quantization algorithm, named KIVI. With the hardware-friendly implementation, KIVI can enable Llama (Llama-2), Falcon, and Mistral models to maintain almost the same quality while using $\mathbf{2.6\times}$ less peak memory usage (including the model weight). This reduction in memory usage enables up to $\mathbf{4\times}$ larger batch size, bringing $\mathbf{2.35\times \sim 3.47\times}$ throughput on real LLM inference workload. The source code is available at https://github.com/jy-yuan/KIVI.
△ Less
Submitted 5 February, 2024;
originally announced February 2024.
-
DARCS: Memory-Efficient Deep Compressed Sensing Reconstruction for Acceleration of 3D Whole-Heart Coronary MR Angiography
Authors:
Zhihao Xue,
Fan Yang,
Juan Gao,
Zhuo Chen,
Hao Peng,
Chao Zou,
Hang Jin,
Chenxi Hu
Abstract:
Three-dimensional coronary magnetic resonance angiography (CMRA) demands reconstruction algorithms that can significantly suppress the artifacts from a heavily undersampled acquisition. While unrolling-based deep reconstruction methods have achieved state-of-the-art performance on 2D image reconstruction, their application to 3D reconstruction is hindered by the large amount of memory needed to tr…
▽ More
Three-dimensional coronary magnetic resonance angiography (CMRA) demands reconstruction algorithms that can significantly suppress the artifacts from a heavily undersampled acquisition. While unrolling-based deep reconstruction methods have achieved state-of-the-art performance on 2D image reconstruction, their application to 3D reconstruction is hindered by the large amount of memory needed to train an unrolled network. In this study, we propose a memory-efficient deep compressed sensing method by employing a sparsifying transform based on a pre-trained artifact estimation network. The motivation is that the artifact image estimated by a well-trained network is sparse when the input image is artifact-free, and less sparse when the input image is artifact-affected. Thus, the artifact-estimation network can be used as an inherent sparsifying transform. The proposed method, named De-Aliasing Regularization based Compressed Sensing (DARCS), was compared with a traditional compressed sensing method, de-aliasing generative adversarial network (DAGAN), model-based deep learning (MoDL), and plug-and-play for accelerations of 3D CMRA. The results demonstrate that the proposed method improved the reconstruction quality relative to the compared methods by a large margin. Furthermore, the proposed method well generalized for different undersampling rates and noise levels. The memory usage of the proposed method was only 63% of that needed by MoDL. In conclusion, the proposed method achieves improved reconstruction quality for 3D CMRA with reduced memory burden.
△ Less
Submitted 2 February, 2024; v1 submitted 31 January, 2024;
originally announced February 2024.
-
Network-based Topic Structure Visualization
Authors:
Yeseul Jeon,
Jina Park,
Ick Hoon Jin,
Dongjun Chungc
Abstract:
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response d…
▽ More
In the real world, many topics are inter-correlated, making it challenging to investigate their structure and relationships. Understanding the interplay between topics and their relevance can provide valuable insights for researchers, guiding their studies and informing the direction of research. In this paper, we utilize the topic-words distribution, obtained from topic models, as item-response data to model the structure of topics using a latent space item response model. By estimating the latent positions of topics based on their distances toward words, we can capture the underlying topic structure and reveal their relationships. Visualizing the latent positions of topics in Euclidean space allows for an intuitive understanding of their proximity and associations. We interpret relationships among topics by characterizing each topic based on representative words selected using a newly proposed scoring scheme. Additionally, we assess the maturity of topics by tracking their latent positions using different word sets, providing insights into the robustness of topics. To demonstrate the effectiveness of our approach, we analyze the topic composition of COVID-19 studies during the early stage of its emergence using biomedical literature in the PubMed database. The software and data used in this paper are publicly available at https://github.com/jeon9677/gViz .
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Generative Video Diffusion for Unseen Cross-Domain Video Moment Retrieval
Authors:
Dezhao Luo,
Shaogang Gong,
Jiabo Huang,
Hailin Jin,
Yang Liu
Abstract:
Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships. Due to the lack of a diverse and generalisable VMR dataset to facilitate learning scalable moment-text associations, existing methods resort to joint training on both source and target domain videos for cross-domain applications. Meanwhile, recent dev…
▽ More
Video Moment Retrieval (VMR) requires precise modelling of fine-grained moment-text associations to capture intricate visual-language relationships. Due to the lack of a diverse and generalisable VMR dataset to facilitate learning scalable moment-text associations, existing methods resort to joint training on both source and target domain videos for cross-domain applications. Meanwhile, recent developments in vision-language multimodal models pre-trained on large-scale image-text and/or video-text pairs are only based on coarse associations (weakly labelled). They are inadequate to provide fine-grained moment-text correlations required for cross-domain VMR. In this work, we solve the problem of unseen cross-domain VMR, where certain visual and textual concepts do not overlap across domains, by only utilising target domain sentences (text prompts) without accessing their videos. To that end, we explore generative video diffusion for fine-grained editing of source videos controlled by the target sentences, enabling us to simulate target domain videos. We address two problems in video editing for optimising unseen domain VMR: (1) generation of high-quality simulation videos of different moments with subtle distinctions, (2) selection of simulation videos that complement existing source training videos without introducing harmful noise or unnecessary repetitions. On the first problem, we formulate a two-stage video diffusion generation controlled simultaneously by (1) the original video structure of a source video, (2) subject specifics, and (3) a target sentence prompt. This ensures fine-grained variations between video moments. On the second problem, we introduce a hybrid selection mechanism that combines two quantitative metrics for noise filtering and one qualitative metric for leveraging VMR prediction on simulation video selection.
△ Less
Submitted 29 January, 2024; v1 submitted 24 January, 2024;
originally announced January 2024.
-
Boron Abundances in Early B Dwarfs of the Galactic Open Cluster NGC 3293
Authors:
Charles R. Proffitt,
Harim Jin,
Simone Daflon,
Daniel J. Lennon,
Norbert Langer,
Katia Cunha,
Talawanda Monroe
Abstract:
New boron abundances or upper limits have been determined for 8 early-B stars in the young Galactic open cluster NGC 3293, using ultraviolet spectra obtained by the Hubble Space Telescope Cosmic Origins Spectrograph. With previous observations, there are now 18 early-B stars in this cluster with boron measurements. Six of the newly observed stars have projected rotational velocities greater than 2…
▽ More
New boron abundances or upper limits have been determined for 8 early-B stars in the young Galactic open cluster NGC 3293, using ultraviolet spectra obtained by the Hubble Space Telescope Cosmic Origins Spectrograph. With previous observations, there are now 18 early-B stars in this cluster with boron measurements. Six of the newly observed stars have projected rotational velocities greater than 200 km/s, allowing new constraints on rotationally driven mixing in main-sequence stars. When comparing to synthetic model populations, we find that the majority of our sample stars agree well with the predicted trends of stronger boron depletion for larger rotation and for larger mass or luminosity. Based on those, a smaller than the canonical rotational mixing efficiency,(fc = 0.0165 vs the more standard value of 0.033), appears to be required. However, our five most slowly rotating stars are not well explained by rotational mixing, and we speculate that they originate from binary mergers.
△ Less
Submitted 2 May, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Large Transverse Thermopower in Shape-Engineered Tilted Leg Thermopile
Authors:
Ki Mun Bang,
Sang J. Park,
Hyun Yu,
Hyungyu Jin
Abstract:
We demonstrate that a novel device design, where a shape-engineered tilted-leg thermopile structure is employed, significantly enhances the output voltage in the transverse direction. Owing to the shape engineering of the leg geometry, an additional temperature gradient develops along the long direction of the leg, which is perpendicular to the direction of the applied temperature gradient, thereb…
▽ More
We demonstrate that a novel device design, where a shape-engineered tilted-leg thermopile structure is employed, significantly enhances the output voltage in the transverse direction. Owing to the shape engineering of the leg geometry, an additional temperature gradient develops along the long direction of the leg, which is perpendicular to the direction of the applied temperature gradient, thereby generating an additional Seebeck voltage V_SE that adds to the Anomalous Nernst effect (ANE) voltage V_ANE. We further show that a simple adjustment of electrode position within the device can further increase V_SE. The tilted leg device with electrode adjustment demonstrates a 990% enhanced transverse output voltage compared to that of conventional rectangular leg thermopile-structured devices, wherein only the ANE occurs. This combined output voltage from both the Seebeck effect and ANE is equivalent to the value that surpasses the state-of-the-art ANE materials and devices currently available. The numerical analysis shows the tendencies of the electrical and thermal outputs of the tilted-leg device, which guides a way to further improve the output voltage. Our study paves the way to develop highly efficient transverse TE devices that can overcome intrinsic materials challenges by utilizing the degree of freedom of device design.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Data-driven Option Pricing
Authors:
Min Dai,
Hanqing Jin,
Xi Yang
Abstract:
We propose an innovative data-driven option pricing methodology that relies exclusively on the dataset of historical underlying asset prices. While the dataset is rooted in the objective world, option prices are commonly expressed as discounted expectations of their terminal payoffs in a risk-neutral world. Bridging this gap motivates us to identify a pricing kernel process, transforming option pr…
▽ More
We propose an innovative data-driven option pricing methodology that relies exclusively on the dataset of historical underlying asset prices. While the dataset is rooted in the objective world, option prices are commonly expressed as discounted expectations of their terminal payoffs in a risk-neutral world. Bridging this gap motivates us to identify a pricing kernel process, transforming option pricing into evaluating expectations in the objective world. We recover the pricing kernel by solving a utility maximization problem, and evaluate the expectations in terms of a functional optimization problem. Leveraging the deep learning technique, we design data-driven algorithms to solve both optimization problems over the dataset. Numerical experiments are presented to demonstrate the efficiency of our methodology.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
FedRKG: A Privacy-preserving Federated Recommendation Framework via Knowledge Graph Enhancement
Authors:
Dezhong Yao,
Tongtong Liu,
Qi Cao,
Hai Jin
Abstract:
Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To addre…
▽ More
Federated Learning (FL) has emerged as a promising approach for preserving data privacy in recommendation systems by training models locally. Recently, Graph Neural Networks (GNN) have gained popularity in recommendation tasks due to their ability to capture high-order interactions between users and items. However, privacy concerns prevent the global sharing of the entire user-item graph. To address this limitation, some methods create pseudo-interacted items or users in the graph to compensate for missing information for each client. Unfortunately, these methods introduce random noise and raise privacy concerns. In this paper, we propose FedRKG, a novel federated recommendation system, where a global knowledge graph (KG) is constructed and maintained on the server using publicly available item information, enabling higher-order user-item interactions. On the client side, a relation-aware GNN model leverages diverse KG relationships. To protect local interaction items and obscure gradients, we employ pseudo-labeling and Local Differential Privacy (LDP). Extensive experiments conducted on three real-world datasets demonstrate the competitive performance of our approach compared to centralized algorithms while ensuring privacy preservation. Moreover, FedRKG achieves an average accuracy improvement of 4% compared to existing federated learning baselines.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities
Authors:
Zhen Li,
Ning Wang,
Deqing Zou,
Yating Li,
Ruqian Zhang,
Shouhuai Xu,
Chao Zhang,
Hai Jin
Abstract:
Software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as function-level vulnerability detectors. However, the limitation of this approach is not understood. In this paper, we investigate its limitation in detecting one class of vulnerabilit…
▽ More
Software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as function-level vulnerability detectors. However, the limitation of this approach is not understood. In this paper, we investigate its limitation in detecting one class of vulnerabilities known as inter-procedural vulnerabilities, where the to-be-patched statements and the vulnerability-triggering statements belong to different functions. For this purpose, we create the first Inter-Procedural Vulnerability Dataset (InterPVD) based on C/C++ open-source software, and we propose a tool dubbed VulTrigger for identifying vulnerability-triggering statements across functions. Experimental results show that VulTrigger can effectively identify vulnerability-triggering statements and inter-procedural vulnerabilities. Our findings include: (i) inter-procedural vulnerabilities are prevalent with an average of 2.8 inter-procedural layers; and (ii) function-level vulnerability detectors are much less effective in detecting to-be-patched functions of inter-procedural vulnerabilities than detecting their counterparts of intra-procedural vulnerabilities.
△ Less
Submitted 20 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Non-Fermi-liquid behavior in a ferromagnetic heavy fermion system CeTi$_{1-x}$V$_{x}$Ge$_{3}$
Authors:
R. -Z. Lin,
H. Jin,
P. Klavins,
W. -T. Chen,
Y. -Y. Chang,
C. -H. Chung,
V. Taufour,
C. -L. Huang
Abstract:
An investigation of the thermodynamic and electrical transport properties of the isoelectronic chemical substitution series CeTi$_{1-x}$V$_{x}$Ge$_{3}$ (CTVG) single crystals is reported. As x increases, the ferromagnetic (FM) transition temperature is suppressed, reaching absolute zero at the critical concentration x = 0.4, where a non-Fermi-liquid low-temperature specific heat and electrical res…
▽ More
An investigation of the thermodynamic and electrical transport properties of the isoelectronic chemical substitution series CeTi$_{1-x}$V$_{x}$Ge$_{3}$ (CTVG) single crystals is reported. As x increases, the ferromagnetic (FM) transition temperature is suppressed, reaching absolute zero at the critical concentration x = 0.4, where a non-Fermi-liquid low-temperature specific heat and electrical resistivity, as well as the hyperscaling of specific heat and magnetization are found. Our study clearly identifies an FM quantum critical point (QCP) in CTVG. The obtained critical exponents suggest that CTVG falls in the preasymptotic region of the disorder-tuned FM QCP predicted by the Belitz-Kirkpatrick-Vojta theory.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Authors:
Xu Yan,
Haiming Zhang,
Yingjie Cai,
Jingming Guo,
Weichao Qiu,
Bin Gao,
Kaiqiang Zhou,
Yue Zhao,
Huan Jin,
Jiantao Gao,
Zhen Li,
Lihui Jiang,
Wei Zhang,
Hongbo Zhang,
Dengxin Dai,
Bingbing Liu
Abstract:
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains chal…
▽ More
The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Authors:
Hongye Jin,
Xiaotian Han,
Jingfeng Yang,
Zhimeng Jiang,
Zirui Liu,
Chia-Yuan Chang,
Huiyuan Chen,
Xia Hu
Abstract:
It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the con…
▽ More
It is well known that LLMs cannot generalize well to long contexts whose lengths are larger than the training sequence length. This poses challenges when employing LLMs for processing long input sequences during inference. In this work, we argue that LLMs themselves have inherent capabilities to handle long contexts without fine-tuning. To achieve this goal, we propose SelfExtend to extend the context window of LLMs by constructing bi-level attention information: the grouped attention and the neighbor attention. The grouped attention captures the dependencies among tokens that are far apart, while neighbor attention captures dependencies among adjacent tokens within a specified range. The two-level attentions are computed based on the original model's self-attention mechanism during inference. With minor code modification, our SelfExtend can effortlessly extend existing LLMs' context window without any fine-tuning. We conduct comprehensive experiments on multiple benchmarks and the results show that our SelfExtend can effectively extend existing LLMs' context window length. The code can be found at \url{https://github.com/datamllab/LongLM}.
△ Less
Submitted 3 February, 2024; v1 submitted 2 January, 2024;
originally announced January 2024.
-
Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit
Authors:
Yao Wan,
Yang He,
Zhangqian Bi,
Jianguo Zhang,
Hongyu Zhang,
Yulei Sui,
Guandong Xu,
Hai Jin,
Philip S. Yu
Abstract:
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language…
▽ More
Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is already a thriving research community focusing on code intelligence, with efforts ranging from software engineering, machine learning, data mining, natural language processing, and programming languages. In this paper, we conduct a comprehensive literature review on deep learning for code intelligence, from the aspects of code representation learning, deep learning techniques, and application tasks. We also benchmark several state-of-the-art neural models for code intelligence, and provide an open-source toolkit tailored for the rapid prototyping of deep-learning-based code intelligence models. In particular, we inspect the existing code intelligence models under the basis of code representation learning, and provide a comprehensive overview to enhance comprehension of the present state of code intelligence. Furthermore, we publicly release the source code and data resources to provide the community with a ready-to-use benchmark, which can facilitate the evaluation and comparison of existing and future code intelligence models (https://xcodemind.github.io). At last, we also point out several challenging and promising directions for future research.
△ Less
Submitted 30 December, 2023;
originally announced January 2024.
-
Towards Mitigating Dimensional Collapse of Representations in Collaborative Filtering
Authors:
Huiyuan Chen,
Vivian Lai,
Hongye Jin,
Zhimeng Jiang,
Mahashweta Das,
Xia Hu
Abstract:
Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the \textsl{dimensional collapse} issue, where user/item embeddings only span a low-dimensi…
▽ More
Contrastive Learning (CL) has shown promising performance in collaborative filtering. The key idea is to generate augmentation-invariant embeddings by maximizing the Mutual Information between different augmented views of the same instance. However, we empirically observe that existing CL models suffer from the \textsl{dimensional collapse} issue, where user/item embeddings only span a low-dimension subspace of the entire feature space. This suppresses other dimensional information and weakens the distinguishability of embeddings. Here we propose a non-contrastive learning objective, named nCL, which explicitly mitigates dimensional collapse of representations in collaborative filtering. Our nCL aims to achieve geometric properties of \textsl{Alignment} and \textsl{Compactness} on the embedding space. In particular, the alignment tries to push together representations of positive-related user-item pairs, while compactness tends to find the optimal coding length of user/item embeddings, subject to a given distortion. More importantly, our nCL does not require data augmentation nor negative sampling during training, making it scalable to large datasets. Experimental results demonstrate the superiority of our nCL.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
Compositional Generalization in Spoken Language Understanding
Authors:
Avik Ray,
Yilin Shen,
Hongxia Jin
Abstract:
State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and f…
▽ More
State-of-the-art spoken language understanding (SLU) models have shown tremendous success in benchmark SLU datasets, yet they still fail in many practical scenario due to the lack of model compositionality when trained on limited training data. In this paper, we study two types of compositionality: (a) novel slot combination, and (b) length generalization. We first conduct in-depth analysis, and find that state-of-the-art SLU models often learn spurious slot correlations during training, which leads to poor performance in both compositional cases. To mitigate these limitations, we create the first compositional splits of benchmark SLU datasets and we propose the first compositional SLU model, including compositional loss and paired training that tackle each compositional case respectively. On both benchmark and compositional splits in ATIS and SNIPS, we show that our compositional SLU model significantly outperforms (up to $5\%$ F1 score) state-of-the-art BERT SLU model.
△ Less
Submitted 25 December, 2023;
originally announced December 2023.
-
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Authors:
Xupeng Miao,
Gabriele Oliaro,
Zhihao Zhang,
Xinhao Cheng,
Hongyi Jin,
Tianqi Chen,
Zhihao Jia
Abstract:
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This…
▽ More
In the rapidly evolving landscape of artificial intelligence (AI), generative large language models (LLMs) stand at the forefront, revolutionizing how we interact with our data. However, the computational intensity and memory consumption of deploying these models present substantial challenges in terms of serving efficiency, particularly in scenarios demanding low latency and high throughput. This survey addresses the imperative need for efficient LLM serving methodologies from a machine learning system (MLSys) research perspective, standing at the crux of advanced AI innovations and practical system optimizations. We provide in-depth analysis, covering a spectrum of solutions, ranging from cutting-edge algorithmic modifications to groundbreaking changes in system designs. The survey aims to provide a comprehensive understanding of the current state and future directions in efficient LLM serving, offering valuable insights for researchers and practitioners in overcoming the barriers of effective LLM deployment, thereby reshaping the future of AI.
△ Less
Submitted 23 December, 2023;
originally announced December 2023.
-
MISA: Unveiling the Vulnerabilities in Split Federated Learning
Authors:
Wei Wan,
Yuxuan Ning,
Shengshan Hu,
Lulu Xue,
Minghui Li,
Leo Yu Zhang,
Hai Jin
Abstract:
\textit{Federated learning} (FL) and \textit{split learning} (SL) are prevailing distributed paradigms in recent years. They both enable shared global model training while keeping data localized on users' devices. The former excels in parallel execution capabilities, while the latter enjoys low dependence on edge computing resources and strong privacy protection. \textit{Split federated learning}…
▽ More
\textit{Federated learning} (FL) and \textit{split learning} (SL) are prevailing distributed paradigms in recent years. They both enable shared global model training while keeping data localized on users' devices. The former excels in parallel execution capabilities, while the latter enjoys low dependence on edge computing resources and strong privacy protection. \textit{Split federated learning} (SFL) combines the strengths of both FL and SL, making it one of the most popular distributed architectures. Furthermore, a recent study has claimed that SFL exhibits robustness against poisoning attacks, with a fivefold improvement compared to FL in terms of robustness.
In this paper, we present a novel poisoning attack known as MISA. It poisons both the top and bottom models, causing a \textbf{\underline{misa}}lignment in the global model, ultimately leading to a drastic accuracy collapse. This attack unveils the vulnerabilities in SFL, challenging the conventional belief that SFL is robust against poisoning attacks. Extensive experiments demonstrate that our proposed MISA poses a significant threat to the availability of SFL, underscoring the imperative for academia and industry to accord this matter due attention.
△ Less
Submitted 19 December, 2023; v1 submitted 18 December, 2023;
originally announced December 2023.
-
Operator-learning-inspired Modeling of Neural Ordinary Differential Equations
Authors:
Woojin Cho,
Seunghyeon Cho,
Hyundong Jin,
Jinsung Jeon,
Kookjin Lee,
Sanghyun Hong,
Dongeun Lee,
Jonghyun Choi,
Noseong Park
Abstract:
Neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are currently utilized for various downstream tasks, e.g., image classification, time series classification, image generation, etc. Its key part is how to model the time-derivative of the hi…
▽ More
Neural ordinary differential equations (NODEs), one of the most influential works of the differential equation-based deep learning, are to continuously generalize residual networks and opened a new field. They are currently utilized for various downstream tasks, e.g., image classification, time series classification, image generation, etc. Its key part is how to model the time-derivative of the hidden state, denoted dh(t)/dt. People have habitually used conventional neural network architectures, e.g., fully-connected layers followed by non-linear activations. In this paper, however, we present a neural operator-based method to define the time-derivative term. Neural operators were initially proposed to model the differential operator of partial differential equations (PDEs). Since the time-derivative of NODEs can be understood as a special type of the differential operator, our proposed method, called branched Fourier neural operator (BFNO), makes sense. In our experiments with general downstream tasks, our method significantly outperforms existing methods.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
AT4CTR: Auxiliary Match Tasks for Enhancing Click-Through Rate Prediction
Authors:
Qi Liu,
Xuyang Hou,
Defu Lian,
Zhe Wang,
Haoran Jin,
Jia Cheng,
Jun Lei
Abstract:
Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting i…
▽ More
Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting in a decline in performance. In this paper, we propose \textbf{A}uxiliary Match \textbf{T}asks for enhancing \textbf{C}lick-\textbf{T}hrough \textbf{R}ate prediction accuracy (AT4CTR) by alleviating the data sparsity problem. Specifically, we design two match tasks inspired by collaborative filtering to enhance the relevance modeling between user and item. As the "click" action is a strong signal which indicates the user's preference towards the item directly, we make the first match task aim at pulling closer the representation between the user and the item regarding the positive samples. Since the user's past click behaviors can also be treated as the user him/herself, we apply the next item prediction as the second match task. For both the match tasks, we choose the InfoNCE as their loss function. The two match tasks can provide meaningful training signals to speed up the model's convergence and alleviate the data sparsity. We conduct extensive experiments on one public dataset and one large-scale industrial recommendation dataset. The result demonstrates the effectiveness of the proposed auxiliary match tasks. AT4CTR has been deployed in the real industrial advertising system and has gained remarkable revenue.
△ Less
Submitted 18 December, 2023; v1 submitted 9 December, 2023;
originally announced December 2023.
-
A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware
Authors:
Dianshu Liao,
Shidong Pan,
Xiaoyu Sun,
Xiaoxue Ren,
Qing Huang,
Zhenchang Xing,
Huan Jin,
Qinying Li
Abstract:
Code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed A^3-CodGen, to harness information within the code repository to generate code with fewer…
▽ More
Code generation tools are essential to help developers in the software development process. Existing tools often disconnect with the working context, i.e., the code repository, causing the generated code to be not similar to human developers. In this paper, we propose a novel code generation framework, dubbed A^3-CodGen, to harness information within the code repository to generate code with fewer potential logical errors, code redundancy, and library-induced compatibility issues. We identify three categories of representative information for the code repository: local-aware information from current code file, global-aware information from other code files, and third-party-library information. Results demonstrate that by adopting the A^3-CodGen framework, we successfully extract, fuse, and feed code repository information into the LLM, generating more accurate, efficient, and highly reusable code. The effectiveness of our framework is further underscored by generating code with a higher reuse rate, compared to human developers. This research contributes significantly to the field of code generation, providing developers with a more powerful tool to address the evolving demands in software development in practice.
△ Less
Submitted 5 March, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Riemannian Complex Matrix Convolution Network for PolSAR Image Classification
Authors:
Junfei Shi,
Wei Wang,
Haiyan Jin,
Mengmeng Nie,
Shanshan Ji
Abstract:
Recently, deep learning methods have achieved superior performance for Polarimetric Synthetic Aperture Radar(PolSAR) image classification. Existing deep learning methods learn PolSAR data by converting the covariance matrix into a feature vector or complex-valued vector as the input. However, all these methods cannot learn the structure of complex matrix directly and destroy the channel correlatio…
▽ More
Recently, deep learning methods have achieved superior performance for Polarimetric Synthetic Aperture Radar(PolSAR) image classification. Existing deep learning methods learn PolSAR data by converting the covariance matrix into a feature vector or complex-valued vector as the input. However, all these methods cannot learn the structure of complex matrix directly and destroy the channel correlation. To learn geometric structure of complex matrix, we propose a Riemannian complex matrix convolution network for PolSAR image classification in Riemannian space for the first time, which directly utilizes the complex matrix as the network input and defines the Riemannian operations to learn complex matrix's features. The proposed Riemannian complex matrix convolution network considers PolSAR complex matrix endowed in Riemannian manifold, and defines a series of new Riemannian convolution, ReLu and LogEig operations in Riemannian space, which breaks through the Euclidean constraint of conventional networks. Then, a CNN module is appended to enhance contextual Riemannian features. Besides, a fast kernel learning method is developed for the proposed method to learn class-specific features and reduce the computation time effectively. Experiments are conducted on three sets of real PolSAR data with different bands and sensors. Experiments results demonstrates the proposed method can obtain superior performance than the state-of-the-art methods.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Prompt Tuning for Zero-shot Compositional Learning
Authors:
Lingyu Zhang,
Ting Hua,
Yilin Shen,
Hongxia Jin
Abstract:
Open World Compositional Zero-Shot Learning (OW-CZSL) is known to be an extremely challenging task, which aims to recognize unseen compositions formed from seen attributes and objects without any prior assumption of the output space. In order to achieve this goal, a model has to be "smart" and "knowledgeable". To be smart, a model should be good at reasoning the interactions between attributes and…
▽ More
Open World Compositional Zero-Shot Learning (OW-CZSL) is known to be an extremely challenging task, which aims to recognize unseen compositions formed from seen attributes and objects without any prior assumption of the output space. In order to achieve this goal, a model has to be "smart" and "knowledgeable". To be smart, a model should be good at reasoning the interactions between attributes and objects from the seen compositions. While "knowledgeable" means the model owns "common sense" to the open world that can "foresee" some features of the unseen compositions. Most previous work focuses on the "smart" part, while few of them provided an effective solution to achieve the "knowledgeable" goal. In this paper, we proposed a framework named Multi-Modal Prompt Tuning (MMPT) to inherit the "knowledgeable" property from the large pre-trained vision-language model. Extensive experiments show that our proposed MMPT obtains new state-of-the-art results in OW-CZSL task. On the UT-Zappos dataset, MMPT pushes the AUC score to $29.8$, while the previous best score is $26.5$. On the more challenging MIT-States dataset, the AUC score of MMPT is 1.5 times better than the current state-of-the-art.
△ Less
Submitted 2 December, 2023;
originally announced December 2023.
-
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Authors:
Minchul Kim,
Shangqian Gao,
Yen-Chang Hsu,
Yilin Shen,
Hongxia Jin
Abstract:
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. Multiple solutions rely on token pruning or token merging. In this paper, we introduce "Token Fusion" (ToFu), a method that…
▽ More
Vision Transformers (ViTs) have emerged as powerful backbones in computer vision, outperforming many traditional CNNs. However, their computational overhead, largely attributed to the self-attention mechanism, makes deployment on resource-constrained edge devices challenging. Multiple solutions rely on token pruning or token merging. In this paper, we introduce "Token Fusion" (ToFu), a method that amalgamates the benefits of both token pruning and token merging. Token pruning proves advantageous when the model exhibits sensitivity to input interpolations, while token merging is effective when the model manifests close to linear responses to inputs. We combine this to propose a new scheme called Token Fusion. Moreover, we tackle the limitations of average merging, which doesn't preserve the intrinsic feature norm, resulting in distributional shifts. To mitigate this, we introduce MLERP merging, a variant of the SLERP technique, tailored to merge multiple tokens while maintaining the norm distribution. ToFu is versatile, applicable to ViTs with or without additional training. Our empirical evaluations indicate that ToFu establishes new benchmarks in both classification and image generation tasks concerning computational efficiency and model accuracy.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters
Authors:
James Seale Smith,
Yen-Chang Hsu,
Zsolt Kira,
Yilin Shen,
Hongxia Jin
Abstract:
Recent work has demonstrated a remarkable ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential (i.e., continual) manner while only providing a few example images for each concept. This setting is known as continual diffusion. Here, we ask the question: Can we scale these methods to longer concept sequences without forgetting? Although prior work mi…
▽ More
Recent work has demonstrated a remarkable ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential (i.e., continual) manner while only providing a few example images for each concept. This setting is known as continual diffusion. Here, we ask the question: Can we scale these methods to longer concept sequences without forgetting? Although prior work mitigates the forgetting of previously learned concepts, we show that its capacity to learn new tasks reaches saturation over longer sequences. We address this challenge by introducing a novel method, STack-And-Mask INcremental Adapters (STAMINA), which is composed of low-ranked attention-masked adapters and customized MLP tokens. STAMINA is designed to enhance the robust fine-tuning properties of LoRA for sequential concept learning via learnable hard-attention masks parameterized with low rank MLPs, enabling precise, scalable learning via sparse adaptation. Notably, all introduced trainable parameters can be folded back into the model after training, inducing no additional inference parameter costs. We show that STAMINA outperforms the prior SOTA for the setting of text-to-image continual customization on a 50-concept benchmark composed of landmarks and human faces, with no stored replay data. Additionally, we extended our method to the setting of continual learning for image classification, demonstrating that our gains also translate to state-of-the-art performance in this standard benchmark.
△ Less
Submitted 2 May, 2024; v1 submitted 30 November, 2023;
originally announced November 2023.
-
Mechanical Characterization and Inverse Design of Stochastic Architected Metamaterials Using Neural Operators
Authors:
Hanxun Jin,
Enrui Zhang,
Boyu Zhang,
Sridhar Krishnaswamy,
George Em Karniadakis,
Horacio D. Espinosa
Abstract:
Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale…
▽ More
Machine learning (ML) is emerging as a transformative tool for the design of architected materials, offering properties that far surpass those achievable through lab-based trial-and-error methods. However, a major challenge in current inverse design strategies is their reliance on extensive computational and/or experimental datasets, which becomes particularly problematic for designing micro-scale stochastic architected materials that exhibit nonlinear mechanical behaviors. Here, we introduce a new end-to-end scientific ML framework, leveraging deep neural operators (DeepONet), to directly learn the relationship between the complete microstructure and mechanical response of architected metamaterials from sparse but high-quality in situ experimental data. The approach facilitates the inverse design of structures tailored to specific nonlinear mechanical behaviors. Results obtained from spinodal microstructures, printed using two-photon lithography, reveal that the prediction error for mechanical responses is within a range of 5 - 10%. Our work underscores that by employing neural operators with advanced micro-mechanics experimental techniques, the design of complex micro-architected materials with desired properties becomes feasible, even in scenarios constrained by data scarcity. Our work marks a significant advancement in the field of materials-by-design, potentially heralding a new era in the discovery and development of next-generation metamaterials with unparalleled mechanical characteristics derived directly from experimental insights.
△ Less
Submitted 10 December, 2023; v1 submitted 23 November, 2023;
originally announced November 2023.
-
On the Feasibility of Reasoning about the Internal States of Blackbox IoT Devices Using Side-Channel Information
Authors:
Wei Sun,
Yuwei Xiao,
Haojian Jin,
Dinesh Bharadia
Abstract:
Internet of Things (IoT) devices are typically designed to function in a secure, closed environment, making it difficult for users to comprehend devices' behaviors. This paper shows that a user can leverage side-channel information to reason fine-grained internal states of black box IoT devices. The key enablers for our design are a multi-model sensing technique that fuses power consumption, netwo…
▽ More
Internet of Things (IoT) devices are typically designed to function in a secure, closed environment, making it difficult for users to comprehend devices' behaviors. This paper shows that a user can leverage side-channel information to reason fine-grained internal states of black box IoT devices. The key enablers for our design are a multi-model sensing technique that fuses power consumption, network traffic, and radio emanations and an annotation interface that helps users form mental models of a black box IoT system. We built a prototype of our design and evaluated the prototype with open-source IoT devices and black-box commercial devices. Our experiments show a false positive rate of 1.44% for open-source IoT devices' state probing, and our participants take an average of 19.8 minutes to reason the internal states of black-box IoT devices.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
EditShield: Protecting Unauthorized Image Editing by Instruction-guided Diffusion Models
Authors:
Ruoxi Chen,
Haibo Jin,
Jinyin Chen,
Lichao Sun
Abstract:
Text-to-image diffusion models have emerged as an evolutionary for producing creative content in image synthesis. Based on the impressive generation abilities of these models, instruction-guided diffusion models can edit images with simple instructions and input images. While they empower users to obtain their desired edited images with ease, they have raised concerns about unauthorized image mani…
▽ More
Text-to-image diffusion models have emerged as an evolutionary for producing creative content in image synthesis. Based on the impressive generation abilities of these models, instruction-guided diffusion models can edit images with simple instructions and input images. While they empower users to obtain their desired edited images with ease, they have raised concerns about unauthorized image manipulation. Prior research has delved into the unauthorized use of personalized diffusion models; however, this problem of instruction-guided diffusion models remains largely unexplored. In this paper, we first propose a protection method EditShield against unauthorized modifications from such models. Specifically, EditShield works by adding imperceptible perturbations that can shift the latent representation used in the diffusion process, forcing models to generate unrealistic images with mismatched subjects. Our extensive experiments demonstrate EditShield's effectiveness among synthetic and real-world datasets. Besides, EditShield also maintains robustness against various editing types and synonymous instruction phrases.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Deep Group Interest Modeling of Full Lifelong User Behaviors for CTR Prediction
Authors:
Qi Liu,
Xuyang Hou,
Haoran Jin,
jin Chen,
Zhe Wang,
Defu Lian,
Tan Qu,
Jia Cheng,
Jun Lei
Abstract:
Extracting users' interests from their lifelong behavior sequence is crucial for predicting Click-Through Rate (CTR). Most current methods employ a two-stage process for efficiency: they first select historical behaviors related to the candidate item and then deduce the user's interest from this narrowed-down behavior sub-sequence. This two-stage paradigm, though effective, leads to information lo…
▽ More
Extracting users' interests from their lifelong behavior sequence is crucial for predicting Click-Through Rate (CTR). Most current methods employ a two-stage process for efficiency: they first select historical behaviors related to the candidate item and then deduce the user's interest from this narrowed-down behavior sub-sequence. This two-stage paradigm, though effective, leads to information loss. Solely using users' lifelong click behaviors doesn't provide a complete picture of their interests, leading to suboptimal performance. In our research, we introduce the Deep Group Interest Network (DGIN), an end-to-end method to model the user's entire behavior history. This includes all post-registration actions, such as clicks, cart additions, purchases, and more, providing a nuanced user understanding. We start by grouping the full range of behaviors using a relevant key (like item_id) to enhance efficiency. This process reduces the behavior length significantly, from O(10^4) to O(10^2). To mitigate the potential loss of information due to grouping, we incorporate two categories of group attributes. Within each group, we calculate statistical information on various heterogeneous behaviors (like behavior counts) and employ self-attention mechanisms to highlight unique behavior characteristics (like behavior type). Based on this reorganized behavior data, the user's interests are derived using the Transformer technique. Additionally, we identify a subset of behaviors that share the same item_id with the candidate item from the lifelong behavior sequence. The insights from this subset reveal the user's decision-making process related to the candidate item, improving prediction accuracy. Our comprehensive evaluation, both on industrial and public datasets, validates DGIN's efficacy and efficiency.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
PsyEval: A Suite of Mental Health Related Tasks for Evaluating Large Language Models
Authors:
Haoan Jin,
Siyuan Chen,
Dilawaier Dilixiati,
Yewei Jiang,
Mengyue Wu,
Kenny Q. Zhu
Abstract:
Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critic…
▽ More
Evaluating Large Language Models (LLMs) in the mental health domain poses distinct challenged from other domains, given the subtle and highly subjective nature of symptoms that exhibit significant variability among individuals. This paper presents PsyEval, the first comprehensive suite of mental health-related tasks for evaluating LLMs. PsyEval encompasses five sub-tasks that evaluate three critical dimensions of mental health. This comprehensive framework is designed to thoroughly assess the unique challenges and intricacies of mental health-related tasks, making PsyEval a highly specialized and valuable tool for evaluating LLM performance in this domain. We evaluate twelve advanced LLMs using PsyEval. Experiment results not only demonstrate significant room for improvement in current LLMs concerning mental health but also unveil potential directions for future model optimization.
△ Less
Submitted 3 June, 2024; v1 submitted 15 November, 2023;
originally announced November 2023.
-
An Extensive Study on Adversarial Attack against Pre-trained Models of Code
Authors:
Xiaohu Du,
Ming Wen,
Zichao Wei,
Shangwen Wang,
Hai Jin
Abstract:
Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have be…
▽ More
Transformer-based pre-trained models of code (PTMC) have been widely utilized and have achieved state-of-the-art performance in many mission-critical applications. However, they can be vulnerable to adversarial attacks through identifier substitution or coding style transformation, which can significantly degrade accuracy and may further incur security concerns. Although several approaches have been proposed to generate adversarial examples for PTMC, the effectiveness and efficiency of such approaches, especially on different code intelligence tasks, has not been well understood. To bridge this gap, this study systematically analyzes five state-of-the-art adversarial attack approaches from three perspectives: effectiveness, efficiency, and the quality of generated examples. The results show that none of the five approaches balances all these perspectives. Particularly, approaches with a high attack success rate tend to be time-consuming; the adversarial code they generate often lack naturalness, and vice versa. To address this limitation, we explore the impact of perturbing identifiers under different contexts and find that identifier substitution within for and if statements is the most effective. Based on these findings, we propose a new approach that prioritizes different types of statements for various tasks and further utilizes beam search to generate adversarial examples. Evaluation results show that it outperforms the state-of-the-art ALERT in terms of both effectiveness and efficiency while preserving the naturalness of the generated adversarial examples.
△ Less
Submitted 23 November, 2023; v1 submitted 13 November, 2023;
originally announced November 2023.
-
Mechanical Metamaterials Fabricated from Self-assembly: A Perspective
Authors:
Hanxun Jin,
Horacio D. Espinosa
Abstract:
Mechanical metamaterials, whose unique mechanical properties stem from their structural design rather than material constituents, are gaining popularity in engineering applications. In particular, recent advances in self-assembly techniques offer the potential to fabricate load-bearing mechanical metamaterials with unparalleled feature size control and scalability compared to those produced by add…
▽ More
Mechanical metamaterials, whose unique mechanical properties stem from their structural design rather than material constituents, are gaining popularity in engineering applications. In particular, recent advances in self-assembly techniques offer the potential to fabricate load-bearing mechanical metamaterials with unparalleled feature size control and scalability compared to those produced by additive manufacturing (AM). Yet, the field is still in its early stages. In this perspective, we first provide an overview of the state-of-the-art self-assembly techniques, with a focus on the copolymer and colloid crystal self-assembly processes. We then discuss current challenges and future opportunities in this research area, focusing on novel fabrication approaches, the need for high-throughput characterization methods, and the integration of Machine Learning (ML) and lab automation for inverse design. Given recent progress in all these areas, we foresee mechanical metamaterials fabricated from self-assembly techniques impacting a variety of applications relying on lightweight, strong, and tough materials.
△ Less
Submitted 11 November, 2023;
originally announced November 2023.
-
A variational Monte Carlo approach to the SU(4) spin-orbital model on the triangular lattice
Authors:
Chun Zhang,
Hui-Ke Jin,
Yi Zhou
Abstract:
Previous investigations have suggested that the simplest spin-orbital model on the simplest frustrated lattice can host a nematic quantum spin-orbital liquid state. Namely, the orbital degeneracy of the SU(4) Kugel-Khomskii (KK) model tends to enhance quantum fluctuations and stabilize a quantum spin-orbital liquid exhibiting stripy features on the triangular lattice, as revealed by the state-of-t…
▽ More
Previous investigations have suggested that the simplest spin-orbital model on the simplest frustrated lattice can host a nematic quantum spin-orbital liquid state. Namely, the orbital degeneracy of the SU(4) Kugel-Khomskii (KK) model tends to enhance quantum fluctuations and stabilize a quantum spin-orbital liquid exhibiting stripy features on the triangular lattice, as revealed by the state-of-the-art method of the density matrix renormalization group boosted by Gutzwiller projected wave functions. In this work, using the variational quantum Monte Carlo method, we have studied several spin-orbital liquid states, including a uniform $π$ flux state, three stripy states, and a plaquette state, on the $L\times{}L$ torus up to $L=24$. It turns out that one of these stripy states, called the "stripe-II" state, is energetically favored. This ground state breaks the $C_6$ symmetry of the lattice, resulting in a reduced $C_2$ symmetry and doubled unit cells, while preserving the SU(4) spin-orbital rotation symmetry. Such a nematic quantum spin-orbital liquid state can be characterized by a parton Fermi surface (FS) consisting of open orbits in the Brillouin zone, in contrast to the circular FS of the uniform $π$-flux state.
△ Less
Submitted 7 November, 2023; v1 submitted 5 November, 2023;
originally announced November 2023.
-
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
Authors:
Ruihang Lai,
Junru Shao,
Siyuan Feng,
Steven S. Lyubomirsky,
Bohan Hou,
Wuwei Lin,
Zihao Ye,
Hongyi Jin,
Yuchen Jin,
Jiawei Liu,
Lesheng Jin,
Yaxing Cai,
Ziheng Jiang,
Yong Wu,
Sunghyun Park,
Prakalp Srivastava,
Jared G. Roesch,
Todd C. Mowry,
Tianqi Chen
Abstract:
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape…
▽ More
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Let's Discover More API Relations: A Large Language Model-based AI Chain for Unsupervised API Relation Inference
Authors:
Qing Huang,
Yanbang Sun,
Zhenchang Xing,
Yuanlong Cao,
Jieshan Chen,
Xiwei Xu,
Huan Jin,
Jiaxing Lu
Abstract:
APIs have intricate relations that can be described in text and represented as knowledge graphs to aid software engineering tasks. Existing relation extraction methods have limitations, such as limited API text corpus and affected by the characteristics of the input text.To address these limitations, we propose utilizing large language models (LLMs) (e.g., GPT-3.5) as a neural knowledge base for A…
▽ More
APIs have intricate relations that can be described in text and represented as knowledge graphs to aid software engineering tasks. Existing relation extraction methods have limitations, such as limited API text corpus and affected by the characteristics of the input text.To address these limitations, we propose utilizing large language models (LLMs) (e.g., GPT-3.5) as a neural knowledge base for API relation inference. This approach leverages the entire Web used to pre-train LLMs as a knowledge base and is insensitive to the context and complexity of input texts. To ensure accurate inference, we design our analytic flow as an AI Chain with three AI modules: API FQN Parser, API Knowledge Extractor, and API Relation Decider. The accuracy of the API FQN parser and API Relation Decider module are 0.81 and 0.83, respectively. Using the generative capacity of the LLM and our approach's inference capability, we achieve an average F1 value of 0.76 under the three datasets, significantly higher than the state-of-the-art method's average F1 value of 0.40. Compared to CoT-based method, our AI Chain design improves the inference reliability by 67%, and the AI-crowd-intelligence strategy enhances the robustness of our approach by 26%.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Triplet Attention Transformer for Spatiotemporal Predictive Learning
Authors:
Xuesong Nie,
Xi Chen,
Haoyuan Jin,
Zhihang Zhu,
Yunfeng Yan,
Donglian Qi
Abstract:
Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maint…
▽ More
Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maintaining computational efficiency, we propose an innovative triplet attention transformer designed to capture both inter-frame dynamics and intra-frame static features. Specifically, the model incorporates the Triplet Attention Module (TAM), which replaces traditional recurrent units by exploring self-attention mechanisms in temporal, spatial, and channel dimensions. In this configuration: (i) temporal tokens contain abstract representations of inter-frame, facilitating the capture of inherent temporal dependencies; (ii) spatial and channel attention combine to refine the intra-frame representation by performing fine-grained interactions across spatial and channel dimensions. Alternating temporal, spatial, and channel-level attention allows our approach to learn more complex short- and long-range spatiotemporal dependencies. Extensive experiments demonstrate performance surpassing existing recurrent-based and recurrent-free methods, achieving state-of-the-art under multi-scenario examination including moving object trajectory prediction, traffic flow prediction, driving scene prediction, and human motion capture.
△ Less
Submitted 28 October, 2023;
originally announced October 2023.
-
Sky location of Galactic white dwarf binaries in space-based gravitational wave detection
Authors:
Pan Guo,
Hong-Bo Jin,
Cong-Feng Qiao,
Yue-Liang Wu
Abstract:
Quickly localizing the identified white dwarf (WD) binaries is the basic requirement for the space-based gravitational wave (GW) detection. In fact, the amplitude of GW signals are modulated by the periodic motion of GW detectors on the solar orbit. The intensity of the observed signals is enhanced according to the observation time beyond a year to enhance a high signal to noise ratio (SNR). As da…
▽ More
Quickly localizing the identified white dwarf (WD) binaries is the basic requirement for the space-based gravitational wave (GW) detection. In fact, the amplitude of GW signals are modulated by the periodic motion of GW detectors on the solar orbit. The intensity of the observed signals is enhanced according to the observation time beyond a year to enhance a high signal to noise ratio (SNR). As data gap exists, the completeness of the data observed for a long time depends on filling gaps in the data. Actually, in a year period, the GW sources have a best observation orbit position of GW detectors, where the detector response intensity of GW is maximum. Thus, the best positions, where the direction of GW source is perpendicular to the detection arms, can be searched for the verified GW sources of the sky map to enhance SNR too. For the three arms response intensity of the GW signals changing more clearly with the location of the GW sources relative to the detector, the noises and the suppression of noise by time delay interferometer are ignored. In the four chosen sources, the two verification WD binaries: J0806 and V407 Vul are observed at the best orbit positions by TAIJI for the short time of 2 and 3 days respectively. The intensities of those GWs are above the values of the TAIJI sensitivity curve, significantly. Compared with a single detector, the network of two detectors does not significantly improve the accuracy of location of the verification binaries. These results imply that the searching of GW signals and parameter estimation of GW sources from the experimental data of the space-based mission do not ignore the orbit positions relevant to GW sources.
△ Less
Submitted 28 March, 2024; v1 submitted 25 October, 2023;
originally announced October 2023.
-
Meta learning with language models: Challenges and opportunities in the classification of imbalanced text
Authors:
Apostol Vassilev,
Honglan Jin,
Munawar Hasan
Abstract:
Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta…
▽ More
Detecting out of policy speech (OOPS) content is important but difficult. While machine learning is a powerful tool to tackle this challenging task, it is hard to break the performance ceiling due to factors like quantity and quality limitations on training data and inconsistencies in OOPS definition and data labeling. To realize the full potential of available limited resources, we propose a meta learning technique (MLT) that combines individual models built with different text representations. We analytically show that the resulting technique is numerically stable and produces reasonable combining weights. We combine the MLT with a threshold-moving (TM) technique to further improve the performance of the combined predictor on highly-imbalanced in-distribution and out-of-distribution datasets. We also provide computational results to show the statistically significant advantages of the proposed MLT approach.
All authors contributed equally to this work.
△ Less
Submitted 24 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Hetero$^2$Net: Heterophily-aware Representation Learning on Heterogenerous Graphs
Authors:
Jintang Li,
Zheng Wei,
Jiawang Dan,
Jing Zhou,
Yuchang Zhu,
Ruofan Wu,
Baokun Wang,
Zhang Zhen,
Changhua Meng,
Hong Jin,
Zibin Zheng,
Liang Chen
Abstract:
Real-world graphs are typically complex, exhibiting heterogeneity in the global structure, as well as strong heterophily within local neighborhoods. While a growing body of literature has revealed the limitations of common graph neural networks (GNNs) in handling homogeneous graphs with heterophily, little work has been conducted on investigating the heterophily properties in the context of hetero…
▽ More
Real-world graphs are typically complex, exhibiting heterogeneity in the global structure, as well as strong heterophily within local neighborhoods. While a growing body of literature has revealed the limitations of common graph neural networks (GNNs) in handling homogeneous graphs with heterophily, little work has been conducted on investigating the heterophily properties in the context of heterogeneous graphs. To bridge this research gap, we identify the heterophily in heterogeneous graphs using metapaths and propose two practical metrics to quantitatively describe the levels of heterophily. Through in-depth investigations on several real-world heterogeneous graphs exhibiting varying levels of heterophily, we have observed that heterogeneous graph neural networks (HGNNs), which inherit many mechanisms from GNNs designed for homogeneous graphs, fail to generalize to heterogeneous graphs with heterophily or low level of homophily. To address the challenge, we present Hetero$^2$Net, a heterophily-aware HGNN that incorporates both masked metapath prediction and masked label prediction tasks to effectively and flexibly handle both homophilic and heterophilic heterogeneous graphs. We evaluate the performance of Hetero$^2$Net on five real-world heterogeneous graph benchmarks with varying levels of heterophily. The results demonstrate that Hetero$^2$Net outperforms strong baselines in the semi-supervised node classification task, providing valuable insights into effectively handling more complex heterogeneous graphs.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.