subscribe to arXiv mailings

arXiv:2407.11626 [pdf]

Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces

Authors: Dongnan Jin, Yali Liu, Qiuzhi Song, Xunju Ma, Yue Liu, Dehao Wu

Abstract: In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i… ▽ More In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search in multi-dimensional spaces with varying numbers of dimensions, this study proposes a new optimization algorithm-Dynamic Dimension Wrapping (DDW) algorithm. Firstly, by utilizing the Dynamic Time Warping (DTW) algorithm and Euclidean distance, a mapping relationship between different time series across dimensions is established, thus creating a fitness function suitable for dimensionally dynamic multi-dimensional space. Additionally, DDW introduces a novel, more efficient cross-dimensional search mechanism for dynamic multidimensional spaces. Finally, through comparative tests with 31 optimization algorithms in dynamic multidimensional space search, the results demonstrate that DDW exhibits outstanding search efficiency and provides search results closest to the actual optimal solution. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.07735 [pdf, other]

Protecting NeRFs' Copyright via Plug-And-Play Watermarking Base Model

Authors: Qi Song, Ziyuan Luo, Ka Chun Cheung, Simon See, Renjie Wan

Abstract: Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation. With the rising prominence and influence of NeRF, safeguarding its intellectual property has become increasingly important. In this paper, we propose \textbf{NeRFProtector}, which adopts a plug-and-play strategy to protect NeRF's copyright during its creation. NeRFProtector utilizes a pre-trained watermarking base… ▽ More Neural Radiance Fields (NeRFs) have become a key method for 3D scene representation. With the rising prominence and influence of NeRF, safeguarding its intellectual property has become increasingly important. In this paper, we propose \textbf{NeRFProtector}, which adopts a plug-and-play strategy to protect NeRF's copyright during its creation. NeRFProtector utilizes a pre-trained watermarking base model, enabling NeRF creators to embed binary messages directly while creating their NeRF. Our plug-and-play property ensures NeRF creators can flexibly choose NeRF variants without excessive modifications. Leveraging our newly designed progressive distillation, we demonstrate performance on par with several leading-edge neural rendering methods. Our project is available at: \url{https://qsong2001.github.io/NeRFProtector}. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV2024

arXiv:2407.06935 [pdf, other]

Bayesian Federated Learning with Hamiltonian Monte Carlo: Algorithm and Theory

Authors: Jiajun Liang, Qian Zhang, Wei Deng, Qifan Song, Guang Lin

Abstract: This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the ef… ▽ More This work introduces a novel and efficient Bayesian federated learning algorithm, namely, the Federated Averaging stochastic Hamiltonian Monte Carlo (FA-HMC), for parameter estimation and uncertainty quantification. We establish rigorous convergence guarantees of FA-HMC on non-iid distributed data sets, under the strong convexity and Hessian smoothness assumptions. Our analysis investigates the effects of parameter space dimension, noise on gradients and momentum, and the frequency of communication (between the central node and local nodes) on the convergence and communication costs of FA-HMC. Beyond that, we establish the tightness of our analysis by showing that the convergence rate cannot be improved even for continuous FA-HMC process. Moreover, extensive empirical studies demonstrate that FA-HMC outperforms the existing Federated Averaging-Langevin Monte Carlo (FA-LD) algorithm. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2406.11707 [pdf, other]

A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jianping Wang

Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of prediction-side attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car. To the best of our knowledge, this study is the first security analysis spanning from LiDAR-based perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

arXiv:2406.06559 [pdf, other]

Harnessing Business and Media Insights with Large Language Models

Authors: Yujia Bao, Ankit Parag Shah, Neeru Narang, Jonathan Rivers, Rajeev Maksey, Lan Guan, Louise N. Barrere, Shelley Evenson, Rahul Basole, Connie Miao, Ankit Mehta, Fabien Boulay, Su Min Park, Natalie E. Pearson, Eldhose Joy, Tiger He, Sumiran Thakur, Koustav Ghosal, Josh On, Phoebe Morrison, Tim Major, Eva Siqi Wang, Gina Escobar, Jiaheng Wei, Tharindu Cyril Weerasooriya , et al. (8 additional authors not shown)

Abstract: This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users… ▽ More This paper introduces Fortune Analytics Language Model (FALM). FALM empowers users with direct access to comprehensive business analysis, including market trends, company performance metrics, and expert insights. Unlike generic LLMs, FALM leverages a curated knowledge base built from professional journalism, enabling it to deliver precise and in-depth answers to intricate business questions. Users can further leverage natural language queries to directly visualize financial data, generating insightful charts and graphs to understand trends across diverse business sectors clearly. FALM fosters user trust and ensures output accuracy through three novel methods: 1) Time-aware reasoning guarantees accurate event registration and prioritizes recent updates. 2) Thematic trend analysis explicitly examines topic evolution over time, providing insights into emerging business landscapes. 3) Content referencing and task decomposition enhance answer fidelity and data visualization accuracy. We conduct both automated and human evaluations, demonstrating FALM's significant performance improvements over baseline methods while prioritizing responsible AI practices. These benchmarks establish FALM as a cutting-edge LLM in the business and media domains, with exceptional accuracy and trustworthiness. △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2404.11155 [pdf, other]

HybriMap: Hybrid Clues Utilization for Effective Vectorized HD Map Construction

Authors: Chi Zhang, Qi Song, Feifei Li, Yongquan Chen, Rui Huang

Abstract: Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years. However, the commonly employed multi-stage sequential workflow in prevailing approaches often leads to the loss of early-stage information, particularly in perspective-view features. Usually, such loss is observed as an instance missing or shape mismatching in the final birds… ▽ More Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years. However, the commonly employed multi-stage sequential workflow in prevailing approaches often leads to the loss of early-stage information, particularly in perspective-view features. Usually, such loss is observed as an instance missing or shape mismatching in the final birds-eye-view predictions. To address this concern, we propose a novel approach, namely \textbf{HybriMap}, which effectively exploits clues from hybrid features to ensure the delivery of valuable information. Specifically, we design the Dual Enhancement Module, to enable both explicit integration and implicit modification under the guidance of hybrid features. Additionally, the perspective keypoints are utilized as supervision, further directing the feature enhancement process. Extensive experiments conducted on existing benchmarks have demonstrated the state-of-the-art performance of our proposed approach. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.19436 [pdf, other]

"At the end of the day, I am accountable": Gig Workers' Self-Tracking for Multi-Dimensional Accountability Management

Authors: Rie Helene Hernandez, Qiurong Song, Yubo Kou, Xinning Gui

Abstract: Tracking is inherent in and central to the gig economy. Platforms track gig workers' performance through metrics such as acceptance rate and punctuality, while gig workers themselves engage in self-tracking. Although prior research has extensively examined how gig platforms track workers through metrics -- with some studies briefly acknowledging the phenomenon of self-tracking among workers -- the… ▽ More Tracking is inherent in and central to the gig economy. Platforms track gig workers' performance through metrics such as acceptance rate and punctuality, while gig workers themselves engage in self-tracking. Although prior research has extensively examined how gig platforms track workers through metrics -- with some studies briefly acknowledging the phenomenon of self-tracking among workers -- there is a dearth of studies that explore how and why gig workers track themselves. To address this, we conducted 25 semi-structured interviews, revealing how gig workers self-tracking to manage accountabilities to themselves and external entities across three identities: the holistic self, the entrepreneurial self, and the platformized self. We connect our findings to neoliberalism, through which we contextualize gig workers' self-accountability and the invisible labor of self-tracking. We further discuss how self-tracking mitigates information and power asymmetries in gig work and offer design implications to support gig workers' multi-dimensional self-tracking. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted to CHI 2024

arXiv:2403.10133 [pdf, other]

E4C: Enhance Editability for Text-Based Image Editing by Harnessing Efficient CLIP Guidance

Authors: Tianrui Huang, Pu Cao, Lu Yang, Chun Liu, Mengjie Hu, Zhiwei Liu, Qing Song

Abstract: Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. While current editing approaches have made improvements under text guidance, most of them have only focused on preserving the information of the input image, disregarding the importance of editability and alignment to the target prompt. In this paper, we… ▽ More Diffusion-based image editing is a composite process of preserving the source image content and generating new content or applying modifications. While current editing approaches have made improvements under text guidance, most of them have only focused on preserving the information of the input image, disregarding the importance of editability and alignment to the target prompt. In this paper, we prioritize the editability by proposing a zero-shot image editing method, named \textbf{E}nhance \textbf{E}ditability for text-based image \textbf{E}diting via \textbf{E}fficient \textbf{C}LIP guidance (\textbf{E4C}), which only requires inference-stage optimization to explicitly enhance the edibility and text alignment. Specifically, we develop a unified dual-branch feature-sharing pipeline that enables the preservation of the structure or texture of the source image while allowing the other to be adapted based on the editing task. We further integrate CLIP guidance into our pipeline by utilizing our novel random-gateway optimization mechanism to efficiently enhance the semantic alignment with the target prompt. Comprehensive quantitative and qualitative experiments demonstrate that our method effectively resolves the text alignment issues prevalent in existing methods while maintaining the fidelity to the source image, and performs well across a wide range of editing tasks. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.04279 [pdf, other]

Controllable Generation with Text-to-Image Diffusion Models: A Survey

Authors: Pu Cao, Feng Zhou, Qing Song, Lu Yang

Abstract: In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a… ▽ More In the rapidly advancing realm of visual generation, diffusion models have revolutionized the landscape, marking a significant shift in capabilities with their impressive text-guided generative functions. However, relying solely on text for conditioning these models does not fully cater to the varied and complex requirements of different applications and scenarios. Acknowledging this shortfall, a variety of studies aim to control pre-trained text-to-image (T2I) models to support novel conditions. In this survey, we undertake a thorough review of the literature on controllable generation with T2I diffusion models, covering both the theoretical foundations and practical advancements in this domain. Our review begins with a brief introduction to the basics of denoising diffusion probabilistic models (DDPMs) and widely used T2I diffusion models. We then reveal the controlling mechanisms of diffusion models, theoretically analyzing how novel conditions are introduced into the denoising process for conditional generation. Additionally, we offer a detailed overview of research in this area, organizing it into distinct categories from the condition perspective: generation with specific conditions, generation with multiple conditions, and universal controllable generation. For an exhaustive list of the controllable generation literature surveyed, please refer to our curated repository at \url{https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models}. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: A collection of resources on controllable generation with text-to-image diffusion models: https://github.com/PRIV-Creation/Awesome-Controllable-T2I-Diffusion-Models

arXiv:2403.03967 [pdf, other]

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

Authors: Rajdeep Haldar, Yue Xing, Qifan Song

Abstract: The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a nat… ▽ More The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap. △ Less

Submitted 23 March, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

Comments: AISTATS 2024

arXiv:2402.18811 [pdf, other]

BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

Authors: Guojing Ge, Qi Song, Guibo Zhu, Yuting Zhang, Jinglu Chen, Miao Xin, Ming Tang, Jinqiao Wang

Abstract: Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the in… ▽ More Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the intrinsic limitation of convolutional neural networks. To model long-range dependencies, we propose a Transformer-based blind face restoration method, named BFRFormer, to reconstruct images with more identity-preserved details in an end-to-end manner. In BFRFormer, to remove blocking artifacts, the wavelet discriminator and aggregated attention module are developed, and spectral normalization and balanced consistency regulation are adaptively applied to address the training instability and over-fitting problem, respectively. Extensive experiments show that our method outperforms state-of-the-art methods on a synthetic dataset and four real-world datasets. The source code, Casia-Test dataset, and pre-trained models are released at https://github.com/s8Znk/BFRFormer. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted by ICASSP 2024

arXiv:2402.14891 [pdf, other]

LLMBind: A Unified Modality-Task Integration Framework

Authors: Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

Abstract: In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific to… ▽ More In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress. To address this challenge, we introduce \textbf{LLMBind}, a novel framework designed to unify a diverse array of multi-modal tasks. By harnessing a Mixture-of-Experts (MoE) Large Language Model (LLM), LLMBind processes multi-modal inputs and generates task-specific tokens, enabling the invocation of corresponding models to accomplish tasks. This unique approach empowers LLMBind to interpret inputs and generate outputs across various modalities, including image, text, video, and audio. Furthermore, we have constructed an interaction dataset comprising 400k instructions, which unlocks the ability of LLMBind for interactive visual generation and editing tasks. Extensive experimentation demonstrates that LLMBind achieves very superior performance across diverse tasks and outperforms existing models in user evaluations conducted in real-world scenarios. Moreover, the adaptability of LLMBind allows for seamless integration with the latest models and extension to new modality tasks, highlighting its potential to serve as a unified AI agent for modeling universal modalities. △ Less

Submitted 18 April, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

arXiv:2402.13435 [pdf, other]

Learning to Retrieve for Job Matching

Authors: Jianqiang Shen, Yuchin Juan, Shaobo Zhang, Ping Liu, Wen Pu, Sriram Vasudevan, Qingquan Song, Fedor Borisyuk, Kay Qianqi Shen, Haichao Wei, Yunxiang Ren, Yeou S. Chiou, Sicong Kuang, Yuan Yin, Ben Zheng, Muchen Wu, Shaghayegh Gharghabi, Xiaoqing Wang, Huichao Xue, Qi Guo, Daniel Hewlett, Luke Simon, Liangjie Hong, Wenjing Zhang

Abstract: Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we d… ▽ More Web-scale search systems typically tackle the scalability challenge with a two-step paradigm: retrieval and ranking. The retrieval step, also known as candidate selection, often involves extracting standardized entities, creating an inverted index, and performing term matching for retrieval. Such traditional methods require manual and time-consuming development of query models. In this paper, we discuss applying learning-to-retrieve technology to enhance LinkedIns job search and recommendation systems. In the realm of promoted jobs, the key objective is to improve the quality of applicants, thereby delivering value to recruiter customers. To achieve this, we leverage confirmed hire data to construct a graph that evaluates a seeker's qualification for a job, and utilize learned links for retrieval. Our learned model is easy to explain, debug, and adjust. On the other hand, the focus for organic jobs is to optimize seeker engagement. We accomplished this by training embeddings for personalized retrieval, fortified by a set of rules derived from the categorization of member feedback. In addition to a solution based on a conventional inverted index, we developed an on-GPU solution capable of supporting both KNN and term matching efficiently. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.06859 [pdf, other]

LiRank: Industrial Large Scale Ranking Models at LinkedIn

Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems. △ Less

Submitted 9 February, 2024; originally announced February 2024.

ACM Class: H.3.3

arXiv:2402.00743 [pdf, other]

Theoretical Understanding of In-Context Learning in Shallow Transformers with Unstructured Data

Authors: Yue Xing, Xiaofeng Lin, Chenheng Xu, Namjoon Suh, Qifan Song, Guang Cheng

Abstract: Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usua… ▽ More Large language models (LLMs) are powerful models that can learn concepts at the inference stage via in-context learning (ICL). While theoretical studies, e.g., \cite{zhang2023trained}, attempt to explain the mechanism of ICL, they assume the input $x_i$ and the output $y_i$ of each demonstration example are in the same token (i.e., structured data). However, in real practice, the examples are usually text input, and all words, regardless of their logic relationship, are stored in different tokens (i.e., unstructured data \cite{wibisono2023role}). To understand how LLMs learn from the unstructured data in ICL, this paper studies the role of each component in the transformer architecture and provides a theoretical understanding to explain the success of the architecture. In particular, we consider a simple transformer with one/two attention layers and linear regression tasks for the ICL prediction. We observe that (1) a transformer with two layers of (self-)attentions with a look-ahead attention mask can learn from the prompt in the unstructured data, and (2) positional encoding can match the $x_i$ and $y_i$ tokens to achieve a better ICL performance. △ Less

Submitted 18 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

arXiv:2401.15248 [pdf, other]

Better Representations via Adversarial Training in Pre-Training: A Theoretical Perspective

Authors: Yue Xing, Xiaofeng Lin, Qifan Song, Yi Xu, Belinda Zeng, Guang Cheng

Abstract: Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoreti… ▽ More Pre-training is known to generate universal representations for downstream tasks in large-scale deep learning such as large language models. Existing literature, e.g., \cite{kim2020adversarial}, empirically observe that the downstream tasks can inherit the adversarial robustness of the pre-trained model. We provide theoretical justifications for this robustness inheritance phenomenon. Our theoretical results reveal that feature purification plays an important role in connecting the adversarial robustness of the pre-trained model and the downstream tasks in two-layer neural networks. Specifically, we show that (i) with adversarial training, each hidden node tends to pick only one (or a few) feature; (ii) without adversarial training, the hidden nodes can be vulnerable to attacks. This observation is valid for both supervised pre-training and contrastive learning. With purified nodes, it turns out that clean training is enough to achieve adversarial robustness in downstream tasks. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: To appear in AISTATS2024

arXiv:2401.09785 [pdf, other]

Instant Answering in E-Commerce Buyer-Seller Messaging using Message-to-Question Reformulation

Authors: Besnik Fetahu, Tejas Mehta, Qun Song, Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi

Abstract: E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries. This manual response requirement imposes additional costs and disrupts buyer's shopping experience with response time fluctuations ranging from hours to days. We seek to automate buyer inquiries to sellers in a leading e-commerce store using a domain… ▽ More E-commerce customers frequently seek detailed product information for purchase decisions, commonly contacting sellers directly with extended queries. This manual response requirement imposes additional costs and disrupts buyer's shopping experience with response time fluctuations ranging from hours to days. We seek to automate buyer inquiries to sellers in a leading e-commerce store using a domain-specific federated Question Answering (QA) system. The main challenge is adapting current QA systems, designed for single questions, to address detailed customer queries. We address this with a low-latency, sequence-to-sequence approach, MESSAGE-TO-QUESTION ( M2Q ). It reformulates buyer messages into succinct questions by identifying and extracting the most salient information from a message. Evaluation against baselines shows that M2Q yields relative increases of 757% in question understanding, and 1,746% in answering rate from the federated QA system. Live deployment shows that automatic answering saves sellers from manually responding to millions of messages per year, and also accelerates customer purchase decisions by eliminating the need for buyers to wait for a reply △ Less

Submitted 30 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: Accepted at ECIR 2024

arXiv:2401.09686 [pdf, other]

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

Authors: Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Abstract: Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform… ▽ More Transformer architecture has enabled recent progress in speech enhancement. Since Transformers are position-agostic, positional encoding is the de facto standard component used to enable Transformers to distinguish the order of elements in a sequence. However, it remains unclear how positional encoding exactly impacts speech enhancement based on Transformer architectures. In this paper, we perform a comprehensive empirical study evaluating five positional encoding methods, i.e., Sinusoidal and learned absolute position embedding (APE), T5-RPE, KERPLE, as well as the Transformer without positional encoding (No-Pos), across both causal and noncausal configurations. We conduct extensive speech enhancement experiments, involving spectral mapping and masking methods. Our findings establish that positional encoding is not quite helpful for the models in a causal configuration, which indicates that causal attention may implicitly incorporate position information. In a noncausal configuration, the models significantly benefit from the use of positional encoding. In addition, we find that among the four position embeddings, relative position embeddings outperform APEs. △ Less

Submitted 13 February, 2024; v1 submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.08067 [pdf]

TrajVis: a visual clinical decision support system to translate artificial intelligence trajectory models in the precision management of chronic kidney disease

Authors: Zuotian Li, Xiang Liu, Ziyang Tang, Pengyue Zhang, Nanxin Jin, Michael Eadon, Qianqian Song, Yingjie Chen, Jing Su

Abstract: Objective: Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMR) for personalized precision management of chronic disease progression. Methods: We first perform requirement analysis with clinicians and data scientists to determine the visual analy… ▽ More Objective: Our objective is to develop and validate TrajVis, an interactive tool that assists clinicians in using artificial intelligence (AI) models to leverage patients' longitudinal electronic medical records (EMR) for personalized precision management of chronic disease progression. Methods: We first perform requirement analysis with clinicians and data scientists to determine the visual analytics tasks of the TrajVis system as well as its design and functionalities. A graph AI model for chronic kidney disease (CKD) trajectory inference named DEPOT is used for system development and demonstration. TrajVis is implemented as a full-stack web application with synthetic EMR data derived from the Atrium Health Wake Forest Baptist Translational Data Warehouse and the Indiana Network for Patient Care research database. A case study with a nephrologist and a user experience survey of clinicians and data scientists are conducted to evaluate the TrajVis system. Results: The TrajVis clinical information system is composed of four panels: the Patient View for demographic and clinical information, the Trajectory View to visualize the DEPOT-derived CKD trajectories in latent space, the Clinical Indicator View to elucidate longitudinal patterns of clinical features and interpret DEPOT predictions, and the Analysis View to demonstrate personal CKD progression trajectories. System evaluations suggest that TrajVis supports clinicians in summarizing clinical data, identifying individualized risk predictors, and visualizing patient disease progression trajectories, overcoming the barriers of AI implementation in healthcare. Conclusion: TrajVis bridges the gap between the fast-growing AI/ML modeling and the clinical use of such models for personalized and precision management of chronic diseases. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2401.06431 [pdf, other]

Human-AI Collaborative Essay Scoring: A Dual-Process Framework with LLMs

Authors: Changrong Xiao, Wenxing Ma, Qingping Song, Sean Xin Xu, Kunpeng Zhang, Yufang Wang, Qi Fu

Abstract: Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass con… ▽ More Receiving timely and personalized feedback is essential for second-language learners, especially when human instructors are unavailable. This study explores the effectiveness of Large Language Models (LLMs), including both proprietary and open-source models, for Automated Essay Scoring (AES). Through extensive experiments with public and private datasets, we find that while LLMs do not surpass conventional state-of-the-art (SOTA) grading models in performance, they exhibit notable consistency, generalizability, and explainability. We propose an open-source LLM-based AES system, inspired by the dual-process theory. Our system offers accurate grading and high-quality feedback, at least comparable to that of fine-tuned proprietary LLMs, in addition to its ability to alleviate misgrading. Furthermore, we conduct human-AI co-grading experiments with both novice and expert graders. We find that our system not only automates the grading process but also enhances the performance and efficiency of human graders, particularly for essays where the model has lower confidence. These results highlight the potential of LLMs to facilitate effective human-AI collaboration in the educational context, potentially transforming learning experiences through AI-generated feedback. △ Less

Submitted 14 June, 2024; v1 submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.04956 [pdf, other]

EmMixformer: Mix transformer for eye movement recognition

Authors: Huafeng Qin, Hongyu Zhu, Xin Jin, Qun Song, Mounim A. El-Yacoubi, Xinbo Gao

Abstract: Eye movement (EM) is a new highly secure biometric behavioral modality that has received increasing attention in recent years. Although deep neural networks, such as convolutional neural network (CNN), have recently achieved promising performance, current solutions fail to capture local and global temporal dependencies within eye movement data. To overcome this problem, we propose in this paper a… ▽ More Eye movement (EM) is a new highly secure biometric behavioral modality that has received increasing attention in recent years. Although deep neural networks, such as convolutional neural network (CNN), have recently achieved promising performance, current solutions fail to capture local and global temporal dependencies within eye movement data. To overcome this problem, we propose in this paper a mixed transformer termed EmMixformer to extract time and frequency domain information for eye movement recognition. To this end, we propose a mixed block consisting of three modules, transformer, attention Long short-term memory (attention LSTM), and Fourier transformer. We are the first to attempt leveraging transformer to learn long temporal dependencies within eye movement. Second, we incorporate the attention mechanism into LSTM to propose attention LSTM with the aim to learn short temporal dependencies. Third, we perform self attention in the frequency domain to learn global features. As the three modules provide complementary feature representations in terms of local and global dependencies, the proposed EmMixformer is capable of improving recognition accuracy. The experimental results on our eye movement dataset and two public eye movement datasets show that the proposed EmMixformer outperforms the state of the art by achieving the lowest verification error. △ Less

Submitted 9 May, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

arXiv:2401.04429 [pdf, other]

i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

Authors: Haoyang Chen, Peiyan Sun, Qiyuan Song, Wanyuan Wang, Weiwei Wu, Wencan Zhang, Guanyu Gao, Yan Lyu

Abstract: Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the r… ▽ More Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the recommendation or not on their own. We propose i-Rebalance, a personalized vehicle reposition technique with deep reinforcement learning (DRL). i-Rebalance estimates drivers' decisions on accepting reposition recommendations through an on-field user study involving 99 real drivers. To optimize supply-demand balance and enhance preference satisfaction simultaneously, i-Rebalance has a sequential reposition strategy with dual DRL agents: Grid Agent to determine the reposition order of idle vehicles, and Vehicle Agent to provide personalized recommendations to each vehicle in the pre-defined order. This sequential learning strategy facilitates more effective policy training within a smaller action space compared to traditional joint-action methods. Evaluation of real-world trajectory data shows that i-Rebalance improves driver acceptance rate by 38.07% and total driver income by 9.97%. △ Less

Submitted 2 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

arXiv:2401.04044 [pdf, other]

FFSplit: Split Feed-Forward Network For Optimizing Accuracy-Efficiency Trade-off in Language Model Inference

Authors: Zirui Liu, Qingquan Song, Qiang Charles Xiao, Sathiya Keerthi Selvaraj, Rahul Mazumder, Aman Gupta, Xia Hu

Abstract: The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-… ▽ More The large number of parameters in Pretrained Language Models enhance their performance, but also make them resource-intensive, making it challenging to deploy them on commodity hardware like a single GPU. Due to the memory and power limitations of these devices, model compression techniques are often used to decrease both the model's size and its inference latency. This usually results in a trade-off between model accuracy and efficiency. Therefore, optimizing this balance is essential for effectively deploying LLMs on commodity hardware. A significant portion of the efficiency challenge is the Feed-forward network (FFN) component, which accounts for roughly $\frac{2}{3}$ total parameters and inference latency. In this paper, we first observe that only a few neurons of FFN module have large output norm for any input tokens, a.k.a. heavy hitters, while the others are sparsely triggered by different tokens. Based on this observation, we explicitly split the FFN into two parts according to the heavy hitters. We improve the efficiency-accuracy trade-off of existing compression methods by allocating more resource to FFN parts with heavy hitters. In practice, our method can reduce model size by 43.1\% and bring $1.25\sim1.56\times$ wall clock time speedup on different hardware with negligible accuracy drop. △ Less

Submitted 8 January, 2024; originally announced January 2024.

arXiv:2312.08195 [pdf, other]

Concept-centric Personalization with Large-scale Diffusion Priors

Authors: Pu Cao, Lu Yang, Feng Zhou, Tianrui Huang, Qing Song

Abstract: Despite large-scale diffusion models being highly capable of generating diverse open-world content, they still struggle to match the photorealism and fidelity of concept-specific generators. In this work, we present the task of customizing large-scale diffusion priors for specific concepts as concept-centric personalization. Our goal is to generate high-quality concept-centric images while maintai… ▽ More Despite large-scale diffusion models being highly capable of generating diverse open-world content, they still struggle to match the photorealism and fidelity of concept-specific generators. In this work, we present the task of customizing large-scale diffusion priors for specific concepts as concept-centric personalization. Our goal is to generate high-quality concept-centric images while maintaining the versatile controllability inherent to open-world models, enabling applications in diverse tasks such as concept-centric stylization and image translation. To tackle these challenges, we identify catastrophic forgetting of guidance prediction from diffusion priors as the fundamental issue. Consequently, we develop a guidance-decoupled personalization framework specifically designed to address this task. We propose Generalized Classifier-free Guidance (GCFG) as the foundational theory for our framework. This approach extends Classifier-free Guidance (CFG) to accommodate an arbitrary number of guidances, sourced from a variety of conditions and models. Employing GCFG enables us to separate conditional guidance into two distinct components: concept guidance for fidelity and control guidance for controllability. This division makes it feasible to train a specialized model for concept guidance, while ensuring both control and unconditional guidance remain intact. We then present a null-text Concept-centric Diffusion Model as a concept-specific generator to learn concept guidance without the need for text annotations. Code will be available at https://github.com/PRIV-Creation/Concept-centric-Personalization. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.02944 [pdf, other]

An alternating peak-optimization method for optimal trajectory generation of quadrotor drones

Authors: Wytze A. B. de Vries, Ming Li, Qirui Song, Zhiyong Sun

Abstract: In this paper, we propose an alternating optimization method to address a time-optimal trajectory generation problem. Different from the existing solutions, our approach introduces a new formulation that minimizes the overall trajectory running time while maintaining the polynomial smoothness constraints and incorporating hard limits on motion derivatives to ensure feasibility. To address this pro… ▽ More In this paper, we propose an alternating optimization method to address a time-optimal trajectory generation problem. Different from the existing solutions, our approach introduces a new formulation that minimizes the overall trajectory running time while maintaining the polynomial smoothness constraints and incorporating hard limits on motion derivatives to ensure feasibility. To address this problem, an alternating peak-optimization method is developed, which splits the optimization process into two sub-optimizations: the first sub-optimization optimizes polynomial coefficients for smoothness, and the second sub-optimization adjusts the time allocated to each trajectory segment. These are alternated until a feasible minimum-time solution is found. We offer a comprehensive set of simulations and experiments to showcase the superior performance of our approach in comparison to existing methods. A collection of demonstration videos with real drone flying experiments can be accessed at https://www.youtube.com/playlist?list=PLQGtPFK17zUYkwFT-fr0a8E49R8Uq712l . △ Less

Submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.16576 [pdf, other]

Wireless Powered Metaverse: Joint Task Scheduling and Trajectory Design for Multi-Devices and Multi-UAVs

Authors: Xiaojie Wang, Jiameng Li, Zhaolong Ning, Qingyang Song, Lei Guo, Abbas Jamalipour

Abstract: To support the running of human-centric metaverse applications on mobile devices, Unmanned Aerial Vehicle (UAV)-assisted Wireless Powered Mobile Edge Computing (WPMEC) is promising to compensate for limited computational capabilities and energy supplies of mobile devices. The high-speed computational processing demands and significant energy consumption of metaverse applications require joint reso… ▽ More To support the running of human-centric metaverse applications on mobile devices, Unmanned Aerial Vehicle (UAV)-assisted Wireless Powered Mobile Edge Computing (WPMEC) is promising to compensate for limited computational capabilities and energy supplies of mobile devices. The high-speed computational processing demands and significant energy consumption of metaverse applications require joint resource scheduling of multiple devices and UAVs, but existing WPMEC solutions address either device or UAV scheduling due to the complexity of combinatorial optimization. To solve the above challenge, we propose a two-stage alternating optimization algorithm based on multi-task Deep Reinforcement Learning (DRL) to jointly allocate charging time, schedule computation tasks, and optimize trajectory of UAVs and mobile devices in a wireless powered metaverse scenario. First, considering energy constraints of both UAVs and mobile devices, we formulate an optimization problem to maximize the computation efficiency of the system. Second, we propose a heuristic algorithm to efficiently perform time allocation and charging scheduling for mobile devices. Following this, we design a multi-task DRL scheme to make charging scheduling and trajectory design decisions for UAVs. Finally, theoretical analysis and performance results demonstrate that our algorithm exhibits significant advantages over representative methods in terms of convergence speed and average computation efficiency. △ Less

Submitted 28 November, 2023; originally announced November 2023.

arXiv:2311.05866 [pdf, other]

Fair Supervised Learning with A Simple Random Sampler of Sensitive Attributes

Authors: Jinwon Sohn, Qifan Song, Guang Lin

Abstract: As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of s… ▽ More As the data-driven decision process becomes dominating for industrial applications, fairness-aware machine learning arouses great attention in various areas. This work proposes fairness penalties learned by neural networks with a simple random sampler of sensitive attributes for non-discriminatory supervised learning. In contrast to many existing works that critically rely on the discreteness of sensitive attributes and response variables, the proposed penalty is able to handle versatile formats of the sensitive attributes, so it is more extensively applicable in practice than many existing algorithms. This penalty enables us to build a computationally efficient group-level in-processing fairness-aware training framework. Empirical evidence shows that our framework enjoys better utility and fairness measures on popular benchmark data sets than competing methods. We also theoretically characterize estimation errors and loss of utility of the proposed neural-penalized risk minimization problem. △ Less

Submitted 9 March, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.16323 [pdf, other]

Personalized Federated X -armed Bandit

Authors: Wenjie Li, Qifan Song, Jean Honorio

Abstract: In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but… ▽ More In this work, we study the personalized federated $\mathcal{X}$-armed bandit problem, where the heterogeneous local objectives of the clients are optimized simultaneously in the federated learning paradigm. We propose the \texttt{PF-PNE} algorithm with a unique double elimination strategy, which safely eliminates the non-optimal regions while encouraging federated collaboration through biased but effective evaluations of the local objectives. The proposed \texttt{PF-PNE} algorithm is able to optimize local objectives with arbitrary levels of heterogeneity, and its limited communications protects the confidentiality of the client-wise reward data. Our theoretical analysis shows the benefit of the proposed algorithm over single-client algorithms. Experimentally, \texttt{PF-PNE} outperforms multiple baselines on both synthetic and real life datasets. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.16320 [pdf, other]

Enhancing Low-Precision Sampling via Stochastic Gradient Hamiltonian Monte Carlo

Authors: Ziyi Wang, Yujie Chen, Qifan Song, Ruqi Zhang

Abstract: Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and f… ▽ More Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $ε$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}}\left({ε^{-2}{μ^*}^{-2}\log^2\left({ε^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}}\left({ε^{-4}{λ^{*}}^{-1}\log^5\left({ε^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning. △ Less

Submitted 14 July, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.09603 [pdf, other]

B-Spine: Learning B-Spline Curve Representation for Robust and Interpretable Spinal Curvature Estimation

Authors: Hao Wang, Qiang Song, Ruofeng Yin, Rui Ma, Yizhou Yu, Yi Chang

Abstract: Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we pr… ▽ More Spinal curvature estimation is important to the diagnosis and treatment of the scoliosis. Existing methods face several issues such as the need of expensive annotations on the vertebral landmarks and being sensitive to the image quality. It is challenging to achieve robust estimation and obtain interpretable results, especially for low-quality images which are blurry and hazy. In this paper, we propose B-Spine, a novel deep learning pipeline to learn B-spline curve representation of the spine and estimate the Cobb angles for spinal curvature estimation from low-quality X-ray images. Given a low-quality input, a novel SegRefine network which employs the unpaired image-to-image translation is proposed to generate a high quality spine mask from the initial segmentation result. Next, a novel mask-based B-spline prediction model is proposed to predict the B-spline curve for the spine centerline. Finally, the Cobb angles are estimated by a hybrid approach which combines the curve slope analysis and a curve-based regression model. We conduct quantitative and qualitative comparisons with the representative and SOTA learning-based methods on the public AASCE2019 dataset and our new proposed CJUH-JLU dataset which contains more challenging low-quality images. The superior performance on both datasets shows our method can achieve both robustness and interpretability for spinal curvature estimation. △ Less

Submitted 14 October, 2023; originally announced October 2023.

arXiv:2309.15035 [pdf, ps, other]

On the Reduced Gröbner Bases of Blockwise Determinantal Ideals

Authors: Chenqi Mou, Qiuye Song

Abstract: Blockwise determinantal ideals are those generated by the union of all the minors of specified sizes in certain blocks of a generic matrix, and they are the natural generalization of many existing determinantal ideals like the Schubert and ladder ones. In this paper we establish several criteria to verify whether the Gröbner bases of blockwise determinantal ideals with respect to (anti-)diagonal t… ▽ More Blockwise determinantal ideals are those generated by the union of all the minors of specified sizes in certain blocks of a generic matrix, and they are the natural generalization of many existing determinantal ideals like the Schubert and ladder ones. In this paper we establish several criteria to verify whether the Gröbner bases of blockwise determinantal ideals with respect to (anti-)diagonal term orders are minimal or reduced. In particular, for Schubert determinantal ideals, while all the elusive minors form the reduced Gröbner bases when the defining permutations are vexillary, in the non-vexillary case we derive an explicit formula for computing the reduced Gröbner basis from elusive minors which avoids all algebraic operations. The fundamental properties of being normal and strong for W-characteristic sets and characteristic pairs, which are heavily connected to the reduced Gröbner bases, of Schubert determinantal ideals are also proven. △ Less

Submitted 26 September, 2023; originally announced September 2023.

MSC Class: 13P10 (Primary) 13C40; 05E14 (Secondary)

arXiv:2309.04909 [pdf, ps, other]

Bicoptor 2.0: Addressing Challenges in Probabilistic Truncation for Enhanced Privacy-Preserving Machine Learning

Authors: Lijing Zhou, Qingrui Song, Su Zhang, Ziyu Wang, Xianggui Wang, Yong Li

Abstract: This paper primarily focuses on analyzing the problems and proposing solutions for the probabilistic truncation protocol in existing PPML works from the perspectives of accuracy and efficiency. In terms of accuracy, we reveal that precision selections recommended in some of the existing works are incorrect. We conduct a thorough analysis of their open-source code and find that their errors were ma… ▽ More This paper primarily focuses on analyzing the problems and proposing solutions for the probabilistic truncation protocol in existing PPML works from the perspectives of accuracy and efficiency. In terms of accuracy, we reveal that precision selections recommended in some of the existing works are incorrect. We conduct a thorough analysis of their open-source code and find that their errors were mainly due to simplified implementation, more specifically, fixed numbers are used instead of random numbers in probabilistic truncation protocols. Based on this, we provide a detailed theoretical analysis to validate our views. We propose a solution and a precision selection guideline for future works. Regarding efficiency, we identify limitations in the state-of-the-art comparison protocol, Bicoptor's (S\&P 2023) DReLU protocol, which relies on the probabilistic truncation protocol and is heavily constrained by the security parameter to avoid errors, significantly impacting the protocol's performance. To address these challenges, we introduce the first non-interactive deterministic truncation protocol, replacing the original probabilistic truncation protocol. Additionally, we design a non-interactive modulo switch protocol to enhance the protocol's security. Finally, we provide a guideline to reduce computational and communication overhead by using only a portion of the bits of the input, i.e., the key bits, for DReLU operations based on different model parameters. With the help of key bits, the performance of our DReLU protocol is further improved. We evaluate the performance of our protocols on three GPU servers, and achieve a 10x improvement in DReLU protocol, and a 6x improvement in the ReLU protocol over the state-of-the-art work Piranha-Falcon (USENIX Sec 22). Overall, the performance of our end-to-end (E2E) privacy-preserving machine learning (PPML) inference is improved by 3-4 times. △ Less

Submitted 6 March, 2024; v1 submitted 9 September, 2023; originally announced September 2023.

Comments: 17 pages, 5 figures

arXiv:2309.01885 [pdf, other]

QuantEase: Optimization-based Quantization for Language Models

Authors: Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu, Sathiya Keerthi, Rahul Mazumder

Abstract: With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is f… ▽ More With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing for retaining significant weights (outliers) with complete precision. Our proposal attains state-of-the-art performance in terms of perplexity and zero-shot accuracy in empirical evaluations across various LLMs and datasets, with relative improvements up to 15% over methods such as GPTQ. Leveraging careful linear algebra optimizations, QuantEase can quantize models like Falcon-180B on a single NVIDIA A100 GPU in $\sim$3 hours. Particularly noteworthy is our outlier-aware algorithm's capability to achieve near or sub-3-bit quantization of LLMs with an acceptable drop in accuracy, obviating the need for non-uniform quantization or grouping techniques, improving upon methods such as SpQR by up to two times in terms of perplexity. △ Less

Submitted 1 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

arXiv:2309.00154 [pdf, other]

Learning From Peers: A Survey of Perception and Utilization of Online Peer Support Among Informal Dementia Caregivers

Authors: Zhijun Yin, Lauren Stratton, Qingyuan Song, Congning Ni, Lijun Song, Patricia A. Commiskey, Qingxia Chen, Monica Moreno, Sam Fazio, Bradley A. Malin

Abstract: Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers who… ▽ More Informal dementia caregivers are those who care for a person living with dementia (PLWD) without receiving payment (e.g., family members, friends, or other unpaid caregivers). These informal caregivers are subject to substantial mental, physical, and financial burdens. Online communities enable these caregivers to exchange caregiving strategies and communicate experiences with other caregivers whom they generally do not know in real life. Research has demonstrated the benefits of peer support in online communities, but they are limited in focusing merely on caregivers who are already online users. In this paper, we designed and administered a survey to investigate the perception and utilization of online peer support from 140 informal dementia caregivers (with 100 online-community caregivers). Our findings show that the behavior to access any online community is only significantly associated with their belief in the value of online peer support (p = 0.006). Moreover, 33 (83%) of the 40 non-online-community caregivers had a belief score above 24, a score assigned when a neutral option is selected for each belief question. The reasons most articulated for not accessing any online community were no time to do so (14; 10%), and insufficient online information searching skills (9; 6%). Our findings suggest that online peer support is valuable, but practical strategies are needed to assist informal dementia caregivers who have limited time or searching skills. △ Less

Submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.16453 [pdf, ps, other]

Listen to Minority: Encrypted Traffic Classification for Class Imbalance with Contrastive Pre-Training

Authors: Xiang Li, Juncheng Guo, Qige Song, Jiang Xie, Yafei Sang, Shuyuan Zhao, Yongzheng Zhang

Abstract: Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network enviro… ▽ More Mobile Internet has profoundly reshaped modern lifestyles in various aspects. Encrypted Traffic Classification (ETC) naturally plays a crucial role in managing mobile Internet, especially with the explosive growth of mobile apps using encrypted communication. Despite some existing learning-based ETC methods showing promising results, three-fold limitations still remain in real-world network environments, 1) label bias caused by traffic class imbalance, 2) traffic homogeneity caused by component sharing, and 3) training with reliance on sufficient labeled traffic. None of the existing ETC methods can address all these limitations. In this paper, we propose a novel Pre-trAining Semi-Supervised ETC framework, dubbed PASS. Our key insight is to resample the original train dataset and perform contrastive pre-training without using individual app labels directly to avoid label bias issues caused by class imbalance, while obtaining a robust feature representation to differentiate overlapping homogeneous traffic by pulling positive traffic pairs closer and pushing negative pairs away. Meanwhile, PASS designs a semi-supervised optimization strategy based on pseudo-label iteration and dynamic loss weighting algorithms in order to effectively utilize massive unlabeled traffic data and alleviate manual train dataset annotation workload. PASS outperforms state-of-the-art ETC methods and generic sampling approaches on four public datasets with significant class imbalance and traffic homogeneity, remarkably pushing the F1 of Cross-Platform215 with 1.31%, ISCX-17 with 9.12%. Furthermore, we validate the generality of the contrastive pre-training and pseudo-label iteration components of PASS, which can adaptively benefit ETC methods with diverse feature extractors. △ Less

Submitted 6 September, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

Comments: Accepted by 2023 20th Annual IEEE International Conference on Sensing, Communication, and Networking, 9 pages, 6 figures

arXiv:2307.16121 [pdf, other]

doi 10.3233/FAIA230441

Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

Authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang

Abstract: Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertain… ▽ More Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertainties into the multi-modal fusion still lacks effective solutions due primarily to the uncertainty's cross-modal incomparability and distinct sensitivities to various adverse conditions. To fill this gap, this paper proposes Uncertainty-Encoded Mixture-of-Experts (UMoE) that explicitly incorporates single-modal uncertainties into LiDAR-camera fusion. UMoE uses individual expert network to process each sensor's detection result together with encoded uncertainty. Then, the expert networks' outputs are analyzed by a gating network to determine the fusion weights. The proposed UMoE module can be integrated into any proposal fusion pipeline. Evaluation shows that UMoE achieves a maximum of 10.67%, 3.17%, and 5.40% performance gain compared with the state-of-the-art proposal-level multi-modal object detectors under extreme weather, adversarial, and blinding attack scenarios. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: In proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023. 8 pages + 2 appendix pages

arXiv:2307.16099 [pdf, other]

On Neural Network approximation of ideal adversarial attack and convergence of adversarial training

Authors: Rajdeep Haldar, Qifan Song

Abstract: Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model, this results in heavy computations every time an attack is generated. In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further gradient computation. We first motivate that the theoretical best attacks, under proper conditions, can be r… ▽ More Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model, this results in heavy computations every time an attack is generated. In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further gradient computation. We first motivate that the theoretical best attacks, under proper conditions, can be represented as smooth piece-wise functions (piece-wise Hölder functions). Then we obtain an approximation result of such functions by a neural network. Subsequently, we emulate the ideal attack process by a neural network and reduce the adversarial training to a mathematical game between an attack network and a training model (a defense network). We also obtain convergence rates of adversarial loss in terms of the sample size $n$ for adversarial training in such a setting. △ Less

Submitted 29 July, 2023; originally announced July 2023.

MSC Class: 68T99; 62G20; 49K35; 34A34

arXiv:2307.08252 [pdf, other]

Large-Scale Person Detection and Localization using Overhead Fisheye Cameras

Authors: Lu Yang, Liulei Li, Xueshi Xin, Yifan Sun, Qing Song, Wenguan Wang

Abstract: Location determination finds wide applications in daily life. Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras. Such solutions are advantageous in large field of view (FOV), low cost, anti-occlusion, and unaggressive work mode (without the necessity of… ▽ More Location determination finds wide applications in daily life. Instead of existing efforts devoted to localizing tourist photos captured by perspective cameras, in this article, we focus on devising person positioning solutions using overhead fisheye cameras. Such solutions are advantageous in large field of view (FOV), low cost, anti-occlusion, and unaggressive work mode (without the necessity of cameras carried by persons). However, related studies are quite scarce, due to the paucity of data. To stimulate research in this exciting area, we present LOAF, the first large-scale overhead fisheye dataset for person detection and localization. LOAF is built with many essential features, e.g., i) the data cover abundant diversities in scenes, human pose, density, and location; ii) it contains currently the largest number of annotated pedestrian, i.e., 457K bounding boxes with groundtruth location information; iii) the body-boxes are labeled as radius-aligned so as to fully address the positioning challenge. To approach localization, we build a fisheye person detection network, which exploits the fisheye distortions by a rotation-equivariant training strategy and predict radius-aligned human boxes end-to-end. Then, the actual locations of the detected persons are calculated by a numerical solution on the fisheye model and camera altitude data. Extensive experiments on LOAF validate the superiority of our fisheye detector w.r.t. previous methods, and show that our whole fisheye positioning solution is able to locate all persons in FOV with an accuracy of 0.5 m, within 0.1 s. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: ICCV 2023. Project page: https://LOAFisheye.github.io

arXiv:2306.13641 [pdf, other]

A New Paradigm for Generative Adversarial Networks based on Randomized Decision Rules

Authors: Sehwan Kim, Qifan Song, Faming Liang

Abstract: The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity amon… ▽ More The Generative Adversarial Network (GAN) was recently introduced in the literature as a novel machine learning method for training generative models. It has many applications in statistics such as nonparametric clustering and nonparametric conditional independence tests. However, training the GAN is notoriously difficult due to the issue of mode collapse, which refers to the lack of diversity among generated data. In this paper, we identify the reasons why the GAN suffers from this issue, and to address it, we propose a new formulation for the GAN based on randomized decision rules. In the new formulation, the discriminator converges to a fixed point while the generator converges to a distribution at the Nash equilibrium. We propose to train the GAN by an empirical Bayes-like method by treating the discriminator as a hyper-parameter of the posterior distribution of the generator. Specifically, we simulate generators from its posterior distribution conditioned on the discriminator using a stochastic gradient Markov chain Monte Carlo (MCMC) algorithm, and update the discriminator using stochastic gradient descent along with simulations of the generators. We establish convergence of the proposed method to the Nash equilibrium. Apart from image generation, we apply the proposed method to nonparametric clustering and nonparametric conditional independence tests. A portion of the numerical results is presented in the supplementary material. △ Less

Submitted 23 June, 2023; originally announced June 2023.

arXiv:2306.02283 [pdf, other]

Matrix Completion from General Deterministic Sampling Patterns

Authors: Hanbyul Lee, Rahul Mazumder, Qifan Song, Jean Honorio

Abstract: Most of the existing works on provable guarantees for low-rank matrix completion algorithms rely on some unrealistic assumptions such that matrix entries are sampled randomly or the sampling pattern has a specific structure. In this work, we establish theoretical guarantee for the exact and approximate low-rank matrix completion problems which can be applied to any deterministic sampling schemes.… ▽ More Most of the existing works on provable guarantees for low-rank matrix completion algorithms rely on some unrealistic assumptions such that matrix entries are sampled randomly or the sampling pattern has a specific structure. In this work, we establish theoretical guarantee for the exact and approximate low-rank matrix completion problems which can be applied to any deterministic sampling schemes. For this, we introduce a graph having observed entries as its edge set, and investigate its graph properties involving the performance of the standard constrained nuclear norm minimization algorithm. We theoretically and experimentally show that the algorithm can be successful as the observation graph is well-connected and has similar node degrees. Our result can be viewed as an extension of the works by Bhojanapalli and Jain [2014] and Burnwal and Vidyasagar [2020], in which the node degrees of the observation graph were assumed to be the same. In particular, our theory significantly improves their results when the underlying matrix is symmetric. △ Less

Submitted 4 June, 2023; originally announced June 2023.

arXiv:2305.14146 [pdf, other]

Industry Practices for Challenging Autonomous Driving Systems with Critical Scenarios

Authors: Qunying Song, Emelie Engström, Per Runeson

Abstract: Testing autonomous driving systems for safety and reliability is extremely complex. A primary challenge is identifying the relevant test scenarios, especially the critical ones that may expose hazards or risks of harm to autonomous vehicles and other road users. There are several proposed methods and tools for critical scenario identification, while the industry practices, such as the selection, i… ▽ More Testing autonomous driving systems for safety and reliability is extremely complex. A primary challenge is identifying the relevant test scenarios, especially the critical ones that may expose hazards or risks of harm to autonomous vehicles and other road users. There are several proposed methods and tools for critical scenario identification, while the industry practices, such as the selection, implementation, and limitations of the approaches, are not well understood. In this study, we conducted 10 interviews with 13 interviewees from 7 companies in autonomous driving in Sweden. We used thematic modeling to analyse and synthesize the interview data. We found there are little joint efforts in the industry to explore different approaches and tools, and every approach has its own limitations and weaknesses. To that end, we recommend combining different approaches available, collaborating among different stakeholders, and continuously learning the field of critical scenario identification and testing. The contributions of our study are the exploration and synthesis of the industry practices and related challenges for critical scenario identification and testing, and the potential increase of the industry relevance for future studies in related topics. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: 29 pages, 3 figures, submitted to a journal

arXiv:2305.08721 [pdf, other]

Learning More Discriminative Local Descriptors for Few-shot Learning

Authors: Qijun Song, Siyun Zhou, Liwei Xu

Abstract: Few-shot learning for image classification comes up as a hot topic in computer vision, which aims at fast learning from a limited number of labeled images and generalize over the new tasks. In this paper, motivated by the idea of Fisher Score, we propose a Discriminative Local Descriptors Attention (DLDA) model that adaptively selects the representative local descriptors and does not introduce any… ▽ More Few-shot learning for image classification comes up as a hot topic in computer vision, which aims at fast learning from a limited number of labeled images and generalize over the new tasks. In this paper, motivated by the idea of Fisher Score, we propose a Discriminative Local Descriptors Attention (DLDA) model that adaptively selects the representative local descriptors and does not introduce any additional parameters, while most of the existing local descriptors based methods utilize the neural networks that inevitably involve the tedious parameter tuning. Moreover, we modify the traditional $k$-NN classification model by adjusting the weights of the $k$ nearest neighbors according to their distances from the query point. Experiments on four benchmark datasets show that our method not only achieves higher accuracy compared with the state-of-art approaches for few-shot learning, but also possesses lower sensitivity to the choices of $k$. △ Less

Submitted 15 May, 2023; originally announced May 2023.

arXiv:2305.08541 [pdf, other]

Ripple sparse self-attention for monaural speech enhancement

Authors: Qiquan Zhang, Hongxu Zhu, Qi Song, Xinyuan Qian, Zhaoheng Ni, Haizhou Li

Abstract: The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self… ▽ More The use of Transformer represents a recent success in speech enhancement. However, as its core component, self-attention suffers from quadratic complexity, which is computationally prohibited for long speech recordings. Moreover, it allows each time frame to attend to all time frames, neglecting the strong local correlations of speech signals. This study presents a simple yet effective sparse self-attention for speech enhancement, called ripple attention, which simultaneously performs fine- and coarse-grained modeling for local and global dependencies, respectively. Specifically, we employ local band attention to enable each frame to attend to its closest neighbor frames in a window at fine granularity, while employing dilated attention outside the window to model the global dependencies at a coarse granularity. We evaluate the efficacy of our ripple attention for speech enhancement on two commonly used training objectives. Extensive experimental results consistently confirm the superior performance of the ripple attention design over standard full self-attention, blockwise attention, and dual-path attention (Sep-Former) in terms of speech quality and intelligibility. △ Less

Submitted 15 May, 2023; originally announced May 2023.

Comments: 5 pages, ICASSP 2023 published

arXiv:2305.05838 [pdf, other]

Generative Steganographic Flow

Authors: Ping Wei, Ge Luo, Qi Song, Xinpeng Zhang, Zhenxing Qian, Sheng Li

Abstract: Generative steganography (GS) is a new data hiding manner, featuring direct generation of stego media from secret data. Existing GS methods are generally criticized for their poor performances. In this paper, we propose a novel flow based GS approach -- Generative Steganographic Flow (GSF), which provides direct generation of stego images without cover image. We take the stego image generation and… ▽ More Generative steganography (GS) is a new data hiding manner, featuring direct generation of stego media from secret data. Existing GS methods are generally criticized for their poor performances. In this paper, we propose a novel flow based GS approach -- Generative Steganographic Flow (GSF), which provides direct generation of stego images without cover image. We take the stego image generation and secret data recovery process as an invertible transformation, and build a reversible bijective mapping between input secret data and generated stego images. In the forward mapping, secret data is hidden in the input latent of Glow model to generate stego images. By reversing the mapping, hidden data can be extracted exactly from generated stego images. Furthermore, we propose a novel latent optimization strategy to improve the fidelity of stego images. Experimental results show our proposed GSF has far better performances than SOTA works. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: The accepted paper in ICME 2022

arXiv:2303.15878 [pdf, ps, other]

Multidimensional Resource Fragmentation-Aware Virtual Network Embedding in MEC Systems Interconnected by Metro Optical Networks

Authors: Yingying Guan, Qingyang Song, Weijing Qi, Ke Li, Lei Guo, Abbas Jamalipour

Abstract: The increasing demand for diverse emerging applications has resulted in the interconnection of multi-access edge computing (MEC) systems via metro optical networks. To cater to these diverse applications, network slicing has become a popular tool for creating specialized virtual networks. However, resource fragmentation caused by uneven utilization of multidimensional resources can lead to reduced… ▽ More The increasing demand for diverse emerging applications has resulted in the interconnection of multi-access edge computing (MEC) systems via metro optical networks. To cater to these diverse applications, network slicing has become a popular tool for creating specialized virtual networks. However, resource fragmentation caused by uneven utilization of multidimensional resources can lead to reduced utilization of limited edge resources. To tackle this issue, this paper focuses on addressing the multidimensional resource fragmentation problem in virtual network embedding (VNE) in MEC systems with the aim of maximizing the profit of an infrastructure provider (InP). The VNE problem in MEC systems is transformed into a bilevel optimization problem, taking into account the interdependence between virtual node embedding (VNoE) and virtual link embedding (VLiE). To solve this problem, we propose a nested bilevel optimization approach named BiVNE. The VNoE is solved using the ant colony system (ACS) in the upper level, while the VLiE is solved using a combination of a shortest path algorithm and an exact-fit spectrum slot allocation method in the lower level. Evaluation results show that the BiVNE algorithm can effectively enhance the profit of the InP by increasing the acceptance ratio and avoiding resource fragmentation simultaneously. △ Less

Submitted 28 March, 2023; originally announced March 2023.

arXiv:2303.06548 [pdf, other]

CoT-MISR:Marrying Convolution and Transformer for Multi-Image Super-Resolution

Authors: Mingming Xiu, Yang Nie, Qing Song, Chun Liu

Abstract: As a method of image restoration, image super-resolution has been extensively studied at first. How to transform a low-resolution image to restore its high-resolution image information is a problem that researchers have been exploring. In the early physical transformation methods, the high-resolution pictures generated by these methods always have a serious problem of missing information, and the… ▽ More As a method of image restoration, image super-resolution has been extensively studied at first. How to transform a low-resolution image to restore its high-resolution image information is a problem that researchers have been exploring. In the early physical transformation methods, the high-resolution pictures generated by these methods always have a serious problem of missing information, and the edges and details can not be well recovered. With the development of hardware technology and mathematics, people begin to use in-depth learning methods for image super-resolution tasks, from direct in-depth learning models, residual channel attention networks, bi-directional suppression networks, to tr networks with transformer network modules, which have gradually achieved good results. In the research of multi-graph super-resolution, thanks to the establishment of multi-graph super-resolution dataset, we have experienced the evolution from convolution model to transformer model, and the quality of super-resolution has been continuously improved. However, we find that neither pure convolution nor pure tr network can make good use of low-resolution image information. Based on this, we propose a new end-to-end CoT-MISR network. CoT-MISR network makes up for local and global information by using the advantages of convolution and tr. The validation of dataset under equal parameters shows that our CoT-MISR network has reached the optimal score index. △ Less

Submitted 11 March, 2023; originally announced March 2023.

arXiv:2303.04030 [pdf, other]

PyXAB -- A Python Library for $\mathcal{X}$-Armed Bandit and Online Blackbox Optimization Algorithms

Authors: Wenjie Li, Haoze Li, Jean Honorio, Qifan Song

Abstract: We introduce a Python open-source library for $\mathcal{X}$-armed bandit and online blackbox optimization named PyXAB. PyXAB contains the implementations for more than 10 $\mathcal{X}$-armed bandit algorithms, such as HOO, StoSOO, HCT, and the most recent works GPO and VHCT. PyXAB also provides the most commonly-used synthetic objectives to evaluate the performance of different algorithms and the… ▽ More We introduce a Python open-source library for $\mathcal{X}$-armed bandit and online blackbox optimization named PyXAB. PyXAB contains the implementations for more than 10 $\mathcal{X}$-armed bandit algorithms, such as HOO, StoSOO, HCT, and the most recent works GPO and VHCT. PyXAB also provides the most commonly-used synthetic objectives to evaluate the performance of different algorithms and the various choices of the hierarchical partitions on the parameter space. The online documentation for PyXAB includes clear instructions for installation, straight-forward examples, detailed feature descriptions, and a complete reference of the API. PyXAB is released under the MIT license in order to encourage both academic and industrial usage. The library can be directly installed from PyPI with its source code available at https://github.com/WilliamLwj/PyXAB △ Less

Submitted 7 March, 2023; originally announced March 2023.

arXiv:2303.03166 [pdf, other]

Faster Learning of Temporal Action Proposal via Sparse Multilevel Boundary Generator

Authors: Qing Song, Yang Zhou, Mengjie Hu, Chun Liu

Abstract: Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG)… ▽ More Temporal action localization in videos presents significant challenges in the field of computer vision. While the boundary-sensitive method has been widely adopted, its limitations include incomplete use of intermediate and global information, as well as an inefficient proposal feature generator. To address these challenges, we propose a novel framework, Sparse Multilevel Boundary Generator (SMBG), which enhances the boundary-sensitive method with boundary classification and action completeness regression. SMBG features a multi-level boundary module that enables faster processing by gathering boundary information at different lengths. Additionally, we introduce a sparse extraction confidence head that distinguishes information inside and outside the action, further optimizing the proposal feature generator. To improve the synergy between multiple branches and balance positive and negative samples, we propose a global guidance loss. Our method is evaluated on two popular benchmarks, ActivityNet-1.3 and THUMOS14, and is shown to achieve state-of-the-art performance, with a better inference speed (2.47xBSN++, 2.12xDBG). These results demonstrate that SMBG provides a more efficient and simple solution for generating temporal action proposals. Our proposed framework has the potential to advance the field of computer vision and enhance the accuracy and speed of temporal action localization in video analysis.The code and models are made available at \url{https://github.com/zhouyang-001/SMBG-for-temporal-action-proposal}. △ Less

Submitted 6 March, 2023; originally announced March 2023.

Comments: 18 pages, 5 figures

arXiv:2302.10827 [pdf, other]

doi 10.1145/3544548.3581082

AutoML in The Wild: Obstacles, Workarounds, and Expectations

Authors: Yuan Sun, Qiurong Song, Xinning Gui, Fenglong Ma, Ting Wang

Abstract: Automated machine learning (AutoML) is envisioned to make ML techniques accessible to ordinary users. Recent work has investigated the role of humans in enhancing AutoML functionality throughout a standard ML workflow. However, it is also critical to understand how users adopt existing AutoML solutions in complex, real-world settings from a holistic perspective. To fill this gap, this study conduc… ▽ More Automated machine learning (AutoML) is envisioned to make ML techniques accessible to ordinary users. Recent work has investigated the role of humans in enhancing AutoML functionality throughout a standard ML workflow. However, it is also critical to understand how users adopt existing AutoML solutions in complex, real-world settings from a holistic perspective. To fill this gap, this study conducted semi-structured interviews of AutoML users (N=19) focusing on understanding (1) the limitations of AutoML encountered by users in their real-world practices, (2) the strategies users adopt to cope with such limitations, and (3) how the limitations and workarounds impact their use of AutoML. Our findings reveal that users actively exercise user agency to overcome three major challenges arising from customizability, transparency, and privacy. Furthermore, users make cautious decisions about whether and how to apply AutoML on a case-by-case basis. Finally, we derive design implications for developing future AutoML solutions. △ Less

Submitted 3 April, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI'23), April 23-28, 2023, Hamburg, Germany

arXiv:2302.09693 [pdf, other]

mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization

Authors: Kayhan Behdin, Qingquan Song, Aman Gupta, Sathiya Keerthi, Ayan Acharya, Borja Ocejo, Gregory Dexter, Rajiv Khanna, David Durfee, Rahul Mazumder

Abstract: Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as… ▽ More Modern deep learning models are over-parameterized, where different optima can result in widely varying generalization performance. The Sharpness-Aware Minimization (SAM) technique modifies the fundamental loss function that steers gradient descent methods toward flatter minima, which are believed to exhibit enhanced generalization prowess. Our study delves into a specific variant of SAM known as micro-batch SAM (mSAM). This variation involves aggregating updates derived from adversarial perturbations across multiple shards (micro-batches) of a mini-batch during training. We extend a recently developed and well-studied general framework for flatness analysis to theoretically show that SAM achieves flatter minima than SGD, and mSAM achieves even flatter minima than SAM. We provide a thorough empirical evaluation of various image classification and natural language processing tasks to substantiate this theoretical advancement. We also show that contrary to previous work, mSAM can be implemented in a flexible and parallelizable manner without significantly increasing computational costs. Our implementation of mSAM yields superior generalization performance across a wide range of tasks compared to SAM, further supporting our theoretical framework. △ Less

Submitted 30 September, 2023; v1 submitted 19 February, 2023; originally announced February 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2212.04343

Showing 1–50 of 165 results for author: Song, Q