subscribe to arXiv mailings

RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization

Authors: Mingshu Zhao, Yi Luo, Yong Ouyang

Abstract: In the realm of resource-constrained mobile vision tasks, the pursuit of efficiency and performance consistently drives innovation in lightweight Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). While ViTs excel at capturing global context through self-attention mechanisms, their deployment in resource-limited environments is hindered by computational complexity and latency. Co… ▽ More In the realm of resource-constrained mobile vision tasks, the pursuit of efficiency and performance consistently drives innovation in lightweight Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). While ViTs excel at capturing global context through self-attention mechanisms, their deployment in resource-limited environments is hindered by computational complexity and latency. Conversely, lightweight CNNs are favored for their parameter efficiency and low latency. This study investigates the complementary advantages of CNNs and ViTs to develop a versatile vision backbone tailored for resource-constrained applications. We introduce RepNeXt, a novel model series integrates multi-scale feature representations and incorporates both serial and parallel structural reparameterization (SRP) to enhance network depth and width without compromising inference speed. Extensive experiments demonstrate RepNeXt's superiority over current leading lightweight CNNs and ViTs, providing advantageous latency across various vision benchmarks. RepNeXt-M4 matches RepViT-M1.5's 82.3\% accuracy on ImageNet within 1.5ms on an iPhone 12, outperforms its AP$^{box}$ by 1.1 on MS-COCO, and reduces parameters by 0.7M. Codes and models are available at https://github.com/suous/RepNeXt. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: Tech report

arXiv:2406.15699 [pdf, other]

Self-Supervised Alignment Learning for Medical Image Segmentation

Authors: Haofeng Li, Yiming Ouyang, Xiang Wan

Abstract: Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment… ▽ More Recently, self-supervised learning (SSL) methods have been used in pre-training the segmentation models for 2D and 3D medical images. Most of these methods are based on reconstruction, contrastive learning and consistency regularization. However, the spatial correspondence of 2D slices from a 3D medical image has not been fully exploited. In this paper, we propose a novel self-supervised alignment learning framework to pre-train the neural network for medical image segmentation. The proposed framework consists of a new local alignment loss and a global positional loss. We observe that in the same 3D scan, two close 2D slices usually contain similar anatomic structures. Thus, the local alignment loss is proposed to make the pixel-level features of matched structures close to each other. Experimental results show that the proposed alignment learning is competitive with existing self-supervised pre-training approaches on CT and MRI datasets, under the setting of limited annotations. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: Accepted by (ISBI 2024) 2024 IEEE International Symposium on Biomedical Imaging

arXiv:2406.13960 [pdf, other]

Evolving to be Your Soulmate: Personalized Dialogue Agents with Dynamically Adapted Personas

Authors: Yi Cheng, Wenge Liu, Kaishuai Xu, Wenjun Hou, Yi Ouyang, Chak Tou Leong, Xian Wu, Yefeng Zheng

Abstract: Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its per… ▽ More Previous research on persona-based dialogue agents typically preset the agent's persona before deployment, which remains static thereafter. In this paper, we take a step further and explore a new paradigm called Self-evolving Personalized Dialogue Agents (SPDA), where the agent continuously evolves during the conversation to better align with the user's anticipation by dynamically adapting its persona. This paradigm could enable better personalization for each user, but also introduce unique challenges, which mainly lie in the process of persona adaptation. Two key issues include how to achieve persona alignment with the user and how to ensure smooth transition in the adaptation process. To address them, we propose a novel framework that refines the persona at hierarchical levels to progressively align better with the user in a controllable way. Experiments show that integrating the personas adapted by our framework consistently enhances personalization and overall dialogue performance across various base systems. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: Work in progress

arXiv:2406.12174 [pdf, other]

Expected Bipartite Matching Distance in A $D$-dimensional $L^p$ Space: Approximate Closed-form Formulas and Applications to Mobility Services

Authors: Shiyu Shen, Yuhui Zhai, Yanfeng Ouyang

Abstract: Although many well-known algorithms can solve the bipartite matching problem instance efficiently, it remains an open question how one could estimate the expected optimal matching distance for arbitrary numbers of randomly distributed vertices in a $D$-dimensional $L^p$ space (referred to as a random bipartite matching problem, or RBMP). This paper proposes an analytical model with closed-form for… ▽ More Although many well-known algorithms can solve the bipartite matching problem instance efficiently, it remains an open question how one could estimate the expected optimal matching distance for arbitrary numbers of randomly distributed vertices in a $D$-dimensional $L^p$ space (referred to as a random bipartite matching problem, or RBMP). This paper proposes an analytical model with closed-form formulas (without statistical curve-fitting) that estimate both the probability distribution and expectation of the optimal matching distance of RBMP. Simpler asymptotic approximations of the formulas are also developed for some special cases. A series of Monte-Carlo simulation experiments are conducted to verify the accuracy of the proposed formulas under varying conditions. These proposed distance estimates could be key for strategic performance evaluation and resource planning in a wide variety of application contexts. To illustrate their usefulness, we focus on mobility service systems where matches must be made between customers and service vehicles that are randomly distributed over time and space. We show how the proposed distance formulas provide a theoretical foundation for the empirically assumed Cobb-Douglas matching function for taxi systems, and reveal conditions under which the matching function can be suitable. Our formulas can also be easily incorporated into optimization models to select taxi operation strategies (e.g., whether newly arriving customers shall be instantly matched or pooled into a batch for matching). Agent-based simulations are conducted to verify the predicted performance of the demand pooling strategy for two types of e-hailing taxi systems. The results not only demonstrate the accuracy of the proposed model estimates under various service conditions, but also offer valuable managerial insights for service operators to optimize their strategies. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10870 [pdf, other]

COOL: Comprehensive Knowledge Enhanced Prompt Learning for Domain Adaptive Few-shot Fake News Detection

Authors: Yi Ouyang, Peng Wu, Li Pan

Abstract: Most Fake News Detection (FND) methods often struggle with data scarcity for emerging news domain. Recently, prompt learning based on Pre-trained Language Models (PLM) has emerged as a promising approach in domain adaptive few-shot learning, since it greatly reduces the need for labeled data by bridging the gap between pre-training and downstream task. Furthermore, external knowledge is also helpf… ▽ More Most Fake News Detection (FND) methods often struggle with data scarcity for emerging news domain. Recently, prompt learning based on Pre-trained Language Models (PLM) has emerged as a promising approach in domain adaptive few-shot learning, since it greatly reduces the need for labeled data by bridging the gap between pre-training and downstream task. Furthermore, external knowledge is also helpful in verifying emerging news, as emerging news often involves timely knowledge that may not be contained in the PLM's outdated prior knowledge. To this end, we propose COOL, a Comprehensive knOwledge enhanced prOmpt Learning method for domain adaptive few-shot FND. Specifically, we propose a comprehensive knowledge extraction module to extract both structured and unstructured knowledge that are positively or negatively correlated with news from external sources, and adopt an adversarial contrastive enhanced hybrid prompt learning strategy to model the domain-invariant news-knowledge interaction pattern for FND. Experimental results demonstrate the superiority of COOL over various state-of-the-arts. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.05975 [pdf, ps, other]

Divisibility of class numbers of quadratic fields and a conjecture of Iizuka

Authors: Yi Ouyang, Qimin Song

Abstract: Assume $x,\ y,\ n$ are positive integers and $n$ is odd. In this note, we show that the class number of the imaginary quadratic field $\mathbb{Q}(\sqrt{x^{2}-y^{n}})$ is divisible by $n$ for fixed $x, n$ if $\gcd(2x,y)=1$ and $y>C$ where $C$ is a constant depending only on $x$ and $n$. Based on this result, for any odd integer $n$ and any positive integer $m$, we construct an infinite family of… ▽ More Assume $x,\ y,\ n$ are positive integers and $n$ is odd. In this note, we show that the class number of the imaginary quadratic field $\mathbb{Q}(\sqrt{x^{2}-y^{n}})$ is divisible by $n$ for fixed $x, n$ if $\gcd(2x,y)=1$ and $y>C$ where $C$ is a constant depending only on $x$ and $n$. Based on this result, for any odd integer $n$ and any positive integer $m$, we construct an infinite family of $m+1$ successive imaginary quadratic fields $\mathbb{Q}(\sqrt{d})$, $\mathbb{Q}(\sqrt{d+1^{2}})$, $\cdots$, $\mathbb{Q}(\sqrt{d+m^{2}})$ $(d\in \mathbb{Z})$ whose class numbers are all divisible by $n$. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2405.16876 [pdf, other]

Transfer Learning for Diffusion Models

Authors: Yidong Ouyang, Liyan Xie, Hongyuan Zha, Guang Cheng

Abstract: Diffusion models, a specific type of generative model, have achieved unprecedented performance in recent years and consistently produce high-quality synthetic samples. A critical prerequisite for their notable success lies in the presence of a substantial number of training samples, which can be impractical in real-world applications due to high collection costs or associated risks. Consequently,… ▽ More Diffusion models, a specific type of generative model, have achieved unprecedented performance in recent years and consistently produce high-quality synthetic samples. A critical prerequisite for their notable success lies in the presence of a substantial number of training samples, which can be impractical in real-world applications due to high collection costs or associated risks. Consequently, various finetuning and regularization approaches have been proposed to transfer knowledge from existing pre-trained models to specific target domains with limited data. This paper introduces the Transfer Guided Diffusion Process (TGDP), a novel approach distinct from conventional finetuning and regularization methods. We prove that the optimal diffusion model for the target domain integrates pre-trained diffusion models on the source domain with additional guidance from a domain classifier. We further extend TGDP to a conditional version for modeling the joint distribution of data and its corresponding labels, together with two additional regularization terms to enhance the model performance. We validate the effectiveness of TGDP on Gaussian mixture simulations and on real electrocardiogram (ECG) datasets. △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

Comments: 24 pages

arXiv:2405.14391 [pdf, other]

Explainable Few-shot Knowledge Tracing

Authors: Haoxuan Li, Jifan Yu, Yuanxin Ouyang, Zhuang Liu, Wenge Rong, Juanzi Li, Zhang Xiong

Abstract: Knowledge tracing (KT), aiming to mine students' mastery of knowledge by their exercise records and predict their performance on future test questions, is a critical task in educational assessment. While researchers achieved tremendous success with the rapid development of deep learning techniques, current knowledge tracing tasks fall into the cracks from real-world teaching scenarios. Relying hea… ▽ More Knowledge tracing (KT), aiming to mine students' mastery of knowledge by their exercise records and predict their performance on future test questions, is a critical task in educational assessment. While researchers achieved tremendous success with the rapid development of deep learning techniques, current knowledge tracing tasks fall into the cracks from real-world teaching scenarios. Relying heavily on extensive student data and solely predicting numerical performances differs from the settings where teachers assess students' knowledge state from limited practices and provide explanatory feedback. To fill this gap, we explore a new task formulation: Explainable Few-shot Knowledge Tracing. By leveraging the powerful reasoning and generation abilities of large language models (LLMs), we then propose a cognition-guided framework that can track the student knowledge from a few student records while providing natural language explanations. Experimental results from three widely used datasets show that LLMs can perform comparable or superior to competitive deep knowledge tracing methods. We also discuss potential directions and call for future improvements in relevant topics. △ Less

Submitted 25 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12260 [pdf, ps, other]

On an upper bound of the set of copulas with a given curvilinear section

Authors: Yao Ouyang, Yonghui Sun, Hua-Peng Zhang

Abstract: The characterizations when two natural upper bounds of the set of copulas with a given diagonal section are copulas have been well studied in the literature. Given a curvilinear section, however, there is only a partial result concerning the characterization when a natural upper bound of the set of copulas is a copula. In this paper, we completely solve the characterization problem for this natura… ▽ More The characterizations when two natural upper bounds of the set of copulas with a given diagonal section are copulas have been well studied in the literature. Given a curvilinear section, however, there is only a partial result concerning the characterization when a natural upper bound of the set of copulas is a copula. In this paper, we completely solve the characterization problem for this natural upper bound to be a copula in the curvilinear case. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2405.05013 [pdf, ps, other]

doi 10.1093/mnras/stae1200

Potential Surface Ice Distribution on Close-in Terrestrial Exoplanets around M dwarfs

Authors: Yueyun Ouyang, Feng Ding

Abstract: Previous studies suggested that surface ice could be distributed on close-in terrestrial exoplanets around M-dwarfs if heat redistribution on the planets is very inefficient. In general, orbital and atmospheric parameters play an important role in the climate on terrestrial planets, including the cold-trap region where the permanent surface water reservoir can potentially be distributed. Here, we… ▽ More Previous studies suggested that surface ice could be distributed on close-in terrestrial exoplanets around M-dwarfs if heat redistribution on the planets is very inefficient. In general, orbital and atmospheric parameters play an important role in the climate on terrestrial planets, including the cold-trap region where the permanent surface water reservoir can potentially be distributed. Here, we develop a simple coupled land-atmosphere model to explore the potential surface ice distribution on close-in terrestrial planets with various orbital and atmospheric parameters, assuming that the planets are airless or have a thin \ce{N2} atmosphere. We find that the most significant factors in deciding the surface cold trap region are the spin-orbit ratio and obliquity. The incident stellar flux and the surface pressure play a limited role in the thin \ce{N2} simulations for incident flux smaller than Mercury's and surface pressure lower than 10$^4$ Pa. Our result illustrates the possible distribution of surface ice on arid terrestrial planets and can help to understand the climate of these exoplanets. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: Accepted at Monthly Notices of the Royal Astronomical Society

arXiv:2404.18620 [pdf, other]

FlexiFilm: Long Video Generation with Flexible Conditions

Authors: Yichen Ouyang, jianhao Yuan, Hao Zhao, Gaoang Wang, Bo zhao

Abstract: Generating long and consistent videos has emerged as a significant yet challenging problem. While most existing diffusion-based video generation models, derived from image generation models, demonstrate promising performance in generating short videos, their simple conditioning mechanism and sampling strategy-originally designed for image generation-cause severe performance degradation when adapte… ▽ More Generating long and consistent videos has emerged as a significant yet challenging problem. While most existing diffusion-based video generation models, derived from image generation models, demonstrate promising performance in generating short videos, their simple conditioning mechanism and sampling strategy-originally designed for image generation-cause severe performance degradation when adapted to long video generation. This results in prominent temporal inconsistency and overexposure. Thus, in this work, we introduce FlexiFilm, a new diffusion model tailored for long video generation. Our framework incorporates a temporal conditioner to establish a more consistent relationship between generation and multi-modal conditions, and a resampling strategy to tackle overexposure. Empirical results demonstrate FlexiFilm generates long and consistent videos, each over 30 seconds in length, outperforming competitors in qualitative and quantitative analyses. Project page: https://y-ichen.github.io/FlexiFilm-Page/ △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 9 pages, 9 figures

arXiv:2404.05962 [pdf, other]

Wasserstein Dependent Graph Attention Network for Collaborative Filtering with Uncertainty

Authors: Haoxuan Li, Yuanxin Ouyang, Zhuang Liu, Wenge Rong, Zhang Xiong

Abstract: Collaborative filtering (CF) is an essential technique in recommender systems that provides personalized recommendations by only leveraging user-item interactions. However, most CF methods represent users and items as fixed points in the latent space, lacking the ability to capture uncertainty. While probabilistic embedding is proposed to intergrate uncertainty, they suffer from several limitation… ▽ More Collaborative filtering (CF) is an essential technique in recommender systems that provides personalized recommendations by only leveraging user-item interactions. However, most CF methods represent users and items as fixed points in the latent space, lacking the ability to capture uncertainty. While probabilistic embedding is proposed to intergrate uncertainty, they suffer from several limitations when introduced to graph-based recommender systems. Graph convolutional network framework would confuse the semantic of uncertainty in the nodes, and similarity measured by Kullback-Leibler (KL) divergence suffers from degradation problem and demands an exponential number of samples. To address these challenges, we propose a novel approach, called the Wasserstein dependent Graph Attention network (W-GAT), for collaborative filtering with uncertainty. We utilize graph attention network and Wasserstein distance to learn Gaussian embedding for each user and item. Additionally, our method incorporates Wasserstein-dependent mutual information further to increase the similarity between positive pairs. Experimental results on three benchmark datasets show the superiority of W-GAT compared to several representative baselines. Extensive experimental analysis validates the effectiveness of W-GAT in capturing uncertainty by modeling the range of user preferences and categories associated with items. △ Less

Submitted 29 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

Comments: Accepted by IEEE TCSS

arXiv:2404.05291 [pdf, other]

Long-horizon Locomotion and Manipulation on a Quadrupedal Robot with Large Language Models

Authors: Yutao Ouyang, Jinhan Li, Yunfei Li, Zhongyu Li, Chao Yu, Koushil Sreenath, Yi Wu

Abstract: We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environm… ▽ More We present a large language model (LLM) based system to empower quadrupedal robots with problem-solving abilities for long-horizon tasks beyond short-term motions. Long-horizon tasks for quadrupeds are challenging since they require both a high-level understanding of the semantics of the problem for task planning and a broad range of locomotion and manipulation skills to interact with the environment. Our system builds a high-level reasoning layer with large language models, which generates hybrid discrete-continuous plans as robot code from task descriptions. It comprises multiple LLM agents: a semantic planner for sketching a plan, a parameter calculator for predicting arguments in the plan, and a code generator to convert the plan into executable robot code. At the low level, we adopt reinforcement learning to train a set of motion planning and control skills to unleash the flexibility of quadrupeds for rich environment interactions. Our system is tested on long-horizon tasks that are infeasible to complete with one single skill. Simulation and real-world experiments show that it successfully figures out multi-step strategies and demonstrates non-trivial behaviors, including building tools or notifying a human for help. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2404.02936 [pdf, other]

Min-K%++: Improved Baseline for Detecting Pre-Training Data from Large Language Models

Authors: Jingyang Zhang, Jingwei Sun, Eric Yeats, Yang Ouyang, Martin Kuo, Jianyi Zhang, Hao Frank Yang, Hai Li

Abstract: The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose… ▽ More The problem of pre-training data detection for large language models (LLMs) has received growing attention due to its implications in critical issues like copyright violation and test data contamination. Despite improved performance, existing methods (including the state-of-the-art, Min-K%) are mostly developed upon simple heuristics and lack solid, reasonable foundations. In this work, we propose a novel and theoretically motivated methodology for pre-training data detection, named Min-K%++. Specifically, we present a key insight that training samples tend to be local maxima of the modeled distribution along each input dimension through maximum likelihood training, which in turn allow us to insightfully translate the problem into identification of local maxima. Then, we design our method accordingly that works under the discrete distribution modeled by LLMs, whose core idea is to determine whether the input forms a mode or has relatively high probability under the conditional categorical distribution. Empirically, the proposed method achieves new SOTA performance across multiple settings. On the WikiMIA benchmark, Min-K%++ outperforms the runner-up by 6.2% to 10.5% in detection AUROC averaged over five models. On the more challenging MIMIR benchmark, it consistently improves upon reference-free methods while performing on par with reference-based method that requires an extra reference model. △ Less

Submitted 23 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

Comments: Project page and code is available at https://zjysteven.github.io/mink-plus-plus/

arXiv:2404.00639 [pdf, other]

RL-MUL: Multiplier Design Optimization with Deep Reinforcement Learning

Authors: Dongsheng Zuo, Jiadong Zhu, Yikang Ouyang, Yuzhe Ma

Abstract: Multiplication is a fundamental operation in many applications, and multipliers are widely adopted in various circuits. However, optimizing multipliers is challenging and non-trivial due to the huge design space. In this paper, we propose RL-MUL, a multiplier design optimization framework based on reinforcement learning. Specifically, we utilize matrix and tensor representations for the compressor… ▽ More Multiplication is a fundamental operation in many applications, and multipliers are widely adopted in various circuits. However, optimizing multipliers is challenging and non-trivial due to the huge design space. In this paper, we propose RL-MUL, a multiplier design optimization framework based on reinforcement learning. Specifically, we utilize matrix and tensor representations for the compressor tree of a multiplier, based on which the convolutional neural networks can be seamlessly incorporated as the agent network. The agent can learn to optimize the multiplier structure based on a Pareto-driven reward which is customized to accommodate the trade-off between area and delay. Additionally, the capability of RL-MUL is extended to optimize the fused multiply-accumulator (MAC) designs. Experiments are conducted on different bit widths of multipliers. The results demonstrate that the multipliers produced by RL-MUL can dominate all baseline designs in terms of area and delay. The performance gain of RL-MUL is further validated by comparing the area and delay of processing element arrays using multipliers from RL-MUL and baseline approaches. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Extension of DAC 2023 version

arXiv:2403.16702 [pdf, other]

ProCQA: A Large-scale Community-based Programming Question Answering Dataset for Code Search

Authors: Zehan Li, Jianfei Zhang, Chuantao Yin, Yuanxin Ouyang, Wenge Rong

Abstract: Retrieval-based code question answering seeks to match user queries in natural language to relevant code snippets. Previous approaches typically rely on pretraining models using crafted bi-modal and uni-modal datasets to align text and code representations. In this paper, we introduce ProCQA, a large-scale programming question answering dataset extracted from the StackOverflow community, offering… ▽ More Retrieval-based code question answering seeks to match user queries in natural language to relevant code snippets. Previous approaches typically rely on pretraining models using crafted bi-modal and uni-modal datasets to align text and code representations. In this paper, we introduce ProCQA, a large-scale programming question answering dataset extracted from the StackOverflow community, offering naturally structured mixed-modal QA pairs. To validate its effectiveness, we propose a modality-agnostic contrastive pre-training approach to improve the alignment of text and code representations of current code language models. Compared to previous models that primarily employ bimodal and unimodal pairs extracted from CodeSearchNet for pre-training, our model exhibits significant performance improvements across a wide range of code retrieval benchmarks. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Accepted to LREC-COLING 2024

arXiv:2403.16029 [pdf, other]

Planning Charging Stations and Service Operations of Dockless Electric Micromobility Systems

Authors: Yining Liu, Yanfeng Ouyang

Abstract: Dockless electric micro-mobility services (e.g., shared e-scooters and e-bikes) have been increasingly popular in the recent decade, and a variety of charging technologies have emerged for these services. The use of charging stations, to/from which service vehicles are transported by the riders for charging, poses as a promising approach because it reduces the need for dedicated staff or contracto… ▽ More Dockless electric micro-mobility services (e.g., shared e-scooters and e-bikes) have been increasingly popular in the recent decade, and a variety of charging technologies have emerged for these services. The use of charging stations, to/from which service vehicles are transported by the riders for charging, poses as a promising approach because it reduces the need for dedicated staff or contractors. However, unique challenges also arise, such as how to incentivize riders to drop off vehicles at stations and how to efficiently utilize the vehicles being charged at the stations. This paper focuses on dockless e-scooters as an example and develops a new spatial queuing network model to capture the steady-state scooter service cycles, battery consumption and charging processes, and the associated pricing and management mechanisms. Building upon this model, a system of closed-form equations is formulated and incorporated into a constrained nonlinear program to optimize the deployment of the service fleet, the design of charging stations (i.e., number, location, and capacity), user-based charging price promotions and priorities, and repositioning truck operations (i.e., headway and truck load). The proposed queuing network model is found to match very well with agent-based simulations. It is applied to a series of numerical experiments to draw insights into the optimal designs and the system performance. The numerical results reveal strong advantages of using charging stations for shared dockless electric micro-mobility services as compared to state-of-the-art alternatives. The proposed model can also be used to analyze other micromobility services and other charging approaches. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.10339 [pdf, other]

Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

Authors: Rui Zhang, Dawei Cheng, Xin Liu, Jie Yang, Yi Ouyang, Xian Wu, Yefeng Zheng

Abstract: Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively d… ▽ More Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively describes this phenomenon. To mitigate its impact, we propose a novel GNN model named Homophily Edge Generation Graph Neural Network (HedGe). Previous works typically focused on pruning, selecting or connecting on original relationships, and we refer to these methods as modifications. Different from these works, our method emphasizes generating new relationships with low class homophily variance, using the original relationships as an auxiliary. HedGe samples homophily adjacency matrices from scratch using a self-attention mechanism, and leverages nodes that are relevant in the feature space but not directly connected in the original graph. Additionally, we modify the loss function to punish the generation of unnecessary heterophilic edges by the model. Extensive comparison experiments demonstrate that HedGe achieved the best performance across multiple benchmark datasets, including anomaly detection and edgeless node classification. The proposed model also improves the robustness under the novel Heterophily Attack with increased class homophily variance on other graph classification tasks. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2403.00803 [pdf, other]

LiMAML: Personalization of Deep Recommender Models via Meta Learning

Authors: Ruofan Wang, Prakruthi Prabhakar, Gaurav Srivastava, Tianqi Wang, Zeinab S. Jalali, Varun Bharill, Yunbo Ouyang, Aastha Nigam, Divya Venugopalan, Aman Gupta, Fedor Borisyuk, Sathiya Keerthi, Ajith Muralidharan

Abstract: In the realm of recommender systems, the ubiquitous adoption of deep neural networks has emerged as a dominant paradigm for modeling diverse business objectives. As user bases continue to expand, the necessity of personalization and frequent model updates have assumed paramount significance to ensure the delivery of relevant and refreshed experiences to a diverse array of members. In this work, we… ▽ More In the realm of recommender systems, the ubiquitous adoption of deep neural networks has emerged as a dominant paradigm for modeling diverse business objectives. As user bases continue to expand, the necessity of personalization and frequent model updates have assumed paramount significance to ensure the delivery of relevant and refreshed experiences to a diverse array of members. In this work, we introduce an innovative meta-learning solution tailored to the personalization of models for individual members and other entities, coupled with the frequent updates based on the latest user interaction signals. Specifically, we leverage the Model-Agnostic Meta Learning (MAML) algorithm to adapt per-task sub-networks using recent user interaction data. Given the near infeasibility of productionizing original MAML-based models in online recommendation systems, we propose an efficient strategy to operationalize meta-learned sub-networks in production, which involves transforming them into fixed-sized vectors, termed meta embeddings, thereby enabling the seamless deployment of models with hundreds of billions of parameters for online serving. Through extensive experimentation on production data drawn from various applications at LinkedIn, we demonstrate that the proposed solution consistently outperforms the baseline models of those applications, including strong baselines such as using wide-and-deep ID based personalization approach. Our approach has enabled the deployment of a range of highly personalized AI models across diverse LinkedIn applications, leading to substantial improvements in business metrics as well as refreshed experience for our members. △ Less

Submitted 23 February, 2024; originally announced March 2024.

arXiv:2402.17236 [pdf]

doi 10.3868/s110-009-024-0004-9

A Review of Data Mining in Personalized Education: Current Trends and Future Prospects

Authors: Zhang Xiong, Haoxuan Li, Zhuang Liu, Zhuofan Chen, Hao Zhou, Wenge Rong, Yuanxin Ouyang

Abstract: Personalized education, tailored to individual student needs, leverages educational technology and artificial intelligence (AI) in the digital age to enhance learning effectiveness. The integration of AI in educational platforms provides insights into academic performance, learning preferences, and behaviors, optimizing the personal learning process. Driven by data mining techniques, it not only b… ▽ More Personalized education, tailored to individual student needs, leverages educational technology and artificial intelligence (AI) in the digital age to enhance learning effectiveness. The integration of AI in educational platforms provides insights into academic performance, learning preferences, and behaviors, optimizing the personal learning process. Driven by data mining techniques, it not only benefits students but also provides educators and institutions with tools to craft customized learning experiences. To offer a comprehensive review of recent advancements in personalized educational data mining, this paper focuses on four primary scenarios: educational recommendation, cognitive diagnosis, knowledge tracing, and learning analysis. This paper presents a structured taxonomy for each area, compiles commonly used datasets, and identifies future research directions, emphasizing the role of data mining in enhancing personalized education and paving the way for future exploration and innovation. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 25 pages, 5 figures

Journal ref: Frontiers of Digital Education, 2024 ,1(1): 26-50

arXiv:2402.11572 [pdf, other]

Cobra Effect in Reference-Free Image Captioning Metrics

Authors: Zheng Ma, Changxin Wang, Yawen Ouyang, Fei Zhao, Jianbing Zhang, Shujian Huang, Jiajun Chen

Abstract: Evaluating the compatibility between textual descriptions and corresponding images represents a core endeavor within multi-modal research. In recent years, a proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. Empirical evidence has substantiated that these innovative approaches exhibit a higher correlation with human judgment, marking a sign… ▽ More Evaluating the compatibility between textual descriptions and corresponding images represents a core endeavor within multi-modal research. In recent years, a proliferation of reference-free methods, leveraging visual-language pre-trained models (VLMs), has emerged. Empirical evidence has substantiated that these innovative approaches exhibit a higher correlation with human judgment, marking a significant advancement in the field. However, does a higher correlation with human evaluations alone sufficiently denote the complete of a metric? In response to this question, in this paper, we study if there are any deficiencies in reference-free metrics. Specifically, inspired by the Cobra Effect, we utilize metric scores as rewards to direct the captioning model toward generating descriptions that closely align with the metric's criteria. If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences. Our findings reveal that descriptions guided by these metrics contain significant flaws, e.g. incoherent statements and excessive repetition. Subsequently, we propose a novel method termed Self-Improving to rectify the identified shortcomings within these metrics. We employ GPT-4V as an evaluative tool to assess generated sentences and the result reveals that our approach achieves state-of-the-art (SOTA) performance. In addition, we also introduce a challenging evaluation benchmark called Flaws Caption to evaluate reference-free image captioning metrics comprehensively. Our code is available at https://github.com/aaronma2020/robust_captioning_metric △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: pre-print version

arXiv:2402.11139 [pdf, other]

LiGNN: Graph Neural Networks at LinkedIn

Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, Ping Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.08813 [pdf, other]

Model approximation in MDPs with unbounded per-step cost

Authors: Berk Bozkurt, Aditya Mahajan, Ashutosh Nayyar, Yi Ouyang

Abstract: We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference… ▽ More We consider the problem of designing a control policy for an infinite-horizon discounted cost Markov decision process $\mathcal{M}$ when we only have access to an approximate model $\hat{\mathcal{M}}$. How well does an optimal policy $\hatπ^{\star}$ of the approximate model perform when used in the original model $\mathcal{M}$? We answer this question by bounding a weighted norm of the difference between the value function of $\hatπ^\star $ when used in $\mathcal{M}$ and the optimal value function of $\mathcal{M}$. We then extend our results and obtain potentially tighter upper bounds by considering affine transformations of the per-step cost. We further provide upper bounds that explicitly depend on the weighted distance between cost functions and weighted distance between transition kernels of the original and approximate models. We present examples to illustrate our results. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.06859 [pdf, other]

LiRank: Industrial Large Scale Ranking Models at LinkedIn

Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including Dense Gating, Transformers and Residual DCN. We also propose novel techniques for calibration and describe how we productionalized deep learning based explore/exploit methods. To enable effective, production-grade serving of large ranking models, we detail how to train and compress models using quantization and vocabulary compression. We provide details about the deployment setup for large-scale use cases of Feed ranking, Jobs Recommendations, and Ads click-through rate (CTR) prediction. We summarize our learnings from various A/B tests by elucidating the most effective technical approaches. These ideas have contributed to relative metrics improvements across the board at LinkedIn: +0.5% member sessions in the Feed, +1.76% qualified job applications for Jobs search and recommendations, and +4.3% for Ads CTR. We hope this work can provide practical insights and solutions for practitioners interested in leveraging large-scale deep ranking systems. △ Less

Submitted 9 February, 2024; originally announced February 2024.

ACM Class: H.3.3

arXiv:2402.04093 [pdf, other]

Robust projective measurements through measuring code-inspired observables

Authors: Yingkai Ouyang

Abstract: Quantum measurements are ubiquitous in quantum information processing tasks, but errors can render their outputs unreliable. Here, we present a scheme that implements a robust projective measurement through measuring code-inspired observables. Namely, given a projective POVM, a classical code and a constraint on the number of measurement outcomes each observable can have, we construct commuting ob… ▽ More Quantum measurements are ubiquitous in quantum information processing tasks, but errors can render their outputs unreliable. Here, we present a scheme that implements a robust projective measurement through measuring code-inspired observables. Namely, given a projective POVM, a classical code and a constraint on the number of measurement outcomes each observable can have, we construct commuting observables whose measurement is equivalent to the projective measurement in the noiseless setting. Moreover, we can correct $t$ errors on the classical outcomes of the observables' measurement if the classical code corrects $t$ errors. Since our scheme does not require the encoding of quantum data onto a quantum error correction code, it can help construct robust measurements for near-term quantum algorithms that do not use quantum error correction. Moreover, our scheme works for any projective POVM, and hence can allow robust syndrome extraction procedures in non-stabilizer quantum error correction codes. △ Less

Submitted 6 February, 2024; originally announced February 2024.

Comments: 7 pages, 1 figure, 2 columns

arXiv:2401.14291 [pdf, other]

On the Algebraic Classification of Non-singular Flexible Kokotsakis Polyhedra

Authors: Yang Liu, Yi Ouyang, Dominik L. Michels

Abstract: Across various scientific and engineering domains, a growing interest in flexible and deployable structures is becoming evident. These structures facilitate seamless transitions between distinct states of shape and find broad applicability ranging from robotics and solar cells to meta-materials and architecture. In this contribution, we study a class of mechanisms known as Kokotsakis polyhedra wit… ▽ More Across various scientific and engineering domains, a growing interest in flexible and deployable structures is becoming evident. These structures facilitate seamless transitions between distinct states of shape and find broad applicability ranging from robotics and solar cells to meta-materials and architecture. In this contribution, we study a class of mechanisms known as Kokotsakis polyhedra with a quadrangular base. These are $3\times3$ quadrilateral meshes whose faces are rigid bodies and joined by hinges at the common edges. Compared to prior work, the quadrilateral faces do not have to be planar. In general, such meshes are not flexible, and the problem of finding and classifying the flexible ones is old, but until now largely unsolved. It appears that the tangent values of the dihedral angles between different faces are algebraically related through polynomials. Specifically, by fixing one angle as a parameter, the others can be parameterized algebraically and hence belong to an extended rational function field of the parameter. We use this approach to characterize shape restrictions resulting in flexible polyhedra. △ Less

Submitted 24 January, 2024; originally announced January 2024.

MSC Class: 12D05; 12F05; 52C25

arXiv:2401.12087 [pdf, other]

Revisiting Demonstration Selection Strategies in In-Context Learning

Authors: Keqin Peng, Liang Ding, Yancheng Yuan, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

Abstract: Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL), where a few examples are used to describe a task to the model. However, the performance of ICL varies significantly with the choice of demonstrations, and it is still unclear why this happens or what factors will influence its choice. In this work, we first revisit the fa… ▽ More Large language models (LLMs) have shown an impressive ability to perform a wide range of tasks using in-context learning (ICL), where a few examples are used to describe a task to the model. However, the performance of ICL varies significantly with the choice of demonstrations, and it is still unclear why this happens or what factors will influence its choice. In this work, we first revisit the factors contributing to this variance from both data and model aspects, and find that the choice of demonstration is both data- and model-dependent. We further proposed a data- and model-dependent demonstration selection method, \textbf{TopK + ConE}, based on the assumption that \textit{the performance of a demonstration positively correlates with its contribution to the model's understanding of the test samples}, resulting in a simple and effective recipe for ICL. Empirically, our method yields consistent improvements in both language understanding and generation tasks with different model scales. Further analyses confirm that, besides the generality and stability under different circumstances, our method provides a unified explanation for the effectiveness of previous methods. Code will be released. △ Less

Submitted 23 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: ACL 2024

arXiv:2401.05886 [pdf, other]

Finding the optimal probe state for multiparameter quantum metrology using conic programming

Authors: Masahito Hayashi, Yingkai Ouyang

Abstract: The aim of the channel estimation is to estimate the parameters encoded in a quantum channel. For this aim, it is allowed to choose the input state as well as the measurement to get the outcome. Various precision bounds are known for the state estimation. For the channel estimation, the respective bounds are determined depending on the choice of the input state. However, determining the optimal in… ▽ More The aim of the channel estimation is to estimate the parameters encoded in a quantum channel. For this aim, it is allowed to choose the input state as well as the measurement to get the outcome. Various precision bounds are known for the state estimation. For the channel estimation, the respective bounds are determined depending on the choice of the input state. However, determining the optimal input probe state and the corresponding precision bounds in estimation is a non-trivial problem, particularly in the multi-parameter setting, where parameters are often incompatible. In this paper, we present a conic programming framework that allows us to determine the optimal probe state for the corresponding multi-parameter precision bounds. The precision bounds we consider include the Holevo-Nagaoka bound and the tight precision bound that give the optimal performances of correlated and uncorrelated measurement strategies, respectively. Using our conic programming framework, we discuss the optimality of a maximally entangled probe state in various settings. We also apply our theory to analyze the canonical field sensing problem using entangled quantum probe states. △ Less

Submitted 26 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 36 pages, 2 columns, 5 figures. Title change, added references, and edited introduction

arXiv:2312.11792 [pdf, other]

COOPER: Coordinating Specialized Agents towards a Complex Dialogue Goal

Authors: Yi Cheng, Wenge Liu, Jian Wang, Chak Tou Leong, Yi Ouyang, Wenjie Li, Xian Wu, Yefeng Zheng

Abstract: In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems. Apart from the requirement for much more sophisticated strategic reasoning and communication skills, a significant challenge of these tasks lies in the difficulty of objectively measu… ▽ More In recent years, there has been a growing interest in exploring dialogues with more complex goals, such as negotiation, persuasion, and emotional support, which go beyond traditional service-focused dialogue systems. Apart from the requirement for much more sophisticated strategic reasoning and communication skills, a significant challenge of these tasks lies in the difficulty of objectively measuring the achievement of their goals in a quantifiable way, making it difficult for existing research to directly optimize the dialogue procedure towards them. In our work, we emphasize the multifaceted nature of complex dialogue goals and argue that it is more feasible to accomplish them by comprehensively considering and jointly promoting their different aspects. To this end, we propose a novel dialogue framework, Cooper, which coordinates multiple specialized agents, each dedicated to a specific dialogue goal aspect separately, to approach the complex objective. Through this divide-and-conquer manner, we make complex dialogue goals more approachable and elicit greater intelligence via the collaboration of individual agents. Experiments on persuasion and emotional support dialogues demonstrate the superiority of our method over a set of competitive baselines. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2310.14605 [pdf, other]

M2DF: Multi-grained Multi-curriculum Denoising Framework for Multimodal Aspect-based Sentiment Analysis

Authors: Fei Zhao, Chunhui Li, Zhen Wu, Yawen Ouyang, Jianbing Zhang, Xinyu Dai

Abstract: Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained Sentiment Analysis task, which has attracted growing research interests recently. Existing work mainly utilizes image information to improve the performance of MABSA task. However, most of the studies overestimate the importance of images since there are many noise images unrelated to the text in the dataset, which will have a ne… ▽ More Multimodal Aspect-based Sentiment Analysis (MABSA) is a fine-grained Sentiment Analysis task, which has attracted growing research interests recently. Existing work mainly utilizes image information to improve the performance of MABSA task. However, most of the studies overestimate the importance of images since there are many noise images unrelated to the text in the dataset, which will have a negative impact on model learning. Although some work attempts to filter low-quality noise images by setting thresholds, relying on thresholds will inevitably filter out a lot of useful image information. Therefore, in this work, we focus on whether the negative impact of noisy images can be reduced without modifying the data. To achieve this goal, we borrow the idea of Curriculum Learning and propose a Multi-grained Multi-curriculum Denoising Framework (M2DF), which can achieve denoising by adjusting the order of training data. Extensive experimental results show that our framework consistently outperforms state-of-the-art work on three sub-tasks of MABSA. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted by EMNLP 2023

arXiv:2310.12139 [pdf, ps, other]

Optimal and parameter-free gradient minimization methods for convex and nonconvex optimization

Authors: Guanghui Lan, Yuyuan Ouyang, Zhe Zhang

Abstract: We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed $\varepsilon$, we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by… ▽ More We propose novel optimal and parameter-free algorithms for computing an approximate solution with small (projected) gradient norm. Specifically, for computing an approximate solution such that the norm of its (projected) gradient does not exceed $\varepsilon$, we obtain the following results: a) for the convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L\|x_0 - x^*\|/\varepsilon}$, where $L$ is the Lipschitz constant of the gradient and $x^*$ is any optimal solution; b) for the strongly convex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{L/μ}\log(\|\nabla f(x_0)\|/ε)$, where $μ$ is the strong convexity modulus; and c) for the nonconvex case, the total number of gradient evaluations is bounded by $O(1)\sqrt{Ll}(f(x_0) - f(x^*))/\varepsilon^2$, where $l$ is the lower curvature constant. Our complexity results match the lower complexity bounds of the convex and strongly cases, and achieve the above best-known complexity bound for the nonconvex case for the first time in the literature. Moreover, for all the convex, strongly convex, and nonconvex cases, we propose parameter-free algorithms that do not require the input of any problem parameters. To the best of our knowledge, there do not exist such parameter-free methods before especially for the strongly convex and nonconvex cases. Since most regularity conditions (e.g., strong convexity and lower curvature) are imposed over a global scope, the corresponding problem parameters are notoriously difficult to estimate. However, gradient norm minimization equips us with a convenient tool to monitor the progress of algorithms and thus the ability to estimate such parameters in-situ. △ Less

Submitted 29 November, 2023; v1 submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.08439 [pdf, other]

TensorMD: Scalable Tensor-Diagram based Machine Learning Interatomic Potential on Heterogeneous Many-Core Processors

Authors: Xin Chen, Yucheng Ouyang, Xin Chen, Zhenchuan Chen, Rongfen Lin, Xingyu Gao, Lifang Wang, Fang Li, Yin Liu, Honghui Shang, Haifeng Song

Abstract: Molecular dynamics simulations have emerged as a potent tool for investigating the physical properties and kinetic behaviors of materials at the atomic scale, particularly in extreme conditions. Ab initio accuracy is now achievable with machine learning based interatomic potentials. With recent advancements in high-performance computing, highly accurate and large-scale simulations become feasible.… ▽ More Molecular dynamics simulations have emerged as a potent tool for investigating the physical properties and kinetic behaviors of materials at the atomic scale, particularly in extreme conditions. Ab initio accuracy is now achievable with machine learning based interatomic potentials. With recent advancements in high-performance computing, highly accurate and large-scale simulations become feasible. This study introduces TensorMD, a new machine learning interatomic potential (MLIP) model that integrates physical principles and tensor diagrams. The tensor formalism provides a more efficient computation and greater flexibility for use with other scientific codes. Additionally, we proposed several portable optimization strategies and developed a highly optimized version for the new Sunway supercomputer. Our optimized TensorMD can achieve unprecedented performance on the new Sunway, enabling simulations of up to 52 billion atoms with a time-to-solution of 31 ps/step/atom, setting new records for HPC + AI + MD. △ Less

Submitted 12 October, 2023; v1 submitted 12 October, 2023; originally announced October 2023.

arXiv:2309.14963 [pdf, ps, other]

Neighborhood of vertices in the isogeny graph of principally polarized superspecial abelian surfaces

Authors: Zheng Xu, Yi Ouyang, Zijian Zhou

Abstract: For two supersingular elliptic curves $E$ and $E'$ defined over $\mathbb{F}_{p^2}$, let $[E \times E']$ be the superspecial abelian surface with the principal polarization $\{0\} \times E' + E \times \{0\}$. We determine local structure of the vertices $[E \times E']$ in the $(\ell, \ell)$-isogeny graph of principally polarized superspecial abelian surfaces where either $E$ or $E'$ is defined over… ▽ More For two supersingular elliptic curves $E$ and $E'$ defined over $\mathbb{F}_{p^2}$, let $[E \times E']$ be the superspecial abelian surface with the principal polarization $\{0\} \times E' + E \times \{0\}$. We determine local structure of the vertices $[E \times E']$ in the $(\ell, \ell)$-isogeny graph of principally polarized superspecial abelian surfaces where either $E$ or $E'$ is defined over $\mathbb{F}_p$. We also present a simple new proof of the main theorem in \cite{LOX20}. △ Less

Submitted 12 March, 2024; v1 submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.11868 [pdf, ps, other]

A Radon-Nikodym theorem for monotone measures

Authors: Yao Ouyang, Jun Li

Abstract: A version of Radon-Nikodym theorem for the Choquet integral w.r.t. monotone measures is proved. Without any presumptive condition, we obtain a necessary and sufficient condition for the ordered pair $(μ, ν)$ of finite monotone measures to have the so-called Radon-Nikodym property related to a nonnegative measurable function $f$. If $ν$ is null-continuous and weakly null-additive, then $f$ is uniqu… ▽ More A version of Radon-Nikodym theorem for the Choquet integral w.r.t. monotone measures is proved. Without any presumptive condition, we obtain a necessary and sufficient condition for the ordered pair $(μ, ν)$ of finite monotone measures to have the so-called Radon-Nikodym property related to a nonnegative measurable function $f$. If $ν$ is null-continuous and weakly null-additive, then $f$ is uniquely determined almost everywhere by $ν$ and thus is called the Radon-Nikodym derivative of $μ$ w.r.t. $ν$. For $σ$-finite monotone measures, a Radon-Nikodym type theorem is also obtained under the assumption that the monotone measures are lower continuous and null-additive. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.09744 [pdf, other]

doi 10.1109/TVCG.2023.3285210

Towards Better Modeling with Missing Data: A Contrastive Learning-based Visual Analytics Perspective

Authors: Laixin Xie, Yang Ouyang, Longfei Chen, Ziming Wu, Quan Li

Abstract: Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need… ▽ More Missing data can pose a challenge for machine learning (ML) modeling. To address this, current approaches are categorized into feature imputation and label prediction and are primarily focused on handling missing data to enhance ML performance. These approaches rely on the observed data to estimate the missing values and therefore encounter three main shortcomings in imputation, including the need for different imputation methods for various missing data mechanisms, heavy dependence on the assumption of data distribution, and potential introduction of bias. This study proposes a Contrastive Learning (CL) framework to model observed data with missing values, where the ML model learns the similarity between an incomplete sample and its complete counterpart and the dissimilarity between other samples. Our proposed approach demonstrates the advantages of CL without requiring any imputation. To enhance interpretability, we introduce CIVis, a visual analytics system that incorporates interpretable techniques to visualize the learning process and diagnose the model status. Users can leverage their domain knowledge through interactive sampling to identify negative and positive pairs in CL. The output of CIVis is an optimized model that takes specified features and predicts downstream tasks. We provide two usage scenarios in regression and classification tasks and conduct quantitative experiments, expert interviews, and a qualitative user study to demonstrate the effectiveness of our approach. In short, this study offers a valuable contribution to addressing the challenges associated with ML modeling in the presence of missing data by providing a practical solution that achieves high predictive accuracy and model interpretability. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 18 pages, 11 figures. This paper is accepted by IEEE Transactions on Visualization and Computer Graphics (TVCG)

ACM Class: I.1.2; H.1.2; H.4.2

arXiv:2309.03599 [pdf, other]

Chasing Consistency in Text-to-3D Generation from a Single Image

Authors: Yichen Ouyang, Wenhao Chai, Jiayi Ye, Dapeng Tao, Yibing Zhan, Gaoang Wang

Abstract: Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the inconsistency issues, including 1) semantic inconsistency, 2) geometric inconsistency, and 3) saturation inconsistency, resulting in distorted, overfitted, and over-saturated generations. In light of the above issues, we p… ▽ More Text-to-3D generation from a single-view image is a popular but challenging task in 3D vision. Although numerous methods have been proposed, existing works still suffer from the inconsistency issues, including 1) semantic inconsistency, 2) geometric inconsistency, and 3) saturation inconsistency, resulting in distorted, overfitted, and over-saturated generations. In light of the above issues, we present Consist3D, a three-stage framework Chasing for semantic-, geometric-, and saturation-Consistent Text-to-3D generation from a single image, in which the first two stages aim to learn parameterized consistency tokens, and the last stage is for optimization. Specifically, the semantic encoding stage learns a token independent of views and estimations, promoting semantic consistency and robustness. Meanwhile, the geometric encoding stage learns another token with comprehensive geometry and reconstruction constraints under novel-view estimations, reducing overfitting and encouraging geometric consistency. Finally, the optimization stage benefits from the semantic and geometric tokens, allowing a low classifier-free guidance scale and therefore preventing oversaturation. Experimental results demonstrate that Consist3D produces more consistent, faithful, and photo-realistic 3D assets compared to previous state-of-the-art methods. Furthermore, Consist3D also allows background and object editing through text prompts. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 9 pages, 11 figures

arXiv:2308.15030 [pdf, other]

SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget

Authors: Rui Kong, Yuanchun Li, Qingtian Feng, Weijun Wang, Xiaozhou Ye, Ye Ouyang, Linghe Kong, Yunxin Liu

Abstract: Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts. However, serving MoE models on memory-constrained devices is challenging due to the large parameter size. Typical solutions such as memory swapping or expert pruning may lead to significantly higher latency or severe accuracy loss. In this paper, we int… ▽ More Mixture of experts (MoE) is a popular technique to improve capacity of Large Language Models (LLMs) with conditionally-activated parallel experts. However, serving MoE models on memory-constrained devices is challenging due to the large parameter size. Typical solutions such as memory swapping or expert pruning may lead to significantly higher latency or severe accuracy loss. In this paper, we introduce SwapMoE, a framework for efficient serving of MoE-based large language models with tunable memory budgets. The main idea of SwapMoE is to keep a small dynamic set of important experts, namely Virtual Experts, in the main memory for inference, while seamlessly maintaining how the Virtual Experts map to the actual experts. Experiments have shown that SwapMoE can reduce the memory footprint while maintaining reasonable accuracy. For example, on text summarization tasks with Switch Transformer, SwapMoE can reduce the memory consumption from 14.2 GiB to 4.7 GiB, together with 50\% latency reduction and a slight Rouge-2 score drop of 0.041. △ Less

Submitted 29 May, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: Accepted at ACL 2024

arXiv:2308.07248 [pdf]

Maintaining the validity of inference from linear mixed models in stepped-wedge cluster randomized trials under misspecified random-effects structures

Authors: Yongdong Ouyang, Monica Taljaard, Andrew B Forbes, Fan Li

Abstract: Linear mixed models are commonly used in analyzing stepped-wedge cluster randomized trials (SW-CRTs). A key consideration for analyzing a SW-CRT is accounting for the potentially complex correlation structure, which can be achieved by specifying a random effects structure. Common random effects structures for a SW-CRT include random intercept, random cluster-by-period, and discrete-time decay. Rec… ▽ More Linear mixed models are commonly used in analyzing stepped-wedge cluster randomized trials (SW-CRTs). A key consideration for analyzing a SW-CRT is accounting for the potentially complex correlation structure, which can be achieved by specifying a random effects structure. Common random effects structures for a SW-CRT include random intercept, random cluster-by-period, and discrete-time decay. Recently, more complex structures, such as the random intervention structure, have been proposed. In practice, specifying appropriate random effects can be challenging. Robust variance estimators (RVE) may be applied to linear mixed models to provide consistent estimators of standard errors of fixed effect parameters in the presence of random-effects misspecification. However, there has been no empirical investigation of RVE for SW-CRT. In this paper, we first review five RVEs (both standard and small-sample bias-corrected RVEs) that are available for linear mixed models. We then describe a comprehensive simulation study to examine the performance of these RVEs for SW-CRTs with a continuous outcome under different data generators. For each data generator, we investigate whether the use of a RVE with either the random intercept model or the random cluster-by-period model is sufficient to provide valid statistical inference for fixed effect parameters, when these working models are subject to misspecification. Our results indicate that the random intercept and random cluster-by-period models with RVEs performed similarly. The CR3 RVE estimator, coupled with the number of clusters minus two degrees of freedom correction, consistently gave the best coverage results, but could be slightly anti-conservative when the number of clusters was below 16. We summarize the implications of our results for linear mixed model analysis of SW-CRTs in practice. △ Less

Submitted 14 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

arXiv:2308.02229 [pdf, ps, other]

A priori estimates for higher-order fractional Laplace equations

Authors: Yugao Ouyang, Meiqing Xu, Ran Zhuo

Abstract: In this paper, we establish a priori estimates for the positive solutions to a higher-order fractional Laplace equation on a bounded domain by a blowing-up and rescaling argument. To overcome the technical difficulty due to the high-order and fractional order mixed operators, we divide the high-order fractional Laplacian equation into a system, and provide uniform estimates for each equation in th… ▽ More In this paper, we establish a priori estimates for the positive solutions to a higher-order fractional Laplace equation on a bounded domain by a blowing-up and rescaling argument. To overcome the technical difficulty due to the high-order and fractional order mixed operators, we divide the high-order fractional Laplacian equation into a system, and provide uniform estimates for each equation in the system. Finding a proper scaling parameter for the domain is the crux of rescaling argument to the above system, and the new idea is introduced in the rescaling proof, which may hopefully be applied to many other system problems. In order to derive a contradiction in the blowing-up proof, combining the moving planes method and suitable Kelvin transform, we prove a key Liouville-type theorem under a weaker regularity assumption in a half space. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2307.12227 [pdf, other]

FSLens: A Visual Analytics Approach to Evaluating and Optimizing the Spatial Layout of Fire Stations

Authors: Longfei Chen, He Wang, Yang Ouyang, Yang Zhou, Naiyu Wang, Quan Li

Abstract: The provision of fire services plays a vital role in ensuring the safety of residents' lives and property. The spatial layout of fire stations is closely linked to the efficiency of fire rescue operations. Traditional approaches have primarily relied on mathematical planning models to generate appropriate layouts by summarizing relevant evaluation criteria. However, this optimization process prese… ▽ More The provision of fire services plays a vital role in ensuring the safety of residents' lives and property. The spatial layout of fire stations is closely linked to the efficiency of fire rescue operations. Traditional approaches have primarily relied on mathematical planning models to generate appropriate layouts by summarizing relevant evaluation criteria. However, this optimization process presents significant challenges due to the extensive decision space, inherent conflicts among criteria, and decision-makers' preferences. To address these challenges, we propose FSLens, an interactive visual analytics system that enables in-depth evaluation and rational optimization of fire station layout. Our approach integrates fire records and correlation features to reveal fire occurrence patterns and influencing factors using spatiotemporal sequence forecasting. We design an interactive visualization method to explore areas within the city that are potentially under-resourced for fire service based on the fire distribution and existing fire station layout. Moreover, we develop a collaborative human-computer multi-criteria decision model that generates multiple candidate solutions for optimizing firefighting resources within these areas. We simulate and compare the impact of different solutions on the original layout through well-designed visualizations, providing decision-makers with the most satisfactory solution. We demonstrate the effectiveness of our approach through one case study with real-world datasets. The feedback from domain experts indicates that our system helps them to better identify and improve potential gaps in the current fire station layout. △ Less

Submitted 25 July, 2023; v1 submitted 23 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE VIS 2023

arXiv:2307.12199 [pdf, other]

Leveraging Historical Medical Records as a Proxy via Multimodal Modeling and Visualization to Enrich Medical Diagnostic Learning

Authors: Yang Ouyang, Yuchen Wu, He Wang, Chenyang Zhang, Furui Cheng, Chang Jiang, Lixia Jin, Yuanwu Cao, Quan Li

Abstract: Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnose… ▽ More Simulation-based Medical Education (SBME) has been developed as a cost-effective means of enhancing the diagnostic skills of novice physicians and interns, thereby mitigating the need for resource-intensive mentor-apprentice training. However, feedback provided in most SBME is often directed towards improving the operational proficiency of learners, rather than providing summative medical diagnoses that result from experience and time. Additionally, the multimodal nature of medical data during diagnosis poses significant challenges for interns and novice physicians, including the tendency to overlook or over-rely on data from certain modalities, and difficulties in comprehending potential associations between modalities. To address these challenges, we present DiagnosisAssistant, a visual analytics system that leverages historical medical records as a proxy for multimodal modeling and visualization to enhance the learning experience of interns and novice physicians. The system employs elaborately designed visualizations to explore different modality data, offer diagnostic interpretive hints based on the constructed model, and enable comparative analyses of specific patients. Our approach is validated through two case studies and expert interviews, demonstrating its effectiveness in enhancing medical training. △ Less

Submitted 22 July, 2023; originally announced July 2023.

Comments: Accepted by IEEE VIS 2023

arXiv:2307.11449 [pdf]

AIGC Empowering Telecom Sector White Paper_chinese

Authors: Ye Ouyang, Yaqin Zhang, Xiaozhou Ye, Yunxin Liu, Yong Song, Yang Liu, Sen Bian, Zhiyong Liu

Abstract: In the global craze of GPT, people have deeply realized that AI, as a transformative technology and key force in economic and social development, will bring great leaps and breakthroughs to the global industry and profoundly influence the future world competition pattern. As the builder and operator of information and communication infrastructure, the telecom sector provides infrastructure support… ▽ More In the global craze of GPT, people have deeply realized that AI, as a transformative technology and key force in economic and social development, will bring great leaps and breakthroughs to the global industry and profoundly influence the future world competition pattern. As the builder and operator of information and communication infrastructure, the telecom sector provides infrastructure support for the development of AI, and even takes the lead in the implementation of AI applications. How to enable the application of AIGC (GPT) and implement AIGC in the telecom sector are questions that telecom practitioners must ponder and answer. Through the study of GPT, a typical representative of AIGC, the authors have analyzed how GPT empowers the telecom sector in the form of scenarios, discussed the gap between the current GPT general model and telecom services, proposed for the first time a Telco Augmented Cognition capability system, provided answers to how to construct a telecom service GPT in the telecom sector, and carried out various practices. Our counterparts in the industry are expected to focus on collaborative innovation around telecom and AI, build an open and shared innovation ecosystem, promote the deep integration of AI and telecom sector, and accelerate the construction of next-generation information infrastructure, in an effort to facilitate the digital transformation of the economy and society. △ Less

Submitted 23 July, 2023; v1 submitted 21 July, 2023; originally announced July 2023.

arXiv:2307.10004 [pdf]

6G Network Business Support System

Authors: Ye Ouyang, Yaqin Zhang, Peng Wang, Yunxin Liu, Wen Qiao, Jun Zhu, Yang Liu, Feng Zhang, Shuling Wang, Xidong Wang

Abstract: 6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, a… ▽ More 6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, and comprehensively leading the digital, intelligent and green transformation of the economy and the society. As the core support system for mobile communication network, 6 6G BSS need to integrate with new business models brought about by the development of the next-generation Internet and IT, upgrade from "network-centric" to "business and service centric" and "customer-centric". 6G OSS and BSS systems need to strengthen their integration to improve the operational efficiency and benefits of customers by connecting the digital intelligence support capabilities on both sides of supply and demand. This paper provides a detailed introduction to the overall vision, potential key technologies, and functional architecture of 6G BSS systems. It also presents an evolutionary roadmap and technological prospects for the BSS systems from 5G to 6G. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.09045 [pdf]

6G Network Operation Support System

Authors: Ye Ouyang, Yaqin Zhang, Xiaozhou Ye, Yunxin Liu, Xidong Wang, Jie Sun, Yang Liu, Shoufeng Wang, Sen Bian, Yun Li

Abstract: 6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, a… ▽ More 6G is the next-generation intelligent and integrated digital information infrastructure, characterized by ubiquitous interconnection, native intelligence, multi-dimensional perception, global coverage, green and low-carbon, native network security, etc. 6G will realize the transition from serving people and people-things communication to supporting the efficient connection of intelligent agents, and comprehensively leading the digital, intelligent and green transformation of the economy and the society. As the core support system for mobile communication network, 6G OSS needs to achieve high-level network automation, intelligence and digital twinning capabilities to achieve end-to-end autonomous network operation and maintenance, support the operation of typical 6G business scenarios and play a greater social responsibility in the fields of environment, society, and governance (ESG).This paper provides a detailed introduction to the overall vision, potential key technologies, and functional architecture of 6G OSS . It also presents an evolutionary roadmap and technological prospects for the OSS from 5G to 6G. △ Less

Submitted 25 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

Comments: 103 pages, 20 figures, 52 references (chinese version)

arXiv:2307.00467 [pdf, other]

MissDiff: Training Diffusion Models on Tabular Data with Missing Values

Authors: Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng

Abstract: The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diff… ▽ More The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diffusion-based framework for learning from data with missing values under various missing mechanisms. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. Then we propose to mask the regression loss of Denoising Score Matching in the training phase. We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases. The proposed framework is evaluated on multiple tabular datasets using realistic and efficacious metrics and is demonstrated to outperform state-of-the-art diffusion model on tabular data with "impute-then-generate" pipeline by a large margin. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: 22 pages, short version is accepted by ICML workshop on Structured Probabilistic Inference & Generative Modeling 2023

Report number: 22

arXiv:2306.10518 [pdf, other]

LAGOON: Language-Guided Motion Control

Authors: Shusheng Xu, Huaijie Wang, Jiaxuan Gao, Yutao Ouyang, Chao Yu, Yi Wu

Abstract: We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a moti… ▽ More We aim to control a robot to physically behave in the real world following any high-level language command like "cartwheel" or "kick". Although human motion datasets exist, this task remains particularly challenging since generative models can produce physically unrealistic motions, which will be more severe for robots due to different body structures and physical properties. Deploying such a motion to a physical robot can cause even greater difficulties due to the sim2real gap. We develop LAnguage-Guided mOtion cONtrol (LAGOON), a multi-phase reinforcement learning (RL) method to generate physically realistic robot motions under language commands. LAGOON first leverages a pretrained model to generate a human motion from a language command. Then an RL phase trains a control policy in simulation to mimic the generated human motion. Finally, with domain randomization, our learned policy can be deployed to a quadrupedal robot, leading to a quadrupedal robot that can take diverse behaviors in the real world under natural language commands △ Less

Submitted 19 May, 2024; v1 submitted 18 June, 2023; originally announced June 2023.

Comments: 6 pages, 5 figures, 2 tables

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

arXiv:2304.10455 [pdf]

doi 10.1063/5.0155494

An extreme value statistics model of heterogeneous ice nucleation for quantifying the stability of supercooled aqueous systems

Authors: Anthony N. Consiglio, Yu Ouyang, Matthew J. Powell-Palm, Boris Rubinsky

Abstract: The propensity of water to remain in a metastable liquid state at temperatures below its equilibrium melting point holds significant potential for cryopreserving biological material such as tissues and organs. The benefits conferred are a direct result of progressively reducing metabolic expenditure due to colder temperatures while simultaneously avoiding the irreversible damage caused by the crys… ▽ More The propensity of water to remain in a metastable liquid state at temperatures below its equilibrium melting point holds significant potential for cryopreserving biological material such as tissues and organs. The benefits conferred are a direct result of progressively reducing metabolic expenditure due to colder temperatures while simultaneously avoiding the irreversible damage caused by the crystallization of ice. Unfortunately, the freezing of water in bulk systems of clinical relevance is dominated by random heterogeneous nucleation initiated by uncharacterized trace impurities, and the marked unpredictability of this behavior has prevented implementation of supercooling outside of controlled laboratory settings and in volumes larger than a few milliliters. Here, we develop a statistical model that jointly captures both the inherent stochastic nature of nucleation using conventional Poisson statistics as well as the random variability of heterogeneous nucleation catalysis through bivariate extreme value statistics. Individually, these two classes of models cannot account for both the time-dependent nature of nucleation and the sample-to-sample variability associated with heterogeneous catalysis, and traditional extreme value models have only considered variation of the characteristic nucleation temperature. We conduct a series of constant cooling rate and isothermal nucleation experiments with physiological saline solutions and leverage the statistical model to evaluate the natural variability of kinetic and thermodynamic nucleation parameters. By quantifying freezing probability as a function of temperature, supercooled duration, and system volume, while accounting for nucleation site variability, this study also provides a basis for the rational design of stable supercooled biopreservation protocols. △ Less

Submitted 20 April, 2023; originally announced April 2023.

Report number: 064511

Journal ref: J. Chem. Phys. 159, 064511 (2023)

arXiv:2304.04233 [pdf, other]

ODDFUZZ: Discovering Java Deserialization Vulnerabilities via Structure-Aware Directed Greybox Fuzzing

Authors: Sicong Cao, Biao He, Xiaobing Sun, Yu Ouyang, Chao Zhang, Xiaoxue Wu, Ting Su, Lili Bo, Bin Li, Chuanlei Ma, Jiajia Li, Tao Wei

Abstract: Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover… ▽ More Java deserialization vulnerability is a severe threat in practice. Researchers have proposed static analysis solutions to locate candidate vulnerabilities and fuzzing solutions to generate proof-of-concept (PoC) serialized objects to trigger them. However, existing solutions have limited effectiveness and efficiency. In this paper, we propose a novel hybrid solution ODDFUZZ to efficiently discover Java deserialization vulnerabilities. First, ODDFUZZ performs lightweight static taint analysis to identify candidate gadget chains that may cause deserialization vulner-abilities. In this step, ODDFUZZ tries to locate all candidates and avoid false negatives. Then, ODDFUZZ performs directed greybox fuzzing (DGF) to explore those candidates and generate PoC testcases to mitigate false positives. Specifically, ODDFUZZ applies a structure-aware seed generation method to guarantee the validity of the testcases, and adopts a novel hybrid feedback and a step-forward strategy to guide the directed fuzzing. We implemented a prototype of ODDFUZZ and evaluated it on the popular Java deserialization repository ysoserial. Results show that, ODDFUZZ could discover 16 out of 34 known gadget chains, while two state-of-the-art baselines only identify three of them. In addition, we evaluated ODDFUZZ on real-world applications including Oracle WebLogic Server, Apache Dubbo, Sonatype Nexus, and protostuff, and found six previously unreported exploitable gadget chains with five CVEs assigned. △ Less

Submitted 9 April, 2023; originally announced April 2023.

Comments: To appear in the Main Track of IEEE S&P 2023

arXiv:2303.14457 [pdf, other]

Diverse Motion In-betweening with Dual Posture Stitching

Authors: Tianxiang Ren, Jubo Yu, Shihui Guo, Ying Ma, Yutao Ouyang, Zijiao Zeng, Yazhan Zhang, Yipeng Qin

Abstract: In-betweening is a technique for generating transitions given initial and target character states. The majority of existing works require multiple (often $>$10) frames as input, which are not always accessible. Our work deals with a focused yet challenging problem: to generate the transition when given exactly two frames (only the first and last). To cope with this challenging scenario, we impleme… ▽ More In-betweening is a technique for generating transitions given initial and target character states. The majority of existing works require multiple (often $>$10) frames as input, which are not always accessible. Our work deals with a focused yet challenging problem: to generate the transition when given exactly two frames (only the first and last). To cope with this challenging scenario, we implement our bi-directional scheme which generates forward and backward transitions from the start and end frames with two adversarial autoregressive networks, and stitches them in the middle of the transition where there is no strict ground truth. The autoregressive networks based on conditional variational autoencoders (CVAE) are optimized by searching for a pair of optimal latent codes that minimize a novel stitching loss between their outputs. Results show that our method achieves higher motion quality and more diverse results than existing methods on both the LaFAN1 and Human3.6m datasets. △ Less

Submitted 25 March, 2023; originally announced March 2023.

Comments: 10 pages, 5 figures

arXiv:2303.13780 [pdf, other]

Towards Making the Most of ChatGPT for Machine Translation

Authors: Keqin Peng, Liang Ding, Qihuang Zhong, Li Shen, Xuebo Liu, Min Zhang, Yuanxin Ouyang, Dacheng Tao

Abstract: ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim… ▽ More ChatGPT shows remarkable capabilities for machine translation (MT). Several prior studies have shown that it achieves comparable results to commercial systems for high-resource languages, but lags behind in complex tasks, e.g., low-resource and distant-language-pairs translation. However, they usually adopt simple prompts which can not fully elicit the capability of ChatGPT. In this paper, we aim to further mine ChatGPT's translation ability by revisiting several aspects: temperature, task information, and domain information, and correspondingly propose an optimal temperature setting and two (simple but effective) prompts: Task-Specific Prompts (TSP) and Domain-Specific Prompts (DSP). We show that: 1) The performance of ChatGPT depends largely on temperature, and a lower temperature usually can achieve better performance; 2) Emphasizing the task information can further improve ChatGPT's performance, particularly in complex MT tasks; 3) Introducing domain information can elicit ChatGPT's generalization ability and improve its performance in the specific domain; 4) ChatGPT tends to generate hallucinations for non-English-centric MT tasks, which can be partially addressed by our proposed prompts but still need to be highlighted for the MT/NLP community. We also explore the effects of advanced in-context learning strategies and find a (negative but interesting) observation: the powerful chain-of-thought prompt leads to word-by-word translation behavior, thus bringing significant translation degradation. △ Less

Submitted 20 October, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: EMNLP 2023 (findings)

Showing 1–50 of 210 results for author: Ouyang, Y