subscribe to arXiv mailings

Entropy Law: The Story Behind Data Compression and LLM Performance

Authors: Mingjia Yin, Chuhan Wu, Yufei Wang, Hao Wang, Wei Guo, Yasheng Wang, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen

Abstract: Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect qua… ▽ More Data is the cornerstone of large language models (LLMs), but not all data is useful for model learning. Carefully selected data can better elicit the capabilities of LLMs with much less computational overhead. Most methods concentrate on evaluating the quality of individual samples in data selection, while the combinatorial effects among samples are neglected. Even if each sample is of perfect quality, their combinations may be suboptimal in teaching LLMs due to their intrinsic homogeneity or contradiction. In this paper, we aim to uncover the underlying relationships between LLM performance and data selection. Inspired by the information compression nature of LLMs, we uncover an ``entropy law'' that connects LLM performance with data compression ratio and first-epoch training loss, which reflect the information redundancy of a dataset and the mastery of inherent knowledge encoded in this dataset, respectively. Through both theoretical deduction and empirical evaluation, we find that model performance is negatively correlated to the compression ratio of training data, which usually yields a lower training loss. Based on the findings of the entropy law, we propose a quite efficient and universal data selection method named \textbf{ZIP} for training LLMs, which aim to prioritize data subsets exhibiting a low compression ratio. Based on a multi-stage algorithm that selects diverse data in a greedy manner, we can obtain a good data subset with satisfactory diversity. Extensive experiments have been conducted to validate the entropy law and the superiority of ZIP across different LLM backbones and alignment stages. We also present an interesting application of entropy law that can detect potential performance risks at the beginning of model training. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06309 [pdf, other]

Multimodal Chain-of-Thought Reasoning via ChatGPT to Protect Children from Age-Inappropriate Apps

Authors: Chuanbo Hu, Bin Liu, Minglei Yin, Yilu Zhou, Xin Li

Abstract: Mobile applications (Apps) could expose children to inappropriate themes such as sexual content, violence, and drug use. Maturity rating offers a quick and effective method for potential users, particularly guardians, to assess the maturity levels of apps. Determining accurate maturity ratings for mobile apps is essential to protect children's health in today's saturated digital marketplace. Exist… ▽ More Mobile applications (Apps) could expose children to inappropriate themes such as sexual content, violence, and drug use. Maturity rating offers a quick and effective method for potential users, particularly guardians, to assess the maturity levels of apps. Determining accurate maturity ratings for mobile apps is essential to protect children's health in today's saturated digital marketplace. Existing approaches to maturity rating are either inaccurate (e.g., self-reported rating by developers) or costly (e.g., manual examination). In the literature, there are few text-mining-based approaches to maturity rating. However, each app typically involves multiple modalities, namely app description in the text, and screenshots in the image. In this paper, we present a framework for determining app maturity levels that utilize multimodal large language models (MLLMs), specifically ChatGPT-4 Vision. Powered by Chain-of-Thought (CoT) reasoning, our framework systematically leverages ChatGPT-4 to process multimodal app data (i.e., textual descriptions and screenshots) and guide the MLLM model through a step-by-step reasoning pathway from initial content analysis to final maturity rating determination. As a result, through explicitly incorporating CoT reasoning, our framework enables ChatGPT to understand better and apply maturity policies to facilitate maturity rating. Experimental results indicate that the proposed method outperforms all baseline models and other fusion strategies. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03307 [pdf, other]

HoloHisto: End-to-end Gigapixel WSI Segmentation with 4K Resolution Sequential Tokenization

Authors: Yucheng Tang, Yufan He, Vishwesh Nath, Pengfeig Guo, Ruining Deng, Tianyuan Yao, Quan Liu, Can Cui, Mengmeng Yin, Ziyue Xu, Holger Roth, Daguang Xu, Haichun Yang, Yuankai Huo

Abstract: In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this… ▽ More In digital pathology, the traditional method for deep learning-based image segmentation typically involves a two-stage process: initially segmenting high-resolution whole slide images (WSI) into smaller patches (e.g., 256x256, 512x512, 1024x1024) and subsequently reconstructing them to their original scale. This method often struggles to capture the complex details and vast scope of WSIs. In this paper, we propose the holistic histopathology (HoloHisto) segmentation method to achieve end-to-end segmentation on gigapixel WSIs, whose maximum resolution is above 80,000$\times$70,000 pixels. HoloHisto fundamentally shifts the paradigm of WSI segmentation to an end-to-end learning fashion with 1) a large (4K) resolution base patch for elevated visual information inclusion and efficient processing, and 2) a novel sequential tokenization mechanism to properly model the contextual relationships and efficiently model the rich information from the 4K input. To our best knowledge, HoloHisto presents the first holistic approach for gigapixel resolution WSI segmentation, supporting direct I/O of complete WSI and their corresponding gigapixel masks. Under the HoloHisto platform, we unveil a random 4K sampler that transcends ultra-high resolution, delivering 31 and 10 times more pixels than standard 2D and 3D patches, respectively, for advancing computational capabilities. To facilitate efficient 4K resolution dense prediction, we leverage sequential tokenization, utilizing a pre-trained image tokenizer to group image features into a discrete token grid. To assess the performance, our team curated a new kidney pathology image segmentation (KPIs) dataset with WSI-level glomeruli segmentation from whole mouse kidneys. From the results, HoloHisto-4K delivers remarkable performance gains over previous state-of-the-art models. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.00596 [pdf, other]

HATs: Hierarchical Adaptive Taxonomy Segmentation for Panoramic Pathology Image Analysis

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Juming Xiong, Shunxing Bao, Hao Li, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel… ▽ More Panoramic image segmentation in computational pathology presents a remarkable challenge due to the morphologically complex and variably scaled anatomy. For instance, the intricate organization in kidney pathology spans multiple layers, from regions like the cortex and medulla to functional units such as glomeruli, tubules, and vessels, down to various cell types. In this paper, we propose a novel Hierarchical Adaptive Taxonomy Segmentation (HATs) method, which is designed to thoroughly segment panoramic views of kidney structures by leveraging detailed anatomical insights. Our approach entails (1) the innovative HATs technique which translates spatial relationships among 15 distinct object classes into a versatile "plug-and-play" loss function that spans across regions, functional units, and cells, (2) the incorporation of anatomical hierarchies and scale considerations into a unified simple matrix representation for all panoramic entities, (3) the adoption of the latest AI foundation model (EfficientSAM) as a feature extraction tool to boost the model's adaptability, yet eliminating the need for manual prompt generation in conventional segment anything model (SAM). Experimental findings demonstrate that the HATs method offers an efficient and effective strategy for integrating clinical insights and imaging precedents into a unified segmentation model across more than 15 categories. The official implementation is publicly available at https://github.com/hrlblab/HATs. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.19286

arXiv:2407.00030 [pdf, other]

On Orchestrating Parallel Broadcasts for Distributed Ledgers

Authors: Peiyao Sheng, Chenyuan Wu, Dahlia Malkhi, Michael K. Reiter, Chrysoula Stathakopoulou, Michael Wei, Maofan Yin

Abstract: This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity an… ▽ More This paper introduces and develops the concept of ``ticketing'', through which atomic broadcasts are orchestrated by nodes in a distributed system. The paper studies different ticketing regimes that allow parallelism, yet prevent slow nodes from hampering overall progress. It introduces a hybrid scheme which combines managed and unmanaged ticketing regimes, striking a balance between adaptivity and resilience. The performance evaluation demonstrates how managed and unmanaged ticketing regimes benefit throughput in systems with heterogeneous resources both in static and dynamic scenarios, with the managed ticketing regime performing better among the two as it adapts better. Finally, it demonstrates how using the hybrid ticketing regime performance can enjoy both the adaptivity of the managed regime and the liveness guarantees of the unmanaged regime. △ Less

Submitted 17 May, 2024; originally announced July 2024.

arXiv:2406.12404 [pdf]

Scan-to-BIM for As-built Roads: Automatic Road Digital Twinning from Semantically Labeled Point Cloud Data

Authors: Yuexiong Ding, Mengtian Yin, Ran Wei, Ioannis Brilakis, Muyang Liu, Xiaowei Luo

Abstract: Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset type… ▽ More Creating geometric digital twins (gDT) for as-built roads still faces many challenges, such as low automation level and accuracy, limited asset types and shapes, and reliance on engineering experience. A novel scan-to-building information modeling (scan-to-BIM) framework is proposed for automatic road gDT creation based on semantically labeled point cloud data (PCD), which considers six asset types: Road Surface, Road Side (Slope), Road Lane (Marking), Road Sign, Road Light, and Guardrail. The framework first segments the semantic PCD into spatially independent instances or parts, then extracts the sectional polygon contours as their representative geometric information, stored in JavaScript Object Notation (JSON) files using a new data structure. Primitive gDTs are finally created from JSON files using corresponding conversion algorithms. The proposed method achieves an average distance error of 1.46 centimeters and a processing speed of 6.29 meters per second on six real-world road segments with a total length of 1,200 meters. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.05590 [pdf, other]

NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Authors: Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique

Abstract: Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database incl… ▽ More Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.01838 [pdf, other]

Learning the Target Network in Function Space

Authors: Kavosh Asadi, Yao Liu, Shoham Sabach, Ming Yin, Rasool Fakoor

Abstract: We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algo… ▽ More We focus on the task of learning the value function in the reinforcement learning (RL) setting. This task is often solved by updating a pair of online and target networks while ensuring that the parameters of these two networks are equivalent. We propose Lookahead-Replicate (LR), a new value-function approximation algorithm that is agnostic to this parameter-space equivalence. Instead, the LR algorithm is designed to maintain an equivalence between the two networks in the function space. This value-based equivalence is obtained by employing a new target-network update. We show that LR leads to a convergent behavior in learning the value function. We also present empirical results demonstrating that LR-based target-network updates significantly improve deep RL on the Atari benchmark. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted to International Conference on Machine Learning (ICML24)

arXiv:2405.20495 [pdf, other]

Transfer Q Star: Principled Decoding for LLM Alignment

Authors: Souradip Chakraborty, Soumya Suvra Ghosal, Ming Yin, Dinesh Manocha, Mengdi Wang, Amrit Singh Bedi, Furong Huang

Abstract: Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable frame… ▽ More Aligning foundation models is essential for their safe and trustworthy deployment. However, traditional fine-tuning methods are computationally intensive and require updating billions of model parameters. A promising alternative, alignment via decoding, adjusts the response distribution directly without model updates to maximize a target reward $r$, thus providing a lightweight and adaptable framework for alignment. However, principled decoding methods rely on oracle access to an optimal Q-function ($Q^*$), which is often unavailable in practice. Hence, prior SoTA methods either approximate this $Q^*$ using $Q^{π_{\texttt{sft}}}$ (derived from the reference $\texttt{SFT}$ model) or rely on short-term rewards, resulting in sub-optimal decoding performance. In this work, we propose Transfer $Q^*$, which implicitly estimates the optimal value function for a target reward $r$ through a baseline model $ρ_{\texttt{BL}}$ aligned with a baseline reward $ρ_{\texttt{BL}}$ (which can be different from the target reward $r$). Theoretical analyses of Transfer $Q^*$ provide a rigorous characterization of its optimality, deriving an upper bound on the sub-optimality gap and identifying a hyperparameter to control the deviation from the pre-trained reference $\texttt{SFT}$ model based on user needs. Our approach significantly reduces the sub-optimality gap observed in prior SoTA methods and demonstrates superior empirical performance across key metrics such as coherence, diversity, and quality in extensive tests on several synthetic and real datasets. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17795 [pdf, other]

Dataset Regeneration for Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Wei Guo, Yong Liu, Suojuan Zhang, Sirui Zhao, Defu Lian, Enhong Chen

Abstract: The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users. Significant efforts have been made to enhance the capabilities of SR systems. These methods typically follow the model-centric paradigm, which involves developing effective models based on fixed datasets. However, this approach often overlooks potent… ▽ More The sequential recommender (SR) system is a crucial component of modern recommender systems, as it aims to capture the evolving preferences of users. Significant efforts have been made to enhance the capabilities of SR systems. These methods typically follow the model-centric paradigm, which involves developing effective models based on fixed datasets. However, this approach often overlooks potential quality issues and flaws inherent in the data. Driven by the potential of data-centric AI, we propose a novel data-centric paradigm for developing an ideal training dataset using a model-agnostic dataset regeneration framework called DR4SR. This framework enables the regeneration of a dataset with exceptional cross-architecture generalizability. Additionally, we introduce the DR4SR+ framework, which incorporates a model-aware dataset personalizer to tailor the regenerated dataset specifically for a target model. To demonstrate the effectiveness of the data-centric paradigm, we integrate our framework with various model-centric methods and observe significant performance improvements across four widely adopted datasets. Furthermore, we conduct in-depth analyses to explore the potential of the data-centric paradigm and provide valuable insights. The code can be found at https://anonymous.4open.science/r/KDD2024-86EA △ Less

Submitted 3 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16345 [pdf]

Cypher4BIM: Releasing the Power of Graph for Building Knowledge Discovery

Authors: Junxiang Zhu, Nicholas Nisbet, Mengtian Yin, Ran Wei, Ioannis Brilakis

Abstract: Graph is considered a promising way for managing building information. A new graphic form of IFC (Industry Foundation Classes) data has just been developed, referred to as IFC-Graph. However, understanding of IFC-Graph is insufficient, especially for information query. This study aims to explore graphic building information query and develop a graph query language tailored for IFC-Graph. A series… ▽ More Graph is considered a promising way for managing building information. A new graphic form of IFC (Industry Foundation Classes) data has just been developed, referred to as IFC-Graph. However, understanding of IFC-Graph is insufficient, especially for information query. This study aims to explore graphic building information query and develop a graph query language tailored for IFC-Graph. A series of tasks were carried out, including a) investigating the structure of IFC data and the main types of information in IFC, b) investigating the graph query language Cypher, and c) developing a set of tailored functional query patterns. The developed language is referred to as Cypher4BIM. Five IFC models were used for validation, and the result shows that Cypher4BIM can query individual instances and complex relations from IFC, such as spatial structure, space boundary, and space accessibility. This study contributes to applications that require effective building information query, such as digital twin. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.12473 [pdf, other]

Learning Partially Aligned Item Representation for Cross-Domain Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Wei Guo, Yong Liu, Zhi Li, Sirui Zhao, Defu Lian, Enhong Chen

Abstract: Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains. While significant endeavors have been made, they primarily concentrated on developing advanced transfer modules and aligning user representations using self-supervised learning techniques. However, the problem of aligning item representations has received… ▽ More Cross-domain sequential recommendation (CDSR) aims to uncover and transfer users' sequential preferences across multiple recommendation domains. While significant endeavors have been made, they primarily concentrated on developing advanced transfer modules and aligning user representations using self-supervised learning techniques. However, the problem of aligning item representations has received limited attention, and misaligned item representations can potentially lead to sub-optimal sequential modeling and user representation alignment. To this end, we propose a model-agnostic framework called \textbf{C}ross-domain item representation \textbf{A}lignment for \textbf{C}ross-\textbf{D}omain \textbf{S}equential \textbf{R}ecommendation (\textbf{CA-CDSR}), which achieves sequence-aware generation and adaptively partial alignment for item representations. Specifically, we first develop a sequence-aware feature augmentation strategy, which captures both collaborative and sequential item correlations, thus facilitating holistic item representation generation. Next, we conduct an empirical study to investigate the partial representation alignment problem from a spectrum perspective. It motivates us to devise an adaptive spectrum filter, achieving partial alignment adaptively. Furthermore, the aligned item representations can be fed into different sequential encoders to obtain user representations. The entire framework is optimized in a multi-task learning paradigm with an annealing strategy. Extensive experiments have demonstrated that CA-CDSR can surpass state-of-the-art baselines by a significant margin and can effectively align items in representation spaces to enhance performance. △ Less

Submitted 3 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.17069 [pdf, other]

Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks

Authors: Yaqi Hu, Mingsheng Yin, Marco Mezzavilla, Hao Guo, Sundeep Rangan

Abstract: The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operat… ▽ More The upper mid-band (FR3) has been recently attracting interest for new generation of mobile networks, as it provides a promising balance between spectrum availability and coverage, which are inherent limitations of the sub 6GHz and millimeter wave bands, respectively. In order to efficiently design and optimize the network, channel modeling plays a key role since FR3 systems are expected to operate at multiple frequency bands. Data-driven methods, especially generative adversarial networks (GANs), can capture the intricate relationships among data samples, and provide an appropriate tool for FR3 channel modeling. In this work, we present the architecture, link state model, and path generative network of GAN-based FR3 channel modeling. The comparison of our model greatly matches the ray-tracing simulated data. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13528 [pdf, other]

doi 10.1145/3620666.3651384

SmartMem: Layout Transformation Elimination and Adaptation for Efficient DNN Execution on Mobile

Authors: Wei Niu, Md Musfiqur Rahman Sanim, Zhihao Shu, Jiexiong Guan, Xipeng Shen, Miao Yin, Gagan Agrawal, Bin Ren

Abstract: This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, w… ▽ More This work is motivated by recent developments in Deep Neural Networks, particularly the Transformer architectures underlying applications such as ChatGPT, and the need for performing inference on mobile devices. Focusing on emerging transformers (specifically the ones with computationally efficient Swin-like architectures) and large models (e.g., Stable Diffusion and LLMs) based on transformers, we observe that layout transformations between the computational operators cause a significant slowdown in these applications. This paper presents SmartMem, a comprehensive framework for eliminating most layout transformations, with the idea that multiple operators can use the same tensor layout through careful choice of layout and implementation of operations. Our approach is based on classifying the operators into four groups, and considering combinations of producer-consumer edges between the operators. We develop a set of methods for searching such layouts. Another component of our work is developing efficient memory layouts for 2.5 dimensional memory commonly seen in mobile devices. Our experimental results show that SmartMem outperforms 5 state-of-the-art DNN execution frameworks on mobile devices across 18 varied neural networks, including CNNs, Transformers with both local and global attention, as well as LLMs. In particular, compared to DNNFusion, SmartMem achieves an average speedup of 2.8$\times$, and outperforms TVM and MNN with speedups of 6.9$\times$ and 7.9$\times$, respectively, on average. △ Less

Submitted 21 April, 2024; originally announced April 2024.

arXiv:2404.13470 [pdf, other]

GWLZ: A Group-wise Learning-based Lossy Compression Framework for Scientific Data

Authors: Wenqi Jia, Sian Jin, Jinzhen Wang, Wei Niu, Dingwen Tao, Miao Yin

Abstract: The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded… ▽ More The rapid expansion of computational capabilities and the ever-growing scale of modern HPC systems present formidable challenges in managing exascale scientific data. Faced with such vast datasets, traditional lossless compression techniques prove insufficient in reducing data size to a manageable level while preserving all information intact. In response, researchers have turned to error-bounded lossy compression methods, which offer a balance between data size reduction and information retention. However, despite their utility, these compressors employing conventional techniques struggle with limited reconstruction quality. To address this issue, we draw inspiration from recent advancements in deep learning and propose GWLZ, a novel group-wise learning-based lossy compression framework with multiple lightweight learnable enhancer models. Leveraging a group of neural networks, GWLZ significantly enhances the decompressed data reconstruction quality with negligible impact on the compression efficiency. Experimental results on different fields from the Nyx dataset demonstrate remarkable improvements by GWLZ, achieving up to 20% quality enhancements with negligible overhead as low as 0.0003x. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.11871 [pdf, other]

Group-On: Boosting One-Shot Segmentation with Supportive Query

Authors: Hanjing Zhou, Mingze Yin, JinTai Chen, Danny Chen, Jian Wu

Abstract: One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class. This task is challenging because target objects in the support and query images can be largely different in appearance and pose (i.e., intra-class variation). Prior works suggested that incorporating more annotated support images in few-shot settings boosts performances but increas… ▽ More One-shot semantic segmentation aims to segment query images given only ONE annotated support image of the same class. This task is challenging because target objects in the support and query images can be largely different in appearance and pose (i.e., intra-class variation). Prior works suggested that incorporating more annotated support images in few-shot settings boosts performances but increases costs due to additional manual labeling. In this paper, we propose a novel approach for ONE-shot semantic segmentation, called Group-On, which packs multiple query images in batches for the benefit of mutual knowledge support within the same category. Specifically, after coarse segmentation masks of the batch of queries are predicted, query-mask pairs act as pseudo support data to enhance mask predictions mutually, under the guidance of a simple Group-On Voting module. Comprehensive experiments on three standard benchmarks show that, in the ONE-shot setting, our Group-On approach significantly outperforms previous works by considerable margins. For example, on the COCO-20i dataset, we increase mIoU scores by 8.21% and 7.46% on ASNet and HSNet baselines, respectively. With only one support image, Group-On can be even competitive with the counterparts using 5 annotated support images. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.04057 [pdf, other]

Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation

Authors: Mingyuan Zhou, Huangjie Zheng, Zhendong Wang, Mingzhang Yin, Hai Huang

Abstract: We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By refo… ▽ More We introduce Score identity Distillation (SiD), an innovative data-free method that distills the generative capabilities of pretrained diffusion models into a single-step generator. SiD not only facilitates an exponentially fast reduction in Fréchet inception distance (FID) during distillation but also approaches or even exceeds the FID performance of the original teacher diffusion models. By reformulating forward diffusion processes as semi-implicit distributions, we leverage three score-related identities to create an innovative loss mechanism. This mechanism achieves rapid FID reduction by training the generator using its own synthesized images, eliminating the need for real data or reverse-diffusion-based generation, all accomplished within significantly shortened generation time. Upon evaluation across four benchmark datasets, the SiD algorithm demonstrates high iteration efficiency during distillation and surpasses competing distillation approaches, whether they are one-step or few-step, data-free, or dependent on training data, in terms of generation quality. This achievement not only redefines the benchmarks for efficiency and effectiveness in diffusion distillation but also in the broader field of diffusion-based generation. The PyTorch implementation is available at https://github.com/mingyuanzhou/SiD △ Less

Submitted 24 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

Comments: ICML 2024, PyTorch implementation: https://github.com/mingyuanzhou/SiD

arXiv:2404.00268 [pdf, other]

A Unified Framework for Adaptive Representation Enhancement and Inversed Learning in Cross-Domain Recommendation

Authors: Luankang Zhang, Hao Wang, Suojuan Zhang, Mingjia Yin, Yongqiang Han, Jiaqing Zhang, Defu Lian, Enhong Chen

Abstract: Cross-domain recommendation (CDR), aiming to extract and transfer knowledge across domains, has attracted wide attention for its efficacy in addressing data sparsity and cold-start problems. Despite significant advances in representation disentanglement to capture diverse user preferences, existing methods usually neglect representation enhancement and lack rigorous decoupling constraints, thereby… ▽ More Cross-domain recommendation (CDR), aiming to extract and transfer knowledge across domains, has attracted wide attention for its efficacy in addressing data sparsity and cold-start problems. Despite significant advances in representation disentanglement to capture diverse user preferences, existing methods usually neglect representation enhancement and lack rigorous decoupling constraints, thereby limiting the transfer of relevant information. To this end, we propose a Unified Framework for Adaptive Representation Enhancement and Inversed Learning in Cross-Domain Recommendation (AREIL). Specifically, we first divide user embeddings into domain-shared and domain-specific components to disentangle mixed user preferences. Then, we incorporate intra-domain and inter-domain information to adaptively enhance the ability of user representations. In particular, we propose a graph convolution module to capture high-order information, and a self-attention module to reveal inter-domain correlations and accomplish adaptive fusion. Next, we adopt domain classifiers and gradient reversal layers to achieve inversed representation learning in a unified framework. Finally, we employ a cross-entropy loss for measuring recommendation performance and jointly optimize the entire framework via multi-task learning. Extensive experiments on multiple datasets validate the substantial improvement in the recommendation performance of AREIL. Moreover, ablation studies and representation visualizations further illustrate the effectiveness of adaptive enhancement and inversed learning in CDR. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: Accepted by DASFAA 2024

arXiv:2403.16812 [pdf, other]

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.12004 [pdf, other]

The Value, Benefits, and Concerns of Generative AI-Powered Assistance in Writing

Authors: Zhuoyan Li, Chen Liang, Jing Peng, Ming Yin

Abstract: Recent advances in generative AI technologies like large language models raise both excitement and concerns about the future of human-AI co-creation in writing. To unpack people's attitude towards and experience with generative AI-powered writing assistants, in this paper, we conduct an experiment to understand whether and how much value people attach to AI assistance, and how the incorporation of… ▽ More Recent advances in generative AI technologies like large language models raise both excitement and concerns about the future of human-AI co-creation in writing. To unpack people's attitude towards and experience with generative AI-powered writing assistants, in this paper, we conduct an experiment to understand whether and how much value people attach to AI assistance, and how the incorporation of AI assistance in writing workflows changes people's writing perceptions and performance. Our results suggest that people are willing to forgo financial payments to receive writing assistance from AI, especially if AI can provide direct content generation assistance and the writing task is highly creative. Generative AI-powered assistance is found to offer benefits in increasing people's productivity and confidence in writing. However, direct content generation assistance offered by AI also comes with risks, including decreasing people's sense of accountability and diversity in writing. We conclude by discussing the implications of our findings. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: CHI 2024

arXiv:2403.11574 [pdf, ps, other]

Offline Multitask Representation Learning for Reinforcement Learning

Authors: Haque Ishfaq, Thanh Nguyen-Tang, Songtao Feng, Raman Arora, Mengdi Wang, Ming Yin, Doina Precup

Abstract: We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore,… ▽ More We study offline multitask representation learning in reinforcement learning (RL), where a learner is provided with an offline dataset from different tasks that share a common representation and is asked to learn the shared representation. We theoretically investigate offline multitask low-rank RL, and propose a new algorithm called MORL for offline multitask representation learning. Furthermore, we examine downstream RL in reward-free, offline and online scenarios, where a new task is introduced to the agent that shares the same representation as the upstream offline tasks. Our theoretical results demonstrate the benefits of using the learned representation from the upstream offline task instead of directly learning the representation of the low-rank model. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.10850 [pdf, other]

GAgent: An Adaptive Rigid-Soft Gripping Agent with Vision Language Models for Complex Lighting Environments

Authors: Zhuowei Li, Miao Zhang, Xiaotian Lin, Meng Yin, Shuai Lu, Xueqian Wang

Abstract: This paper introduces GAgent: an Gripping Agent designed for open-world environments that provides advanced cognitive abilities via VLM agents and flexible grasping abilities with variable stiffness soft grippers. GAgent comprises three primary components - Prompt Engineer module, Visual-Language Model (VLM) core and Workflow module. These three modules enhance gripper success rates by recognizing… ▽ More This paper introduces GAgent: an Gripping Agent designed for open-world environments that provides advanced cognitive abilities via VLM agents and flexible grasping abilities with variable stiffness soft grippers. GAgent comprises three primary components - Prompt Engineer module, Visual-Language Model (VLM) core and Workflow module. These three modules enhance gripper success rates by recognizing objects and materials and accurately estimating grasp area even under challenging lighting conditions. As part of creativity, researchers also created a bionic hybrid soft gripper with variable stiffness capable of gripping heavy loads while still gently engaging objects. This intelligent agent, featuring VLM-based cognitive processing with bionic design, shows promise as it could potentially benefit UAVs in various scenarios. △ Less

Submitted 16 March, 2024; originally announced March 2024.

arXiv:2403.09552 [pdf, other]

"Are You Really Sure?" Understanding the Effects of Human Self-Confidence Calibration in AI-Assisted Decision Making

Authors: Shuai Ma, Xinru Wang, Ying Lei, Chuhan Shi, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-con… ▽ More In AI-assisted decision-making, it is crucial but challenging for humans to achieve appropriate reliance on AI. This paper approaches this problem from a human-centered perspective, "human self-confidence calibration". We begin by proposing an analytical framework to highlight the importance of calibrated human self-confidence. In our first study, we explore the relationship between human self-confidence appropriateness and reliance appropriateness. Then in our second study, We propose three calibration mechanisms and compare their effects on humans' self-confidence and user experience. Subsequently, our third study investigates the effects of self-confidence calibration on AI-assisted decision-making. Results show that calibrating human self-confidence enhances human-AI team performance and encourages more rational reliance on AI (in some aspects) compared to uncalibrated baselines. Finally, we discuss our main findings and provide implications for designing future AI-assisted decision-making interfaces. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.04530 [pdf, other]

Multi-District School Choice: Playing on Several Fields

Authors: Yannai A. Gonczarowski, Michael Yin, Shirley Zhang

Abstract: We extend the seminal model of Pathak and Sönmez (2008) to a setting with multiple school districts, each running its own separate centralized match, and focus on the case of two districts. In our setting, in addition to each student being either sincere or sophisticated, she is also either constrained - able to apply only to schools within her own district of residence - or unconstrained - able t… ▽ More We extend the seminal model of Pathak and Sönmez (2008) to a setting with multiple school districts, each running its own separate centralized match, and focus on the case of two districts. In our setting, in addition to each student being either sincere or sophisticated, she is also either constrained - able to apply only to schools within her own district of residence - or unconstrained - able to choose any single district within which to apply. We show that several key results from Pathak and Sönmez (2008) qualitatively flip: A sophisticated student may prefer for a sincere student to become sophisticated, and a sophisticated student may prefer for her own district to use Deferred Acceptance over the Boston Mechanism, irrespective of the mechanism used by the other district. We furthermore investigate the preferences of students over the constraint levels of other students. Many of these phenomena appear abundantly in large random markets. △ Less

Submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.03149 [pdf, other]

Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

Authors: Yichang Xu, Ming Yin, Minghong Fang, Neil Zhenqiang Gong

Abstract: Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or… ▽ More Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or knowledge of label distribution before the attack. In this work, we bridge the gap by proposing InferGuard, a novel Byzantine-robust aggregation rule aimed at defending against client-side training data distribution inference attacks. In our proposed InferGuard, the server first calculates the coordinate-wise median of all the model updates it receives. A client's model update is considered malicious if it significantly deviates from the computed median update. We conduct a thorough evaluation of our proposed InferGuard on five benchmark datasets and perform a comparison with ten baseline methods. The results of our experiments indicate that our defense mechanism is highly effective in protecting against client-side training data distribution inference attacks, even against strong adaptive attacks. Furthermore, our method substantially outperforms the baseline methods in various practical FL scenarios. △ Less

Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2403.01791 [pdf, other]

Beyond Recommender: An Exploratory Study of the Effects of Different AI Roles in AI-Assisted Decision Making

Authors: Shuai Ma, Chenyi Zhang, Xinru Wang, Xiaojuan Ma, Ming Yin

Abstract: Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform var… ▽ More Artificial Intelligence (AI) is increasingly employed in various decision-making tasks, typically as a Recommender, providing recommendations that the AI deems correct. However, recent studies suggest this may diminish human analytical thinking and lead to humans' inappropriate reliance on AI, impairing the synergy in human-AI teams. In contrast, human advisors in group decision-making perform various roles, such as analyzing alternative options or criticizing decision-makers to encourage their critical thinking. This diversity of roles has not yet been empirically explored in AI assistance. In this paper, we examine three AI roles: Recommender, Analyzer, and Devil's Advocate, and evaluate their effects across two AI performance levels. Our results show each role's distinct strengths and limitations in task performance, reliance appropriateness, and user experience. Notably, the Recommender role is not always the most effective, especially if the AI performance level is low, the Analyzer role may be preferable. These insights offer valuable implications for designing AI assistants with adaptive functional roles according to different situations. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.19286 [pdf, other]

PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation

Authors: Ruining Deng, Quan Liu, Can Cui, Tianyuan Yao, Jialin Yue, Juming Xiong, Lining Yu, Yifei Wu, Mengmeng Yin, Yu Wang, Shilin Zhao, Yucheng Tang, Haichun Yang, Yuankai Huo

Abstract: Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intrica… ▽ More Understanding the anatomy of renal pathology is crucial for advancing disease diagnostics, treatment evaluation, and clinical research. The complex kidney system comprises various components across multiple levels, including regions (cortex, medulla), functional units (glomeruli, tubules), and cells (podocytes, mesangial cells in glomerulus). Prior studies have predominantly overlooked the intricate spatial interrelations among objects from clinical knowledge. In this research, we introduce a novel universal proposition learning approach, called panoramic renal pathology segmentation (PrPSeg), designed to segment comprehensively panoramic structures within kidney by integrating extensive knowledge of kidney anatomy. In this paper, we propose (1) the design of a comprehensive universal proposition matrix for renal pathology, facilitating the incorporation of classification and spatial relationships into the segmentation process; (2) a token-based dynamic head single network architecture, with the improvement of the partial label image segmentation and capability for future data enlargement; and (3) an anatomy loss function, quantifying the inter-object relationships across the kidney. △ Less

Submitted 20 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: IEEE / CVF Computer Vision and Pattern Recognition Conference 2024

arXiv:2402.11637 [pdf, other]

Poisoning Federated Recommender Systems with Fake Users

Authors: Ming Yin, Yichang Xu, Minghong Fang, Neil Zhenqiang Gong

Abstract: Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strat… ▽ More Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strategies for executing promotion attacks in federated recommender systems. Current poisoning attacks on federated recommender systems often rely on additional information, such as the local training data of genuine users or item popularity. However, such information is challenging for the potential attacker to obtain. Thus, there is a need to develop an attack that requires no extra information apart from item embeddings obtained from the server. In this paper, we introduce a novel fake user based poisoning attack named PoisonFRS to promote the attacker-chosen targeted item in federated recommender systems without requiring knowledge about user-item rating data, user attributes, or the aggregation rule used by the server. Extensive experiments on multiple real-world datasets demonstrate that PoisonFRS can effectively promote the attacker-chosen targeted item to a large portion of genuine users and outperform current benchmarks that rely on additional information about the system. We further observe that the model updates from both genuine and fake users are indistinguishable within the latent space. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2402.10516 [pdf, other]

Generative AI for Controllable Protein Sequence Design: A Survey

Authors: Yiheng Zhu, Zitai Kong, Jialu Wu, Weize Liu, Yuqiang Han, Mingze Yin, Hongxia Xu, Chang-Yu Hsieh, Tingjun Hou

Abstract: The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particul… ▽ More The design of novel protein sequences with targeted functionalities underpins a central theme in protein engineering, impacting diverse fields such as drug discovery and enzymatic engineering. However, navigating this vast combinatorial search space remains a severe challenge due to time and financial constraints. This scenario is rapidly evolving as the transformative advancements in AI, particularly in the realm of generative models and optimization algorithms, have been propelling the protein design field towards an unprecedented revolution. In this survey, we systematically review recent advances in generative AI for controllable protein sequence design. To set the stage, we first outline the foundational tasks in protein sequence design in terms of the constraints involved and present key generative models and optimization algorithms. We then offer in-depth reviews of each design task and discuss the pertinent applications. Finally, we identify the unresolved challenges and highlight research opportunities that merit deeper exploration. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 9 pages

arXiv:2402.07250 [pdf, other]

DIMON: Learning Solution Operators of Partial Differential Equations on a Diffeomorphic Family of Domains

Authors: Minglang Yin, Nicolas Charon, Ryan Brody, Lu Lu, Natalia Trayanova, Mauro Maggioni

Abstract: The solution of a PDE over varying initial/boundary conditions on multiple domains is needed in a wide variety of applications, but it is computationally expensive if the solution is computed de novo whenever the initial/boundary conditions of the domain change. We introduce a general operator learning framework, called DIffeomorphic Mapping Operator learNing (DIMON) to learn approximate PDE solut… ▽ More The solution of a PDE over varying initial/boundary conditions on multiple domains is needed in a wide variety of applications, but it is computationally expensive if the solution is computed de novo whenever the initial/boundary conditions of the domain change. We introduce a general operator learning framework, called DIffeomorphic Mapping Operator learNing (DIMON) to learn approximate PDE solutions over a family of domains $\{Ω_θ}_θ$, that learns the map from initial/boundary conditions and domain $Ω_θ$ to the solution of the PDE, or to specified functionals thereof. DIMON is based on transporting a given problem (initial/boundary conditions and domain $Ω_θ$) to a problem on a reference domain $Ω_{0}$, where training data from multiple problems is used to learn the map to the solution on $Ω_{0}$, which is then re-mapped to the original domain $Ω_θ$. We consider several problems to demonstrate the performance of the framework in learning both static and time-dependent PDEs on non-rigid geometries; these include solving the Laplace equation, reaction-diffusion equations, and a multiscale PDE that characterizes the electrical propagation on the left ventricle. This work paves the way toward the fast prediction of PDE solutions on a family of domains and the application of neural operators in engineering and precision medicine. △ Less

Submitted 11 February, 2024; originally announced February 2024.

arXiv:2401.15603 [pdf, other]

Improving Expressive Power of Spectral Graph Neural Networks with Eigenvalue Correction

Authors: Kangkang Lu, Yanhua Yu, Hao Fei, Xuan Li, Zixuan Yang, Zirui Guo, Meiyu Liang, Mengran Yin, Tat-Seng Chua

Abstract: In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper… ▽ More In recent years, spectral graph neural networks, characterized by polynomial filters, have garnered increasing attention and have achieved remarkable performance in tasks such as node classification. These models typically assume that eigenvalues for the normalized Laplacian matrix are distinct from each other, thus expecting a polynomial filter to have a high fitting ability. However, this paper empirically observes that normalized Laplacian matrices frequently possess repeated eigenvalues. Moreover, we theoretically establish that the number of distinguishable eigenvalues plays a pivotal role in determining the expressive power of spectral graph neural networks. In light of this observation, we propose an eigenvalue correction strategy that can free polynomial filters from the constraints of repeated eigenvalue inputs. Concretely, the proposed eigenvalue correction strategy enhances the uniform distribution of eigenvalues, thus mitigating repeated eigenvalues, and improving the fitting capacity and expressive power of polynomial filters. Extensive experimental results on both synthetic and real-world datasets demonstrate the superiority of our method. △ Less

Submitted 18 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI-24

arXiv:2401.10341 [pdf, other]

ELRT: Efficient Low-Rank Training for Compact Convolutional Neural Networks

Authors: Yang Sui, Miao Yin, Yu Gong, Jinqi Xiao, Huy Phan, Bo Yuan

Abstract: Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, an… ▽ More Low-rank compression, a popular model compression technique that produces compact convolutional neural networks (CNNs) with low rankness, has been well-studied in the literature. On the other hand, low-rank training, as an alternative way to train low-rank CNNs from scratch, has been exploited little yet. Unlike low-rank compression, low-rank training does not need pre-trained full-rank models, and the entire training phase is always performed on the low-rank structure, bringing attractive benefits for practical applications. However, the existing low-rank training solutions still face several challenges, such as a considerable accuracy drop and/or still needing to update full-size models during the training. In this paper, we perform a systematic investigation on low-rank CNN training. By identifying the proper low-rank format and performance-improving strategy, we propose ELRT, an efficient low-rank training solution for high-accuracy, high-compactness, low-rank CNN models. Our extensive evaluation results for training various CNNs on different datasets demonstrate the effectiveness of ELRT. △ Less

Submitted 18 January, 2024; originally announced January 2024.

arXiv:2401.07058 [pdf, other]

Does More Advice Help? The Effects of Second Opinions in AI-Assisted Decision Making

Authors: Zhuoran Lu, Dakuo Wang, Ming Yin

Abstract: AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of {second opinions} may affect decision-makers' behavior and performance in AI-assisted decision-making. We find t… ▽ More AI assistance in decision-making has become popular, yet people's inappropriate reliance on AI often leads to unsatisfactory human-AI collaboration performance. In this paper, through three pre-registered, randomized human subject experiments, we explore whether and how the provision of {second opinions} may affect decision-makers' behavior and performance in AI-assisted decision-making. We find that if both the AI model's decision recommendation and a second opinion are always presented together, decision-makers reduce their over-reliance on AI while increase their under-reliance on AI, regardless whether the second opinion is generated by a peer or another AI model. However, if decision-makers have the control to decide when to solicit a peer's second opinion, we find that their active solicitations of second opinions have the potential to mitigate over-reliance on AI without inducing increased under-reliance in some cases. We conclude by discussing the implications of our findings for promoting effective human-AI collaborations in decision-making. △ Less

Submitted 13 January, 2024; originally announced January 2024.

arXiv:2401.05840 [pdf, other]

Decoding AI's Nudge: A Unified Framework to Predict Human Behavior in AI-assisted Decision Making

Authors: Zhuoyan Li, Zhuoran Lu, Ming Yin

Abstract: With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end pr… ▽ More With the rapid development of AI-based decision aids, different forms of AI assistance have been increasingly integrated into the human decision making processes. To best support humans in decision making, it is essential to quantitatively understand how diverse forms of AI assistance influence humans' decision making behavior. To this end, much of the current research focuses on the end-to-end prediction of human behavior using ``black-box'' models, often lacking interpretations of the nuanced ways in which AI assistance impacts the human decision making process. Meanwhile, methods that prioritize the interpretability of human behavior predictions are often tailored for one specific form of AI assistance, making adaptations to other forms of assistance difficult. In this paper, we propose a computational framework that can provide an interpretable characterization of the influence of different forms of AI assistance on decision makers in AI-assisted decision making. By conceptualizing AI assistance as the ``{\em nudge}'' in human decision making processes, our approach centers around modelling how different forms of AI assistance modify humans' strategy in weighing different information in making their decisions. Evaluations on behavior data collected from real human decision makers show that the proposed framework outperforms various baselines in accurately predicting human behavior in AI-assisted decision making. Based on the proposed framework, we further provide insights into how individuals with different cognitive styles are nudged by AI assistance differently. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: AAAI 2024

arXiv:2312.15877 [pdf, other]

PBCounter: Weighted Model Counting on Pseudo-Boolean Formulas

Authors: Yong Lai, Zhenghang Xu, Minghao Yin

Abstract: In Weighted Model Counting (WMC), we assign weights to literals and compute the sum of the weights of the models of a given propositional formula where the weight of an assignment is the product of the weights of its literals. The current WMC solvers work on Conjunctive Normal Form (CNF) formulas. However, CNF is not a natural representation for human-being in many applications. Motivated by the s… ▽ More In Weighted Model Counting (WMC), we assign weights to literals and compute the sum of the weights of the models of a given propositional formula where the weight of an assignment is the product of the weights of its literals. The current WMC solvers work on Conjunctive Normal Form (CNF) formulas. However, CNF is not a natural representation for human-being in many applications. Motivated by the stronger expressive power of pseudo-Boolean (PB) formulas than CNF, we propose to perform WMC on PB formulas. Based on a recent dynamic programming algorithm framework called ADDMC for WMC, we implement a weighted PB counting tool PBCounter. We compare PBCounter with the state-of-the-art weighted model counters SharpSAT-TD, ExactMC, D4, and ADDMC, where the latter tools work on CNF with encoding methods that convert PB constraints into a CNF formula. The experiments on three domains of benchmarks show that PBCounter is superior to the model counters on CNF formulas. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15717 [pdf, other]

Spatial-Temporal Interplay in Human Mobility: A Hierarchical Reinforcement Learning Approach with Hypergraph Representation

Authors: Zhaofan Zhang, Yanan Xiao, Lu Jiang, Dingqi Yang, Minghao Yin, Pengyang Wang

Abstract: In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences. This trade-off, however, varies across individuals, making the modeling of these spatial-temporal dynamics a formidable challenge. To address the problem, in thi… ▽ More In the realm of human mobility, the decision-making process for selecting the next-visit location is intricately influenced by a trade-off between spatial and temporal constraints, which are reflective of individual needs and preferences. This trade-off, however, varies across individuals, making the modeling of these spatial-temporal dynamics a formidable challenge. To address the problem, in this work, we introduce the "Spatial-temporal Induced Hierarchical Reinforcement Learning" (STI-HRL) framework, for capturing the interplay between spatial and temporal factors in human mobility decision-making. Specifically, STI-HRL employs a two-tiered decision-making process: the low-level focuses on disentangling spatial and temporal preferences using dedicated agents, while the high-level integrates these considerations to finalize the decision. To complement the hierarchical decision setting, we construct a hypergraph to organize historical data, encapsulating the multi-aspect semantics of human mobility. We propose a cross-channel hypergraph embedding module to learn the representations as the states to facilitate the decision-making cycle. Our extensive experiments on two real-world datasets validate the superiority of STI-HRL over state-of-the-art methods in predicting users' next visits across various performance metrics. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2311.16502 [pdf, other]

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Authors: Xiang Yue, Yuansheng Ni, Kai Zhang, Tianyu Zheng, Ruoqi Liu, Ge Zhang, Samuel Stevens, Dongfu Jiang, Weiming Ren, Yuxuan Sun, Cong Wei, Botao Yu, Ruibin Yuan, Renliang Sun, Ming Yin, Boyuan Zheng, Zhenzhu Yang, Yibo Liu, Wenhao Huang, Huan Sun, Yu Su, Wenhu Chen

Abstract: We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and… ▽ More We introduce MMMU: a new benchmark designed to evaluate multimodal models on massive multi-discipline tasks demanding college-level subject knowledge and deliberate reasoning. MMMU includes 11.5K meticulously collected multimodal questions from college exams, quizzes, and textbooks, covering six core disciplines: Art & Design, Business, Science, Health & Medicine, Humanities & Social Science, and Tech & Engineering. These questions span 30 subjects and 183 subfields, comprising 30 highly heterogeneous image types, such as charts, diagrams, maps, tables, music sheets, and chemical structures. Unlike existing benchmarks, MMMU focuses on advanced perception and reasoning with domain-specific knowledge, challenging models to perform tasks akin to those faced by experts. The evaluation of 14 open-source LMMs as well as the proprietary GPT-4V(ision) and Gemini highlights the substantial challenges posed by MMMU. Even the advanced GPT-4V and Gemini Ultra only achieve accuracies of 56% and 59% respectively, indicating significant room for improvement. We believe MMMU will stimulate the community to build next-generation multimodal foundation models towards expert artificial general intelligence. △ Less

Submitted 13 June, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: CVPR 2024 Oral

arXiv:2311.10357 [pdf, ps, other]

Fast algorithms for classical specifications of stabiliser states and Clifford gates

Authors: Nadish de Silva, Wilfred Salmon, Ming Yin

Abstract: The stabiliser formalism plays a central role in quantum computing, error correction, and fault-tolerance. Stabiliser states are used to encode computational basis states. Clifford gates are those which can be easily performed fault-tolerantly in the most common error correction schemes. Their mathematical properties are the subject of significant research interest. Conversions between and verif… ▽ More The stabiliser formalism plays a central role in quantum computing, error correction, and fault-tolerance. Stabiliser states are used to encode computational basis states. Clifford gates are those which can be easily performed fault-tolerantly in the most common error correction schemes. Their mathematical properties are the subject of significant research interest. Conversions between and verifications of different specifications of stabiliser states and Clifford gates are important components of many classical algorithms in quantum information, e.g. for gate synthesis, circuit optimisation, and for simulating quantum circuits. These core functions are also used in the numerical experiments critical to formulating and testing mathematical conjectures on the stabiliser formalism. We develop novel mathematical insights concerning stabiliser states and Clifford gates that significantly clarify their descriptions. We then utilise these to provide ten new fast algorithms which offer asymptotic advantages over any existing implementations. We show how to rapidly verify that a vector is a stabiliser state, and interconvert between its specification as amplitudes, a quadratic form, and a check matrix. These methods are leveraged to rapidly check if a given unitary matrix is a Clifford gate and to interconvert between the matrix of a Clifford gate and its compact specification as a stabiliser tableau. For example, we extract the stabiliser tableau of a Clifford gate matrix with $N^2$ entries in $O(N \log N)$ time. Remarkably, it is not necessary to read all the elements of a Clifford matrix to extract its stabiliser tableau. This is an asymptotic speedup over the best-known method that is superexponential in the number of qubits. We provide example implementations of our algorithms in Python. △ Less

Submitted 26 May, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

Comments: Python implementations available at https://github.com/ndesilva/stabiliser-tools. New in v2: new algorithm for extracting the stabiliser tableau of a Clifford gate matrix that is exponentially faster compared to v1, more thorough complexity analyses. New in v3: new and faster algorithms, comparisons with existing implementations

arXiv:2311.02816 [pdf, other]

doi 10.1145/3583780.3614781

APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation

Authors: Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong Liu, Ruiming Tang, Defu Lian, Enhong Chen

Abstract: The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance… ▽ More The sequential recommendation system has been widely studied for its promising effectiveness in capturing dynamic preferences buried in users' sequential behaviors. Despite the considerable achievements, existing methods usually focus on intra-sequence modeling while overlooking exploiting global collaborative information by inter-sequence modeling, resulting in inferior recommendation performance. Therefore, previous works attempt to tackle this problem with a global collaborative item graph constructed by pre-defined rules. However, these methods neglect two crucial properties when capturing global collaborative information, i.e., adaptiveness and personalization, yielding sub-optimal user representations. To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems. Specifically, we first learn an adaptive global graph among all items and capture global collaborative information with it in a self-supervised fashion, whose computational burden can be further alleviated by the proposed SVD-based accelerator. Furthermore, based on the graph, we propose to extract and utilize personalized item correlations in the form of relative positional encoding, which is a highly compatible manner of personalizing the utilization of global collaborative information. Finally, the entire framework is optimized in a multi-task learning paradigm, thus each part of APGL4SR can be mutually reinforced. As a generic framework, APGL4SR can outperform other baselines with significant margins. The code is available at https://github.com/Graph-Team/APGL4SR. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2310.18919 [pdf, other]

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Authors: Nikki Lijing Kuang, Ming Yin, Mengdi Wang, Yu-Xiang Wang, Yi-An Ma

Abstract: Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly… ▽ More Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[τ])$ worst-case regret in the presence of unknown stochastic delays. Here $E[τ]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms. △ Less

Submitted 3 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.12026 [pdf, other]

Nonparametric Discrete Choice Experiments with Machine Learning Guided Adaptive Design

Authors: Mingzhang Yin, Ruijiang Gao, Weiran Lin, Steven M. Shugan

Abstract: Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices… ▽ More Designing products to meet consumers' preferences is essential for a business's success. We propose the Gradient-based Survey (GBS), a discrete choice experiment for multiattribute product design. The experiment elicits consumer preferences through a sequence of paired comparisons for partial profiles. GBS adaptively constructs paired comparison questions based on the respondents' previous choices. Unlike the traditional random utility maximization paradigm, GBS is robust to model misspecification by not requiring a parametric utility model. Cross-pollinating the machine learning and experiment design, GBS is scalable to products with hundreds of attributes and can design personalized products for heterogeneous consumers. We demonstrate the advantage of GBS in accuracy and sample efficiency compared to the existing parametric and nonparametric methods in simulations. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.08824 [pdf, other]

Confounding-Robust Policy Improvement with Human-AI Teams

Authors: Ruijiang Gao, Mingzhang Yin

Abstract: Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaborat… ▽ More Human-AI collaboration has the potential to transform various domains by leveraging the complementary strengths of human experts and Artificial Intelligence (AI) systems. However, unobserved confounding can undermine the effectiveness of this collaboration, leading to biased and unreliable outcomes. In this paper, we propose a novel solution to address unobserved confounding in human-AI collaboration by employing the marginal sensitivity model (MSM). Our approach combines domain expertise with AI-driven statistical modeling to account for potential confounders that may otherwise remain hidden. We present a deferral collaboration framework for incorporating the MSM into policy learning from observational data, enabling the system to control for the influence of unobserved confounding factors. In addition, we propose a personalized deferral collaboration system to leverage the diverse expertise of different human decision-makers. By adjusting for potential biases, our proposed solution enhances the robustness and reliability of collaborative outcomes. The empirical and theoretical analyses demonstrate the efficacy of our approach in mitigating unobserved confounding and improving the overall performance of human-AI collaborations. △ Less

Submitted 12 October, 2023; originally announced October 2023.

Comments: 24 pages

arXiv:2310.07849 [pdf, other]

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations

Authors: Zhuoyan Li, Hangxiao Zhu, Zhuoran Lu, Ming Yin

Abstract: The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data… ▽ More The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently explored using large language models (LLMs) to generate synthetic datasets as an alternative approach. However, the effectiveness of the LLM-generated synthetic data in supporting model training is inconsistent across different classification tasks. To better understand factors that moderate the effectiveness of the LLM-generated synthetic data, in this study, we look into how the performance of models trained on these synthetic data may vary with the subjectivity of classification. Our results indicate that subjectivity, at both the task level and instance level, is negatively associated with the performance of the model trained on synthetic data. We conclude by discussing the implications of our work on the potential and limitations of leveraging LLM for synthetic data generation. △ Less

Submitted 12 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

Comments: EMNLP 2023

arXiv:2310.06335 [pdf, other]

BBCA-CHAIN: Low Latency, High Throughput BFT Consensus on a DAG

Authors: Dahlia Malkhi, Chrysoula Stathakopoulou, Maofan Yin

Abstract: This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra typ… ▽ More This paper presents a partially synchronous BFT consensus protocol powered by BBCA, a lightly modified Byzantine Consistent Broadcast (BCB) primitive. BBCA provides a Complete-Adopt semantic through an added probing interface to allow either aborting the broadcast by correct nodes or exclusively, adopting the message consistently in case of a potential delivery. It does not introduce any extra types of messages or additional communication costs to BCB. BBCA is harnessed into BBCA-CHAIN to make direct commits on a chained backbone of a causally ordered graph of blocks, without any additional voting blocks or artificial layering. With the help of Complete-Adopt, the additional knowledge gained from the underlying BCB completely removes the voting latency in popular DAG-based protocols. At the same time, causal ordering allows nodes to propose blocks in parallel and achieve high throughput. BBCA-CHAIN thus closes up the gap between protocols built by consistent broadcasts (e.g., Bullshark) to those without such an abstraction (e.g., PBFT/HotStuff), emphasizing their shared fundamental principles. Using a Bracha-style BCB as an example, we fully specify BBCA-CHAIN with simplicity, serving as a solid basis for high-performance replication systems (and blockchains). △ Less

Submitted 24 May, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.16251 [pdf, other]

The effect of 3D stereopsis and hand-tool alignment on learning effectiveness and skill transfer of a VR-based simulator for dental training

Authors: Maximilian Kaluschke, Myat Su Yin, Peter Haddawy, Siriwan Suebnukarn, Gabriel Zachmann

Abstract: Dental simulators gained prevalence in recent years. Important aspects distinguishing VR hardware configurations are 3D stereoscopic rendering and visual alignment of the user's hands with the virtual tools. New dental simulators are often evaluated without analysing the impact of these simulation aspects. In this paper, we seek to determine the impact of 3D stereoscopic rendering and of hand-tool… ▽ More Dental simulators gained prevalence in recent years. Important aspects distinguishing VR hardware configurations are 3D stereoscopic rendering and visual alignment of the user's hands with the virtual tools. New dental simulators are often evaluated without analysing the impact of these simulation aspects. In this paper, we seek to determine the impact of 3D stereoscopic rendering and of hand-tool alignment on the teaching effectiveness and skill assessment accuracy of a VR dental simulator. We developed a bimanual simulator using an HMD and two haptic devices that provides an immersive environment with both 3D stereoscopic rendering and hand-tool alignment. We then independently controlled for each of the two aspects of the simulation. We trained four groups of students in root canal access opening using the simulator and measured the virtual and real learning gains. We quantified the real learning gains by pre- and post-testing using realistic plastic teeth and the virtual learning gains by scoring the training outcomes inside the simulator. We developed a scoring metric to automatically score the training outcomes that strongly correlates with experts' scoring of those outcomes. We found that hand-tool alignment has a positive impact on virtual and real learning gains, and improves the accuracy of skill assessment. We found that stereoscopic 3D had a negative impact on virtual and real learning gains, however it improves the accuracy of skill assessment. This finding is counter-intuitive, and we found eye-tooth distance to be a confounding variable of stereoscopic 3D, as it was significantly lower for the monoscopic 3D condition and negatively correlates with real learning gain. The results of our study provide valuable information for the future design of dental simulators, as well as simulators for other high-precision psycho-motor tasks. △ Less

Submitted 28 September, 2023; originally announced September 2023.

Comments: 26 pages, 15 figures, Accepted at online journal PLoS ONE

MSC Class: 62A86 (Primary) 62H30 (Secondary) ACM Class: J.3; G.3

arXiv:2308.08858 [pdf, ps, other]

Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games

Authors: Songtao Feng, Ming Yin, Yu-Xiang Wang, Jing Yang, Yingbin Liang

Abstract: The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the d… ▽ More The problem of two-player zero-sum Markov games has recently attracted increasing interests in theoretical studies of multi-agent reinforcement learning (RL). In particular, for finite-horizon episodic Markov decision processes (MDPs), it has been shown that model-based algorithms can find an $ε$-optimal Nash Equilibrium (NE) with the sample complexity of $O(H^3SAB/ε^2)$, which is optimal in the dependence of the horizon $H$ and the number of states $S$ (where $A$ and $B$ denote the number of actions of the two players, respectively). However, none of the existing model-free algorithms can achieve such an optimality. In this work, we propose a model-free stage-based Q-learning algorithm and show that it achieves the same sample complexity as the best model-based algorithm, and hence for the first time demonstrate that model-free algorithms can enjoy the same optimality in the $H$ dependence as model-based algorithms. The main improvement of the dependency on $H$ arises by leveraging the popular variance reduction technique based on the reference-advantage decomposition previously used only for single-agent RL. However, such a technique relies on a critical monotonicity property of the value function, which does not hold in Markov games due to the update of the policy via the coarse correlated equilibrium (CCE) oracle. Thus, to extend such a technique to Markov games, our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions whose value difference is the smallest in the history in order to achieve the desired improvement in the sample efficiency. △ Less

Submitted 5 June, 2024; v1 submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.02665 [pdf, other]

Let's Give a Voice to Conversational Agents in Virtual Reality

Authors: Michele Yin, Gabriel Roccabruna, Abhinav Azad, Giuseppe Riccardi

Abstract: The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and ad… ▽ More The dialogue experience with conversational agents can be greatly enhanced with multimodal and immersive interactions in virtual reality. In this work, we present an open-source architecture with the goal of simplifying the development of conversational agents operating in virtual environments. The architecture offers the possibility of plugging in conversational agents of different domains and adding custom or cloud-based Speech-To-Text and Text-To-Speech models to make the interaction voice-based. Using this architecture, we present two conversational prototypes operating in the digital health domain developed in Unity for both non-immersive displays and VR headsets. △ Less

Submitted 4 August, 2023; originally announced August 2023.

arXiv:2306.14757 [pdf, other]

BBCA-LEDGER: High Throughput Consensus meets Low Latency

Authors: Chrysoula Stathakopoulou, Michael Wei, Maofan Yin, Hongbo Zhang, Dahlia Malkhi

Abstract: This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks.… ▽ More This paper presents BBCA-LEDGER, a Byzantine log replication technology for partially synchronous networks enabling blocks to be broadcast in parallel, such that each broadcast is finalized independently and instantaneously into an individual slot in the log. Every finalized broadcast is eventually committed to the total ordering, so that all network bandwidth has utility in disseminating blocks. Finalizing log slots in parallel achieves both high throughput and low latency. BBCA-LEDGER is composed of two principal protocols that interweave together, a low-latency/high-throughput happy path, and a high-throughput DAG-based fallback path. The happy path employs a novel primitive called BBCA, a consistent broadcast enforcing unique slot numbering. In steady state, BBCA ensures that a transaction can be committed with low latency, in just 3 network steps. Under network partitions or faults, we harness recent advances in BFT and build a fallback mechanism on a direct acyclic graph (DAG) created by BBCA broadcasts. In this manner, BBCA-LEDGER exhibits the throughput benefits of DAG-based BFT in face of gaps. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.14063 [pdf, other]

Offline Policy Evaluation for Reinforcement Learning with Adaptively Collected Data

Authors: Sunil Madhow, Dan Qiao, Ming Yin, Yu-Xiang Wang

Abstract: Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution -- namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may… ▽ More Developing theoretical guarantees on the sample complexity of offline RL methods is an important step towards making data-hungry RL algorithms practically viable. Currently, most results hinge on unrealistic assumptions about the data distribution -- namely that it comprises a set of i.i.d. trajectories collected by a single logging policy. We consider a more general setting where the dataset may have been gathered adaptively. We develop theory for the TMIS Offline Policy Evaluation (OPE) estimator in this generalized setting for tabular MDPs, deriving high-probability, instance-dependent bounds on its estimation error. We also recover minimax-optimal offline learning in the adaptive setting. Finally, we conduct simulations to empirically analyze the behavior of these estimators under adaptive and non-adaptive regimes. △ Less

Submitted 30 April, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

arXiv:2306.07992 [pdf, other]

Securing Visually-Aware Recommender Systems: An Adversarial Image Reconstruction and Detection Framework

Authors: Minglei Yin, Bin Liu, Neil Zhenqiang Gong, Xin Li

Abstract: With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to… ▽ More With rich visual data, such as images, becoming readily associated with items, visually-aware recommendation systems (VARS) have been widely used in different applications. Recent studies have shown that VARS are vulnerable to item-image adversarial attacks, which add human-imperceptible perturbations to the clean images associated with those items. Attacks on VARS pose new security challenges to a wide range of applications such as e-Commerce and social networks where VARS are widely used. How to secure VARS from such adversarial attacks becomes a critical problem. Currently, there is still a lack of systematic study on how to design secure defense strategies against visual attacks on VARS. In this paper, we attempt to fill this gap by proposing an adversarial image reconstruction and detection framework to secure VARS. Our proposed method can simultaneously (1) secure VARS from adversarial attacks characterized by local perturbations by image reconstruction based on global vision transformers; and (2) accurately detect adversarial examples using a novel contrastive learning approach. Meanwhile, our framework is designed to be used as both a filter and a detector so that they can be jointly trained to improve the flexibility of our defense strategy to a variety of attacks and VARS models. We have conducted extensive experimental studies with two popular attack methods (FGSM and PGD). Our experimental results on two real-world datasets show that our defense strategy against visual attacks is effective and outperforms existing methods on different attacks. Moreover, our method can detect adversarial examples with high accuracy. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Showing 1–50 of 173 results for author: Yin, M