subscribe to arXiv mailings

Quantum Maximum Entropy Inference and Hamiltonian Learning

Authors: Minbo Gao, Zhengfeng Ji, Fuchao Wei

Abstract: Maximum entropy inference and learning of graphical models are pivotal tasks in learning theory and optimization. This work extends algorithms for these problems, including generalized iterative scaling (GIS) and gradient descent (GD), to the quantum realm. While the generalization, known as quantum iterative scaling (QIS), is straightforward, the key challenge lies in the non-commutative nature o… ▽ More Maximum entropy inference and learning of graphical models are pivotal tasks in learning theory and optimization. This work extends algorithms for these problems, including generalized iterative scaling (GIS) and gradient descent (GD), to the quantum realm. While the generalization, known as quantum iterative scaling (QIS), is straightforward, the key challenge lies in the non-commutative nature of quantum problem instances, rendering the convergence rate analysis significantly more challenging than the classical case. Our principal technical contribution centers on a rigorous analysis of the convergence rates, involving the establishment of both lower and upper bounds on the spectral radius of the Jacobian matrix for each iteration of these algorithms. Furthermore, we explore quasi-Newton methods to enhance the performance of QIS and GD. Specifically, we propose using Anderson mixing and the L-BFGS method for QIS and GD, respectively. These quasi-Newton techniques exhibit remarkable efficiency gains, resulting in orders of magnitude improvements in performance. As an application, our algorithms provide a viable approach to designing Hamiltonian learning algorithms. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 7 figures

arXiv:2407.10098 [pdf, other]

Accelerator-as-a-Service in Public Clouds: An Intra-Host Traffic Management View for Performance Isolation in the Wild

Authors: Jiechen Zhao, Ran Shu, Katie Lim, Zewen Fan, Thomas Anderson, Mingyu Gao, Natalie Enright Jerger

Abstract: I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present… ▽ More I/O devices in public clouds have integrated increasing numbers of hardware accelerators, e.g., AWS Nitro, Azure FPGA and Nvidia BlueField. However, such specialized compute (1) is not explicitly accessible to cloud users with performance guarantee, (2) cannot be leveraged simultaneously by both providers and users, unlike general-purpose compute (e.g., CPUs). Through ten observations, we present that the fundamental difficulty of democratizing accelerators is insufficient performance isolation support. The key obstacles to enforcing accelerator isolation are (1) too many unknown traffic patterns in public clouds and (2) too many possible contention sources in the datapath. In this work, instead of scheduling such complex traffic on-the-fly and augmenting isolation support on each system component, we propose to model traffic as network flows and proactively re-shape the traffic to avoid unpredictable contention. We discuss the implications of our findings on the design of future I/O management stacks and device interfaces. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.09678 [pdf, other]

Q statistics in data depth: fundamental theory revisited and variants

Authors: Min Gao, Yiting Chen, Xiaoping Shi, Wenzhi Yang

Abstract: Recently, data depth has been widely used to rank multivariate data. The study of the depth-based $Q$ statistic, originally proposed by Liu and Singh (1993), has become increasingly popular when it can be used as a quality index to differentiate between two samples. Based on the existing theoretical foundations, more and more variants have been developed for increasing power in the two sample test… ▽ More Recently, data depth has been widely used to rank multivariate data. The study of the depth-based $Q$ statistic, originally proposed by Liu and Singh (1993), has become increasingly popular when it can be used as a quality index to differentiate between two samples. Based on the existing theoretical foundations, more and more variants have been developed for increasing power in the two sample test. However, the asymptotic expansion of the $Q$ statistic in the important foundation work of Zuo and He (2006) currently has an optimal rate $m^{-3/4}$ slower than the target $m^{-1}$, leading to limitations in higher-order expansions for developing more powerful tests. We revisit the existing assumptions and add two new plausible assumptions to obtain the target rate by applying a new proof method based on the Hoeffding decomposition and the Cox-Reid expansion. The aim of this paper is to rekindle interest in asymptotic data depth theory, to place Q-statistical inference on a firmer theoretical basis, to show its variants in current research, to open the door to the development of new theories for further variants requiring higher-order expansions, and to explore more of its potential applications. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.06250 [pdf, other]

FairDiff: Fair Segmentation with Point-Image Diffusion

Authors: Wenyi Li, Haoran Xu, Guiyu Zhang, Huan-ang Gao, Mingju Gao, Mengyu Wang, Hao Zhao

Abstract: Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack pai… ▽ More Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and the societal demand for equitable medical quality. In response to this issue, our research adopts a data-driven strategy-enhancing data balance by integrating synthetic images. However, in terms of generating synthetic images, previous works either lack paired labels or fail to precisely control the boundaries of synthetic images to be aligned with those labels. To address this, we formulate the problem in a joint optimization manner, in which three networks are optimized towards the goal of empirical risk minimization and fairness maximization. On the implementation side, our solution features an innovative Point-Image Diffusion architecture, which leverages 3D point clouds for improved control over mask boundaries through a point-mask-image synthesis pipeline. This method outperforms significantly existing techniques in synthesizing scanning laser ophthalmoscopy (SLO) fundus images. By combining synthetic data with real data during the training phase using a proposed Equal Scale approach, our model achieves superior fairness segmentation performance compared to the state-of-the-art fairness learning models. Code is available at https://github.com/wenyi-li/FairDiff. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to MICCAI 2024

arXiv:2407.05126 [pdf, other]

doi 10.1145/3637528.3672056

Consistency and Discrepancy-Based Contrastive Tripartite Graph Learning for Recommendations

Authors: Linxin Guo, Yaochen Zhu, Min Gao, Yinghui Tao, Junliang Yu, Chen Chen

Abstract: Tripartite graph-based recommender systems markedly diverge from traditional models by recommending unique combinations such as user groups and item bundles. Despite their effectiveness, these systems exacerbate the longstanding cold-start problem in traditional recommender systems, because any number of user groups or item bundles can be formed among users or items. To address this issue, we intr… ▽ More Tripartite graph-based recommender systems markedly diverge from traditional models by recommending unique combinations such as user groups and item bundles. Despite their effectiveness, these systems exacerbate the longstanding cold-start problem in traditional recommender systems, because any number of user groups or item bundles can be formed among users or items. To address this issue, we introduce a Consistency and Discrepancy-based graph contrastive learning method for tripartite graph-based Recommendation. This approach leverages two novel meta-path-based metrics consistency and discrepancy to capture nuanced, implicit associations between the recommended objects and the recommendees. These metrics, indicative of high-order similarities, can be efficiently calculated with infinite graph convolutional networks layers under a multi-objective optimization framework, using the limit theory of GCN. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.00488 [pdf, other]

PFME: A Modular Approach for Fine-grained Hallucination Detection and Editing of Large Language Models

Authors: Kunquan Deng, Zeyu Huang, Chen Li, Chenghua Lin, Min Gao, Wenge Rong

Abstract: Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called "hallucinations." This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model Editor (PFME)--specifically designed to detect and correct fine-grained hallucinations in LLMs. PFME consists of two collabo… ▽ More Large Language Models (LLMs) excel in fluency but risk producing inaccurate content, called "hallucinations." This paper outlines a standardized process for categorizing fine-grained hallucination types and proposes an innovative framework--the Progressive Fine-grained Model Editor (PFME)--specifically designed to detect and correct fine-grained hallucinations in LLMs. PFME consists of two collaborative modules: the Real-time Fact Retrieval Module and the Fine-grained Hallucination Detection and Editing Module. The former identifies key entities in the document and retrieves the latest factual evidence from credible sources. The latter further segments the document into sentence-level text and, based on relevant evidence and previously edited context, identifies, locates, and edits each sentence's hallucination type. Experimental results on FavaBench and FActScore demonstrate that PFME outperforms existing methods in fine-grained hallucination detection tasks. Particularly, when using the Llama3-8B-Instruct model, PFME's performance in fine-grained hallucination detection with external knowledge assistance improves by 8.7 percentage points (pp) compared to ChatGPT. In editing tasks, PFME further enhances the FActScore of FActScore-Alpaca13B and FActScore-ChatGPT datasets, increasing by 16.2pp and 4.6pp, respectively. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00433 [pdf]

Screening of half-Heuslers with temperature-induced band convergence and enhanced thermoelectric properties

Authors: Jinyang Xi, Zirui Dong, Menghan Gao, Jun Luo, Jiong Yang

Abstract: Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-in… ▽ More Enhancing band convergence is an effective way to optimize the thermoelectric (TE) properties of materials. However, the temperature-induced band renormalization is commonly ignored. By employing the recently-developed electron-phonon renormalization (EPR) method, the nature of band renormalization in half-Heusler (HH) compounds TiCoSb and NbFeSb is revealed, and the key factors for temperature-induced conduction band convergence in HH are found out. Using these as the screening criteria, 3 out of 274 HHs (TiRhBi, TiPtSn, NbPtTl) are then stood out from our MatHub-3d database. Taking TiPtSn as the example, it shows the conduction band convergence at mid-high temperature, and further resulting in enhanced Seebeck coefficient S: e.g., at 600 K with electron concentration 10^20 cm^-3, the predicted S with and without renormalized band is 352.83 uV/K and 289.52 uV/K, respectively. Herein, the former is closer to our measurement value of 338.79 uV/K. Besides, the effective masses obtained from calculation and experiment are both enlarged with temperature, indicating the existence of band convergence. Our work demonstrates for the first time the significance of adding the temperature effect on electronic structure in the design of potential high-performance TE materials. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.18365 [pdf, other]

Themis: Towards Flexible and Interpretable NLG Evaluation

Authors: Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

Abstract: The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the impro… ▽ More The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus NLG-Eval with human and GPT-4 annotations to alleviate the lack of relevant data in this field. Furthermore, we propose Themis, an LLM dedicated to NLG evaluation, which has been trained with our designed multi-perspective consistency and rating-oriented preference alignment methods. Themis can conduct flexible and interpretable evaluations without references, and it exhibits superior evaluation performance on various NLG tasks, simultaneously generalizing well to unseen tasks and surpassing other evaluation models, including GPT-4. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17005 [pdf, other]

PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as the disappearance and reappearance of objects, inconspicuous small objects, heavy occlusions, and crowded environments in MOSE. Moreover, we provide a new motion expression guided video segmentation dataset MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic scenarios. The MOSE challenge had 140 registered teams in total, 65 teams participated the validation phase and 12 teams made valid submissions in the final challenge phase. The MeViS challenge had 225 registered teams in total, 50 teams participated the validation phase and 5 teams made valid submissions in the final challenge phase. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

arXiv:2406.14673 [pdf, other]

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

Authors: Taiming Lu, Muhan Gao, Kuai Yu, Adam Byerly, Daniel Khashabi

Abstract: Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information re… ▽ More Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze the relationship between extraction time and final accuracy, offering insights into the underlying mechanics of transformer models. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.12066 [pdf, other]

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Authors: Jack Gallifant, Shan Chen, Pedro Moreira, Nikolaj Munch, Mingye Gao, Jackson Pond, Leo Anthony Celi, Hugo Aerts, Thomas Hartvigsen, Danielle Bitterman

Abstract: Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medica… ▽ More Medical knowledge is context-dependent and requires consistent reasoning across various natural language expressions of semantically equivalent phrases. This is particularly crucial for drug names, where patients often use brand names like Advil or Tylenol instead of their generic equivalents. To study this, we create a new robustness dataset, RABBITS, to evaluate performance differences on medical benchmarks after swapping brand and generic drug names using physician expert annotations. We assess both open-source and API-based LLMs on MedQA and MedMCQA, revealing a consistent performance drop ranging from 1-10\%. Furthermore, we identify a potential source of this fragility as the contamination of test data in widely used pre-training datasets. All code is accessible at https://github.com/BittermanLab/RABBITS, and a HuggingFace leaderboard is available at https://huggingface.co/spaces/AIM-Harvard/rabbits-leaderboard. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

Comments: submitted for review, total 15 pages

arXiv:2406.11336 [pdf, other]

LFPLM: A General and Flexible Load Forecasting Framework based on Pre-trained Language Model

Authors: Mingyang Gao, Suyang Zhou, Wei Gu, Zhi Wu, Zijian Hu, Hong Zhu, Haiquan Liu

Abstract: Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent… ▽ More Accurate load forecasting is essential for maintaining the power balance between generators and consumers, especially with the increasing integration of renewable energy sources, which introduce significant intermittent volatility. With the development of data-driven methods, machine learning and deep learning-based models have become the predominant approach for load forecasting tasks. In recent years, pre-trained language models (PLMs) have made significant advancements, demonstrating superior performance in various fields. This paper proposes a load forecasting method based on PLMs, which offers not only accurate predictive ability but also general and flexible applicability. Additionally, a data modeling method is proposed to effectively transform load sequence data into natural language for PLM training. Furthermore, we introduce a data enhancement strategy that eliminate the impact of PLM hallucinations on forecasting results. The effectiveness of the proposed method has been validated on two real-world datasets. Compared with existing methods, our approach shows state-of-the-art performance across all validation metrics. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures and 5 tables

arXiv:2406.10304 [pdf, other]

Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Authors: Ming Gao, Hang Chen, Jun Du, Xin Xu, Hongxiao Guo, Hui Bu, Jianxing Yang, Ming Li, Chin-Hui Lee

Abstract: Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we… ▽ More Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: to be published in Interspeech 2024

arXiv:2406.09406 [pdf, other]

4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities

Authors: Roman Bachmann, Oğuzhan Fatih Kar, David Mizrahi, Ali Garjani, Mingfei Gao, David Griffiths, Jiaming Hu, Afshin Dehghan, Amir Zamir

Abstract: Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse moda… ▽ More Current multimodal and multitask foundation models like 4M or UnifiedIO show promising results, but in practice their out-of-the-box abilities to accept diverse inputs and perform diverse tasks are limited by the (usually rather small) number of modalities and tasks they are trained on. In this paper, we expand upon the capabilities of them by training a single model on tens of highly diverse modalities and by performing co-training on large-scale multimodal datasets and text corpora. This includes training on several semantic and geometric modalities, feature maps from recent state of the art models like DINOv2 and ImageBind, pseudo labels of specialist models like SAM and 4DHumans, and a range of new modalities that allow for novel ways to interact with the model and steer the generation, for example image metadata or color palettes. A crucial step in this process is performing discrete tokenization on various modalities, whether they are image-like, neural network feature maps, vectors, structured data like instance segmentation or human poses, or data that can be represented as text. Through this, we expand on the out-of-the-box capabilities of multimodal models and specifically show the possibility of training one model to solve at least 3x more tasks/modalities than existing ones and doing so without a loss in performance. This enables more fine-grained and controllable multimodal generation capabilities and allows us to study the distillation of models trained on diverse data and objectives into a unified model. We successfully scale the training to a three billion parameter model using tens of modalities and different datasets. The resulting models and training code are open sourced at 4m.epfl.ch. △ Less

Submitted 14 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Project page at 4m.epfl.ch

arXiv:2406.08419 [pdf, ps, other]

Identification and Inference on Treatment Effects under Covariate-Adaptive Randomization and Imperfect Compliance

Authors: Federico A. Bugni, Mengsi Gao, Filip Obradovic, Amilcar Velez

Abstract: Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment. We first develop characterizations… ▽ More Randomized controlled trials (RCTs) frequently utilize covariate-adaptive randomization (CAR) (e.g., stratified block randomization) and commonly suffer from imperfect compliance. This paper studies the identification and inference for the average treatment effect (ATE) and the average treatment effect on the treated (ATT) in such RCTs with a binary treatment. We first develop characterizations of the identified sets for both estimands. Since data are generally not i.i.d. under CAR, these characterizations do not follow from existing results. We then provide consistent estimators of the identified sets and asymptotically valid confidence intervals for the parameters. Our asymptotic analysis leads to concrete practical recommendations regarding how to estimate the treatment assignment probabilities that enter in estimated bounds. In the case of the ATE, using sample analog assignment frequencies is more efficient than using the true assignment probabilities. On the contrary, using the true assignment probabilities is preferable for the ATT. △ Less

Submitted 20 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 62 pages and 3 tables

arXiv:2406.07967 [pdf, other]

Better than Random: Reliable NLG Human Evaluation with Constrained Active Sampling

Authors: Jie Ruan, Xiao Pu, Mingqi Gao, Xiaojun Wan, Yuesheng Zhu

Abstract: Human evaluation is viewed as a reliable evaluation method for NLG which is expensive and time-consuming. To save labor and costs, researchers usually perform human evaluation on a small subset of data sampled from the whole dataset in practice. However, different selection subsets will lead to different rankings of the systems. To give a more correct inter-system ranking and make the gold standar… ▽ More Human evaluation is viewed as a reliable evaluation method for NLG which is expensive and time-consuming. To save labor and costs, researchers usually perform human evaluation on a small subset of data sampled from the whole dataset in practice. However, different selection subsets will lead to different rankings of the systems. To give a more correct inter-system ranking and make the gold standard human evaluation more reliable, we propose a Constrained Active Sampling Framework (CASF) for reliable human judgment. CASF operates through a Learner, a Systematic Sampler and a Constrained Controller to select representative samples for getting a more correct inter-system ranking.Experiment results on 137 real NLG evaluation setups with 44 human evaluation metrics across 16 datasets and 5 NLG tasks demonstrate CASF receives 93.18% top-ranked system recognition accuracy and ranks first or ranks second on 90.91% of the human metrics with 0.83 overall inter-system ranking Kendall correlation.Code and data are publicly available online. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: With Appendix

arXiv:2406.07043 [pdf, other]

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Authors: Mingqi Gao, Jingnan Luo, Jinyu Yang, Jungong Han, Feng Zheng

Abstract: Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeVi… ▽ More Motion Expression guided Video Segmentation (MeViS), as an emerging task, poses many new challenges to the field of referring video object segmentation (RVOS). In this technical report, we investigated and validated the effectiveness of static-dominant data and frame sampling on this challenging setting. Our solution achieves a J&F score of 0.5447 in the competition phase and ranks 1st in the MeViS track of the PVUW Challenge. The code is available at: https://github.com/Tapall-AI/MeViS_Track_Solution_2024. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.04983 [pdf, other]

CityCraft: A Real Crafter for 3D City Generation

Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neural rendering. These techniques often exhibit limited diversity and noticeable artifacts in the rendered city scenes. The rendered scenes lack variety, resembling the training images, resulting in monotonous styles. Additionally, these methods lack planning capabilities, leading to less realistic generated scenes. In this paper, we introduce CityCraft, an innovative framework designed to enhance both the diversity and quality of urban scene generation. Our approach integrates three key stages: initially, a diffusion transformer (DiT) model is deployed to generate diverse and controllable 2D city layouts. Subsequently, a Large Language Model(LLM) is utilized to strategically make land-use plans within these layouts based on user prompts and language guidelines. Based on the generated layout and city plan, we utilize the asset retrieval module and Blender for precise asset placement and scene construction. Furthermore, we contribute two new datasets to the field: 1)CityCraft-OSM dataset including 2D semantic layouts of urban areas, corresponding satellite images, and detailed annotations. 2) CityCraft-Buildings dataset, featuring thousands of diverse, high-quality 3D building assets. CityCraft achieves state-of-the-art performance in generating realistic 3D cities. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 20 pages, 9 figures

arXiv:2406.01022 [pdf, other]

Poisoning Attacks and Defenses in Recommender Systems: A Survey

Authors: Zongwei Wang, Junliang Yu, Min Gao, Wei Yuan, Guanhua Ye, Shazia Sadiq, Hongzhi Yin

Abstract: Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats… ▽ More Modern recommender systems (RS) have profoundly enhanced user experience across digital platforms, yet they face significant threats from poisoning attacks. These attacks, aimed at manipulating recommendation outputs for unethical gains, exploit vulnerabilities in RS through injecting malicious data or intervening model training. This survey presents a unique perspective by examining these threats through the lens of an attacker, offering fresh insights into their mechanics and impacts. Concretely, we detail a systematic pipeline that encompasses four stages of a poisoning attack: setting attack goals, assessing attacker capabilities, analyzing victim architecture, and implementing poisoning strategies. The pipeline not only aligns with various attack tactics but also serves as a comprehensive taxonomy to pinpoint focuses of distinct poisoning attacks. Correspondingly, we further classify defensive strategies into two main categories: poisoning data filtering and robust training from the defender's perspective. Finally, we highlight existing limitations and suggest innovative directions for further exploration in this field. △ Less

Submitted 5 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: 22 pages, 8 figures

arXiv:2406.00322 [pdf, other]

Adaptive Penalized Likelihood method for Markov Chains

Authors: Yining Zhou, Ming Gao, Yiting Chen, Xiaoping Shi

Abstract: Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov… ▽ More Maximum Likelihood Estimation (MLE) and Likelihood Ratio Test (LRT) are widely used methods for estimating the transition probability matrix in Markov chains and identifying significant relationships between transitions, such as equality. However, the estimated transition probability matrix derived from MLE lacks accuracy compared to the real one, and LRT is inefficient in high-dimensional Markov chains. In this study, we extended the adaptive Lasso technique from linear models to Markov chains and proposed a novel model by applying penalized maximum likelihood estimation to optimize the estimation of the transition probability matrix. Meanwhile, we demonstrated that the new model enjoys oracle properties, which means the estimated transition probability matrix has the same performance as the real one when given. Simulations show that our new method behave very well overall in comparison with various competitors. Real data analysis further convince the value of our proposed method. △ Less

Submitted 1 June, 2024; originally announced June 2024.

arXiv:2405.19469 [pdf, other]

Constraining Inflation with the BICEP/Keck CMB Polarization Experiments

Authors: The BICEP/Keck Collaboration, :, P. A. R. Ade, Z. Ahmed, M. Amiri, D. Barkats, R. Basu Thakur, C. A. Bischoff, D. Beck, J. J. Bock, H. Boenish, V. Buza, J. R. Cheshire IV, J. Connors, J. Cornelison, M. Crumrine, A. Cukierman, E. V. Denison, M. Dierickx, L. Duband, M. Eiben, B. Elwood, S. Fatigoni, J. P. Filippini, M. Gao , et al. (63 additional authors not shown)

Abstract: The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor… ▽ More The BICEP/$\textit{Keck}$ (BK) series of cosmic microwave background (CMB) polarization experiments has, over the past decade and a half, produced a series of field-leading constraints on cosmic inflation via measurements of the "B-mode" polarization of the CMB. Primordial B modes are directly tied to the amplitude of primordial gravitational waves (PGW), their strength parameterized by the tensor-to-scalar ratio, $r$, and thus the energy scale of inflation. Having set the most sensitive constraints to-date on $r$, $σ(r)=0.009$ ($r_{0.05}<0.036, 95\%$ C.L.) using data through the 2018 observing season ("BK18"), the BICEP/$\textit{Keck}$ program has continued to improve its dataset in the years since. We give a brief overview of the BK program and the "BK18" result before discussing the program's ongoing efforts, including the deployment and performance of the $\textit{Keck Array}$'s successor instrument, BICEP Array, improvements to data processing and internal consistency testing, new techniques such as delensing, and how those will ultimately serve to allow BK reach $σ(r) \lesssim 0.003$ using data through the 2027 observing season. △ Less

Submitted 11 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 9 pages, 5 figures. Contribution to the 2024 Cosmology session of the 58th Rencontres de Moriond

arXiv:2405.18320 [pdf, other]

Self-Supervised Learning Based Handwriting Verification

Authors: Mihir Chauhan, Mohammad Abuzar Shaikh, Bina Ramamurthy, Mingchen Gao, Siwei Lyu, Sargur Srihari

Abstract: We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND data… ▽ More We present SSL-HV: Self-Supervised Learning approaches applied to the task of Handwriting Verification. This task involves determining whether a given pair of handwritten images originate from the same or different writer distribution. We have compared the performance of multiple generative, contrastive SSL approaches against handcrafted feature extractors and supervised learning on CEDAR AND dataset. We show that ResNet based Variational Auto-Encoder (VAE) outperforms other generative approaches achieving 76.3% accuracy, while ResNet-18 fine-tuned using Variance-Invariance-Covariance Regularization (VICReg) outperforms other contrastive approaches achieving 78% accuracy. Using a pre-trained VAE and VICReg for the downstream task of writer verification we observed a relative improvement in accuracy of 6.7% and 9% over ResNet-18 supervised baseline with 10% writer labels. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 14 pages, 6 figures, 2 tables

arXiv:2405.15414 [pdf, other]

Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self-improvement in solving the task. In this work, we introduce autonomous embodied verification techniques for agents to fill the gap, laying the groundwork for creative tasks. Specifically, we propose the Luban agent target creative building tasks in Minecraft, which equips with two-level autonomous embodied verification inspired by human design practices: (1) visual verification of 3D structural speculates, which comes from agent synthesized CAD modeling programs; (2) pragmatic verification of the creation by generating and verifying environment-relevant functionality programs based on the abstract criteria. Extensive multi-dimensional human studies and Elo ratings show that the Luban completes diverse creative building tasks in our proposed benchmark and outperforms other baselines ($33\%$ to $100\%$) in both visualization and pragmatism. Additional demos on the real-world robotic arm show the creation potential of the Luban in the physical world. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.15245 [pdf, other]

Cooperative Backdoor Attack in Decentralized Reinforcement Learning with Theoretical Guarantee

Authors: Mengtong Gao, Yifei Zou, Zuyuan Zhang, Xiuzhen Cheng, Dongxiao Yu

Abstract: The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor be… ▽ More The safety of decentralized reinforcement learning (RL) is a challenging problem since malicious agents can share their poisoned policies with benign agents. The paper investigates a cooperative backdoor attack in a decentralized reinforcement learning scenario. Differing from the existing methods that hide a whole backdoor attack behind their shared policies, our method decomposes the backdoor behavior into multiple components according to the state space of RL. Each malicious agent hides one component in its policy and shares its policy with the benign agents. When a benign agent learns all the poisoned policies, the backdoor attack is assembled in its policy. The theoretical proof is given to show that our cooperative method can successfully inject the backdoor into the RL policies of benign agents. Compared with the existing backdoor attacks, our cooperative method is more covert since the policy from each attacker only contains a component of the backdoor attack and is harder to detect. Extensive simulations are conducted based on Atari environments to demonstrate the efficiency and covertness of our method. To the best of our knowledge, this is the first paper presenting a provable cooperative backdoor attack in decentralized reinforcement learning. △ Less

Submitted 24 May, 2024; originally announced May 2024.

arXiv:2405.14496 [pdf, other]

Hybrid Global Causal Discovery with Local Search

Authors: Sujai Hiremath, Jacqueline R. M. A. Maasch, Mengxiao Gao, Promit Ghosal, Kyra Gan

Abstract: Learning the unique directed acyclic graph corresponding to an unknown causal model is a challenging task. Methods based on functional causal models can identify a unique graph, but either suffer from the curse of dimensionality or impose strong parametric assumptions. To address these challenges, we propose a novel hybrid approach for global causal discovery in observational data that leverages l… ▽ More Learning the unique directed acyclic graph corresponding to an unknown causal model is a challenging task. Methods based on functional causal models can identify a unique graph, but either suffer from the curse of dimensionality or impose strong parametric assumptions. To address these challenges, we propose a novel hybrid approach for global causal discovery in observational data that leverages local causal substructures. We first present a topological sorting algorithm that leverages ancestral relationships in linear structural equation models to establish a compact top-down hierarchical ordering, encoding more causal information than linear orderings produced by existing methods. We demonstrate that this approach generalizes to nonlinear settings with arbitrary noise. We then introduce a nonparametric constraint-based algorithm that prunes spurious edges by searching for local conditioning sets, achieving greater accuracy than current methods. We provide theoretical guarantees for correctness and worst-case polynomial time complexities, with empirical validation on synthetic data. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13019 [pdf, other]

A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models

Authors: Mahsa Khoshnoodi, Vinija Jain, Mingye Gao, Malavika Srikanth, Aman Chadha

Abstract: Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey… ▽ More Despite the crucial importance of accelerating text generation in large language models (LLMs) for efficiently producing content, the sequential nature of this process often leads to high inference latency, posing challenges for real-time applications. Various techniques have been proposed and developed to address these challenges and improve efficiency. This paper presents a comprehensive survey of accelerated generation techniques in autoregressive language models, aiming to understand the state-of-the-art methods and their applications. We categorize these techniques into several key areas: speculative decoding, early exiting mechanisms, and non-autoregressive methods. We discuss each category's underlying principles, advantages, limitations, and recent advancements. Through this survey, we aim to offer insights into the current landscape of techniques in LLMs and provide guidance for future research directions in this critical area of natural language processing. △ Less

Submitted 24 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.12850 [pdf, other]

Weakly supervised alignment and registration of MR-CT for cervical cancer radiotherapy

Authors: Jjahao Zhang, Yin Gu, Deyu Sun, Yuhua Gao, Ming Gao, Ming Cui, Teng Zhang, He Ma

Abstract: Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be use… ▽ More Cervical cancer is one of the leading causes of death in women, and brachytherapy is currently the primary treatment method. However, it is important to precisely define the extent of paracervical tissue invasion to improve cancer diagnosis and treatment options. The fusion of the information characteristics of both computed tomography (CT) and magnetic resonance imaging(MRI) modalities may be useful in achieving a precise outline of the extent of paracervical tissue invasion. Registration is the initial step in information fusion. However, when aligning multimodal images with varying depths, manual alignment is prone to large errors and is time-consuming. Furthermore, the variations in the size of the Region of Interest (ROI) and the shape of multimodal images pose a significant challenge for achieving accurate registration.In this paper, we propose a preliminary spatial alignment algorithm and a weakly supervised multimodal registration network. The spatial position alignment algorithm efficiently utilizes the limited annotation information in the two modal images provided by the doctor to automatically align multimodal images with varying depths. By utilizing aligned multimodal images for weakly supervised registration and incorporating pyramidal features and cost volume to estimate the optical flow, the results indicate that the proposed method outperforms traditional volume rendering alignment methods and registration networks in various evaluation metrics. This demonstrates the effectiveness of our model in multimodal image registration. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.09445 [pdf]

Revisiting first-principles thermodynamics by quasiharmonic approach: Application to study thermal expansion of additively-manufactured Inconel 625

Authors: Shun-Li Shang, Rushi Gong, Michael C. Gao, Darren C. Pagan, Zi-Kui Liu

Abstract: An innovative method is developed for accurate determination of thermodynamic properties as a function of temperature by revisiting the density functional theory (DFT) based quasiharmonic approach (QHA). The present methodology individually evaluates the contributions from static total energy, phonon, and thermal electron to free energy for increased efficiency and accuracy. The Akaike information… ▽ More An innovative method is developed for accurate determination of thermodynamic properties as a function of temperature by revisiting the density functional theory (DFT) based quasiharmonic approach (QHA). The present methodology individually evaluates the contributions from static total energy, phonon, and thermal electron to free energy for increased efficiency and accuracy. The Akaike information criterion with a correction (AICc) is used to select models and model parameters for fitting each contribution as a function of volume. Using the additively manufactured Inconel alloy 625 (IN625) as an example, predicted temperature-dependent linear coefficient of thermal expansion (CTE) agrees well with dilatometer measurements and values in the literature. Sensitivity and uncertainty are also analyzed for the predicted IN625 CTE due to different structural configurations used by DFT, and hence different equilibrium properties determined. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: This manuscript includes both the main text and the supplementary material, but without the supplementary Excel file

arXiv:2405.06932 [pdf, ps, other]

Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training

Authors: Junqin Huang, Zhongjie Hu, Zihao Jing, Mengya Gao, Yichao Wu

Abstract: In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension… ▽ More In this report, we introduce Piccolo2, an embedding model that surpasses other models in the comprehensive evaluation over 6 tasks on CMTEB benchmark, setting a new state-of-the-art. Piccolo2 primarily leverages an efficient multi-task hybrid loss training approach, effectively harnessing textual data and labels from diverse downstream tasks. In addition, Piccolo2 scales up the embedding dimension and uses MRL training to support more flexible vector dimensions. The latest information of piccolo models can be accessed via: https://huggingface.co/sensenova/ △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: tech report

arXiv:2405.05506 [pdf, other]

Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

Authors: Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, Danielle S. Bitterman

Abstract: Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence… ▽ More Large language models (LLMs) are increasingly essential in processing natural languages, yet their application is frequently compromised by biases and inaccuracies originating in their training data. In this study, we introduce Cross-Care, the first benchmark framework dedicated to assessing biases and real world knowledge in LLMs, specifically focusing on the representation of disease prevalence across diverse demographic groups. We systematically evaluate how demographic biases embedded in pre-training corpora like $ThePile$ influence the outputs of LLMs. We expose and quantify discrepancies by juxtaposing these biases against actual disease prevalences in various U.S. demographic groups. Our results highlight substantial misalignment between LLM representation of disease prevalence and real disease prevalence rates across demographic subgroups, indicating a pronounced risk of bias propagation and a lack of real-world grounding for medical applications of LLMs. Furthermore, we observe that various alignment methods minimally resolve inconsistencies in the models' representation of disease prevalence across different languages. For further exploration and analysis, we make all data and a data visualization tool available at: www.crosscare.net. △ Less

Submitted 24 June, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: Submitted for review, data visualization tool available at: www.crosscare.net

arXiv:2405.05505 [pdf, other]

Unveiling Higher-Order Topology via Polarized Topological Charges

Authors: Wei Jia, Bao-Zong Wang, Ming-Jian Gao, Jun-Hong An

Abstract: Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space cha… ▽ More Real-space topological invariants were widely used to characterize chiral-symmetric higher-order topological phases (HOTPs). However, a momentum-space characterization to these HOTPs, which essentially reveals their intrinsic bulk-boundary correspondence and facilitates their detection in quantum simulation systems, is still lacking. Here, we propose an experimentally observable momentum-space characterization to the chiral-symmetric HOTPs by the concept of polarized topological charges. It provides a unified description to topological phase transitions caused by the closing and reopening of band gap not only of the bulk states but also the edge states. Remarkably, these polarized topological charges can be identified by measuring the pseudospin structures. A feasible scheme to detect the HOTPs in the $^{87}$Rb cold atomic system is given. Our work opens an avenue for characterization and experimental detection of the chiral-symmetric HOTPs in momentum space. △ Less

Submitted 20 May, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

Comments: 8+8 pages, 4+3 figures. References are updated.Typos are corrected

arXiv:2405.04177 [pdf]

doi 10.1109/CICC60959.2024.10528987

A 49.8mm2 Fully Integrated, 1.5m Transmission-Range, High-Data-Rate IR-UWB Transmitter for Brain Implants

Authors: Cong Ding, Mingxiang Gao, Anja K. Skrivervik, Mahsa Shoaran

Abstract: To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of… ▽ More To address the challenge of extending the transmission range of implantable TXs while also minimizing their size and power consumption, this paper introduces a transcutaneous, high data-rate, fully integrated IR-UWB transmitter that employs a novel co-designed power amplifier (PA) and antenna interface for enhanced performance. With the co-designed interface, we achieved the smallest footprint of 49.8mm2 and the longest transmission range of 1.5m compared to the state-of-the-art IR-UWB TXs. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Journal ref: 2024 IEEE Custom Integrated Circuits Conference (CICC)

arXiv:2405.03435 [pdf, other]

doi 10.1088/2632-2153/ad3710

A method for quantifying the generalization capabilities of generative models for solving Ising models

Authors: Qunlong Ma, Zhi Ma, Ming Gao

Abstract: For Ising models with complex energy landscapes, whether the ground state can be found by neural networks depends heavily on the Hamming distance between the training datasets and the ground state. Despite the fact that various recently proposed generative models have shown good performance in solving Ising models, there is no adequate discussion on how to quantify their generalization capabilitie… ▽ More For Ising models with complex energy landscapes, whether the ground state can be found by neural networks depends heavily on the Hamming distance between the training datasets and the ground state. Despite the fact that various recently proposed generative models have shown good performance in solving Ising models, there is no adequate discussion on how to quantify their generalization capabilities. Here we design a Hamming distance regularizer in the framework of a class of generative models, variational autoregressive networks (VAN), to quantify the generalization capabilities of various network architectures combined with VAN. The regularizer can control the size of the overlaps between the ground state and the training datasets generated by networks, which, together with the success rates of finding the ground state, form a quantitative metric to quantify their generalization capabilities. We conduct numerical experiments on several prototypical network architectures combined with VAN, including feed-forward neural networks, recurrent neural networks, and graph neural networks, to quantify their generalization capabilities when solving Ising models. Moreover, considering the fact that the quantification of the generalization capabilities of networks on small-scale problems can be used to predict their relative performance on large-scale problems, our method is of great significance for assisting in the Neural Architecture Search field of searching for the optimal network architectures when solving large-scale Ising models. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 10 pages, 7 figures

Journal ref: Mach. Learn.: Sci. Technol. 5 (2024) 025011

arXiv:2404.14219 [pdf, other]

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered publicly available web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench). Moreover, we also introduce phi-3-vision, a 4.2 billion parameter model based on phi-3-mini with strong reasoning capabilities for image and text prompts. △ Less

Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: 19 pages

arXiv:2404.13946 [pdf, other]

Dual Model Replacement:invisible Multi-target Backdoor Attack based on Federal Learning

Authors: Rong Wang, Guichen Zhou, Mingjun Gao, Yunpeng Xiao

Abstract: In recent years, the neural network backdoor hidden in the parameters of the federated learning model has been proved to have great security risks. Considering the characteristics of trigger generation, data poisoning and model training in backdoor attack, this paper designs a backdoor attack method based on federated learning. Firstly, aiming at the concealment of the backdoor trigger, a TrojanGa… ▽ More In recent years, the neural network backdoor hidden in the parameters of the federated learning model has been proved to have great security risks. Considering the characteristics of trigger generation, data poisoning and model training in backdoor attack, this paper designs a backdoor attack method based on federated learning. Firstly, aiming at the concealment of the backdoor trigger, a TrojanGan steganography model with encoder-decoder structure is designed. The model can encode specific attack information as invisible noise and attach it to the image as a backdoor trigger, which improves the concealment and data transformations of the backdoor trigger.Secondly, aiming at the problem of single backdoor trigger mode, an image poisoning attack method called combination trigger attack is proposed. This method realizes multi-backdoor triggering by multiplexing combined triggers and improves the robustness of backdoor attacks. Finally, aiming at the problem that the local training mechanism leads to the decrease of the success rate of backdoor attack, a dual model replacement backdoor attack algorithm based on federated learning is designed. This method can improve the success rate of backdoor attack while maintaining the performance of the federated learning aggregation model. Experiments show that the attack strategy in this paper can not only achieve high backdoor concealment and diversification of trigger forms under federated learning, but also achieve good attack success rate in multi-target attacks.door concealment and diversification of trigger forms but also achieve good results in multi-target attacks. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.13941 [pdf, other]

Autoencoder-assisted Feature Ensemble Net for Incipient Faults

Authors: Mingxuan Gao, Min Wang, Maoyin Chen

Abstract: Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to… ▽ More Deep learning has shown the great power in the field of fault detection. However, for incipient faults with tiny amplitude, the detection performance of the current deep learning networks (DLNs) is not satisfactory. Even if prior information about the faults is utilized, DLNs can't successfully detect faults 3, 9 and 15 in Tennessee Eastman process (TEP). These faults are notoriously difficult to detect, lacking effective detection technologies in the field of fault detection. In this work, we propose Autoencoder-assisted Feature Ensemble Net (AE-FENet): a deep feature ensemble framework that uses the unsupervised autoencoder to conduct the feature transformation. Compared with the principle component analysis (PCA) technique adopted in the original Feature Ensemble Net (FENet), autoencoder can mine more exact features on incipient faults, which results in the better detection performance of AE-FENet. With same kinds of basic detectors, AE-FENet achieves a state-of-the-art average accuracy over 96% on faults 3, 9 and 15 in TEP, which represents a significant enhancement in performance compared to other methods. Plenty of experiments have been done to extend our framework, proving that DLNs can be utilized efficiently within this architecture. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12209 [pdf, other]

Partial-to-Partial Shape Matching with Geometric Consistency

Authors: Viktoria Ehm, Maolin Gao, Paul Roetzer, Marvin Eisenberger, Daniel Cremers, Florian Bernard

Abstract: Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. A prominent challenge are partial-to-partial shape matching settings, which occur when the shapes to match are only observed incompletely (e.g. from 3D scanning). Although partial-to-partial matching is a highly relevant setting in practice, it is rarely explored. Our work b… ▽ More Finding correspondences between 3D shapes is an important and long-standing problem in computer vision, graphics and beyond. A prominent challenge are partial-to-partial shape matching settings, which occur when the shapes to match are only observed incompletely (e.g. from 3D scanning). Although partial-to-partial matching is a highly relevant setting in practice, it is rarely explored. Our work bridges the gap between existing (rather artificial) 3D full shape matching and partial-to-partial real-world settings by exploiting geometric consistency as a strong constraint. We demonstrate that it is indeed possible to solve this challenging problem in a variety of settings. For the first time, we achieve geometric consistency for partial-to-partial matching, which is realized by a novel integer non-linear program formalism building on triangle product spaces, along with a new pruning algorithm based on linear integer programming. Further, we generate a new inter-class dataset for partial-to-partial shape-matching. We show that our method outperforms current SOTA methods on both an established intra-class dataset and our novel inter-class dataset. △ Less

Submitted 10 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.11129 [pdf, other]

Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales

Authors: Minghe Gao, Shuang Chen, Liang Pang, Yuan Yao, Jisheng Dang, Wenqiao Zhang, Juncheng Li, Siliang Tang, Yueting Zhuang, Tat-Seng Chua

Abstract: The remarkable performance of Multimodal Large Language Models (MLLMs) has unequivocally demonstrated their proficient understanding capabilities in handling a wide array of visual tasks. Nevertheless, the opaque nature of their black-box reasoning processes persists as an enigma, rendering them uninterpretable and struggling with hallucination. Their ability to execute intricate compositional rea… ▽ More The remarkable performance of Multimodal Large Language Models (MLLMs) has unequivocally demonstrated their proficient understanding capabilities in handling a wide array of visual tasks. Nevertheless, the opaque nature of their black-box reasoning processes persists as an enigma, rendering them uninterpretable and struggling with hallucination. Their ability to execute intricate compositional reasoning tasks is also constrained, culminating in a stagnation of learning progression for these models. In this work, we introduce Fact, a novel paradigm designed to generate multimodal rationales that are faithful, concise, and transferable for teaching MLLMs. This paradigm utilizes verifiable visual programming to generate executable code guaranteeing faithfulness and precision. Subsequently, through a series of operations including pruning, merging, and bridging, the rationale enhances its conciseness. Furthermore, we filter rationales that can be transferred to end-to-end paradigms from programming paradigms to guarantee transferability. Empirical evidence from experiments demonstrates the superiority of our method across models of varying parameter sizes, significantly enhancing their compositional reasoning and generalization ability. Our approach also reduces hallucinations owing to its high correlation between images and text. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.09436 [pdf]

Image Reconstruction with B0 Inhomogeneity using an Interpretable Deep Unrolled Network on an Open-bore MRI-Linac

Authors: Shanshan Shan, Yang Gao, David E. J. Waddington, Hongli Chen, Brendan Whelan, Paul Z. Y. Liu, Yaohui Wang, Chunyi Liu, Hongping Gan, Mingyuan Gao, Feng Liu

Abstract: MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free… ▽ More MRI-Linac systems require fast image reconstruction with high geometric fidelity to localize and track tumours for radiotherapy treatments. However, B0 field inhomogeneity distortions and slow MR acquisition potentially limit the quality of the image guidance and tumour treatments. In this study, we develop an interpretable unrolled network, referred to as RebinNet, to reconstruct distortion-free images from B0 inhomogeneity-corrupted k-space for fast MRI-guided radiotherapy applications. RebinNet includes convolutional neural network (CNN) blocks to perform image regularizations and nonuniform fast Fourier Transform (NUFFT) modules to incorporate B0 inhomogeneity information. The RebinNet was trained on a publicly available MR dataset from eleven healthy volunteers for both fully sampled and subsampled acquisitions. Grid phantom and human brain images acquired from an open-bore 1T MRI-Linac scanner were used to evaluate the performance of the proposed network. The RebinNet was compared with the conventional regularization algorithm and our recently developed UnUNet method in terms of root mean squared error (RMSE), structural similarity (SSIM), residual distortions, and computation time. Imaging results demonstrated that the RebinNet reconstructed images with lowest RMSE (<0.05) and highest SSIM (>0.92) at four-time acceleration for simulated brain images. The RebinNet could better preserve structural details and substantially improve the computational efficiency (ten-fold faster) compared to the conventional regularization methods, and had better generalization ability than the UnUNet method. The proposed RebinNet can achieve rapid image reconstruction and overcome the B0 inhomogeneity distortions simultaneously, which would facilitate accurate and fast image guidance in radiotherapy treatments. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.09226 [pdf, other]

Breast Cancer Image Classification Method Based on Deep Transfer Learning

Authors: Weimin Wang, Min Gao, Mingxuan Xiao, Xu Yan, Yufeng Li

Abstract: To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attenti… ▽ More To address the issues of limited samples, time-consuming feature design, and low accuracy in detection and classification of breast cancer pathological images, a breast cancer image classification model algorithm combining deep learning and transfer learning is proposed. This algorithm is based on the DenseNet structure of deep neural networks, and constructs a network model by introducing attention mechanisms, and trains the enhanced dataset using multi-level transfer learning. Experimental results demonstrate that the algorithm achieves an efficiency of over 84.0\% in the test set, with a significantly improved classification accuracy compared to previous models, making it applicable to medical breast cancer detection tasks. △ Less

Submitted 14 April, 2024; originally announced April 2024.

arXiv:2404.08713 [pdf, other]

Survival Prediction Across Diverse Cancer Types Using Neural Networks

Authors: Xu Yan, Weimin Wang, MingXuan Xiao, Yufeng Li, Min Gao

Abstract: Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival… ▽ More Gastric cancer and Colon adenocarcinoma represent widespread and challenging malignancies with high mortality rates and complex treatment landscapes. In response to the critical need for accurate prognosis in cancer patients, the medical community has embraced the 5-year survival rate as a vital metric for estimating patient outcomes. This study introduces a pioneering approach to enhance survival prediction models for gastric and Colon adenocarcinoma patients. Leveraging advanced image analysis techniques, we sliced whole slide images (WSI) of these cancers, extracting comprehensive features to capture nuanced tumor characteristics. Subsequently, we constructed patient-level graphs, encapsulating intricate spatial relationships within tumor tissues. These graphs served as inputs for a sophisticated 4-layer graph convolutional neural network (GCN), designed to exploit the inherent connectivity of the data for comprehensive analysis and prediction. By integrating patients' total survival time and survival status, we computed C-index values for gastric cancer and Colon adenocarcinoma, yielding 0.57 and 0.64, respectively. Significantly surpassing previous convolutional neural network models, these results underscore the efficacy of our approach in accurately predicting patient survival outcomes. This research holds profound implications for both the medical and AI communities, offering insights into cancer biology and progression while advancing personalized treatment strategies. Ultimately, our study represents a significant stride in leveraging AI-driven methodologies to revolutionize cancer prognosis and improve patient outcomes on a global scale. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.08279 [pdf, other]

Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example

Authors: MingXuan Xiao, Yufeng Li, Xu Yan, Min Gao, Weimin Wang

Abstract: Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast… ▽ More Breast cancer is a relatively common cancer among gynecological cancers. Its diagnosis often relies on the pathology of cells in the lesion. The pathological diagnosis of breast cancer not only requires professionals and time, but also sometimes involves subjective judgment. To address the challenges of dependence on pathologists expertise and the time-consuming nature of achieving accurate breast pathological image classification, this paper introduces an approach utilizing convolutional neural networks (CNNs) for the rapid categorization of pathological images, aiming to enhance the efficiency of breast pathological image detection. And the approach enables the rapid and automatic classification of pathological images into benign and malignant groups. The methodology involves utilizing a convolutional neural network (CNN) model leveraging the Inceptionv3 architecture and transfer learning algorithm for extracting features from pathological images. Utilizing a neural network with fully connected layers and employing the SoftMax function for image classification. Additionally, the concept of image partitioning is introduced to handle high-resolution images. To achieve the ultimate classification outcome, the classification probabilities of each image block are aggregated using three algorithms: summation, product, and maximum. Experimental validation was conducted on the BreaKHis public dataset, resulting in accuracy rates surpassing 0.92 across all four magnification coefficients (40X, 100X, 200X, and 400X). It demonstrates that the proposed method effectively enhances the accuracy in classifying pathological images of breast cancer. △ Less

Submitted 12 April, 2024; originally announced April 2024.

arXiv:2404.07471 [pdf, other]

Structure-aware Fine-tuning for Code Pre-trained Models

Authors: Jiayi Wu, Renyu Zhu, Nuo Chen, Qiushi Sun, Xiang Li, Ming Gao

Abstract: Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Stru… ▽ More Over the past few years, we have witnessed remarkable advancements in Code Pre-trained Models (CodePTMs). These models achieved excellent representation capabilities by designing structure-based pre-training tasks for code. However, how to enhance the absorption of structural knowledge when fine-tuning CodePTMs still remains a significant challenge. To fill this gap, in this paper, we present Structure-aware Fine-tuning (SAT), a novel structure-enhanced and plug-and-play fine-tuning method for CodePTMs. We first propose a structure loss to quantify the difference between the information learned by CodePTMs and the knowledge extracted from code structure. Specifically, we use the attention scores extracted from Transformer layer as the learned structural information, and the shortest path length between leaves in abstract syntax trees as the structural knowledge. Subsequently, multi-task learning is introduced to improve the performance of fine-tuning. Experiments conducted on four pre-trained models and two generation tasks demonstrate the effectiveness of our proposed method as a plug-and-play solution. Furthermore, we observed that SAT can benefit CodePTMs more with limited training data. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: Accepted by COLING 2024

arXiv:2404.06225 [pdf, other]

Message Passing Variational Autoregressive Network for Solving Intractable Ising Models

Authors: Qunlong Ma, Zhi Ma, Jinlong Xu, Hairui Zhang, Ming Gao

Abstract: Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Learning a probability distribution of energy configuration or finding the ground states of a disordered, fully connected Ising model is essential for statistical mechanics and NP-hard problems. Despite tremen… ▽ More Many deep neural networks have been used to solve Ising models, including autoregressive neural networks, convolutional neural networks, recurrent neural networks, and graph neural networks. Learning a probability distribution of energy configuration or finding the ground states of a disordered, fully connected Ising model is essential for statistical mechanics and NP-hard problems. Despite tremendous efforts, a neural network architecture with the ability to high-accurately solve these fully connected and extremely intractable problems on larger systems is still lacking. Here we propose a variational autoregressive architecture with a message passing mechanism, which can effectively utilize the interactions between spin variables. The new network trained under an annealing framework outperforms existing methods in solving several prototypical Ising spin Hamiltonians, especially for larger spin systems at low temperatures. The advantages also come from the great mitigation of mode collapse during the training process of deep neural networks. Considering these extremely difficult problems to be solved, our method extends the current computational limits of unsupervised neural networks to solve combinatorial optimization problems. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 18 pages, 14 figures

arXiv:2404.03999 [pdf, other]

Finsler-Laplace-Beltrami Operators with Application to Shape Analysis

Authors: Simon Weber, Thomas Dagès, Maolin Gao, Daniel Cremers

Abstract: The Laplace-Beltrami operator (LBO) emerges from studying manifolds equipped with a Riemannian metric. It is often called the Swiss army knife of geometry processing as it allows to capture intrinsic shape information and gives rise to heat diffusion, geodesic distances, and a multitude of shape descriptors. It also plays a central role in geometric deep learning. In this work, we explore Finsler… ▽ More The Laplace-Beltrami operator (LBO) emerges from studying manifolds equipped with a Riemannian metric. It is often called the Swiss army knife of geometry processing as it allows to capture intrinsic shape information and gives rise to heat diffusion, geodesic distances, and a multitude of shape descriptors. It also plays a central role in geometric deep learning. In this work, we explore Finsler manifolds as a generalization of Riemannian manifolds. We revisit the Finsler heat equation and derive a Finsler heat kernel and a Finsler-Laplace-Beltrami Operator (FLBO): a novel theoretically justified anisotropic Laplace-Beltrami operator (ALBO). In experimental evaluations we demonstrate that the proposed FLBO is a valuable alternative to the traditional Riemannian-based LBO and ALBOs for spatial filtering and shape correspondence estimation. We hope that the proposed Finsler heat kernel and the FLBO will inspire further exploration of Finsler geometry in the computer vision community. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.18196 [pdf, ps, other]

Looking Beyond What You See: An Empirical Analysis on Subgroup Intersectional Fairness for Multi-label Chest X-ray Classification Using Social Determinants of Racial Health Inequities

Authors: Dana Moukheiber, Saurabh Mahindre, Lama Moukheiber, Mira Moukheiber, Mingchen Gao

Abstract: There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest… ▽ More There has been significant progress in implementing deep learning models in disease diagnosis using chest X- rays. Despite these advancements, inherent biases in these models can lead to disparities in prediction accuracy across protected groups. In this study, we propose a framework to achieve accurate diagnostic outcomes and ensure fairness across intersectional groups in high-dimensional chest X- ray multi-label classification. Transcending traditional protected attributes, we consider complex interactions within social determinants, enabling a more granular benchmark and evaluation of fairness. We present a simple and robust method that involves retraining the last classification layer of pre-trained models using a balanced dataset across groups. Additionally, we account for fairness constraints and integrate class-balanced fine-tuning for multi-label settings. The evaluation of our method on the MIMIC-CXR dataset demonstrates that our framework achieves an optimal tradeoff between accuracy and fairness compared to baseline methods. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: ICCV CVAMD 2023

arXiv:2403.17670 [pdf, other]

A family of Chatterjee's correlation coefficients and their properties

Authors: Muhong Gao, Qizhai Li

Abstract: Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a… ▽ More Quantifying the strength of functional dependence between random scalars $X$ and $Y$ is an important statistical problem. While many existing correlation coefficients excel in identifying linear or monotone functional dependence, they fall short in capturing general non-monotone functional relationships. In response, we propose a family of correlation coefficients $ξ^{(h,F)}_n$, characterized by a continuous bivariate function $h$ and a cdf function $F$. By offering a range of selections for $h$ and $F$, $ξ^{(h,F)}_n$ encompasses a diverse class of novel correlation coefficients, while also incorporates the Chatterjee's correlation coefficient (Chatterjee, 2021) as a special case. We prove that $ξ^{(h,F)}_n$ converges almost surely to a deterministic limit $ξ^{(h,F)}$ as sample size $n$ approaches infinity. In addition, under appropriate conditions imposed on $h$ and $F$, the limit $ξ^{(h,F)}$ satisfies the three appealing properties: (P1). it belongs to the range of $[0,1]$; (P2). it equals 1 if and only if $Y$ is a measurable function of $X$; and (P3). it equals 0 if and only if $Y$ is independent of $X$. As amplified by our numerical experiments, our proposals provide practitioners with a variety of options to choose the most suitable correlation coefficient tailored to their specific practical needs. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 27 pages, 4 figures

MSC Class: 62H20; 62G05

arXiv:2403.14638 [pdf, other]

Personalized Programming Guidance based on Deep Programming Learning Style Capturing

Authors: Yingfan Liu, Renyu Zhu, Ming Gao

Abstract: With the rapid development of big data and AI technology, programming is in high demand and has become an essential skill for students. Meanwhile, researchers also focus on boosting the online judging system's guidance ability to reduce students' dropout rates. Previous studies mainly targeted at enhancing learner engagement on online platforms by providing personalized recommendations. However, t… ▽ More With the rapid development of big data and AI technology, programming is in high demand and has become an essential skill for students. Meanwhile, researchers also focus on boosting the online judging system's guidance ability to reduce students' dropout rates. Previous studies mainly targeted at enhancing learner engagement on online platforms by providing personalized recommendations. However, two significant challenges still need to be addressed in programming: C1) how to recognize complex programming behaviors; C2) how to capture intrinsic learning patterns that align with the actual learning process. To fill these gaps, in this paper, we propose a novel model called Programming Exercise Recommender with Learning Style (PERS), which simulates learners' intricate programming behaviors. Specifically, since programming is an iterative and trial-and-error process, we first introduce a positional encoding and a differentiating module to capture the changes of consecutive code submissions (which addresses C1). To better profile programming behaviors, we extend the Felder-Silverman learning style model, a classical pedagogical theory, to perceive intrinsic programming patterns. Based on this, we align three latent vectors to record and update programming ability, processing style, and understanding style, respectively (which addresses C2). We perform extensive experiments on two real-world datasets to verify the rationality of modeling programming learning styles and the effectiveness of PERS for personalized programming guidance. △ Less

Submitted 20 February, 2024; originally announced March 2024.

Comments: 18th International Conference on Computer Science & Education

arXiv:2403.14400 [pdf, ps, other]

Absence of phonon-mediated superconductivity in La$_3$Ni$_2$O$_7$ under pressure

Authors: Zhenfeng Ouyang, Miao Gao, Zhong-Yi Lu

Abstract: A recent experimental study announced the emergence of superconductivity in La$_3$Ni$_2$O$_7$ under pressure, with the highest observed superconducting transition temperature ($T_c$) reaching approximately 80 K beyond 14 GPa. While extensive studies have been devoted to the electronic correlations and potential superconducting pairing mechanisms, there lack investigations into the phonon propertie… ▽ More A recent experimental study announced the emergence of superconductivity in La$_3$Ni$_2$O$_7$ under pressure, with the highest observed superconducting transition temperature ($T_c$) reaching approximately 80 K beyond 14 GPa. While extensive studies have been devoted to the electronic correlations and potential superconducting pairing mechanisms, there lack investigations into the phonon properties and electron phonon coupling. Using density functional theory in conjunction with Wannier interpolation techniques, we study the phonon properties and electron phonon interactions in La$_3$Ni$_2$O$_7$ under 29.5 GPa. Our findings reveal that the electron phonon coupling is insufficient to solely explain the observed high superconducting $T_c$ $\sim$ 80 K in La$_3$Ni$_2$O$_7$. And the calculated strong Fermi surface nesting may explain the experimental observed charge density wave transition in La$_3$Ni$_2$O$_7$. Our calculations substantiate La$_3$Ni$_2$O$_7$ is an unconventional superconductor. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: 5 pages, 4 figures

arXiv:2403.14023 [pdf]

A system capable of verifiably and privately screening global DNA synthesis

Authors: Carsten Baum, Jens Berlips, Walther Chen, Hongrui Cui, Ivan Damgard, Jiangbin Dong, Kevin M. Esvelt, Mingyu Gao, Dana Gretton, Leonard Foner, Martin Kysel, Kaiyi Zhang, Juanru Li, Xiang Li, Omer Paneth, Ronald L. Rivest, Francesca Sage-Ling, Adi Shamir, Yue Shen, Meicen Sun, Vinod Vaikuntanathan, Lynn Van Hauwe, Theia Vogel, Benjamin Weinstein-Raun, Yun Wang , et al. (5 additional authors not shown)

Abstract: Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't n… ▽ More Printing custom DNA sequences is essential to scientific and biomedical research, but the technology can be used to manufacture plagues as well as cures. Just as ink printers recognize and reject attempts to counterfeit money, DNA synthesizers and assemblers should deny unauthorized requests to make viral DNA that could be used to ignite a pandemic. There are three complications. First, we don't need to quickly update printers to deal with newly discovered currencies, whereas we regularly learn of new viruses and other biological threats. Second, anti-counterfeiting specifications on a local printer can't be extracted and misused by malicious actors, unlike information on biological threats. Finally, any screening must keep the inspected DNA sequences private, as they may constitute valuable trade secrets. Here we describe SecureDNA, a free, privacy-preserving, and fully automated system capable of verifiably screening all DNA synthesis orders of 30+ base pairs against an up-to-date database of hazards, and its operational performance and specificity when applied to 67 million base pairs of DNA synthesized by providers in the United States, Europe, and China. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Main text 10 pages, 4 figures. 5 supplementary figures. Total 21 pages. Direct correspondence to: Ivan B. Damgard (ivan@cs.au.dk), Andrew C. Yao (andrewcyao@mail.tsinghua.edu.cn), Kevin M. Esvelt (esvelt@mit.edu)

Showing 1–50 of 519 results for author: Gao, M