subscribe to arXiv mailings

Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models

Authors: Zun Wang, Chang Liu, Nianlong Zou, He Zhang, Xinran Wei, Lin Huang, Lijun Wu, Bin Shao

Abstract: In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians. The DEQH model inherently captures the self-consistency nature of Hamiltonian, a critical aspect often overlooked by traditional machine learning app… ▽ More In this study, we introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model, which incorporates Deep Equilibrium Models (DEQs) for predicting Density Functional Theory (DFT) Hamiltonians. The DEQH model inherently captures the self-consistency nature of Hamiltonian, a critical aspect often overlooked by traditional machine learning approaches for Hamiltonian prediction. By employing DEQ within our model architecture, we circumvent the need for DFT calculations during the training phase to introduce the Hamiltonian's self-consistency, thus addressing computational bottlenecks associated with large or complex systems. We propose a versatile framework that combines DEQ with off-the-shelf machine learning models for predicting Hamiltonians. When benchmarked on the MD17 and QH9 datasets, DEQHNet, an instantiation of the DEQH framework, has demonstrated a significant improvement in prediction accuracy. Beyond a predictor, the DEQH model is a Hamiltonian solver, in the sense that it uses the fixed-point solving capability of the deep equilibrium model to iteratively solve for the Hamiltonian. Ablation studies of DEQHNet further elucidate the network's effectiveness, offering insights into the potential of DEQ-integrated networks for Hamiltonian learning. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2404.08674 [pdf, other]

Effects of Different Prompts on the Quality of GPT-4 Responses to Dementia Care Questions

Authors: Zhuochun Li, Bo Xie, Robin Hilsabeck, Alyssa Aguirre, Ning Zou, Zhimeng Luo, Daqing He

Abstract: Evidence suggests that different prompts lead large language models (LLMs) to generate responses with varying quality. Yet, little is known about prompts' effects on response quality in healthcare domains. In this exploratory study, we address this gap, focusing on a specific healthcare domain: dementia caregiving. We first developed an innovative prompt template with three components: (1) system… ▽ More Evidence suggests that different prompts lead large language models (LLMs) to generate responses with varying quality. Yet, little is known about prompts' effects on response quality in healthcare domains. In this exploratory study, we address this gap, focusing on a specific healthcare domain: dementia caregiving. We first developed an innovative prompt template with three components: (1) system prompts (SPs) featuring 4 different roles; (2) an initialization prompt; and (3) task prompts (TPs) specifying different levels of details, totaling 12 prompt combinations. Next, we selected 3 social media posts containing complicated, real-world questions about dementia caregivers' challenges in 3 areas: memory loss and confusion, aggression, and driving. We then entered these posts into GPT-4, with our 12 prompts, to generate 12 responses per post, totaling 36 responses. We compared the word count of the 36 responses to explore potential differences in response length. Two experienced dementia care clinicians on our team assessed the response quality using a rating scale with 5 quality indicators: factual, interpretation, application, synthesis, and comprehensiveness (scoring range: 0-5; higher scores indicate higher quality). △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2312.12369 [pdf, other]

Chasing Fairness in Graphs: A GNN Architecture Perspective

Authors: Zhimeng Jiang, Xiaotian Han, Chao Fan, Zirui Liu, Na Zou, Ali Mostafavi, Xia Hu

Abstract: There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., reg… ▽ More There has been significant progress in improving the performance of graph neural networks (GNNs) through enhancements in graph data, model architecture design, and training strategies. For fairness in graphs, recent studies achieve fair representations and predictions through either graph data pre-processing (e.g., node feature masking, and topology rewiring) or fair training strategies (e.g., regularization, adversarial debiasing, and fair contrastive learning). How to achieve fairness in graphs from the model architecture perspective is less explored. More importantly, GNNs exhibit worse fairness performance compared to multilayer perception since their model architecture (i.e., neighbor aggregation) amplifies biases. To this end, we aim to achieve fairness via a new GNN architecture. We propose \textsf{F}air \textsf{M}essage \textsf{P}assing (FMP) designed within a unified optimization framework for GNNs. Notably, FMP \textit{explicitly} renders sensitive attribute usage in \textit{forward propagation} for node classification task using cross-entropy loss without data pre-processing. In FMP, the aggregation is first adopted to utilize neighbors' information and then the bias mitigation step explicitly pushes demographic group node presentation centers together. In this way, FMP scheme can aggregate useful information from neighbors and mitigate bias to achieve better fairness and prediction tradeoff performance. Experiments on node classification tasks demonstrate that the proposed FMP outperforms several baselines in terms of fairness and accuracy on three real-world datasets. The code is available in {\url{https://github.com/zhimengj0326/FMP}}. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI Conference on Artificial Intelligence (AAAI) 2024. arXiv admin note: substantial text overlap with arXiv:2202.04187

arXiv:2310.14527 [pdf, other]

Marginal Nodes Matter: Towards Structure Fairness in Graphs

Authors: Xiaotian Han, Kaixiong Zhou, Ting-Hsiang Wang, Jundong Li, Fei Wang, Na Zou

Abstract: In social network, a person located at the periphery region (marginal node) is likely to be treated unfairly when compared with the persons at the center. While existing fairness works on graphs mainly focus on protecting sensitive attributes (e.g., age and gender), the fairness incurred by the graph structure should also be given attention. On the other hand, the information aggregation mechanism… ▽ More In social network, a person located at the periphery region (marginal node) is likely to be treated unfairly when compared with the persons at the center. While existing fairness works on graphs mainly focus on protecting sensitive attributes (e.g., age and gender), the fairness incurred by the graph structure should also be given attention. On the other hand, the information aggregation mechanism of graph neural networks amplifies such structure unfairness, as marginal nodes are often far away from other nodes. In this paper, we focus on novel fairness incurred by the graph structure on graph neural networks, named \emph{structure fairness}. Specifically, we first analyzed multiple graphs and observed that marginal nodes in graphs have a worse performance of downstream tasks than others in graph neural networks. Motivated by the observation, we propose \textbf{S}tructural \textbf{Fair} \textbf{G}raph \textbf{N}eural \textbf{N}etwork (SFairGNN), which combines neighborhood expansion based structure debiasing with hop-aware attentive information aggregation to achieve structure fairness. Our experiments show \SFairGNN can significantly improve structure fairness while maintaining overall performance in the downstream tasks. △ Less

Submitted 22 October, 2023; originally announced October 2023.

Comments: SIGKDD Explorations (To Appear)

arXiv:2310.08772 [pdf, other]

Investigating the Robustness and Properties of Detection Transformers (DETR) Toward Difficult Images

Authors: Zhao Ning Zou, Yuhang Zhang, Robert Wijaya

Abstract: Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks, ultimately in object detection. This detector is based on a self-attention mechanism along with the transformer encoder-decoder architecture to capture the global context in the image. The critical issue to be addressed is how this model architecture can handle different image nuisances, such… ▽ More Transformer-based object detectors (DETR) have shown significant performance across machine vision tasks, ultimately in object detection. This detector is based on a self-attention mechanism along with the transformer encoder-decoder architecture to capture the global context in the image. The critical issue to be addressed is how this model architecture can handle different image nuisances, such as occlusion and adversarial perturbations. We studied this issue by measuring the performance of DETR with different experiments and benchmarking the network with convolutional neural network (CNN) based detectors like YOLO and Faster-RCNN. We found that DETR performs well when it comes to resistance to interference from information loss in occlusion images. Despite that, we found that the adversarial stickers put on the image require the network to produce a new unnecessary set of keys, queries, and values, which in most cases, results in a misdirection of the network. DETR also performed poorer than YOLOv5 in the image corruption benchmark. Furthermore, we found that DETR depends heavily on the main query when making a prediction, which leads to imbalanced contributions between queries since the main query receives most of the gradient flow. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.01508 [pdf, other]

CODA: Temporal Domain Generalization via Concept Drift Simulator

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Zhimeng Jiang, Kwei-Herng Lai, Anxiao Jiang, Na Zou

Abstract: In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized predicti… ▽ More In real-world applications, machine learning models often become obsolete due to shifts in the joint distribution arising from underlying temporal trends, a phenomenon known as the "concept drift". Existing works propose model-specific strategies to achieve temporal generalization in the near-future domain. However, the diverse characteristics of real-world datasets necessitate customized prediction model architectures. To this end, there is an urgent demand for a model-agnostic temporal domain generalization approach that maintains generality across diverse data modalities and architectures. In this work, we aim to address the concept drift problem from a data-centric perspective to bypass considering the interaction between data and model. Developing such a framework presents non-trivial challenges: (i) existing generative models struggle to generate out-of-distribution future data, and (ii) precisely capturing the temporal trends of joint distribution along chronological source domains is computationally infeasible. To tackle the challenges, we propose the COncept Drift simulAtor (CODA) framework incorporating a predicted feature correlation matrix to simulate future data for model training. Specifically, CODA leverages feature correlations to represent data characteristics at specific time points, thereby circumventing the daunting computational costs. Experimental results demonstrate that using CODA-generated data as training input effectively achieves temporal domain generalization across different model architectures. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.13292 [pdf, other]

Beyond Fairness: Age-Harmless Parkinson's Detection via Voice

Authors: Yicheng Wang, Xiaotian Han, Leisheng Yu, Na Zou

Abstract: Parkinson's disease (PD), a neurodegenerative disorder, often manifests as speech and voice dysfunction. While utilizing voice data for PD detection has great potential in clinical applications, the widely used deep learning models currently have fairness issues regarding different ages of onset. These deep models perform well for the elderly group (age $>$ 55) but are less accurate for the young… ▽ More Parkinson's disease (PD), a neurodegenerative disorder, often manifests as speech and voice dysfunction. While utilizing voice data for PD detection has great potential in clinical applications, the widely used deep learning models currently have fairness issues regarding different ages of onset. These deep models perform well for the elderly group (age $>$ 55) but are less accurate for the young group (age $\leq$ 55). Through our investigation, the discrepancy between the elderly and the young arises due to 1) an imbalanced dataset and 2) the milder symptoms often seen in early-onset patients. However, traditional debiasing methods are impractical as they typically impair the prediction accuracy for the majority group while minimizing the discrepancy. To address this issue, we present a new debiasing method using GradCAM-based feature masking combined with ensemble models, ensuring that neither fairness nor accuracy is compromised. Specifically, the GradCAM-based feature masking selectively obscures age-related features in the input voice data while preserving essential information for PD detection. The ensemble models further improve the prediction accuracy for the minority (young group). Our approach effectively improves detection accuracy for early-onset patients without sacrificing performance for the elderly group. Additionally, we propose a two-step detection strategy for the young group, offering a practical risk assessment for potential early-onset PD patients. △ Less

Submitted 23 September, 2023; originally announced September 2023.

arXiv:2307.07181 [pdf, other]

DISPEL: Domain Generalization via Domain-Specific Liberating

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Guanchu Wang, Mengnan Du, Na Zou

Abstract: Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categor… ▽ More Domain generalization aims to learn a generalization model that can perform well on unseen test domains by only training on limited source domains. However, existing domain generalization approaches often bring in prediction-irrelevant noise or require the collection of domain labels. To address these challenges, we consider the domain generalization problem from a different perspective by categorizing underlying feature groups into domain-shared and domain-specific features. Nevertheless, the domain-specific features are difficult to be identified and distinguished from the input data. In this work, we propose DomaIn-SPEcific Liberating (DISPEL), a post-processing fine-grained masking approach that can filter out undefined and indistinguishable domain-specific features in the embedding space. Specifically, DISPEL utilizes a mask generator that produces a unique mask for each input data to filter domain-specific features. The DISPEL framework is highly flexible to be applied to any fine-tuned models. We derive a generalization error bound to guarantee the generalization performance by optimizing a designed objective loss. The experimental results on five benchmarks demonstrate DISPEL outperforms existing methods and can further generalize various algorithms. △ Less

Submitted 31 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.04105 [pdf, other]

Towards Assumption-free Bias Mitigation

Authors: Chia-Yuan Chang, Yu-Neng Chuang, Kwei-Herng Lai, Xiaotian Han, Xia Hu, Na Zou

Abstract: Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missin… ▽ More Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2306.09468 [pdf, other]

FFB: A Fair Fairness Benchmark for In-Processing Group Fairness Methods

Authors: Xiaotian Han, Jianfeng Chi, Yu Chen, Qifan Wang, Han Zhao, Na Zou, Xia Hu

Abstract: This paper introduces the Fair Fairness Benchmark (\textsf{FFB}), a benchmarking framework for in-processing group fairness methods. Ensuring fairness in machine learning is important for ethical compliance. However, there exist challenges in comparing and developing fairness methods due to inconsistencies in experimental settings, lack of accessible algorithmic implementations, and limited extens… ▽ More This paper introduces the Fair Fairness Benchmark (\textsf{FFB}), a benchmarking framework for in-processing group fairness methods. Ensuring fairness in machine learning is important for ethical compliance. However, there exist challenges in comparing and developing fairness methods due to inconsistencies in experimental settings, lack of accessible algorithmic implementations, and limited extensibility of current fairness packages and tools. To address these issues, we introduce an open-source standardized benchmark for evaluating in-processing group fairness methods and provide a comprehensive analysis of state-of-the-art methods to ensure different notions of group fairness. This work offers the following key contributions: the provision of flexible, extensible, minimalistic, and research-oriented open-source code; the establishment of unified fairness method benchmarking pipelines; and extensive benchmarking, which yields key insights from $\mathbf{45,079}$ experiments, $\mathbf{14,428}$ GPU hours. We believe that our work will significantly facilitate the growth and development of the fairness research community. △ Less

Submitted 10 June, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

Comments: ICLR2024

arXiv:2306.06788 [pdf, other]

Graph Mixup with Soft Alignments

Authors: Hongyi Ling, Zhimeng Jiang, Meng Liu, Shuiwang Ji, Na Zou

Abstract: We study graph data augmentation by mixup, which has been used successfully on images. A key operation of mixup is to compute a convex combination of a pair of inputs. This operation is straightforward for grid-like data, such as images, but challenging for graph data. The key difficulty lies in the fact that different graphs typically have different numbers of nodes, and thus there lacks a node-l… ▽ More We study graph data augmentation by mixup, which has been used successfully on images. A key operation of mixup is to compute a convex combination of a pair of inputs. This operation is straightforward for grid-like data, such as images, but challenging for graph data. The key difficulty lies in the fact that different graphs typically have different numbers of nodes, and thus there lacks a node-level correspondence between graphs. In this work, we propose S-Mixup, a simple yet effective mixup method for graph classification by soft alignments. Specifically, given a pair of graphs, we explicitly obtain node-level correspondence via computing a soft assignment matrix to match the nodes between two graphs. Based on the soft assignments, we transform the adjacency and node feature matrices of one graph, so that the transformed graph is aligned with the other graph. In this way, any pair of graphs can be mixed directly to generate an augmented graph. We conduct systematic experiments to show that S-Mixup can improve the performance and generalization of graph neural networks (GNNs) on various graph classification tasks. In addition, we show that S-Mixup can increase the robustness of GNNs against noisy labels. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2304.00012 [pdf, other]

Multi-Task Learning for Post-transplant Cause of Death Analysis: A Case Study on Liver Transplant

Authors: Sirui Ding, Qiaoyu Tan, Chia-yuan Chang, Na Zou, Kai Zhang, Nathan R. Hoot, Xiaoqian Jiang, Xia Hu

Abstract: Organ transplant is the essential treatment method for some end-stage diseases, such as liver failure. Analyzing the post-transplant cause of death (CoD) after organ transplant provides a powerful tool for clinical decision making, including personalized treatment and organ allocation. However, traditional methods like Model for End-stage Liver Disease (MELD) score and conventional machine learnin… ▽ More Organ transplant is the essential treatment method for some end-stage diseases, such as liver failure. Analyzing the post-transplant cause of death (CoD) after organ transplant provides a powerful tool for clinical decision making, including personalized treatment and organ allocation. However, traditional methods like Model for End-stage Liver Disease (MELD) score and conventional machine learning (ML) methods are limited in CoD analysis due to two major data and model-related challenges. To address this, we propose a novel framework called CoD-MTL leveraging multi-task learning to model the semantic relationships between various CoD prediction tasks jointly. Specifically, we develop a novel tree distillation strategy for multi-task learning, which combines the strength of both the tree model and multi-task learning. Experimental results are presented to show the precise and reliable CoD predictions of our framework. A case study is conducted to demonstrate the clinical importance of our method in the liver transplant. △ Less

Submitted 5 October, 2023; v1 submitted 29 March, 2023; originally announced April 2023.

Comments: AMIA Annual Symposium 2023 Best Student Paper Finalist

arXiv:2303.13790 [pdf, other]

Towards Fair Patient-Trial Matching via Patient-Criterion Level Fairness Constraint

Authors: Chia-Yuan Chang, Jiayi Yuan, Sirui Ding, Qiaoyu Tan, Kai Zhang, Xiaoqian Jiang, Xia Hu, Na Zou

Abstract: Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discre… ▽ More Clinical trials are indispensable in developing new treatments, but they face obstacles in patient recruitment and retention, hindering the enrollment of necessary participants. To tackle these challenges, deep learning frameworks have been created to match patients to trials. These frameworks calculate the similarity between patients and clinical trial eligibility criteria, considering the discrepancy between inclusion and exclusion criteria. Recent studies have shown that these frameworks outperform earlier approaches. However, deep learning models may raise fairness issues in patient-trial matching when certain sensitive groups of individuals are underrepresented in clinical trials, leading to incomplete or inaccurate data and potential harm. To tackle the issue of fairness, this work proposes a fair patient-trial matching framework by generating a patient-criterion level fairness constraint. The proposed framework considers the inconsistency between the embedding of inclusion and exclusion criteria among patients of different sensitive groups. The experimental results on real-world patient-trial and patient-criterion matching tasks demonstrate that the proposed framework can successfully alleviate the predictions that tend to be biased. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.10794 [pdf, other]

PheME: A deep ensemble framework for improving phenotype prediction from multi-modal data

Authors: Shenghan Zhang, Haoxuan Li, Ruixiang Tang, Sirui Ding, Laila Rasmy, Degui Zhi, Na Zou, Xia Hu

Abstract: Detailed phenotype information is fundamental to accurate diagnosis and risk estimation of diseases. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from the heterogeneous EHR data remains a challenge. In this work, we present PheME, an Ensemble framework… ▽ More Detailed phenotype information is fundamental to accurate diagnosis and risk estimation of diseases. As a rich source of phenotype information, electronic health records (EHRs) promise to empower diagnostic variant interpretation. However, how to accurately and efficiently extract phenotypes from the heterogeneous EHR data remains a challenge. In this work, we present PheME, an Ensemble framework using Multi-modality data of structured EHRs and unstructured clinical notes for accurate Phenotype prediction. Firstly, we employ multiple deep neural networks to learn reliable representations from the sparse structured EHR data and redundant clinical notes. A multi-modal model then aligns multi-modal features onto the same latent space to predict phenotypes. Secondly, we leverage ensemble learning to combine outputs from single-modal models and multi-modal models to improve phenotype predictions. We choose seven diseases to evaluate the phenotyping performance of the proposed framework. Experimental results show that using multi-modal data significantly improves phenotype prediction in all diseases, the proposed ensemble learning framework can further boost the performance. △ Less

Submitted 26 April, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

arXiv:2303.03300 [pdf, other]

Chasing Fairness Under Distribution Shift: A Model Weight Perturbation Approach

Authors: Zhimeng Jiang, Xiaotian Han, Hongye Jin, Guanchu Wang, Rui Chen, Na Zou, Xia Hu

Abstract: Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient co… ▽ More Fairness in machine learning has attracted increasing attention in recent years. The fairness methods improving algorithmic fairness for in-distribution data may not perform well under distribution shifts. In this paper, we first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation. Subsequently, we analyze the sufficient conditions to guarantee fairness (i.e., low demographic parity) for the target dataset, including fairness for the source dataset, and low prediction difference between the source and target datasets for each sensitive attribute group. Motivated by these sufficient conditions, we propose robust fairness regularization (RFR) by considering the worst case within the model weight perturbation ball for each sensitive attribute group. We evaluate the effectiveness of our proposed RFR algorithm on synthetic and real distribution shifts across various datasets. Experimental results demonstrate that RFR achieves better fairness-accuracy trade-off performance compared with several baselines. The source code is available at \url{https://github.com/zhimengj0326/RFR_NeurIPS23}. △ Less

Submitted 21 October, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: NeurIPS 2023

arXiv:2303.01506 [pdf, other]

Understanding and Unifying Fourteen Attribution Methods with Taylor Interactions

Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Ziwei Yang, Zheyang Li, Quanshi Zhang

Abstract: Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this… ▽ More Various attribution methods have been developed to explain deep neural networks (DNNs) by inferring the attribution/importance/contribution score of each input variable to the final output. However, existing attribution methods are often built upon different heuristics. There remains a lack of a unified theoretical understanding of why these methods are effective and how they are related. To this end, for the first time, we formulate core mechanisms of fourteen attribution methods, which were designed on different heuristics, into the same mathematical system, i.e., the system of Taylor interactions. Specifically, we prove that attribution scores estimated by fourteen attribution methods can all be reformulated as the weighted sum of two types of effects, i.e., independent effects of each individual input variable and interaction effects between input variables. The essential difference among the fourteen attribution methods mainly lies in the weights of allocating different effects. Based on the above findings, we propose three principles for a fair allocation of effects to evaluate the faithfulness of the fourteen attribution methods. △ Less

Submitted 5 March, 2023; v1 submitted 1 March, 2023; originally announced March 2023.

arXiv:2302.09400 [pdf, other]

Fairly Predicting Graft Failure in Liver Transplant for Organ Assigning

Authors: Sirui Ding, Ruixiang Tang, Daochen Zha, Na Zou, Kai Zhang, Xiaoqian Jiang, Xia Hu

Abstract: Liver transplant is an essential therapy performed for severe liver diseases. The fact of scarce liver resources makes the organ assigning crucial. Model for End-stage Liver Disease (MELD) score is a widely adopted criterion when making organ distribution decisions. However, it ignores post-transplant outcomes and organ/donor features. These limitations motivate the emergence of machine learning (… ▽ More Liver transplant is an essential therapy performed for severe liver diseases. The fact of scarce liver resources makes the organ assigning crucial. Model for End-stage Liver Disease (MELD) score is a widely adopted criterion when making organ distribution decisions. However, it ignores post-transplant outcomes and organ/donor features. These limitations motivate the emergence of machine learning (ML) models. Unfortunately, ML models could be unfair and trigger bias against certain groups of people. To tackle this problem, this work proposes a fair machine learning framework targeting graft failure prediction in liver transplant. Specifically, knowledge distillation is employed to handle dense and sparse features by combining the advantages of tree models and neural networks. A two-step debiasing method is tailored for this framework to enhance fairness. Experiments are conducted to analyze unfairness issues in existing models and demonstrate the superiority of our method in both prediction and fairness performance. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: AMIA Symposium 2022 Best Student Paper Finalist

arXiv:2301.13443 [pdf, other]

Retiring $Δ$DP: New Distribution-Level Metrics for Demographic Parity

Authors: Xiaotian Han, Zhimeng Jiang, Hongye Jin, Zirui Liu, Na Zou, Qifan Wang, Xia Hu

Abstract: Demographic parity is the most widely recognized measure of group fairness in machine learning, which ensures equal treatment of different demographic groups. Numerous works aim to achieve demographic parity by pursuing the commonly used metric $ΔDP$. Unfortunately, in this paper, we reveal that the fairness metric $ΔDP$ can not precisely measure the violation of demographic parity, because it inh… ▽ More Demographic parity is the most widely recognized measure of group fairness in machine learning, which ensures equal treatment of different demographic groups. Numerous works aim to achieve demographic parity by pursuing the commonly used metric $ΔDP$. Unfortunately, in this paper, we reveal that the fairness metric $ΔDP$ can not precisely measure the violation of demographic parity, because it inherently has the following drawbacks: i) zero-value $ΔDP$ does not guarantee zero violation of demographic parity, ii) $ΔDP$ values can vary with different classification thresholds. To this end, we propose two new fairness metrics, Area Between Probability density function Curves (ABPC) and Area Between Cumulative density function Curves (ABCC), to precisely measure the violation of demographic parity at the distribution level. The new fairness metrics directly measure the difference between the distributions of the prediction probability for different demographic groups. Thus our proposed new metrics enjoy: i) zero-value ABCC/ABPC guarantees zero violation of demographic parity; ii) ABCC/ABPC guarantees demographic parity while the classification thresholds are adjusted. We further re-evaluate the existing fair models with our proposed fairness metrics and observe different fairness behaviors of those models under the new metrics. The code is available at https://github.com/ahxt/new_metric_for_demographic_parity △ Less

Submitted 9 June, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Accepted by TMLR. Code available at https://github.com/ahxt/new_metric_for_demographic_parity

arXiv:2301.01150 [pdf, other]

RELIANT: Fair Knowledge Distillation for Graph Neural Networks

Authors: Yushun Dong, Binchi Zhang, Yiling Yuan, Na Zou, Qi Wang, Jundong Li

Abstract: Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation… ▽ More Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility. △ Less

Submitted 4 January, 2023; v1 submitted 3 January, 2023; originally announced January 2023.

Comments: Published as a conference paper in SDM 2023

arXiv:2211.14489 [pdf, other]

Mitigating Relational Bias on Knowledge Graphs

Authors: Yu-Neng Chuang, Kwei-Herng Lai, Ruixiang Tang, Mengnan Du, Chia-Yuan Chang, Na Zou, Xia Hu

Abstract: Knowledge graph data are prevalent in real-world applications, and knowledge graph neural networks (KGNNs) are essential techniques for knowledge graph representation learning. Although KGNN effectively models the structural information from knowledge graphs, these frameworks amplify the underlying data bias that leads to discrimination towards certain groups or individuals in resulting applicatio… ▽ More Knowledge graph data are prevalent in real-world applications, and knowledge graph neural networks (KGNNs) are essential techniques for knowledge graph representation learning. Although KGNN effectively models the structural information from knowledge graphs, these frameworks amplify the underlying data bias that leads to discrimination towards certain groups or individuals in resulting applications. Additionally, as existing debiasing approaches mainly focus on the entity-wise bias, eliminating the multi-hop relational bias that pervasively exists in knowledge graphs remains an open question. However, it is very challenging to eliminate relational bias due to the sparsity of the paths that generate the bias and the non-linear proximity structure of knowledge graphs. To tackle the challenges, we propose Fair-KGNN, a KGNN framework that simultaneously alleviates multi-hop bias and preserves the proximity information of entity-to-relation in knowledge graphs. The proposed framework is generalizable to mitigate the relational bias for all types of KGNN. We develop two instances of Fair-KGNN incorporating with two state-of-the-art KGNN models, RGCN and CompGCN, to mitigate gender-occupation and nationality-salary bias. The experiments carried out on three benchmark knowledge graph datasets demonstrate that the Fair-KGNN can effectively mitigate unfair situations during representation learning while preserving the predictive performance of KGNN models. △ Less

Submitted 3 December, 2022; v1 submitted 26 November, 2022; originally announced November 2022.

arXiv:2208.12433 [pdf, other]

Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning

Authors: Daochen Zha, Kwei-Herng Lai, Qiaoyu Tan, Sirui Ding, Na Zou, Xia Hu

Abstract: Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we ma… ▽ More Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. While numerous over-sampling algorithms have been proposed, they heavily rely on heuristics, which could be sub-optimal since we may need different sampling strategies for different datasets and base classifiers, and they cannot directly optimize the performance metric. Motivated by this, we investigate developing a learning-based over-sampling algorithm to optimize the classification performance, which is a challenging task because of the huge and hierarchical decision space. At the high level, we need to decide how many synthetic samples to generate. At the low level, we need to determine where the synthetic samples should be located, which depends on the high-level decision since the optimal locations of the samples may differ for different numbers of samples. To address the challenges, we propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions. Motivated by the success of SMOTE~\cite{chawla2002smote} and its extensions, we formulate the generation process as a Markov decision process (MDP) consisting of three levels of policies to generate synthetic samples within the SMOTE search space. Then we leverage deep hierarchical reinforcement learning to optimize the performance metric on the validation data. Extensive experiments on six real-world datasets demonstrate that AutoSMOTE significantly outperforms the state-of-the-art resampling algorithms. The code is at https://github.com/daochenzha/autosmote △ Less

Submitted 26 August, 2022; originally announced August 2022.

Comments: Accepted by CIKM 2022

arXiv:2208.11857 [pdf, other]

Shortcut Learning of Large Language Models in Natural Language Understanding

Authors: Mengnan Du, Fengxiang He, Na Zou, Dacheng Tao, Xia Hu

Abstract: Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness. In this paper, we provide a review of recent developments that address the shortcut learning and robus… ▽ More Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks. However, these LLMs might rely on dataset bias and artifacts as shortcuts for prediction. This has significantly affected their generalizability and adversarial robustness. In this paper, we provide a review of recent developments that address the shortcut learning and robustness challenge of LLMs. We first introduce the concepts of shortcut learning of language models. We then introduce methods to identify shortcut learning behavior in language models, characterize the reasons for shortcut learning, as well as introduce mitigation solutions. Finally, we discuss key research challenges and potential research directions in order to advance the field of LLMs. △ Less

Submitted 7 May, 2023; v1 submitted 24 August, 2022; originally announced August 2022.

Comments: Accepted by Communications of the ACM (CACM), Review Article

arXiv:2208.04187 [pdf, other]

DIVISION: Memory Efficient Training via Dual Activation Precision

Authors: Guanchu Wang, Zirui Liu, Zhimeng Jiang, Ninghao Liu, Na Zou, Xia Hu

Abstract: Activation compressed training provides a solution towards reducing the memory cost of training deep neural networks~(DNNs). However, state-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent. To this end, we propose a simple and effective method to compress DNN training. Our method is motivated by an instructive… ▽ More Activation compressed training provides a solution towards reducing the memory cost of training deep neural networks~(DNNs). However, state-of-the-art work combines a search of quantization bit-width with the training, which makes the procedure complicated and less transparent. To this end, we propose a simple and effective method to compress DNN training. Our method is motivated by an instructive observation: DNN backward propagation mainly utilizes the low-frequency component (LFC) of the activation maps, while the majority of memory is for caching the high-frequency component (HFC) during the training. This indicates the HFC of activation maps is highly redundant and compressible during DNN training, which inspires our proposed Dual Activation Precision (DIVISION). During the training, DIVISION preserves the high-precision copy of LFC and compresses the HFC into a light-weight copy with low numerical precision. This can significantly reduce the memory cost without negatively affecting the precision of backward propagation such that DIVISION maintains competitive model accuracy. Experiment results show DIVISION has better comprehensive performance than state-of-the-art methods, including over 10x compression of activation maps and competitive training throughput, without loss of model accuracy. △ Less

Submitted 22 May, 2023; v1 submitted 4 August, 2022; originally announced August 2022.

arXiv:2207.10018 [pdf, other]

Mitigating Algorithmic Bias with Limited Annotations

Authors: Guanchu Wang, Mengnan Du, Ninghao Liu, Na Zou, Xia Hu

Abstract: Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribu… ▽ More Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias. According to the evaluation on five benchmark datasets, APOD outperforms the state-of-the-arts baseline methods under the limited annotation budget, and shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited. △ Less

Submitted 13 March, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

arXiv:2206.14397 [pdf, other]

Fair Machine Learning in Healthcare: A Review

Authors: Qizhang Feng, Mengnan Du, Na Zou, Xia Hu

Abstract: The digitization of healthcare data coupled with advances in computational capabilities has propelled the adoption of machine learning (ML) in healthcare. However, these methods can perpetuate or even exacerbate existing disparities, leading to fairness concerns such as the unequal distribution of resources and diagnostic inaccuracies among different demographic groups. Addressing these fairness p… ▽ More The digitization of healthcare data coupled with advances in computational capabilities has propelled the adoption of machine learning (ML) in healthcare. However, these methods can perpetuate or even exacerbate existing disparities, leading to fairness concerns such as the unequal distribution of resources and diagnostic inaccuracies among different demographic groups. Addressing these fairness problem is paramount to prevent further entrenchment of social injustices. In this survey, we analyze the intersection of fairness in machine learning and healthcare disparities. We adopt a framework based on the principles of distributive justice to categorize fairness concerns into two distinct classes: equal allocation and equal performance. We provide a critical review of the associated fairness metrics from a machine learning standpoint and examine biases and mitigation strategies across the stages of the ML lifecycle, discussing the relationship between biases and their countermeasures. The paper concludes with a discussion on the pressing challenges that remain unaddressed in ensuring fairness in healthcare ML, and proposes several new research directions that hold promise for developing ethical and equitable ML applications in healthcare. △ Less

Submitted 1 February, 2024; v1 submitted 29 June, 2022; originally announced June 2022.

arXiv:2202.04187 [pdf, other]

FMP: Toward Fair Graph Message Passing against Topology Bias

Authors: Zhimeng Jiang, Xiaotian Han, Chao Fan, Zirui Liu, Na Zou, Ali Mostafavi, Xia Hu

Abstract: Despite recent advances in achieving fair representations and predictions through regularization, adversarial debiasing, and contrastive learning in graph neural networks (GNNs), the working mechanism (i.e., message passing) behind GNNs inducing unfairness issue remains unknown. In this work, we theoretically and experimentally demonstrate that representative aggregation in message-passing schemes… ▽ More Despite recent advances in achieving fair representations and predictions through regularization, adversarial debiasing, and contrastive learning in graph neural networks (GNNs), the working mechanism (i.e., message passing) behind GNNs inducing unfairness issue remains unknown. In this work, we theoretically and experimentally demonstrate that representative aggregation in message-passing schemes accumulates bias in node representation due to topology bias induced by graph topology. Thus, a \textsf{F}air \textsf{M}essage \textsf{P}assing (FMP) scheme is proposed to aggregate useful information from neighbors but minimize the effect of topology bias in a unified framework considering graph smoothness and fairness objectives. The proposed FMP is effective, transparent, and compatible with back-propagation training. An acceleration approach on gradient calculation is also adopted to improve algorithm efficiency. Experiments on node classification tasks demonstrate that the proposed FMP outperforms the state-of-the-art baselines in effectively and efficiently mitigating bias on three real-world datasets. △ Less

Submitted 8 February, 2022; originally announced February 2022.

arXiv:2112.08767 [pdf, other]

Adaptation and Attention for Neural Video Coding

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Ramin G. Youvalari, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

Abstract: Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired w… ▽ More Neural image coding represents now the state-of-the-art image compression approach. However, a lot of work is still to be done in the video domain. In this work, we propose an end-to-end learned video codec that introduces several architectural novelties as well as training novelties, revolving around the concepts of adaptation and attention. Our codec is organized as an intra-frame codec paired with an inter-frame codec. As one architectural novelty, we propose to train the inter-frame codec model to adapt the motion estimation process based on the resolution of the input video. A second architectural novelty is a new neural block that combines concepts from split-attention based neural networks and from DenseNets. Finally, we propose to overfit a set of decoder-side multiplicative parameters at inference time. Through ablation studies and comparisons to prior art, we show the benefits of our proposed techniques in terms of coding gains. We compare our codec to VVC/H.266 and RLVC, which represent the state-of-the-art traditional and end-to-end learned codecs, respectively, and to the top performing end-to-end learned approach in 2021 CLIC competition, E2E_T_OL. Our codec clearly outperforms E2E_T_OL, and compare favorably to VVC and RLVC in some settings. △ Less

Submitted 16 December, 2021; originally announced December 2021.

arXiv:2111.04303 [pdf, other]

Defense Against Explanation Manipulation

Authors: Ruixiang Tang, Ninghao Liu, Fan Yang, Na Zou, Xia Hu

Abstract: Explainable machine learning attracts increasing attention as it improves transparency of models, which is helpful for machine learning to be trusted in real applications. However, explanation methods have recently been demonstrated to be vulnerable to manipulation, where we can easily change a model's explanation while keeping its prediction constant. To tackle this problem, some efforts have bee… ▽ More Explainable machine learning attracts increasing attention as it improves transparency of models, which is helpful for machine learning to be trusted in real applications. However, explanation methods have recently been demonstrated to be vulnerable to manipulation, where we can easily change a model's explanation while keeping its prediction constant. To tackle this problem, some efforts have been paid to use more stable explanation methods or to change model configurations. In this work, we tackle the problem from the training perspective, and propose a new training scheme called Adversarial Training on EXplanations (ATEX) to improve the internal explanation stability of a model regardless of the specific explanation method being applied. Instead of directly specifying explanation values over data instances, ATEX only puts requirement on model predictions which avoids involving second-order derivatives in optimization. As a further discussion, we also find that explanation stability is closely related to another property of the model, i.e., the risk of being exposed to adversarial attack. Through experiments, besides showing that ATEX improves model robustness against manipulation targeting explanation, it also brings additional benefits including smoothing explanations and improving the efficacy of adversarial training if applied to the model. △ Less

Submitted 8 November, 2021; originally announced November 2021.

arXiv:2111.03015 [pdf, other]

Modeling Techniques for Machine Learning Fairness: A Survey

Authors: Mingyang Wan, Daochen Zha, Ninghao Liu, Na Zou

Abstract: Machine learning models are becoming pervasive in high-stakes applications. Despite their clear benefits in terms of performance, the models could show discrimination against minority groups and result in fairness issues in a decision-making process, leading to severe negative impacts on the individuals and the society. In recent years, various techniques have been developed to mitigate the unfair… ▽ More Machine learning models are becoming pervasive in high-stakes applications. Despite their clear benefits in terms of performance, the models could show discrimination against minority groups and result in fairness issues in a decision-making process, leading to severe negative impacts on the individuals and the society. In recent years, various techniques have been developed to mitigate the unfairness for machine learning models. Among them, in-processing methods have drawn increasing attention from the community, where fairness is directly taken into consideration during model design to induce intrinsically fair models and fundamentally mitigate fairness issues in outputs and representations. In this survey, we review the current progress of in-processing fairness mitigation techniques. Based on where the fairness is achieved in the model, we categorize them into explicit and implicit methods, where the former directly incorporates fairness metrics in training objectives, and the latter focuses on refining latent representation learning. Finally, we conclude the survey with a discussion of the research challenges in this community to motivate future exploration. △ Less

Submitted 9 April, 2022; v1 submitted 4 November, 2021; originally announced November 2021.

Comments: 26 pages, 4 figures

arXiv:2108.10551 [pdf, ps, other]

Lossless Image Compression Using a Multi-Scale Progressive Statistical Model

Authors: Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Nannan Zou, Emre Aksu, Miska M. Hannuksela

Abstract: Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way p… ▽ More Lossless image compression is an important technique for image storage and transmission when information loss is not allowed. With the fast development of deep learning techniques, deep neural networks have been used in this field to achieve a higher compression rate. Methods based on pixel-wise autoregressive statistical models have shown good performance. However, the sequential processing way prevents these methods to be used in practice. Recently, multi-scale autoregressive models have been proposed to address this limitation. Multi-scale approaches can use parallel computing systems efficiently and build practical systems. Nevertheless, these approaches sacrifice compression performance in exchange for speed. In this paper, we propose a multi-scale progressive statistical model that takes advantage of the pixel-wise approach and the multi-scale approach. We developed a flexible mechanism where the processing order of the pixels can be adjusted easily. Our proposed method outperforms the state-of-the-art lossless image compression methods on two large benchmark datasets by a significant margin without degrading the inference speed dramatically. △ Less

Submitted 24 August, 2021; originally announced August 2021.

Comments: Accepted ACCV 2020

arXiv:2108.04212 [pdf, other]

AutoVideo: An Automated Video Action Recognition System

Authors: Daochen Zha, Zaid Pervaiz Bhat, Yi-Wei Chen, Yicheng Wang, Sirui Ding, Jiaben Chen, Kwei-Herng Lai, Mohammad Qazim Bhat, Anmoll Kumar Jain, Alfredo Costilla Reyes, Na Zou, Xia Hu

Abstract: Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured fo… ▽ More Action recognition is an important task for video understanding with broad applications. However, developing an effective action recognition solution often requires extensive engineering efforts in building and testing different combinations of the modules and their hyperparameters. In this demo, we present AutoVideo, a Python system for automated video action recognition. AutoVideo is featured for 1) highly modular and extendable infrastructure following the standard pipeline language, 2) an exhaustive list of primitives for pipeline construction, 3) data-driven tuners to save the efforts of pipeline tuning, and 4) easy-to-use Graphical User Interface (GUI). AutoVideo is released under MIT license at https://github.com/datamllab/autovideo △ Less

Submitted 16 July, 2022; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: Accepted by IJCAI https://github.com/datamllab/autovideo

arXiv:2105.13841

A General Taylor Framework for Unifying and Revisiting Attribution Methods

Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu

Abstract: Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empiric… ▽ More Attribution methods provide an insight into the decision-making process of machine learning models, especially deep neural networks, by assigning contribution scores to each individual feature. However, the attribution problem has not been well-defined, which lacks a unified guideline to the contribution assignment process. Furthermore, existing attribution methods often built upon various empirical intuitions and heuristics. There still lacks a general theoretical framework that not only can offer a good description of the attribution problem, but also can be applied to unifying and revisiting existing attribution methods. To bridge the gap, in this paper, we propose a Taylor attribution framework, which models the attribution problem as how to decide individual payoffs in a coalition. Then, we reformulate fourteen mainstream attribution methods into the Taylor framework and analyze these attribution methods in terms of rationale, fidelity, and limitation in the framework. Moreover, we establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct Taylor contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets. △ Less

Submitted 25 February, 2023; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: In the current version, the author information is not complete and there are some mathematical errors in the proof. We need to correct errors and add all co-authors who contribute to the paper. Therefore, we hope to withdraw the manuscript

arXiv:2104.06629 [pdf, other]

Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution

Authors: Huiqi Deng, Na Zou, Weifu Chen, Guocan Feng, Mengnan Du, Xia Hu

Abstract: Back propagation based visualizations have been proposed to interpret deep neural networks (DNNs), some of which produce interpretations with good visual quality. However, there exist doubts about whether these intuitive visualizations are related to the network decisions. Recent studies have confirmed this suspicion by verifying that almost all these modified back-propagation visualizations are n… ▽ More Back propagation based visualizations have been proposed to interpret deep neural networks (DNNs), some of which produce interpretations with good visual quality. However, there exist doubts about whether these intuitive visualizations are related to the network decisions. Recent studies have confirmed this suspicion by verifying that almost all these modified back-propagation visualizations are not faithful to the model's decision-making process. Besides, these visualizations produce vague "relative importance scores", among which low values can't guarantee to be independent of the final prediction. Hence, it's highly desirable to develop a novel back-propagation framework that guarantees theoretical faithfulness and produces a quantitative attribution score with a clear understanding. To achieve the goal, we resort to mutual information theory to generate the interpretations, studying how much information of output is encoded in each input neuron. The basic idea is to learn a source signal by back-propagation such that the mutual information between input and output should be as much as possible preserved in the mutual information between input and the source signal. In addition, we propose a Mutual Information Preserving Inverse Network, termed MIP-IN, in which the parameters of each layer are recursively trained to learn how to invert. During the inversion, forward Relu operation is adopted to adapt the general interpretations to the specific input. We then empirically demonstrate that the inverted source signal satisfies completeness and minimality property, which are crucial for a faithful interpretation. Furthermore, the empirical study validates the effectiveness of interpretations generated by MIP-IN. △ Less

Submitted 14 April, 2021; originally announced April 2021.

arXiv:2008.09695 [pdf, other]

A Unified Taylor Framework for Revisiting Attribution Methods

Authors: Huiqi Deng, Na Zou, Mengnan Du, Weifu Chen, Guocan Feng, Xia Hu

Abstract: Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theor… ▽ More Attribution methods have been developed to understand the decision-making process of machine learning models, especially deep neural networks, by assigning importance scores to individual features. Existing attribution methods often built upon empirical intuitions and heuristics. There still lacks a general and theoretical framework that not only can unify these attribution methods, but also theoretically reveal their rationales, fidelity, and limitations. To bridge the gap, in this paper, we propose a Taylor attribution framework and reformulate seven mainstream attribution methods into the framework. Based on reformulations, we analyze the attribution methods in terms of rationale, fidelity, and limitation. Moreover, We establish three principles for a good attribution in the Taylor attribution framework, i.e., low approximation error, correct contribution assignment, and unbiased baseline selection. Finally, we empirically validate the Taylor reformulations and reveal a positive correlation between the attribution performance and the number of principles followed by the attribution method via benchmarking on real-world datasets. △ Less

Submitted 13 April, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.16054 [pdf, other]

Learning to Learn to Compress

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Miska Hannuksela, Emre Aksu, Esa Rahtu

Abstract: In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or bene… ▽ More In this paper we present an end-to-end meta-learned system for image compression. Traditional machine learning based approaches to image compression train one or more neural network for generalization performance. However, at inference time, the encoder or the latent tensor output by the encoder can be optimized for each test image. This optimization can be regarded as a form of adaptation or benevolent overfitting to the input content. In order to reduce the gap between training and inference conditions, we propose a new training paradigm for learned image compression, which is based on meta-learning. In a first phase, the neural networks are trained normally. In a second phase, the Model-Agnostic Meta-learning approach is adapted to the specific case of image compression, where the inner-loop performs latent tensor overfitting, and the outer loop updates both encoder and decoder neural networks based on the overfitting performance. Furthermore, after meta-learning, we propose to overfit and cluster the bias terms of the decoder on training image patches, so that at inference time the optimal content-specific bias terms can be selected at encoder-side. Finally, we propose a new probability model for lossless compression, which combines concepts from both multi-scale and super-resolution probability model approaches. We show the benefits of all our proposed ideas via carefully designed experiments. △ Less

Submitted 1 May, 2021; v1 submitted 31 July, 2020; originally announced July 2020.

arXiv:2006.08315 [pdf, other]

Mitigating Gender Bias in Captioning Systems

Authors: Ruixiang Tang, Mengnan Du, Yuening Li, Zirui Liu, Na Zou, Xia Hu

Abstract: Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To enco… ▽ More Image captioning has made substantial progress with huge supporting image collections sourced from the web. However, recent studies have pointed out that captioning datasets, such as COCO, contain gender bias found in web corpora. As a result, learning models could heavily rely on the learned priors and image context for gender identification, leading to incorrect or even offensive errors. To encourage models to learn correct gender features, we reorganize the COCO dataset and present two new splits COCO-GB V1 and V2 datasets where the train and test sets have different gender-context joint distribution. Models relying on contextual cues will suffer from huge gender prediction errors on the anti-stereotypical test data. Benchmarking experiments reveal that most captioning models learn gender bias, leading to high gender prediction errors, especially for women. To alleviate the unwanted bias, we propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence. Experimental results validate that GAIC can significantly reduce gender prediction errors with a competitive caption quality. Our codes and the designed benchmark datasets are available at https://github.com/datamllab/Mitigating_Gender_Bias_In_Captioning_System. △ Less

Submitted 20 April, 2021; v1 submitted 15 June, 2020; originally announced June 2020.

arXiv:2004.09226 [pdf, other]

End-to-End Learning for Video Frame Compression with Self-Attention

Authors: Nannan Zou, Honglei Zhang, Francesco Cricri, Hamed R. Tavakoli, Jani Lainema, Emre Aksu, Miska Hannuksela, Esa Rahtu

Abstract: One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference… ▽ More One of the core components of conventional (i.e., non-learned) video codecs consists of predicting a frame from a previously-decoded frame, by leveraging temporal correlations. In this paper, we propose an end-to-end learned system for compressing video frames. Instead of relying on pixel-space motion (as with optical flow), our system learns deep embeddings of frames and encodes their difference in latent space. At decoder-side, an attention mechanism is designed to attend to the latent space of frames to decide how different parts of the previous and current frame are combined to form the final predicted current frame. Spatially-varying channel allocation is achieved by using importance masks acting on the feature-channels. The model is trained to reduce the bitrate by minimizing a loss on importance maps and a loss on the probability output by a context model for arithmetic coding. In our experiments, we show that the proposed system achieves high compression rates and high objective visual quality as measured by MS-SSIM and PSNR. Furthermore, we provide ablation studies where we highlight the contribution of different components. △ Less

Submitted 20 April, 2020; originally announced April 2020.

arXiv:2003.05602 [pdf, other]

PyODDS: An End-to-end Outlier Detection System with Automated Machine Learning

Authors: Yuening Li, Daochen Zha, Praveen Kumar Venugopal, Na Zou, Xia Hu

Abstract: Outlier detection is an important task for various data mining applications. Current outlier detection techniques are often manually designed for specific domains, requiring large human efforts of database setup, algorithm selection, and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support, which automaticall… ▽ More Outlier detection is an important task for various data mining applications. Current outlier detection techniques are often manually designed for specific domains, requiring large human efforts of database setup, algorithm selection, and hyper-parameter tuning. To fill this gap, we present PyODDS, an automated end-to-end Python system for Outlier Detection with Database Support, which automatically optimizes an outlier detection pipeline for a new data source at hand. Specifically, we define the search space in the outlier detection pipeline, and produce a search strategy within the given search space. PyODDS enables end-to-end executions based on an Apache Spark backend server and a light-weight database. It also provides unified interfaces and visualizations for users with or without data science or machine learning background. In particular, we demonstrate PyODDS on several real-world datasets, with quantification analysis and visualization results. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: In Companion Proceedings of the Web Conference 2020 (WWW 20)

arXiv:1912.08306 [pdf, other]

Multi-Channel Graph Convolutional Networks

Authors: Kaixiong Zhou, Qingquan Song, Xiao Huang, Daochen Zha, Na Zou, Xia Hu

Abstract: Graph neural networks (GNN) has been demonstrated to be effective in classifying graph structures. To further improve the graph representation learning ability, hierarchical GNN has been explored. It leverages the differentiable pooling to cluster nodes into fixed groups, and generates a coarse-grained structure accompanied with the shrinking of the original graph. However, such clustering would d… ▽ More Graph neural networks (GNN) has been demonstrated to be effective in classifying graph structures. To further improve the graph representation learning ability, hierarchical GNN has been explored. It leverages the differentiable pooling to cluster nodes into fixed groups, and generates a coarse-grained structure accompanied with the shrinking of the original graph. However, such clustering would discard some graph information and achieve the suboptimal results. It is because the node inherently has different characteristics or roles, and two non-isomorphic graphs may have the same coarse-grained structure that cannot be distinguished after pooling. To compensate the loss caused by coarse-grained clustering and further advance GNN, we propose a multi-channel graph convolutional networks (MuchGCN). It is motivated by the convolutional neural networks, at which a series of channels are encoded to preserve the comprehensive characteristics of the input image. Thus, we define the specific graph convolutions to learn a series of graph channels at each layer, and pool graphs iteratively to encode the hierarchical structures. Experiments have been carefully carried out to demonstrate the superiority of MuchGCN over the state-of-the-art graph classification algorithms. △ Less

Submitted 17 December, 2019; originally announced December 2019.

arXiv:1910.02575 [pdf, other]

PyODDS: An End-to-End Outlier Detection System

Authors: Yuening Li, Daochen Zha, Na Zou, Xia Hu

Abstract: PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access… ▽ More PyODDS is an end-to end Python system for outlier detection with database support. PyODDS provides outlier detection algorithms which meet the demands for users in different fields, w/wo data science or machine learning background. PyODDS gives the ability to execute machine learning algorithms in-database without moving data out of the database server or over the network. It also provides access to a wide range of outlier detection algorithms, including statistical analysis and more recent deep learning based approaches. PyODDS is released under the MIT open-source license, and currently available at (https://github.com/datamllab/pyodds) with official documentations at (https://pyodds.github.io/). △ Less

Submitted 11 October, 2019; v1 submitted 6 October, 2019; originally announced October 2019.

Comments: 6 Pages, 2 Figures

arXiv:1908.08843 [pdf, other]

Fairness in Deep Learning: A Computational Perspective

Authors: Mengnan Du, Fan Yang, Na Zou, Xia Hu

Abstract: Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing negative impacts on individuals and society. Therefore, fairness in deep learning has attracted tremendous attention recently. We provide a review cover… ▽ More Deep learning is increasingly being used in high-stake decision making applications that affect individual lives. However, deep learning models might exhibit algorithmic discrimination behaviors with respect to protected groups, potentially posing negative impacts on individuals and society. Therefore, fairness in deep learning has attracted tremendous attention recently. We provide a review covering recent progresses to tackle algorithmic fairness problems of deep learning from the computational perspective. Specifically, we show that interpretability can serve as a useful ingredient to diagnose the reasons that lead to algorithmic discrimination. We also discuss fairness mitigation approaches categorized according to three stages of deep learning life-cycle, aiming to push forward the area of fairness in deep learning and build genuinely fair and reliable deep learning systems. △ Less

Submitted 18 March, 2020; v1 submitted 23 August, 2019; originally announced August 2019.

arXiv:1908.03849 [pdf, other]

SpecAE: Spectral AutoEncoder for Anomaly Detection in Attributed Networks

Authors: Yuening Li, Xiao Huang, Jundong Li, Mengnan Du, Na Zou

Abstract: Anomaly detection aims to distinguish observations that are rare and different from the majority. While most existing algorithms assume that instances are i.i.d., in many practical scenarios, links describing instance-to-instance dependencies and interactions are available. Such systems are called attributed networks. Anomaly detection in attributed networks has various applications such as monito… ▽ More Anomaly detection aims to distinguish observations that are rare and different from the majority. While most existing algorithms assume that instances are i.i.d., in many practical scenarios, links describing instance-to-instance dependencies and interactions are available. Such systems are called attributed networks. Anomaly detection in attributed networks has various applications such as monitoring suspicious accounts in social media and financial fraud in transaction networks. However, it remains a challenging task since the definition of anomaly becomes more complicated and topological structures are heterogeneous with nodal attributes. In this paper, we propose a spectral convolution and deconvolution based framework -- SpecAE, to project the attributed network into a tailored space to detect global and community anomalies. SpecAE leverages Laplacian sharpening to amplify the distances between representations of anomalies and the ones of the majority. The learned representations along with reconstruction errors are combined with a density estimation model to perform the detection. They are trained jointly as an end-to-end framework. Experiments on real-world datasets demonstrate the effectiveness of SpecAE. △ Less

Submitted 9 October, 2019; v1 submitted 11 August, 2019; originally announced August 2019.

Comments: 7 pages, in proceedings of the 28th ACM International Conference on Information and Knowledge Management (CIKM)

arXiv:1908.03692 [pdf, other]

doi 10.1109/ICIP.2019.8803627

User independent Emotion Recognition with Residual Signal-Image Network

Authors: Guanghao Yin, Shouqian Sun, Hui Zhang, Dian Yu, Chao Li, Kejun Zhang, Ning Zou

Abstract: User independent emotion recognition with large scale physiological signals is a tough problem. There exist many advanced methods but they are conducted under relatively small datasets with dozens of subjects. Here, we propose Res-SIN, a novel end-to-end framework using Electrodermal Activity(EDA) signal images to classify human emotion. We first apply convex optimization-based EDA (cvxEDA) to dec… ▽ More User independent emotion recognition with large scale physiological signals is a tough problem. There exist many advanced methods but they are conducted under relatively small datasets with dozens of subjects. Here, we propose Res-SIN, a novel end-to-end framework using Electrodermal Activity(EDA) signal images to classify human emotion. We first apply convex optimization-based EDA (cvxEDA) to decompose signals and mine the static and dynamic emotion changes. Then, we transform decomposed signals to images so that they can be effectively processed by CNN frameworks. The Res-SIN combines individual emotion features and external emotion benchmarks to accelerate convergence. We evaluate our approach on the PMEmo dataset, the currently largest emotional dataset containing music and EDA signals. To the best of author's knowledge, our method is the first attempt to classify large scale subject-independent emotion with 7962 pieces of EDA signals from 457 subjects. Experimental results demonstrate the reliability of our model and the binary classification accuracy of 73.65% and 73.43% on arousal and valence dimension can be used as a baseline. △ Less

Submitted 2 August, 2020; v1 submitted 10 August, 2019; originally announced August 2019.

Journal ref: ICIP2019

arXiv:1812.04103 [pdf, other]

Non-local U-Net for Biomedical Image Segmentation

Authors: Zhengyang Wang, Na Zou, Dinggang Shen, Shuiwang Ji

Abstract: Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equippe… ▽ More Deep learning has shown its great promise in various biomedical image segmentation tasks. Existing models are typically based on U-Net and rely on an encoder-decoder architecture with stacked local operators to aggregate long-range information gradually. However, only using the local operators limits the efficiency and effectiveness. In this work, we propose the non-local U-Nets, which are equipped with flexible global aggregation blocks, for biomedical image segmentation. These blocks can be inserted into U-Net as size-preserving processes, as well as down-sampling and up-sampling layers. We perform thorough experiments on the 3D multimodality isointense infant brain MR image segmentation task to evaluate the non-local U-Nets. Results show that our proposed models achieve top performances with fewer parameters and faster computation. △ Less

Submitted 18 February, 2020; v1 submitted 10 December, 2018; originally announced December 2018.

Comments: In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2019

arXiv:1805.11510

ALZA: An Efficient Hybrid Decentralized Payment System

Authors: Nate Zou, Eric Li, Henry Zhang

Abstract: The efficiency of decentralized book systems like Bitcoin and Ethereum has always been a challenge. It is usually measured by three major factors: scalability, throughput, and latency. Scalability refers to how the system capacity is increased by adding more physical resources. Throughput measures the volume of transactions for a given period of time, where most current solutions attempt to improv… ▽ More The efficiency of decentralized book systems like Bitcoin and Ethereum has always been a challenge. It is usually measured by three major factors: scalability, throughput, and latency. Scalability refers to how the system capacity is increased by adding more physical resources. Throughput measures the volume of transactions for a given period of time, where most current solutions attempt to improve such as NEO, EOS, etc. Latency measures the processing time of any single transaction. In current blockchain based systems, the block generation rate is the main latency bottleneck. Off-chain processes such as state channels are the most recent work that can integrate partial inbound transactions, reducing latency. Unfortunately, the state channel introduces more issues at the same time, such as cross-channel synchronization, which makes the state channel unavailable for full adoption of current blockchain solutions. In order to solve the efficiency problem, we proposed an end-to-end solution called ALZA, which links the dedicated high-throughput blockchain with self-organizing payment fields. This mechanism allows arbitrary set of users to create payment fields that process extremely low latency transactions within each field. Therefore, users can make transactions almost immediately. Since all transactions are conducted within fields, transaction costs will be reduced by several orders of magnitude. In addition, ALZA distributes main ledger to each client through an innovative replication mechanism. Therefore, the system will be significantly more robust to blockchain system failures. In theory, ALZA can complete millions of transactions in one second, which naturally supports high-frequency trading. △ Less

Submitted 7 March, 2019; v1 submitted 25 May, 2018; originally announced May 2018.

Comments: Overlap wording with Hadoop file systems

Showing 1–45 of 45 results for author: Zou, N