-
Multi-View Active Sensing for Human-Robot Interaction via Hierarchically Connected Tree
Authors:
Yuanjiong Ying,
Xian Huang,
Wei Dong
Abstract:
Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing…
▽ More
Comprehensive perception of human beings is the prerequisite to ensure the safety of human-robot interaction. Currently, prevailing visual sensing approach typically involves a single static camera, resulting in a restricted and occluded field of view. In our work, we develop an active vision system using multiple cameras to dynamically capture multi-source RGB-D data. An integrated human sensing strategy based on a hierarchically connected tree structure is proposed to fuse localized visual information. Constituting the tree model are the nodes representing keypoints and the edges representing keyparts, which are consistently interconnected to preserve the structural constraints during multi-source fusion. Utilizing RGB-D data and HRNet, the 3D positions of keypoints are analytically estimated, and their presence is inferred through a sliding widow of confidence scores. Subsequently, the point clouds of reliable keyparts are extracted by drawing occlusion-resistant masks, enabling fine registration between data clouds and cylindrical model following the hierarchical order. Experimental results demonstrate that our method enhances keypart recognition recall from 69.20% to 90.10%, compared to employing a single static camera. Furthermore, in overcoming challenges related to localized and occluded perception, the robotic arm's obstacle avoidance capabilities are effectively improved.
△ Less
Submitted 19 March, 2024;
originally announced March 2024.
-
CEASE: Collision-Evaluation-based Active Sense System for Collaborative Robotic Arms
Authors:
Xian Huang,
Yuanjiong Ying,
Wei Dong
Abstract:
Collision detection via visual fences can significantly enhance the safety of collaborative robotic arms. Existing work typically performs such detection based on pre-deployed stationary cameras outside the robotic arm's workspace. These stationary cameras can only provide a restricted detection range and constrain the mobility of the robotic system. To cope with this issue, we propose an active s…
▽ More
Collision detection via visual fences can significantly enhance the safety of collaborative robotic arms. Existing work typically performs such detection based on pre-deployed stationary cameras outside the robotic arm's workspace. These stationary cameras can only provide a restricted detection range and constrain the mobility of the robotic system. To cope with this issue, we propose an active sense method enabling a wide range of collision risk evaluation in dynamic scenarios. First, an active vision mechanism is implemented by equipping cameras with additional degrees of rotation. Considering the uncertainty in the active sense, we design a state confidence envelope to uniformly characterize both known and potential dynamic obstacles. Subsequently, using the observation-based uncertainty evolution, collision risk is evaluated by the prediction of obstacle envelopes. On this basis, a Markov decision process was employed to search for an optimal observation sequence of the active sense system, which enlarges the field of observation and reduces uncertainties in the state estimation of surrounding obstacles. Simulation and real-world experiments consistently demonstrate a 168% increase in the observation time coverage of typical dynamic humanoid obstacles compared to the method using stationary cameras, which underscores our system's effectiveness in collision risk tracking and enhancing the safety of robotic arms.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Towards Accurate Post-training Quantization for Reparameterized Models
Authors:
Luoming Zhang,
Yefei He,
Wen Fei,
Zhenyu Lou,
Weijia Wu,
YangWei Ying,
Hong Zhou
Abstract:
Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impac…
▽ More
Model reparameterization is a widely accepted technique for improving inference speed without compromising performance. However, current Post-training Quantization (PTQ) methods often lead to significant accuracy degradation when applied to reparameterized models. This is primarily caused by channel-specific and sample-specific outliers, which appear only at specific samples and channels and impact on the selection of quantization parameters. To address this issue, we propose RepAPQ, a novel framework that preserves the accuracy of quantized reparameterization models. Different from previous frameworks using Mean Squared Error (MSE) as a measurement, we utilize Mean Absolute Error (MAE) to mitigate the influence of outliers on quantization parameters. Our framework comprises two main components: Quantization Protecting Reparameterization and Across-block Calibration. For effective calibration, Quantization Protecting Reparameterization combines multiple branches into a single convolution with an affine layer. During training, the affine layer accelerates convergence and amplifies the output of the convolution to better accommodate samples with outliers. Additionally, Across-block Calibration leverages the measurement of stage output as supervision to address the gradient problem introduced by MAE and enhance the interlayer correlation with quantization parameters. Comprehensive experiments demonstrate the effectiveness of RepAPQ across various models and tasks. Our framework outperforms previous methods by approximately 1\% for 8-bit PTQ and 2\% for 6-bit PTQ, showcasing its superior performance. The code is available at \url{https://github.com/ilur98/DLMC-QUANT}.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
RENOVI: A Benchmark Towards Remediating Norm Violations in Socio-Cultural Conversations
Authors:
Haolan Zhan,
Zhuang Li,
Xiaoxi Kang,
Tao Feng,
Yuncheng Hua,
Lizhen Qu,
Yi Ying,
Mei Rianto Chandra,
Kelly Rosalin,
Jureynolds Jureynolds,
Suraj Sharma,
Shilin Qu,
Linhao Luo,
Lay-Ki Soon,
Zhaleh Semnani Azad,
Ingrid Zukerman,
Gholamreza Haffari
Abstract:
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as…
▽ More
Norm violations occur when individuals fail to conform to culturally accepted behaviors, which may lead to potential conflicts. Remediating norm violations requires social awareness and cultural sensitivity of the nuances at play. To equip interactive AI systems with a remediation ability, we offer ReNoVi - a large-scale corpus of 9,258 multi-turn dialogues annotated with social norms, as well as define a sequence of tasks to help understand and remediate norm violations step by step. ReNoVi consists of two parts: 512 human-authored dialogues (real data), and 8,746 synthetic conversations generated by ChatGPT through prompt learning. While collecting sufficient human-authored data is costly, synthetic conversations provide suitable amounts of data to help mitigate the scarcity of training data, as well as the chance to assess the alignment between LLMs and humans in the awareness of social norms. We thus harness the power of ChatGPT to generate synthetic training data for our task. To ensure the quality of both human-authored and synthetic data, we follow a quality control protocol during data collection. Our experimental results demonstrate the importance of remediating norm violations in socio-cultural conversations, as well as the improvement in performance obtained from synthetic data.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Differentially Private Non-convex Learning for Multi-layer Neural Networks
Authors:
Hanpu Shen,
Cheng-Long Wang,
Zihang Xiang,
Yiming Ying,
Di Wang
Abstract:
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded…
▽ More
This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example.
In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
Outlier Robust Adversarial Training
Authors:
Shu Hu,
Zhenhuan Yang,
Xin Wang,
Yiming Ying,
Siwei Lyu
Abstract:
Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are…
▽ More
Supervised learning models are challenged by the intrinsic complexities of training data such as outliers and minority subpopulations and intentional attacks at inference time with adversarial samples. While traditional robust learning methods and the recent adversarial training approaches are designed to handle each of the two challenges, to date, no work has been done to develop models that are robust with regard to the low-quality training data and the potential adversarial attack at inference time simultaneously. It is for this reason that we introduce Outlier Robust Adversarial Training (ORAT) in this work. ORAT is based on a bi-level optimization formulation of adversarial training with a robust rank-based loss function. Theoretically, we show that the learning objective of ORAT satisfies the $\mathcal{H}$-consistency in binary classification, which establishes it as a proper surrogate to adversarial 0/1 loss. Furthermore, we analyze its generalization ability and provide uniform convergence rates in high probability. ORAT can be optimized with a simple algorithm. Experimental evaluations on three benchmark datasets demonstrate the effectiveness and robustness of ORAT in handling outliers and adversarial attacks. Our code is available at https://github.com/discovershu/ORAT.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Stability and Generalization of Stochastic Compositional Gradient Descent Algorithms
Authors:
Ming Yang,
Xiyuan Wei,
Tianbao Yang,
Yiming Ying
Abstract:
Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on un…
▽ More
Many machine learning tasks can be formulated as a stochastic compositional optimization (SCO) problem such as reinforcement learning, AUC maximization, and meta-learning, where the objective function involves a nested composition associated with an expectation. While a significant amount of studies has been devoted to studying the convergence behavior of SCO algorithms, there is little work on understanding their generalization, i.e., how these learning algorithms built from training examples would behave on future test examples. In this paper, we provide the stability and generalization analysis of stochastic compositional gradient descent algorithms through the lens of algorithmic stability in the framework of statistical learning theory. Firstly, we introduce a stability concept called compositional uniform stability and establish its quantitative relation with generalization for SCO problems. Then, we establish the compositional uniform stability results for two popular stochastic compositional gradient descent algorithms, namely SCGD and SCSC. Finally, we derive dimension-independent excess risk bounds for SCGD and SCSC by trade-offing their stability results and optimization errors. To the best of our knowledge, these are the first-ever-known results on stability and generalization analysis of stochastic compositional gradient descent algorithms.
△ Less
Submitted 21 November, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance
Authors:
Lisha Chen,
Heshan Fernando,
Yiming Ying,
Tianyi Chen
Abstract:
Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting a…
▽ More
Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
△ Less
Submitted 5 October, 2023; v1 submitted 31 May, 2023;
originally announced May 2023.
-
Generalization Guarantees of Gradient Descent for Multi-Layer Neural Networks
Authors:
Puyu Wang,
Yunwen Lei,
Di Wang,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stabil…
▽ More
Recently, significant progress has been made in understanding the generalization of neural networks (NNs) trained by gradient descent (GD) using the algorithmic stability approach. However, most of the existing research has focused on one-hidden-layer NNs and has not addressed the impact of different network scaling parameters. In this paper, we greatly extend the previous work \cite{lei2022stability,richards2021stability} by conducting a comprehensive stability and generalization analysis of GD for multi-layer NNs. For two-layer NNs, our results are established under general network scaling parameters, relaxing previous conditions. In the case of three-layer NNs, our technical contribution lies in demonstrating its nearly co-coercive property by utilizing a novel induction strategy that thoroughly explores the effects of over-parameterization. As a direct application of our general findings, we derive the excess risk rate of $O(1/\sqrt{n})$ for GD algorithms in both two-layer and three-layer NNs. This sheds light on sufficient or necessary conditions for under-parameterized and over-parameterized NNs trained by GD to attain the desired risk rate of $O(1/\sqrt{n})$. Moreover, we demonstrate that as the scaling parameter increases or the network complexity decreases, less over-parameterization is required for GD to achieve the desired error rates. Additionally, under a low-noise condition, we obtain a fast risk rate of $O(1/n)$ for GD in both two-layer and three-layer NNs.
△ Less
Submitted 29 September, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Fairness-aware Differentially Private Collaborative Filtering
Authors:
Zhenhuan Yang,
Yingqiang Ge,
Congzhe Su,
Dingxian Wang,
Xiaoting Zhao,
Yiming Ying
Abstract:
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochas…
▽ More
Recently, there has been an increasing adoption of differential privacy guided algorithms for privacy-preserving machine learning tasks. However, the use of such algorithms comes with trade-offs in terms of algorithmic fairness, which has been widely acknowledged. Specifically, we have empirically observed that the classical collaborative filtering method, trained by differentially private stochastic gradient descent (DP-SGD), results in a disparate impact on user groups with respect to different user engagement levels. This, in turn, causes the original unfair model to become even more biased against inactive users. To address the above issues, we propose \textbf{DP-Fair}, a two-stage framework for collaborative filtering based algorithms. Specifically, it combines differential privacy mechanisms with fairness constraints to protect user privacy while ensuring fair recommendations. The experimental results, based on Amazon datasets, and user history logs collected from Etsy, one of the largest e-commerce platforms, demonstrate that our proposed method exhibits superior performance in terms of both overall accuracy and user group fairness on both shallow and deep recommendation models compared to vanilla DP-SGD.
△ Less
Submitted 16 March, 2023;
originally announced March 2023.
-
Generalization Analysis for Contrastive Representation Learning
Authors:
Yunwen Lei,
Tianbao Yang,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is…
▽ More
Recently, contrastive learning has found impressive success in advancing the state of the art in solving various machine learning tasks. However, the existing generalization analysis is very limited or even not meaningful. In particular, the existing generalization error bounds depend linearly on the number $k$ of negative examples while it was widely shown in practice that choosing a large $k$ is necessary to guarantee good generalization of contrastive learning in downstream tasks. In this paper, we establish novel generalization bounds for contrastive learning which do not depend on $k$, up to logarithmic terms. Our analysis uses structural results on empirical covering numbers and Rademacher complexities to exploit the Lipschitz continuity of loss functions. For self-bounding Lipschitz loss functions, we further improve our results by developing optimistic bounds which imply fast rates in a low noise condition. We apply our results to learning with both linear representation and nonlinear representation by deep neural networks, for both of which we derive Rademacher complexity bounds to get improved generalization bounds.
△ Less
Submitted 27 February, 2023; v1 submitted 23 February, 2023;
originally announced February 2023.
-
SRIBO: An Efficient and Resilient Single-Range and Inertia Based Odometry for Flying Robots
Authors:
Wei Dong,
Zheyuan Mei,
Yuanjiong Ying,
Sijia Chen,
Yichen ie,
Xiangyang Zhu
Abstract:
Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a posi…
▽ More
Positioning with one inertial measurement unit and one ranging sensor is commonly thought to be feasible only when trajectories are in certain patterns ensuring observability. For this reason, to pursue observable patterns, it is required either exciting the trajectory or searching key nodes in a long interval, which is commonly highly nonlinear and may also lack resilience. Therefore, such a positioning approach is still not widely accepted in real-world applications. To address this issue, this work first investigates the dissipative nature of flying robots considering aerial drag effects and re-formulates the corresponding positioning problem, which guarantees observability almost surely. On this basis, a dimension-reduced wriggling estimator is proposed accordingly. This estimator slides the estimation horizon in a stepping manner, and output matrices can be approximately evaluated based on the historical estimation sequence. The computational complexity is then further reduced via a dimension-reduction approach using polynomial fittings. In this way, the states of robots can be estimated via linear programming in a sufficiently long interval, and the degree of observability is thereby further enhanced because an adequate redundancy of measurements is available for each estimation. Subsequently, the estimator's convergence and numerical stability are proven theoretically. Finally, both indoor and outdoor experiments verify that the proposed estimator can achieve decimeter-level precision at hundreds of hertz per second, and it is resilient to sensors' failures. Hopefully, this study can provide a new practical approach for self-localization as well as relative positioning of cooperative agents with low-cost and lightweight sensors.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.
-
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks
Authors:
Yunwen Lei,
Rong Jin,
Yiming Ying
Abstract:
While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of…
▽ More
While significant theoretical progress has been achieved, unveiling the generalization mystery of overparameterized neural networks still remains largely elusive. In this paper, we study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and stochastic gradient descent (SGD) to train SNNs, for both of which we develop consistent excess risk bounds by balancing the optimization and generalization via early-stopping. As compared to existing analysis on GD, our new analysis requires a relaxed overparameterization assumption and also applies to SGD. The key for the improvement is a better estimation of the smallest eigenvalues of the Hessian matrices of the empirical risks and the loss function along the trajectories of GD and SGD by providing a refined estimation of their iterates.
△ Less
Submitted 19 September, 2022;
originally announced September 2022.
-
Stability and Generalization for Markov Chain Stochastic Gradient Methods
Authors:
Puyu Wang,
Yunwen Lei,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learni…
▽ More
Recently there is a large amount of work devoted to the study of Markov chain stochastic gradient methods (MC-SGMs) which mainly focus on their convergence analysis for solving minimization problems. In this paper, we provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems through the lens of algorithmic stability in the framework of statistical learning theory. For empirical risk minimization (ERM) problems, we establish the optimal excess population risk bounds for both smooth and non-smooth cases by introducing on-average argument stability. For minimax problems, we develop a quantitative connection between on-average argument stability and generalization error which extends the existing results for uniform stability \cite{lei2021stability}. We further develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability, which, combined with our stability results, show that the optimal generalization bounds can be attained for both smooth and non-smooth cases. To the best of our knowledge, this is the first generalization analysis of SGMs when the gradients are sampled from a Markov process.
△ Less
Submitted 16 September, 2022;
originally announced September 2022.
-
Differentially Private Stochastic Gradient Descent with Low-Noise
Authors:
Puyu Wang,
Yunwen Lei,
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and ut…
▽ More
Modern machine learning algorithms aim to extract fine-grained information from data to provide accurate predictions, which often conflicts with the goal of privacy protection. This paper addresses the practical and theoretical importance of developing privacy-preserving machine learning algorithms that ensure good performance while preserving privacy. In this paper, we focus on the privacy and utility (measured by excess risk bounds) performances of differentially private stochastic gradient descent (SGD) algorithms in the setting of stochastic convex optimization. Specifically, we examine the pointwise problem in the low-noise setting for which we derive sharper excess risk bounds for the differentially private SGD algorithm. In the pairwise learning setting, we propose a simple differentially private SGD algorithm based on gradient perturbation. Furthermore, we develop novel utility bounds for the proposed algorithm, proving that it achieves optimal excess risk rates even for non-smooth losses. Notably, we establish fast learning rates for privacy-preserving pairwise learning under the low-noise condition, which is the first of its kind.
△ Less
Submitted 14 July, 2023; v1 submitted 9 September, 2022;
originally announced September 2022.
-
Minimax AUC Fairness: Efficient Algorithm with Provable Convergence
Authors:
Zhenhuan Yang,
Yan Lok Ko,
Kush R. Varshney,
Yiming Ying
Abstract:
The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metri…
▽ More
The use of machine learning models in consequential decision making often exacerbates societal inequity, in particular yielding disparate impact on members of marginalized groups defined by race and gender. The area under the ROC curve (AUC) is widely used to evaluate the performance of a scoring function in machine learning, but is studied in algorithmic fairness less than other performance metrics. Due to the pairwise nature of the AUC, defining an AUC-based group fairness metric is pairwise-dependent and may involve both \emph{intra-group} and \emph{inter-group} AUCs. Importantly, considering only one category of AUCs is not sufficient to mitigate unfairness in AUC optimization. In this paper, we propose a minimax learning and bias mitigation framework that incorporates both intra-group and inter-group AUCs while maintaining utility. Based on this Rawlsian framework, we design an efficient stochastic optimization algorithm and prove its convergence to the minimum group-level AUC. We conduct numerical experiments on both synthetic and real-world datasets to validate the effectiveness of the minimax framework and the proposed optimization algorithm.
△ Less
Submitted 28 November, 2022; v1 submitted 22 August, 2022;
originally announced August 2022.
-
AUC Maximization in the Era of Big Data and AI: A Survey
Authors:
Tianbao Yang,
Yiming Ying
Abstract:
Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, s…
▽ More
Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
△ Less
Submitted 3 August, 2022; v1 submitted 28 March, 2022;
originally announced March 2022.
-
Differentially Private SGDA for Minimax Problems
Authors:
Zhenhuan Yang,
Shu Hu,
Yunwen Lei,
Kush R. Varshney,
Siwei Lyu,
Yiming Ying
Abstract:
Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish th…
▽ More
Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.
△ Less
Submitted 29 July, 2022; v1 submitted 22 January, 2022;
originally announced January 2022.
-
Label Distributionally Robust Losses for Multi-class Classification: Consistency, Robustness and Adaptivity
Authors:
Dixian Zhu,
Yiming Ying,
Tianbao Yang
Abstract:
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unif…
▽ More
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification that are formulated from distributionally robust optimization (DRO) perspective, where the uncertainty in the given label information are modeled and captured by taking the worse case of distributional weights. The benefits of this perspective are several fold: (i) it provides a unified framework to explain the classical cross-entropy (CE) loss and SVM loss and their variants, (ii) it includes a special family corresponding to the temperature-scaled CE loss, which is widely adopted but poorly understood; (iii) it allows us to achieve adaptivity to the uncertainty degree of label information at an instance level. Our contributions include: (1) we study both consistency and robustness by establishing top-$k$ ($\forall k\geq 1$) consistency of LDR losses for multi-class classification, and a negative result that a top-$1$ consistent and symmetric robust loss cannot achieve top-$k$ consistency simultaneously for all $k\geq 2$; (2) we propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance; (3) we demonstrate stable and competitive performance for the proposed adaptive LDR loss on 7 benchmark datasets under 6 noisy label and 1 clean settings against 13 loss functions, and on one real-world noisy dataset. The code is open-sourced at \url{https://github.com/Optimization-AI/ICML2023_LDR}.
△ Less
Submitted 28 June, 2023; v1 submitted 29 December, 2021;
originally announced December 2021.
-
PlantStereo: A Stereo Matching Benchmark for Plant Surface Dense Reconstruction
Authors:
Qingyu Wang,
Baojian Ma,
Wei Liu,
Mingzhao Lou,
Mingchuan Zhou,
Huanyu Jiang,
Yibin Ying
Abstract:
Stereo matching is an important task in computer vision which has drawn tremendous research attention for decades. While in terms of disparity accuracy, density and data size, public stereo datasets are difficult to meet the requirements of models. In this paper, we aim to address the issue between datasets and models and propose a large scale stereo dataset with high accuracy disparity ground tru…
▽ More
Stereo matching is an important task in computer vision which has drawn tremendous research attention for decades. While in terms of disparity accuracy, density and data size, public stereo datasets are difficult to meet the requirements of models. In this paper, we aim to address the issue between datasets and models and propose a large scale stereo dataset with high accuracy disparity ground truth named PlantStereo. We used a semi-automatic way to construct the dataset: after camera calibration and image registration, high accuracy disparity images can be obtained from the depth images. In total, PlantStereo contains 812 image pairs covering a diverse set of plants: spinach, tomato, pepper and pumpkin. We firstly evaluated our PlantStereo dataset on four different stereo matching methods. Extensive experiments on different models and plants show that compared with ground truth in integer accuracy, high accuracy disparity images provided by PlantStereo can remarkably improve the training effect of deep learning models. This paper provided a feasible and reliable method to realize plant surface dense reconstruction. The PlantStereo dataset and relative code are available at: https://www.github.com/wangqingyu985/PlantStereo
△ Less
Submitted 30 November, 2021;
originally announced November 2021.
-
Simple Stochastic and Online Gradient Descent Algorithms for Pairwise Learning
Authors:
Zhenhuan Yang,
Yunwen Lei,
Puyu Wang,
Tianbao Yang,
Yiming Ying
Abstract:
Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances…
▽ More
Pairwise learning refers to learning tasks where the loss function depends on a pair of instances. It instantiates many important machine learning tasks such as bipartite ranking and metric learning. A popular approach to handle streaming data in pairwise learning is an online gradient descent (OGD) algorithm, where one needs to pair the current instance with a buffering set of previous instances with a sufficiently large size and therefore suffers from a scalability issue. In this paper, we propose simple stochastic and online gradient descent methods for pairwise learning. A notable difference from the existing studies is that we only pair the current instance with the previous one in building a gradient direction, which is efficient in both the storage and computational complexity. We develop novel stability results, optimization, and generalization error bounds for both convex and nonconvex as well as both smooth and nonsmooth problems. We introduce novel techniques to decouple the dependency of models and the previous instance in both the optimization and generalization analysis. Our study resolves an open question on developing meaningful generalization bounds for OGD using a buffering set with a very small fixed size. We also extend our algorithms and stability analysis to develop differentially private SGD algorithms for pairwise learning which significantly improves the existing results.
△ Less
Submitted 23 November, 2021;
originally announced November 2021.
-
Memory-Based Optimization Methods for Model-Agnostic Meta-Learning and Personalized Federated Learning
Authors:
Bokun Wang,
Zhuoning Yuan,
Yiming Ying,
Tianbao Yang
Abstract:
In recent years, model-agnostic meta-learning (MAML) has become a popular research area. However, the stochastic optimization of MAML is still underdeveloped. Existing MAML algorithms rely on the ``episode'' idea by sampling a few tasks and data points to update the meta-model at each iteration. Nonetheless, these algorithms either fail to guarantee convergence with a constant mini-batch size or r…
▽ More
In recent years, model-agnostic meta-learning (MAML) has become a popular research area. However, the stochastic optimization of MAML is still underdeveloped. Existing MAML algorithms rely on the ``episode'' idea by sampling a few tasks and data points to update the meta-model at each iteration. Nonetheless, these algorithms either fail to guarantee convergence with a constant mini-batch size or require processing a large number of tasks at every iteration, which is unsuitable for continual learning or cross-device federated learning where only a small number of tasks are available per iteration or per round. To address these issues, this paper proposes memory-based stochastic algorithms for MAML that converge with vanishing error. The proposed algorithms require sampling a constant number of tasks and data samples per iteration, making them suitable for the continual learning scenario. Moreover, we introduce a communication-efficient memory-based MAML algorithm for personalized federated learning in cross-device (with client sampling) and cross-silo (without client sampling) settings. Our theoretical analysis improves the optimization theory for MAML, and our empirical results corroborate our theoretical findings. Interested readers can access our code at \url{https://github.com/bokun-wang/moml}.
△ Less
Submitted 24 April, 2023; v1 submitted 9 June, 2021;
originally announced June 2021.
-
Sum of Ranked Range Loss for Supervised Learning
Authors:
Shu Hu,
Yiming Ying,
Xin Wang,
Siwei Lyu
Abstract:
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR)…
▽ More
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary/multi-class classification at the sample level and the TKML individual loss for multi-label/multi-class classification at the label level. A combination loss of AoRR and TKML is proposed as a new learning objective for improving the robustness of multi-label learning in the face of outliers in sample and labels alike. Our empirical results highlight the effectiveness of the proposed optimization frameworks and demonstrate the applicability of proposed losses using synthetic and real data sets.
△ Less
Submitted 3 April, 2022; v1 submitted 6 June, 2021;
originally announced June 2021.
-
Stability and Generalization of Stochastic Gradient Methods for Minimax Problems
Authors:
Yunwen Lei,
Zhenhuan Yang,
Tianbao Yang,
Yiming Ying
Abstract:
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning m…
▽ More
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs), AUC maximization and robust estimation, to mention but a few. A substantial amount of studies are devoted to studying the convergence behavior of their stochastic gradient-type algorithms. In contrast, there is relatively little work on their generalization, i.e., how the learning models built from training examples would behave on test examples. In this paper, we provide a comprehensive generalization analysis of stochastic gradient methods for minimax problems under both convex-concave and nonconvex-nonconcave cases through the lens of algorithmic stability. We establish a quantitative connection between stability and several generalization measures both in expectation and with high probability. For the convex-concave setting, our stability analysis shows that stochastic gradient descent ascent attains optimal generalization bounds for both smooth and nonsmooth minimax problems. We also establish generalization bounds for both weakly-convex-weakly-concave and gradient-dominated problems.
△ Less
Submitted 12 July, 2021; v1 submitted 8 May, 2021;
originally announced May 2021.
-
Federated Deep AUC Maximization for Heterogeneous Data with a Constant Communication Complexity
Authors:
Zhuoning Yuan,
Zhishuai Guo,
Yi Xu,
Yiming Ying,
Tianbao Yang
Abstract:
Deep AUC (area under the ROC curve) Maximization (DAM) has attracted much attention recently due to its great potential for imbalanced data classification. However, the research on Federated Deep AUC Maximization (FDAM) is still limited. Compared with standard federated learning (FL) approaches that focus on decomposable minimization objectives, FDAM is more complicated due to its minimization obj…
▽ More
Deep AUC (area under the ROC curve) Maximization (DAM) has attracted much attention recently due to its great potential for imbalanced data classification. However, the research on Federated Deep AUC Maximization (FDAM) is still limited. Compared with standard federated learning (FL) approaches that focus on decomposable minimization objectives, FDAM is more complicated due to its minimization objective is non-decomposable over individual examples. In this paper, we propose improved FDAM algorithms for heterogeneous data by solving the popular non-convex strongly-concave min-max formulation of DAM in a distributed fashion, which can also be applied to a class of non-convex strongly-concave min-max problems. A striking result of this paper is that the communication complexity of the proposed algorithm is a constant independent of the number of machines and also independent of the accuracy level, which improves an existing result by orders of magnitude. The experiments have demonstrated the effectiveness of our FDAM algorithm on benchmark datasets, and on medical chest X-ray images from different organizations. Our experiment shows that the performance of FDAM using data from multiple hospitals can improve the AUC score on testing data from a single hospital for detecting life-threatening diseases based on chest radiographs. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org) whose github address is https://github.com/Optimization-AI/ICML2021_FedDeepAUC_CODASCA.
△ Less
Submitted 13 September, 2021; v1 submitted 8 February, 2021;
originally announced February 2021.
-
Differentially Private SGD with Non-Smooth Losses
Authors:
Puyu Wang,
Yunwen Lei,
Yiming Ying,
Hai Zhang
Abstract:
In this paper, we are concerned with differentially private {stochastic gradient descent (SGD)} algorithms in the setting of stochastic convex optimization (SCO). Most of the existing work requires the loss to be Lipschitz continuous and strongly smooth, and the model parameter to be uniformly bounded. However, these assumptions are restrictive as many popular losses violate these conditions inclu…
▽ More
In this paper, we are concerned with differentially private {stochastic gradient descent (SGD)} algorithms in the setting of stochastic convex optimization (SCO). Most of the existing work requires the loss to be Lipschitz continuous and strongly smooth, and the model parameter to be uniformly bounded. However, these assumptions are restrictive as many popular losses violate these conditions including the hinge loss for SVM, the absolute loss in robust regression, and even the least square loss in an unbounded domain. We significantly relax these restrictive assumptions and establish privacy and generalization (utility) guarantees for private SGD algorithms using output and gradient perturbations associated with non-smooth convex losses. Specifically, the loss function is relaxed to have an $α$-Hölder continuous gradient (referred to as $α$-Hölder smoothness) which instantiates the Lipschitz continuity ($α=0$) and the strong smoothness ($α=1$). We prove that noisy SGD with $α$-Hölder smooth losses using gradient perturbation can guarantee $(ε,δ)$-differential privacy (DP) and attain optimal excess population risk $\mathcal{O}\Big(\frac{\sqrt{d\log(1/δ)}}{nε}+\frac{1}{\sqrt{n}}\Big)$, up to logarithmic terms, with the gradient complexity $ \mathcal{O}( n^{2-α\over 1+α}+ n).$ This shows an important trade-off between $α$-Hölder smoothness of the loss and the computational complexity for private SGD with statistically optimal performance. In particular, our results indicate that $α$-Hölder smoothness with $α\ge {1/2}$ is sufficient to guarantee $(ε,δ)$-DP of noisy SGD algorithms while achieving optimal excess risk with the linear gradient complexity $\mathcal{O}(n).$
△ Less
Submitted 26 June, 2021; v1 submitted 21 January, 2021;
originally announced January 2021.
-
Stochastic Hard Thresholding Algorithms for AUC Maximization
Authors:
Zhenhuan Yang,
Baojian Zhou,
Yunwen Lei,
Yiming Ying
Abstract:
In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{…
▽ More
In this paper, we aim to develop stochastic hard thresholding algorithms for the important problem of AUC maximization in imbalanced classification. The main challenge is the pairwise loss involved in AUC maximization. We overcome this obstacle by reformulating the U-statistics objective function as an empirical risk minimization (ERM), from which a stochastic hard thresholding algorithm (\texttt{SHT-AUC}) is developed. To our best knowledge, this is the first attempt to provide stochastic hard thresholding algorithms for AUC maximization with a per-iteration cost $Ø(b d)$ where $d$ and $b$ are the dimension of the data and the minibatch size, respectively. We show that the proposed algorithm enjoys the linear convergence rate up to a tolerance error. In particular, we show, if the data is generated from the Gaussian distribution, then its convergence becomes slower as the data gets more imbalanced. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.
△ Less
Submitted 4 November, 2020;
originally announced November 2020.
-
Learning by Minimizing the Sum of Ranked Range
Authors:
Shu Hu,
Yiming Ying,
Xin Wang,
Siwei Lyu
Abstract:
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR)…
▽ More
In forming learning objectives, one oftentimes needs to aggregate a set of individual values to a single output. Such cases occur in the aggregate loss, which combines individual losses of a learning model over each training sample, and in the individual loss for multi-label learning, which combines prediction scores over all class labels. In this work, we introduce the sum of ranked range (SoRR) as a general approach to form learning objectives. A ranked range is a consecutive sequence of sorted values of a set of real numbers. The minimization of SoRR is solved with the difference of convex algorithm (DCA). We explore two applications in machine learning of the minimization of the SoRR framework, namely the AoRR aggregate loss for binary classification and the TKML individual loss for multi-label/multi-class classification. Our empirical results highlight the effectiveness of the proposed optimization framework and demonstrate the applicability of proposed losses using synthetic and real datasets.
△ Less
Submitted 4 October, 2020;
originally announced October 2020.
-
Online AUC Optimization for Sparse High-Dimensional Datasets
Authors:
Baojian Zhou,
Yiming Ying,
Steven Skiena
Abstract:
The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each $d$ dimensional sample has only $k$ non-zero features with $k \ll d$, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost…
▽ More
The Area Under the ROC Curve (AUC) is a widely used performance measure for imbalanced classification arising from many application domains where high-dimensional sparse data is abundant. In such cases, each $d$ dimensional sample has only $k$ non-zero features with $k \ll d$, and data arrives sequentially in a streaming form. Current online AUC optimization algorithms have high per-iteration cost $\mathcal{O}(d)$ and usually produce non-sparse solutions in general, and hence are not suitable for handling the data challenge mentioned above.
In this paper, we aim to directly optimize the AUC score for high-dimensional sparse datasets under online learning setting and propose a new algorithm, \textsc{FTRL-AUC}. Our proposed algorithm can process data in an online fashion with a much cheaper per-iteration cost $\mathcal{O}(k)$, making it amenable for high-dimensional sparse streaming data analysis. Our new algorithmic design critically depends on a novel reformulation of the U-statistics AUC objective function as the empirical saddle point reformulation, and the innovative introduction of the "lazy update" rule so that the per-iteration complexity is dramatically reduced from $\mathcal{O}(d)$ to $\mathcal{O}(k)$. Furthermore, \textsc{FTRL-AUC} can inherently capture sparsity more effectively by applying a generalized Follow-The-Regularized-Leader (FTRL) framework.
Experiments on real-world datasets demonstrate that \textsc{FTRL-AUC} significantly improves both run time and model sparsity while achieving competitive AUC scores compared with the state-of-the-art methods. Comparison with the online learning method for logistic loss demonstrates that \textsc{FTRL-AUC} achieves higher AUC scores especially when datasets are imbalanced.
△ Less
Submitted 22 September, 2020;
originally announced September 2020.
-
Automated Model Selection for Time-Series Anomaly Detection
Authors:
Yuanxiang Ying,
Juanyong Duan,
Chunlei Wang,
Yujing Wang,
Congrui Huang,
Bixiong Xu
Abstract:
Time-series anomaly detection is a popular topic in both academia and industrial fields. Many companies need to monitor thousands of temporal signals for their applications and services and require instant feedback and alerts for potential incidents in time. The task is challenging because of the complex characteristics of time-series, which are messy, stochastic, and often without proper labels.…
▽ More
Time-series anomaly detection is a popular topic in both academia and industrial fields. Many companies need to monitor thousands of temporal signals for their applications and services and require instant feedback and alerts for potential incidents in time. The task is challenging because of the complex characteristics of time-series, which are messy, stochastic, and often without proper labels. This prohibits training supervised models because of lack of labels and a single model hardly fits different time series. In this paper, we propose a solution to address these issues. We present an automated model selection framework to automatically find the most suitable detection model with proper parameters for the incoming data. The model selection layer is extensible as it can be updated without too much effort when a new detector is available to the service. Finally, we incorporate a customized tuning algorithm to flexibly filter anomalies to meet customers' criteria. Experiments on real-world datasets show the effectiveness of our solution.
△ Less
Submitted 25 August, 2020;
originally announced September 2020.
-
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent
Authors:
Yunwen Lei,
Yiming Ying
Abstract:
Recently there are a considerable amount of work devoted to the study of the algorithmic stability and generalization for stochastic gradient descent (SGD). However, the existing stability analysis requires to impose restrictive assumptions on the boundedness of gradients, strong smoothness and convexity of loss functions. In this paper, we provide a fine-grained analysis of stability and generali…
▽ More
Recently there are a considerable amount of work devoted to the study of the algorithmic stability and generalization for stochastic gradient descent (SGD). However, the existing stability analysis requires to impose restrictive assumptions on the boundedness of gradients, strong smoothness and convexity of loss functions. In this paper, we provide a fine-grained analysis of stability and generalization for SGD by substantially relaxing these assumptions. Firstly, we establish stability and generalization for SGD by removing the existing bounded gradient assumptions. The key idea is the introduction of a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting using stability approach. Secondly, the smoothness assumption is relaxed by considering loss functions with Holder continuous (sub)gradients for which we show that optimal bounds are still achieved by balancing computation and stability. To our best knowledge, this gives the first-ever-known stability and generalization bounds for SGD with even non-differentiable loss functions. Finally, we study learning problems with (strongly) convex objectives but non-convex loss functions.
△ Less
Submitted 15 June, 2020;
originally announced June 2020.
-
Stochastic AUC Maximization with Deep Neural Networks
Authors:
Mingrui Liu,
Zhuoning Yuan,
Yiming Ying,
Tianbao Yang
Abstract:
Stochastic AUC maximization has garnered an increasing interest due to better fit to imbalanced data classification. However, existing works are limited to stochastic AUC maximization with a linear predictive model, which restricts its predictive power when dealing with extremely complex data. In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predicti…
▽ More
Stochastic AUC maximization has garnered an increasing interest due to better fit to imbalanced data classification. However, existing works are limited to stochastic AUC maximization with a linear predictive model, which restricts its predictive power when dealing with extremely complex data. In this paper, we consider stochastic AUC maximization problem with a deep neural network as the predictive model. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a {\it non-convex concave} min-max problem. The main contribution made in this paper is to make stochastic AUC maximization more practical for deep neural networks and big data with theoretical insights as well. In particular, we propose to explore Polyak-Łojasiewicz (PL) condition that has been proved and observed in deep learning, which enables us to develop new stochastic algorithms with even faster convergence rate and more practical step size scheme. An AdaGrad-style algorithm is also analyzed under the PL condition with adaptive convergence rate. Our experimental results demonstrate the effectiveness of the proposed algorithms.
△ Less
Submitted 29 June, 2020; v1 submitted 28 August, 2019;
originally announced August 2019.
-
Stochastic Proximal AUC Maximization
Authors:
Yunwen Lei,
Yiming Ying
Abstract:
In this paper we consider the problem of maximizing the Area under the ROC curve (AUC) which is a widely used performance metric in imbalanced classification and anomaly detection. Due to the pairwise nonlinearity of the objective function, classical SGD algorithms do not apply to the task of AUC maximization. We propose a novel stochastic proximal algorithm for AUC maximization which is scalable…
▽ More
In this paper we consider the problem of maximizing the Area under the ROC curve (AUC) which is a widely used performance metric in imbalanced classification and anomaly detection. Due to the pairwise nonlinearity of the objective function, classical SGD algorithms do not apply to the task of AUC maximization. We propose a novel stochastic proximal algorithm for AUC maximization which is scalable to large scale streaming data. Our algorithm can accommodate general penalty terms and is easy to implement with favorable $O(d)$ space and per-iteration time complexities. We establish a high-probability convergence rate $O(1/\sqrt{T})$ for the general convex setting, and improve it to a fast convergence rate $O(1/T)$ for the cases of strongly convex regularizers and no regularization term (without strong convexity). Our proof does not need the uniform boundedness assumption on the loss function or the iterates which is more fidelity to the practice. Finally, we perform extensive experiments over various benchmark data sets from real-world application domains which show the superior performance of our algorithm over the existing AUC maximization algorithms.
△ Less
Submitted 14 June, 2019;
originally announced June 2019.
-
Dual Averaging Method for Online Graph-structured Sparsity
Authors:
Baojian Zhou,
Feng Chen,
Yiming Ying
Abstract:
Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods desi…
▽ More
Online learning algorithms update models via one sample per iteration, thus efficient to process large-scale datasets and useful to detect malicious events for social benefits, such as disease outbreak and traffic congestion on the fly. However, existing algorithms for graph-structured models focused on the offline setting and the least square loss, incapable for online setting, while methods designed for online setting cannot be directly applied to the problem of complex (usually non-convex) graph-structured sparsity model. To address these limitations, in this paper we propose a new algorithm for graph-structured sparsity constraint problems under online setting, which we call \textsc{GraphDA}. The key part in \textsc{GraphDA} is to project both averaging gradient (in dual space) and primal variables (in primal space) onto lower dimensional subspaces, thus capturing the graph-structured sparsity effectively. Furthermore, the objective functions assumed here are generally convex so as to handle different losses for online learning settings. To the best of our knowledge, \textsc{GraphDA} is the first online learning algorithm for graph-structure constrained optimization problems. To validate our method, we conduct extensive experiments on both benchmark graph and real-world graph datasets. Our experiment results show that, compared to other baseline methods, \textsc{GraphDA} not only improves classification performance, but also successfully captures graph-structured features more effectively, hence stronger interpretability.
△ Less
Submitted 25 May, 2019;
originally announced May 2019.
-
Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization
Authors:
Baojian Zhou,
Feng Chen,
Yiming Ying
Abstract:
Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity information is very specific, e.g., convex sparsity-inducing norms or $\ell^0$-norm. However, these norms cannot be directly applied to the problem of complex (…
▽ More
Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity information is very specific, e.g., convex sparsity-inducing norms or $\ell^0$-norm. However, these norms cannot be directly applied to the problem of complex (non-convex) graph-structured sparsity models, which have important application in disease outbreak and social networks, etc. In this paper, we propose a stochastic gradient-based method for solving graph-structured sparsity constraint problems, not restricted to the least square loss. We prove that our algorithm enjoys a linear convergence up to a constant error, which is competitive with the counterparts in the batch learning setting. We conduct extensive experiments to show the efficiency and effectiveness of the proposed algorithms.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
Stability and Optimization Error of Stochastic Gradient Descent for Pairwise Learning
Authors:
Wei Shen,
Zhenhuan Yang,
Yiming Ying,
Xiaoming Yuan
Abstract:
In this paper we study the stability and its trade-off with optimization error for stochastic gradient descent (SGD) algorithms in the pairwise learning setting. Pairwise learning refers to a learning task which involves a loss function depending on pairs of instances among which notable examples are bipartite ranking, metric learning, area under ROC (AUC) maximization and minimum error entropy (M…
▽ More
In this paper we study the stability and its trade-off with optimization error for stochastic gradient descent (SGD) algorithms in the pairwise learning setting. Pairwise learning refers to a learning task which involves a loss function depending on pairs of instances among which notable examples are bipartite ranking, metric learning, area under ROC (AUC) maximization and minimum error entropy (MEE) principle. Our contribution is twofold. Firstly, we establish the stability results of SGD for pairwise learning in the convex, strongly convex and non-convex settings, from which generalization bounds can be naturally derived. Secondly, we establish the trade-off between stability and optimization error of SGD algorithms for pairwise learning. This is achieved by lower-bounding the sum of stability and optimization error by the minimax statistical error over a prescribed class of pairwise loss functions. From this fundamental trade-off, we obtain lower bounds for the optimization error of SGD algorithms and the excess expected risk over a class of pairwise losses. In addition, we illustrate our stability results by giving some specific examples of AUC maximization, metric learning and MEE.
△ Less
Submitted 26 April, 2019; v1 submitted 25 April, 2019;
originally announced April 2019.
-
Object Detection based Deep Unsupervised Hashing
Authors:
Rong-Cheng Tu,
Xian-Ling Mao,
Bo-Si Feng,
Bing-Bing Bian,
Yu-shu Ying
Abstract:
Recently, similarity-preserving hashing methods have been extensively studied for large-scale image retrieval. Compared with unsupervised hashing, supervised hashing methods for labeled data have usually better performance by utilizing semantic label information. Intuitively, for unlabeled data, it will improve the performance of unsupervised hashing methods if we can first mine some supervised se…
▽ More
Recently, similarity-preserving hashing methods have been extensively studied for large-scale image retrieval. Compared with unsupervised hashing, supervised hashing methods for labeled data have usually better performance by utilizing semantic label information. Intuitively, for unlabeled data, it will improve the performance of unsupervised hashing methods if we can first mine some supervised semantic 'label information' from unlabeled data and then incorporate the 'label information' into the training process. Thus, in this paper, we propose a novel Object Detection based Deep Unsupervised Hashing method (ODDUH). Specifically, a pre-trained object detection model is utilized to mining supervised 'label information', which is used to guide the learning process to generate high-quality hash codes.Extensive experiments on two public datasets demonstrate that the proposed method outperforms the state-of-the-art unsupervised hashing methods in the image retrieval task.
△ Less
Submitted 24 November, 2018;
originally announced November 2018.
-
A Univariate Bound of Area Under ROC
Authors:
Siwei Lyu,
Yiming Ying
Abstract:
Area under ROC (AUC) is an important metric for binary classification and bipartite ranking problems. However, it is difficult to directly optimizing AUC as a learning objective, so most existing algorithms are based on optimizing a surrogate loss to AUC. One significant drawback of these surrogate losses is that they require pairwise comparisons among training data, which leads to slow running ti…
▽ More
Area under ROC (AUC) is an important metric for binary classification and bipartite ranking problems. However, it is difficult to directly optimizing AUC as a learning objective, so most existing algorithms are based on optimizing a surrogate loss to AUC. One significant drawback of these surrogate losses is that they require pairwise comparisons among training data, which leads to slow running time and increasing local storage for online learning. In this work, we describe a new surrogate loss based on a reformulation of the AUC risk, which does not require pairwise comparison but rankings of the predictions. We further show that the ranking operation can be avoided, and the learning objective obtained based on this surrogate enjoys linear complexity in time and storage. We perform experiments to demonstrate the effectiveness of the online and batch algorithms for AUC optimization based on the proposed surrogate loss.
△ Less
Submitted 25 May, 2018; v1 submitted 16 April, 2018;
originally announced April 2018.
-
Learning with Correntropy-induced Losses for Regression with Mixture of Symmetric Stable Noise
Authors:
Yunlong Feng,
Yiming Ying
Abstract:
In recent years, correntropy and its applications in machine learning have been drawing continuous attention owing to its merits in dealing with non-Gaussian noise and outliers. However, theoretical understanding of correntropy, especially in the statistical learning context, is still limited. In this study, within the statistical learning framework, we investigate correntropy based regression in…
▽ More
In recent years, correntropy and its applications in machine learning have been drawing continuous attention owing to its merits in dealing with non-Gaussian noise and outliers. However, theoretical understanding of correntropy, especially in the statistical learning context, is still limited. In this study, within the statistical learning framework, we investigate correntropy based regression in the presence of non-Gaussian noise or outliers. Motivated by the practical way of generating non-Gaussian noise or outliers, we introduce mixture of symmetric stable noise, which include Gaussian noise, Cauchy noise, and their mixture as special cases, to model non-Gaussian noise or outliers. We demonstrate that under the mixture of symmetric stable noise assumption, correntropy based regression can learn the conditional mean function or the conditional median function well without resorting to the finite-variance or even the finite first-order moment condition on the noise. In particular, for the above two cases, we establish asymptotic optimal learning rates for correntropy based regression estimators that are asymptotically of type $\mathcal{O}(n^{-1})$. These results justify the effectiveness of the correntropy based regression estimators in dealing with outliers as well as non-Gaussian noise. We believe that the present study completes our understanding towards correntropy based regression from a statistical learning viewpoint, and may also shed some light on robust statistical learning for regression.
△ Less
Submitted 4 September, 2019; v1 submitted 28 February, 2018;
originally announced March 2018.
-
Learning with Average Top-k Loss
Authors:
Yanbo Fan,
Siwei Lyu,
Yiming Ying,
Bao-Gang Hu
Abstract:
In this work, we introduce the {\em average top-$k$} (\atk) loss as a new aggregate loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the \atk loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss, but can combine their advantages and mitigate their drawbac…
▽ More
In this work, we introduce the {\em average top-$k$} (\atk) loss as a new aggregate loss for supervised learning, which is the average over the $k$ largest individual losses over a training dataset. We show that the \atk loss is a natural generalization of the two widely used aggregate losses, namely the average loss and the maximum loss, but can combine their advantages and mitigate their drawbacks to better adapt to different data distributions. Furthermore, it remains a convex function over all individual losses, which can lead to convex optimization problems that can be solved effectively with conventional gradient-based methods. We provide an intuitive interpretation of the \atk loss based on its equivalent effect on the continuous individual loss functions, suggesting that it can reduce the penalty on correctly classified data. We further give a learning theory analysis of \matk learning on the classification calibration of the \atk loss and the error bounds of \atk-SVM. We demonstrate the applicability of minimum average top-$k$ learning for binary classification and regression using synthetic and real datasets.
△ Less
Submitted 20 December, 2017; v1 submitted 24 May, 2017;
originally announced May 2017.
-
Unregularized Online Learning Algorithms with General Loss Functions
Authors:
Yiming Ying,
Ding-Xuan Zhou
Abstract:
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Spaces (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general gamma-activating loss (see Definition 1 in the paper). Our results extend and refine the results in Ying and Pontil (2008) for the least-square lo…
▽ More
In this paper, we consider unregularized online learning algorithms in a Reproducing Kernel Hilbert Spaces (RKHS). Firstly, we derive explicit convergence rates of the unregularized online learning algorithms for classification associated with a general gamma-activating loss (see Definition 1 in the paper). Our results extend and refine the results in Ying and Pontil (2008) for the least-square loss and the recent result in Bach and Moulines (2011) for the loss function with a Lipschitz-continuous gradient. Moreover, we establish a very general condition on the step sizes which guarantees the convergence of the last iterate of such algorithms. Secondly, we establish, for the first time, the convergence of the unregularized pairwise learning algorithm with a general loss function and derive explicit rates under the assumption of polynomially decaying step sizes. Concrete examples are used to illustrate our main results. The main techniques are tools from convex analysis, refined inequalities of Gaussian averages, and an induction approach.
△ Less
Submitted 26 April, 2015; v1 submitted 2 March, 2015;
originally announced March 2015.
-
Online Pairwise Learning Algorithms with Kernels
Authors:
Yiming Ying,
Ding-Xuan Zhou
Abstract:
Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones include ranking, metric learning and AUC maximization. In this paper, we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS), which we refer to as the O…
▽ More
Pairwise learning usually refers to a learning task which involves a loss function depending on pairs of examples, among which most notable ones include ranking, metric learning and AUC maximization. In this paper, we study an online algorithm for pairwise learning with a least-square loss function in an unconstrained setting of a reproducing kernel Hilbert space (RKHS), which we refer to as the Online Pairwise lEaRning Algorithm (OPERA). In contrast to existing works \cite{Kar,Wang} which require that the iterates are restricted to a bounded domain or the loss function is strongly-convex, OPERA is associated with a non-strongly convex objective function and learns the target function in an unconstrained RKHS. Specifically, we establish a general theorem which guarantees the almost surely convergence for the last iterate of OPERA without any assumptions on the underlying distribution. Explicit convergence rates are derived under the condition of polynomially decaying step sizes. We also establish an interesting property for a family of widely-used kernels in the setting of pairwise learning and illustrate the above convergence results using such kernels. Our methodology mainly depends on the characterization of RKHSs using its associated integral operators and probability inequalities for random variables with values in a Hilbert space.
△ Less
Submitted 25 February, 2015;
originally announced February 2015.
-
Guaranteed Classification via Regularized Similarity Learning
Authors:
Zheng-Chu Guo,
Yiming Ying
Abstract:
Learning an appropriate (dis)similarity function from the available data is a central problem in machine learning, since the success of many machine learning algorithms critically depends on the choice of a similarity function to compare examples. Despite many approaches for similarity metric learning have been proposed, there is little theoretical study on the links between similarity met- ric le…
▽ More
Learning an appropriate (dis)similarity function from the available data is a central problem in machine learning, since the success of many machine learning algorithms critically depends on the choice of a similarity function to compare examples. Despite many approaches for similarity metric learning have been proposed, there is little theoretical study on the links between similarity met- ric learning and the classification performance of the result classifier. In this paper, we propose a regularized similarity learning formulation associated with general matrix-norms, and establish their generalization bounds. We show that the generalization error of the resulting linear separator can be bounded by the derived generalization bound of similarity learning. This shows that a good gen- eralization of the learnt similarity function guarantees a good classification of the resulting linear classifier. Our results extend and improve those obtained by Bellet at al. [3]. Due to the techniques dependent on the notion of uniform stability [6], the bound obtained there holds true only for the Frobenius matrix- norm regularization. Our techniques using the Rademacher complexity [5] and its related Khinchin-type inequality enable us to establish bounds for regularized similarity learning formulations associated with general matrix-norms including sparse L 1 -norm and mixed (2,1)-norm.
△ Less
Submitted 29 August, 2013; v1 submitted 13 June, 2013;
originally announced June 2013.
-
Generalization Bounds for Metric and Similarity Learning
Authors:
Qiong Cao,
Zheng-Chu Guo,
Yiming Ying
Abstract:
Recently, metric learning and similarity learning have attracted a large amount of interest. Many models and optimisation algorithms have been proposed. However, there is relatively little work on the generalization analysis of such methods. In this paper, we derive novel generalization bounds of metric and similarity learning. In particular, we first show that the generalization analysis reduces…
▽ More
Recently, metric learning and similarity learning have attracted a large amount of interest. Many models and optimisation algorithms have been proposed. However, there is relatively little work on the generalization analysis of such methods. In this paper, we derive novel generalization bounds of metric and similarity learning. In particular, we first show that the generalization analysis reduces to the estimation of the Rademacher average over "sums-of-i.i.d." sample-blocks related to the specific matrix norm. Then, we derive generalization bounds for metric/similarity learning with different matrix-norm regularisers by estimating their specific Rademacher complexities. Our analysis indicates that sparse metric/similarity learning with $L^1$-norm regularisation could lead to significantly better bounds than those with Frobenius-norm regularisation. Our novel generalization analysis develops and refines the techniques of U-statistics and Rademacher complexity analysis.
△ Less
Submitted 17 March, 2013; v1 submitted 23 July, 2012;
originally announced July 2012.