subscribe to arXiv mailings

arXiv:2405.20124 [pdf, other]

A Geometric Unification of Distributionally Robust Covariance Estimators: Shrinking the Spectrum by Inflating the Ambiguity Set

Authors: Man-Chung Yue, Yves Rychener, Daniel Kuhn, Viet Anh Nguyen

Abstract: The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a pri… ▽ More The state-of-the-art methods for estimating high-dimensional covariance matrices all shrink the eigenvalues of the sample covariance matrix towards a data-insensitive shrinkage target. The underlying shrinkage transformation is either chosen heuristically - without compelling theoretical justification - or optimally in view of restrictive distributional assumptions. In this paper, we propose a principled approach to construct covariance estimators without imposing restrictive assumptions. That is, we study distributionally robust covariance estimation problems that minimize the worst-case Frobenius error with respect to all data distributions close to a nominal distribution, where the proximity of distributions is measured via a divergence on the space of covariance matrices. We identify mild conditions on this divergence under which the resulting minimizers represent shrinkage estimators. We show that the corresponding shrinkage transformations are intimately related to the geometrical properties of the underlying divergence. We also prove that our robust estimators are efficiently computable and asymptotically consistent and that they enjoy finite-sample performance guarantees. We exemplify our general methodology by synthesizing explicit estimators induced by the Kullback-Leibler, Fisher-Rao, and Wasserstein divergences. Numerical experiments based on synthetic and real data show that our robust estimators are competitive with state-of-the-art estimators. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2404.06711 [pdf, other]

MathVC: An LLM-Simulated Multi-Character Virtual Classroom for Mathematics Education

Authors: Murong Yue, Wijdane Mifdal, Yixuan Zhang, Jennifer Suh, Ziyu Yao

Abstract: Mathematical modeling (MM) is considered a fundamental skill for students in STEM disciplines. Practicing the MM skill is often the most effective when students can engage in group discussion and collaborative problem-solving. However, due to unevenly distributed teachers and educational resources needed to monitor such group activities, students do not always receive equal opportunities for this… ▽ More Mathematical modeling (MM) is considered a fundamental skill for students in STEM disciplines. Practicing the MM skill is often the most effective when students can engage in group discussion and collaborative problem-solving. However, due to unevenly distributed teachers and educational resources needed to monitor such group activities, students do not always receive equal opportunities for this practice. Excitingly, large language models (LLMs) have recently demonstrated strong capability in both modeling mathematical problems and simulating characters with different traits and properties. Drawing inspiration from the advancement of LLMs, in this work, we present MATHVC, the very first LLM-powered virtual classroom containing multiple LLM-simulated student characters, with whom a human student can practice their MM skill. To encourage each LLM character's behaviors to be aligned with their specified math-relevant properties (termed "characteristics alignment") and the overall conversational procedure to be close to an authentic student MM discussion (termed "conversational procedural alignment"), we proposed three innovations: integrating MM domain knowledge into the simulation, defining a symbolic schema as the ground for character simulation, and designing a meta planner at the platform level to drive the conversational procedure. Through experiments and ablation studies, we confirmed the effectiveness of our simulation approach and showed the promise for MATHVC to benefit real-life students in the future. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: Work in progress

arXiv:2401.14095 [pdf, other]

Evaluating User Experience and Data Quality in a Gamified Data Collection for Appearance-Based Gaze Estimation

Authors: Mingtao Yue, Tomomi Sayuda, Miles Pennington, Yusuke Sugano

Abstract: Appearance-based gaze estimation, which uses only a regular camera to estimate human gaze, is important in various application fields. While the technique faces data bias issues, data collection protocol is often demanding, and collecting data from a wide range of participants is difficult. It is an important challenge to design opportunities that allow a diverse range of people to participate whi… ▽ More Appearance-based gaze estimation, which uses only a regular camera to estimate human gaze, is important in various application fields. While the technique faces data bias issues, data collection protocol is often demanding, and collecting data from a wide range of participants is difficult. It is an important challenge to design opportunities that allow a diverse range of people to participate while ensuring the quality of the training data. To tackle this challenge, we introduce a novel gamified approach for collecting training data. In this game, two players communicate words via eye gaze through a transparent letter board. Images captured during gameplay serve as valuable training data for gaze estimation models. The game is designed as a physical installation that involves communication between players, and it is expected to attract the interest of diverse participants. We assess the game's significance on data quality and user experience through a comparative user study. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.07763 [pdf, other]

Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalization

Authors: Min Zhang, Jianfeng He, Shuo Lei, Murong Yue, Linhang Wang, Chang-Tien Lu

Abstract: The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks thro… ▽ More The meaning of complex phrases in natural language is composed of their individual components. The task of compositional generalization evaluates a model's ability to understand new combinations of components. Previous studies trained smaller, task-specific models, which exhibited poor generalization. While large language models (LLMs) exhibit impressive generalization abilities on many tasks through in-context learning (ICL), their potential for compositional generalization remains unexplored. In this paper, we first empirically investigate prevailing ICL methods in compositional generalization. We find that they struggle with complex compositional questions due to cumulative errors in long reasoning steps and intricate logic required for tool-making. Consequently, we propose a human-guided tool manipulation framework (HTM) that generates tools for sub-questions and integrates multiple tools. Our method enhances the effectiveness of tool creation and usage with minimal human effort. Experiments show that our method achieves state-of-the-art performance on two compositional generalization benchmarks and outperforms existing methods on the most challenging test split by 70%. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted by ICASSP 2024

arXiv:2311.11349 [pdf, other]

Coverage-Validity-Aware Algorithmic Recourse

Authors: Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen

Abstract: Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the fut… ▽ More Algorithmic recourse emerges as a prominent technique to promote the explainability, transparency and hence ethics of machine learning models. Existing algorithmic recourse approaches often assume an invariant predictive model; however, the predictive model is usually updated upon the arrival of new data. Thus, a recourse that is valid respective to the present model may become invalid for the future model. To resolve this issue, we propose a novel framework to generate a model-agnostic recourse that exhibits robustness to model shifts. Our framework first builds a coverage-validity-aware linear surrogate of the nonlinear (black-box) model; then, the recourse is generated with respect to the linear surrogate. We establish a theoretical connection between our coverage-validity-aware linear surrogate and the minimax probability machines (MPM). We then prove that by prescribing different covariance robustness, the proposed framework recovers popular regularizations for MPM, including the $\ell_2$-regularization and class-reweighting. Furthermore, we show that our surrogate pushes the approximate hyperplane intuitively, facilitating not only robust but also interpretable recourses. The numerical results demonstrate the usefulness and robustness of our framework. △ Less

Submitted 19 November, 2023; originally announced November 2023.

arXiv:2310.03094 [pdf, other]

Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning

Authors: Murong Yue, Jie Zhao, Min Zhang, Liang Du, Ziyu Yao

Abstract: Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the in… ▽ More Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost. △ Less

Submitted 8 February, 2024; v1 submitted 4 October, 2023; originally announced October 2023.

Comments: Accepted to ICLR 2024

arXiv:2308.04030 [pdf, other]

Gentopia: A Collaborative Platform for Tool-Augmented LLMs

Authors: Binfeng Xu, Xukun Liu, Hua Shen, Zeyu Han, Yuhan Li, Murong Yue, Zhiyuan Peng, Yuchen Liu, Ziyu Yao, Dongkuan Xu

Abstract: Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework ena… ▽ More Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm. Furthermore, we establish gentpool, a public platform enabling the registration and sharing of user-customized agents. Agents registered in gentpool are composable such that they can be assembled together for agent collaboration, advancing the democratization of artificial intelligence. To ensure high-quality agents, gentbench, an integral component of gentpool, is designed to thoroughly evaluate user-customized agents across diverse aspects such as safety, robustness, efficiency, etc. We release gentopia on Github and will continuously move forward. △ Less

Submitted 8 August, 2023; originally announced August 2023.

arXiv:2308.00655 [pdf, other]

doi 10.1145/3581783.3612201

Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations

Authors: Xiaolei Diao, Daqian Shi, Jian Li, Lida Shi, Mingzhe Yue, Ruihua Qi, Chuntao Li, Hao Xu

Abstract: Optical character recognition (OCR) methods have been applied to diverse tasks, e.g., street view text recognition and document analysis. Recently, zero-shot OCR has piqued the interest of the research community because it considers a practical OCR scenario with unbalanced data distribution. However, there is a lack of benchmarks for evaluating such zero-shot methods that apply a divide-and-conque… ▽ More Optical character recognition (OCR) methods have been applied to diverse tasks, e.g., street view text recognition and document analysis. Recently, zero-shot OCR has piqued the interest of the research community because it considers a practical OCR scenario with unbalanced data distribution. However, there is a lack of benchmarks for evaluating such zero-shot methods that apply a divide-and-conquer recognition strategy by decomposing characters into radicals. Meanwhile, radical recognition, as another important OCR task, also lacks radical-level annotation for model training. In this paper, we construct an ancient Chinese character image dataset that contains both radical-level and character-level annotations to satisfy the requirements of the above-mentioned methods, namely, ACCID, where radical-level annotations include radical categories, radical locations, and structural relations. To increase the adaptability of ACCID, we propose a splicing-based synthetic character algorithm to augment the training samples and apply an image denoising method to improve the image quality. By introducing character decomposition and recombination, we propose a baseline method for zero-shot OCR. The experimental results demonstrate the validity of ACCID and the baseline model quantitatively and qualitatively. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: Accepted by ACM MM 2023

arXiv:2307.01090 [pdf, other]

doi 10.1051/0004-6361/202347332

Streamlined Lensed Quasar Identification in Multiband Images via Ensemble Networks

Authors: Irham Taufik Andika, Sherry H. Suyu, Raoul Cañameras, Alejandra Melo, Stefan Schuldt, Yiping Shu, Anna-Christina Eilers, Anton Timur Jaelani, Minghao Yue

Abstract: Quasars experiencing strong lensing offer unique viewpoints on subjects related to the cosmic expansion rate, the dark matter profile within the foreground deflectors, and the quasar host galaxies. Unfortunately, identifying them in astronomical images is challenging since they are overwhelmed by the abundance of non-lenses. To address this, we have developed a novel approach by ensembling cutting… ▽ More Quasars experiencing strong lensing offer unique viewpoints on subjects related to the cosmic expansion rate, the dark matter profile within the foreground deflectors, and the quasar host galaxies. Unfortunately, identifying them in astronomical images is challenging since they are overwhelmed by the abundance of non-lenses. To address this, we have developed a novel approach by ensembling cutting-edge convolutional networks (CNNs) -- for instance, ResNet, Inception, NASNet, MobileNet, EfficientNet, and RegNet -- along with vision transformers (ViTs) trained on realistic galaxy-quasar lens simulations based on the Hyper Suprime-Cam (HSC) multiband images. While the individual model exhibits remarkable performance when evaluated against the test dataset, achieving an area under the receiver operating characteristic curve of $>$97.3% and a median false positive rate of 3.6%, it struggles to generalize in real data, indicated by numerous spurious sources picked by each classifier. A significant improvement is achieved by averaging these CNNs and ViTs, resulting in the impurities being downsized by factors up to 50. Subsequently, combining the HSC images with the UKIRT, VISTA, and unWISE data, we retrieve approximately 60 million sources as parent samples and reduce this to 892,609 after employing a photometry preselection to discover $z>1.5$ lensed quasars with Einstein radii of $θ_\mathrm{E}<5$ arcsec. Afterward, the ensemble classifier indicates 3080 sources with a high probability of being lenses, for which we visually inspect, yielding 210 prevailing candidates awaiting spectroscopic confirmation. These outcomes suggest that automated deep learning pipelines hold great potential in effectively detecting strong lenses in vast datasets with minimal manual visual inspection involved. △ Less

Submitted 18 August, 2023; v1 submitted 3 July, 2023; originally announced July 2023.

Comments: Accepted for publication in the Astronomy & Astrophysics journal. 28 pages, 11 figures, and 3 tables. We welcome comments from the reader

Journal ref: A&A 678, A103 (2023)

arXiv:2302.06089 [pdf, other]

Federated attention consistent learning models for prostate cancer diagnosis and Gleason grading

Authors: Fei Kong, Xiyue Wang, Jinxi Xiang, Sen Yang, Xinran Wang, Meng Yue, Jun Zhang, Junhan Zhao, Xiao Han, Yuhan Dong, Biyue Zhu, Fang Wang, Yueping Liu

Abstract: Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data.… ▽ More Artificial intelligence (AI) holds significant promise in transforming medical imaging, enhancing diagnostics, and refining treatment strategies. However, the reliance on extensive multicenter datasets for training AI models poses challenges due to privacy concerns. Federated learning provides a solution by facilitating collaborative model training across multiple centers without sharing raw data. This study introduces a federated attention-consistent learning (FACL) framework to address challenges associated with large-scale pathological images and data heterogeneity. FACL enhances model generalization by maximizing attention consistency between local clients and the server model. To ensure privacy and validate robustness, we incorporated differential privacy by introducing noise during parameter transfer. We assessed the effectiveness of FACL in cancer diagnosis and Gleason grading tasks using 19,461 whole-slide images of prostate cancer from multiple centers. In the diagnosis task, FACL achieved an area under the curve (AUC) of 0.9718, outperforming seven centers with an average AUC of 0.9499 when categories are relatively balanced. For the Gleason grading task, FACL attained a Kappa score of 0.8463, surpassing the average Kappa score of 0.7379 from six centers. In conclusion, FACL offers a robust, accurate, and cost-effective AI training model for prostate cancer pathology while maintaining effective data safeguards. △ Less

Submitted 28 March, 2024; v1 submitted 12 February, 2023; originally announced February 2023.

Comments: 14 pages

arXiv:2301.12538 [pdf, other]

On Approximating the Dynamic Response of Synchronous Generators via Operator Learning: A Step Towards Building Deep Operator-based Power Grid Simulators

Authors: Christian Moya, Guang Lin, Tianqiao Zhao, Meng Yue

Abstract: This paper designs an Operator Learning framework to approximate the dynamic response of synchronous generators. One can use such a framework to (i) design a neural-based generator model that can interact with a numerical simulator of the rest of the power grid or (ii) shadow the generator's transient response. To this end, we design a data-driven Deep Operator Network~(DeepONet) that approximates… ▽ More This paper designs an Operator Learning framework to approximate the dynamic response of synchronous generators. One can use such a framework to (i) design a neural-based generator model that can interact with a numerical simulator of the rest of the power grid or (ii) shadow the generator's transient response. To this end, we design a data-driven Deep Operator Network~(DeepONet) that approximates the generators' infinite-dimensional solution operator. Then, we develop a DeepONet-based numerical scheme to simulate a given generator's dynamic response over a short/medium-term horizon. The proposed numerical scheme recursively employs the trained DeepONet to simulate the response for a given multi-dimensional input, which describes the interaction between the generator and the rest of the system. Furthermore, we develop a residual DeepONet numerical scheme that incorporates information from mathematical models of synchronous generators. We accompany this residual DeepONet scheme with an estimate for the prediction's cumulative error. We also design a data aggregation (DAgger) strategy that allows (i) employing supervised learning to train the proposed DeepONets and (ii) fine-tuning the DeepONet using aggregated training data that the DeepONet is likely to encounter during interactive simulations with other grid components. Finally, as a proof of concept, we demonstrate that the proposed DeepONet frameworks can effectively approximate the transient model of a synchronous generator. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2209.13268 [pdf, other]

Approximate Secular Equations for the Cubic Regularization Subproblem

Authors: Yihang Gao, Man-Chung Yue, Michael K. Ng

Abstract: The cubic regularization method (CR) is a popular algorithm for unconstrained non-convex optimization. At each iteration, CR solves a cubically regularized quadratic problem, called the cubic regularization subproblem (CRS). One way to solve the CRS relies on solving the secular equation, whose computational bottleneck lies in the computation of all eigenvalues of the Hessian matrix. In this paper… ▽ More The cubic regularization method (CR) is a popular algorithm for unconstrained non-convex optimization. At each iteration, CR solves a cubically regularized quadratic problem, called the cubic regularization subproblem (CRS). One way to solve the CRS relies on solving the secular equation, whose computational bottleneck lies in the computation of all eigenvalues of the Hessian matrix. In this paper, we propose and analyze a novel CRS solver based on an approximate secular equation, which requires only some of the Hessian eigenvalues and is therefore much more efficient. Two approximate secular equations (ASEs) are developed. For both ASEs, we first study the existence and uniqueness of their roots and then establish an upper bound on the gap between the root and that of the standard secular equation. Such an upper bound can in turn be used to bound the distance from the approximate CRS solution based ASEs to the true CRS solution, thus offering a theoretical guarantee for our CRS solver. A desirable feature of our CRS solver is that it requires only matrix-vector multiplication but not matrix inversion, which makes it particularly suitable for high-dimensional applications of unconstrained non-convex optimization, such as low-rank recovery and deep learning. Numerical experiments with synthetic and real data-sets are conducted to investigate the practical performance of the proposed CRS solver. Experimental results show that the proposed solver outperforms two state-of-the-art methods. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2209.10622 [pdf, other]

DeepGraphONet: A Deep Graph Operator Network to Learn and Zero-shot Transfer the Dynamic Response of Networked Systems

Authors: Yixuan Sun, Christian Moya, Guang Lin, Meng Yue

Abstract: This paper develops a Deep Graph Operator Network (DeepGraphONet) framework that learns to approximate the dynamics of a complex system (e.g. the power grid or traffic) with an underlying sub-graph structure. We build our DeepGraphONet by fusing the ability of (i) Graph Neural Networks (GNN) to exploit spatially correlated graph information and (ii) Deep Operator Networks~(DeepONet) to approximate… ▽ More This paper develops a Deep Graph Operator Network (DeepGraphONet) framework that learns to approximate the dynamics of a complex system (e.g. the power grid or traffic) with an underlying sub-graph structure. We build our DeepGraphONet by fusing the ability of (i) Graph Neural Networks (GNN) to exploit spatially correlated graph information and (ii) Deep Operator Networks~(DeepONet) to approximate the solution operator of dynamical systems. The resulting DeepGraphONet can then predict the dynamics within a given short/medium-term time horizon by observing a finite history of the graph state information. Furthermore, we design our DeepGraphONet to be resolution-independent. That is, we do not require the finite history to be collected at the exact/same resolution. In addition, to disseminate the results from a trained DeepGraphONet, we design a zero-shot learning strategy that enables using it on a different sub-graph. Finally, empirical results on the (i) transient stability prediction problem of power grids and (ii) traffic flow forecasting problem of a vehicular system illustrate the effectiveness of the proposed DeepGraphONet. △ Less

Submitted 21 September, 2022; originally announced September 2022.

arXiv:2208.14259 [pdf, other]

RIS-Aided Multiuser MIMO-OFDM with Linear Precoding and Iterative Detection: Analysis and Optimization

Authors: Mingyang Yue, Lei Liu, Xiaojun Yuan

Abstract: In this paper, we consider a reconfigurable intelligence surface (RIS) aided uplink multiuser multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) system, where the receiver is assumed to conduct low-complexity iterative detection. We aim to minimize the total transmit power by jointly designing the precoder of the transmitter and the passive beamforming of the RIS. Th… ▽ More In this paper, we consider a reconfigurable intelligence surface (RIS) aided uplink multiuser multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM) system, where the receiver is assumed to conduct low-complexity iterative detection. We aim to minimize the total transmit power by jointly designing the precoder of the transmitter and the passive beamforming of the RIS. This problem can be tackled from the perspective of information theory. But this information-theoretic approach may involve prohibitively high complexity since the number of rate constraints that specify the capacity region of the uplink multiuser channel is exponential in the number of users. To avoid this difficulty, we formulate the design problem of the iterative receiver under the constraints of a maximal iteration number and target bit error rates of users. To tackle this challenging problem, we propose a groupwise successive interference cancellation (SIC) optimization approach, where the signals of users are decoded and cancelled in a group-by-group manner. We present a heuristic user grouping strategy, and resort to the alternating optimization technique to iteratively solve the precoding and passive beamforming sub-problems. Specifically, for the precoding sub-problem, we employ fractional programming to convert it to a convex problem; for the passive beamforming sub-problem, we adopt successive convex approximation to deal with the unit-modulus constraints of the RIS. We show that the proposed groupwise SIC approach has significant advantages in both performance and computational complexity, as compared with the counterpart approaches. △ Less

Submitted 30 August, 2022; originally announced August 2022.

arXiv:2206.10833 [pdf, other]

Robust Bayesian Recourse

Authors: Tuan-Duy H. Nguyen, Ngoc Bui, Duy Nguyen, Man-Chung Yue, Viet Anh Nguyen

Abstract: Algorithmic recourse aims to recommend an informative feedback to overturn an unfavorable machine learning decision. We introduce in this paper the Bayesian recourse, a model-agnostic recourse that minimizes the posterior probability odds ratio. Further, we present its min-max robust counterpart with the goal of hedging against future changes in the machine learning model parameters. The robust co… ▽ More Algorithmic recourse aims to recommend an informative feedback to overturn an unfavorable machine learning decision. We introduce in this paper the Bayesian recourse, a model-agnostic recourse that minimizes the posterior probability odds ratio. Further, we present its min-max robust counterpart with the goal of hedging against future changes in the machine learning model parameters. The robust counterpart explicitly takes into account possible perturbations of the data in a Gaussian mixture ambiguity set prescribed using the optimal transport (Wasserstein) distance. We show that the resulting worst-case objective function can be decomposed into solving a series of two-dimensional optimization subproblems, and the min-max recourse finding problem is thus amenable to a gradient descent algorithm. Contrary to existing methods for generating robust recourses, the robust Bayesian recourse does not require a linear approximation step. The numerical experiment demonstrates the effectiveness of our proposed robust Bayesian recourse facing model shifts. Our code is available at https://github.com/VinAIResearch/robust-bayesian-recourse. △ Less

Submitted 22 June, 2022; originally announced June 2022.

Comments: Accepted to UAI'22

arXiv:2202.07176 [pdf, other]

DeepONet-Grid-UQ: A Trustworthy Deep Operator Framework for Predicting the Power Grid's Post-Fault Trajectories

Authors: Christian Moya, Shiqi Zhang, Meng Yue, Guang Lin

Abstract: This paper proposes a new data-driven method for the reliable prediction of power system post-fault trajectories. The proposed method is based on the fundamentally new concept of Deep Operator Networks (DeepONets). Compared to traditional neural networks that learn to approximate functions, DeepONets are designed to approximate nonlinear operators. Under this operator framework, we design a DeepON… ▽ More This paper proposes a new data-driven method for the reliable prediction of power system post-fault trajectories. The proposed method is based on the fundamentally new concept of Deep Operator Networks (DeepONets). Compared to traditional neural networks that learn to approximate functions, DeepONets are designed to approximate nonlinear operators. Under this operator framework, we design a DeepONet to (1) take as inputs the fault-on trajectories collected, for example, via simulation or phasor measurement units, and (2) provide as outputs the predicted post-fault trajectories. In addition, we endow our method with a much-needed ability to balance efficiency with reliable/trustworthy predictions via uncertainty quantification. To this end, we propose and compare two methods that enable quantifying the predictive uncertainty. First, we propose a \textit{Bayesian DeepONet} (B-DeepONet) that uses stochastic gradient Hamiltonian Monte-Carlo to sample from the posterior distribution of the DeepONet parameters. Then, we propose a \textit{Probabilistic DeepONet} (Prob-DeepONet) that uses a probabilistic training strategy to equip DeepONets with a form of automated uncertainty quantification, at virtually no extra computational cost. Finally, we validate the predictive power and uncertainty quantification capability of the proposed B-DeepONet and Prob-DeepONet using the IEEE 16-machine 68-bus system. △ Less

Submitted 14 February, 2022; originally announced February 2022.

arXiv:2202.03071 [pdf, other]

Distributionally Robust Fair Principal Components via Geodesic Descents

Authors: Hieu Vu, Toan Tran, Man-Chung Yue, Viet Anh Nguyen

Abstract: Principal component analysis is a simple yet useful dimensionality reduction technique in modern machine learning pipelines. In consequential domains such as college admission, healthcare and credit approval, it is imperative to take into account emerging criteria such as the fairness and the robustness of the learned projection. In this paper, we propose a distributionally robust optimization pro… ▽ More Principal component analysis is a simple yet useful dimensionality reduction technique in modern machine learning pipelines. In consequential domains such as college admission, healthcare and credit approval, it is imperative to take into account emerging criteria such as the fairness and the robustness of the learned projection. In this paper, we propose a distributionally robust optimization problem for principal component analysis which internalizes a fairness criterion in the objective function. The learned projection thus balances the trade-off between the total reconstruction error and the reconstruction error gap between subgroups, taken in the min-max sense over all distributions in a moment-based ambiguity set. The resulting optimization problem over the Stiefel manifold can be efficiently solved by a Riemannian subgradient descent algorithm with a sub-linear convergence rate. Our experimental results on real-world datasets show the merits of our proposed method over state-of-the-art baselines. △ Less

Submitted 7 February, 2022; originally announced February 2022.

Comments: International Conference on Learning Representations (ICLR) 2022

arXiv:2201.09145 [pdf, other]

glassoformer: a query-sparse transformer for post-fault power grid voltage prediction

Authors: Yunling Zheng, Carson Hu, Guang Lin, Meng Yue, Bao Wang, Jack Xin

Abstract: We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than… ▽ More We propose GLassoformer, a novel and efficient transformer architecture leveraging group Lasso regularization to reduce the number of queries of the standard self-attention mechanism. Due to the sparsified queries, GLassoformer is more computationally efficient than the standard transformers. On the power grid post-fault voltage prediction task, GLassoformer shows remarkably better prediction than many existing benchmark algorithms in terms of accuracy and stability. △ Less

Submitted 22 January, 2022; originally announced January 2022.

arXiv:2109.12846 [pdf, other]

HAGEN: Homophily-Aware Graph Convolutional Recurrent Network for Crime Forecasting

Authors: Chenyu Wang, Zongyu Lin, Xiaochen Yang, Jiao Sun, Mingxuan Yue, Cyrus Shahabi

Abstract: The crime forecasting is an important problem as it greatly contributes to urban safety. Typically, the goal of the problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions cons… ▽ More The crime forecasting is an important problem as it greatly contributes to urban safety. Typically, the goal of the problem is to predict different types of crimes for each geographical region (like a neighborhood or censor tract) in the near future. Since nearby regions usually have similar socioeconomic characteristics which indicate similar crime patterns, recent state-of-the-art solutions constructed a distance-based region graph and utilized Graph Neural Network (GNN) techniques for crime forecasting, because the GNN techniques could effectively exploit the latent relationships between neighboring region nodes in the graph. However, this distance-based pre-defined graph cannot fully capture crime correlation between regions that are far from each other but share similar crime patterns. Hence, to make an accurate crime prediction, the main challenge is to learn a better graph that reveals the dependencies between regions in crime occurrences and meanwhile captures the temporal patterns from historical crime records. To address these challenges, we propose an end-to-end graph convolutional recurrent network called HAGEN with several novel designs for crime prediction. Specifically, our framework could jointly capture the crime correlation between regions and the temporal crime dynamics by combining an adaptive region graph learning module with the Diffusion Convolution Gated Recurrent Unit (DCGRU). Based on the homophily assumption of GNN, we propose a homophily-aware constraint to regularize the optimization of the region graph so that neighboring region nodes on the learned graph share similar crime patterns, thus fitting the mechanism of diffusion convolution. It also incorporates crime embedding to model the interdependencies between regions and crime categories. Empirical experiments and comprehensive analysis on two real-world datasets showcase the effectiveness of HAGEN. △ Less

Submitted 27 September, 2021; originally announced September 2021.

arXiv:2106.00322 [pdf, other]

Sequential Domain Adaptation by Synthesizing Distributionally Robust Experts

Authors: Bahar Taskesen, Man-Chung Yue, Jose Blanchet, Daniel Kuhn, Viet Anh Nguyen

Abstract: Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are… ▽ More Least squares estimators, when trained on a few target domain samples, may predict poorly. Supervised domain adaptation aims to improve the predictive accuracy by exploiting additional labeled training samples from a source distribution that is close to the target distribution. Given available data, we investigate novel strategies to synthesize a family of least squares estimator experts that are robust with regard to moment conditions. When these moment conditions are specified using Kullback-Leibler or Wasserstein-type divergences, we can find the robust estimators efficiently using convex optimization. We use the Bernstein online aggregation algorithm on the proposed family of robust experts to generate predictions for the sequential stream of target test samples. Numerical experiments on real data show that the robust strategies may outperform non-robust interpolations of the empirical least squares estimators. △ Less

Submitted 1 June, 2021; originally announced June 2021.

arXiv:2104.05169 [pdf, other]

doi 10.1109/TWC.2022.3153106

MIMO-OFDM-Based Massive Connectivity With Frequency Selectivity Compensation

Authors: Wenjun Jiang, Mingyang Yue, Xiaojun Yuan, Yong Zuo

Abstract: In this paper, we study how to efficiently and reliably detect active devices and estimate their channels in a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) based grant-free non-orthogonal multiple access (NOMA) system to enable massive machine-type communications (mMTC). First, by exploiting the correlation of the channel frequency responses in narrow-ban… ▽ More In this paper, we study how to efficiently and reliably detect active devices and estimate their channels in a multiple-input multiple-output (MIMO) orthogonal frequency-division multiplexing (OFDM) based grant-free non-orthogonal multiple access (NOMA) system to enable massive machine-type communications (mMTC). First, by exploiting the correlation of the channel frequency responses in narrow-band mMTC, we propose a block-wise linear channel model. Specifically, the continuous OFDM subcarriers in the narrow-band are divided into several sub-blocks and a linear function with only two variables (mean and slope) is used to approximate the frequency-selective channel in each sub-block. This significantly reduces the number of variables to be determined in channel estimation and the sub-block number can be adjusted to reliably compensate the channel frequency-selectivity. Second, we formulate the joint active device detection and channel estimation in the block-wise linear system as a Bayesian inference problem. By exploiting the block-sparsity of the channel matrix, we develop an efficient turbo message passing (Turbo-MP) algorithm to resolve the Bayesian inference problem with near-linear complexity. We further incorporate machine learning approaches into Turbo-MP to learn unknown prior parameters. Numerical results demonstrate the superior performance of the proposed algorithm over state-of-the-art algorithms. △ Less

Submitted 11 April, 2021; originally announced April 2021.

Journal ref: IEEE Transactions on Wireless Communications, vol. 21, no. 9, pp. 6920-6934, Sept. 2022

arXiv:2101.10386 [pdf, other]

ProbLock: Probability-based Logic Locking

Authors: Michael Yue, Fatemeh Tehranipoor

Abstract: Integrated circuit (IC) piracy and overproduction are serious issues that threaten the security and integrity of a system. Logic locking is a type of hardware obfuscation technique where additional key gates are inserted into the circuit. Only the correct key can unlock the functionality of that circuit otherwise the system produces the wrong output. In an effort to hinder these threats on ICs, we… ▽ More Integrated circuit (IC) piracy and overproduction are serious issues that threaten the security and integrity of a system. Logic locking is a type of hardware obfuscation technique where additional key gates are inserted into the circuit. Only the correct key can unlock the functionality of that circuit otherwise the system produces the wrong output. In an effort to hinder these threats on ICs, we have developed a probability-based logic locking technique to protect the design of a circuit. Our proposed technique called "ProbLock" can be applied to combinational and sequential circuits through a critical selection process. We used a filtering process to select the best location of key gates based on various constraints. Each step in the filtering process generates a subset of nodes for each constraint. We also analyzed the correlation between each constraint and adjusted the strength of the constraints before inserting key gates. We have tested our algorithm on 40 benchmarks from the ISCAS '85 and ISCAS '89 suite. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2009.07514 [pdf, other]

A Unified Approach to Synchronization Problems over Subgroups of the Orthogonal Group

Authors: Huikang Liu, Man-Chung Yue, Anthony Man-Cho So

Abstract: The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consid… ▽ More The problem of synchronization over a group $\mathcal{G}$ aims to estimate a collection of group elements $G^*_1, \dots, G^*_n \in \mathcal{G}$ based on noisy observations of a subset of all pairwise ratios of the form $G^*_i {G^*_j}^{-1}$. Such a problem has gained much attention recently and finds many applications across a wide range of scientific and engineering areas. In this paper, we consider the class of synchronization problems in which the group is a closed subgroup of the orthogonal group. This class covers many group synchronization problems that arise in practice. Our contribution is fivefold. First, we propose a unified approach for solving this class of group synchronization problems, which consists of a suitable initialization step and an iterative refinement step based on the generalized power method, and show that it enjoys a strong theoretical guarantee on the estimation error under certain assumptions on the group, measurement graph, noise, and initialization. Second, we formulate two geometric conditions that are required by our approach and show that they hold for various practically relevant subgroups of the orthogonal group. The conditions are closely related to the error-bound geometry of the subgroup -- an important notion in optimization. Third, we verify the assumptions on the measurement graph and noise for standard random graph and random matrix models. Fourth, based on the classic notion of metric entropy, we develop and analyze a novel spectral-type estimator. Finally, we show via extensive numerical experiments that our proposed non-convex approach outperforms existing approaches in terms of computational speed, scalability, and/or estimation error. △ Less

Submitted 16 June, 2023; v1 submitted 16 September, 2020; originally announced September 2020.

arXiv:2004.07162 [pdf, ps, other]

On Linear Optimization over Wasserstein Balls

Authors: Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann

Abstract: Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is wea… ▽ More Wasserstein balls, which contain all probability measures within a pre-specified Wasserstein distance to a reference measure, have recently enjoyed wide popularity in the distributionally robust optimization and machine learning communities to formulate and solve data-driven optimization problems with rigorous statistical guarantees. In this technical note we prove that the Wasserstein ball is weakly compact under mild conditions, and we offer necessary and sufficient conditions for the existence of optimal solutions. We also characterize the sparsity of solutions if the Wasserstein ball is centred at a discrete reference measure. In comparison with the existing literature, which has proved similar results under different conditions, our proofs are self-contained and shorter, yet mathematically rigorous, and our necessary and sufficient conditions for the existence of optimal solutions are easily verifiable in practice. △ Less

Submitted 6 June, 2021; v1 submitted 15 April, 2020; originally announced April 2020.

arXiv:2003.01351 [pdf, other]

DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis

Authors: Mingxuan Yue, Yaguang Li, Haoze Yang, Ritesh Ahuja, Yao-Yi Chiang, Cyrus Shahabi

Abstract: Identifying mobility behaviors in rich trajectory data is of great economic and social interest to various applications including urban planning, marketing and intelligence. Existing work on trajectory clustering often relies on similarity measurements that utilize raw spatial and/or temporal information of trajectories. These measures are incapable of identifying similar moving behaviors that exh… ▽ More Identifying mobility behaviors in rich trajectory data is of great economic and social interest to various applications including urban planning, marketing and intelligence. Existing work on trajectory clustering often relies on similarity measurements that utilize raw spatial and/or temporal information of trajectories. These measures are incapable of identifying similar moving behaviors that exhibit varying spatio-temporal scales of movement. In addition, the expense of labeling massive trajectory data is a barrier to supervised learning models. To address these challenges, we propose an unsupervised neural approach for mobility behavior clustering, called the Deep Embedded TrajEctory ClusTering network (DETECT). DETECT operates in three parts: first it transforms the trajectories by summarizing their critical parts and augmenting them with context derived from their geographical locality (e.g., using POIs from gazetteers). In the second part, it learns a powerful representation of trajectories in the latent space of behaviors, thus enabling a clustering function (such as $k$-means) to be applied. Finally, a clustering oriented loss is directly built on the embedded features to jointly perform feature refinement and cluster assignment, thus improving separability between mobility behaviors. Exhaustive quantitative and qualitative experiments on two real-world datasets demonstrate the effectiveness of our approach for mobility behavior analyses. △ Less

Submitted 3 March, 2020; originally announced March 2020.

Comments: Published as a conference paper at BigData 2019

arXiv:1910.10583 [pdf, other]

Optimistic Distributionally Robust Optimization for Nonparametric Likelihood Approximation

Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann

Abstract: The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the… ▽ More The likelihood function is a fundamental component in Bayesian statistics. However, evaluating the likelihood of an observation is computationally intractable in many applications. In this paper, we propose a non-parametric approximation of the likelihood that identifies a probability measure which lies in the neighborhood of the nominal measure and that maximizes the probability of observing the given sample point. We show that when the neighborhood is constructed by the Kullback-Leibler divergence, by moment conditions or by the Wasserstein distance, then our \textit{optimistic likelihood} can be determined through the solution of a convex optimization problem, and it admits an analytical expression in particular cases. We also show that the posterior inference problem with our optimistic likelihood approximation enjoys strong theoretical performance guarantees, and it performs competitively in a probabilistic classification task. △ Less

Submitted 23 October, 2019; originally announced October 2019.

arXiv:1910.07817 [pdf, other]

Calculating Optimistic Likelihoods Using (Geodesically) Convex Optimization

Authors: Viet Anh Nguyen, Soroosh Shafieezadeh-Abadeh, Man-Chung Yue, Daniel Kuhn, Wolfram Wiesemann

Abstract: A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its… ▽ More A fundamental problem arising in many areas of machine learning is the evaluation of the likelihood of a given observation under different nominal distributions. Frequently, these nominal distributions are themselves estimated from data, which makes them susceptible to estimation errors. We thus propose to replace each nominal distribution with an ambiguity set containing all distributions in its vicinity and to evaluate an \emph{optimistic likelihood}, that is, the maximum of the likelihood over all distributions in the ambiguity set. When the proximity of distributions is quantified by the Fisher-Rao distance or the Kullback-Leibler divergence, the emerging optimistic likelihoods can be computed efficiently using either geodesic or standard convex optimization techniques. We showcase the advantages of working with optimistic likelihoods on a classification problem using synthetic as well as empirical data. △ Less

Submitted 17 October, 2019; originally announced October 2019.

arXiv:1511.04244 [pdf, other]

doi 10.1109/LSP.2016.2569600

A Robust Design for MISO Physical-Layer Multicasting over Line-of-Sight Channels

Authors: Man-Chung Yue, Sissi Xiaoxiao Wu, Anthony Man-Cho So

Abstract: This paper studies a robust design problem for far-field line-of-sight (LOS) channels where phase errors are present. Compared with the commonly used additive error model, the phase error model is more suitable for capturing the uncertainty in an LOS channel, as the dominant source of uncertainty lies in the phase. We consider a multiple-input single-output (MISO) multicast scenario, in which our… ▽ More This paper studies a robust design problem for far-field line-of-sight (LOS) channels where phase errors are present. Compared with the commonly used additive error model, the phase error model is more suitable for capturing the uncertainty in an LOS channel, as the dominant source of uncertainty lies in the phase. We consider a multiple-input single-output (MISO) multicast scenario, in which our goal is to design a beamformer that minimizes the transmit power while satisfying probabilistic signal-to-noise ratio (SNR) constraints. The probabilistic constraints give rise to a new computational challenge, as they involve random trigonometric forms. In this work, we propose to first approximate the random trigonometric form by its second-order Taylor expansion and then tackle the resulting random quadratic form using a Bernstein-type inequality. The advantage of such an approach is that an approximately optimal beamformer can be obtained using the standard semidefinite relaxation technique. In the simulations, we first show that if a non-robust design (i.e., one that does not take phase errors into account) is used, then the whole system may collapse. We then show that our proposed method is less conservative than the existing robust design based on Gaussian approximation and thus requires a lower power budget. △ Less

Submitted 13 November, 2015; originally announced November 2015.

Comments: This manuscript is submitted for possible journal publication on 13-Nov-2015

arXiv:1405.4890 [pdf]

A Revised Incremental Conductance MPPT Algorithm for Solar PV Generation Systems

Authors: Meng Yue, Xiaoyu Wang

Abstract: A revised Incremental Conductance (IncCond) maximum power point tracking (MPPT) algorithm for PV generation systems is proposed in this paper. The commonly adopted traditional IncCond method uses a constant step size for voltage adjustment and is difficult to achieve both a good tracking performance and quick elimination of the oscillations, especially under the dramatic changes of the environment… ▽ More A revised Incremental Conductance (IncCond) maximum power point tracking (MPPT) algorithm for PV generation systems is proposed in this paper. The commonly adopted traditional IncCond method uses a constant step size for voltage adjustment and is difficult to achieve both a good tracking performance and quick elimination of the oscillations, especially under the dramatic changes of the environment conditions. For the revised algorithm, the incremental voltage change step size is adaptively adjusted based on the slope of the power-voltage (P-V) curve. An accelerating factor and a decelerating factor are further applied to adjust the voltage step change considering whether the sign of the P-V curve slope remains the same or not in a subsequent tracking step. In addition, the upper bound of the maximum voltage step change is also updated considering the information of sign changes. The revised MPPT algorithm can quickly track the maximum power points (MPPs) and remove the oscillation of the actual operation points around the real MPPs. The effectiveness of the revised algorithm is demonstrated using a simulation. △ Less

Submitted 19 May, 2014; originally announced May 2014.

arXiv:1209.0377 [pdf, ps, other]

A Perturbation Inequality for the Schatten-$p$ Quasi-Norm and Its Applications to Low-Rank Matrix Recovery

Authors: Man-Chung Yue, Anthony Man-Cho So

Abstract: In this paper, we establish the following perturbation result concerning the singular values of a matrix: Let $A,B \in \mathbb{R}^{m\times n}$ be given matrices, and let $f:\mathbb{R}_+\rightarrow\mathbb{R}_+$ be a concave function satisfying $f(0)=0$. Then, we have $$ \sum_{i=1}^{\min\{m,n\}} \big| f(σ_i(A)) - f(σ_i(B)) \big| \le \sum_{i=1}^{\min\{m,n\}} f(σ_i(A-B)), $$ where $σ_i(\cdot)$ denotes… ▽ More In this paper, we establish the following perturbation result concerning the singular values of a matrix: Let $A,B \in \mathbb{R}^{m\times n}$ be given matrices, and let $f:\mathbb{R}_+\rightarrow\mathbb{R}_+$ be a concave function satisfying $f(0)=0$. Then, we have $$ \sum_{i=1}^{\min\{m,n\}} \big| f(σ_i(A)) - f(σ_i(B)) \big| \le \sum_{i=1}^{\min\{m,n\}} f(σ_i(A-B)), $$ where $σ_i(\cdot)$ denotes the $i$--th largest singular value of a matrix. This answers an open question that is of interest to both the compressive sensing and linear algebra communities. In particular, by taking $f(\cdot)=(\cdot)^p$ for any $p \in (0,1]$, we obtain a perturbation inequality for the so--called Schatten $p$--quasi--norm, which allows us to confirm the validity of a number of previously conjectured conditions for the recovery of low--rank matrices via the popular Schatten $p$--quasi--norm heuristic. We believe that our result will find further applications, especially in the study of low--rank matrix recovery. △ Less

Submitted 27 June, 2014; v1 submitted 3 September, 2012; originally announced September 2012.

Showing 1–30 of 30 results for author: Yue, M