subscribe to arXiv mailings

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

Authors: Enming Zhang, Ruobing Yao, Huanyong Liu, Junhui Yu, Jiale Wang

Abstract: With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first compr… ▽ More With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs' abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on MLLMs for tasks based on flowcharts. \url{https://github.com/360AILAB-NLP/FlowCE} \end{abstract} △ Less

Submitted 18 June, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07952 [pdf, other]

Spatial-Frequency Dual Progressive Attention Network For Medical Image Segmentation

Authors: Zhenhuan Zhou, Along He, Yanlin Wu, Rui Yao, Xueshuo Xie, Tao Li

Abstract: In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or… ▽ More In medical images, various types of lesions often manifest significant differences in their shape and texture. Accurate medical image segmentation demands deep learning models with robust capabilities in multi-scale and boundary feature learning. However, previous networks still have limitations in addressing the above issues. Firstly, previous networks simultaneously fuse multi-level features or employ deep supervision to enhance multi-scale learning. However, this may lead to feature redundancy and excessive computational overhead, which is not conducive to network training and clinical deployment. Secondly, the majority of medical image segmentation networks exclusively learn features in the spatial domain, disregarding the abundant global information in the frequency domain. This results in a bias towards low-frequency components, neglecting crucial high-frequency information. To address these problems, we introduce SF-UNet, a spatial-frequency dual-domain attention network. It comprises two main components: the Multi-scale Progressive Channel Attention (MPCA) block, which progressively extract multi-scale features across adjacent encoder layers, and the lightweight Frequency-Spatial Attention (FSA) block, with only 0.05M parameters, enabling concurrent learning of texture and boundary features from both spatial and frequency domains. We validate the effectiveness of the proposed SF-UNet on three public datasets. Experimental results show that compared to previous state-of-the-art (SOTA) medical image segmentation networks, SF-UNet achieves the best performance, and achieves up to 9.4\% and 10.78\% improvement in DSC and IOU. Codes will be released at https://github.com/nkicsl/SF-UNet. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 8 pages

arXiv:2404.19401 [pdf, other]

UniFS: Universal Few-shot Instance Perception with Point Representations

Authors: Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

Abstract: Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of ta… ▽ More Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of tasks, presumably due to the challenges involved in designing a generic model capable of representing diverse tasks in a unified manner. In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework. Additionally, we propose Structure-Aware Point Learning (SAPL) to exploit the higher-order structural relationship among points to further enhance representation learning. Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models. Codes will be released soon. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.14701 [pdf, other]

Deep neural networks for choice analysis: Enhancing behavioral regularity with gradient regularization

Authors: Siqi Feng, Rui Yao, Stephane Hess, Ricardo A. Daziano, Timothy Brathwaite, Joan Walker, Shenhao Wang

Abstract: Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framewor… ▽ More Deep neural networks (DNNs) frequently present behaviorally irregular patterns, significantly limiting their practical potentials and theoretical validity in travel behavior modeling. This study proposes strong and weak behavioral regularities as novel metrics to evaluate the monotonicity of individual demand functions (a.k.a. law of demand), and further designs a constrained optimization framework with six gradient regularizers to enhance DNNs' behavioral regularity. The proposed framework is applied to travel survey data from Chicago and London to examine the trade-off between predictive power and behavioral regularity for large vs. small sample scenarios and in-domain vs. out-of-domain generalizations. The results demonstrate that, unlike models with strong behavioral foundations such as the multinomial logit, the benchmark DNNs cannot guarantee behavioral regularity. However, gradient regularization (GR) increases DNNs' behavioral regularity by around 6 percentage points (pp) while retaining their relatively high predictive power. In the small sample scenario, GR is more effective than in the large sample scenario, simultaneously improving behavioral regularity by about 20 pp and log-likelihood by around 1.7%. Comparing with the in-domain generalization of DNNs, GR works more effectively in out-of-domain generalization: it drastically improves the behavioral regularity of poorly performing benchmark DNNs by around 65 pp, indicating the criticality of behavioral regularization for enhancing model transferability and application in forecasting. Moreover, the proposed framework is applicable to other NN-based choice models such as TasteNets. Future studies could use behavioral regularity as a metric along with log-likelihood in evaluating travel demand models, and investigate other methods to further enhance behavioral regularity when adopting complex machine learning models. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2402.10834 [pdf, other]

Agent-based Simulation Evaluation of CBD Tolling: A Case Study from New York City

Authors: Qingnan Liang, Ruili Yao, Ruixuan Zhang, Zhibin Chen, Guoyuan Wu

Abstract: Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MA… ▽ More Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MATSim, a simulation platform that provides microscopic behaviors at the agent level, to evaluate performance on tolling schemes. Specifically, we conduct a case study of the Manhattan Central Business District (CBD) in New York City (NYC) using a fine-granularity traffic network model in the large-scale agent behavior setting. The flexibility of MATSim enables the implementation of a customized tolling policy proposed yet not deployed by the NYC agency while providing detailed interpretations. The quantitative and qualitative results indicate that the tested tolling program can regulate the personal vehicle volume in the CBD area and encourage the usage of public transportation, which proves to be a practical move towards sustainable transportation systems. More importantly, our work demonstrates that agent-based simulation helps better understand the travel pattern change subject to tollings in dense and complex urban environments, and it has the potential to facilitate efficient decision-making for the devotion to sustainable traffic management. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: Accepted by 2024 IEEE Forum on Integrated and Sustainable Transportation Systems

arXiv:2311.17629 [pdf, other]

Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Authors: Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik

Abstract: Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two te… ▽ More Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two technologies, Rotated RoI attention (RRoI attention) and Selective Distinct Queries (SDQ). Specifically, RRoI attention effectively focuses on oriented regions of interest through a cross-attention mechanism and aligns multi-scale features. SDQ collects queries from intermediate decoder layers and then filters similar queries to obtain distinct queries. The proposed SDQ can facilitate the optimization of one-to-one label assignment, without introducing redundant initial queries or extra auxiliary branches. Extensive experiments on five datasets demonstrate the effectiveness of our method. Notably, our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone. △ Less

Submitted 1 December, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 11 pages, 7 figures, 13 tables

arXiv:2310.19113 [pdf, other]

Dynamic V2X Autonomous Perception from Road-to-Vehicle Vision

Authors: Jiayao Tan, Fan Lyu, Linyan Li, Fuyuan Hu, Tingliang Feng, Fenglei Xu, Rui Yao

Abstract: Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we… ▽ More Vehicle-to-everything (V2X) perception is an innovative technology that enhances vehicle perception accuracy, thereby elevating the security and reliability of autonomous systems. However, existing V2X perception methods focus on static scenes from mainly vehicle-based vision, which is constrained by sensor capabilities and communication loads. To adapt V2X perception models to dynamic scenes, we propose to build V2X perception from road-to-vehicle vision and present Adaptive Road-to-Vehicle Perception (AR2VP) method. In AR2VP,we leverage roadside units to offer stable, wide-range sensing capabilities and serve as communication hubs. AR2VP is devised to tackle both intra-scene and inter-scene changes. For the former, we construct a dynamic perception representing module, which efficiently integrates vehicle perceptions, enabling vehicles to capture a more comprehensive range of dynamic factors within the scene.Moreover, we introduce a road-to-vehicle perception compensating module, aimed at preserving the maximized roadside unit perception information in the presence of intra-scene changes.For inter-scene changes, we implement an experience replay mechanism leveraging the roadside unit's storage capacity to retain a subset of historical scene data, maintaining model robustness in response to inter-scene shifts. We conduct perception experiment on 3D object detection and segmentation, and the results show that AR2VP excels in both performance-bandwidth trade-offs and adaptability within dynamic environments. △ Less

Submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.16499 [pdf, other]

Data Optimization in Deep Learning: A Survey

Authors: Ou Wu, Rujing Yao

Abstract: Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts of high-quality data. Additionally, issues such as model robustness, fairness, and trustworthiness are also closely related to training data. Consequently, a hu… ▽ More Large-scale, high-quality data are considered an essential factor for the successful application of many deep learning techniques. Meanwhile, numerous real-world deep learning tasks still have to contend with the lack of sufficient amounts of high-quality data. Additionally, issues such as model robustness, fairness, and trustworthiness are also closely related to training data. Consequently, a huge number of studies in the existing literature have focused on the data aspect in deep learning tasks. Some typical data optimization techniques include data augmentation, logit perturbation, sample weighting, and data condensation. These techniques usually come from different deep learning divisions and their theoretical inspirations or heuristic motivations may seem unrelated to each other. This study aims to organize a wide range of existing data optimization methodologies for deep learning from the previous literature, and makes the effort to construct a comprehensive taxonomy for them. The constructed taxonomy considers the diversity of split dimensions, and deep sub-taxonomies are constructed for each dimension. On the basis of the taxonomy, connections among the extensive data optimization methods for deep learning are built in terms of four aspects. We probe into rendering several promising and interesting future directions. The constructed taxonomy and the revealed connections will enlighten the better understanding of existing methods and the design of novel data optimization techniques. Furthermore, our aspiration for this survey is to promote data optimization as an independent subdivision of deep learning. A curated, up-to-date list of resources related to data optimization in deep learning is available at \url{https://github.com/YaoRujing/Data-Optimization}. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2310.08285 [pdf, other]

How would mobility-as-a-service (MaaS) platform survive as an intermediary? From the viewpoint of stability in many-to-many matching

Authors: Rui Yao, Kenan Zhang

Abstract: Mobility-as-a-service (MaaS) provides seamless door-to-door trips by integrating different transport modes. Although many MaaS platforms have emerged in recent years, most of them remain at a limited integration level. This study investigates the assignment and pricing problem for a MaaS platform as an intermediary in a multi-modal transportation network, which purchases capacity from service oper… ▽ More Mobility-as-a-service (MaaS) provides seamless door-to-door trips by integrating different transport modes. Although many MaaS platforms have emerged in recent years, most of them remain at a limited integration level. This study investigates the assignment and pricing problem for a MaaS platform as an intermediary in a multi-modal transportation network, which purchases capacity from service operators and sells multi-modal trips to travelers. The analysis framework of many-to-many stable matching is adopted to decompose the joint design problem and to derive the stability condition such that both operators and travelers are willing to participate in the MaaS system. To maximize the flexibility in route choice and remove boundaries between modes, we design an origin-destination pricing scheme for MaaS trips. On the supply side, we propose a wholesale purchase price for service capacity. Accordingly, the assignment problem is reformulated and solved as a bi-level program, where MaaS travelers make multi-modal trips to minimize their travel costs meanwhile interacting with non-MaaS travelers in the multi-modal transport system. We prove that, under the proposed pricing scheme, there always exists a stable outcome to the overall many-to-many matching problem. Further, given an optimal assignment and under some mild conditions, a unique optimal pricing scheme is ensured. Numerical experiments conducted on the extended Sioux Falls network also demonstrate that the proposed MaaS system could create a win-win-win situation -- the MaaS platform is profitable and both traveler welfare and transit operator revenues increase from a baseline scenario without MaaS. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2308.14378 [pdf, other]

GKGNet: Group K-Nearest Neighbor based Graph Convolutional Network for Multi-Label Image Recognition

Authors: Ruijie Yao, Sheng Jin, Lumin Xu, Wang Zeng, Wentao Liu, Chen Qian, Ping Luo, Ji Wu

Abstract: Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions. Although convolutional neural networks and vision transformers have succeeded in processing images as regular grids of pixels or patches, these representations are sub-optimal for capturing irregular and… ▽ More Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions. Although convolutional neural networks and vision transformers have succeeded in processing images as regular grids of pixels or patches, these representations are sub-optimal for capturing irregular and discontinuous regions of interest. In this work, we present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet), which models the connections between semantic label embeddings and image patches in a flexible and unified graph structure. To address the scale variance of different objects and to capture information from multiple perspectives, we propose the Group KGCN module for dynamic graph construction and message passing. Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs on the challenging multi-label datasets, \ie MS-COCO and VOC2007 datasets. We will release the code and models to facilitate future research in this area. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2307.13310 [pdf, other]

doi 10.1109/TCSVT.2023.3299087

CT-Net: Arbitrary-Shaped Text Detection via Contour Transformer

Authors: Zhiwen Shao, Yuchen Su, Yong Zhou, Fanrong Meng, Hancheng Zhu, Bing Liu, Rui Yao

Abstract: Contour based scene text detection methods have rapidly developed recently, but still suffer from inaccurate frontend contour initialization, multi-stage error accumulation, or deficient local information aggregation. To tackle these limitations, we propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers. Specifically… ▽ More Contour based scene text detection methods have rapidly developed recently, but still suffer from inaccurate frontend contour initialization, multi-stage error accumulation, or deficient local information aggregation. To tackle these limitations, we propose a novel arbitrary-shaped scene text detection framework named CT-Net by progressive contour regression with contour transformers. Specifically, we first employ a contour initialization module that generates coarse text contours without any post-processing. Then, we adopt contour refinement modules to adaptively refine text contours in an iterative manner, which are beneficial for context information capturing and progressive global contour deformation. Besides, we propose an adaptive training strategy to enable the contour transformers to learn more potential deformation paths, and introduce a re-score mechanism that can effectively suppress false positives. Extensive experiments are conducted on four challenging datasets, which demonstrate the accuracy and efficiency of our CT-Net over state-of-the-art methods. Particularly, CT-Net achieves F-measure of 86.1 at 11.2 frames per second (FPS) and F-measure of 87.8 at 10.1 FPS for CTW1500 and Total-Text datasets, respectively. △ Less

Submitted 25 July, 2023; originally announced July 2023.

Comments: This paper has been accepted by IEEE Transactions on Circuits and Systems for Video Technology

arXiv:2306.08854 [pdf, other]

A Gromov--Wasserstein Geometric View of Spectrum-Preserving Graph Coarsening

Authors: Yifan Chen, Rentian Yao, Yun Yang, Jie Chen

Abstract: Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a diffe… ▽ More Graph coarsening is a technique for solving large-scale graph problems by working on a smaller version of the original graph, and possibly interpolating the results back to the original graph. It has a long history in scientific computing and has recently gained popularity in machine learning, particularly in methods that preserve the graph spectrum. This work studies graph coarsening from a different perspective, developing a theory for preserving graph distances and proposing a method to achieve this. The geometric approach is useful when working with a collection of graphs, such as in graph classification and regression. In this study, we consider a graph as an element on a metric space equipped with the Gromov--Wasserstein (GW) distance, and bound the difference between the distance of two graphs and their coarsened versions. Minimizing this difference can be done using the popular weighted kernel $K$-means method, which improves existing spectrum-preserving methods with the proper choice of the kernel. The study includes a set of experiments to support the theory and method, including approximating the GW distance, preserving the graph spectrum, classifying graphs using spectral information, and performing regression using graph convolutional networks. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening . △ Less

Submitted 15 June, 2023; originally announced June 2023.

Comments: To appear at ICML 2023. Code is available at https://github.com/ychen-stat-ml/GW-Graph-Coarsening

arXiv:2306.06624 [pdf, other]

RestGPT: Connecting Large Language Models with Real-World RESTful APIs

Authors: Yifan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, Sujian Li

Abstract: Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of tasks. However, existing methods are mainly restricted to specifically designed tools and fail to fulfill complex instructions, having great limitations when confronted with real-world scenarios. In this paper, we explore a more realistic scenario by connecting LLMs with RESTful APIs, which a… ▽ More Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of tasks. However, existing methods are mainly restricted to specifically designed tools and fail to fulfill complex instructions, having great limitations when confronted with real-world scenarios. In this paper, we explore a more realistic scenario by connecting LLMs with RESTful APIs, which adhere to the widely adopted REST software architectural style for web service development. To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection. RestGPT also contains an API executor tailored for calling RESTful APIs, which can meticulously formulate parameters and parse API responses. To fully evaluate the performance of RestGPT, we propose RestBench, a high-quality benchmark which consists of two real-world scenarios and human-annotated instructions with gold solution paths. Experiments show that RestGPT is able to achieve impressive results in complex tasks and has strong robustness, which paves a new way towards AGI. RestGPT and RestBench is publicly available at https://restgpt.github.io/. △ Less

Submitted 26 August, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

Comments: Add RestBench to evaluate RestGPT

arXiv:2306.00127 [pdf, other]

Surrogate Model Extension (SME): A Fast and Accurate Weight Update Attack on Federated Learning

Authors: Junyi Zhu, Ruicong Yao, Matthew B. Blaschko

Abstract: In Federated Learning (FL) and many other distributed training frameworks, collaborators can hold their private data locally and only share the network weights trained with the local data after multiple iterations. Gradient inversion is a family of privacy attacks that recovers data from its generated gradients. Seemingly, FL can provide a degree of protection against gradient inversion attacks on… ▽ More In Federated Learning (FL) and many other distributed training frameworks, collaborators can hold their private data locally and only share the network weights trained with the local data after multiple iterations. Gradient inversion is a family of privacy attacks that recovers data from its generated gradients. Seemingly, FL can provide a degree of protection against gradient inversion attacks on weight updates, since the gradient of a single step is concealed by the accumulation of gradients over multiple local iterations. In this work, we propose a principled way to extend gradient inversion attacks to weight updates in FL, thereby better exposing weaknesses in the presumed privacy protection inherent in FL. In particular, we propose a surrogate model method based on the characteristic of two-dimensional gradient flow and low-rank property of local updates. Our method largely boosts the ability of gradient inversion attacks on weight updates containing many iterations and achieves state-of-the-art (SOTA) performance. Additionally, our method runs up to $100\times$ faster than the SOTA baseline in the common FL scenario. Our work re-evaluates and highlights the privacy risk of sharing network weights. Our code is available at https://github.com/JunyiZhu-AI/surrogate_model_extension. △ Less

Submitted 31 May, 2023; originally announced June 2023.

Comments: Accepted at ICML 2023

arXiv:2305.15583 [pdf, other]

Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps

Authors: Mingxiao Li, Tingyu Qu, Ruicong Yao, Wei Sun, Marie-Francine Moens

Abstract: Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during… ▽ More Diffusion Probabilistic Models (DPM) have shown remarkable efficacy in the synthesis of high-quality images. However, their inference process characteristically requires numerous, potentially hundreds, of iterative steps, which could exaggerate the problem of exposure bias due to the training and inference discrepancy. Previous work has attempted to mitigate this issue by perturbing inputs during training, which consequently mandates the retraining of the DPM. In this work, we conduct a systematic study of exposure bias in DPM and, intriguingly, we find that the exposure bias could be alleviated with a novel sampling method that we propose, without retraining the model. We empirically and theoretically show that, during inference, for each backward time step $t$ and corresponding state $\hat{x}_t$, there might exist another time step $t_s$ which exhibits superior coupling with $\hat{x}_t$. Based on this finding, we introduce a sampling method named Time-Shift Sampler. Our framework can be seamlessly integrated to existing sampling algorithms, such as DDPM, DDIM and other high-order solvers, inducing merely minimal additional computations. Experimental results show our method brings significant and consistent improvements in FID scores on different datasets and sampling methods. For example, integrating Time-Shift Sampler to F-PNDM yields a FID=3.88, achieving 44.49\% improvements as compared to F-PNDM, on CIFAR-10 with 10 sampling steps, which is more performant than the vanilla DDIM with 100 sampling steps. Our code is available at https://github.com/Mingxiao-Li/TS-DPM. △ Less

Submitted 16 June, 2024; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: Accepted at International Conference on Learning Representations (ICLR2024); typo correction

arXiv:2303.16526 [pdf, other]

doi 10.1109/ICME55011.2023.00346

HybridPoint: Point Cloud Registration Based on Hybrid Point Sampling and Matching

Authors: Yiheng Li, Canhui Tang, Runzhao Yao, Aixue Ye, Feng Wen, Shaoyi Du

Abstract: Patch-to-point matching has become a robust way of point cloud registration. However, previous patch-matching methods employ superpoints with poor localization precision as nodes, which may lead to ambiguous patch partitions. In this paper, we propose a HybridPoint-based network to find more robust and accurate correspondences. Firstly, we propose to use salient points with prominent local feature… ▽ More Patch-to-point matching has become a robust way of point cloud registration. However, previous patch-matching methods employ superpoints with poor localization precision as nodes, which may lead to ambiguous patch partitions. In this paper, we propose a HybridPoint-based network to find more robust and accurate correspondences. Firstly, we propose to use salient points with prominent local features as nodes to increase patch repeatability, and introduce some uniformly distributed points to complete the point cloud, thus constituting hybrid points. Hybrid points not only have better localization precision but also give a complete picture of the whole point cloud. Furthermore, based on the characteristic of hybrid points, we propose a dual-classes patch matching module, which leverages the matching results of salient points and filters the matching noise of non-salient points. Experiments show that our model achieves state-of-the-art performance on 3DMatch, 3DLoMatch, and KITTI odometry, especially with 93.0% Registration Recall on the 3DMatch dataset. Our code and models are available at https://github.com/liyih/HybridPoint. △ Less

Submitted 23 April, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME), 2023

arXiv:2302.03931 [pdf, other]

Fast Linear Model Trees by PILOT

Authors: Jakob Raymaekers, Peter J. Rousseeuw, Tim Verdonck, Ruicong Yao

Abstract: Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addit… ▽ More Linear model trees are regression trees that incorporate linear models in the leaf nodes. This preserves the intuitive interpretation of decision trees and at the same time enables them to better capture linear relationships, which is hard for standard decision trees. But most existing methods for fitting linear model trees are time consuming and therefore not scalable to large data sets. In addition, they are more prone to overfitting and extrapolation issues than standard regression trees. In this paper we introduce PILOT, a new algorithm for linear model trees that is fast, regularized, stable and interpretable. PILOT trains in a greedy fashion like classic regression trees, but incorporates an $L^2$ boosting approach and a model selection rule for fitting linear models in the nodes. The abbreviation PILOT stands for $PI$ecewise $L$inear $O$rganic $T$ree, where `organic' refers to the fact that no pruning is carried out. PILOT has the same low time and space complexity as CART without its pruning. An empirical study indicates that PILOT tends to outperform standard decision trees and other linear model trees on a variety of data sets. Moreover, we prove its consistency in an additive model setting under weak assumptions. When the data is generated by a linear model, the convergence rate is polynomial. △ Less

Submitted 8 February, 2023; originally announced February 2023.

Journal ref: Machine Learning, 2024

arXiv:2211.12794 [pdf, ps, other]

Zero Forcing Uplink Detection through Large-Scale RIS: System Performance and Phase Shift Design

Authors: Nikolaos I. Miridakis, Theodoros A. Tsiftsis, Rugui Yao

Abstract: A multiple-input multiple-output wireless communication system is analytically studied, which operates with the aid of a large-scale reconfigurable intelligent surface (LRIS). LRIS is equipped with multiple passive elements with discrete phase adjustment capabilities, and independent Rician fading conditions are assumed for both the transmitter-to-LRIS and LRIS-to-receiver links. A direct transcei… ▽ More A multiple-input multiple-output wireless communication system is analytically studied, which operates with the aid of a large-scale reconfigurable intelligent surface (LRIS). LRIS is equipped with multiple passive elements with discrete phase adjustment capabilities, and independent Rician fading conditions are assumed for both the transmitter-to-LRIS and LRIS-to-receiver links. A direct transceiver link is also considered which is modeled by Rayleigh fading distribution. The system performance is analytically studied when the linear yet efficient zero-forcing detection is implemented at the receiver. In particular, the outage performance is derived in closed-form expression for different system configuration setups with regards to the available channel state information (CSI) at the receiver. In fact, the case of both perfect and imperfect CSI is analyzed. Also, an efficient phase shift design approach at LRIS is introduced, which is linear on the number of passive elements and receive antennas. The proposed phase shift design can be applied on two different modes of operation; namely, when the system strives to adapt either on the instantaneous or statistical CSI. Finally, some impactful engineering insights are provided, such as how the channel fading conditions, CSI, discrete phase shift resolution, and volume of antenna/LRIS element arrays impact on the overall system performance. △ Less

Submitted 23 November, 2022; originally announced November 2022.

Comments: Accepted for publication to IEEE Transactions on Communications

arXiv:2211.00168 [pdf, other]

Improving Fairness in Image Classification via Sketching

Authors: Ruichen Yao, Ziteng Cui, Xiaoxiao Li, Lin Gu

Abstract: Fairness is a fundamental requirement for trustworthy and human-centered Artificial Intelligence (AI) system. However, deep neural networks (DNNs) tend to make unfair predictions when the training data are collected from different sub-populations with different attributes (i.e. color, sex, age), leading to biased DNN predictions. We notice that such a troubling phenomenon is often caused by data i… ▽ More Fairness is a fundamental requirement for trustworthy and human-centered Artificial Intelligence (AI) system. However, deep neural networks (DNNs) tend to make unfair predictions when the training data are collected from different sub-populations with different attributes (i.e. color, sex, age), leading to biased DNN predictions. We notice that such a troubling phenomenon is often caused by data itself, which means that bias information is encoded to the DNN along with the useful information (i.e. class information, semantic information). Therefore, we propose to use sketching to handle this phenomenon. Without losing the utility of data, we explore the image-to-sketching methods that can maintain useful semantic information for the target classification while filtering out the useless bias information. In addition, we design a fair loss to further improve the model fairness. We evaluate our method through extensive experiments on both general scene dataset and medical scene dataset. Our results show that the desired image-to-sketching method improves model fairness and achieves satisfactory results among state-of-the-art. △ Less

Submitted 31 October, 2022; originally announced November 2022.

Comments: 8 pages, 2 figures. To appear in 2022 Trustworthy and Socially Responsible Machine Learning (TSRML 2022) co-located with NeurIPS 2022

arXiv:2208.12042 [pdf, other]

Efficient Truncated Linear Regression with Unknown Noise Variance

Authors: Constantinos Daskalakis, Patroklos Stefanou, Rui Yao, Manolis Zampetakis

Abstract: Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in it… ▽ More Truncated linear regression is a classical challenge in Statistics, wherein a label, $y = w^T x + \varepsilon$, and its corresponding feature vector, $x \in \mathbb{R}^k$, are only observed if the label falls in some subset $S \subseteq \mathbb{R}$; otherwise the existence of the pair $(x, y)$ is hidden from observation. Linear regression with truncated observations has remained a challenge, in its general form, since the early works of~\citet{tobin1958estimation,amemiya1973regression}. When the distribution of the error is normal with known variance, recent work of~\citet{daskalakis2019truncatedregression} provides computationally and statistically efficient estimators of the linear model, $w$. In this paper, we provide the first computationally and statistically efficient estimators for truncated linear regression when the noise variance is unknown, estimating both the linear model and the variance of the noise. Our estimator is based on an efficient implementation of Projected Stochastic Gradient Descent on the negative log-likelihood of the truncated sample. Importantly, we show that the error of our estimates is asymptotically normal, and we use this to provide explicit confidence regions for our estimates. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2206.13381 [pdf, other]

doi 10.1109/TMM.2022.3186431

TextDCT: Arbitrary-Shaped Text Detection via Discrete Cosine Transform Mask

Authors: Yuchen Su, Zhiwen Shao, Yong Zhou, Fanrong Meng, Hancheng Zhu, Bing Liu, Rui Yao

Abstract: Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation. Most existing regression based methods resort to regress the masks or contour points of text regions to model the text instances. However, regressing the complete masks requires high training complexity, and contour points are not sufficient to capture the details o… ▽ More Arbitrary-shaped scene text detection is a challenging task due to the variety of text changes in font, size, color, and orientation. Most existing regression based methods resort to regress the masks or contour points of text regions to model the text instances. However, regressing the complete masks requires high training complexity, and contour points are not sufficient to capture the details of highly curved texts. To tackle the above limitations, we propose a novel light-weight anchor-free text detection framework called TextDCT, which adopts the discrete cosine transform (DCT) to encode the text masks as compact vectors. Further, considering the imbalanced number of training samples among pyramid layers, we only employ a single-level head for top-down prediction. To model the multi-scale texts in a single-level head, we introduce a novel positive sampling strategy by treating the shrunk text region as positive samples, and design a feature awareness module (FAM) for spatial-awareness and scale-awareness by fusing rich contextual information and focusing on more significant features. Moreover, we propose a segmented non-maximum suppression (S-NMS) method that can filter low-quality mask regressions. Extensive experiments are conducted on four challenging datasets, which demonstrate our TextDCT obtains competitive performance on both accuracy and efficiency. Specifically, TextDCT achieves F-measure of 85.1 at 17.2 frames per second (FPS) and F-measure of 84.9 at 15.1 FPS for CTW1500 and Total-Text datasets, respectively. △ Less

Submitted 27 June, 2022; originally announced June 2022.

Comments: This paper has been accepted by IEEE Transactions on Multimedia

arXiv:2203.15612 [pdf, other]

Three-Dimensional Spectrum Occupancy Measurement using UAV: Performance Analysis and Algorithm Design

Authors: Zhiqing Wei, Rubing Yao, Jie Kang, Xu Chen, Huici Wu

Abstract: Spectrum sharing, as an approach to significantly improve spectrum efficiency in the era of 6th generation mobile networks (6G), has attracted extensive attention. Radio Environment Map (REM) based low-complexity spectrum sharing is widely studied where the spectrum occupancy measurement (SOM) is vital to construct REM. The SOM in three-dimensional (3D) space is becoming increasingly essential to… ▽ More Spectrum sharing, as an approach to significantly improve spectrum efficiency in the era of 6th generation mobile networks (6G), has attracted extensive attention. Radio Environment Map (REM) based low-complexity spectrum sharing is widely studied where the spectrum occupancy measurement (SOM) is vital to construct REM. The SOM in three-dimensional (3D) space is becoming increasingly essential to support the spectrum sharing with space-air-ground integrated network being a great momentum of 6G. In this paper, we analyze the performance of 3D SOM to further study the tradeoff between accuracy and efficiency in 3D SOM. We discover that the error of 3D SOM is related with the area of the boundary surfaces of licensed networks, the number of discretized cubes, and the length of the edge of 3D space. Moreover, we design a fast and accurate 3D SOM algorithm that utilizes unmanned aerial vehicle (UAV) to measure the spectrum occupancy considering the path planning of UAV, which improves the measurement efficiency by requiring less measurement time and flight time of the UAV for satisfactory performance. The theoretical results obtained in this paper reveal the essential dependencies that describe the 3D SOM methodology, and the proposed algorithm is beneficial to improve the efficiency of 3D SOM. It is noted that the theoretical results and algorithm in this paper may provide a guideline for more areas such as spectrum monitoring, spectrum measurement, network measurement, planning, etc. △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2112.14192

Robust Security Analysis Based on Random Geometry Theory for Satellite-Terrestrial-Vehicle Network

Authors: Xudong Li, Ye Fan, Rugui Yao, Peng Wang, Nan Qi, Xiaoya Zuo

Abstract: Driven by B5G and 6G technologies, multi-network fusion is an indispensable tendency for future communications. In this paper, we focus on and analyze the \emph{security performance} (SP) of the \emph{satellite-terrestrial downlink transmission} (STDT). Here, the STDT is composed of a satellite network and a vehicular network with a legitimate mobile receiver and an mobile eavesdropper distributin… ▽ More Driven by B5G and 6G technologies, multi-network fusion is an indispensable tendency for future communications. In this paper, we focus on and analyze the \emph{security performance} (SP) of the \emph{satellite-terrestrial downlink transmission} (STDT). Here, the STDT is composed of a satellite network and a vehicular network with a legitimate mobile receiver and an mobile eavesdropper distributing. To theoretically analyze the SP of this system from the perspective of mobile terminals better, the random geometry theory is adopted, which assumes that both terrestrial vehicles are distributed stochastically in one beam of the satellite. Furthermore, based on this theory, the closed-form analytical expressions for two crucial and specific indicators in the STDT are derived, respectively, the secrecy outage probability and the ergodic secrecy capacity. Additionally, several related variables restricting the SP of the STDT are discussed, and specific schemes are presented to enhance the SP. Then, the asymptotic property is investigated in the high signal-to-noise ratio scenario, and accurate and asymptotic closed-form expressions are given. Finally, simulation results show that, under the precondition of guaranteeing the reliability of the STDT, the asymptotic solutions outperform the corresponding accurate results significantly in the effectiveness. △ Less

Submitted 14 July, 2022; v1 submitted 28 December, 2021; originally announced December 2021.

Comments: The theoretical analysis in the original manuscript is insufficient, and the system model is not convincing. With the consideration of these flaws, we decide to withdraw our work for further improvement

arXiv:2109.09467 [pdf]

Cooperative Anti-Jamming for UAV Networks: A Local Altruistic Game Approach

Authors: Yueyue Su, Nan Qi, Zanqi Huang, Rugui Yao, Luliang Jia

Abstract: To improve the anti-jamming ability of the UAV-aided communication systems, this paper investigates the channel selection optimization problem in face of both internal mutual interference and external malicious jamming. A cooperative anti-jamming method based on local altruistic is proposed to optimize UAVs' channel selection. Specifically, a Stackelberg game is modeled to formulate the confrontat… ▽ More To improve the anti-jamming ability of the UAV-aided communication systems, this paper investigates the channel selection optimization problem in face of both internal mutual interference and external malicious jamming. A cooperative anti-jamming method based on local altruistic is proposed to optimize UAVs' channel selection. Specifically, a Stackelberg game is modeled to formulate the confrontation relationship between UAVs and the jammer. A local altruistic game is modeled with each UAV considering the utilities of both itself and other UAVs. A distributed cooperative anti-jamming algorithm is proposed to obtain the Stackelberg equilibrium. Finally, the convergence of the proposed algorithm and the impact of the transmission power on the system loss value are analyzed, and the anti-jamming performance of the proposed algorithm can be improved by around 64% compared with the existing algorithms. △ Less

Submitted 12 September, 2021; originally announced September 2021.

Comments: 14 pages, 8 figures

MSC Class: 91A28

arXiv:2107.11921 [pdf, other]

Compensation Learning

Authors: Rujing Yao, Ou Wu

Abstract: Weighting strategy prevails in machine learning. For example, a common approach in robust machine learning is to exert lower weights on samples which are likely to be noisy or quite hard. This study reveals another undiscovered strategy, namely, compensating. Various incarnations of compensating have been utilized but it has not been explicitly revealed. Learning with compensating is called compen… ▽ More Weighting strategy prevails in machine learning. For example, a common approach in robust machine learning is to exert lower weights on samples which are likely to be noisy or quite hard. This study reveals another undiscovered strategy, namely, compensating. Various incarnations of compensating have been utilized but it has not been explicitly revealed. Learning with compensating is called compensation learning and a systematic taxonomy is constructed for it in this study. In our taxonomy, compensation learning is divided on the basis of the compensation targets, directions, inference manners, and granularity levels. Many existing learning algorithms including some classical ones can be viewed or understood at least partially as compensation techniques. Furthermore, a family of new learning algorithms can be obtained by plugging the compensation learning into existing learning algorithms. Specifically, two concrete new learning algorithms are proposed for robust machine learning. Extensive experiments on image classification and text sentiment analysis verify the effectiveness of the two new algorithms. Compensation learning can also be used in other various learning scenarios, such as imbalance learning, clustering, regression, and so on. △ Less

Submitted 4 January, 2022; v1 submitted 25 July, 2021; originally announced July 2021.

arXiv:2106.13319 [pdf]

A variational autoencoder approach for choice set generation and implicit perception of alternatives in choice modeling

Authors: Rui Yao, Shlomo Bekhor

Abstract: This paper derives the generalized extreme value (GEV) model with implicit availability/perception (IAP) of alternatives and proposes a variational autoencoder (VAE) approach for choice set generation and implicit perception of alternatives. Specifically, the cross-nested logit (CNL) model with IAP is derived as an example of IAP-GEV models. The VAE approach is adapted to model the choice set gene… ▽ More This paper derives the generalized extreme value (GEV) model with implicit availability/perception (IAP) of alternatives and proposes a variational autoencoder (VAE) approach for choice set generation and implicit perception of alternatives. Specifically, the cross-nested logit (CNL) model with IAP is derived as an example of IAP-GEV models. The VAE approach is adapted to model the choice set generation process, in which the likelihood of perceiving chosen alternatives in the choice set is maximized. The VAE approach for route choice set generation is exemplified using a real dataset. IAP- CNL model estimated has the best performance in terms of goodness-of-fit and prediction performance, compared to multinomial logit models and conventional choice set generation methods. △ Less

Submitted 18 June, 2021; originally announced June 2021.

arXiv:2105.13078 [pdf]

A Dynamic Tree Algorithm for Peer-to-Peer Ride-sharing Matching

Authors: Rui Yao, Shlomo Bekhor

Abstract: On-demand peer-to-peer ride-sharing services provide flexible mobility options, and are expected to alleviate congestion by sharing empty car seats. An efficient matching algorithm is essential to the success of a ride-sharing system. The matching problem is related to the well-known dial-a-ride problem, which also tries to find the optimal pickup and delivery sequence for a given set of passenger… ▽ More On-demand peer-to-peer ride-sharing services provide flexible mobility options, and are expected to alleviate congestion by sharing empty car seats. An efficient matching algorithm is essential to the success of a ride-sharing system. The matching problem is related to the well-known dial-a-ride problem, which also tries to find the optimal pickup and delivery sequence for a given set of passengers. In this paper, we propose an efficient dynamic tree algorithm to solve the on-demand peer-to-peer ride-sharing matching problem. The dynamic tree algorithm benefits from given ride-sharing driver schedules, and provides satisfactory runtime performances. In addition, an efficient pre-processing procedure to select candidate passenger requests is proposed, which further improves the algorithm performance. Numerical experiments conducted in a small network show that the dynamic tree algorithm reaches the same objective function values of the exact algorithm, but with shorter runtimes. Furthermore, the proposed method is applied to a larger size problem. Results show that the spatial distribution of ride-sharing participants influences the algorithm performance. Sensitivity analysis confirms that the most critical ride-sharing matching constraints are the excess travel times. The network analysis suggests that small vehicle capacities do not guarantee overall vehicle-kilometer travel savings. △ Less

Submitted 27 May, 2021; originally announced May 2021.

Comments: Accepted for publication on Networks and Spatial Economics

arXiv:2104.13463 [pdf]

A ridesharing simulation platform that considers dynamic supply-demand interactions

Authors: Rui Yao, Shlomo Bekhor

Abstract: This paper presents a new ridesharing simulation platform that accounts for dynamic driver supply and passenger demand, and complex interactions between drivers and passengers. The proposed simulation platform explicitly considers driver and passenger acceptance/rejection on the matching options, and cancellation before/after being matched. New simulation events, procedures and modules have been d… ▽ More This paper presents a new ridesharing simulation platform that accounts for dynamic driver supply and passenger demand, and complex interactions between drivers and passengers. The proposed simulation platform explicitly considers driver and passenger acceptance/rejection on the matching options, and cancellation before/after being matched. New simulation events, procedures and modules have been developed to handle these realistic interactions. The capabilities of the simulation platform are illustrated using numerical experiments. The experiments confirm the importance of considering supply and demand interactions and provide new insights to ridesharing operations. Results show that increase of driver supply does not always increase matching option accept rate, and larger matching window could have negative impacts on overall ridesharing success rate. These results emphasize the importance of a careful planning of a ridesharing system. △ Less

Submitted 15 May, 2022; v1 submitted 27 April, 2021; originally announced April 2021.

arXiv:2104.02880 [pdf, ps, other]

Contingency Analysis Based on Partitioned and Parallel Holomorphic Embedding

Authors: Rui Yao, Feng Qiu, Kai Sun

Abstract: In the steady-state contingency analysis, the traditional Newton-Raphson method suffers from non-convergence issues when solving post-outage power flow problems, which hinders the integrity and accuracy of security assessment. In this paper, we propose a novel robust contingency analysis approach based on holomorphic embedding (HE). The HE-based simulator guarantees convergence if the true power f… ▽ More In the steady-state contingency analysis, the traditional Newton-Raphson method suffers from non-convergence issues when solving post-outage power flow problems, which hinders the integrity and accuracy of security assessment. In this paper, we propose a novel robust contingency analysis approach based on holomorphic embedding (HE). The HE-based simulator guarantees convergence if the true power flow solution exists, which is desirable because it avoids the influence of numerical issues and provides a credible security assessment conclusion. In addition, based on the multi-area characteristics of real-world power systems, a partitioned HE (PHE) method is proposed with an interface-based partitioning of HE formulation. The PHE method does not undermine the numerical robustness of HE and significantly reduces the computation burden in large-scale contingency analysis. The PHE method is further enhanced by parallel or distributed computation to become parallel PHE (P${}^\mathrm{2}$HE). Tests on a 458-bus system, a synthetic 419-bus system and a large-scale 21447-bus system demonstrate the advantages of the proposed methods in robustness and efficiency. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2104.02877 [pdf, ps, other]

Hybrid QSS and Dynamic Extended-Term Simulation Based on Holomorphic Embedding

Authors: Rui Yao, Feng Qiu

Abstract: Power system simulations that extend over a time period of minutes, hours, or even longer are called extended-term simulations. As power systems evolve into complex systems with increasing interdependencies and richer dynamic behaviors across a wide range of timescales, extended-term simulation is needed for many power system analysis tasks (e.g., resilience analysis, renewable energy integration,… ▽ More Power system simulations that extend over a time period of minutes, hours, or even longer are called extended-term simulations. As power systems evolve into complex systems with increasing interdependencies and richer dynamic behaviors across a wide range of timescales, extended-term simulation is needed for many power system analysis tasks (e.g., resilience analysis, renewable energy integration, cascading failures), and there is an urgent need for efficient and robust extended-term simulation approaches. The conventional approaches are insufficient for dealing with the extended-term simulation of multi-timescale processes. This paper proposes an extended-term simulation approach based on the holomorphic embedding (HE) methodology. Its accuracy and computational efficiency are backed by HE's high accuracy in event-driven simulation, larger and adaptive time steps, and flexible switching between full-dynamic and quasi-steady-state (QSS) models. We used this proposed extended-term simulation approach to evaluate bulk power system restoration plans, and it demonstrates satisfactory accuracy and efficiency in this complex simulation task. △ Less

Submitted 6 April, 2021; originally announced April 2021.

arXiv:2102.09583 [pdf, other]

doi 10.1109/TPWRS.2021.3110881

Encoding Frequency Constraints in Preventive Unit Commitment Using Deep Learning with Region-of-Interest Active Sampling

Authors: Yichen Zhang, Hantao Cui, Jianzhe Liu, Feng Qiu, Tianqi Hong, Rui Yao, Fangxing Li

Abstract: With the increasing penetration of renewable energy, frequency response and its security are of significant concerns for reliable power system operations. Frequency-constrained unit commitment (FCUC) is proposed to address this challenge. Despite existing efforts in modeling frequency characteristics in unit commitment (UC), current strategies can only handle oversimplified low-order frequency res… ▽ More With the increasing penetration of renewable energy, frequency response and its security are of significant concerns for reliable power system operations. Frequency-constrained unit commitment (FCUC) is proposed to address this challenge. Despite existing efforts in modeling frequency characteristics in unit commitment (UC), current strategies can only handle oversimplified low-order frequency response models and do not consider wide-range operating conditions. This paper presents a generic data-driven framework for FCUC under high renewable penetration. Deep neural networks (DNNs) are trained to predict the frequency response using real data or high-fidelity simulation data. Next, the DNN is reformulated as a set of mixed-integer linear constraints to be incorporated into the ordinary UC formulation. In the data generation phase, all possible power injections are considered, and a region-of-interests active sampling is proposed to include power injection samples with frequency nadirs closer to the UFLC threshold, which significantly enhances the accuracy of frequency constraints in FCUC. The proposed FCUC is verified on the the IEEE 39-bus system. Then, a full-order dynamic model simulation using PSS/E verifies the effectiveness of FCUC in frequency-secure generator commitments. △ Less

Submitted 12 October, 2021; v1 submitted 18 February, 2021; originally announced February 2021.

arXiv:2101.07897 [pdf, other]

Safer Illinois and RokWall: Privacy Preserving University Health Apps for COVID-19

Authors: Vikram Sharma Mailthody, James Wei, Nicholas Chen, Mohammad Behnia, Ruihao Yao, Qihao Wang, Vedant Agrawal, Churan He, Lijian Wang, Leihao Chen, Amit Agarwal, Edward Richter, Wen-Mei Hwu, Christopher W. Fletcher, Jinjun Xiong, Andrew Miller, Sanjay Patel

Abstract: COVID-19 has fundamentally disrupted the way we live. Government bodies, universities, and companies worldwide are rapidly developing technologies to combat the COVID-19 pandemic and safely reopen society. Essential analytics tools such as contact tracing, super-spreader event detection, and exposure mapping require collecting and analyzing sensitive user information. The increasing use of such po… ▽ More COVID-19 has fundamentally disrupted the way we live. Government bodies, universities, and companies worldwide are rapidly developing technologies to combat the COVID-19 pandemic and safely reopen society. Essential analytics tools such as contact tracing, super-spreader event detection, and exposure mapping require collecting and analyzing sensitive user information. The increasing use of such powerful data-driven applications necessitates a secure, privacy-preserving infrastructure for computation on personal data. In this paper, we analyze two such computing infrastructures under development at the University of Illinois at Urbana-Champaign to track and mitigate the spread of COVID-19. First, we present Safer Illinois, a system for decentralized health analytics supporting two applications currently deployed with widespread adoption: digital contact tracing and COVID-19 status cards. Second, we introduce the RokWall architecture for privacy-preserving centralized data analytics on sensitive user data. We discuss the architecture of these systems, design choices, threat models considered, and the challenges we experienced in developing production-ready systems for sensitive data analysis. △ Less

Submitted 17 March, 2021; v1 submitted 19 January, 2021; originally announced January 2021.

Comments: Appears in the Workshop on Secure IT Technologies against COVID-19(CoronaDef) 2021

arXiv:2101.01308 [pdf, other]

doi 10.1109/TIP.2021.3087401

CycleSegNet: Object Co-segmentation with Cycle Refinement and Region Correspondence

Authors: Chi Zhang, Guankai Li, Guosheng Lin, Qingyao Wu, Rui Yao

Abstract: Image co-segmentation is an active computer vision task that aims to segment the common objects from a set of images. Recently, researchers design various learning-based algorithms to undertake the co-segmentation task. The main difficulty in this task is how to effectively transfer information between images to make conditional predictions. In this paper, we present CycleSegNet, a novel framework… ▽ More Image co-segmentation is an active computer vision task that aims to segment the common objects from a set of images. Recently, researchers design various learning-based algorithms to undertake the co-segmentation task. The main difficulty in this task is how to effectively transfer information between images to make conditional predictions. In this paper, we present CycleSegNet, a novel framework for the co-segmentation task. Our network design has two key components: a region correspondence module which is the basic operation for exchanging information between local image regions, and a cycle refinement module, which utilizes ConvLSTMs to progressively update image representations and exchange information in a cycle and iterative manner. Extensive experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on four popular benchmark datasets -- PASCAL VOC dataset, MSRC dataset, Internet dataset, and iCoseg dataset, by 2.6%, 7.7%, 2.2%, and 2.9%, respectively. △ Less

Submitted 2 June, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: Accept to TIP

arXiv:2011.12354 [pdf, ps, other]

PowerNet: Multi-agent Deep Reinforcement Learning for Scalable Powergrid Control

Authors: Dong Chen, Kaian Chen. Zhaojian Li, Tianshu Chu, Rui Yao, Feng Qiu, Kaixiang Lin

Abstract: This paper develops an efficient multi-agent deep reinforcement learning algorithm for cooperative controls in powergrids. Specifically, we consider the decentralized inverter-based secondary voltage control problem in distributed generators (DGs), which is first formulated as a cooperative multi-agent reinforcement learning (MARL) problem. We then propose a novel on-policy MARL algorithm, PowerNe… ▽ More This paper develops an efficient multi-agent deep reinforcement learning algorithm for cooperative controls in powergrids. Specifically, we consider the decentralized inverter-based secondary voltage control problem in distributed generators (DGs), which is first formulated as a cooperative multi-agent reinforcement learning (MARL) problem. We then propose a novel on-policy MARL algorithm, PowerNet, in which each agent (DG) learns a control policy based on (sub-)global reward but local states from its neighboring agents. Motivated by the fact that a local control from one agent has limited impact on agents distant from it, we exploit a novel spatial discount factor to reduce the effect from remote agents, to expedite the training process and improve scalability. Furthermore, a differentiable, learning-based communication protocol is employed to foster the collaborations among neighboring agents. In addition, to mitigate the effects of system uncertainty and random noise introduced during on-policy learning, we utilize an action smoothing factor to stabilize the policy execution. To facilitate training and evaluation, we develop PGSim, an efficient, high-fidelity powergrid simulation platform. Experimental results in two microgrid setups show that the developed PowerNet outperforms a conventional model-based control, as well as several state-of-the-art MARL algorithms. The decentralized learning scheme and high sample efficiency also make it viable to large-scale power grids. △ Less

Submitted 31 July, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: 11 pages

arXiv:2011.00518 [pdf, other]

AI Marker-based Large-scale AI Literature Mining

Authors: Rujing Yao, Yingchun Ye, Ji Zhang, Shuxiao Li, Ou Wu

Abstract: The knowledge contained in academic literature is interesting to mine. Inspired by the idea of molecular markers tracing in the field of biochemistry, three named entities, namely, methods, datasets and metrics are used as AI markers for AI literature. These entities can be used to trace the research process described in the bodies of papers, which opens up new perspectives for seeking and mining… ▽ More The knowledge contained in academic literature is interesting to mine. Inspired by the idea of molecular markers tracing in the field of biochemistry, three named entities, namely, methods, datasets and metrics are used as AI markers for AI literature. These entities can be used to trace the research process described in the bodies of papers, which opens up new perspectives for seeking and mining more valuable academic information. Firstly, the entity extraction model is used in this study to extract AI markers from large-scale AI literature. Secondly, original papers are traced for AI markers. Statistical and propagation analysis are performed based on tracing results. Finally, the co-occurrences of AI markers are used to achieve clustering. The evolution within method clusters and the influencing relationships amongst different research scene clusters are explored. The above-mentioned mining based on AI markers yields many meaningful discoveries. For example, the propagation of effective methods on the datasets is rapidly increasing with the development of time; effective methods proposed by China in recent years have increasing influence on other countries, whilst France is the opposite. Saliency detection, a classic computer vision research scene, is the least likely to be affected by other research scenes. △ Less

Submitted 2 November, 2020; v1 submitted 1 November, 2020; originally announced November 2020.

arXiv:2010.13583 [pdf, other]

Method and Dataset Entity Mining in Scientific Literature: A CNN + Bi-LSTM Model with Self-attention

Authors: Linlin Hou, Ji Zhang, Ou Wu, Ting Yu, Zhen Wang, Zhao Li, Jianliang Gao, Yingchun Ye, Rujing Yao

Abstract: Literature analysis facilitates researchers to acquire a good understanding of the development of science and technology. The traditional literature analysis focuses largely on the literature metadata such as topics, authors, abstracts, keywords, references, etc., and little attention was paid to the main content of papers. In many scientific domains such as science, computing, engineering, etc.,… ▽ More Literature analysis facilitates researchers to acquire a good understanding of the development of science and technology. The traditional literature analysis focuses largely on the literature metadata such as topics, authors, abstracts, keywords, references, etc., and little attention was paid to the main content of papers. In many scientific domains such as science, computing, engineering, etc., the methods and datasets involved in the scientific papers published in those domains carry important information and are quite useful for domain analysis as well as algorithm and dataset recommendation. In this paper, we propose a novel entity recognition model, called MDER, which is able to effectively extract the method and dataset entities from the main textual content of scientific papers. The model utilizes rule embedding and adopts a parallel structure of CNN and Bi-LSTM with the self-attention mechanism. We evaluate the proposed model on datasets which are constructed from the published papers of four research areas in computer science, i.e., NLP, CV, Data Mining and AI. The experimental results demonstrate that our model performs well in all the four areas and it features a good learning capacity for cross-area learning and recognition. We also conduct experiments to evaluate the effectiveness of different building modules within our model which indicate that the importance of different building modules in collectively contributing to the good entity recognition performance as a whole. The data augmentation experiments on our model demonstrated that data augmentation positively contributes to model training, making our model much more robust in dealing with the scenarios where only small number of training samples are available. We finally apply our model on PAKDD papers published from 2009-2019 to mine insightful results from scientific papers published in a longer time span. △ Less

Submitted 27 January, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:2007.13250 [pdf, other]

Deep Active Learning for Solvability Prediction in Power Systems

Authors: Yichen Zhang, Jianzhe Liu, Feng Qiu, Tianqi Hong, Rui Yao

Abstract: Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled,… ▽ More Traditional methods for solvability region analysis can only have inner approximations with inconclusive conservatism. Machine learning methods have been proposed to approach the real region. In this letter, we propose a deep active learning framework for power system solvability prediction. Compared with the passive learning methods where the training is performed after all instances are labeled, the active learning selects most informative instances to be label and therefore significantly reduce the size of labeled dataset for training. In the active learning framework, the acquisition functions, which correspond to different sampling strategies, are defined in terms of the on-the-fly posterior probability from the classifier. The IEEE 39-bus system is employed to validate the proposed framework, where a two-dimensional case is illustrated to visualize the effectiveness of the sampling method followed by the full-dimensional numerical experiments. △ Less

Submitted 22 December, 2020; v1 submitted 26 July, 2020; originally announced July 2020.

arXiv:2005.11195 [pdf]

A Dynamic Tree Algorithm for On-demand Peer-to-peer Ride-sharing Matching

Authors: Rui Yao, Shlomo Bekhor

Abstract: Innovative shared mobility services provide on-demand flexible mobility options and have the potential to alleviate traffic congestion. These attractive services are challenging from different perspectives. One major challenge in such systems is to find suitable ride-sharing matchings between drivers and passengers with respect to the system objective and constraints, and to provide optimal pickup… ▽ More Innovative shared mobility services provide on-demand flexible mobility options and have the potential to alleviate traffic congestion. These attractive services are challenging from different perspectives. One major challenge in such systems is to find suitable ride-sharing matchings between drivers and passengers with respect to the system objective and constraints, and to provide optimal pickup and drop-off sequence to the drivers. In this paper, we develop an efficient dynamic tree algorithm to find the optimal pickup and drop-off sequence. The algorithm finds an initial solution to the problem, keeps track of previously explored feasible solutions, and reduces the solution search space when considering new requests. In addition, an efficient pre-processing procedure to select candidate passenger requests is proposed, which further improves the algorithm performance. Numerical experiments are conducted on a real size network to illustrate the efficiency of our algorithm. Sensitivity analysis suggests that small vehicle capacities and loose excess travel time constraints do not guarantee overall savings in vehicle kilometer traveled. △ Less

Submitted 22 May, 2020; originally announced May 2020.

Comments: hEART 2020 : 9th Symposium of the European Association for Research in Transportation

arXiv:2005.04463 [pdf, ps, other]

Vehicle Re-Identification Based on Complementary Features

Authors: Cunyuan Gao, Yi Hu, Yi Zhang, Rui Yao, Yong Zhou, Jiaqi Zhao

Abstract: In this work, we present our solution to the vehicle re-identification (vehicle Re-ID) track in AI City Challenge 2020 (AIC2020). The purpose of vehicle Re-ID is to retrieve the same vehicle appeared across multiple cameras, and it could make a great contribution to the Intelligent Traffic System(ITS) and smart city. Due to the vehicle's orientation, lighting and inter-class similarity, it is diff… ▽ More In this work, we present our solution to the vehicle re-identification (vehicle Re-ID) track in AI City Challenge 2020 (AIC2020). The purpose of vehicle Re-ID is to retrieve the same vehicle appeared across multiple cameras, and it could make a great contribution to the Intelligent Traffic System(ITS) and smart city. Due to the vehicle's orientation, lighting and inter-class similarity, it is difficult to achieve robust and discriminative representation feature. For the vehicle Re-ID track in AIC2020, our method is to fuse features extracted from different networks in order to take advantages of these networks and achieve complementary features. For each single model, several methods such as multi-loss, filter grafting, semi-supervised are used to increase the representation ability as better as possible. Top performance in City-Scale Multi-Camera Vehicle Re-Identification demonstrated the advantage of our methods, and we got 5-th place in the vehicle Re-ID track of AIC2020. The codes are available at https://github.com/gggcy/AIC2020_ReID. △ Less

Submitted 9 May, 2020; originally announced May 2020.

arXiv:2001.01168 [pdf, other]

doi 10.1109/TIP.2023.3277794

Facial Action Unit Detection via Adaptive Attention and Relation

Authors: Zhiwen Shao, Yong Zhou, Jianfei Cai, Hancheng Zhu, Rui Yao

Abstract: Facial action unit (AU) detection is challenging due to the difficulty in capturing correlated information from subtle and dynamic AUs. Existing methods often resort to the localization of correlated regions of AUs, in which predefining local AU attentions by correlated facial landmarks often discards essential parts, or learning global attention maps often contains irrelevant areas. Furthermore,… ▽ More Facial action unit (AU) detection is challenging due to the difficulty in capturing correlated information from subtle and dynamic AUs. Existing methods often resort to the localization of correlated regions of AUs, in which predefining local AU attentions by correlated facial landmarks often discards essential parts, or learning global attention maps often contains irrelevant areas. Furthermore, existing relational reasoning methods often employ common patterns for all AUs while ignoring the specific way of each AU. To tackle these limitations, we propose a novel adaptive attention and relation (AAR) framework for facial AU detection. Specifically, we propose an adaptive attention regression network to regress the global attention map of each AU under the constraint of attention predefinition and the guidance of AU detection, which is beneficial for capturing both specified dependencies by landmarks in strongly correlated regions and facial globally distributed dependencies in weakly correlated regions. Moreover, considering the diversity and dynamics of AUs, we propose an adaptive spatio-temporal graph convolutional network to simultaneously reason the independent pattern of each AU, the inter-dependencies among AUs, as well as the temporal dependencies. Extensive experiments show that our approach (i) achieves competitive performance on challenging benchmarks including BP4D, DISFA, and GFT in constrained scenarios and Aff-Wild2 in unconstrained scenarios, and (ii) can precisely learn the regional correlation distribution of each AU. △ Less

Submitted 16 May, 2023; v1 submitted 5 January, 2020; originally announced January 2020.

Comments: This paper has been accepted by IEEE Transactions on Image Processing (TIP)

arXiv:1912.12395 [pdf, ps, other]

OpenRadar: A Toolkit for Prototyping mmWave Radar Applications

Authors: Arjun Gupta, Dashiell Kosaka, Edwin Pan, Jingning Tang, Ruihao Yao, Sanjay Patel

Abstract: Millimeter-Wave (mmWave) radar sensors are gaining popularity for their robust sensing and increasing imaging capabilities. However, current radar signal processing is hardware specific, which makes it impossible to build sensor agnostic solutions. OpenRadar serves as an interface to prototype, research, and benchmark solutions in a modular manner. This enables creating software processing stacks… ▽ More Millimeter-Wave (mmWave) radar sensors are gaining popularity for their robust sensing and increasing imaging capabilities. However, current radar signal processing is hardware specific, which makes it impossible to build sensor agnostic solutions. OpenRadar serves as an interface to prototype, research, and benchmark solutions in a modular manner. This enables creating software processing stacks in a way that has not yet been extensively explored. In the wake of increased AI adoption, OpenRadar can accelerate the growth of the combined fields of radar and AI. The OpenRadar API was released on Oct 2, 2019 as an open-source package under the Apache 2.0 license. The codebase exists at https://github.com/presenseradar/openradar. △ Less

Submitted 27 December, 2019; originally announced December 2019.

MSC Class: I.2.0; I.5.4; J.7 ACM Class: I.2.0; I.5.4; J.7

arXiv:1912.00398 [pdf, other]

Deep Human Answer Understanding for Natural Reverse QA

Authors: Rujing Yao, Linlin Hou, Lei Yang, Jie Gui, Qing Yin, Ou Wu

Abstract: This study focuses on a reverse question answering (QA) procedure, in which machines proactively raise questions and humans supply the answers. This procedure exists in many real human-machine interaction applications. However, a crucial problem in human-machine interaction is answer understanding. The existing solutions have relied on mandatory option term selection to avoid automatic answer unde… ▽ More This study focuses on a reverse question answering (QA) procedure, in which machines proactively raise questions and humans supply the answers. This procedure exists in many real human-machine interaction applications. However, a crucial problem in human-machine interaction is answer understanding. The existing solutions have relied on mandatory option term selection to avoid automatic answer understanding. However, these solutions have led to unnatural human-computer interaction and negatively affected user experience. To this end, the current study proposes a novel deep answer understanding network, called AntNet, for reverse QA. The network consists of three new modules, namely, skeleton attention for questions, relevance-aware representation of answers, and multi-hop based fusion. As answer understanding for reverse QA has not been explored, a new data corpus is compiled in this study. Experimental results indicate that our proposed network is significantly better than existing methods and those modified from classical natural language processing deep models. The effectiveness of the three new modules is also verified. △ Less

Submitted 28 November, 2020; v1 submitted 1 December, 2019; originally announced December 2019.

arXiv:1911.13096 [pdf]

Method and Dataset Mining in Scientific Papers

Authors: Rujing Yao, Linlin Hou, Yingchun Ye, Ou Wu, Ji Zhang, Jian Wu

Abstract: Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining… ▽ More Literature analysis facilitates researchers better understanding the development of science and technology. The conventional literature analysis focuses on the topics, authors, abstracts, keywords, references, etc., and rarely pays attention to the content of papers. In the field of machine learning, the involved methods (M) and datasets (D) are key information in papers. The extraction and mining of M and D are useful for discipline analysis and algorithm recommendation. In this paper, we propose a novel entity recognition model, called MDER, and constructe datasets from the papers of the PAKDD conferences (2009-2019). Some preliminary experiments are conducted to assess the extraction performance and the mining results are visualized. △ Less

Submitted 29 November, 2019; originally announced November 2019.

arXiv:1910.13174 [pdf, other]

Autonomous UAV Landing System Based on Visual Navigation

Authors: Zhixin Wu, Peng Han, Ruiwen Yao, Lei Qiao, Weidong Zhang, Tielong Shen, Min Sun, Yilong Zhu, Ming Liu, Rui Fan

Abstract: In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal pla… ▽ More In this paper, we present an autonomous unmanned aerial vehicle (UAV) landing system based on visual navigation. We design the landmark as a topological pattern in order to enable the UAV to distinguish the landmark from the environment easily. In addition, a dynamic thresholding method is developed for image binarization to improve detection efficiency. The relative distance in the horizontal plane is calculated according to effective image information, and the relative height is obtained using a linear interpolation method. The landing experiments are performed on a static and a moving platform, respectively. The experimental results illustrate that our proposed landing system performs robustly and accurately. △ Less

Submitted 29 October, 2019; originally announced October 2019.

Comments: 6 pages, 13 figures, 2019 IEEE International Conference on Imaging Systems and Techniques (IST)

arXiv:1910.13055 [pdf, other]

PT-ResNet: Perspective Transformation-Based Residual Network for Semantic Road Image Segmentation

Authors: Rui Fan, Yuan Wang, Lei Qiao, Ruiwen Yao, Peng Han, Weidong Zhang, Ioannis Pitas, Ming Liu

Abstract: Semantic road region segmentation is a high-level task, which paves the way towards road scene understanding. This paper presents a residual network trained for semantic road segmentation. Firstly, we represent the projections of road disparities in the v-disparity map as a linear model, which can be estimated by optimizing the v-disparity map using dynamic programming. This linear model is then u… ▽ More Semantic road region segmentation is a high-level task, which paves the way towards road scene understanding. This paper presents a residual network trained for semantic road segmentation. Firstly, we represent the projections of road disparities in the v-disparity map as a linear model, which can be estimated by optimizing the v-disparity map using dynamic programming. This linear model is then utilized to reduce the redundant information in the left and right road images. The right image is also transformed into the left perspective view, which greatly enhances the road surface similarity between the two images. Finally, the processed stereo images and their disparity maps are concatenated to create a set of 3D images, which are then utilized to train our neural network. The experimental results illustrate that our network achieves a maximum F1-measure of approximately 91.19% when analyzing the images from the KITTI road dataset. △ Less

Submitted 28 October, 2019; originally announced October 2019.

Comments: 5 pages, 5 figures, accepted by 2019 IEEE International Conference on Imaging Systems and Techniques (IST)

arXiv:1909.05891 [pdf, other]

Traffic-aware Two-stage Queueing Communication Networks: Queue Analysis and Energy Saving

Authors: Nan Qi, Nikolaos I. Miridakis, Ming Xiao, Theodoros A. Tsiftsis, Rugui Yao, Shi Jin

Abstract: To boost energy saving for the general delay-tolerant IoT networks, a two-stage and single-relay queueing communication scheme is investigated. Concretely, a traffic-aware $N$-threshold and gated-service policy are applied at the relay. As two fundamental and significant performance metrics, the mean waiting time and long-term expected power consumption are explicitly derived and related with the… ▽ More To boost energy saving for the general delay-tolerant IoT networks, a two-stage and single-relay queueing communication scheme is investigated. Concretely, a traffic-aware $N$-threshold and gated-service policy are applied at the relay. As two fundamental and significant performance metrics, the mean waiting time and long-term expected power consumption are explicitly derived and related with the queueing and service parameters, such as packet arrival rate, service threshold and channel statistics. Besides, we take into account the electrical circuit energy consumptions when the relay server and access point (AP) are in different modes and energy costs for mode transitions, whereby the power consumption model is more practical. The expected power minimization problem under the mean waiting time constraint is formulated. Tight closed-form bounds are adopted to obtain tractable analytical formulae with less computational complexity. The optimal energy-saving service threshold that can flexibly adjust to packet arrival rate is determined. In addition, numerical results reveal that: 1) sacrificing the mean waiting time not necessarily facilitates power savings; 2) a higher arrival rate leads to a greater optimal service threshold; and 3) our policy performs better than the current state-of-the-art. △ Less

Submitted 14 February, 2020; v1 submitted 12 September, 2019; originally announced September 2019.

arXiv:1904.09172 [pdf, other]

Video Object Segmentation and Tracking: A Survey

Authors: Rui Yao, Guosheng Lin, Shixiong Xia, Jiaqi Zhao, Yong Zhou

Abstract: Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are diffcult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter suffers from difficulties in handling fast motion,… ▽ More Object segmentation and object tracking are fundamental research area in the computer vision community. These two topics are diffcult to handle some common challenges, such as occlusion, deformation, motion blur, and scale variation. The former contains heterogeneous object, interacting object, edge ambiguity, and shape complexity. And the latter suffers from difficulties in handling fast motion, out-of-view, and real-time processing. Combining the two problems of video object segmentation and tracking (VOST) can overcome their respective difficulties and improve their performance. VOST can be widely applied to many practical applications such as video summarization, high definition video compression, human computer interaction, and autonomous vehicles. This article aims to provide a comprehensive review of the state-of-the-art tracking methods, and classify these methods into different categories, and identify new trends. First, we provide a hierarchical categorization existing approaches, including unsupervised VOS, semi-supervised VOS, interactive VOS, weakly supervised VOS, and segmentation-based tracking methods. Second, we provide a detailed discussion and overview of the technical characteristics of the different methods. Third, we summarize the characteristics of the related video dataset, and provide a variety of evaluation metrics. Finally, we point out a set of interesting future works and draw our own conclusions. △ Less

Submitted 26 April, 2019; v1 submitted 19 April, 2019; originally announced April 2019.

arXiv:1903.02351 [pdf, other]

CANet: Class-Agnostic Segmentation Networks with Iterative Refinement and Attentive Few-Shot Learning

Authors: Chi Zhang, Guosheng Lin, Fayao Liu, Rui Yao, Chunhua Shen

Abstract: Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets. However, data labeling for pixel-wise segmentation is tedious and costly. Moreover, a trained model can only make predictions within a set of pre-defined classes. In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation o… ▽ More Recent progress in semantic segmentation is driven by deep Convolutional Neural Networks and large-scale labeled image datasets. However, data labeling for pixel-wise segmentation is tedious and costly. Moreover, a trained model can only make predictions within a set of pre-defined classes. In this paper, we present CANet, a class-agnostic segmentation network that performs few-shot segmentation on new classes with only a few annotated images available. Our network consists of a two-branch dense comparison module which performs multi-level feature comparison between the support image and the query image, and an iterative optimization module which iteratively refines the predicted results. Furthermore, we introduce an attention mechanism to effectively fuse information from multiple support examples under the setting of k-shot learning. Experiments on PASCAL VOC 2012 show that our method achieves a mean Intersection-over-Union score of 55.4% for 1-shot segmentation and 57.1% for 5-shot segmentation, outperforming state-of-the-art methods by a large margin of 14.6% and 13.2%, respectively. △ Less

Submitted 6 March, 2019; originally announced March 2019.

Comments: Accepted to CVPR 2019

arXiv:1707.00548 [pdf, other]

Efficient Eye Typing with 9-direction Gaze Estimation

Authors: Chi Zhang, Rui Yao, Jinpeng Cai

Abstract: Vision based text entry systems aim to help disabled people achieve text communication using eye movement. Most previous methods have employed an existing eye tracker to predict gaze direction and design an input method based upon that. However, these methods can result in eye tracking quality becoming easily affected by various factors and lengthy amounts of time for calibration. Our paper presen… ▽ More Vision based text entry systems aim to help disabled people achieve text communication using eye movement. Most previous methods have employed an existing eye tracker to predict gaze direction and design an input method based upon that. However, these methods can result in eye tracking quality becoming easily affected by various factors and lengthy amounts of time for calibration. Our paper presents a novel efficient gaze based text input method, which has the advantage of low cost and robustness. Users can type in words by looking at an on-screen keyboard and blinking. Rather than estimate gaze angles directly to track eyes, we introduce a method that divides the human gaze into nine directions. This method can effectively improve the accuracy of making a selection by gaze and blinks. We build a Convolutional Neural Network (CNN) model for 9-direction gaze estimation. On the basis of the 9-direction gaze, we use a nine-key T9 input method which is widely used in candy bar phones. Bar phones were very popular in the world decades ago and have cultivated strong user habits and language models. To train a robust gaze estimator, we created a large-scale dataset with images of eyes sourced from 25 people. According to the results from our experiments, our CNN model is able to accurately estimate different people's gaze under various lighting conditions by different devices. In considering disable people's needs, we removed the complex calibration process. The input methods can run in screen mode and portable off-screen mode. Moreover, The datasets used in our experiments are made available to the community to allow further experimentation. △ Less

Submitted 3 July, 2017; originally announced July 2017.

arXiv:1705.01671 [pdf, other]

Towards Simulation and Risk Assessment of Weather-Related Cascading Outages

Authors: Rui Yao, Kai Sun

Abstract: Weather and environmental factors are verified to have played significant roles in historical major cascading outages and blackouts. Therefore, in the simulation and risk assessment of cascading outages in power systems, it is necessary to consider the weather and environmental effects. This paper proposes a method for the risk assessment of weather-related cascading outages. Based on the analysis… ▽ More Weather and environmental factors are verified to have played significant roles in historical major cascading outages and blackouts. Therefore, in the simulation and risk assessment of cascading outages in power systems, it is necessary to consider the weather and environmental effects. This paper proposes a method for the risk assessment of weather-related cascading outages. Based on the analysis of historical outage records and temperature-dependent physical outage mechanisms of transmission lines, an outage rate model considering weather condition and conductor temperature is proposed, and the analytical form of outage probability of lines are derived. With the weather-dependent outage model, a two-stage risk assessment method based on Markovian tree (MT) search is proposed, which consists of offline full assessment, and online efficient update of risk assessment results and continued MT search using updated NWP data. The test cases on NPCC 140-bus test system model in winter and summer scenarios verify the advantages of the proposed risk assessment method in both accuracy and efficiency. △ Less

Submitted 3 May, 2017; originally announced May 2017.

Showing 1–50 of 54 results for author: Yao, R