subscribe to arXiv mailings

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Authors: Penghui Du, Yu Wang, Yifan Sun, Luting Wang, Yue Liao, Gang Zhang, Errui Ding, Yan Wang, Jingdong Wang, Si Liu

Abstract: Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vo… ▽ More Existing methods enhance open-vocabulary object detection by leveraging the robust open-vocabulary recognition capabilities of Vision-Language Models (VLMs), such as CLIP.However, two main challenges emerge:(1) A deficiency in concept representation, where the category names in CLIP's text space lack textual and visual knowledge.(2) An overfitting tendency towards base categories, with the open vocabulary knowledge biased towards base categories during the transfer from VLMs to detectors.To address these challenges, we propose the Language Model Instruction (LaMI) strategy, which leverages the relationships between visual concepts and applies them within a simple yet effective DETR-like detector, termed LaMI-DETR.LaMI utilizes GPT to construct visual concepts and employs T5 to investigate visual similarities across categories.These inter-category relationships refine concept representation and avoid overfitting to base categories.Comprehensive experiments validate our approach's superior performance over existing methods in the same rigorous setting without reliance on external training resources.LaMI-DETR achieves a rare box AP of 43.4 on OV-LVIS, surpassing the previous best by 7.8 rare box AP. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: ECCV2024

arXiv:2406.04785 [pdf, other]

Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

Authors: Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang

Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. Fir… ▽ More Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. First, the generation lengths of requests in a batch vary, and requests with short generation lengths must wait for requests with long generation lengths to finish during the batch serving procedure. Second, requests with longer generation lengths consume more memory during serving. Without knowing the generation lengths of batched requests, the batch size is always set small to avoid the out-of-memory (OOM) error, thus preventing the GPU from being fully utilized. In this paper, we find that a significant number of popular applications in the LMaaS scenario have a positive correlation between the generation length and the length of raw user input. Based on this observation, we propose Magnus, which can accurately predict the request generation length with the user input length, application-level, and user-level semantic features. Accordingly, Magnus can achieve high request throughput by batching requests of similar generation lengths together with adaptive batch sizes. Besides, Magnus can also schedule batches with the highest response ratio next (HRRN) policy to reduce request response time. Experiments conducted on our testbed show that Magnus improves request throughput by up to 234\% and reduces response time by up to 89.7\% compared to baselines. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: 12 pages, 14 figures

arXiv:2406.02822 [pdf, other]

W-RIZZ: A Weakly-Supervised Framework for Relative Traversability Estimation in Mobile Robotics

Authors: Andre Schreiber, Arun N. Sivakumar, Peter Du, Mateus V. Gasparino, Girish Chowdhary, Katherine Driggs-Campbell

Abstract: Successful deployment of mobile robots in unstructured domains requires an understanding of the environment and terrain to avoid hazardous areas, getting stuck, and colliding with obstacles. Traversability estimation--which predicts where in the environment a robot can travel--is one prominent approach that tackles this problem. Existing geometric methods may ignore important semantic consideratio… ▽ More Successful deployment of mobile robots in unstructured domains requires an understanding of the environment and terrain to avoid hazardous areas, getting stuck, and colliding with obstacles. Traversability estimation--which predicts where in the environment a robot can travel--is one prominent approach that tackles this problem. Existing geometric methods may ignore important semantic considerations, while semantic segmentation approaches involve a tedious labeling process. Recent self-supervised methods reduce labeling tedium, but require additional data or models and tend to struggle to explicitly label untraversable areas. To address these limitations, we introduce a weakly-supervised method for relative traversability estimation. Our method involves manually annotating the relative traversability of a small number of point pairs, which significantly reduces labeling effort compared to traditional segmentation-based methods and avoids the limitations of self-supervised methods. We further improve the performance of our method through a novel cross-image labeling strategy and loss function. We demonstrate the viability and performance of our method through deployment on a mobile robot in outdoor environments. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by RA-L. Code is available at https://github.com/andreschreiber/W-RIZZ

arXiv:2404.02162 [pdf, other]

doi 10.1109/LWC.2024.3368652

LoS Sensing-based Channel Estimation in UAV-Assisted OFDM Systems

Authors: Chaojin Qing, Zhiying Liu, Wenquan Hu, Yinjie Zhang, Xi Cai, Pengfei Du

Abstract: In unmanned aerial vehicle (UAV)-assisted orthogonal frequency division multiplexing (OFDM) systems, the potential advantage of the line-of-sight (LoS) path, characterized by its high probability of existence, has not been fully harnessed, thereby impeding the improvement of channel estimation (CE) accuracy. Inspired by the ideas of integrated sensing and communication (ISAC), this letter develops… ▽ More In unmanned aerial vehicle (UAV)-assisted orthogonal frequency division multiplexing (OFDM) systems, the potential advantage of the line-of-sight (LoS) path, characterized by its high probability of existence, has not been fully harnessed, thereby impeding the improvement of channel estimation (CE) accuracy. Inspired by the ideas of integrated sensing and communication (ISAC), this letter develops a LoS sensing method aimed at detecting the presence of LoS path. Leveraging the prior information obtained from LoS path detection, the detection thresholds for resolvable paths are proposed for LoS and Non-LoS (NLoS) scenarios, respectively. By employing these specifically designed detection thresholds, denoising processing is applied to classical least square (LS) CE, thereby improving the CE accuracy. Simulation results validate the effectiveness of the proposed method in enhancing CE accuracy and demonstrate its robustness against parameter variations. △ Less

Submitted 22 February, 2024; originally announced April 2024.

arXiv:2403.01928 [pdf, other]

ZSL-RPPO: Zero-Shot Learning for Quadrupedal Locomotion in Challenging Terrains using Recurrent Proximal Policy Optimization

Authors: Yao Zhao, Tao Wu, Yijie Zhu, Xiang Lu, Jun Wang, Haitham Bou-Ammar, Xinyu Zhang, Peng Du

Abstract: We present ZSL-RPPO, an improved zero-shot learning architecture that overcomes the limitations of teacher-student neural networks and enables generating robust, reliable, and versatile locomotion for quadrupedal robots in challenging terrains. We propose a new algorithm RPPO (Recurrent Proximal Policy Optimization) that directly trains recurrent neural network in partially observable environments… ▽ More We present ZSL-RPPO, an improved zero-shot learning architecture that overcomes the limitations of teacher-student neural networks and enables generating robust, reliable, and versatile locomotion for quadrupedal robots in challenging terrains. We propose a new algorithm RPPO (Recurrent Proximal Policy Optimization) that directly trains recurrent neural network in partially observable environments and results in more robust training using domain randomization. Our locomotion controller supports extensive perturbation across simulation-to-reality transfer for both intrinsic and extrinsic physical parameters without further fine-tuning. This can avoid the significant decline of student's performance during simulation-to-reality transfer and therefore enhance the robustness and generalization of the locomotion controller. We deployed our controller on the Unitree A1 and Aliengo robots in real environment and exteroceptive perception is provided by either a solid-state Lidar or a depth camera. Our locomotion controller was tested in various challenging terrains like slippery surfaces, Grassy Terrain, and stairs. Our experiment results and comparison show that our approach significantly outperforms the state-of-the-art. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00561 [pdf, other]

Multi-Task Learning Using Uncertainty to Weigh Losses for Heterogeneous Face Attribute Estimation

Authors: Huaqing Yuan, Yi He, Peng Du, Lu Song

Abstract: Face images contain a wide variety of attribute information. In this paper, we propose a generalized framework for joint estimation of ordinal and nominal attributes based on information sharing. We tackle the correlation problem between heterogeneous attributes using hard parameter sharing of shallow features, and trade-off multiple loss functions by considering homoskedastic uncertainty for each… ▽ More Face images contain a wide variety of attribute information. In this paper, we propose a generalized framework for joint estimation of ordinal and nominal attributes based on information sharing. We tackle the correlation problem between heterogeneous attributes using hard parameter sharing of shallow features, and trade-off multiple loss functions by considering homoskedastic uncertainty for each attribute estimation task. This leads to optimal estimation of multiple attributes of the face and reduces the training cost of multitask learning. Experimental results on benchmarks with multiple face attributes show that the proposed approach has superior performance compared to state of the art. Finally, we discuss the bias issues arising from the proposed approach in face attribute estimation and validate its feasibility on edge systems. △ Less

Submitted 1 March, 2024; originally announced March 2024.

arXiv:2402.11139 [pdf, other]

LiGNN: Graph Neural Networks at LinkedIn

Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, Ping Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.06329 [pdf]

A Network for structural dense displacement based on 3D deformable mesh model and optical flow

Authors: Peimian Du, Qicheng Guo, Yanru Li

Abstract: This study proposes a Network to recognize displacement of a RC frame structure from a video by a monocular camera. The proposed Network consists of two modules which is FlowNet2 and POFRN-Net. FlowNet2 is used to generate dense optical flow as well as POFRN-Net is to extract pose parameter H. FlowNet2 convert two video frames into dense optical flow. POFRN-Net is inputted dense optical flow from… ▽ More This study proposes a Network to recognize displacement of a RC frame structure from a video by a monocular camera. The proposed Network consists of two modules which is FlowNet2 and POFRN-Net. FlowNet2 is used to generate dense optical flow as well as POFRN-Net is to extract pose parameter H. FlowNet2 convert two video frames into dense optical flow. POFRN-Net is inputted dense optical flow from FlowNet2 to output the pose parameter H. The displacement of any points of structure can be calculated from parameter H. The Fast Fourier Transform (FFT) is applied to obtain frequency domain signal from corresponding displacement signal. Furthermore, the comparison of the truth displacement on the First floor of the First video is shown in this study. Finally, the predicted displacements on four floors of RC frame structure of given three videos are exhibited in the last of this study. △ Less

Submitted 9 February, 2024; originally announced February 2024.

Comments: Paper for the 3rd International Competition for Structural Health Monitoring (IC-SHM 2022): 15 pages, 13 figures

arXiv:2402.02547 [pdf]

Integration of cognitive tasks into artificial general intelligence test for large models

Authors: Youzhi Qu, Chen Wei, Penghui Du, Wenxin Che, Chi Zhang, Wanli Ouyang, Yatao Bian, Feiyang Xu, Bin Hu, Kai Du, Haiyan Wu, Jia Liu, Quanying Liu

Abstract: During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of… ▽ More During the evolution of large models, performance evaluation is necessarily performed to assess their capabilities and ensure safety before practical application. However, current model evaluations mainly rely on specific tasks and datasets, lacking a united framework for assessing the multidimensional intelligence of large models. In this perspective, we advocate for a comprehensive framework of cognitive science-inspired artificial general intelligence (AGI) tests, aimed at fulfilling the testing needs of large models with enhanced capabilities. The cognitive science-inspired AGI tests encompass the full spectrum of intelligence facets, including crystallized intelligence, fluid intelligence, social intelligence, and embodied intelligence. To assess the multidimensional intelligence of large models, the AGI tests consist of a battery of well-designed cognitive tests adopted from human intelligence tests, and then naturally encapsulates into an immersive virtual community. We propose increasing the complexity of AGI testing tasks commensurate with advancements in large models and emphasizing the necessity for the interpretation of test results to avoid false negatives and false positives. We believe that cognitive science-inspired AGI tests will effectively guide the targeted improvement of large models in specific dimensions of intelligence and accelerate the integration of large models into human society. △ Less

Submitted 5 March, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

arXiv:2401.11679 [pdf, other]

Simulating Nighttime Visible Satellite Imagery of Tropical Cyclones Using Conditional Generative Adversarial Networks

Authors: Jinghuai Yao, Puyuan Du, Yucheng Zhao, Yubo Wang

Abstract: Visible (VIS) imagery of satellites has various important applications in meteorology, including monitoring Tropical Cyclones (TCs). However, it is unavailable at night because of the lack of sunlight. This study presents a Conditional Generative Adversarial Networks (CGAN) model that generates highly accurate nighttime visible reflectance using infrared (IR) bands and sunlight direction parameter… ▽ More Visible (VIS) imagery of satellites has various important applications in meteorology, including monitoring Tropical Cyclones (TCs). However, it is unavailable at night because of the lack of sunlight. This study presents a Conditional Generative Adversarial Networks (CGAN) model that generates highly accurate nighttime visible reflectance using infrared (IR) bands and sunlight direction parameters as input. The model was trained and validated using target area observations of the Advanced Himawari Imager (AHI) in the daytime. This study also presents the first nighttime model validation using the Day/Night Band (DNB) of the Visible/Infrared Imager Radiometer Suite (VIIRS). The daytime statistical results of the Structural Similarity Index Measure (SSIM), Peak Signal-to-Noise Ratio (PSNR), Root Mean Square Error (RMSE), Correlation Coefficient (CC), and Bias are 0.885, 28.3, 0.0428, 0.984, and -0.0016 respectively, completely surpassing the model performance of previous studies. The nighttime statistical results of SSIM, PSNR, RMSE, and CC are 0.821, 24.4, 0.0643, and 0.969 respectively, which are slightly negatively impacted by the parallax between satellites. We performed full-disk model validation which proves our model could also be readily applied in the tropical ocean without TCs in the northern hemisphere. This model contributes to the nighttime monitoring of meteorological phenomena by providing accurate AI-generated visible imagery with adjustable virtual sunlight directions. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2312.10317 [pdf, other]

Spatial-Temporal DAG Convolutional Networks for End-to-End Joint Effective Connectivity Learning and Resting-State fMRI Classification

Authors: Rui Yang, Wenrui Dai, Huajun She, Yiping P. Du, Dapeng Wu, Hongkai Xiong

Abstract: Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation… ▽ More Building comprehensive brain connectomes has proved of fundamental importance in resting-state fMRI (rs-fMRI) analysis. Based on the foundation of brain network, spatial-temporal-based graph convolutional networks have dramatically improved the performance of deep learning methods in rs-fMRI time series classification. However, existing works either pre-define the brain network as the correlation matrix derived from the raw time series or jointly learn the connectome and model parameters without any topology constraint. These methods could suffer from degraded classification performance caused by the deviation from the intrinsic brain connectivity and lack biological interpretability of demonstrating the causal structure (i.e., effective connectivity) among brain regions. Moreover, most existing methods for effective connectivity learning are unaware of the downstream classification task and cannot sufficiently exploit useful rs-fMRI label information. To address these issues in an end-to-end manner, we model the brain network as a directed acyclic graph (DAG) to discover direct causal connections between brain regions and propose Spatial-Temporal DAG Convolutional Network (ST-DAGCN) to jointly infer effective connectivity and classify rs-fMRI time series by learning brain representations based on nonlinear structural equation model. The optimization problem is formulated into a continuous program and solved with score-based learning method via gradient descent. We evaluate ST-DAGCN on two public rs-fMRI databases. Experiments show that ST-DAGCN outperforms existing models by evident margins in rs-fMRI classification and simultaneously learns meaningful edges of effective connectivity that help understand brain activity patterns and pathological mechanisms in brain disease. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by NeurIPS 2023 Temporal Graph Learning Workshop

arXiv:2309.07394 [pdf, other]

doi 10.1109/TMI.2023.3309971

Nucleus-aware Self-supervised Pretraining Using Unpaired Image-to-image Translation for Histopathology Images

Authors: Zhiyun Song, Penghui Du, Junpeng Yan, Kailu Li, Jianzhong Shou, Maode Lai, Yubo Fan, Yan Xu

Abstract: Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretr… ▽ More Self-supervised pretraining attempts to enhance model performance by obtaining effective features from unlabeled data, and has demonstrated its effectiveness in the field of histopathology images. Despite its success, few works concentrate on the extraction of nucleus-level information, which is essential for pathologic analysis. In this work, we propose a novel nucleus-aware self-supervised pretraining framework for histopathology images. The framework aims to capture the nuclear morphology and distribution information through unpaired image-to-image translation between histopathology images and pseudo mask images. The generation process is modulated by both conditional and stochastic style representations, ensuring the reality and diversity of the generated histopathology images for pretraining. Further, an instance segmentation guided strategy is employed to capture instance-level information. The experiments on 7 datasets show that the proposed pretraining method outperforms supervised ones on Kather classification, multiple instance learning, and 5 dense-prediction tasks with the transfer learning protocol, and yields superior results than other self-supervised approaches on 8 semi-supervised tasks. Our project is publicly available at https://github.com/zhiyuns/UNITPathSSL. △ Less

Submitted 13 September, 2023; originally announced September 2023.

arXiv:2308.14353 [pdf, other]

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

Authors: Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo Chen, Shengping Liu, Kang Liu, Jun Zhao

Abstract: The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Es… ▽ More The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Especially, we also propose a new benchmark that focuses on knowledge ability of LLMs. (2) Multi-faceted evaluation methods collaboration: We use 3 different yet complementary evaluation methods to comprehensively evaluate LLMs, which can ensure the authority and accuracy of the evaluation results. (3) Comprehensive Chinese benchmark: ZhuJiu is the pioneering benchmark that fully assesses LLMs in Chinese, while also providing equally robust evaluation abilities in English. (4) Avoiding potential data leakage: To avoid data leakage, we construct evaluation data specifically for 37 tasks. We evaluate 10 current mainstream LLMs and conduct an in-depth discussion and analysis of their results. The ZhuJiu benchmark and open-participation leaderboard are publicly released at http://www.zhujiu-benchmark.com/ and we also provide a demo video at https://youtu.be/qypkJ89L1Ic. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.11874 [pdf, other]

Semi-Supervised Learning via Weight-aware Distillation under Class Distribution Mismatch

Authors: Pan Du, Suyun Zhao, Zisen Sheng, Cuiping Li, Hong Chen

Abstract: Semi-Supervised Learning (SSL) under class distribution mismatch aims to tackle a challenging problem wherein unlabeled data contain lots of unknown categories unseen in the labeled ones. In such mismatch scenarios, traditional SSL suffers severe performance damage due to the harmful invasion of the instances with unknown categories into the target classifier. In this study, by strict mathematical… ▽ More Semi-Supervised Learning (SSL) under class distribution mismatch aims to tackle a challenging problem wherein unlabeled data contain lots of unknown categories unseen in the labeled ones. In such mismatch scenarios, traditional SSL suffers severe performance damage due to the harmful invasion of the instances with unknown categories into the target classifier. In this study, by strict mathematical reasoning, we reveal that the SSL error under class distribution mismatch is composed of pseudo-labeling error and invasion error, both of which jointly bound the SSL population risk. To alleviate the SSL error, we propose a robust SSL framework called Weight-Aware Distillation (WAD) that, by weights, selectively transfers knowledge beneficial to the target task from unsupervised contrastive representation to the target classifier. Specifically, WAD captures adaptive weights and high-quality pseudo labels to target instances by exploring point mutual information (PMI) in representation space to maximize the role of unlabeled data and filter unknown categories. Theoretically, we prove that WAD has a tight upper bound of population risk under class distribution mismatch. Experimentally, extensive results demonstrate that WAD outperforms five state-of-the-art SSL approaches and one standard baseline on two benchmark datasets, CIFAR10 and CIFAR100, and an artificial cross-dataset. The code is available at https://github.com/RUC-DWBI-ML/research/tree/main/WAD-master. △ Less

Submitted 22 August, 2023; originally announced August 2023.

Comments: ICCV 2023

arXiv:2307.10730 [pdf, other]

Joint Port Selection Based Channel Acquisition for FDD Cell-Free Massive MIMO

Authors: Cheng Zhang, Pengguang Du, Minjie Ding, Yindi Jing, Yongming Huang

Abstract: In frequency division duplexing (FDD) cell-free massive MIMO, the acquisition of the channel state information (CSI) is very challenging because of the large overhead required for the training and feedback of the downlink channels of multiple cooperating base stations (BSs). In this paper, for systems with partial uplink-downlink channel reciprocity, and a general spatial domain channel model with… ▽ More In frequency division duplexing (FDD) cell-free massive MIMO, the acquisition of the channel state information (CSI) is very challenging because of the large overhead required for the training and feedback of the downlink channels of multiple cooperating base stations (BSs). In this paper, for systems with partial uplink-downlink channel reciprocity, and a general spatial domain channel model with variations in the average port power and correlation among port coefficients, we propose a joint-port-selection-based CSI acquisition and feedback scheme for the downlink transmission with zero-forcing precoding. The scheme uses an eigenvalue-decomposition-based transformation to reduce the feedback overhead by exploring the port correlation. We derive the sum-rate of the system for any port selection. Based on the sum-rate result, we propose a low-complexity greedy-search-based joint port selection (GS-JPS) algorithm. Moreover, to adapt to fast time-varying scenarios, a supervised deep learning-enhanced joint port selection (DL-JPS) algorithm is proposed. Simulations verify the effectiveness of our proposed schemes and their advantage over existing port-selection channel acquisition schemes. △ Less

Submitted 12 January, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: 15 pages, 11 figures. The paper has been accepted by IEEE TRANSACTIONS ON COMMUNICATIONS

arXiv:2304.00367 [pdf, other]

Conveying Autonomous Robot Capabilities through Contrasting Behaviour Summaries

Authors: Peter Du, Surya Murthy, Katherine Driggs-Campbell

Abstract: As advances in artificial intelligence enable increasingly capable learning-based autonomous agents, it becomes more challenging for human observers to efficiently construct a mental model of the agent's behaviour. In order to successfully deploy autonomous agents, humans should not only be able to understand the individual limitations of the agents but also have insight on how they compare agains… ▽ More As advances in artificial intelligence enable increasingly capable learning-based autonomous agents, it becomes more challenging for human observers to efficiently construct a mental model of the agent's behaviour. In order to successfully deploy autonomous agents, humans should not only be able to understand the individual limitations of the agents but also have insight on how they compare against one another. To do so, we need effective methods for generating human interpretable agent behaviour summaries. Single agent behaviour summarization has been tackled in the past through methods that generate explanations for why an agent chose to pick a particular action at a single timestep. However, for complex tasks, a per-action explanation may not be able to convey an agents global strategy. As a result, researchers have looked towards multi-timestep summaries which can better help humans assess an agents overall capability. More recently, multi-step summaries have also been used for generating contrasting examples to evaluate multiple agents. However, past approaches have largely relied on unstructured search methods to generate summaries and require agents to have a discrete action space. In this paper we present an adaptive search method for efficiently generating contrasting behaviour summaries with support for continuous state and action spaces. We perform a user study to evaluate the effectiveness of the summaries for helping humans discern the superior autonomous agent for a given task. Our results indicate that adaptive search can efficiently identify informative contrasting scenarios that enable humans to accurately select the better performing agent with a limited observation time budget. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2304.00365 [pdf, other]

Adaptive Failure Search Using Critical States from Domain Experts

Authors: Peter Du, Katherine Driggs-Campbell

Abstract: Uncovering potential failure cases is a crucial step in the validation of safety critical systems such as autonomous vehicles. Failure search may be done through logging substantial vehicle miles in either simulation or real world testing. Due to the sparsity of failure events, naive random search approaches require significant amounts of vehicle operation hours to find potential system weaknesses… ▽ More Uncovering potential failure cases is a crucial step in the validation of safety critical systems such as autonomous vehicles. Failure search may be done through logging substantial vehicle miles in either simulation or real world testing. Due to the sparsity of failure events, naive random search approaches require significant amounts of vehicle operation hours to find potential system weaknesses. As a result, adaptive searching techniques have been proposed to efficiently explore and uncover failure trajectories of an autonomous policy in simulation. Adaptive Stress Testing (AST) is one such method that poses the problem of failure search as a Markov decision process and uses reinforcement learning techniques to find high probability failures. However, this formulation requires a probability model for the actions of all agents in the environment. In systems where the environment actions are discrete and dependencies among agents exist, it may be infeasible to fully characterize the distribution or find a suitable proxy. This work proposes the use of a data driven approach to learn a suitable classifier that tries to model how humans identify {critical states and use this to guide failure search in AST. We show that the incorporation of critical states into the AST framework generates failure scenarios with increased safety violations in an autonomous driving policy with a discrete action space. △ Less

Submitted 1 April, 2023; originally announced April 2023.

Comments: Appears in IEEE ICRA 2021

arXiv:2303.05892 [pdf, other]

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection

Authors: Luting Wang, Yi Liu, Penghui Du, Zihan Ding, Yue Liao, Qiaosong Qi, Biaolong Chen, Si Liu

Abstract: Open-vocabulary object detection aims to provide object detectors trained on a fixed set of object categories with the generalizability to detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation to extract knowledge from Pretrained Vision-and-Language Models (PVLMs) and transfer it to detectors. However, due to the non-adaptive proposal cropping and single… ▽ More Open-vocabulary object detection aims to provide object detectors trained on a fixed set of object categories with the generalizability to detect objects described by arbitrary text queries. Previous methods adopt knowledge distillation to extract knowledge from Pretrained Vision-and-Language Models (PVLMs) and transfer it to detectors. However, due to the non-adaptive proposal cropping and single-level feature mimicking processes, they suffer from information destruction during knowledge extraction and inefficient knowledge transfer. To remedy these limitations, we propose an Object-Aware Distillation Pyramid (OADP) framework, including an Object-Aware Knowledge Extraction (OAKE) module and a Distillation Pyramid (DP) mechanism. When extracting object knowledge from PVLMs, the former adaptively transforms object proposals and adopts object-aware mask attention to obtain precise and complete knowledge of objects. The latter introduces global and block distillation for more comprehensive knowledge transfer to compensate for the missing relation information in object distillation. Extensive experiments show that our method achieves significant improvement compared to current methods. Especially on the MS-COCO dataset, our OADP framework reaches $35.6$ mAP$^{\text{N}}_{50}$, surpassing the current state-of-the-art method by $3.3$ mAP$^{\text{N}}_{50}$. Code is released at https://github.com/LutingWang/OADP. △ Less

Submitted 10 March, 2023; originally announced March 2023.

Comments: Accepted by CVPR 2023

arXiv:2212.14189 [pdf, other]

High Resolution Modeling and Analysis of Cryptocurrency Mining's Impact on Power Grids: Carbon Footprint, Reliability, and Electricity Price

Authors: Ali Menati, Xiangtian Zheng, Kiyeob Lee, Ranyu Shi, Pengwei Du, Chanan Singh, Le Xie

Abstract: Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-fa… ▽ More Blockchain technologies are considered one of the most disruptive innovations of the last decade, enabling secure decentralized trust-building. However, in recent years, with the rapid increase in the energy consumption of blockchain-based computations for cryptocurrency mining, there have been growing concerns about their sustainable operation in electric grids. This paper investigates the tri-factor impact of such large loads on carbon footprint, grid reliability, and electricity market price in the Texas grid. We release open-source high-resolution data to enable high-resolution modeling of influencing factors such as location and flexibility. We reveal that the per-megawatt-hour carbon footprint of cryptocurrency mining loads across locations can vary by as much as 50% of the crude system average estimate. We show that the flexibility of mining loads can significantly mitigate power shortages and market disruptions that can result from the deployment of mining loads. These findings suggest policymakers to facilitate the participation of large mining facilities in wholesale markets and require them to provide mandatory demand response. △ Less

Submitted 14 April, 2023; v1 submitted 29 December, 2022; originally announced December 2022.

Comments: This paper has been accepted for publication in the journal of "Advances in Applied Energy"

arXiv:2211.02864 [pdf]

BEKG: A Built Environment Knowledge Graph

Authors: Xiaojun Yang, Haoyu Zhong, Penglin Du, Keyi Zhou, Xingjin Lai, Zhengdong Wang, Yik Lun Lau, Yangqiu Song, Liyaning Tang

Abstract: Practices in the built environment have become more digitalized with the rapid development of modern design and construction technologies. However, the requirement of practitioners or scholars to gather complicated professional knowledge in the built environment has not been satisfied yet. In this paper, more than 80,000 paper abstracts in the built environment field were obtained to build a knowl… ▽ More Practices in the built environment have become more digitalized with the rapid development of modern design and construction technologies. However, the requirement of practitioners or scholars to gather complicated professional knowledge in the built environment has not been satisfied yet. In this paper, more than 80,000 paper abstracts in the built environment field were obtained to build a knowledge graph, a knowledge base storing entities and their connective relations in a graph-structured data model. To ensure the retrieval accuracy of the entities and relations in the knowledge graph, two well-annotated datasets have been created, containing 2,000 instances and 1,450 instances each in 29 relations for the named entity recognition task and relation extraction task respectively. These two tasks were solved by two BERT-based models trained on the proposed dataset. Both models attained an accuracy above 85% on these two tasks. More than 200,000 high-quality relations and entities were obtained using these models to extract all abstract data. Finally, this knowledge graph is presented as a self-developed visualization system to reveal relations between various entities in the domain. Both the source code and the annotated dataset can be found here: https://github.com/HKUST-KnowComp/BEKG. △ Less

Submitted 5 November, 2022; originally announced November 2022.

arXiv:2207.04028 [pdf, other]

CoCAtt: A Cognitive-Conditioned Driver Attention Dataset (Supplementary Material)

Authors: Yuan Shen, Niviru Wijayaratne, Pranav Sriram, Aamir Hasan, Peter Du, Katherine Driggs-Campbell

Abstract: The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can si… ▽ More The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can significantly influence how they observe their surroundings. To address these issues, we present a new driver attention dataset, CoCAtt (Cognitive-Conditioned Attention). Unlike previous driver attention datasets, CoCAtt includes per-frame annotations that describe the distraction state and intention of the driver. In addition, the attention data in our dataset is captured in both manual and autopilot modes using eye-tracking devices of different resolutions. Our results demonstrate that incorporating the above two driver states into attention modeling can improve the performance of driver attention prediction. To the best of our knowledge, this work is the first to provide autopilot attention data. Furthermore, CoCAtt is currently the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios. CoCAtt is available for download at https://cocatt-dataset.github.io. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: Supplementary Material for the main paper, "CoCAtt: A Cognitive-Conditioned Driver Attention Dataset". Accepted at ITSC2022

arXiv:2207.01762 [pdf, other]

PReGAN: Answer Oriented Passage Ranking with Weakly Supervised GAN

Authors: Pan Du, Jian-Yun Nie, Yutao Zhu, Hao Jiang, Lixin Zou, Xiaohui Yan

Abstract: Beyond topical relevance, passage ranking for open-domain factoid question answering also requires a passage to contain an answer (answerability). While a few recent studies have incorporated some reading capability into a ranker to account for answerability, the ranker is still hindered by the noisy nature of the training data typically available in this area, which considers any passage containi… ▽ More Beyond topical relevance, passage ranking for open-domain factoid question answering also requires a passage to contain an answer (answerability). While a few recent studies have incorporated some reading capability into a ranker to account for answerability, the ranker is still hindered by the noisy nature of the training data typically available in this area, which considers any passage containing an answer entity as a positive sample. However, the answer entity in a passage is not necessarily mentioned in relation with the given question. To address the problem, we propose an approach called \ttt{PReGAN} for Passage Reranking based on Generative Adversarial Neural networks, which incorporates a discriminator on answerability, in addition to a discriminator on topical relevance. The goal is to force the generator to rank higher a passage that is topically relevant and contains an answer. Experiments on five public datasets show that \ttt{PReGAN} can better rank appropriate passages, which in turn, boosts the effectiveness of QA systems, and outperforms the existing approaches without using external data. △ Less

Submitted 4 July, 2022; originally announced July 2022.

arXiv:2206.03950 [pdf, other]

Transfer learning to decode brain states reflecting the relationship between cognitive tasks

Authors: Youzhi Qu, Xinyao Jian, Wenxin Che, Penghui Du, Kai Fu, Quanying Liu

Abstract: Transfer learning improves the performance of the target task by leveraging the data of a specific source task: the closer the relationship between the source and the target tasks, the greater the performance improvement by transfer learning. In neuroscience, the relationship between cognitive tasks is usually represented by similarity of activated brain regions or neural representation. However,… ▽ More Transfer learning improves the performance of the target task by leveraging the data of a specific source task: the closer the relationship between the source and the target tasks, the greater the performance improvement by transfer learning. In neuroscience, the relationship between cognitive tasks is usually represented by similarity of activated brain regions or neural representation. However, no study has linked transfer learning and neuroscience to reveal the relationship between cognitive tasks. In this study, we propose a transfer learning framework to reflect the relationship between cognitive tasks, and compare the task relations reflected by transfer learning and by the overlaps of brain regions (e.g., neurosynth). Our results of transfer learning create cognitive taskonomy to reflect the relationship between cognitive tasks which is well in line with the task relations derived from neurosynth. Transfer learning performs better in task decoding with fMRI data if the source and target cognitive tasks activate similar brain regions. Our study uncovers the relationship of multiple cognitive tasks and provides guidance for source task selection in transfer learning for neural decoding based on small-sample data. △ Less

Submitted 30 August, 2022; v1 submitted 7 June, 2022; originally announced June 2022.

arXiv:2204.08939 [pdf, other]

doi 10.1063/5.0101128

Deep learning-based surrogate model for 3-D patient-specific computational fluid dynamics

Authors: Pan Du, Xiaozhi Zhu, Jian-Xun Wang

Abstract: Optimization and uncertainty quantification have been playing an increasingly important role in computational hemodynamics. However, existing methods based on principled modeling and classic numerical techniques have faced significant challenges, particularly when it comes to complex 3D patient-specific shapes in the real world. First, it is notoriously challenging to parameterize the input space… ▽ More Optimization and uncertainty quantification have been playing an increasingly important role in computational hemodynamics. However, existing methods based on principled modeling and classic numerical techniques have faced significant challenges, particularly when it comes to complex 3D patient-specific shapes in the real world. First, it is notoriously challenging to parameterize the input space of arbitrarily complex 3-D geometries. Second, the process often involves massive forward simulations, which are extremely computationally demanding or even infeasible. We propose a novel deep learning surrogate modeling solution to address these challenges and enable rapid hemodynamic predictions. Specifically, a statistical generative model for 3-D patient-specific shapes is developed based on a small set of baseline patient-specific geometries. An unsupervised shape correspondence solution is used to enable geometric morphing and scalable shape synthesis statistically. Moreover, a simulation routine is developed for automatic data generation by automatic meshing, boundary setting, simulation, and post-processing. An efficient supervised learning solution is proposed to map the geometric inputs to the hemodynamics predictions in latent spaces. Numerical studies on aortic flows are conducted to demonstrate the effectiveness and merit of the proposed techniques. △ Less

Submitted 11 April, 2022; originally announced April 2022.

Comments: 8 figures, 2 tables

arXiv:2204.00976 [pdf, other]

FedGBF: An efficient vertical federated learning framework via gradient boosting and bagging

Authors: Yujin Han, Pan Du, Kai Yang

Abstract: Federated learning, conducive to solving data privacy and security problems, has attracted increasing attention recently. However, the existing federated boosting model sequentially builds a decision tree model with the weak base learner, resulting in redundant boosting steps and high interactive communication costs. In contrast, the federated bagging model saves time by building multi-decision tr… ▽ More Federated learning, conducive to solving data privacy and security problems, has attracted increasing attention recently. However, the existing federated boosting model sequentially builds a decision tree model with the weak base learner, resulting in redundant boosting steps and high interactive communication costs. In contrast, the federated bagging model saves time by building multi-decision trees in parallel, but it suffers from performance loss. With the aim of obtaining an outstanding performance with less time cost, we propose a novel model in a vertically federated setting termed as Federated Gradient Boosting Forest (FedGBF). FedGBF simultaneously integrates the boosting and bagging's preponderance by building the decision trees in parallel as a base learner for boosting. Subsequent to FedGBF, the problem of hyperparameters tuning is rising. Then we propose the Dynamic FedGBF, which dynamically changes each forest's parameters and thus reduces the complexity. Finally, the experiments based on the benchmark datasets demonstrate the superiority of our method. △ Less

Submitted 2 April, 2022; originally announced April 2022.

arXiv:2203.02104 [pdf, other]

Interactive Image Synthesis with Panoptic Layout Generation

Authors: Bo Wang, Tao Wu, Minfeng Zhu, Peng Du

Abstract: Interactive image synthesis from user-guided input is a challenging task when users wish to control the scene structure of a generated image with ease.Although remarkable progress has been made on layout-based image synthesis approaches, in order to get realistic fake image in interactive scene, existing methods require high-precision inputs, which probably need adjustment several times and are un… ▽ More Interactive image synthesis from user-guided input is a challenging task when users wish to control the scene structure of a generated image with ease.Although remarkable progress has been made on layout-based image synthesis approaches, in order to get realistic fake image in interactive scene, existing methods require high-precision inputs, which probably need adjustment several times and are unfriendly to novice users. When placement of bounding boxes is subject to perturbation, layout-based models suffer from "missing regions" in the constructed semantic layouts and hence undesirable artifacts in the generated images. In this work, we propose Panoptic Layout Generative Adversarial Networks (PLGAN) to address this challenge. The PLGAN employs panoptic theory which distinguishes object categories between "stuff" with amorphous boundaries and "things" with well-defined shapes, such that stuff and instance layouts are constructed through separate branches and later fused into panoptic layouts. In particular, the stuff layouts can take amorphous shapes and fill up the missing regions left out by the instance layouts. We experimentally compare our PLGAN with state-of-the-art layout-based models on the COCO-Stuff, Visual Genome, and Landscape datasets. The advantages of PLGAN are not only visually demonstrated but quantitatively verified in terms of inception score, Fréchet inception distance, classification accuracy score, and coverage. △ Less

Submitted 28 March, 2022; v1 submitted 3 March, 2022; originally announced March 2022.

Comments: Accepted by CVPR 2022

arXiv:2112.01215 [pdf]

Adaptive Group Collaborative Artificial Bee Colony Algorithm

Authors: Haiquan Wang, Hans-DietrichHaasis, Panpan Du, Xiaobin Xu, Menghao Su, Shengjun Wen, Wenxuan Yue, Shanshan Zhang

Abstract: As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performa… ▽ More As an effective algorithm for solving complex optimization problems, artificial bee colony (ABC) algorithm has shown to be competitive, but the same as other population-based algorithms, it is poor at balancing the abilities of global searching in the whole solution space (named as exploration) and quick searching in local solution space which is defined as exploitation. For improving the performance of ABC, an adaptive group collaborative ABC (AgABC) algorithm is introduced where the population in different phases is divided to specific groups and different search strategies with different abilities are assigned to the members in groups, and the member or strategy which obtains the best solution will be employed for further searching. Experimental results on benchmark functions show that the proposed algorithm with dynamic mechanism is superior to other algorithms in searching accuracy and stability. Furthermore, numerical experiments show that the proposed method can generate the optimal solution for the complex scheduling problem. △ Less

Submitted 2 December, 2021; originally announced December 2021.

arXiv:2111.10014 [pdf, other]

CoCAtt: A Cognitive-Conditioned Driver Attention Dataset

Authors: Yuan Shen, Niviru Wijayaratne, Pranav Sriram, Aamir Hasan, Peter Du, Katie Driggs-Campbell

Abstract: The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can si… ▽ More The task of driver attention prediction has drawn considerable interest among researchers in robotics and the autonomous vehicle industry. Driver attention prediction can play an instrumental role in mitigating and preventing high-risk events, like collisions and casualties. However, existing driver attention prediction models neglect the distraction state and intention of the driver, which can significantly influence how they observe their surroundings. To address these issues, we present a new driver attention dataset, CoCAtt (Cognitive-Conditioned Attention). Unlike previous driver attention datasets, CoCAtt includes per-frame annotations that describe the distraction state and intention of the driver. In addition, the attention data in our dataset is captured in both manual and autopilot modes using eye-tracking devices of different resolutions. Our results demonstrate that incorporating the above two driver states into attention modeling can improve the performance of driver attention prediction. To the best of our knowledge, this work is the first to provide autopilot attention data. Furthermore, CoCAtt is currently the largest and the most diverse driver attention dataset in terms of autonomy levels, eye tracker resolutions, and driving scenarios. △ Less

Submitted 23 November, 2021; v1 submitted 18 November, 2021; originally announced November 2021.

Comments: 10 pages, 5 figures

arXiv:2108.10510 [pdf, other]

doi 10.1145/3459637.3482243

Contrastive Learning of User Behavior Sequence for Context-Aware Document Ranking

Authors: Yutao Zhu, Jian-Yun Nie, Zhicheng Dou, Zhengyi Ma, Xinyu Zhang, Pan Du, Xiaochen Zuo, Hao Jiang

Abstract: Context information in search sessions has proven to be useful for capturing user search intent. Existing studies explored user behavior sequences in sessions in different ways to enhance query suggestion or document ranking. However, a user behavior sequence has often been viewed as a definite and exact signal reflecting a user's behavior. In reality, it is highly variable: user's queries for the… ▽ More Context information in search sessions has proven to be useful for capturing user search intent. Existing studies explored user behavior sequences in sessions in different ways to enhance query suggestion or document ranking. However, a user behavior sequence has often been viewed as a definite and exact signal reflecting a user's behavior. In reality, it is highly variable: user's queries for the same intent can vary, and different documents can be clicked. To learn a more robust representation of the user behavior sequence, we propose a method based on contrastive learning, which takes into account the possible variations in user's behavior sequences. Specifically, we propose three data augmentation strategies to generate similar variants of user behavior sequences and contrast them with other sequences. In so doing, the model is forced to be more robust regarding the possible variations. The optimized sequence representation is incorporated into document ranking. Experiments on two real query log datasets show that our proposed model outperforms the state-of-the-art methods significantly, which demonstrates the effectiveness of our method for context-aware document ranking. △ Less

Submitted 23 August, 2021; originally announced August 2021.

Comments: Accepted by CIKM 2021

arXiv:2108.07949 [pdf, other]

DeepFake MNIST+: A DeepFake Facial Animation Dataset

Authors: Jiajun Huang, Xueyu Wang, Bo Du, Pei Du, Chang Xu

Abstract: The DeepFakes, which are the facial manipulation techniques, is the emerging threat to digital society. Various DeepFake detection methods and datasets are proposed for detecting such data, especially for face-swapping. However, recent researches less consider facial animation, which is also important in the DeepFake attack side. It tries to animate a face image with actions provided by a driving… ▽ More The DeepFakes, which are the facial manipulation techniques, is the emerging threat to digital society. Various DeepFake detection methods and datasets are proposed for detecting such data, especially for face-swapping. However, recent researches less consider facial animation, which is also important in the DeepFake attack side. It tries to animate a face image with actions provided by a driving video, which also leads to a concern about the security of recent payment systems that reply on liveness detection to authenticate real users via recognising a sequence of user facial actions. However, our experiments show that the existed datasets are not sufficient to develop reliable detection methods. While the current liveness detector cannot defend such videos as the attack. As a response, we propose a new human face animation dataset, called DeepFake MNIST+, generated by a SOTA image animation generator. It includes 10,000 facial animation videos in ten different actions, which can spoof the recent liveness detectors. A baseline detection method and a comprehensive analysis of the method is also included in this paper. In addition, we analyze the proposed dataset's properties and reveal the difficulty and importance of detecting animation datasets under different types of motion and compression quality. △ Less

Submitted 17 August, 2021; originally announced August 2021.

Comments: 14 pages

arXiv:2107.08329 [pdf, other]

doi 10.1145/3404835.3463011

Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals

Authors: Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Hao Jiang, Zhicheng Dou

Abstract: A proactive dialogue system has the ability to proactively lead the conversation. Different from the general chatbots which only react to the user, proactive dialogue systems can be used to achieve some goals, e.g., to recommend some items to the user. Background knowledge is essential to enable smooth and natural transitions in dialogue. In this paper, we propose a new multi-task learning framewo… ▽ More A proactive dialogue system has the ability to proactively lead the conversation. Different from the general chatbots which only react to the user, proactive dialogue systems can be used to achieve some goals, e.g., to recommend some items to the user. Background knowledge is essential to enable smooth and natural transitions in dialogue. In this paper, we propose a new multi-task learning framework for retrieval-based knowledge-grounded proactive dialogue. To determine the relevant knowledge to be used, we frame knowledge prediction as a complementary task and use explicit signals to supervise its learning. The final response is selected according to the predicted knowledge, the goal to achieve, and the context. Experimental results show that explicit modeling of knowledge prediction and goal selection can greatly improve the final response selection. Our code is available at https://github.com/DaoD/KPN/. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: Accepted by SIGIR 2021

arXiv:2105.08251 [pdf, other]

Emotion Eliciting Machine: Emotion Eliciting Conversation Generation based on Dual Generator

Authors: Hao Jiang, Yutao Zhu, Xinyu Zhang, Zhicheng Dou, Pan Du, Te Pi, Yantao Jia

Abstract: Recent years have witnessed great progress on building emotional chatbots. Tremendous methods have been proposed for chatbots to generate responses with given emotions. However, the emotion changes of the user during the conversation has not been fully explored. In this work, we study the problem of positive emotion elicitation, which aims to generate responses that can elicit positive emotion of… ▽ More Recent years have witnessed great progress on building emotional chatbots. Tremendous methods have been proposed for chatbots to generate responses with given emotions. However, the emotion changes of the user during the conversation has not been fully explored. In this work, we study the problem of positive emotion elicitation, which aims to generate responses that can elicit positive emotion of the user, in human-machine conversation. We propose a weakly supervised Emotion Eliciting Machine (EEM) to address this problem. Specifically, we first collect weak labels of user emotion status changes in a conversion based on a pre-trained emotion classifier. Then we propose a dual encoder-decoder structure to model the generation of responses in both positive and negative side based on the changes of the user's emotion status in the conversation. An emotion eliciting factor is introduced on top of the dual structure to balance the positive and negative emotional impacts on the generated response during emotion elicitation. The factor also provides a fine-grained controlling manner for emotion elicitation. Experimental results on a large real-world dataset show that EEM outperforms the existing models in generating responses with positive emotion elicitation. △ Less

Submitted 17 May, 2021; originally announced May 2021.

arXiv:2103.13584 [pdf, other]

BERT4SO: Neural Sentence Ordering by Fine-tuning BERT

Authors: Yutao Zhu, Jian-Yun Nie, Kun Zhou, Shengchao Liu, Yabo Ling, Pan Du

Abstract: Sentence ordering aims to arrange the sentences of a given text in the correct order. Recent work frames it as a ranking problem and applies deep neural networks to it. In this work, we propose a new method, named BERT4SO, by fine-tuning BERT for sentence ordering. We concatenate all sentences and compute their representations by using multiple special tokens and carefully designed segment (interv… ▽ More Sentence ordering aims to arrange the sentences of a given text in the correct order. Recent work frames it as a ranking problem and applies deep neural networks to it. In this work, we propose a new method, named BERT4SO, by fine-tuning BERT for sentence ordering. We concatenate all sentences and compute their representations by using multiple special tokens and carefully designed segment (interval) embeddings. The tokens across multiple sentences can attend to each other which greatly enhances their interactions. We also propose a margin-based listwise ranking loss based on ListMLE to facilitate the optimization process. Experimental results on five benchmark datasets demonstrate the effectiveness of our proposed method. △ Less

Submitted 11 May, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

arXiv:2102.13034 [pdf, other]

doi 10.1145/3411763.3451591

AutoPreview: A Framework for Autopilot Behavior Understanding

Authors: Yuan Shen, Niviru Wijayaratne, Peter Du, Shanduojiao Jiang, Katherine Driggs Campbell

Abstract: The behavior of self driving cars may differ from people expectations, (e.g. an autopilot may unexpectedly relinquish control). This expectation mismatch can cause potential and existing users to distrust self driving technology and can increase the likelihood of accidents. We propose a simple but effective framework, AutoPreview, to enable consumers to preview a target autopilot potential actions… ▽ More The behavior of self driving cars may differ from people expectations, (e.g. an autopilot may unexpectedly relinquish control). This expectation mismatch can cause potential and existing users to distrust self driving technology and can increase the likelihood of accidents. We propose a simple but effective framework, AutoPreview, to enable consumers to preview a target autopilot potential actions in the real world driving context before deployment. For a given target autopilot, we design a delegate policy that replicates the target autopilot behavior with explainable action representations, which can then be queried online for comparison and to build an accurate mental model. To demonstrate its practicality, we present a prototype of AutoPreview integrated with the CARLA simulator along with two potential use cases of the framework. We conduct a pilot study to investigate whether or not AutoPreview provides deeper understanding about autopilot behavior when experiencing a new autopilot policy for the first time. Our results suggest that the AutoPreview method helps users understand autopilot behavior in terms of driving style comprehension, deployment preference, and exact action timing prediction. △ Less

Submitted 25 February, 2021; originally announced February 2021.

Comments: 7 pages, 5 figures, CHI 2021 Late breaking Work

Journal ref: CHI Conference on Human Factors in Computing Systems Extended Abstracts (CHI '21 Extended Abstracts), May 8 to 13, 2021, Yokohama, Japan

arXiv:2101.08426 [pdf, other]

Content Selection Network for Document-grounded Retrieval-based Chatbots

Authors: Yutao Zhu, Jian-Yun Nie, Kun Zhou, Pan Du, Zhicheng Dou

Abstract: Grounding human-machine conversation in a document is an effective way to improve the performance of retrieval-based chatbots. However, only a part of the document content may be relevant to help select the appropriate response at a round. It is thus crucial to select the part of document content relevant to the current conversation context. In this paper, we propose a document content selection n… ▽ More Grounding human-machine conversation in a document is an effective way to improve the performance of retrieval-based chatbots. However, only a part of the document content may be relevant to help select the appropriate response at a round. It is thus crucial to select the part of document content relevant to the current conversation context. In this paper, we propose a document content selection network (CSN) to perform explicit selection of relevant document contents, and filter out the irrelevant parts. We show in experiments on two public document-grounded conversation datasets that CSN can effectively help select the relevant document contents to the conversation context, and it produces better results than the state-of-the-art approaches. Our code and datasets are available at https://github.com/DaoD/CSN. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: ECIR 2021 Camera Ready

arXiv:2010.13544 [pdf, other]

doi 10.1145/3340531.3412039

Meta-Learning for Neural Relation Classification with Distant Supervision

Authors: Zhenzhen Li, Jian-Yun Nie, Benyou Wang, Pan Du, Yuhan Zhang, Lixin Zou, Dongsheng Li

Abstract: Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have been proposed to select a subset of reliable instances for neural model training, but they still suffer from noisy labeling problem or underutilization of the we… ▽ More Distant supervision provides a means to create a large number of weakly labeled data at low cost for relation classification. However, the resulting labeled instances are very noisy, containing data with wrong labels. Many approaches have been proposed to select a subset of reliable instances for neural model training, but they still suffer from noisy labeling problem or underutilization of the weakly-labeled data. To better select more reliable training instances, we introduce a small amount of manually labeled data as reference to guide the selection process. In this paper, we propose a meta-learning based approach, which learns to reweight noisy training data under the guidance of reference data. As the clean reference data is usually very small, we propose to augment it by dynamically distilling the most reliable elite instances from the noisy data. Experiments on several datasets demonstrate that the reference data can effectively guide the selection of training data, and our augmented approach consistently improves the performance of relation classification comparing to the existing state-of-the-art methods. △ Less

Submitted 26 October, 2020; originally announced October 2020.

Comments: 10 pages, 7 figures; corrected one encoding error in CIKM pdf

Journal ref: In Proceedings of CIKM, pp. 815-824. 2020

arXiv:2006.05018 [pdf]

Deep learning to estimate the physical proportion of infected region of lung for COVID-19 pneumonia with CT image set

Authors: Wei Wu, Yu Shi, Xukun Li, Yukun Zhou, Peng Du, Shuangzhi Lv, Tingbo Liang, Jifang Sheng

Abstract: Utilizing computed tomography (CT) images to quickly estimate the severity of cases with COVID-19 is one of the most straightforward and efficacious methods. Two tasks were studied in this present paper. One was to segment the mask of intact lung in case of pneumonia. Another was to generate the masks of regions infected by COVID-19. The masks of these two parts of images then were converted to co… ▽ More Utilizing computed tomography (CT) images to quickly estimate the severity of cases with COVID-19 is one of the most straightforward and efficacious methods. Two tasks were studied in this present paper. One was to segment the mask of intact lung in case of pneumonia. Another was to generate the masks of regions infected by COVID-19. The masks of these two parts of images then were converted to corresponding volumes to calculate the physical proportion of infected region of lung. A total of 129 CT image set were herein collected and studied. The intrinsic Hounsfiled value of CT images was firstly utilized to generate the initial dirty version of labeled masks both for intact lung and infected regions. Then, the samples were carefully adjusted and improved by two professional radiologists to generate the final training set and test benchmark. Two deep learning models were evaluated: UNet and 2.5D UNet. For the segment of infected regions, a deep learning based classifier was followed to remove unrelated blur-edged regions that were wrongly segmented out such as air tube and blood vessel tissue etc. For the segmented masks of intact lung and infected regions, the best method could achieve 0.972 and 0.757 measure in mean Dice similarity coefficient on our test benchmark. As the overall proportion of infected region of lung, the final result showed 0.961 (Pearson's correlation coefficient) and 11.7% (mean absolute percent error). The instant proportion of infected regions of lung could be used as a visual evidence to assist clinical physician to determine the severity of the case. Furthermore, a quantified report of infected regions can help predict the prognosis for COVID-19 cases which were scanned periodically within the treatment cycle. △ Less

Submitted 8 June, 2020; originally announced June 2020.

arXiv:2004.05707 [pdf, other]

VGCN-BERT: Augmenting BERT with Graph Embedding for Text Classification

Authors: Zhibin Lu, Pan Du, Jian-Yun Nie

Abstract: Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the stren… ▽ More Much progress has been made recently on text classification with methods based on neural networks. In particular, models using attention mechanism such as BERT have shown to have the capability of capturing the contextual information within a sentence or document. However, their ability of capturing the global information about the vocabulary of a language is more limited. This latter is the strength of Graph Convolutional Networks (GCN). In this paper, we propose VGCN-BERT model which combines the capability of BERT with a Vocabulary Graph Convolutional Network (VGCN). Local information and global information interact through different layers of BERT, allowing them to influence mutually and to build together a final representation for classification. In our experiments on several text classification datasets, our approach outperforms BERT and GCN alone, and achieve higher effectiveness than that reported in previous studies. △ Less

Submitted 12 April, 2020; originally announced April 2020.

Comments: 12 pages, 2 figures

ACM Class: I.2.4; I.2.7

Journal ref: in J. M. Jose et al. (Eds.): ECIR 2020, LNCS 12035, pp.369-382, 2020

arXiv:2002.09334 [pdf]

doi 10.1016/j.eng.2020.04.010

Deep Learning System to Screen Coronavirus Disease 2019 Pneumonia

Authors: Xiaowei Xu, Xiangao Jiang, Chunlian Ma, Peng Du, Xukun Li, Shuangzhi Lv, Liang Yu, Yanfei Chen, Junwei Su, Guanjing Lang, Yongtao Li, Hong Zhao, Kaijin Xu, Lingxiang Ruan, Wei Wu

Abstract: We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of v… ▽ More We found that the real time reverse transcription-polymerase chain reaction (RT-PCR) detection of viral RNA from sputum or nasopharyngeal swab has a relatively low positive rate in the early stage to determine COVID-19 (named by the World Health Organization). The manifestations of computed tomography (CT) imaging of COVID-19 had their own characteristics, which are different from other types of viral pneumonia, such as Influenza-A viral pneumonia. Therefore, clinical doctors call for another early diagnostic criteria for this new type of pneumonia as soon as possible.This study aimed to establish an early screening model to distinguish COVID-19 pneumonia from Influenza-A viral pneumonia and healthy cases with pulmonary CT images using deep learning techniques. The candidate infection regions were first segmented out using a 3-dimensional deep learning model from pulmonary CT image set. These separated images were then categorized into COVID-19, Influenza-A viral pneumonia and irrelevant to infection groups, together with the corresponding confidence scores using a location-attention classification model. Finally the infection type and total confidence score of this CT case were calculated with Noisy-or Bayesian function.The experiments result of benchmark dataset showed that the overall accuracy was 86.7 % from the perspective of CT cases as a whole.The deep learning models established in this study were effective for the early screening of COVID-19 patients and demonstrated to be a promising supplementary diagnostic method for frontline clinical doctors. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Journal ref: Engineering, Volume 6, Issue 10, October 2020, Pages 1122-1129

arXiv:2002.06810 [pdf, other]

doi 10.1145/3394171.3413968

Discernible Image Compression

Authors: Zhaohui Yang, Yunhe Wang, Chang Xu, Peng Du, Chao Xu, Chunjing Xu, Qi Tian

Abstract: Image compression, as one of the fundamental low-level image processing tasks, is very essential for computer vision. Tremendous computing and storage resources can be preserved with a trivial amount of visual information. Conventional image compression methods tend to obtain compressed images by minimizing their appearance discrepancy with the corresponding original images, but pay little attenti… ▽ More Image compression, as one of the fundamental low-level image processing tasks, is very essential for computer vision. Tremendous computing and storage resources can be preserved with a trivial amount of visual information. Conventional image compression methods tend to obtain compressed images by minimizing their appearance discrepancy with the corresponding original images, but pay little attention to their efficacy in downstream perception tasks, e.g., image recognition and object detection. Thus, some of compressed images could be recognized with bias. In contrast, this paper aims to produce compressed images by pursuing both appearance and perceptual consistency. Based on the encoder-decoder framework, we propose using a pre-trained CNN to extract features of the original and compressed images, and making them similar. Thus the compressed images are discernible to subsequent tasks, and we name our method as Discernible Image Compression (DIC). In addition, the maximum mean discrepancy (MMD) is employed to minimize the difference between feature distributions. The resulting compression network can generate images with high image quality and preserve the consistent perception in the feature domain, so that these images can be well recognized by pre-trained machine learning models. Experiments on benchmarks demonstrate that images compressed by using the proposed method can also be well recognized by subsequent visual recognition and detection models. For instance, the mAP value of compressed images by DIC is about 0.6% higher than that of using compressed images by conventional methods. △ Less

Submitted 7 September, 2020; v1 submitted 17 February, 2020; originally announced February 2020.

Comments: Accepted by ACMMM 2020

arXiv:1910.05599 [pdf, other]

Online monitoring for safe pedestrian-vehicle interactions

Authors: Peter Du, Zhe Huang, Tianqi Liu, Ke Xu, Qichao Gao, Hussein Sibai, Katherine Driggs-Campbell, Sayan Mitra

Abstract: As autonomous systems begin to operate amongst humans, methods for safe interaction must be investigated. We consider an example of a small autonomous vehicle in a pedestrian zone that must safely maneuver around people in a free-form fashion. We investigate two key questions: How can we effectively integrate pedestrian intent estimation into our autonomous stack. Can we develop an online monitori… ▽ More As autonomous systems begin to operate amongst humans, methods for safe interaction must be investigated. We consider an example of a small autonomous vehicle in a pedestrian zone that must safely maneuver around people in a free-form fashion. We investigate two key questions: How can we effectively integrate pedestrian intent estimation into our autonomous stack. Can we develop an online monitoring framework to give formal guarantees on the safety of such human-robot interactions. We present a pedestrian intent estimation framework that can accurately predict future pedestrian trajectories given multiple possible goal locations. We integrate this into a reachability-based online monitoring scheme that formally assesses the safety of these interactions with nearly real-time performance (approximately 0.3 seconds). These techniques are integrated on a test vehicle with a complete in-house autonomous stack, demonstrating effective and safe interaction in real-world experiments. △ Less

Submitted 17 July, 2020; v1 submitted 12 October, 2019; originally announced October 2019.

Comments: 15 pages, 5 figures,

arXiv:1910.02285 [pdf]

doi 10.1007/s10489-020-02051-1

A Deep Learning System That Generates Quantitative CT Reports for Diagnosing Pulmonary Tuberculosis

Authors: Wei Wu, Xukun Li, Peng Du, Guanjing Lang, Min Xu, Kaijin Xu, Lanjuan Li

Abstract: We developed a deep learning model-based system to automatically generate a quantitative Computed Tomography (CT) diagnostic report for Pulmonary Tuberculosis (PTB) cases.501 CT imaging datasets from 223 patients with active PTB were collected, and another 501 cases from a healthy population served as negative samples.2884 lesions of PTB were carefully labeled and classified manually by profession… ▽ More We developed a deep learning model-based system to automatically generate a quantitative Computed Tomography (CT) diagnostic report for Pulmonary Tuberculosis (PTB) cases.501 CT imaging datasets from 223 patients with active PTB were collected, and another 501 cases from a healthy population served as negative samples.2884 lesions of PTB were carefully labeled and classified manually by professional radiologists.Three state-of-the-art 3D convolution neural network (CNN) models were trained and evaluated in the inspection of PTB CT images. Transfer learning method was also utilized during this process. The best model was selected to annotate the spatial location of lesions and classify them into miliary, infiltrative, caseous, tuberculoma and cavitary types simultaneously.Then the Noisy-Or Bayesian function was used to generate an overall infection probability.Finally, a quantitative diagnostic report was exported.The results showed that the recall and precision rates, from the perspective of a single lesion region of PTB, were 85.9% and 89.2% respectively. The overall recall and precision rates,from the perspective of one PTB case, were 98.7% and 93.7%, respectively. Moreover, the precision rate of the PTB lesion type classification was 90.9%.The new method might serve as an effective reference for decision making by clinical doctors. △ Less

Submitted 5 October, 2019; originally announced October 2019.

arXiv:1910.01557 [pdf, other]

CyPhyHouse: A Programming, Simulation, and Deployment Toolchain for Heterogeneous Distributed Coordination

Authors: Ritwika Ghosh, Joao P. Jansch-Porto, Chiao Hsieh, Amelia Gosse, Minghao Jiang, Hebron Taylor, Peter Du, Sayan Mitra, Geir Dullerud

Abstract: Programming languages, libraries, and development tools have transformed the application development processes for mobile computing and machine learning. This paper introduces the CyPhyHouse - a toolchain that aims to provide similar programming, debugging, and deployment benefits for distributed mobile robotic applications. Users can develop hardware-agnostic, distributed applications using the h… ▽ More Programming languages, libraries, and development tools have transformed the application development processes for mobile computing and machine learning. This paper introduces the CyPhyHouse - a toolchain that aims to provide similar programming, debugging, and deployment benefits for distributed mobile robotic applications. Users can develop hardware-agnostic, distributed applications using the high-level, event driven Koord programming language, without requiring expertise in controller design or distributed network protocols. The modular, platform-independent middleware of CyPhyHouse implements these functionalities using standard algorithms for path planning (RRT), control (MPC), mutual exclusion, etc. A high-fidelity, scalable, multi-threaded simulator for Koord applications is developed to simulate the same application code for dozens of heterogeneous agents. The same compiled code can also be deployed on heterogeneous mobile platforms. The effectiveness of CyPhyHouse in improving the design cycles is explicitly illustrated in a robotic testbed through development, simulation, and deployment of a distributed task allocation application on in-house ground and aerial vehicles. △ Less

Submitted 10 October, 2019; v1 submitted 3 October, 2019; originally announced October 2019.

arXiv:1908.01046 [pdf, other]

Adaptive Stress Testing with Reward Augmentation for Autonomous Vehicle Validation

Authors: Anthony Corso, Peter Du, Katherine Driggs-Campbell, Mykel J. Kochenderfer

Abstract: Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the mos… ▽ More Determining possible failure scenarios is a critical step in the evaluation of autonomous vehicle systems. Real-world vehicle testing is commonly employed for autonomous vehicle validation, but the costs and time requirements are high. Consequently, simulation-driven methods such as Adaptive Stress Testing (AST) have been proposed to aid in validation. AST formulates the problem of finding the most likely failure scenarios as a Markov decision process, which can be solved using reinforcement learning. In practice, AST tends to find scenarios where failure is unavoidable and tends to repeatedly discover the same types of failures of a system. This work addresses these issues by encoding domain relevant information into the search procedure. With this modification, the AST method discovers a larger and more expressive subset of the failure space when compared to the original AST formulation. We show that our approach is able to identify useful failure scenarios of an autonomous vehicle policy. △ Less

Submitted 6 August, 2019; v1 submitted 2 August, 2019; originally announced August 2019.

Comments: Appears in IEEE ITSC 2019

arXiv:1905.13550 [pdf]

A novel hybrid model based on multi-objective Harris hawks optimization algorithm for daily PM2.5 and PM10 forecasting

Authors: Pei Du, Jianzhou Wang, Yan Hao, Tong Niu, Wendong Yang

Abstract: High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design an accurate and reliable air pollutant forecasting model. However, most previous research have many deficiencies, such as ignoring the importance of… ▽ More High levels of air pollution may seriously affect people's living environment and even endanger their lives. In order to reduce air pollution concentrations, and warn the public before the occurrence of hazardous air pollutants, it is urgent to design an accurate and reliable air pollutant forecasting model. However, most previous research have many deficiencies, such as ignoring the importance of predictive stability, and poor initial parameters and so on, which have significantly effect on the performance of air pollution prediction. Therefore, to address these issues, a novel hybrid model is proposed in this study. Specifically, a powerful data preprocessing techniques is applied to decompose the original time series into different modes from low- frequency to high- frequency. Next, a new multi-objective algorithm called MOHHO is first developed in this study, which are introduced to tune the parameters of ELM model with high forecasting accuracy and stability for air pollution series prediction, simultaneously. And the optimized ELM model is used to perform the time series prediction. Finally, a scientific and robust evaluation system including several error criteria, benchmark models, and several experiments using six air pollutant concentrations time series from three cities in China is designed to perform a compressive assessment for the presented hybrid forecasting model. Experimental results indicate that the proposed hybrid model can guarantee a more stable and higher predictive performance compared to others, whose superior prediction ability may help to develop effective plans for air pollutant emissions and prevent health problems caused by air pollution. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: 24 pages, 4 figures

MSC Class: 68U20

arXiv:1905.07689 [pdf, other]

doi 10.1145/3331184.3331219

DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

Authors: Zhiqing Sun, Jian Tang, Pan Du, Zhi-Hong Deng, Jian-Yun Nie

Abstract: Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically… ▽ More Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches. △ Less

Submitted 19 May, 2019; originally announced May 2019.

Comments: Accepted to SIGIR 2019

arXiv:1903.00066 [pdf, other]

A Long-Short Demands-Aware Model for Next-Item Recommendation

Authors: Ting Bai, Pan Du, Wayne Xin Zhao, Ji-Rong Wen, Jian-Yun Nie

Abstract: Recommending the right products is the central problem in recommender systems, but the right products should also be recommended at the right time to meet the demands of users, so as to maximize their values. Users' demands, implying strong purchase intents, can be the most useful way to promote products sales if well utilized. Previous recommendation models mainly focused on user's general intere… ▽ More Recommending the right products is the central problem in recommender systems, but the right products should also be recommended at the right time to meet the demands of users, so as to maximize their values. Users' demands, implying strong purchase intents, can be the most useful way to promote products sales if well utilized. Previous recommendation models mainly focused on user's general interests to find the right products. However, the aspect of meeting users' demands at the right time has been much less explored. To address this problem, we propose a novel Long-Short Demands-aware Model (LSDM), in which both user's interests towards items and user's demands over time are incorporated. We summarize two aspects: termed as long-time demands (e.g., purchasing the same product repetitively showing a long-time persistent interest) and short-time demands (e.g., co-purchase like buying paintbrushes after pigments). To utilize such long-short demands of users, we create different clusters to group the successive product purchases together according to different time spans, and use recurrent neural networks to model each sequence of clusters at a time scale. The long-short purchase demands with multi-time scales are finally aggregated by joint learning strategies. Experimental results on three real-world commerce datasets demonstrate the effectiveness of our model for next-item recommendation, showing the usefulness of modeling users' long-short purchase demands of items with multi-time scales. △ Less

Submitted 12 February, 2019; originally announced March 2019.

arXiv:1810.07260 [pdf]

Statistical Estimation of Malware Detection Metrics in the Absence of Ground Truth

Authors: Pang Du, Zheyuan Sun, Huashan Chen, Jin-Hee Cho, Shouhuai Xu

Abstract: The accurate measurement of security metrics is a critical research problem because an improper or inaccurate measurement process can ruin the usefulness of the metrics, no matter how well they are defined. This is a highly challenging problem particularly when the ground truth is unknown or noisy. In contrast to the well perceived importance of defining security metrics, the measurement of securi… ▽ More The accurate measurement of security metrics is a critical research problem because an improper or inaccurate measurement process can ruin the usefulness of the metrics, no matter how well they are defined. This is a highly challenging problem particularly when the ground truth is unknown or noisy. In contrast to the well perceived importance of defining security metrics, the measurement of security metrics has been little understood in the literature. In this paper, we measure five malware detection metrics in the {\em absence} of ground truth, which is a realistic setting that imposes many technical challenges. The ultimate goal is to develop principled, automated methods for measuring these metrics at the maximum accuracy possible. The problem naturally calls for investigations into statistical estimators by casting the measurement problem as a {\em statistical estimation} problem. We propose statistical estimators for these five malware detection metrics. By investigating the statistical properties of these estimators, we are able to characterize when the estimators are accurate, and what adjustments can be made to improve them under what circumstances. We use synthetic data with known ground truth to validate these statistical estimators. Then, we employ these estimators to measure five metrics with respect to a large dataset collected from VirusTotal. We believe our study touches upon a vital problem that has not been paid due attention and will inspire many future investigations. △ Less

Submitted 23 September, 2018; originally announced October 2018.

Journal ref: IEEE T-IFS (2018)

arXiv:1806.08485 [pdf, other]

Shape-from-Mask: A Deep Learning Based Human Body Shape Reconstruction from Binary Mask Images

Authors: Zhongping Ji, Xiao Qi, Yigang Wang, Gang Xu, Peng Du, Qing Wu

Abstract: 3D content creation is referred to as one of the most fundamental tasks of computer graphics. And many 3D modeling algorithms from 2D images or curves have been developed over the past several decades. Designers are allowed to align some conceptual images or sketch some suggestive curves, from front, side, and top views, and then use them as references in constructing a 3D model automatically or m… ▽ More 3D content creation is referred to as one of the most fundamental tasks of computer graphics. And many 3D modeling algorithms from 2D images or curves have been developed over the past several decades. Designers are allowed to align some conceptual images or sketch some suggestive curves, from front, side, and top views, and then use them as references in constructing a 3D model automatically or manually. However, to the best of our knowledge, no studies have investigated on 3D human body reconstruction in a similar manner. In this paper, we propose a deep learning based reconstruction of 3D human body shape from 2D orthographic views. A novel CNN-based regression network, with two branches corresponding to frontal and lateral views respectively, is designed for estimating 3D human body shape from 2D mask images. We train our networks separately to decouple the feature descriptors which encode the body parameters from different views, and fuse them to estimate an accurate human body shape. In addition, to overcome the shortage of training data required for this purpose, we propose some significantly data augmentation schemes for 3D human body shapes, which can be used to promote further research on this topic. Extensive experimen- tal results demonstrate that visually realistic and accurate reconstructions can be achieved effectively using our algorithm. Requiring only binary mask images, our method can help users create their own digital avatars quickly, and also make it easy to create digital human body for 3D game, virtual reality, online fashion shopping. △ Less

Submitted 22 June, 2018; originally announced June 2018.

Comments: 11 pages

Showing 1–49 of 49 results for author: Du, P