subscribe to arXiv mailings

Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding

Authors: Junjie Fei, Mahmoud Ahmed, Jian Ding, Eslam Mohamed Bakr, Mohamed Elhoseiny

Abstract: While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its sig… ▽ More While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its significance, the current landscape lacks tasks and datasets that endow and assess this capability. Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks. To support learning and evaluating for these tasks, we introduce 3DCoMPaT Grounded Instructions Dataset (3DCoMPaT-GRIN). 3DCoMPaT-GRIN Vanilla, comprising 789k part-aware point cloud-instruction-segmentation mask triplets, is used to evaluate MLLMs' ability of part-aware segmentation grounding. 3DCoMPaT-GRIN Grounded Caption, containing 107k part-aware point cloud-instruction-grounded caption triplets, assesses both MLLMs' part-aware language comprehension and segmentation grounding capabilities. Our introduced tasks, dataset, and Kestrel represent a preliminary effort to bridge the gap between human cognition and 3D MLLMs, i.e., the ability to perceive and engage with the environment at both global and part levels. Extensive experiments on the 3DCoMPaT-GRIN show that Kestrel can generate user-specified segmentation masks, a capability not present in any existing 3D MLLM. Kestrel thus established a benchmark for evaluating the part-aware language comprehension and segmentation grounding of 3D objects. Project page at https://feielysia.github.io/Kestrel.github.io/ △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2311.05478 [pdf, other]

Robust Retraining-free GAN Fingerprinting via Personalized Normalization

Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

Abstract: In recent years, there has been significant growth in the commercial applications of generative models, licensed and distributed by model developers to users, who in turn use them to offer services. In this scenario, there is a need to track and identify the responsible user in the presence of a violation of the license agreement or any kind of malicious usage. Although there are methods enabling… ▽ More In recent years, there has been significant growth in the commercial applications of generative models, licensed and distributed by model developers to users, who in turn use them to offer services. In this scenario, there is a need to track and identify the responsible user in the presence of a violation of the license agreement or any kind of malicious usage. Although there are methods enabling Generative Adversarial Networks (GANs) to include invisible watermarks in the images they produce, generating a model with a different watermark, referred to as a fingerprint, for each user is time- and resource-consuming due to the need to retrain the model to include the desired fingerprint. In this paper, we propose a retraining-free GAN fingerprinting method that allows model developers to easily generate model copies with the same functionality but different fingerprints. The generator is modified by inserting additional Personalized Normalization (PN) layers whose parameters (scaling and bias) are generated by two dedicated shallow networks (ParamGen Nets) taking the fingerprint as input. A watermark decoder is trained simultaneously to extract the fingerprint from the generated images. The proposed method can embed different fingerprints inside the GAN by just changing the input of the ParamGen Nets and performing a feedforward pass, without finetuning or retraining. The performance of the proposed method in terms of robustness against both model-level and image-level attacks is also superior to the state-of-the-art. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2310.16919 [pdf, other]

Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs

Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

Abstract: We propose a novel multi-bit box-free watermarking method for the protection of Intellectual Property Rights (IPR) of GANs with improved robustness against white-box attacks like fine-tuning, pruning, quantization, and surrogate model attacks. The watermark is embedded by adding an extra watermarking loss term during GAN training, ensuring that the images generated by the GAN contain an invisible… ▽ More We propose a novel multi-bit box-free watermarking method for the protection of Intellectual Property Rights (IPR) of GANs with improved robustness against white-box attacks like fine-tuning, pruning, quantization, and surrogate model attacks. The watermark is embedded by adding an extra watermarking loss term during GAN training, ensuring that the images generated by the GAN contain an invisible watermark that can be retrieved by a pre-trained watermark decoder. In order to improve the robustness against white-box model-level attacks, we make sure that the model converges to a wide flat minimum of the watermarking loss term, in such a way that any modification of the model parameters does not erase the watermark. To do so, we add random noise vectors to the parameters of the generator and require that the watermarking loss term is as invariant as possible with respect to the presence of noise. This procedure forces the generator to converge to a wide flat minimum of the watermarking loss. The proposed method is architectureand dataset-agnostic, thus being applicable to many different generation tasks and models, as well as to CNN-based image processing architectures. We present the results of extensive experiments showing that the presence of the watermark has a negligible impact on the quality of the generated images, and proving the superior robustness of the watermark against model modification and surrogate model attacks. △ Less

Submitted 25 October, 2023; originally announced October 2023.

arXiv:2307.16525 [pdf, other]

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

Authors: Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

Abstract: Image-to-text generation aims to describe images using natural language. Recently, zero-shot image captioning based on pre-trained vision-language models (VLMs) and large language models (LLMs) has made significant progress. However, we have observed and empirically demonstrated that these methods are susceptible to modality bias induced by LLMs and tend to generate descriptions containing objects… ▽ More Image-to-text generation aims to describe images using natural language. Recently, zero-shot image captioning based on pre-trained vision-language models (VLMs) and large language models (LLMs) has made significant progress. However, we have observed and empirically demonstrated that these methods are susceptible to modality bias induced by LLMs and tend to generate descriptions containing objects (entities) that do not actually exist in the image but frequently appear during training (i.e., object hallucination). In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios. ViECap incorporates entity-aware hard prompts to guide LLMs' attention toward the visual entities present in the image, enabling coherent caption generation across diverse scenes. With entity-aware hard prompts, ViECap is capable of maintaining performance when transferring from in-domain to out-of-domain scenarios. Extensive experiments demonstrate that ViECap sets a new state-of-the-art cross-domain (transferable) captioning and performs competitively in-domain captioning compared to previous VLMs-based zero-shot methods. Our code is available at: https://github.com/FeiElysia/ViECap △ Less

Submitted 31 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2306.02061 [pdf, other]

Balancing Logit Variation for Long-tailed Semantic Segmentation

Authors: Yuchao Wang, Jingjing Fei, Haochen Wang, Wei Li, Tianpeng Bao, Liwei Wu, Rui Zhao, Yujun Shen

Abstract: Semantic segmentation usually suffers from a long-tail data distribution. Due to the imbalanced number of samples across categories, the features of those tail classes may get squeezed into a narrow area in the feature space. Towards a balanced feature distribution, we introduce category-wise variation into the network predictions in the training phase such that an instance is no longer projected… ▽ More Semantic segmentation usually suffers from a long-tail data distribution. Due to the imbalanced number of samples across categories, the features of those tail classes may get squeezed into a narrow area in the feature space. Towards a balanced feature distribution, we introduce category-wise variation into the network predictions in the training phase such that an instance is no longer projected to a feature point, but a small region instead. Such a perturbation is highly dependent on the category scale, which appears as assigning smaller variation to head classes and larger variation to tail classes. In this way, we manage to close the gap between the feature areas of different categories, resulting in a more balanced representation. It is noteworthy that the introduced variation is discarded at the inference stage to facilitate a confident prediction. Although with an embarrassingly simple implementation, our method manifests itself in strong generalizability to various datasets and task settings. Extensive experiments suggest that our plug-in design lends itself well to a range of state-of-the-art approaches and boosts the performance on top of them. △ Less

Submitted 3 June, 2023; originally announced June 2023.

arXiv:2305.13752 [pdf, other]

Pulling Target to Source: A New Perspective on Domain Adaptive Semantic Segmentation

Authors: Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Liwei Wu, Yuxi Wang, Zhaoxiang Zhang

Abstract: Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning qualified target features, making it challenging to guarantee their discrimination in the absence of target labels. This work provides a new perspective. We observe that the features learned with source data mana… ▽ More Domain adaptive semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, existing methods primarily focus on directly learning qualified target features, making it challenging to guarantee their discrimination in the absence of target labels. This work provides a new perspective. We observe that the features learned with source data manage to keep categorically discriminative during training, thereby enabling us to implicitly learn adequate target representations by simply \textbf{pulling target features close to source features for each category}. To this end, we propose T2S-DA, which we interpret as a form of pulling Target to Source for Domain Adaptation, encouraging the model in learning similar cross-domain features. Also, considering the pixel categories are heavily imbalanced for segmentation datasets, we come up with a dynamic re-weighting strategy to help the model concentrate on those underperforming classes. Extensive experiments confirm that T2S-DA learns a more discriminative and generalizable representation, significantly surpassing the state-of-the-art. We further show that our method is quite qualified for the domain generalization task, verifying its domain-invariant property. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.02677 [pdf, other]

Caption Anything: Interactive Image Description with Diverse Multimodal Controls

Authors: Teng Wang, Jinrui Zhang, Junjie Fei, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao

Abstract: Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e.g.}$, looking at the specified regions or telling in a particular text style. State-of-the-art methods are trained on annotated pairs of input controls and output captions. However, the scarcity of such well-annotated multimodal data largely limits… ▽ More Controllable image captioning is an emerging multimodal topic that aims to describe the image with natural language following human purpose, $\textit{e.g.}$, looking at the specified regions or telling in a particular text style. State-of-the-art methods are trained on annotated pairs of input controls and output captions. However, the scarcity of such well-annotated multimodal data largely limits their usability and scalability for interactive AI systems. Leveraging unimodal instruction-following foundation models is a promising alternative that benefits from broader sources of data. In this paper, we present Caption AnyThing (CAT), a foundation model augmented image captioning framework supporting a wide range of multimodel controls: 1) visual controls, including points, boxes, and trajectories; 2) language controls, such as sentiment, length, language, and factuality. Powered by Segment Anything Model (SAM) and ChatGPT, we unify the visual and language prompts into a modularized framework, enabling the flexible combination between different controls. Extensive case studies demonstrate the user intention alignment capabilities of our framework, shedding light on effective user interaction modeling in vision-language applications. Our code is publicly available at https://github.com/ttengwang/Caption-Anything. △ Less

Submitted 6 July, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

Comments: Tech-report

arXiv:2301.12178 [pdf, other]

MVKT-ECG: Efficient Single-lead ECG Classification on Multi-Label Arrhythmia by Multi-View Knowledge Transferring

Authors: Yuzhen Qin, Li Sun, Hui Chen, Wei-qiang Zhang, Wenming Yang, Jintao Fei, Guijin Wang

Abstract: The widespread emergence of smart devices for ECG has sparked demand for intelligent single-lead ECG-based diagnostic systems. However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information. In this work, we propose inter-lead Multi-View Knowledge Transferring of ECG (MVKT-ECG) to boost single-lead… ▽ More The widespread emergence of smart devices for ECG has sparked demand for intelligent single-lead ECG-based diagnostic systems. However, it is challenging to develop a single-lead-based ECG interpretation model for multiple diseases diagnosis due to the lack of some key disease information. In this work, we propose inter-lead Multi-View Knowledge Transferring of ECG (MVKT-ECG) to boost single-lead ECG's ability for multi-label disease diagnosis. This training strategy can transfer superior disease knowledge from multiple different views of ECG (e.g. 12-lead ECG) to single-lead-based ECG interpretation model to mine details in single-lead ECG signals that are easily overlooked by neural networks. MVKT-ECG allows this lead variety as a supervision signal within a teacher-student paradigm, where the teacher observes multi-lead ECG educates a student who observes only single-lead ECG. Since the mutual disease information between the single-lead ECG and muli-lead ECG plays a key role in knowledge transferring, we present a new disease-aware Contrastive Lead-information Transferring(CLT) to improve the mutual disease information between the single-lead ECG and muli-lead ECG. Moreover, We modify traditional Knowledge Distillation to multi-label disease Knowledge Distillation (MKD) to make it applicable for multi-label disease diagnosis. The comprehensive experiments verify that MVKT-ECG has an excellent performance in improving the diagnostic effect of single-lead ECG. △ Less

Submitted 28 January, 2023; originally announced January 2023.

arXiv:2212.14309 [pdf, other]

Learning to mask: Towards generalized face forgery detection

Authors: Jianwei Fei, Yunshu Dai, Huaming Wang, Zhihua Xia

Abstract: Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forg… ▽ More Generalizability to unseen forgery types is crucial for face forgery detectors. Recent works have made significant progress in terms of generalization by synthetic forgery data augmentation. In this work, we explore another path for improving the generalization. Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types. Specifically, in our method, a teacher network takes as input the face images and generates an attention map of the deep features by a diverse multihead attention ViT. The attention map is used to guide a student network to focus on the low-attended features by reducing the highly-attended deep features. A deep feature mixup strategy is also proposed to synthesize forgeries in the feature domain. Experiments demonstrate that, without data augmentation, our method is able to achieve promising performances on unseen forgeries and highly compressed data. △ Less

Submitted 29 December, 2022; originally announced December 2022.

arXiv:2212.13466 [pdf, other]

General GAN-generated image detection by data augmentation in fingerprint domain

Authors: Huaming Wang, Jianwei Fei, Yunshu Dai, Lingyun Leng, Zhihua Xia

Abstract: In this work, we investigate improving the generalizability of GAN-generated image detectors by performing data augmentation in the fingerprint domain. Specifically, we first separate the fingerprints and contents of the GAN-generated images using an autoencoder based GAN fingerprint extractor, followed by random perturbations of the fingerprints. Then the original fingerprints are substituted wit… ▽ More In this work, we investigate improving the generalizability of GAN-generated image detectors by performing data augmentation in the fingerprint domain. Specifically, we first separate the fingerprints and contents of the GAN-generated images using an autoencoder based GAN fingerprint extractor, followed by random perturbations of the fingerprints. Then the original fingerprints are substituted with the perturbed fingerprints and added to the original contents, to produce images that are visually invariant but with distinct fingerprints. The perturbed images can successfully imitate images generated by different GANs to improve the generalization of the detectors, which is demonstrated by the spectra visualization. To our knowledge, we are the first to conduct data augmentation in the fingerprint domain. Our work explores a novel prospect that is distinct from previous works on spatial and frequency domain augmentation. Extensive cross-GAN experiments demonstrate the effectiveness of our method compared to the state-of-the-art methods in detecting fake images generated by unknown GANs. △ Less

Submitted 9 April, 2023; v1 submitted 27 December, 2022; originally announced December 2022.

arXiv:2211.13968 [pdf, other]

MIAD: A Maintenance Inspection Dataset for Unsupervised Anomaly Detection

Authors: Tianpeng Bao, Jiadong Chen, Wei Li, Xiang Wang, Jingjing Fei, Liwei Wu, Rui Zhao, Ye Zheng

Abstract: Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for… ▽ More Visual anomaly detection plays a crucial role in not only manufacturing inspection to find defects of products during manufacturing processes, but also maintenance inspection to keep equipment in optimum working condition particularly outdoors. Due to the scarcity of the defective samples, unsupervised anomaly detection has attracted great attention in recent years. However, existing datasets for unsupervised anomaly detection are biased towards manufacturing inspection, not considering maintenance inspection which is usually conducted under outdoor uncontrolled environment such as varying camera viewpoints, messy background and degradation of object surface after long-term working. We focus on outdoor maintenance inspection and contribute a comprehensive Maintenance Inspection Anomaly Detection (MIAD) dataset which contains more than 100K high-resolution color images in various outdoor industrial scenarios. This dataset is generated by a 3D graphics software and covers both surface and logical anomalies with pixel-precise ground truth. Extensive evaluations of representative algorithms for unsupervised anomaly detection are conducted, and we expect MIAD and corresponding experimental results can inspire research community in outdoor unsupervised anomaly detection tasks. Worthwhile and related future work can be spawned from our new dataset. △ Less

Submitted 28 November, 2022; v1 submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.07052 [pdf, other]

Treatment-RSPN: Recurrent Sum-Product Networks for Sequential Treatment Regimes

Authors: Adam Dejl, Harsh Deep, Jonathan Fei, Ardavan Saeedi, Li-wei H. Lehman

Abstract: Sum-product networks (SPNs) have recently emerged as a novel deep learning architecture enabling highly efficient probabilistic inference. Since their introduction, SPNs have been applied to a wide range of data modalities and extended to time-sequence data. In this paper, we propose a general framework for modelling sequential treatment decision-making behaviour and treatment response using recur… ▽ More Sum-product networks (SPNs) have recently emerged as a novel deep learning architecture enabling highly efficient probabilistic inference. Since their introduction, SPNs have been applied to a wide range of data modalities and extended to time-sequence data. In this paper, we propose a general framework for modelling sequential treatment decision-making behaviour and treatment response using recurrent sum-product networks (RSPNs). Models developed using our framework benefit from the full range of RSPN capabilities, including the abilities to model the full distribution of the data, to seamlessly handle latent variables, missing values and categorical data, and to efficiently perform marginal and conditional inference. Our methodology is complemented by a novel variant of the expectation-maximization algorithm for RSPNs, enabling efficient training of our models. We evaluate our approach on a synthetic dataset as well as real-world data from the MIMIC-IV intensive care unit medical database. Our evaluation demonstrates that our approach can closely match the ground-truth data generation process on synthetic data and achieve results close to neural and probabilistic baselines while using a tractable and interpretable model. △ Less

Submitted 13 November, 2022; originally announced November 2022.

Comments: Extended Abstract presented at Machine Learning for Health (ML4H) symposium 2022, November 28th, 2022, New Orleans, United States & Virtual, http://www.ml4h.cc, 14 pages

ACM Class: G.3; I.2

arXiv:2209.15490 [pdf, other]

Learning Second Order Local Anomaly for General Face Forgery Detection

Authors: Jianwei Fei, Yunshu Dai, Peipeng Yu, Tianrun Shen, Zhihua Xia, Jian Weng

Abstract: In this work, we propose a novel method to improve the generalization ability of CNN-based face forgery detectors. Our method considers the feature anomalies of forged faces caused by the prevalent blending operations in face forgery algorithms. Specifically, we propose a weakly supervised Second Order Local Anomaly (SOLA) learning module to mine anomalies in local regions using deep feature maps.… ▽ More In this work, we propose a novel method to improve the generalization ability of CNN-based face forgery detectors. Our method considers the feature anomalies of forged faces caused by the prevalent blending operations in face forgery algorithms. Specifically, we propose a weakly supervised Second Order Local Anomaly (SOLA) learning module to mine anomalies in local regions using deep feature maps. SOLA first decomposes the neighborhood of local features by different directions and distances and then calculates the first and second order local anomaly maps which provide more general forgery traces for the classifier. We also propose a Local Enhancement Module (LEM) to improve the discrimination between local features of real and forged regions, so as to ensure accuracy in calculating anomalies. Besides, an improved Adaptive Spatial Rich Model (ASRM) is introduced to help mine subtle noise features via learnable high pass filters. With neither pixel level annotations nor external synthetic data, our method using a simple ResNet18 backbone achieves competitive performances compared with state-of-the-art works when evaluated on unseen forgeries. △ Less

Submitted 30 September, 2022; originally announced September 2022.

arXiv:2209.09434 [pdf, other]

BP-Im2col: Implicit Im2col Supporting AI Backpropagation on Systolic Arrays

Authors: Jianchao Yang, Mei Wen, Junzhong Shen, Yasong Cao, Minjin Tang, Renyu Yang, Jiawei Fei, Chunyuan Zhang

Abstract: State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate the inference of convolutional layers. However, traditional im2col cannot efficiently support AI backpropagation. Backpropagation in convolutional layers involves performing transposed convolution and dilated convolution, which usually introduces plenty of zero-spaces into the feature map or ker… ▽ More State-of-the-art systolic array-based accelerators adopt the traditional im2col algorithm to accelerate the inference of convolutional layers. However, traditional im2col cannot efficiently support AI backpropagation. Backpropagation in convolutional layers involves performing transposed convolution and dilated convolution, which usually introduces plenty of zero-spaces into the feature map or kernel. The zero-space data reorganization interfere with the continuity of training and incur additional and non-negligible overhead in terms of off- and on-chip storage, access and performance. Since countermeasures for backpropagation are rarely proposed, we propose BP-im2col, a novel im2col algorithm for AI backpropagation, and implement it in RTL on a TPU-like accelerator. Experiments on TPU-like accelerator indicate that BP-im2col reduces the backpropagation runtime by 34.9% on average, and reduces the bandwidth of off-chip memory and on-chip buffers by at least 22.7% and 70.6% respectively, over a baseline accelerator adopting the traditional im2col. It further reduces the additional storage overhead in the backpropagation process by at least 74.78%. △ Less

Submitted 19 September, 2022; originally announced September 2022.

Comments: Accepted in ICCD 2022, The 40th IEEE International Conference on Computer Design

arXiv:2209.07237 [pdf, other]

Robust Implementation of Foreground Extraction and Vessel Segmentation for X-ray Coronary Angiography Image Sequence

Authors: Zeyu Fu, Zhuang Fu, Chenzhuo Lu, Jun Yan, Jian Fei, Hui Han

Abstract: The extraction of contrast-filled vessels from X-ray coronary angiography (XCA) image sequence has important clinical significance for intuitively diagnosis and therapy. In this study, the XCA image sequence is regarded as a 3D tensor input, the vessel layer is regarded as a sparse tensor, and the background layer is regarded as a low-rank tensor. Using tensor nuclear norm (TNN) minimization, a no… ▽ More The extraction of contrast-filled vessels from X-ray coronary angiography (XCA) image sequence has important clinical significance for intuitively diagnosis and therapy. In this study, the XCA image sequence is regarded as a 3D tensor input, the vessel layer is regarded as a sparse tensor, and the background layer is regarded as a low-rank tensor. Using tensor nuclear norm (TNN) minimization, a novel method for vessel layer extraction based on tensor robust principal component analysis (TRPCA) is proposed. Furthermore, considering the irregular movement of vessels and the low-frequency dynamic disturbance of surrounding irrelevant tissues, the total variation (TV) regularized spatial-temporal constraint is introduced to smooth the foreground layer. Subsequently, for vessel layer images with uneven contrast distribution, a two-stage region growing (TSRG) method is utilized for vessel enhancement and segmentation. A global threshold method is used as the preprocessing to obtain main branches, and the Radon-Like features (RLF) filter is used to enhance and connect broken minor segments, the final binary vessel mask is constructed by combining the two intermediate results. The visibility of TV-TRPCA algorithm for foreground extraction is evaluated on clinical XCA image sequences and third-party dataset, which can effectively improve the performance of commonly used vessel segmentation algorithms. Based on TV-TRPCA, the accuracy of TSRG algorithm for vessel segmentation is further evaluated. Both qualitative and quantitative results validate the superiority of the proposed method over existing state-of-the-art approaches. △ Less

Submitted 27 February, 2023; v1 submitted 15 September, 2022; originally announced September 2022.

Comments: 34pages, 14figures, 5tables

arXiv:2209.06993 [pdf, other]

Learning from Future: A Novel Self-Training Framework for Semantic Segmentation

Authors: Ye Du, Yujun Shen, Haochen Wang, Jingjing Fei, Wei Li, Liwei Wu, Rui Zhao, Zehua Fu, Qingjie Liu

Abstract: Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wron… ▽ More Self-training has shown great potential in semi-supervised learning. Its core idea is to use the model learned on labeled data to generate pseudo-labels for unlabeled samples, and in turn teach itself. To obtain valid supervision, active attempts typically employ a momentum teacher for pseudo-label prediction yet observe the confirmation bias issue, where the incorrect predictions may provide wrong supervision signals and get accumulated in the training process. The primary cause of such a drawback is that the prevailing self-training framework acts as guiding the current state with previous knowledge, because the teacher is updated with the past student only. To alleviate this problem, we propose a novel self-training strategy, which allows the model to learn from the future. Concretely, at each training step, we first virtually optimize the student (i.e., caching the gradients without applying them to the model weights), then update the teacher with the virtual future student, and finally ask the teacher to produce pseudo-labels for the current student as the guidance. In this way, we manage to improve the quality of pseudo-labels and thus boost the performance. We also develop two variants of our future-self-training (FST) framework through peeping at the future both deeply (FST-D) and widely (FST-W). Taking the tasks of unsupervised domain adaptive semantic segmentation and semi-supervised semantic segmentation as the instances, we experimentally demonstrate the effectiveness and superiority of our approach under a wide range of settings. Code will be made publicly available. △ Less

Submitted 18 September, 2022; v1 submitted 14 September, 2022; originally announced September 2022.

Comments: Accepted to NeurIPS 2022

arXiv:2209.03466 [pdf, other]

Supervised GAN Watermarking for Intellectual Property Protection

Authors: Jianwei Fei, Zhihua Xia, Benedetta Tondi, Mauro Barni

Abstract: We propose a watermarking method for protecting the Intellectual Property (IP) of Generative Adversarial Networks (GANs). The aim is to watermark the GAN model so that any image generated by the GAN contains an invisible watermark (signature), whose presence inside the image can be checked at a later stage for ownership verification. To achieve this goal, a pre-trained CNN watermarking decoding bl… ▽ More We propose a watermarking method for protecting the Intellectual Property (IP) of Generative Adversarial Networks (GANs). The aim is to watermark the GAN model so that any image generated by the GAN contains an invisible watermark (signature), whose presence inside the image can be checked at a later stage for ownership verification. To achieve this goal, a pre-trained CNN watermarking decoding block is inserted at the output of the generator. The generator loss is then modified by including a watermark loss term, to ensure that the prescribed watermark can be extracted from the generated images. The watermark is embedded via fine-tuning, with reduced time complexity. Results show that our method can effectively embed an invisible watermark inside the generated images. Moreover, our method is a general one and can work with different GAN architectures, different tasks, and different resolutions of the output image. We also demonstrate the good robustness performance of the embedded watermark against several post-processing, among them, JPEG compression, noise addition, blurring, and color transformations. △ Less

Submitted 7 September, 2022; originally announced September 2022.

arXiv:2206.11476 [pdf, other]

Dynamic Scene Deblurring Based on Continuous Cross-Layer Attention Transmission

Authors: Xia Hua, Mingxin Li, Junxiong Fei, Yu Shi, JianGuo Liu, Hanyu Hong

Abstract: The deep convolutional neural networks (CNNs) using attention mechanism have achieved great success for dynamic scene deblurring. In most of these networks, only the features refined by the attention maps can be passed to the next layer and the attention maps of different layers are separated from each other, which does not make full use of the attention information from different layers in the CN… ▽ More The deep convolutional neural networks (CNNs) using attention mechanism have achieved great success for dynamic scene deblurring. In most of these networks, only the features refined by the attention maps can be passed to the next layer and the attention maps of different layers are separated from each other, which does not make full use of the attention information from different layers in the CNN. To address this problem, we introduce a new continuous cross-layer attention transmission (CCLAT) mechanism that can exploit hierarchical attention information from all the convolutional layers. Based on the CCLAT mechanism, we use a very simple attention module to construct a novel residual dense attention fusion block (RDAFB). In RDAFB, the attention maps inferred from the outputs of the preceding RDAFB and each layer are directly connected to the subsequent ones, leading to a CCLAT mechanism. Taking RDAFB as the building block, we design an effective architecture for dynamic scene deblurring named RDAFNet. The experiments on benchmark datasets show that the proposed model outperforms the state-of-the-art deblurring approaches, and demonstrate the effectiveness of CCLAT mechanism. △ Less

Submitted 28 January, 2023; v1 submitted 23 June, 2022; originally announced June 2022.

arXiv:2203.04007 [pdf, other]

doi 10.1609/aaai.v36i1.19939

DuMLP-Pin: A Dual-MLP-dot-product Permutation-invariant Network for Set Feature Extraction

Authors: Jiajun Fei, Ziyu Zhu, Wenlei Liu, Zhidong Deng, Mingyang Li, Huanjun Deng, Shuo Zhang

Abstract: Existing permutation-invariant methods can be divided into two categories according to the aggregation scope, i.e. global aggregation and local one. Although the global aggregation methods, e. g., PointNet and Deep Sets, get involved in simpler structures, their performance is poorer than the local aggregation ones like PointNet++ and Point Transformer. It remains an open problem whether there exi… ▽ More Existing permutation-invariant methods can be divided into two categories according to the aggregation scope, i.e. global aggregation and local one. Although the global aggregation methods, e. g., PointNet and Deep Sets, get involved in simpler structures, their performance is poorer than the local aggregation ones like PointNet++ and Point Transformer. It remains an open problem whether there exists a global aggregation method with a simple structure, competitive performance, and even much fewer parameters. In this paper, we propose a novel global aggregation permutation-invariant network based on dual MLP dot-product, called DuMLP-Pin, which is capable of being employed to extract features for set inputs, including unordered or unstructured pixel, attribute, and point cloud data sets. We strictly prove that any permutation-invariant function implemented by DuMLP-Pin can be decomposed into two or more permutation-equivariant ones in a dot-product way as the cardinality of the given input set is greater than a threshold. We also show that the DuMLP-Pin can be viewed as Deep Sets with strong constraints under certain conditions. The performance of DuMLP-Pin is evaluated on several different tasks with diverse data sets. The experimental results demonstrate that our DuMLP-Pin achieves the best results on the two classification problems for pixel sets and attribute sets. On both the point cloud classification and the part segmentation, the accuracy of DuMLP-Pin is very close to the so-far best-performing local aggregation method with only a 1-2% difference, while the number of required parameters is significantly reduced by more than 85% in classification and 69% in segmentation, respectively. The code is publicly available on https://github.com/JaronTHU/DuMLP-Pin. △ Less

Submitted 30 August, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 16 pages, accepted by AAAI 2022 (https://ojs.aaai.org/index.php/AAAI/article/view/19939), with technical appendix

arXiv:2203.03884 [pdf, other]

Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels

Authors: Yuchao Wang, Haochen Wang, Yujun Shen, Jingjing Fei, Wei Li, Guoqiang Jin, Liwei Wu, Rui Zhao, Xinyi Le

Abstract: The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuit… ▽ More The crux of semi-supervised semantic segmentation is to assign adequate pseudo-labels to the pixels of unlabeled images. A common practice is to select the highly confident predictions as the pseudo ground-truth, but it leads to a problem that most pixels may be left unused due to their unreliability. We argue that every pixel matters to the model training, even its prediction is ambiguous. Intuitively, an unreliable prediction may get confused among the top classes (i.e., those with the highest probabilities), however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative sample to those most unlikely categories. Based on this insight, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative samples, and manage to train the model with all candidate pixels. Considering the training evolution, where the prediction becomes more and more accurate, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives. △ Less

Submitted 14 March, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: Accepted to CVPR 2022. Project: https://haochen-wang409.github.io/U2PL/

arXiv:2202.13067 [pdf, other]

A Robust Document Image Watermarking Scheme using Deep Neural Network

Authors: Sulong Ge, Zhihua Xia, Jianwei Fei, Xingming Sun, Jian Weng

Abstract: Watermarking is an important copyright protection technology which generally embeds the identity information into the carrier imperceptibly. Then the identity can be extracted to prove the copyright from the watermarked carrier even after suffering various attacks. Most of the existing watermarking technologies take the nature images as carriers. Different from the natural images, document images… ▽ More Watermarking is an important copyright protection technology which generally embeds the identity information into the carrier imperceptibly. Then the identity can be extracted to prove the copyright from the watermarked carrier even after suffering various attacks. Most of the existing watermarking technologies take the nature images as carriers. Different from the natural images, document images are not so rich in color and texture, and thus have less redundant information to carry watermarks. This paper proposes an end-to-end document image watermarking scheme using the deep neural network. Specifically, an encoder and a decoder are designed to embed and extract the watermark. A noise layer is added to simulate the various attacks that could be encountered in reality, such as the Cropout, Dropout, Gaussian blur, Gaussian noise, Resize, and JPEG Compression. A text-sensitive loss function is designed to limit the embedding modification on characters. An embedding strength adjustment strategy is proposed to improve the quality of watermarked image with little loss of extraction accuracy. Experimental results show that the proposed document image watermarking technology outperforms three state-of-the-arts in terms of the robustness and image quality. △ Less

Submitted 26 February, 2022; originally announced February 2022.

arXiv:2112.06095 [pdf, other]

Unlocking the Power of Inline Floating-Point Operations on Programmable Switches

Authors: Yifan Yuan, Omar Alama, Amedeo Sapio, Jiawei Fei, Jacob Nelson, Dan R. K. Ports, Marco Canini, Nam Sung Kim

Abstract: The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network prot… ▽ More The advent of switches with programmable dataplanes has enabled the rapid development of new network functionality, as well as providing a platform for acceleration of a broad range of application-level functionality. However, existing switch hardware was not designed with application acceleration in mind, and thus applications requiring operations or datatypes not used in traditional network protocols must resort to expensive workarounds. Applications involving floating point data, including distributed training for machine learning and distributed query processing, are key examples. In this paper, we propose FPISA, a floating point representation designed to work efficiently in programmable switches. We first implement FPISA on an Intel Tofino switch, but find that it has limitations that impact throughput and accuracy. We then propose hardware changes to address these limitations based on the open-source Banzai switch architecture, and synthesize them in a 15-nm standard-cell library to demonstrate their feasibility. Finally, we use FPISA to implement accelerators for training for machine learning and for query processing, and evaluate their performance on a switch implementing our changes using emulation. We find that FPISA allows distributed training to use 25-75% fewer CPU cores and provide up to 85.9% better throughput in a CPU-constrained environment than SwitchML. For distributed query processing with floating point data, FPISA enables up to 2.7x better throughput than Spark. △ Less

Submitted 11 December, 2021; originally announced December 2021.

Comments: This paper has been accepted by NSDI'22. This arxiv paper is not the final camera-ready version

arXiv:2108.08166 [pdf, other]

Deployment of Deep Neural Networks for Object Detection on Edge AI Devices with Runtime Optimization

Authors: Lukas Stäcker, Juncong Fei, Philipp Heidenreich, Frank Bonarens, Jason Rambach, Didier Stricker, Christoph Stiller

Abstract: Deep neural networks have proven increasingly important for automotive scene understanding with new algorithms offering constant improvements of the detection performance. However, there is little emphasis on experiences and needs for deployment in embedded environments. We therefore perform a case study of the deployment of two representative object detection networks on an edge AI platform. In p… ▽ More Deep neural networks have proven increasingly important for automotive scene understanding with new algorithms offering constant improvements of the detection performance. However, there is little emphasis on experiences and needs for deployment in embedded environments. We therefore perform a case study of the deployment of two representative object detection networks on an edge AI platform. In particular, we consider RetinaNet for image-based 2D object detection and PointPillars for LiDAR-based 3D object detection. We describe the modifications necessary to convert the algorithms from a PyTorch training environment to the deployment environment taking into account the available tools. We evaluate the runtime of the deployed DNN using two different libraries, TensorRT and TorchScript. In our experiments, we observe slight advantages of TensorRT for convolutional layers and TorchScript for fully connected layers. We also study the trade-off between runtime and performance, when selecting an optimized setup for deployment, and observe that quantization significantly reduces the runtime while having only little impact on the detection performance. △ Less

Submitted 18 August, 2021; originally announced August 2021.

Comments: To present in ICCV 2021 (ERCVAD Workshop)

arXiv:2107.00346 [pdf, other]

MASS: Multi-Attentional Semantic Segmentation of LiDAR Data for Dense Top-View Understanding

Authors: Kunyu Peng, Juncong Fei, Kailun Yang, Alina Roitberg, Jiaming Zhang, Frank Bieder, Philipp Heidenreich, Christoph Stiller, Rainer Stiefelhagen

Abstract: At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost co… ▽ More At the heart of all automated driving systems is the ability to sense the surroundings, e.g., through semantic segmentation of LiDAR sequences, which experienced a remarkable progress due to the release of large datasets such as SemanticKITTI and nuScenes-LidarSeg. While most previous works focus on sparse segmentation of the LiDAR input, dense output masks provide self-driving cars with almost complete environment information. In this paper, we introduce MASS - a Multi-Attentional Semantic Segmentation model specifically built for dense top-view understanding of the driving scenes. Our framework operates on pillar- and occupancy features and comprises three attention-based building blocks: (1) a keypoint-driven graph attention, (2) an LSTM-based attention computed from a vector embedding of the spatial input, and (3) a pillar-based attention, resulting in a dense 360-degree segmentation mask. With extensive experiments on both, SemanticKITTI and nuScenes-LidarSeg, we quantitatively demonstrate the effectiveness of our model, outperforming the state of the art by 19.0% on SemanticKITTI and reaching 30.4% in mIoU on nuScenes-LidarSeg, where MASS is the first work addressing the dense segmentation task. Furthermore, our multi-attention model is shown to be very effective for 3D object detection validated on the KITTI-3D dataset, showcasing its high generalizability to other tasks related to 3D vision. △ Less

Submitted 20 January, 2022; v1 submitted 1 July, 2021; originally announced July 2021.

Comments: Accepted to IEEE Transactions on Intelligent Transportation Systems (T-ITS). Code is publicly available at https://github.com/KPeng9510/MASS

arXiv:2105.04169 [pdf, other]

PillarSegNet: Pillar-based Semantic Grid Map Estimation using Sparse LiDAR Data

Authors: Juncong Fei, Kunyu Peng, Philipp Heidenreich, Frank Bieder, Christoph Stiller

Abstract: Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semanti… ▽ More Semantic understanding of the surrounding environment is essential for automated vehicles. The recent publication of the SemanticKITTI dataset stimulates the research on semantic segmentation of LiDAR point clouds in urban scenarios. While most existing approaches predict sparse pointwise semantic classes for the sparse input LiDAR scan, we propose PillarSegNet to be able to output a dense semantic grid map. In contrast to a previously proposed grid map method, PillarSegNet uses PointNet to learn features directly from the 3D point cloud and then conducts 2D semantic segmentation in the top view. To train and evaluate our approach, we use both sparse and dense ground truth, where the dense ground truth is obtained from multiple superimposed scans. Experimental results on the SemanticKITTI dataset show that PillarSegNet achieves a performance gain of about 10% mIoU over the state-of-the-art grid map method. △ Less

Submitted 5 July, 2021; v1 submitted 10 May, 2021; originally announced May 2021.

Comments: Accepted to present in the 2021 IEEE Intelligent Vehicles Symposium (IV21)

arXiv:2009.12276 [pdf, other]

doi 10.1109/MFI49285.2020.9235240

SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation

Authors: Juncong Fei, Wenbo Chen, Philipp Heidenreich, Sascha Wirges, Christoph Stiller

Abstract: 3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public be… ▽ More 3D pedestrian detection is a challenging task in automated driving because pedestrians are relatively small, frequently occluded and easily confused with narrow vertical objects. LiDAR and camera are two commonly used sensor modalities for this task, which should provide complementary information. Unexpectedly, LiDAR-only detection methods tend to outperform multisensor fusion methods in public benchmarks. Recently, PointPainting has been presented to eliminate this performance drop by effectively fusing the output of a semantic segmentation network instead of the raw image information. In this paper, we propose a generalization of PointPainting to be able to apply fusion at different levels. After the semantic augmentation of the point cloud, we encode raw point data in pillars to get geometric features and semantic point data in voxels to get semantic features and fuse them in an effective way. Experimental results on the KITTI test set show that SemanticVoxels achieves state-of-the-art performance in both 3D and bird's eye view pedestrian detection benchmarks. In particular, our approach demonstrates its strength in detecting challenging pedestrian cases and outperforms current state-of-the-art approaches. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Comments: Accepted to present in the 2020 IEEE International Conference on Multisensor Fusion and Integration (MFI 2020)

arXiv:2007.13902 [pdf, other]

Leveraging the Power of Place: A Data-Driven Decision Helper to Improve the Location Decisions of Economic Immigrants

Authors: Jeremy Ferwerda, Nicholas Adams-Cohen, Kirk Bansak, Jennifer Fei, Duncan Lawrence, Jeremy M. Weinstein, Jens Hainmueller

Abstract: A growing number of countries have established programs to attract immigrants who can contribute to their economy. Research suggests that an immigrant's initial arrival location plays a key role in shaping their economic success. Yet immigrants currently lack access to personalized information that would help them identify optimal destinations. Instead, they often rely on availability heuristics,… ▽ More A growing number of countries have established programs to attract immigrants who can contribute to their economy. Research suggests that an immigrant's initial arrival location plays a key role in shaping their economic success. Yet immigrants currently lack access to personalized information that would help them identify optimal destinations. Instead, they often rely on availability heuristics, which can lead to the selection of sub-optimal landing locations, lower earnings, elevated outmigration rates, and concentration in the most well-known locations. To address this issue and counteract the effects of cognitive biases and limited information, we propose a data-driven decision helper that draws on behavioral insights, administrative data, and machine learning methods to inform immigrants' location decisions. The decision helper provides personalized location recommendations that reflect immigrants' preferences as well as data-driven predictions of the locations where they maximize their expected earnings given their profile. We illustrate the potential impact of our approach using backtests conducted with administrative data that links landing data of recent economic immigrants from Canada's Express Entry system with their earnings retrieved from tax records. Simulations across various scenarios suggest that providing location recommendations to incoming economic immigrants can increase their initial earnings and lead to a mild shift away from the most populous landing destinations. Our approach can be implemented within existing institutional structures at minimal cost, and offers governments an opportunity to harness their administrative data to improve outcomes for economic immigrants. △ Less

Submitted 27 July, 2020; originally announced July 2020.

Comments: 51 pages (including appendix), 13 figures. Immigration Policy Lab (IPL) Working Paper Series, Working Paper No. 20-06

arXiv:2002.11573 [pdf, other]

Efficient reinforcement learning control for continuum robots based on Inexplicit Prior Knowledge

Authors: Junjia Liu, Jiaying Shou, Zhuang Fu, Hangfei Zhou, Rongli Xie, Jun Zhang, Jian Fei, Yanna Zhao

Abstract: Compared to rigid robots that are generally studied in reinforcement learning, the physical characteristics of some sophisticated robots such as soft or continuum robots are higher complicated. Moreover, recent reinforcement learning methods are data-inefficient and can not be directly deployed to the robot without simulation. In this paper, we propose an efficient reinforcement learning method ba… ▽ More Compared to rigid robots that are generally studied in reinforcement learning, the physical characteristics of some sophisticated robots such as soft or continuum robots are higher complicated. Moreover, recent reinforcement learning methods are data-inefficient and can not be directly deployed to the robot without simulation. In this paper, we propose an efficient reinforcement learning method based on inexplicit prior knowledge in response to such problems. We first corroborate the method by simulation and employed directly in the real world. By using our method, we can achieve active visual tracking and distance maintenance of a tendon-driven robot which will be critical in minimally invasive procedures. Codes are available at https://github.com/Skylark0924/TendonTrack. △ Less

Submitted 2 October, 2020; v1 submitted 26 February, 2020; originally announced February 2020.

Comments: 11 pages, 12 figures

arXiv:1906.03647 [pdf, other]

A Variant of Gaussian Process Dynamical Systems

Authors: Jing Zhao, Jingjing Fei, Shiliang Sun

Abstract: In order to better model high-dimensional sequential data, we propose a collaborative multi-output Gaussian process dynamical system (CGPDS), which is a novel variant of GPDSs. The proposed model assumes that the output on each dimension is controlled by a shared global latent process and a private local latent process. Thus, the dependence among different dimensions of the sequences can be captur… ▽ More In order to better model high-dimensional sequential data, we propose a collaborative multi-output Gaussian process dynamical system (CGPDS), which is a novel variant of GPDSs. The proposed model assumes that the output on each dimension is controlled by a shared global latent process and a private local latent process. Thus, the dependence among different dimensions of the sequences can be captured, and the unique characteristics of each dimension of the sequences can be maintained. For training models and making prediction, we introduce inducing points and adopt stochastic variational inference methods. △ Less

Submitted 9 June, 2019; originally announced June 2019.

Comments: Technical Report, East China Normal University, November 2018

arXiv:1905.05761 [pdf, ps, other]

Online Anomaly Detection with Sparse Gaussian Processes

Authors: Jingjing Fei, Shiliang Sun

Abstract: Online anomaly detection of time-series data is an important and challenging task in machine learning. Gaussian processes (GPs) are powerful and flexible models for modeling time-series data. However, the high time complexity of GPs limits their applications in online anomaly detection. Attributed to some internal or external changes, concept drift usually occurs in time-series data, where the cha… ▽ More Online anomaly detection of time-series data is an important and challenging task in machine learning. Gaussian processes (GPs) are powerful and flexible models for modeling time-series data. However, the high time complexity of GPs limits their applications in online anomaly detection. Attributed to some internal or external changes, concept drift usually occurs in time-series data, where the characteristics of data and meanings of abnormal behaviors alter over time. Online anomaly detection methods should have the ability to adapt to concept drift. Motivated by the above facts, this paper proposes the method of sparse Gaussian processes with Q-function (SGP-Q). The SGP-Q employs sparse Gaussian processes (SGPs) whose time complexity is lower than that of GPs, thus significantly speeding up online anomaly detection. By using Q-function properly, the SGP-Q can adapt to concept drift well. Moreover, the SGP-Q makes use of few abnormal data in the training data by its strategy of updating training data, resulting in more accurate sparse Gaussian process regression models and better anomaly detection results. We evaluate the SGP-Q on various artificial and real-world datasets. Experimental results validate the effectiveness of the SGP-Q. △ Less

Submitted 14 May, 2019; originally announced May 2019.

arXiv:1901.05571 [pdf, other]

Metaflow: A DAG-Based Network Abstraction for Distributed Applications

Authors: Jiawei Fei, Yang Shi, Qun Huang, Mei Wen

Abstract: In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they cannot capture the semantics of communication in a cluster application. Being aware of this problem, coflow is proposed as a new network abstraction. However, it is… ▽ More In the past decade, increasingly network scheduling techniques have been proposed to boost the distributed application performance. Flow-level metrics, such as flow completion time (FCT), are based on the abstraction of flows yet they cannot capture the semantics of communication in a cluster application. Being aware of this problem, coflow is proposed as a new network abstraction. However, it is insufficient to reveal the dependencies between computation and communication. As a result, the real application performance can be hurt, especially in the absence of hard barriers. Based on the computation DAG of the application, we propose an expressive abstraction namely metaflow that resides in the middle of the two extreme points of flows and coflows. Evaluation results show that metaflow-based scheduling can outperform the coflow-based algorithm by 1.78x. △ Less

Submitted 16 January, 2019; originally announced January 2019.

arXiv:1311.3105 [pdf]

k-DAG Based Lifetime Aware Data Collection in Wireless Sensor Networks

Authors: Jingjing Fei, Hui Wu, Yongxin Wang

Abstract: Wireless Sensor Networks need to be organized for efficient data collection and lifetime maximization. In this paper, we propose a novel routing structure, namely k-DAG, to balance the load of the base station's neighbours while providing the worst-case latency guarantee for data collection, and a distributed algorithm for construction a k-DAG based on a SPD (Shortest Path DAG). In a k-DAG, the le… ▽ More Wireless Sensor Networks need to be organized for efficient data collection and lifetime maximization. In this paper, we propose a novel routing structure, namely k-DAG, to balance the load of the base station's neighbours while providing the worst-case latency guarantee for data collection, and a distributed algorithm for construction a k-DAG based on a SPD (Shortest Path DAG). In a k-DAG, the lengths of the longest path and the shortest path of each sensor node to the base station differ by at most k. By adding sibling edges to a SPD, our distributed algorithm allows critical nodes to have more routing choices. The simulation results show that our approach significantly outperforms the SPD-based data collection approach in both network lifetime and load balance. △ Less

Submitted 13 November, 2013; originally announced November 2013.

Comments: 17 pages, 10 figures

Journal ref: International Journal of Wireless & Mobile Networks (IJWMN) Vol. 5, No. 5, October 2013, pp.17-33

arXiv:1309.2139 [pdf]

Frequency and time domain packet scheduling based on channel prediction with imperfect CQI in LTE

Authors: Yongxin Wang, Kumbesan Sandrasegaran, Xinning Zhu, Jingjing Fei, Xiaoying Kong, Cheng-Chung Lin

Abstract: Channel-dependent scheduling of transmission of data packets in a wireless system is based on measurement and feedback of the channel quality. To alleviate the performance degradation due to simultaneous multiple imperfect channel quality information (CQI), a simple and efficient packet scheduling (PS) algorithm is developed in downlink LTE system for real time traffic. A frequency domain channel… ▽ More Channel-dependent scheduling of transmission of data packets in a wireless system is based on measurement and feedback of the channel quality. To alleviate the performance degradation due to simultaneous multiple imperfect channel quality information (CQI), a simple and efficient packet scheduling (PS) algorithm is developed in downlink LTE system for real time traffic. A frequency domain channel predictor based on Kalman filter is first developed to restore the true CQI from erroneous channel quality feedback. Then, a time domain grouping technique employing the joint of Proportional Fair (PF) and Modified Largest Weighted Delay First (M-LWDF) algorithms is used. It was proved this proposal achieves better performance in terms of system throughput and packet loss ratio by simulation results. △ Less

Submitted 9 September, 2013; originally announced September 2013.

Showing 1–33 of 33 results for author: Fei, J