-
Out-of-Distribution Detection using Neural Activation Prior
Authors:
Weilin Wan,
Weizhong Zhang,
Quan Zhou,
Fan Yi,
Cheng Jin
Abstract:
Out-of-distribution detection (OOD) is a crucial technique for deploying machine learning models in the real world to handle the unseen scenarios. In this paper, we first propose a simple yet effective Neural Activation Prior (NAP) for OOD detection. Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the…
▽ More
Out-of-distribution detection (OOD) is a crucial technique for deploying machine learning models in the real world to handle the unseen scenarios. In this paper, we first propose a simple yet effective Neural Activation Prior (NAP) for OOD detection. Our neural activation prior is based on a key observation that, for a channel before the global pooling layer of a fully trained neural network, the probability of a few neurons being activated with a large response by an in-distribution (ID) sample is significantly higher than that by an OOD sample. An intuitive explanation is that for a model fully trained on ID dataset, each channel would play a role in detecting a certain pattern in the ID dataset, and a few neurons can be activated with a large response when the pattern is detected in an input sample. Then, a new scoring function based on this prior is proposed to highlight the role of these strongly activated neurons in OOD detection. Our approach is plug-and-play and does not lead to any performance degradation on ID data classification and requires no extra training or statistics from training or external datasets. Notice that previous methods primarily rely on post-global-pooling features of the neural networks, while the within-channel distribution information we leverage would be discarded by the global pooling operator. Consequently, our method is orthogonal to existing approaches and can be effectively combined with them in various applications. Experimental results show that our method achieves the state-of-the-art performance on CIFAR benchmark and ImageNet dataset, which demonstrates the power of the proposed prior. Finally, we extend our method to Transformers and the experimental findings indicate that NAP can also significantly enhance the performance of OOD detection on Transformers, thereby demonstrating the broad applicability of this prior knowledge.
△ Less
Submitted 24 May, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
WaveFlex: A Smart Surface for Private CBRS Wireless Cellular Networks
Authors:
Fan Yi,
Kun Woo Cho,
Yaxiong Xie,
Kyle Jamieson
Abstract:
We present the design and implementation of WaveFlex, the first smart surface that enhances Private LTE/5G networks operating under the shared-license framework in the Citizens Broadband Radio Service frequency band. WaveFlex works in the presence of frequency diversity: multiple nearby base stations operating on different frequencies, as dictated by a Spectrum Access System coordinator. It also h…
▽ More
We present the design and implementation of WaveFlex, the first smart surface that enhances Private LTE/5G networks operating under the shared-license framework in the Citizens Broadband Radio Service frequency band. WaveFlex works in the presence of frequency diversity: multiple nearby base stations operating on different frequencies, as dictated by a Spectrum Access System coordinator. It also handles time dynamism: due to the dynamic sharing rules of the band, base stations occasionally switch channels, especially when priority users enter the network. Finally, WaveFlex operates independently of the network itself, not requiring access to nor modification of the base station or mobile users, yet it remain compliant with and effective on prevailing cellular protocols. We have designed and fabricated WaveFlex on a custom multi-layer PCB, software defined radio-based network monitor, and supporting control software and hardware. Our experimental evaluation benchmarks an operational Private LTE network running at full line rate. Results demonstrate an 8.50 dB average SNR gain, and an average throughput gain of 4.36 Mbps for a single small cell, and 3.19 Mbps for four small cells, in a realistic indoor office scenario.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Exploiting Facial Relationships and Feature Aggregation for Multi-Face Forgery Detection
Authors:
Chenhao Lin,
Fangbin Yi,
Hang Wang,
Qian Li,
Deng Jingyi,
Chao Shen
Abstract:
Face forgery techniques have emerged as a forefront concern, and numerous detection approaches have been proposed to address this challenge. However, existing methods predominantly concentrate on single-face manipulation detection, leaving the more intricate and realistic realm of multi-face forgeries relatively unexplored. This paper proposes a novel framework explicitly tailored for multi-face f…
▽ More
Face forgery techniques have emerged as a forefront concern, and numerous detection approaches have been proposed to address this challenge. However, existing methods predominantly concentrate on single-face manipulation detection, leaving the more intricate and realistic realm of multi-face forgeries relatively unexplored. This paper proposes a novel framework explicitly tailored for multi-face forgery detection,filling a critical gap in the current research. The framework mainly involves two modules:(i) a facial relationships learning module, which generates distinguishable local features for each face within images,(ii) a global feature aggregation module that leverages the mutual constraints between global and local information to enhance forgery detection accuracy.Our experimental results on two publicly available multi-face forgery datasets demonstrate that the proposed approach achieves state-of-the-art performance in multi-face forgery detection scenarios.
△ Less
Submitted 7 October, 2023;
originally announced October 2023.
-
Prompting or Fine-tuning? A Comparative Study of Large Language Models for Taxonomy Construction
Authors:
Boqi Chen,
Fandi Yi,
Dániel Varró
Abstract:
Taxonomies represent hierarchical relations between entities, frequently applied in various software modeling and natural language processing (NLP) activities. They are typically subject to a set of structural constraints restricting their content. However, manual taxonomy construction can be time-consuming, incomplete, and costly to maintain. Recent studies of large language models (LLMs) have de…
▽ More
Taxonomies represent hierarchical relations between entities, frequently applied in various software modeling and natural language processing (NLP) activities. They are typically subject to a set of structural constraints restricting their content. However, manual taxonomy construction can be time-consuming, incomplete, and costly to maintain. Recent studies of large language models (LLMs) have demonstrated that appropriate user inputs (called prompting) can effectively guide LLMs, such as GPT-3, in diverse NLP tasks without explicit (re-)training. However, existing approaches for automated taxonomy construction typically involve fine-tuning a language model by adjusting model parameters. In this paper, we present a general framework for taxonomy construction that takes into account structural constraints. We subsequently conduct a systematic comparison between the prompting and fine-tuning approaches performed on a hypernym taxonomy and a novel computer science taxonomy dataset. Our result reveals the following: (1) Even without explicit training on the dataset, the prompting approach outperforms fine-tuning-based approaches. Moreover, the performance gap between prompting and fine-tuning widens when the training dataset is small. However, (2) taxonomies generated by the fine-tuning approach can be easily post-processed to satisfy all the constraints, whereas handling violations of the taxonomies produced by the prompting approach can be challenging. These evaluation findings provide guidance on selecting the appropriate method for taxonomy construction and highlight potential enhancements for both approaches.
△ Less
Submitted 4 September, 2023;
originally announced September 2023.
-
Multiverse Transformer: 1st Place Solution for Waymo Open Sim Agents Challenge 2023
Authors:
Yu Wang,
Tiebiao Zhao,
Fan Yi
Abstract:
This technical report presents our 1st place solution for the Waymo Open Sim Agents Challenge (WOSAC) 2023. Our proposed MultiVerse Transformer for Agent simulation (MVTA) effectively leverages transformer-based motion prediction approaches, and is tailored for closed-loop simulation of agents. In order to produce simulations with a high degree of realism, we design novel training and sampling met…
▽ More
This technical report presents our 1st place solution for the Waymo Open Sim Agents Challenge (WOSAC) 2023. Our proposed MultiVerse Transformer for Agent simulation (MVTA) effectively leverages transformer-based motion prediction approaches, and is tailored for closed-loop simulation of agents. In order to produce simulations with a high degree of realism, we design novel training and sampling methods, and implement a receding horizon prediction mechanism. In addition, we introduce a variable-length history aggregation method to mitigate the compounding error that can arise during closed-loop autoregressive execution. On the WOSAC, our MVTA and its enhanced version MVTE reach a realism meta-metric of 0.5091 and 0.5168, respectively, outperforming all the other methods on the leaderboard.
△ Less
Submitted 20 June, 2023;
originally announced June 2023.
-
AI vs. Human -- Differentiation Analysis of Scientific Content Generation
Authors:
Yongqiang Ma,
Jiawei Liu,
Fan Yi,
Qikai Cheng,
Yong Huang,
Wei Lu,
Xiaozhong Liu
Abstract:
Recent neural language models have taken a significant step forward in producing remarkably controllable, fluent, and grammatical text. Although studies have found that AI-generated text is not distinguishable from human-written text for crowd-sourcing workers, there still exist errors in AI-generated text which are even subtler and harder to spot. We primarily focus on the scenario in which scien…
▽ More
Recent neural language models have taken a significant step forward in producing remarkably controllable, fluent, and grammatical text. Although studies have found that AI-generated text is not distinguishable from human-written text for crowd-sourcing workers, there still exist errors in AI-generated text which are even subtler and harder to spot. We primarily focus on the scenario in which scientific AI writing assistant is deeply involved. First, we construct a feature description framework to distinguish between AI-generated text and human-written text from syntax, semantics, and pragmatics based on the human evaluation. Then we utilize the features, i.e., writing style, coherence, consistency, and argument logistics, from the proposed framework to analyze two types of content. Finally, we adopt several publicly available methods to investigate the gap of between AI-generated scientific text and human-written scientific text by AI-generated scientific text detection models. The results suggest that while AI has the potential to generate scientific content that is as accurate as human-written content, there is still a gap in terms of depth and overall quality. The AI-generated scientific content is more likely to contain errors in factual issues. We find that there exists a "writing style" gap between AI-generated scientific text and human-written scientific text. Based on the analysis result, we summarize a series of model-agnostic and distribution-agnostic features for detection tasks in other domains. Findings in this paper contribute to guiding the optimization of AI models to produce high-quality content and addressing related ethical and security concerns.
△ Less
Submitted 12 February, 2023; v1 submitted 23 January, 2023;
originally announced January 2023.
-
ASFormer: Transformer for Action Segmentation
Authors:
Fangqiu Yi,
Hongyu Wen,
Tingting Jiang
Abstract:
Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the…
▽ More
Algorithms for the action segmentation task typically use temporal models to predict what action is occurring at each frame for a minute-long daily activity. Recent studies have shown the potential of Transformer in modeling the relations among elements in sequential data. However, there are several major concerns when directly applying the Transformer to the action segmentation task, such as the lack of inductive biases with small training sets, the deficit in processing long input sequence, and the limitation of the decoder architecture to utilize temporal relations among multiple action segments to refine the initial predictions. To address these concerns, we design an efficient Transformer-based model for action segmentation task, named ASFormer, with three distinctive characteristics: (i) We explicitly bring in the local connectivity inductive priors because of the high locality of features. It constrains the hypothesis space within a reliable scope, and is beneficial for the action segmentation task to learn a proper target function with small training sets. (ii) We apply a pre-defined hierarchical representation pattern that efficiently handles long input sequences. (iii) We carefully design the decoder to refine the initial predictions from the encoder. Extensive experiments on three public datasets demonstrate that effectiveness of our methods. Code is available at \url{https://github.com/ChinaYi/ASFormer}.
△ Less
Submitted 16 October, 2021;
originally announced October 2021.
-
Not End-to-End: Explore Multi-Stage Architecture for Online Surgical Phase Recognition
Authors:
Fangqiu Yi,
Tingting Jiang
Abstract:
Surgical phase recognition is of particular interest to computer assisted surgery systems, in which the goal is to predict what phase is occurring at each frame for a surgery video. Networks with multi-stage architecture have been widely applied in many computer vision tasks with rich patterns, where a predictor stage first outputs initial predictions and an additional refinement stage operates on…
▽ More
Surgical phase recognition is of particular interest to computer assisted surgery systems, in which the goal is to predict what phase is occurring at each frame for a surgery video. Networks with multi-stage architecture have been widely applied in many computer vision tasks with rich patterns, where a predictor stage first outputs initial predictions and an additional refinement stage operates on the initial predictions to perform further refinement. Existing works show that surgical video contents are well ordered and contain rich temporal patterns, making the multi-stage architecture well suited for the surgical phase recognition task. However, we observe that when simply applying the multi-stage architecture to the surgical phase recognition task, the end-to-end training manner will make the refinement ability fall short of its wishes. To address the problem, we propose a new non end-to-end training strategy and explore different designs of multi-stage architecture for surgical phase recognition task. For the non end-to-end training strategy, the refinement stage is trained separately with proposed two types of disturbed sequences. Meanwhile, we evaluate three different choices of refinement models to show that our analysis and solution are robust to the choices of specific multi-stage models. We conduct experiments on two public benchmarks, the M2CAI16 Workflow Challenge, and the Cholec80 dataset. Results show that multi-stage architecture trained with our strategy largely boosts the performance of the current state-of-the-art single-stage model. Code is available at \url{https://github.com/ChinaYi/casual_tcn}.
△ Less
Submitted 10 July, 2021;
originally announced July 2021.
-
Deep Learning based Full-reference and No-reference Quality Assessment Models for Compressed UGC Videos
Authors:
Wei Sun,
Tao Wang,
Xiongkuo Min,
Fuwang Yi,
Guangtao Zhai
Abstract:
In this paper, we propose a deep learning based video quality assessment (VQA) framework to evaluate the quality of the compressed user's generated content (UGC) videos. The proposed VQA framework consists of three modules, the feature extraction module, the quality regression module, and the quality pooling module. For the feature extraction module, we fuse the features from intermediate layers o…
▽ More
In this paper, we propose a deep learning based video quality assessment (VQA) framework to evaluate the quality of the compressed user's generated content (UGC) videos. The proposed VQA framework consists of three modules, the feature extraction module, the quality regression module, and the quality pooling module. For the feature extraction module, we fuse the features from intermediate layers of the convolutional neural network (CNN) network into final quality-aware feature representation, which enables the model to make full use of visual information from low-level to high-level. Specifically, the structure and texture similarities of feature maps extracted from all intermediate layers are calculated as the feature representation for the full reference (FR) VQA model, and the global mean and standard deviation of the final feature maps fused by intermediate feature maps are calculated as the feature representation for the no reference (NR) VQA model. For the quality regression module, we use the fully connected (FC) layer to regress the quality-aware features into frame-level scores. Finally, a subjectively-inspired temporal pooling strategy is adopted to pool frame-level scores into the video-level score. The proposed model achieves the best performance among the state-of-the-art FR and NR VQA models on the Compressed UGC VQA database and also achieves pretty good performance on the in-the-wild UGC VQA databases.
△ Less
Submitted 2 June, 2021;
originally announced June 2021.
-
PBE-CC: Congestion Control via Endpoint-Centric, Physical-Layer Bandwidth Measurements
Authors:
Yaxiong Xie,
Fan Yi,
Kyle Jamieson
Abstract:
Wireless networks are becoming ever more sophisticated and overcrowded, imposing the most delay, jitter, and throughput damage to end-to-end network flows in today's internet. We therefore argue for fine-grained mobile endpoint-based wireless measurements to inform a precise congestion control algorithm through a well-defined API to the mobile's wireless physical layer. Our proposed congestion con…
▽ More
Wireless networks are becoming ever more sophisticated and overcrowded, imposing the most delay, jitter, and throughput damage to end-to-end network flows in today's internet. We therefore argue for fine-grained mobile endpoint-based wireless measurements to inform a precise congestion control algorithm through a well-defined API to the mobile's wireless physical layer. Our proposed congestion control algorithm is based on Physical-Layer Bandwidth measurements taken at the Endpoint (PBE-CC), and captures the latest 5G New Radio innovations that increase wireless capacity, yet create abrupt rises and falls in available wireless capacity that the PBE-CC sender can react to precisely and very rapidly. We implement a proof-of-concept prototype of the PBE measurement module on software-defined radios and the PBE sender and receiver in C. An extensive performance evaluation compares PBE-CC head to head against the leading cellular-aware and wireless-oblivious congestion control protocols proposed in the research community and in deployment, in mobile and static mobile scenarios, and over busy and quiet networks. Results show 6.3% higher average throughput than BBR, while simultaneously reducing 95th percentile delay by 1.8x.
△ Less
Submitted 6 July, 2020; v1 submitted 9 February, 2020;
originally announced February 2020.
-
ConvPath: A Software Tool for Lung Adenocarcinoma Digital Pathological Image Analysis Aided by Convolutional Neural Network
Authors:
Shidan Wang,
Tao Wang,
Lin Yang,
Faliu Yi,
Xin Luo,
Yikun Yang,
Adi Gazdar,
Junya Fujimoto,
Ignacio I. Wistuba,
Bo Yao,
ShinYi Lin,
Yang Xie,
Yousheng Mao,
Guanghua Xiao
Abstract:
The spatial distributions of different types of cells could reveal a cancer cell growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key hallmarks of cancer. However, manually recognizing and localizing all the cells in pathology slides are almost impossible. In this study, we developed an automated cell type classification…
▽ More
The spatial distributions of different types of cells could reveal a cancer cell growth pattern, its relationships with the tumor microenvironment and the immune response of the body, all of which represent key hallmarks of cancer. However, manually recognizing and localizing all the cells in pathology slides are almost impossible. In this study, we developed an automated cell type classification pipeline, ConvPath, which includes nuclei segmentation, convolutional neural network-based tumor, stromal and lymphocytes classification, and extraction of tumor microenvironment related features for lung cancer pathology images. The overall classification accuracy is 92.9% and 90.1% in training and independent testing datasets, respectively. By identifying cells and classifying cell types, this pipeline can convert a pathology image into a spatial map of tumor, stromal and lymphocyte cells. From this spatial map, we can extracted features that characterize the tumor micro-environment. Based on these features, we developed an image feature-based prognostic model and validated the model in two independent cohorts. The predicted risk group serves as an independent prognostic factor, after adjusting for clinical variables that include age, gender, smoking status, and stage.
△ Less
Submitted 20 September, 2018;
originally announced September 2018.
-
Probabilistic Visual Secret Sharing Schemes for Gray-scale images and Color images
Authors:
Dao-Shun Wang,
Feng Yi,
Xiaobo Li
Abstract:
Visual secrete sharing (VSS) is an encryption technique that utilizes human visual system in the recovering of the secret image and it does not require any complex calculation. Pixel expansion has been a major issue of VSS schemes. A number of probabilistic VSS schemes with minimum pixel expansion have been proposed for binary secret images. This paper presents a general probabilistic (k, n)-VSS…
▽ More
Visual secrete sharing (VSS) is an encryption technique that utilizes human visual system in the recovering of the secret image and it does not require any complex calculation. Pixel expansion has been a major issue of VSS schemes. A number of probabilistic VSS schemes with minimum pixel expansion have been proposed for binary secret images. This paper presents a general probabilistic (k, n)-VSS scheme for gray-scale images and another scheme for color images. With our schemes, the pixel expansion can be set to a user-defined value. When this value is 1, there is no pixel expansion at all. The quality of reconstructed secret images, measured by Average Relative Difference, is equivalent to Relative Difference of existing deterministic schemes. Previous probabilistic VSS schemes for black-and-white images with respect to pixel expansion can be viewed as special cases of the schemes proposed here
△ Less
Submitted 26 December, 2007;
originally announced December 2007.
-
On the Analysis and Generalization of Extended Visual Cryptography Schemes
Authors:
DaoShun Wang,
Feng Yi,
Xiaobo Li,
Ping Luo,
Yiqi Dai
Abstract:
An Extended Visual Cryptography Scheme (EVCS) was proposed by Ateniese et al. [3] to protect a binary secret image with meaningful (innocent-looking) shares. This is implemented by concatenating an extended matrix to each basis matrix. The minimum size of the extended matrix was obtained from a hypergraph coloring model and the scheme was designed for binary images only [3]. In this paper, we gi…
▽ More
An Extended Visual Cryptography Scheme (EVCS) was proposed by Ateniese et al. [3] to protect a binary secret image with meaningful (innocent-looking) shares. This is implemented by concatenating an extended matrix to each basis matrix. The minimum size of the extended matrix was obtained from a hypergraph coloring model and the scheme was designed for binary images only [3]. In this paper, we give a more concise derivation for this matrix extension for color images. Furthermore, we present a (k, n) scheme to protect multiple color images with meaningful shares. This scheme is an extension of the (n, n) VCS for multiple binary images proposed in Droste scheme [2].
△ Less
Submitted 30 October, 2006;
originally announced October 2006.