subscribe to arXiv mailings

GreenCOD: A Green Camouflaged Object Detection Method

Authors: Hong-Shuo Chen, Yao Zhu, Suya You, Azad M. Madni, C. -C. Jay Kuo

Abstract: We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through bac… ▽ More We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through backpropagation-based fine-tuning. However, such methods are typically computationally demanding and exhibit only marginal performance variations across different models. This raises the question of whether effective training can be achieved without backpropagation. Addressing this, our work proposes a new paradigm that utilizes gradient boosting for COD. This approach significantly simplifies the model design, resulting in a system that requires fewer parameters and operations and maintains high performance compared to state-of-the-art deep learning models. Remarkably, our models are trained without backpropagation and achieve the best performance with fewer than 20G Multiply-Accumulate Operations (MACs). This new, more efficient paradigm opens avenues for further exploration in green, backpropagation-free model training. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2402.06982 [pdf, other]

Treatment-wise Glioblastoma Survival Inference with Multi-parametric Preoperative MRI

Authors: Xiaofeng Liu, Nadya Shusharina, Helen A Shih, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of tr… ▽ More In this work, we aim to predict the survival time (ST) of glioblastoma (GBM) patients undergoing different treatments based on preoperative magnetic resonance (MR) scans. The personalized and precise treatment planning can be achieved by comparing the ST of different treatments. It is well established that both the current status of the patient (as represented by the MR scans) and the choice of treatment are the cause of ST. While previous related MR-based glioblastoma ST studies have focused only on the direct mapping of MR scans to ST, they have not included the underlying causal relationship between treatments and ST. To address this limitation, we propose a treatment-conditioned regression model for glioblastoma ST that incorporates treatment information in addition to MR scans. Our approach allows us to effectively utilize the data from all of the treatments in a unified manner, rather than having to train separate models for each of the treatments. Furthermore, treatment can be effectively injected into each convolutional layer through the adaptive instance normalization we employ. We evaluate our framework on the BraTS20 ST prediction task. Three treatment options are considered: Gross Total Resection (GTR), Subtotal Resection (STR), and no resection. The evaluation results demonstrate the effectiveness of injecting the treatment for estimating GBM survival. △ Less

Submitted 10 February, 2024; originally announced February 2024.

Comments: SPIE Medical Imaging 2024: Computer-Aided Diagnosis

arXiv:2401.07475 [pdf, other]

GWPT: A Green Word-Embedding-based POS Tagger

Authors: Chengwei Wei, Runqi Pang, C. -C. Jay Kuo

Abstract: As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) fe… ▽ More As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods. △ Less

Submitted 15 January, 2024; originally announced January 2024.

arXiv:2312.14968 [pdf, other]

Enhancing Edge Intelligence with Highly Discriminant LNT Features

Authors: Xinyu Wang, Vinod K. Mishra, C. -C. Jay Kuo

Abstract: AI algorithms at the edge demand smaller model sizes and lower computational complexity. To achieve these objectives, we adopt a green learning (GL) paradigm rather than the deep learning paradigm. GL has three modules: 1) unsupervised representation learning, 2) supervised feature learning, and 3) supervised decision learning. We focus on the second module in this work. In particular, we derive n… ▽ More AI algorithms at the edge demand smaller model sizes and lower computational complexity. To achieve these objectives, we adopt a green learning (GL) paradigm rather than the deep learning paradigm. GL has three modules: 1) unsupervised representation learning, 2) supervised feature learning, and 3) supervised decision learning. We focus on the second module in this work. In particular, we derive new discriminant features from proper linear combinations of input features, denoted by x, obtained in the first module. They are called complementary and raw features, respectively. Along this line, we present a novel supervised learning method to generate highly discriminant complementary features based on the least-squares normal transform (LNT). LNT consists of two steps. First, we convert a C-class classification problem to a binary classification problem. The two classes are assigned with 0 and 1, respectively. Next, we formulate a least-squares regression problem from the N-dimensional (N-D) feature space to the 1-D output space, and solve the least-squares normal equation to obtain one N-D normal vector, denoted by a1. Since one normal vector is yielded by one binary split, we can obtain M normal vectors with M splits. Then, Ax is called an LNT of x, where transform matrix A in R^{M by N} by stacking aj^T, j=1, ..., M, and the LNT, Ax, can generate M new features. The newly generated complementary features are shown to be more discriminant than the raw features. Experiments show that the classification performance can be improved by these new features. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 2023 IEEE International Conference on Big Data, AI and Adaptive Computing for Edge Sensing and Processing Workshop

arXiv:2310.04995 [pdf, other]

SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment

Authors: Ganning Zhao, Wenhui Cui, Suya You, C. -C. Jay Kuo

Abstract: Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic cons… ▽ More Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic consistency in translation is proposed and named SemST in this work. SemST reduces semantic distortion by employing contrastive learning and aligning the structural and textural properties of input and output by maximizing their mutual information. Furthermore, a multi-scale approach is introduced to enhance translation performance, thereby enabling the applicability of SemST to domain adaptation in high-resolution images. Experiments show that SemST effectively mitigates semantic distortion and achieves state-of-the-art performance. Also, the application of SemST to domain adaptation (DA) is explored. It is demonstrated by preliminary experiments that SemST can be utilized as a beneficial pre-training for the semantic segmentation task. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2309.12501 [pdf, other]

Knowledge Graph Embedding: An Overview

Authors: Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

Abstract: Many mathematical models have been leveraged to design embeddings for representing Knowledge Graph (KG) entities and relations for link prediction and many downstream tasks. These mathematically-inspired models are not only highly scalable for inference in large KGs, but also have many explainable advantages in modeling different relation patterns that can be validated through both formal proofs a… ▽ More Many mathematical models have been leveraged to design embeddings for representing Knowledge Graph (KG) entities and relations for link prediction and many downstream tasks. These mathematically-inspired models are not only highly scalable for inference in large KGs, but also have many explainable advantages in modeling different relation patterns that can be validated through both formal proofs and empirical results. In this paper, we make a comprehensive overview of the current state of research in KG completion. In particular, we focus on two main branches of KG embedding (KGE) design: 1) distance-based methods and 2) semantic matching-based methods. We discover the connections between recently proposed models and present an underlying trend that might help researchers invent novel and more effective models. Next, we delve into CompoundE and CompoundE3D, which draw inspiration from 2D and 3D affine operations, respectively. They encompass a broad spectrum of techniques including distance-based and semantic-based methods. We will also discuss an emerging approach for KG completion which leverages pre-trained language models (PLMs) and textual descriptions of entities and relations and offer insights into the integration of KGE embedding methods with PLMs for KG completion. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.09078 [pdf, other]

Unsupervised Green Object Tracker (GOT) without Offline Pre-training

Authors: Zhiruo Zhou, Suya You, C. -C. Jay Kuo

Abstract: Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility w… ▽ More Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices. △ Less

Submitted 16 September, 2023; originally announced September 2023.

arXiv:2309.08836 [pdf, other]

Bias and Fairness in Chatbots: An Overview

Authors: Jintang Xue, Yun-Cheng Wang, Chengwei Wei, Xiaofeng Liu, Jonghye Woo, C. -C. Jay Kuo

Abstract: Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in mode… ▽ More Chatbots have been studied for more than half a century. With the rapid development of natural language processing (NLP) technologies in recent years, chatbots using large language models (LLMs) have received much attention nowadays. Compared with traditional ones, modern chatbots are more powerful and have been used in real-world applications. There are however, bias and fairness concerns in modern chatbot design. Due to the huge amounts of training data, extremely large model sizes, and lack of interpretability, bias mitigation and fairness preservation of modern chatbots are challenging. Thus, a comprehensive overview on bias and fairness in chatbot systems is given in this paper. The history of chatbots and their categories are first reviewed. Then, bias sources and potential harms in applications are analyzed. Considerations in designing fair and unbiased chatbot systems are examined. Finally, future research directions are discussed. △ Less

Submitted 10 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2308.16055 [pdf, other]

AsyncET: Asynchronous Learning for Knowledge Graph Entity Typing with Auxiliary Relations

Authors: Yun-Cheng Wang, Xiou Ge, Bin Wang, C. -C. Jay Kuo

Abstract: Knowledge graph entity typing (KGET) is a task to predict the missing entity types in knowledge graphs (KG). Previously, KG embedding (KGE) methods tried to solve the KGET task by introducing an auxiliary relation, 'hasType', to model the relationship between entities and their types. However, a single auxiliary relation has limited expressiveness for diverse entity-type patterns. We improve the e… ▽ More Knowledge graph entity typing (KGET) is a task to predict the missing entity types in knowledge graphs (KG). Previously, KG embedding (KGE) methods tried to solve the KGET task by introducing an auxiliary relation, 'hasType', to model the relationship between entities and their types. However, a single auxiliary relation has limited expressiveness for diverse entity-type patterns. We improve the expressiveness of KGE methods by introducing multiple auxiliary relations in this work. Similar entity types are grouped to reduce the number of auxiliary relations and improve their capability to model entity-type patterns with different granularities. With the presence of multiple auxiliary relations, we propose a method adopting an Asynchronous learning scheme for Entity Typing, named AsyncET, which updates the entity and type embeddings alternatively to keep the learned entity embedding up-to-date and informative for entity type prediction. Experiments are conducted on two commonly used KGET datasets to show that the performance of KGE methods on the KGET task can be substantially improved by the proposed multiple auxiliary relations and asynchronous embedding learning. Furthermore, our method has a significant advantage over state-of-the-art methods in model sizes and time complexity. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2306.17170 [pdf, other]

doi 10.36227/techrxiv.23272271

An Overview on Generative AI at Scale with Edge-Cloud Computing

Authors: Yun-Cheng Wang, Jintang Xue, Chengwei Wei, C. -C. Jay Kuo

Abstract: As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing fram… ▽ More As a specific category of artificial intelligence (AI), generative artificial intelligence (GenAI) generates new content that resembles what is created by humans. The rapid development of GenAI systems has created a huge amount of new data on the Internet, posing new challenges to current computing and communication frameworks. Currently, GenAI services rely on the traditional cloud computing framework due to the need for large computation resources. However, such services will encounter high latency because of data transmission and a high volume of requests. On the other hand, edge-cloud computing can provide adequate computation power and low latency at the same time through the collaboration between edges and the cloud. Thus, it is attractive to build GenAI systems at scale by leveraging the edge-cloud computing paradigm. In this overview paper, we review recent developments in GenAI and edge-cloud computing, respectively. Then, we use two exemplary GenAI applications to discuss technical challenges in scaling up their solutions using edge-cloud collaborative systems. Finally, we list design considerations for training and deploying GenAI systems at scale and point out future research directions. △ Less

Submitted 9 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

arXiv:2306.04008 [pdf]

Green Steganalyzer: A Green Learning Approach to Image Steganalysis

Authors: Yao Zhu, Xinyu Wang, Hong-Shuo Chen, Ronald Salloum, C. -C. Jay Kuo

Abstract: A novel learning solution to image steganalysis based on the green learning paradigm, called Green Steganalyzer (GS), is proposed in this work. GS consists of three modules: 1) pixel-based anomaly prediction, 2) embedding location detection, and 3) decision fusion for image-level detection. In the first module, GS decomposes an image into patches, adopts Saab transforms for feature extraction, and… ▽ More A novel learning solution to image steganalysis based on the green learning paradigm, called Green Steganalyzer (GS), is proposed in this work. GS consists of three modules: 1) pixel-based anomaly prediction, 2) embedding location detection, and 3) decision fusion for image-level detection. In the first module, GS decomposes an image into patches, adopts Saab transforms for feature extraction, and conducts self-supervised learning to predict an anomaly score of their center pixel. In the second module, GS analyzes the anomaly scores of a pixel and its neighborhood to find pixels of higher embedding probabilities. In the third module, GS focuses on pixels of higher embedding probabilities and fuses their anomaly scores to make final image-level classification. Compared with state-of-the-art deep-learning models, GS achieves comparable detection performance against S-UNIWARD, WOW and HILL steganography schemes with significantly lower computational complexity and a smaller model size, making it attractive for mobile/edge applications. Furthermore, GS is mathematically transparent because of its modular design. △ Less

Submitted 6 June, 2023; originally announced June 2023.

arXiv:2304.12591 [pdf, other]

Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints

Authors: Ganning Zhao, Tingwei Shen, Suya You, C. -C. Jay Kuo

Abstract: Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlat… ▽ More Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlated patches together and push uncorrelated ones apart. In this work, we exploit semantic and structural consistency between synthetic and refined images and adopt CL to reduce the semantic distortion. Besides, we incorporate hard negative mining to improve the performance furthermore. We compare the performance of our method with several other benchmarking methods using qualitative and quantitative measures and show that our method offers the state-of-the-art performance. △ Less

Submitted 26 April, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

arXiv:2304.00378 [pdf, other]

Knowledge Graph Embedding with 3D Compound Geometric Transformations

Authors: Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

Abstract: The cascade of 2D geometric transformations were exploited to model relations between entities in a knowledge graph (KG), leading to an effective KG embedding (KGE) model, CompoundE. Furthermore, the rotation in the 3D space was proposed as a new KGE model, Rotate3D, by leveraging its non-commutative property. Inspired by CompoundE and Rotate3D, we leverage 3D compound geometric transformations, i… ▽ More The cascade of 2D geometric transformations were exploited to model relations between entities in a knowledge graph (KG), leading to an effective KG embedding (KGE) model, CompoundE. Furthermore, the rotation in the 3D space was proposed as a new KGE model, Rotate3D, by leveraging its non-commutative property. Inspired by CompoundE and Rotate3D, we leverage 3D compound geometric transformations, including translation, rotation, scaling, reflection, and shear and propose a family of KGE models, named CompoundE3D, in this work. CompoundE3D allows multiple design variants to match rich underlying characteristics of a KG. Since each variant has its own advantages on a subset of relations, an ensemble of multiple variants can yield superior performance. The effectiveness and flexibility of CompoundE3D are experimentally verified on four popular link prediction datasets. △ Less

Submitted 1 April, 2023; originally announced April 2023.

arXiv:2303.10898 [pdf, other]

A Tiny Machine Learning Model for Point Cloud Object Classification

Authors: Min Zhang, Jintang Xue, Pranav Kadam, Hardik Prajapati, Shan Liu, C. -C. Jay Kuo

Abstract: The design of a tiny machine learning model, which can be deployed in mobile and edge devices, for point cloud object classification is investigated in this work. To achieve this objective, we replace the multi-scale representation of a point cloud object with a single-scale representation for complexity reduction, and exploit rich 3D geometric information of a point cloud object for performance i… ▽ More The design of a tiny machine learning model, which can be deployed in mobile and edge devices, for point cloud object classification is investigated in this work. To achieve this objective, we replace the multi-scale representation of a point cloud object with a single-scale representation for complexity reduction, and exploit rich 3D geometric information of a point cloud object for performance improvement. The proposed solution is named Green-PointHop due to its low computational complexity. We evaluate the performance of Green-PointHop on ModelNet40 and ScanObjectNN two datasets. Green-PointHop has a model size of 64K parameters. It demands 2.3M floating-point operations (FLOPs) to classify a ModelNet40 object of 1024 down-sampled points. Its classification performance gaps against the state-of-the-art DGCNN method are 3% and 7% for ModelNet40 and ScanObjectNN, respectively. On the other hand, the model size and inference complexity of DGCNN are 42X and 1203X of those of Green-PointHop, respectively. △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 13 pages, 4 figures

arXiv:2303.05759 [pdf, other]

An Overview on Language Models: Recent Developments and Outlook

Authors: Chengwei Wei, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

Abstract: Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner, while pre-trained language models (PLMs) c… ▽ More Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner, while pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, architectures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era. △ Less

Submitted 3 July, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

Report number: APSIPA Transactions on Signal and Information Processing: Vol. 13: No. 2, e101

arXiv:2302.14193 [pdf, other]

PointFlowHop: Green and Interpretable Scene Flow Estimation from Consecutive Point Clouds

Authors: Pranav Kadam, Jiahao Gu, Shan Liu, C. -C. Jay Kuo

Abstract: An efficient 3D scene flow estimation method called PointFlowHop is proposed in this work. PointFlowHop takes two consecutive point clouds and determines the 3D flow vectors for every point in the first point cloud. PointFlowHop decomposes the scene flow estimation task into a set of subtasks, including ego-motion compensation, object association and object-wise motion estimation. It follows the g… ▽ More An efficient 3D scene flow estimation method called PointFlowHop is proposed in this work. PointFlowHop takes two consecutive point clouds and determines the 3D flow vectors for every point in the first point cloud. PointFlowHop decomposes the scene flow estimation task into a set of subtasks, including ego-motion compensation, object association and object-wise motion estimation. It follows the green learning (GL) pipeline and adopts the feedforward data processing path. As a result, its underlying mechanism is more transparent than deep-learning (DL) solutions based on end-to-end optimization of network parameters. We conduct experiments on the stereoKITTI and the Argoverse LiDAR point cloud datasets and demonstrate that PointFlowHop outperforms deep-learning methods with a small model size and less training time. Furthermore, we compare the Floating Point Operations (FLOPs) required by PointFlowHop and other learning-based methods in inference, and show its big savings in computational complexity. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 13 pages, 5 figures

arXiv:2302.13596 [pdf, other]

LSR: A Light-Weight Super-Resolution Method

Authors: Wei Wang, Xuejing Lei, Yueru Chen, Ming-Sui Lee, C. -C. Jay Kuo

Abstract: A light-weight super-resolution (LSR) method from a single image targeting mobile applications is proposed in this work. LSR predicts the residual image between the interpolated low-resolution (ILR) and high-resolution (HR) images using a self-supervised framework. To lower the computational complexity, LSR does not adopt the end-to-end optimization deep networks. It consists of three modules: 1)… ▽ More A light-weight super-resolution (LSR) method from a single image targeting mobile applications is proposed in this work. LSR predicts the residual image between the interpolated low-resolution (ILR) and high-resolution (HR) images using a self-supervised framework. To lower the computational complexity, LSR does not adopt the end-to-end optimization deep networks. It consists of three modules: 1) generation of a pool of rich and diversified representations in the neighborhood of a target pixel via unsupervised learning, 2) selecting a subset from the representation pool that is most relevant to the underlying super-resolution task automatically via supervised learning, 3) predicting the residual of the target pixel via regression. LSR has low computational complexity and reasonable model size so that it can be implemented on mobile/edge platforms conveniently. Besides, it offers better visual quality than classical exemplar-based methods in terms of PSNR/SSIM measures. △ Less

Submitted 27 February, 2023; originally announced February 2023.

Comments: 8 pages, 3 figures, 10 tables

ACM Class: I.4.3

arXiv:2302.11506 [pdf, other]

S3I-PointHop: SO(3)-Invariant PointHop for 3D Point Cloud Classification

Authors: Pranav Kadam, Hardik Prajapati, Min Zhang, Jintang Xue, Shan Liu, C. -C. Jay Kuo

Abstract: Many point cloud classification methods are developed under the assumption that all point clouds in the dataset are well aligned with the canonical axes so that the 3D Cartesian point coordinates can be employed to learn features. When input point clouds are not aligned, the classification performance drops significantly. In this work, we focus on a mathematically transparent point cloud classific… ▽ More Many point cloud classification methods are developed under the assumption that all point clouds in the dataset are well aligned with the canonical axes so that the 3D Cartesian point coordinates can be employed to learn features. When input point clouds are not aligned, the classification performance drops significantly. In this work, we focus on a mathematically transparent point cloud classification method called PointHop, analyze its reason for failure due to pose variations, and solve the problem by replacing its pose dependent modules with rotation invariant counterparts. The proposed method is named SO(3)-Invariant PointHop (or S3I-PointHop in short). We also significantly simplify the PointHop pipeline using only one single hop along with multiple spatial aggregation techniques. The idea of exploiting more spatial information is novel. Experiments on the ModelNet40 dataset demonstrate the superiority of S3I-PointHop over traditional PointHop-like methods. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 5 pages, 3 figures

arXiv:2301.08959 [pdf, other]

Successive Subspace Learning for Cardiac Disease Classification with Two-phase Deformation Fields from Cine MRI

Authors: Xiaofeng Liu, Fangxu Xing, Hanna K. Gaggin, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool.~While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a ``black box," for w… ▽ More Cardiac cine magnetic resonance imaging (MRI) has been used to characterize cardiovascular diseases (CVD), often providing a noninvasive phenotyping tool.~While recently flourished deep learning based approaches using cine MRI yield accurate characterization results, the performance is often degraded by small training samples. In addition, many deep learning models are deemed a ``black box," for which models remain largely elusive in how models yield a prediction and how reliable they are. To alleviate this, this work proposes a lightweight successive subspace learning (SSL) framework for CVD classification, based on an interpretable feedforward design, in conjunction with a cardiac atlas. Specifically, our hierarchical SSL model is based on (i) neighborhood voxel expansion, (ii) unsupervised subspace approximation, (iii) supervised regression, and (iv) multi-level feature integration. In addition, using two-phase 3D deformation fields, including end-diastolic and end-systolic phases, derived between the atlas and individual subjects as input offers objective means of assessing CVD, even with small training samples. We evaluate our framework on the ACDC2017 database, comprising one healthy group and four disease groups. Compared with 3D CNN-based approaches, our framework achieves superior classification performance with 140$\times$ fewer parameters, which supports its potential value in clinical use. △ Less

Submitted 21 January, 2023; originally announced January 2023.

Comments: ISBI 2023

arXiv:2212.11484 [pdf, other]

SALVE: Self-supervised Adaptive Low-light Video Enhancement

Authors: Zohreh Azizi, C. -C. Jay Kuo

Abstract: A self-supervised adaptive low-light video enhancement method, called SALVE, is proposed in this work. SALVE first enhances a few key frames of an input low-light video using a retinex-based low-light image enhancement technique. For each keyframe, it learns a mapping from low-light image patches to enhanced ones via ridge regression. These mappings are then used to enhance the remaining frames in… ▽ More A self-supervised adaptive low-light video enhancement method, called SALVE, is proposed in this work. SALVE first enhances a few key frames of an input low-light video using a retinex-based low-light image enhancement technique. For each keyframe, it learns a mapping from low-light image patches to enhanced ones via ridge regression. These mappings are then used to enhance the remaining frames in the low-light video. The combination of traditional retinex-based image enhancement and learning-based ridge regression leads to a robust, adaptive and computationally inexpensive solution to enhance low-light videos. Our extensive experiments along with a user study show that 87% of participants prefer SALVE over prior work. △ Less

Submitted 21 February, 2023; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: 12 pages, 7 figures, 4 tables

arXiv:2211.01096 [pdf, other]

doi 10.1016/j.jvcir.2023.104045

Recovering Sign Bits of DCT Coefficients in Digital Images as an Optimization Problem

Authors: Ruiyuan Lin, Sheng Liu, Jun Jiang, Shujun Li, Chengqing Li, C. -C. Jay Kuo

Abstract: Recovering unknown, missing, damaged, distorted, or lost information in DCT coefficients is a common task in multiple applications of digital image processing, including image compression, selective image encryption, and image communication. This paper investigates the recovery of sign bits in DCT coefficients of digital images, by proposing two different approximation methods to solve a mixed int… ▽ More Recovering unknown, missing, damaged, distorted, or lost information in DCT coefficients is a common task in multiple applications of digital image processing, including image compression, selective image encryption, and image communication. This paper investigates the recovery of sign bits in DCT coefficients of digital images, by proposing two different approximation methods to solve a mixed integer linear programming (MILP) problem, which is NP-hard in general. One method is a relaxation of the MILP problem to a linear programming (LP) problem, and the other splits the original MILP problem into some smaller MILP problems and an LP problem. We considered how the proposed methods can be applied to JPEG-encoded images and conducted extensive experiments to validate their performances. The experimental results showed that the proposed methods outperformed other existing methods by a substantial margin, both according to objective quality metrics and our subjective evaluation. △ Less

Submitted 8 January, 2024; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: 22 pages, 8 figures

MSC Class: 68P30

Journal ref: Journal of Visual Communication and Image Representation, vol. 98, art. no. 104045, 2024

arXiv:2210.03689 [pdf, ps, other]

GENHOP: An Image Generation Method Based on Successive Subspace Learning

Authors: Xuejing Lei, Wei Wang, C. -C. Jay Kuo

Abstract: Being different from deep-learning-based (DL-based) image generation methods, a new image generative model built upon successive subspace learning principle is proposed and named GenHop (an acronym of Generative PixelHop) in this work. GenHop consists of three modules: 1) high-to-low dimension reduction, 2) seed image generation, and 3) low-to-high dimension expansion. In the first module, it buil… ▽ More Being different from deep-learning-based (DL-based) image generation methods, a new image generative model built upon successive subspace learning principle is proposed and named GenHop (an acronym of Generative PixelHop) in this work. GenHop consists of three modules: 1) high-to-low dimension reduction, 2) seed image generation, and 3) low-to-high dimension expansion. In the first module, it builds a sequence of high-to-low dimensional subspaces through a sequence of whitening processes, each of which contains samples of joint-spatial-spectral representation. In the second module, it generates samples in the lowest dimensional subspace. In the third module, it finds a proper high-dimensional sample for a seed image by adding details back via locally linear embedding (LLE) and a sequence of coloring processes. Experiments show that GenHop can generate visually pleasant images whose FID scores are comparable or even better than those of DL-based generative models for MNIST, Fashion-MNIST and CelebA datasets. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: 10 pages, 5 figures, accepted by ISCAS 2022

arXiv:2210.00965 [pdf, other]

Green Learning: Introduction, Examples and Outlook

Authors: C. -C. Jay Kuo, Azad M. Madni

Abstract: Rapid advances in artificial intelligence (AI) in the last decade have largely been built upon the wide applications of deep learning (DL). However, the high carbon footprint yielded by larger and larger DL networks becomes a concern for sustainability. Furthermore, DL decision mechanism is somewhat obsecure and can only be verified by test data. Green learning (GL) has been proposed as an alterna… ▽ More Rapid advances in artificial intelligence (AI) in the last decade have largely been built upon the wide applications of deep learning (DL). However, the high carbon footprint yielded by larger and larger DL networks becomes a concern for sustainability. Furthermore, DL decision mechanism is somewhat obsecure and can only be verified by test data. Green learning (GL) has been proposed as an alternative paradigm to address these concerns. GL is characterized by low carbon footprints, small model sizes, low computational complexity, and logical transparency. It offers energy-effective solutions in cloud centers as well as mobile/edge devices. GL also provides a clear and logical decision-making process to gain people's trust. Several statistical tools have been developed to achieve this goal in recent years. They include subspace approximation, unsupervised and supervised representation learning, supervised discriminant feature selection, and feature space partitioning. We have seen a few successful GL examples with performance comparable with state-of-the-art DL solutions. This paper offers an introduction to GL, its demonstrated applications, and future outlook. △ Less

Submitted 3 October, 2022; originally announced October 2022.

Journal ref: Journal of Visual Communication and Image Representation 2022

arXiv:2209.12139 [pdf, other]

Lightweight Image Codec via Multi-Grid Multi-Block-Size Vector Quantization (MGBVQ)

Authors: Yifan Wang, Zhanxuan Mei, Ioannis Katsavounidis, C. -C. Jay Kuo

Abstract: A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decom… ▽ More A multi-grid multi-block-size vector quantization (MGBVQ) method is proposed for image coding in this work. The fundamental idea of image coding is to remove correlations among pixels before quantization and entropy coding, e.g., the discrete cosine transform (DCT) and intra predictions, adopted by modern image coding standards. We present a new method to remove pixel correlations. First, by decomposing correlations into long- and short-range correlations, we represent long-range correlations in coarser grids due to their smoothness, thus leading to a multi-grid (MG) coding architecture. Second, we show that short-range correlations can be effectively coded by a suite of vector quantizers (VQs). Along this line, we argue the effectiveness of VQs of very large block sizes and present a convenient way to implement them. It is shown by experimental results that MGBVQ offers excellent rate-distortion (RD) performance, which is comparable with existing image coders, at much lower complexity. Besides, it provides a progressive coded bitstream. △ Less

Submitted 25 September, 2022; originally announced September 2022.

Comments: GIC-python-v2

arXiv:2209.11549 [pdf, other]

MAGIC: Mask-Guided Image Synthesis by Inverting a Quasi-Robust Classifier

Authors: Mozhdeh Rouhsedaghat, Masoud Monajatipoor, C. -C. Jay Kuo, Iacopo Masi

Abstract: We offer a method for one-shot mask-guided image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled MAGIC, leverages structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guarant… ▽ More We offer a method for one-shot mask-guided image synthesis that allows controlling manipulations of a single image by inverting a quasi-robust classifier equipped with strong regularizers. Our proposed method, entitled MAGIC, leverages structured gradients from a pre-trained quasi-robust classifier to better preserve the input semantics while preserving its classification accuracy, thereby guaranteeing credibility in the synthesis. Unlike current methods that use complex primitives to supervise the process or use attention maps as a weak supervisory signal, MAGIC aggregates gradients over the input, driven by a guide binary mask that enforces a strong, spatial prior. MAGIC implements a series of manipulations with a single framework achieving shape and location control, intense non-rigid shape deformations, and copy/move operations in the presence of repeating objects and gives users firm control over the synthesis by requiring to simply specify binary guide masks. Our study and findings are supported by various qualitative comparisons with the state-of-the-art on the same images sampled from ImageNet and quantitative analysis using machine perception along with a user survey of 100+ participants that endorse our synthesis quality. Project page at https://mozhdehrouhsedaghat.github.io/magic.html. Code is available at https://github.com/mozhdehrouhsedaghat/magic △ Less

Submitted 30 June, 2023; v1 submitted 23 September, 2022; originally announced September 2022.

Comments: Accepted to the Thirty-Seventh Conference on Artificial Intelligence (AAAI) 2023 - 12 pages, 9 figures

arXiv:2208.09137 [pdf, other]

GreenKGC: A Lightweight Knowledge Graph Completion Method

Authors: Yun-Cheng Wang, Xiou Ge, Bin Wang, C. -C. Jay Kuo

Abstract: Knowledge graph completion (KGC) aims to discover missing relationships between entities in knowledge graphs (KGs). Most prior KGC work focuses on learning embeddings for entities and relations through a simple scoring function. Yet, a higher-dimensional embedding space is usually required for a better reasoning capability, which leads to a larger model size and hinders applicability to real-world… ▽ More Knowledge graph completion (KGC) aims to discover missing relationships between entities in knowledge graphs (KGs). Most prior KGC work focuses on learning embeddings for entities and relations through a simple scoring function. Yet, a higher-dimensional embedding space is usually required for a better reasoning capability, which leads to a larger model size and hinders applicability to real-world problems (e.g., large-scale KGs or mobile/edge computing). A lightweight modularized KGC solution, called GreenKGC, is proposed in this work to address this issue. GreenKGC consists of three modules: representation learning, feature pruning, and decision learning, to extract discriminant KG features and make accurate predictions on missing relationships using classifiers and negative sampling. Experimental results demonstrate that, in low dimensions, GreenKGC can outperform SOTA methods in most datasets. In addition, low-dimensional GreenKGC can achieve competitive or even better performance against high-dimensional models with a much smaller model size. △ Less

Submitted 9 July, 2023; v1 submitted 18 August, 2022; originally announced August 2022.

Comments: Accepted to ACL2023

arXiv:2208.07769 [pdf, other]

Unsupervised Domain Adaptation for Segmentation with Black-box Source Model

Authors: Xiaofeng Liu, Chaehwa Yoo, Fangxu Xing, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: Unsupervised domain adaptation (UDA) has been widely used to transfer knowledge from a labeled source domain to an unlabeled target domain to counter the difficulty of labeling in a new domain. The training of conventional solutions usually relies on the existence of both source and target domain data. However, privacy of the large-scale and well-labeled data in the source domain and trained model… ▽ More Unsupervised domain adaptation (UDA) has been widely used to transfer knowledge from a labeled source domain to an unlabeled target domain to counter the difficulty of labeling in a new domain. The training of conventional solutions usually relies on the existence of both source and target domain data. However, privacy of the large-scale and well-labeled data in the source domain and trained model parameters can become the major concern of cross center/domain collaborations. In this work, to address this, we propose a practical solution to UDA for segmentation with a black-box segmentation model trained in the source domain only, rather than original source data or a white-box source model. Specifically, we resort to a knowledge distillation scheme with exponential mixup decay (EMD) to gradually learn target-specific representations. In addition, unsupervised entropy minimization is further applied to regularization of the target domain confidence. We evaluated our framework on the BraTS 2018 database, achieving performance on par with white-box source model adaptation approaches. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: SPIE Medical Imaging 2022: Image Processing

arXiv:2208.07754 [pdf, other]

Subtype-Aware Dynamic Unsupervised Domain Adaptation

Authors: Xiaofeng Liu, Fangxu Xing, Jia You, Jun Lu, C. -C. Jay Kuo, Georges El Fakhri, Jonghye Woo

Abstract: Unsupervised domain adaptation (UDA) has been successfully applied to transfer knowledge from a labeled source domain to target domains without their labels. Recently introduced transferable prototypical networks (TPN) further addresses class-wise conditional alignment. In TPN, while the closeness of class centers between source and target domains is explicitly enforced in a latent space, the unde… ▽ More Unsupervised domain adaptation (UDA) has been successfully applied to transfer knowledge from a labeled source domain to target domains without their labels. Recently introduced transferable prototypical networks (TPN) further addresses class-wise conditional alignment. In TPN, while the closeness of class centers between source and target domains is explicitly enforced in a latent space, the underlying fine-grained subtype structure and the cross-domain within-class compactness have not been fully investigated. To counter this, we propose a new approach to adaptively perform a fine-grained subtype-aware alignment to improve performance in the target domain without the subtype label in both domains. The insight of our approach is that the unlabeled subtypes in a class have the local proximity within a subtype, while exhibiting disparate characteristics, because of different conditional and label shifts. Specifically, we propose to simultaneously enforce subtype-wise compactness and class-wise separation, by utilizing intermediate pseudo-labels. In addition, we systematically investigate various scenarios with and without prior knowledge of subtype numbers, and propose to exploit the underlying subtype structure. Furthermore, a dynamic queue framework is developed to evolve the subtype cluster centroids steadily using an alternative processing scheme. Experimental results, carried out with multi-view congenital heart disease data and VisDA and DomainNet, show the effectiveness and validity of our subtype-aware UDA, compared with state-of-the-art UDA methods. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

arXiv:2208.07023 [pdf, ps, other]

Acceleration of Subspace Learning Machine via Particle Swarm Optimization and Parallel Processing

Authors: Hongyu Fu, Yijing Yang, Yuhuai Liu, Joseph Lin, Ethan Harrison, Vinod K. Mishra, C. -C. Jay Kuo

Abstract: Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ways to accelerate SLM. First, we adopt the particle swarm optimizat… ▽ More Built upon the decision tree (DT) classification and regression idea, the subspace learning machine (SLM) has been recently proposed to offer higher performance in general classification and regression tasks. Its performance improvement is reached at the expense of higher computational complexity. In this work, we investigate two ways to accelerate SLM. First, we adopt the particle swarm optimization (PSO) algorithm to speed up the search of a discriminant dimension that is expressed as a linear combination of current dimensions. The search of optimal weights in the linear combination is computationally heavy. It is accomplished by probabilistic search in original SLM. The acceleration of SLM by PSO requires 10-20 times fewer iterations. Second, we leverage parallel processing in the SLM implementation. Experimental results show that the accelerated SLM method achieves a speed up factor of 577 in training time while maintaining comparable classification/regression performance of original SLM. △ Less

Submitted 15 August, 2022; originally announced August 2022.

arXiv:2208.02932 [pdf, other]

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

Authors: Yilei Zeng, Jiali Duan, Yang Li, Emilio Ferrara, Lerrel Pinto, C. -C. Jay Kuo, Stefanos Nikolaidis

Abstract: Human-centered AI considers human experiences with AI performance. While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferr… ▽ More Human-centered AI considers human experiences with AI performance. While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties. △ Less

Submitted 4 August, 2022; originally announced August 2022.

Comments: 6 pages, 7 figures

ACM Class: I.2.6

arXiv:2208.01823 [pdf, other]

Statistical Attention Localization (SAL): Methodology and Application to Object Classification

Authors: Yijing Yang, Vasileios Magoulianitis, Xinyu Wang, C. -C. Jay Kuo

Abstract: A statistical attention localization (SAL) method is proposed to facilitate the object classification task in this work. SAL consists of three steps: 1) preliminary attention window selection via decision statistics, 2) attention map refinement, and 3) rectangular attention region finalization. SAL computes soft-decision scores of local squared windows and uses them to identify salient regions in… ▽ More A statistical attention localization (SAL) method is proposed to facilitate the object classification task in this work. SAL consists of three steps: 1) preliminary attention window selection via decision statistics, 2) attention map refinement, and 3) rectangular attention region finalization. SAL computes soft-decision scores of local squared windows and uses them to identify salient regions in Step 1. To accommodate object of various sizes and shapes, SAL refines the preliminary result and obtain an attention map of more flexible shape in Step 2. Finally, SAL yields a rectangular attention region using the refined attention map and bounding box regularization in Step 3. As an application, we adopt E-PixelHop, which is an object classification solution based on successive subspace learning (SSL), as the baseline. We apply SAL so as to obtain a cropped-out and resized attention region as an alternative input. Classification results of the whole image as well as the attention region are ensembled to achieve the highest classification accuracy. Experiments on the CIFAR-10 dataset are given to demonstrate the advantage of the SAL-assisted object classification method. △ Less

Submitted 2 August, 2022; originally announced August 2022.

Comments: 11 pages, 9 figures

arXiv:2208.00475 [pdf, other]

Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics

Authors: Xiaoyuan Guo, Jiali Duan, C. -C. Jay Kuo, Judy Wawira Gichoya, Imon Banerjee

Abstract: Language modality within the vision language pretraining framework is innately discretized, endowing each word in the language vocabulary a semantic meaning. In contrast, visual modality is inherently continuous and high-dimensional, which potentially prohibits the alignment as well as fusion between vision and language modalities. We therefore propose to "discretize" the visual representation by… ▽ More Language modality within the vision language pretraining framework is innately discretized, endowing each word in the language vocabulary a semantic meaning. In contrast, visual modality is inherently continuous and high-dimensional, which potentially prohibits the alignment as well as fusion between vision and language modalities. We therefore propose to "discretize" the visual representation by joint learning a codebook that imbues each visual token a semantic. We then utilize these discretized visual semantics as self-supervised ground-truths for building our Masked Image Modeling objective, a counterpart of Masked Language Modeling which proves successful for language models. To optimize the codebook, we extend the formulation of VQ-VAE which gives a theoretic guarantee. Experiments validate the effectiveness of our approach across common vision-language benchmarks. △ Less

Submitted 31 July, 2022; originally announced August 2022.

Comments: 7 pages, 4 figures, ICPR2022. arXiv admin note: text overlap with arXiv:2203.00048

arXiv:2207.11844 [pdf, other]

doi 10.1145/3503161.3548376

Enhancing Image Rescaling using Dual Latent Variables in Invertible Neural Network

Authors: Min Zhang, Zhihong Pan, Xin Zhou, C. -C. Jay Kuo

Abstract: Normalizing flow models have been used successfully for generative image super-resolution (SR) by approximating complex distribution of natural images to simple tractable distribution in latent space through Invertible Neural Networks (INN). These models can generate multiple realistic SR images from one low-resolution (LR) input using randomly sampled points in the latent space, simulating the il… ▽ More Normalizing flow models have been used successfully for generative image super-resolution (SR) by approximating complex distribution of natural images to simple tractable distribution in latent space through Invertible Neural Networks (INN). These models can generate multiple realistic SR images from one low-resolution (LR) input using randomly sampled points in the latent space, simulating the ill-posed nature of image upscaling where multiple high-resolution (HR) images correspond to the same LR. Lately, the invertible process in INN has also been used successfully by bidirectional image rescaling models like IRN and HCFlow for joint optimization of downscaling and inverse upscaling, resulting in significant improvements in upscaled image quality. While they are optimized for image downscaling too, the ill-posed nature of image downscaling, where one HR image could be downsized to multiple LR images depending on different interpolation kernels and resampling methods, is not considered. A new downscaling latent variable, in addition to the original one representing uncertainties in image upscaling, is introduced to model variations in the image downscaling process. This dual latent variable enhancement is applicable to different image rescaling models and it is shown in extensive experiments that it can improve image upscaling accuracy consistently without sacrificing image quality in downscaled LR images. It is also shown to be effective in enhancing other INN-based models for image restoration applications like image hiding. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: Accepted by ACM Multimedia 2022

ACM Class: I.4.5

arXiv:2207.07629 [pdf, other]

GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences

Authors: Zhiruo Zhou, Hongyu Fu, Suya You, C. -C. Jay Kuo

Abstract: Supervised and unsupervised deep trackers that rely on deep learning technologies are popular in recent years. Yet, they demand high computational complexity and a high memory cost. A green unsupervised single-object tracker, called GUSOT, that aims at object tracking for long videos under a resource-constrained environment is proposed in this work. Built upon a baseline tracker, UHP-SOT++, which… ▽ More Supervised and unsupervised deep trackers that rely on deep learning technologies are popular in recent years. Yet, they demand high computational complexity and a high memory cost. A green unsupervised single-object tracker, called GUSOT, that aims at object tracking for long videos under a resource-constrained environment is proposed in this work. Built upon a baseline tracker, UHP-SOT++, which works well for short-term tracking, GUSOT contains two additional new modules: 1) lost object recovery, and 2) color-saliency-based shape proposal. They help resolve the tracking loss problem and offer a more flexible object proposal, respectively. Thus, they enable GUSOT to achieve higher tracking accuracy in the long run. We conduct experiments on the large-scale dataset LaSOT with long video sequences, and show that GUSOT offers a lightweight high-performance tracking solution that finds applications in mobile and edge computing platforms. △ Less

Submitted 15 July, 2022; originally announced July 2022.

arXiv:2207.05324 [pdf, other]

CompoundE: Knowledge Graph Embedding with Translation, Rotation and Scaling Compound Operations

Authors: Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

Abstract: Translation, rotation, and scaling are three commonly used geometric manipulation operations in image processing. Besides, some of them are successfully used in developing effective knowledge graph embedding (KGE) models such as TransE and RotatE. Inspired by the synergy, we propose a new KGE model by leveraging all three operations in this work. Since translation, rotation, and scaling operations… ▽ More Translation, rotation, and scaling are three commonly used geometric manipulation operations in image processing. Besides, some of them are successfully used in developing effective knowledge graph embedding (KGE) models such as TransE and RotatE. Inspired by the synergy, we propose a new KGE model by leveraging all three operations in this work. Since translation, rotation, and scaling operations are cascaded to form a compound one, the new model is named CompoundE. By casting CompoundE in the framework of group theory, we show that quite a few scoring-function-based KGE models are special cases of CompoundE. CompoundE extends the simple distance-based relation to relation-dependent compound operations on head and/or tail entities. To demonstrate the effectiveness of CompoundE, we conduct experiments on three popular KG completion datasets. Experimental results show that CompoundE consistently achieves the state of-the-art performance. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 16 pages

arXiv:2206.10029 [pdf, other]

SynWMD: Syntax-aware Word Mover's Distance for Sentence Similarity Evaluation

Authors: Chengwei Wei, Bin Wang, C. -C. Jay Kuo

Abstract: Word Mover's Distance (WMD) computes the distance between words and models text similarity with the moving cost between words in two text sequences. Yet, it does not offer good performance in sentence similarity evaluation since it does not incorporate word importance and fails to take inherent contextual and structural information in a sentence into account. An improved WMD method using the synta… ▽ More Word Mover's Distance (WMD) computes the distance between words and models text similarity with the moving cost between words in two text sequences. Yet, it does not offer good performance in sentence similarity evaluation since it does not incorporate word importance and fails to take inherent contextual and structural information in a sentence into account. An improved WMD method using the syntactic parse tree, called Syntax-aware Word Mover's Distance (SynWMD), is proposed to address these two shortcomings in this work. First, a weighted graph is built upon the word co-occurrence statistics extracted from the syntactic parse trees of sentences. The importance of each word is inferred from graph connectivities. Second, the local syntactic parsing structure of words is considered in computing the distance between words. To demonstrate the effectiveness of the proposed SynWMD, we conduct experiments on 6 textual semantic similarity (STS) datasets and 4 sentence classification datasets. Experimental results show that SynWMD achieves state-of-the-art performance on STS tasks. It also outperforms other WMD-based methods on sentence classification tasks. △ Less

Submitted 20 June, 2022; originally announced June 2022.

arXiv:2206.09061 [pdf, other]

Design of Supervision-Scalable Learning Systems: Methodology and Performance Benchmarking

Authors: Yijing Yang, Hongyu Fu, C. -C. Jay Kuo

Abstract: The design of robust learning systems that offer stable performance under a wide range of supervision degrees is investigated in this work. We choose the image classification problem as an illustrative example and focus on the design of modularized systems that consist of three learning modules: representation learning, feature learning and decision learning. We discuss ways to adjust each module… ▽ More The design of robust learning systems that offer stable performance under a wide range of supervision degrees is investigated in this work. We choose the image classification problem as an illustrative example and focus on the design of modularized systems that consist of three learning modules: representation learning, feature learning and decision learning. We discuss ways to adjust each module so that the design is robust with respect to different training sample numbers. Based on these ideas, we propose two families of learning systems. One adopts the classical histogram of oriented gradients (HOG) features while the other uses successive-subspace-learning (SSL) features. We test their performance against LeNet-5, which is an end-to-end optimized neural network, for MNIST and Fashion-MNIST datasets. The number of training samples per image class goes from the extremely weak supervision condition (i.e., 1 labeled sample per class) to the strong supervision condition (i.e., 4096 labeled sample per class) with gradual transition in between (i.e., $2^n$, $n=0, 1, \cdots, 12$). Experimental results show that the two families of modularized learning systems have more robust performance than LeNet-5. They both outperform LeNet-5 by a large margin for small $n$ and have performance comparable with that of LeNet-5 for large $n$. △ Less

Submitted 16 August, 2022; v1 submitted 17 June, 2022; originally announced June 2022.

Comments: 16 pages, 12 figures, 4 tables, under consideration at Pattern Recognition

arXiv:2206.00162 [pdf, other]

PAGER: Progressive Attribute-Guided Extendable Robust Image Generation

Authors: Zohreh Azizi, C. -C. Jay Kuo

Abstract: This work presents a generative modeling approach based on successive subspace learning (SSL). Unlike most generative models in the literature, our method does not utilize neural networks to analyze the underlying source distribution and synthesize images. The resulting method, called the progressive attribute-guided extendable robust image generative (PAGER) model, has advantages in mathematical… ▽ More This work presents a generative modeling approach based on successive subspace learning (SSL). Unlike most generative models in the literature, our method does not utilize neural networks to analyze the underlying source distribution and synthesize images. The resulting method, called the progressive attribute-guided extendable robust image generative (PAGER) model, has advantages in mathematical transparency, progressive content generation, lower training time, robust performance with fewer training samples, and extendibility to conditional image generation. PAGER consists of three modules: core generator, resolution enhancer, and quality booster. The core generator learns the distribution of low-resolution images and performs unconditional image generation. The resolution enhancer increases image resolution via conditional generation. Finally, the quality booster adds finer details to generated images. Extensive experiments on MNIST, Fashion-MNIST, and CelebA datasets are conducted to demonstrate generative performance of PAGER. △ Less

Submitted 22 August, 2022; v1 submitted 31 May, 2022; originally announced June 2022.

Comments: 19 pages, 12 figures, 2 tables

arXiv:2205.05296 [pdf, other]

Subspace Learning Machine (SLM): Methodology and Performance

Authors: Hongyu Fu, Yijing Yang, Vinod K. Mishra, C. -C. Jay Kuo

Abstract: Inspired by the feedforward multilayer perceptron (FF-MLP), decision tree (DT) and extreme learning machine (ELM), a new classification model, called the subspace learning machine (SLM), is proposed in this work. SLM first identifies a discriminant subspace, $S^0$, by examining the discriminant power of each input feature. Then, it uses probabilistic projections of features in $S^0$ to yield 1D su… ▽ More Inspired by the feedforward multilayer perceptron (FF-MLP), decision tree (DT) and extreme learning machine (ELM), a new classification model, called the subspace learning machine (SLM), is proposed in this work. SLM first identifies a discriminant subspace, $S^0$, by examining the discriminant power of each input feature. Then, it uses probabilistic projections of features in $S^0$ to yield 1D subspaces and finds the optimal partition for each of them. This is equivalent to partitioning $S^0$ with hyperplanes. A criterion is developed to choose the best $q$ partitions that yield $2q$ partitioned subspaces among them. We assign $S^0$ to the root node of a decision tree and the intersections of $2q$ subspaces to its child nodes of depth one. The partitioning process is recursively applied at each child node to build an SLM tree. When the samples at a child node are sufficiently pure, the partitioning process stops and each leaf node makes a prediction. The idea can be generalized to regression, leading to the subspace learning regressor (SLR). Furthermore, ensembles of SLM/SLR trees can yield a stronger predictor. Extensive experiments are conducted for performance benchmarking among SLM/SLR trees, ensembles and classical classifiers/regressors. △ Less

Submitted 11 May, 2022; originally announced May 2022.

arXiv:2205.00211 [pdf, other]

DefakeHop++: An Enhanced Lightweight Deepfake Detector

Authors: Hong-Shuo Chen, Shuowen Hu, Suya You, C. -C. Jay Kuo

Abstract: On the basis of DefakeHop, an enhanced lightweight Deepfake detector called DefakeHop++ is proposed in this work. The improvements lie in two areas. First, DefakeHop examines three facial regions (i.e., two eyes and mouth) while DefakeHop++ includes eight more landmarks for broader coverage. Second, for discriminant features selection, DefakeHop uses an unsupervised approach while DefakeHop++ adop… ▽ More On the basis of DefakeHop, an enhanced lightweight Deepfake detector called DefakeHop++ is proposed in this work. The improvements lie in two areas. First, DefakeHop examines three facial regions (i.e., two eyes and mouth) while DefakeHop++ includes eight more landmarks for broader coverage. Second, for discriminant features selection, DefakeHop uses an unsupervised approach while DefakeHop++ adopts a more effective approach with supervision, called the Discriminant Feature Test (DFT). In DefakeHop++, rich spatial and spectral features are first derived from facial regions and landmarks automatically. Then, DFT is used to select a subset of discriminant features for classifier training. As compared with MobileNet v3 (a lightweight CNN model of 1.5M parameters targeting at mobile applications), DefakeHop++ has a model of 238K parameters, which is 16% of MobileNet v3. Furthermore, DefakeHop++ outperforms MobileNet v3 in Deepfake image detection performance in a weakly-supervised setting. △ Less

Submitted 30 April, 2022; originally announced May 2022.

arXiv:2204.08646 [pdf, other]

Label Efficient Regularization and Propagation for Graph Node Classification

Authors: Tian Xie, Rajgopal Kannan, C. -C. Jay Kuo

Abstract: An enhanced label propagation (LP) method called GraphHop was proposed recently. It outperforms graph convolutional networks (GCNs) in the semi-supervised node classification task on various networks. Although the performance of GraphHop was explained intuitively with joint node attribute and label signal smoothening, its rigorous mathematical treatment is lacking. In this paper, we propose a labe… ▽ More An enhanced label propagation (LP) method called GraphHop was proposed recently. It outperforms graph convolutional networks (GCNs) in the semi-supervised node classification task on various networks. Although the performance of GraphHop was explained intuitively with joint node attribute and label signal smoothening, its rigorous mathematical treatment is lacking. In this paper, we propose a label efficient regularization and propagation (LERP) framework for graph node classification, and present an alternate optimization procedure for its solution. Furthermore, we show that GraphHop only offers an approximate solution to this framework and has two drawbacks. First, it includes all nodes in the classifier training without taking the reliability of pseudo-labeled nodes into account in the label update step. Second, it provides a rough approximation to the optimum of a subproblem in the label aggregation step. Based on the LERP framework, we propose a new method, named the LERP method, to solve these two shortcomings. LERP determines reliable pseudo-labels adaptively during the alternate optimization and provides a better approximation to the optimum with computational efficiency. Theoretical convergence of LERP is guaranteed. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of LERP. That is, LERP outperforms all benchmarking methods, including GraphHop, consistently on five test datasets and an object recognition task at extremely low label rates (i.e., 1, 2, 4, 8, 16, and 20 labeled samples per class). △ Less

Submitted 30 October, 2022; v1 submitted 18 April, 2022; originally announced April 2022.

arXiv:2203.14887 [pdf, other]

HUNIS: High-Performance Unsupervised Nuclei Instance Segmentation

Authors: Vasileios Magoulianitis, Yijing Yang, C. -C. Jay Kuo

Abstract: A high-performance unsupervised nuclei instance segmentation (HUNIS) method is proposed in this work. HUNIS consists of two-stage block-wise operations. The first stage includes: 1) adaptive thresholding of pixel intensities, 2) incorporation of nuclei size/shape priors and 3) removal of false positive nuclei instances. Then, HUNIS conducts the second stage segmentation by receiving guidance from… ▽ More A high-performance unsupervised nuclei instance segmentation (HUNIS) method is proposed in this work. HUNIS consists of two-stage block-wise operations. The first stage includes: 1) adaptive thresholding of pixel intensities, 2) incorporation of nuclei size/shape priors and 3) removal of false positive nuclei instances. Then, HUNIS conducts the second stage segmentation by receiving guidance from the first one. The second stage exploits the segmentation masks obtained in the first stage and leverages color and shape distributions for a more accurate segmentation. The main purpose of the two-stage design is to provide pixel-wise pseudo-labels from the first to the second stage. This self-supervision mechanism is novel and effective. Experimental results on the MoNuSeg dataset show that HUNIS outperforms all other unsupervised methods by a substantial margin. It also has a competitive standing among state-of-the-art supervised methods. △ Less

Submitted 28 March, 2022; originally announced March 2022.

Comments: 8 pages, 3 figures, 3 tables

arXiv:2203.11924 [pdf, other]

On Supervised Feature Selection from High Dimensional Feature Spaces

Authors: Yijing Yang, Wei Wang, Hongyu Fu, C. -C. Jay Kuo

Abstract: The application of machine learning to image and video data often yields a high dimensional feature space. Effective feature selection techniques identify a discriminant feature subspace that lowers computational and modeling costs with little performance degradation. A novel supervised feature selection methodology is proposed for machine learning decisions in this work. The resulting tests are c… ▽ More The application of machine learning to image and video data often yields a high dimensional feature space. Effective feature selection techniques identify a discriminant feature subspace that lowers computational and modeling costs with little performance degradation. A novel supervised feature selection methodology is proposed for machine learning decisions in this work. The resulting tests are called the discriminant feature test (DFT) and the relevant feature test (RFT) for the classification and regression problems, respectively. The DFT and RFT procedures are described in detail. Furthermore, we compare the effectiveness of DFT and RFT with several classic feature selection methods. To this end, we use deep features obtained by LeNet-5 for MNIST and Fashion-MNIST datasets as illustrative examples. Other datasets with handcrafted and gene expressions features are also included for performance evaluation. It is shown by experimental results that DFT and RFT can select a lower dimensional feature subspace distinctly and robustly while maintaining high decision performance. △ Less

Submitted 19 June, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Comments: 14 pages, 9 figures, 9 tables, under consideration at APSIPA Transactions on Signal and Information Processing

arXiv:2203.02679 [pdf, other]

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

Authors: Bin Wang, C. -C. Jay Kuo, Haizhou Li

Abstract: Word and sentence embeddings are useful feature representations in natural language processing. However, intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade. Word and sentence similarity tasks have become the de facto evaluation method. It leads models to overfit to such evaluations, negatively impacting embedding models' development.… ▽ More Word and sentence embeddings are useful feature representations in natural language processing. However, intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade. Word and sentence similarity tasks have become the de facto evaluation method. It leads models to overfit to such evaluations, negatively impacting embedding models' development. This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations. Further, we propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks. Extensive experiments are conducted based on 60+ models and popular datasets to certify our judgments. Finally, the practical evaluation toolkit is released for future benchmarking purposes. △ Less

Submitted 21 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

Comments: Accepted as Main Conference for ACL 2022. Code: https://github.com/BinWang28/EvalRank-Embedding-Evaluation

arXiv:2202.07843 [pdf, other]

PCRP: Unsupervised Point Cloud Object Retrieval and Pose Estimation

Authors: Pranav Kadam, Qingyang Zhou, Shan Liu, C. -C. Jay Kuo

Abstract: An unsupervised point cloud object retrieval and pose estimation method, called PCRP, is proposed in this work. It is assumed that there exists a gallery point cloud set that contains point cloud objects with given pose orientation information. PCRP attempts to register the unknown point cloud object with those in the gallery set so as to achieve content-based object retrieval and pose estimation… ▽ More An unsupervised point cloud object retrieval and pose estimation method, called PCRP, is proposed in this work. It is assumed that there exists a gallery point cloud set that contains point cloud objects with given pose orientation information. PCRP attempts to register the unknown point cloud object with those in the gallery set so as to achieve content-based object retrieval and pose estimation jointly, where the point cloud registration task is built upon an enhanced version of the unsupervised R-PointHop method. Experiments on the ModelNet40 dataset demonstrate the superior performance of PCRP in comparison with traditional and learning based methods. △ Less

Submitted 15 February, 2022; originally announced February 2022.

Comments: 8 pages, 3 figures

arXiv:2112.12284 [pdf, other]

A Survey on Perceptually Optimized Video Coding

Authors: Yun Zhang, Linwei Zhu, Gangyi Jiang, Sam Kwong, C. -C. Jay Kuo

Abstract: To provide users with more realistic visual experiences, videos are developing in the trends of Ultra High Definition (UHD), High Frame Rate (HFR), High Dynamic Range (HDR), Wide Color Gammut (WCG) and high clarity. However, the data amount of videos increases exponentially, which requires high efficiency video compression for storage and network transmission. Perceptually optimized video coding a… ▽ More To provide users with more realistic visual experiences, videos are developing in the trends of Ultra High Definition (UHD), High Frame Rate (HFR), High Dynamic Range (HDR), Wide Color Gammut (WCG) and high clarity. However, the data amount of videos increases exponentially, which requires high efficiency video compression for storage and network transmission. Perceptually optimized video coding aims to maximize compression efficiency by exploiting visual redundancies. In this paper, we present a broad and systematic survey on perceptually optimized video coding. Firstly, we present problem formulation and framework of the perceptually optimized video coding, which includes visual perception modelling, visual quality assessment and perceptual video coding optimization. Secondly, recent advances on visual factors, computational perceptual models and quality assessment models are presented. Thirdly, we review perceptual video coding optimizations from four key aspects, including perceptually optimized bit allocation, rate-distortion optimization, transform and quantization, filtering and enhancement. In each part, problem formulation, working flow, recent advances, advantages and challenges are presented. Fourthly, perceptual coding performances of the latest coding standards and tools are experimentally analyzed. Finally, challenging issues and future opportunities are identified. △ Less

Submitted 15 November, 2022; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: 36 pages, 12 figures, 6 tables, accepted by ACM Computing Surveys

arXiv:2112.10067 [pdf, other]

doi 10.1016/j.patrec.2022.03.024

CORE: A Knowledge Graph Entity Type Prediction Method via Complex Space Regression and Embedding

Authors: Xiou Ge, Yun-Cheng Wang, Bin Wang, C. -C. Jay Kuo

Abstract: Entity type prediction is an important problem in knowledge graph (KG) research. A new KG entity type prediction method, named CORE (COmplex space Regression and Embedding), is proposed in this work. The proposed CORE method leverages the expressive power of two complex space embedding models; namely, RotatE and ComplEx models. It embeds entities and types in two different complex spaces using eit… ▽ More Entity type prediction is an important problem in knowledge graph (KG) research. A new KG entity type prediction method, named CORE (COmplex space Regression and Embedding), is proposed in this work. The proposed CORE method leverages the expressive power of two complex space embedding models; namely, RotatE and ComplEx models. It embeds entities and types in two different complex spaces using either RotatE or ComplEx. Then, we derive a complex regression model to link these two spaces. Finally, a mechanism to optimize embedding and regression parameters jointly is introduced. Experiments show that CORE outperforms benchmarking methods on representative KG entity type inference datasets. Strengths and weaknesses of various entity type prediction methods are analyzed. △ Less

Submitted 19 December, 2021; originally announced December 2021.

Journal ref: Pattern Recognit. Lett. 157 (2022) 97-103

arXiv:2112.09340 [pdf, other]

doi 10.1016/j.patrec.2022.04.001

KGBoost: A Classification-based Knowledge Base Completion Method with Negative Sampling

Authors: Yun-Cheng Wang, Xiou Ge, Bin Wang, C. -C. Jay Kuo

Abstract: Knowledge base completion is formulated as a binary classification problem in this work, where an XGBoost binary classifier is trained for each relation using relevant links in knowledge graphs (KGs). The new method, named KGBoost, adopts a modularized design and attempts to find hard negative samples so as to train a powerful classifier for missing link prediction. We conduct experiments on multi… ▽ More Knowledge base completion is formulated as a binary classification problem in this work, where an XGBoost binary classifier is trained for each relation using relevant links in knowledge graphs (KGs). The new method, named KGBoost, adopts a modularized design and attempts to find hard negative samples so as to train a powerful classifier for missing link prediction. We conduct experiments on multiple benchmark datasets, and demonstrate that KGBoost outperforms state-of-the-art methods across most datasets. Furthermore, as compared with models trained by end-to-end optimization, KGBoost works well under the low-dimensional setting so as to allow a smaller model size. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Journal ref: Pattern Recognition Letters, 2022

arXiv:2112.04054 [pdf, other]

GreenPCO: An Unsupervised Lightweight Point Cloud Odometry Method

Authors: Pranav Kadam, Min Zhang, Jiahao Gu, Shan Liu, C. -C. Jay Kuo

Abstract: Visual odometry aims to track the incremental motion of an object using the information captured by visual sensors. In this work, we study the point cloud odometry problem, where only the point cloud scans obtained by the LiDAR (Light Detection And Ranging) are used to estimate object's motion trajectory. A lightweight point cloud odometry solution is proposed and named the green point cloud odome… ▽ More Visual odometry aims to track the incremental motion of an object using the information captured by visual sensors. In this work, we study the point cloud odometry problem, where only the point cloud scans obtained by the LiDAR (Light Detection And Ranging) are used to estimate object's motion trajectory. A lightweight point cloud odometry solution is proposed and named the green point cloud odometry (GreenPCO) method. GreenPCO is an unsupervised learning method that predicts object motion by matching features of consecutive point cloud scans. It consists of three steps. First, a geometry-aware point sampling scheme is used to select discriminant points from the large point cloud. Second, the view is partitioned into four regions surrounding the object, and the PointHop++ method is used to extract point features. Third, point correspondences are established to estimate object motion between two consecutive scans. Experiments on the KITTI dataset are conducted to demonstrate the effectiveness of the GreenPCO method. It is observed that GreenPCO outperforms benchmarking deep learning methods in accuracy while it has a significantly smaller model size and less training time. △ Less

Submitted 17 July, 2022; v1 submitted 7 December, 2021; originally announced December 2021.

Comments: 10 pages, 5 figures

arXiv:2111.07548 [pdf, other]

Unsupervised Lightweight Single Object Tracking with UHP-SOT++

Authors: Zhiruo Zhou, Hongyu Fu, Suya You, C. -C. Jay Kuo

Abstract: An unsupervised, lightweight and high-performance single object tracker, called UHP-SOT, was proposed by Zhou et al. recently. As an extension, we present an enhanced version and name it UHP-SOT++ in this work. Built upon the foundation of the discriminative-correlation-filters-based (DCF-based) tracker, two new ingredients are introduced in UHP-SOT and UHP-SOT++: 1) background motion modeling and… ▽ More An unsupervised, lightweight and high-performance single object tracker, called UHP-SOT, was proposed by Zhou et al. recently. As an extension, we present an enhanced version and name it UHP-SOT++ in this work. Built upon the foundation of the discriminative-correlation-filters-based (DCF-based) tracker, two new ingredients are introduced in UHP-SOT and UHP-SOT++: 1) background motion modeling and 2) object box trajectory modeling. The main difference between UHP-SOT and UHP-SOT++ is the fusion strategy of proposals from three models (i.e., DCF, background motion and object box trajectory models). An improved fusion strategy is adopted by UHP-SOT++ for more robust tracking performance against large-scale tracking datasets. Our second contribution lies in an extensive evaluation of the performance of state-of-the-art supervised and unsupervised methods by testing them on four SOT benchmark datasets - OTB2015, TC128, UAV123 and LaSOT. Experiments show that UHP-SOT++ outperforms all previous unsupervised methods and several deep-learning (DL) methods in tracking accuracy. Since UHP-SOT++ has extremely small model size, high tracking performance, and low computational complexity (operating at a rate of 20 FPS on an i5 CPU even without code optimization), it is an ideal solution in real-time object tracking on resource-limited platforms. Based on the experimental results, we compare pros and cons of supervised and unsupervised trackers and provide a new perspective to understand the performance gap between supervised and unsupervised methods, which is the third contribution of this work. △ Less

Submitted 6 April, 2022; v1 submitted 15 November, 2021; originally announced November 2021.

Comments: updated content: comparison with state-of-the-art deep unsupervised methods

Showing 1–50 of 160 results for author: Kuo, C - J