subscribe to arXiv mailings

An Image Segmentation Model with Transformed Total Variation

Authors: Elisha Dayag, Kevin Bui, Fredrick Park, Jack Xin

Abstract: Based on transformed $\ell_1$ regularization, transformed total variation (TTV) has robust image recovery that is competitive with other nonconvex total variation (TV) regularizers, such as TV$^p$, $0<p<1$. Inspired by its performance, we propose a TTV-regularized Mumford--Shah model with fuzzy membership function for image segmentation. To solve it, we design an alternating direction method of mu… ▽ More Based on transformed $\ell_1$ regularization, transformed total variation (TTV) has robust image recovery that is competitive with other nonconvex total variation (TV) regularizers, such as TV$^p$, $0<p<1$. Inspired by its performance, we propose a TTV-regularized Mumford--Shah model with fuzzy membership function for image segmentation. To solve it, we design an alternating direction method of multipliers (ADMM) algorithm that utilizes the transformed $\ell_1$ proximal operator. Numerical experiments demonstrate that using TTV is more effective than classical TV and other nonconvex TV variants in image segmentation. △ Less

Submitted 4 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

Comments: Accepted to EUSIPCO'24

arXiv:2402.07243 [pdf, other]

PIVOT-Net: Heterogeneous Point-Voxel-Tree-based Framework for Point Cloud Compression

Authors: Jiahao Pang, Kevin Bui, Dong Tian

Abstract: The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for e… ▽ More The universality of the point cloud format enables many 3D applications, making the compression of point clouds a critical phase in practice. Sampled as discrete 3D points, a point cloud approximates 2D surface(s) embedded in 3D with a finite bit-depth. However, the point distribution of a practical point cloud changes drastically as its bit-depth increases, requiring different methodologies for effective consumption/analysis. In this regard, a heterogeneous point cloud compression (PCC) framework is proposed. We unify typical point cloud representations -- point-based, voxel-based, and tree-based representations -- and their associated backbones under a learning-based framework to compress an input point cloud at different bit-depth levels. Having recognized the importance of voxel-domain processing, we augment the framework with a proposed context-aware upsampling for decoding and an enhanced voxel transformer for feature aggregation. Extensive experimentation demonstrates the state-of-the-art performance of our proposal on a wide range of point clouds. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: Accepted at 3DV 2024

arXiv:2402.03292 [pdf, other]

Zero-shot Object-Level OOD Detection with Context-Aware Inpainting

Authors: Quang-Huy Nguyen, Jin Peng Zhou, Zhenzhen Liu, Khanh-Huyen Bui, Kilian Q. Weinberger, Dung D. Le

Abstract: Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses… ▽ More Machine learning algorithms are increasingly provided as black-box cloud services or pre-trained models, without access to their training data. This motivates the problem of zero-shot out-of-distribution (OOD) detection. Concretely, we aim to detect OOD objects that do not belong to the classifier's label set but are erroneously classified as in-distribution (ID) objects. Our approach, RONIN, uses an off-the-shelf diffusion model to replace detected objects with inpainting. RONIN conditions the inpainting process with the predicted ID label, drawing the input object closer to the in-distribution domain. As a result, the reconstructed object is very close to the original in the ID cases and far in the OOD cases, allowing RONIN to effectively distinguish ID and OOD samples. Throughout extensive experiments, we demonstrate that RONIN achieves competitive results compared to previous approaches across several datasets, both in zero-shot and non-zero-shot settings. △ Less

Submitted 6 February, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

arXiv:2308.07601 [pdf, ps, other]

VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022

Authors: Hai Long Trieu, Song Kiet Bui, Tan Minh Tran, Van Khanh Tran, Hai An Nguyen

Abstract: We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method fo… ▽ More We present our systems participated in the VLSP 2022 machine translation shared task. In the shared task this year, we participated in both translation tasks, i.e., Chinese-Vietnamese and Vietnamese-Chinese translations. We build our systems based on the neural-based Transformer model with the powerful multilingual denoising pre-trained model mBART. The systems are enhanced by a sampling method for backtranslation, which leverage large scale available monolingual data. Additionally, several other methods are applied to improve the translation quality including ensembling and postprocessing. We achieve 38.9 BLEU on ChineseVietnamese and 38.0 BLEU on VietnameseChinese on the public test sets, which outperform several strong baselines. △ Less

Submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.00684 [pdf, ps, other]

A Proximal Algorithm for Network Slimming

Authors: Kevin Bui, Fanghui Xue, Fredrick Park, Yingyong Qi, Jack Xin

Abstract: As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with $\ell_1$ regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-c… ▽ More As a popular channel pruning method for convolutional neural networks (CNNs), network slimming (NS) has a three-stage process: (1) it trains a CNN with $\ell_1$ regularization applied to the scaling factors of the batch normalization layers; (2) it removes channels whose scaling factors are below a chosen threshold; and (3) it retrains the pruned model to recover the original accuracy. This time-consuming, three-step process is a result of using subgradient descent to train CNNs. Because subgradient descent does not exactly train CNNs towards sparse, accurate structures, the latter two steps are necessary. Moreover, subgradient descent does not have any convergence guarantee. Therefore, we develop an alternative algorithm called proximal NS. Our proposed algorithm trains CNNs towards sparse, accurate structures, so identifying a scaling factor threshold is unnecessary and fine tuning the pruned CNNs is optional. Using Kurdyka-Łojasiewicz assumptions, we establish global convergence of proximal NS. Lastly, we validate the efficacy of the proposed algorithm on VGGNet, DenseNet and ResNet on CIFAR 10/100. Our experiments demonstrate that after one round of training, proximal NS yields a CNN with competitive accuracy and compression. △ Less

Submitted 30 January, 2024; v1 submitted 2 July, 2023; originally announced July 2023.

Comments: accepted to LOD'23; fixed typo

arXiv:2307.00439 [pdf, other]

Weighted Anisotropic-Isotropic Total Variation for Poisson Denoising

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating… ▽ More Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating the weighted anisotropic-isotropic total variation (AITV) as a regularization. We then develop an alternating direction method of multipliers with a combination of a proximal operator for an efficient implementation. Lastly, numerical experiments demonstrate that our algorithm outperforms other Poisson denoising methods in terms of image quality and computational efficiency. △ Less

Submitted 1 July, 2023; originally announced July 2023.

Comments: accepted to ICIP 2023

arXiv:2301.03393 [pdf, other]

Difference of Anisotropic and Isotropic TV for Segmentation under Blur and Poisson Noise

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: In this paper, we aim to segment an image degraded by blur and Poisson noise. We adopt a smoothing-and-thresholding (SaT) segmentation framework that finds a piecewise-smooth solution, followed by $k$-means clustering to segment the image. Specifically for the image smoothing step, we replace the least-squares fidelity for Gaussian noise in the Mumford-Shah model with a maximum posterior (MAP) ter… ▽ More In this paper, we aim to segment an image degraded by blur and Poisson noise. We adopt a smoothing-and-thresholding (SaT) segmentation framework that finds a piecewise-smooth solution, followed by $k$-means clustering to segment the image. Specifically for the image smoothing step, we replace the least-squares fidelity for Gaussian noise in the Mumford-Shah model with a maximum posterior (MAP) term to deal with Poisson noise and we incorporate the weighted difference of anisotropic and isotropic total variation (AITV) as a regularization to promote the sparsity of image gradients. For such a nonconvex model, we develop a specific splitting scheme and utilize a proximal operator to apply the alternating direction method of multipliers (ADMM). Convergence analysis is provided to validate the efficacy of the ADMM scheme. Numerical experiments on various segmentation scenarios (grayscale/color and multiphase) showcase that our proposed method outperforms a number of segmentation methods, including the original SaT. △ Less

Submitted 16 June, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

Comments: Accepted to Frontiers in Computer Science: https://www.frontiersin.org/articles/10.3389/fcomp.2023.1131317/abstract; Arxiv version has clearer images best for zooming in

arXiv:2202.10115 [pdf, other]

An Efficient Smoothing and Thresholding Image Segmentation Framework with Weighted Anisotropic-Isotropic Total Variation

Authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin

Abstract: In this paper, we design an efficient, multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation (AITV). The segmentation framework generally consists of two stages: smoothing and thresholding, thus referred to as SaT. In the first stage, a smoothed image is obtained by an AITV-regularized Mumford-Shah (MS) model, which can be sol… ▽ More In this paper, we design an efficient, multi-stage image segmentation framework that incorporates a weighted difference of anisotropic and isotropic total variation (AITV). The segmentation framework generally consists of two stages: smoothing and thresholding, thus referred to as SaT. In the first stage, a smoothed image is obtained by an AITV-regularized Mumford-Shah (MS) model, which can be solved efficiently by the alternating direction method of multipliers (ADMM) with a closed-form solution of a proximal operator of the $\ell_1 -α\ell_2$ regularizer. Convergence of the ADMM algorithm is analyzed. In the second stage, we threshold the smoothed image by $K$-means clustering to obtain the final segmentation result. Numerical experiments demonstrate that the proposed segmentation framework is versatile for both grayscale and color images, efficient in producing high-quality segmentation results within a few seconds, and robust to input images that are corrupted with noise, blur, or both. We compare the AITV method with its original convex TV and nonconvex TV$^p (0<p<1)$ counterparts, showcasing the qualitative and quantitative advantages of our proposed method. △ Less

Submitted 15 November, 2023; v1 submitted 21 February, 2022; originally announced February 2022.

Comments: final version sent to Springer CAMC

arXiv:2010.01242 [pdf, other]

Improving Network Slimming with Nonconvex Regularization

Authors: Kevin Bui, Fredrick Park, Shuai Zhang, Yingyong Qi, Jack Xin

Abstract: Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compres… ▽ More Convolutional neural networks (CNNs) have developed to become powerful models for various computer vision tasks ranging from object detection to semantic segmentation. However, most of the state-of-the-art CNNs cannot be deployed directly on edge devices such as smartphones and drones, which need low latency under limited power and memory bandwidth. One popular, straightforward approach to compressing CNNs is network slimming, which imposes $\ell_1$ regularization on the channel-associated scaling factors via the batch normalization layers during training. Network slimming thereby identifies insignificant channels that can be pruned for inference. In this paper, we propose replacing the $\ell_1$ penalty with an alternative nonconvex, sparsity-inducing penalty in order to yield a more compressed and/or accurate CNN architecture. We investigate $\ell_p (0 < p < 1)$, transformed $\ell_1$ (T$\ell_1$), minimax concave penalty (MCP), and smoothly clipped absolute deviation (SCAD) due to their recent successes and popularity in solving sparse optimization problems, such as compressed sensing and variable selection. We demonstrate the effectiveness of network slimming with nonconvex penalties on three neural network architectures -- VGG-19, DenseNet-40, and ResNet-164 -- on standard image classification datasets. Based on the numerical experiments, T$\ell_1$ preserves model accuracy against channel pruning, $\ell_{1/2, 3/4}$ yield better compressed models with similar accuracies after retraining as $\ell_1$, and MCP and SCAD provide more accurate models after retraining with similar compression as $\ell_1$. Network slimming with T$\ell_1$ regularization also outperforms the latest Bayesian modification of network slimming in compressing a CNN architecture in terms of memory storage while preserving its model accuracy after channel pruning. △ Less

Submitted 18 August, 2021; v1 submitted 2 October, 2020; originally announced October 2020.

Comments: version1 published in ISVC'20; version 2: fixed typo; version3 is the extended version and submitted to a journal; version 4: more typos fixed, official version will be on IEEE Access

arXiv:2005.04401 [pdf, other]

A Weighted Difference of Anisotropic and Isotropic Total Variation for Relaxed Mumford-Shah Color and Multiphase Image Segmentation

Authors: Kevin Bui, Fredrick Park, Yifei Lou, Jack Xin

Abstract: In a class of piecewise-constant image segmentation models, we propose to incorporate a weighted difference of anisotropic and isotropic total variation (AITV) to regularize the partition boundaries in an image. In particular, we replace the total variation regularization in the Chan-Vese segmentation model and a fuzzy region competition model by the proposed AITV. To deal with the nonconvex natur… ▽ More In a class of piecewise-constant image segmentation models, we propose to incorporate a weighted difference of anisotropic and isotropic total variation (AITV) to regularize the partition boundaries in an image. In particular, we replace the total variation regularization in the Chan-Vese segmentation model and a fuzzy region competition model by the proposed AITV. To deal with the nonconvex nature of AITV, we apply the difference-of-convex algorithm (DCA), in which the subproblems can be minimized by the primal-dual hybrid gradient method with linesearch. The convergence of the DCA scheme is analyzed. In addition, a generalization to color image segmentation is discussed. In the numerical experiments, we compare the proposed models with the classic convex approaches and the two-stage segmentation methods (smoothing and then thresholding) on various images, showing that our models are effective in image segmentation and robust with respect to impulsive noises. △ Less

Submitted 17 July, 2021; v1 submitted 9 May, 2020; originally announced May 2020.

Comments: latest version has typos fixed; Clean, official version will be on SIAM Journal on Imaging Sciences

arXiv:1912.07868 [pdf, ps, other]

$\ell_0$ Regularized Structured Sparsity Convolutional Neural Networks

Authors: Kevin Bui, Fredrick Park, Shuai Zhang, Yingyong Qi, Jack Xin

Abstract: Deepening and widening convolutional neural networks (CNNs) significantly increases the number of trainable weight parameters by adding more convolutional layers and feature maps per layer, respectively. By imposing inter- and intra-group sparsity onto the weights of the layers during the training process, a compressed network can be obtained with accuracy comparable to a dense one. In this paper,… ▽ More Deepening and widening convolutional neural networks (CNNs) significantly increases the number of trainable weight parameters by adding more convolutional layers and feature maps per layer, respectively. By imposing inter- and intra-group sparsity onto the weights of the layers during the training process, a compressed network can be obtained with accuracy comparable to a dense one. In this paper, we propose a new variant of sparse group lasso that blends the $\ell_0$ norm onto the individual weight parameters and the $\ell_{2,1}$ norm onto the output channels of a layer. To address the non-differentiability of the $\ell_0$ norm, we apply variable splitting resulting in an algorithm that consists of executing stochastic gradient descent followed by hard thresholding for each iteration. Numerical experiments are demonstrated on LeNet-5 and wide-residual-networks for MNIST and CIFAR 10/100, respectively. They showcase the effectiveness of our proposed method in attaining superior test accuracy with network sparsification on par with the current state of the art. △ Less

Submitted 17 December, 2019; originally announced December 2019.

Showing 1–11 of 11 results for author: Bui, K