subscribe to arXiv mailings

arXiv:2012.06149 [pdf, other]

Superpixel Segmentation Based on Spatially Constrained Subspace Clustering

Authors: Hua Li, Yuheng Jia, Runmin Cong, Wenhui Wu, Sam Kwong, Chuanbo Chen

Abstract: Superpixel segmentation aims at dividing the input image into some representative regions containing pixels with similar and consistent intrinsic properties, without any prior knowledge about the shape and size of each superpixel. In this paper, to alleviate the limitation of superpixel segmentation applied in practical industrial tasks that detailed boundaries are difficult to be kept, we regard… ▽ More Superpixel segmentation aims at dividing the input image into some representative regions containing pixels with similar and consistent intrinsic properties, without any prior knowledge about the shape and size of each superpixel. In this paper, to alleviate the limitation of superpixel segmentation applied in practical industrial tasks that detailed boundaries are difficult to be kept, we regard each representative region with independent semantic information as a subspace, and correspondingly formulate superpixel segmentation as a subspace clustering problem to preserve more detailed content boundaries. We show that a simple integration of superpixel segmentation with the conventional subspace clustering does not effectively work due to the spatial correlation of the pixels within a superpixel, which may lead to boundary confusion and segmentation error when the correlation is ignored. Consequently, we devise a spatial regularization and propose a novel convex locality-constrained subspace clustering model that is able to constrain the spatial adjacent pixels with similar attributes to be clustered into a superpixel and generate the content-aware superpixels with more detailed boundaries. Finally, the proposed model is solved by an efficient alternating direction method of multipliers (ADMM) solver. Experiments on different standard datasets demonstrate that the proposed method achieves superior performance both quantitatively and qualitatively compared with some state-of-the-art methods. △ Less

Submitted 11 December, 2020; originally announced December 2020.

Comments: Accepted by IEEE Transactions on Industrial Informatics, 2020

arXiv:2011.13144 [pdf, other]

doi 10.1109/TIP.2020.3042084

Dense Attention Fluid Network for Salient Object Detection in Optical Remote Sensing Images

Authors: Qijian Zhang, Runmin Cong, Chongyi Li, Ming-Ming Cheng, Yuming Fang, Xiaochun Cao, Yao Zhao, Sam Kwong

Abstract: Despite the remarkable advances in visual saliency analysis for natural scene images (NSIs), salient object detection (SOD) for optical remote sensing images (RSIs) still remains an open and challenging problem. In this paper, we propose an end-to-end Dense Attention Fluid Network (DAFNet) for SOD in optical RSIs. A Global Context-aware Attention (GCA) module is proposed to adaptively capture long… ▽ More Despite the remarkable advances in visual saliency analysis for natural scene images (NSIs), salient object detection (SOD) for optical remote sensing images (RSIs) still remains an open and challenging problem. In this paper, we propose an end-to-end Dense Attention Fluid Network (DAFNet) for SOD in optical RSIs. A Global Context-aware Attention (GCA) module is proposed to adaptively capture long-range semantic context relationships, and is further embedded in a Dense Attention Fluid (DAF) structure that enables shallow attention cues flow into deep layers to guide the generation of high-level feature attention maps. Specifically, the GCA module is composed of two key components, where the global feature aggregation module achieves mutual reinforcement of salient feature embeddings from any two spatial locations, and the cascaded pyramid attention module tackles the scale variation issue by building up a cascaded pyramid framework to progressively refine the attention map in a coarse-to-fine manner. In addition, we construct a new and challenging optical RSI dataset for SOD that contains 2,000 images with pixel-wise saliency annotations, which is currently the largest publicly available benchmark. Extensive experiments demonstrate that our proposed DAFNet significantly outperforms the existing state-of-the-art SOD competitors. https://github.com/rmcong/DAFNet_TIP20 △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: Accepted by IEEE Transactions on Image Processing, EORSSD dataset: https://github.com/rmcong/EORSSD-dataset

arXiv:2011.12745 [pdf, other]

Deep Magnification-Flexible Upsampling over 3D Point Clouds

Authors: Yue Qian, Junhui Hou, Sam Kwong, Ying He

Abstract: This paper addresses the problem of generating dense point clouds from given sparse point clouds to model the underlying geometric structures of objects/scenes. To tackle this challenging issue, we propose a novel end-to-end learning-based framework. Specifically, by taking advantage of the linear approximation theorem, we first formulate the problem explicitly, which boils down to determining the… ▽ More This paper addresses the problem of generating dense point clouds from given sparse point clouds to model the underlying geometric structures of objects/scenes. To tackle this challenging issue, we propose a novel end-to-end learning-based framework. Specifically, by taking advantage of the linear approximation theorem, we first formulate the problem explicitly, which boils down to determining the interpolation weights and high-order approximation errors. Then, we design a lightweight neural network to adaptively learn unified and sorted interpolation weights as well as the high-order refinements, by analyzing the local geometry of the input point cloud. The proposed method can be interpreted by the explicit formulation, and thus is more memory-efficient than existing ones. In sharp contrast to the existing methods that work only for a pre-defined and fixed upsampling factor, the proposed framework only requires a single neural network with one-time training to handle various upsampling factors within a typical range, which is highly desired in real-world applications. In addition, we propose a simple yet effective training strategy to drive such a flexible ability. In addition, our method can handle non-uniformly distributed and noisy data well. Extensive experiments on both synthetic and real-world data demonstrate the superiority of the proposed method over state-of-the-art methods both quantitatively and qualitatively. △ Less

Submitted 29 March, 2022; v1 submitted 25 November, 2020; originally announced November 2020.

Comments: 15 pages, 16 figures, 6 tables. This paper has been published in IEEE TIP

arXiv:2010.06313 [pdf, other]

Controllable Pareto Multi-Task Learning

Authors: Xi Lin, Zhiyuan Yang, Qingfu Zhang, Sam Kwong

Abstract: A multi-task learning (MTL) system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. For many real-world applications where the trade-off has to be made online, multiple models with different preferences over tasks have to be trained… ▽ More A multi-task learning (MTL) system aims at solving multiple related tasks at the same time. With a fixed model capacity, the tasks would be conflicted with each other, and the system usually has to make a trade-off among learning all of them together. For many real-world applications where the trade-off has to be made online, multiple models with different preferences over tasks have to be trained and stored. This work proposes a novel controllable Pareto multi-task learning framework, to enable the system to make real-time trade-off control among different tasks with a single model. To be specific, we formulate the MTL as a preference-conditioned multiobjective optimization problem, with a parametric mapping from preferences to the corresponding trade-off solutions. A single hypernetwork-based multi-task neural network is built to learn all tasks with different trade-off preferences among them, where the hypernetwork generates the model parameters conditioned on the preference. For inference, MTL practitioners can easily control the model performance based on different trade-off preferences in real-time. Experiments on different applications demonstrate that the proposed model is efficient for solving various MTL problems. △ Less

Submitted 14 February, 2021; v1 submitted 13 October, 2020; originally announced October 2020.

arXiv:2010.05713 [pdf, other]

Unsupervised Image-to-Image Translation via Pre-trained StyleGAN2 Network

Authors: Jialu Huang, Jing Liao, Sam Kwong

Abstract: Image-to-Image (I2I) translation is a heated topic in academia, and it also has been applied in real-world industry for tasks like image synthesis, super-resolution, and colorization. However, traditional I2I translation methods train data in two or more domains together. This requires lots of computation resources. Moreover, the results are of lower quality, and they contain many more artifacts.… ▽ More Image-to-Image (I2I) translation is a heated topic in academia, and it also has been applied in real-world industry for tasks like image synthesis, super-resolution, and colorization. However, traditional I2I translation methods train data in two or more domains together. This requires lots of computation resources. Moreover, the results are of lower quality, and they contain many more artifacts. The training process could be unstable when the data in different domains are not balanced, and modal collapse is more likely to happen. We proposed a new I2I translation method that generates a new model in the target domain via a series of model transformations on a pre-trained StyleGAN2 model in the source domain. After that, we proposed an inversion method to achieve the conversion between an image and its latent vector. By feeding the latent vector into the generated model, we can perform I2I translation between the source domain and target domain. Both qualitative and quantitative evaluations were conducted to prove that the proposed method can achieve outstanding performance in terms of image quality, diversity and semantic similarity to the input and reference images compared to state-of-the-art works. △ Less

Submitted 26 October, 2020; v1 submitted 12 October, 2020; originally announced October 2020.

Comments: 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2009.12537 [pdf, other]

Deep Selective Combinatorial Embedding and Consistency Regularization for Light Field Super-resolution

Authors: Jing Jin, Junhui Hou, Zhiyu Zhu, Jie Chen, Sam Kwong

Abstract: Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited detector resolution has to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challengi… ▽ More Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited detector resolution has to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF sub-aperture images (SAIs) and are insufficient in accurately preserving the scene's parallax structure. To tackle this challenge, we propose a novel learning-based LF spatial SR framework. Specifically, each SAI of an LF image is first coarsely and individually super-resolved by exploring the complementary information among SAIs with selective combinatorial geometry embedding. To achieve efficient and effective selection of the complementary information, we propose two novel sub-modules conducted hierarchically: the patch selector provides an option of retrieving similar image patches based on offline disparity estimation to handle large-disparity correlations; and the SAI selector adaptively and flexibly selects the most informative SAIs to improve the embedding efficiency. To preserve the parallax structure among the reconstructed SAIs, we subsequently append a consistency regularization network trained over a structure-aware loss function to refine the parallax relationships over the coarse estimation. In addition, we extend the proposed method to irregular LF data. To the best of our knowledge, this is the first learning-based SR method for irregular LF data. Experimental results over both synthetic and real-world LF datasets demonstrate the significant advantage of our approach over state-of-the-art methods. △ Less

Submitted 6 October, 2021; v1 submitted 26 September, 2020; originally announced September 2020.

Comments: 14 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2004.02215

arXiv:2008.11455 [pdf, other]

High Efficiency Rate Control for Versatile Video Coding Based on Composite Cauchy Distribution

Authors: Yunhao Mao, Meng Wang, Shiqi Wang, Sam Kwong

Abstract: In this work, we propose a novel rate control algorithm for Versatile Video Coding (VVC) standard based on its distinct rate-distortion characteristics. By modelling the transform coefficients with the composite Cauchy distribution, higher accuracy compared with traditional distributions has been achieved. Based on the transform coefficient modelling, the theoretically derived R-Q and D-Q models w… ▽ More In this work, we propose a novel rate control algorithm for Versatile Video Coding (VVC) standard based on its distinct rate-distortion characteristics. By modelling the transform coefficients with the composite Cauchy distribution, higher accuracy compared with traditional distributions has been achieved. Based on the transform coefficient modelling, the theoretically derived R-Q and D-Q models which have been shown to deliver higher accuracy in characterizing RD characteristics for sequences with different content are incorporated into the rate control process. Furthermore, to establish an adaptive bit allocation scheme, the dependency between different levels of frames is modelled by a dependency factor to describe relationship between the reference and to-be-coded frames. Given the derived R-Q and D-Q relationships, as well as the dependency factor, an adaptive bit allocation scheme is developed for optimal bits allocation. We implement the proposed algorithm on VVC Test Model (VTM) 3.0. Experiments show that due to proper bit allocation, for low delay configuration the proposed algorithm can achieve 1.03% BD-Rate saving compared with the default rate control algorithm and 2.96% BD-Rate saving compared with fixed QP scheme. Moreover, 1.29% BD-Rate saving and higher control accuracy have also been observed under the random access configuration. △ Less

Submitted 26 August, 2020; v1 submitted 26 August, 2020; originally announced August 2020.

arXiv:2008.11420 [pdf, other]

doi 10.1109/TIP.2021.3051460

Low Complexity Trellis-Coded Quantization in Versatile Video Coding

Authors: Meng Wang, Shiqi Wang, Junru Li, Li Zhang, Yue Wang, Siwei Ma, Sam Kwong

Abstract: The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce… ▽ More The forthcoming Versatile Video Coding (VVC) standard adopts the trellis-coded quantization, which leverages the delicate trellis graph to map the quantization candidates within one block into the optimal path. Despite the high compression efficiency, the complex trellis search with soft decision quantization may hinder the applications due to high complexity and low throughput capacity. To reduce the complexity, in this paper, we propose a low complexity trellis-coded quantization scheme in a scientifically sound way with theoretical modeling of the rate and distortion. As such, the trellis departure point can be adaptively adjusted, and unnecessarily visited branches are accordingly pruned, leading to the shrink of total trellis stages and simplification of transition branches. Extensive experimental results on the VVC test model show that the proposed scheme is effective in reducing the encoding complexity by 11% and 5% with all intra and random access configurations, respectively, at the cost of only 0.11% and 0.05% BD-Rate increase. Meanwhile, on average 24% and 27% quantization time savings can be achieved under all intra and random access configurations. Due to the excellent performance, the VVC test model has adopted one implementation of the proposed scheme. △ Less

Submitted 26 August, 2020; originally announced August 2020.

arXiv:2008.05642 [pdf, other]

Towards Modality Transferable Visual Information Representation with Optimal Model Compression

Authors: Rongqun Lin, Linwei Zhu, Shiqi Wang, Sam Kwong

Abstract: Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better represen… ▽ More Compactly representing the visual signals is of fundamental importance in various image/video-centered applications. Although numerous approaches were developed for improving the image and video coding performance by removing the redundancies within visual signals, much less work has been dedicated to the transformation of the visual signals to another well-established modality for better representation capability. In this paper, we propose a new scheme for visual signal representation that leverages the philosophy of transferable modality. In particular, the deep learning model, which characterizes and absorbs the statistics of the input scene with online training, could be efficiently represented in the sense of rate-utility optimization to serve as the enhancement layer in the bitstream. As such, the overall performance can be further guaranteed by optimizing the new modality incorporated. The proposed framework is implemented on the state-of-the-art video coding standard (i.e., versatile video coding), and significantly better representation capability has been observed based on extensive evaluations. △ Less

Submitted 12 August, 2020; originally announced August 2020.

Comments: Accepted in ACM Multimedia 2020

arXiv:2008.02501 [pdf, other]

doi 10.1109/TCSVT.2021.3101484

Subjective Quality Database and Objective Study of Compressed Point Clouds With 6DoF Head-Mounted Display

Authors: Xinju Wu, Yun Zhang, Chunling Fan, Junhui Hou, Sam Kwong

Abstract: In this paper, we focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database, named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted po… ▽ More In this paper, we focus on subjective and objective Point Cloud Quality Assessment (PCQA) in an immersive environment and study the effect of geometry and texture attributes in compression distortion. Using a Head-Mounted Display (HMD) with six degrees of freedom, we establish a subjective PCQA database, named SIAT Point Cloud Quality Database (SIAT-PCQD). Our database consists of 340 distorted point clouds compressed by the MPEG point cloud encoder with the combination of 20 sequences and 17 pairs of geometry and texture quantization parameters. The impact of distorted geometry and texture attributes is further discussed in this paper. Then, we propose two projection-based objective quality evaluation methods, i.e., a weighted view projection based model and a patch projection based model. Our subjective database and findings can be used in point cloud processing, transmission, and coding, especially for virtual reality applications. The subjective dataset has been released in the public repository. △ Less

Submitted 3 August, 2021; v1 submitted 6 August, 2020; originally announced August 2020.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2006.10649 [pdf, other]

Multi-Density Sketch-to-Image Translation Network

Authors: Jialu Huang, Jing Liao, Zhifeng Tan, Sam Kwong

Abstract: Sketch-to-image (S2I) translation plays an important role in image synthesis and manipulation tasks, such as photo editing and colorization. Some specific S2I translation including sketch-to-photo and sketch-to-painting can be used as powerful tools in the art design industry. However, previous methods only support S2I translation with a single level of density, which gives less flexibility to use… ▽ More Sketch-to-image (S2I) translation plays an important role in image synthesis and manipulation tasks, such as photo editing and colorization. Some specific S2I translation including sketch-to-photo and sketch-to-painting can be used as powerful tools in the art design industry. However, previous methods only support S2I translation with a single level of density, which gives less flexibility to users for controlling the input sketches. In this work, we propose the first multi-level density sketch-to-image translation framework, which allows the input sketch to cover a wide range from rough object outlines to micro structures. Moreover, to tackle the problem of noncontinuous representation of multi-level density input sketches, we project the density level into a continuous latent space, which can then be linearly controlled by a parameter. This allows users to conveniently control the densities of input sketches and generation of images. Moreover, our method has been successfully verified on various datasets for different applications including face editing, multi-modal sketch-to-photo translation, and anime colorization, providing coarse-to-fine levels of controls to these applications. △ Less

Submitted 18 June, 2020; originally announced June 2020.

Comments: 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2005.00383 [pdf, other]

MOPS-Net: A Matrix Optimization-driven Network forTask-Oriented 3D Point Cloud Downsampling

Authors: Yue Qian, Junhui Hou, Qijian Zhang, Yiming Zeng, Sam Kwong, Ying He

Abstract: This paper explores the problem of task-oriented downsampling over 3D point clouds, which aims to downsample a point cloud while maintaining the performance of subsequent applications applied to the downsampled sparse points as much as possible. Designing from the perspective of matrix optimization, we propose MOPS-Net, a novel interpretable deep learning-based method, which is fundamentally diffe… ▽ More This paper explores the problem of task-oriented downsampling over 3D point clouds, which aims to downsample a point cloud while maintaining the performance of subsequent applications applied to the downsampled sparse points as much as possible. Designing from the perspective of matrix optimization, we propose MOPS-Net, a novel interpretable deep learning-based method, which is fundamentally different from the existing deep learning-based methods due to its interpretable feature. The optimization problem is challenging due to its discrete and combinatorial nature. We tackle the challenges by relaxing the binary constraint of the variables, and formulate a constrained and differentiable matrix optimization problem. We then design a deep neural network to mimic the matrix optimization by exploring both the local and global structures of the input data. MOPS-Net can be end-to-end trained with a task network and is permutation-invariant, making it robust to the input. We also extend MOPS-Net such that a single network after one-time training is capable of handling arbitrary downsampling ratios. Extensive experimental results show that MOPS-Net can achieve favorable performance against state-of-the-art deep learning-based methods over various tasks, including classification, reconstruction, and registration. Besides, we validate the robustness of MOPS-Net on noisy data. △ Less

Submitted 12 April, 2021; v1 submitted 1 May, 2020; originally announced May 2020.

Comments: 15 pages, 16 figures, 10 tables

arXiv:2004.14705 [pdf, other]

Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation

Authors: Yuheng Jia, Hui Liu, Junhui Hou, Sam Kwong, Qingfu Zhang

Abstract: This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. Unlike the existing methods that all adopt an off-the-shelf tensor low-rank norm without considering the special characteristics of the tensor in MVSC, we design a novel structured tensor low-rank norm tailored to MVSC. Specifically, we explicitly impose a symmetric low-rank constraint and a… ▽ More This paper explores the problem of multi-view spectral clustering (MVSC) based on tensor low-rank modeling. Unlike the existing methods that all adopt an off-the-shelf tensor low-rank norm without considering the special characteristics of the tensor in MVSC, we design a novel structured tensor low-rank norm tailored to MVSC. Specifically, we explicitly impose a symmetric low-rank constraint and a structured sparse low-rank constraint on the frontal and horizontal slices of the tensor to characterize the intra-view and inter-view relationships, respectively. Moreover, the two constraints could be jointly optimized to achieve mutual refinement. On the basis of the novel tensor low-rank norm, we formulate MVSC as a convex low-rank tensor recovery problem, which is then efficiently solved with an augmented Lagrange multiplier based method iteratively. Extensive experimental results on five benchmark datasets show that the proposed method outperforms state-of-the-art methods to a significant extent. Impressively, our method is able to produce perfect clustering. In addition, the parameters of our method can be easily tuned, and the proposed model is robust to different datasets, demonstrating its potential in practice. △ Less

Submitted 1 August, 2020; v1 submitted 30 April, 2020; originally announced April 2020.

arXiv:2004.02215 [pdf, other]

Light Field Spatial Super-resolution via Deep Combinatorial Geometry Embedding and Structural Consistency Regularization

Authors: Jing Jin, Junhui Hou, Jie Chen, Sam Kwong

Abstract: Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challengi… ▽ More Light field (LF) images acquired by hand-held devices usually suffer from low spatial resolution as the limited sampling resources have to be shared with the angular dimension. LF spatial super-resolution (SR) thus becomes an indispensable part of the LF camera processing pipeline. The high-dimensionality characteristic and complex geometrical structure of LF images make the problem more challenging than traditional single-image SR. The performance of existing methods is still limited as they fail to thoroughly explore the coherence among LF views and are insufficient in accurately preserving the parallax structure of the scene. In this paper, we propose a novel learning-based LF spatial SR framework, in which each view of an LF image is first individually super-resolved by exploring the complementary information among views with combinatorial geometry embedding. For accurate preservation of the parallax structure among the reconstructed views, a regularization network trained over a structure-aware loss function is subsequently appended to enforce correct parallax relationships over the intermediate estimation. Our proposed approach is evaluated over datasets with a large number of testing images including both synthetic and real-world scenes. Experimental results demonstrate the advantage of our approach over state-of-the-art methods, i.e., our method not only improves the average PSNR by more than 1.0 dB but also preserves more accurate parallax details, at a lower computational cost. △ Less

Submitted 5 April, 2020; originally announced April 2020.

Comments: This paper was accepted by CVPR 2020

arXiv:2002.11263 [pdf, other]

Learning Light Field Angular Super-Resolution via a Geometry-Aware Network

Authors: Jing Jin, Junhui Hou, Hui Yuan, Sam Kwong

Abstract: The acquisition of light field images with high angular resolution is costly. Although many methods have been proposed to improve the angular resolution of a sparsely-sampled light field, they always focus on the light field with a small baseline, which is captured by a consumer light field camera. By making full use of the intrinsic \textit{geometry} information of light fields, in this paper we… ▽ More The acquisition of light field images with high angular resolution is costly. Although many methods have been proposed to improve the angular resolution of a sparsely-sampled light field, they always focus on the light field with a small baseline, which is captured by a consumer light field camera. By making full use of the intrinsic \textit{geometry} information of light fields, in this paper we propose an end-to-end learning-based approach aiming at angularly super-resolving a sparsely-sampled light field with a large baseline. Our model consists of two learnable modules and a physically-based module. Specifically, it includes a depth estimation module for explicitly modeling the scene geometry, a physically-based warping for novel views synthesis, and a light field blending module specifically designed for light field reconstruction. Moreover, we introduce a novel loss function to promote the preservation of the light field parallax structure. Experimental results over various light field datasets including large baseline light field images demonstrate the significant superiority of our method when compared with state-of-the-art ones, i.e., our method improves the PSNR of the second best method up to 2 dB in average, while saves the execution time 48$\times$. In addition, our method preserves the light field parallax structure better. △ Less

Submitted 25 February, 2020; originally announced February 2020.

Comments: This paper was accepted by AAAI 2020

arXiv:2002.10277 [pdf, other]

PUGeo-Net: A Geometry-centric Network for 3D Point Cloud Upsampling

Authors: Yue Qian, Junhui Hou, Sam Kwong, Ying He

Abstract: This paper addresses the problem of generating uniform dense point clouds to describe the underlying geometric structures from given sparse point clouds. Due to the irregular and unordered nature, point cloud densification as a generative task is challenging. To tackle the challenge, we propose a novel deep neural network based method, called PUGeo-Net, that learns a $3\times 3$ linear transformat… ▽ More This paper addresses the problem of generating uniform dense point clouds to describe the underlying geometric structures from given sparse point clouds. Due to the irregular and unordered nature, point cloud densification as a generative task is challenging. To tackle the challenge, we propose a novel deep neural network based method, called PUGeo-Net, that learns a $3\times 3$ linear transformation matrix $\bf T$ for each input point. Matrix $\mathbf T$ approximates the augmented Jacobian matrix of a local parameterization and builds a one-to-one correspondence between the 2D parametric domain and the 3D tangent plane so that we can lift the adaptively distributed 2D samples (which are also learned from data) to 3D space. After that, we project the samples to the curved surface by computing a displacement along the normal of the tangent plane. PUGeo-Net is fundamentally different from the existing deep learning methods that are largely motivated by the image super-resolution techniques and generate new points in the abstract feature space. Thanks to its geometry-centric nature, PUGeo-Net works well for both CAD models with sharp features and scanned models with rich geometric details. Moreover, PUGeo-Net can compute the normal for the original and generated points, which is highly desired by the surface reconstruction algorithms. Computational results show that PUGeo-Net, the first neural network that can jointly generate vertex coordinates and normals, consistently outperforms the state-of-the-art in terms of accuracy and efficiency for upsampling factor $4\sim 16$. △ Less

Submitted 7 March, 2020; v1 submitted 24 February, 2020; originally announced February 2020.

Comments: 17 pages, 10 figures

arXiv:2001.06826 [pdf, other]

Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement

Authors: Chunle Guo, Chongyi Li, Jichang Guo, Chen Change Loy, Junhui Hou, Sam Kwong, Runmin Cong

Abstract: The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. Our method trains a lightweight deep network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic range adjustment of a given image. The curve estimation is specially designed, considering pixel value… ▽ More The paper presents a novel method, Zero-Reference Deep Curve Estimation (Zero-DCE), which formulates light enhancement as a task of image-specific curve estimation with a deep network. Our method trains a lightweight deep network, DCE-Net, to estimate pixel-wise and high-order curves for dynamic range adjustment of a given image. The curve estimation is specially designed, considering pixel value range, monotonicity, and differentiability. Zero-DCE is appealing in its relaxed assumption on reference images, i.e., it does not require any paired or unpaired data during training. This is achieved through a set of carefully formulated non-reference loss functions, which implicitly measure the enhancement quality and drive the learning of the network. Our method is efficient as image enhancement can be achieved by an intuitive and simple nonlinear curve mapping. Despite its simplicity, we show that it generalizes well to diverse lighting conditions. Extensive experiments on various benchmarks demonstrate the advantages of our method over state-of-the-art methods qualitatively and quantitatively. Furthermore, the potential benefits of our Zero-DCE to face detection in the dark are discussed. Code and model will be available at https://github.com/Li-Chongyi/Zero-DCE. △ Less

Submitted 22 March, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

Comments: 10 pages

Journal ref: CVPR 2020

arXiv:1912.12854 [pdf, other]

Pareto Multi-Task Learning

Authors: Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, Sam Kwong

Abstract: Multi-task learning is a powerful method for solving multiple correlated tasks simultaneously. However, it is often impossible to find one single solution to optimize all the tasks, since different tasks might conflict with each other. Recently, a novel method is proposed to find one single Pareto optimal solution with good trade-off among different tasks by casting multi-task learning as multiobj… ▽ More Multi-task learning is a powerful method for solving multiple correlated tasks simultaneously. However, it is often impossible to find one single solution to optimize all the tasks, since different tasks might conflict with each other. Recently, a novel method is proposed to find one single Pareto optimal solution with good trade-off among different tasks by casting multi-task learning as multiobjective optimization. In this paper, we generalize this idea and propose a novel Pareto multi-task learning algorithm (Pareto MTL) to find a set of well-distributed Pareto solutions which can represent different trade-offs among different tasks. The proposed algorithm first formulates a multi-task learning problem as a multiobjective optimization problem, and then decomposes the multiobjective optimization problem into a set of constrained subproblems with different trade-off preferences. By solving these subproblems in parallel, Pareto MTL can find a set of well-representative Pareto optimal solutions with different trade-off among all tasks. Practitioners can easily select their preferred solution from these Pareto solutions, or use different trade-off solutions for different situations. Experimental results confirm that the proposed algorithm can generate well-representative solutions and outperform some state-of-the-art algorithms on many multi-task learning applications. △ Less

Submitted 30 December, 2019; originally announced December 2019.

Comments: 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

arXiv:1912.11954 [pdf, other]

Non-Cooperative Game Theory Based Rate Adaptation for Dynamic Video Streaming over HTTP

Authors: Hui Yuan, Huayong Fu, Ju Liu, Junhui Hou, Sam Kwong

Abstract: Dynamic Adaptive Streaming over HTTP (DASH) has demonstrated to be an emerging and promising multimedia streaming technique, owing to its capability of dealing with the variability of networks. Rate adaptation mechanism, a challenging and open issue, plays an important role in DASH based systems since it affects Quality of Experience (QoE) of users, network utilization, etc. In this paper, based o… ▽ More Dynamic Adaptive Streaming over HTTP (DASH) has demonstrated to be an emerging and promising multimedia streaming technique, owing to its capability of dealing with the variability of networks. Rate adaptation mechanism, a challenging and open issue, plays an important role in DASH based systems since it affects Quality of Experience (QoE) of users, network utilization, etc. In this paper, based on non-cooperative game theory, we propose a novel algorithm to optimally allocate the limited export bandwidth of the server to multi-users to maximize their QoE with fairness guaranteed. The proposed algorithm is proxy-free. Specifically, a novel user QoE model is derived by taking a variety of factors into account, like the received video quality, the reference buffer length, and user accumulated buffer lengths, etc. Then, the bandwidth competing problem is formulated as a non-cooperation game with the existence of Nash Equilibrium that is theoretically proven. Finally, a distributed iterative algorithm with stability analysis is proposed to find the Nash Equilibrium. Compared with state-of-the-art methods, extensive experimental results in terms of both simulated and realistic networking scenarios demonstrate that the proposed algorithm can produce higher QoE, and the actual buffer lengths of all users keep nearly optimal states, i.e., moving around the reference buffer all the time. Besides, the proposed algorithm produces no playback interruption. △ Less

Submitted 26 December, 2019; originally announced December 2019.

Comments: This paper has been published on IEEE Transactions on Mobile Computing. H. Yuan, H. Fu, J. Liu, J. Hou, and S. Kwong, "Non-Cooperative Game Theory Based Rate Adaptation for Dynamic Video Streaming over HTTP," IEEE Transactions on Mobile Computing, vol.17, no.10, pp. 2334-2348, Oct. 2018

Journal ref: IEEE Transactions on Mobile Computing, vol.17, no.10, pp. 2334-2348, Oct. 2018

arXiv:1912.11822 [pdf, other]

An Ensemble Rate Adaptation Framework for Dynamic Adaptive Streaming Over HTTP

Authors: Hui Yuan, Xiaoqian Hu, Junhui Hou, Xuekai Wei, Sam Kwong

Abstract: Rate adaptation is one of the most important issues in dynamic adaptive streaming over HTTP (DASH). Due to the frequent fluctuations of the network bandwidth and complex variations of video content, it is difficult to deal with the varying network conditions and video content perfectly by using a single rate adaptation method. In this paper, we propose an ensemble rate adaptation framework for DAS… ▽ More Rate adaptation is one of the most important issues in dynamic adaptive streaming over HTTP (DASH). Due to the frequent fluctuations of the network bandwidth and complex variations of video content, it is difficult to deal with the varying network conditions and video content perfectly by using a single rate adaptation method. In this paper, we propose an ensemble rate adaptation framework for DASH, which aims to leverage the advantages of multiple methods involved in the framework to improve the quality of experience (QoE) of users. The proposed framework is simple yet very effective. Specifically, the proposed framework is composed of two modules, i.e., the method pool and method controller. In the method pool, several rate adap tation methods are integrated. At each decision time, only the method that can achieve the best QoE is chosen to determine the bitrate of the requested video segment. Besides, we also propose two strategies for switching methods, i.e., InstAnt Method Switching, and InterMittent Method Switching, for the method controller to determine which method can provide the best QoEs. Simulation results demonstrate that, the proposed framework always achieves the highest QoE for the change of channel environment and video complexity, compared with state-of-the-art rate adaptation methods. △ Less

Submitted 26 December, 2019; originally announced December 2019.

Comments: This article has been accepted by IEEE Transactions on Broadcasting

arXiv:1912.10653 [pdf, ps, other]

Video Compression Coding via Colorization: A Generative Adversarial Network (GAN)-Based Approach

Authors: Zhaoqing Pan, Feng Yuan, Jianjun Lei, Sam Kwong

Abstract: Under the limited storage, computing and network bandwidth resources, the video compression coding technology plays an important role for visual communication. To efficiently compress raw video data, a colorization-based video compression coding method is proposed in this paper. In the proposed encoder, only the video luminance components are encoded and transmitted. To restore the video chrominan… ▽ More Under the limited storage, computing and network bandwidth resources, the video compression coding technology plays an important role for visual communication. To efficiently compress raw video data, a colorization-based video compression coding method is proposed in this paper. In the proposed encoder, only the video luminance components are encoded and transmitted. To restore the video chrominance information, a generative adversarial network (GAN) model is adopted in the proposed decoder. In order to make the GAN work efficiently for video colorization, the generator of the proposed GAN model adopts an optimized MultiResUNet, an attention module, and a mixed loss function. Experimental results show that when compared with the H.265/HEVC video compression coding standard using all-intra coding structure, the proposed video compression coding method achieves an average of 72.05% BDBR reduction, and an average of 4.758 dB BDPSNR increase. Moreover, to our knowledge, this is the first work which compresses videos by using GAN-based colorization, and it provides a new way for addressing the video compression coding problems. △ Less

Submitted 23 December, 2019; originally announced December 2019.

Comments: This work has been submitted to IEEE TCSVT

arXiv:1912.09675 [pdf, other]

Spatial and Temporal Consistency-Aware Dynamic Adaptive Streaming for 360-Degree Videos

Authors: Hui Yuan, Shiyun Zhao, Junhui Hou, Xuekai Wei, Sam Kwong

Abstract: The 360-degree video allows users to enjoy the whole scene by interactively switching viewports. However, the huge data volume of the 360-degree video limits its remote applications via network. To provide high quality of experience (QoE) for remote web users, this paper presents a tile-based adaptive streaming method for 360-degree videos. First, we propose a simple yet effective rate adaptation… ▽ More The 360-degree video allows users to enjoy the whole scene by interactively switching viewports. However, the huge data volume of the 360-degree video limits its remote applications via network. To provide high quality of experience (QoE) for remote web users, this paper presents a tile-based adaptive streaming method for 360-degree videos. First, we propose a simple yet effective rate adaptation algorithm to determine the requested bitrate for downloading the current video segment by considering the balance between the buffer length and video quality. Then, we propose to use a Gaussian model to predict the field of view at the beginning of each requested video segment. To deal with the circumstance that the view angle is switched during the display of a video segment, we propose to download all the tiles in the 360-degree video with different priorities based on a Zipf model. Finally, in order to allocate bitrates for all the tiles, a two-stage optimization algorithm is proposed to preserve the quality of tiles in FoV and guarantee the spatial and temporal smoothness. Experimental results demonstrate the effectiveness and advantage of the proposed method compared with the state-of-the-art methods. That is, our method preserves both the quality and the smoothness of tiles in FoV, thus providing the best QoE for users. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: 16 pages, This paper has been accepted by the IEEE Journal of Selected Topics in Signal Processing

arXiv:1911.10566 [pdf, other]

Controllable List-wise Ranking for Universal No-reference Image Quality Assessment

Authors: Fu-Zhao Ou, Yuan-Gen Wang, Jin Li, Guopu Zhu, Sam Kwong

Abstract: No-reference image quality assessment (NR-IQA) has received increasing attention in the IQA community since reference image is not always available. Real-world images generally suffer from various types of distortion. Unfortunately, existing NR-IQA methods do not work with all types of distortion. It is a challenging task to develop universal NR-IQA that has the ability of evaluating all types of… ▽ More No-reference image quality assessment (NR-IQA) has received increasing attention in the IQA community since reference image is not always available. Real-world images generally suffer from various types of distortion. Unfortunately, existing NR-IQA methods do not work with all types of distortion. It is a challenging task to develop universal NR-IQA that has the ability of evaluating all types of distorted images. In this paper, we propose a universal NR-IQA method based on controllable list-wise ranking (CLRIQA). First, to extend the authentically distorted image dataset, we present an imaging-heuristic approach, in which the over-underexposure is formulated as an inverse of Weber-Fechner law, and fusion strategy and probabilistic compression are adopted, to generate the degraded real-world images. These degraded images are label-free yet associated with quality ranking information. We then design a controllable list-wise ranking function by limiting rank range and introducing an adaptive margin to tune rank interval. Finally, the extended dataset and controllable list-wise ranking function are used to pre-train a CNN. Moreover, in order to obtain an accurate prediction model, we take advantage of the original dataset to further fine-tune the pre-trained network. Experiments evaluated on four benchmark datasets (i.e. LIVE, CSIQ, TID2013, and LIVE-C) show that the proposed CLRIQA improves the state of the art by over 9% in terms of overall performance. The code and model are publicly available at https://github.com/GZHU-Image-Lab/CLRIQA. △ Less

Submitted 5 January, 2020; v1 submitted 24 November, 2019; originally announced November 2019.

arXiv:1909.13028 [pdf, other]

Semantic Example Guided Image-to-Image Translation

Authors: Jialu Huang, Jing Liao, Tak Wu Sam Kwong

Abstract: Many image-to-image (I2I) translation problems are in nature of high diversity that a single input may have various counterparts. Prior works proposed the multi-modal network that can build a many-to-many mapping between two visual domains. However, most of them are guided by sampled noises. Some others encode the reference images into a latent vector, by which the semantic information of the refe… ▽ More Many image-to-image (I2I) translation problems are in nature of high diversity that a single input may have various counterparts. Prior works proposed the multi-modal network that can build a many-to-many mapping between two visual domains. However, most of them are guided by sampled noises. Some others encode the reference images into a latent vector, by which the semantic information of the reference image will be washed away. In this work, we aim to provide a solution to control the output based on references semantically. Given a reference image and an input in another domain, a semantic matching is first performed between the two visual contents and generates the auxiliary image, which is explicitly encouraged to preserve semantic characteristics of the reference. A deep network then is used for I2I translation and the final outputs are expected to be semantically similar to both the input and the reference; however, no such paired data can satisfy that dual-similarity in a supervised fashion, so we build up a self-supervised framework to serve the training purpose. We improve the quality and diversity of the outputs by employing non-local blocks and a multi-task architecture. We assess the proposed method through extensive qualitative and quantitative evaluations and also presented comparisons with several state-of-art models. △ Less

Submitted 3 October, 2019; v1 submitted 28 September, 2019; originally announced September 2019.

Comments: 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:1909.01341 [pdf, other]

Deep Coarse-to-fine Dense Light Field Reconstruction with Flexible Sampling and Geometry-aware Fusion

Authors: Jing Jin, Junhui Hou, Jie Chen, Huanqiang Zeng, Sam Kwong, Jingyi Yu

Abstract: A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficie… ▽ More A densely-sampled light field (LF) is highly desirable in various applications, such as 3-D reconstruction, post-capture refocusing and virtual reality. However, it is costly to acquire such data. Although many computational methods have been proposed to reconstruct a densely-sampled LF from a sparsely-sampled one, they still suffer from either low reconstruction quality, low computational efficiency, or the restriction on the regularity of the sampling pattern. To this end, we propose a novel learning-based method, which accepts sparsely-sampled LFs with irregular structures, and produces densely-sampled LFs with arbitrary angular resolution accurately and efficiently. We also propose a simple yet effective method for optimizing the sampling pattern. Our proposed method, an end-to-end trainable network, reconstructs a densely-sampled LF in a coarse-to-fine manner. Specifically, the coarse sub-aperture image (SAI) synthesis module first explores the scene geometry from an unstructured sparsely-sampled LF and leverages it to independently synthesize novel SAIs, in which a confidence-based blending strategy is proposed to fuse the information from different input SAIs, giving an intermediate densely-sampled LF. Then, the efficient LF refinement module learns the angular relationship within the intermediate result to recover the LF parallax structure. Comprehensive experimental evaluations demonstrate the superiority of our method on both real-world and synthetic LF images when compared with state-of-the-art methods. In addition, we illustrate the benefits and advantages of the proposed approach when applied in various LF-based applications, including image-based rendering and depth estimation enhancement. △ Less

Submitted 26 September, 2020; v1 submitted 31 August, 2019; originally announced September 2019.

Comments: 17 pages, 11 figures, 10 tables

arXiv:1908.02172 [pdf, other]

Bayesian Network Based Label Correlation Analysis For Multi-label Classifier Chain

Authors: Ran Wang, Suhe Ye, Ke Li, Sam Kwong

Abstract: Classifier chain (CC) is a multi-label learning approach that constructs a sequence of binary classifiers according to a label order. Each classifier in the sequence is responsible for predicting the relevance of one label. When training the classifier for a label, proceeding labels will be taken as extended features. If the extended features are highly correlated to the label, the performance wil… ▽ More Classifier chain (CC) is a multi-label learning approach that constructs a sequence of binary classifiers according to a label order. Each classifier in the sequence is responsible for predicting the relevance of one label. When training the classifier for a label, proceeding labels will be taken as extended features. If the extended features are highly correlated to the label, the performance will be improved, otherwise, the performance will not be influenced or even degraded. How to discover label correlation and determine the label order is critical for CC approach. This paper employs Bayesian network (BN) to model the label correlations and proposes a new BN-based CC method (BNCC). First, conditional entropy is used to describe the dependency relations among labels. Then, a BN is built up by taking nodes as labels and weights of edges as their dependency relations. A new scoring function is proposed to evaluate a BN structure, and a heuristic algorithm is introduced to optimize the BN. At last, by applying topological sorting on the nodes of the optimized BN, the label order for constructing CC model is derived. Experimental comparisons demonstrate the feasibility and effectiveness of the proposed method. △ Less

Submitted 6 August, 2019; originally announced August 2019.

Comments: 27 pages

arXiv:1907.09640 [pdf, other]

Light Field Super-resolution via Attention-Guided Fusion of Hybrid Lenses

Authors: Jing Jin, Junhui Hou, Jie Chen, Sam Kwong, Jingyi Yu

Abstract: This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. S… ▽ More This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation; the other one constructs another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations via the learned attention maps, leading to the final high-resolution LF image. Extensive experiments demonstrate the significant superiority of our approach over state-of-the-art ones. That is, our method not only improves the PSNR by more than 2 dB, but also preserves the LF structure much better. To the best of our knowledge, this is the first end-to-end deep learning method for reconstructing a high-resolution LF image with a hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and also be beneficial to LF data storage and transmission. The code is available at https://github.com/jingjin25/LFhybridSR-Fusion. △ Less

Submitted 31 July, 2020; v1 submitted 22 July, 2019; originally announced July 2019.

Comments: This paper was accepted by ACM MM 2020

arXiv:1906.08462 [pdf, other]

doi 10.1109/TGRS.2019.2925070

Nested Network with Two-Stream Pyramid for Salient Object Detection in Optical Remote Sensing Images

Authors: Chongyi Li, Runmin Cong, Junhui Hou, Sanyi Zhang, Yue Qian, Sam Kwong

Abstract: Arising from the various object types and scales, diverse imaging orientations, and cluttered backgrounds in optical remote sensing image (RSI), it is difficult to directly extend the success of salient object detection for nature scene image to the optical RSI. In this paper, we propose an end-to-end deep network called LV-Net based on the shape of network architecture, which detects salient obje… ▽ More Arising from the various object types and scales, diverse imaging orientations, and cluttered backgrounds in optical remote sensing image (RSI), it is difficult to directly extend the success of salient object detection for nature scene image to the optical RSI. In this paper, we propose an end-to-end deep network called LV-Net based on the shape of network architecture, which detects salient objects from optical RSIs in a purely data-driven fashion. The proposed LV-Net consists of two key modules, i.e., a two-stream pyramid module (L-shaped module) and an encoder-decoder module with nested connections (V-shaped module). Specifically, the L-shaped module extracts a set of complementary information hierarchically by using a two-stream pyramid structure, which is beneficial to perceiving the diverse scales and local details of salient objects. The V-shaped module gradually integrates encoder detail features with decoder semantic features through nested connections, which aims at suppressing the cluttered backgrounds and highlighting the salient objects. In addition, we construct the first publicly available optical RSI dataset for salient object detection, including 800 images with varying spatial resolutions, diverse saliency types, and pixel-wise ground truth. Experiments on this benchmark dataset demonstrate that the proposed method outperforms the state-of-the-art salient object detection methods both qualitatively and quantitatively. △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: 11 pages, 8 figures, has been accepted by TGRS

arXiv:1905.02001 [pdf, other]

Compressed Image Quality Assessment Based on Saak Features

Authors: Xinfeng Zhang, Sam Kwong, C. -C. Jay Kuo

Abstract: Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality assessment algorithm to measure the quality of compressed images. The proposed method utilizes a data-driven transform, Saak (Subspace approximat… ▽ More Compressed image quality assessment plays an important role in image services, especially in image compression applications, which can be utilized as a guidance to optimize image processing algorithms. In this paper, we propose an objective image quality assessment algorithm to measure the quality of compressed images. The proposed method utilizes a data-driven transform, Saak (Subspace approximation with augmented kernels), to decompose images into hierarchical structural feature space. We measure the distortions of Saak features and accumulate these distortions according to the feature importance to human visual system. Compared with the state-of-the-art image quality assessment methods on widely utilized datasets, the proposed method correlates better with the subjective results. In addition, the proposed methods achieves more robust results on different datasets. △ Less

Submitted 16 May, 2019; v1 submitted 6 May, 2019; originally announced May 2019.

arXiv:1905.01446 [pdf, ps, other]

Clustering-aware Graph Construction: A Joint Learning Perspective

Authors: Yuheng Jia, Hui Liu, Junhui Hou, Sam Kwong

Abstract: Graph-based clustering methods have demonstrated the effectiveness in various applications. Generally, existing graph-based clustering methods first construct a graph to represent the input data and then partition it to generate the clustering result. However, such a stepwise manner may make the constructed graph not fit the requirements for the subsequent decomposition, leading to compromised clu… ▽ More Graph-based clustering methods have demonstrated the effectiveness in various applications. Generally, existing graph-based clustering methods first construct a graph to represent the input data and then partition it to generate the clustering result. However, such a stepwise manner may make the constructed graph not fit the requirements for the subsequent decomposition, leading to compromised clustering accuracy. To this end, we propose a joint learning framework, which is able to learn the graph and the clustering result simultaneously, such that the resulting graph is tailored to the clustering task. The proposed model is formulated as a well-defined nonnegative and off-diagonal constrained optimization problem, which is further efficiently solved with convergence theoretically guaranteed. The advantage of the proposed model is demonstrated by comparing with 19 state-of-the-art clustering methods on 10 datasets with 4 clustering metrics. △ Less

Submitted 15 December, 2019; v1 submitted 4 May, 2019; originally announced May 2019.

arXiv:1901.05495 [pdf, other]

An Underwater Image Enhancement Benchmark Dataset and Beyond

Authors: Chongyi Li, Chunle Guo, Wenqi Ren, Runmin Cong, Junhui Hou, Sam Kwong, Dacheng Tao

Abstract: Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images… ▽ More Underwater image enhancement has been attracting much attention due to its significance in marine engineering and aquatic robotics. Numerous underwater image enhancement algorithms have been proposed in the last few years. However, these algorithms are mainly evaluated using either synthetic datasets or few selected real-world images. It is thus unclear how these algorithms would perform on images acquired in the wild and how we could gauge the progress in the field. To bridge this gap, we present the first comprehensive perceptual study and analysis of underwater image enhancement using large-scale real-world images. In this paper, we construct an Underwater Image Enhancement Benchmark (UIEB) including 950 real-world underwater images, 890 of which have the corresponding reference images. We treat the rest 60 underwater images which cannot obtain satisfactory reference images as challenging data. Using this dataset, we conduct a comprehensive study of the state-of-the-art underwater image enhancement algorithms qualitatively and quantitatively. In addition, we propose an underwater image enhancement network (called Water-Net) trained on this benchmark as a baseline, which indicates the generalization of the proposed UIEB for training Convolutional Neural Networks (CNNs). The benchmark evaluations and the proposed Water-Net demonstrate the performance and limitations of state-of-the-art algorithms, which shed light on future research in underwater image enhancement. The dataset and code are available at https://li-chongyi.github.io/proj_benchmark.html. △ Less

Submitted 26 November, 2019; v1 submitted 11 January, 2019; originally announced January 2019.

Comments: 14 pages

Journal ref: IEEE TRANSACTIONS ON IMAGE PROCESSING 2019

arXiv:1811.01323 [pdf, other]

A Batched Scalable Multi-Objective Bayesian Optimization Algorithm

Authors: Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qingfu Zhang, Sam Kwong

Abstract: The surrogate-assisted optimization algorithm is a promising approach for solving expensive multi-objective optimization problems. However, most existing surrogate-assisted multi-objective optimization algorithms have three main drawbacks: 1) cannot scale well for solving problems with high dimensional decision space, 2) cannot incorporate available gradient information, and 3) do not support batc… ▽ More The surrogate-assisted optimization algorithm is a promising approach for solving expensive multi-objective optimization problems. However, most existing surrogate-assisted multi-objective optimization algorithms have three main drawbacks: 1) cannot scale well for solving problems with high dimensional decision space, 2) cannot incorporate available gradient information, and 3) do not support batch optimization. These drawbacks prevent their use for solving many real-world large scale optimization problems. This paper proposes a batched scalable multi-objective Bayesian optimization algorithm to tackle these issues. The proposed algorithm uses the Bayesian neural network as the scalable surrogate model. Powered with Monte Carlo dropout and Sobolov training, the model can be easily trained and can incorporate available gradient information. We also propose a novel batch hypervolume upper confidence bound acquisition function to support batch optimization. Experimental results on various benchmark problems and a real-world application demonstrate the efficiency of the proposed algorithm. △ Less

Submitted 4 November, 2018; originally announced November 2018.

arXiv:1811.01316 [pdf, other]

Nonlinear Collaborative Scheme for Deep Neural Networks

Authors: Hui-Ling Zhen, Xi Lin, Alan Z. Tang, Zhenhua Li, Qingfu Zhang, Sam Kwong

Abstract: Conventional research attributes the improvements of generalization ability of deep neural networks either to powerful optimizers or the new network design. Different from them, in this paper, we aim to link the generalization ability of a deep network to optimizing a new objective function. To this end, we propose a \textit{nonlinear collaborative scheme} for deep network training, with the key t… ▽ More Conventional research attributes the improvements of generalization ability of deep neural networks either to powerful optimizers or the new network design. Different from them, in this paper, we aim to link the generalization ability of a deep network to optimizing a new objective function. To this end, we propose a \textit{nonlinear collaborative scheme} for deep network training, with the key technique as combining different loss functions in a nonlinear manner. We find that after adaptively tuning the weights of different loss functions, the proposed objective function can efficiently guide the optimization process. What is more, we demonstrate that, from the mathematical perspective, the nonlinear collaborative scheme can lead to (i) smaller KL divergence with respect to optimal solutions; (ii) data-driven stochastic gradient descent; (iii) tighter PAC-Bayes bound. We also prove that its advantage can be strengthened by nonlinearity increasing. To some extent, we bridge the gap between learning (i.e., minimizing the new objective function) and generalization (i.e., minimizing a PAC-Bayes bound) in the new scheme. We also interpret our findings through the experiments on Residual Networks and DenseNet, showing that our new scheme performs superior to single-loss and multi-loss schemes no matter with randomization or not. △ Less

Submitted 3 November, 2018; originally announced November 2018.

Comments: 11 pages, 3 figures (20 subfigures), prepared to submit to IEEE Trans. on Neural Networks and Learning Systems

arXiv:1704.02340 [pdf, other]

Evolutionary Many-Objective Optimization Based on Adversarial Decomposition

Authors: Mengyuan Wu, Ke Li, Sam Kwong, Qingfu Zhang

Abstract: The decomposition-based method has been recognized as a major approach for multi-objective optimization. It decomposes a multi-objective optimization problem into several single-objective optimization subproblems, each of which is usually defined as a scalarizing function using a weight vector. Due to the characteristics of the contour line of a particular scalarizing function, the performance of… ▽ More The decomposition-based method has been recognized as a major approach for multi-objective optimization. It decomposes a multi-objective optimization problem into several single-objective optimization subproblems, each of which is usually defined as a scalarizing function using a weight vector. Due to the characteristics of the contour line of a particular scalarizing function, the performance of the decomposition-based method strongly depends on the Pareto front's shape by merely using a single scalarizing function, especially when facing a large number of objectives. To improve the flexibility of the decomposition-based method, this paper develops an adversarial decomposition method that leverages the complementary characteristics of two different scalarizing functions within a single paradigm. More specifically, we maintain two co-evolving populations simultaneously by using different scalarizing functions. In order to avoid allocating redundant computational resources to the same region of the Pareto front, we stably match these two co-evolving populations into one-one solution pairs according to their working regions of the Pareto front. Then, each solution pair can at most contribute one mating parent during the mating selection process. Comparing with nine state-of-the-art many-objective optimizers, we have witnessed the competitive performance of our proposed algorithm on 130 many-objective test instances with various characteristics and Pareto front's shapes. △ Less

Submitted 7 April, 2017; originally announced April 2017.

Comments: 24 pages, 5 figures

arXiv:1701.01500 [pdf, other]

VideoSet: A Large-Scale Compressed Video Quality Dataset Based on JND Measurement

Authors: Haiqiang Wang, Ioannis Katsavounidis, Jiantong Zhou, Jeonghoon Park, Shawmin Lei, Xin Zhou, Man-On Pun, Xin Jin, Ronggang Wang, Xu Wang, Yun Zhang, Jiwu Huang, Sam Kwong, C. -C. Jay Kuo

Abstract: A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequence… ▽ More A new methodology to measure coded image/video quality using the just-noticeable-difference (JND) idea was proposed. Several small JND-based image/video quality datasets were released by the Media Communications Lab at the University of Southern California. In this work, we present an effort to build a large-scale JND-based coded video quality dataset. The dataset consists of 220 5-second sequences in four resolutions (i.e., $1920 \times 1080$, $1280 \times 720$, $960 \times 540$ and $640 \times 360$). For each of the 880 video clips, we encode it using the H.264 codec with $QP=1, \cdots, 51$ and measure the first three JND points with 30+ subjects. The dataset is called the "VideoSet", which is an acronym for "Video Subject Evaluation Test (SET)". This work describes the subjective test procedure, detection and removal of outlying measured data, and the properties of collected JND data. Finally, the significance and implications of the VideoSet to future video coding research and standardization efforts are pointed out. All source/coded video clips as well as measured JND data included in the VideoSet are available to the public in the IEEE DataPort. △ Less

Submitted 14 January, 2017; v1 submitted 5 January, 2017; originally announced January 2017.

arXiv:1608.08607 [pdf, other]

Matching-Based Selection with Incomplete Lists for Decomposition Multi-Objective Optimization

Authors: Mengyuan Wu, Ke Li, Sam Kwong, Yu Zhou, Qingfu Zhang

Abstract: The balance between convergence and diversity is a key issue of evolutionary multi-objective optimization. The recently proposed stable matching-based selection provides a new perspective to handle this balance under the framework of decomposition multi-objective optimization. In particular, the stable matching between subproblems and solutions, which achieves an equilibrium between their mutual p… ▽ More The balance between convergence and diversity is a key issue of evolutionary multi-objective optimization. The recently proposed stable matching-based selection provides a new perspective to handle this balance under the framework of decomposition multi-objective optimization. In particular, the stable matching between subproblems and solutions, which achieves an equilibrium between their mutual preferences, implicitly strikes a balance between the convergence and diversity. Nevertheless, the original stable matching model has a high risk of matching a solution with a unfavorable subproblem which finally leads to an imbalanced selection result. In this paper, we propose an adaptive two-level stable matching-based selection for decomposition multi-objective optimization. Specifically, borrowing the idea of stable matching with incomplete lists, we match each solution with one of its favorite subproblems by restricting the length of its preference list during the first-level stable matching. During the second-level stable matching, the remaining subproblems are thereafter matched with their favorite solutions according to the classic stable matching model. In particular, we develop an adaptive mechanism to automatically set the length of preference list for each solution according to its local competitiveness. The performance of our proposed method is validated and compared with several state-of-the-art evolutionary multi-objective optimization algorithms on 62 benchmark problem instances. Empirical results fully demonstrate the competitive performance of our proposed method on problems with complicated Pareto sets and those with more than three objectives. △ Less

Submitted 25 January, 2017; v1 submitted 30 August, 2016; originally announced August 2016.

Comments: 27 pages, 3 figures, 15 tables

Showing 51–86 of 86 results for author: Kwong, S