subscribe to arXiv mailings

NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Content (AIGC). The challenge is divided into the image track and the video track. The image track uses the AIGIQA-20K, which contains 20,000 AI-Generated Images (AIGIs) generated by 15 popular generative models. The image track has a total of 318 registered participants. A total of 1,646 submissions are received in the development phase, and 221 submissions are received in the test phase. Finally, 16 participating teams submitted their models and fact sheets. The video track uses the T2VQA-DB, which contains 10,000 AI-Generated Videos (AIGVs) generated by 9 popular Text-to-Video (T2V) models. A total of 196 participants have registered in the video track. A total of 991 submissions are received in the development phase, and 185 submissions are received in the test phase. Finally, 12 participating teams submitted their models and fact sheets. Some methods have achieved better results than baseline methods, and the winning methods in both tracks have demonstrated superior prediction performance on AIGC. △ Less

Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.11313 [pdf, other]

NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results

Authors: Xin Li, Kun Yuan, Yajing Pei, Yiting Lu, Ming Sun, Chao Zhou, Zhibo Chen, Radu Timofte, Wei Sun, Haoning Wu, Zicheng Zhang, Jun Jia, Zhichao Zhang, Linhan Cao, Qiubo Chen, Xiongkuo Min, Weisi Lin, Guangtao Zhai, Jianhui Sun, Tianyi Wang, Lei Li, Han Kong, Wenxuan Wang, Bing Li, Cheng Luo , et al. (43 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The… ▽ More This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024. △ Less

Submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR2024 Workshop. The challenge report for CVPR NTIRE2024 Short-form UGC Video Quality Assessment Challenge

arXiv:2403.08556 [pdf, other]

SM4Depth: Seamless Monocular Metric Depth Estimation across Multiple Cameras and Scenes by One Model

Authors: Yihao Liu, Feng Xue, Anlong Ming

Abstract: The generalization of monocular metric depth estimation (MMDE) has been a longstanding challenge. Recent methods made progress by combining relative and metric depth or aligning input image focal length. However, they are still beset by challenges in camera, scene, and data levels: (1) Sensitivity to different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on massive training data.… ▽ More The generalization of monocular metric depth estimation (MMDE) has been a longstanding challenge. Recent methods made progress by combining relative and metric depth or aligning input image focal length. However, they are still beset by challenges in camera, scene, and data levels: (1) Sensitivity to different cameras; (2) Inconsistent accuracy across scenes; (3) Reliance on massive training data. This paper proposes SM4Depth, a seamless MMDE method, to address all the issues above within a single network. First, we reveal that a consistent field of view (FOV) is the key to resolve ``metric ambiguity'' across cameras, which guides us to propose a more straightforward preprocessing unit. Second, to achieve consistently high accuracy across scenes, we explicitly model the metric scale determination as discretizing the depth interval into bins and propose variation-based unnormalized depth bins. This method bridges the depth gap of diverse scenes by reducing the ambiguity of the conventional metric bin. Third, to reduce the reliance on massive training data, we propose a ``divide and conquer" solution. Instead of estimating directly from the vast solution space, the correct metric bins are estimated from multiple solution sub-spaces for complexity reduction. Finally, with just 150K RGB-D pairs and a consumer-grade GPU for training, SM4Depth achieves state-of-the-art performance on most previously unseen datasets, especially surpassing ZoeDepth and Metric3D on mRI$_θ$. The code can be found at https://github.com/1hao-Liu/SM4Depth. △ Less

Submitted 13 March, 2024; originally announced March 2024.

Comments: Project Page: xuefeng-cvr.github.io/SM4Depth

arXiv:2403.07028 [pdf, other]

An Efficient Learning-based Solver Comparable to Metaheuristics for the Capacitated Arc Routing Problem

Authors: Runze Guo, Feng Xue, Anlong Ming, Nicu Sebe

Abstract: Recently, neural networks (NN) have made great strides in combinatorial optimization. However, they face challenges when solving the capacitated arc routing problem (CARP) which is to find the minimum-cost tour covering all required edges on a graph, while within capacity constraints. In tackling CARP, NN-based approaches tend to lag behind advanced metaheuristics, since they lack directed arc mod… ▽ More Recently, neural networks (NN) have made great strides in combinatorial optimization. However, they face challenges when solving the capacitated arc routing problem (CARP) which is to find the minimum-cost tour covering all required edges on a graph, while within capacity constraints. In tackling CARP, NN-based approaches tend to lag behind advanced metaheuristics, since they lack directed arc modeling and efficient learning methods tailored for complex CARP. In this paper, we introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics while exhibiting superior efficiency. First, we propose the direction-aware attention model (DaAM) to incorporate directionality into the embedding process, facilitating more effective one-stage decision-making. Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy for subsequent reinforcement fine-tuning. It proves particularly valuable for solving CARP that has a higher complexity than the node routing problems (NRPs). Finally, a path optimization method is proposed to adjust the depot return positions within the path generated by DaAM. Experiments illustrate that our approach surpasses heuristics and achieves decision quality comparable to state-of-the-art metaheuristics for the first time while maintaining superior efficiency. △ Less

Submitted 10 March, 2024; originally announced March 2024.

arXiv:2403.06831 [pdf, other]

HDRTransDC: High Dynamic Range Image Reconstruction with Transformer Deformation Convolution

Authors: Shuaikang Shang, Xuejing Kang, Anlong Ming

Abstract: High Dynamic Range (HDR) imaging aims to generate an artifact-free HDR image with realistic details by fusing multi-exposure Low Dynamic Range (LDR) images. Caused by large motion and severe under-/over-exposure among input LDR images, HDR imaging suffers from ghosting artifacts and fusion distortions. To address these critical issues, we propose an HDR Transformer Deformation Convolution (HDRTran… ▽ More High Dynamic Range (HDR) imaging aims to generate an artifact-free HDR image with realistic details by fusing multi-exposure Low Dynamic Range (LDR) images. Caused by large motion and severe under-/over-exposure among input LDR images, HDR imaging suffers from ghosting artifacts and fusion distortions. To address these critical issues, we propose an HDR Transformer Deformation Convolution (HDRTransDC) network to generate high-quality HDR images, which consists of the Transformer Deformable Convolution Alignment Module (TDCAM) and the Dynamic Weight Fusion Block (DWFB). To solve the ghosting artifacts, the proposed TDCAM extracts long-distance content similar to the reference feature in the entire non-reference features, which can accurately remove misalignment and fill the content occluded by moving objects. For the purpose of eliminating fusion distortions, we propose DWFB to spatially adaptively select useful information across frames to effectively fuse multi-exposed features. Extensive experiments show that our method quantitatively and qualitatively achieves state-of-the-art performance. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2401.01445 [pdf, other]

Indoor Obstacle Discovery on Reflective Ground via Monocular Camera

Authors: Feng Xue, Yicong Chang, Tianxi Wang, Yu Zhou, Anlong Ming

Abstract: Visual obstacle discovery is a key step towards autonomous navigation of indoor mobile robots. Successful solutions have many applications in multiple scenes. One of the exceptions is the reflective ground. In this case, the reflections on the floor resemble the true world, which confuses the obstacle discovery and leaves navigation unsuccessful. We argue that the key to this problem lies in obtai… ▽ More Visual obstacle discovery is a key step towards autonomous navigation of indoor mobile robots. Successful solutions have many applications in multiple scenes. One of the exceptions is the reflective ground. In this case, the reflections on the floor resemble the true world, which confuses the obstacle discovery and leaves navigation unsuccessful. We argue that the key to this problem lies in obtaining discriminative features for reflections and obstacles. Note that obstacle and reflection can be separated by the ground plane in 3D space. With this observation, we firstly introduce a pre-calibration based ground detection scheme that uses robot motion to predict the ground plane. Due to the immunity of robot motion to reflection, this scheme avoids failed ground detection caused by reflection. Given the detected ground, we design a ground-pixel parallax to describe the location of a pixel relative to the ground. Based on this, a unified appearance-geometry feature representation is proposed to describe objects inside rectangular boxes. Eventually, based on segmenting by detection framework, an appearance-geometry fusion regressor is designed to utilize the proposed feature to discover the obstacles. It also prevents our model from concentrating too much on parts of obstacles instead of whole obstacles. For evaluation, we introduce a new dataset for Obstacle on Reflective Ground (ORG), which comprises 15 scenes with various ground reflections, a total of more than 200 image sequences and 3400 RGB images. The pixel-wise annotations of ground and obstacle provide a comparison to our method and other methods. By reducing the misdetection of the reflection, the proposed approach outperforms others. The source code and the dataset will be available at https://github.com/XuefengBUPT/IndoorObstacleDiscovery-RG. △ Less

Submitted 2 January, 2024; originally announced January 2024.

Comments: International Journal of Computer Vision (IJCV) 2023. Project Page: https://xuefeng-cvr.github.io/IODRG

arXiv:2303.13769 [pdf, other]

Unknown Sniffer for Object Detection: Don't Turn a Blind Eye to Unknown Objects

Authors: Wenteng Liang, Feng Xue, Yihao Liu, Guofeng Zhong, Anlong Ming

Abstract: The recently proposed open-world object and open-set detection have achieved a breakthrough in finding never-seen-before objects and distinguishing them from known ones. However, their studies on knowledge transfer from known classes to unknown ones are not deep enough, resulting in the scanty capability for detecting unknowns hidden in the background. In this paper, we propose the unknown sniffer… ▽ More The recently proposed open-world object and open-set detection have achieved a breakthrough in finding never-seen-before objects and distinguishing them from known ones. However, their studies on knowledge transfer from known classes to unknown ones are not deep enough, resulting in the scanty capability for detecting unknowns hidden in the background. In this paper, we propose the unknown sniffer (UnSniffer) to find both unknown and known objects. Firstly, the generalized object confidence (GOC) score is introduced, which only uses known samples for supervision and avoids improper suppression of unknowns in the background. Significantly, such confidence score learned from known objects can be generalized to unknown ones. Additionally, we propose a negative energy suppression loss to further suppress the non-object samples in the background. Next, the best box of each unknown is hard to obtain during inference due to lacking their semantic information in training. To solve this issue, we introduce a graph-based determination scheme to replace hand-designed non-maximum suppression (NMS) post-processing. Finally, we present the Unknown Object Detection Benchmark, the first publicly benchmark that encompasses precision evaluation for unknown detection to our knowledge. Experiments show that our method is far better than the existing state-of-the-art methods. △ Less

Submitted 19 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

Comments: CVPR 2023 camera-ready; Code: https://github.com/Went-Liang/UnSniffer Project: https://xuefengbupt.github.io/project_page/unsniffer_cvpr23.html Demo: https://www.bilibili.com/video/BV1xM4y1z7Hv Supplymentary: https://xuefengbupt.github.io/project_page/pdf/supplementary_cvpr2023.pdf

arXiv:2302.13770 [pdf, other]

Mask Reference Image Quality Assessment

Authors: Pengxiang Xiao, Shuai He, Limin Liu, Anlong Ming

Abstract: Understanding semantic information is an essential step in knowing what is being learned in both full-reference (FR) and no-reference (NR) image quality assessment (IQA) methods. However, especially for many severely distorted images, even if there is an undistorted image as a reference (FR-IQA), it is difficult to perceive the lost semantic and texture information of distorted images directly. In… ▽ More Understanding semantic information is an essential step in knowing what is being learned in both full-reference (FR) and no-reference (NR) image quality assessment (IQA) methods. However, especially for many severely distorted images, even if there is an undistorted image as a reference (FR-IQA), it is difficult to perceive the lost semantic and texture information of distorted images directly. In this paper, we propose a Mask Reference IQA (MR-IQA) method that masks specific patches of a distorted image and supplements missing patches with the reference image patches. In this way, our model only needs to input the reconstructed image for quality assessment. First, we design a mask generator to select the best candidate patches from reference images and supplement the lost semantic information in distorted images, thus providing more reference for quality assessment; in addition, the different masked patches imply different data augmentations, which favors model training and reduces overfitting. Second, we provide a Mask Reference Network (MRNet): the dedicated modules can prevent disturbances due to masked patches and help eliminate the patch discontinuity in the reconstructed image. Our method achieves state-of-the-art performances on the benchmark KADID-10k, LIVE and CSIQ datasets and has better generalization performance across datasets. The code and results are available in the supplementary material. △ Less

Submitted 19 March, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: 10 pages, 6 figures

arXiv:2302.11119 [pdf, other]

Balanced Line Coverage in Large-scale Urban Scene

Authors: Hangsong Su, Feng Xue, Runze Guo, Anlong Ming

Abstract: Line coverage is to cover linear infrastructure modeled as 1D segments by robots, which received attention in recent years. With the increasing urbanization, the area of the city and the density of infrastructure continues to increase, which brings two issues: (1) Due to the energy constraint, it is hard for the homogeneous robot team to cover the large-scale linear infrastructure starting from on… ▽ More Line coverage is to cover linear infrastructure modeled as 1D segments by robots, which received attention in recent years. With the increasing urbanization, the area of the city and the density of infrastructure continues to increase, which brings two issues: (1) Due to the energy constraint, it is hard for the homogeneous robot team to cover the large-scale linear infrastructure starting from one depot; (2) In the large urban scene, the imbalance of robots' path greatly extends the time cost of the multi-robot system, which is more serious than that in smaller-size scenes. To address these issues, we propose a heterogeneous multi-robot approach consisting of several teams, each of which contains one transportation robot (TRob) and several coverage robots (CRobs). Firstly, a balanced graph partitioning (BGP) algorithm is proposed to divide the road network into several similar-size sub-graphs, and then the TRob delivers a group of CRobs to the subgraph region quickly. Secondly, a balanced ulusoy partitioning (BUP) algorithm is proposed to extract similar-length tours for each CRob from the sub-graph. Abundant experiments are conducted on seven road networks ranging in scales that are collected in this paper. Our method achieves robot utilization of 90% and the best maximal tour length at the cost of a small increase in total tour length, which further minimizes the time cost of the whole system. The source code and the road networks are available at https://github.com/suhangsong/BLC-LargeScale. △ Less

Submitted 21 February, 2023; originally announced February 2023.

arXiv:2204.06833 [pdf, other]

MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners

Authors: Tianxi Wang, Feng Xue, Yu Zhou, Anlong Ming

Abstract: For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and… ▽ More For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics. △ Less

Submitted 14 April, 2022; originally announced April 2022.

Comments: Accepted by Transactions on Cybernetics (TCYB)

arXiv:2203.04538 [pdf, other]

Monocular Depth Distribution Alignment with Low Computation

Authors: Fei Sheng, Feng Xue, Yicong Chang, Wenteng Liang, Anlong Ming

Abstract: The performance of monocular depth estimation generally depends on the amount of parameters and computational cost. It leads to a large accuracy contrast between light-weight networks and heavy-weight networks, which limits their application in the real world. In this paper, we model the majority of accuracy contrast between them as the difference of depth distribution, which we call "Distribution… ▽ More The performance of monocular depth estimation generally depends on the amount of parameters and computational cost. It leads to a large accuracy contrast between light-weight networks and heavy-weight networks, which limits their application in the real world. In this paper, we model the majority of accuracy contrast between them as the difference of depth distribution, which we call "Distribution drift". To this end, a distribution alignment network (DANet) is proposed. We firstly design a pyramid scene transformer (PST) module to capture inter-region interaction in multiple scales. By perceiving the difference of depth features between every two regions, DANet tends to predict a reasonable scene structure, which fits the shape of distribution to ground truth. Then, we propose a local-global optimization (LGO) scheme to realize the supervision of global range of scene depth. Thanks to the alignment of depth distribution shape and scene depth range, DANet sharply alleviates the distribution drift, and achieves a comparable performance with prior heavy-weight methods, but uses only 1% floating-point operations per second (FLOPs) of them. The experiments on two datasets, namely the widely used NYUDv2 dataset and the more challenging iBims-1 dataset, demonstrate the effectiveness of our method. The source code is available at https://github.com/YiLiM1/DANet. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by ICRA 2022

arXiv:2203.04537 [pdf, other]

Fast Road Segmentation via Uncertainty-aware Symmetric Network

Authors: Yicong Chang, Feng Xue, Fei Sheng, Wenteng Liang, Anlong Ming

Abstract: The high performance of RGB-D based road segmentation methods contrasts with their rare application in commercial autonomous driving, which is owing to two reasons: 1) the prior methods cannot achieve high inference speed and high accuracy in both ways; 2) the different properties of RGB and depth data are not well-exploited, limiting the reliability of predicted road. In this paper, based on the… ▽ More The high performance of RGB-D based road segmentation methods contrasts with their rare application in commercial autonomous driving, which is owing to two reasons: 1) the prior methods cannot achieve high inference speed and high accuracy in both ways; 2) the different properties of RGB and depth data are not well-exploited, limiting the reliability of predicted road. In this paper, based on the evidence theory, an uncertainty-aware symmetric network (USNet) is proposed to achieve a trade-off between speed and accuracy by fully fusing RGB and depth data. Firstly, cross-modal feature fusion operations, which are indispensable in the prior RGB-D based methods, are abandoned. We instead separately adopt two light-weight subnetworks to learn road representations from RGB and depth inputs. The light-weight structure guarantees the real-time inference of our method. Moreover, a multiscale evidence collection (MEC) module is designed to collect evidence in multiple scales for each modality, which provides sufficient evidence for pixel class determination. Finally, in uncertainty-aware fusion (UAF) module, the uncertainty of each modality is perceived to guide the fusion of the two subnetworks. Experimental results demonstrate that our method achieves a state-of-the-art accuracy with real-time inference speed of 43+ FPS. The source code is available at https://github.com/morancyc/USNet. △ Less

Submitted 9 March, 2022; originally announced March 2022.

Comments: Accepted by ICRA 2022

arXiv:2111.09204 [pdf, other]

Tiny Obstacle Discovery by Occlusion-Aware Multilayer Regression

Authors: Feng Xue, Anlong Ming, Yu Zhou

Abstract: Edges are the fundamental visual element for discovering tiny obstacles using a monocular camera. Nevertheless, tiny obstacles often have weak and inconsistent edge cues due to various properties such as small size and similar appearance to the free space, making it hard to capture them. ... Edges are the fundamental visual element for discovering tiny obstacles using a monocular camera. Nevertheless, tiny obstacles often have weak and inconsistent edge cues due to various properties such as small size and similar appearance to the free space, making it hard to capture them. ... △ Less

Submitted 17 November, 2021; originally announced November 2021.

Comments: Published in Transaction on Image Processing 2021

arXiv:2108.05722 [pdf, other]

MT-ORL: Multi-Task Occlusion Relationship Learning

Authors: Panhe Feng, Qi She, Lei Zhu, Jiaxin Li, Lin Zhang, Zijian Feng, Changhu Wang, Chunpeng Li, Xuejing Kang, Anlong Ming

Abstract: Retrieving occlusion relation among objects in a single image is challenging due to sparsity of boundaries in image. We observe two key issues in existing works: firstly, lack of an architecture which can exploit the limited amount of coupling in the decoder stage between the two subtasks, namely occlusion boundary extraction and occlusion orientation prediction, and secondly, improper representat… ▽ More Retrieving occlusion relation among objects in a single image is challenging due to sparsity of boundaries in image. We observe two key issues in existing works: firstly, lack of an architecture which can exploit the limited amount of coupling in the decoder stage between the two subtasks, namely occlusion boundary extraction and occlusion orientation prediction, and secondly, improper representation of occlusion orientation. In this paper, we propose a novel architecture called Occlusion-shared and Path-separated Network (OPNet), which solves the first issue by exploiting rich occlusion cues in shared high-level features and structured spatial information in task-specific low-level features. We then design a simple but effective orthogonal occlusion representation (OOR) to tackle the second issue. Our method surpasses the state-of-the-art methods by 6.1%/8.3% Boundary-AP and 6.5%/10% Orientation-AP on standard PIOD/BSDS ownership datasets. Code is available at https://github.com/fengpanhe/MT-ORL. △ Less

Submitted 18 August, 2021; v1 submitted 12 August, 2021; originally announced August 2021.

Comments: Accepted by ICCV 2021

arXiv:2102.13258 [pdf, other]

Boundary-induced and scene-aggregated network for monocular depth prediction

Authors: Feng Xue, Junfeng Cao, Yu Zhou, Fei Sheng, Yankai Wang, Anlong Ming

Abstract: Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2)… ▽ More Monocular depth prediction is an important task in scene understanding. It aims to predict the dense depth of a single RGB image. With the development of deep learning, the performance of this task has made great improvements. However, two issues remain unresolved: (1) The deep feature encodes the wrong farthest region in a scene, which leads to a distorted 3D structure of the predicted depth; (2) The low-level features are insufficient utilized, which makes it even harder to estimate the depth near the edge with sudden depth change. To tackle these two issues, we propose the Boundary-induced and Scene-aggregated network (BS-Net). In this network, the Depth Correlation Encoder (DCE) is first designed to obtain the contextual correlations between the regions in an image, and perceive the farthest region by considering the correlations. Meanwhile, the Bottom-Up Boundary Fusion (BUBF) module is designed to extract accurate boundary that indicates depth change. Finally, the Stripe Refinement module (SRM) is designed to refine the dense depth induced by the boundary cue, which improves the boundary accuracy of the predicted depth. Several experimental results on the NYUD v2 dataset and \xff{the iBims-1 dataset} illustrate the state-of-the-art performance of the proposed approach. And the SUN-RGBD dataset is employed to evaluate the generalization of our method. Code is available at https://github.com/XuefengBUPT/BS-Net. △ Less

Submitted 13 April, 2021; v1 submitted 25 February, 2021; originally announced February 2021.

Comments: Accepted by Pattern Recognition 2021

arXiv:1911.11582

DDNet: Dual-path Decoder Network for Occlusion Relationship Reasoning

Authors: Panhe Feng, Xuejing Kang, Lizhu Ye, Lei Zhu, Chunpeng Li, Anlong Ming

Abstract: Occlusion relationship reasoning based on convolution neural networks consists of two subtasks: occlusion boundary extraction and occlusion orientation inference. Due to the essential differences between the two subtasks in the feature expression at the higher and lower stages, it is challenging to carry on them simultaneously in one network. To address this issue, we propose a novel Dual-path Dec… ▽ More Occlusion relationship reasoning based on convolution neural networks consists of two subtasks: occlusion boundary extraction and occlusion orientation inference. Due to the essential differences between the two subtasks in the feature expression at the higher and lower stages, it is challenging to carry on them simultaneously in one network. To address this issue, we propose a novel Dual-path Decoder Network, which uniformly extracts occlusion information at higher stages and separates into two paths to recover boundary and occlusion orientation respectively in lower stages. Besides, considering the restriction of occlusion orientation presentation to occlusion orientation learning, we design a new orthogonal representation for occlusion orientation and proposed the Orthogonal Orientation Regression loss which can get rid of the unfitness between occlusion representation and learning and further prompt the occlusion orientation learning. Finally, we apply a multi-scale loss together with our proposed orientation regression loss to guide the boundary and orientation path learning respectively. Experiments demonstrate that our proposed method achieves state-of-the-art results on PIOD and BSDS ownership datasets. △ Less

Submitted 10 May, 2022; v1 submitted 26 November, 2019; originally announced November 2019.

Comments: The new one has been republished as arXiv:2108.05722

arXiv:1908.05898 [pdf, other]

Occlusion-shared and Feature-separated Network for Occlusion Relationship Reasoning

Authors: Rui Lu, Feng Xue, Menghan Zhou, Anlong Ming, Yu Zhou

Abstract: Occlusion relationship reasoning demands closed contour to express the object, and orientation of each contour pixel to describe the order relationship between objects. Current CNN-based methods neglect two critical issues of the task: (1) simultaneous existence of the relevance and distinction for the two elements, i.e, occlusion edge and occlusion orientation; and (2) inadequate exploration to t… ▽ More Occlusion relationship reasoning demands closed contour to express the object, and orientation of each contour pixel to describe the order relationship between objects. Current CNN-based methods neglect two critical issues of the task: (1) simultaneous existence of the relevance and distinction for the two elements, i.e, occlusion edge and occlusion orientation; and (2) inadequate exploration to the orientation features. For the reasons above, we propose the Occlusion-shared and Feature-separated Network (OFNet). On one hand, considering the relevance between edge and orientation, two sub-networks are designed to share the occlusion cue. On the other hand, the whole network is split into two paths to learn the high-level semantic features separately. Moreover, a contextual feature for orientation prediction is extracted, which represents the bilateral cue of the foreground and background areas. The bilateral cue is then fused with the occlusion cue to precisely locate the object regions. Finally, a stripe convolution is designed to further aggregate features from surrounding scenes of the occlusion edge. The proposed OFNet remarkably advances the state-of-the-art approaches on PIOD and BSDS ownership dataset. The source code is available at https://github.com/buptlr/OFNet. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: Accepted by ICCV 2019. Code and pretrained model are available at https://github.com/buptlr/OFNet

arXiv:1904.10161 [pdf, other]

A Novel Multi-layer Framework for Tiny Obstacle Discovery

Authors: Feng Xue, Anlong Ming, Menghan Zhou, Yu Zhou

Abstract: For tiny obstacle discovery in a monocular image, edge is a fundamental visual element. Nevertheless, because of various reasons, e.g., noise and similar color distribution with background, it is still difficult to detect the edges of tiny obstacles at long distance. In this paper, we propose an obstacle-aware discovery method to recover the missing contours of these obstacles, which helps to obta… ▽ More For tiny obstacle discovery in a monocular image, edge is a fundamental visual element. Nevertheless, because of various reasons, e.g., noise and similar color distribution with background, it is still difficult to detect the edges of tiny obstacles at long distance. In this paper, we propose an obstacle-aware discovery method to recover the missing contours of these obstacles, which helps to obtain obstacle proposals as much as possible. First, by using visual cues in monocular images, several multi-layer regions are elaborately inferred to reveal the distances from the camera. Second, several novel obstacle-aware occlusion edge maps are constructed to well capture the contours of tiny obstacles, which combines cues from each layer. Third, to ensure the existence of the tiny obstacle proposals, the maps from all layers are used for proposals extraction. Finally, based on these proposals containing tiny obstacles, a novel obstacle-aware regressor is proposed to generate an obstacle occupied probability map with high confidence. The convincing experimental results with comparisons on the Lost and Found dataset demonstrate the effectiveness of our approach, achieving around 9.5% improvement on the accuracy than FPHT and PHT, it even gets comparable performance to MergeNet. Moreover, our method outperforms the state-of-the-art algorithms and significantly improves the discovery ability for tiny obstacles at long distance. △ Less

Submitted 24 August, 2019; v1 submitted 23 April, 2019; originally announced April 2019.

Comments: Accepted to 2019 International Conference on Robotics and Automation (ICRA)

arXiv:1903.08890 [pdf, other]

Context-Constrained Accurate Contour Extraction for Occlusion Edge Detection

Authors: Rui Lu, Menghan Zhou, Anlong Ming, Yu Zhou

Abstract: Occlusion edge detection requires both accurate locations and context constraints of the contour. Existing CNN-based pipeline does not utilize adaptive methods to filter the noise introduced by low-level features. To address this dilemma, we propose a novel Context-constrained accurate Contour Extraction Network (CCENet). Spatial details are retained and contour-sensitive context is augmented thro… ▽ More Occlusion edge detection requires both accurate locations and context constraints of the contour. Existing CNN-based pipeline does not utilize adaptive methods to filter the noise introduced by low-level features. To address this dilemma, we propose a novel Context-constrained accurate Contour Extraction Network (CCENet). Spatial details are retained and contour-sensitive context is augmented through two extraction blocks, respectively. Then, an elaborately designed fusion module is available to integrate features, which plays a complementary role to restore details and remove clutter. Weight response of attention mechanism is eventually utilized to enhance occluded contours and suppress noise. The proposed CCENet significantly surpasses state-of-the-art methods on PIOD and BSDS ownership dataset of object edge detection and occlusion orientation detection. △ Less

Submitted 21 March, 2019; originally announced March 2019.

Comments: To appear in ICME 2019

arXiv:1612.00053 [pdf, other]

doi 10.1109/IROS.2011.6094693

A Novel Propulsion Method of Flexible Underwater Robots

Authors: Jun Shintake, Aiguo Ming, Makoto Shimojo

Abstract: This paper presents aims at mobility improvement of flexible underwater robots. For this purpose, a novel propulsion method using planar structural vibration pattern is proposed, and tested on two kinds of prototypes. The result of experiments showed the possibility of the movements for multiple directions: forward, backward, turn, rotation, drift, and their combination. These movements are achiev… ▽ More This paper presents aims at mobility improvement of flexible underwater robots. For this purpose, a novel propulsion method using planar structural vibration pattern is proposed, and tested on two kinds of prototypes. The result of experiments showed the possibility of the movements for multiple directions: forward, backward, turn, rotation, drift, and their combination. These movements are achieved by only one structure with two actuators. The results also indicated the possibility of driving using eigenmodes since movements were concentrated on low driving frequency area. To investigate the relation between movement and structural vibration pattern, we established a simulation model. △ Less

Submitted 30 November, 2016; originally announced December 2016.

Comments: 8 pages, 21 figures in 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

MSC Class: 68T40 ACM Class: I.2.9

arXiv:1005.1330 [pdf, ps, other]

Dimensional Tuning of the Magnetic-Structural Transition in A(Fe$_{1-x}$Co$_x$)$_2$As$_2$ (A=Sr,Ba)

Authors: Jack Gillett, Sitikantha D. Das, Paul Syers, Alison K. T. Ming, Jose I. Espeso, Chiara M. Petrone, Suchitra E. Sebastian

Abstract: A phase diagram of superconducting Sr(Fe$_{1-x}$Co$_x$)$_2$As$_2$ as a function of doping (x) is determined by a series of thermodynamic and transport measurements on single crystals. On comparison with a similar phase diagram for Ba(Fe$_{1-x}$Co$_x$)$_2$As$_2$ (Co-doped Ba122), we find that the increased dimensionality of Co-doped Sr122 results in a single first-order-like transition where the ma… ▽ More A phase diagram of superconducting Sr(Fe$_{1-x}$Co$_x$)$_2$As$_2$ as a function of doping (x) is determined by a series of thermodynamic and transport measurements on single crystals. On comparison with a similar phase diagram for Ba(Fe$_{1-x}$Co$_x$)$_2$As$_2$ (Co-doped Ba122), we find that the increased dimensionality of Co-doped Sr122 results in a single first-order-like transition where the magnetic and structural transitions coincide, unlike the case of Co-doped Ba122 that exhibits split quasicontinuous magnetic and structural transitions. We relate this dimensionally-tuned splitting in the magnetic and structural transitions to the relative size of superconducting temperatures in these materials. △ Less

Submitted 8 May, 2010; originally announced May 2010.

Comments: 4 pages, 4 figures

Showing 1–21 of 21 results for author: Ming, A