-
GAURA: Generalizable Approach for Unified Restoration and Rendering of Arbitrary Views
Authors:
Vinayak Gupta,
Rongali Simhachala Venkata Girish,
Mukund Varma T,
Ayush Tewari,
Kaushik Mitra
Abstract:
Neural rendering methods can achieve near-photorealistic image synthesis of scenes from posed input images. However, when the images are imperfect, e.g., captured in very low-light conditions, state-of-the-art methods fail to reconstruct high-quality 3D scenes. Recent approaches have tried to address this limitation by modeling various degradation processes in the image formation model; however, t…
▽ More
Neural rendering methods can achieve near-photorealistic image synthesis of scenes from posed input images. However, when the images are imperfect, e.g., captured in very low-light conditions, state-of-the-art methods fail to reconstruct high-quality 3D scenes. Recent approaches have tried to address this limitation by modeling various degradation processes in the image formation model; however, this limits them to specific image degradations. In this paper, we propose a generalizable neural rendering method that can perform high-fidelity novel view synthesis under several degradations. Our method, GAURA, is learning-based and does not require any test-time scene-specific optimization. It is trained on a synthetic dataset that includes several degradation types. GAURA outperforms state-of-the-art methods on several benchmarks for low-light enhancement, dehazing, deraining, and on-par for motion deblurring. Further, our model can be efficiently fine-tuned to any new incoming degradation using minimal data. We thus demonstrate adaptation results on two unseen degradations, desnowing and removing defocus blur. Code and video results are available at vinayak-vg.github.io/GAURA.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
A Blueprint Architecture of Compound AI Systems for Enterprise
Authors:
Eser Kandogan,
Sajjadur Rahman,
Nikita Bhutani,
Dan Zhang,
Rafael Li Chen,
Kushan Mitra,
Sairam Gurajada,
Pouya Pezeshkpour,
Hayate Iso,
Yanlin Feng,
Hannah Kim,
Chen Shen,
Jin Wang,
Estevam Hruschka
Abstract:
Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we intr…
▽ More
Large Language Models (LLMs) have showcased remarkable capabilities surpassing conventional NLP challenges, creating opportunities for use in production use cases. Towards this goal, there is a notable shift to building compound AI systems, wherein LLMs are integrated into an expansive software infrastructure with many components like models, retrievers, databases and tools. In this paper, we introduce a blueprint architecture for compound AI systems to operate in enterprise settings cost-effectively and feasibly. Our proposed architecture aims for seamless integration with existing compute and data infrastructure, with ``stream'' serving as the key orchestration concept to coordinate data and instructions among agents and other components. Task and data planners, respectively, break down, map, and optimize tasks and data to available agents and data sources defined in respective registries, given production constraints such as accuracy and latency.
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Stereo-Knowledge Distillation from dpMV to Dual Pixels for Light Field Video Reconstruction
Authors:
Aryan Garg,
Raghav Mallampali,
Akshat Joshi,
Shrisudhan Govindarajan,
Kaushik Mitra
Abstract:
Dual pixels contain disparity cues arising from the defocus blur. This disparity information is useful for many vision tasks ranging from autonomous driving to 3D creative realism. However, directly estimating disparity from dual pixels is less accurate. This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks e…
▽ More
Dual pixels contain disparity cues arising from the defocus blur. This disparity information is useful for many vision tasks ranging from autonomous driving to 3D creative realism. However, directly estimating disparity from dual pixels is less accurate. This work hypothesizes that distilling high-precision dark stereo knowledge, implicitly or explicitly, to efficient dual-pixel student networks enables faithful reconstructions. This dark knowledge distillation should also alleviate stereo-synchronization setup and calibration costs while dramatically increasing parameter and inference time efficiency. We collect the first and largest 3-view dual-pixel video dataset, dpMV, to validate our explicit dark knowledge distillation hypothesis. We show that these methods outperform purely monocular solutions, especially in challenging foreground-background separation regions using faithful guidance from dual pixels. Finally, we demonstrate an unconventional use case unlocked by dpMV and implicit dark knowledge distillation from an ensemble of teachers for Light Field (LF) video reconstruction. Our LF video reconstruction method is the fastest and most temporally consistent to date. It remains competitive in reconstruction fidelity while offering many other essential properties like high parameter efficiency, implicit disocclusion handling, zero-shot cross-dataset transfer, geometrically consistent inference on higher spatial-angular resolutions, and adaptive baseline control. All source code is available at the anonymous repository https://github.com/Aryan-Garg.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results
Authors:
Yuekun Dai,
Dafeng Zhang,
Xiaoming Li,
Zongsheng Yue,
Chongyi Li,
Shangchen Zhou,
Ruicheng Feng,
Peiqing Yang,
Zhezhu Jin,
Guanqun Liu,
Chen Change Loy,
Lize Zhang,
Shuai Liu,
Chaoyu Feng,
Luyang Wang,
Shuan Chen,
Guangqi Shao,
Xiaotao Wang,
Lei Lei,
Qirui Yang,
Qihua Cheng,
Zhiqiang Xu,
Yihao Liu,
Huanjing Yue,
Jingyu Yang
, et al. (38 additional authors not shown)
Abstract:
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra…
▽ More
The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photography and imaging (MIPI). Building on the achievements of the previous MIPI Workshops held at ECCV 2022 and CVPR 2023, we introduce our third MIPI challenge including three tracks focusing on novel image sensors and imaging algorithms. In this paper, we summarize and review the Nighttime Flare Removal track on MIPI 2024. In total, 170 participants were successfully registered, and 14 teams submitted results in the final testing phase. The developed solutions in this challenge achieved state-of-the-art performance on Nighttime Flare Removal. More details of this challenge and the link to the dataset can be found at https://mipi-challenge.org/MIPI2024/.
△ Less
Submitted 27 May, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Authors:
Bhargav Ghanekar,
Salman Siddique Khan,
Pranav Sharma,
Shreyas Singh,
Vivek Boominathan,
Kaushik Mitra,
Ashok Veeraraghavan
Abstract:
Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potent…
▽ More
Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potential solution to achieve the same. DP sensors collect light rays from two different halves of the lens in two interleaved pixel arrays, thus capturing two slightly different views of the scene, like a stereo camera system. However, imaging with a DP sensor implies that the defocus blur size is directly proportional to the disparity seen between the views. This creates a trade-off between disparity estimation vs. deblurring accuracy. To improve this trade-off effect, we propose CADS (Coded Aperture Dual-Pixel Sensing), in which we use a coded aperture in the imaging lens along with a DP sensor. In our approach, we jointly learn an optimal coded pattern and the reconstruction algorithm in an end-to-end optimization setting. Our resulting CADS imaging system demonstrates improvement of >1.5dB PSNR in all-in-focus (AIF) estimates and 5-6% in depth estimation quality over naive DP sensing for a wide range of aperture settings. Furthermore, we build the proposed CADS prototypes for DSLR photography settings and in an endoscope and a dermoscope form factor. Our novel coded dual-pixel sensing approach demonstrates accurate RGB-D reconstruction results in simulations and real-world experiments in a passive, snapshot, and compact manner.
△ Less
Submitted 30 March, 2024; v1 submitted 28 February, 2024;
originally announced February 2024.
-
MEGAnno+: A Human-LLM Collaborative Annotation System
Authors:
Hannah Kim,
Kushan Mitra,
Rafael Li Chen,
Sajjadur Rahman,
Dan Zhang
Abstract:
Large language models (LLMs) can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present M…
▽ More
Large language models (LLMs) can label data faster and cheaper than humans for various NLP tasks. Despite their prowess, LLMs may fall short in understanding of complex, sociocultural, or domain-specific context, potentially leading to incorrect annotations. Therefore, we advocate a collaborative approach where humans and LLMs work together to produce reliable and high-quality labels. We present MEGAnno+, a human-LLM collaborative annotation system that offers effective LLM agent and annotation management, convenient and robust LLM annotation, and exploratory verification of LLM labels by humans.
△ Less
Submitted 27 February, 2024;
originally announced February 2024.
-
Characterizing Large Language Models as Rationalizers of Knowledge-intensive Tasks
Authors:
Aditi Mishra,
Sajjadur Rahman,
Hannah Kim,
Kushan Mitra,
Estevam Hruschka
Abstract:
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision. Yet, their ability to provide well-grounded rationalizations for knowledge-intensive tasks remains under-explored. Such tasks, like commonsense multiple-choice questions, require rationales based on world knowledge to support predictions and refute alternate options. We consider the task o…
▽ More
Large language models (LLMs) are proficient at generating fluent text with minimal task-specific supervision. Yet, their ability to provide well-grounded rationalizations for knowledge-intensive tasks remains under-explored. Such tasks, like commonsense multiple-choice questions, require rationales based on world knowledge to support predictions and refute alternate options. We consider the task of generating knowledge-guided rationalization in natural language by using expert-written examples in a few-shot manner. Surprisingly, crowd-workers preferred knowledge-grounded rationales over crowdsourced rationalizations, citing their factuality, sufficiency, and comprehensive refutations. Although LLMs-generated rationales were preferable, further improvements in conciseness and novelty are required. In another study, we show how rationalization of incorrect model predictions erodes humans' trust in LLM-generated rationales. Motivated by these observations, we create a two-stage pipeline to review task predictions and eliminate potential incorrect decisions before rationalization, enabling trustworthy rationale generation.
△ Less
Submitted 31 January, 2024; v1 submitted 8 November, 2023;
originally announced November 2023.
-
Spectrum-inspired Low-light Image Translation for Saliency Detection
Authors:
Kitty Varghese,
Sudarshan Rajagopalan,
Mohit Lamba,
Kaushik Mitra
Abstract:
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only…
▽ More
Saliency detection methods are central to several real-world applications such as robot navigation and satellite imagery. However, the performance of existing methods deteriorate under low-light conditions because training datasets mostly comprise of well-lit images. One possible solution is to collect a new dataset for low-light conditions. This involves pixel-level annotations, which is not only tedious and time-consuming but also infeasible if a huge training corpus is required. We propose a technique that performs classical band-pass filtering in the Fourier space to transform well-lit images to low-light images and use them as a proxy for real low-light images. Unlike popular deep learning approaches which require learning thousands of parameters and enormous amounts of training data, the proposed transformation is fast and simple and easy to extend to other tasks such as low-light depth estimation. Our experiments show that the state-of-the-art saliency detection and depth estimation networks trained on our proxy low-light images perform significantly better on real low-light images than networks trained using existing strategies.
△ Less
Submitted 17 March, 2023;
originally announced March 2023.
-
Acela: Predictable Datacenter-level Maintenance Job Scheduling
Authors:
Yi Ding,
Aijia Gao,
Thibaud Ryden,
Kaushik Mitra,
Sukumar Kalmanje,
Yanai Golany,
Michael Carbin,
Henry Hoffmann
Abstract:
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the st…
▽ More
Datacenter operators ensure fair and regular server maintenance by using automated processes to schedule maintenance jobs to complete within a strict time budget. Automating this scheduling problem is challenging because maintenance job duration varies based on both job type and hardware. While it is tempting to use prior machine learning techniques for predicting job duration, we find that the structure of the maintenance job scheduling problem creates a unique challenge. In particular, we show that prior machine learning methods that produce the lowest error predictions do not produce the best scheduling outcomes due to asymmetric costs. Specifically, underpredicting maintenance job duration has results in more servers being taken offline and longer server downtime than overpredicting maintenance job duration. The system cost of underprediction is much larger than that of overprediction.
We present Acela, a machine learning system for predicting maintenance job duration, which uses quantile regression to bias duration predictions toward overprediction. We integrate Acela into a maintenance job scheduler and evaluate it on datasets from large-scale, production datacenters. Compared to machine learning based predictors from prior work, Acela reduces the number of servers that are taken offline by 1.87-4.28X, and reduces the server offline time by 1.40-2.80X.
△ Less
Submitted 9 December, 2022;
originally announced December 2022.
-
Towards Realistic Underwater Dataset Generation and Color Restoration
Authors:
Neham Jain,
Gopi Matta,
Kaushik Mitra
Abstract:
Recovery of true color from underwater images is an ill-posed problem. This is because the wide-band attenuation coefficients for the RGB color channels depend on object range, reflectance, etc. which are difficult to model. Also, there is backscattering due to suspended particles in water. Thus, most existing deep-learning based color restoration methods, which are trained on synthetic underwater…
▽ More
Recovery of true color from underwater images is an ill-posed problem. This is because the wide-band attenuation coefficients for the RGB color channels depend on object range, reflectance, etc. which are difficult to model. Also, there is backscattering due to suspended particles in water. Thus, most existing deep-learning based color restoration methods, which are trained on synthetic underwater datasets, do not perform well on real underwater data. This can be attributed to the fact that synthetic data cannot accurately represent real conditions. To address this issue, we use an image to image translation network to bridge the gap between the synthetic and real domains by translating images from synthetic underwater domain to real underwater domain. Using this multimodal domain adaptation technique, we create a dataset that can capture a diverse array of underwater conditions. We then train a simple but effective CNN based network on our domain adapted dataset to perform color restoration. Code and pre-trained models can be accessed at https://github.com/nehamjain10/TRUDGCR
△ Less
Submitted 16 December, 2022; v1 submitted 27 November, 2022;
originally announced November 2022.
-
LWGNet: Learned Wirtinger Gradients for Fourier Ptychographic Phase Retrieval
Authors:
Atreyee Saha,
Salman S Khan,
Sagar Sehrawat,
Sanjana S Prabhu,
Shanti Bhattacharya,
Kaushik Mitra
Abstract:
Fourier Ptychographic Microscopy (FPM) is an imaging procedure that overcomes the traditional limit on Space-Bandwidth Product (SBP) of conventional microscopes through computational means. It utilizes multiple images captured using a low numerical aperture (NA) objective and enables high-resolution phase imaging through frequency domain stitching. Existing FPM reconstruction methods can be broadl…
▽ More
Fourier Ptychographic Microscopy (FPM) is an imaging procedure that overcomes the traditional limit on Space-Bandwidth Product (SBP) of conventional microscopes through computational means. It utilizes multiple images captured using a low numerical aperture (NA) objective and enables high-resolution phase imaging through frequency domain stitching. Existing FPM reconstruction methods can be broadly categorized into two approaches: iterative optimization based methods, which are based on the physics of the forward imaging model, and data-driven methods which commonly employ a feed-forward deep learning framework. We propose a hybrid model-driven residual network that combines the knowledge of the forward imaging system with a deep data-driven network. Our proposed architecture, LWGNet, unrolls traditional Wirtinger flow optimization algorithm into a novel neural network design that enhances the gradient images through complex convolutional blocks. Unlike other conventional unrolling techniques, LWGNet uses fewer stages while performing at par or even better than existing traditional and deep learning techniques, particularly, for low-cost and low dynamic range CMOS sensors. This improvement in performance for low-bit depth and low-cost sensors has the potential to bring down the cost of FPM imaging setup significantly. Finally, we show consistently improved performance on our collected real data.
△ Less
Submitted 16 August, 2022; v1 submitted 8 August, 2022;
originally announced August 2022.
-
Towards Fast and Light-Weight Restoration of Dark Images
Authors:
Mohit Lamba,
Atul Balaji,
Kaushik Mitra
Abstract:
The ability to capture good quality images in the dark and near-zero lux conditions has been a long-standing pursuit of the computer vision community. The seminal work by Chen et al. [5] has especially caused renewed interest in this area, resulting in methods that build on top of their work in a bid to improve the reconstruction. However, for practical utility and deployment of low-light enhancem…
▽ More
The ability to capture good quality images in the dark and near-zero lux conditions has been a long-standing pursuit of the computer vision community. The seminal work by Chen et al. [5] has especially caused renewed interest in this area, resulting in methods that build on top of their work in a bid to improve the reconstruction. However, for practical utility and deployment of low-light enhancement algorithms on edge devices such as embedded systems, surveillance cameras, autonomous robots and smartphones, the solution must respect additional constraints such as limited GPU memory and processing power. With this in mind, we propose a deep neural network architecture that aims to strike a balance between the network latency, memory utilization, model parameters, and reconstruction quality. The key idea is to forbid computations in the High-Resolution (HR) space and limit them to a Low-Resolution (LR) space. However, doing the bulk of computations in the LR space causes artifacts in the restored image. We thus propose Pack and UnPack operations, which allow us to effectively transit between the HR and LR spaces without incurring much artifacts in the restored image. We show that we can enhance a full resolution, 2848 x 4256, extremely dark single-image in the ballpark of 3 seconds even on a CPU. We achieve this with 2 - 7x fewer model parameters, 2 - 3x lower memory utilization, 5 - 20x speed up and yet maintain a competitive image reconstruction quality compared to the state-of-the-art algorithms.
△ Less
Submitted 28 November, 2020;
originally announced November 2020.
-
A Unified Framework for Compressive Video Recovery from Coded Exposure Techniques
Authors:
Prasan Shedligeri,
Anupama S,
Kaushik Mitra
Abstract:
Several coded exposure techniques have been proposed for acquiring high frame rate videos at low bandwidth. Most recently, a Coded-2-Bucket camera has been proposed that can acquire two compressed measurements in a single exposure, unlike previously proposed coded exposure techniques, which can acquire only a single measurement. Although two measurements are better than one for an effective video…
▽ More
Several coded exposure techniques have been proposed for acquiring high frame rate videos at low bandwidth. Most recently, a Coded-2-Bucket camera has been proposed that can acquire two compressed measurements in a single exposure, unlike previously proposed coded exposure techniques, which can acquire only a single measurement. Although two measurements are better than one for an effective video recovery, we are yet unaware of the clear advantage of two measurements, either quantitatively or qualitatively. Here, we propose a unified learning-based framework to make such a qualitative and quantitative comparison between those which capture only a single coded image (Flutter Shutter, Pixel-wise coded exposure) and those that capture two measurements per exposure (C2B). Our learning-based framework consists of a shift-variant convolutional layer followed by a fully convolutional deep neural network. Our proposed unified framework achieves the state of the art reconstructions in all three sensing techniques. Further analysis shows that when most scene points are static, the C2B sensor has a significant advantage over acquiring a single pixel-wise coded measurement. However, when most scene points undergo motion, the C2B sensor has only a marginal benefit over the single pixel-wise coded exposure measurement.
△ Less
Submitted 10 November, 2020;
originally announced November 2020.
-
FlatNet: Towards Photorealistic Scene Reconstruction from Lensless Measurements
Authors:
Salman S. Khan,
Varun Sundar,
Vivek Boominathan,
Ashok Veeraraghavan,
Kaushik Mitra
Abstract:
Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer imag…
▽ More
Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer images. In this work, we propose a non-iterative deep learning based reconstruction approach that results in orders of magnitude improvement in image quality for lensless reconstructions. Our approach, called $\textit{FlatNet}$, lays down a framework for reconstructing high-quality photorealistic images from mask-based lensless cameras, where the camera's forward model formulation is known. FlatNet consists of two stages: (1) an inversion stage that maps the measurement into a space of intermediate reconstruction by learning parameters within the forward model formulation, and (2) a perceptual enhancement stage that improves the perceptual quality of this intermediate reconstruction. These stages are trained together in an end-to-end manner. We show high-quality reconstructions by performing extensive experiments on real and challenging scenes using two different types of lensless prototypes: one which uses a separable forward model and another, which uses a more general non-separable cropped-convolution model. Our end-to-end approach is fast, produces photorealistic reconstructions, and is easy to adopt for other mask-based lensless cameras.
△ Less
Submitted 29 October, 2020;
originally announced October 2020.
-
Video Reconstruction by Spatio-Temporal Fusion of Blurred-Coded Image Pair
Authors:
S Anupama,
Prasan Shedligeri,
Abhishek Pal,
Kaushik Mitra
Abstract:
Learning-based methods have enabled the recovery of a video sequence from a single motion-blurred image or a single coded exposure image. Recovering video from a single motion-blurred image is a very ill-posed problem and the recovered video usually has many artifacts. In addition to this, the direction of motion is lost and it results in motion ambiguity. However, it has the advantage of fully pr…
▽ More
Learning-based methods have enabled the recovery of a video sequence from a single motion-blurred image or a single coded exposure image. Recovering video from a single motion-blurred image is a very ill-posed problem and the recovered video usually has many artifacts. In addition to this, the direction of motion is lost and it results in motion ambiguity. However, it has the advantage of fully preserving the information in the static parts of the scene. The traditional coded exposure framework is better-posed but it only samples a fraction of the space-time volume, which is at best 50% of the space-time volume. Here, we propose to use the complementary information present in the fully-exposed (blurred) image along with the coded exposure image to recover a high fidelity video without any motion ambiguity. Our framework consists of a shared encoder followed by an attention module to selectively combine the spatial information from the fully-exposed image with the temporal information from the coded image, which is then super-resolved to recover a non-ambiguous high-quality video. The input to our algorithm is a fully-exposed and coded image pair. Such an acquisition system already exists in the form of a Coded-two-bucket (C2B) camera. We demonstrate that our proposed deep learning approach using blurred-coded image pair produces much better results than those from just a blurred image or just a coded image.
△ Less
Submitted 13 November, 2020; v1 submitted 20 October, 2020;
originally announced October 2020.
-
UDC 2020 Challenge on Image Restoration of Under-Display Camera: Methods and Results
Authors:
Yuqian Zhou,
Michael Kwan,
Kyle Tolentino,
Neil Emerton,
Sehoon Lim,
Tim Large,
Lijiang Fu,
Zhihong Pan,
Baopu Li,
Qirui Yang,
Yihao Liu,
Jigang Tang,
Tao Ku,
Shibin Ma,
Bingnan Hu,
Jiarong Wang,
Densen Puthussery,
Hrishikesh P S,
Melvin Kuriakose,
Jiji C V,
Varun Sundar,
Sumanth Hegde,
Divya Kothandaraman,
Kaushik Mitra,
Akashdeep Jassal
, et al. (20 additional authors not shown)
Abstract:
This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, ei…
▽ More
This paper is the report of the first Under-Display Camera (UDC) image restoration challenge in conjunction with the RLQ workshop at ECCV 2020. The challenge is based on a newly-collected database of Under-Display Camera. The challenge tracks correspond to two types of display: a 4k Transparent OLED (T-OLED) and a phone Pentile OLED (P-OLED). Along with about 150 teams registered the challenge, eight and nine teams submitted the results during the testing phase for each track. The results in the paper are state-of-the-art restoration performance of Under-Display Camera Restoration. Datasets and paper are available at https://yzhouas.github.io/projects/UDC/udc.html.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Deep Atrous Guided Filter for Image Restoration in Under Display Cameras
Authors:
Varun Sundar,
Sumanth Hegde,
Divya Kothandaraman,
Kaushik Mitra
Abstract:
Under Display Cameras present a promising opportunity for phone manufacturers to achieve bezel-free displays by positioning the camera behind semi-transparent OLED screens. Unfortunately, such imaging systems suffer from severe image degradation due to light attenuation and diffraction effects. In this work, we present Deep Atrous Guided Filter (DAGF), a two-stage, end-to-end approach for image re…
▽ More
Under Display Cameras present a promising opportunity for phone manufacturers to achieve bezel-free displays by positioning the camera behind semi-transparent OLED screens. Unfortunately, such imaging systems suffer from severe image degradation due to light attenuation and diffraction effects. In this work, we present Deep Atrous Guided Filter (DAGF), a two-stage, end-to-end approach for image restoration in UDC systems. A Low-Resolution Network first restores image quality at low-resolution, which is subsequently used by the Guided Filter Network as a filtering input to produce a high-resolution output. Besides the initial downsampling, our low-resolution network uses multiple, parallel atrous convolutions to preserve spatial resolution and emulates multi-scale processing. Our approach's ability to directly train on megapixel images results in significant performance improvement. We additionally propose a simple simulation scheme to pre-train our model and boost performance. Our overall framework ranks 2nd and 5th in the RLQ-TOD'20 UDC Challenge for POLED and TOLED displays, respectively.
△ Less
Submitted 1 September, 2020; v1 submitted 14 August, 2020;
originally announced August 2020.
-
Monocular Retinal Depth Estimation and Joint Optic Disc and Cup Segmentation using Adversarial Networks
Authors:
Sharath M Shankaranarayana,
Keerthi Ram,
Kaushik Mitra,
Mohanasankar Sivaprakasam
Abstract:
One of the important parameters for the assessment of glaucoma is optic nerve head (ONH) evaluation, which usually involves depth estimation and subsequent optic disc and cup boundary extraction. Depth is usually obtained explicitly from imaging modalities like optical coherence tomography (OCT) and is very challenging to estimate depth from a single RGB image. To this end, we propose a novel meth…
▽ More
One of the important parameters for the assessment of glaucoma is optic nerve head (ONH) evaluation, which usually involves depth estimation and subsequent optic disc and cup boundary extraction. Depth is usually obtained explicitly from imaging modalities like optical coherence tomography (OCT) and is very challenging to estimate depth from a single RGB image. To this end, we propose a novel method using adversarial network to predict depth map from a single image. The proposed depth estimation technique is trained and evaluated using individual retinal images from INSPIRE-stereo dataset. We obtain a very high average correlation coefficient of 0.92 upon five fold cross validation outperforming the state of the art. We then use the depth estimation process as a proxy task for joint optic disc and cup segmentation.
△ Less
Submitted 15 July, 2020;
originally announced July 2020.
-
A Blockchain-based Approach for Assessing Compliance with SLA-guaranteed IoT Services
Authors:
A. Alzubaidi,
K. Mitra,
P. Patel,
E. Solaiman
Abstract:
Within cloud-based internet of things (IoT) applications, typically cloud providers employ Service Level Agreements (SLAs) to ensure the quality of their provisioned services. Similar to any other contractual method, an SLA is not immune to breaches. Ideally, an SLA stipulates consequences (e.g. penalties) imposed on cloud providers when they fail to conform to SLA terms. The current practice assu…
▽ More
Within cloud-based internet of things (IoT) applications, typically cloud providers employ Service Level Agreements (SLAs) to ensure the quality of their provisioned services. Similar to any other contractual method, an SLA is not immune to breaches. Ideally, an SLA stipulates consequences (e.g. penalties) imposed on cloud providers when they fail to conform to SLA terms. The current practice assumes trust in service providers to acknowledge SLA breach incidents and executing associated consequences. Recently, the Blockchain paradigm has introduced compelling capabilities that may enable us to address SLA enforcement more elegantly. This paper proposes and implements a blockchain-based approach for assessing SLA compliance and enforcing consequences. It employs a diagnostic accuracy method for validating the dependability of the proposed solution. The paper also benchmarks Hyperledger Fabric to investigate its feasibility as an underlying blockchain infrastructure concerning latency and transaction success/fail rates.
△ Less
Submitted 27 June, 2020;
originally announced June 2020.
-
Pyramidal Edge-maps and Attention based Guided Thermal Super-resolution
Authors:
Honey Gupta,
Kaushik Mitra
Abstract:
Guided super-resolution (GSR) of thermal images using visible range images is challenging because of the difference in the spectral-range between the images. This in turn means that there is significant texture-mismatch between the images, which manifests as blur and ghosting artifacts in the super-resolved thermal image. To tackle this, we propose a novel algorithm for GSR based on pyramidal edge…
▽ More
Guided super-resolution (GSR) of thermal images using visible range images is challenging because of the difference in the spectral-range between the images. This in turn means that there is significant texture-mismatch between the images, which manifests as blur and ghosting artifacts in the super-resolved thermal image. To tackle this, we propose a novel algorithm for GSR based on pyramidal edge-maps extracted from the visible image. Our proposed network has two sub-networks. The first sub-network super-resolves the low-resolution thermal image while the second obtains edge-maps from the visible image at a growing perceptual scale and integrates them into the super-resolution sub-network with the help of attention-based fusion. Extraction and integration of multi-level edges allows the super-resolution network to process texture-to-object level information progressively, enabling more straightforward identification of overlapping edges between the input images. Extensive experiments show that our model outperforms the state-of-the-art GSR methods, both quantitatively and qualitatively.
△ Less
Submitted 30 September, 2020; v1 submitted 13 March, 2020;
originally announced March 2020.
-
Optimal HDR and Depth from Dual Cameras
Authors:
Pradyumna Chari,
Anil Kumar Vadathya,
Kaushik Mitra
Abstract:
Dual camera systems have assisted in the proliferation of various applications, such as optical zoom, low-light imaging and High Dynamic Range (HDR) imaging. In this work, we explore an optimal method for capturing the scene HDR and disparity map using dual camera setups. Hasinoff et al. (2010) have developed a noise optimal framework for HDR capture from a single camera. We generalize this to the…
▽ More
Dual camera systems have assisted in the proliferation of various applications, such as optical zoom, low-light imaging and High Dynamic Range (HDR) imaging. In this work, we explore an optimal method for capturing the scene HDR and disparity map using dual camera setups. Hasinoff et al. (2010) have developed a noise optimal framework for HDR capture from a single camera. We generalize this to the dual camera set-up for estimating both HDR and disparity map. It may seem that dual camera systems can capture HDR in a shorter time. However, disparity estimation is a necessary step, which requires overlap among the images captured by the two cameras. This may lead to an increase in the capture time. To address this conflicting requirement, we propose a novel framework to find the optimal exposure and ISO sequence by minimizing the capture time under the constraints of an upper bound on the disparity error and a lower bound on the per-exposure SNR. We show that the resulting optimization problem is non-convex in general and propose an appropriate initialization technique. To obtain the HDR and disparity map from the optimal capture sequence, we propose a pipeline which alternates between estimating the camera ICRFs and the scene disparity map. We demonstrate that our optimal capture sequence leads to better results than other possible capture sequences. Our results are also close to those obtained by capturing the full stereo stack spanning the entire dynamic range. Finally, we present for the first time a stereo HDR dataset consisting of dense ISO and exposure stack captured from a smartphone dual camera. The dataset consists of 6 scenes, with an average of 142 exposure-ISO image sequence per scene.
△ Less
Submitted 12 March, 2020;
originally announced March 2020.
-
Harnessing Multi-View Perspective of Light Fields for Low-Light Imaging
Authors:
Mohit Lamba,
Kranthi Kumar,
Kaushik Mitra
Abstract:
Light Field (LF) offers unique advantages such as post-capture refocusing and depth estimation, but low-light conditions limit these capabilities. To restore low-light LFs we should harness the geometric cues present in different LF views, which is not possible using single-frame low-light enhancement techniques. We, therefore, propose a deep neural network for Low-Light Light Field (L3F) restorat…
▽ More
Light Field (LF) offers unique advantages such as post-capture refocusing and depth estimation, but low-light conditions limit these capabilities. To restore low-light LFs we should harness the geometric cues present in different LF views, which is not possible using single-frame low-light enhancement techniques. We, therefore, propose a deep neural network for Low-Light Light Field (L3F) restoration, which we refer to as L3Fnet. The proposed L3Fnet not only performs the necessary visual enhancement of each LF view but also preserves the epipolar geometry across views. We achieve this by adopting a two-stage architecture for L3Fnet. Stage-I looks at all the LF views to encode the LF geometry. This encoded information is then used in Stage-II to reconstruct each LF view. To facilitate learning-based techniques for low-light LF imaging, we collected a comprehensive LF dataset of various scenes. For each scene, we captured four LFs, one with near-optimal exposure and ISO settings and the others at different levels of low-light conditions varying from low to extreme low-light settings. The effectiveness of the proposed L3Fnet is supported by both visual and numerical comparisons on this dataset. To further analyze the performance of low-light reconstruction methods, we also propose an L3F-wild dataset that contains LF captured late at night with almost zero lux values. No ground truth is available in this dataset. To perform well on the L3F-wild dataset, any method must adapt to the light level of the captured scene. To do this we propose a novel pre-processing block that makes L3Fnet robust to various degrees of low-light conditions. Lastly, we show that L3Fnet can also be used for low-light enhancement of single-frame images, despite it being engineered for LF data. We do so by converting the single-frame DSLR image into a form suitable to L3Fnet, which we call as pseudo-LF.
△ Less
Submitted 8 December, 2020; v1 submitted 5 March, 2020;
originally announced March 2020.
-
multi-patch aggregation models for resampling detection
Authors:
Mohit Lamba,
Kaushik Mitra
Abstract:
Images captured nowadays are of varying dimensions with smartphones and DSLR's allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments, we observed that many state-of-the-art forensic algorithms are sensitive to image size and thei…
▽ More
Images captured nowadays are of varying dimensions with smartphones and DSLR's allowing users to choose from a list of available image resolutions. It is therefore imperative for forensic algorithms such as resampling detection to scale well for images of varying dimensions. However, in our experiments, we observed that many state-of-the-art forensic algorithms are sensitive to image size and their performance quickly degenerates when operated on images of diverse dimensions despite re-training them using multiple image sizes. To handle this issue, we propose a novel pooling strategy called ITERATIVE POOLING. This pooling strategy can dynamically adjust input tensors in a discrete without much loss of information as in ROI Max-pooling. This pooling strategy can be used with any of the existing deep models and for demonstration purposes, we show its utility on Resnet-18 for the case of resampling detection a fundamental operation for any image sought of image manipulation. Compared to existing strategies and Max-pooling it gives up to 7-8% improvement on public datasets.
△ Less
Submitted 3 March, 2020;
originally announced March 2020.
-
Image Aesthetics Assessment using Multi Channel Convolutional Neural Networks
Authors:
Nishi Doshi,
Gitam Shikhenawis,
Suman K Mitra
Abstract:
Image Aesthetics Assessment is one of the emerging domains in research. The domain deals with classification of images into categories depending on the basis of how pleasant they are for the users to watch. In this article, the focus is on categorizing the images in high quality and low quality image. Deep convolutional neural networks are used to classify the images. Instead of using just the raw…
▽ More
Image Aesthetics Assessment is one of the emerging domains in research. The domain deals with classification of images into categories depending on the basis of how pleasant they are for the users to watch. In this article, the focus is on categorizing the images in high quality and low quality image. Deep convolutional neural networks are used to classify the images. Instead of using just the raw image as input, different crops and saliency maps of the images are also used, as input to the proposed multi channel CNN architecture. The experiments reported on widely used AVA database show improvement in the aesthetic assessment performance over existing approaches.
△ Less
Submitted 21 November, 2019;
originally announced November 2019.
-
Unsupervised Single Image Underwater Depth Estimation
Authors:
Honey Gupta,
Kaushik Mitra
Abstract:
Depth estimation from a single underwater image is one of the most challenging problems and is highly ill-posed. Due to the absence of large generalized underwater depth datasets and the difficulty in obtaining ground truth depth-maps, supervised learning techniques such as direct depth regression cannot be used. In this paper, we propose an unsupervised method for depth estimation from a single u…
▽ More
Depth estimation from a single underwater image is one of the most challenging problems and is highly ill-posed. Due to the absence of large generalized underwater depth datasets and the difficulty in obtaining ground truth depth-maps, supervised learning techniques such as direct depth regression cannot be used. In this paper, we propose an unsupervised method for depth estimation from a single underwater image taken `in the wild' by using haze as a cue for depth. Our approach is based on indirect depth-map estimation where we learn the mapping functions between unpaired RGB-D terrestrial images and arbitrary underwater images to estimate the required depth-map. We propose a method which is based on the principles of cycle-consistent learning and uses dense-block based auto-encoders as generator networks. We evaluate and compare our method both quantitatively and qualitatively on various underwater images with diverse attenuation and scattering conditions and show that our method produces state-of-the-art results for unsupervised depth estimation from a single underwater image.
△ Less
Submitted 28 May, 2019; v1 submitted 25 May, 2019;
originally announced May 2019.
-
Fully Convolutional Networks for Monocular Retinal Depth Estimation and Optic Disc-Cup Segmentation
Authors:
Sharath M Shankaranarayana,
Keerthi Ram,
Kaushik Mitra,
Mohanasankar Sivaprakasam
Abstract:
Glaucoma is a serious ocular disorder for which the screening and diagnosis are carried out by the examination of the optic nerve head (ONH). The color fundus image (CFI) is the most common modality used for ocular screening. In CFI, the central r
Glaucoma is a serious ocular disorder for which the screening and diagnosis are carried out by the examination of the optic nerve head (ONH). The color fundus image (CFI) is the most common modality used for ocular screening. In CFI, the central r
△ Less
Submitted 4 February, 2019;
originally announced February 2019.
-
Neural Decoder for Topological Codes using Pseudo-Inverse of Parity Check Matrix
Authors:
Chaitanya Chinni,
Abhishek Kulkarni,
Dheeraj M. Pai,
Kaushik Mitra,
Pradeep Kiran Sarvepalli
Abstract:
Recent developments in the field of deep learning have motivated many researchers to apply these methods to problems in quantum information. Torlai and Melko first proposed a decoder for surface codes based on neural networks. Since then, many other researchers have applied neural networks to study a variety of problems in the context of decoding. An important development in this regard was due to…
▽ More
Recent developments in the field of deep learning have motivated many researchers to apply these methods to problems in quantum information. Torlai and Melko first proposed a decoder for surface codes based on neural networks. Since then, many other researchers have applied neural networks to study a variety of problems in the context of decoding. An important development in this regard was due to Varsamopoulos et al. who proposed a two-step decoder using neural networks. Subsequent work of Maskara et al. used the same concept for decoding for various noise models. We propose a similar two-step neural decoder using inverse parity-check matrix for topological color codes. We show that it outperforms the state-of-the-art performance of non-neural decoders for independent Pauli errors noise model on a 2D hexagonal color code. Our final decoder is independent of the noise model and achieves a threshold of $10 \%$. Our result is comparable to the recent work on neural decoder for quantum error correction by Maskara et al.. It appears that our decoder has significant advantages with respect to training cost and complexity of the network for higher lengths when compared to that of Maskara et al.. Our proposed method can also be extended to arbitrary dimension and other stabilizer codes.
△ Less
Submitted 24 January, 2019; v1 submitted 21 January, 2019;
originally announced January 2019.
-
A Unified Learning Based Framework for Light Field Reconstruction from Coded Projections
Authors:
Anil Kumar Vadathya,
Sharath Girish,
Kaushik Mitra
Abstract:
Light field presents a rich way to represent the 3D world by capturing the spatio-angular dimensions of the visual signal. However, the popular way of capturing light field (LF) via a plenoptic camera presents spatio-angular resolution trade-off. Computational imaging techniques such as compressive light field and programmable coded aperture reconstruct full sensor resolution LF from coded project…
▽ More
Light field presents a rich way to represent the 3D world by capturing the spatio-angular dimensions of the visual signal. However, the popular way of capturing light field (LF) via a plenoptic camera presents spatio-angular resolution trade-off. Computational imaging techniques such as compressive light field and programmable coded aperture reconstruct full sensor resolution LF from coded projections obtained by multiplexing the incoming spatio-angular light field. Here, we present a unified learning framework that can reconstruct LF from a variety of multiplexing schemes with minimal number of coded images as input. We consider three light field capture schemes: heterodyne capture scheme with code placed near the sensor, coded aperture scheme with code at the camera aperture and finally the dual exposure scheme of capturing a focus-defocus pair where there is no explicit coding. Our algorithm consists of three stages 1) we recover the all-in-focus image from the coded image 2) we estimate the disparity maps for all the LF views from the coded image and the all-in-focus image, 3) we then render the LF by warping the all-in-focus image using disparity maps and refine it. For these three stages we propose three deep neural networks - ViewNet, DispairtyNet and RefineNet. Our reconstructions show that our learning algorithm achieves state-of-the-art results for all the three multiplexing schemes. Especially, our LF reconstructions from focus-defocus pair is comparable to other learning-based view synthesis approaches from multiple images. Thus, our work paves the way for capturing high-resolution LF (~ a megapixel) using conventional cameras such as DSLRs. Please check our supplementary materials $\href{https://docs.google.com/presentation/d/1Vr-F8ZskrSd63tvnLfJ2xmEXY6OBc1Rll3XeOAtc11I/}{online}$ to better appreciate the reconstructed light fields.
△ Less
Submitted 18 October, 2019; v1 submitted 26 December, 2018;
originally announced December 2018.
-
Photorealistic Image Reconstruction from Hybrid Intensity and Event based Sensor
Authors:
Prasan A Shedligeri,
Kaushik Mitra
Abstract:
Event sensors output a stream of asynchronous brightness changes (called ``events'') at a very high temporal rate. Previous works on recovering the lost intensity information from the event sensor data have heavily relied on the event stream, which makes the reconstructed images non-photorealistic and also susceptible to noise in the event stream. We propose to reconstruct photorealistic intensity…
▽ More
Event sensors output a stream of asynchronous brightness changes (called ``events'') at a very high temporal rate. Previous works on recovering the lost intensity information from the event sensor data have heavily relied on the event stream, which makes the reconstructed images non-photorealistic and also susceptible to noise in the event stream. We propose to reconstruct photorealistic intensity images from a hybrid sensor consisting of a low frame rate conventional camera, which has the scene texture information, along with the event sensor. To accomplish our task, we warp the low frame rate intensity images to temporally dense locations of the event data by estimating a spatially dense scene depth and temporally dense sensor ego-motion. The results obtained from our algorithm are more photorealistic compared to any of the previous state-of-the-art algorithms. We also demonstrate our algorithm's robustness to abrupt camera motion and noise in the event sensor data.
△ Less
Submitted 11 February, 2019; v1 submitted 16 May, 2018;
originally announced May 2018.
-
Phase retrieval for Fourier Ptychography under varying amount of measurements
Authors:
Lokesh Boominathan,
Mayug Maniparambil,
Honey Gupta,
Rahul Baburajan,
Kaushik Mitra
Abstract:
Fourier Ptychography is a recently proposed imaging technique that yields high-resolution images by computationally transcending the diffraction blur of an optical system. At the crux of this method is the phase retrieval algorithm, which is used for computationally stitching together low-resolution images taken under varying illumination angles of a coherent light source. However, the traditional…
▽ More
Fourier Ptychography is a recently proposed imaging technique that yields high-resolution images by computationally transcending the diffraction blur of an optical system. At the crux of this method is the phase retrieval algorithm, which is used for computationally stitching together low-resolution images taken under varying illumination angles of a coherent light source. However, the traditional iterative phase retrieval technique relies heavily on the initialization and also need a good amount of overlap in the Fourier domain for the successively captured low-resolution images, thus increasing the acquisition time and data. We show that an auto-encoder based architecture can be adaptively trained for phase retrieval under both low overlap, where traditional techniques completely fail, and at higher levels of overlap. For the low overlap case we show that a supervised deep learning technique using an autoencoder generator is a good choice for solving the Fourier ptychography problem. And for the high overlap case, we show that optimizing the generator for reducing the forward model error is an appropriate choice. Using simulations for the challenging case of uncorrelated phase and amplitude, we show that our method outperforms many of the previously proposed Fourier ptychography phase retrieval techniques.
△ Less
Submitted 9 May, 2018;
originally announced May 2018.
-
Dynamic Vision Sensors for Human Activity Recognition
Authors:
Stefanie Anna Baby,
Bimal Vinod,
Chaitanya Chinni,
Kaushik Mitra
Abstract:
Unlike conventional cameras which capture video at a fixed frame rate, Dynamic Vision Sensors (DVS) record only changes in pixel intensity values. The output of DVS is simply a stream of discrete ON/OFF events based on the polarity of change in its pixel values. DVS has many attractive features such as low power consumption, high temporal resolution, high dynamic range and fewer storage requiremen…
▽ More
Unlike conventional cameras which capture video at a fixed frame rate, Dynamic Vision Sensors (DVS) record only changes in pixel intensity values. The output of DVS is simply a stream of discrete ON/OFF events based on the polarity of change in its pixel values. DVS has many attractive features such as low power consumption, high temporal resolution, high dynamic range and fewer storage requirements. All these make DVS a very promising camera for potential applications in wearable platforms where power consumption is a major concern.
In this paper, we explore the feasibility of using DVS for Human Activity Recognition (HAR). We propose to use the various slices (such as $x-y$, $x-t$, and $y-t$) of the DVS video as a feature map for HAR and denote them as Motion Maps. We show that fusing motion maps with Motion Boundary Histogram (MBH) give good performance on the benchmark DVS dataset as well as on a real DVS gesture dataset collected by us. Interestingly, the performance of DVS is comparable to that of conventional videos although DVS captures only sparse motion information.
△ Less
Submitted 13 March, 2018;
originally announced March 2018.
-
Solving Inverse Computational Imaging Problems using Deep Pixel-level Prior
Authors:
Akshat Dave,
Anil Kumar Vadathya,
Ramana Subramanyam,
Rahul Baburajan,
Kaushik Mitra
Abstract:
Signal reconstruction is a challenging aspect of computational imaging as it often involves solving ill-posed inverse problems. Recently, deep feed-forward neural networks have led to state-of-the-art results in solving various inverse imaging problems. However, being task specific, these networks have to be learned for each inverse problem. On the other hand, a more flexible approach would be to…
▽ More
Signal reconstruction is a challenging aspect of computational imaging as it often involves solving ill-posed inverse problems. Recently, deep feed-forward neural networks have led to state-of-the-art results in solving various inverse imaging problems. However, being task specific, these networks have to be learned for each inverse problem. On the other hand, a more flexible approach would be to learn a deep generative model once and then use it as a signal prior for solving various inverse problems. We show that among the various state of the art deep generative models, autoregressive models are especially suitable for our purpose for the following reasons. First, they explicitly model the pixel level dependencies and hence are capable of reconstructing low-level details such as texture patterns and edges better. Second, they provide an explicit expression for the image prior which can then be used for MAP based inference along with the forward model. Third, they can model long range dependencies in images which make them ideal for handling global multiplexing as encountered in various compressive imaging systems. We demonstrate the efficacy of our proposed approach in solving three computational imaging problems: Single Pixel Camera (SPC), LiSens and FlatCam. For both real and simulated cases, we obtain better reconstructions than the state-of-the-art methods in terms of perceptual and quantitative metrics.
△ Less
Submitted 23 April, 2018; v1 submitted 27 February, 2018;
originally announced February 2018.
-
Learning Light Field Reconstruction from a Single Coded Image
Authors:
Anil Kumar Vadathya,
Saikiran Cholleti,
Gautham Ramajayam,
Vijayalakshmi Kanchana,
Kaushik Mitra
Abstract:
Light field imaging is a rich way of representing the 3D world around us. However, due to limited sensor resolution capturing light field data inherently poses spatio-angular resolution trade-off. In this paper, we propose a deep learning based solution to tackle the resolution trade-off. Specifically, we reconstruct full sensor resolution light field from a single coded image. We propose to do th…
▽ More
Light field imaging is a rich way of representing the 3D world around us. However, due to limited sensor resolution capturing light field data inherently poses spatio-angular resolution trade-off. In this paper, we propose a deep learning based solution to tackle the resolution trade-off. Specifically, we reconstruct full sensor resolution light field from a single coded image. We propose to do this in three stages 1) reconstruction of center view from the coded image 2) estimating disparity map from the coded image and center view 3) warping center view using the disparity to generate light field. We propose three neural networks for these stages. Our disparity estimation network is trained in an unsupervised manner alleviating the need for ground truth disparity. Our results demonstrate better recovery of parallax from the coded image. Also, we get better results than dictionary learning based approaches both qualitatively and quatitatively.
△ Less
Submitted 26 April, 2018; v1 submitted 20 January, 2018;
originally announced January 2018.
-
Data Driven Coded Aperture Design for Depth Recovery
Authors:
Prasan A Shedligeri,
Sreyas Mohan,
Kaushik Mitra
Abstract:
Inserting a patterned occluder at the aperture of a camera lens has been shown to improve the recovery of depth map and all-focus image compared to a fully open aperture. However, design of the aperture pattern plays a very critical role. Previous approaches for designing aperture codes make simple assumptions on image distributions to obtain metrics for evaluating aperture codes. However, real im…
▽ More
Inserting a patterned occluder at the aperture of a camera lens has been shown to improve the recovery of depth map and all-focus image compared to a fully open aperture. However, design of the aperture pattern plays a very critical role. Previous approaches for designing aperture codes make simple assumptions on image distributions to obtain metrics for evaluating aperture codes. However, real images may not follow those assumptions and hence the designed code may not be optimal for them. To address this drawback we propose a data driven approach for learning the optimal aperture pattern to recover depth map from a single coded image. We propose a two stage architecture where, in the first stage we simulate coded aperture images from a training dataset of all-focus images and depth maps and in the second stage we recover the depth map using a deep neural network. We demonstrate that our learned aperture code performs better than previously designed codes even on code design metrics proposed by previous approaches.
△ Less
Submitted 1 June, 2017; v1 submitted 28 May, 2017;
originally announced May 2017.
-
Proactive QoE Provisioning in Heterogeneous Access Networks using Hidden Markov Models and Reinforcement Learning
Authors:
Karan Mitra,
Christer Åhlund,
Arkady Zaslavsky,
Saguna Saguna
Abstract:
Quality of Experience (QoE) provisioning in heterogeneous access networks (HANs) can be achieved via handoffs. The current approaches for QoE-aware handoffs either lack the availability of a network path probing method or lack the availability of efficient methods for QoE prediction. Further, the current approaches do not explore the benefits of proactive QoE-aware handoffs such that user's QoE is…
▽ More
Quality of Experience (QoE) provisioning in heterogeneous access networks (HANs) can be achieved via handoffs. The current approaches for QoE-aware handoffs either lack the availability of a network path probing method or lack the availability of efficient methods for QoE prediction. Further, the current approaches do not explore the benefits of proactive QoE-aware handoffs such that user's QoE is maximized by learning from past network conditions and by actions taken by the mobile device regarding handoffs. In this paper, our contributions are two-fold. First, we propose, develop and validate a novel method for QoE prediction based on passive probing. Our method is based on hidden Markov models and Multi-homed Mobility Management Protocol which eliminates the need for additional probe packets for QoE prediction. It achieves the average QoE prediction accuracy of 97%. Second, we propose, develop and validate a novel reinforcement learning based method for proactive QoE-aware handoffs. We show that our method outperforms existing approaches by reducing the number of vertical handoffs by 60.65% while maintaining high QoE levels and by extending crucial functionality such as passive probing mechanisms.
△ Less
Submitted 25 December, 2016;
originally announced December 2016.
-
ALPINE: A Bayesian System for Cloud Performance Diagnosis and Prediction
Authors:
Karan Mitra,
Saguna Saguna,
Christer Åhlund,
Rajiv Ranjan
Abstract:
Cloud performance diagnosis and prediction is a challenging problem due to the stochastic nature of the cloud systems. Cloud performance is affected by a large set of factors including (but not limited to) virtual machine types, regions, workloads, wide area network delay and bandwidth. Therefore, necessitating the determination of complex relationships between these factors. The current research…
▽ More
Cloud performance diagnosis and prediction is a challenging problem due to the stochastic nature of the cloud systems. Cloud performance is affected by a large set of factors including (but not limited to) virtual machine types, regions, workloads, wide area network delay and bandwidth. Therefore, necessitating the determination of complex relationships between these factors. The current research in this area does not address the challenge of building models that capture the uncertain and complex relationships between these factors. Further, the challenge of cloud performance prediction under uncertainty has not garnered sufficient attention. This paper proposes develops and validates ALPINE, a Bayesian system for cloud performance diagnosis and prediction. ALPINE incorporates Bayesian networks to model uncertain and complex relationships between several factors mentioned above. It handles missing, scarce and sparse data to diagnose and predict stochastic cloud performance efficiently. We validate our proposed system using extensive real data and trace-driven analysis and show that it predicts cloud performance with high accuracy of 91.93%.
△ Less
Submitted 16 December, 2016;
originally announced December 2016.
-
Compressive Image Recovery Using Recurrent Generative Model
Authors:
Akshat Dave,
Anil Kumar Vadathya,
Kaushik Mitra
Abstract:
Reconstruction of signals from compressively sensed measurements is an ill-posed problem. In this paper, we leverage the recurrent generative model, RIDE, as an image prior for compressive image reconstruction. Recurrent networks can model long-range dependencies in images and hence are suitable to handle global multiplexing in reconstruction from compressive imaging. We perform MAP inference with…
▽ More
Reconstruction of signals from compressively sensed measurements is an ill-posed problem. In this paper, we leverage the recurrent generative model, RIDE, as an image prior for compressive image reconstruction. Recurrent networks can model long-range dependencies in images and hence are suitable to handle global multiplexing in reconstruction from compressive imaging. We perform MAP inference with RIDE using back-propagation to the inputs and projected gradient method. We propose an entropy thresholding based approach for preserving texture in images well. Our approach shows superior reconstructions compared to recent global reconstruction approaches like D-AMP and TVAL3 on both simulated and real data.
△ Less
Submitted 3 May, 2017; v1 submitted 13 December, 2016;
originally announced December 2016.
-
Spatial Phase-Sweep: Increasing temporal resolution of transient imaging using a light source array
Authors:
Ryuichi Tadano,
Adithya Kumar Pediredla,
Kaushik Mitra,
Ashok Veeraraghavan
Abstract:
Transient imaging or light-in-flight techniques capture the propagation of an ultra-short pulse of light through a scene, which in effect captures the optical impulse response of the scene. Recently, it has been shown that we can capture transient images using commercially available Time-of-Flight (ToF) systems such as Photonic Mixer Devices (PMD). In this paper, we propose `spatial phase-sweep',…
▽ More
Transient imaging or light-in-flight techniques capture the propagation of an ultra-short pulse of light through a scene, which in effect captures the optical impulse response of the scene. Recently, it has been shown that we can capture transient images using commercially available Time-of-Flight (ToF) systems such as Photonic Mixer Devices (PMD). In this paper, we propose `spatial phase-sweep', a technique that exploits the speed of light to increase the temporal resolution beyond the 100 picosecond limit imposed by current electronics. Spatial phase-sweep uses a linear array of light sources with spatial separation of about 3 mm between them, thereby resulting in a time shift of about 10 picoseconds, which translates into 100 Gfps of transient imaging in theory. We demonstrate a prototype and transient imaging results using spatial phase-sweep.
△ Less
Submitted 21 December, 2015;
originally announced December 2015.
-
Cross-Layer Multi-Cloud Real-Time Application QoS Monitoring and Benchmarking As-a-Service Framework
Authors:
Khalid Alhamazani,
Rajiv Ranjan,
Prem Prakash Jayaraman,
Karan Mitra,
Chang Liu,
Fethi Rabhi,
Dimitrios Georgakopoulos,
Lizhe Wang
Abstract:
Cloud computing provides on-demand access to affordable hardware (multi-core CPUs, GPUs, disks, and networking equipment) and software (databases, application servers and data processing frameworks) platforms with features such as elasticity, pay-per-use, low upfront investment and low time to market. This has led to the proliferation of business critical applications that leverage various cloud p…
▽ More
Cloud computing provides on-demand access to affordable hardware (multi-core CPUs, GPUs, disks, and networking equipment) and software (databases, application servers and data processing frameworks) platforms with features such as elasticity, pay-per-use, low upfront investment and low time to market. This has led to the proliferation of business critical applications that leverage various cloud platforms. Such applications hosted on single or multiple cloud provider platforms have diverse characteristics requiring extensive monitoring and benchmarking mechanisms to ensure run-time Quality of Service (QoS) (e.g., latency and throughput). This paper proposes, develops and validates CLAMBS:Cross-Layer Multi-Cloud Application Monitoring and Benchmarking as-a-Service for efficient QoS monitoring and benchmarking of cloud applications hosted on multi-clouds environments. The major highlight of CLAMBS is its capability of monitoring and benchmarking individual application components such as databases and web servers, distributed across cloud layers, spread among multiple cloud providers. We validate CLAMBS using prototype implementation and extensive experimentation and show that CLAMBS efficiently monitors and benchmarks application components on multi-cloud platforms including Amazon EC2 and Microsoft Azure.
△ Less
Submitted 29 April, 2015; v1 submitted 1 February, 2015;
originally announced February 2015.
-
QoE Modelling, Measurement and Prediction: A Review
Authors:
Karan Mitra,
Arkady Zaslavsky,
Christer Åhlund
Abstract:
In mobile computing systems, users can access network services anywhere and anytime using mobile devices such as tablets and smart phones. These devices connect to the Internet via network or telecommunications operators. Users usually have some expectations about the services provided to them by different operators. Users' expectations along with additional factors such as cognitive and behaviour…
▽ More
In mobile computing systems, users can access network services anywhere and anytime using mobile devices such as tablets and smart phones. These devices connect to the Internet via network or telecommunications operators. Users usually have some expectations about the services provided to them by different operators. Users' expectations along with additional factors such as cognitive and behavioural states, cost, and network quality of service (QoS) may determine their quality of experience (QoE). If users are not satisfied with their QoE, they may switch to different providers or may stop using a particular application or service. Thus, QoE measurement and prediction techniques may benefit users in availing personalized services from service providers. On the other hand, it can help service providers to achieve lower user-operator switchover. This paper presents a review of the state-the-art research in the area of QoE modelling, measurement and prediction. In particular, we investigate and discuss the strengths and shortcomings of existing techniques. Finally, we present future research directions for developing novel QoE measurement and prediction techniques
△ Less
Submitted 25 October, 2014;
originally announced October 2014.
-
An Overview of the Commercial Cloud Monitoring Tools: Research Dimensions, Design Issues, and State-of-the-Art
Authors:
Khalid Alhamazani,
Rajiv Ranjan,
Karan Mitra,
Fethi Rabhi,
Samee Ullah Khan,
Adnene Guabtni,
Vasudha Bhatnagar
Abstract:
Cloud monitoring activity involves dynamically tracking the Quality of Service (QoS) parameters related to virtualized resources (e.g., VM, storage, network, appliances, etc.), the physical resources they share, the applications running on them and data hosted on them. Applications and resources configuration in cloud computing environment is quite challenging considering a large number of heterog…
▽ More
Cloud monitoring activity involves dynamically tracking the Quality of Service (QoS) parameters related to virtualized resources (e.g., VM, storage, network, appliances, etc.), the physical resources they share, the applications running on them and data hosted on them. Applications and resources configuration in cloud computing environment is quite challenging considering a large number of heterogeneous cloud resources. Further, considering the fact that at each point of time, there will be a different and specific cloud service which may be massively required. Hence, cloud monitoring tools can assist a cloud providers or application developers in: (i) keeping their resources and applications operating at peak efficiency; (ii) detecting variations in resource and application performance; (iii) accounting the Service Level Agreement (SLA) violations of certain QoS parameters; and (iv) tracking the leave and join operations of cloud resources due to failures and other dynamic configuration changes.
In this paper, we identify and discuss the major research dimensions and design issues related to engineering cloud monitoring tools. We further discuss how aforementioned research dimensions and design issues are handled by current academic research as well as by commercial monitoring tools.
△ Less
Submitted 20 December, 2013;
originally announced December 2013.
-
A Framework for the Analysis of Computational Imaging Systems with Practical Applications
Authors:
Kaushik Mitra,
Oliver Cossairt,
Ashok Veeraraghavan
Abstract:
Over the last decade, a number of Computational Imaging (CI) systems have been proposed for tasks such as motion deblurring, defocus deblurring and multispectral imaging. These techniques increase the amount of light reaching the sensor via multiplexing and then undo the deleterious effects of multiplexing by appropriate reconstruction algorithms. Given the widespread appeal and the considerable e…
▽ More
Over the last decade, a number of Computational Imaging (CI) systems have been proposed for tasks such as motion deblurring, defocus deblurring and multispectral imaging. These techniques increase the amount of light reaching the sensor via multiplexing and then undo the deleterious effects of multiplexing by appropriate reconstruction algorithms. Given the widespread appeal and the considerable enthusiasm generated by these techniques, a detailed performance analysis of the benefits conferred by this approach is important.
Unfortunately, a detailed analysis of CI has proven to be a challenging problem because performance depends equally on three components: (1) the optical multiplexing, (2) the noise characteristics of the sensor, and (3) the reconstruction algorithm. A few recent papers have performed analysis taking multiplexing and noise characteristics into account. However, analysis of CI systems under state-of-the-art reconstruction algorithms, most of which exploit signal prior models, has proven to be unwieldy. In this paper, we present a comprehensive analysis framework incorporating all three components.
In order to perform this analysis, we model the signal priors using a Gaussian Mixture Model (GMM). A GMM prior confers two unique characteristics. Firstly, GMM satisfies the universal approximation property which says that any prior density function can be approximated to any fidelity using a GMM with appropriate number of mixtures. Secondly, a GMM prior lends itself to analytical tractability allowing us to derive simple expressions for the `minimum mean square error' (MMSE), which we use as a metric to characterize the performance of CI systems. We use our framework to analyze several previously proposed CI techniques, giving conclusive answer to the question: `How much performance gain is due to use of a signal prior and how much is due to multiplexing?
△ Less
Submitted 13 March, 2014; v1 submitted 8 August, 2013;
originally announced August 2013.
-
MediaWise - Designing a Smart Media Cloud
Authors:
Dimitrios Georgakopoulos,
Rajiv Ranjan,
Karan Mitra,
Xiangmin Zhou
Abstract:
The MediaWise project aims to expand the scope of existing media delivery systems with novel cloud, personalization and collaboration capabilities that can serve the needs of more users, communities, and businesses. The project develops a MediaWise Cloud platform that supports do-it-yourself creation, search, management, and consumption of multimedia content. The MediaWise Cloud supports pay-as-yo…
▽ More
The MediaWise project aims to expand the scope of existing media delivery systems with novel cloud, personalization and collaboration capabilities that can serve the needs of more users, communities, and businesses. The project develops a MediaWise Cloud platform that supports do-it-yourself creation, search, management, and consumption of multimedia content. The MediaWise Cloud supports pay-as-you-go models and elasticity that are similar to those offered by commercially available cloud services. However, unlike existing commercial CDN services providers such as Limelight Networks and Akamai the MediaWise Cloud require no ownerships of computing infrastructure and instead rely on the public Internet and public cloud services (e.g., commercial cloud storage to store its content). In addition to integrating such public cloud services into a public cloud-based Content Delivery Network, the MediaWise Cloud also provides advanced Quality of Service (QoS) management as required for the delivery of streamed and interactive high resolution multimedia content. In this paper, we give a brief overview of MediaWise Cloud architecture and present a comprehensive discussion on research objectives related to its service components. Finally, we also compare the features supported by the existing CDN services against the envisioned objectives of MediaWise Cloud.
△ Less
Submitted 17 June, 2012; v1 submitted 9 June, 2012;
originally announced June 2012.
-
A Simple Flood Forecasting Scheme Using Wireless Sensor Networks
Authors:
Victor Seal,
Arnab Raha,
Shovan Maity,
Souvik Kr Mitra,
Amitava Mukherjee,
Mrinal Kanti Naskar
Abstract:
This paper presents a forecasting model designed using WSNs (Wireless Sensor Networks) to predict flood in rivers using simple and fast calculations to provide real-time results and save the lives of people who may be affected by the flood. Our prediction model uses multiple variable robust linear regression which is easy to understand and simple and cost effective in implementation, is speed effi…
▽ More
This paper presents a forecasting model designed using WSNs (Wireless Sensor Networks) to predict flood in rivers using simple and fast calculations to provide real-time results and save the lives of people who may be affected by the flood. Our prediction model uses multiple variable robust linear regression which is easy to understand and simple and cost effective in implementation, is speed efficient, but has low resource utilization and yet provides real time predictions with reliable accuracy, thus having features which are desirable in any real world algorithm. Our prediction model is independent of the number of parameters, i.e. any number of parameters may be added or removed based on the on-site requirements. When the water level rises, we represent it using a polynomial whose nature is used to determine if the water level may exceed the flood line in the near future. We compare our work with a contemporary algorithm to demonstrate our improvements over it. Then we present our simulation results for the predicted water level compared to the actual water level.
△ Less
Submitted 9 March, 2012;
originally announced March 2012.
-
ROOT: Energy Efficient Routing through Optimized Tree in Sensor Networks
Authors:
Kaushik Chakraborty,
Ayon Chakraborty,
Swarup Kumar Mitra,
Mrinal Kanti Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 10 May, 2011;
originally announced May 2011.
-
Energy Efficient Routing in Wireless Sensor Networks: A Genetic Approach
Authors:
Ayon Chakraborty,
Swarup Kumar Mitra,
Mrinal Kanti Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 10 May, 2011;
originally announced May 2011.
-
Simulation of Wireless Sensor Networks Using TinyOS- A case Study
Authors:
Swarup kumar Mitra,
Ayon Chakraborty,
Subhajit Mandal,
M. K. Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 23 April, 2010;
originally announced April 2010.
-
An Energy Efficient Scheme for Data Gathering in Wireless Sensor Networks Using Particle Swarm Optimization
Authors:
Ayon Chakraborty,
Kaushik Chakraborty,
Swarup Kumar Mitra,
M. K. Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 20 April, 2010;
originally announced April 2010.
-
An Optimized Lifetime Enhancement Scheme for Data Gathering in Wireless Sensor Networks
Authors:
Ayon Chakraborty,
Kaushik Chakraborty,
Swarup Kumar Mitra,
M. K. Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 20 April, 2010;
originally announced April 2010.
-
An Efficient Hybrid Data Gathering Scheme in Wireless Sensor Networks
Authors:
Ayon Chakraborty,
Swarup Kumar Mitra,
M. K. Naskar
Abstract:
This paper has been withdrawn by the author due to a crucial sign error in equation 1
This paper has been withdrawn by the author due to a crucial sign error in equation 1
△ Less
Submitted 26 June, 2011; v1 submitted 19 April, 2010;
originally announced April 2010.