subscribe to arXiv mailings

GOV-REK: Governed Reward Engineering Kernels for Designing Robust Multi-Agent Reinforcement Learning Systems

Authors: Ashish Rana, Michael Oesterle, Jannik Brinkmann

Abstract: For multi-agent reinforcement learning systems (MARLS), the problem formulation generally involves investing massive reward engineering effort specific to a given problem. However, this effort often cannot be translated to other problems; worse, it gets wasted when system dynamics change drastically. This problem is further exacerbated in sparse reward scenarios, where a meaningful heuristic can a… ▽ More For multi-agent reinforcement learning systems (MARLS), the problem formulation generally involves investing massive reward engineering effort specific to a given problem. However, this effort often cannot be translated to other problems; worse, it gets wasted when system dynamics change drastically. This problem is further exacerbated in sparse reward scenarios, where a meaningful heuristic can assist in the policy convergence task. We propose GOVerned Reward Engineering Kernels (GOV-REK), which dynamically assign reward distributions to agents in MARLS during its learning stage. We also introduce governance kernels, which exploit the underlying structure in either state or joint action space for assigning meaningful agent reward distributions. During the agent learning stage, it iteratively explores different reward distribution configurations with a Hyperband-like algorithm to learn ideal agent reward models in a problem-agnostic manner. Our experiments demonstrate that our meaningful reward priors robustly jumpstart the learning process for effectively learning different MARL problems. △ Less

Submitted 14 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: Extended Abstract accepted in the 23rd International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2024)

arXiv:2401.07360 [pdf, other]

Promptformer: Prompted Conformer Transducer for ASR

Authors: Sergio Duarte-Torres, Arunasish Sen, Aman Rana, Lukas Drude, Alejandro Gomez-Alanis, Andreas Schwarz, Leif Rädel, Volker Leutnant

Abstract: Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rW… ▽ More Context cues carry information which can improve multi-turn interactions in automatic speech recognition (ASR) systems. In this paper, we introduce a novel mechanism inspired by hyper-prompting to fuse textual context with acoustic representations in the attention mechanism. Results on a test set with multi-turn interactions show that our method achieves 5.9% relative word error rate reduction (rWERR) over a strong baseline. We show that our method does not degrade in the absence of context and leads to improvements even if the model is trained without context. We further show that leveraging a pre-trained sentence-piece model for context embedding generation can outperform an external BERT model. △ Less

Submitted 14 January, 2024; originally announced January 2024.

arXiv:2312.07169 [pdf, other]

Semi-supervised Active Learning for Video Action Detection

Authors: Ayush Singh, Aayush J Rana, Akash Kumar, Shruti Vyas, Yogesh Singh Rawat

Abstract: In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning i… ▽ More In this work, we focus on label efficient learning for video action detection. We develop a novel semi-supervised active learning approach which utilizes both labeled as well as unlabeled data along with informative sample selection for action detection. Video action detection requires spatio-temporal localization along with classification, which poses several challenges for both active learning informative sample selection as well as semi-supervised learning pseudo label generation. First, we propose NoiseAug, a simple augmentation strategy which effectively selects informative samples for video action detection. Next, we propose fft-attention, a novel technique based on high-pass filtering which enables effective utilization of pseudo label for SSL in video action detection by emphasizing on relevant activity region within a video. We evaluate the proposed approach on three different benchmark datasets, UCF-101-24, JHMDB-21, and Youtube-VOS. First, we demonstrate its effectiveness on video action detection where the proposed approach outperforms prior works in semi-supervised and weakly-supervised learning along with several baseline approaches in both UCF101-24 and JHMDB-21. Next, we also show its effectiveness on Youtube-VOS for video object segmentation demonstrating its generalization capability for other dense prediction tasks in videos. The code and models is publicly available at: \url{https://github.com/AKASH2907/semi-sup-active-learning}. △ Less

Submitted 3 April, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

Comments: AAAI Conference on Artificial Intelligence, Main Technical Track (AAAI), 2024, Code: https://github.com/AKASH2907/semi-sup-active-learning

arXiv:2304.06668 [pdf, other]

DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer

Authors: Amit Kumar Rana, Sabarinath Mahadevan, Alexander Hermans, Bastian Leibe

Abstract: Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interactio… ▽ More Most state-of-the-art instance segmentation methods rely on large amounts of pixel-precise ground-truth annotations for training, which are expensive to create. Interactive segmentation networks help generate such annotations based on an image and the corresponding user interactions such as clicks. Existing methods for this task can only process a single instance at a time and each user interaction requires a full forward pass through the entire deep network. We introduce a more efficient approach, called DynaMITe, in which we represent user interactions as spatio-temporal queries to a Transformer decoder with a potential to segment multiple object instances in a single iteration. Our architecture also alleviates any need to re-compute image features during refinement, and requires fewer interactions for segmenting multiple instances in a single image when compared to other methods. DynaMITe achieves state-of-the-art results on multiple existing interactive segmentation benchmarks, and also on the new multi-instance benchmark that we propose in this paper. △ Less

Submitted 22 August, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: Accepted to ICCV 2023

arXiv:2301.10052 [pdf, other]

Event Detection in Football using Graph Convolutional Networks

Authors: Aditya Sangram Singh Rana

Abstract: The massive growth of data collection in sports has opened numerous avenues for professional teams and media houses to gain insights from this data. The data collected includes per frame player and ball trajectories, and event annotations such as passes, fouls, cards, goals, etc. Graph Convolutional Networks (GCNs) have recently been employed to process this highly unstructured tracking data which… ▽ More The massive growth of data collection in sports has opened numerous avenues for professional teams and media houses to gain insights from this data. The data collected includes per frame player and ball trajectories, and event annotations such as passes, fouls, cards, goals, etc. Graph Convolutional Networks (GCNs) have recently been employed to process this highly unstructured tracking data which can be otherwise difficult to model because of lack of clarity on how to order players in a sequence and how to handle missing objects of interest. In this thesis, we focus on the goal of automatic event detection from football videos. We show how to model the players and the ball in each frame of the video sequence as a graph, and present the results for graph convolutional layers and pooling methods that can be used to model the temporal context present around each action. △ Less

Submitted 24 January, 2023; originally announced January 2023.

arXiv:2210.14624 [pdf, other]

doi 10.1109/IGARSS46834.2022.9883198

RapidAI4EO: Mono- and Multi-temporal Deep Learning models for Updating the CORINE Land Cover Product

Authors: Priyash Bhugra, Benjamin Bischke, Christoph Werner, Robert Syrnicki, Carolin Packbier, Patrick Helber, Caglar Senaras, Akhil Singh Rana, Tim Davis, Wanda De Keersmaecker, Daniele Zanaga, Annett Wania, Ruben Van De Kerchove, Giovanni Marchisio

Abstract: In the remote sensing community, Land Use Land Cover (LULC) classification with satellite imagery is a main focus of current research activities. Accurate and appropriate LULC classification, however, continues to be a challenging task. In this paper, we evaluate the performance of multi-temporal (monthly time series) compared to mono-temporal (single time step) satellite images for multi-label cl… ▽ More In the remote sensing community, Land Use Land Cover (LULC) classification with satellite imagery is a main focus of current research activities. Accurate and appropriate LULC classification, however, continues to be a challenging task. In this paper, we evaluate the performance of multi-temporal (monthly time series) compared to mono-temporal (single time step) satellite images for multi-label classification using supervised learning on the RapidAI4EO dataset. As a first step, we trained our CNN model on images at a single time step for multi-label classification, i.e. mono-temporal. We incorporated time-series images using a LSTM model to assess whether or not multi-temporal signals from satellites improves CLC classification. The results demonstrate an improvement of approximately 0.89% in classifying satellite imagery on 15 classes using a multi-temporal approach on monthly time series images compared to the mono-temporal approach. Using features from multi-temporal or mono-temporal images, this work is a step towards an efficient change detection and land monitoring approach. △ Less

Submitted 26 October, 2022; originally announced October 2022.

Comments: Published in IGARSS 2022 - 2022 IEEE International Geoscience and Remote Sensing Symposium

arXiv:2204.07892 [pdf, other]

Video Action Detection: Analysing Limitations and Challenges

Authors: Rajat Modi, Aayush Jung Rana, Akash Kumar, Praveen Tirupattur, Shruti Vyas, Yogesh Singh Rawat, Mubarak Shah

Abstract: Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action… ▽ More Beyond possessing large enough size to feed data hungry machines (eg, transformers), what attributes measure the quality of a dataset? Assuming that the definitions of such attributes do exist, how do we quantify among their relative existences? Our work attempts to explore these questions for video action detection. The task aims to spatio-temporally localize an actor and assign a relevant action class. We first analyze the existing datasets on video action detection and discuss their limitations. Next, we propose a new dataset, Multi Actor Multi Action (MAMA) which overcomes these limitations and is more suitable for real world applications. In addition, we perform a biasness study which analyzes a key property differentiating videos from static images: the temporal aspect. This reveals if the actions in these datasets really need the motion information of an actor, or whether they predict the occurrence of an action even by looking at a single frame. Finally, we investigate the widely held assumptions on the importance of temporal ordering: is temporal ordering important for detecting these actions? Such extreme experiments show existence of biases which have managed to creep into existing methods inspite of careful modeling. △ Less

Submitted 16 April, 2022; originally announced April 2022.

Comments: CVPRW'22

arXiv:2202.06218 [pdf, other]

Emotion Based Hate Speech Detection using Multimodal Learning

Authors: Aneri Rana, Sonali Jha

Abstract: In recent years, monitoring hate speech and offensive language on social media platforms has become paramount due to its widespread usage among all age groups, races, and ethnicities. Consequently, there have been substantial research efforts towards automated detection of such content using Natural Language Processing (NLP). While successfully filtering textual data, no research has focused on de… ▽ More In recent years, monitoring hate speech and offensive language on social media platforms has become paramount due to its widespread usage among all age groups, races, and ethnicities. Consequently, there have been substantial research efforts towards automated detection of such content using Natural Language Processing (NLP). While successfully filtering textual data, no research has focused on detecting hateful content in multimedia data. With increased ease of data storage and the exponential growth of social media platforms, multimedia content proliferates the internet as much as text data. Nevertheless, it escapes the automatic filtering systems. Hate speech and offensiveness can be detected in multimedia primarily via three modalities, i.e., visual, acoustic, and verbal. Our preliminary study concluded that the most essential features in classifying hate speech would be the speaker's emotional state and its influence on the spoken words, therefore limiting our current research to these modalities. This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content. Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content. This paper also presents a new Hate Speech Detection Video Dataset (HSDVD) collected for the purpose of multimodal learning as no such dataset exists today. △ Less

Submitted 13 February, 2022; originally announced February 2022.

arXiv:2202.02646 [pdf, other]

RerrFact: Reduced Evidence Retrieval Representations for Scientific Claim Verification

Authors: Ashish Rana, Deepanshu Khanna, Tirthankar Ghosal, Muskaan Singh, Harpreet Singh, Prashant Singh Rana

Abstract: Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scien… ▽ More Exponential growth in digital information outlets and the race to publish has made scientific misinformation more prevalent than ever. However, the task to fact-verify a given scientific claim is not straightforward even for researchers. Scientific claim verification requires in-depth knowledge and great labor from domain experts to substantiate supporting and refuting evidence from credible scientific sources. The SciFact dataset and corresponding task provide a benchmarking leaderboard to the community to develop automatic scientific claim verification systems via extracting and assimilating relevant evidence rationales from source abstracts. In this work, we propose a modular approach that sequentially carries out binary classification for every prediction subtask as in the SciFact leaderboard. Our simple classifier-based approach uses reduced abstract representations to retrieve relevant abstracts. These are further used to train the relevant rationale-selection model. Finally, we carry out two-step stance predictions that first differentiate non-relevant rationales and then identify supporting or refuting rationales for a given claim. Experimentally, our system RerrFact with no fine-tuning, simple design, and a fraction of model parameters fairs competitively on the leaderboard against large-scale, modular, and joint modeling approaches. We make our codebase available at https://github.com/ashishrana160796/RerrFact. △ Less

Submitted 18 April, 2022; v1 submitted 5 February, 2022; originally announced February 2022.

Comments: Accepted in the AAAI-22 Workshop on Scientific Document Understanding at the Thirty-Sixth AAAI Conference on Artificial Intelligence (SDU@AAAI-22)

arXiv:2110.10899 [pdf, other]

LARNet: Latent Action Representation for Human Action Synthesis

Authors: Naman Biyani, Aayush J Rana, Shruti Vyas, Yogesh S Rawat

Abstract: We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach inste… ▽ More We present LARNet, a novel end-to-end approach for generating human action videos. A joint generative modeling of appearance and dynamics to synthesize a video is very challenging and therefore recent works in video synthesis have proposed to decompose these two factors. However, these methods require a driving video to model the video dynamics. In this work, we propose a generative approach instead, which explicitly learns action dynamics in latent space avoiding the need of a driving video during inference. The generated action dynamics is integrated with the appearance using a recurrent hierarchical structure which induces motion at different scales to focus on both coarse as well as fine level action details. In addition, we propose a novel mix-adversarial loss function which aims at improving the temporal coherency of synthesized videos. We evaluate the proposed approach on four real-world human action datasets demonstrating the effectiveness of the proposed approach in generating human actions. Code available at https://github.com/aayushjr/larnet. △ Less

Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

Comments: British Machine Vision Conference (BMVC) 2021

arXiv:2109.10443 [pdf, other]

Geometric Fabrics: Generalizing Classical Mechanics to Capture the Physics of Behavior

Authors: Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana, Buck Babich, Bryan Peele, Qian Wan, Iretiayo Akinola, Balakumar Sundaralingam, Dieter Fox, Byron Boots, Nathan D. Ratliff

Abstract: Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guara… ▽ More Classical mechanical systems are central to controller design in energy shaping methods of geometric control. However, their expressivity is limited by position-only metrics and the intimate link between metric and geometry. Recent work on Riemannian Motion Policies (RMPs) has shown that shedding these restrictions results in powerful design tools, but at the expense of theoretical stability guarantees. In this work, we generalize classical mechanics to what we call geometric fabrics, whose expressivity and theory enable the design of systems that outperform RMPs in practice. Geometric fabrics strictly generalize classical mechanics forming a new physics of behavior by first generalizing them to Finsler geometries and then explicitly bending them to shape their behavior while maintaining stability. We develop the theory of fabrics and present both a collection of controlled experiments examining their theoretical properties and a set of robot system experiments showing improved performance over a well-engineered and hardened implementation of RMPs, our current state-of-the-art in controller design. △ Less

Submitted 18 January, 2022; v1 submitted 21 September, 2021; originally announced September 2021.

arXiv:2107.14591 [pdf, ps, other]

Self-supervision for health insurance claims data: a Covid-19 use case

Authors: Emilia Apostolova, Fazle Karim, Guido Muscioni, Anubhav Rana, Jeffrey Clyman

Abstract: In this work, we modify and apply self-supervision techniques to the domain of medical health insurance claims. We model patients' healthcare claims history analogous to free-text narratives, and introduce pre-trained `prior knowledge', later utilized for patient outcome predictions on a challenging task: predicting Covid-19 hospitalization, given a patient's pre-Covid-19 insurance claims history.… ▽ More In this work, we modify and apply self-supervision techniques to the domain of medical health insurance claims. We model patients' healthcare claims history analogous to free-text narratives, and introduce pre-trained `prior knowledge', later utilized for patient outcome predictions on a challenging task: predicting Covid-19 hospitalization, given a patient's pre-Covid-19 insurance claims history. Results suggest that pre-training on insurance claims not only produces better prediction performance, but, more importantly, improves the model's `clinical trustworthiness' and model stability/reliability. △ Less

Submitted 19 July, 2021; originally announced July 2021.

arXiv:2107.11494 [pdf, other]

TinyAction Challenge: Recognizing Real-world Low-resolution Activities in Videos

Authors: Praveen Tirupattur, Aayush J Rana, Tushar Sangam, Shruti Vyas, Yogesh S Rawat, Mubarak Shah

Abstract: This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effecti… ▽ More This paper summarizes the TinyAction challenge which was organized in ActivityNet workshop at CVPR 2021. This challenge focuses on recognizing real-world low-resolution activities present in videos. Action recognition task is currently focused around classifying the actions from high-quality videos where the actors and the action is clearly visible. While various approaches have been shown effective for recognition task in recent works, they often do not deal with videos of lower resolution where the action is happening in a tiny region. However, many real world security videos often have the actual action captured in a small resolution, making action recognition in a tiny region a challenging task. In this work, we propose a benchmark dataset, TinyVIRAT-v2, which is comprised of naturally occuring low-resolution actions. This is an extension of the TinyVIRAT dataset and consists of actions with multiple labels. The videos are extracted from security videos which makes them realistic and more challenging. We use current state-of-the-art action recognition methods on the dataset as a benchmark, and propose the TinyAction Challenge. △ Less

Submitted 23 July, 2021; originally announced July 2021.

Comments: 8 pages. arXiv admin note: text overlap with arXiv:2007.07355

arXiv:2105.07962 [pdf]

doi 10.1007/s42979-021-00835-x

DFENet: A Novel Dimension Fusion Edge Guided Network for Brain MRI Segmentation

Authors: Hritam Basak, Rukhshanda Hussain, Ajay Rana

Abstract: The rapid increment of morbidity of brain stroke in the last few years have been a driving force towards fast and accurate segmentation of stroke lesions from brain MRI images. With the recent development of deep-learning, computer-aided and segmentation methods of ischemic stroke lesions have been useful for clinicians in early diagnosis and treatment planning. However, most of these methods suff… ▽ More The rapid increment of morbidity of brain stroke in the last few years have been a driving force towards fast and accurate segmentation of stroke lesions from brain MRI images. With the recent development of deep-learning, computer-aided and segmentation methods of ischemic stroke lesions have been useful for clinicians in early diagnosis and treatment planning. However, most of these methods suffer from inaccurate and unreliable segmentation results because of their inability to capture sufficient contextual features from the MRI volumes. To meet these requirements, 3D convolutional neural networks have been proposed, which, however, suffer from huge computational requirements. To mitigate these problems, we propose a novel Dimension Fusion Edge-guided network (DFENet) that can meet both of these requirements by fusing the features of 2D and 3D CNNs. Unlike other methods, our proposed network uses a parallel partial decoder (PPD) module for aggregating and upsampling selected features, rich in important contextual information. Additionally, we use an edge-guidance and enhanced mixing loss for constantly supervising and improvising the learning process of the network. The proposed method is evaluated on publicly available Anatomical Tracings of Lesions After Stroke (ATLAS) dataset, resulting in mean DSC, IoU, Precision and Recall values of 0.5457, 0.4015, 0.6371, and 0.4969 respectively. The results, when compared to other state-of-the-art methods, outperforms them by a significant margin. Therefore, the proposed model is robust, accurate, superior to the existing methods, and can be relied upon for biomedical applications. △ Less

Submitted 22 October, 2021; v1 submitted 17 May, 2021; originally announced May 2021.

Comments: Submitted at SN Computer Science

arXiv:2105.04905 [pdf, other]

Scene Understanding for Autonomous Driving

Authors: Òscar Lorente, Ian Riera, Aditya Rana

Abstract: To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate q… ▽ More To detect and segment objects in images based on their content is one of the most active topics in the field of computer vision. Nowadays, this problem can be addressed using Deep Learning architectures such as Faster R-CNN or YOLO, among others. In this paper, we study the behaviour of different configurations of RetinaNet, Faster R-CNN and Mask R-CNN presented in Detectron2. First, we evaluate qualitatively and quantitatively (AP) the performance of the pre-trained models on KITTI-MOTS and MOTSChallenge datasets. We observe a significant improvement in performance after fine-tuning these models on the datasets of interest and optimizing hyperparameters. Finally, we run inference in unusual situations using out of context datasets, and present interesting results that help us understanding better the networks. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2105.04895 [pdf, other]

Image Classification with Classic and Deep Learning Techniques

Authors: Òscar Lorente, Ian Riera, Aditya Rana

Abstract: To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning techniques.… ▽ More To classify images based on their content is one of the most studied topics in the field of computer vision. Nowadays, this problem can be addressed using modern techniques such as Convolutional Neural Networks (CNN), but over the years different classical methods have been developed. In this report, we implement an image classifier using both classic computer vision and deep learning techniques. Specifically, we study the performance of a Bag of Visual Words classifier using Support Vector Machines, a Multilayer Perceptron, an existing architecture named InceptionV3 and our own CNN, TinyNet, designed from scratch. We evaluate each of the cases in terms of accuracy and loss, and we obtain results that vary between 0.6 and 0.96 depending on the model and configuration used. △ Less

Submitted 11 May, 2021; originally announced May 2021.

arXiv:2103.10245 [pdf, other]

doi 10.1109/CCCI52664.2021.9583209

Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge

Authors: Ashish Rana, Avleen Malhi

Abstract: Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interaction behavior amongst different vehicles. But, as we know, real driving tasks often involve very high risk scenarios where other drivers often don… ▽ More Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interaction behavior amongst different vehicles. But, as we know, real driving tasks often involve very high risk scenarios where other drivers often don't behave in the expected sense. There can be many reasons for this behavior like being tired or inexperienced. The simulation environment doesn't take this information into account while training the navigation agent. Therefore, in this study we especially focus on systematically creating these risk prone scenarios with heavy traffic and unexpected random behavior for creating better model-free learning agents. We generate multiple autonomous driving scenarios by creating new custom Markov Decision Process (MDP) environment iterations in the highway-env simulation package. The behavior policy is learnt by agents trained with the help from deep reinforcement learning models. Our behavior policy is deliberated to handle collisions and risky randomized driver behavior. We train model free learning agents with supplement information of risk prone driving scenarios and compare their performance with baseline agents. Finally, we casually measure the impact of adding these perturbations in the training process to precisely account for the performance improvement obtained from utilizing the learnings from these scenarios. △ Less

Submitted 17 October, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: Published in CCCI 2021, Best Paper Award in Informatics

arXiv:2103.05922 [pdf, other]

RMP2: A Structured Composable Policy Class for Robot Learning

Authors: Anqi Li, Ching-An Cheng, M. Asif Rana, Man Xie, Karl Van Wyk, Nathan Ratliff, Byron Boots

Abstract: We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different lev… ▽ More We consider the problem of learning motion policies for acceleration-based robotics systems with a structured policy class specified by RMPflow. RMPflow is a multi-task control framework that has been successfully applied in many robotics problems. Using RMPflow as a structured policy class in learning has several benefits, such as sufficient expressiveness, the flexibility to inject different levels of prior knowledge as well as the ability to transfer policies between robots. However, implementing a system for end-to-end learning RMPflow policies faces several computational challenges. In this work, we re-examine the message passing algorithm of RMPflow and propose a more efficient alternate algorithm, called RMP2, that uses modern automatic differentiation tools (such as TensorFlow and PyTorch) to compute RMPflow policies. Our new design retains the strengths of RMPflow while bringing in advantages from automatic differentiation, including 1) easy programming interfaces to designing complex transformations; 2) support of general directed acyclic graph (DAG) transformation structures; 3) end-to-end differentiability for policy learning; 4) improved computational efficiency. Because of these features, RMP2 can be treated as a structured policy class for efficient robot learning which is suitable encoding domain knowledge. Our experiments show that using structured policy class given by RMP2 can improve policy performance and safety in reinforcement learning tasks for goal reaching in cluttered space. △ Less

Submitted 10 March, 2021; originally announced March 2021.

arXiv:2101.10396 [pdf, other]

Quality Assessment of Super-Resolved Omnidirectional Image Quality Using Tangential Views

Authors: Cagri Ozcinar, Aakanksha Rana

Abstract: Omnidirectional images (ODIs), also known as 360-degree images, enable viewers to explore all directions of a given 360-degree scene from a fixed point. Designing an immersive imaging system with ODI is challenging as such systems require very large resolution coverage of the entire 360 viewing space to provide an enhanced quality of experience (QoE). Despite remarkable progress on single image su… ▽ More Omnidirectional images (ODIs), also known as 360-degree images, enable viewers to explore all directions of a given 360-degree scene from a fixed point. Designing an immersive imaging system with ODI is challenging as such systems require very large resolution coverage of the entire 360 viewing space to provide an enhanced quality of experience (QoE). Despite remarkable progress on single image super-resolution (SISR) methods with deep-learning techniques, no study for quality assessments of super-resolved ODIs exists to analyze the quality of such SISR techniques. This paper proposes an objective, full-reference quality assessment framework which studies quality measurement for ODIs generated by GAN-based and CNN-based SISR methods. The quality assessment framework offers to utilize tangential views to cope with the spherical nature of a given ODIs. The generated tangential views are distortion-free and can be efficiently scaled to high-resolution spherical data for SISR quality measurement. We extensively evaluate two state-of-the-art SISR methods using widely used full-reference SISR quality metrics adapted to our designed framework. In addition, our study reveals that most objective metric show high performance over CNN based SISR, while subjective tests favors GAN-based architectures. △ Less

Submitted 25 January, 2021; originally announced January 2021.

Comments: Paper Accepted at Electronic Imaging

arXiv:2012.13457 [pdf, other]

Towards Coordinated Robot Motions: End-to-End Learning of Motion Policies on Transform Trees

Authors: M. Asif Rana, Anqi Li, Dieter Fox, Sonia Chernova, Byron Boots, Nathan Ratliff

Abstract: Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the… ▽ More Generating robot motion that fulfills multiple tasks simultaneously is challenging due to the geometric constraints imposed by the robot. In this paper, we propose to solve multi-task problems through learning structured policies from human demonstrations. Our structured policy is inspired by RMPflow, a framework for combining subtask policies on different spaces. The policy structure provides the user an interface to 1) specifying the spaces that are directly relevant to the completion of the tasks, and 2) designing policies for certain tasks that do not need to be learned. We derive an end-to-end learning objective function that is suitable for the multi-task problem, emphasizing the deviation of motions on task spaces. Furthermore, the motion generated from the learned policy class is guaranteed to be stable. We validate the effectiveness of our proposed learning framework through qualitative and quantitative evaluations on three robotic tasks on a 7-DOF Rethink Sawyer robot. △ Less

Submitted 10 March, 2021; v1 submitted 24 December, 2020; originally announced December 2020.

arXiv:2011.10927 [pdf, other]

We don't Need Thousand Proposals$\colon$ Single Shot Actor-Action Detection in Videos

Authors: Aayush J Rana, Yogesh S Rawat

Abstract: We propose SSA2D, a simple yet effective end-to-end deep network for actor-action detection in videos. The existing methods take a top-down approach based on region-proposals (RPN), where the action is estimated based on the detected proposals followed by post-processing such as non-maximal suppression. While effective in terms of performance, these methods pose limitations in scalability for dens… ▽ More We propose SSA2D, a simple yet effective end-to-end deep network for actor-action detection in videos. The existing methods take a top-down approach based on region-proposals (RPN), where the action is estimated based on the detected proposals followed by post-processing such as non-maximal suppression. While effective in terms of performance, these methods pose limitations in scalability for dense video scenes with a high memory requirement for thousands of proposals. We propose to solve this problem from a different perspective where we don't need any proposals. SSA2D is a unified network, which performs pixel level joint actor-action detection in a single-shot, where every pixel of the detected actor is assigned an action label. SSA2D has two main advantages: 1) It is a fully convolutional network which does not require any proposals and post-processing making it memory as well as time efficient, 2) It is easily scalable to dense video scenes as its memory requirement is independent of the number of actors present in the scene. We evaluate the proposed method on the Actor-Action dataset (A2D) and Video Object Relation (VidOR) dataset, demonstrating its effectiveness in multiple actors and action detection in a video. SSA2D is 11x faster during inference with comparable (sometimes better) performance and fewer network parameters when compared with the prior works. △ Less

Submitted 21 November, 2020; originally announced November 2020.

Comments: 8 pages

arXiv:2010.15676 [pdf, other]

Optimization Fabrics for Behavioral Design

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: A common approach to the provably stable design of reactive behavior, exemplified by operational space control, is to reduce the problem to the design of virtual classical mechanical systems (energy shaping). This framework is widely used, and through it we gain stability, but at the price of expressivity. This work presents a comprehensive theoretical framework expanding this approach showing tha… ▽ More A common approach to the provably stable design of reactive behavior, exemplified by operational space control, is to reduce the problem to the design of virtual classical mechanical systems (energy shaping). This framework is widely used, and through it we gain stability, but at the price of expressivity. This work presents a comprehensive theoretical framework expanding this approach showing that there is a much larger class of differential equations generalizing classical mechanical systems (and the broader class of Lagrangian systems) and greatly expanding their expressivity while maintaining the same governing stability principles. At the core of our framework is a class of differential equations we call fabrics which constitute a behavioral medium across which we can optimize a potential function. These fabrics shape the system's behavior during optimization but still always provably converge to a local minimum, making them a building block of stable behavioral design. We build the theoretical foundations of our framework here and provide a simple empirical demonstration of a practical class of geometric fabrics, which additionally exhibit a natural geometric path consistency making them convenient for flexible and intuitive behavioral design. △ Less

Submitted 25 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

Comments: arXiv admin note: substantial text overlap with arXiv:2008.02399

arXiv:2010.14750 [pdf, other]

Geometric Fabrics for the Acceleration-based Design of Robotic Motion

Authors: Mandy Xie, Karl Van Wyk, Anqi Li, Muhammad Asif Rana, Qian Wan, Dieter Fox, Byron Boots, Nathan Ratliff

Abstract: This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called… ▽ More This paper describes the pragmatic design and construction of geometric fabrics for shaping a robot's task-independent nominal behavior, capturing behavioral components such as obstacle avoidance, joint limit avoidance, redundancy resolution, global navigation heuristics, etc. Geometric fabrics constitute the most concrete incarnation of a new mathematical formulation for reactive behavior called optimization fabrics. Fabrics generalize recent work on Riemannian Motion Policies (RMPs); they add provable stability guarantees and improve design consistency while promoting the intuitive acceleration-based principles of modular design that make RMPs successful. We describe a suite of mathematical modeling tools that practitioners can employ in practice and demonstrate both how to mitigate system complexity by constructing behaviors layer-wise and how to employ these tools to design robust, strongly-generalizing, policies that solve practical problems one would expect to find in industry applications. Our system exhibits intelligent global navigation behaviors expressed entirely as provably stable fabrics with zero planning or state machine governance. △ Less

Submitted 25 June, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.14745 [pdf, other]

Generalized Nonlinear and Finsler Geometry for Robotics

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: Robotics research has found numerous important applications of Riemannian geometry. Despite that, the concept remain challenging to many roboticists because the background material is complex and strikingly foreign. Beyond {\em Riemannian} geometry, there are many natural generalizations in the mathematical literature -- areas such as Finsler geometry and spray geometry -- but those generalization… ▽ More Robotics research has found numerous important applications of Riemannian geometry. Despite that, the concept remain challenging to many roboticists because the background material is complex and strikingly foreign. Beyond {\em Riemannian} geometry, there are many natural generalizations in the mathematical literature -- areas such as Finsler geometry and spray geometry -- but those generalizations are largely inaccessible, and as a result there remain few applications within robotics. This paper presents a re-derivation of spray and Finsler geometries we found critical for the development of our recent work on a powerful behavioral design tool we call geometric fabrics. These derivations build from basic tools in advanced calculus and the calculus of variations making them more accessible to a robotics audience than standard presentations. We focus on the pragmatic and calculable results, avoiding the use of tensor notation to appeal to a broader audience, emphasizing geometric path consistency over ideas around connections and curvature. We hope that these derivations will contribute to an increased understanding of generalized nonlinear, and even classical Riemannian, geometry within the robotics community and inspire future research into new applications. △ Less

Submitted 2 July, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

arXiv:2010.12065 [pdf]

A generalized deep learning model for multi-disease Chest X-Ray diagnostics

Authors: Nabit Bajwa, Kedar Bajwa, Atif Rana, M. Faique Shakeel, Kashif Haqqi, Suleiman Ali Khan

Abstract: We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites. We systematically train the model using datasets from three independent sites with different patient populations: National Institute of Health (NIH), Stanford University Medical Centre (CheXpert), and Shifa International Hospital (SIH… ▽ More We investigate the generalizability of deep convolutional neural network (CNN) on the task of disease classification from chest x-rays collected over multiple sites. We systematically train the model using datasets from three independent sites with different patient populations: National Institute of Health (NIH), Stanford University Medical Centre (CheXpert), and Shifa International Hospital (SIH). We formulate a sequential training approach and demonstrate that the model produces generalized prediction performance using held out test sets from the three sites. Our model generalizes better when trained on multiple datasets, with the CheXpert-Shifa-NET model performing significantly better (p-values < 0.05) than the models trained on individual datasets for 3 out of the 4 distinct disease classes. The code for training the model will be made available open source at: www.github.com/link-to-code at the time of publication. △ Less

Submitted 17 October, 2020; originally announced October 2020.

arXiv:2008.02399 [pdf, other]

Optimization Fabrics

Authors: Nathan D. Ratliff, Karl Van Wyk, Mandy Xie, Anqi Li, Muhammad Asif Rana

Abstract: This paper presents a theory of optimization fabrics, second-order differential equations that encode nominal behaviors on a space and can be used to define the behavior of a smooth optimizer. Optimization fabrics can encode commonalities among optimization problems that reflect the structure of the space itself, enabling smooth optimization processes to intelligently navigate each problem even wh… ▽ More This paper presents a theory of optimization fabrics, second-order differential equations that encode nominal behaviors on a space and can be used to define the behavior of a smooth optimizer. Optimization fabrics can encode commonalities among optimization problems that reflect the structure of the space itself, enabling smooth optimization processes to intelligently navigate each problem even when optimizing simple naive potential functions. Importantly, optimization over a fabric is inherently asymptotically stable. The majority of this paper is dedicated to the development of a tool set for the design and use of a broad class of fabrics called geometric fabrics. Geometric fabrics encode behavior as general nonlinear geometries which are covariant second-order differential equations with a special homogeneity property that ensures their behavior is independent of the system's speed through the medium. A class of Finsler Lagrangian energies can be used to both define how these nonlinear geometries combine with one another and how they react when potential functions force them from their nominal paths. Furthermore, these geometric fabrics are closed under the standard operations of pullback and combination on a transform tree. For behavior representation, this class of geometric fabrics constitutes a broad class of spectral semi-sprays (specs), also known as Riemannian Motion Policies (RMPs) in the context of robotic motion generation, that captures both the intuitive separation between acceleration policy and priority metric critical for modular design and are inherently stable. Therefore, geometric fabrics are safe and easier to use by less experienced behavioral designers. Application of this theory to policy representation and generalization in learning are discussed as well. △ Less

Submitted 21 August, 2020; v1 submitted 5 August, 2020; originally announced August 2020.

arXiv:2008.01116 [pdf, other]

Sub-Pixel Back-Projection Network For Lightweight Single Image Super-Resolution

Authors: Supratik Banerjee, Cagri Ozcinar, Aakanksha Rana, Aljosa Smolic, Michael Manzke

Abstract: Convolutional neural network (CNN)-based methods have achieved great success for single-image superresolution (SISR). However, most models attempt to improve reconstruction accuracy while increasing the requirement of number of model parameters. To tackle this problem, in this paper, we study reducing the number of parameters and computational cost of CNN-based SISR methods while maintaining the a… ▽ More Convolutional neural network (CNN)-based methods have achieved great success for single-image superresolution (SISR). However, most models attempt to improve reconstruction accuracy while increasing the requirement of number of model parameters. To tackle this problem, in this paper, we study reducing the number of parameters and computational cost of CNN-based SISR methods while maintaining the accuracy of super-resolution reconstruction performance. To this end, we introduce a novel network architecture for SISR, which strikes a good trade-off between reconstruction quality and low computational complexity. Specifically, we propose an iterative back-projection architecture using sub-pixel convolution instead of deconvolution layers. We evaluate the performance of computational and reconstruction accuracy for our proposed model with extensive quantitative and qualitative evaluations. Experimental results reveal that our proposed method uses fewer parameters and reduces the computational cost while maintaining reconstruction accuracy against state-of-the-art SISR methods over well-known four SR benchmark datasets. Code is available at "https://github.com/supratikbanerjee/SubPixel-BackProjection_SuperResolution". △ Less

Submitted 3 August, 2020; originally announced August 2020.

Comments: To appear in IMVIP 2020

arXiv:2005.13143 [pdf, other]

Euclideanizing Flows: Diffeomorphic Reduction for Learning Stable Dynamical Systems

Authors: Muhammad Asif Rana, Anqi Li, Dieter Fox, Byron Boots, Fabio Ramos, Nathan Ratliff

Abstract: Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphi… ▽ More Robotic tasks often require motions with complex geometric structures. We present an approach to learn such motions from a limited number of human demonstrations by exploiting the regularity properties of human motions e.g. stability, smoothness, and boundedness. The complex motions are encoded as rollouts of a stable dynamical system, which, under a change of coordinates defined by a diffeomorphism, is equivalent to a simple, hand-specified dynamical system. As an immediate result of using diffeomorphisms, the stability property of the hand-specified dynamical system directly carry over to the learned dynamical system. Inspired by recent works in density estimation, we propose to represent the diffeomorphism as a composition of simple parameterized diffeomorphisms. Additional structure is imposed to provide guarantees on the smoothness of the generated motions. The efficacy of this approach is demonstrated through validation on an established benchmark as well demonstrations collected on a real-world robotic system. △ Less

Submitted 21 September, 2020; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: 2nd Annual Conference on Learning for Dynamics and Control (L4DC) 2020 -- Revised Version

arXiv:2004.11475 [pdf, other]

Gabriella: An Online System for Real-Time Activity Detection in Untrimmed Security Videos

Authors: Mamshad Nayeem Rizve, Ugur Demir, Praveen Tirupattur, Aayush Jung Rana, Kevin Duarte, Ishan Dave, Yogesh Singh Rawat, Mubarak Shah

Abstract: Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security… ▽ More Activity detection in security videos is a difficult problem due to multiple factors such as large field of view, presence of multiple activities, varying scales and viewpoints, and its untrimmed nature. The existing research in activity detection is mainly focused on datasets, such as UCF-101, JHMDB, THUMOS, and AVA, which partially address these issues. The requirement of processing the security videos in real-time makes this even more challenging. In this work we propose Gabriella, a real-time online system to perform activity detection on untrimmed security videos. The proposed method consists of three stages: tubelet extraction, activity classification, and online tubelet merging. For tubelet extraction, we propose a localization network which takes a video clip as input and spatio-temporally detects potential foreground regions at multiple scales to generate action tubelets. We propose a novel Patch-Dice loss to handle large variations in actor size. Our online processing of videos at a clip level drastically reduces the computation time in detecting activities. The detected tubelets are assigned activity class scores by the classification network and merged together using our proposed Tubelet-Merge Action-Split (TMAS) algorithm to form the final action detections. The TMAS algorithm efficiently connects the tubelets in an online fashion to generate action detections which are robust against varying length activities. We perform our experiments on the VIRAT and MEVA (Multiview Extended Video with Activities) datasets and demonstrate the effectiveness of the proposed approach in terms of speed (~100 fps) and performance with state-of-the-art results. The code and models will be made publicly available. △ Less

Submitted 19 May, 2020; v1 submitted 23 April, 2020; originally announced April 2020.

Comments: 9 pages

arXiv:2004.06674 [pdf, other]

Systematically designing better instance counting models on cell images with Neural Arithmetic Logic Units

Authors: Ashish Rana, Taranveer Singh, Harpreet Singh, Neeraj Kumar, Prashant Singh Rana

Abstract: The big problem for neural network models which are trained to count instances is that whenever test range goes high training range generalization error increases i.e. they are not good generalizers outside training range. Consider the case of automating cell counting process where more dense images with higher cell counts are commonly encountered as compared to images used in training data. By ma… ▽ More The big problem for neural network models which are trained to count instances is that whenever test range goes high training range generalization error increases i.e. they are not good generalizers outside training range. Consider the case of automating cell counting process where more dense images with higher cell counts are commonly encountered as compared to images used in training data. By making better predictions for higher ranges of cell count we are aiming to create better generalization systems for cell counting. With architecture proposal of neural arithmetic logic units (NALU) for arithmetic operations, task of counting has become feasible for higher numeric ranges which were not included in training data with better accuracy. As a part of our study we used these units and different other activation functions for learning cell counting task with two different architectures namely Fully Convolutional Regression Network and U-Net. These numerically biased units are added in the form of residual concatenated layers to original architectures and a comparative experimental study is done with these newly proposed changes. This comparative study is described in terms of optimizing regression loss problem from these models trained with extensive data augmentation techniques. We were able to achieve better results in our experiments of cell counting tasks with introduction of these numerically biased units to already existing architectures in the form of residual layer concatenation connections. Our results confirm that above stated numerically biased units does help models to learn numeric quantities for better generalization results. △ Less

Submitted 15 June, 2020; v1 submitted 14 April, 2020; originally announced April 2020.

Comments: * code repository for project: https://github.com/ashishrana160796/nalu-cell-counting

arXiv:2001.10386 [pdf, other]

Taking Recoveries to Task: Recovery-Driven Development for Recipe-based Robot Tasks

Authors: Siddhartha Banerjee, Angel Daruna, David Kent, Weiyu Liu, Jonathan Balloch, Abhinav Jain, Akshay Krishnan, Muhammad Asif Rana, Harish Ravichandar, Binit Shah, Nithin Shrivatsav, Sonia Chernova

Abstract: Robot task execution when situated in real-world environments is fragile. As such, robot architectures must rely on robust error recovery, adding non-trivial complexity to highly-complex robot systems. To handle this complexity in development, we introduce Recovery-Driven Development (RDD), an iterative task scripting process that facilitates rapid task and recovery development by leveraging hiera… ▽ More Robot task execution when situated in real-world environments is fragile. As such, robot architectures must rely on robust error recovery, adding non-trivial complexity to highly-complex robot systems. To handle this complexity in development, we introduce Recovery-Driven Development (RDD), an iterative task scripting process that facilitates rapid task and recovery development by leveraging hierarchical specification, separation of nominal task and recovery development, and situated testing. We validate our approach with our challenge-winning mobile manipulator software architecture developed using RDD for the FetchIt! Challenge at the IEEE 2019 International Conference on Robotics and Automation. We attribute the success of our system to the level of robustness achieved using RDD, and conclude with lessons learned for developing such systems. △ Less

Submitted 28 January, 2020; originally announced January 2020.

Comments: Published and presented at International Symposium on Robotics Research (ISRR), 2019 in Hanoi, Vietnam

arXiv:1912.08868 [pdf, other]

Topic subject creation using unsupervised learning for topic modeling

Authors: Rashid Mehdiyev, Jean Nava, Karan Sodhi, Saurav Acharya, Annie Ibrahim Rana

Abstract: We describe the use of Non-Negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) algorithms to perform topic mining and labelling applied to retail customer communications in attempt to characterize the subject of customers inquiries. In this paper we compare both algorithms in the topic mining performance and propose methods to assign topic subject labels in an automated way. We describe the use of Non-Negative Matrix Factorization (NMF) and Latent Dirichlet Allocation (LDA) algorithms to perform topic mining and labelling applied to retail customer communications in attempt to characterize the subject of customers inquiries. In this paper we compare both algorithms in the topic mining performance and propose methods to assign topic subject labels in an automated way. △ Less

Submitted 18 December, 2019; originally announced December 2019.

arXiv:1911.02725 [pdf, other]

Benchmark for Skill Learning from Demonstration: Impact of User Experience, Task Complexity, and Start Configuration on Performance

Authors: M. Asif Rana, Daphne Chen, S. Reza Ahmadzadeh, Jacob Williams, Vivian Chu, Sonia Chernova

Abstract: In this work, we contribute a large-scale study benchmarking the performance of multiple motion-based learning from demonstration approaches. Given the number and diversity of existing methods, it is critical that comprehensive empirical studies be performed comparing the relative strengths of these learning techniques. In particular, we evaluate four different approaches based on properties an en… ▽ More In this work, we contribute a large-scale study benchmarking the performance of multiple motion-based learning from demonstration approaches. Given the number and diversity of existing methods, it is critical that comprehensive empirical studies be performed comparing the relative strengths of these learning techniques. In particular, we evaluate four different approaches based on properties an end user may desire for real-world tasks. To perform this evaluation, we collected data from nine participants, across four different manipulation tasks with varying starting conditions. The resulting demonstrations were used to train 180 task models and evaluated on 720 task reproductions on a physical robot. Our results detail how i) complexity of the task, ii) the expertise of the human demonstrator, and iii) the starting configuration of the robot affect task performance. The collected dataset of demonstrations, robot executions, and evaluations are being made publicly available. Research insights and guidelines are also provided to guide future research and deployment choices about these approaches. △ Less

Submitted 6 November, 2019; originally announced November 2019.

Comments: 8 pages, 8 figures, submitted to IEEE Robotics and Automation Letters, videos and website can be found at https://sites.google.com/view/rail-lfd

arXiv:1909.03613 [pdf, other]

DublinCity: Annotated LiDAR Point Cloud and its Applications

Authors: S. M. Iman Zolanvari, Susana Ruano, Aakanksha Rana, Alan Cummins, Rogerio Eduardo da Silva, Morteza Rahbar, Aljosa Smolic

Abstract: Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a n… ▽ More Scene understanding of full-scale 3D models of an urban area remains a challenging task. While advanced computer vision techniques offer cost-effective approaches to analyse 3D urban elements, a precise and densely labelled dataset is quintessential. The paper presents the first-ever labelled dataset for a highly dense Aerial Laser Scanning (ALS) point cloud at city-scale. This work introduces a novel benchmark dataset that includes a manually annotated point cloud for over 260 million laser scanning points into 100'000 (approx.) assets from Dublin LiDAR point cloud [12] in 2015. Objects are labelled into 13 classes using hierarchical levels of detail from large (i.e., building, vegetation and ground) to refined (i.e., window, door and tree) elements. To validate the performance of our dataset, two different applications are showcased. Firstly, the labelled point cloud is employed for training Convolutional Neural Networks (CNNs) to classify urban elements. The dataset is tested on the well-known state-of-the-art CNNs (i.e., PointNet, PointNet++ and So-Net). Secondly, the complete ALS dataset is applied as detailed ground truth for city-scale image-based 3D reconstruction. △ Less

Submitted 6 September, 2019; originally announced September 2019.

Comments: Accepted to the 30th British Machine Vision Conference

arXiv:1908.11310 [pdf, other]

Aesthetic Image Captioning From Weakly-Labelled Photographs

Authors: Koustav Ghosal, Aakanksha Rana, Aljosa Smolic

Abstract: Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose an automatic cleaning strategy to create a benchmarking AIC da… ▽ More Aesthetic image captioning (AIC) refers to the multi-modal task of generating critical textual feedbacks for photographs. While in natural image captioning (NIC), deep models are trained in an end-to-end manner using large curated datasets such as MS-COCO, no such large-scale, clean dataset exists for AIC. Towards this goal, we propose an automatic cleaning strategy to create a benchmarking AIC dataset, by exploiting the images and noisy comments easily available from photography websites. We propose a probabilistic caption-filtering method for cleaning the noisy web-data, and compile a large-scale, clean dataset "AVA-Captions", (230, 000 images with 5 captions per image). Additionally, by exploiting the latent associations between aesthetic attributes, we propose a strategy for training the convolutional neural network (CNN) based visual feature extractor, the first component of the AIC framework. The strategy is weakly supervised and can be effectively used to learn rich aesthetic representations, without requiring expensive ground-truth annotations. We finally show-case a thorough analysis of the proposed contributions using automatic metrics and subjective evaluations. △ Less

Submitted 29 August, 2019; originally announced August 2019.

Comments: International Workshop on Cross-Modal Learning in Real World, ICCV 2019

arXiv:1908.08505 [pdf, other]

ColorNet -- Estimating Colorfulness in Natural Images

Authors: Emin Zerman, Aakanksha Rana, Aljosa Smolic

Abstract: Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the mapping from featu… ▽ More Measuring the colorfulness of a natural or virtual scene is critical for many applications in image processing field ranging from capturing to display. In this paper, we propose the first deep learning-based colorfulness estimation metric. For this purpose, we develop a color rating model which simultaneously learns to extracts the pertinent characteristic color features and the mapping from feature space to the ideal colorfulness scores for a variety of natural colored images. Additionally, we propose to overcome the lack of adequate annotated dataset problem by combining/aligning two publicly available colorfulness databases using the results of a new subjective test which employs a common subset of both databases. Using the obtained subjectively annotated dataset with 180 colored images, we finally demonstrate the efficacy of our proposed model over the traditional methods, both quantitatively and qualitatively. △ Less

Submitted 22 August, 2019; originally announced August 2019.

Comments: Accepted to IEEE International Conference on Image Processing (ICIP) 2019

arXiv:1908.06752 [pdf, other]

doi 10.1109/ICASSP.2019.8683318

Towards Generating Ambisonics Using Audio-Visual Cue for Virtual Reality

Authors: Aakanksha Rana, Cagri Ozcinar, Aljoscha Smolic

Abstract: Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this pape… ▽ More Ambisonics i.e., a full-sphere surround sound, is quintessential with 360-degree visual content to provide a realistic virtual reality (VR) experience. While 360-degree visual content capture gained a tremendous boost recently, the estimation of corresponding spatial sound is still challenging due to the required sound-field microphones or information about the sound-source locations. In this paper, we introduce a novel problem of generating Ambisonics in 360-degree videos using the audio-visual cue. With this aim, firstly, a novel 360-degree audio-visual video dataset of 265 videos is introduced with annotated sound-source locations. Secondly, a pipeline is designed for an automatic Ambisonic estimation problem. Benefiting from the deep learning-based audio-visual feature-embedding and prediction modules, our pipeline estimates the 3D sound-source locations and further use such locations to encode to the B-format. To benchmark our dataset and pipeline, we additionally propose evaluation criteria to investigate the performance using different 360-degree input representations. Our results demonstrate the efficacy of the proposed pipeline and open up a new area of research in 360-degree audio-visual analysis for future investigations. △ Less

Submitted 16 August, 2019; originally announced August 2019.

Comments: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv:1908.04297 [pdf, other]

Super-resolution of Omnidirectional Images Using Adversarial Learning

Authors: Cagri Ozcinar, Aakanksha Rana, Aljosa Smolic

Abstract: An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversari… ▽ More An omnidirectional image (ODI) enables viewers to look in every direction from a fixed point through a head-mounted display providing an immersive experience compared to that of a standard image. Designing immersive virtual reality systems with ODIs is challenging as they require high resolution content. In this paper, we study super-resolution for ODIs and propose an improved generative adversarial network based model which is optimized to handle the artifacts obtained in the spherical observational space. Specifically, we propose to use a fast PatchGAN discriminator, as it needs fewer parameters and improves the super-resolution at a fine scale. We also explore the generative models with adversarial learning by introducing a spherical-content specific loss function, called 360-SS. To train and test the performance of our proposed model we prepare a dataset of 4500 ODIs. Our results demonstrate the efficacy of the proposed method and identify new challenges in ODI super-resolution for future investigations. △ Less

Submitted 12 August, 2019; originally announced August 2019.

arXiv:1908.04197 [pdf, other]

doi 10.1109/TIP.2019.2936649

Deep Tone Mapping Operator for High Dynamic Range Images

Authors: Aakanksha Rana, Praveer Singh, Giuseppe Valenzise, Frederic Dufaux, Nikos Komodakis, Aljosa Smolic

Abstract: A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subje… ▽ More A computationally fast tone mapping operator (TMO) that can quickly adapt to a wide spectrum of high dynamic range (HDR) content is quintessential for visualization on varied low dynamic range (LDR) output devices such as movie screens or standard displays. Existing TMOs can successfully tone-map only a limited number of HDR content and require an extensive parameter tuning to yield the best subjective-quality tone-mapped output. In this paper, we address this problem by proposing a fast, parameter-free and scene-adaptable deep tone mapping operator (DeepTMO) that yields a high-resolution and high-subjective quality tone mapped output. Based on conditional generative adversarial network (cGAN), DeepTMO not only learns to adapt to vast scenic-content (e.g., outdoor, indoor, human, structures, etc.) but also tackles the HDR related scene-specific challenges such as contrast and brightness, while preserving the fine-grained details. We explore 4 possible combinations of Generator-Discriminator architectural designs to specifically address some prominent issues in HDR related deep-learning frameworks like blurring, tiling patterns and saturation artifacts. By exploring different influences of scales, loss-functions and normalization layers under a cGAN setting, we conclude with adopting a multi-scale model for our task. To further leverage on the large-scale availability of unlabeled HDR data, we train our network by generating targets using an objective HDR quality metric, namely Tone Mapping Image Quality Index (TMQI). We demonstrate results both quantitatively and qualitatively, and showcase that our DeepTMO generates high-resolution, high-quality output images over a large spectrum of real-world scenes. Finally, we evaluate the perceived quality of our results by conducting a pair-wise subjective study which confirms the versatility of our method. △ Less

Submitted 12 August, 2019; originally announced August 2019.

arXiv:1908.01593 [pdf, other]

doi 10.1001/jamanetworkopen.2020.5111

High Accuracy Tumor Diagnoses and Benchmarking of Hematoxylin and Eosin Stained Prostate Core Biopsy Images Generated by Explainable Deep Neural Networks

Authors: Aman Rana, Alarice Lowe, Marie Lithgow, Katharine Horback, Tyler Janovitz, Annacarolina Da Silva, Harrison Tsai, Vignesh Shanmugam, Hyung-Jin Yoon, Pratik Shah

Abstract: Histopathological diagnoses of tumors in tissue biopsy after Hematoxylin and Eosin (H&E) staining is the gold standard for oncology care. H&E staining is slow and uses dyes, reagents and precious tissue samples that cannot be reused. Thousands of native nonstained RGB Whole Slide Image (RWSI) patches of prostate core tissue biopsies were registered with their H&E stained versions. Conditional Gene… ▽ More Histopathological diagnoses of tumors in tissue biopsy after Hematoxylin and Eosin (H&E) staining is the gold standard for oncology care. H&E staining is slow and uses dyes, reagents and precious tissue samples that cannot be reused. Thousands of native nonstained RGB Whole Slide Image (RWSI) patches of prostate core tissue biopsies were registered with their H&E stained versions. Conditional Generative Adversarial Neural Networks (cGANs) that automate conversion of native nonstained RWSI to computational H&E stained images were then trained. High similarities between computational and H&E dye stained images with Structural Similarity Index (SSIM) 0.902, Pearsons Correlation Coefficient (CC) 0.962 and Peak Signal to Noise Ratio (PSNR) 22.821 dB were calculated. A second cGAN performed accurate computational destaining of H&E dye stained images back to their native nonstained form with SSIM 0.9, CC 0.963 and PSNR 25.646 dB. A single-blind study computed more than 95% pixel-by-pixel overlap between prostate tumor annotations on computationally stained images, provided by five-board certified MD pathologists, with those on H&E dye stained counterparts. We report the first visualization and explanation of neural network kernel activation maps during H&E staining and destaining of RGB images by cGANs. High similarities between kernel activation maps of computational and H&E stained images (Mean-Squared Errors <0.0005) provide additional mathematical and mechanistic validation of the staining system. Our neural network framework thus is automated, explainable and performs high precision H&E staining and destaining of low cost native RGB images, and is computer vision and physician authenticated for rapid and accurate tumor diagnoses. △ Less

Submitted 2 August, 2019; originally announced August 2019.

Journal ref: JAMA Network. 2020;3(5):e205111

arXiv:1903.11725 [pdf, other]

Skill Acquisition via Automated Multi-Coordinate Cost Balancing

Authors: Harish Ravichandar, S. Reza Ahmadzadeh, M. Asif Rana, Sonia Chernova

Abstract: We propose a learning framework, named Multi-Coordinate Cost Balancing (MCCB), to address the problem of acquiring point-to-point movement skills from demonstrations. MCCB encodes demonstrations simultaneously in multiple differential coordinates that specify local geometric properties. MCCB generates reproductions by solving a convex optimization problem with a multi-coordinate cost function and… ▽ More We propose a learning framework, named Multi-Coordinate Cost Balancing (MCCB), to address the problem of acquiring point-to-point movement skills from demonstrations. MCCB encodes demonstrations simultaneously in multiple differential coordinates that specify local geometric properties. MCCB generates reproductions by solving a convex optimization problem with a multi-coordinate cost function and linear constraints on the reproductions, such as initial, target, and via points. Further, since the relative importance of each coordinate system in the cost function might be unknown for a given skill, MCCB learns optimal weighting factors that balance the cost function. We demonstrate the effectiveness of MCCB via detailed experiments conducted on one handwriting dataset and three complex skill datasets. △ Less

Submitted 27 March, 2019; originally announced March 2019.

Comments: Accepted for publication in proceedings of ICRA 2019

arXiv:1811.02659 [pdf, other]

doi 10.1109/ICMLA.2018.00097

Machine Learning Algorithms for Classification of Microcirculation Images from Septic and Non-Septic Patients

Authors: Perikumar Javia, Aman Rana, Nathan Shapiro, Pratik Shah

Abstract: Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the… ▽ More Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the receiver operating characteristics of the classifier was 0.92, the precision was 0.92 and the recall was 0.84. Codes representing the learned feature space of trained classifier were visualized using t-SNE embedding and were separable and distinguished between images from critically ill and non-septic patients. Using an unsupervised convolutional autoencoder, independent of the clinical diagnosis, we also report clustering of learned features from a compressed representation associated with healthy images and those with microcirculatory dysfunction. The feature space used by our trained classifier to distinguish between images from septic and non-septic patients has potential diagnostic application. △ Less

Submitted 20 February, 2019; v1 submitted 24 October, 2018; originally announced November 2018.

Comments: Accepted for publication at 2018 IEEE International Conference on Machine Learning and Applications (IEEE ICMLA)

arXiv:1811.02642 [pdf, other]

doi 10.1109/ICMLA.2018.00133

Computational Histological Staining and Destaining of Prostate Core Biopsy RGB Images with Generative Adversarial Neural Networks

Authors: Aman Rana, Gregory Yauney, Alarice Lowe, Pratik Shah

Abstract: Histopathology tissue samples are widely available in two states: paraffin-embedded unstained and non-paraffin-embedded stained whole slide RGB images (WSRI). Hematoxylin and eosin stain (H&E) is one of the principal stains in histology but suffers from several shortcomings related to tissue preparation, staining protocols, slowness and human error. We report two novel approaches for training mach… ▽ More Histopathology tissue samples are widely available in two states: paraffin-embedded unstained and non-paraffin-embedded stained whole slide RGB images (WSRI). Hematoxylin and eosin stain (H&E) is one of the principal stains in histology but suffers from several shortcomings related to tissue preparation, staining protocols, slowness and human error. We report two novel approaches for training machine learning models for the computational H&E staining and destaining of prostate core biopsy RGB images. The staining model uses a conditional generative adversarial network that learns hierarchical non-linear mappings between whole slide RGB image (WSRI) pairs of prostate core biopsy before and after H&E staining. The trained staining model can then generate computationally H&E-stained prostate core WSRIs using previously unseen non-stained biopsy images as input. The destaining model, by learning mappings between an H&E stained WSRI and a non-stained WSRI of the same biopsy, can computationally destain previously unseen H&E-stained images. Structural and anatomical details of prostate tissue and colors, shapes, geometries, locations of nuclei, stroma, vessels, glands and other cellular components were generated by both models with structural similarity indices of 0.68 (staining) and 0.84 (destaining). The proposed staining and destaining models can engender computational H&E staining and destaining of WSRI biopsies without additional equipment and devices. △ Less

Submitted 20 February, 2019; v1 submitted 26 October, 2018; originally announced November 2018.

Comments: Accepted for publication at 2018 IEEE International Conference on Machine Learning and Applications (ICMLA)

arXiv:1810.10664 [pdf, other]

Automated Process Incorporating Machine Learning Segmentation and Correlation of Oral Diseases with Systemic Health

Authors: Gregory Yauney, Aman Rana, Lawrence C. Wong, Perikumar Javia, Ali Muftu, Pratik Shah

Abstract: Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90… ▽ More Imaging fluorescent disease biomarkers in tissues and skin is a non-invasive method to screen for health conditions. We report an automated process that combines intraoral fluorescent porphyrin biomarker imaging, clinical examinations and machine learning for correlation of systemic health conditions with periodontal disease. 1215 intraoral fluorescent images, from 284 consenting adults aged 18-90, were analyzed using a machine learning classifier that can segment periodontal inflammation. The classifier achieved an AUC of 0.677 with precision and recall of 0.271 and 0.429, respectively, indicating a learned association between disease signatures in collected images. Periodontal diseases were more prevalent among males (p=0.0012) and older subjects (p=0.0224) in the screened population. Physicians independently examined the collected images, assigning localized modified gingival indices (MGIs). MGIs and periodontal disease were then cross-correlated with responses to a medical history questionnaire, blood pressure and body mass index measurements, and optic nerve, tympanic membrane, neurological, and cardiac rhythm imaging examinations. Gingivitis and early periodontal disease were associated with subjects diagnosed with optic nerve abnormalities (p <0.0001) in their retinal scans. We also report significant co-occurrences of periodontal disease in subjects reporting swollen joints (p=0.0422) and a family history of eye disease (p=0.0337). These results indicate cross-correlation of poor periodontal health with systemic health outcomes and stress the importance of oral health screenings at the primary care level. Our screening process and analysis method, using images and machine learning, can be generalized for automated diagnoses and systemic health screenings for other diseases. △ Less

Submitted 24 October, 2018; originally announced October 2018.

Comments: Submitted to IEEE Journal of Biomedical and Health Informatics, 2018

arXiv:1808.00349 [pdf, other]

Learning Generalizable Robot Skills from Demonstrations in Cluttered Environments

Authors: Muhammad Asif Rana, Mustafa Mukadam, Seyed Reza Ahmadzadeh, Sonia Chernova, Byron Boots

Abstract: Learning from Demonstration (LfD) is a popular approach to endowing robots with skills without having to program them by hand. Typically, LfD relies on human demonstrations in clutter-free environments. This prevents the demonstrations from being affected by irrelevant objects, whose influence can obfuscate the true intention of the human or the constraints of the desired skill. However, it is unr… ▽ More Learning from Demonstration (LfD) is a popular approach to endowing robots with skills without having to program them by hand. Typically, LfD relies on human demonstrations in clutter-free environments. This prevents the demonstrations from being affected by irrelevant objects, whose influence can obfuscate the true intention of the human or the constraints of the desired skill. However, it is unrealistic to assume that the robot's environment can always be restructured to remove clutter when capturing human demonstrations. To contend with this problem, we develop an importance weighted batch and incremental skill learning approach, building on a recent inference-based technique for skill representation and reproduction. Our approach reduces unwanted environmental influences on the learned skill, while still capturing the salient human behavior. We provide both batch and incremental versions of our approach and validate our algorithms on a 7-DOF JACO2 manipulator with reaching and placing skills. △ Less

Submitted 3 August, 2018; v1 submitted 1 August, 2018; originally announced August 2018.

Comments: 6 pages, 9 figures, accepted in International Conference on Intelligent Robots & Systems (IROS), 2018

arXiv:1705.00218 [pdf]

A floating point division unit based on Taylor-Series expansion algorithm and Iterative Logarithmic Multiplier

Authors: Riyansh K. Karani, Akash K. Rana, Dhruv H. Reshamwala, Kishore Saldanha

Abstract: Floating point division, even though being an infrequent operation in the traditional sense, is indis- pensable when it comes to a range of non-traditional applications such as K-Means Clustering and QR Decomposition just to name a few. In such applications, hardware support for floating point division would boost the performance of the entire system. In this paper, we present a novel architecture… ▽ More Floating point division, even though being an infrequent operation in the traditional sense, is indis- pensable when it comes to a range of non-traditional applications such as K-Means Clustering and QR Decomposition just to name a few. In such applications, hardware support for floating point division would boost the performance of the entire system. In this paper, we present a novel architecture for a floating point division unit based on the Taylor-series expansion algorithm. We show that the Iterative Logarithmic Multiplier is very well suited to be used as a part of this architecture. We propose an implementation of the powering unit that can calculate an odd power and an even power of a number simultaneously, meanwhile having little hardware overhead when compared to the Iterative Logarithmic Multiplier. △ Less

Submitted 29 April, 2017; originally announced May 2017.

Comments: NeCoM, CSITEC - 2016

arXiv:1701.08546 [pdf, ps, other]

Survey on Models and Techniques for Root-Cause Analysis

Authors: Marc Solé, Victor Muntés-Mulero, Annie Ibrahim Rana, Giovani Estrada

Abstract: Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how… ▽ More Automation and computer intelligence to support complex human decisions becomes essential to manage large and distributed systems in the Cloud and IoT era. Understanding the root cause of an observed symptom in a complex system has been a major problem for decades. As industry dives into the IoT world and the amount of data generated per year grows at an amazing speed, an important question is how to find appropriate mechanisms to determine root causes that can handle huge amounts of data or may provide valuable feedback in real-time. While many survey papers aim at summarizing the landscape of techniques for modelling system behavior and infering the root cause of a problem based in the resulting models, none of those focuses on analyzing how the different techniques in the literature fit growing requirements in terms of performance and scalability. In this survey, we provide a review of root-cause analysis, focusing on these particular aspects. We also provide guidance to choose the best root-cause analysis strategy depending on the requirements of a particular system and application. △ Less

Submitted 3 July, 2017; v1 submitted 30 January, 2017; originally announced January 2017.

Comments: 18 pages, 222 references

arXiv:1512.02332 [pdf, ps, other]

$(1-2u^k)$-constacyclic codes over $\mathbb{F}_p+u\mathbb{F}_p+u^2\mathbb{F}_+u^{3}\mathbb{F}_{p}+\dots+u^{k}\mathbb{F}_{p}$

Authors: Zahid Raza, Amrina Rana

Abstract: Let $\mathbb{F}_p$ be a finite field and $u$ be an indeterminate. This article studies $(1-2u^k)$-constacyclic codes over the ring $\mathcal{R}=\mathbb{F}_p+u\mathbb{F}_p+u^2\mathbb{F}_p+u^{3}\mathbb{F}_{p}+\cdots+u^{k}\mathbb{F}_{p}$ where $u^{k+1}=u$. We illustrate the generator polynomials and investigate the structural properties of these codes via decomposition theorem. Let $\mathbb{F}_p$ be a finite field and $u$ be an indeterminate. This article studies $(1-2u^k)$-constacyclic codes over the ring $\mathcal{R}=\mathbb{F}_p+u\mathbb{F}_p+u^2\mathbb{F}_p+u^{3}\mathbb{F}_{p}+\cdots+u^{k}\mathbb{F}_{p}$ where $u^{k+1}=u$. We illustrate the generator polynomials and investigate the structural properties of these codes via decomposition theorem. △ Less

Submitted 17 April, 2019; v1 submitted 8 December, 2015; originally announced December 2015.

arXiv:1412.6359 [pdf]

An Empirical Study on Refactoring Activity

Authors: Mohammad Iftekharul Hoque, Vijay Nag Ranga, Anurag Reddy Pedditi, Rachitha Srinath, Md Ali Ahsan Rana, Md Eftakhairul Islam, Afshin Somani

Abstract: This paper reports an empirical study on refactoring activity in three Java software systems. We investigated some questions on refactoring activity, to confirm or disagree on conclusions that have been drawn from previous empirical studies. Unlike previous empirical studies, our study found that it is not always true that there are more refactoring activities before major project release date tha… ▽ More This paper reports an empirical study on refactoring activity in three Java software systems. We investigated some questions on refactoring activity, to confirm or disagree on conclusions that have been drawn from previous empirical studies. Unlike previous empirical studies, our study found that it is not always true that there are more refactoring activities before major project release date than after. In contrast, we were able to confirm that software developers perform different types of refactoring operations on test code and production code, specific developers are responsible for refactorings in the project, refactoring edits are not very well tested. Further, floss refactoring is more popular among the developers, refactoring activity is frequent in the projects, majority of bad smells once occurred they persist up to the latest version of the system. By confirming assumptions by other researchers we can have greater confidence that those research conclusions are generalizable. △ Less

Submitted 17 December, 2014; originally announced December 2014.

Comments: 11 pages, 9 figures, 1 table

ACM Class: D.2; K.6; H.5.2

arXiv:1407.0697 [pdf]

How to Track Online SLA

Authors: Anuradha Rana, Pratima Sharma

Abstract: SLA (Service level agreement) is defined by an organization to fulfil its client requirements, the time within which the deliverables should be turned over to the clients. Tracking of SLA can be done manually by checking the status, priority of any particular task. Manual SLA tracking takes time as one has to go over each and every task that needs to be completed. For instance, you ordered a produ… ▽ More SLA (Service level agreement) is defined by an organization to fulfil its client requirements, the time within which the deliverables should be turned over to the clients. Tracking of SLA can be done manually by checking the status, priority of any particular task. Manual SLA tracking takes time as one has to go over each and every task that needs to be completed. For instance, you ordered a product from a website and you are not happy with the quality of the product and want to replace the same on urgent basis, You send mail to the customer support department, the query/complaint will be submitted in a queue and will be processed basis of its priority and urgency (The SLA for responding back to customers concern are listed in the policy). This online SLA tracking system will ensure that no queries/complaints are missed and are processed in an organized manner as per their priority and the date by when it should be handled. The portal will provide the status of the complaints for that particular day and the ones which have been pending since last week. The information can be refreshed as per the client need (within what time frame the complaint should be addressed). △ Less

Submitted 2 July, 2014; originally announced July 2014.

Showing 1–50 of 52 results for author: Rana, A