-
Lung-CADex: Fully automatic Zero-Shot Detection and Classification of Lung Nodules in Thoracic CT Images
Authors:
Furqan Shaukat,
Syed Muhammad Anwar,
Abhijeet Parida,
Van Khanh Lam,
Marius George Linguraru,
Mubarak Shah
Abstract:
Lung cancer has been one of the major threats to human life for decades. Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization. Large Visual Language models (VLMs) have been found effective for multiple downstream medical tasks that rely on both imaging and text data. However, lesion level detection and subsequent diagnosis using VLMs h…
▽ More
Lung cancer has been one of the major threats to human life for decades. Computer-aided diagnosis can help with early lung nodul detection and facilitate subsequent nodule characterization. Large Visual Language models (VLMs) have been found effective for multiple downstream medical tasks that rely on both imaging and text data. However, lesion level detection and subsequent diagnosis using VLMs have not been explored yet. We propose CADe, for segmenting lung nodules in a zero-shot manner using a variant of the Segment Anything Model called MedSAM. CADe trains on a prompt suite on input computed tomography (CT) scans by using the CLIP text encoder through prefix tuning. We also propose, CADx, a method for the nodule characterization as benign/malignant by making a gallery of radiomic features and aligning image-feature pairs through contrastive learning. Training and validation of CADe and CADx have been done using one of the largest publicly available datasets, called LIDC. To check the generalization ability of the model, it is also evaluated on a challenging dataset, LUNGx. Our experimental results show that the proposed methods achieve a sensitivity of 0.86 compared to 0.76 that of other fully supervised methods.The source code, datasets and pre-processed data can be accessed using the link:
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Xi-Net: Transformer Based Seismic Waveform Reconstructor
Authors:
Anshuman Gaharwar,
Parth Parag Kulkarni,
Joshua Dickey,
Mubarak Shah
Abstract:
Missing/erroneous data is a major problem in today's world. Collected seismic data sometimes contain gaps due to multitude of reasons like interference and sensor malfunction. Gaps in seismic waveforms hamper further signal processing to gain valuable information. Plethora of techniques are used for data reconstruction in other domains like image, video, audio, but translation of those methods to…
▽ More
Missing/erroneous data is a major problem in today's world. Collected seismic data sometimes contain gaps due to multitude of reasons like interference and sensor malfunction. Gaps in seismic waveforms hamper further signal processing to gain valuable information. Plethora of techniques are used for data reconstruction in other domains like image, video, audio, but translation of those methods to address seismic waveforms demands adapting them to lengthy sequence inputs, which is practically complex. Even if that is accomplished, high computational costs and inefficiency would still persist in these predominantly convolution-based reconstruction models. In this paper, we present a transformer-based deep learning model, Xi-Net, which utilizes multi-faceted time and frequency domain inputs for accurate waveform reconstruction. Xi-Net converts the input waveform to frequency domain, employs separate encoders for time and frequency domains, and one decoder for getting reconstructed output waveform from the fused features. 1D shifted-window transformer blocks form the elementary units of all parts of the model. To the best of our knowledge, this is the first transformer-based deep learning model for seismic waveform reconstruction. We demonstrate this model's prowess by filling 0.5-1s random gaps in 120s waveforms, resembling the original waveform quite closely. The code, models can be found at: https://github.com/Anshuman04/waveformReconstructor.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
SoccerNet-Echoes: A Soccer Game Audio Commentary Dataset
Authors:
Sushant Gautam,
Mehdi Houshmand Sarkhoosh,
Jan Held,
Cise Midoglu,
Anthony Cioppa,
Silvio Giancola,
Vajira Thambawita,
Michael A. Riegler,
Pål Halvorsen,
Mubarak Shah
Abstract:
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet…
▽ More
The application of Automatic Speech Recognition (ASR) technology in soccer offers numerous opportunities for sports analytics. Specifically, extracting audio commentaries with ASR provides valuable insights into the events of the game, and opens the door to several downstream applications such as automatic highlight generation. This paper presents SoccerNet-Echoes, an augmentation of the SoccerNet dataset with automatically generated transcriptions of audio commentaries from soccer game broadcasts, enhancing video content with rich layers of textual information derived from the game audio using ASR. These textual commentaries, generated using the Whisper model and translated with Google Translate, extend the usefulness of the SoccerNet dataset in diverse applications such as enhanced action spotting, automatic caption generation, and game summarization. By incorporating textual data alongside visual and auditory content, SoccerNet-Echoes aims to serve as a comprehensive resource for the development of algorithms specialized in capturing the dynamics of soccer games. We detail the methods involved in the curation of this dataset and the integration of ASR. We also highlight the implications of a multimodal approach in sports analytics, and how the enriched dataset can support diverse applications, thus broadening the scope of research and development in the field of sports analytics.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images
Authors:
Fatema Tuj Johora Faria,
Mukaffi Bin Moin,
Pronay Debnath,
Asif Iftekher Fahim,
Faisal Muhammad Shah
Abstract:
Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our…
▽ More
Our research focuses on the critical field of early diagnosis of disease by examining retinal blood vessels in fundus images. While automatic segmentation of retinal blood vessels holds promise for early detection, accurate analysis remains challenging due to the limitations of existing methods, which often lack discrimination power and are susceptible to influences from pathological regions. Our research in fundus image analysis advances deep learning-based classification using eight pre-trained CNN models. To enhance interpretability, we utilize Explainable AI techniques such as Grad-CAM, Grad-CAM++, Score-CAM, Faster Score-CAM, and Layer CAM. These techniques illuminate the decision-making processes of the models, fostering transparency and trust in their predictions. Expanding our exploration, we investigate ten models, including TransUNet with ResNet backbones, Attention U-Net with DenseNet and ResNet backbones, and Swin-UNET. Incorporating diverse architectures such as ResNet50V2, ResNet101V2, ResNet152V2, and DenseNet121 among others, this comprehensive study deepens our insights into attention mechanisms for enhanced fundus image analysis. Among the evaluated models for fundus image classification, ResNet101 emerged with the highest accuracy, achieving an impressive 94.17%. On the other end of the spectrum, EfficientNetB0 exhibited the lowest accuracy among the models, achieving a score of 88.33%. Furthermore, in the domain of fundus image segmentation, Swin-Unet demonstrated a Mean Pixel Accuracy of 86.19%, showcasing its effectiveness in accurately delineating regions of interest within fundus images. Conversely, Attention U-Net with DenseNet201 backbone exhibited the lowest Mean Pixel Accuracy among the evaluated models, achieving a score of 75.87%.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
The Brain Tumor Segmentation in Pediatrics (BraTS-PEDs) Challenge: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)
Authors:
Anahita Fathi Kazerooni,
Nastaran Khalili,
Deep Gandhi,
Xinyang Liu,
Zhifan Jiang,
Syed Muhammed Anwar,
Jake Albrecht,
Maruf Adewole,
Udunna Anazodo,
Hannah Anderson,
Sina Bagheri,
Ujjwal Baid,
Timothy Bergquist,
Austin J. Borja,
Evan Calabrese,
Verena Chung,
Gian-Marco Conte,
Farouk Dako,
James Eddy,
Ivan Ezhov,
Ariana Familiar,
Keyvan Farahani,
Anurag Gottipati,
Debanjan Haldar,
Shuvanjan Haldar
, et al. (51 additional authors not shown)
Abstract:
Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we pr…
▽ More
Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. Here we present the CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge, focused on pediatric brain tumors with data acquired across multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. The CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs challenge brings together clinicians and AI/imaging scientists to lead to faster development of automated segmentation techniques that could benefit clinical trials, and ultimately the care of children with brain tumors.
△ Less
Submitted 29 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Authors:
Muhammad A. Shah,
David Solans Noguero,
Mikko A. Heikkila,
Nicolas Kourtellis
Abstract:
As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate…
▽ More
As Automatic Speech Recognition (ASR) models become ever more pervasive, it is important to ensure that they make reliable predictions under corruptions present in the physical and digital world. We propose Speech Robust Bench (SRB), a comprehensive benchmark for evaluating the robustness of ASR models to diverse corruptions. SRB is composed of 69 input perturbations which are intended to simulate various corruptions that ASR models may encounter in the physical and digital world. We use SRB to evaluate the robustness of several state-of-the-art ASR models and observe that model size and certain modeling choices such as discrete representations, and self-training appear to be conducive to robustness. We extend this analysis to measure the robustness of ASR models on data from various demographic subgroups, namely English and Spanish speakers, and males and females, and observed noticeable disparities in the model's robustness across subgroups. We believe that SRB will facilitate future research towards robust ASR models, by making it easier to conduct comprehensive and comparable robustness evaluations.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Dynamic Loco-manipulation on HECTOR: Humanoid for Enhanced ConTrol and Open-source Research
Authors:
Junheng Li,
Junchao Ma,
Omar Kolt,
Manas Shah,
Quan Nguyen
Abstract:
Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Cont…
▽ More
Despite their remarkable advancement in locomotion and manipulation, humanoid robots remain challenged by a lack of synchronized loco-manipulation control, hindering their full dynamic potential. In this work, we introduce a versatile and effective approach to controlling and generalizing dynamic locomotion and loco-manipulation on humanoid robots via a Force-and-moment-based Model Predictive Control (MPC). Specifically, we proposed a simplified rigid body dynamics (SRBD) model to take into account both humanoid and object dynamics for humanoid loco-manipulation. This linear dynamics model allows us to directly solve for ground reaction forces and moments via an MPC problem to achieve highly dynamic real-time control. Our proposed framework is highly versatile and generalizable. We introduce HECTOR (Humanoid for Enhanced ConTrol and Open-source Research) platform to demonstrate its effectiveness in hardware experiments. With the proposed framework, HECTOR can maintain exceptional balance during double-leg stance mode, even when subjected to external force disturbances to the body or foot location. In addition, it can execute 3-D dynamic walking on a variety of uneven terrains, including wet grassy surfaces, slopes, randomly placed wood slats, and stacked wood slats up to 6 cm high with the speed of 0.6 m/s. In addition, we have demonstrated dynamic humanoid loco-manipulation over uneven terrain, carrying 2.5 kg load. HECTOR simulations, along with the proposed control framework, are made available as an open-source project. (https://github.com/DRCL-USC/Hector_Simulation).
△ Less
Submitted 21 December, 2023; v1 submitted 19 December, 2023;
originally announced December 2023.
-
Impact of Urban Street Geometry on the Detection Probability of Automotive Radars
Authors:
Mohammad Taha Shah,
Ankit Kumar,
Gourab Ghatak,
Shobha Sundar Ram
Abstract:
Prior works have analyzed the performance of millimeter wave automotive radars in the presence of diverse clutter and interference scenarios using stochastic geometry tools instead of more time-consuming measurement studies or system-level simulations. In these works, the distributions of radars or discrete clutter scatterers were modeled as Poisson point processes in the Euclidean space. However,…
▽ More
Prior works have analyzed the performance of millimeter wave automotive radars in the presence of diverse clutter and interference scenarios using stochastic geometry tools instead of more time-consuming measurement studies or system-level simulations. In these works, the distributions of radars or discrete clutter scatterers were modeled as Poisson point processes in the Euclidean space. However, since most automotive radars are likely to be mounted on vehicles and road infrastructure, road geometries are an important factor that must be considered. Instead of considering each road geometry as an individual case for study, in this work, we model each case as a specific instance of an underlying Poisson line process and further model the distribution of vehicles on the road as a Poisson point process - forming a Poisson line Cox process. Then, through the use of stochastic geometry tools, we estimate the average number of interfering radars for specific road and vehicular densities and the effect of radar parameters such as noise and beamwidth on the radar detection metrics. The numerical results are validated with Monte Carlo simulations.
△ Less
Submitted 9 December, 2023;
originally announced December 2023.
-
L-WaveBlock: A Novel Feature Extractor Leveraging Wavelets for Generative Adversarial Networks
Authors:
Mirat Shah,
Vansh Jain,
Anmol Chokshi,
Guruprasad Parasnis,
Pramod Bide
Abstract:
Generative Adversarial Networks (GANs) have risen to prominence in the field of deep learning, facilitating the generation of realistic data from random noise. The effectiveness of GANs often depends on the quality of feature extraction, a critical aspect of their architecture. This paper introduces L-WaveBlock, a novel and robust feature extractor that leverages the capabilities of the Discrete W…
▽ More
Generative Adversarial Networks (GANs) have risen to prominence in the field of deep learning, facilitating the generation of realistic data from random noise. The effectiveness of GANs often depends on the quality of feature extraction, a critical aspect of their architecture. This paper introduces L-WaveBlock, a novel and robust feature extractor that leverages the capabilities of the Discrete Wavelet Transform (DWT) with deep learning methodologies. L-WaveBlock is catered to quicken the convergence of GAN generators while simultaneously enhancing their performance. The paper demonstrates the remarkable utility of L-WaveBlock across three datasets, a road satellite imagery dataset, the CelebA dataset and the GoPro dataset, showcasing its ability to ease feature extraction and make it more efficient. By utilizing DWT, L-WaveBlock efficiently captures the intricate details of both structural and textural details, and further partitions feature maps into orthogonal subbands across multiple scales while preserving essential information at the same time. Not only does it lead to faster convergence, but also gives competent results on every dataset by employing the L-WaveBlock. The proposed method achieves an Inception Score of 3.6959 and a Structural Similarity Index of 0.4261 on the maps dataset, a Peak Signal-to-Noise Ratio of 29.05 and a Structural Similarity Index of 0.874 on the CelebA dataset. The proposed method performs competently to the state-of-the-art for the image denoising dataset, albeit not better, but still leads to faster convergence than conventional methods. With this, L-WaveBlock emerges as a robust and efficient tool for enhancing GAN-based image generation, demonstrating superior convergence speed and competitive performance across multiple datasets for image resolution, image generation and image denoising.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
A Multi-Agent Systems Approach for Peer-to-Peer Energy Trading in Dairy Farming
Authors:
Mian Ibad Ali Shah,
Abdul Wahid,
Enda Barrett,
Karl Mason
Abstract:
To achieve desired carbon emission reductions, integrating renewable generation and accelerating the adoption of peer-to-peer energy trading is crucial. This is especially important for energy-intensive farming, like dairy farming. However, integrating renewables and peer-to-peer trading presents challenges. To address this, we propose the Multi-Agent Peer-to-Peer Dairy Farm Energy Simulator (MAPD…
▽ More
To achieve desired carbon emission reductions, integrating renewable generation and accelerating the adoption of peer-to-peer energy trading is crucial. This is especially important for energy-intensive farming, like dairy farming. However, integrating renewables and peer-to-peer trading presents challenges. To address this, we propose the Multi-Agent Peer-to-Peer Dairy Farm Energy Simulator (MAPDES), enabling dairy farms to participate in peer-to-peer markets. Our strategy reduces electricity costs and peak demand by approximately 30% and 24% respectively, while increasing energy sales by 37% compared to the baseline scenario without P2P trading. This demonstrates the effectiveness of our approach.
△ Less
Submitted 21 August, 2023;
originally announced October 2023.
-
A Review on AI Algorithms for Energy Management in E-Mobility Services
Authors:
Sen Yan,
Maqsood Hussain Shah,
Ji Li,
Noel O'Connor,
Mingming Liu
Abstract:
E-mobility, or electric mobility, has emerged as a pivotal solution to address pressing environmental and sustainability concerns in the transportation sector. The depletion of fossil fuels, escalating greenhouse gas emissions, and the imperative to combat climate change underscore the significance of transitioning to electric vehicles (EVs). This paper seeks to explore the potential of artificial…
▽ More
E-mobility, or electric mobility, has emerged as a pivotal solution to address pressing environmental and sustainability concerns in the transportation sector. The depletion of fossil fuels, escalating greenhouse gas emissions, and the imperative to combat climate change underscore the significance of transitioning to electric vehicles (EVs). This paper seeks to explore the potential of artificial intelligence (AI) in addressing various challenges related to effective energy management in e-mobility systems (EMS). These challenges encompass critical factors such as range anxiety, charge rate optimization, and the longevity of energy storage in EVs. By analyzing existing literature, we delve into the role that AI can play in tackling these challenges and enabling efficient energy management in EMS. Our objectives are twofold: to provide an overview of the current state-of-the-art in this research domain and propose effective avenues for future investigations. Through this analysis, we aim to contribute to the advancement of sustainable and efficient e-mobility solutions, shaping a greener and more sustainable future for transportation.
△ Less
Submitted 26 September, 2023;
originally announced September 2023.
-
Egocentric RGB+Depth Action Recognition in Industry-Like Settings
Authors:
Jyoti Kini,
Sarah Fleischer,
Ishan Dave,
Mubarak Shah
Abstract:
Action recognition from an egocentric viewpoint is a crucial perception task in robotics and enables a wide range of human-robot interactions. While most computer vision approaches prioritize the RGB camera, the Depth modality - which can further amplify the subtleties of actions from an egocentric perspective - remains underexplored. Our work focuses on recognizing actions from egocentric RGB and…
▽ More
Action recognition from an egocentric viewpoint is a crucial perception task in robotics and enables a wide range of human-robot interactions. While most computer vision approaches prioritize the RGB camera, the Depth modality - which can further amplify the subtleties of actions from an egocentric perspective - remains underexplored. Our work focuses on recognizing actions from egocentric RGB and Depth modalities in an industry-like environment. To study this problem, we consider the recent MECCANO dataset, which provides a wide range of assembling actions. Our framework is based on the 3D Video SWIN Transformer to encode both RGB and Depth modalities effectively. To address the inherent skewness in real-world multimodal action occurrences, we propose a training strategy using an exponentially decaying variant of the focal loss modulating factor. Additionally, to leverage the information in both RGB and Depth modalities, we opt for late fusion to combine the predictions from each modality. We thoroughly evaluate our method on the action recognition task of the MECCANO dataset, and it significantly outperforms the prior work. Notably, our method also secured first place at the multimodal action recognition challenge at ICIAP 2023.
△ Less
Submitted 25 September, 2023;
originally announced September 2023.
-
Achievable Sum-rate of variants of QAM over Gaussian Multiple Access Channel with and without security
Authors:
Shifa Showkat,
Zahid Bashir Dar,
Shahid Mehraj Shah
Abstract:
The performance of next generation wireless systems (5G/6G and beyond) at the physical layer is primarily driven by the choice of digital modulation techniques that are bandwidth and power efficient, while maintaining high data rates. Achievable rates for Gaussian input and some finite constellations (BPSK/QPSK/QAM) are well studied in the literature. However, new variants of Quadrature Amplitude…
▽ More
The performance of next generation wireless systems (5G/6G and beyond) at the physical layer is primarily driven by the choice of digital modulation techniques that are bandwidth and power efficient, while maintaining high data rates. Achievable rates for Gaussian input and some finite constellations (BPSK/QPSK/QAM) are well studied in the literature. However, new variants of Quadrature Amplitude Modulation (QAM) such as Cross-QAM (XQAM), Star-QAM (S-QAM), Amplitude and phase shift keying (APSK), and Hexagonal Quadrature Amplitude Modulation (H-QAM) are not studied in the context of achievable rates for meeting the demand of high data rates. In this paper, we study achievable rate region for different variants of M-QAM like Cross-QAM, H-QAM, Star-QAM and APSK. We also compute mutual information corresponding to the sum rate of Gaussian Multiple Access Channel (G-MAC), for hybrid constellation scheme, e.g., user 1 transmits using Star-QAM and user 2 by H-QAM. From the results, it is observed that S-QAM gives the maximum sum-rate when users transmit same constellations. Also, it has been found that when hybrid constellation is used, the combination of Star-QAM \& H-QAM gives the maximum rate. In the next part of the paper, we consider a scenario wherein an adversary is also present at the receiver side and is trying to decode the information. We model this scenario as Gaussian Multiple Access Wiretap Channel (G-MAW-WT). We then compute the achievable secrecy sum rate of two user G-MAC-WT with discrete inputs from different variants of QAM (viz, X-QAM, H-QAM and S-QAM).It has been found that at higher values of SNR, S-QAM gives better values of SSR than the other variants. For hybrid inputs of QAM, at lower values of SNR, combination of APSK and S-QAM gives better results and at higher values of SNR, combination of HQAM and APSK gives greater value of SSR.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
A Lightweight Transformer for Faster and Robust EBSD Data Collection
Authors:
Harry Dong,
Sean Donegan,
Megna Shah,
Yuejie Chi
Abstract:
Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures tha…
▽ More
Three dimensional electron back-scattered diffraction (EBSD) microscopy is a critical tool in many applications in materials science, yet its data quality can fluctuate greatly during the arduous collection process, particularly via serial-sectioning. Fortunately, 3D EBSD data is inherently sequential, opening up the opportunity to use transformers, state-of-the-art deep learning architectures that have made breakthroughs in a plethora of domains, for data processing and recovery. To be more robust to errors and accelerate this 3D EBSD data collection, we introduce a two step method that recovers missing slices in an 3D EBSD volume, using an efficient transformer model and a projection algorithm to process the transformer's outputs. Overcoming the computational and practical hurdles of deep learning with scarce high dimensional data, we train this model using only synthetic 3D EBSD data with self-supervision and obtain superior recovery accuracy on real 3D EBSD data, compared to existing methods.
△ Less
Submitted 18 August, 2023;
originally announced August 2023.
-
Flexible Beamforming in B5G for Improving Tethered UAV Coverage over Smart Environments
Authors:
Abdu Saif,
Nor Shahida Mohd Shah,
Soreen Ameen Fattah,
Saeed Hamood Alsamhi,
Santosh Kumar,
Ali Saad Al khuraib
Abstract:
Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the num…
▽ More
Unmanned Aerial Vehicles (UAVs) are being used for wireless communications in smart environments. However, the need for mobility, scalability of data transmission over wide areas, and the required coverage area make UAV beamforming essential for better coverage and user experience. To this end, we propose a flexible beamforming approach to improve tethered UAV coverage quality and maximize the number of users experiencing the minimum required rate in any target environment. Our solution demonstrates a significant achievement in flexible beamforming in smart environments, including urban, suburban, dense, and high-rise urban. Furthermore, the beamforming gain is mainly concentrated in the target to improve the coverage area based on various scenarios. Simulation results show that the proposed approach can achieve a significantly received flexible power beam that focuses the transmitted signal towards the receiver and improves received power by reducing signal power spread. In the case of no beamforming, signal power spreads out as distance increases, reducing the signal strength. Furthermore, our proposed solution is suitable for improving UAV coverage and reliability in smart and harsh environments.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation
Authors:
Asif Hanif,
Muzammal Naseer,
Salman Khan,
Mubarak Shah,
Fahad Shahbaz Khan
Abstract:
It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial at…
▽ More
It is imperative to ensure the robustness of deep learning models in critical applications such as, healthcare. While recent advances in deep learning have improved the performance of volumetric medical image segmentation models, these models cannot be deployed for real-world applications immediately due to their vulnerability to adversarial attacks. We present a 3D frequency domain adversarial attack for volumetric medical image segmentation models and demonstrate its advantages over conventional input or voxel domain attacks. Using our proposed attack, we introduce a novel frequency domain adversarial training approach for optimizing a robust model against voxel and frequency domain attacks. Moreover, we propose frequency consistency loss to regulate our frequency domain adversarial training that achieves a better tradeoff between model's performance on clean and adversarial samples. Code is publicly available at https://github.com/asif-hanif/vafa.
△ Less
Submitted 20 July, 2023; v1 submitted 14 July, 2023;
originally announced July 2023.
-
Exploiting the Brain's Network Structure for Automatic Identification of ADHD Subjects
Authors:
Soumyabrata Dey,
Ravishankar Rao,
Mubarak Shah
Abstract:
Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from cont…
▽ More
Attention Deficit Hyperactive Disorder (ADHD) is a common behavioral problem affecting children. In this work, we investigate the automatic classification of ADHD subjects using the resting state Functional Magnetic Resonance Imaging (fMRI) sequences of the brain. We show that the brain can be modeled as a functional network, and certain properties of the networks differ in ADHD subjects from control subjects. We compute the pairwise correlation of brain voxels' activity over the time frame of the experimental protocol which helps to model the function of a brain as a network. Different network features are computed for each of the voxels constructing the network. The concatenation of the network features of all the voxels in a brain serves as the feature vector. Feature vectors from a set of subjects are then used to train a PCA-LDA (principal component analysis-linear discriminant analysis) based classifier. We hypothesized that ADHD-related differences lie in some specific regions of the brain and using features only from those regions is sufficient to discriminate ADHD and control subjects. We propose a method to create a brain mask that includes the useful regions only and demonstrate that using the feature from the masked regions improves classification accuracy on the test data set. We train our classifier with 776 subjects and test on 171 subjects provided by The Neuro Bureau for the ADHD-200 challenge. We demonstrate the utility of graph-motif features, specifically the maps that represent the frequency of participation of voxels in network cycles of length 3. The best classification performance (69.59%) is achieved using 3-cycle map features with masking. Our proposed approach holds promise in being able to diagnose and understand the disorder.
△ Less
Submitted 15 June, 2023;
originally announced June 2023.
-
Bayesian Game Formulation of Power Allocation in Multiple Access Wiretap Channel with Incomplete CSI
Authors:
Basharat Rashid,
Majed Haddad,
Shahid Mehraj Shah
Abstract:
In this paper, we address the problem of distributed power allocation in a $K$ user fading multiple access wiretap channel, where global channel state information is limited, i.e., each user has knowledge of their own channel state with respect to Bob and Eve but only knows the distribution of other users' channel states. We model this problem as a Bayesian game, where each user is assumed to self…
▽ More
In this paper, we address the problem of distributed power allocation in a $K$ user fading multiple access wiretap channel, where global channel state information is limited, i.e., each user has knowledge of their own channel state with respect to Bob and Eve but only knows the distribution of other users' channel states. We model this problem as a Bayesian game, where each user is assumed to selfishly maximize his average \emph{secrecy capacity} with partial channel state information. In this work, we first prove that there is a unique Bayesian equilibrium in the proposed game. Additionally, the price of anarchy is calculated to measure the efficiency of the equilibrium solution. We also propose a fast convergent iterative algorithm for power allocation. Finally, the results are validated using simulation results.
△ Less
Submitted 4 September, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
The Brain Tumor Segmentation (BraTS) Challenge 2023: Focus on Pediatrics (CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs)
Authors:
Anahita Fathi Kazerooni,
Nastaran Khalili,
Xinyang Liu,
Debanjan Haldar,
Zhifan Jiang,
Syed Muhammed Anwar,
Jake Albrecht,
Maruf Adewole,
Udunna Anazodo,
Hannah Anderson,
Sina Bagheri,
Ujjwal Baid,
Timothy Bergquist,
Austin J. Borja,
Evan Calabrese,
Verena Chung,
Gian-Marco Conte,
Farouk Dako,
James Eddy,
Ivan Ezhov,
Ariana Familiar,
Keyvan Farahani,
Shuvanjan Haldar,
Juan Eugenio Iglesias,
Anastasia Janas
, et al. (48 additional authors not shown)
Abstract:
Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20\%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. The MICCA…
▽ More
Pediatric tumors of the central nervous system are the most common cause of cancer-related death in children. The five-year survival rate for high-grade gliomas in children is less than 20\%. Due to their rarity, the diagnosis of these entities is often delayed, their treatment is mainly based on historic treatment concepts, and clinical trials require multi-institutional collaborations. The MICCAI Brain Tumor Segmentation (BraTS) Challenge is a landmark community benchmark event with a successful history of 12 years of resource creation for the segmentation and analysis of adult glioma. Here we present the CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs 2023 challenge, which represents the first BraTS challenge focused on pediatric brain tumors with data acquired across multiple international consortia dedicated to pediatric neuro-oncology and clinical trials. The BraTS-PEDs 2023 challenge focuses on benchmarking the development of volumentric segmentation algorithms for pediatric brain glioma through standardized quantitative performance evaluation metrics utilized across the BraTS 2023 cluster of challenges. Models gaining knowledge from the BraTS-PEDs multi-parametric structural MRI (mpMRI) training data will be evaluated on separate validation and unseen test mpMRI dataof high-grade pediatric glioma. The CBTN-CONNECT-DIPGR-ASNR-MICCAI BraTS-PEDs 2023 challenge brings together clinicians and AI/imaging scientists to lead to faster development of automated segmentation techniques that could benefit clinical trials, and ultimately the care of children with brain tumors.
△ Less
Submitted 23 May, 2024; v1 submitted 26 May, 2023;
originally announced May 2023.
-
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Authors:
Syed Talal Wasim,
Muzammal Naseer,
Salman Khan,
Fahad Shahbaz Khan,
Mubarak Shah
Abstract:
Adopting contrastive image-text pretrained models like CLIP towards video classification has gained attention due to its cost-effectiveness and competitive performance. However, recent works in this area face a trade-off. Finetuning the pretrained model to achieve strong supervised performance results in low zero-shot generalization. Similarly, freezing the backbone to retain zero-shot capability…
▽ More
Adopting contrastive image-text pretrained models like CLIP towards video classification has gained attention due to its cost-effectiveness and competitive performance. However, recent works in this area face a trade-off. Finetuning the pretrained model to achieve strong supervised performance results in low zero-shot generalization. Similarly, freezing the backbone to retain zero-shot capability causes significant drop in supervised accuracy. Because of this, recent works in literature typically train separate models for supervised and zero-shot action recognition. In this work, we propose a multimodal prompt learning scheme that works to balance the supervised and zero-shot performance under a single unified training. Our prompting approach on the vision side caters for three aspects: 1) Global video-level prompts to model the data distribution; 2) Local frame-level prompts to provide per-frame discriminative conditioning; and 3) a summary prompt to extract a condensed video representation. Additionally, we define a prompting scheme on the text side to augment the textual context. Through this prompting scheme, we can achieve state-of-the-art zero-shot performance on Kinetics-600, HMDB51 and UCF101 while remaining competitive in the supervised setting. By keeping the pretrained backbone frozen, we optimize a much lower number of parameters and retain the existing general representation which helps achieve the strong zero-shot performance. Our codes/models are released at https://github.com/TalalWasim/Vita-CLIP.
△ Less
Submitted 6 April, 2023;
originally announced April 2023.
-
Diffusion Action Segmentation
Authors:
Daochang Liu,
Qiyue Li,
AnhDung Dinh,
Tingting Jiang,
Mubarak Shah,
Chang Xu
Abstract:
Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random no…
▽ More
Temporal action segmentation is crucial for understanding long-form videos. Previous works on this task commonly adopt an iterative refinement paradigm by using multi-stage models. We propose a novel framework via denoising diffusion models, which nonetheless shares the same inherent spirit of such iterative refinement. In this framework, action predictions are iteratively generated from random noise with input video features as conditions. To enhance the modeling of three striking characteristics of human actions, including the position prior, the boundary ambiguity, and the relational dependency, we devise a unified masking strategy for the conditioning inputs in our framework. Extensive experiments on three benchmark datasets, i.e., GTEA, 50Salads, and Breakfast, are performed and the proposed method achieves superior or comparable results to state-of-the-art methods, showing the effectiveness of a generative approach for action segmentation.
△ Less
Submitted 11 August, 2023; v1 submitted 31 March, 2023;
originally announced March 2023.
-
3D Mitochondria Instance Segmentation with Spatio-Temporal Transformers
Authors:
Omkar Thawakar,
Rao Muhammad Anwer,
Jorma Laaksonen,
Orly Reiner,
Mubarak Shah,
Fahad Shahbaz Khan
Abstract:
Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology. Most existing approaches employ 3D convolutions to obtain representative features. However, these convolution-based approaches struggle to effectively capture long-range dependencies in the volume mitochondria da…
▽ More
Accurate 3D mitochondria instance segmentation in electron microscopy (EM) is a challenging problem and serves as a prerequisite to empirically analyze their distributions and morphology. Most existing approaches employ 3D convolutions to obtain representative features. However, these convolution-based approaches struggle to effectively capture long-range dependencies in the volume mitochondria data, due to their limited local receptive field. To address this, we propose a hybrid encoder-decoder framework based on a split spatio-temporal attention module that efficiently computes spatial and temporal self-attentions in parallel, which are later fused through a deformable convolution. Further, we introduce a semantic foreground-background adversarial loss during training that aids in delineating the region of mitochondria instances from the background clutter. Our extensive experiments on three benchmarks, Lucchi, MitoEM-R and MitoEM-H, reveal the benefits of the proposed contributions achieving state-of-the-art results on all three datasets. Our code and models are available at https://github.com/OmkarThawakar/STT-UNET.
△ Less
Submitted 21 March, 2023;
originally announced March 2023.
-
An Efficient Game Theory-Based Power Control Algorithm for D2D Communication in 5G Networks
Authors:
Abdu Saif,
Kamarul Ariffin bin Noordin,
Kaharudin Dimyati,
Nor Shahida Mohd Shah,
Yousef Ali Al-Gumaei,
Qazwan Abdullah,
Kamal Ali Alezabi
Abstract:
Device-to-Device (D2D) communication is one of the enabling technologies for 5G networks that support proximity-based service (ProSe) for wireless network communications. This paper proposes a power control algorithm based on the Nash equilibrium and game theory to eliminate the interference between the cellular user device and D2D links. This leads to reliable connectivity with minimal power cons…
▽ More
Device-to-Device (D2D) communication is one of the enabling technologies for 5G networks that support proximity-based service (ProSe) for wireless network communications. This paper proposes a power control algorithm based on the Nash equilibrium and game theory to eliminate the interference between the cellular user device and D2D links. This leads to reliable connectivity with minimal power consumption in wireless communication. The power control in D2D is modeled as a non-cooperative game. Each device is allowed to independently select and transmit its power to maximize (or minimize) user utility. The aim is to guide user devices to converge with the Nash equilibrium by establishing connectivity with network resources. The proposed algorithm with pricing factors is used for power consumption and reduces overall interference of D2Ds communication. The proposed algorithm is evaluated in terms of the energy efficiency of the average power consumption, the number of D2D communication, and the number of iterations. Besides, the algorithm has a relatively fast convergence with the Nash Equilibrium rate. It guarantees that the user devices can achieve their required Quality of Service (QoS) by adjusting the residual cost coefficient and residual energy factor. Simulation results show that the power control shows a significant reduction in power consumption that has been achieved by approximately 20% compared with algorithms in [11].
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Estimation Large- Scale Fading Channels for Transmit Orthogonal Pilot Reuse Sequences in Massive MIMO System
Authors:
Qazwan Abdullah,
Nor Shahida Mohd Shah,
Shipun Hamzah,
Adeb Salh,
Mahathir Mohamad,
Shahilah Nordin,
Maisarah Abu,
Mohammed Abdo Albaom,
safwan sadeq
Abstract:
Massive multiple-input multiple-output (MIMO) is a critical technology for future fifth-generation (5G) systems. Reduced pilot contamination (PC) enhanced system performance, and reduced inter-cell interference and improved channel estimation. However, because the pilot sequence transmitted by users in a single cell to neighboring cells is not orthogonal, massive MIMO systems are still constrained…
▽ More
Massive multiple-input multiple-output (MIMO) is a critical technology for future fifth-generation (5G) systems. Reduced pilot contamination (PC) enhanced system performance, and reduced inter-cell interference and improved channel estimation. However, because the pilot sequence transmitted by users in a single cell to neighboring cells is not orthogonal, massive MIMO systems are still constrained. We propose channel evaluation using orthogonal pilot reuse sequences (PRS) and zero forced (ZF) pre-coding techniques to eliminate channel quality in end users with poor channel quality based on channel evaluation, large-scale shutdown evaluation, and analysis of maximum transmission efficiency. We derived the lower bounds on the downlink data rate (DR) and signal-to-interference noise ratio (SINR) that can be achieved based on PRS assignment to a group of users where the number of antenna elements mitigated the interference when the number of antennas reaches infinity. The channel coherence interval limitation, the orthogonal PRS cannot be allocated to all UEs in each cell. The short coherence intervals able to reduce the PC and improve the quality of channel. The results of the modelling showed that higher DR can be achieved due to better channel evaluation and lower loss.
△ Less
Submitted 20 October, 2022;
originally announced December 2022.
-
A New Technique for Improving Energy Efficiency in 5G Mm-wave Hybrid Precoding Systems
Authors:
Adeb Salh,
Qazwan Abdullah,
Ghasan Hussain,
Razlai Ngah,
Lukman Audah,
Nor Shahida Mohd Shah,
Shipun Hamzah
Abstract:
In this article, we present a new approach to optimizing the energy efficiency of the cost-efficiency of quantized hybrid pre-encoding (HP) design. We present effective alternating minimization algorithms (AMA) based on the zero gradient method to produce completely connected structures (CCSs) and partially connected structures (PCSs). Alternative minimization algorithms offer lower complexity by…
▽ More
In this article, we present a new approach to optimizing the energy efficiency of the cost-efficiency of quantized hybrid pre-encoding (HP) design. We present effective alternating minimization algorithms (AMA) based on the zero gradient method to produce completely connected structures (CCSs) and partially connected structures (PCSs). Alternative minimization algorithms offer lower complexity by introducing orthogonal constraints on digital pre-codes to concurrently maximize computing complexity and communication power. As a result, by improving CCS through advanced phase extraction, the alternating minimization technique enhances hybrid pre-encoding. For PCS, the energy-saving ratio grew by 45.3 %, while for CCS, it increased by 18.12 %.
△ Less
Submitted 20 October, 2022;
originally announced November 2022.
-
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Authors:
MD Abdullah Al Nasim,
Abdullah Al Munem,
Maksuda Islam,
Md Aminul Haque Palash,
MD. Mahim Anjum Haque,
Faisal Muhammad Shah
Abstract:
Cancer of the brain is deadly and requires careful surgical segmentation. The brain tumors were segmented using U-Net using a Convolutional Neural Network (CNN). When looking for overlaps of necrotic, edematous, growing, and healthy tissue, it might be hard to get relevant information from the images. The 2D U-Net network was improved and trained with the BraTS datasets to find these four areas. U…
▽ More
Cancer of the brain is deadly and requires careful surgical segmentation. The brain tumors were segmented using U-Net using a Convolutional Neural Network (CNN). When looking for overlaps of necrotic, edematous, growing, and healthy tissue, it might be hard to get relevant information from the images. The 2D U-Net network was improved and trained with the BraTS datasets to find these four areas. U-Net can set up many encoder and decoder routes that can be used to get information from images that can be used in different ways. To reduce computational time, we use image segmentation to exclude insignificant background details. Experiments on the BraTS datasets show that our proposed model for segmenting brain tumors from MRI (MRI) works well. In this study, we demonstrate that the BraTS datasets for 2017, 2018, 2019, and 2020 do not significantly differ from the BraTS 2019 dataset's attained dice scores of 0.8717 (necrotic), 0.9506 (edema), and 0.9427 (enhancing).
△ Less
Submitted 15 January, 2023; v1 submitted 24 October, 2022;
originally announced October 2022.
-
Comparative Analysis of State-of-the-Art Deep Learning Models for Detecting COVID-19 Lung Infection from Chest X-Ray Images
Authors:
Zeba Ghaffar,
Pir Masoom Shah,
Hikmat Khan,
Syed Farhan Alam Zaidi,
Abdullah Gani,
Izaz Ahmad Khan,
Munam Ali Shah,
Saif ul Islam
Abstract:
The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists wor…
▽ More
The ongoing COVID-19 pandemic has already taken millions of lives and damaged economies across the globe. Most COVID-19 deaths and economic losses are reported from densely crowded cities. It is comprehensible that the effective control and prevention of epidemic/pandemic infectious diseases is vital. According to WHO, testing and diagnosis is the best strategy to control pandemics. Scientists worldwide are attempting to develop various innovative and cost-efficient methods to speed up the testing process. This paper comprehensively evaluates the applicability of the recent top ten state-of-the-art Deep Convolutional Neural Networks (CNNs) for automatically detecting COVID-19 infection using chest X-ray images. Moreover, it provides a comparative analysis of these models in terms of accuracy. This study identifies the effective methodologies to control and prevent infectious respiratory diseases. Our trained models have demonstrated outstanding results in classifying the COVID-19 infected chest x-rays. In particular, our trained models MobileNet, EfficentNet, and InceptionV3 achieved a classification average accuracy of 95\%, 95\%, and 94\% test set for COVID-19 class classification, respectively. Thus, it can be beneficial for clinical practitioners and radiologists to speed up the testing, detection, and follow-up of COVID-19 cases.
△ Less
Submitted 30 June, 2022;
originally announced August 2022.
-
Breast Cancer Classification using Deep Learned Features Boosted with Handcrafted Features
Authors:
Unaiza Sajid,
Rizwan Ahmed Khan,
Shahid Munir Shah,
Sheeraz Arif
Abstract:
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early dete…
▽ More
Breast cancer is one of the leading causes of death among women across the globe. It is difficult to treat if detected at advanced stages, however, early detection can significantly increase chances of survival and improves lives of millions of women. Given the widespread prevalence of breast cancer, it is of utmost importance for the research community to come up with the framework for early detection, classification and diagnosis. Artificial intelligence research community in coordination with medical practitioners are developing such frameworks to automate the task of detection. With the surge in research activities coupled with availability of large datasets and enhanced computational powers, it expected that AI framework results will help even more clinicians in making correct predictions. In this article, a novel framework for classification of breast cancer using mammograms is proposed. The proposed framework combines robust features extracted from novel Convolutional Neural Network (CNN) features with handcrafted features including HOG (Histogram of Oriented Gradients) and LBP (Local Binary Pattern). The obtained results on CBIS-DDSM dataset exceed state of the art.
△ Less
Submitted 16 January, 2023; v1 submitted 26 June, 2022;
originally announced June 2022.
-
Outage Analysis of Energy Efficiency in a Finite-Element-IRS Aided Communication System
Authors:
Aaqib Bulla,
Shahid M Shah
Abstract:
In this paper, we study the performance of an energy efficient wireless communication system, assisted by a finite-element-intelligent reflecting surface (IRS). With no instantaneous channel state information (CSI) at the transmitter, we characterize the system performance in terms of the outage probability (OP) of energy efficiency (EE). Depending upon the availability of line-of-sight (LOS) path…
▽ More
In this paper, we study the performance of an energy efficient wireless communication system, assisted by a finite-element-intelligent reflecting surface (IRS). With no instantaneous channel state information (CSI) at the transmitter, we characterize the system performance in terms of the outage probability (OP) of energy efficiency (EE). Depending upon the availability of line-of-sight (LOS) paths, we analyze the system for two different channel models, viz. Rician and Rayleigh. For an arbitrary number of IRS elements $(N)$, we derive the approximate closed-form solutions for the OP of EE, using Laguerre series and moment matching methods. The analytical results are validated using the Monte-Carlo simulations. Moreover, we also quantify the rate of convergence of the derived expressions to the central limit theorem (CLT) approximations using the \textit{Berry-Esseen} inequality. Further, we prove that the OP of EE is a strict pseudo-convex function of the transmit power and hence, has a unique global minimum. To obtain the optimal transmit power, we solve the OP of EE as a constrained optimization problem. To the best of our knowledge, the OP of EE as a performance metric, has never been previously studied in IRS-assisted wireless communication systems.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Self-Supervised Video Object Segmentation via Cutout Prediction and Tagging
Authors:
Jyoti Kini,
Fahad Shahbaz Khan,
Salman Khan,
Mubarak Shah
Abstract:
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a discriminative learning loss formulation that takes into account both object and background information to ensure object-background discriminabil…
▽ More
We propose a novel self-supervised Video Object Segmentation (VOS) approach that strives to achieve better object-background discriminability for accurate object segmentation. Distinct from previous self-supervised VOS methods, our approach is based on a discriminative learning loss formulation that takes into account both object and background information to ensure object-background discriminability, rather than using only object appearance. The discriminative learning loss comprises cutout-based reconstruction (cutout region represents part of a frame, whose pixels are replaced with some constant values) and tag prediction loss terms. The cutout-based reconstruction term utilizes a simple cutout scheme to learn the pixel-wise correspondence between the current and previous frames in order to reconstruct the original current frame with added cutout region in it. The introduced cutout patch guides the model to focus as much on the significant features of the object of interest as the less significant ones, thereby implicitly equipping the model to address occlusion-based scenarios. Next, the tag prediction term encourages object-background separability by grouping tags of all pixels in the cutout region that are similar, while separating them from the tags of the rest of the reconstructed frame pixels. Additionally, we introduce a zoom-in scheme that addresses the problem of small object segmentation by capturing fine structural information at multiple scales. Our proposed approach, termed CT-VOS, achieves state-of-the-art results on two challenging benchmarks: DAVIS-2017 and Youtube-VOS. A detailed ablation showcases the importance of the proposed loss formulation to effectively capture object-background discriminability and the impact of our zoom-in scheme to accurately segment small-sized objects.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
Tag-Based Attention Guided Bottom-Up Approach for Video Instance Segmentation
Authors:
Jyoti Kini,
Mubarak Shah
Abstract:
Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down approach that usually involves separate networks to detect and segment objects in each frame, followed by associating these detections in consecutive frames using…
▽ More
Video Instance Segmentation is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence. Most existing methods typically accomplish this task by employing a multi-stage top-down approach that usually involves separate networks to detect and segment objects in each frame, followed by associating these detections in consecutive frames using a learned tracking head. In this work, however, we introduce a simple end-to-end trainable bottom-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach. Unlike contemporary frame-based models, our network pipeline processes an input video clip as a single 3D volume to incorporate temporal information. The central idea of our formulation is to solve the video instance segmentation task as a tag assignment problem, such that generating distinct tag values essentially separates individual object instances across the video sequence (here each tag could be any arbitrary value between 0 and 1). To this end, we propose a novel spatio-temporal tagging loss that allows for sufficient separation of different objects as well as necessary identification of different instances of the same object. Furthermore, we present a tag-based attention module that improves instance tags, while concurrently learning instance propagation within a video. Evaluations demonstrate that our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other state-of-the-art performance methods.
△ Less
Submitted 22 April, 2022;
originally announced April 2022.
-
An Efficient End-to-End Deep Neural Network for Interstitial Lung Disease Recognition and Classification
Authors:
Masum Shah Junayed,
Afsana Ahsan Jeny,
Md Baharul Islam,
Ikhtiar Ahmed,
A F M Shahen Shah
Abstract:
The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and R…
▽ More
The automated Interstitial Lung Diseases (ILDs) classification technique is essential for assisting clinicians during the diagnosis process. Detecting and classifying ILDs patterns is a challenging problem. This paper introduces an end-to-end deep convolution neural network (CNN) for classifying ILDs patterns. The proposed model comprises four convolutional layers with different kernel sizes and Rectified Linear Unit (ReLU) activation function, followed by batch normalization and max-pooling with a size equal to the final feature map size well as four dense layers. We used the ADAM optimizer to minimize categorical cross-entropy. A dataset consisting of 21328 image patches of 128 CT scans with five classes is taken to train and assess the proposed model. A comparison study showed that the presented model outperformed pre-trained CNNs and five-fold cross-validation on the same dataset. For ILDs pattern classification, the proposed approach achieved the accuracy scores of 99.09% and the average F score of 97.9%, outperforming three pre-trained CNNs. These outcomes show that the proposed model is relatively state-of-the-art in precision, recall, f score, and accuracy.
△ Less
Submitted 21 April, 2022;
originally announced April 2022.
-
Advancement of Deep Learning in Pneumonia and Covid-19 Classification and Localization: A Qualitative and Quantitative Analysis
Authors:
Aakash Shah,
Manan Shah
Abstract:
Around 450 million people are affected by pneumonia every year which results in 2.5 million deaths. Covid-19 has also affected 181 million people which has lead to 3.92 million casualties. The chances of death in both of these diseases can be significantly reduced if they are diagnosed early. However, the current methods of diagnosing pneumonia (complaints + chest X-ray) and covid-19 (RT-PCR) requ…
▽ More
Around 450 million people are affected by pneumonia every year which results in 2.5 million deaths. Covid-19 has also affected 181 million people which has lead to 3.92 million casualties. The chances of death in both of these diseases can be significantly reduced if they are diagnosed early. However, the current methods of diagnosing pneumonia (complaints + chest X-ray) and covid-19 (RT-PCR) require the presence of expert radiologists and time, respectively. With the help of Deep Learning models, pneumonia and covid-19 can be detected instantly from Chest X-rays or CT scans. This way, the process of diagnosing Pneumonia/Covid-19 can be made more efficient and widespread. In this paper, we aim to elicit, explain, and evaluate, qualitatively and quantitatively, major advancements in deep learning methods aimed at detecting or localizing community-acquired pneumonia (CAP), viral pneumonia, and covid-19 from images of chest X-rays and CT scans. Being a systematic review, the focus of this paper lies in explaining deep learning model architectures which have either been modified or created from scratch for the task at hand wiwth focus on generalizability. For each model, this paper answers the question of why the model is designed the way it is, the challenges that a particular model overcomes, and the tradeoffs that come with modifying a model to the required specifications. A quantitative analysis of all models described in the paper is also provided to quantify the effectiveness of different models with a similar goal. Some tradeoffs cannot be quantified, and hence they are mentioned explicitly in the qualitative analysis, which is done throughout the paper. By compiling and analyzing a large quantum of research details in one place with all the datasets, model architectures, and results, we aim to provide a one-stop solution to beginners and current researchers interested in this field.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
Artificial Intelligence For Breast Cancer Detection: Trends & Directions
Authors:
Shahid Munir Shah,
Rizwan Ahmed Khan,
Sheeraz Arif,
Unaiza Sajid
Abstract:
In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware t…
▽ More
In the last decade, researchers working in the domain of computer vision and Artificial Intelligence (AI) have beefed up their efforts to come up with the automated framework that not only detects but also identifies stage of breast cancer. The reason for this surge in research activities in this direction are mainly due to advent of robust AI algorithms (deep learning), availability of hardware that can train those robust and complex AI algorithms and accessibility of large enough dataset required for training AI algorithms. Different imaging modalities that have been exploited by researchers to automate the task of breast cancer detection are mammograms, ultrasound, magnetic resonance imaging, histopathological images or any combination of them. This article analyzes these imaging modalities and presents their strengths, limitations and enlists resources from where their datasets can be accessed for research purpose. This article then summarizes AI and computer vision based state-of-the-art methods proposed in the last decade, to detect breast cancer using various imaging modalities. Generally, in this article we have focused on to review frameworks that have reported results using mammograms as it is most widely used breast imaging modality that serves as first test that medical practitioners usually prescribe for the detection of breast cancer. Second reason of focusing on mammogram imaging modalities is the availability of its labeled datasets. Datasets availability is one of the most important aspect for the development of AI based frameworks as such algorithms are data hungry and generally quality of dataset affects performance of AI based algorithms. In a nutshell, this research article will act as a primary resource for the research community working in the field of automated breast imaging analysis.
△ Less
Submitted 3 October, 2021;
originally announced October 2021.
-
An Optimization of Fractal Microstrip Patch Antenna with Partial Ground using Genetic Algorithm Method
Authors:
Hamid M. Q. Rasheda,
Norshahida Mohd Shah,
Abdu Saif,
Qazwan Abdullah,
Abbas Ugurenver,
Abdul Rashid. O. Mumin,
Nan Bin Mad Sahar
Abstract:
Ultra-wideband is increasingly advancing as a high data rate wireless technology after the Federal Communication Commission announced the bandwidth of 7.5 GHz (from 3.1 GHz to 10.6 GHz) for ultra-wideband applications. Furthermore, designing a UWB antenna faces more difficulties than designing a narrow band antenna. A suitable UWB antenna should be able to work over the Federal Communication Commi…
▽ More
Ultra-wideband is increasingly advancing as a high data rate wireless technology after the Federal Communication Commission announced the bandwidth of 7.5 GHz (from 3.1 GHz to 10.6 GHz) for ultra-wideband applications. Furthermore, designing a UWB antenna faces more difficulties than designing a narrow band antenna. A suitable UWB antenna should be able to work over the Federal Communication Commission of ultra-wide bandwidth allocation. Furthermore, good radiation properties across the entire frequency spectrum are needed. This paper outlines an optimization of fractal square microstrip patch antenna with the partial ground using a genetic algorithm at 3.5 GHz and 6 GHz. The optimized antenna design shows improved results compared to the non-optimized design. This design is optimized using a genetic algorithm and simulated using CST simulation software. The size of the optimized design is reduced by cutting the edges and the center of the patch. The optimized results reported, and concentrated on the rerun loss, VSWR and gain. The results indicate a significant enhancement as is illustrated in Table II. Thus, the optimized design is suitable for S-band and C-band applications.
△ Less
Submitted 30 June, 2021;
originally announced August 2021.
-
Development of A Fully Data-Driven Artificial Intelligence and Deep Learning for URLLC Application in 6G Wireless Systems: A Survey
Authors:
Adeeb Salh,
Lukman Audah,
Qazwan Abdullah,
Abdullah Noorsaliza,
Nor Shahida Mohd Shah,
Jameel Mukred,
Shipun Hamzah
Abstract:
The full future of the sixth generation will develop a fully data-driven that provide terabit rate per second, and adopt an average of 1000+ massive number of connections per person in 10 years 2030 virtually instantaneously. Data-driven for ultra-reliable and low latency communication is a new service paradigm provided by a new application of future sixth-generation wireless communication and net…
▽ More
The full future of the sixth generation will develop a fully data-driven that provide terabit rate per second, and adopt an average of 1000+ massive number of connections per person in 10 years 2030 virtually instantaneously. Data-driven for ultra-reliable and low latency communication is a new service paradigm provided by a new application of future sixth-generation wireless communication and network architecture, involving 100+ Gbps data rates with one millisecond latency. The key constraint is the amount of computing power available to spread massive data and well-designed artificial neural networks. Artificial Intelligence provides a new technique to design wireless networks by apply learning, predicting, and make decisions to manage the stream of big data training individuals, which provides more the capacity to transform that expert learning to develop the performance of wireless networks. We study the developing technologies that will be the driving force are artificial intelligence, communication systems to guarantee low latency. This paper aims to discuss the efficiency of the developing network and alleviate the great challenge for application scenarios and study Holographic radio, enhanced wireless channel coding, enormous Internet of Things integration, and haptic communication for virtual and augmented reality provide new services on the 6G network. Furthermore, improving a multi-level architecture for ultra-reliable and low latency in deep Learning allows for data-driven AI and 6G networks for device intelligence, as well as allowing innovations based on effective learning capabilities. These difficulties must be solved in order to meet the needs of future smart networks. Furthermore, this research categorizes various unexplored research gaps between machine learning and the sixth generation.
△ Less
Submitted 3 August, 2021;
originally announced August 2021.
-
Florida Wildlife Camera Trap Dataset
Authors:
Crystal Gagne,
Jyoti Kini,
Daniel Smith,
Mubarak Shah
Abstract:
Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research. Minimal human interference required to operate camera traps allows capturing unbiased species activities. Several studies - based on human and wildlife interactions, migratory patterns of various species, risk of extinction in endangered populations - are limited by the lack of rich…
▽ More
Trail camera imagery has increasingly gained popularity amongst biologists for conservation and ecological research. Minimal human interference required to operate camera traps allows capturing unbiased species activities. Several studies - based on human and wildlife interactions, migratory patterns of various species, risk of extinction in endangered populations - are limited by the lack of rich data and the time-consuming nature of manually annotating trail camera imagery. We introduce a challenging wildlife camera trap classification dataset collected from two different locations in Southwestern Florida, consisting of 104,495 images featuring visually similar species, varying illumination conditions, skewed class distribution, and including samples of endangered species, i.e. Florida panthers. Experimental evaluations with ResNet-50 architecture indicate that this image classification-based dataset can further push the advancements in wildlife statistical modeling. We will make the dataset publicly available.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
Optimal Transmit Power and Antenna Selection to Achieve Energy Efficient and Low Complexity in fifth generation Massive MIMO Systems
Authors:
Adeeb Salh,
Lukman Audah,
Nor Shahida Mohd Shah,
Qazwan Abdullah,
Noorsaliza Abdullah,
Jameel Mukred,
Shipun Hamzah
Abstract:
This paper investigates joint antenna selection and optimal transmit power in multi cell massive multiple input multiple output systems. The pilot interference and activated transmit antenna selection plays an essential role in maximizing energy efficiency. We derived the closed-form of maximal energy efficiency with complete knowledge of large-scale fading with maximum ratio transmission while ac…
▽ More
This paper investigates joint antenna selection and optimal transmit power in multi cell massive multiple input multiple output systems. The pilot interference and activated transmit antenna selection plays an essential role in maximizing energy efficiency. We derived the closed-form of maximal energy efficiency with complete knowledge of large-scale fading with maximum ratio transmission while accounting for channel estimation and eliminated pilot contamination when the antennas approach infinity. We investigated joint optimal antenna selection and optimal transmit power under minimized reuse of pilot sequences based on a novel iterative low-complexity algorithm for Lagrange multiplayer and Newton methods. The two scenarios of achievable high data rate and total transmit power allocation are critical to the performance maximal energy efficiency. We propose new power consumption for each antenna based on the transmit power amplifier and circuit power consumption to analyze exact power consumption. The simulation results show that maximal energy efficiency could be achieved using the iterative low complexity algorithm based on the reasonable maximum transmit power when the noise power was less than the power received pilot. The proposed low complexity iterative algorithm offers maximum energy efficiency by repeating a minimized pilot signal until the optimal antenna selection and transmission power are achieved.
△ Less
Submitted 4 June, 2021;
originally announced June 2021.
-
Trade-off Energy and Spectral Efficiency in 5G Massive MIMO System
Authors:
Adeb Salh,
Nor Shahida Mohd Shah,
Lukman Audah,
Qazwan Abdullah,
Norsaliza Abdullah,
Shipun A. Hamzah,
Abdu Saif
Abstract:
A massive multiple input multiple-output system is very important to optimize the trade-off energy efficiency and spectral efficiency in fifth-generation cellular networks. The challenges for the next generation depend on increasing the high data traffic in the wireless communication system for both EE and SE. In this paper, the trade off energy efficiency and spectral efficiency based on the firs…
▽ More
A massive multiple input multiple-output system is very important to optimize the trade-off energy efficiency and spectral efficiency in fifth-generation cellular networks. The challenges for the next generation depend on increasing the high data traffic in the wireless communication system for both EE and SE. In this paper, the trade off energy efficiency and spectral efficiency based on the first derivative of transmit antennas and transmit power in a downlink massive MIMO system has been investigated. The trade off EE-SE by using a multiobjective optimization problem to decrease transmit power has been analyzed. The EE and SE based on constraint maximum transmit power allocation and a number of antennas by computing the first derivative of transmit power to maximize the trade-off energy efficiency and spectral efficiency has been improved. From the simulation results, the optimum trade-off between EE and SE can be obtained based on the first derivative by selecting the optimal antennas with a low cost of transmit power. Therefore, based on an optimal optimization problem is flexible to make trade-offs between EE-SE for distinct preferences
△ Less
Submitted 22 May, 2021;
originally announced May 2021.
-
Internet of Fly Things For Post-Disaster Recovery Based on Multi-environment
Authors:
Abdu Saif,
Kaharudin Bin Dimyati,
Kamarul Ariffin Bin Noordin,
Nor Shahida Mohd Shah,
Qazwan Abdullah,
Fadhil Mukhlif,
Mahathir Mohamad
Abstract:
Natural disasters such as floods and earthquakes immensely impact the telecommunication network infrastructure, leading to the malfunctioning and interruption of wireless services. Consequently, the user devices under the disaster zone are unable to access the cellular base stations. Wireless coverage on an unmanned aerial vehicle (UAV) is considered for providing coverage service to ground user d…
▽ More
Natural disasters such as floods and earthquakes immensely impact the telecommunication network infrastructure, leading to the malfunctioning and interruption of wireless services. Consequently, the user devices under the disaster zone are unable to access the cellular base stations. Wireless coverage on an unmanned aerial vehicle (UAV) is considered for providing coverage service to ground user devices in disaster events. This work evaluated the efficient performance of wireless coverage services of UAVs to provide the internet to fly things to help recover the communications link in a natural disaster in multi environments. The results demonstrate the line of sight, nonline of sight, path loss, and coverage probability for the radio propagation environment scenario. Therefore, the path loss and coverage probability are affected by the user devices' elevation angle and distance in the multi-environment system. The user position's optimum user device distance and elevation angle are also investigated to improve the coverage probability, which could be especially useful for the UAV deployment design.
△ Less
Submitted 8 May, 2021; v1 submitted 18 April, 2021;
originally announced April 2021.
-
Unmanned Aerial Vehicle and Optimal Relay for Extending Coverage in Post-Disaster Scenarios
Authors:
Abdu Saif,
Kaharudin Dimyati,
Kamarul Ariffin Noordin,
Nor Shahida Mohd Shah,
Qazwan Abdullah,
Mahathir Mohamad,
Mahmod Abd Hakim Mohamad,
Ahmed M. Al-Saman
Abstract:
The malfunction or interruption of wireless coverage services has been shown to increase the mortality rate during natural disasters. Wireless coverage by an unmanned aerial vehicle (UAV) provides network coverage to ground user devices during and post-disaster events. The relay hops receive wireless coverage and can be forwarded to user devices that are out of coverage allowing reliable connectiv…
▽ More
The malfunction or interruption of wireless coverage services has been shown to increase the mortality rate during natural disasters. Wireless coverage by an unmanned aerial vehicle (UAV) provides network coverage to ground user devices during and post-disaster events. The relay hops receive wireless coverage and can be forwarded to user devices that are out of coverage allowing reliable connectivity for large-scale user devices. This work evaluates the optimal relay hops performance to improve wireless coverage services and establish connectivity in post-disaster scenarios. The results demonstrate the UAV line of sights understanding to select an optimal relay for improving wireless coverage services. The path loss probability and system capacity were all affected by the user device distance and relay densities. The optimal relay hop distance and the UAV positions static are also investigated to improve coverage likelihood which could be especially useful for UAV deployment design. It is found that the dense relays node in UAV systems enhances the capacity coverage area and energy efficiency by decentralized connectivity through a multihop device to device wireless network.
△ Less
Submitted 13 April, 2021;
originally announced April 2021.
-
Secure Energy Efficiency: Power Allocation and Outage Analysis for SWIPT-in-DAS based IoT
Authors:
Aaqib Bulla,
Shahid M Shah
Abstract:
In this paper we study secure energy efficiency (SEE) for simultaneous wireless information and power transfer (SWIPT) in a distributed antenna system (DAS) based IoT network. We consider a system in which both legitimate users (Bobs) and eavesdroppers (Eves) have power splitting (PS) receivers to simultaneously decode information and harvest energy from the received signal. When the channel state…
▽ More
In this paper we study secure energy efficiency (SEE) for simultaneous wireless information and power transfer (SWIPT) in a distributed antenna system (DAS) based IoT network. We consider a system in which both legitimate users (Bobs) and eavesdroppers (Eves) have power splitting (PS) receivers to simultaneously decode information and harvest energy from the received signal. When the channel state information (CSI) is known at the transmitter, we analyze the effect of an energy harvesting eavesdropper (EHE) over the maximization of SEE of the system. Next, considering the fact that perfect CSI is hard to achieve in practice, we characterize the system performance in terms of the outage probability of SEE. For the given SWIPT-in-DAS setup, we derive the closed form expression for the outage probability of SEE and with the help of numerical results, we study the effect of transmit power levels, number of distributed antenna (DA) ports and the PS ratio of devices. To the best of our knowledge, this is the first attempt to define the outage probability of SEE for SWIPT-in-DAS.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Distributed Clustering for User Devices Under Unmanned Aerial Vehicle Coverage Area during Disaster Recovery
Authors:
Abdu Saif,
Kaharudin Bin Dimyati,
Kamarul Ariffin Bin Noordin,
Nor Shahida Mohd. Shah,
S. H. Alsamhi,
Qazwan Abdullah,
Nabil Farah
Abstract:
An Unmanned Aerial Vehicle (UAV) is a promising technology for providing wireless coverage to ground user devices. For all the infrastructure communication networks destroyed in disasters, UAVs battery life is challenging during service delivery in a post-disaster scenario. Therefore, selecting cluster heads among user devices plays a vital role in detecting UAV signals and processing data for imp…
▽ More
An Unmanned Aerial Vehicle (UAV) is a promising technology for providing wireless coverage to ground user devices. For all the infrastructure communication networks destroyed in disasters, UAVs battery life is challenging during service delivery in a post-disaster scenario. Therefore, selecting cluster heads among user devices plays a vital role in detecting UAV signals and processing data for improving UAV energy efficacy and reliable Connectivity. This paper focuses on the performance evaluation of the clustering approach performance in detecting wireless coverage services with improving energy efficiency. The evaluation performance is a realistic simulation for the ground to air channel Line of Sight (LoS). The results show that the cluster head can effectively link the UAVs and cluster members at minimal energy expenditure. The UAVs altitudes and path loss exponent affected user devices for detecting wireless coverage. Moreover, the bit error rate in the cluster heads is considered for reliable Connectivity in post disaster. Clustering stabilizes the clusters linking the uncovered nodes to the UAV, and its effectiveness in doing so resulted in its ubiquity in emergency communication systems.
△ Less
Submitted 14 March, 2021;
originally announced March 2021.
-
Anomaly Detection in Video via Self-Supervised and Multi-Task Learning
Authors:
Mariana-Iuliana Georgescu,
Antonio Barbalau,
Radu Tudor Ionescu,
Fahad Shahbaz Khan,
Marius Popescu,
Mubarak Shah
Abstract:
Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. The…
▽ More
Anomaly detection in video is a challenging computer vision problem. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without full supervision. In this paper, we approach anomalous event detection in video through self-supervised and multi-task learning at the object level. We first utilize a pre-trained detector to detect objects. Then, we train a 3D convolutional neural network to produce discriminative anomaly-specific information by jointly learning multiple proxy tasks: three self-supervised and one based on knowledge distillation. The self-supervised tasks are: (i) discrimination of forward/backward moving objects (arrow of time), (ii) discrimination of objects in consecutive/intermittent frames (motion irregularity) and (iii) reconstruction of object-specific appearance information. The knowledge distillation task takes into account both classification and detection information, generating large prediction discrepancies between teacher and student models when anomalies occur. To the best of our knowledge, we are the first to approach anomalous event detection in video as a multi-task learning problem, integrating multiple self-supervised and knowledge distillation proxy tasks in a single architecture. Our lightweight architecture outperforms the state-of-the-art methods on three benchmarks: Avenue, ShanghaiTech and UCSD Ped2. Additionally, we perform an ablation study demonstrating the importance of integrating self-supervised learning and normality-specific distillation in a multi-task learning setting.
△ Less
Submitted 10 September, 2021; v1 submitted 15 November, 2020;
originally announced November 2020.
-
Federated Learning for Breast Density Classification: A Real-World Implementation
Authors:
Holger R. Roth,
Ken Chang,
Praveer Singh,
Nir Neumark,
Wenqi Li,
Vikash Gupta,
Sharut Gupta,
Liangqiong Qu,
Alvin Ihsani,
Bernardo C. Bizzo,
Yuhong Wen,
Varun Buch,
Meesam Shah,
Felipe Kitamura,
Matheus Mendonça,
Vitor Lavor,
Ahmed Harouni,
Colin Compas,
Jesse Tetreault,
Prerna Dogra,
Yan Cheng,
Selnur Erdal,
Richard White,
Behrooz Hashemian,
Thomas Schultz
, et al. (18 additional authors not shown)
Abstract:
Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Report…
▽ More
Building robust deep learning-based models requires large quantities of diverse training data. In this study, we investigate the use of federated learning (FL) to build medical imaging classification models in a real-world collaborative setting. Seven clinical institutions from across the world joined this FL effort to train a model for breast density classification based on Breast Imaging, Reporting & Data System (BI-RADS). We show that despite substantial differences among the datasets from all sites (mammography system, class distribution, and data set size) and without centralizing data, we can successfully train AI models in federation. The results show that models trained using FL perform 6.3% on average better than their counterparts trained on an institute's local data alone. Furthermore, we show a 45.8% relative improvement in the models' generalizability when evaluated on the other participating sites' testing data.
△ Less
Submitted 20 October, 2020; v1 submitted 3 September, 2020;
originally announced September 2020.
-
A Background-Agnostic Framework with Adversarial Training for Abnormal Event Detection in Video
Authors:
Mariana-Iuliana Georgescu,
Radu Tudor Ionescu,
Fahad Shahbaz Khan,
Marius Popescu,
Mubarak Shah
Abstract:
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propo…
▽ More
Abnormal event detection in video is a complex computer vision problem that has attracted significant attention in recent years. The complexity of the task arises from the commonly-adopted definition of an abnormal event, that is, a rarely occurring event that typically depends on the surrounding context. Following the standard formulation of abnormal event detection as outlier detection, we propose a background-agnostic framework that learns from training videos containing only normal events. Our framework is composed of an object detector, a set of appearance and motion auto-encoders, and a set of classifiers. Since our framework only looks at object detections, it can be applied to different scenes, provided that normal events are defined identically across scenes and that the single main factor of variation is the background. To overcome the lack of abnormal data during training, we propose an adversarial learning strategy for the auto-encoders. We create a scene-agnostic set of out-of-domain pseudo-abnormal examples, which are correctly reconstructed by the auto-encoders before applying gradient ascent on the pseudo-abnormal examples. We further utilize the pseudo-abnormal examples to serve as abnormal examples when training appearance-based and motion-based binary classifiers to discriminate between normal and abnormal latent features and reconstructions. We compare our framework with the state-of-the-art methods on four benchmark data sets, using various evaluation metrics. Compared to existing methods, the empirical results indicate that our approach achieves favorable performance on all data sets. In addition, we provide region-based and track-based annotations for two large-scale abnormal event detection data sets from the literature, namely ShanghaiTech and Subway.
△ Less
Submitted 6 April, 2023; v1 submitted 27 August, 2020;
originally announced August 2020.
-
Conditional Entropy Coding for Efficient Video Compression
Authors:
Jerry Liu,
Shenlong Wang,
Wei-Chiu Ma,
Meet Shah,
Rui Hu,
Pranaab Dhawan,
Raquel Urtasun
Abstract:
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architectur…
▽ More
We propose a very simple and efficient video compression framework that only focuses on modeling the conditional entropy between frames. Unlike prior learning-based approaches, we reduce complexity by not performing any form of explicit transformations between frames and assume each frame is encoded with an independent state-of-the-art deep image compressor. We first show that a simple architecture modeling the entropy between the image latent codes is as competitive as other neural video compression works and video codecs while being much faster and easier to implement. We then propose a novel internal learning extension on top of this architecture that brings an additional 10% bitrate savings without trading off decoding speed. Importantly, we show that our approach outperforms H.265 and other deep learning baselines in MS-SSIM on higher bitrate UVG video, and against all video codecs on lower framerates, while being thousands of times faster in decoding than deep models utilizing an autoregressive entropy model.
△ Less
Submitted 20 August, 2020;
originally announced August 2020.
-
Deep Photo Cropper and Enhancer
Authors:
Aaron Ott,
Amir Mazaheri,
Niels D. Lobo,
Mubarak Shah
Abstract:
This paper introduces a new type of image enhancement problem. Compared to traditional image enhancement methods, which mostly deal with pixel-wise modifications of a given photo, our proposed task is to crop an image which is embedded within a photo and enhance the quality of the cropped image. We split our proposed approach into two deep networks: deep photo cropper and deep image enhancer. In t…
▽ More
This paper introduces a new type of image enhancement problem. Compared to traditional image enhancement methods, which mostly deal with pixel-wise modifications of a given photo, our proposed task is to crop an image which is embedded within a photo and enhance the quality of the cropped image. We split our proposed approach into two deep networks: deep photo cropper and deep image enhancer. In the photo cropper network, we employ a spatial transformer to extract the embedded image. In the photo enhancer, we employ super-resolution to increase the number of pixels in the embedded image and reduce the effect of stretching and distortion of pixels. We use cosine distance loss between image features and ground truth for the cropper and the mean square loss for the enhancer. Furthermore, we propose a new dataset to train and test the proposed method. Finally, we analyze the proposed method with respect to qualitative and quantitative evaluations.
△ Less
Submitted 2 August, 2020;
originally announced August 2020.
-
TinyVIRAT: Low-resolution Video Action Recognition
Authors:
Ugur Demir,
Yogesh S Rawat,
Mubarak Shah
Abstract:
The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem. In this work, we focus on recognizing tiny action…
▽ More
The existing research in action recognition is mostly focused on high-quality videos where the action is distinctly visible. In real-world surveillance environments, the actions in videos are captured at a wide range of resolutions. Most activities occur at a distance with a small resolution and recognizing such activities is a challenging problem. In this work, we focus on recognizing tiny actions in videos. We introduce a benchmark dataset, TinyVIRAT, which contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging. We propose a novel method for recognizing tiny actions in videos which utilizes a progressive generative approach to improve the quality of low-resolution actions. The proposed method also consists of a weakly trained attention mechanism which helps in focusing on the activity regions in the video. We perform extensive experiments to benchmark the proposed TinyVIRAT dataset and observe that the proposed method significantly improves the action recognition performance over baselines. We also evaluate the proposed approach on synthetically resized action recognition datasets and achieve state-of-the-art results when compared with existing methods. The dataset and code is publicly available at https://github.com/UgurDemir/Tiny-VIRAT.
△ Less
Submitted 14 July, 2020;
originally announced July 2020.
-
Text Synopsis Generation for Egocentric Videos
Authors:
Aidean Sharghi,
Niels da Vitoria Lobo,
Mubarak Shah
Abstract:
Mass utilization of body-worn cameras has led to a huge corpus of available egocentric video. Existing video summarization algorithms can accelerate browsing such videos by selecting (visually) interesting shots from them. Nonetheless, since the system user still has to watch the summary videos, browsing large video databases remain a challenge. Hence, in this work, we propose to generate a textua…
▽ More
Mass utilization of body-worn cameras has led to a huge corpus of available egocentric video. Existing video summarization algorithms can accelerate browsing such videos by selecting (visually) interesting shots from them. Nonetheless, since the system user still has to watch the summary videos, browsing large video databases remain a challenge. Hence, in this work, we propose to generate a textual synopsis, consisting of a few sentences describing the most important events in a long egocentric videos. Users can read the short text to gain insight about the video, and more importantly, efficiently search through the content of a large video database using text queries. Since egocentric videos are long and contain many activities and events, using video-to-text algorithms results in thousands of descriptions, many of which are incorrect. Therefore, we propose a multi-task learning scheme to simultaneously generate descriptions for video segments and summarize the resulting descriptions in an end-to-end fashion. We Input a set of video shots and the network generates a text description for each shot. Next, visual-language content matching unit that is trained with a weakly supervised objective, identifies the correct descriptions. Finally, the last component of our network, called purport network, evaluates the descriptions all together to select the ones containing crucial information. Out of thousands of descriptions generated for the video, a few informative sentences are returned to the user. We validate our framework on the challenging UT Egocentric video dataset, where each video is between 3 to 5 hours long, associated with over 3000 textual descriptions on average. The generated textual summaries, including only 5 percent (or less) of the generated descriptions, are compared to groundtruth summaries in text domain using well-established metrics in natural language processing.
△ Less
Submitted 21 September, 2020; v1 submitted 7 May, 2020;
originally announced May 2020.