-
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Authors:
Sidra Aleem,
Fangyijie Wang,
Mayug Maniparambil,
Eric Arazo,
Julia Dietlmeier,
Guenole Silvestre,
Kathleen Curran,
Noel E. O'Connor,
Suzanne Little
Abstract:
The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily…
▽ More
The Segment Anything Model (SAM) and CLIP are remarkable vision foundation models (VFMs). SAM, a prompt driven segmentation model, excels in segmentation tasks across diverse domains, while CLIP is renowned for its zero shot recognition capabilities. However, their unified potential has not yet been explored in medical image segmentation. To adapt SAM to medical imaging, existing methods primarily rely on tuning strategies that require extensive data or prior prompts tailored to the specific task, making it particularly challenging when only a limited number of data samples are available. This work presents an in depth exploration of integrating SAM and CLIP into a unified framework for medical image segmentation. Specifically, we propose a simple unified framework, SaLIP, for organ segmentation. Initially, SAM is used for part based segmentation within the image, followed by CLIP to retrieve the mask corresponding to the region of interest (ROI) from the pool of SAM generated masks. Finally, SAM is prompted by the retrieved ROI to segment a specific organ. Thus, SaLIP is training and fine tuning free and does not rely on domain expertise or labeled data for prompt engineering. Our method shows substantial enhancements in zero shot segmentation, showcasing notable improvements in DICE scores across diverse segmentation tasks like brain (63.46%), lung (50.11%), and fetal head (30.82%), when compared to un prompted SAM. Code and text prompts are available at: https://github.com/aleemsidra/SaLIP.
△ Less
Submitted 30 April, 2024; v1 submitted 9 April, 2024;
originally announced April 2024.
-
ConvLoRA and AdaBN based Domain Adaptation via Self-Training
Authors:
Sidra Aleem,
Julia Dietlmeier,
Eric Arazo,
Suzanne Little
Abstract:
Existing domain adaptation (DA) methods often involve pre-training on the source domain and fine-tuning on the target domain. For multi-target domain adaptation, having a dedicated/separate fine-tuned network for each target domain, that retain all the pre-trained model parameters, is prohibitively expensive. To address this limitation, we propose Convolutional Low-Rank Adaptation (ConvLoRA). Conv…
▽ More
Existing domain adaptation (DA) methods often involve pre-training on the source domain and fine-tuning on the target domain. For multi-target domain adaptation, having a dedicated/separate fine-tuned network for each target domain, that retain all the pre-trained model parameters, is prohibitively expensive. To address this limitation, we propose Convolutional Low-Rank Adaptation (ConvLoRA). ConvLoRA freezes pre-trained model weights, adds trainable low-rank decomposition matrices to convolutional layers, and backpropagates the gradient through these matrices thus greatly reducing the number of trainable parameters. To further boost adaptation, we utilize Adaptive Batch Normalization (AdaBN) which computes target-specific running statistics and use it along with ConvLoRA. Our method has fewer trainable parameters and performs better or on-par with large independent fine-tuned networks (with less than 0.9% trainable parameters of the total base model) when tested on the segmentation of Calgary-Campinas dataset containing brain MRI images. Our approach is simple, yet effective and can be applied to any deep learning-based architecture which uses convolutional and batch normalization layers. Code is available at: https://github.com/aleemsidra/ConvLoRA.
△ Less
Submitted 7 February, 2024;
originally announced February 2024.
-
Biased Attention: Do Vision Transformers Amplify Gender Bias More than Convolutional Neural Networks?
Authors:
Abhishek Mandal,
Susan Leavy,
Suzanne Little
Abstract:
Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias. Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification. However, given that research on mitigating bias in computer vision has primarily focused on CNNs, it…
▽ More
Deep neural networks used in computer vision have been shown to exhibit many social biases such as gender bias. Vision Transformers (ViTs) have become increasingly popular in computer vision applications, outperforming Convolutional Neural Networks (CNNs) in many tasks such as image classification. However, given that research on mitigating bias in computer vision has primarily focused on CNNs, it is important to evaluate the effect of a different network architecture on the potential for bias amplification. In this paper we therefore introduce a novel metric to measure bias in architectures, Accuracy Difference. We examine bias amplification when models belonging to these two architectures are used as a part of large multimodal models, evaluating the different image encoders of Contrastive Language Image Pretraining which is an important model used in many generative models such as DALL-E and Stable Diffusion. Our experiments demonstrate that architecture can play a role in amplifying social biases due to the different techniques employed by the models for feature extraction and embedding as well as their different learning properties. This research found that ViTs amplified gender bias to a greater extent than CNNs
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Gender Bias in Multimodal Models: A Transnational Feminist Approach Considering Geographical Region and Culture
Authors:
Abhishek Mandal,
Suzanne Little,
Susan Leavy
Abstract:
Deep learning based visual-linguistic multimodal models such as Contrastive Language Image Pre-training (CLIP) have become increasingly popular recently and are used within text-to-image generative models such as DALL-E and Stable Diffusion. However, gender and other social biases have been uncovered in these models, and this has the potential to be amplified and perpetuated through AI systems. In…
▽ More
Deep learning based visual-linguistic multimodal models such as Contrastive Language Image Pre-training (CLIP) have become increasingly popular recently and are used within text-to-image generative models such as DALL-E and Stable Diffusion. However, gender and other social biases have been uncovered in these models, and this has the potential to be amplified and perpetuated through AI systems. In this paper, we present a methodology for auditing multimodal models that consider gender, informed by concepts from transnational feminism, including regional and cultural dimensions. Focusing on CLIP, we found evidence of significant gender bias with varying patterns across global regions. Harmful stereotypical associations were also uncovered related to visual cultural cues and labels such as terrorism. Levels of gender bias uncovered within CLIP for different regions aligned with global indices of societal gender equality, with those from the Global South reflecting the highest levels of gender bias.
△ Less
Submitted 10 September, 2023;
originally announced September 2023.
-
Multi-Band Acoustic Monitoring of Aerial Signatures
Authors:
Andrew Mead,
Sarah Little,
Paul Sail,
Michelle Tu,
Wesley Andrés Watters,
Abigail White,
Richard Cloete
Abstract:
The Galileo Project's acoustic monitoring, omni-directional system (AMOS) aids in the detection and characterization of aerial phenomena. It uses a multi-band microphone suite spanning infrasonic to ultrasonic frequencies, providing an independent signal modality for validation and characterization of detected objects. The system utilizes infrasonic, audible, and ultrasonic systems to cover a wide…
▽ More
The Galileo Project's acoustic monitoring, omni-directional system (AMOS) aids in the detection and characterization of aerial phenomena. It uses a multi-band microphone suite spanning infrasonic to ultrasonic frequencies, providing an independent signal modality for validation and characterization of detected objects. The system utilizes infrasonic, audible, and ultrasonic systems to cover a wide range of sounds produced by both natural and man-made aerial phenomena. Sound signals from aerial objects can be captured given certain conditions, such as when the sound level is above ambient noise and isn't excessively distorted by its transmission path. Findings suggest that audible sources can be detected up to 1 km away, infrasonic sources can be detected over much longer distances, and ultrasonic at shorter ones. Initial data collected from aircraft recordings with spectral analysis will help develop algorithms and software for quick identification of known aircraft. Future work will involve multi-sensor arrays for sound localization, larger data sets analysis, and incorporation of machine learning and AI for detection and identification of more types of phenomena in all frequency bands.
△ Less
Submitted 29 May, 2023;
originally announced May 2023.
-
An Ensemble Deep Learning Approach for COVID-19 Severity Prediction Using Chest CT Scans
Authors:
Sidra Aleem,
Mayug Maniparambil,
Suzanne Little,
Noel O'Connor,
Kevin McGuinness
Abstract:
Chest X-rays have been widely used for COVID-19 screening; however, 3D computed tomography (CT) is a more effective modality. We present our findings on COVID-19 severity prediction from chest CT scans using the STOIC dataset. We developed an ensemble deep learning based model that incorporates multiple neural networks to improve predictions. To address data imbalance, we used slicing functions an…
▽ More
Chest X-rays have been widely used for COVID-19 screening; however, 3D computed tomography (CT) is a more effective modality. We present our findings on COVID-19 severity prediction from chest CT scans using the STOIC dataset. We developed an ensemble deep learning based model that incorporates multiple neural networks to improve predictions. To address data imbalance, we used slicing functions and data augmentation. We further improved performance using test time data augmentation. Our approach which employs a simple yet effective ensemble of deep learning-based models with strong test time augmentations, achieved results comparable to more complex methods and secured the fourth position in the STOIC2021 COVID-19 AI Challenge. Our code is available on online: at: https://github.com/aleemsidra/stoic2021- baseline-finalphase-main.
△ Less
Submitted 17 May, 2023;
originally announced May 2023.
-
Multimodal Composite Association Score: Measuring Gender Bias in Generative Multimodal Models
Authors:
Abhishek Mandal,
Susan Leavy,
Suzanne Little
Abstract:
Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawl…
▽ More
Generative multimodal models based on diffusion models have seen tremendous growth and advances in recent years. Models such as DALL-E and Stable Diffusion have become increasingly popular and successful at creating images from texts, often combining abstract ideas. However, like other deep learning models, they also reflect social biases they inherit from their training data, which is often crawled from the internet. Manually auditing models for biases can be very time and resource consuming and is further complicated by the unbounded and unconstrained nature of inputs these models can take. Research into bias measurement and quantification has generally focused on small single-stage models working on a single modality. Thus the emergence of multistage multimodal models requires a different approach. In this paper, we propose Multimodal Composite Association Score (MCAS) as a new method of measuring gender bias in multimodal generative models. Evaluating both DALL-E 2 and Stable Diffusion using this approach uncovered the presence of gendered associations of concepts embedded within the models. We propose MCAS as an accessible and scalable method of quantifying potential bias for models with different modalities and a range of potential biases.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
Constructing a meta-learner for unsupervised anomaly detection
Authors:
Małgorzata Gutowska,
Suzanne Little,
Andrew McCarren
Abstract:
Unsupervised anomaly detection (AD) is critical for a wide range of practical applications, from network security to health and medical tools. Due to the diversity of problems, no single algorithm has been found to be superior for all AD tasks. Choosing an algorithm, otherwise known as the Algorithm Selection Problem (ASP), has been extensively examined in supervised classification problems, throu…
▽ More
Unsupervised anomaly detection (AD) is critical for a wide range of practical applications, from network security to health and medical tools. Due to the diversity of problems, no single algorithm has been found to be superior for all AD tasks. Choosing an algorithm, otherwise known as the Algorithm Selection Problem (ASP), has been extensively examined in supervised classification problems, through the use of meta-learning and AutoML, however, it has received little attention in unsupervised AD tasks. This research proposes a new meta-learning approach that identifies an appropriate unsupervised AD algorithm given a set of meta-features generated from the unlabelled input dataset. The performance of the proposed meta-learner is superior to the current state of the art solution. In addition, a mixed model statistical analysis has been conducted to examine the impact of the meta-learner components: the meta-model, meta-features, and the base set of AD algorithms, on the overall performance of the meta-learner. The analysis was conducted using more than 10,000 datasets, which is significantly larger than previous studies. Results indicate that a relatively small number of meta-features can be used to identify an appropriate AD algorithm, but the choice of a meta-model in the meta-learner has a considerable impact.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Random Data Augmentation based Enhancement: A Generalized Enhancement Approach for Medical Datasets
Authors:
Sidra Aleem,
Teerath Kumar,
Suzanne Little,
Malika Bendechache,
Rob Brennan,
Kevin McGuinness
Abstract:
Over the years, the paradigm of medical image analysis has shifted from manual expertise to automated systems, often using deep learning (DL) systems. The performance of deep learning algorithms is highly dependent on data quality. Particularly for the medical domain, it is an important aspect as medical data is very sensitive to quality and poor quality can lead to misdiagnosis. To improve the di…
▽ More
Over the years, the paradigm of medical image analysis has shifted from manual expertise to automated systems, often using deep learning (DL) systems. The performance of deep learning algorithms is highly dependent on data quality. Particularly for the medical domain, it is an important aspect as medical data is very sensitive to quality and poor quality can lead to misdiagnosis. To improve the diagnostic performance, research has been done both in complex DL architectures and in improving data quality using dataset dependent static hyperparameters. However, the performance is still constrained due to data quality and overfitting of hyperparameters to a specific dataset. To overcome these issues, this paper proposes random data augmentation based enhancement. The main objective is to develop a generalized, data-independent and computationally efficient enhancement approach to improve medical data quality for DL. The quality is enhanced by improving the brightness and contrast of images. In contrast to the existing methods, our method generates enhancement hyperparameters randomly within a defined range, which makes it robust and prevents overfitting to a specific dataset. To evaluate the generalization of the proposed method, we use four medical datasets and compare its performance with state-of-the-art methods for both classification and segmentation tasks. For grayscale imagery, experiments have been performed with: COVID-19 chest X-ray, KiTS19, and for RGB imagery with: LC25000 datasets. Experimental results demonstrate that with the proposed enhancement methodology, DL architectures outperform other existing methods. Our code is publicly available at: https://github.com/aleemsidra/Augmentation-Based-Generalized-Enhancement
△ Less
Submitted 3 October, 2022;
originally announced October 2022.
-
TrollsWithOpinion: A Dataset for Predicting Domain-specific Opinion Manipulation in Troll Memes
Authors:
Shardul Suryawanshi,
Bharathi Raja Chakravarthi,
Mihael Arcan,
Suzanne Little,
Paul Buitelaar
Abstract:
Research into the classification of Image with Text (IWT) troll memes has recently become popular. Since the online community utilizes the refuge of memes to express themselves, there is an abundance of data in the form of memes. These memes have the potential to demean, harras, or bully targeted individuals. Moreover, the targeted individual could fall prey to opinion manipulation. To comprehend…
▽ More
Research into the classification of Image with Text (IWT) troll memes has recently become popular. Since the online community utilizes the refuge of memes to express themselves, there is an abundance of data in the form of memes. These memes have the potential to demean, harras, or bully targeted individuals. Moreover, the targeted individual could fall prey to opinion manipulation. To comprehend the use of memes in opinion manipulation, we define three specific domains (product, political or others) which we classify into troll or not-troll, with or without opinion manipulation. To enable this analysis, we enhanced an existing dataset by annotating the data with our defined classes, resulting in a dataset of 8,881 IWT or multimodal memes in the English language (TrollsWithOpinion dataset). We perform baseline experiments on the annotated dataset, and our result shows that existing state-of-the-art techniques could only reach a weighted-average F1-score of 0.37. This shows the need for a development of a specific technique to deal with multimodal troll memes.
△ Less
Submitted 10 May, 2022; v1 submitted 8 September, 2021;
originally announced September 2021.
-
Utilising Visual Attention Cues for Vehicle Detection and Tracking
Authors:
Feiyan Hu,
Venkatesh G M,
Noel E. O'Connor,
Alan F. Smeaton,
Suzanne Little
Abstract:
Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency m…
▽ More
Advanced Driver-Assistance Systems (ADAS) have been attracting attention from many researchers. Vision-based sensors are the closest way to emulate human driver visual behavior while driving. In this paper, we explore possible ways to use visual attention (saliency) for object detection and tracking. We investigate: 1) How a visual attention map such as a \emph{subjectness} attention or saliency map and an \emph{objectness} attention map can facilitate region proposal generation in a 2-stage object detector; 2) How a visual attention map can be used for tracking multiple objects. We propose a neural network that can simultaneously detect objects as and generate objectness and subjectness maps to save computational power. We further exploit the visual attention map during tracking using a sequential Monte Carlo probability hypothesis density (PHD) filter. The experiments are conducted on KITTI and DETRAC datasets. The use of visual attention and hierarchical features has shown a considerable improvement of $\approx$8\% in object detection which effectively increased tracking performance by $\approx$4\% on KITTI dataset.
△ Less
Submitted 31 July, 2020;
originally announced August 2020.
-
ECIR 2020 Workshops: Assessing the Impact of Going Online
Authors:
Sérgio Nunes,
Suzanne Little,
Sumit Bhatia,
Ludovico Boratto,
Guillaume Cabanac,
Ricardo Campos,
Francisco M. Couto,
Stefano Faralli,
Ingo Frommholz,
Adam Jatowt,
Alípio Jorge,
Mirko Marras,
Philipp Mayr,
Giovanni Stilo
Abstract:
ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organizers and the workshop participants. We pr…
▽ More
ECIR 2020 https://ecir2020.org/ was one of the many conferences affected by the COVID-19 pandemic. The Conference Chairs decided to keep the initially planned dates (April 14-17, 2020) and move to a fully online event. In this report, we describe the experience of organizing the ECIR 2020 Workshops in this scenario from two perspectives: the workshop organizers and the workshop participants. We provide a report on the organizational aspect of these events and the consequences for participants. Covering the scientific dimension of each workshop is outside the scope of this article.
△ Less
Submitted 14 May, 2020;
originally announced May 2020.
-
MediaEval 2019: Concealed FGSM Perturbations for Privacy Preservation
Authors:
Panagiotis Linardos,
Suzanne Little,
Kevin McGuinness
Abstract:
This work tackles the Pixel Privacy task put forth by MediaEval 2019. Our goal is to manipulate images in a way that conceals them from automatic scene classifiers while preserving the original image quality. We use the fast gradient sign method, which normally has a corrupting influence on image appeal, and devise two methods to minimize the damage. The first approach uses a map of pixel location…
▽ More
This work tackles the Pixel Privacy task put forth by MediaEval 2019. Our goal is to manipulate images in a way that conceals them from automatic scene classifiers while preserving the original image quality. We use the fast gradient sign method, which normally has a corrupting influence on image appeal, and devise two methods to minimize the damage. The first approach uses a map of pixel locations that are either salient or flat, and directs perturbations away from them. The second approach subtracts the gradient of an aesthetics evaluation model from the gradient of the attack model to guide the perturbations towards a direction that preserves appeal. We make our code available at: https://git.io/JesXr.
△ Less
Submitted 25 October, 2019;
originally announced October 2019.
-
Differentially Private Continual Release of Graph Statistics
Authors:
Shuang Song,
Susan Little,
Sanjay Mehta,
Staal Vinterbo,
Kamalika Chaudhuri
Abstract:
Motivated by understanding the dynamics of sensitive social networks over time, we consider the problem of continual release of statistics in a network that arrives online, while preserving privacy of its participants. For our privacy notion, we use differential privacy -- the gold standard in privacy for statistical data analysis. The main challenge in this problem is maintaining a good privacy-u…
▽ More
Motivated by understanding the dynamics of sensitive social networks over time, we consider the problem of continual release of statistics in a network that arrives online, while preserving privacy of its participants. For our privacy notion, we use differential privacy -- the gold standard in privacy for statistical data analysis. The main challenge in this problem is maintaining a good privacy-utility tradeoff; naive solutions that compose across time, as well as solutions suited to tabular data either lead to poor utility or do not directly apply. In this work, we show that if there is a publicly known upper bound on the maximum degree of any node in the entire network sequence, then we can release many common graph statistics such as degree distributions and subgraph counts continually with a better privacy-accuracy tradeoff.
Code available at https://bitbucket.org/shs037/graphprivacycode
△ Less
Submitted 18 September, 2018; v1 submitted 7 September, 2018;
originally announced September 2018.
-
People, Penguins and Petri Dishes: Adapting Object Counting Models To New Visual Domains And Object Types Without Forgetting
Authors:
Mark Marsden,
Kevin McGuinness,
Suzanne Little,
Ciara E. Keogh,
Noel E. O'Connor
Abstract:
In this paper we propose a technique to adapt a convolutional neural network (CNN) based object counter to additional visual domains and object types while still preserving the original counting function. Domain-specific normalisation and scaling operators are trained to allow the model to adjust to the statistical distributions of the various visual domains. The developed adaptation technique is…
▽ More
In this paper we propose a technique to adapt a convolutional neural network (CNN) based object counter to additional visual domains and object types while still preserving the original counting function. Domain-specific normalisation and scaling operators are trained to allow the model to adjust to the statistical distributions of the various visual domains. The developed adaptation technique is used to produce a singular patch-based counting regressor capable of counting various object types including people, vehicles, cell nuclei and wildlife. As part of this study a challenging new cell counting dataset in the context of tissue culture and patient diagnosis is constructed. This new collection, referred to as the Dublin Cell Counting (DCC) dataset, is the first of its kind to be made available to the wider computer vision community. State-of-the-art object counting performance is achieved in both the Shanghaitech (parts A and B) and Penguins datasets while competitive performance is observed on the TRANCOS and Modified Bone Marrow (MBM) datasets, all using a shared counting model.
△ Less
Submitted 15 November, 2017;
originally announced November 2017.
-
ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification
Authors:
Mark Marsden,
Kevin McGuinness,
Suzanne Little,
Noel E. O'Connor
Abstract:
In this paper we propose ResnetCrowd, a deep residual architecture for simultaneous crowd counting, violent behaviour detection and crowd density level classification. To train and evaluate the proposed multi-objective technique, a new 100 image dataset referred to as Multi Task Crowd is constructed. This new dataset is the first computer vision dataset fully annotated for crowd counting, violent…
▽ More
In this paper we propose ResnetCrowd, a deep residual architecture for simultaneous crowd counting, violent behaviour detection and crowd density level classification. To train and evaluate the proposed multi-objective technique, a new 100 image dataset referred to as Multi Task Crowd is constructed. This new dataset is the first computer vision dataset fully annotated for crowd counting, violent behaviour detection and density level classification. Our experiments show that a multi-task approach boosts individual task performance for all tasks and most notably for violent behaviour detection which receives a 9\% boost in ROC curve AUC (Area under the curve). The trained ResnetCrowd model is also evaluated on several additional benchmarks highlighting the superior generalisation of crowd analysis models trained for multiple objectives.
△ Less
Submitted 30 May, 2017;
originally announced May 2017.
-
Fully Convolutional Crowd Counting On Highly Congested Scenes
Authors:
Mark Marsden,
Kevin McGuinness,
Suzanne Little,
Noel E. O'Connor
Abstract:
In this paper we advance the state-of-the-art for crowd counting in high density scenes by further exploring the idea of a fully convolutional crowd counting model introduced by (Zhang et al., 2016). Producing an accurate and robust crowd count estimator using computer vision techniques has attracted significant research interest in recent years. Applications for crowd counting systems exist in ma…
▽ More
In this paper we advance the state-of-the-art for crowd counting in high density scenes by further exploring the idea of a fully convolutional crowd counting model introduced by (Zhang et al., 2016). Producing an accurate and robust crowd count estimator using computer vision techniques has attracted significant research interest in recent years. Applications for crowd counting systems exist in many diverse areas including city planning, retail, and of course general public safety. Developing a highly generalised counting model that can be deployed in any surveillance scenario with any camera perspective is the key objective for research in this area. Techniques developed in the past have generally performed poorly in highly congested scenes with several thousands of people in frame (Rodriguez et al., 2011). Our approach, influenced by the work of (Zhang et al., 2016), consists of the following contributions: (1) A training set augmentation scheme that minimises redundancy among training samples to improve model generalisation and overall counting performance; (2) a deep, single column, fully convolutional network (FCN) architecture; (3) a multi-scale averaging step during inference. The developed technique can analyse images of any resolution or aspect ratio and achieves state-of-the-art counting performance on the Shanghaitech Part B and UCF CC 50 datasets as well as competitive performance on Shanghaitech Part A.
△ Less
Submitted 17 January, 2017; v1 submitted 1 December, 2016;
originally announced December 2016.
-
Holistic Features For Real-Time Crowd Behaviour Anomaly Detection
Authors:
M. Marsden,
K. McGuinness,
S. Little,
N. E. O'Connor
Abstract:
This paper presents a new approach to crowd behaviour anomaly detection that uses a set of efficiently computed, easily interpretable, scene-level holistic features. This low-dimensional descriptor combines two features from the literature: crowd collectiveness [1] and crowd conflict [2], with two newly developed crowd features: mean motion speed and a new formulation of crowd density. Two differe…
▽ More
This paper presents a new approach to crowd behaviour anomaly detection that uses a set of efficiently computed, easily interpretable, scene-level holistic features. This low-dimensional descriptor combines two features from the literature: crowd collectiveness [1] and crowd conflict [2], with two newly developed crowd features: mean motion speed and a new formulation of crowd density. Two different anomaly detection approaches are investigated using these features. When only normal training data is available we use a Gaussian Mixture Model (GMM) for outlier detection. When both normal and abnormal training data is available we use a Support Vector Machine (SVM) for binary classification. We evaluate on two crowd behaviour anomaly detection datasets, achieving both state-of-the-art classification performance on the violent-flows dataset [3] as well as better than real-time processing performance (40 frames per second).
△ Less
Submitted 16 June, 2016;
originally announced June 2016.