-
A deep learning framework for the detection and quantification of drusen and reticular pseudodrusen on optical coherence tomography
Authors:
Roy Schwartz,
Hagar Khalid,
Sandra Liakopoulos,
Yanling Ouyang,
Coen de Vente,
Cristina González-Gonzalo,
Aaron Y. Lee,
Robyn Guymer,
Emily Y. Chew,
Catherine Egan,
Zhichao Wu,
Himeesh Kumar,
Joseph Farrington,
Clara I. Sánchez,
Adnan Tufail
Abstract:
Purpose - To develop and validate a deep learning (DL) framework for the detection and quantification of drusen and reticular pseudodrusen (RPD) on optical coherence tomography scans.
Design - Development and validation of deep learning models for classification and feature segmentation.
Methods - A DL framework was developed consisting of a classification model and an out-of-distribution (OOD…
▽ More
Purpose - To develop and validate a deep learning (DL) framework for the detection and quantification of drusen and reticular pseudodrusen (RPD) on optical coherence tomography scans.
Design - Development and validation of deep learning models for classification and feature segmentation.
Methods - A DL framework was developed consisting of a classification model and an out-of-distribution (OOD) detection model for the identification of ungradable scans; a classification model to identify scans with drusen or RPD; and an image segmentation model to independently segment lesions as RPD or drusen. Data were obtained from 1284 participants in the UK Biobank (UKBB) with a self-reported diagnosis of age-related macular degeneration (AMD) and 250 UKBB controls. Drusen and RPD were manually delineated by five retina specialists. The main outcome measures were sensitivity, specificity, area under the ROC curve (AUC), kappa, accuracy and intraclass correlation coefficient (ICC).
Results - The classification models performed strongly at their respective tasks (0.95, 0.93, and 0.99 AUC, respectively, for the ungradable scans classifier, the OOD model, and the drusen and RPD classification model). The mean ICC for drusen and RPD area vs. graders was 0.74 and 0.61, respectively, compared with 0.69 and 0.68 for intergrader agreement. FROC curves showed that the model's sensitivity was close to human performance.
Conclusions - The models achieved high classification and segmentation performance, similar to human performance. Application of this robust framework will further our understanding of RPD as a separate entity from drusen in both research and clinical settings.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Adversarial Attack Vulnerability of Medical Image Analysis Systems: Unexplored Factors
Authors:
Gerda Bortsova,
Cristina González-Gonzalo,
Suzanne C. Wetstein,
Florian Dubost,
Ioannis Katramados,
Laurens Hogeweg,
Bart Liefers,
Bram van Ginneken,
Josien P. W. Pluim,
Mitko Veta,
Clara I. Sánchez,
Marleen de Bruijne
Abstract:
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep l…
▽ More
Adversarial attacks are considered a potentially serious security threat for machine learning systems. Medical image analysis (MedIA) systems have recently been argued to be vulnerable to adversarial attacks due to strong financial incentives and the associated technological infrastructure.
In this paper, we study previously unexplored factors affecting adversarial attack vulnerability of deep learning MedIA systems in three medical domains: ophthalmology, radiology, and pathology. We focus on adversarial black-box settings, in which the attacker does not have full access to the target model and usually uses another model, commonly referred to as surrogate model, to craft adversarial examples. We consider this to be the most realistic scenario for MedIA systems.
Firstly, we study the effect of weight initialization (ImageNet vs. random) on the transferability of adversarial attacks from the surrogate model to the target model. Secondly, we study the influence of differences in development data between target and surrogate models. We further study the interaction of weight initialization and data differences with differences in model architecture. All experiments were done with a perturbation degree tuned to ensure maximal transferability at minimal visual perceptibility of the attacks.
Our experiments show that pre-training may dramatically increase the transferability of adversarial examples, even when the target and surrogate's architectures are different: the larger the performance gain using pre-training, the larger the transferability. Differences in the development data between target and surrogate models considerably decrease the performance of the attack; this decrease is further amplified by difference in the model architecture. We believe these factors should be considered when developing security-critical MedIA systems planned to be deployed in clinical practice.
△ Less
Submitted 17 June, 2021; v1 submitted 11 June, 2020;
originally announced June 2020.
-
Iterative Augmentation of Visual Evidence for Weakly-Supervised Lesion Localization in Deep Interpretability Frameworks: Application to Color Fundus Images
Authors:
Cristina González-Gonzalo,
Bart Liefers,
Bram van Ginneken,
Clara I. Sánchez
Abstract:
Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts' trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnorma…
▽ More
Interpretability of deep learning (DL) systems is gaining attention in medical imaging to increase experts' trust in the obtained predictions and facilitate their integration in clinical settings. We propose a deep visualization method to generate interpretability of DL classification tasks in medical imaging by means of visual evidence augmentation. The proposed method iteratively unveils abnormalities based on the prediction of a classifier trained only with image-level labels. For each image, initial visual evidence of the prediction is extracted with a given visual attribution technique. This provides localization of abnormalities that are then removed through selective inpainting. We iteratively apply this procedure until the system considers the image as normal. This yields augmented visual evidence, including less discriminative lesions which were not detected at first but should be considered for final diagnosis. We apply the method to grading of two retinal diseases in color fundus images: diabetic retinopathy (DR) and age-related macular degeneration (AMD). We evaluate the generated visual evidence and the performance of weakly-supervised localization of different types of DR and AMD abnormalities, both qualitatively and quantitatively. We show that the augmented visual evidence of the predictions highlights the biomarkers considered by experts for diagnosis and improves the final localization performance. It results in a relative increase of 11.2+/-2.0% per image regarding sensitivity averaged at 10 false positives/image on average, when applied to different classification tasks, visual attribution techniques and network architectures. This makes the proposed method a useful tool for exhaustive visual support of DL classifiers in medical imaging.
△ Less
Submitted 1 February, 2022; v1 submitted 16 October, 2019;
originally announced October 2019.
-
A deep learning model for segmentation of geographic atrophy to study its long-term natural history
Authors:
Bart Liefers,
Johanna M. Colijn,
Cristina González-Gonzalo,
Timo Verzijden,
Paul Mitchell,
Carel B. Hoyng,
Bram van Ginneken,
Caroline C. W. Klaver,
Clara I. Sánchez
Abstract:
Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for…
▽ More
Purpose: To develop and validate a deep learning model for automatic segmentation of geographic atrophy (GA) in color fundus images (CFIs) and its application to study growth rate of GA. Participants: 409 CFIs of 238 eyes with GA from the Rotterdam Study (RS) and the Blue Mountain Eye Study (BMES) for model development, and 5,379 CFIs of 625 eyes from the Age-Related Eye Disease Study (AREDS) for analysis of GA growth rate. Methods: A deep learning model based on an ensemble of encoder-decoder architectures was implemented and optimized for the segmentation of GA in CFIs. Four experienced graders delineated GA in CFIs from RS and BMES. These manual delineations were used to evaluate the segmentation model using 5-fold cross-validation. The model was further applied to CFIs from the AREDS to study the growth rate of GA. Linear regression analysis was used to study associations between structural biomarkers at baseline and GA growth rate. A general estimate of the progression of GA area over time was made by combining growth rates of all eyes with GA from the AREDS set. Results: The model obtained an average Dice coefficient of 0.72 $\pm$ 0.26 on the BMES and RS. An intraclass correlation coefficient of 0.83 was reached between the automatically estimated GA area and the graders' consensus measures. Eight automatically calculated structural biomarkers (area, filled area, convex area, convex solidity, eccentricity, roundness, foveal involvement and perimeter) were significantly associated with growth rate. Combining all growth rates indicated that GA area grows quadratically up to an area of around 12 mm$^{2}$, after which growth rate stabilizes or decreases. Conclusion: The presented deep learning model allowed for fully automatic and robust segmentation of GA in CFIs. These segmentations can be used to extract structural characteristics of GA that predict its growth rate.
△ Less
Submitted 15 August, 2019;
originally announced August 2019.
-
Evaluation of a deep learning system for the joint automated detection of diabetic retinopathy and age-related macular degeneration
Authors:
Cristina González-Gonzalo,
Verónica Sánchez-Gutiérrez,
Paula Hernández-Martínez,
Inés Contreras,
Yara T. Lechanteur,
Artin Domanian,
Bram van Ginneken,
Clara I. Sánchez
Abstract:
Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases.
Methods: Evaluation of joint detection of referable DR…
▽ More
Purpose: To validate the performance of a commercially-available, CE-certified deep learning (DL) system, RetCAD v.1.3.0 (Thirona, Nijmegen, The Netherlands), for the joint automatic detection of diabetic retinopathy (DR) and age-related macular degeneration (AMD) in color fundus (CF) images on a dataset with mixed presence of eye diseases.
Methods: Evaluation of joint detection of referable DR and AMD was performed on a DR-AMD dataset with 600 images acquired during routine clinical practice, containing referable and non-referable cases of both diseases. Each image was graded for DR and AMD by an experienced ophthalmologist to establish the reference standard (RS), and by four independent observers for comparison with human performance. Validation was furtherly assessed on Messidor (1200 images) for individual identification of referable DR, and the Age-Related Eye Disease Study (AREDS) dataset (133821 images) for referable AMD, against the corresponding RS.
Results: Regarding joint validation on the DR-AMD dataset, the system achieved an area under the ROC curve (AUC) of 95.1% for detection of referable DR (SE=90.1%, SP=90.6%). For referable AMD, the AUC was 94.9% (SE=91.8%, SP=87.5%). Average human performance for DR was SE=61.5% and SP=97.8%; for AMD, SE=76.5% and SP=96.1%. Regarding detection of referable DR in Messidor, AUC was 97.5% (SE=92.0%, SP=92.1%); for referable AMD in AREDS, AUC was 92.7% (SE=85.8%, SP=86.0%).
Conclusions: The validated system performs comparably to human experts at simultaneous detection of DR and AMD. This shows that DL systems can facilitate access to joint screening of eye diseases and become a quick and reliable support for ophthalmological experts.
△ Less
Submitted 22 March, 2019;
originally announced March 2019.