subscribe to arXiv mailings

Reasoning with trees: interpreting CNNs using hierarchies

Authors: Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman

Abstract: Challenges persist in providing interpretable explanations for neural network reasoning in explainable AI (xAI). Existing methods like Integrated Gradients produce noisy maps, and LIME, while intuitive, may deviate from the model's reasoning. We introduce a framework that uses hierarchical segmentation techniques for faithful and interpretable explanations of Convolutional Neural Networks (CNNs).… ▽ More Challenges persist in providing interpretable explanations for neural network reasoning in explainable AI (xAI). Existing methods like Integrated Gradients produce noisy maps, and LIME, while intuitive, may deviate from the model's reasoning. We introduce a framework that uses hierarchical segmentation techniques for faithful and interpretable explanations of Convolutional Neural Networks (CNNs). Our method constructs model-based hierarchical segmentations that maintain the model's reasoning fidelity and allows both human-centric and model-centric segmentation. This approach offers multiscale explanations, aiding bias identification and enhancing understanding of neural network decision-making. Experiments show that our framework, xAiTrees, delivers highly interpretable and faithful model explanations, not only surpassing traditional xAI methods but shedding new light on a novel approach to enhancing xAI interpretability. Code at: https://github.com/CarolMazini/reasoning_with_trees . △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2405.11573 [pdf, other]

Quantile Activation: departing from single point estimation for better generalization across distortions

Authors: Aditya Challa, Sravan Danda, Laurent Najman, Snehanshu Saha

Abstract: A classifier is, in its essence, a function which takes an input and returns the class of the input and implicitly assumes an underlying distribution. We argue in this article that one has to move away from this basic tenet to obtain generalisation across distributions. Specifically, the class of the sample should depend on the points from its context distribution for better generalisation across… ▽ More A classifier is, in its essence, a function which takes an input and returns the class of the input and implicitly assumes an underlying distribution. We argue in this article that one has to move away from this basic tenet to obtain generalisation across distributions. Specifically, the class of the sample should depend on the points from its context distribution for better generalisation across distributions. How does one achieve this? The key idea is to adapt the outputs of each neuron of the network to its context distribution. We propose quantile activation, QACT, which, in simple terms, outputs the relative quantile of the sample in its context distribution, instead of the actual values in traditional networks. The scope of this article is to validate the proposed activation across several experimental settings, and compare it with conventional techniques. For this, we use the datasets developed to test robustness against distortions CIFAR10C, CIFAR100C, MNISTC, TinyImagenetC, and show that we achieve a significantly higher generalisation across distortions than the conventional classifiers, across different architectures. Although this paper is only a proof of concept, we surprisingly find that this approach outperforms DINOv2(small) at large distortions, even though DINOv2 is trained with a far bigger network on a considerably larger dataset. △ Less

Submitted 19 May, 2024; originally announced May 2024.

arXiv:2403.08789 [pdf, other]

Bridging Human Concepts and Computer Vision for Explainable Face Verification

Authors: Miriam Doh, Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman, Matei Mancas, Hugues Bersini

Abstract: With Artificial Intelligence (AI) influencing the decision-making process of sensitive applications such as Face Verification, it is fundamental to ensure the transparency, fairness, and accountability of decisions. Although Explainable Artificial Intelligence (XAI) techniques exist to clarify AI decisions, it is equally important to provide interpretability of these decisions to humans. In this p… ▽ More With Artificial Intelligence (AI) influencing the decision-making process of sensitive applications such as Face Verification, it is fundamental to ensure the transparency, fairness, and accountability of decisions. Although Explainable Artificial Intelligence (XAI) techniques exist to clarify AI decisions, it is equally important to provide interpretability of these decisions to humans. In this paper, we present an approach to combine computer and human vision to increase the explanation's interpretability of a face verification algorithm. In particular, we are inspired by the human perceptual process to understand how machines perceive face's human-semantic areas during face comparison tasks. We use Mediapipe, which provides a segmentation technique that identifies distinct human-semantic facial regions, enabling the machine's perception analysis. Additionally, we adapted two model-agnostic algorithms to provide human-interpretable insights into the decision-making processes. △ Less

Submitted 30 January, 2024; originally announced March 2024.

arXiv:2403.00504 [pdf, other]

Learning and Leveraging World Models in Visual Representation Learning

Authors: Quentin Garrido, Mahmoud Assran, Nicolas Ballas, Adrien Bardes, Laurent Najman, Yann LeCun

Abstract: Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict t… ▽ More Joint-Embedding Predictive Architecture (JEPA) has emerged as a promising self-supervised approach that learns by leveraging a world model. While previously limited to predicting missing parts of an input, we explore how to generalize the JEPA prediction task to a broader set of corruptions. We introduce Image World Models, an approach that goes beyond masked image modeling and learns to predict the effect of global photometric transformations in latent space. We study the recipe of learning performant IWMs and show that it relies on three key aspects: conditioning, prediction difficulty, and capacity. Additionally, we show that the predictive world model learned by IWM can be adapted through finetuning to solve diverse tasks; a fine-tuned IWM world model matches or surpasses the performance of previous self-supervised methods. Finally, we show that learning with an IWM allows one to control the abstraction level of the learned representations, learning invariant representations such as contrastive methods, or equivariant representations such as masked image modelling. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 23 pages, 16 figures

arXiv:2402.08405 [pdf, other]

A Novel Approach to Regularising 1NN classifier for Improved Generalization

Authors: Aditya Challa, Sravan Danda, Laurent Najman

Abstract: In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that… ▽ More In this paper, we propose a class of non-parametric classifiers, that learn arbitrary boundaries and generalize well. Our approach is based on a novel way to regularize 1NN classifiers using a greedy approach. We refer to this class of classifiers as Watershed Classifiers. 1NN classifiers are known to trivially over-fit but have very large VC dimension, hence do not generalize well. We show that watershed classifiers can find arbitrary boundaries on any dense enough dataset, and, at the same time, have very small VC dimension; hence a watershed classifier leads to good generalization. Traditional approaches to regularize 1NN classifiers are to consider $K$ nearest neighbours. Neighbourhood component analysis (NCA) proposes a way to learn representations consistent with ($n-1$) nearest neighbour classifier, where $n$ denotes the size of the dataset. In this article, we propose a loss function which can learn representations consistent with watershed classifiers, and show that it outperforms the NCA baseline. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.07507 [pdf, other]

Clustering Dynamics for Improved Speed Prediction Deriving from Topographical GPS Registrations

Authors: Sarah Almeida Carneiro, Giovanni Chierchia, Aurelie Pirayre, Laurent Najman

Abstract: A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infras… ▽ More A persistent challenge in the field of Intelligent Transportation Systems is to extract accurate traffic insights from geographic regions with scarce or no data coverage. To this end, we propose solutions for speed prediction using sparse GPS data points and their associated topographical and road design features. Our goal is to investigate whether we can use similarities in the terrain and infrastructure to train a machine learning model that can predict speed in regions where we lack transportation data. For this we create a Temporally Orientated Speed Dictionary Centered on Topographically Clustered Roads, which helps us to provide speed correlations to selected feature configurations. Our results show qualitative and quantitative improvement over new and standard regression methods. The presented framework provides a fresh perspective on devising strategies for missing data traffic analysis. △ Less

Submitted 12 February, 2024; originally announced February 2024.

arXiv:2402.02874 [pdf, other]

Morse frames

Authors: Gilles Bertrand, Laurent Najman

Abstract: In the context of discrete Morse theory, we introduce Morse frames, which are maps that associate a set of critical simplexes to all simplexes. The main example of Morse frames are the Morse references. In particular, these Morse references allow computing Morse complexes, an important tool for homology. We highlight the link between Morse references and gradient flows. We also propose a novel pre… ▽ More In the context of discrete Morse theory, we introduce Morse frames, which are maps that associate a set of critical simplexes to all simplexes. The main example of Morse frames are the Morse references. In particular, these Morse references allow computing Morse complexes, an important tool for homology. We highlight the link between Morse references and gradient flows. We also propose a novel presentation of the Annotation algorithm for persistent cohomology, as a variant of a Morse frame. Finally, we propose another construction, that takes advantage of the Morse reference for computing the Betti numbers in mod 2 arithmetic. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Journal ref: International Conference on Discrete Geometry and Mathematical Morphology (DGMM), S. Brunetti; A. Frosini; S. Rinaldi, Apr 2024, Florence, Italy

arXiv:2401.14434 [pdf, other]

Transforming gradient-based techniques into interpretable methods

Authors: Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman

Abstract: The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of the… ▽ More The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation. △ Less

Submitted 15 May, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

arXiv:2312.07264 [pdf, other]

Dual Structure-Aware Image Filterings for Semi-supervised Medical Image Segmentation

Authors: Yuliang Gu, Zhichao Sun, Tian Chen, Xin Xiao, Yepeng Liu, Yongchao Xu, Laurent Najman

Abstract: Semi-supervised image segmentation has attracted great attention recently. The key is how to leverage unlabeled images in the training process. Most methods maintain consistent predictions of the unlabeled images under variations (e.g., adding noise/perturbations, or creating alternative versions) in the image and/or model level. In most image-level variation, medical images often have prior struc… ▽ More Semi-supervised image segmentation has attracted great attention recently. The key is how to leverage unlabeled images in the training process. Most methods maintain consistent predictions of the unlabeled images under variations (e.g., adding noise/perturbations, or creating alternative versions) in the image and/or model level. In most image-level variation, medical images often have prior structure information, which has not been well explored. In this paper, we propose novel dual structure-aware image filterings (DSAIF) as the image-level variations for semi-supervised medical image segmentation. Motivated by connected filtering that simplifies image via filtering in structure-aware tree-based image representation, we resort to the dual contrast invariant Max-tree and Min-tree representation. Specifically, we propose a novel connected filtering that removes topologically equivalent nodes (i.e. connected components) having no siblings in the Max/Min-tree. This results in two filtered images preserving topologically critical structure. Applying the proposed DSAIF to mutually supervised networks decreases the consensus of their erroneous predictions on unlabeled images. This helps to alleviate the confirmation bias issue of overfitting to noisy pseudo labels of unlabeled images, and thus effectively improves the segmentation performance. Extensive experimental results on three benchmark datasets demonstrate that the proposed method significantly/consistently outperforms some state-of-the-art methods. The source codes will be publicly available. △ Less

Submitted 27 March, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2310.12590 [pdf, other]

PrivacyGAN: robust generative image privacy

Authors: Mariia Zameshina, Marlene Careil, Olivier Teytaud, Laurent Najman

Abstract: Classical techniques for protecting facial image privacy typically fall into two categories: data-poisoning methods, exemplified by Fawkes, which introduce subtle perturbations to images, or anonymization methods that generate images resembling the original only in several characteristics, such as gender, ethnicity, or facial expression.In this study, we introduce a novel approach, PrivacyGAN, tha… ▽ More Classical techniques for protecting facial image privacy typically fall into two categories: data-poisoning methods, exemplified by Fawkes, which introduce subtle perturbations to images, or anonymization methods that generate images resembling the original only in several characteristics, such as gender, ethnicity, or facial expression.In this study, we introduce a novel approach, PrivacyGAN, that uses the power of image generation techniques, such as VQGAN and StyleGAN, to safeguard privacy while maintaining image usability, particularly for social media applications. Drawing inspiration from Fawkes, our method entails shifting the original image within the embedding space towards a decoy image.We evaluate our approach using privacy metrics on traditional and novel facial image datasets. Additionally, we propose new criteria for evaluating the robustness of privacy-protection methods against unknown image recognition techniques, and we demonstrate that our approach is effective even in unknown embedding transfer scenarios. We also provide a human evaluation that further proves that the modified image preserves its utility as it remains recognisable as an image of the same person by friends and family. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.12583 [pdf, other]

Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation

Authors: Mariia Zameshina, Olivier Teytaud, Laurent Najman

Abstract: Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity.Diverse Diffusion is a general unsupervised technique that can be applied to exis… ▽ More Latent diffusion models excel at producing high-quality images from text. Yet, concerns appear about the lack of diversity in the generated imagery. To tackle this, we introduce Diverse Diffusion, a method for boosting image diversity beyond gender and ethnicity, spanning into richer realms, including color diversity.Diverse Diffusion is a general unsupervised technique that can be applied to existing text-to-image models. Our approach focuses on finding vectors in the Stable Diffusion latent space that are distant from each other. We generate multiple vectors in the latent space until we find a set of vectors that meets the desired distance requirements and the required batch size.To evaluate the effectiveness of our diversity methods, we conduct experiments examining various characteristics, including color diversity, LPIPS metric, and ethnicity/gender representation in images featuring humans.The results of our experiments emphasize the significance of diversity in generating realistic and varied images, offering valuable insights for improving text-to-image models. Through the enhancement of image diversity, our approach contributes to the creation of more inclusive and representative AI-generated art. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.02282 [pdf, other]

doi 10.1109/MT-ITS56129.2023.10241394

SWMLP: Shared Weight Multilayer Perceptron for Car Trajectory Speed Prediction using Road Topographical Features

Authors: Sarah Almeida Carneiro, Giovanni Chierchia, Jean Charléty, Aurélie Chataignon, Laurent Najman

Abstract: Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method tha… ▽ More Although traffic is one of the massively collected data, it is often only available for specific regions. One concern is that, although there are studies that give good results for these data, the data from these regions may not be sufficiently representative to describe all the traffic patterns in the rest of the world. In quest of addressing this concern, we propose a speed prediction method that is independent of large historical speed data. To predict a vehicle's speed, we use the trajectory road topographical features to fit a Shared Weight Multilayer Perceptron learning model. Our results show significant improvement, both qualitative and quantitative, over standard regression analysis. Moreover, the proposed framework sheds new light on the way to design new approaches for traffic analysis. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Journal ref: International Conference on Models and Technologies for Intelligent Transportation Systems, Jun 2023, Nice, France. pp.1-6

arXiv:2309.00018 [pdf, other]

Unsupervised discovery of Interpretable Visual Concepts

Authors: Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman

Abstract: Providing interpretability of deep-learning models to non-experts, while fundamental for a responsible real-world usage, is challenging. Attribution maps from xAI techniques, such as Integrated Gradients, are a typical example of a visualization technique containing a high level of information, but with difficult interpretation. In this paper, we propose two methods, Maximum Activation Groups Extr… ▽ More Providing interpretability of deep-learning models to non-experts, while fundamental for a responsible real-world usage, is challenging. Attribution maps from xAI techniques, such as Integrated Gradients, are a typical example of a visualization technique containing a high level of information, but with difficult interpretation. In this paper, we propose two methods, Maximum Activation Groups Extraction (MAGE) and Multiscale Interpretable Visualization (Ms-IV), to explain the model's decision, enhancing global interpretability. MAGE finds, for a given CNN, combinations of features which, globally, form a semantic meaning, that we call concepts. We group these similar feature patterns by clustering in ``concepts'', that we visualize through Ms-IV. This last method is inspired by Occlusion and Sensitivity analysis (incorporating causality), and uses a novel metric, called Class-aware Order Correlation (CaOC), to globally evaluate the most important image regions according to the model's decision space. We compare our approach to xAI methods such as LIME and Integrated Gradients. Experimental results evince the Ms-IV higher localization and faithfulness values. Finally, qualitative evaluation of combined MAGE and Ms-IV demonstrates humans' ability to agree, based on the visualization, with the decision of clusters' concepts; and, to detect, among a given set of networks, the existence of bias. △ Less

Submitted 21 November, 2023; v1 submitted 31 August, 2023; originally announced September 2023.

arXiv:2302.10283 [pdf, other]

Self-supervised learning of Split Invariant Equivariant representations

Authors: Quentin Garrido, Laurent Najman, Yann Lecun

Abstract: Recent progress has been made towards learning invariant or equivariant representations with self-supervised learning. While invariant methods are evaluated on large scale datasets, equivariant ones are evaluated in smaller, more controlled, settings. We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. We start by i… ▽ More Recent progress has been made towards learning invariant or equivariant representations with self-supervised learning. While invariant methods are evaluated on large scale datasets, equivariant ones are evaluated in smaller, more controlled, settings. We aim at bridging the gap between the two in order to learn more diverse representations that are suitable for a wide range of tasks. We start by introducing a dataset called 3DIEBench, consisting of renderings from 3D models over 55 classes and more than 2.5 million images where we have full control on the transformations applied to the objects. We further introduce a predictor architecture based on hypernetworks to learn equivariant representations with no possible collapse to invariance. We introduce SIE (Split Invariant-Equivariant) which combines the hypernetwork-based predictor with representations split in two parts, one invariant, the other equivariant, to learn richer representations. We demonstrate significant performance gains over existing methods on equivariance related tasks from both a qualitative and quantitative point of view. We further analyze our introduced predictor and show how it steers the learned latent space. We hope that both our introduced dataset and approach will enable learning richer representations without supervision in more complex scenarios. Code and data are available at https://github.com/facebookresearch/SIE. △ Less

Submitted 19 June, 2023; v1 submitted 14 February, 2023; originally announced February 2023.

Journal ref: The Fortieth International Conference on Machine Learning, 2023, Honolulu, United States

arXiv:2301.03840 [pdf, other]

Discrete Morse Functions and Watersheds

Authors: Gilles Bertrand, Nicolas Boutry, Laurent Najman

Abstract: Any watershed, when defined on a stack on a normal pseudomanifold of dimension d, is a pure (d -- 1)-subcomplex that satisfies a drop-of-water principle. In this paper, we introduce Morse stacks, a class of functions that are equivalent to discrete Morse functions. We show that the watershed of a Morse stack on a normal pseudomanifold is uniquely defined, and can be obtained with a linear-time alg… ▽ More Any watershed, when defined on a stack on a normal pseudomanifold of dimension d, is a pure (d -- 1)-subcomplex that satisfies a drop-of-water principle. In this paper, we introduce Morse stacks, a class of functions that are equivalent to discrete Morse functions. We show that the watershed of a Morse stack on a normal pseudomanifold is uniquely defined, and can be obtained with a linear-time algorithm relying on a sequence of collapses. Last, we prove that such a watershed is the cut of the unique minimum spanning forest, rooted in the minima of the Morse stack, of the facet graph of the pseudomanifold. △ Less

Submitted 16 May, 2023; v1 submitted 10 January, 2023; originally announced January 2023.

arXiv:2210.03517 [pdf, other]

doi 10.1145/3520304.3528992

Fairness in generative modeling

Authors: Mariia Zameshina, Olivier Teytaud, Fabien Teytaud, Vlad Hosu, Nathanael Carraz, Laurent Najman, Markus Wagner

Abstract: We design general-purpose algorithms for addressing fairness issues and mode collapse in generative modeling. More precisely, to design fair algorithms for as many sensitive variables as possible, including variables we might not be aware of, we assume no prior knowledge of sensitive variables: our algorithms use unsupervised fairness only, meaning no information related to the sensitive variables… ▽ More We design general-purpose algorithms for addressing fairness issues and mode collapse in generative modeling. More precisely, to design fair algorithms for as many sensitive variables as possible, including variables we might not be aware of, we assume no prior knowledge of sensitive variables: our algorithms use unsupervised fairness only, meaning no information related to the sensitive variables is used for our fairness-improving methods. All images of faces (even generated ones) have been removed to mitigate legal risks. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Journal ref: GECCO '22: Genetic and Evolutionary Computation Conference, Jul 2022, Boston Massachusetts, France. pp.320-323

arXiv:2210.02885 [pdf, other]

RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank

Authors: Quentin Garrido, Randall Balestriero, Laurent Najman, Yann Lecun

Abstract: Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non info… ▽ More Joint-Embedding Self Supervised Learning (JE-SSL) has seen a rapid development, with the emergence of many method variations but only few principled guidelines that would help practitioners to successfully deploy them. The main reason for that pitfall comes from JE-SSL's core principle of not employing any input reconstruction therefore lacking visual cues of unsuccessful training. Adding non informative loss values to that, it becomes difficult to deploy SSL on a new dataset for which no labels can help to judge the quality of the learned representation. In this study, we develop a simple unsupervised criterion that is indicative of the quality of the learned JE-SSL representations: their effective rank. Albeit simple and computationally friendly, this method -- coined RankMe -- allows one to assess the performance of JE-SSL representations, even on different downstream datasets, without requiring any labels. A further benefit of RankMe is that it does not have any training or hyper-parameters to tune. Through thorough empirical experiments involving hundreds of training episodes, we demonstrate how RankMe can be used for hyperparameter selection with nearly no reduction in final performance compared to the current selection method that involve a dataset's labels. We hope that RankMe will facilitate the deployment of JE-SSL towards domains that do not have the opportunity to rely on labels for representations' quality assessment. △ Less

Submitted 26 June, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

Journal ref: The Fortieth International Conference on Machine Learning, 2023, Honolulu, United States

arXiv:2206.05109 [pdf, other]

A Proof of the Tree of Shapes in n-D

Authors: Thierry GÉraud, Nicolas Boutry, Sébastien Crozet, Edwin Carlinet, Laurent Najman

Abstract: In this paper, we prove that the self-dual morphological hierarchical structure computed on a n-D gray-level wellcomposed image u by the algorithm of G{é}raud et al. [1] is exactly the mathematical structure defined to be the tree of shape of u in Najman et al [2]. We recall that this algorithm is in quasi-linear time and thus considered to be optimal. The tree of shapes leads to many applications… ▽ More In this paper, we prove that the self-dual morphological hierarchical structure computed on a n-D gray-level wellcomposed image u by the algorithm of G{é}raud et al. [1] is exactly the mathematical structure defined to be the tree of shape of u in Najman et al [2]. We recall that this algorithm is in quasi-linear time and thus considered to be optimal. The tree of shapes leads to many applications in mathematical morphology and in image processing like grain filtering, shapings, image segmentation, and so on. △ Less

Submitted 10 June, 2022; originally announced June 2022.

arXiv:2206.02574 [pdf, other]

doi 10.48550/arXiv.2206.02574

On the duality between contrastive and non-contrastive self-supervised learning

Authors: Quentin Garrido, Yubei Chen, Adrien Bardes, Laurent Najman, Yann Lecun

Abstract: Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and… ▽ More Recent approaches in self-supervised learning of image representations can be categorized into different families of methods and, in particular, can be divided into contrastive and non-contrastive approaches. While differences between the two families have been thoroughly discussed to motivate new approaches, we focus more on the theoretical similarities between them. By designing contrastive and covariance based non-contrastive criteria that can be related algebraically and shown to be equivalent under limited assumptions, we show how close those families can be. We further study popular methods and introduce variations of them, allowing us to relate this theoretical result to current practices and show the influence (or lack thereof) of design choices on downstream performance. Motivated by our equivalence result, we investigate the low performance of SimCLR and show how it can match VICReg's with careful hyperparameter tuning, improving significantly over known baselines. We also challenge the popular assumption that non-contrastive methods need large output dimensions. Our theoretical and quantitative results suggest that the numerical gaps between contrastive and non-contrastive methods in certain regimes can be closed given better network design choices and hyperparameter tuning. The evidence shows that unifying different SOTA methods is an important direction to build a better understanding of self-supervised learning. △ Less

Submitted 26 June, 2023; v1 submitted 3 June, 2022; originally announced June 2022.

Comments: The Eleventh International Conference on Learning Representations, 2023, Kigali, Rwanda

arXiv:2205.12546 [pdf, other]

Some equivalence relation between persistent homology and morphological dynamics

Authors: Nicolas Boutry, Laurent Najman, Thierry Géraud

Abstract: In Mathematical Morphology (MM), connected filters based on dynamics are used to filter the extrema of an image. Similarly, persistence is a concept coming from Persistent Homology (PH) and Morse Theory (MT) that represents the stability of the extrema of a Morse function. Since these two concepts seem to be closely related, in this paper we examine their relationship, and we prove that they are e… ▽ More In Mathematical Morphology (MM), connected filters based on dynamics are used to filter the extrema of an image. Similarly, persistence is a concept coming from Persistent Homology (PH) and Morse Theory (MT) that represents the stability of the extrema of a Morse function. Since these two concepts seem to be closely related, in this paper we examine their relationship, and we prove that they are equal on n-D Morse functions, n $\ge$ 1. More exactly, pairing a minimum with a 1-saddle by dynamics or pairing the same 1-saddle with a minimum by persistence leads exactly to the same pairing, assuming that the critical values of the studied Morse function are unique. This result is a step further to show how much topological data analysis and mathematical morphology are related, paving the way for a more in-depth study of the relations between these two research fields. △ Less

Submitted 25 May, 2022; originally announced May 2022.

Comments: Journal of Mathematical Imaging and Vision, Springer Verlag, In press

arXiv:2204.04969 [pdf, other]

Assessing hierarchies by their consistent segmentations

Authors: Zeev Gutman, Ritvik Vij, Laurent Najman, Michael Lindenbaum

Abstract: Current approaches to generic segmentation start by creating a hierarchy of nested image partitions and then specifying a segmentation from it. Our first contribution is to describe several ways, most of them new, for specifying segmentations using the hierarchy elements. Then, we consider the best hierarchy-induced segmentation specified by a limited number of hierarchy elements. We focus on… ▽ More Current approaches to generic segmentation start by creating a hierarchy of nested image partitions and then specifying a segmentation from it. Our first contribution is to describe several ways, most of them new, for specifying segmentations using the hierarchy elements. Then, we consider the best hierarchy-induced segmentation specified by a limited number of hierarchy elements. We focus on a common quality measure for binary segmentations, the Jaccard index (also known as IoU). Optimizing the Jaccard index is highly non-trivial, and yet we propose an efficient approach for doing exactly that. This way we get algorithm-independent upper bounds on the quality of any segmentation created from the hierarchy. We found that the obtainable segmentation quality varies significantly depending on the way that the segments are specified by the hierarchy elements, and that representing a segmentation with only a few hierarchy elements is often possible. (Code is available). △ Less

Submitted 7 December, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2203.11512 [pdf, other]

Gradient Vector Fields of Discrete Morse Functions and Watershed-cuts

Authors: Nicolas Boutry, Gilles Bertrand, Laurent Najman

Abstract: In this paper, we study a class of discrete Morse functions, coming from Discrete Morse Theory, that are equivalent to a class of simplicial stacks, coming from Mathematical Morphology. We show that, as in Discrete Morse Theory, we can see the gradient vector field of a simplicial stack (seen as a discrete Morse function) as the only relevant information we should consider. Last, but not the least… ▽ More In this paper, we study a class of discrete Morse functions, coming from Discrete Morse Theory, that are equivalent to a class of simplicial stacks, coming from Mathematical Morphology. We show that, as in Discrete Morse Theory, we can see the gradient vector field of a simplicial stack (seen as a discrete Morse function) as the only relevant information we should consider. Last, but not the least, we also show that the Minimum Spanning Forest of the dual graph of a simplicial stack is induced by the gradient vector field of the initial function. This result allows computing a watershed-cut from a gradient vector field. △ Less

Submitted 5 October, 2022; v1 submitted 22 March, 2022; originally announced March 2022.

Journal ref: 2nd International Conference on Discrete Geometry and Mathematical Morphology (DGMM 2022)

arXiv:2103.09384 [pdf, other]

doi 10.1109/TGRS.2021.3113721

Triplet-Watershed for Hyperspectral Image Classification

Authors: Aditya Challa, Sravan Danda, B. S. Daya Sagar, Laurent Najman

Abstract: Hyperspectral images (HSI) consist of rich spatial and spectral information, which can potentially be used for several applications. However, noise, band correlations and high dimensionality restrict the applicability of such data. This is recently addressed using creative deep learning network architectures such as ResNet, SSRN, and A2S2K. However, the last layer, i.e the classification layer, re… ▽ More Hyperspectral images (HSI) consist of rich spatial and spectral information, which can potentially be used for several applications. However, noise, band correlations and high dimensionality restrict the applicability of such data. This is recently addressed using creative deep learning network architectures such as ResNet, SSRN, and A2S2K. However, the last layer, i.e the classification layer, remains unchanged and is taken to be the softmax classifier. In this article, we propose to use a watershed classifier. Watershed classifier extends the watershed operator from Mathematical Morphology for classification. In its vanilla form, the watershed classifier does not have any trainable parameters. In this article, we propose a novel approach to train deep learning networks to obtain representations suitable for the watershed classifier. The watershed classifier exploits the connectivity patterns, a characteristic of HSI datasets, for better inference. We show that exploiting such characteristics allows the Triplet-Watershed to achieve state-of-art results in supervised and semi-supervised contexts. These results are validated on Indianpines (IP), University of Pavia (UP), Kennedy Space Center (KSC) and University of Houston (UH) datasets, relying on simple convnet architecture using a quarter of parameters compared to previous state-of-the-art networks. The source code for reproducing the experiments and supplementary material (high resolution images) is available at https://github.com/ac20/TripletWatershed Code. △ Less

Submitted 5 September, 2021; v1 submitted 16 March, 2021; originally announced March 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Journal ref: IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-14, 2022

arXiv:2102.05892 [pdf, other]

Visualizing hierarchies in scRNA-seq data using a density tree-biased autoencoder

Authors: Quentin Garrido, Sebastian Damrich, Alexander Jäger, Dario Cerletti, Manfred Claassen, Laurent Najman, Fred Hamprecht

Abstract: Motivation: Single cell RNA sequencing (scRNA-seq) data makes studying the development of cells possible at unparalleled resolution. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data is expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree-structure in two dimensions is highly desirable for biological in… ▽ More Motivation: Single cell RNA sequencing (scRNA-seq) data makes studying the development of cells possible at unparalleled resolution. Given that many cellular differentiation processes are hierarchical, their scRNA-seq data is expected to be approximately tree-shaped in gene expression space. Inference and representation of this tree-structure in two dimensions is highly desirable for biological interpretation and exploratory analysis.Results:Our two contributions are an approach for identifying a meaningful tree structure from high-dimensional scRNA-seq data, and a visualization method respecting the tree-structure. We extract the tree structure by means of a density based minimum spanning tree on a vector quantization of the data and show that it captures biological information well. We then introduce DTAE, a tree-biased autoencoder that emphasizes the tree structure of the data in low dimensional space. We compare to other dimension reduction methods and demonstrate the success of our method both qualitatively and quantitatively on real and toy data.Availability: Our implementation relying on PyTorch and Higra is available at https://github.com/hci-unihd/DTAE. △ Less

Submitted 22 April, 2022; v1 submitted 11 February, 2021; originally announced February 2021.

Journal ref: Bioinformatics, Oxford University Press (OUP), In press

arXiv:2101.04378 [pdf, other]

doi 10.1016/j.patcog.2022.108882

Rethinking Interactive Image Segmentation: Feature Space Annotation

Authors: Jord{ã}o Bragantini, Alexandre X Falc{ã}o, Laurent Najman

Abstract: Despite the progress of interactive image segmentation methods, high-quality pixel-level annotation is still time-consuming and laborious - a bottleneck for several deep learning applications. We take a step back to propose interactive and simultaneous segment annotation from multiple images guided by feature space projection. This strategy is in stark contrast to existing interactive segmentation… ▽ More Despite the progress of interactive image segmentation methods, high-quality pixel-level annotation is still time-consuming and laborious - a bottleneck for several deep learning applications. We take a step back to propose interactive and simultaneous segment annotation from multiple images guided by feature space projection. This strategy is in stark contrast to existing interactive segmentation methodologies, which perform annotation in the image domain. We show that feature space annotation achieves competitive results with state-of-the-art methods in foreground segmentation datasets: iCoSeg, DAVIS, and Rooftop. Moreover, in the semantic segmentation context, it achieves 91.5% accuracy in the Cityscapes dataset, being 74.75 times faster than the original annotation procedure. Further, our contribution sheds light on a novel direction for interactive image annotation that can be integrated with existing methodologies. The supplementary material presents video demonstrations. Code available at https://github.com/LIDS-UNICAMP/rethinking-interactive-image-segmentation. △ Less

Submitted 11 July, 2022; v1 submitted 12 January, 2021; originally announced January 2021.

Journal ref: Pattern Recognition, Elsevier, In press

arXiv:1710.04476 [pdf, other]

doi 10.1109/MSP.2009.934154

VOIDD: automatic vessel of intervention dynamic detection in PCI procedures

Authors: Ketan Bacchuwar, Jean Cousty, Régis Vaillant, Laurent Najman

Abstract: In this article, we present the work towards improving the overall workflow of the Percutaneous Coronary Interventions (PCI) procedures by capacitating the imaging instruments to precisely monitor the steps of the procedure. In the long term, such capabilities can be used to optimize the image acquisition to reduce the amount of dose or contrast media employed during the procedure. We present the… ▽ More In this article, we present the work towards improving the overall workflow of the Percutaneous Coronary Interventions (PCI) procedures by capacitating the imaging instruments to precisely monitor the steps of the procedure. In the long term, such capabilities can be used to optimize the image acquisition to reduce the amount of dose or contrast media employed during the procedure. We present the automatic VOIDD algorithm to detect the vessel of intervention which is going to be treated during the procedure by combining information from the vessel image with contrast agent injection and images acquired during guidewire tip navigation. Due to the robust guidewire tip segmentation method, this algorithm is also able to automatically detect the sequence corresponding to guidewire navigation. We present an evaluation methodology which characterizes the correctness of the guide wire tip detection and correct identification of the vessel navigated during the procedure. On a dataset of 2213 images from 8 sequences of 4 patients, VOIDD identifies vessel-of-intervention with accuracy in the range of 88% or above and absence of tip with accuracy in range of 98% or above depending on the test case. △ Less

Submitted 12 October, 2017; originally announced October 2017.

Journal ref: CVII-Stent Workshop MICCAI 2017, Sep 2017, Quebec City, Canada. 26 (6), pp.136 - 157, 2009

arXiv:1603.04838 [pdf, other]

Hierarchical image simplification and segmentation based on Mumford-Shah-salient level line selection

Authors: Yongchao Xu, Thierry Géraud, Laurent Najman

Abstract: Hierarchies, such as the tree of shapes, are popular representations for image simplification and segmentation thanks to their multiscale structures. Selecting meaningful level lines (boundaries of shapes) yields to simplify image while preserving intact salient structures. Many image simplification and segmentation methods are driven by the optimization of an energy functional, for instance the c… ▽ More Hierarchies, such as the tree of shapes, are popular representations for image simplification and segmentation thanks to their multiscale structures. Selecting meaningful level lines (boundaries of shapes) yields to simplify image while preserving intact salient structures. Many image simplification and segmentation methods are driven by the optimization of an energy functional, for instance the celebrated Mumford-Shah functional. In this paper, we propose an efficient approach to hierarchical image simplification and segmentation based on the minimization of the piecewise-constant Mumford-Shah functional. This method conforms to the current trend that consists in producing hierarchical results rather than a unique partition. Contrary to classical approaches which compute optimal hierarchical segmentations from an input hierarchy of segmentations, we rely on the tree of shapes, a unique and well-defined representation equivalent to the image. Simply put, we compute for each level line of the image an attribute function that characterizes its persistence under the energy minimization. Then we stack the level lines from meaningless ones to salient ones through a saliency map based on extinction values defined on the tree-based shape space. Qualitative illustrations and quantitative evaluation on Weizmann segmentation evaluation database demonstrate the state-of-the-art performance of our method. △ Less

Submitted 17 May, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

Comments: Pattern Recognition Letters, Elsevier, 2016

arXiv:1505.07203 [pdf, other]

doi 10.1007/978-3-319-18720-4_18

New characterizations of minimum spanning trees and of saliency maps based on quasi-flat zones

Authors: Jean Cousty, Laurent Najman, Yukiko Kenmochi, Silvio GuimarÃ£es

Abstract: We study three representations of hierarchies of partitions: dendrograms (direct representations), saliency maps, and minimum spanning trees. We provide a new bijection between saliency maps and hierarchies based on quasi-flat zones as used in image processing and characterize saliency maps and minimum spanning trees as solutions to constrained minimization problems where the constraint is quasi-f… ▽ More We study three representations of hierarchies of partitions: dendrograms (direct representations), saliency maps, and minimum spanning trees. We provide a new bijection between saliency maps and hierarchies based on quasi-flat zones as used in image processing and characterize saliency maps and minimum spanning trees as solutions to constrained minimization problems where the constraint is quasi-flat zones preservation. In practice, these results form a toolkit for new hierarchical methods where one can choose the most convenient representation. They also invite us to process non-image data with morphological hierarchies. △ Less

Submitted 27 May, 2015; originally announced May 2015.

Journal ref: 12th International Symposium on Mathematical Morphology (ISMM), May 2015, Reykjavik, Iceland. Lecture Notes in Computer Science (LNCS), 9082, pp.205-216, Mathematical Morphology and Its Applications to Signal and Image Processing

arXiv:1404.7748 [pdf, other]

doi 10.1016/j.patrec.2014.05.007

A graph-based mathematical morphology reader

Authors: Laurent Najman, Jean Cousty

Abstract: This survey paper aims at providing a "literary" anthology of mathematical morphology on graphs. It describes in the English language many ideas stemming from a large number of different papers, hence providing a unified view of an active and diverse field of research. This survey paper aims at providing a "literary" anthology of mathematical morphology on graphs. It describes in the English language many ideas stemming from a large number of different papers, hence providing a unified view of an active and diverse field of research. △ Less

Submitted 30 April, 2014; originally announced April 2014.

Journal ref: Pattern Recognition Letters 47 (2014) 3-17

arXiv:1401.5602 [pdf, ps, other]

Dimensional operators for mathematical morphology on simplicial complexes

Authors: Fabio Dias, Jean Cousty, Laurent Najman

Abstract: In this work we study the framework of mathematical morphology on simplicial complex spaces. Simplicial complexes are widely used to represent multidimensional data, such as meshes, that are two dimensional complexes, or graphs, that can be interpreted as one dimensional complexes. Mathematical morphology is one of the most powerful frameworks for image processing, including the processing of digi… ▽ More In this work we study the framework of mathematical morphology on simplicial complex spaces. Simplicial complexes are widely used to represent multidimensional data, such as meshes, that are two dimensional complexes, or graphs, that can be interpreted as one dimensional complexes. Mathematical morphology is one of the most powerful frameworks for image processing, including the processing of digital structures, and is heavily used for many applications. However, mathematical morphology operators on simplicial complex spaces is not a concept fully developed in the literature. Specifically, we explore properties of the dimensional operators, small, versatile operators that can be used to define new operators on simplicial complexes, while maintaining properties from mathematical morphology. These operators can also be used to recover many morphological operators from the literature. Matlab code and additional material, including the proofs of the original properties, are freely available at \url{https://code.google.com/p/math-morpho-simplicial-complexes.} △ Less

Submitted 22 January, 2014; originally announced January 2014.

Comments: Pattern Recognition Letters (2014) To appear

arXiv:1301.3572 [pdf, other]

Indoor Semantic Segmentation using depth information

Authors: Camille Couprie, Clément Farabet, Laurent Najman, Yann LeCun

Abstract: This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%… ▽ More This work addresses multi-class segmentation of indoor scenes with RGB-D inputs. While this area of research has gained much attention recently, most works still rely on hand-crafted features. In contrast, we apply a multiscale convolutional network to learn features directly from the images and the depth information. We obtain state-of-the-art on the NYU-v2 depth dataset with an accuracy of 64.5%. We illustrate the labeling of indoor scenes in videos sequences that could be processed in real-time using appropriate hardware such as an FPGA. △ Less

Submitted 14 March, 2013; v1 submitted 15 January, 2013; originally announced January 2013.

Comments: 8 pages, 3 figures

arXiv:1209.4233 [pdf, other]

doi 10.1007/978-3-642-32313-3_10

Writing Reusable Digital Geometry Algorithms in a Generic Image Processing Framework

Authors: Roland Levillain, Thierry Géraud, Laurent Najman

Abstract: Digital Geometry software should reflect the generality of the underlying mathe- matics: mapping the latter to the former requires genericity. By designing generic solutions, one can effectively reuse digital geometry data structures and algorithms. We propose an image processing framework focused on the Generic Programming paradigm in which an algorithm on the paper can be turned into a single co… ▽ More Digital Geometry software should reflect the generality of the underlying mathe- matics: mapping the latter to the former requires genericity. By designing generic solutions, one can effectively reuse digital geometry data structures and algorithms. We propose an image processing framework focused on the Generic Programming paradigm in which an algorithm on the paper can be turned into a single code, written once and usable with various input types. This approach enables users to design and implement new methods at a lower cost, try cross-domain experiments and help generalize results △ Less

Submitted 18 September, 2012; originally announced September 2012.

Comments: Workshop on Applications of Discrete Geometry and Mathematical Morphology, Istanb : France (2010)

arXiv:1209.3925 [pdf, other]

doi 10.1007/978-3-642-32313-3_4

On morphological hierarchical representations for image processing and spatial data clustering

Authors: Pierre Soille, Laurent Najman

Abstract: Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the fr… ▽ More Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing. △ Less

Submitted 18 September, 2012; originally announced September 2012.

Journal ref: Workshop on APPLICATIONS OF DISCRETE GEOMETRY AND MATHEMATICAL MORPHOLOGY, Istanbul : Turkey (2010)

arXiv:1206.2807 [pdf, other]

An efficient hierarchical graph based image segmentation

Authors: Silvio Jamil F. Guimarães, Jean Cousty, Yukiko Kenmochi, Laurent Najman

Abstract: Hierarchical image segmentation provides region-oriented scalespace, i.e., a set of image segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. Most image segmentation algorithms, such as region merging algorithms, rely on a criterion for merging that does not lead to a hierarchy, and for which the tuning of the para… ▽ More Hierarchical image segmentation provides region-oriented scalespace, i.e., a set of image segmentations at different detail levels in which the segmentations at finer levels are nested with respect to those at coarser levels. Most image segmentation algorithms, such as region merging algorithms, rely on a criterion for merging that does not lead to a hierarchy, and for which the tuning of the parameters can be difficult. In this work, we propose a hierarchical graph based image segmentation relying on a criterion popularized by Felzenzwalb and Huttenlocher. We illustrate with both real and synthetic images, showing efficiency, ease of use, and robustness of our method. △ Less

Submitted 13 June, 2012; originally announced June 2012.

arXiv:1204.4758 [pdf, other]

Morphological Filtering in Shape Spaces: Applications using Tree-Based Image Representations

Authors: Yongchao Xu, Thierry Géraud, Laurent Najman

Abstract: Connected operators are filtering tools that act by merging elementary regions of an image. A popular strategy is based on tree-based image representations: for example, one can compute an attribute on each node of the tree and keep only the nodes for which the attribute is sufficiently strong. This operation can be seen as a thresholding of the tree, seen as a graph whose nodes are weighted by th… ▽ More Connected operators are filtering tools that act by merging elementary regions of an image. A popular strategy is based on tree-based image representations: for example, one can compute an attribute on each node of the tree and keep only the nodes for which the attribute is sufficiently strong. This operation can be seen as a thresholding of the tree, seen as a graph whose nodes are weighted by the attribute. Rather than being satisfied with a mere thresholding, we propose to expand on this idea, and to apply connected filters on this latest graph. Consequently, the filtering is done not in the space of the image, but on the space of shapes build from the image. Such a processing is a generalization of the existing tree-based connected operators. Indeed, the framework includes classical existing connected operators by attributes. It also allows us to propose a class of novel connected operators from the leveling family, based on shape attributes. Finally, we also propose a novel class of self-dual connected operators that we call morphological shapings. △ Less

Submitted 16 July, 2012; v1 submitted 20 April, 2012; originally announced April 2012.

Comments: 4 pages, will appear in 21st International Conference on Pattern Recognition (ICPR 2012)

arXiv:1202.2160 [pdf, other]

Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers

Authors: Clément Farabet, Camille Couprie, Laurent Najman, Yann LeCun

Abstract: Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a se… ▽ More Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image. The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes regions of multiple sizes centered on each pixel. The feature extractor is a multiscale convolutional network trained from raw pixels. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The convolutional network feature extractor is trained end-to-end from raw pixels, alleviating the need for engineered features. After training, the system is parameter free. The system yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of magnitude faster than competing approaches, producing a 320 \times 240 image labeling in less than 1 second. △ Less

Submitted 13 July, 2012; v1 submitted 9 February, 2012; originally announced February 2012.

Comments: 9 pages, 4 figures - Published in 29th International Conference on Machine Learning (ICML 2012), Jun 2012, Edinburgh, United Kingdom

arXiv:1010.2733 [pdf, ps, other]

doi 10.1137/100799186

Combinatorial Continuous Maximal Flows

Authors: Camille Couprie, Leo Grady, Hugues Talbot, Laurent Najman

Abstract: Maximum flow (and minimum cut) algorithms have had a strong impact on computer vision. In particular, graph cuts algorithms provide a mechanism for the discrete optimization of an energy functional which has been used in a variety of applications such as image segmentation, stereo, image stitching and texture synthesis. Algorithms based on the classical formulation of max-flow defined on a graph a… ▽ More Maximum flow (and minimum cut) algorithms have had a strong impact on computer vision. In particular, graph cuts algorithms provide a mechanism for the discrete optimization of an energy functional which has been used in a variety of applications such as image segmentation, stereo, image stitching and texture synthesis. Algorithms based on the classical formulation of max-flow defined on a graph are known to exhibit metrication artefacts in the solution. Therefore, a recent trend has been to instead employ a spatially continuous maximum flow (or the dual min-cut problem) in these same applications to produce solutions with no metrication errors. However, known fast continuous max-flow algorithms have no stopping criteria or have not been proved to converge. In this work, we revisit the continuous max-flow problem and show that the analogous discrete formulation is different from the classical max-flow problem. We then apply an appropriate combinatorial optimization technique to this combinatorial continuous max-flow CCMF problem to find a null-divergence solution that exhibits no metrication artefacts and may be solved exactly by a fast, efficient algorithm with provable convergence. Finally, by exhibiting the dual problem of our CCMF formulation, we clarify the fact, already proved by Nozawa in the continuous setting, that the max-flow and the total variation problems are not always equivalent. △ Less

Submitted 28 December, 2011; v1 submitted 13 October, 2010; originally announced October 2010.

Comments: 26 pages

Journal ref: SIAM Journal on Imaging Sciences 4 (2011) 905-930

arXiv:1002.1887 [pdf, ps, other]

doi 10.1007/s10851-011-0259-1

On the equivalence between hierarchical segmentations and ultrametric watersheds

Authors: Laurent Najman

Abstract: We study hierarchical segmentation in the framework of edge-weighted graphs. We define ultrametric watersheds as topological watersheds null on the minima. We prove that there exists a bijection between the set of ultrametric watersheds and the set of hierarchical segmentations. We end this paper by showing how to use the proposed framework in practice in the example of constrained connectivity; i… ▽ More We study hierarchical segmentation in the framework of edge-weighted graphs. We define ultrametric watersheds as topological watersheds null on the minima. We prove that there exists a bijection between the set of ultrametric watersheds and the set of hierarchical segmentations. We end this paper by showing how to use the proposed framework in practice in the example of constrained connectivity; in particular it allows to compute such a hierarchy following a classical watershed-based morphological scheme, which provides an efficient algorithm to compute the whole hierarchy. △ Less

Submitted 17 December, 2010; v1 submitted 9 February, 2010; originally announced February 2010.

Comments: 19 pages, double-column

Journal ref: Journal of Mathematical Imaging and Vision 40, 3 (2011) 231-247

Showing 1–38 of 38 results for author: Najman, L