-
Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data
Authors:
Tim Elsner,
Paula Usinger,
Victor Czech,
Gregor Kobsik,
Yanjiang He,
Isaak Lim,
Leif Kobbelt
Abstract:
In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information content in that region. Adaptive discretisation schemes like quadtrees are applied to allocate tokens for patches with varying sizes, but this just varies the region…
▽ More
In quantised autoencoders, images are usually split into local patches, each encoded by one token. This representation is redundant in the sense that the same number of tokens is spend per region, regardless of the visual information content in that region. Adaptive discretisation schemes like quadtrees are applied to allocate tokens for patches with varying sizes, but this just varies the region of influence for a token which nevertheless remains a local descriptor. Modern architectures add an attention mechanism to the autoencoder which infuses some degree of global information into the local tokens. Despite the global context, tokens are still associated with a local image region. In contrast, our method is inspired by spectral decompositions which transform an input signal into a superposition of global frequencies. Taking the data-driven perspective, we learn custom basis functions corresponding to the codebook entries in our VQ-VAE setup. Furthermore, a decoder combines these basis functions in a non-linear fashion, going beyond the simple linear superposition of spectral decompositions. We can achieve this global description with an efficient transpose operation between features and channels and demonstrate our performance on compression.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Semi-Supervised Domain Adaptation Using Target-Oriented Domain Augmentation for 3D Object Detection
Authors:
Yecheol Kim,
Junho Lee,
Changsoo Park,
Hyoung won Kim,
Inho Lim,
Christopher Chang,
Jun Won Choi
Abstract:
3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abu…
▽ More
3D object detection is crucial for applications like autonomous driving and robotics. However, in real-world environments, variations in sensor data distribution due to sensor upgrades, weather changes, and geographic differences can adversely affect detection performance. Semi-Supervised Domain Adaptation (SSDA) aims to mitigate these challenges by transferring knowledge from a source domain, abundant in labeled data, to a target domain where labels are scarce. This paper presents a new SSDA method referred to as Target-Oriented Domain Augmentation (TODA) specifically tailored for LiDAR-based 3D object detection. TODA efficiently utilizes all available data, including labeled data in the source domain, and both labeled data and unlabeled data in the target domain to enhance domain adaptation performance. TODA consists of two stages: TargetMix and AdvMix. TargetMix employs mixing augmentation accounting for LiDAR sensor characteristics to facilitate feature alignment between the source-domain and target-domain. AdvMix applies point-wise adversarial augmentation with mixing augmentation, which perturbs the unlabeled data to align the features within both labeled and unlabeled data in the target domain. Our experiments conducted on the challenging domain adaptation tasks demonstrate that TODA outperforms existing domain adaptation techniques designed for 3D object detection by significant margins. The code is available at: https://github.com/rasd3/TODA.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Quantitative Characterization of Retinal Features in Translated OCTA
Authors:
Rashadul Hasan Badhon,
Atalie Carina Thompson,
Jennifer I. Lim,
Theodore Leng,
Minhaj Nur Alam
Abstract:
Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and…
▽ More
Purpose: This study explores the feasibility of using generative machine learning (ML) to translate Optical Coherence Tomography (OCT) images into Optical Coherence Tomography Angiography (OCTA) images, potentially bypassing the need for specialized OCTA hardware. Methods: The method involved implementing a generative adversarial network framework that includes a 2D vascular segmentation model and a 2D OCTA image translation model. The study utilizes a public dataset of 500 patients, divided into subsets based on resolution and disease status, to validate the quality of TR-OCTA images. The validation employs several quality and quantitative metrics to compare the translated images with ground truth OCTAs (GT-OCTA). We then quantitatively characterize vascular features generated in TR-OCTAs with GT-OCTAs to assess the feasibility of using TR-OCTA for objective disease diagnosis. Result: TR-OCTAs showed high image quality in both 3 and 6 mm datasets (high-resolution, moderate structural similarity and contrast quality compared to GT-OCTAs). There were slight discrepancies in vascular metrics, especially in diseased patients. Blood vessel features like tortuosity and vessel perimeter index showed a better trend compared to density features which are affected by local vascular distortions. Conclusion: This study presents a promising solution to the limitations of OCTA adoption in clinical practice by using vascular features from TR-OCTA for disease detection. Translation relevance: This study has the potential to significantly enhance the diagnostic process for retinal diseases by making detailed vascular imaging more widely available and reducing dependency on costly OCTA equipment.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Partial Symmetry Detection for 3D Geometry using Contrastive Learning with Geodesic Point Cloud Patches
Authors:
Gregor Kobsik,
Isaak Lim,
Leif Kobbelt
Abstract:
Symmetry detection, especially partial and extrinsic symmetry, is essential for various downstream tasks, like 3D geometry completion, segmentation, compression and structure-aware shape encoding or generation. In order to detect partial extrinsic symmetries, we propose to learn rotation, reflection, translation and scale invariant local shape features for geodesic point cloud patches via contrast…
▽ More
Symmetry detection, especially partial and extrinsic symmetry, is essential for various downstream tasks, like 3D geometry completion, segmentation, compression and structure-aware shape encoding or generation. In order to detect partial extrinsic symmetries, we propose to learn rotation, reflection, translation and scale invariant local shape features for geodesic point cloud patches via contrastive learning, which are robust across multiple classes and generalize over different datasets. We show that our approach is able to extract multiple valid solutions for this ambiguous problem. Furthermore, we introduce a novel benchmark test for partial extrinsic symmetry detection to evaluate our method. Lastly, we incorporate the detected symmetries together with a region growing algorithm to demonstrate a downstream task with the goal of computing symmetry-aware partitions of 3D shapes. To our knowledge, we are the first to propose a self-supervised data-driven method for partial extrinsic symmetry detection.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Adaptive Voronoi NeRFs
Authors:
Tim Elsner,
Victor Czech,
Julia Berger,
Zain Selman,
Isaak Lim,
Leif Kobbelt
Abstract:
Neural Radiance Fields (NeRFs) learn to represent a 3D scene from just a set of registered images. Increasing sizes of a scene demands more complex functions, typically represented by neural networks, to capture all details. Training and inference then involves querying the neural network millions of times per image, which becomes impractically slow. Since such complex functions can be replaced by…
▽ More
Neural Radiance Fields (NeRFs) learn to represent a 3D scene from just a set of registered images. Increasing sizes of a scene demands more complex functions, typically represented by neural networks, to capture all details. Training and inference then involves querying the neural network millions of times per image, which becomes impractically slow. Since such complex functions can be replaced by multiple simpler functions to improve speed, we show that a hierarchy of Voronoi diagrams is a suitable choice to partition the scene. By equipping each Voronoi cell with its own NeRF, our approach is able to quickly learn a scene representation. We propose an intuitive partitioning of the space that increases quality gains during training by distributing information evenly among the networks and avoids artifacts through a top-down adaptive refinement. Our framework is agnostic to the underlying NeRF method and easy to implement, which allows it to be applied to various NeRF variants for improved learning and rendering speeds.
△ Less
Submitted 30 March, 2023; v1 submitted 28 March, 2023;
originally announced March 2023.
-
Localized Latent Updates for Fine-Tuning Vision-Language Models
Authors:
Moritz Ibing,
Isaak Lim,
Leif Kobbelt
Abstract:
Although massive pre-trained vision-language models like CLIP show impressive generalization capabilities for many tasks, still it often remains necessary to fine-tune them for improved performance on specific datasets. When doing so, it is desirable that updating the model is fast and that the model does not lose its capabilities on data outside of the dataset, as is often the case with classical…
▽ More
Although massive pre-trained vision-language models like CLIP show impressive generalization capabilities for many tasks, still it often remains necessary to fine-tune them for improved performance on specific datasets. When doing so, it is desirable that updating the model is fast and that the model does not lose its capabilities on data outside of the dataset, as is often the case with classical fine-tuning approaches. In this work we suggest a lightweight adapter, that only updates the models predictions close to seen datapoints. We demonstrate the effectiveness and speed of this relatively simple approach in the context of few-shot learning, where our results both on classes seen and unseen during training are comparable with or improve on the state of the art.
△ Less
Submitted 13 December, 2022;
originally announced December 2022.
-
Contrastive learning-based pretraining improves representation and transferability of diabetic retinopathy classification models
Authors:
Minhaj Nur Alam,
Rikiya Yamashita,
Vignav Ramesh,
Tejas Prabhune,
Jennifer I. Lim,
R. V. P. Chan,
Joelle Hallak,
Theodore Leng,
Daniel Rubin
Abstract:
Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural s…
▽ More
Self supervised contrastive learning based pretraining allows development of robust and generalized deep learning models with small, labeled datasets, reducing the burden of label generation. This paper aims to evaluate the effect of CL based pretraining on the performance of referrable vs non referrable diabetic retinopathy (DR) classification. We have developed a CL based framework with neural style transfer (NST) augmentation to produce models with better representations and initializations for the detection of DR in color fundus images. We compare our CL pretrained model performance with two state of the art baseline models pretrained with Imagenet weights. We further investigate the model performance with reduced labeled training data (down to 10 percent) to test the robustness of the model when trained with small, labeled datasets. The model is trained and validated on the EyePACS dataset and tested independently on clinical data from the University of Illinois, Chicago (UIC). Compared to baseline models, our CL pretrained FundusNet model had higher AUC (CI) values (0.91 (0.898 to 0.930) vs 0.80 (0.783 to 0.820) and 0.83 (0.801 to 0.853) on UIC data). At 10 percent labeled training data, the FundusNet AUC was 0.81 (0.78 to 0.84) vs 0.58 (0.56 to 0.64) and 0.63 (0.60 to 0.66) in baseline models, when tested on the UIC dataset. CL based pretraining with NST significantly improves DL classification performance, helps the model generalize well (transferable from EyePACS to UIC data), and allows training with small, annotated datasets, therefore reducing ground truth annotation burden of the clinicians.
△ Less
Submitted 24 August, 2022;
originally announced August 2022.
-
A Synergy of Institutional Incentives and Networked Structures in Evolutionary Game Dynamics of Multi-agent Systems
Authors:
Ik Soo Lim,
Valerio Capraro
Abstract:
Understanding the emergence of prosocial behaviours (e.g., cooperation and trust) among self-interested agents is an important problem in many disciplines. Network structure and institutional incentives (e.g., punishing antisocial agents) are known to promote prosocial behaviours, when acting in isolation, one mechanism being present at a time. Here we study the interplay between these two mechani…
▽ More
Understanding the emergence of prosocial behaviours (e.g., cooperation and trust) among self-interested agents is an important problem in many disciplines. Network structure and institutional incentives (e.g., punishing antisocial agents) are known to promote prosocial behaviours, when acting in isolation, one mechanism being present at a time. Here we study the interplay between these two mechanisms to see whether they are independent, interfering or synergetic. Using evolutionary game theory, we show that punishing antisocial agents and a regular networked structure not only promote prosocial behaviours among agents playing the trust game, but they also interplay with each other, leading to interference or synergy, depending on the game parameters. Synergy emerges on a wider range of parameters than interference does. In this domain, the combination of incentives and networked structure improves the efficiency of incentives, yielding prosocial behaviours at a lower cost than the incentive does alone. This has a significant implication in the promotion of prosocial behaviours in multi-agent systems.
△ Less
Submitted 27 March, 2022; v1 submitted 6 December, 2021;
originally announced December 2021.
-
3D Shape Generation with Grid-based Implicit Functions
Authors:
Moritz Ibing,
Isaak Lim,
Leif Kobbelt
Abstract:
Previous approaches to generate shapes in a 3D setting train a GAN on the latent space of an autoencoder (AE). Even though this produces convincing results, it has two major shortcomings. As the GAN is limited to reproduce the dataset the AE was trained on, we cannot reuse a trained AE for novel data. Furthermore, it is difficult to add spatial supervision into the generation process, as the AE on…
▽ More
Previous approaches to generate shapes in a 3D setting train a GAN on the latent space of an autoencoder (AE). Even though this produces convincing results, it has two major shortcomings. As the GAN is limited to reproduce the dataset the AE was trained on, we cannot reuse a trained AE for novel data. Furthermore, it is difficult to add spatial supervision into the generation process, as the AE only gives us a global representation. To remedy these issues, we propose to train the GAN on grids (i.e. each cell covers a part of a shape). In this representation each cell is equipped with a latent vector provided by an AE. This localized representation enables more expressiveness (since the cell-based latent vectors can be combined in novel ways) as well as spatial control of the generation process (e.g. via bounding boxes). Our method outperforms the current state of the art on all established evaluation measures, proposed for quantitatively evaluating the generative capabilities of GANs. We show limitations of these measures and propose the adaptation of a robust criterion from statistical analysis as an alternative.
△ Less
Submitted 22 July, 2021;
originally announced July 2021.
-
A Convolutional Decoder for Point Clouds using Adaptive Instance Normalization
Authors:
Isaak Lim,
Moritz Ibing,
Leif Kobbelt
Abstract:
Automatic synthesis of high quality 3D shapes is an ongoing and challenging area of research. While several data-driven methods have been proposed that make use of neural networks to generate 3D shapes, none of them reach the level of quality that deep learning synthesis approaches for images provide. In this work we present a method for a convolutional point cloud decoder/generator that makes use…
▽ More
Automatic synthesis of high quality 3D shapes is an ongoing and challenging area of research. While several data-driven methods have been proposed that make use of neural networks to generate 3D shapes, none of them reach the level of quality that deep learning synthesis approaches for images provide. In this work we present a method for a convolutional point cloud decoder/generator that makes use of recent advances in the domain of image synthesis. Namely, we use Adaptive Instance Normalization and offer an intuition on why it can improve training. Furthermore, we propose extensions to the minimization of the commonly used Chamfer distance for auto-encoding point clouds. In addition, we show that careful sampling is important both for the input geometry and in our point cloud generation process to improve results. The results are evaluated in an auto-encoding setup to offer both qualitative and quantitative analysis. The proposed decoder is validated by an extensive ablation study and is able to outperform current state of the art results in a number of experiments. We show the applicability of our method in the fields of point cloud upsampling, single view reconstruction, and shape synthesis.
△ Less
Submitted 27 June, 2019;
originally announced June 2019.
-
A Simple Approach to Intrinsic Correspondence Learning on Unstructured 3D Meshes
Authors:
Isaak Lim,
Alexander Dielen,
Marcel Campen,
Leif Kobbelt
Abstract:
The question of representation of 3D geometry is of vital importance when it comes to leveraging the recent advances in the field of machine learning for geometry processing tasks. For common unstructured surface meshes state-of-the-art methods rely on patch-based or mapping-based techniques that introduce resampling operations in order to encode neighborhood information in a structured and regula…
▽ More
The question of representation of 3D geometry is of vital importance when it comes to leveraging the recent advances in the field of machine learning for geometry processing tasks. For common unstructured surface meshes state-of-the-art methods rely on patch-based or mapping-based techniques that introduce resampling operations in order to encode neighborhood information in a structured and regular manner. We investigate whether such resampling can be avoided, and propose a simple and direct encoding approach. It does not only increase processing efficiency due to its simplicity - its direct nature also avoids any loss in data fidelity. To evaluate the proposed method, we perform a number of experiments in the challenging domain of intrinsic, non-rigid shape correspondence estimation. In comparisons to current methods we observe that our approach is able to achieve highly competitive results.
△ Less
Submitted 26 September, 2018; v1 submitted 18 September, 2018;
originally announced September 2018.
-
Anchored Network Users: Stochastic Evolutionary Dynamics of Cognitive Radio Network Selection
Authors:
Ik Soo Lim,
Peter Wittek
Abstract:
To solve the spectrum scarcity problem, the cognitive radio technology involves licensed users and unlicensed users. A fundamental issue for the network users is whether it is better to act as a licensed user by using a primary network or an unlicensed user by using a secondary network. To model the network selection process by the users, the deterministic replicator dynamics is often used, but in…
▽ More
To solve the spectrum scarcity problem, the cognitive radio technology involves licensed users and unlicensed users. A fundamental issue for the network users is whether it is better to act as a licensed user by using a primary network or an unlicensed user by using a secondary network. To model the network selection process by the users, the deterministic replicator dynamics is often used, but in a less practical way that it requires each user to know global information on the network state for reaching a Nash equilibrium. This paper addresses the network selection process in a more practical way such that only noise-prone estimation of local information is required and, yet, it obtains an efficient system performance.
△ Less
Submitted 20 December, 2017;
originally announced December 2017.
-
Risk and Ambiguity in Information Seeking: Eye Gaze Patterns Reveal Contextual Behaviour in Dealing with Uncertainty
Authors:
Peter Wittek,
Ying-Hsang Liu,
Sándor Darányi,
Tom Gedeon,
Ik Soo Lim
Abstract:
Information foraging connects optimal foraging theory in ecology with how humans search for information. The theory suggests that, following an information scent, the information seeker must optimize the tradeoff between exploration by repeated steps in the search space vs. exploitation, using the resources encountered. We conjecture that this tradeoff characterizes how a user deals with uncertain…
▽ More
Information foraging connects optimal foraging theory in ecology with how humans search for information. The theory suggests that, following an information scent, the information seeker must optimize the tradeoff between exploration by repeated steps in the search space vs. exploitation, using the resources encountered. We conjecture that this tradeoff characterizes how a user deals with uncertainty and its two aspects, risk and ambiguity in economic theory. Risk is related to the perceived quality of the actually visited patch of information, and can be reduced by exploiting and understanding the patch to a better extent. Ambiguity, on the other hand, is the opportunity cost of having higher quality patches elsewhere in the search space. The aforementioned tradeoff depends on many attributes, including traits of the user: at the two extreme ends of the spectrum, analytic and wholistic searchers employ entirely different strategies. The former type focuses on exploitation first, interspersed with bouts of exploration, whereas the latter type prefers to explore the search space first and consume later. Based on an eye-tracking study of experts' interactions with novel search interfaces in the biomedical domain, we demonstrate that perceived risk shifts the balance between exploration and exploitation in either type of users, tilting it against vs. in favour of ambiguity minimization. Since the pattern of behaviour in information foraging is quintessentially sequential, risk and ambiguity minimization cannot happen simultaneously, leading to a fundamental limit on how good such a tradeoff can be. This in turn connects information seeking with the emergent field of quantum decision theory.
△ Less
Submitted 27 June, 2016;
originally announced June 2016.
-
Somoclu: An Efficient Parallel Library for Self-Organizing Maps
Authors:
Peter Wittek,
Shi Chao Gao,
Ik Soo Lim,
Li Zhao
Abstract:
Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, su…
▽ More
Somoclu is a massively parallel tool for training self-organizing maps on large data sets written in C++. It builds on OpenMP for multicore execution, and on MPI for distributing the workload across the nodes in a cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, R and MATLAB interfaces facilitate interactive use. Apart from fast execution, memory use is highly optimized, enabling training large emergent maps even on a single computer.
△ Less
Submitted 9 June, 2017; v1 submitted 7 May, 2013;
originally announced May 2013.
-
MIMO Z Channel Interference Management
Authors:
Ian Lim
Abstract:
MIMO Z Channel is investigated in this paper. We focus on how to tackle the interference when different users try to send their codewords to their corresponding receivers while only one user will cause interference to the other. We assume there are two transmitters and two receivers each with two antennas. We propose a strategy to remove the interference while allowing different users transmit at…
▽ More
MIMO Z Channel is investigated in this paper. We focus on how to tackle the interference when different users try to send their codewords to their corresponding receivers while only one user will cause interference to the other. We assume there are two transmitters and two receivers each with two antennas. We propose a strategy to remove the interference while allowing different users transmit at the same time. Our strategy is low-complexity while the performance is good. Mathematical analysis is provided and simulations are given based on our system.
△ Less
Submitted 30 March, 2012;
originally announced April 2012.