subscribe to arXiv mailings

RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets

Authors: Piotr Gaiński, Michał Koziarski, Krzysztof Maziarz, Marwin Segler, Jacek Tabor, Marek Śmieja

Abstract: Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequen… ▽ More Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequently, the existing models are not encouraged to explore the space of possible reactions sufficiently. In this paper, we propose a novel single-step retrosynthesis model, RetroGFN, that can explore outside the limited dataset and return a diverse set of feasible reactions by leveraging a feasibility proxy model during the training. We show that RetroGFN achieves competitive results on standard top-k accuracy while outperforming existing methods on round-trip accuracy. Moreover, we provide empirical arguments in favor of using round-trip accuracy which expands the notion of feasibility with respect to the standard top-k accuracy metric. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.08267 [pdf, other]

A deep cut into Split Federated Self-supervised Learning

Authors: Marcin Przewięźlikowski, Marcin Osial, Bartosz Zieliński, Marek Śmieja

Abstract: Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demon… ▽ More Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demonstrate that splitting depth is crucial for maintaining privacy and communication efficiency in distributed training. We also show that MocoSFL suffers from a catastrophic quality deterioration for the minimal communication overhead. As a remedy, we introduce Momentum-Aligned contrastive Split Federated Learning (MonAcoSFL), which aligns online and momentum client models during training procedure. Consequently, we achieve state-of-the-art accuracy while significantly reducing the communication overhead, making MonAcoSFL more practical in real-world scenarios. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: Accepted to European Conference on Machine Learning (ECML) 2024

arXiv:2309.12033 [pdf, other]

Face Identity-Aware Disentanglement in StyleGAN

Authors: Adrian Suwała, Bartosz Wójcik, Magdalena Proszewska, Jacek Tabor, Przemysław Spurek, Marek Śmieja

Abstract: Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plu… ▽ More Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.02198 [pdf, other]

ChiENN: Embracing Molecular Chirality with Graph Neural Networks

Authors: Piotr Gaiński, Michał Koziarski, Jacek Tabor, Marek Śmieja

Abstract: Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enanti… ▽ More Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enantiomers can have very distinct biochemical properties. In this paper, we propose a theoretically justified message-passing scheme, which makes GNNs sensitive to the order of node neighbors. We apply that general concept in the context of molecular chirality to construct Chiral Edge Neural Network (ChiENN) layer which can be appended to any GNN model to enable chirality-awareness. Our experiments show that adding ChiENN layers to a GNN outperforms current state-of-the-art methods in chiral-sensitive molecular property prediction tasks. △ Less

Submitted 10 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.06082 [pdf, other]

Augmentation-aware Self-supervised Learning with Conditioned Projector

Authors: Marcin Przewięźlikowski, Mateusz Pyla, Bartosz Zieliński, Bartłomiej Twardowski, Jacek Tabor, Marek Śmieja

Abstract: Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used durin… ▽ More Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used during pretraining, such as color. In this paper, we propose to foster sensitivity to such characteristics in the representation space by modifying the projector network, a common component of self-supervised architectures. Specifically, we supplement the projector with information about augmentations applied to images. In order for the projector to take advantage of this auxiliary conditioning when solving the SSL task, the feature extractor learns to preserve the augmentation information in its representations. Our approach, coined Conditional Augmentation-aware Self-supervised Learning (CASSLE), is directly applicable to typical joint-embedding SSL methods regardless of their objective functions. Moreover, it does not require major changes in the network architecture or prior knowledge of downstream tasks. In addition to an analysis of sensitivity towards different data augmentations, we conduct a series of experiments, which show that CASSLE improves over various SSL methods, reaching state-of-the-art performance in multiple downstream tasks. △ Less

Submitted 2 December, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

Comments: Prepint under review. Code: https://github.com/gmum/CASSLE

arXiv:2304.05243 [pdf, other]

r-softmax: Generalized Softmax with Controllable Sparsity Rate

Authors: Klaudia Bałazy, Łukasz Struski, Marek Śmieja, Jacek Tabor

Abstract: Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the posit… ▽ More Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the positive probability to all positions. In this paper, we propose r-softmax, a modification of the softmax, outputting sparse probability distribution with controllable sparsity rate. In contrast to the existing sparse probability mapping functions, we provide an intuitive mechanism for controlling the output sparsity level. We show on several multi-label datasets that r-softmax outperforms other sparse alternatives to softmax and is highly competitive with the original softmax. We also apply r-softmax to the self-attention module of a pre-trained transformer language model and demonstrate that it leads to improved performance when fine-tuning the model on different natural language processing tasks. △ Less

Submitted 21 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

arXiv:2304.03543 [pdf, other]

HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets

Authors: Witold Wydmański, Oleksii Bulenok, Marek Śmieja

Abstract: Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challen… ▽ More Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challenge, we introduce HyperTab, a hypernetwork-based approach to solving small sample problems on tabular datasets. By combining the advantages of Random Forests and neural networks, HyperTab generates an ensemble of neural networks, where each target model is specialized to process a specific lower-dimensional view of the data. Since each view plays the role of data augmentation, we virtually increase the number of training samples while keeping the number of trainable parameters unchanged, which prevents model overfitting. We evaluated HyperTab on more than 40 tabular datasets of a varying number of samples and domains of origin, and compared its performance with shallow and deep learning models representing the current state-of-the-art. We show that HyperTab consistently outranks other methods on small data (with a statistically significant difference) and scores comparable to them on larger datasets. We make a python package with the code available to download at https://pypi.org/project/hypertab/ △ Less

Submitted 24 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.03389 [pdf, other]

Contrastive Hierarchical Clustering

Authors: Michał Znaleźniak, Przemysław Rola, Patryk Kaszuba, Jacek Tabor, Marek Śmieja

Abstract: Deep clustering has been dominated by flat models, which split a dataset into a predefined number of groups. Although recent methods achieve an extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can b… ▽ More Deep clustering has been dominated by flat models, which split a dataset into a predefined number of groups. Although recent methods achieve an extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can be applied to typical image data. By employing a self-supervised learning approach, CoHiClust distills the base network into a binary tree without access to any labeled data. The hierarchical clustering structure can be used to analyze the relationship between clusters, as well as to measure the similarity between data points. Experiments demonstrate that CoHiClust generates a reasonable structure of clusters, which is consistent with our intuition and image semantics. Moreover, it obtains superior clustering accuracy on most of the image datasets compared to the state-of-the-art flat clustering models. △ Less

Submitted 21 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

arXiv:2207.04874 [pdf, other]

Hebbian Continual Representation Learning

Authors: Paweł Morawiecki, Andrii Krutsylo, Maciej Wołczyk, Marek Śmieja

Abstract: Continual Learning aims to bring machine learning into a more realistic scenario, where tasks are learned sequentially and the i.i.d. assumption is not preserved. Although this setting is natural for biological systems, it proves very difficult for machine learning models such as artificial neural networks. To reduce this performance gap, we investigate the question whether biologically inspired H… ▽ More Continual Learning aims to bring machine learning into a more realistic scenario, where tasks are learned sequentially and the i.i.d. assumption is not preserved. Although this setting is natural for biological systems, it proves very difficult for machine learning models such as artificial neural networks. To reduce this performance gap, we investigate the question whether biologically inspired Hebbian learning is useful for tackling continual challenges. In particular, we highlight a realistic and often overlooked unsupervised setting, where the learner has to build representations without any supervision. By combining sparse neural networks with Hebbian learning principle, we build a simple yet effective alternative (HebbCL) to typical neural network models trained via the gradient descent. Due to Hebbian learning, the network have easily interpretable weights, which might be essential in critical application such as security or healthcare. We demonstrate the efficacy of HebbCL in an unsupervised learning setting applied to MNIST and Omniglot datasets. We also adapt the algorithm to the supervised scenario and obtain promising results in the class-incremental learning. △ Less

Submitted 28 June, 2022; originally announced July 2022.

arXiv:2206.13923 [pdf, other]

SLOVA: Uncertainty Estimation Using Single Label One-Vs-All Classifier

Authors: Bartosz Wójcik, Jacek Grela, Marek Śmieja, Krzysztof Misztal, Jacek Tabor

Abstract: Deep neural networks present impressive performance, yet they cannot reliably estimate their predictive confidence, limiting their applicability in high-risk domains. We show that applying a multi-label one-vs-all loss reveals classification ambiguity and reduces model overconfidence. The introduced SLOVA (Single Label One-Vs-All) model redefines typical one-vs-all predictive probabilities to a si… ▽ More Deep neural networks present impressive performance, yet they cannot reliably estimate their predictive confidence, limiting their applicability in high-risk domains. We show that applying a multi-label one-vs-all loss reveals classification ambiguity and reduces model overconfidence. The introduced SLOVA (Single Label One-Vs-All) model redefines typical one-vs-all predictive probabilities to a single label situation, where only one class is the correct answer. The proposed classifier is confident only if a single class has a high probability and other probabilities are negligible. Unlike the typical softmax function, SLOVA naturally detects out-of-distribution samples if the probabilities of all other classes are small. The model is additionally fine-tuned with exponential calibration, which allows us to precisely align the confidence score with model accuracy. We verify our approach on three tasks. First, we demonstrate that SLOVA is competitive with the state-of-the-art on in-distribution calibration. Second, the performance of SLOVA is robust under dataset shifts. Finally, our approach performs extremely well in the detection of out-of-distribution samples. Consequently, SLOVA is a tool that can be used in various applications where uncertainty modeling is required. △ Less

Submitted 28 June, 2022; originally announced June 2022.

arXiv:2112.09397 [pdf, other]

doi 10.1145/3477314.3507181

Semi-Supervised Clustering via Information-Theoretic Markov Chain Aggregation

Authors: Sophie Steger, Bernhard C. Geiger, Marek Smieja

Abstract: We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorpor… ▽ More We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorporating semi-supervision information as hard constraints in a Hartigan-style algorithm. The introduced Constrained Markov Clustering (CoMaC) is an extension of a recent information-theoretic framework for (unsupervised) Markov aggregation to the semi-supervised case. Instantiating CoMaC for certain parameter settings further generalizes two previous information-theoretic objectives for unsupervised clustering. Our results indicate that CoMaC is competitive with the state-of-the-art. △ Less

Submitted 7 February, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

Comments: 13 pages, 6 figures; this is an extended version of a short paper accepted at ACM SAC 2022 (minor changes to the text; error in source code corrected)

ACM Class: H.1.1; I.5.3; I.2.0

Journal ref: Proc. of ACM/SIGAPP Symposium on Applied Computing, pp. 1136-1139, 2022

arXiv:2110.14010 [pdf, other]

MisConv: Convolutional Neural Networks for Missing Data

Authors: Marcin Przewięźlikowski, Marek Śmieja, Łukasz Struski, Jacek Tabor

Abstract: Processing of missing data by modern neural networks, such as CNNs, remains a fundamental, yet unsolved challenge, which naturally arises in many practical applications, like image inpainting or autonomous vehicles and robots. While imputation-based techniques are still one of the most popular solutions, they frequently introduce unreliable information to the data and do not take into account the… ▽ More Processing of missing data by modern neural networks, such as CNNs, remains a fundamental, yet unsolved challenge, which naturally arises in many practical applications, like image inpainting or autonomous vehicles and robots. While imputation-based techniques are still one of the most popular solutions, they frequently introduce unreliable information to the data and do not take into account the uncertainty of estimation, which may be destructive for a machine learning model. In this paper, we present MisConv, a general mechanism, for adapting various CNN architectures to process incomplete images. By modeling the distribution of missing values by the Mixture of Factor Analyzers, we cover the spectrum of possible replacements and find an analytical formula for the expected value of convolution operator applied to the incomplete image. The whole framework is realized by matrix operations, which makes MisConv extremely efficient in practice. Experiments performed on various image processing tasks demonstrate that MisConv achieves superior or comparable performance to the state-of-the-art methods. △ Less

Submitted 29 October, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

Comments: Accepted for publication at WACV 2022 Conference

arXiv:2110.01339 [pdf]

doi 10.1021/acs.jcim.1c00589

Pharmacoprint -- a combination of pharmacophore fingerprint and artificial intelligence as a tool for computer-aided drug design

Authors: Dawid Warszycki, Łukasz Struski, Marek Śmieja, Rafał Kafel, Rafał Kurczab

Abstract: Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least two decades in various fields of cheminformatics: from similarity searching to machine learning (ML). Advances in silico techniques consequently led to combining both these methodologies into a new approach known as pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fin… ▽ More Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least two decades in various fields of cheminformatics: from similarity searching to machine learning (ML). Advances in silico techniques consequently led to combining both these methodologies into a new approach known as pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fingerprint called Pharmacoprint that encodes the presence, types, and relationships between pharmacophore features of a molecule. Pharmacoprint was evaluated in classification experiments by using ML algorithms (logistic regression, support vector machines, linear support vector machines, and neural networks) and outperformed other popular molecular fingerprints (i.e., Estate, MACCS, PubChem, Substructure, Klekotha-Roth, CDK, Extended, and GraphOnly) and ChemAxon Pharmacophoric Features fingerprint. Pharmacoprint consisted of 39973 bits; several methods were applied for dimensionality reduction, and the best algorithm not only reduced the length of bit string but also improved the efficiency of ML tests. Further optimization allowed us to define the best parameter settings for using Pharmacoprint in discrimination tests and for maximizing statistical parameters. Finally, Pharmacoprint generated for 3D structures with defined hydrogens as input data was applied to neural networks with a supervised autoencoder for selecting the most important bits and allowed to maximize Matthews Correlation Coefficient up to 0.962. The results show the potential of Pharmacoprint as a new, perspective tool for computer-aided drug design. △ Less

Submitted 31 October, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

Comments: Journal of Chemical Information and Modeling (2021)

arXiv:2109.09011 [pdf, other]

PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

Authors: Maciej Wołczyk, Magdalena Proszewska, Łukasz Maziarka, Maciej Zięba, Patryk Wielopolski, Rafał Kurczab, Marek Śmieja

Abstract: Modern generative models achieve excellent quality in a variety of tasks including image or text generation and chemical molecule modeling. However, existing methods often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. Incorporating such additional conditioning factors would require rebu… ▽ More Modern generative models achieve excellent quality in a variety of tasks including image or text generation and chemical molecule modeling. However, existing methods often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. Incorporating such additional conditioning factors would require rebuilding the entire architecture and optimizing the parameters from scratch. Moreover, it is difficult to disentangle selected attributes so that to perform edits of only one attribute while leaving the others unchanged. To overcome these limitations we propose PluGeN (Plugin Generative Network), a simple yet effective generative technique that can be used as a plugin to pre-trained generative models. The idea behind our approach is to transform the entangled latent representation using a flow-based module into a multi-dimensional space where the values of each attribute are modeled as an independent one-dimensional distribution. In consequence, PluGeN can generate new samples with desired attributes as well as manipulate labeled attributes of existing examples. Due to the disentangling of the latent representation, we are even able to generate samples with rare or unseen combinations of attributes in the dataset, such as a young person with gray hair, men with make-up, or women with beards. We combined PluGeN with GAN and VAE models and applied it to conditional generation and manipulation of images and chemical molecule modeling. Experiments demonstrate that PluGeN preserves the quality of backbone models while adding the ability to control the values of labeled attributes. △ Less

Submitted 3 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

arXiv:2108.04907 [pdf, other]

Flow-based SVDD for anomaly detection

Authors: Marcin Sendera, Marek Śmieja, Łukasz Maziarka, Łukasz Struski, Przemysław Spurek, Jacek Tabor

Abstract: We propose FlowSVDD -- a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable res… ▽ More We propose FlowSVDD -- a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable results to the current state-of-the-art methods and significantly outperforms related deep SVDD methods on benchmark datasets. △ Less

Submitted 10 August, 2021; originally announced August 2021.

Comments: arXiv admin note: text overlap with arXiv:2010.03002

arXiv:2107.13214 [pdf, other]

SONG: Self-Organizing Neural Graphs

Authors: Łukasz Struski, Tomasz Danel, Marek Śmieja, Jacek Tabor, Bartosz Zieliński

Abstract: Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. H… ▽ More Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. However, one of the well-known drawbacks of decision trees, as compared to decision graphs, is that decision trees cannot reuse the decision nodes. Nevertheless, decision graphs were not commonly used in deep learning due to the lack of efficient gradient-based training techniques. In this paper, we fill this gap and provide a general paradigm based on Markov processes, which allows for efficient training of the special type of decision graphs, which we call Self-Organizing Neural Graphs (SONG). We provide an extensive theoretical study of SONG, complemented by experiments conducted on Letter, Connect4, MNIST, CIFAR, and TinyImageNet datasets, showing that our method performs on par or better than existing decision models. △ Less

Submitted 28 July, 2021; originally announced July 2021.

arXiv:2106.05409 [pdf, other]

Zero Time Waste: Recycling Predictions in Early Exit Neural Networks

Authors: Maciej Wołczyk, Bartosz Wójcik, Klaudia Bałazy, Igor Podolak, Jacek Tabor, Marek Śmieja, Tomasz Trzciński

Abstract: The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. Howeve… ▽ More The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. However, if a particular IC does not decide to return an answer early, its predictions are discarded, with its computations effectively being wasted. To solve this issue, we introduce Zero Time Waste (ZTW), a novel approach in which each IC reuses predictions returned by its predecessors by (1) adding direct connections between ICs and (2) combining previous outputs in an ensemble-like manner. We conduct extensive experiments across various datasets and architectures to demonstrate that ZTW achieves a significantly better accuracy vs. inference time trade-off than other recently proposed early exit methods. △ Less

Submitted 5 December, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

Comments: Accepted at NeurIPS 2021

arXiv:2011.14620 [pdf, other]

RegFlow: Probabilistic Flow-based Regression for Future Prediction

Authors: Maciej Zięba, Marcin Przewięźlikowski, Marek Śmieja, Jacek Tabor, Tomasz Trzcinski, Przemysław Spurek

Abstract: Predicting future states or actions of a given system remains a fundamental, yet unsolved challenge of intelligence, especially in the scope of complex and non-deterministic scenarios, such as modeling behavior of humans. Existing approaches provide results under strong assumptions concerning unimodality of future states, or, at best, assuming specific probability distributions that often poorly f… ▽ More Predicting future states or actions of a given system remains a fundamental, yet unsolved challenge of intelligence, especially in the scope of complex and non-deterministic scenarios, such as modeling behavior of humans. Existing approaches provide results under strong assumptions concerning unimodality of future states, or, at best, assuming specific probability distributions that often poorly fit to real-life conditions. In this work we introduce a robust and flexible probabilistic framework that allows to model future predictions with virtually no constrains regarding the modality or underlying probability distribution. To achieve this goal, we leverage a hypernetwork architecture and train a continuous normalizing flow model. The resulting method dubbed RegFlow achieves state-of-the-art results on several benchmark datasets, outperforming competing approaches by a significant margin. △ Less

Submitted 30 November, 2020; originally announced November 2020.

arXiv:2010.13914 [pdf, other]

Processing of incomplete images by (graph) convolutional neural networks

Authors: Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Łukasz Maziarka

Abstract: We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical… ▽ More We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical CNNs operating on images. On one hand, our approach avoids the problem of missing data imputation while, on the other hand, there is a natural correspondence between CNNs and SGCN. Experiments confirm that our approach performs better than analogical CNNs with the imputation of missing values on typical classification and reconstruction tasks. △ Less

Submitted 26 October, 2020; originally announced October 2020.

arXiv:2010.03002 [pdf, other]

doi 10.1109/TPAMI.2021.3108223

OneFlow: One-class flow for anomaly detection based on a minimal volume region

Authors: Łukasz Maziarka, Marek Śmieja, Marcin Sendera, Łukasz Struski, Jacek Tabor, Przemysław Spurek

Abstract: We propose OneFlow - a flow-based one-class classifier for anomaly (outlier) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located… ▽ More We propose OneFlow - a flow-based one-class classifier for anomaly (outlier) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located near to the decision boundary (behavior similar to the support vectors in SVM). The combination of flow models and a Bernstein quantile estimator allows OneFlow to find a parametric form of bounding region, which can be useful in various applications including describing shapes from 3D point clouds. Experiments show that the proposed model outperforms related methods on real-world anomaly detection problems. △ Less

Submitted 22 September, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

Journal ref: 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

arXiv:2010.02183 [pdf, other]

doi 10.1007/978-3-030-63836-8_19

Estimating conditional density of missing values using deep Gaussian mixture model

Authors: Marcin Przewięźlikowski, Marek Śmieja, Łukasz Struski

Abstract: We consider the problem of estimating the conditional probability distribution of missing values given the observed ones. We propose an approach, which combines the flexibility of deep neural networks with the simplicity of Gaussian mixture models (GMMs). Given an incomplete data point, our neural network returns the parameters of Gaussian distribution (in the form of Factor Analyzers model) repre… ▽ More We consider the problem of estimating the conditional probability distribution of missing values given the observed ones. We propose an approach, which combines the flexibility of deep neural networks with the simplicity of Gaussian mixture models (GMMs). Given an incomplete data point, our neural network returns the parameters of Gaussian distribution (in the form of Factor Analyzers model) representing the corresponding conditional density. We experimentally verify that our model provides better log-likelihood than conditional GMM trained in a typical way. Moreover, imputation obtained by replacing missing values using the mean vector of our model looks visually plausible. △ Less

Submitted 6 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

Comments: A preliminary version of this paper appeared as an extended abstract at the ICML 2020 Workshop on The Art of Learning with Missing Values

arXiv:2006.10013 [pdf, other]

Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

Authors: Bartosz Wójcik, Paweł Morawiecki, Marek Śmieja, Tomasz Krzyżek, Przemysław Spurek, Jacek Tabor

Abstract: We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives u… ▽ More We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives us insight into the behavior of adversarial examples and their flow through the layers of a deep neural network. Experimental results show that our method outperforms the state of the art in supervised and unsupervised settings. △ Less

Submitted 17 June, 2020; originally announced June 2020.

arXiv:2001.06720 [pdf, other]

A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

Authors: Marek Śmieja, Łukasz Struski, Mário A. T. Figueiredo

Abstract: In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully… ▽ More In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully pairwise-labeled dataset produced by the first stage in a supervised neural-network-based clustering method. The proposed approach, S3C2 (Semi-Supervised Siamese Classifiers for Clustering), is motivated by the observation that binary classification (such as assigning pairwise relations) is usually easier than multi-class clustering with partial supervision. On the other hand, being classification-based, our method solves only well-defined classification problems, rather than less well specified clustering tasks. Extensive experiments on various datasets demonstrate the high performance of the proposed method. △ Less

Submitted 18 January, 2020; originally announced January 2020.

arXiv:1910.02776 [pdf, other]

Biologically-Inspired Spatial Neural Networks

Authors: Maciej Wołczyk, Jacek Tabor, Marek Śmieja, Szymon Maszke

Abstract: We introduce bio-inspired artificial neural networks consisting of neurons that are additionally characterized by spatial positions. To simulate properties of biological systems we add the costs penalizing long connections and the proximity of neurons in a two-dimensional space. Our experiments show that in the case where the network performs two different tasks, the neurons naturally split into c… ▽ More We introduce bio-inspired artificial neural networks consisting of neurons that are additionally characterized by spatial positions. To simulate properties of biological systems we add the costs penalizing long connections and the proximity of neurons in a two-dimensional space. Our experiments show that in the case where the network performs two different tasks, the neurons naturally split into clusters, where each cluster is responsible for processing a different task. This behavior not only corresponds to the biological systems, but also allows for further insight into interpretability or continual learning. △ Less

Submitted 7 October, 2019; originally announced October 2019.

arXiv:1909.05310 [pdf, other]

Spatial Graph Convolutional Networks

Authors: Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek Śmieja, Łukasz Struski, Agnieszka Słowik, Łukasz Maziarka

Abstract: Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remed… ▽ More Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remedy this issue, we propose Spatial Graph Convolutional Network (SGCN) which uses spatial features to efficiently learn from graphs that can be naturally located in space. Our contribution is threefold: we propose a GCN-inspired architecture which (i) leverages node positions, (ii) is a proper generalization of both GCNs and Convolutional Neural Networks (CNNs), (iii) benefits from augmentation which further improves the performance and assures invariance with respect to the desired properties. Empirically, SGCN outperforms state-of-the-art graph-based methods on image classification and chemical tasks. △ Less

Submitted 2 July, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1906.09333 [pdf, other]

SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder

Authors: Marek Śmieja, Maciej Wołczyk, Jacek Tabor, Bernhard C. Geiger

Abstract: We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small… ▽ More We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small amount of labeled data and a Gaussian classifier induced by the target distribution. SeGMA is optimized efficiently due to the use of Cramer-Wold distance as a maximum mean discrepancy penalty, which yields a closed-form expression for a mixture of spherical Gaussian components and thus obviates the need of sampling. While SeGMA preserves all properties of its semi-supervised predecessors and achieves at least as good generative performance on standard benchmark data sets, it presents additional features: (a) interpolation between any pair of points in the latent space produces realistically-looking samples; (b) combining the interpolation property with disentangled class and style variables, SeGMA is able to perform a continuous style transfer from one class to another; (c) it is possible to change the intensity of class characteristics in a data point by moving the latent representation of the data point away from specific Gaussian components. △ Less

Submitted 27 August, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

arXiv:1906.00628 [pdf, other]

Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models

Authors: Paweł Morawiecki, Przemysław Spurek, Marek Śmieja, Jacek Tabor

Abstract: We present an efficient technique, which allows to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon the work of Gowal et al., who applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. While that method is faster than competitive approaches,… ▽ More We present an efficient technique, which allows to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon the work of Gowal et al., who applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. While that method is faster than competitive approaches, it requires careful tuning of hyper-parameters and a large number of epochs to converge. To speed up and stabilize training, we supply the cost function with an additional term, which encourages the model to keep the interval bounds at hidden layers small. Experimental results demonstrate that we can achieve comparable (or even better) results using a smaller number of training iterations, in a more stable fashion. Moreover, the proposed model is not so sensitive to the exact specification of the training process, which makes it easier to use by practitioners. △ Less

Submitted 3 July, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

arXiv:1902.10404 [pdf, other]

doi 10.1007/978-3-030-30493-5_48

Hypernetwork functional image representation

Authors: Sylwester Klocek, Łukasz Maziarka, Maciej Wołczyk, Jacek Tabor, Jakub Nowak, Marek Śmieja

Abstract: Motivated by the human way of memorizing images we introduce their functional representation, where an image is represented by a neural network. For this purpose, we construct a hypernetwork which takes an image and returns weights to the target network, which maps point from the plane (representing positions of the pixel) into its corresponding color in the image. Since the obtained representatio… ▽ More Motivated by the human way of memorizing images we introduce their functional representation, where an image is represented by a neural network. For this purpose, we construct a hypernetwork which takes an image and returns weights to the target network, which maps point from the plane (representing positions of the pixel) into its corresponding color in the image. Since the obtained representation is continuous, one can easily inspect the image at various resolutions and perform on it arbitrary continuous operations. Moreover, by inspecting interpolations we show that such representation has some properties characteristic to generative models. To evaluate the proposed mechanism experimentally, we apply it to image super-resolution problem. Despite using a single model for various scaling factors, we obtained results comparable to existing super-resolution methods. △ Less

Submitted 3 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

Journal ref: Artificial Neural Networks and Machine Learning -- ICANN 2019: Workshop and Special Sessions

arXiv:1810.01868 [pdf, other]

doi 10.1007/978-3-030-36711-4_35

Set Aggregation Network as a Trainable Pooling Layer

Authors: Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, Przemysław Spurek

Abstract: Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to e… ▽ More Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to embed a given set of features to a vector representation of arbitrary size. We show that by adjusting the size of embedding, SAN is capable of preserving the whole information from the input. In experiments, we demonstrate that replacing global pooling layer by SAN leads to the improvement of classification accuracy. Moreover, it is less prone to overfitting and can be used as a regularizer. △ Less

Submitted 25 November, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

Comments: ICONIP 2019

Journal ref: Neural Information Processing. ICONIP 2019

arXiv:1805.07405 [pdf, other]

Processing of missing data by neural networks

Authors: Marek Smieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek

Abstract: We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron's response in the first hidden layer by its expected value. This approach can be applied for various types of networks at minimal cost in their modification. Moreover, in contrast to recent approaches, it does not require complete data for training. Experime… ▽ More We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron's response in the first hidden layer by its expected value. This approach can be applied for various types of networks at minimal cost in their modification. Moreover, in contrast to recent approaches, it does not require complete data for training. Experimental results performed on different types of architectures show that our method gives better results than typical imputation strategies and other methods dedicated for incomplete data. △ Less

Submitted 3 April, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

arXiv:1803.04033 [pdf, other]

Cascade context encoder for improved inpainting

Authors: Bartosz Zieliński, Łukasz Struski, Marek Śmieja, Jacek Tabor

Abstract: In this paper, we analyze if cascade usage of the context encoder with increasing input can improve the results of the inpainting. For this purpose, we train context encoder for 64x64 pixels images in a standard way and use its resized output to fill in the missing input region of the 128x128 context encoder, both in training and evaluation phase. As the result, the inpainting is visibly more plau… ▽ More In this paper, we analyze if cascade usage of the context encoder with increasing input can improve the results of the inpainting. For this purpose, we train context encoder for 64x64 pixels images in a standard way and use its resized output to fill in the missing input region of the 128x128 context encoder, both in training and evaluation phase. As the result, the inpainting is visibly more plausible. In order to thoroughly verify the results, we introduce normalized squared-distortion, a measure for quantitative inpainting evaluation, and we provide its mathematical explanation. This is the first attempt to formalize the inpainting measure, which is based on the properties of latent feature representation, instead of L2 reconstruction loss. △ Less

Submitted 11 March, 2018; originally announced March 2018.

Comments: Supplemental materials are available at http://www.ii.uj.edu.pl/~zielinsb

arXiv:1707.03157 [pdf, other]

Efficient mixture model for clustering of sparse high dimensional binary data

Authors: Marek Śmieja, Krzysztof Hajto, Jacek Tabor

Abstract: In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the pro… ▽ More In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the processing of sparse data, -can be efficiently realized by an on-line Hartigan optimization algorithm, -is able to automatically reduce unnecessary clusters. We perform extensive experimental studies on various types of data, which confirm that SparseMix builds partitions with higher compatibility with reference grouping than related methods. Moreover, constructed representatives often better reveal the internal structure of data. △ Less

Submitted 11 July, 2017; originally announced July 2017.

arXiv:1705.02232 [pdf, other]

Spherical Wards clustering and generalized Voronoi diagrams

Authors: Marek Śmieja, Jacek Tabor

Abstract: Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards ap… ▽ More Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards approach. The algorithm finds the optimal number of clusters by automatically removing groups which carry no information. Moreover, it is scale invariant and allows for forming of spherically-shaped clusters of arbitrary sizes. In order to graphically represent and interpret the results the notion of Voronoi diagram was generalized to non Euclidean spaces and applied for introduced clustering method. △ Less

Submitted 4 May, 2017; originally announced May 2017.

arXiv:1705.01877 [pdf, other]

Semi-supervised model-based clustering with controlled clusters leakage

Authors: Marek Śmieja, Łukasz Struski, Jacek Tabor

Abstract: In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering.… ▽ More In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data. △ Less

Submitted 4 May, 2017; originally announced May 2017.

arXiv:1705.01601 [pdf, other]

doi 10.1016/j.ins.2017.07.016

Semi-supervised cross-entropy clustering with information bottleneck constraint

Authors: Marek Śmieja, Bernhard C. Geiger

Abstract: In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goa… ▽ More In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goals: the accuracy with which the data set is modeled, the simplicity of the model, and the consistency of the clustering with side information. Experiments demonstrate that CEC-IB has a performance comparable to Gaussian mixture models (GMM) in a classical semi-supervised scenario, but is faster, more robust to noisy labels, automatically determines the optimal number of clusters, and performs well when not all classes are present in the side information. Moreover, in contrast to other semi-supervised models, it can be successfully applied in discovering natural subgroups if the partition-level side information is derived from the top levels of a hierarchical clustering. △ Less

Submitted 3 May, 2017; originally announced May 2017.

Journal ref: Information Sciences, vol. 421, Dec. 2017, pp. 254-271

arXiv:1705.00840 [pdf, other]

Pointed subspace approach to incomplete data

Authors: Łukasz Struski, Marek Śmieja, Jacek Tabor

Abstract: Incomplete data are often represented as vectors with filled missing attributes joined with flag vectors indicating missing components. In this paper we generalize this approach and represent incomplete data as pointed affine subspaces. This allows to perform various affine transformations of data, as whitening or dimensionality reduction. We embed such generalized missing data into a vector space… ▽ More Incomplete data are often represented as vectors with filled missing attributes joined with flag vectors indicating missing components. In this paper we generalize this approach and represent incomplete data as pointed affine subspaces. This allows to perform various affine transformations of data, as whitening or dimensionality reduction. We embed such generalized missing data into a vector space by mapping pointed affine subspace (generalized missing data point) to a vector containing imputed values joined with a corresponding projection matrix. Such an operation preserves the scalar product of the embedding defined for flag vectors and allows to input transformed incomplete data to typical classification methods. △ Less

Submitted 2 May, 2017; originally announced May 2017.

Comments: 13 pages, 3 figures and 3 tables. arXiv admin note: text overlap with arXiv:1612.01480

arXiv:1612.01480 [pdf, other]

Generalized RBF kernel for incomplete data

Authors: Łukasz Struski, Marek Śmieja, Jacek Tabor

Abstract: We construct $\bf genRBF$ kernel, which generalizes the classical Gaussian RBF kernel to the case of incomplete data. We model the uncertainty contained in missing attributes making use of data distribution and associate every point with a conditional probability density function. This allows to embed incomplete data into the function space and to define a kernel between two missing data points ba… ▽ More We construct $\bf genRBF$ kernel, which generalizes the classical Gaussian RBF kernel to the case of incomplete data. We model the uncertainty contained in missing attributes making use of data distribution and associate every point with a conditional probability density function. This allows to embed incomplete data into the function space and to define a kernel between two missing data points based on scalar product in $L_2$. Experiments show that introduced kernel applied to SVM classifier gives better results than other state-of-the-art methods, especially in the case when large number of features is missing. Moreover, it is easy to implement and can be used together with any kernel approaches with no additional modifications. △ Less

Submitted 2 May, 2017; v1 submitted 5 December, 2016; originally announced December 2016.

Comments: 9 pages, 7 figures

arXiv:1508.04559 [pdf, other]

Introduction to Cross-Entropy Clustering The R Package CEC

Authors: Jacek Tabor, Przemysław Spurek, Konrad Kamieniecki, Marek Śmieja, Krzysztof Misztal

Abstract: The R Package CEC performs clustering based on the cross-entropy clustering (CEC) method, which was recently developed with the use of information theory. The main advantage of CEC is that it combines the speed and simplicity of $k$-means with the ability to use various Gaussian mixture models and reduce unnecessary clusters. In this work we present a practical tutorial to CEC based on the R Packa… ▽ More The R Package CEC performs clustering based on the cross-entropy clustering (CEC) method, which was recently developed with the use of information theory. The main advantage of CEC is that it combines the speed and simplicity of $k$-means with the ability to use various Gaussian mixture models and reduce unnecessary clusters. In this work we present a practical tutorial to CEC based on the R Package CEC. Functions are provided to encompass the whole process of clustering. △ Less

Submitted 19 August, 2015; originally announced August 2015.

arXiv:1305.3040 [pdf, ps, other]

Weighted Approach to General Entropy Function

Authors: Marek Śmieja

Abstract: The definition of weighted entropy allows for easy calculation of the entropy of the mixture of measures. In this paper we investigate the problem of equivalent definition of the general entropy function in weighted form. We show that under reasonable condition, which is satisfied by the well-known Shannon, Rényi and Tsallis entropies, every entropy function can be defined equivalently in the weig… ▽ More The definition of weighted entropy allows for easy calculation of the entropy of the mixture of measures. In this paper we investigate the problem of equivalent definition of the general entropy function in weighted form. We show that under reasonable condition, which is satisfied by the well-known Shannon, Rényi and Tsallis entropies, every entropy function can be defined equivalently in the weighted way. As a corollary, we show how use the weighted form to compute Tsallis entropy of the mixture of measures. △ Less

Submitted 14 May, 2013; originally announced May 2013.

arXiv:1204.0078 [pdf, ps, other]

Partition Reduction for Lossy Data Compression Problem

Authors: Marek Śmieja, Jacek Tabor

Abstract: We consider the computational aspects of lossy data compression problem, where the compression error is determined by a cover of the data space. We propose an algorithm which reduces the number of partitions needed to find the entropy with respect to the compression error. In particular, we show that, in the case of finite cover, the entropy is attained on some partition. We give an algorithmic co… ▽ More We consider the computational aspects of lossy data compression problem, where the compression error is determined by a cover of the data space. We propose an algorithm which reduces the number of partitions needed to find the entropy with respect to the compression error. In particular, we show that, in the case of finite cover, the entropy is attained on some partition. We give an algorithmic construction of such partition. △ Less

Submitted 31 March, 2012; originally announced April 2012.

arXiv:1204.0075 [pdf, ps, other]

Weighted Approach to Rényi Entropy

Authors: Marek Śmieja, Jacek Tabor

Abstract: Rényi entropy of order αis a general measure of entropy. In this paper we derive estimations for the Rényi entropy of the mixture of sources in terms of the entropy of the single sources. These relations allow to compute the Rényi entropy dimension of arbitrary order of a mixture of measures. The key for obtaining these results is our new definition of the weighted Rényi entropy. It is shown tha… ▽ More Rényi entropy of order αis a general measure of entropy. In this paper we derive estimations for the Rényi entropy of the mixture of sources in terms of the entropy of the single sources. These relations allow to compute the Rényi entropy dimension of arbitrary order of a mixture of measures. The key for obtaining these results is our new definition of the weighted Rényi entropy. It is shown that weighted entropy is equal to the classical Rényi entropy. △ Less

Submitted 12 April, 2012; v1 submitted 31 March, 2012; originally announced April 2012.

arXiv:1110.6027 [pdf, ps, other]

Entropy of the Mixture of Sources and Entropy Dimension

Authors: Marek Smieja, Jacek Tabor

Abstract: We investigate the problem of the entropy of the mixture of sources. There is given an estimation of the entropy and entropy dimension of convex combination of measures. The proof is based on our alternative definition of the entropy based on measures instead of partitions. We investigate the problem of the entropy of the mixture of sources. There is given an estimation of the entropy and entropy dimension of convex combination of measures. The proof is based on our alternative definition of the entropy based on measures instead of partitions. △ Less

Submitted 28 October, 2011; v1 submitted 27 October, 2011; originally announced October 2011.

Showing 1–42 of 42 results for author: Śmieja, M