Skip to main content

Showing 1–42 of 42 results for author: Śmieja, M

  1. arXiv:2406.18739  [pdf, other

    cs.LG

    RetroGFN: Diverse and Feasible Retrosynthesis using GFlowNets

    Authors: Piotr Gaiński, Michał Koziarski, Krzysztof Maziarz, Marwin Segler, Jacek Tabor, Marek Śmieja

    Abstract: Single-step retrosynthesis aims to predict a set of reactions that lead to the creation of a target molecule, which is a crucial task in molecular discovery. Although a target molecule can often be synthesized with multiple different reactions, it is not clear how to verify the feasibility of a reaction, because the available datasets cover only a tiny fraction of the possible solutions. Consequen… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.08267  [pdf, other

    cs.LG cs.AI cs.DC

    A deep cut into Split Federated Self-supervised Learning

    Authors: Marcin Przewięźlikowski, Marcin Osial, Bartosz Zieliński, Marek Śmieja

    Abstract: Collaborative self-supervised learning has recently become feasible in highly distributed environments by dividing the network layers between client devices and a central server. However, state-of-the-art methods, such as MocoSFL, are optimized for network division at the initial layers, which decreases the protection of the client data and increases communication overhead. In this paper, we demon… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Accepted to European Conference on Machine Learning (ECML) 2024

  3. arXiv:2309.12033  [pdf, other

    cs.CV cs.LG

    Face Identity-Aware Disentanglement in StyleGAN

    Authors: Adrian Suwała, Bartosz Wójcik, Magdalena Proszewska, Jacek Tabor, Przemysław Spurek, Marek Śmieja

    Abstract: Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plu… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  4. arXiv:2307.02198  [pdf, other

    cs.LG q-bio.QM

    ChiENN: Embracing Molecular Chirality with Graph Neural Networks

    Authors: Piotr Gaiński, Michał Koziarski, Jacek Tabor, Marek Śmieja

    Abstract: Graph Neural Networks (GNNs) play a fundamental role in many deep learning problems, in particular in cheminformatics. However, typical GNNs cannot capture the concept of chirality, which means they do not distinguish between the 3D graph of a chemical compound and its mirror image (enantiomer). The ability to distinguish between enantiomers is important especially in drug discovery because enanti… ▽ More

    Submitted 10 July, 2023; v1 submitted 5 July, 2023; originally announced July 2023.

  5. arXiv:2306.06082  [pdf, other

    cs.CV cs.LG

    Augmentation-aware Self-supervised Learning with Conditioned Projector

    Authors: Marcin Przewięźlikowski, Mateusz Pyla, Bartosz Zieliński, Bartłomiej Twardowski, Jacek Tabor, Marek Śmieja

    Abstract: Self-supervised learning (SSL) is a powerful technique for learning robust representations from unlabeled data. By learning to remain invariant to applied data augmentations, methods such as SimCLR and MoCo are able to reach quality on par with supervised approaches. However, this invariance may be harmful to solving some downstream tasks which depend on traits affected by augmentations used durin… ▽ More

    Submitted 2 December, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

    Comments: Prepint under review. Code: https://github.com/gmum/CASSLE

  6. arXiv:2304.05243  [pdf, other

    cs.LG

    r-softmax: Generalized Softmax with Controllable Sparsity Rate

    Authors: Klaudia Bałazy, Łukasz Struski, Marek Śmieja, Jacek Tabor

    Abstract: Nowadays artificial neural network models achieve remarkable results in many disciplines. Functions mapping the representation provided by the model to the probability distribution are the inseparable aspect of deep learning solutions. Although softmax is a commonly accepted probability mapping function in the machine learning community, it cannot return sparse outputs and always spreads the posit… ▽ More

    Submitted 21 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

  7. arXiv:2304.03543  [pdf, other

    cs.LG cs.AI

    HyperTab: Hypernetwork Approach for Deep Learning on Small Tabular Datasets

    Authors: Witold Wydmański, Oleksii Bulenok, Marek Śmieja

    Abstract: Deep learning has achieved impressive performance in many domains, such as computer vision and natural language processing, but its advantage over classical shallow methods on tabular datasets remains questionable. It is especially challenging to surpass the performance of tree-like ensembles, such as XGBoost or Random Forests, on small-sized datasets (less than 1k samples). To tackle this challen… ▽ More

    Submitted 24 August, 2023; v1 submitted 7 April, 2023; originally announced April 2023.

  8. arXiv:2303.03389  [pdf, other

    cs.LG cs.AI

    Contrastive Hierarchical Clustering

    Authors: Michał Znaleźniak, Przemysław Rola, Patryk Kaszuba, Jacek Tabor, Marek Śmieja

    Abstract: Deep clustering has been dominated by flat models, which split a dataset into a predefined number of groups. Although recent methods achieve an extremely high similarity with the ground truth on popular benchmarks, the information contained in the flat partition is limited. In this paper, we introduce CoHiClust, a Contrastive Hierarchical Clustering model based on deep neural networks, which can b… ▽ More

    Submitted 21 June, 2023; v1 submitted 3 March, 2023; originally announced March 2023.

  9. arXiv:2207.04874  [pdf, other

    cs.NE cs.LG

    Hebbian Continual Representation Learning

    Authors: Paweł Morawiecki, Andrii Krutsylo, Maciej Wołczyk, Marek Śmieja

    Abstract: Continual Learning aims to bring machine learning into a more realistic scenario, where tasks are learned sequentially and the i.i.d. assumption is not preserved. Although this setting is natural for biological systems, it proves very difficult for machine learning models such as artificial neural networks. To reduce this performance gap, we investigate the question whether biologically inspired H… ▽ More

    Submitted 28 June, 2022; originally announced July 2022.

  10. arXiv:2206.13923  [pdf, other

    cs.LG

    SLOVA: Uncertainty Estimation Using Single Label One-Vs-All Classifier

    Authors: Bartosz Wójcik, Jacek Grela, Marek Śmieja, Krzysztof Misztal, Jacek Tabor

    Abstract: Deep neural networks present impressive performance, yet they cannot reliably estimate their predictive confidence, limiting their applicability in high-risk domains. We show that applying a multi-label one-vs-all loss reveals classification ambiguity and reduces model overconfidence. The introduced SLOVA (Single Label One-Vs-All) model redefines typical one-vs-all predictive probabilities to a si… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

  11. Semi-Supervised Clustering via Information-Theoretic Markov Chain Aggregation

    Authors: Sophie Steger, Bernhard C. Geiger, Marek Smieja

    Abstract: We connect the problem of semi-supervised clustering to constrained Markov aggregation, i.e., the task of partitioning the state space of a Markov chain. We achieve this connection by considering every data point in the dataset as an element of the Markov chain's state space, by defining the transition probabilities between states via similarities between corresponding data points, and by incorpor… ▽ More

    Submitted 7 February, 2022; v1 submitted 17 December, 2021; originally announced December 2021.

    Comments: 13 pages, 6 figures; this is an extended version of a short paper accepted at ACM SAC 2022 (minor changes to the text; error in source code corrected)

    ACM Class: H.1.1; I.5.3; I.2.0

    Journal ref: Proc. of ACM/SIGAPP Symposium on Applied Computing, pp. 1136-1139, 2022

  12. arXiv:2110.14010  [pdf, other

    cs.LG cs.CV

    MisConv: Convolutional Neural Networks for Missing Data

    Authors: Marcin Przewięźlikowski, Marek Śmieja, Łukasz Struski, Jacek Tabor

    Abstract: Processing of missing data by modern neural networks, such as CNNs, remains a fundamental, yet unsolved challenge, which naturally arises in many practical applications, like image inpainting or autonomous vehicles and robots. While imputation-based techniques are still one of the most popular solutions, they frequently introduce unreliable information to the data and do not take into account the… ▽ More

    Submitted 29 October, 2021; v1 submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted for publication at WACV 2022 Conference

  13. Pharmacoprint -- a combination of pharmacophore fingerprint and artificial intelligence as a tool for computer-aided drug design

    Authors: Dawid Warszycki, Łukasz Struski, Marek Śmieja, Rafał Kafel, Rafał Kurczab

    Abstract: Structural fingerprints and pharmacophore modeling are methodologies that have been used for at least two decades in various fields of cheminformatics: from similarity searching to machine learning (ML). Advances in silico techniques consequently led to combining both these methodologies into a new approach known as pharmacophore fingerprint. Herein, we propose a high-resolution, pharmacophore fin… ▽ More

    Submitted 31 October, 2023; v1 submitted 4 October, 2021; originally announced October 2021.

    Comments: Journal of Chemical Information and Modeling (2021)

  14. arXiv:2109.09011  [pdf, other

    cs.LG

    PluGeN: Multi-Label Conditional Generation From Pre-Trained Models

    Authors: Maciej Wołczyk, Magdalena Proszewska, Łukasz Maziarka, Maciej Zięba, Patryk Wielopolski, Rafał Kurczab, Marek Śmieja

    Abstract: Modern generative models achieve excellent quality in a variety of tasks including image or text generation and chemical molecule modeling. However, existing methods often lack the essential ability to generate examples with requested properties, such as the age of the person in the photo or the weight of the generated molecule. Incorporating such additional conditioning factors would require rebu… ▽ More

    Submitted 3 January, 2022; v1 submitted 18 September, 2021; originally announced September 2021.

  15. arXiv:2108.04907  [pdf, other

    cs.LG

    Flow-based SVDD for anomaly detection

    Authors: Marcin Sendera, Marek Śmieja, Łukasz Maziarka, Łukasz Struski, Przemysław Spurek, Jacek Tabor

    Abstract: We propose FlowSVDD -- a flow-based one-class classifier for anomaly/outliers detection that realizes a well-known SVDD principle using deep learning tools. Contrary to other approaches to deep SVDD, the proposed model is instantiated using flow-based models, which naturally prevents from collapsing of bounding hypersphere into a single point. Experiments show that FlowSVDD achieves comparable res… ▽ More

    Submitted 10 August, 2021; originally announced August 2021.

    Comments: arXiv admin note: text overlap with arXiv:2010.03002

  16. arXiv:2107.13214  [pdf, other

    cs.LG cs.AI

    SONG: Self-Organizing Neural Graphs

    Authors: Łukasz Struski, Tomasz Danel, Marek Śmieja, Jacek Tabor, Bartosz Zieliński

    Abstract: Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. H… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  17. arXiv:2106.05409  [pdf, other

    cs.LG

    Zero Time Waste: Recycling Predictions in Early Exit Neural Networks

    Authors: Maciej Wołczyk, Bartosz Wójcik, Klaudia Bałazy, Igor Podolak, Jacek Tabor, Marek Śmieja, Tomasz Trzciński

    Abstract: The problem of reducing processing time of large deep learning models is a fundamental challenge in many real-world applications. Early exit methods strive towards this goal by attaching additional Internal Classifiers (ICs) to intermediate layers of a neural network. ICs can quickly return predictions for easy examples and, as a result, reduce the average inference time of the whole model. Howeve… ▽ More

    Submitted 5 December, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted at NeurIPS 2021

  18. arXiv:2011.14620  [pdf, other

    cs.LG cs.AI stat.ML

    RegFlow: Probabilistic Flow-based Regression for Future Prediction

    Authors: Maciej Zięba, Marcin Przewięźlikowski, Marek Śmieja, Jacek Tabor, Tomasz Trzcinski, Przemysław Spurek

    Abstract: Predicting future states or actions of a given system remains a fundamental, yet unsolved challenge of intelligence, especially in the scope of complex and non-deterministic scenarios, such as modeling behavior of humans. Existing approaches provide results under strong assumptions concerning unimodality of future states, or, at best, assuming specific probability distributions that often poorly f… ▽ More

    Submitted 30 November, 2020; originally announced November 2020.

  19. arXiv:2010.13914  [pdf, other

    cs.CV cs.LG

    Processing of incomplete images by (graph) convolutional neural networks

    Authors: Tomasz Danel, Marek Śmieja, Łukasz Struski, Przemysław Spurek, Łukasz Maziarka

    Abstract: We investigate the problem of training neural networks from incomplete images without replacing missing values. For this purpose, we first represent an image as a graph, in which missing pixels are entirely ignored. The graph image representation is processed using a spatial graph convolutional network (SGCN) -- a type of graph convolutional networks, which is a proper generalization of classical… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

  20. OneFlow: One-class flow for anomaly detection based on a minimal volume region

    Authors: Łukasz Maziarka, Marek Śmieja, Marcin Sendera, Łukasz Struski, Jacek Tabor, Przemysław Spurek

    Abstract: We propose OneFlow - a flow-based one-class classifier for anomaly (outlier) detection that finds a minimal volume bounding region. Contrary to density-based methods, OneFlow is constructed in such a way that its result typically does not depend on the structure of outliers. This is caused by the fact that during training the gradient of the cost function is propagated only over the points located… ▽ More

    Submitted 22 September, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Journal ref: 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

  21. Estimating conditional density of missing values using deep Gaussian mixture model

    Authors: Marcin Przewięźlikowski, Marek Śmieja, Łukasz Struski

    Abstract: We consider the problem of estimating the conditional probability distribution of missing values given the observed ones. We propose an approach, which combines the flexibility of deep neural networks with the simplicity of Gaussian mixture models (GMMs). Given an incomplete data point, our neural network returns the parameters of Gaussian distribution (in the form of Factor Analyzers model) repre… ▽ More

    Submitted 6 October, 2020; v1 submitted 5 October, 2020; originally announced October 2020.

    Comments: A preliminary version of this paper appeared as an extended abstract at the ICML 2020 Workshop on The Art of Learning with Missing Values

  22. arXiv:2006.10013  [pdf, other

    cs.LG cs.CR stat.ML

    Adversarial Examples Detection and Analysis with Layer-wise Autoencoders

    Authors: Bartosz Wójcik, Paweł Morawiecki, Marek Śmieja, Tomasz Krzyżek, Przemysław Spurek, Jacek Tabor

    Abstract: We present a mechanism for detecting adversarial examples based on data representations taken from the hidden layers of the target network. For this purpose, we train individual autoencoders at intermediate layers of the target network. This allows us to describe the manifold of true data and, in consequence, decide whether a given example has the same characteristics as true data. It also gives u… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  23. arXiv:2001.06720  [pdf, other

    cs.LG stat.ML

    A Classification-Based Approach to Semi-Supervised Clustering with Pairwise Constraints

    Authors: Marek Śmieja, Łukasz Struski, Mário A. T. Figueiredo

    Abstract: In this paper, we introduce a neural network framework for semi-supervised clustering (SSC) with pairwise (must-link or cannot-link) constraints. In contrast to existing approaches, we decompose SSC into two simpler classification tasks/stages: the first stage uses a pair of Siamese neural networks to label the unlabeled pairs of points as must-link or cannot-link; the second stage uses the fully… ▽ More

    Submitted 18 January, 2020; originally announced January 2020.

  24. arXiv:1910.02776  [pdf, other

    cs.NE cs.LG stat.ML

    Biologically-Inspired Spatial Neural Networks

    Authors: Maciej Wołczyk, Jacek Tabor, Marek Śmieja, Szymon Maszke

    Abstract: We introduce bio-inspired artificial neural networks consisting of neurons that are additionally characterized by spatial positions. To simulate properties of biological systems we add the costs penalizing long connections and the proximity of neurons in a two-dimensional space. Our experiments show that in the case where the network performs two different tasks, the neurons naturally split into c… ▽ More

    Submitted 7 October, 2019; originally announced October 2019.

  25. arXiv:1909.05310  [pdf, other

    cs.LG stat.ML

    Spatial Graph Convolutional Networks

    Authors: Tomasz Danel, Przemysław Spurek, Jacek Tabor, Marek Śmieja, Łukasz Struski, Agnieszka Słowik, Łukasz Maziarka

    Abstract: Graph Convolutional Networks (GCNs) have recently become the primary choice for learning from graph-structured data, superseding hash fingerprints in representing chemical compounds. However, GCNs lack the ability to take into account the ordering of node neighbors, even when there is a geometric interpretation of the graph vertices that provides an order based on their spatial positions. To remed… ▽ More

    Submitted 2 July, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

  26. arXiv:1906.09333  [pdf, other

    cs.LG cs.AI

    SeGMA: Semi-Supervised Gaussian Mixture Auto-Encoder

    Authors: Marek Śmieja, Maciej Wołczyk, Jacek Tabor, Bernhard C. Geiger

    Abstract: We propose a semi-supervised generative model, SeGMA, which learns a joint probability distribution of data and their classes and which is implemented in a typical Wasserstein auto-encoder framework. We choose a mixture of Gaussians as a target distribution in latent space, which provides a natural splitting of data into clusters. To connect Gaussian components with correct classes, we use a small… ▽ More

    Submitted 27 August, 2020; v1 submitted 21 June, 2019; originally announced June 2019.

  27. arXiv:1906.00628  [pdf, other

    cs.LG stat.ML

    Fast and Stable Interval Bounds Propagation for Training Verifiably Robust Models

    Authors: Paweł Morawiecki, Przemysław Spurek, Marek Śmieja, Jacek Tabor

    Abstract: We present an efficient technique, which allows to train classification networks which are verifiably robust against norm-bounded adversarial attacks. This framework is built upon the work of Gowal et al., who applies the interval arithmetic to bound the activations at each layer and keeps the prediction invariant to the input perturbation. While that method is faster than competitive approaches,… ▽ More

    Submitted 3 July, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

  28. Hypernetwork functional image representation

    Authors: Sylwester Klocek, Łukasz Maziarka, Maciej Wołczyk, Jacek Tabor, Jakub Nowak, Marek Śmieja

    Abstract: Motivated by the human way of memorizing images we introduce their functional representation, where an image is represented by a neural network. For this purpose, we construct a hypernetwork which takes an image and returns weights to the target network, which maps point from the plane (representing positions of the pixel) into its corresponding color in the image. Since the obtained representatio… ▽ More

    Submitted 3 June, 2019; v1 submitted 27 February, 2019; originally announced February 2019.

    Journal ref: Artificial Neural Networks and Machine Learning -- ICANN 2019: Workshop and Special Sessions

  29. arXiv:1810.01868  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Set Aggregation Network as a Trainable Pooling Layer

    Authors: Łukasz Maziarka, Marek Śmieja, Aleksandra Nowak, Jacek Tabor, Łukasz Struski, Przemysław Spurek

    Abstract: Global pooling, such as max- or sum-pooling, is one of the key ingredients in deep neural networks used for processing images, texts, graphs and other types of structured data. Based on the recent DeepSets architecture proposed by Zaheer et al. (NIPS 2017), we introduce a Set Aggregation Network (SAN) as an alternative global pooling layer. In contrast to typical pooling operators, SAN allows to e… ▽ More

    Submitted 25 November, 2019; v1 submitted 3 October, 2018; originally announced October 2018.

    Comments: ICONIP 2019

    Journal ref: Neural Information Processing. ICONIP 2019

  30. arXiv:1805.07405  [pdf, other

    cs.LG stat.ML

    Processing of missing data by neural networks

    Authors: Marek Smieja, Łukasz Struski, Jacek Tabor, Bartosz Zieliński, Przemysław Spurek

    Abstract: We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron's response in the first hidden layer by its expected value. This approach can be applied for various types of networks at minimal cost in their modification. Moreover, in contrast to recent approaches, it does not require complete data for training. Experime… ▽ More

    Submitted 3 April, 2019; v1 submitted 18 May, 2018; originally announced May 2018.

  31. arXiv:1803.04033  [pdf, other

    cs.CV

    Cascade context encoder for improved inpainting

    Authors: Bartosz Zieliński, Łukasz Struski, Marek Śmieja, Jacek Tabor

    Abstract: In this paper, we analyze if cascade usage of the context encoder with increasing input can improve the results of the inpainting. For this purpose, we train context encoder for 64x64 pixels images in a standard way and use its resized output to fill in the missing input region of the 128x128 context encoder, both in training and evaluation phase. As the result, the inpainting is visibly more plau… ▽ More

    Submitted 11 March, 2018; originally announced March 2018.

    Comments: Supplemental materials are available at http://www.ii.uj.edu.pl/~zielinsb

  32. arXiv:1707.03157  [pdf, other

    cs.LG stat.ML

    Efficient mixture model for clustering of sparse high dimensional binary data

    Authors: Marek Śmieja, Krzysztof Hajto, Jacek Tabor

    Abstract: In this paper we propose a mixture model, SparseMix, for clustering of sparse high dimensional binary data, which connects model-based with centroid-based clustering. Every group is described by a representative and a probability distribution modeling dispersion from this representative. In contrast to classical mixture models based on EM algorithm, SparseMix: -is especially designed for the pro… ▽ More

    Submitted 11 July, 2017; originally announced July 2017.

  33. arXiv:1705.02232  [pdf, other

    cs.LG

    Spherical Wards clustering and generalized Voronoi diagrams

    Authors: Marek Śmieja, Jacek Tabor

    Abstract: Gaussian mixture model is very useful in many practical problems. Nevertheless, it cannot be directly generalized to non Euclidean spaces. To overcome this problem we present a spherical Gaussian-based clustering approach for partitioning data sets with respect to arbitrary dissimilarity measure. The proposed method is a combination of spherical Cross-Entropy Clustering with a generalized Wards ap… ▽ More

    Submitted 4 May, 2017; originally announced May 2017.

  34. arXiv:1705.01877  [pdf, other

    cs.LG stat.ML

    Semi-supervised model-based clustering with controlled clusters leakage

    Authors: Marek Śmieja, Łukasz Struski, Jacek Tabor

    Abstract: In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering.… ▽ More

    Submitted 4 May, 2017; originally announced May 2017.

  35. Semi-supervised cross-entropy clustering with information bottleneck constraint

    Authors: Marek Śmieja, Bernhard C. Geiger

    Abstract: In this paper, we propose a semi-supervised clustering method, CEC-IB, that models data with a set of Gaussian distributions and that retrieves clusters based on a partial labeling provided by the user (partition-level side information). By combining the ideas from cross-entropy clustering (CEC) with those from the information bottleneck method (IB), our method trades between three conflicting goa… ▽ More

    Submitted 3 May, 2017; originally announced May 2017.

    Journal ref: Information Sciences, vol. 421, Dec. 2017, pp. 254-271

  36. arXiv:1705.00840  [pdf, other

    cs.LG

    Pointed subspace approach to incomplete data

    Authors: Łukasz Struski, Marek Śmieja, Jacek Tabor

    Abstract: Incomplete data are often represented as vectors with filled missing attributes joined with flag vectors indicating missing components. In this paper we generalize this approach and represent incomplete data as pointed affine subspaces. This allows to perform various affine transformations of data, as whitening or dimensionality reduction. We embed such generalized missing data into a vector space… ▽ More

    Submitted 2 May, 2017; originally announced May 2017.

    Comments: 13 pages, 3 figures and 3 tables. arXiv admin note: text overlap with arXiv:1612.01480

  37. arXiv:1612.01480  [pdf, other

    cs.LG stat.ML

    Generalized RBF kernel for incomplete data

    Authors: Łukasz Struski, Marek Śmieja, Jacek Tabor

    Abstract: We construct $\bf genRBF$ kernel, which generalizes the classical Gaussian RBF kernel to the case of incomplete data. We model the uncertainty contained in missing attributes making use of data distribution and associate every point with a conditional probability density function. This allows to embed incomplete data into the function space and to define a kernel between two missing data points ba… ▽ More

    Submitted 2 May, 2017; v1 submitted 5 December, 2016; originally announced December 2016.

    Comments: 9 pages, 7 figures

  38. arXiv:1508.04559  [pdf, other

    cs.LG stat.ME stat.ML

    Introduction to Cross-Entropy Clustering The R Package CEC

    Authors: Jacek Tabor, Przemysław Spurek, Konrad Kamieniecki, Marek Śmieja, Krzysztof Misztal

    Abstract: The R Package CEC performs clustering based on the cross-entropy clustering (CEC) method, which was recently developed with the use of information theory. The main advantage of CEC is that it combines the speed and simplicity of $k$-means with the ability to use various Gaussian mixture models and reduce unnecessary clusters. In this work we present a practical tutorial to CEC based on the R Packa… ▽ More

    Submitted 19 August, 2015; originally announced August 2015.

  39. arXiv:1305.3040  [pdf, ps, other

    cs.IT

    Weighted Approach to General Entropy Function

    Authors: Marek Śmieja

    Abstract: The definition of weighted entropy allows for easy calculation of the entropy of the mixture of measures. In this paper we investigate the problem of equivalent definition of the general entropy function in weighted form. We show that under reasonable condition, which is satisfied by the well-known Shannon, Rényi and Tsallis entropies, every entropy function can be defined equivalently in the weig… ▽ More

    Submitted 14 May, 2013; originally announced May 2013.

  40. arXiv:1204.0078  [pdf, ps, other

    cs.IT

    Partition Reduction for Lossy Data Compression Problem

    Authors: Marek Śmieja, Jacek Tabor

    Abstract: We consider the computational aspects of lossy data compression problem, where the compression error is determined by a cover of the data space. We propose an algorithm which reduces the number of partitions needed to find the entropy with respect to the compression error. In particular, we show that, in the case of finite cover, the entropy is attained on some partition. We give an algorithmic co… ▽ More

    Submitted 31 March, 2012; originally announced April 2012.

  41. arXiv:1204.0075  [pdf, ps, other

    cs.IT

    Weighted Approach to Rényi Entropy

    Authors: Marek Śmieja, Jacek Tabor

    Abstract: Rényi entropy of order αis a general measure of entropy. In this paper we derive estimations for the Rényi entropy of the mixture of sources in terms of the entropy of the single sources. These relations allow to compute the Rényi entropy dimension of arbitrary order of a mixture of measures. The key for obtaining these results is our new definition of the weighted Rényi entropy. It is shown tha… ▽ More

    Submitted 12 April, 2012; v1 submitted 31 March, 2012; originally announced April 2012.

  42. arXiv:1110.6027  [pdf, ps, other

    cs.IT

    Entropy of the Mixture of Sources and Entropy Dimension

    Authors: Marek Smieja, Jacek Tabor

    Abstract: We investigate the problem of the entropy of the mixture of sources. There is given an estimation of the entropy and entropy dimension of convex combination of measures. The proof is based on our alternative definition of the entropy based on measures instead of partitions.

    Submitted 28 October, 2011; v1 submitted 27 October, 2011; originally announced October 2011.