subscribe to arXiv mailings

Sparsity-Aware Distributed Learning for Gaussian Processes with Linear Multiple Kernel

Authors: Richard Cornelius Suwandi, Zhidi Lin, Feng Yin, Zhiguo Wang, Sergios Theodoridis

Abstract: Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture (GSM) kernel is tailored for m… ▽ More Gaussian processes (GPs) stand as crucial tools in machine learning and signal processing, with their effectiveness hinging on kernel design and hyper-parameter optimization. This paper presents a novel GP linear multiple kernel (LMK) and a generic sparsity-aware distributed learning framework to optimize the hyper-parameters. The newly proposed grid spectral mixture (GSM) kernel is tailored for multi-dimensional data, effectively reducing the number of hyper-parameters while maintaining good approximation capabilities. We further demonstrate that the associated hyper-parameter optimization of this kernel yields sparse solutions. To exploit the inherent sparsity property of the solutions, we introduce the Sparse LInear Multiple Kernel Learning (SLIM-KL) framework. The framework incorporates a quantized alternating direction method of multipliers (ADMM) scheme for collaborative learning among multiple agents, where the local optimization problem is solved using a distributed successive convex approximation (DSCA) algorithm. SLIM-KL effectively manages large-scale hyper-parameter optimization for the proposed kernel, simultaneously ensuring data privacy and minimizing communication costs. Theoretical analysis establishes convergence guarantees for the learning framework, while experiments on diverse datasets demonstrate the superior prediction performance and efficiency of our proposed methods. △ Less

Submitted 26 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.01074 [pdf, other]

Towards Efficient Modeling and Inference in Multi-Dimensional Gaussian Process State-Space Models

Authors: Zhidi Lin, Juan Maroñas, Ying Li, Feng Yin, Sergios Theodoridis

Abstract: The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states.… ▽ More The Gaussian process state-space model (GPSSM) has attracted extensive attention for modeling complex nonlinear dynamical systems. However, the existing GPSSM employs separate Gaussian processes (GPs) for each latent state dimension, leading to escalating computational complexity and parameter proliferation, thus posing challenges for modeling dynamical systems with high-dimensional latent states. To surmount this obstacle, we propose to integrate the efficient transformed Gaussian process (ETGP) into the GPSSM, which involves pushing a shared GP through multiple normalizing flows to efficiently model the transition function in high-dimensional latent state space. Additionally, we develop a corresponding variational inference algorithm that surpasses existing methods in terms of parameter count and computational complexity. Experimental results on diverse synthetic and real-world datasets corroborate the efficiency of the proposed method, while also demonstrating its ability to achieve similar inference performance compared to existing methods. Code is available at \url{https://github.com/zhidilin/gpssmProj}. △ Less

Submitted 3 September, 2023; originally announced September 2023.

arXiv:2306.00561 [pdf, other]

Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners

Authors: Sarthak Yadav, Sergios Theodoridis, Lars Kai Hansen, Zheng-Hua Tan

Abstract: In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standar… ▽ More In this work, we propose a Multi-Window Masked Autoencoder (MW-MAE) fitted with a novel Multi-Window Multi-Head Attention (MW-MHA) module that facilitates the modelling of local-global interactions in every decoder transformer block through attention heads of several distinct local and global windows. Empirical results on ten downstream audio tasks show that MW-MAEs consistently outperform standard MAEs in overall performance and learn better general-purpose audio representations, along with demonstrating considerably better scaling characteristics. Investigating attention distances and entropies reveals that MW-MAE encoders learn heads with broader local and global attention. Analyzing attention head feature representations through Projection Weighted Canonical Correlation Analysis (PWCCA) shows that attention heads with the same window sizes across the decoder layers of the MW-MAE learn correlated feature representations which enables each block to independently capture local and global information, leading to a decoupled decoder feature hierarchy. Code for feature extraction and downstream experiments along with pre-trained models will be released publically. △ Less

Submitted 1 October, 2023; v1 submitted 1 June, 2023; originally announced June 2023.

arXiv:2205.14283 [pdf, other]

doi 10.1109/MSP.2022.3198201

Rethinking Bayesian Learning for Data Analysis: The Art of Prior and Inference in Sparsity-Aware Modeling

Authors: Lei Cheng, Feng Yin, Sergios Theodoridis, Sotirios Chatzis, Tsung-Hui Chang

Abstract: Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can… ▽ More Sparse modeling for signal processing and machine learning has been at the focus of scientific research for over two decades. Among others, supervised sparsity-aware learning comprises two major paths paved by: a) discriminative methods and b) generative methods. The latter, more widely known as Bayesian methods, enable uncertainty evaluation w.r.t. the performed predictions. Furthermore, they can better exploit related prior information and naturally introduce robustness into the model, due to their unique capacity to marginalize out uncertainties related to the parameter estimates. Moreover, hyper-parameters associated with the adopted priors can be learnt via the training data. To implement sparsity-aware learning, the crucial point lies in the choice of the function regularizer for discriminative methods and the choice of the prior distribution for Bayesian learning. Over the last decade or so, due to the intense research on deep learning, emphasis has been put on discriminative techniques. However, a come back of Bayesian methods is taking place that sheds new light on the design of deep neural networks, which also establish firm links with Bayesian models and inspire new paths for unsupervised learning, such as Bayesian tensor decomposition. The goal of this article is two-fold. First, to review, in a unified way, some recent advances in incorporating sparsity-promoting priors into three highly popular data modeling tools, namely deep neural networks, Gaussian processes, and tensor decomposition. Second, to review their associated inference techniques from different aspects, including: evidence maximization via optimization and variational inference methods. Challenges such as small data dilemma, automatic model structure search, and natural prediction uncertainty evaluation are also discussed. Typical signal processing and machine learning tasks are demonstrated. △ Less

Submitted 27 May, 2022; originally announced May 2022.

Comments: 64 pages, 16 figures, 6 tables, 98 references, submitted to IEEE Signal Processing Magazine

arXiv:2203.15806 [pdf]

doi 10.1109/JSTQE.2022.3183444

Bayesian Photonic Accelerators for Energy Efficient and Noise Robust Neural Processing

Authors: George Sarantoglou, Adonis Bogris, Charis Mesaritakis, Sergios Theodoridis

Abstract: Artificial neural networks are efficient computing platforms inspired by the brain. Such platforms can tackle a vast area of real-life tasks ranging from image processing to language translation. Silicon photonic integrated chips (PICs), by employing coherent interactions in Mach-Zehnder interferometers, are promising accelerators offering record low power consumption and ultra-fast matrix multipl… ▽ More Artificial neural networks are efficient computing platforms inspired by the brain. Such platforms can tackle a vast area of real-life tasks ranging from image processing to language translation. Silicon photonic integrated chips (PICs), by employing coherent interactions in Mach-Zehnder interferometers, are promising accelerators offering record low power consumption and ultra-fast matrix multiplication. Such photonic accelerators, however, suffer from phase uncertainty due to fabrication errors and crosstalk effects that inhibit the development of high-density implementations. In this work, we present a Bayesian learning framework for such photonic accelerators. In addition to the conventional log-likelihood optimization path, two novel training schemes are derived, namely a regularized version and a fully Bayesian learning scheme. They are applied on a photonic neural network with 512 phase shifters targeting the MNIST dataset. The new schemes, when combined with a pre-characterization stage that provides the passive offsets, are able to dramatically decrease the operational power of the PIC beyond 70%, with just a slight loss in classification accuracy. The full Bayesian scheme, apart from this energy reduction, returns information with respect to the sensitivity of the phase shifters. This information is used to de-activate 31% of the phase actuators and, thus, significantly simplify the driving system. △ Less

Submitted 5 May, 2022; v1 submitted 29 March, 2022; originally announced March 2022.

Comments: 10 pages, 8 figures

Report number: 28, (6), pp 1-10, 2022

Journal ref: IEEE Journal of Selected Topics in Quantum Electronics 2022

arXiv:2112.02671 [pdf, other]

Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness

Authors: Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

Abstract: This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. T… ▽ More This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case. △ Less

Submitted 5 December, 2021; originally announced December 2021.

Comments: Bayesian Deep Learning Workshop, NeurIPS 2021

arXiv:2109.07228 [pdf, other]

Dialog speech sentiment classification for imbalanced datasets

Authors: Sergis Nicolaou, Lambros Mavrides, Georgina Tryfou, Kyriakos Tolias, Konstantinos Panousis, Sotirios Chatzis, Sergios Theodoridis

Abstract: Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a c… ▽ More Speech is the most common way humans express their feelings, and sentiment analysis is the use of tools such as natural language processing and computational algorithms to identify the polarity of these feelings. Even though this field has seen tremendous advancements in the last two decades, the task of effectively detecting under represented sentiments in different kinds of datasets is still a challenging task. In this paper, we use single and bi-modal analysis of short dialog utterances and gain insights on the main factors that aid in sentiment detection, particularly in the underrepresented classes, in datasets with and without inherent sentiment component. Furthermore, we propose an architecture which uses a learning rate scheduler and different monitoring criteria and provides state-of-the-art results for the SWITCHBOARD imbalanced sentiment dataset. △ Less

Submitted 15 September, 2021; originally announced September 2021.

Comments: To be published in SPECOM & ICR 2021 Electronic Proceedings by the Springer Nature

arXiv:2101.01121 [pdf, ps, other]

Local Competition and Stochasticity for Adversarial Robustness in Deep Learning

Authors: Konstantinos P. Panousis, Sotirios Chatzis, Antonios Alexos, Sergios Theodoridis

Abstract: This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments,… ▽ More This work addresses adversarial robustness in deep learning by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units result in sparse representations from each model layer, as the units are organized in blocks where only one unit generates a non-zero output. The main operating principle of the introduced units lies on stochastic arguments, as the network performs posterior sampling over competing units to select the winner. We combine these LWTA arguments with tools from the field of Bayesian non-parametrics, specifically the stick-breaking construction of the Indian Buffet Process, to allow for inferring the sub-part of each layer that is essential for modeling the data at hand. Then, inference is performed by means of stochastic variational Bayes. We perform a thorough experimental evaluation of our model using benchmark datasets. As we show, our method achieves high robustness to adversarial perturbations, with state-of-the-art performance in powerful adversarial attack schemes. △ Less

Submitted 29 March, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

Comments: Accepted AISTATS 2021. arXiv admin note: text overlap with arXiv:2006.10620

arXiv:2009.02472 [pdf, other]

Towards Flexible Sparsity-Aware Modeling: Automatic Tensor Rank Learning Using The Generalized Hyperbolic Prior

Authors: Lei Cheng, Zhongtao Chen, Qingjiang Shi, Yik-Chung Wu, Sergios Theodoridis

Abstract: Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a… ▽ More Tensor rank learning for canonical polyadic decomposition (CPD) has long been deemed as an essential yet challenging problem. In particular, since the tensor rank controls the complexity of the CPD model, its inaccurate learning would cause overfitting to noise or underfitting to the signal sources, and even destroy the interpretability of model parameters. However, the optimal determination of a tensor rank is known to be a non-deterministic polynomial-time hard (NP-hard) task. Rather than exhaustively searching for the best tensor rank via trial-and-error experiments, Bayesian inference under the Gaussian-gamma prior was introduced in the context of probabilistic CPD modeling, and it was shown to be an effective strategy for automatic tensor rank determination. This triggered flourishing research on other structured tensor CPDs with automatic tensor rank learning. On the other side of the coin, these research works also reveal that the Gaussian-gamma model does not perform well for high-rank tensors and/or low signal-to-noise ratios (SNRs). To overcome these drawbacks, in this paper, we introduce a more advanced generalized hyperbolic (GH) prior to the probabilistic CPD model, which not only includes the Gaussian-gamma model as a special case, but also is more flexible to adapt to different levels of sparsity. Based on this novel probabilistic model, an algorithm is developed under the framework of variational inference, where each update is obtained in a closed-form. Extensive numerical results, using synthetic data and real-world datasets, demonstrate the significantly improved performance of the proposed method in learning both low as well as high tensor ranks even for low SNR cases. △ Less

Submitted 29 March, 2022; v1 submitted 5 September, 2020; originally announced September 2020.

arXiv:2005.07134 [pdf, other]

Early soft and flexible fusion of EEG and fMRI via tensor decompositions

Authors: Christos Chatzichristos, Eleftherios Kofidis, Lieven De Lathauwer, Sergios Theodoridis, Sabine Van Huffel

Abstract: Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary… ▽ More Data fusion refers to the joint analysis of multiple datasets which provide complementary views of the same task. In this preprint, the problem of jointly analyzing electroencephalography (EEG) and functional Magnetic Resonance Imaging (fMRI) data is considered. Jointly analyzing EEG and fMRI measurements is highly beneficial for studying brain function because these modalities have complementary spatiotemporal resolution: EEG offers good temporal resolution while fMRI is better in its spatial resolution. The fusion methods reported so far ignore the underlying multi-way nature of the data in at least one of the modalities and/or rely on very strong assumptions about the relation of the two datasets. In this preprint, these two points are addressed by adopting for the first time tensor models in the two modalities while also exploring double coupled tensor decompositions and by following soft and flexible coupling approaches to implement the multi-modal analysis. To cope with the Event Related Potential (ERP) variability in EEG, the PARAFAC2 model is adopted. The results obtained are compared against those of parallel Independent Component Analysis (ICA) and hard coupling alternatives in both simulated and real data. Our results confirm the superiority of tensorial methods over methods based on ICA. In scenarios that do not meet the assumptions underlying hard coupling, the advantage of soft and flexible coupled decompositions is clearly demonstrated. △ Less

Submitted 12 May, 2020; originally announced May 2020.

arXiv:2003.03697 [pdf, other]

FedLoc: Federated Learning Framework for Data-Driven Cooperative Localization and Location Data Processing

Authors: Feng Yin, Zhidi Lin, Yue Xu, Qinglei Kong, Deshi Li, Sergios Theodoridis, Shuguang, Cui

Abstract: In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various… ▽ More In this overview paper, data-driven learning model-based cooperative localization and location data processing are considered, in line with the emerging machine learning and big data methods. We first review (1) state-of-the-art algorithms in the context of federated learning, (2) two widely used learning models, namely the deep neural network model and the Gaussian process model, and (3) various distributed model hyper-parameter optimization schemes. Then, we demonstrate various practical use cases that are summarized from a mixture of standard, newly published, and unpublished works, which cover a broad range of location services, including collaborative static localization/fingerprinting, indoor target tracking, outdoor navigation using low-sampling GPS, and spatio-temporal wireless traffic data modeling and prediction. Experimental results show that near centralized data fitting- and prediction performance can be achieved by a set of collaborative mobile users running distributed algorithms. All the surveyed use cases fall under our newly proposed Federated Localization (FedLoc) framework, which targets on collaboratively building accurate location services without sacrificing user privacy, in particular, sensitive information related to their geographical trajectories. Future research directions are also discussed at the end of this paper. △ Less

Submitted 25 May, 2020; v1 submitted 7 March, 2020; originally announced March 2020.

arXiv:2002.05809 [pdf, other]

Variational Conditional Dependence Hidden Markov Models for Skeleton-Based Action Recognition

Authors: Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

Abstract: Hidden Markov Models (HMMs) comprise a powerful generative approach for modeling sequential data and time-series in general. However, the commonly employed assumption of the dependence of the current time frame to a single or multiple immediately preceding frames is unrealistic; more complicated dynamics potentially exist in real world scenarios. This paper revisits conventional sequential modelin… ▽ More Hidden Markov Models (HMMs) comprise a powerful generative approach for modeling sequential data and time-series in general. However, the commonly employed assumption of the dependence of the current time frame to a single or multiple immediately preceding frames is unrealistic; more complicated dynamics potentially exist in real world scenarios. This paper revisits conventional sequential modeling approaches, aiming to address the problem of capturing time-varying temporal dependency patterns. To this end, we propose a different formulation of HMMs, whereby the dependence on past frames is dynamically inferred from the data. Specifically, we introduce a hierarchical extension by postulating an additional latent variable layer; therein, the (time-varying) temporal dependence patterns are treated as latent variables over which inference is performed. We leverage solid arguments from the Variational Bayes framework and derive a tractable inference algorithm based on the forward-backward algorithm. As we experimentally show, our approach can model highly complex sequential data and can effectively handle data with missing values. △ Less

Submitted 9 September, 2021; v1 submitted 13 February, 2020; originally announced February 2020.

Comments: International Symposium on Visual Computing (ISVC) 2021

arXiv:1904.09559 [pdf, ps, other]

doi 10.1109/TSP.2020.3023008

Linear Multiple Low-Rank Kernel Based Stationary Gaussian Processes Regression for Time Series

Authors: Feng Yin, Lishuo Pan, Xinwei He, Tianshi Chen, Sergios Theodoridis, Zhi-Quan, Luo

Abstract: Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The under… ▽ More Gaussian processes (GP) for machine learning have been studied systematically over the past two decades and they are by now widely used in a number of diverse applications. However, GP kernel design and the associated hyper-parameter optimization are still hard and to a large extend open problems. In this paper, we consider the task of GP regression for time series modeling and analysis. The underlying stationary kernel can be approximated arbitrarily close by a new proposed grid spectral mixture (GSM) kernel, which turns out to be a linear combination of low-rank sub-kernels. In the case where a large number of the sub-kernels are used, either the Nyström or the random Fourier feature approximations can be adopted to deal efficiently with the computational demands. The unknown GP hyper-parameters consist of the non-negative weights of all sub-kernels as well as the noise variance; their estimation is performed via the maximum-likelihood (ML) estimation framework. Two efficient numerical optimization methods for solving the unknown hyper-parameters are derived, including a sequential majorization-minimization (MM) method and a non-linearly constrained alternating direction of multiplier method (ADMM). The MM matches perfectly with the proven low-rank property of the proposed GSM sub-kernels and turns out to be a part of efficiency, stable, and efficient solver, while the ADMM has the potential to generate better local minimum in terms of the test MSE. Experimental results, based on various classic time series data sets, corroborate that the proposed GSM kernel-based GP regression model outperforms several salient competitors of similar kind in terms of prediction mean-squared-error and numerical stability. △ Less

Submitted 21 April, 2019; originally announced April 2019.

Comments: 15 pages, 5 figures, submitted

arXiv:1808.00560 [pdf, other]

Compressible Spectral Mixture Kernels with Sparse Dependency Structures for Gaussian Processes

Authors: Kai Chen, Yijue Dai, Feng Yin, Elena Marchiori, Sergios Theodoridis

Abstract: Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-co… ▽ More Spectral mixture (SM) kernels comprise a powerful class of generalized kernels for Gaussian processes (GPs) to describe complex patterns. This paper introduces model compression and time- and phase (TP) modulated dependency structures to the original (SM) kernel for improved generalization of GPs. Specifically, by adopting Bienaymés identity, we generalize the dependency structure through cross-covariance between the SM components. Then, we propose a novel SM kernel with a dependency structure (SMD) by using cross-convolution between the SM components. Furthermore, we ameliorate the expressiveness of the dependency structure by parameterizing it with time and phase delays. The dependency structure has clear interpretations in terms of spectral density, covariance behavior, and sampling path. To enrich the SMD with effective hyperparameter initialization, compressible SM kernel components, and sparse dependency structures, we introduce a novel structure adaptation (SA) algorithm in the end. A thorough comparative analysis of the SMD on both synthetic and real-life applications corroborates its efficacy. △ Less

Submitted 26 July, 2023; v1 submitted 1 August, 2018; originally announced August 2018.

Comments: 13 pages

arXiv:1805.07624 [pdf, other]

Nonparametric Bayesian Deep Networks with Local Competition

Authors: Konstantinos P. Panousis, Sotirios Chatzis, Sergios Theodoridis

Abstract: The aim of this work is to enable inference of deep networks that retain high accuracy for the least possible model complexity, with the latter deduced from the data during inference. To this end, we revisit deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. In this context, our main technical innovation consists in… ▽ More The aim of this work is to enable inference of deep networks that retain high accuracy for the least possible model complexity, with the latter deduced from the data during inference. To this end, we revisit deep networks that comprise competing linear units, as opposed to nonlinear units that do not entail any form of (local) competition. In this context, our main technical innovation consists in an inferential setup that leverages solid arguments from Bayesian nonparametrics. We infer both the needed set of connections or locally competing sets of units, as well as the required floating-point precision for storing the network parameters. Specifically, we introduce auxiliary discrete latent variables representing which initial network components are actually needed for modeling the data at hand, and perform Bayesian inference over them by imposing appropriate stick-breaking priors. As we experimentally show using benchmark datasets, our approach yields networks with less computational footprint than the state-of-the-art, and with no compromises in predictive accuracy. △ Less

Submitted 5 May, 2019; v1 submitted 19 May, 2018; originally announced May 2018.

Comments: Proc. ICML 2019

arXiv:1804.07672 [pdf, other]

Unsupervised learning of the brain connectivity dynamic using residual D-net

Authors: Youngjoo Seo, Manuel Morante, Yannis Kopsinis, Sergios Theodoridis

Abstract: In this paper, we propose a novel unsupervised learning method to learn the brain dynamics using a deep learning architecture named residual D-net. As it is often the case in medical research, in contrast to typical deep learning tasks, the size of the resting-state functional Magnetic Resonance Image (rs-fMRI) datasets for training is limited. Thus, the available data should be very efficiently u… ▽ More In this paper, we propose a novel unsupervised learning method to learn the brain dynamics using a deep learning architecture named residual D-net. As it is often the case in medical research, in contrast to typical deep learning tasks, the size of the resting-state functional Magnetic Resonance Image (rs-fMRI) datasets for training is limited. Thus, the available data should be very efficiently used to learn the complex patterns underneath the brain connectivity dynamics. To address this issue, we use residual connections to alleviate the training complexity through recurrent multi-scale representation. We conduct two classification tasks to differentiate early and late stage Mild Cognitive Impairment (MCI) from Normal healthy Control (NC) subjects. The experiments verify that our proposed residual D-net indeed learns the brain connectivity dynamics, leading to significantly higher classification accuracy compared to previously published techniques. △ Less

Submitted 28 February, 2019; v1 submitted 20 April, 2018; originally announced April 2018.

Comments: 10 pages, 5 figueres and 3 tables, under review in MIDL 2018

MSC Class: 62P10

arXiv:1703.08131 [pdf, ps, other]

doi 10.1109/TSP.2017.2781640

Online Distributed Learning Over Networks in RKH Spaces Using Random Fourier Features

Authors: Pantelis Bouboulis, Symeon Chouvardas, Sergios Theodoridis

Abstract: We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. In contrast,… ▽ More We present a novel diffusion scheme for online kernel-based learning over networks. So far, a major drawback of any online learning algorithm, operating in a reproducing kernel Hilbert space (RKHS), is the need for updating a growing number of parameters as time iterations evolve. Besides complexity, this leads to an increased need of communication resources, in a distributed setting. In contrast, the proposed method approximates the solution as a fixed-size vector (of larger dimension than the input space) using Random Fourier Features. This paves the way to use standard linear combine-then-adapt techniques. To the best of our knowledge, this is the first time that a complete protocol for distributed online learning in RKHS is presented. Conditions for asymptotic convergence and boundness of the networkwise regret are also provided. The simulated tests illustrate the performance of the proposed scheme. △ Less

Submitted 24 March, 2017; v1 submitted 23 March, 2017; originally announced March 2017.

arXiv:1606.03685 [pdf, ps, other]

Efficient KLMS and KRLS Algorithms: A Random Fourier Feature Perspective

Authors: Pantelis Bouboulis, Spyridon Pougkakiotis, Sergios Theodoridis

Abstract: We present a new framework for online Least Squares algorithms for nonlinear modeling in RKH spaces (RKHS). Instead of implicitly mapping the data to a RKHS (e.g., kernel trick), we map the data to a finite dimensional Euclidean space, using random features of the kernel's Fourier transform. The advantage is that, the inner product of the mapped data approximates the kernel function. The resulting… ▽ More We present a new framework for online Least Squares algorithms for nonlinear modeling in RKH spaces (RKHS). Instead of implicitly mapping the data to a RKHS (e.g., kernel trick), we map the data to a finite dimensional Euclidean space, using random features of the kernel's Fourier transform. The advantage is that, the inner product of the mapped data approximates the kernel function. The resulting "linear" algorithm does not require any form of sparsification, since, in contrast to all existing algorithms, the solution's size remains fixed and does not increase with the iteration steps. As a result, the obtained algorithms are computationally significantly more efficient compared to previously derived variants, while, at the same time, they converge at similar speeds and to similar error floors. △ Less

Submitted 12 June, 2016; originally announced June 2016.

Comments: presented in the 2016 IEEE Workshop on Statistical Signal Processing (SSP 16)

ACM Class: K.3.2, I.5.4

arXiv:1601.00595 [pdf, other]

doi 10.1109/TSP.2017.2708029

Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising

Authors: George Papageorgiou, Pantelis Bouboulis, Sergios Theodoridis

Abstract: We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression… ▽ More We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression (SVR) turn out to be inadequate. Instead, we employ sparse modeling arguments to explicitly model and estimate the outliers, adopting a greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), is inspired by the classical Orthogonal Matching Pursuit (OMP) algorithm. Specifically, the proposed method alternates between a KRR task and an OMP-like selection step. Theoretical results concerning the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods, where its performance is evaluated via a set of experiments with various types of noise. Finally, the proposed robust estimation framework is applied to the task of image denoising, and its enhanced performance in the presence of outliers is demonstrated. △ Less

Submitted 3 August, 2016; v1 submitted 4 January, 2016; originally announced January 2016.

arXiv:1410.3682 [pdf, ps, other]

doi 10.1109/TSP.2015.2393839

Greedy Sparsity-Promoting Algorithms for Distributed Learning

Authors: Symeon Chouvardas, Gerasimos Mileounis, Nicholas Kalouptsidis, Sergios Theodoridis

Abstract: This paper focuses on the development of novel greedy techniques for distributed learning under sparsity constraints. Greedy techniques have widely been used in centralized systems due to their low computational requirements and at the same time their relatively good performance in estimating sparse parameter vectors/signals. The paper reports two new algorithms in the context of sparsity--aware l… ▽ More This paper focuses on the development of novel greedy techniques for distributed learning under sparsity constraints. Greedy techniques have widely been used in centralized systems due to their low computational requirements and at the same time their relatively good performance in estimating sparse parameter vectors/signals. The paper reports two new algorithms in the context of sparsity--aware learning. In both cases, the goal is first to identify the support set of the unknown signal and then to estimate the non--zero values restricted to the active support set. First, an iterative greedy multi--step procedure is developed, based on a neighborhood cooperation strategy, using batch processing on the observed data. Next, an extension of the algorithm to the online setting, based on the diffusion LMS rationale for adaptivity, is derived. Theoretical analysis of the algorithms is provided, where it is shown that the batch algorithm converges to the unknown vector if a Restricted Isometry Property (RIP) holds. Moreover, the online version converges in the mean to the solution vector under some general assumptions. Finally, the proposed schemes are tested against recently developed sparsity--promoting algorithms and their enhanced performance is verified via simulation examples. △ Less

Submitted 14 October, 2014; originally announced October 2014.

Comments: Paper submitted to IEEE Transactions on Signal Processing

arXiv:1409.4279 [pdf, ps, other]

doi 10.1109/TSP.2015.2430840

Robust Linear Regression Analysis - A Greedy Approach

Authors: George Papageorgiou, Pantelis Bouboulis, Sergios Theodoridis, Kostantinos Themelis

Abstract: The task of robust linear estimation in the presence of outliers is of particular importance in signal processing, statistics and machine learning. Although the problem has been stated a few decades ago and solved using classical (considered nowadays) methods, recently it has attracted more attention in the context of sparse modeling, where several notable contributions have been made. In the pres… ▽ More The task of robust linear estimation in the presence of outliers is of particular importance in signal processing, statistics and machine learning. Although the problem has been stated a few decades ago and solved using classical (considered nowadays) methods, recently it has attracted more attention in the context of sparse modeling, where several notable contributions have been made. In the present manuscript, a new approach is considered in the framework of greedy algorithms. The noise is split into two components: a) the inlier bounded noise and b) the outliers, which are explicitly modeled by employing sparsity arguments. Based on this scheme, a novel efficient algorithm (Greedy Algorithm for Robust Denoising - GARD), is derived. GARD alternates between a least square optimization criterion and an Orthogonal Matching Pursuit (OMP) selection step that identifies the outliers. The case where only outliers are present has been studied separately, where bounds on the \textit{Restricted Isometry Property} guarantee that the recovery of the signal via GARD is exact. Moreover, theoretical results concerning convergence as well as the derivation of error bounds in the case of additional bounded noise are discussed. Finally, we provide extensive simulations, which demonstrate the comparative advantages of the new technique. △ Less

Submitted 8 May, 2015; v1 submitted 15 September, 2014; originally announced September 2014.

arXiv:1303.2184 [pdf, ps, other]

doi 10.1109/TNNLS.2014.2336679

Complex Support Vector Machines for Regression and Quaternary Classification

Authors: Pantelis Bouboulis, Sergios Theodoridis, Charalampos Mavroforakis, Leoni Dalla

Abstract: The paper presents a new framework for complex Support Vector Regression as well as Support Vector Machines for quaternary classification. The method exploits the notion of widely linear estimation to model the input-out relation for complex-valued data and considers two cases: a) the complex data are split into their real and imaginary parts and a typical real kernel is employed to map the comple… ▽ More The paper presents a new framework for complex Support Vector Regression as well as Support Vector Machines for quaternary classification. The method exploits the notion of widely linear estimation to model the input-out relation for complex-valued data and considers two cases: a) the complex data are split into their real and imaginary parts and a typical real kernel is employed to map the complex data to a complexified feature space and b) a pure complex kernel is used to directly map the data to the induced complex feature space. The recently developed Wirtinger's calculus on complex reproducing kernel Hilbert spaces (RKHS) is employed in order to compute the Lagrangian and derive the dual optimization problem. As one of our major results, we prove that any complex SVM/SVR task is equivalent with solving two real SVM/SVR tasks exploiting a specific real kernel which is generated by the chosen complex kernel. In particular, the case of pure complex kernels leads to the generation of new kernels, which have not been considered before. In the classification case, the proposed framework inherently splits the complex space into four parts. This leads naturally in solving the four class-task (quaternary classification), instead of the typical two classes of the real SVM. In turn, this rationale can be used in a multiclass problem as a split-class scenario based on four classes, as opposed to the one-versus-all method; this can lead to significant computational savings. Experiments demonstrate the effectiveness of the proposed framework for regression and classification tasks that involve complex data. △ Less

Submitted 15 July, 2014; v1 submitted 9 March, 2013; originally announced March 2013.

Comments: Manuscript accepted in IEEE Transactions on Neural Networks and Learning Systems

arXiv:1303.2136 [pdf, ps, other]

Preamble-based Channel Estimation in OFDM/OQAM Systems: A Review

Authors: E. Kofidis, D. Katselis, A. Rontogiannis, S. Theodoridis

Abstract: Filter bank-based multicarrier communications (FBMC) have recently attracted increased interest in both wired (e.g., xDSL, PLC) and wireless (e.g., cognitive radio) applications, due to their enhanced flexibility, higher spectral efficiency, and better spectral containment compared to conventional OFDM. A particular type of FBMC, the so-called FBMC/OQAM or OFDM/OQAM system, consisting of pulse sha… ▽ More Filter bank-based multicarrier communications (FBMC) have recently attracted increased interest in both wired (e.g., xDSL, PLC) and wireless (e.g., cognitive radio) applications, due to their enhanced flexibility, higher spectral efficiency, and better spectral containment compared to conventional OFDM. A particular type of FBMC, the so-called FBMC/OQAM or OFDM/OQAM system, consisting of pulse shaped OFDM carrying offset QAM (OQAM) symbols, has received increasing attention due to, among other features, its higher spectral efficiency and implementation simplicity. It suffers, however, from an imaginary inter-carrier/inter-symbol interference that complicates signal processing tasks such as channel estimation. This paper focuses on channel estimation for OFDM/OQAM systems based on a known preamble. A review of the existing preamble structures and associated channel estimation methods is given, for both single- (SISO) and multiple-antenna (MIMO) systems. The various preambles are compared via simulations in both mildly and highly frequency selective channels. △ Less

Submitted 8 March, 2013; originally announced March 2013.

Comments: This is an early version of a paper to appear in Signal Processing (Elsevier)

arXiv:1211.5231 [pdf, other]

Sparsity-Aware Learning and Compressed Sensing: An Overview

Authors: Sergios Theodoridis, Yannis Kopsinis, Konstantinos Slavakis

Abstract: This paper is based on a chapter of a new book on Machine Learning, by the first and third author, which is currently under preparation. We provide an overview of the major theoretical advances as well as the main trends in algorithmic developments in the area of sparsity-aware learning and compressed sensing. Both batch processing and online processing techniques are considered. A case study in t… ▽ More This paper is based on a chapter of a new book on Machine Learning, by the first and third author, which is currently under preparation. We provide an overview of the major theoretical advances as well as the main trends in algorithmic developments in the area of sparsity-aware learning and compressed sensing. Both batch processing and online processing techniques are considered. A case study in the context of time-frequency analysis of signals is also presented. Our intent is to update this review from time to time, since this is a very hot research area with a momentum and speed that is sometimes difficult to follow up. △ Less

Submitted 22 November, 2012; originally announced November 2012.

arXiv:1112.5716 [pdf, ps, other]

doi 10.1109/TSP.2012.2204987

A Sparsity-Aware Adaptive Algorithm for Distributed Learning

Authors: Symeon Chouvardas, Konstantinos Slavakis, Yannis Kopsinis, Sergios Theodoridis

Abstract: In this paper, a sparsity-aware adaptive algorithm for distributed learning in diffusion networks is developed. The algorithm follows the set-theoretic estimation rationale. At each time instance and at each node of the network, a closed convex set, known as property set, is constructed based on the received measurements; this defines the region in which the solution is searched for. In this paper… ▽ More In this paper, a sparsity-aware adaptive algorithm for distributed learning in diffusion networks is developed. The algorithm follows the set-theoretic estimation rationale. At each time instance and at each node of the network, a closed convex set, known as property set, is constructed based on the received measurements; this defines the region in which the solution is searched for. In this paper, the property sets take the form of hyperslabs. The goal is to find a point that belongs to the intersection of these hyperslabs. To this end, sparsity encouraging variable metric projections onto the hyperslabs have been adopted. Moreover, sparsity is also imposed by employing variable metric projections onto weighted $\ell_1$ balls. A combine adapt cooperation strategy is adopted. Under some mild assumptions, the scheme enjoys monotonicity, asymptotic optimality and strong convergence to a point that lies in the consensus subspace. Finally, numerical examples verify the validity of the proposed scheme, compared to other algorithms, which have been developed in the context of sparse adaptive learning. △ Less

Submitted 24 December, 2011; originally announced December 2011.

arXiv:1112.0665 [pdf, ps, other]

Generalized Thresholding and Online Sparsity-Aware Learning in a Union of Subspaces

Authors: Konstantinos Slavakis, Yannis Kopsinis, Sergios Theodoridis, Stephen McLaughlin

Abstract: This paper studies a sparse signal recovery task in time-varying (time-adaptive) environments. The contribution of the paper to sparsity-aware online learning is threefold; first, a Generalized Thresholding (GT) operator, which relates to both convex and non-convex penalty functions, is introduced. This operator embodies, in a unified way, the majority of well-known thresholding rules which promot… ▽ More This paper studies a sparse signal recovery task in time-varying (time-adaptive) environments. The contribution of the paper to sparsity-aware online learning is threefold; first, a Generalized Thresholding (GT) operator, which relates to both convex and non-convex penalty functions, is introduced. This operator embodies, in a unified way, the majority of well-known thresholding rules which promote sparsity. Second, a non-convexly constrained, sparsity-promoting, online learning scheme, namely the Adaptive Projection-based Generalized Thresholding (APGT), is developed that incorporates the GT operator with a computational complexity that scales linearly to the number of unknowns. Third, the novel family of partially quasi-nonexpansive mappings is introduced as a functional analytic tool for treating the GT operator. By building upon the rich fixed point theory, the previous class of mappings helps us, also, to establish a link between the GT operator and a union of linear subspaces; a non-convex object which lies at the heart of any sparsity promoting technique, batch or online. Based on such a functional analytic framework, a convergence analysis of the APGT is provided. Furthermore, extensive experiments suggest that the APGT exhibits competitive performance when compared to computationally more demanding alternatives, such as the sparsity-promoting Affine Projection Algorithm (APA)- and Recursive Least Squares (RLS)-based techniques. △ Less

Submitted 29 November, 2012; v1 submitted 3 December, 2011; originally announced December 2011.

arXiv:1110.1075 [pdf, ps, other]

doi 10.1109/TSP.2012.2200479

The Augmented Complex Kernel LMS

Authors: Pantelis Bouboulis, Sergios Theodoridis, Michael Mavroforakis

Abstract: Recently, a unified framework for adaptive kernel based signal processing of complex data was presented by the authors, which, besides offering techniques to map the input data to complex Reproducing Kernel Hilbert Spaces, developed a suitable Wirtinger-like Calculus for general Hilbert Spaces. In this short paper, the extended Wirtinger's calculus is adopted to derive complex kernel-based widely-… ▽ More Recently, a unified framework for adaptive kernel based signal processing of complex data was presented by the authors, which, besides offering techniques to map the input data to complex Reproducing Kernel Hilbert Spaces, developed a suitable Wirtinger-like Calculus for general Hilbert Spaces. In this short paper, the extended Wirtinger's calculus is adopted to derive complex kernel-based widely-linear estimation filters. Furthermore, we illuminate several important characteristics of the widely linear filters. We show that, although in many cases the gains from adopting widely linear estimation filters, as alternatives to ordinary linear ones, are rudimentary, for the case of kernel based widely linear filters significant performance improvements can be obtained. △ Less

Submitted 5 October, 2011; originally announced October 2011.

Comments: manuscript submitted to IEE Transactions on Signal Processing

arXiv:1011.5962 [pdf, ps, other]

Edge Preserving Image Denoising in Reproducing Kernel Hilbert Spaces

Authors: Pantelis Bouboulis, Sergios Theodoridis

Abstract: The goal of this paper is the development of a novel approach for the problem of Noise Removal, based on the theory of Reproducing Kernels Hilbert Spaces (RKHS). The problem is cast as an optimization task in a RKHS, by taking advantage of the celebrated semiparametric Representer Theorem. Examples verify that in the presence of gaussian noise the proposed method performs relatively well compared… ▽ More The goal of this paper is the development of a novel approach for the problem of Noise Removal, based on the theory of Reproducing Kernels Hilbert Spaces (RKHS). The problem is cast as an optimization task in a RKHS, by taking advantage of the celebrated semiparametric Representer Theorem. Examples verify that in the presence of gaussian noise the proposed method performs relatively well compared to wavelet based technics and outperforms them significantly in the presence of impulse or mixed noise. A more detailed version of this work has been published in the IEEE Trans. Im. Proc. : P. Bouboulis, K. Slavakis and S. Theodoridis, Adaptive Kernel-based Image Denoising employing Semi-Parametric Regularization, IEEE Transactions on Image Processing, vol 19(6), 2010, 1465 - 1479. △ Less

Submitted 27 November, 2010; originally announced November 2010.

Comments: This work has been selected for the Best Scientific Paper Award (Track III: Signal, Speech, Image and Video Processing) at the ICPR 2010

Journal ref: Proceedings of the 20th International Conference on Pattern Recognition, Istanbul: Turkey, 23-26 August 2010

arXiv:1006.3033 [pdf, ps, other]

doi 10.1109/TSP.2010.2096420

Extension of Wirtinger's Calculus to Reproducing Kernel Hilbert Spaces and the Complex Kernel LMS

Authors: Pantelis Bouboulis, Sergios Theodoridis

Abstract: Over the last decade, kernel methods for nonlinear processing have successfully been used in the machine learning community. The primary mathematical tool employed in these methods is the notion of the Reproducing Kernel Hilbert Space. However, so far, the emphasis has been on batch techniques. It is only recently, that online techniques have been considered in the context of adaptive signal proce… ▽ More Over the last decade, kernel methods for nonlinear processing have successfully been used in the machine learning community. The primary mathematical tool employed in these methods is the notion of the Reproducing Kernel Hilbert Space. However, so far, the emphasis has been on batch techniques. It is only recently, that online techniques have been considered in the context of adaptive signal processing tasks. Moreover, these efforts have only been focussed on real valued data sequences. To the best of our knowledge, no adaptive kernel-based strategy has been developed, so far, for complex valued signals. Furthermore, although the real reproducing kernels are used in an increasing number of machine learning problems, complex kernels have not, yet, been used, in spite of their potential interest in applications that deal with complex signals, with Communications being a typical example. In this paper, we present a general framework to attack the problem of adaptive filtering of complex signals, using either real reproducing kernels, taking advantage of a technique called \textit{complexification} of real RKHSs, or complex reproducing kernels, highlighting the use of the complex gaussian kernel. In order to derive gradients of operators that need to be defined on the associated complex RKHSs, we employ the powerful tool of Wirtinger's Calculus, which has recently attracted attention in the signal processing community. To this end, in this paper, the notion of Wirtinger's calculus is extended, for the first time, to include complex RKHSs and use it to derive several realizations of the Complex Kernel Least-Mean-Square (CKLMS) algorithm. Experiments verify that the CKLMS offers significant performance improvements over several linear and nonlinear algorithms, when dealing with nonlinearities. △ Less

Submitted 27 November, 2010; v1 submitted 15 June, 2010; originally announced June 2010.

Comments: 15 pages (double column), preprint of article accepted in IEEE Trans. Sig. Proc

arXiv:1005.0902 [pdf, ps, other]

Extension of Wirtinger Calculus in RKH Spaces and the Complex Kernel LMS

Authors: Pantelis Bouboulis, Sergios Theodoridis

Abstract: Over the last decade, kernel methods for nonlinear processing have successfully been used in the machine learning community. However, so far, the emphasis has been on batch techniques. It is only recently, that online adaptive techniques have been considered in the context of signal processing tasks. To the best of our knowledge, no kernel-based strategy has been developed, so far, that is able to… ▽ More Over the last decade, kernel methods for nonlinear processing have successfully been used in the machine learning community. However, so far, the emphasis has been on batch techniques. It is only recently, that online adaptive techniques have been considered in the context of signal processing tasks. To the best of our knowledge, no kernel-based strategy has been developed, so far, that is able to deal with complex valued signals. In this paper, we take advantage of a technique called complexification of real RKHSs to attack this problem. In order to derive gradients and subgradients of operators that need to be defined on the associated complex RKHSs, we employ the powerful tool ofWirtinger's Calculus, which has recently attracted much attention in the signal processing community. Writinger's calculus simplifies computations and offers an elegant tool for treating complex signals. To this end, in this paper, the notion of Writinger's calculus is extended, for the first time, to include complex RKHSs and use it to derive the Complex Kernel Least-Mean-Square (CKLMS) algorithm. Experiments verify that the CKLMS can be used to derive nonlinear stable algorithms, which offer significant performance improvements over the traditional complex LMS orWidely Linear complex LMS (WL-LMS) algorithms, when dealing with nonlinearities. △ Less

Submitted 25 May, 2010; v1 submitted 6 May, 2010; originally announced May 2010.

Comments: 6 pages, 3 figures manuscript submitted to MLSP 2010

arXiv:1005.0897 [pdf, ps, other]

The Complex Gaussian Kernel LMS algorithm

Authors: Pantelis Bouboulis, Sergios Theodoridis

Abstract: Although the real reproducing kernels are used in an increasing number of machine learning problems, complex kernels have not, yet, been used, in spite of their potential interest in applications such as communications. In this work, we focus our attention on the complex gaussian kernel and its possible application in the complex Kernel LMS algorithm. In order to derive the gradients needed to dev… ▽ More Although the real reproducing kernels are used in an increasing number of machine learning problems, complex kernels have not, yet, been used, in spite of their potential interest in applications such as communications. In this work, we focus our attention on the complex gaussian kernel and its possible application in the complex Kernel LMS algorithm. In order to derive the gradients needed to develop the complex kernel LMS (CKLMS), we employ the powerful tool of Wirtinger's Calculus, which has recently attracted much attention in the signal processing community. Writinger's calculus simplifies computations and offers an elegant tool for treating complex signals. To this end, the notion of Writinger's calculus is extended to include complex RKHSs. Experiments verify that the CKLMS offers significant performance improvements over the traditional complex LMS or Widely Linear complex LMS (WL-LMS) algorithms, when dealing with nonlinearities. △ Less

Submitted 6 May, 2010; originally announced May 2010.

Comments: 10 pages, 3 figures Manuscript submitted to ICANN 2010

arXiv:1004.3040 [pdf, ps, other]

doi 10.1109/TSP.2010.2090874

Online Sparse System Identification and Signal Reconstruction using Projections onto Weighted $\ell_1$ Balls

Authors: Yannis Kopsinis, Konstantinos Slavakis, Sergios Theodoridis

Abstract: This paper presents a novel projection-based adaptive algorithm for sparse signal and system identification. The sequentially observed data are used to generate an equivalent sequence of closed convex sets, namely hyperslabs. Each hyperslab is the geometric equivalent of a cost criterion, that quantifies "data mismatch". Sparsity is imposed by the introduction of appropriately designed weighted… ▽ More This paper presents a novel projection-based adaptive algorithm for sparse signal and system identification. The sequentially observed data are used to generate an equivalent sequence of closed convex sets, namely hyperslabs. Each hyperslab is the geometric equivalent of a cost criterion, that quantifies "data mismatch". Sparsity is imposed by the introduction of appropriately designed weighted $\ell_1$ balls. The algorithm develops around projections onto the sequence of the generated hyperslabs as well as the weighted $\ell_1$ balls. The resulting scheme exhibits linear dependence, with respect to the unknown system's order, on the number of multiplications/additions and an $\mathcal{O}(L\log_2L)$ dependence on sorting operations, where $L$ is the length of the system/signal to be estimated. Numerical results are also given to validate the performance of the proposed method against the LASSO algorithm and two very recently developed adaptive sparse LMS and LS-type of adaptive algorithms, which are considered to belong to the same algorithmic family. △ Less

Submitted 18 April, 2010; originally announced April 2010.

Comments: Extented version of preprint submitted to IEEE trans. on Signal Processing

arXiv:0910.3928 [pdf, ps, other]

doi 10.1109/TSP.2010.2043129

Preamble-Based Channel Estimation for CP-OFDM and OFDM/OQAM Systems: A Comparative Study

Authors: Dimitris Katselis, Eleftherios Kofidis, Athanasios Rontogiannis, Sergios Theodoridis

Abstract: In this paper, preamble-based least squares (LS) channel estimation in OFDM systems of the QAM and offset QAM (OQAM) types is considered, in both the frequency and the time domains. The construction of optimal (in the mean squared error (MSE) sense) preambles is investigated, for both the cases of full (all tones carrying pilot symbols) and sparse (a subset of pilot tones, surrounded by nulls or… ▽ More In this paper, preamble-based least squares (LS) channel estimation in OFDM systems of the QAM and offset QAM (OQAM) types is considered, in both the frequency and the time domains. The construction of optimal (in the mean squared error (MSE) sense) preambles is investigated, for both the cases of full (all tones carrying pilot symbols) and sparse (a subset of pilot tones, surrounded by nulls or data) preambles. The two OFDM systems are compared for the same transmit power, which, for cyclic prefix (CP) based OFDM/QAM, also includes the power spent for CP transmission. OFDM/OQAM, with a sparse preamble consisting of equipowered and equispaced pilots embedded in zeros, turns out to perform at least as well as CP-OFDM. Simulations results are presented that verify the analysis. △ Less

Submitted 20 October, 2009; originally announced October 2009.

Showing 1–33 of 33 results for author: Theodoridis, S