-
Capacity-Maximizing Input Symbol Selection for Discrete Memoryless Channels
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Deniz Gündüz,
Nir Weinberger
Abstract:
Motivated by communication systems with constrained complexity, we consider the problem of input symbol selection for discrete memoryless channels (DMCs). Given a DMC, the goal is to find a subset of its input alphabet, so that the optimal input distribution that is only supported on these symbols maximizes the capacity among all other subsets of the same size (or smaller). We observe that the res…
▽ More
Motivated by communication systems with constrained complexity, we consider the problem of input symbol selection for discrete memoryless channels (DMCs). Given a DMC, the goal is to find a subset of its input alphabet, so that the optimal input distribution that is only supported on these symbols maximizes the capacity among all other subsets of the same size (or smaller). We observe that the resulting optimization problem is non-concave and non-submodular, and so generic methods for such cases do not have theoretical guarantees. We derive an analytical upper bound on the capacity loss when selecting a subset of input symbols based only on the properties of the transition matrix of the channel. We propose a selection algorithm that is based on input-symbols clustering, and an appropriate choice of representatives for each cluster, which uses the theoretical bound as a surrogate objective function. We provide numerical experiments to support the findings.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
A Toolbox for Refined Information-Theoretic Analyses with Applications
Authors:
Neri Merhav,
Nir Weinberger
Abstract:
This monograph offers a toolbox of mathematical techniques, which have been effective and widely applicable in information-theoretic analysis. The first tool is a generalization of the method of types to Gaussian settings, and then to general exponential families. The second tool is Laplace and saddle-point integration, which allow to refine the results of the method of types, and are capable of o…
▽ More
This monograph offers a toolbox of mathematical techniques, which have been effective and widely applicable in information-theoretic analysis. The first tool is a generalization of the method of types to Gaussian settings, and then to general exponential families. The second tool is Laplace and saddle-point integration, which allow to refine the results of the method of types, and are capable of obtaining more precise results. The third is the type class enumeration method, a principled method to evaluate the exact random-coding exponent of coded systems, which results in the best known exponent in various problem settings. The fourth subset of tools aimed at evaluating the expectation of non-linear functions of random variables, either via integral representations, or by a refinement of Jensen's inequality via change-of-measure, by complementing Jensen's inequality with a reversed inequality, or by a class of generalized Jensen's inequalities that are applicable for functions beyond convex/concave. Various application examples of all these tools are provided along this monograph.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
On Bits and Bandits: Quantifying the Regret-Information Trade-off
Authors:
Itai Shufaro,
Nadav Merlis,
Nir Weinberger,
Shie Mannor
Abstract:
In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources. We examine the trade-off between the information an agent accumulates and the regret it suffers. We show that information from external sources, measured in bits, can be traded off for regret, measured in reward. We invoke informati…
▽ More
In interactive decision-making tasks, information can be acquired by direct interactions, through receiving indirect feedback, and from external knowledgeable sources. We examine the trade-off between the information an agent accumulates and the regret it suffers. We show that information from external sources, measured in bits, can be traded off for regret, measured in reward. We invoke information-theoretic methods for obtaining regret lower bounds, that also allow us to easily re-derive several known lower bounds. We then generalize a variety of interactive decision-making tasks with external information to a new setting. Using this setting, we introduce the first Bayesian regret lower bounds that depend on the information an agent accumulates. These lower bounds also prove the near-optimality of Thompson sampling for Bayesian problems. Finally, we demonstrate the utility of these bounds in improving the performance of a question-answering task with large language models, allowing us to obtain valuable insights.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Capacity of Frequency-based Channels: Encoding Information in Molecular Concentrations
Authors:
Yuval Gerzon,
Ilan Shomorony,
Nir Weinberger
Abstract:
We consider a molecular channel, in which messages are encoded to the frequency of objects (or concentration of molecules) in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. We tightly characterize the capacity of this channel using upper and lower bounds, when the number of objects in the pool of obj…
▽ More
We consider a molecular channel, in which messages are encoded to the frequency of objects (or concentration of molecules) in a pool, and whose output during reading time is a noisy version of the input frequencies, as obtained by sampling with replacement from the pool. We tightly characterize the capacity of this channel using upper and lower bounds, when the number of objects in the pool of objects is constrained. We apply this result to the DNA storage channel in the short-molecule regime, and show that even though the capacity of this channel is technically zero, it can still achieve a large information density.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Information Rates Over Multi-View Channels
Authors:
V. Arvind Rameshwar,
Nir Weinberger
Abstract:
We investigate the fundamental limits of reliable communication over multi-view channels, in which the channel output is comprised of a large number of independent noisy views of a transmitted symbol. We consider first the setting of multi-view discrete memoryless channels and then extend our results to general multi-view channels (using multi-letter formulas). We argue that the channel capacity a…
▽ More
We investigate the fundamental limits of reliable communication over multi-view channels, in which the channel output is comprised of a large number of independent noisy views of a transmitted symbol. We consider first the setting of multi-view discrete memoryless channels and then extend our results to general multi-view channels (using multi-letter formulas). We argue that the channel capacity and dispersion of such multi-view channels converge exponentially fast in the number of views to the entropy and varentropy of the input distribution, respectively. We identify the exact rate of convergence as the smallest Chernoff information between two conditional distributions of the output, conditioned on unequal inputs. For the special case of the deletion channel, we compute upper bounds on this Chernoff information. Finally, we present a new channel model we term the Poisson approximation channel -- of possible independent interest -- whose capacity closely approximates the capacity of the multi-view binary symmetric channel for any fixed number of views.
△ Less
Submitted 12 May, 2024;
originally announced May 2024.
-
A representation-learning game for classes of prediction tasks
Authors:
Neria Uzan,
Nir Weinberger
Abstract:
We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and th…
▽ More
We propose a game-based formulation for learning dimensionality-reducing representations of feature vectors, when only a prior knowledge on future prediction tasks is available. In this game, the first player chooses a representation, and then the second player adversarially chooses a prediction task from a given class, representing the prior knowledge. The first player aims is to minimize, and the second player to maximize, the regret: The minimal prediction loss using the representation, compared to the same loss using the original features. For the canonical setting in which the representation, the response to predict and the predictors are all linear functions, and under the mean squared error loss function, we derive the theoretically optimal representation in pure strategies, which shows the effectiveness of the prior knowledge, and the optimal regret in mixed strategies, which shows the usefulness of randomizing the representation. For general representations and loss functions, we propose an efficient algorithm to optimize a randomized representation. The algorithm only requires the gradients of the loss function, and is based on incrementally adding a representation rule to a mixture of such rules.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Statistical curriculum learning: An elimination algorithm achieving an oracle risk
Authors:
Omer Cohen,
Ron Meir,
Nir Weinberger
Abstract:
We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The…
▽ More
We consider a statistical version of curriculum learning (CL) in a parametric prediction setting. The learner is required to estimate a target parameter vector, and can adaptively collect samples from either the target model, or other source models that are similar to the target model, but less noisy. We consider three types of learners, depending on the level of side-information they receive. The first two, referred to as strong/weak-oracle learners, receive high/low degrees of information about the models, and use these to learn. The third, a fully adaptive learner, estimates the target parameter vector without any prior information. In the single source case, we propose an elimination learning method, whose risk matches that of a strong-oracle learner. In the multiple source case, we advocate that the risk of the weak-oracle learner is a realistic benchmark for the risk of adaptive learners. We develop an adaptive multiple elimination-rounds CL algorithm, and characterize instance-dependent conditions for its risk to match that of the weak-oracle learner. We consider instance-dependent minimax lower bounds, and discuss the challenges associated with defining the class of instances for the bound. We derive two minimax lower bounds, and determine the conditions under which the performance weak-oracle learner is minimax optimal.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
Characterization of the Distortion-Perception Tradeoff for Finite Channels with Arbitrary Metrics
Authors:
Dror Freirich,
Nir Weinberger,
Ron Meir
Abstract:
Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-$1$ distance induced by a general metric as the perception index, and an arbitrary distortion ma…
▽ More
Whenever inspected by humans, reconstructed signals should not be distinguished from real ones. Typically, such a high perceptual quality comes at the price of high reconstruction error, and vice versa. We study this distortion-perception (DP) tradeoff over finite-alphabet channels, for the Wasserstein-$1$ distance induced by a general metric as the perception index, and an arbitrary distortion matrix. Under this setting, we show that computing the DP function and the optimal reconstructions is equivalent to solving a set of linear programming problems. We provide a structural characterization of the DP tradeoff, where the DP function is piecewise linear in the perception index. We further derive a closed-form expression for the case of binary sources.
△ Less
Submitted 3 February, 2024;
originally announced February 2024.
-
The Joint Effect of Task Similarity and Overparameterization on Catastrophic Forgetting -- An Analytical Model
Authors:
Daniel Goldfarb,
Itay Evron,
Nir Weinberger,
Daniel Soudry,
Paul Hand
Abstract:
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression…
▽ More
In continual learning, catastrophic forgetting is affected by multiple aspects of the tasks. Previous works have analyzed separately how forgetting is affected by either task similarity or overparameterization. In contrast, our paper examines how task similarity and overparameterization jointly affect forgetting in an analyzable model. Specifically, we focus on two-task continual linear regression, where the second task is a random orthogonal transformation of an arbitrary first task (an abstraction of random permutation tasks). We derive an exact analytical expression for the expected forgetting - and uncover a nuanced pattern. In highly overparameterized models, intermediate task similarity causes the most forgetting. However, near the interpolation threshold, forgetting decreases monotonically with the expected task similarity. We validate our findings with linear regression on synthetic data, and with neural networks on established permutation task benchmarks.
△ Less
Submitted 24 January, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
Maximal-Capacity Discrete Memoryless Channel Identification
Authors:
Maximilian Egger,
Rawad Bitar,
Antonia Wachter-Zeh,
Deniz Gündüz,
Nir Weinberger
Abstract:
The problem of identifying the channel with the highest capacity among several discrete memoryless channels (DMCs) is considered. The problem is cast as a pure-exploration multi-armed bandit problem, which follows the practical use of training sequences to sense the communication channel statistics. A capacity estimator is proposed and tight confidence bounds on the estimator error are derived. Ba…
▽ More
The problem of identifying the channel with the highest capacity among several discrete memoryless channels (DMCs) is considered. The problem is cast as a pure-exploration multi-armed bandit problem, which follows the practical use of training sequences to sense the communication channel statistics. A capacity estimator is proposed and tight confidence bounds on the estimator error are derived. Based on this capacity estimator, a gap-elimination algorithm termed BestChanID is proposed, which is oblivious to the capacity-achieving input distribution and is guaranteed to output the DMC with the largest capacity, with a desired confidence. Furthermore, two additional algorithms NaiveChanSel and MedianChanEl, that output with certain confidence a DMC with capacity close to the maximal, are introduced. Each of those algorithms is beneficial in a different regime and can be used as a subroutine in BestChanID. The sample complexity of all algorithms is analyzed as a function of the desired confidence parameter, the number of channels, and the channels' input and output alphabet sizes. The cost of best channel identification is shown to scale quadratically with the alphabet size, and a fundamental lower bound for the required number of channel senses to identify the best channel with a certain confidence is derived.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
How do Minimum-Norm Shallow Denoisers Look in Function Space?
Authors:
Chen Zeno,
Greg Ongie,
Yaniv Blumenfeld,
Nir Weinberger,
Daniel Soudry
Abstract:
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we aim to characterize the functions realized by shallow ReLU NN denoisers -- in the common theoretical setting of interpolation (i.e., zero training loss…
▽ More
Neural network (NN) denoisers are an essential building block in many common tasks, ranging from image reconstruction to image generation. However, the success of these models is not well understood from a theoretical perspective. In this paper, we aim to characterize the functions realized by shallow ReLU NN denoisers -- in the common theoretical setting of interpolation (i.e., zero training loss) with a minimal representation cost (i.e., minimal $\ell^2$ norm weights). First, for univariate data, we derive a closed form for the NN denoiser function, find it is contractive toward the clean data points, and prove it generalizes better than the empirical MMSE estimator at a low noise level. Next, for multivariate data, we find the NN denoiser functions in a closed form under various geometric assumptions on the training data: data contained in a low-dimensional subspace, data contained in a union of one-sided rays, or several types of simplexes. These functions decompose into a sum of simple rank-one piecewise linear interpolations aligned with edges and/or faces connecting training samples. We empirically verify this alignment phenomenon on synthetic data and real images.
△ Less
Submitted 16 January, 2024; v1 submitted 12 November, 2023;
originally announced November 2023.
-
M-DAB: An Input-Distribution Optimization Algorithm for Composite DNA Storage by the Multinomial Channel
Authors:
Adir Kobovich,
Eitan Yaakobi,
Nir Weinberger
Abstract:
Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose an optimization algorithm for its capacity achieving input distribution, for an arbitrary number of output reads. The algorithm is termed multidi…
▽ More
Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose an optimization algorithm for its capacity achieving input distribution, for an arbitrary number of output reads. The algorithm is termed multidimensional dynamic assignment Blahut-Arimoto (M-DAB), and is a generalized version of the DAB algorithm, proposed by Wesel et al. developed for the binomial channel. We also empirically observe a scaling law behavior of the capacity as a function of the support size of the capacity-achieving input distribution.
△ Less
Submitted 29 September, 2023;
originally announced September 2023.
-
Fundamental Limits of Reference-Based Sequence Reordering
Authors:
Nir Weinberger,
Ilan Shomorony
Abstract:
The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the fragments are relatively long, and typically no fragment appears more than once, the scaling of the failure probability of maximum likelihood reconstruction algorithm…
▽ More
The problem of reconstructing a sequence of independent and identically distributed symbols from a set of equal size, consecutive, fragments, as well as a dependent reference sequence, is considered. First, in the regime in which the fragments are relatively long, and typically no fragment appears more than once, the scaling of the failure probability of maximum likelihood reconstruction algorithm is exactly determined for perfect reconstruction and bounded for partial reconstruction. Second, the regime in which the fragments are relatively short and repeating fragments abound is characterized. A trade-off is stated between the fraction of fragments that cannot be adequately reconstructed vs. the distortion level allowed for the reconstruction of each fragment, while still allowing vanishing failure probability
△ Less
Submitted 19 July, 2023;
originally announced July 2023.
-
On Mismatched Oblivious Relaying
Authors:
Michael Dikshtein,
Nir Weinberger,
Shlomo Shamai
Abstract:
We consider the problem of reliable communication over a discrete memoryless channel (DMC) with the help of a relay, termed the information bottleneck (IB) channel. There is no direct link between the source and the destination, and the information flows in two hops. The first hop is a noisy channel from the source to the relay. The second hop is a noiseless but limited-capacity backhaul link from…
▽ More
We consider the problem of reliable communication over a discrete memoryless channel (DMC) with the help of a relay, termed the information bottleneck (IB) channel. There is no direct link between the source and the destination, and the information flows in two hops. The first hop is a noisy channel from the source to the relay. The second hop is a noiseless but limited-capacity backhaul link from the relay to the decoder. We further assume that the relay is oblivious to the transmission codebook. We examine two mismatch scenarios. In the first setting, we assume the decoder is restricted to use some fixed decoding rule, which is mismatched to the actual channel. In the second setting, we assume that the relay is restricted to use some fixed compression metric, which is again mismatched to the statistics of the relay input. We establish bounds on the random- coding capacity of both settings, some of which are shown to be ensemble tight.
△ Less
Submitted 1 May, 2023;
originally announced May 2023.
-
Quantifying the Loss of Acyclic Join Dependencies
Authors:
Batya Kenig,
Nir Weinberger
Abstract:
Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by…
▽ More
Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic lower bound on the percentage of redundant tuples. For an upper bound, we propose a random database model, and establish a high probability bound on the percentage of redundant tuples, which coincides with the lower bound for large databases.
△ Less
Submitted 10 April, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Multi-Armed Bandits with Self-Information Rewards
Authors:
Nir Weinberger,
Michal Yemini
Abstract:
This paper introduces the informational multi-armed bandit (IMAB) model in which at each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. Thus, the expected reward of an arm is the Shannon entropy of the probability mass function of the source that generates its symbols. The player aims to maximize the expected total…
▽ More
This paper introduces the informational multi-armed bandit (IMAB) model in which at each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. Thus, the expected reward of an arm is the Shannon entropy of the probability mass function of the source that generates its symbols. The player aims to maximize the expected total reward associated with the entropy values of the arms played. Under the assumption that the alphabet size is known, two UCB-based algorithms are proposed for the IMAB model which consider the biases of the plug-in entropy estimator. The first algorithm optimistically corrects the bias term in the entropy estimation. The second algorithm relies on data-dependent confidence intervals that adapt to sources with small entropy values. Performance guarantees are provided by upper bounding the expected regret of each of the algorithms. Furthermore, in the Bernoulli case, the asymptotic behavior of these algorithms is compared to the Lai-Robbins lower bound for the pseudo regret. Additionally, under the assumption that the \textit{exact} alphabet size is unknown, and instead the player only knows a loose upper bound on it, a UCB-based algorithm is proposed, in which the player aims to reduce the regret caused by the unknown alphabet size in a finite time regime. Numerical results illustrating the expected regret of the algorithms presented in the paper are provided.
△ Less
Submitted 6 September, 2022;
originally announced September 2022.
-
On Information Bottleneck for Gaussian Processes
Authors:
Michael Dikshtein,
Nir Weinberger,
Shlomo Shamai
Abstract:
The information bottleneck problem (IB) of jointly stationary Gaussian sources is considered. A water-filling solution for the IB rate is given in terms of its SNR spectrum and whose rate is attained via frequency domain test-channel realization. A time-domain realization of the IB rate, based on linear prediction, is also proposed, which lends itself to an efficient implementation of the correspo…
▽ More
The information bottleneck problem (IB) of jointly stationary Gaussian sources is considered. A water-filling solution for the IB rate is given in terms of its SNR spectrum and whose rate is attained via frequency domain test-channel realization. A time-domain realization of the IB rate, based on linear prediction, is also proposed, which lends itself to an efficient implementation of the corresponding remote source-coding problem. A compound version of the problem is addressed, in which the joint distribution of the source is not precisely specified but rather in terms of a lower bound on the guaranteed mutual information. It is proved that a white SNR spectrum is optimal for this setting.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
Mean Estimation in High-Dimensional Binary Markov Gaussian Mixture Models
Authors:
Yihan Zhang,
Nir Weinberger
Abstract:
We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $θ_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by i…
▽ More
We consider a high-dimensional mean estimation problem over a binary hidden Markov model, which illuminates the interplay between memory in data, sample size, dimension, and signal strength in statistical inference. In this model, an estimator observes $n$ samples of a $d$-dimensional parameter vector $θ_{*}\in\mathbb{R}^{d}$, multiplied by a random sign $ S_i $ ($1\le i\le n$), and corrupted by isotropic standard Gaussian noise. The sequence of signs $\{S_{i}\}_{i\in[n]}\in\{-1,1\}^{n}$ is drawn from a stationary homogeneous Markov chain with flip probability $δ\in[0,1/2]$. As $δ$ varies, this model smoothly interpolates two well-studied models: the Gaussian Location Model for which $δ=0$ and the Gaussian Mixture Model for which $δ=1/2$. Assuming that the estimator knows $δ$, we establish a nearly minimax optimal (up to logarithmic factors) estimation error rate, as a function of $\|θ_{*}\|,δ,d,n$. We then provide an upper bound to the case of estimating $δ$, assuming a (possibly inaccurate) knowledge of $θ_{*}$. The bound is proved to be tight when $θ_{*}$ is an accurately known constant. These results are then combined to an algorithm which estimates $θ_{*}$ with $δ$ unknown a priori, and theoretical guarantees on its error are stated.
△ Less
Submitted 12 October, 2022; v1 submitted 6 June, 2022;
originally announced June 2022.
-
Error Probability Bounds for Coded-Index DNA Storage
Authors:
Nir Weinberger
Abstract:
The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is considered, in which the $m$th molecule of the codeword is restricted to a subset of all possible molecules (an inner code), which is unique for each $m$.…
▽ More
The DNA storage channel is considered, in which a codeword is comprised of $M$ unordered DNA molecules. At reading time, $N$ molecules are sampled with replacement, and then each molecule is sequenced. A coded-index concatenated-coding scheme is considered, in which the $m$th molecule of the codeword is restricted to a subset of all possible molecules (an inner code), which is unique for each $m$. The decoder has low-complexity, and is based on first decoding each molecule separately (the inner code), and then decoding the sequence of molecules (an outer code). Only mild assumptions are made on the sequencing channel, in the form of the existence of an inner code and decoder with vanishing error. The error probability of a random code as well as an expurgated code is analyzed and shown to decay exponentially with $N$. This establishes the importance of increasing the coverage depth $N/M$ in order to obtain low error probability.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
The Compound Information Bottleneck Outlook
Authors:
Michael Dikshtein,
Nir Weinberger,
Shlomo Shamai
Abstract:
We formulate and analyze the compound information bottleneck programming. In this problem, a Markov chain $ \mathsf{X} \rightarrow \mathsf{Y} \rightarrow \mathsf{Z} $ is assumed with fixed marginal distributions $\mathsf{P}_{\mathsf{X}}$ and $\mathsf{P}_{\mathsf{Y}}$, and the mutual information between $ \mathsf{X} $ and $ \mathsf{Z} $ is sought to be maximized over the choice of conditional proba…
▽ More
We formulate and analyze the compound information bottleneck programming. In this problem, a Markov chain $ \mathsf{X} \rightarrow \mathsf{Y} \rightarrow \mathsf{Z} $ is assumed with fixed marginal distributions $\mathsf{P}_{\mathsf{X}}$ and $\mathsf{P}_{\mathsf{Y}}$, and the mutual information between $ \mathsf{X} $ and $ \mathsf{Z} $ is sought to be maximized over the choice of conditional probability of $\mathsf{Z}$ given $\mathsf{Y}$ from a given class, under the \textit{worst choice} of the joint probability of the pair $(\mathsf{X},\mathsf{Y})$ from a different class. We consider several classes based on extremes of: mutual information; minimal correlation; total variation; and the relative entropy class. We provide values, bounds, and various characterizations for specific instances of this problem: the binary symmetric case, the scalar Gaussian case, the vector Gaussian case and the symmetric modulo-additive case. Finally, for the general case, we propose a Blahut-Arimoto type of alternating iterations algorithm to find a consistent solution to this problem.
△ Less
Submitted 9 May, 2022;
originally announced May 2022.
-
Learning Maximum Margin Channel Decoders
Authors:
Amit Tsvieli,
Nir Weinberger
Abstract:
The problem of learning a channel decoder is considered for two channel models. The first model is an additive noise channel whose noise distribution is unknown and nonparametric. The learner is provided with a fixed codebook and a dataset comprised of independent samples of the noise, and is required to select a precision matrix for a nearest neighbor decoder in terms of the Mahalanobis distance.…
▽ More
The problem of learning a channel decoder is considered for two channel models. The first model is an additive noise channel whose noise distribution is unknown and nonparametric. The learner is provided with a fixed codebook and a dataset comprised of independent samples of the noise, and is required to select a precision matrix for a nearest neighbor decoder in terms of the Mahalanobis distance. The second model is a non-linear channel with additive white Gaussian noise and unknown channel transformation. The learner is provided with a fixed codebook and a dataset comprised of independent input-output samples of the channel, and is required to select a matrix for a nearest neighbor decoder with a linear kernel. For both models, the objective of maximizing the margin of the decoder is addressed. Accordingly, for each channel model, a regularized loss minimization problem with a codebook-related regularization term and hinge-like loss function is developed, which is inspired by the support vector machine paradigm for classification problems. Expected generalization error bounds for the error probability loss function are provided for both models, under optimal choice of the regularization parameter. For the additive noise channel, a theoretical guidance for choosing the training signal-to-noise ratio is proposed based on this bound. In addition, for the non-linear channel, a high probability uniform generalization error bound is provided for the hypothesis class. For each channel, a stochastic sub-gradient descent algorithm for solving the regularized loss minimization problem is proposed, and an optimization error bound is stated. The performance of the proposed algorithms is demonstrated through several examples.
△ Less
Submitted 15 February, 2023; v1 submitted 13 March, 2022;
originally announced March 2022.
-
Robust Linear Regression for General Feature Distribution
Authors:
Tom Norman,
Nir Weinberger,
Kfir Y. Levy
Abstract:
We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary than may know the data distribution but is otherwise oblivious to the realizations of the data samples. This model has been previously analyzed under strong assumptions. Concretely, $\textbf{(i)}$ all previous works assume that the covariance matrix of the features is positive defin…
▽ More
We investigate robust linear regression where data may be contaminated by an oblivious adversary, i.e., an adversary than may know the data distribution but is otherwise oblivious to the realizations of the data samples. This model has been previously analyzed under strong assumptions. Concretely, $\textbf{(i)}$ all previous works assume that the covariance matrix of the features is positive definite; and $\textbf{(ii)}$ most of them assume that the features are centered (i.e. zero mean). Additionally, all previous works make additional restrictive assumption, e.g., assuming that the features are Gaussian or that the corruptions are symmetrically distributed.
In this work we go beyond these assumptions and investigate robust regression under a more general set of assumptions: $\textbf{(i)}$ we allow the covariance matrix to be either positive definite or positive semi definite, $\textbf{(ii)}$ we do not necessarily assume that the features are centered, $\textbf{(iii)}$ we make no further assumption beyond boundedness (sub-Gaussianity) of features and measurement noise.
Under these assumption we analyze a natural SGD variant for this problem and show that it enjoys a fast convergence rate when the covariance matrix is positive definite. In the positive semi definite case we show that there are two regimes: if the features are centered we can obtain a standard convergence rate; otherwise the adversary can cause any learner to fail arbitrarily.
△ Less
Submitted 4 February, 2022;
originally announced February 2022.
-
Generalization Bounds and Algorithms for Learning to Communicate over Additive Noise Channels
Authors:
Nir Weinberger
Abstract:
An additive noise channel is considered, in which the distribution of the noise is nonparametric and unknown. The problem of learning encoders and decoders based on noise samples is considered. For uncoded communication systems, the problem of choosing a codebook and possibly also a generalized minimal distance decoder (which is parameterized by a covariance matrix) is addressed. High probability…
▽ More
An additive noise channel is considered, in which the distribution of the noise is nonparametric and unknown. The problem of learning encoders and decoders based on noise samples is considered. For uncoded communication systems, the problem of choosing a codebook and possibly also a generalized minimal distance decoder (which is parameterized by a covariance matrix) is addressed. High probability generalization bounds for the error probability loss function, as well as for a hinge-type surrogate loss function are provided. A stochastic-gradient based alternating-minimization algorithm for the latter loss function is proposed. In addition, a Gibbs-based algorithm that gradually expurgates an initial codebook from codewords in order to obtain a smaller codebook with improved error probability is proposed, and bounds on its average empirical error and generalization error, as well as a high probability generalization bound, are stated. Various experiments demonstrate the performance of the proposed algorithms. For coded systems, the problem of maximizing the mutual information between the input and the output with respect to the input distribution is addressed, and uniform convergence bounds for two different classes of input distributions are obtained.
△ Less
Submitted 16 November, 2021;
originally announced November 2021.
-
The DNA Storage Channel: Capacity and Error Probability
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
The DNA storage channel is considered, in which the $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword are stored without order, sampled $N$ times with replacement, and then sequenced over a discrete memoryless channel. For a constant coverage depth $M/N$ and molecule length scaling $Θ(\log M)$, lower (achievability) and upper (converse) bounds on the capacity of the channel, as we…
▽ More
The DNA storage channel is considered, in which the $M$ Deoxyribonucleic acid (DNA) molecules comprising each codeword are stored without order, sampled $N$ times with replacement, and then sequenced over a discrete memoryless channel. For a constant coverage depth $M/N$ and molecule length scaling $Θ(\log M)$, lower (achievability) and upper (converse) bounds on the capacity of the channel, as well as a lower (achievability) bound on the reliability function of the channel are provided. Both the lower and upper bounds on the capacity generalize a bound which was previously known to hold only for the binary symmetric sequencing channel, and only under certain restrictions on the molecule length scaling and the crossover probability parameters. When specified to binary symmetric sequencing channel, these restrictions are completely removed for the lower bound and are significantly relaxed for the upper bound in the high-noise regime. The lower bound on the reliability function is achieved under a universal decoder, and reveals that the dominant error event is that of outage -- the event in which the capacity of the channel induced by the DNA molecule sampling operation does not support the target rate.
△ Less
Submitted 13 February, 2022; v1 submitted 26 September, 2021;
originally announced September 2021.
-
The EM Algorithm is Adaptively-Optimal for Unbalanced Symmetric Gaussian Mixtures
Authors:
Nir Weinberger,
Guy Bresler
Abstract:
This paper studies the problem of estimating the means $\pmθ_{*}\in\mathbb{R}^{d}$ of a symmetric two-component Gaussian mixture $δ_{*}\cdot N(θ_{*},I)+(1-δ_{*})\cdot N(-θ_{*},I)$ where the weights $δ_{*}$ and $1-δ_{*}$ are unequal. Assuming that $δ_{*}$ is known, we show that the population version of the EM algorithm globally converges if the initial estimate has non-negative inner product with…
▽ More
This paper studies the problem of estimating the means $\pmθ_{*}\in\mathbb{R}^{d}$ of a symmetric two-component Gaussian mixture $δ_{*}\cdot N(θ_{*},I)+(1-δ_{*})\cdot N(-θ_{*},I)$ where the weights $δ_{*}$ and $1-δ_{*}$ are unequal. Assuming that $δ_{*}$ is known, we show that the population version of the EM algorithm globally converges if the initial estimate has non-negative inner product with the mean of the larger weight component. This can be achieved by the trivial initialization $θ_{0}=0$. For the empirical iteration based on $n$ samples, we show that when initialized at $θ_{0}=0$, the EM algorithm adaptively achieves the minimax error rate $\tilde{O}\Big(\min\Big\{\frac{1}{(1-2δ_{*})}\sqrt{\frac{d}{n}},\frac{1}{\|θ_{*}\|}\sqrt{\frac{d}{n}},\left(\frac{d}{n}\right)^{1/4}\Big\}\Big)$ in no more than $O\Big(\frac{1}{\|θ_{*}\|(1-2δ_{*})}\Big)$ iterations (with high probability). We also consider the EM iteration for estimating the weight $δ_{*}$, assuming a fixed mean $θ$ (which is possibly mismatched to $θ_{*}$). For the empirical iteration of $n$ samples, we show that the minimax error rate $\tilde{O}\Big(\frac{1}{\|θ_{*}\|}\sqrt{\frac{d}{n}}\Big)$ is achieved in no more than $O\Big(\frac{1}{\|θ_{*}\|^{2}}\Big)$ iterations. These results robustify and complement recent results of Wu and Zhou obtained for the equal weights case $δ_{*}=1/2$.
△ Less
Submitted 29 March, 2021;
originally announced March 2021.
-
Large Deviations Behavior of the Logarithmic Error Probability of Random Codes
Authors:
Ran Tamir,
Neri Merhav,
Nir Weinberger,
Albert Guillen i Fabregas
Abstract:
This work studies the deviations of the error exponent of the constant composition code ensemble around its expectation, known as the error exponent of the typical random code (TRC). In particular, it is shown that the probability of randomly drawing a codebook whose error exponent is smaller than the TRC exponent is exponentially small; upper and lower bounds for this exponent are given, which co…
▽ More
This work studies the deviations of the error exponent of the constant composition code ensemble around its expectation, known as the error exponent of the typical random code (TRC). In particular, it is shown that the probability of randomly drawing a codebook whose error exponent is smaller than the TRC exponent is exponentially small; upper and lower bounds for this exponent are given, which coincide in some cases. In addition, the probability of randomly drawing a codebook whose error exponent is larger than the TRC exponent is shown to be double-exponentially small; upper and lower bounds to the double-exponential exponent are given. The results suggest that codebooks whose error exponent is larger than the error exponent of the TRC are extremely rare. The key ingredient in the proofs is a new large deviations result of type class enumerators with dependent variables.
△ Less
Submitted 20 December, 2019;
originally announced December 2019.
-
Guessing with a Bit of Help
Authors:
Nir Weinberger,
Ofer Shayevitz
Abstract:
What is the value of a single bit to a guesser? We study this problem in a setup where Alice wishes to guess an i.i.d. random vector, and can procure one bit of information from Bob, who observes this vector through a memoryless channel. We are interested in the guessing efficiency, which we define as the best possible multiplicative reduction in Alice's guessing-moments obtainable by observing Bo…
▽ More
What is the value of a single bit to a guesser? We study this problem in a setup where Alice wishes to guess an i.i.d. random vector, and can procure one bit of information from Bob, who observes this vector through a memoryless channel. We are interested in the guessing efficiency, which we define as the best possible multiplicative reduction in Alice's guessing-moments obtainable by observing Bob's bit. For the case of a uniform binary vector observed through a binary symmetric channel, we provide two lower bounds on the guessing efficiency by analyzing the performance of the Dictator and Majority functions, and two upper bounds via maximum entropy and Fourier-analytic / hypercontractivity arguments. We then extend our maximum entropy argument to give a lower bound on the guessing efficiency for a general channel with a binary uniform input, via the strong data-processing inequality constant of the reverse channel. We compute this bound for the binary erasure channel, and conjecture that Greedy Dictator functions achieve the guessing efficiency.
△ Less
Submitted 23 May, 2018;
originally announced May 2018.
-
Self-Predicting Boolean Functions
Authors:
Nir Weinberger,
Ofer Shayevitz
Abstract:
A Boolean function $g$ is said to be an optimal predictor for another Boolean function $f$, if it minimizes the probability that $f(X^{n})\neq g(Y^{n})$ among all functions, where $X^{n}$ is uniform over the Hamming cube and $Y^{n}$ is obtained from $X^{n}$ by independently flipping each coordinate with probability $δ$. This paper is about self-predicting functions, which are those that coincide w…
▽ More
A Boolean function $g$ is said to be an optimal predictor for another Boolean function $f$, if it minimizes the probability that $f(X^{n})\neq g(Y^{n})$ among all functions, where $X^{n}$ is uniform over the Hamming cube and $Y^{n}$ is obtained from $X^{n}$ by independently flipping each coordinate with probability $δ$. This paper is about self-predicting functions, which are those that coincide with their optimal predictor.
△ Less
Submitted 26 March, 2019; v1 submitted 12 January, 2018;
originally announced January 2018.
-
On the Reliability Function of Distributed Hypothesis Testing Under Optimal Detection
Authors:
Nir Weinberger,
Yuval Kochman
Abstract:
The distributed hypothesis testing problem with full side-information is studied. The trade-off (reliability function) between the two types of error exponents under limited rate is studied in the following way. First, the problem is reduced to the problem of determining the reliability function of channel codes designed for detection (in analogy to a similar result which connects the reliability…
▽ More
The distributed hypothesis testing problem with full side-information is studied. The trade-off (reliability function) between the two types of error exponents under limited rate is studied in the following way. First, the problem is reduced to the problem of determining the reliability function of channel codes designed for detection (in analogy to a similar result which connects the reliability function of distributed lossless compression and ordinary channel codes). Second, a single-letter random-coding bound based on a hierarchical ensemble, as well as a single-letter expurgated bound, are derived for the reliability of channel-detection codes. Both bounds are derived for a system which employs the optimal detection rule. We conjecture that the resulting random-coding bound is ensemble-tight, and consequently optimal within the class of quantization-and-binning schemes.
△ Less
Submitted 23 April, 2019; v1 submitted 11 January, 2018;
originally announced January 2018.
-
Expurgated Bounds for the Asymmetric Broadcast Channel
Authors:
Ran Averbuch,
Nir Weinberger,
Neri Merhav
Abstract:
This work contains two main contributions concerning the expurgation of hierarchical ensembles for the asymmetric broadcast channel. The first is an analysis of the optimal maximum likelihood (ML) decoders for the weak and strong user. Two different methods of code expurgation will be used, that will provide two competing error exponents. The second is the derivation of expurgated exponents under…
▽ More
This work contains two main contributions concerning the expurgation of hierarchical ensembles for the asymmetric broadcast channel. The first is an analysis of the optimal maximum likelihood (ML) decoders for the weak and strong user. Two different methods of code expurgation will be used, that will provide two competing error exponents. The second is the derivation of expurgated exponents under the generalized stochastic likelihood decoder (GLD). We prove that the GLD exponents are at least as tight as the maximum between the random coding error exponents derived in an earlier work by Averbuch and Merhav (2017) and one of our ML-based expurgated exponents. By that, we actually prove the existence of hierarchical codebooks that achieve the best of the random coding exponent and the expurgated exponent simultaneously for both users.
△ Less
Submitted 28 November, 2017;
originally announced November 2017.
-
On the VC-Dimension of Binary Codes
Authors:
Sihuang Hu,
Nir Weinberger,
Ofer Shayevitz
Abstract:
We investigate the asymptotic rates of length-$n$ binary codes with VC-dimension at most $dn$ and minimum distance at least $δn$. Two upper bounds are obtained, one as a simple corollary of a result by Haussler and the other via a shortening approach combining Sauer-Shelah lemma and the linear programming bound. Two lower bounds are given using Gilbert-Varshamov type arguments over constant-weight…
▽ More
We investigate the asymptotic rates of length-$n$ binary codes with VC-dimension at most $dn$ and minimum distance at least $δn$. Two upper bounds are obtained, one as a simple corollary of a result by Haussler and the other via a shortening approach combining Sauer-Shelah lemma and the linear programming bound. Two lower bounds are given using Gilbert-Varshamov type arguments over constant-weight and Markov-type sets.
△ Less
Submitted 27 June, 2018; v1 submitted 5 March, 2017;
originally announced March 2017.
-
On the Optimal Boolean Function for Prediction under Quadratic Loss
Authors:
Nir Weinberger,
Ofer Shayevitz
Abstract:
Suppose $Y^{n}$ is obtained by observing a uniform Bernoulli random vector $X^{n}$ through a binary symmetric channel. Courtade and Kumar asked how large the mutual information between $Y^{n}$ and a Boolean function $\mathsf{b}(X^{n})$ could be, and conjectured that the maximum is attained by a dictator function. An equivalent formulation of this conjecture is that dictator minimizes the predictio…
▽ More
Suppose $Y^{n}$ is obtained by observing a uniform Bernoulli random vector $X^{n}$ through a binary symmetric channel. Courtade and Kumar asked how large the mutual information between $Y^{n}$ and a Boolean function $\mathsf{b}(X^{n})$ could be, and conjectured that the maximum is attained by a dictator function. An equivalent formulation of this conjecture is that dictator minimizes the prediction cost in a sequential prediction of $Y^{n}$ under logarithmic loss, given $\mathsf{b}(X^{n})$. In this paper, we study the question of minimizing the sequential prediction cost under a different (proper) loss function - the quadratic loss. In the noiseless case, we show that majority asymptotically minimizes this prediction cost among all Boolean functions. We further show that for weak noise, majority is better than dictator, and that for strong noise dictator outperforms majority. We conjecture that for quadratic loss, there is no single sequence of Boolean functions that is simultaneously (asymptotically) optimal at all noise levels.
△ Less
Submitted 8 July, 2016;
originally announced July 2016.
-
Lower Bounds on Parameter Modulation-Estimation Under Bandwidth Constraints
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
We consider the problem of modulating the value of a parameter onto a band-limited signal to be transmitted over a continuous-time, additive white Gaussian noise (AWGN) channel, and estimating this parameter at the receiver. The performance is measured by the mean power-$α$ error (MP$α$E), which is defined as the worst-case $α$-th order moment of the absolute estimation error. The optimal exponent…
▽ More
We consider the problem of modulating the value of a parameter onto a band-limited signal to be transmitted over a continuous-time, additive white Gaussian noise (AWGN) channel, and estimating this parameter at the receiver. The performance is measured by the mean power-$α$ error (MP$α$E), which is defined as the worst-case $α$-th order moment of the absolute estimation error. The optimal exponential decay rate of the MP$α$E as a function of the transmission time, is investigated. Two upper (converse) bounds on the MP$α$E exponent are derived, on the basis of known bounds for the AWGN channel of inputs with unlimited bandwidth. The bounds are computed for typical values of the error moment and the signal-to-noise ratio (SNR), and the SNR asymptotics of the different bounds are analyzed. The new bounds are compared to known converse and achievability bounds, which were derived from channel coding considerations.
△ Less
Submitted 21 June, 2016;
originally announced June 2016.
-
Channel Detection in Coded Communication
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
We consider the problem of block-coded communication, where in each block, the channel law belongs to one of two disjoint sets. The decoder is aimed to decode only messages that have undergone a channel from one of the sets, and thus has to detect the set which contains the prevailing channel. We begin with the simplified case where each of the sets is a singleton. For any given code, we derive th…
▽ More
We consider the problem of block-coded communication, where in each block, the channel law belongs to one of two disjoint sets. The decoder is aimed to decode only messages that have undergone a channel from one of the sets, and thus has to detect the set which contains the prevailing channel. We begin with the simplified case where each of the sets is a singleton. For any given code, we derive the optimum detection/decoding rule in the sense of the best trade-off among the probabilities of decoding error, false alarm, and misdetection, and also introduce sub-optimal detection/decoding rules which are simpler to implement. Then, various achievable bounds on the error exponents are derived, including the exact single-letter characterization of the random coding exponents for the optimal detector/decoder. We then extend the random coding analysis to general sets of channels, and show that there exists a universal detector/decoder which performs asymptotically as well as the optimal detector/decoder, when tuned to detect a channel from a specific pair of channels. The case of a pair of binary symmetric channels is discussed in detail.
△ Less
Submitted 6 September, 2015;
originally announced September 2015.
-
A Large Deviations Approach to Secure Lossy Compression
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
We consider a Shannon cipher system for memoryless sources, in which distortion is allowed at the legitimate decoder. The source is compressed using a rate distortion code secured by a shared key, which satisfies a constraint on the compression rate, as well as a constraint on the exponential rate of the excess-distortion probability at the legitimate decoder. Secrecy is measured by the exponentia…
▽ More
We consider a Shannon cipher system for memoryless sources, in which distortion is allowed at the legitimate decoder. The source is compressed using a rate distortion code secured by a shared key, which satisfies a constraint on the compression rate, as well as a constraint on the exponential rate of the excess-distortion probability at the legitimate decoder. Secrecy is measured by the exponential rate of the exiguous-distortion probability at the eavesdropper, rather than by the traditional measure of equivocation. We define the perfect secrecy exponent as the maximal exiguous-distortion exponent achievable when the key rate is unlimited. Under limited key rate, we prove that the maximal achievable exiguous-distortion exponent is equal to the minimum between the average key rate and the perfect secrecy exponent, for a fairly general class of variable key rate codes.
△ Less
Submitted 22 April, 2015;
originally announced April 2015.
-
Simplified Erasure/List Decoding
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
We consider the problem of erasure/list decoding using certain classes of simplified decoders. Specifically, we assume a class of erasure/list decoders, such that a codeword is in the list if its likelihood is larger than a threshold. This class of decoders both approximates the optimal decoder of Forney, and also includes the following simplified subclasses of decoding rules: The first is a funct…
▽ More
We consider the problem of erasure/list decoding using certain classes of simplified decoders. Specifically, we assume a class of erasure/list decoders, such that a codeword is in the list if its likelihood is larger than a threshold. This class of decoders both approximates the optimal decoder of Forney, and also includes the following simplified subclasses of decoding rules: The first is a function of the output vector only, but not the codebook (which is most suitable for high rates), and the second is a scaled version of the maximum likelihood decoder (which is most suitable for low rates). We provide single-letter expressions for the exact random coding exponents of any decoder in these classes, operating over a discrete memoryless channel. For each class of decoders, we find the optimal decoder within the class, in the sense that it maximizes the erasure/list exponent, under a given constraint on the error exponent. We establish the optimality of the simplified decoders of the first and second kind for low and high rates, respectively.
△ Less
Submitted 5 December, 2014;
originally announced December 2014.
-
Erasure/List Random Coding Error Exponents Are Not Universally Achievable
Authors:
Wasim Huleihel,
Nir Weinberger,
Neri Merhav
Abstract:
We study the problem of universal decoding for unknown discrete memoryless channels in the presence of erasure/list option at the decoder, in the random coding regime. Specifically, we harness a universal version of Forney's classical erasure/list decoder developed in earlier studies, which is based on the competitive minimax methodology, and guarantees universal achievability of a certain fractio…
▽ More
We study the problem of universal decoding for unknown discrete memoryless channels in the presence of erasure/list option at the decoder, in the random coding regime. Specifically, we harness a universal version of Forney's classical erasure/list decoder developed in earlier studies, which is based on the competitive minimax methodology, and guarantees universal achievability of a certain fraction of the optimum random coding error exponents. In this paper, we derive an exact single-letter expression for the maximum achievable fraction. Examples are given in which the maximal achievable fraction is strictly less than unity, which imply that, in general, there is no universal erasure/list decoder which achieves the same random coding error exponents as the optimal decoder for a known channel. This is in contrast to the situation in ordinary decoding (without the erasure/list option), where optimum exponents are universally achievable, as is well known. It is also demonstrated that previous lower bounds derived for the maximal achievable fraction are not tight in general. We then analyze a generalized random coding ensemble which incorporate a training sequence, in conjunction with a suboptimal practical decoder ("plug-in" decoder), which first estimates the channel using the known training sequence, and then decodes the remaining symbols of the codeword using the estimated channel. One of the implications of our results, is setting the stage for a reasonable criterion of optimal training. Finally, we compare the performance of the "plug-in" decoder and the universal decoder, in terms of the achievable error exponents, and show that the latter is noticeably better than the former.
△ Less
Submitted 22 June, 2017; v1 submitted 26 October, 2014;
originally announced October 2014.
-
Optimum Trade-offs Between the Error Exponent and the Excess-Rate Exponent of Variable-Rate Slepian-Wolf Coding
Authors:
Nir Weinberger,
Neri Merhav
Abstract:
We analyze the optimal trade-off between the error exponent and the excess-rate exponent for variable-rate Slepian-Wolf codes. In particular, we first derive upper (converse) bounds on the optimal error and excess-rate exponents, and then lower (achievable) bounds, via a simple class of variable-rate codes which assign the same rate to all source blocks of the same type class. Then, using the expo…
▽ More
We analyze the optimal trade-off between the error exponent and the excess-rate exponent for variable-rate Slepian-Wolf codes. In particular, we first derive upper (converse) bounds on the optimal error and excess-rate exponents, and then lower (achievable) bounds, via a simple class of variable-rate codes which assign the same rate to all source blocks of the same type class. Then, using the exponent bounds, we derive bounds on the optimal rate functions, namely, the minimal rate assigned to each type class, needed in order to achieve a given target error exponent. The resulting excess-rate exponent is then evaluated. Iterative algorithms are provided for the computation of both bounds on the optimal rate functions and their excess-rate exponents. The resulting Slepian-Wolf codes bridge between the two extremes of fixed-rate coding, which has minimal error exponent and maximal excess-rate exponent, and average-rate coding, which has maximal error exponent and minimal excess-rate exponent.
△ Less
Submitted 6 November, 2014; v1 submitted 5 January, 2014;
originally announced January 2014.