subscribe to arXiv mailings

Generalized regenerating codes and node repair on graphs

Abstract: We consider regenerating codes in distributed storage systems where connections between the nodes are constrained by a graph. In this problem, the failed node downloads the information stored at a subset of vertices of the graph for the purpose of recovering the lost data. Compared to the standard setting, regenerating codes on graphs address two additional features. The repair information is move… ▽ More We consider regenerating codes in distributed storage systems where connections between the nodes are constrained by a graph. In this problem, the failed node downloads the information stored at a subset of vertices of the graph for the purpose of recovering the lost data. Compared to the standard setting, regenerating codes on graphs address two additional features. The repair information is moved across the network, and the cost of node repair is determined by the graphical distance from the helper nodes to the failed node. Accordingly, the helpers far away from the failed node may be expected to contribute less data for repair than the nodes in the neighborhood of that node. We analyze regenerating codes with nonuniform download for repair on graphs. Moreover, in the process of repair, the information moved from the helpers to the failed node may be combined through intermediate processing, reducing the repair bandwidth. We derive lower bounds for communication complexity of node repair on graphs, including repair schemes with nonuniform download and intermediate processing, and construct codes that attain these bounds. Additionally, some of the nodes may act as adversaries, introducing errors into the data moved in the network. For repair on graphs in the presence of adversarial nodes, we construct codes that support node repair and error correction in systematic nodes. △ Less

Submitted 19 May, 2024; originally announced May 2024.

Comments: 29pp

arXiv:2405.04406 [pdf, ps, other]

Rényi divergence guarantees for hashing with linear codes

Authors: Madhura Pathegama, Alexander Barg

Abstract: We consider the problem of distilling uniform random bits from an unknown source with a given $p$-entropy using linear hashing. As our main result, we estimate the expected $p$-divergence from the uniform distribution over the ensemble of random linear codes for all integer $p\ge 2$. The proof relies on analyzing how additive noise, determined by a random element of the code from the ensemble, act… ▽ More We consider the problem of distilling uniform random bits from an unknown source with a given $p$-entropy using linear hashing. As our main result, we estimate the expected $p$-divergence from the uniform distribution over the ensemble of random linear codes for all integer $p\ge 2$. The proof relies on analyzing how additive noise, determined by a random element of the code from the ensemble, acts on the source distribution. This action leads to the transformation of the source distribution into an approximately uniform one, a process commonly referred to as distribution smoothing. We also show that hashing with Reed-Muller matrices reaches intrinsic randomness of memoryless Bernoulli sources in the $l_p$ sense for all integer $p\ge 2$. △ Less

Submitted 7 May, 2024; originally announced May 2024.

arXiv:2310.05358 [pdf, other]

doi 10.22331/q-2024-04-30-1321

A family of permutationally invariant quantum codes

Authors: Arda Aydin, Max A. Alekseyev, Alexander Barg

Abstract: We construct a new family of permutationally invariant codes that correct $t$ Pauli errors for any $t\ge 1$. We also show that codes in the new family correct quantum deletion errors as well as spontaneous decay errors. Our construction contains some of the previously known permutationally invariant quantum codes as particular cases, which also admit transversal gates. In many cases, the codes in… ▽ More We construct a new family of permutationally invariant codes that correct $t$ Pauli errors for any $t\ge 1$. We also show that codes in the new family correct quantum deletion errors as well as spontaneous decay errors. Our construction contains some of the previously known permutationally invariant quantum codes as particular cases, which also admit transversal gates. In many cases, the codes in the new family are shorter than the best previously known explicit permutationally invariant codes for Pauli errors and deletions. Furthermore, our new code family includes a new $((4,2,2))$ optimal single-deletion-correcting code. As a separate result, we generalize the conditions for permutationally invariant codes to correct $t$ Pauli errors from the previously known results for $t=1$ to any number of errors. For small $t$, these conditions can be used to construct new examples of codes by computer. △ Less

Submitted 15 April, 2024; v1 submitted 8 October, 2023; originally announced October 2023.

Comments: v3: Accepted to Quantum. This is the published version of the paper, formatted in the style of the journal

Journal ref: Quantum 8, 1321 (2024)

arXiv:2308.14558 [pdf, other]

Storage codes and recoverable systems on lines and grids

Authors: Alexander Barg, Ohad Elishco, Ryan Gabrys, Geyang Wang, Eitan Yaakobi

Abstract: A storage code is an assignment of symbols to the vertices of a connected graph $G(V,E)$ with the property that the value of each vertex is a function of the values of its neighbors, or more generally, of a certain neighborhood of the vertex in $G$. In this work we introduce a new construction method of storage codes, enabling one to construct new codes from known ones via an interleaving procedur… ▽ More A storage code is an assignment of symbols to the vertices of a connected graph $G(V,E)$ with the property that the value of each vertex is a function of the values of its neighbors, or more generally, of a certain neighborhood of the vertex in $G$. In this work we introduce a new construction method of storage codes, enabling one to construct new codes from known ones via an interleaving procedure driven by resolvable designs. We also study storage codes on $\mathbb Z$ and ${\mathbb Z}^2$ (lines and grids), finding closed-form expressions for the capacity of several one and two-dimensional systems depending on their recovery set, using connections between storage codes, graphs, anticodes, and difference-avoiding sets. △ Less

Submitted 28 August, 2023; originally announced August 2023.

arXiv:2308.11009 [pdf, ps, other]

doi 10.3390/e25111515

Smoothing of binary codes, uniform distributions, and applications

Authors: Madhura Pathegama, Alexander Barg

Abstract: The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in the Euclidean space. We aim to characterize the cases when the output distribution is close to the uniform distribution on the space, as measured by Ré… ▽ More The action of a noise operator on a code transforms it into a distribution on the respective space. Some common examples from information theory include Bernoulli noise acting on a code in the Hamming space and Gaussian noise acting on a lattice in the Euclidean space. We aim to characterize the cases when the output distribution is close to the uniform distribution on the space, as measured by Rényi divergence of order $α\in (1,\infty]$. A version of this question is known as the channel resolvability problem in information theory, and it has implications for security guarantees in wiretap channels, error correction, discrepancy, worst-to-average case complexity reductions, and many other problems. Our work quantifies the requirements for asymptotic uniformity (perfect smoothing) and identifies explicit code families that achieve it under the action of the Bernoulli and ball noise operators on the code. We derive expressions for the minimum rate of codes required to attain asymptotically perfect smoothing. In proving our results, we leverage recent results from harmonic analysis of functions on the Hamming space. Another result pertains to the use of code families in Wyner's transmission scheme on the binary wiretap channel. We identify explicit families that guarantee strong secrecy when applied in this scheme, showing that nested Reed-Muller codes can transmit messages reliably and securely over a binary symmetric wiretap channel with a positive rate. Finally, we establish a connection between smoothing and error correction in the binary symmetric channel. △ Less

Submitted 25 September, 2023; v1 submitted 21 August, 2023; originally announced August 2023.

Comments: 30 pages, 2 figures. In v.2, the characterization of smoothing capacity was extended from integer orders of the Rényi divergence to all real orders $α\in (1,\infty]$. Presentation reorganized, minor errors are corrected

Journal ref: Entropy 2023, 25, 1515

arXiv:2302.11593 [pdf, other]

doi 10.1038/s41567-024-02496-y

Quantum spherical codes

Authors: Shubham P. Jain, Joseph T. Iosue, Alexander Barg, Victor V. Albert

Abstract: We introduce a framework for constructing quantum codes defined on spheres by recasting such codes as quantum analogues of the classical spherical codes. We apply this framework to bosonic coding, obtaining multimode extensions of the cat codes that can outperform previous constructions while requiring a similar type of overhead. Our polytope-based cat codes consist of sets of points with large se… ▽ More We introduce a framework for constructing quantum codes defined on spheres by recasting such codes as quantum analogues of the classical spherical codes. We apply this framework to bosonic coding, obtaining multimode extensions of the cat codes that can outperform previous constructions while requiring a similar type of overhead. Our polytope-based cat codes consist of sets of points with large separation that at the same time form averaging sets known as spherical designs. We also recast concatenations of CSS codes with cat codes as quantum spherical codes, revealing a new way to autonomously protect against dephasing noise. △ Less

Submitted 7 December, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

Comments: 5 + 12 pages, 3 figures, 5 tables

arXiv:2212.12117 [pdf, ps, other]

Storage codes on coset graphs with asymptotically unit rate

Authors: Alexander Barg, Moshe Schwartz, Lev Yohananov

Abstract: A storage code on a graph $G$ is a set of assignments of symbols to the vertices such that every vertex can recover its value by looking at its neighbors. We consider the question of constructing large-size storage codes on triangle-free graphs constructed as coset graphs of binary linear codes. Previously it was shown that there are infinite families of binary storage codes on coset graphs with r… ▽ More A storage code on a graph $G$ is a set of assignments of symbols to the vertices such that every vertex can recover its value by looking at its neighbors. We consider the question of constructing large-size storage codes on triangle-free graphs constructed as coset graphs of binary linear codes. Previously it was shown that there are infinite families of binary storage codes on coset graphs with rate converging to 3/4. Here we show that codes on such graphs can attain rate asymptotically approaching 1. Equivalently, this question can be phrased as a version of hat-guessing games on graphs (e.g., P.J. Cameron e.a., \emph{Electronic J. Comb.} 2016). In this language, we construct triangle-free graphs with success probability of the players approaching one as the number of vertices tends to infinity. Furthermore, finding linear index codes of rate approaching zero is also an equivalent problem. Another family of storage codes on triangle-free graphs of rate approaching 1 was constructed earlier by A. Golovnev and I. Haviv (36th Computational Complexity Conf., 2021) relying on a different family of graphs. △ Less

Submitted 11 June, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: Final version, to appear in Combinatorics

MSC Class: 94B25

arXiv:2211.00797 [pdf, ps, other]

Node repair on connected graphs, Part II

Authors: Adway Patra, Alexander Barg

Abstract: We continue our study of regenerating codes in distributed storage systems where connections between the nodes are constrained by a graph. In this problem, the failed node downloads the information stored at a subset of vertices of the graph for the purpose of recovering the lost data. This information is moved across the network, and the cost of node repair is determined by the graphical distance… ▽ More We continue our study of regenerating codes in distributed storage systems where connections between the nodes are constrained by a graph. In this problem, the failed node downloads the information stored at a subset of vertices of the graph for the purpose of recovering the lost data. This information is moved across the network, and the cost of node repair is determined by the graphical distance from the helper nodes to the failed node. This problem was formulated in our recent work (IEEE IT Transactions, May 2022) where we showed that processing of the information at the intermediate nodes can yield savings in repair bandwidth over the direct forwarding of the data. While the previous paper was limited to the MSR case, here we extend our study to the case of general regenerating codes. We derive a lower bound on the repair bandwidth and formulate repair procedures with intermediate processing for several families of regenerating codes, with an emphasis on the recent constructions from multilinear algebra. We also consider the task of data retrieval for codes on graphs, deriving a lower bound on the communication bandwidth and showing that it can be attained at the MBR point of the storage-bandwidth tradeoff curve. △ Less

Submitted 1 November, 2022; originally announced November 2022.

arXiv:2210.07496 [pdf, ps, other]

On the size of maximal binary codes with 2, 3, and 4 distances

Authors: Alexander Barg, Alexey Glazyrin, Wei-Jiun Kao, Ching-Yi Lai, Pin-Chieh Tseng, Wei-Hsuan Yu

Abstract: We address the maximum size of binary codes and binary constant weight codes with few distances. Previous works established a number of bounds for these quantities as well as the exact values for a range of small code lengths. As our main results, we determine the exact size of maximal binary codes with two distances for all lengths $n\ge 6$ as well as the exact size of maximal binary constant wei… ▽ More We address the maximum size of binary codes and binary constant weight codes with few distances. Previous works established a number of bounds for these quantities as well as the exact values for a range of small code lengths. As our main results, we determine the exact size of maximal binary codes with two distances for all lengths $n\ge 6$ as well as the exact size of maximal binary constant weight codes with 2,3, and 4 distances for several values of the weight and for all but small lengths. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Main text 23 pp. and Appendix 17pp

arXiv:2206.13617

Semidefinite programming bounds for few-distance sets in the Hamming and Johnson spaces

Authors: Alexander Barg, Ching-Yi Lai, Pin-Chieh Tseng, Wei-Hsuan Yu

Abstract: We study the maximum cardinality problem of a set of few distances in the Hamming and Johnson spaces. We formulate semidefinite programs for this problem and extend the 2011 works by Barg-Musin and Musin-Nozaki. As our main result, we find new parameters for which the maximum size of two- and three-distance sets is known exactly. We study the maximum cardinality problem of a set of few distances in the Hamming and Johnson spaces. We formulate semidefinite programs for this problem and extend the 2011 works by Barg-Musin and Musin-Nozaki. As our main result, we find new parameters for which the maximum size of two- and three-distance sets is known exactly. △ Less

Submitted 7 July, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Renew the whole article to add more results and authors

arXiv:2110.02378 [pdf, ps, other]

doi 10.1109/TIT.2022.3191309

High-rate storage codes on triangle-free graphs

Authors: Alexander Barg, Gilles Zémor

Abstract: Consider an assignment of bits to the vertices of a connected graph $G(V,E)$ with the property that the value of each vertex is a function of the values of its neighbors. A collection of such assignments is called a {\em storage code} of length $|V|$ on $G$. The storage code problem can be equivalently formulated as maximizing the probability of success in a {\em guessing game} on graphs, or const… ▽ More Consider an assignment of bits to the vertices of a connected graph $G(V,E)$ with the property that the value of each vertex is a function of the values of its neighbors. A collection of such assignments is called a {\em storage code} of length $|V|$ on $G$. The storage code problem can be equivalently formulated as maximizing the probability of success in a {\em guessing game} on graphs, or constructing {\em index codes} of small rate. If $G$ contains many cliques, it is easy to construct codes of rate close to 1, so a natural problem is to construct high-rate codes on triangle-free graphs, where constructing codes of rate $>1/2$ is a nontrivial task, with few known results. In this work we construct infinite families of linear storage codes with high rate relying on coset graphs of binary linear codes. We also derive necessary conditions for such codes to have high rate, and even rate potentially close to one. We also address correction of multiple erasures in the codeword, deriving recovery guarantees based on expansion properties of the graph. Finally, we point out connections between linear storage codes and quantum CSS codes, a link to bootstrap percolation and contagion spread in graphs, and formulate a number of open problems. △ Less

Submitted 27 March, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

Comments: 20 pages. V2: minor corrections and additions

Journal ref: IEEE Transactions on Information Theory, 2022, vol. 68, no. 12, pp. 7787-7797

arXiv:2108.08679 [pdf, ps, other]

A construction of maximally recoverable codes

Authors: Alexander Barg, Zitan Chen, Itzhak Tamo

Abstract: We construct a family of linear maximally recoverable codes with locality $r$ and dimension $r+1.$ For codes of length $n$ with $r\approx n^α, 0\leα\le 1$ the code alphabet is of the order $n^{1+3α},$ which improves upon the previously known constructions of maximally recoverable codes. We construct a family of linear maximally recoverable codes with locality $r$ and dimension $r+1.$ For codes of length $n$ with $r\approx n^α, 0\leα\le 1$ the code alphabet is of the order $n^{1+3α},$ which improves upon the previously known constructions of maximally recoverable codes. △ Less

Submitted 19 August, 2021; originally announced August 2021.

arXiv:2108.00939 [pdf, ps, other]

Node repair on connected graphs

Authors: Adway Patra, Alexander Barg

Abstract: We study the problem of erasure correction (node repair) for regenerating codes defined on graphs wherein the cost of transmitting the information to the failed node depends on the graphical distance from this node to the helper vertices of the graph. The information passed to the failed node from the helpers traverses several vertices of the graph, and savings in communication complexity can be a… ▽ More We study the problem of erasure correction (node repair) for regenerating codes defined on graphs wherein the cost of transmitting the information to the failed node depends on the graphical distance from this node to the helper vertices of the graph. The information passed to the failed node from the helpers traverses several vertices of the graph, and savings in communication complexity can be attained if the intermediate vertices process the information rather than simply relaying it toward the failed node. We derive simple information-theoretic bounds on the amount of information communicated between the nodes in the course of the repair. Next we show that Minimum Storage Regenerating (MSR) codes can be modified to perform the intermediate processing, thereby attaining the lower bound on the information exchange on the graph. We also consider node repair when the underlying graph is random, deriving conditions on the parameters that support recovery of the failed node with communication complexity smaller than required by the simple relaying. △ Less

Submitted 15 January, 2022; v1 submitted 2 August, 2021; originally announced August 2021.

Comments: Minor changes from v1. This is a final version (to appear in IEEE Transactions on Information Theory)

arXiv:2105.03511 [pdf, ps, other]

Bounds for the sum of distances of spherical sets of small size

Authors: Alexander Barg, Peter Boyvalenkov, Maya Stoyanova

Abstract: We derive upper and lower bounds on the sum of distances of a spherical code of size $N$ in $n$ dimensions when $N\sim n^α, 0<α\le 2.$ The bounds are derived by specializing recent general, universal bounds on energy of spherical sets. We discuss asymptotic behavior of our bounds along with several examples of codes whose sum of distances closely follows the upper bound. We derive upper and lower bounds on the sum of distances of a spherical code of size $N$ in $n$ dimensions when $N\sim n^α, 0<α\le 2.$ The bounds are derived by specializing recent general, universal bounds on energy of spherical sets. We discuss asymptotic behavior of our bounds along with several examples of codes whose sum of distances closely follows the upper bound. △ Less

Submitted 21 December, 2022; v1 submitted 7 May, 2021; originally announced May 2021.

Comments: 21 pp

arXiv:2010.00589 [pdf, other]

Recoverable Systems

Authors: Ohad Elishco, Alexander Barg

Abstract: Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We address the problem of finding the maximum growth rate of the set, which we term capacity, as well as constructions of explicit families that approach the optimal ra… ▽ More Motivated by the established notion of storage codes, we consider sets of infinite sequences over a finite alphabet such that every $k$-tuple of consecutive entries is uniquely recoverable from its $l$-neighborhood in the sequence. We address the problem of finding the maximum growth rate of the set, which we term capacity, as well as constructions of explicit families that approach the optimal rate. The techniques that we employ rely on the connection of this problem with constrained systems. In the second part of the paper we consider a modification of the problem wherein the entries in the sequence are viewed as random variables over a finite alphabet that follow some joint distribution, and the recovery condition requires that the Shannon entropy of the $k$-tuple conditioned on its $l$-neighborhood be bounded above by some $ε>0.$ We study properties of measures on infinite sequences that maximize the metric entropy under the recoverability condition. Drawing on tools from ergodic theory, we prove some properties of entropy-maximizing measures. We also suggest a procedure of constructing an $ε$-recoverable measure from a corresponding deterministic system. △ Less

Submitted 6 March, 2022; v1 submitted 1 October, 2020; originally announced October 2020.

Comments: Minor changes from v1. This is a final version (to appear in IEEE Transactions on Information Theory)

arXiv:2007.09721 [pdf, ps, other]

Bounds for discrepancies in the Hamming space

Authors: Alexander Barg, Maxim Skriganov

Abstract: We derive bounds for the ball $L_p$-discrepancies in the Hamming space for $0<p<\infty$ and $p=\infty$. Sharp estimates of discrepancies have been obtained for many spaces such as the Euclidean spheres and more general compact Riemannian manifolds. In the present paper, we show that the behavior of discrepancies in the Hamming space differs fundamentally because the volume of the ball in this spac… ▽ More We derive bounds for the ball $L_p$-discrepancies in the Hamming space for $0<p<\infty$ and $p=\infty$. Sharp estimates of discrepancies have been obtained for many spaces such as the Euclidean spheres and more general compact Riemannian manifolds. In the present paper, we show that the behavior of discrepancies in the Hamming space differs fundamentally because the volume of the ball in this space depends on its radius exponentially while such a dependence for the Riemannian manifolds is polynomial. △ Less

Submitted 27 August, 2020; v1 submitted 19 July, 2020; originally announced July 2020.

arXiv:2005.12995 [pdf, other]

doi 10.1112/mtk12066

Stolarsky's invariance principle for finite metric spaces

Authors: Alexander Barg

Abstract: Stolarsky's invariance principle quantifies the deviation of a subset of a metric space from the uniform distribution. Classically derived for spherical sets, it has been recently studied in a number of other situations, revealing a general structure behind various forms of the main identity. In this work we consider the case of finite metric spaces, relating the quadratic discrepancy of a subset… ▽ More Stolarsky's invariance principle quantifies the deviation of a subset of a metric space from the uniform distribution. Classically derived for spherical sets, it has been recently studied in a number of other situations, revealing a general structure behind various forms of the main identity. In this work we consider the case of finite metric spaces, relating the quadratic discrepancy of a subset to a certain function of the distribution of distances in it. Our main results are related to a concrete form of the invariance principle for the Hamming space. We derive several equivalent versions of the expression for the discrepancy of a code, including expansions of the discrepancy and associated kernels in the Krawtchouk basis. Codes that have the smallest possible quadratic discrepancy among all subsets of the same cardinality can be naturally viewed as energy minimizing subsets in the space. Using linear programming, we find several bounds on the minimal discrepancy and give examples of minimizing configurations. In particular, we show that all binary perfect codes have the smallest possible discrepancy. △ Less

Submitted 1 September, 2021; v1 submitted 26 May, 2020; originally announced May 2020.

Comments: 26pp. This version corrects the statement of Theorem 6.3 and contains a new Appendix with a simple proof of the formula for the sum of squares of Krawtchouk polynomials

Journal ref: Mathematika, vol. 67, no. 1, 2021, pp. 158-186

arXiv:2004.06770 [pdf, ps, other]

Cyclic and convolutional codes with locality

Authors: Zitan Chen, Alexander Barg

Abstract: Locally recoverable (LRC) codes and their variants have been extensively studied in recent years. In this paper we focus on cyclic constructions of LRC codes and derive conditions on the zeros of the code that support the property of hierarchical locality. As a result, we obtain a general family of hierarchical LRC codes for a new range of code parameters. We also observe that our approach enables… ▽ More Locally recoverable (LRC) codes and their variants have been extensively studied in recent years. In this paper we focus on cyclic constructions of LRC codes and derive conditions on the zeros of the code that support the property of hierarchical locality. As a result, we obtain a general family of hierarchical LRC codes for a new range of code parameters. We also observe that our approach enables one to represent an LRC code in quasicyclic form, and use this representation to construct tail-biting convolutional LRC codes with locality. Among other results, we extend the general approach to cyclic codes with locality to multidimensional cyclic codes, yielding new families of LRC codes with availability, and construct a family of q-ary cyclic hierarchical LRC codes of unbounded length. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: An extended abstract of this work appears in Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT 2020)

arXiv:2001.07189 [pdf, ps, other]

Enabling optimal access and error correction for the repair of Reed-Solomon codes

Authors: Zitan Chen, Min Ye, Alexander Barg

Abstract: Recently Reed-Solomon (RS) codes were shown to possess a repair scheme that supports repair of failed nodes with optimal repair bandwidth. In this paper, we extend this result in two directions. First, we propose a new repair scheme for the RS codes constructed in [Tamo-Ye-Barg, {\em IEEE Transactions on Information Theory}, vol. 65, May 2019] and show that our new scheme is robust to erroneous in… ▽ More Recently Reed-Solomon (RS) codes were shown to possess a repair scheme that supports repair of failed nodes with optimal repair bandwidth. In this paper, we extend this result in two directions. First, we propose a new repair scheme for the RS codes constructed in [Tamo-Ye-Barg, {\em IEEE Transactions on Information Theory}, vol. 65, May 2019] and show that our new scheme is robust to erroneous information provided by the helper nodes while maintaining the optimal repair bandwidth. Second, we construct a new family of RS codes with optimal access for the repair of any single failed node. We also show that the constructed codes can accommodate both features, supporting optimal-access repair with optimal error-correction capability. Going beyond RS codes, we also prove that any scalar MDS code with optimal repair bandwidth allows for a repair scheme with optimal access property. △ Less

Submitted 20 January, 2020; originally announced January 2020.

arXiv:1908.09900 [pdf, other]

Capacity of dynamical storage systems

Authors: Ohad Elishco, Alexander Barg

Abstract: We introduce a dynamical model of node repair in distributed storage systems wherein the storage nodes are subjected to failures according to independent Poisson processes. The main parameter that we study is the time-average capacity of the network in the scenario where a fixed subset of the nodes support a higher repair bandwidth than the other nodes. The sequence of node failures generates rand… ▽ More We introduce a dynamical model of node repair in distributed storage systems wherein the storage nodes are subjected to failures according to independent Poisson processes. The main parameter that we study is the time-average capacity of the network in the scenario where a fixed subset of the nodes support a higher repair bandwidth than the other nodes. The sequence of node failures generates random permutations of the nodes in the encoded block, and we model the state of the network as a Markov random walk on permutations of $n$ elements. As our main result we show that the capacity of the network can be increased compared to the static (worst-case) model of the storage system, while maintaining the same (average) repair bandwidth, and we derive estimates of the increase. We also quantify the capacity increase in the case that the repair center has information about the sequence of the recently failed storage nodes. △ Less

Submitted 20 September, 2020; v1 submitted 26 August, 2019; originally announced August 2019.

Comments: 27 pages; final version

arXiv:1901.04419 [pdf, ps, other]

Explicit constructions of MSR codes for clustered distributed storage: The rack-aware storage model

Authors: Zitan Chen, Alexander Barg

Abstract: The paper is devoted to the problem of erasure coding in distributed storage. We consider a model of storage that assumes that nodes are organized into equally sized groups, called racks, that within each group the nodes can communicate freely without taxing the system bandwidth, and that the only information transmission that counts is the one between the racks. This assumption implies that the n… ▽ More The paper is devoted to the problem of erasure coding in distributed storage. We consider a model of storage that assumes that nodes are organized into equally sized groups, called racks, that within each group the nodes can communicate freely without taxing the system bandwidth, and that the only information transmission that counts is the one between the racks. This assumption implies that the nodes within each of the racks can collaborate before providing information to the failed node. The main emphasis of the paper is on code construction for this storage model. We present an explicit family of MDS array codes that support recovery of a single failed node from any number of helper racks using the minimum possible amount of inter-rack communication (such codes are said to provide optimal repair). The codes are constructed over finite fields of size comparable to the code length. We also derive a bound on the number of symbols accessed at helper nodes for the purposes of repair, and construct a code family that approaches this bound, while still maintaining the optimal repair property. Finally, we present a construction of scalar Reed-Solomon codes that support optimal repair for the rack-oriented storage model. △ Less

Submitted 14 January, 2019; originally announced January 2019.

Comments: 24 pages

arXiv:1810.07283 [pdf, ps, other]

Optimal locally private estimation under $\ell_p$ loss for $1\le p\le 2$

Authors: Min Ye, Alexander Barg

Abstract: We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. In our previous work (IEE… ▽ More We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. In our previous work (IEEE Trans. Inform. Theory, 2018), we proposed a family of new privatization schemes and the corresponding estimator. We also proved that our scheme and estimator are order optimal in the regime $e^ε \ll k$ under both $\ell_2^2$ (mean square) and $\ell_1$ loss. In this paper, we sharpen this result by showing asymptotic optimality of the proposed scheme under the $\ell_p^p$ loss for all $1\le p\le 2.$ More precisely, we show that for any $p\in[1,2]$ and any $k$ and $ε,$ the ratio between the worst-case $\ell_p^p$ estimation loss of our scheme and the optimal value approaches $1$ as the number of samples tends to infinity. The lower bound on the minimax risk of private estimation that we establish as a part of the proof is valid for any loss function $\ell_p^p, p\ge 1.$ △ Less

Submitted 16 October, 2018; originally announced October 2018.

Comments: This paper generalizes the optimality results of the preprint arXiv:1708.00059 from $ell_2$ to a broader class of loss functions. The new approach taken here also results in a much shorter proof

arXiv:1807.05473 [pdf, ps, other]

Codes with hierarchical locality from covering maps of curves

Authors: Sean Ballentine, Alexander Barg, Serge Vladuts

Abstract: Locally recoverable (LRC) codes provide ways of recovering erased coordinates of the codeword without having to access each of the remaining coordinates. A subfamily of LRC codes with hierarchical locality (H-LRC codes) provides added flexibility to the construction by introducing several tiers of recoverability for correcting different numbers of erasures. We present a general construction of cod… ▽ More Locally recoverable (LRC) codes provide ways of recovering erased coordinates of the codeword without having to access each of the remaining coordinates. A subfamily of LRC codes with hierarchical locality (H-LRC codes) provides added flexibility to the construction by introducing several tiers of recoverability for correcting different numbers of erasures. We present a general construction of codes with 2-level hierarchical locality from maps between algebraic curves and specialize it to several code families obtained from quotients of curves by a subgroup of the automorphism group, including rational, elliptic, Kummer, and Artin-Schreier curves. We further address the question of H-LRC codes with availability, and suggest a general construction of such codes from fiber products of curves. Detailed calculations of parameters for H-LRC codes with availability are performed for Reed-Solomon- and Hermitian-like code families. Finally, we construct asymptotically good families of H-LRC codes from curves related to the Garcia-Stichtenoth tower. △ Less

Submitted 2 April, 2019; v1 submitted 14 July, 2018; originally announced July 2018.

Comments: V2: We added a construction of codes with two-level hierarchical locality and availability in each of the levels. \\ V3: The section on locality and availability has been rewritten; a new section was added with two general families of codes with two-level locality and availability in each of the levels, Sec. VIII.E

arXiv:1805.01883 [pdf, ps, other]

The repair problem for Reed-Solomon codes: Optimal repair of single and multiple erasures, asymptotically optimal node size

Authors: Itzhak Tamo, Min Ye, Alexander Barg

Abstract: The repair problem in distributed storage addresses recovery of the data encoded using an erasure code, for instance, a Reed-Solomon (RS) code. We consider the problem of repairing a single node or multiple nodes in RS-coded storage systems using the smallest possible amount of inter-nodal communication. According to the cut-set bound, communication cost of repairing $h\ge 1$ failed nodes for an… ▽ More The repair problem in distributed storage addresses recovery of the data encoded using an erasure code, for instance, a Reed-Solomon (RS) code. We consider the problem of repairing a single node or multiple nodes in RS-coded storage systems using the smallest possible amount of inter-nodal communication. According to the cut-set bound, communication cost of repairing $h\ge 1$ failed nodes for an $(n,k=n-r)$ MDS code using $d$ helper nodes is at least $dhl/(d+h-k),$ where $l$ is the size of the node. Guruswami and Wootters (2016) initiated the study of efficient repair of RS codes, showing that they can be repaired using a smaller bandwidth than under the trivial approach. At the same time, their work as well as follow-up papers stopped short of constructing RS codes (or any scalar MDS codes) that meet the cut-set bound with equality. In this paper we construct families of RS codes that achieve the cutset bound for repair of one or several nodes. In the single-node case, we present RS codes of length $n$ over the field $\mathbb{F}_{q^l},l=\exp((1+o(1))n\log n)$ that meet the cut-set bound. We also prove an almost matching lower bound on $l$, showing that super-exponential scaling is both necessary and sufficient for scalar MDS codes to achieve the cut-set bound using linear repair schemes. For the case of multiple nodes, we construct a family of RS codes that achieve the cut-set bound universally for the repair of any $h=2,3,\dots$ failed nodes from any subset of $d$ helper nodes, $k\le d\le n-h.$ For a fixed number of parities $r$ the node size of the constructed codes is close to the smallest possible node size for codes with such properties. △ Less

Submitted 4 May, 2018; originally announced May 2018.

Comments: Submitted to IEEE Transactions on Information Theory. This paper is a journal version of the earlier conference papers. It contains a unified presentation of the results in arXiv:1706.00112 and arXiv:1710.07216 as well as some other results. Overall it presents a solution of the optimal repair problem for Reed-Solomon codes

arXiv:1802.00157 [pdf, ps, other]

Optimal LRC codes for all lenghts n <= q

Authors: Oleg Kolosov, Alexander Barg, Itzhak Tamo, Gala Yadgar

Abstract: A family of distance-optimal LRC codes from certain subcodes of $q$-ary Reed-Solomon codes, proposed by I.~Tamo and A.~Barg in 2014, assumes that the code length $n$ is a multiple of $r+1.$ By shortening codes from this family, we show that it is possible to lift this assumption, still obtaining distance-optimal codes. A family of distance-optimal LRC codes from certain subcodes of $q$-ary Reed-Solomon codes, proposed by I.~Tamo and A.~Barg in 2014, assumes that the code length $n$ is a multiple of $r+1.$ By shortening codes from this family, we show that it is possible to lift this assumption, still obtaining distance-optimal codes. △ Less

Submitted 1 February, 2018; originally announced February 2018.

arXiv:1801.09665 [pdf, ps, other]

Cooperative repair: Constructions of optimal MDS codes for all admissible parameters

Authors: Min Ye, Alexander Barg

Abstract: Two widely studied models of multiple-node repair in distributed storage systems are centralized repair and cooperative repair. The centralized model assumes that all the failed nodes are recreated in one location, while the cooperative one stipulates that the failed nodes may communicate but are distinct, and the amount of data exchanged between them is included in the repair bandwidth. As our… ▽ More Two widely studied models of multiple-node repair in distributed storage systems are centralized repair and cooperative repair. The centralized model assumes that all the failed nodes are recreated in one location, while the cooperative one stipulates that the failed nodes may communicate but are distinct, and the amount of data exchanged between them is included in the repair bandwidth. As our first result, we prove a lower bound on the minimum bandwidth of cooperative repair. We also show that the cooperative model is stronger than the centralized one, in the sense that any MDS code with optimal repair bandwidth under the former model also has optimal bandwidth under the latter one. These results were previously known under the additional "uniform download" assumption, which is removed in our proofs. As our main result, we give explicit constructions of MDS codes with optimal cooperative repair for all possible parameters. More precisely, given any $n,k,h,d$ such that $2\le h \le n-d\le n-k$ we construct $(n,k)$ MDS codes over the field $F$ of size $|F|\ge (d+1-k)n$ that can optimally repair any $h$ erasures from any $d$ helper nodes. The repair scheme of our codes involves two rounds of communication. In the first round, each failed node downloads information from the helper nodes, and in the second one, each failed node downloads additional information from the other failed nodes. This implies that our codes achieve the optimal repair bandwidth using the smallest possible number of rounds. △ Less

Submitted 10 July, 2018; v1 submitted 29 January, 2018; originally announced January 2018.

Comments: In this version we explicitly state a lower bound on the minimum bandwidth of cooperative repair which was previously known under the uniform download assumption. This assumption is removed in our proof

arXiv:1710.07216 [pdf, ps, other]

Repairing Reed-Solomon codes: Universally achieving the cut-set bound for any number of erasures

Authors: Min Ye, Alexander Barg

Abstract: The repair bandwidth of a code is the minimum amount of data required to repair one or several failed nodes (erasures). For MDS codes, the repair bandwidth is bounded below by the so-called cut-set bound, and codes that meet this bound with equality are said to support optimal repair of one or multiple failed nodes. We consider the problem of repairing multiple failed nodes of Reed-Solomon (RS)… ▽ More The repair bandwidth of a code is the minimum amount of data required to repair one or several failed nodes (erasures). For MDS codes, the repair bandwidth is bounded below by the so-called cut-set bound, and codes that meet this bound with equality are said to support optimal repair of one or multiple failed nodes. We consider the problem of repairing multiple failed nodes of Reed-Solomon (RS) codes. In a recent work with I. Tamo (Proc. IEEE FOCS 2017), we gave the first explicit construction of RS codes with optimal repair of any single failed node from any subset of helper nodes. In this paper, we construct explicit RS codes that universally achieve the cut-set bound for the repair of any number of failed nodes from any set of helper nodes. Moreover, the node size of our codes is close to the optimal (smallest possible) node size of codes with such property. △ Less

Submitted 17 October, 2017; originally announced October 2017.

Comments: arXiv admin note: text overlap with arXiv:1706.00112

arXiv:1708.00059 [pdf, ps, other]

Asymptotically optimal private estimation under mean square loss

Authors: Min Ye, Alexander Barg

Abstract: We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. In our previous work (arX… ▽ More We consider the minimax estimation problem of a discrete distribution with support size $k$ under locally differential privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. In our previous work (arXiv:1702.00610), we proposed a family of new privatization schemes and the corresponding estimator. We also proved that our scheme and estimator are order optimal in the regime $e^ε \ll k$ under both $\ell_2^2$ and $\ell_1$ loss. In other words, for a large number of samples the worst-case estimation loss of our scheme was shown to differ from the optimal value by at most a constant factor. In this paper, we eliminate this gap by showing asymptotic optimality of the proposed scheme and estimator under the $\ell_2^2$ (mean square) loss. More precisely, we show that for any $k$ and $ε,$ the ratio between the worst-case estimation loss of our scheme and the optimal value approaches $1$ as the number of samples tends to infinity. △ Less

Submitted 31 July, 2017; originally announced August 2017.

arXiv:1706.00112 [pdf, ps, other]

Optimal repair of Reed-Solomon codes: Achieving the cut-set bound

Authors: Itzhak Tamo, Min Ye, Alexander Barg

Abstract: Coding for distributed storage gives rise to a new set of problems in coding theory related to the need of reducing inter-node communication in the system. A large number of recent papers addressed the problem of optimizing the total amount of information downloaded for repair of a single failed node (the repair bandwidth) by accessing information on $d$ {\em helper nodes}, where $k\le d\le n-1.$… ▽ More Coding for distributed storage gives rise to a new set of problems in coding theory related to the need of reducing inter-node communication in the system. A large number of recent papers addressed the problem of optimizing the total amount of information downloaded for repair of a single failed node (the repair bandwidth) by accessing information on $d$ {\em helper nodes}, where $k\le d\le n-1.$ By the so-called cut-set bound (Dimakis et al., 2010), the repair bandwidth of an $(n,k=n-r)$ MDS code using $d$ helper nodes is at least $dl/(d+1-k),$ where $l$ is the size of the node. Also, a number of known constructions of MDS array codes meet this bound with equality. In a related but separate line of work, Guruswami and Wootters (2016) studied repair of Reed-Solomon (RS) codes, showing that these codes can be repaired using a smaller bandwidth than under the trivial approach. At the same time, their work as well as follow-up papers stopped short of constructing RS codes (or any scalar MDS codes) that meet the cut-set bound with equality, which has been an open problem in coding theory. In this work we present a solution to this problem, constructing RS codes of length $n$ over the field $q^l, l=\exp((1+o(1))n\log n)$ that meet the cut-set bound. We also prove an almost matching lower bound on $l$, showing that the super-exponential scaling is both necessary and sufficient for achieving the cut-set bound using linear repair schemes. More precisely, we prove that for scalar MDS codes (including the RS codes) to meet this bound, the sub-packetization $l$ must satisfy $l \ge \exp((1+o(1)) k\log k).$ △ Less

Submitted 31 May, 2017; originally announced June 2017.

arXiv:1702.03786 [pdf, other]

A Study on the Impact of Locality in the Decoding of Binary Cyclic Codes

Authors: M. Nikhil Krishnan, Bhagyashree Puranik, P. Vijay Kumar, Itzhak Tamo, Alexander Barg

Abstract: In this paper, we study the impact of locality on the decoding of binary cyclic codes under two approaches, namely ordered statistics decoding (OSD) and trellis decoding. Given a binary cyclic code having locality or availability, we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise ratio, for a given reliability and essentially the same level of decoder complexity. With rega… ▽ More In this paper, we study the impact of locality on the decoding of binary cyclic codes under two approaches, namely ordered statistics decoding (OSD) and trellis decoding. Given a binary cyclic code having locality or availability, we suitably modify the OSD to obtain gains in terms of the Signal-To-Noise ratio, for a given reliability and essentially the same level of decoder complexity. With regard to trellis decoding, we show that careful introduction of locality results in the creation of cyclic subcodes having lower maximum state complexity. We also present a simple upper-bounding technique on the state complexity profile, based on the zeros of the code. Finally, it is shown how the decoding speed can be significantly increased in the presence of locality, in the moderate-to-high SNR regime, by making use of a quick-look decoder that often returns the ML codeword. △ Less

Submitted 13 February, 2017; originally announced February 2017.

Comments: Extended version of a paper submitted to ISIT 2017

arXiv:1702.02685 [pdf, other]

doi 10.1109/TIT.2018.2800042

Combinatorial Alphabet-Dependent Bounds for Locally Recoverable Codes

Authors: Abhishek Agarwal, Alexander Barg, Sihuang Hu, Arya Mazumdar, Itzhak Tamo

Abstract: Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We pres… ▽ More Locally recoverable (LRC) codes have recently been a focus point of research in coding theory due to their theoretical appeal and applications in distributed storage systems. In an LRC code, any erased symbol of a codeword can be recovered by accessing only a small number of other symbols. For LRC codes over a small alphabet (such as binary), the optimal rate-distance trade-off is unknown. We present several new combinatorial bounds on LRC codes including the locality-aware sphere packing and Plotkin bounds. We also develop an approach to linear programming (LP) bounds on LRC codes. The resulting LP bound gives better estimates in examples than the other upper bounds known in the literature. Further, we provide the tightest known upper bound on the rate of linear LRC codes with a given relative distance, an improvement over the previous best known bounds. △ Less

Submitted 25 January, 2018; v1 submitted 8 February, 2017; originally announced February 2017.

Comments: To appear in IEEE Transactions on Information Theory

Journal ref: IEEE Transactions on Information Theory, 64 (2018), 3481-3492

arXiv:1702.00610 [pdf, ps, other]

Optimal Schemes for Discrete Distribution Estimation under Locally Differential Privacy

Authors: Min Ye, Alexander Barg

Abstract: We consider the minimax estimation problem of a discrete distribution with support size $k$ under privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. For a given $ε,$ we consider the problem of cons… ▽ More We consider the minimax estimation problem of a discrete distribution with support size $k$ under privacy constraints. A privatization scheme is applied to each raw sample independently, and we need to estimate the distribution of the raw samples from the privatized samples. A positive number $ε$ measures the privacy level of a privatization scheme. For a given $ε,$ we consider the problem of constructing optimal privatization schemes with $ε$-privacy level, i.e., schemes that minimize the expected estimation loss for the worst-case distribution. Two schemes in the literature provide order optimal performance in the high privacy regime where $ε$ is very close to $0,$ and in the low privacy regime where $e^ε\approx k,$ respectively. In this paper, we propose a new family of schemes which substantially improve the performance of the existing schemes in the medium privacy regime when $1\ll e^ε \ll k.$ More concretely, we prove that when $3.8 < ε<\ln(k/9) ,$ our schemes reduce the expected estimation loss by $50\%$ under $\ell_2^2$ metric and by $30\%$ under $\ell_1$ metric over the existing schemes. We also prove a lower bound for the region $e^ε \ll k,$ which implies that our schemes are order optimal in this regime. △ Less

Submitted 2 February, 2017; originally announced February 2017.

Comments: A short version is submitted to ISIT 2017

arXiv:1701.06969 [pdf, ps, other]

Error correction based on partial information

Authors: Itzhak Tamo, Min Ye, Alexander Barg

Abstract: We consider the decoding of linear and array codes from errors when we are only allowed to download a part of the codeword. More specifically, suppose that we have encoded $k$ data symbols using an $(n,k)$ code with code length $n$ and dimension $k.$ During storage, some of the codeword coordinates might be corrupted by errors. We aim to recover the original data by reading the corrupted codeword… ▽ More We consider the decoding of linear and array codes from errors when we are only allowed to download a part of the codeword. More specifically, suppose that we have encoded $k$ data symbols using an $(n,k)$ code with code length $n$ and dimension $k.$ During storage, some of the codeword coordinates might be corrupted by errors. We aim to recover the original data by reading the corrupted codeword with a limit on the transmitting bandwidth, namely, we can only download an $α$ proportion of the corrupted codeword. For a given $α,$ our objective is to design a code and a decoding scheme such that we can recover the original data from the largest possible number of errors. A naive scheme is to read $αn$ coordinates of the codeword. This method used in conjunction with MDS codes guarantees recovery from any $\lfloor(αn-k)/2\rfloor$ errors. In this paper we show that we can instead read an $α$ proportion from each of the codeword's coordinates. For a well-designed MDS code, this method can guarantee recovery from $\lfloor (n-k/α)/2 \rfloor$ errors, which is $1/α$ times more than the naive method, and is also the maximum number of errors that an $(n,k)$ code can correct by downloading only an $α$ proportion of the codeword. We present two families of such optimal constructions and decoding schemes. One is a Reed-Solomon code with evaluation points in a subfield and the other is based on Folded Reed-Solomon codes. We further show that both code constructions attain asymptotically optimal list decoding radius when downloading only a part of the corrupted codeword. We also construct an ensemble of random codes that with high probability approaches the upper bound on the number of correctable errors when the decoder downloads an $α$ proportion of the corrupted codeword. △ Less

Submitted 8 October, 2018; v1 submitted 24 January, 2017; originally announced January 2017.

Comments: Extended version of the conference paper in ISIT 2017

arXiv:1701.05212 [pdf, other]

doi 10.1007/978-3-319-63931-4_4

Locally recoverable codes from algebraic curves and surfaces

Authors: Alexander Barg, Kathryn Haymaker, Everett W. Howe, Gretchen L. Matthews, Anthony Várilly-Alvarado

Abstract: A locally recoverable code is a code over a finite alphabet such that the value of any single coordinate of a codeword can be recovered from the values of a small subset of other coordinates. Building on work of Barg, Tamo, and Vlăduţ, we present several constructions of locally recoverable codes from algebraic curves and surfaces. A locally recoverable code is a code over a finite alphabet such that the value of any single coordinate of a codeword can be recovered from the values of a small subset of other coordinates. Building on work of Barg, Tamo, and Vlăduţ, we present several constructions of locally recoverable codes from algebraic curves and surfaces. △ Less

Submitted 18 January, 2017; originally announced January 2017.

Comments: 27 pages

MSC Class: 94B27; 14G50

Journal ref: pp. 95-127 in: Algebraic Geometry for Coding Theory and Cryptography (E. W. Howe, K. E. Lauter, and J. L. Walker, eds.), Springer, Cham, 2017

arXiv:1605.08630 [pdf, ps, other]

Explicit constructions of optimal-access MDS codes with nearly optimal sub-packetization

Authors: Min Ye, Alexander Barg

Abstract: An $(n,k,l)$ MDS array code of length $n,$ dimension $k=n-r$ and sub-packetization $l$ is formed of $l\times n$ matrices over a finite field $F,$ with every column of the matrix stored on a separate node in a distributed storage system and viewed as a coordinate of the codeword. Repair of a failed node can be performed by accessing a set of $d\le n-1$ helper nodes. The code is said to have the opt… ▽ More An $(n,k,l)$ MDS array code of length $n,$ dimension $k=n-r$ and sub-packetization $l$ is formed of $l\times n$ matrices over a finite field $F,$ with every column of the matrix stored on a separate node in a distributed storage system and viewed as a coordinate of the codeword. Repair of a failed node can be performed by accessing a set of $d\le n-1$ helper nodes. The code is said to have the optimal access property if the amount of data accessed at each of the helper nodes meets a lower bound on this quantity. For optimal-access MDS codes with $d=n-1,$ the sub-packetization $l$ satisfies the bound $l\ge r^{(k-1)/r}.$ In our previous work, for any $n$ and $r,$ we presented an explicit construction of optimal-access MDS codes with sub-packetization $l=r^{n-1}.$ In this paper we take up the question of reducing the sub-packetization value $l$ to make it approach the lower bound. We construct an explicit family of optimal-access codes with $l=r^{\lceil n/r\rceil},$ which differs from the optimal value by at most a factor of $r^2.$ These codes can be constructed over any finite field $F$ as long as $|F|\ge r\lceil n/r\rceil,$ and afford low-complexity encoding and decoding procedures. We also define a version of the repair problem that bridges the context of regenerating codes and codes with locality constraints (LRC codes), calling it group repair with optimal access. In this variation, we assume that the set of $n=sm$ nodes is partitioned into $m$ repair groups of size $s,$ and require that the amount of accessed data for repair is the smallest possible whenever the $d$ helper nodes include all the other $s-1$ nodes from the same group as the failed node. For this problem, we construct a family of codes with the group optimal access property. These codes can be constructed over any field $F$ of size $|F|\ge n,$ and also afford low-complexity encoding and decoding procedures. △ Less

Submitted 27 July, 2017; v1 submitted 27 May, 2016; originally announced May 2016.

Comments: Minor changes from the previous version

arXiv:1604.00454 [pdf, ps, other]

Explicit constructions of high-rate MDS array codes with optimal repair bandwidth

Authors: Min Ye, Alexander Barg

Abstract: Maximum distance separable (MDS) codes are optimal error-correcting codes in the sense that they provide the maximum failure-tolerance for a given number of parity nodes. Suppose that an MDS code with $k$ information nodes and $r=n-k$ parity nodes is used to encode data in a distributed storage system. It is known that if $h$ out of the $n$ nodes are inaccessible and $d$ surviving (helper) nodes a… ▽ More Maximum distance separable (MDS) codes are optimal error-correcting codes in the sense that they provide the maximum failure-tolerance for a given number of parity nodes. Suppose that an MDS code with $k$ information nodes and $r=n-k$ parity nodes is used to encode data in a distributed storage system. It is known that if $h$ out of the $n$ nodes are inaccessible and $d$ surviving (helper) nodes are used to recover the lost data, then we need to download at least $h/(d+h-k)$ fraction of the data stored in each of the helper nodes (Dimakis et. al., 2010 and Cadambe et al., 2013). If this lower bound is achieved for the repair of any $h$ erased nodes from any $d$ helper nodes, we say that the MDS code has the $(h,d)$-optimal repair property. We study high-rate MDS array codes with the optimal repair property. Explicit constructions of such codes in the literature are only available for the cases where there are at most 3 parity nodes, and these existing constructions can only optimally repair a single node failure by accessing all the surviving nodes. In this paper, given any $r$ and $n$, we present two explicit constructions of MDS array codes with the $(h,d)$-optimal repair property for all $h\le r$ and $k\le d\le n-h$ simultaneously. Codes in the first family can be constructed over any base field $F$ as long as $|F|\ge sn,$ where $s=\text{lcm}(1,2,\dots,r).$ The encoding, decoding, repair of failed nodes, and update procedures of these codes all have low complexity. Codes in the second family have the optimal access property and can be constructed over any base field $F$ as long as $|F|\ge n+1.$ Moreover, both code families have the optimal error resilience capability when repairing failed nodes. We also construct several other related families of MDS codes with the optimal repair property. △ Less

Submitted 8 April, 2016; v1 submitted 1 April, 2016; originally announced April 2016.

Comments: 19pp., submitted for publication. This version contains a few additional references

MSC Class: 94B60

arXiv:1603.08878 [pdf, other]

doi 10.1504/IJICOT.2016.079496

Cyclic LRC Codes, binary LRC codes, and upper bounds on the distance of cyclic codes

Authors: Itzhak Tamo, Alexander Barg, Sreechakra Goparaju, Robert Calderbank

Abstract: We consider linear cyclic codes with the locality property, or locally recoverable codes (LRC codes). A family of LRC codes that generalize the classical construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg (IEEE Trans. Inform. Theory, no. 8, 2014). In this paper we focus on optimal cyclic codes that arise from this construction. We give a characterization of… ▽ More We consider linear cyclic codes with the locality property, or locally recoverable codes (LRC codes). A family of LRC codes that generalize the classical construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg (IEEE Trans. Inform. Theory, no. 8, 2014). In this paper we focus on optimal cyclic codes that arise from this construction. We give a characterization of these codes in terms of their zeros, and observe that there are many equivalent ways of constructing optimal cyclic LRC codes over a given field. We also study subfield subcodes of cyclic LRC codes (BCH-like LRC codes) and establish several results about their locality and minimum distance. The locality parameter of a cyclic code is related to the dual distance of this code, and we phrase our results in terms of upper bounds on the dual distance. △ Less

Submitted 29 March, 2016; originally announced March 2016.

Comments: 12pp., submitted for publication. An extended abstract of this submission was posted earlier as arXiv:1502.01414 and was published in Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong Kong, China, June 14-19, 2015, pp. 1262--1266

MSC Class: 94B15

Journal ref: International Journal of Information and Coding Theory, vol. 3, no. 4, pp.345-364 (2016)

arXiv:1603.08876 [pdf, other]

doi 10.1109/ISIT.2015.7282656

Locally recoverable codes on algebraic curves

Authors: Alexander Barg, Itzhak Tamo, Serge Vladuts

Abstract: A code over a finite alphabet is called locally recoverable (LRC code) if every symbol in the encoding is a function of a small number (at most $r$) other symbols of the codeword. In this paper we introduce a construction of LRC codes on algebraic curves, extending a recent construction of Reed-Solomon like codes with locality. We treat the following situations: local recovery of a single erasure,… ▽ More A code over a finite alphabet is called locally recoverable (LRC code) if every symbol in the encoding is a function of a small number (at most $r$) other symbols of the codeword. In this paper we introduce a construction of LRC codes on algebraic curves, extending a recent construction of Reed-Solomon like codes with locality. We treat the following situations: local recovery of a single erasure, local recovery of multiple erasures, and codes with several disjoint recovery sets for every coordinate (the {\em availability problem}). For each of these three problems we describe a general construction of codes on curves and construct several families of LRC codes. We also describe a construction of codes with availability that relies on automorphism groups of curves. △ Less

Submitted 29 March, 2016; originally announced March 2016.

Comments: 16pp. An extended abstract of this submission was posted earlier as arXiv:1501.04904 and was published in Proceedings of the 2015 IEEE International Symposium on Information Theory, Hong Kong, China, June 14-19, 2015, pp. 1252--1256

MSC Class: 94B27; 14G50

arXiv:1603.05736 [pdf, other]

Construction of polar codes for arbitrary discrete memoryless channels

Authors: Talha Cihad Gulcu, Min Ye, Alexander Barg

Abstract: It is known that polar codes can be efficiently constructed for binary-input channels. At the same time, existing algorithms for general input alphabets are less practical because of high complexity. We address the construction problem for the general case, and analyze an algorithm that is based on successive reduction of the output alphabet size of the subchannels in each recursion step. For this… ▽ More It is known that polar codes can be efficiently constructed for binary-input channels. At the same time, existing algorithms for general input alphabets are less practical because of high complexity. We address the construction problem for the general case, and analyze an algorithm that is based on successive reduction of the output alphabet size of the subchannels in each recursion step. For this procedure we estimate the approximation error as $O(μ^{-1/(q-1)}),$ where $μ$ is the "quantization parameter," i.e., the maximum size of the subchannel output alphabet allowed by the algorithm. The complexity of the code construction scales as $O(Nμ^4),$ where $N$ is the length of the code. We also show that if the polarizing operation relies on modulo-$q$ addition, it is possible to merge subsets of output symbols without any loss in subchannel capacity. Performing this procedure before each approximation step results in a further speed-up of the code construction, and the resulting codes have smaller gap to capacity. We show that a similar acceleration can be attained for polar codes over finite field alphabets. Experimentation shows that the suggested construction algorithms can be used to construct long polar codes for alphabets of size $q=16$ and more with acceptable loss of the code rate for a variety of polarizing transforms. △ Less

Submitted 29 September, 2017; v1 submitted 17 March, 2016; originally announced March 2016.

Comments: The results are unchanged, minor changes in the proofs

arXiv:1510.02873 [pdf, ps, other]

Group testing schemes from codes and designs

Authors: Alexander Barg, Arya Mazumdar

Abstract: In group testing, simple binary-output tests are designed to identify a small number $t$ of defective items that are present in a large population of $N$ items. Each test takes as input a group of items and produces a binary output indicating whether the group is free of the defective items or contains one or more of them. In this paper we study a relaxation of the combinatorial group testing prob… ▽ More In group testing, simple binary-output tests are designed to identify a small number $t$ of defective items that are present in a large population of $N$ items. Each test takes as input a group of items and produces a binary output indicating whether the group is free of the defective items or contains one or more of them. In this paper we study a relaxation of the combinatorial group testing problem. A matrix is called $(t,ε)$-disjunct if it gives rise to a nonadaptive group testing scheme with the property of identifying a uniformly random $t$-set of defective subjects out of a population of size $N$ with false positive probability of an item at most $ε$. We establish a new connection between $(t,ε)$-disjunct matrices and error correcting codes based on the dual distance of the codes and derive estimates of the parameters of codes that give rise to such schemes. Our methods rely on the moments of the distance distribution of codes and inequalities for moments of sums of independent random variables. We also provide a new connection between group testing schemes and combinatorial designs. △ Less

Submitted 8 April, 2017; v1 submitted 10 October, 2015; originally announced October 2015.

Comments: 18 pages

MSC Class: 68Q25; 94C30

arXiv:1506.07196 [pdf, other]

Bounds on the Parameters of Locally Recoverable Codes

Authors: Itzhak Tamo, Alexander Barg, Alexey Frolov

Abstract: A locally recoverable code (LRC code) is a code over a finite alphabet such that every symbol in the encoding is a function of a small number of other symbols that form a recovering set. In this paper we derive new finite-length and asymptotic bounds on the parameters of LRC codes. For LRC codes with a single recovering set for every coordinate, we derive an asymptotic Gilbert-Varshamov type bound… ▽ More A locally recoverable code (LRC code) is a code over a finite alphabet such that every symbol in the encoding is a function of a small number of other symbols that form a recovering set. In this paper we derive new finite-length and asymptotic bounds on the parameters of LRC codes. For LRC codes with a single recovering set for every coordinate, we derive an asymptotic Gilbert-Varshamov type bound for LRC codes and find the maximum attainable relative distance of asymptotically good LRC codes. Similar results are established for LRC codes with two disjoint recovering sets for every coordinate. For the case of multiple recovering sets we derive a lower bound on the parameters using expander graph arguments. Finally, we also derive finite-length upper bounds on the rate and distance of LRC codes with multiple recovering sets. △ Less

Submitted 9 March, 2016; v1 submitted 23 June, 2015; originally announced June 2015.

Comments: To appear in IEEE Transaction on Information Theory. A part of the results of this paper were presented at the 2014 IEEE International Symposium on Information Theory, July 2014, Honululu

arXiv:1506.06345 [pdf, ps, other]

doi 10.1109/TIT.2015.2448658

Restricted isometry property of random subdictionaries

Authors: Alexander Barg, Arya Mazumdar, Rongrong Wang

Abstract: We study statistical restricted isometry, a property closely related to sparse signal recovery, of deterministic sensing matrices of size $m \times N$. A matrix is said to have a statistical restricted isometry property (StRIP) of order $k$ if most submatrices with $k$ columns define a near-isometric map of ${\mathbb R}^k$ into ${\mathbb R}^m$. As our main result, we establish sufficient condition… ▽ More We study statistical restricted isometry, a property closely related to sparse signal recovery, of deterministic sensing matrices of size $m \times N$. A matrix is said to have a statistical restricted isometry property (StRIP) of order $k$ if most submatrices with $k$ columns define a near-isometric map of ${\mathbb R}^k$ into ${\mathbb R}^m$. As our main result, we establish sufficient conditions for the StRIP property of a matrix in terms of the mutual coherence and mean square coherence. We show that for many existing deterministic families of sampling matrices, $m=O(k)$ rows suffice for $k$-StRIP, which is an improvement over the known estimates of either $m = Θ(k \log N)$ or $m = Θ(k\log k)$. We also give examples of matrix families that are shown to have the StRIP property using our sufficient conditions. △ Less

Submitted 21 June, 2015; originally announced June 2015.

Comments: To appear in the IEEE Transactions on Information Theory, 2015. A detailed draft which is a predecessor of this paper appears as arXiv:1303.1847

arXiv:1502.01414 [pdf, other]

Cyclic LRC Codes and their Subfield Subcodes

Authors: Itzhak Tamo, Alexander Barg, Sreechakra Goparaju, Robert Calderbank

Abstract: We consider linear cyclic codes with the locality property, or locally recoverable codes (LRC codes). A family of LRC codes that generalizes the classical construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg (IEEE Transactions on Information Theory, no. 8, 2014; arXiv:1311.3284). In this paper we focus on the optimal cyclic codes that arise from the general c… ▽ More We consider linear cyclic codes with the locality property, or locally recoverable codes (LRC codes). A family of LRC codes that generalizes the classical construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg (IEEE Transactions on Information Theory, no. 8, 2014; arXiv:1311.3284). In this paper we focus on the optimal cyclic codes that arise from the general construction. We give a characterization of these codes in terms of their zeros, and observe that there are many equivalent ways of constructing optimal cyclic LRC codes over a given field. We also study subfield subcodes of cyclic LRC codes (BCH-like LRC codes) and establish several results about their locality and minimum distance. △ Less

Submitted 4 February, 2015; originally announced February 2015.

Comments: Submitted for publication

MSC Class: 94B15

arXiv:1501.04904 [pdf, ps, other]

Locally recoverable codes on algebraic curves

Authors: Alexander Barg, Itzhak Tamo, Serge Vladut

Abstract: A code over a finite alphabet is called locally recoverable (LRC code) if every symbol in the encoding is a function of a small number (at most r) other symbols. A family of linear LRC codes that generalize the classic construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg. In this paper we extend this construction to codes on algebraic curves. We give a genera… ▽ More A code over a finite alphabet is called locally recoverable (LRC code) if every symbol in the encoding is a function of a small number (at most r) other symbols. A family of linear LRC codes that generalize the classic construction of Reed-Solomon codes was constructed in a recent paper by I. Tamo and A. Barg. In this paper we extend this construction to codes on algebraic curves. We give a general construction of LRC codes on curves and compute some examples, including asymptotically good families of codes derived from the Garcia- Stichtenoth towers. The local recovery procedure is performed by polynomial interpolation over r coordinates of the codevector. We also obtain a family of Hermitian codes with two disjoint recovering sets for every symbol of the codeword. △ Less

Submitted 10 May, 2015; v1 submitted 20 January, 2015; originally announced January 2015.

Comments: Will appear at ISIT 2015

arXiv:1410.3422 [pdf, other]

Achieving Secrecy Capacity of the Wiretap Channel and Broadcast Channel with a Confidential Component

Authors: Talha Cihad Gulcu, Alexander Barg

Abstract: The wiretap channel model of Wyner is one of the first communication models with both reliability and security constraints. Capacity-achieving schemes for various models of the wiretap channel have received considerable attention in recent literature. In this paper, we show that capacity of the general (not necessarily degraded or symmetric) wiretap channel under a "strong secrecy constraint" can… ▽ More The wiretap channel model of Wyner is one of the first communication models with both reliability and security constraints. Capacity-achieving schemes for various models of the wiretap channel have received considerable attention in recent literature. In this paper, we show that capacity of the general (not necessarily degraded or symmetric) wiretap channel under a "strong secrecy constraint" can be achieved using a transmission scheme based on polar codes. We also extend our construction to the case of broadcast channels with confidential messages defined by Csisz{á}r and K{örner}, achieving the entire capacity region of this communication model. △ Less

Submitted 3 November, 2016; v1 submitted 13 October, 2014; originally announced October 2014.

Comments: to appear in IEEE Trans Inform. Theory

arXiv:1408.6824 [pdf, ps, other]

Universal Source Polarization and an Application to a Multi-User Problem

Authors: Min Ye, Alexander Barg

Abstract: We propose a scheme that universally achieves the smallest possible compression rate for a class of sources with side information, and develop an application of this result for a joint source channel coding problem over a broadcast channel. We propose a scheme that universally achieves the smallest possible compression rate for a class of sources with side information, and develop an application of this result for a joint source channel coding problem over a broadcast channel. △ Less

Submitted 26 September, 2014; v1 submitted 28 August, 2014; originally announced August 2014.

Comments: to be presented at Allerton 2014

arXiv:1405.0894 [pdf, other]

Interactive Function Computation via Polar Coding

Authors: Talha Cihad Gulcu, Alexander Barg

Abstract: In a series of papers N. Ma and P. Ishwar (2011-13) considered a range of distributed source coding problems that arise in the context of iterative computation of functions, characterizing the region of achievable communication rates. We consider the problems of interactive computation of functions by two terminals and interactive computation in a collocated network, showing that the rate regions… ▽ More In a series of papers N. Ma and P. Ishwar (2011-13) considered a range of distributed source coding problems that arise in the context of iterative computation of functions, characterizing the region of achievable communication rates. We consider the problems of interactive computation of functions by two terminals and interactive computation in a collocated network, showing that the rate regions for both these problems can be achieved using several rounds of polar-coded transmissions. △ Less

Submitted 31 October, 2015; v1 submitted 5 May, 2014; originally announced May 2014.

Comments: to appear at Problems of Information Transmission

arXiv:1404.5501 [pdf, ps, other]

doi 10.3934/amc.2015.9.87

Polar Codes for Distributed Hierarchical Source Coding

Authors: Min Ye, Alexander Barg

Abstract: We show that polar codes can be used to achieve the rate-distortion functions in the problem of hierarchical source coding also known as the successive refinement problem. We also analyze the distributed version of this problem, constructing a polar coding scheme that achieves the rate distortion functions for successive refinement with side information. We show that polar codes can be used to achieve the rate-distortion functions in the problem of hierarchical source coding also known as the successive refinement problem. We also analyze the distributed version of this problem, constructing a polar coding scheme that achieves the rate distortion functions for successive refinement with side information. △ Less

Submitted 22 April, 2014; originally announced April 2014.

Comments: 14 pages

Journal ref: Advances in Mathematics of Communication, 9, no.1, 2015, 87-103

arXiv:1402.0916 [pdf, ps, other]

Bounds on Locally Recoverable Codes with Multiple Recovering Sets

Authors: Itzhak Tamo, Alexander Barg

Abstract: A locally recoverable code (LRC code) is a code over a finite alphabet such that every symbol in the encoding is a function of a small number of other symbols that form a recovering set. Bounds on the rate and distance of such codes have been extensively studied in the literature. In this paper we derive upper bounds on the rate and distance of codes in which every symbol has $t\geq 1$ disjoint re… ▽ More A locally recoverable code (LRC code) is a code over a finite alphabet such that every symbol in the encoding is a function of a small number of other symbols that form a recovering set. Bounds on the rate and distance of such codes have been extensively studied in the literature. In this paper we derive upper bounds on the rate and distance of codes in which every symbol has $t\geq 1$ disjoint recovering sets. △ Less

Submitted 4 February, 2014; originally announced February 2014.

Comments: Submitted to ISIT 2014

arXiv:1311.3284 [pdf, ps, other]

doi 10.1109/TIT.2014.2321280

A family of optimal locally recoverable codes

Authors: Itzhak Tamo, Alexander Barg

Abstract: A code over a finite alphabet is called locally recoverable (LRC) if every symbol in the encoding is a function of a small number (at most $r$) other symbols. We present a family of LRC codes that attain the maximum possible value of the distance for a given locality parameter and code cardinality. The codewords are obtained as evaluations of specially constructed polynomials over a finite field,… ▽ More A code over a finite alphabet is called locally recoverable (LRC) if every symbol in the encoding is a function of a small number (at most $r$) other symbols. We present a family of LRC codes that attain the maximum possible value of the distance for a given locality parameter and code cardinality. The codewords are obtained as evaluations of specially constructed polynomials over a finite field, and reduce to a Reed-Solomon code if the locality parameter $r$ is set to be equal to the code dimension. The size of the code alphabet for most parameters is only slightly greater than the code length. The recovery procedure is performed by polynomial interpolation over $r$ points. We also construct codes with several disjoint recovering sets for every symbol. This construction enables the system to conduct several independent and simultaneous recovery processes of a specific symbol by accessing different parts of the codeword. This property enables high availability of frequently accessed data ("hot data"). △ Less

Submitted 11 July, 2014; v1 submitted 13 November, 2013; originally announced November 2013.

Comments: Minor changes. This is the final published version of the paper

Journal ref: IEEE Transactions on Information Theory, vol. 60, no. 8, 2014

Showing 1–50 of 77 results for author: Barg, A