-
Achieving DNA Labeling Capacity with Minimum Labels through Extremal de Bruijn Subgraphs
Authors:
Christoph Hofmeister,
Anina Gruica,
Dganit Hanania,
Rawad Bitar,
Eitan Yaakobi
Abstract:
DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molecular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(\ell+1)$-ary sequence, where $\ell$ is the number of labels, in which any non-zero symbol indicates the appearance of the…
▽ More
DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molecular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(\ell+1)$-ary sequence, where $\ell$ is the number of labels, in which any non-zero symbol indicates the appearance of the corresponding label in the DNA molecule. The labeling capacity refers to the maximum information rate that can be achieved by the labeling process for any given set of labels. The main goal of this paper is to study the minimum number of labels of the same length required to achieve the maximum labeling capacity of 2 for DNA sequences or $\log_2q$ for an arbitrary alphabet of size $q$. The solution to this problem requires the study of path unique subgraphs of the de Bruijn graph with the largest number of edges and we provide upper and lower bounds on this value.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Reducing Coverage Depth in DNA Storage: A Combinatorial Perspective on Random Access Efficiency
Authors:
Anina Gruica,
Daniella Bar-Lev,
Alberto Ravagnani,
Eitan Yaakobi
Abstract:
We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large num…
▽ More
We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large number of copies. In this context, the random access coverage depth problem refers to the expected number of reads (i.e., sequenced strands) until it is possible to decode a specific information strand, which is requested by the user. The goal is to minimize the maximum expectation over all possible requested information strands, and this value is denoted by $T_{\max}(G)$. This paper introduces new techniques to investigate the random access coverage depth problem, which capture its combinatorial nature. We establish two general formulas to find $T_{max}(G)$ for arbitrary matrices. We introduce the concept of recovery balanced codes and combine all these results and notions to compute $T_{\max}(G)$ for MDS, simplex, and Hamming codes. We also study the performance of modified systematic MDS matrices and our results show that the best results for $T_{\max}(G)$ are achieved with a specific mix of encoded strands and replication of the information strands.
△ Less
Submitted 28 January, 2024;
originally announced January 2024.
-
Rank-Metric Codes and Their Parameters
Authors:
Anina Gruica,
Altan B. Kilic,
Alberto Ravagnani
Abstract:
We present the theory of linear rank-metric codes from the point of view of their fundamental parameters. These are: the minimum rank distance, the rank distribution, the maximum rank, the covering radius, and the field size. The focus of this chapter is on the interplay among these parameters and on their significance for the code's (combinatorial) structure. The results covered in this chapter s…
▽ More
We present the theory of linear rank-metric codes from the point of view of their fundamental parameters. These are: the minimum rank distance, the rank distribution, the maximum rank, the covering radius, and the field size. The focus of this chapter is on the interplay among these parameters and on their significance for the code's (combinatorial) structure. The results covered in this chapter span from the theory of optimal codes and anticodes to very recent developments on the asymptotic density of MRD codes.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
LRCs: Duality, LP Bounds, and Field Size
Authors:
Anina Gruica,
Benjamin Jany,
Alberto Ravagnani
Abstract:
We develop a duality theory of locally recoverable codes (LRCs) and apply it to establish a series of new bounds on their parameters. We introduce and study a refined notion of weight distribution that captures the code's locality. Using a duality result analogous to a MacWilliams identity, we then derive an LP-type bound that improves on the best known bounds in several instances. Using a dual di…
▽ More
We develop a duality theory of locally recoverable codes (LRCs) and apply it to establish a series of new bounds on their parameters. We introduce and study a refined notion of weight distribution that captures the code's locality. Using a duality result analogous to a MacWilliams identity, we then derive an LP-type bound that improves on the best known bounds in several instances. Using a dual distance bound and the theory of generalized weights, we obtain non-existence results for optimal LRCs over small fields. In particular, we show that an optimal LRC must have both minimum distance and block length relatively small compared to the field size.
△ Less
Submitted 7 September, 2023;
originally announced September 2023.
-
Duality and LP Bounds for Codes with Locality
Authors:
Anina Gruica,
Benjamin Jany,
Alberto Ravagnani
Abstract:
We initiate the study of the duality theory of locally recoverable codes, with a focus on the applications. We characterize the locality of a code in terms of the dual code, and introduce a class of invariants that refine the classical weight distribution. In this context, we establish a duality theorem analogous to (but very different from) a MacWilliams identity. As an application of our results…
▽ More
We initiate the study of the duality theory of locally recoverable codes, with a focus on the applications. We characterize the locality of a code in terms of the dual code, and introduce a class of invariants that refine the classical weight distribution. In this context, we establish a duality theorem analogous to (but very different from) a MacWilliams identity. As an application of our results, we obtain two new bounds for the parameters of a locally recoverable code, including an LP bound that improves on the best available bounds in several instances.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
Rook Theory of the Etzion-Silberstein Conjecture
Authors:
Anina Gruica,
Alberto Ravagnani
Abstract:
In 2009, Etzion and Siberstein proposed a conjecture on the largest dimension of a linear space of matrices over a finite field in which all nonzero matrices are supported on a Ferrers diagram and have rank bounded below by a given integer. Although several cases of the conjecture have been established in the past decade, proving or disproving it remains to date a wide open problem. In this paper,…
▽ More
In 2009, Etzion and Siberstein proposed a conjecture on the largest dimension of a linear space of matrices over a finite field in which all nonzero matrices are supported on a Ferrers diagram and have rank bounded below by a given integer. Although several cases of the conjecture have been established in the past decade, proving or disproving it remains to date a wide open problem. In this paper, we take a new look at the Etzion-Siberstein Conjecture, investigating its connection with rook theory. Our results show that the combinatorics behind this open problem is closely linked to the theory of $q$-rook polynomials associated with Ferrers diagrams, as defined by Garsia and Remmel. In passing, we give a closed formula for the trailing degree of the $q$-rook polynomial associated with a Ferrers diagram in terms of the cardinalities of its diagonals. The combinatorial approach taken in this paper allows us to establish some new instances of the Etzion-Silberstein Conjecture using a non-constructive argument. We also solve the asymptotic version of the conjecture over large finite fields, answering a current open question.
△ Less
Submitted 12 September, 2022;
originally announced September 2022.
-
Densities of Codes of Various Linearity Degrees in Translation-Invariant Metric Spaces
Authors:
Anina Gruica,
Anna-Lena Horlemann,
Alberto Ravagnani,
Nadja Willenborg
Abstract:
We investigate the asymptotic density of error-correcting codes with good distance properties and prescribed linearity degree, including sublinear and nonlinear codes. We focus on the general setting of finite translation-invariant metric spaces, and then specialize our results to the Hamming metric, to the rank metric, and to the sum-rank metric. Our results show that the asymptotic density of co…
▽ More
We investigate the asymptotic density of error-correcting codes with good distance properties and prescribed linearity degree, including sublinear and nonlinear codes. We focus on the general setting of finite translation-invariant metric spaces, and then specialize our results to the Hamming metric, to the rank metric, and to the sum-rank metric. Our results show that the asymptotic density of codes heavily depends on the imposed linearity degree and the chosen metric.
△ Less
Submitted 5 June, 2023; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Rank-Metric Codes, Semifields, and the Average Critical Problem
Authors:
Anina Gruica,
Alberto Ravagnani,
John Sheekey,
Ferdinando Zullo
Abstract:
We investigate two fundamental questions intersecting coding theory and combinatorial geometry, with emphasis on their connections. These are the problem of computing the asymptotic density of MRD codes in the rank metric, and the Critical Problem for combinatorial geometries by Crapo and Rota. Using methods from semifield theory, we derive two lower bounds for the density function of full-rank, s…
▽ More
We investigate two fundamental questions intersecting coding theory and combinatorial geometry, with emphasis on their connections. These are the problem of computing the asymptotic density of MRD codes in the rank metric, and the Critical Problem for combinatorial geometries by Crapo and Rota. Using methods from semifield theory, we derive two lower bounds for the density function of full-rank, square MRD codes. The first bound is sharp when the matrix size is a prime number and the underlying field is sufficiently large, while the second bound applies to the binary field. We then take a new look at the Critical Problem for combinatorial geometries, approaching it from a qualitative, often asymptotic, viewpoint. We illustrate the connection between this very classical problem and that of computing the asymptotic density of MRD codes. Finally, we study the asymptotic density of some special families of codes in the rank metric, including the symmetric, alternating and Hermitian ones. In particular, we show that the optimal codes in these three contexts are sparse.
△ Less
Submitted 18 January, 2022;
originally announced January 2022.
-
The Typical Non-Linear Code over Large Alphabets
Authors:
Anina Gruica,
Alberto Ravagnani
Abstract:
We consider the problem of describing the typical (possibly) non-linear code of minimum distance bounded from below over a large alphabet. We concentrate on block codes with the Hamming metric and on subspace codes with the injection metric. In sharp contrast with the behavior of linear block codes, we show that the typical non-linear code in the Hamming metric of cardinality $q^{n-d+1}$ is far fr…
▽ More
We consider the problem of describing the typical (possibly) non-linear code of minimum distance bounded from below over a large alphabet. We concentrate on block codes with the Hamming metric and on subspace codes with the injection metric. In sharp contrast with the behavior of linear block codes, we show that the typical non-linear code in the Hamming metric of cardinality $q^{n-d+1}$ is far from having minimum distance $d$, i.e., from being MDS. We also give more precise results about the asymptotic proportion of block codes with good distance properties within the set of codes having a certain cardinality. We then establish the analogous results for subspace codes with the injection metric, showing also an application to the theory of partial spreads in finite geometry.
△ Less
Submitted 22 November, 2021; v1 submitted 10 May, 2021;
originally announced May 2021.
-
Convolutional codes over finite chain rings, MDP codes and their characterization
Authors:
Gianira N. Alfarano,
Anina Gruica,
Julia Lieb,
Joachim Rosenthal
Abstract:
In this paper, we develop the theory of convolutional codes over finite commutative chain rings. In particular, we focus on maximum distance profile (MDP) convolutional codes and we provide a characterization of these codes, generalizing the one known for fields. Moreover, we relate (reverse) MDP convolutional codes over a finite chain ring with (reverse) MDP convolutional codes over its residue f…
▽ More
In this paper, we develop the theory of convolutional codes over finite commutative chain rings. In particular, we focus on maximum distance profile (MDP) convolutional codes and we provide a characterization of these codes, generalizing the one known for fields. Moreover, we relate (reverse) MDP convolutional codes over a finite chain ring with (reverse) MDP convolutional codes over its residue field. Finally, we provide a construction of (reverse) MDP convolutional codes over finite chain rings generalizing the notion of (reverse) superregular matrices.
△ Less
Submitted 30 March, 2022; v1 submitted 19 April, 2021;
originally announced April 2021.
-
Common Complements of Linear Subspaces and the Sparseness of MRD Codes
Authors:
Anina Gruica,
Alberto Ravagnani
Abstract:
Motivated by applications to the theory of rank-metric codes, we study the problem of estimating the number of common complements of a family of subspaces over a finite field in terms of the cardinality of the family and its intersection structure. We derive upper and lower bounds for this number, along with their asymptotic versions as the field size tends to infinity. We then use these bounds to…
▽ More
Motivated by applications to the theory of rank-metric codes, we study the problem of estimating the number of common complements of a family of subspaces over a finite field in terms of the cardinality of the family and its intersection structure. We derive upper and lower bounds for this number, along with their asymptotic versions as the field size tends to infinity. We then use these bounds to describe the general behaviour of common complements with respect to sparseness and density, showing that the decisive property is whether or not the number of spaces to be complemented is negligible with respect to the field size. By specializing our results to matrix spaces, we obtain upper and lower bounds for the number of MRD codes in the rank metric. In particular, we answer an open question in coding theory, proving that MRD codes are sparse for all parameter sets as the field size grows, with only very few exceptions. We also investigate the density of MRD codes as their number of columns tends to infinity, obtaining a new asymptotic bound. Using properties of the Euler function from number theory, we then show that our bound improves on known results for most parameter sets. We conclude the paper by establishing general structural properties of the density function of rank-metric codes.
△ Less
Submitted 17 January, 2022; v1 submitted 5 November, 2020;
originally announced November 2020.