-
Generalized Unique Reconstruction from Substrings
Authors:
Yonatan Yehezkeally,
Daniella Bar-Lev,
Sagi Marcovich,
Eitan Yaakobi
Abstract:
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this…
▽ More
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this work studies two extensions of this paradigm. The first extension considers the setup in which consecutive substrings are read with some given minimum overlap. First, an upper bound is provided on the attainable rates of codes that guarantee unique reconstruction. Then, efficient constructions of codes that asymptotically meet that upper bound are presented. In the second extension, we study the setup where multiple strings are reconstructed together. Given the number of strings and their length, we first derive a lower bound on the read substrings' length $\ell$ that is necessary for the existence of multi-strand reconstruction codes with non-vanishing rates. We then present two constructions of such codes and show that their rates approach 1 for values of $\ell$ that asymptotically behave like the lower bound.
△ Less
Submitted 20 April, 2023; v1 submitted 10 October, 2022;
originally announced October 2022.
-
Covering Sequences for $\ell$-Tuples
Authors:
Sagi Marcovich,
Tuvi Etzion,
Eitan Yaakobi
Abstract:
de Bruijn sequences of order $\ell$, i.e., sequences that contain each $\ell$-tuple as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has rate of $1/2$. To overcome this low rate, we study $\ell$-tuples covering sequences, which impose that each $\ell$-tuple appears at least once as a window in the…
▽ More
de Bruijn sequences of order $\ell$, i.e., sequences that contain each $\ell$-tuple as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has rate of $1/2$. To overcome this low rate, we study $\ell$-tuples covering sequences, which impose that each $\ell$-tuple appears at least once as a window in the sequence. The cardinality of this family of sequences is analyzed while assuming that $\ell$ is a function of the sequence length $n$. Lower and upper bounds on the asymptotic rate of this family are given. Moreover, we study an upper bound for $\ell$ such that the redundancy of the set of $\ell$-tuples covering sequences is at most a single symbol. Lastly, we present efficient encoding and decoding schemes for $\ell$-tuples covering sequences that meet this bound.
△ Less
Submitted 8 June, 2022;
originally announced June 2022.
-
Reconstruction from Substrings with Partial Overlap
Authors:
Yonatan Yehezkeally,
Daniella Bar-Lev,
Sagi Marcovich,
Eitan Yaakobi
Abstract:
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the set…
▽ More
This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the setup in which consecutive substrings are read with some given minimum overlap. First, upper bounds are provided on the attainable rates of codes that guarantee unique reconstruction. Then, we present efficient constructions of asymptotically optimal codes that meet the upper bound.
△ Less
Submitted 8 May, 2022;
originally announced May 2022.
-
Adversarial Torn-paper Codes
Authors:
Daniella Bar-Lev,
Sagi Marcovich,
Eitan Yaakobi,
Yonatan Yehezkeally
Abstract:
We study the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry information may break into smaller pieces which are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any parameters of the channel for which non-vanishing asymptotic r…
▽ More
We study the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry information may break into smaller pieces which are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any parameters of the channel for which non-vanishing asymptotic rate is possible and show our constructions achieve asymptotically optimal rate while allowing for efficient encoding and decoding. Finally, we extend our results to related settings included multi-strand storage, presence of substitution errors, or incomplete coverage.
△ Less
Submitted 4 July, 2023; v1 submitted 26 January, 2022;
originally announced January 2022.
-
Multi-strand Reconstruction from Substrings
Authors:
Yonatan Yehezkeally,
Sagi Marcovich,
Eitan Yaakobi
Abstract:
The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their substrings of some fixed length $\ell$, defined as…
▽ More
The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their substrings of some fixed length $\ell$, defined as the $\ell$-profile of $S$, are received and the goal is to reconstruct all strings in $S$. A multi-strand $\ell$-reconstruction code is a set of multisets such that every element $S$ can be reconstructed from its $\ell$-profile. Given the number of strings~$k$ and their length~$n$, we first find a lower bound on the value of $\ell$ necessary for existence of multi-strand $\ell$-reconstruction codes with non-vanishing asymptotic rate. We then present two constructions of such codes and show that their rates approach~$1$ for values of $\ell$ that asymptotically behave like the lower bound.
△ Less
Submitted 26 August, 2021;
originally announced August 2021.
-
The Zero Cubes Free and Cubes Unique Multidimensional Constraints
Authors:
Sagi Marcovich,
Eitan Yaakobi
Abstract:
This paper studies two families of constraints for two-dimensional and multidimensional arrays. The first family requires that a multidimensional array will not contain a cube of zeros of some fixed size and the second constraint imposes that there will not be two identical cubes of a given size in the array. These constraints are natural extensions of their one-dimensional counterpart that have b…
▽ More
This paper studies two families of constraints for two-dimensional and multidimensional arrays. The first family requires that a multidimensional array will not contain a cube of zeros of some fixed size and the second constraint imposes that there will not be two identical cubes of a given size in the array. These constraints are natural extensions of their one-dimensional counterpart that have been rigorously studied recently. For both of these constraint we present conditions of the size of the cube for which the asymptotic rate of the set of valid arrays approaches 1 as well as conditions for the redundancy to be at most a single symbol. For the first family we present an efficient encoding algorithm that uses a single symbol to encode arbitrary information into a valid array and for the second family we present a similar encoder for the two-dimensional case. The results in the paper are also extended to similar constraints where the sub-array is not necessarily a cube, but a box of arbitrary dimensions and only its volume is bounded.
△ Less
Submitted 31 January, 2021;
originally announced February 2021.
-
Reconstruction of Strings from their Substrings Spectrum
Authors:
Sagi Marcovich,
Eitan Yaakobi
Abstract:
This paper studies reconstruction of strings based upon their substrings spectrum. Under this paradigm, it is assumed that all substrings of some fixed length are received and the goal is to reconstruct the string. While many existing works assumed that substrings are received error free, we follow in this paper the noisy setup of this problem that was first studied by Gabrys and Milenkovic. The g…
▽ More
This paper studies reconstruction of strings based upon their substrings spectrum. Under this paradigm, it is assumed that all substrings of some fixed length are received and the goal is to reconstruct the string. While many existing works assumed that substrings are received error free, we follow in this paper the noisy setup of this problem that was first studied by Gabrys and Milenkovic. The goal of this study is twofold. First we study the setup in which not all substrings in the multispectrum are received, and then we focus on the case where the read substrings are not error free. In each case we provide specific code constructions of strings that their reconstruction is guaranteed even in the presence of failure in either model. We present efficient encoding and decoding maps and analyze the cardinality of the code constructions, while studying the cases where the rates of our codes approach 1.
△ Less
Submitted 1 June, 2021; v1 submitted 23 December, 2019;
originally announced December 2019.