Skip to main content

Showing 1–50 of 115 results for author: Yaakobi, E

  1. arXiv:2406.17689  [pdf, ps, other

    cs.IT cs.DS

    Robust Gray Codes Approaching the Optimal Rate

    Authors: Roni Con, Dorsa Fathollahi, Ryan Gabrys, Mary Wootters, Eitan Yaakobi

    Abstract: Robust Gray codes were introduced by (Lolck and Pagh, SODA 2024). Informally, a robust Gray code is a (binary) Gray code $\mathcal{G}$ so that, given a noisy version of the encoding $\mathcal{G}(j)$ of an integer $j$, one can recover $\hat{j}$ that is close to $j$ (with high probability over the noise). Such codes have found applications in differential privacy. In this work, we present near-opt… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2405.08625  [pdf, other

    cs.IT

    Optimal Almost-Balanced Sequences

    Authors: Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi

    Abstract: This paper presents a novel approach to address the constrained coding challenge of generating almost-balanced sequences. While strictly balanced sequences have been well studied in the past, the problem of designing efficient algorithms with small redundancy, preferably constant or even a single bit, for almost balanced sequences has remained unsolved. A sequence is $\varepsilon(n)$-almost balanc… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024

  3. arXiv:2405.08475  [pdf, ps, other

    cs.IT

    Representing Information on DNA using Patterns Induced by Enzymatic Labeling

    Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi, Zohar Yakhini

    Abstract: Enzymatic DNA labeling is a powerful tool with applications in biochemistry, molecular biology, biotechnology, medical science, and genomic research. This paper contributes to the evolving field of DNA-based data storage by presenting a formal framework for modeling DNA labeling in strings, specifically tailored for data storage purposes. Our approach involves a known DNA molecule as a template fo… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to The IEEE International Symposium on Information Theory (ISIT) 2024

  4. arXiv:2405.06870  [pdf, other

    cs.IT

    Noise-Tolerant Codebooks for Semi-Quantitative Group Testing: Application to Spatial Genomics

    Authors: Kok Hao Chen, Duc Tu Dao, Han Mao Kiah, Van Long Phuoc Pham, Eitan Yaakobi

    Abstract: Motivated by applications in spatial genomics, we revisit group testing (Dorfman~1943) and propose the class of $λ$-{\sf ADD}-codes, studying such codes with certain distance $d$ and codelength $n$. When $d$ is constant, we provide explicit code constructions with rates close to $1/2$. When $d$ is proportional to $n$, we provide a GV-type lower bound whose rates are efficiently computable. Upper b… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: To appear in ISIT 2024 Proceedings

  5. arXiv:2405.06583  [pdf, other

    cs.IT

    Private Repair of a Single Erasure in Reed-Solomon Codes

    Authors: Stanislav Kruglik, Han Mao Kiah, Son Hoang Dau, Eitan Yaakobi

    Abstract: We investigate the problem of privately recovering a single erasure for Reed-Solomon codes with low communication bandwidths. For an $[n,k]_{q^\ell}$ code with $n-k\geq q^{m}+t-1$, we construct a repair scheme that allows a client to recover an arbitrary codeword symbol without leaking its index to any set of $t$ colluding helper nodes at a repair bandwidth of $(n-1)(\ell-m)$ sub-symbols in… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: Full version of the paper accepted for the 2024 IEEE International Symposium on Information Theory (ISIT)

  6. arXiv:2405.02080  [pdf, ps, other

    cs.IT

    Coding for Synthesis Defects

    Authors: Ziyang Lu, Han Mao Kiah, Yiwei Zhang, Robert N. Grass, Eitan Yaakobi

    Abstract: Motivated by DNA based data storage system, we investigate the errors that occur when synthesizing DNA strands in parallel, where each strand is appended one nucleotide at a time by the machine according to a template supersequence. If there is a cycle such that the machine fails, then the strands meant to be appended at this cycle will not be appended, and we refer to this as a synthesis defect.… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  7. arXiv:2404.12868  [pdf, ps, other

    cs.IT

    Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions

    Authors: Frederik Walter, Omer Sabary, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: Composite DNA is a recent method to increase the base alphabet size in DNA-based data storage.This paper models synthesizing and sequencing of composite DNA and introduces coding techniques to correct substitutions, losses of entire strands, and symbol deletion errors. Non-asymptotic upper bounds on the size of codes with $t$ occurrences of these error types are derived. Explicit constructions are… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  8. arXiv:2403.19061  [pdf, other

    cs.IT

    One Code Fits All: Strong stuck-at codes for versatile memory encoding

    Authors: Roni Con, Ryan Gabrys, Eitan Yaakobi

    Abstract: In this work we consider a generalization of the well-studied problem of coding for ``stuck-at'' errors, which we refer to as ``strong stuck-at'' codes. In the traditional framework of stuck-at codes, the task involves encoding a message into a one-dimensional binary vector. However, a certain number of the bits in this vector are 'frozen', meaning they are fixed at a predetermined value and canno… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  9. arXiv:2403.15827  [pdf, ps, other

    cs.IT

    Permutation Recovery Problem against Deletion Errors for DNA Data Storage

    Authors: Shubhransh Singhvi, Charchit Gupta, Avital Boruchovsky, Yuval Goldberg, Han Mao Kiah, Eitan Yaakobi

    Abstract: Owing to its immense storage density and durability, DNA has emerged as a promising storage medium. However, due to technological constraints, data can only be written onto many short DNA molecules called data blocks that are stored in an unordered way. To handle the unordered nature of DNA data storage systems, a unique address is typically prepended to each data block to form a DNA strand. Howev… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2305.04597

  10. arXiv:2403.07754  [pdf, other

    cs.IT

    An Optimal Sequence Reconstruction Algorithm for Reed-Solomon Codes

    Authors: Shubhransh Singhvi, Roni Con, Han Mao Kiah, Eitan Yaakobi

    Abstract: The sequence reconstruction problem, introduced by Levenshtein in 2001, considers a scenario where the sender transmits a codeword from some codebook, and the receiver obtains $N$ noisy outputs of the codeword. We study the problem of efficient reconstruction using $N$ outputs that are each corrupted by at most $t$ substitutions. Specifically, for the ubiquitous Reed-Solomon codes, we adapt the Ko… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Submitted to IEEE ISIT 2024

  11. arXiv:2402.03987  [pdf, other

    cs.IT

    Tail-Erasure-Correcting Codes

    Authors: Boaz Moav, Ryan Gabrys, Eitan Yaakobi

    Abstract: The increasing demand for data storage has prompted the exploration of new techniques, with molecular data storage being a promising alternative. In this work, we develop coding schemes for a new storage paradigm that can be represented as a collection of two-dimensional arrays. Motivated by error patterns observed in recent prototype architectures, our study focuses on correcting erasures in the… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  12. arXiv:2401.17649  [pdf, other

    cs.IT

    Covering All Bases: The Next Inning in DNA Sequencing Efficiency

    Authors: Hadas Abraham, Rayn Gabrys, Eitan Yaakobi

    Abstract: DNA emerges as a promising medium for the exponential growth of digital data due to its density and durability. This study extends recent research by addressing the \emph{coverage depth problem} in practical scenarios, exploring optimal error-correcting code pairings with DNA storage systems to minimize coverage depth. Conducted within random access settings, the study provides theoretical analyse… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  13. arXiv:2401.16915  [pdf, ps, other

    cs.IT cs.DC

    Interactive Byzantine-Resilient Gradient Coding for General Data Assignments

    Authors: Shreyas Jain, Luis Maßny, Christoph Hofmeister, Eitan Yaakobi, Rawad Bitar

    Abstract: We tackle the problem of Byzantine errors in distributed gradient descent within the Byzantine-resilient gradient coding framework. Our proposed solution can recover the exact full gradient in the presence of $s$ malicious workers with a data replication factor of only $s+1$. It generalizes previous solutions to any data assignment scheme that has a regular replication over all data samples. The s… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

  14. arXiv:2401.15939  [pdf, other

    cs.IT

    Correcting a Single Deletion in Reads from a Nanopore Sequencer

    Authors: Anisha Banerjee, Yonatan Yehezkeally, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to devise practical and efficient coding schemes that would allow accurate retrieval of the data stored. Our work takes a step in this direction by adopting a simplifi… ▽ More

    Submitted 7 May, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at IEEE ISIT'24

  15. arXiv:2401.15733  [pdf, ps, other

    cs.IT

    Achieving DNA Labeling Capacity with Minimum Labels through Extremal de Bruijn Subgraphs

    Authors: Christoph Hofmeister, Anina Gruica, Dganit Hanania, Rawad Bitar, Eitan Yaakobi

    Abstract: DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molecular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(\ell+1)$-ary sequence, where $\ell$ is the number of labels, in which any non-zero symbol indicates the appearance of the… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  16. arXiv:2401.15722  [pdf, ps, other

    cs.IT math.CO

    Reducing Coverage Depth in DNA Storage: A Combinatorial Perspective on Random Access Efficiency

    Authors: Anina Gruica, Daniella Bar-Lev, Alberto Ravagnani, Eitan Yaakobi

    Abstract: We investigate the fundamental limits of the recently proposed random access coverage depth problem for DNA data storage. Under this paradigm, it is assumed that the user information consists of $k$ information strands, which are encoded into $n$ strands via some generator matrix $G$. In the sequencing process, the strands are read uniformly at random, since each strand is available in a large num… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  17. arXiv:2401.15666  [pdf, other

    cs.IT

    Error-Correcting Codes for Combinatorial Composite DNA

    Authors: Omer Sabary, Inbal Preuss, Ryan Gabrys, Zohar Yakhini, Leon Anavy, Eitan Yaakobi

    Abstract: Data storage in DNA is developing as a possible solution for archival digital data. Recently, to further increase the potential capacity of DNA-based data storage systems, the combinatorial composite DNA synthesis method was suggested. This approach extends the DNA alphabet by harnessing short DNA fragment reagents, known as shortmers. The shortmers are building blocks of the alphabet symbols, con… ▽ More

    Submitted 26 May, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

  18. arXiv:2401.15368  [pdf, ps, other

    cs.IT

    The Capacity of the Weighted Read Channel

    Authors: Omer Yerushalmi, Tuvi Etzion, Eitan Yaakobi

    Abstract: One of the primary sequencing methods gaining prominence in DNA storage is nanopore sequencing, attributed to various factors. In this work, we consider a simplified model of the sequencer, characterized as a channel. This channel takes a sequence and processes it using a sliding window of length $\ell$, shifting the window by $δ$ characters each time. The output of this channel, which we refer to… ▽ More

    Submitted 27 January, 2024; originally announced January 2024.

  19. arXiv:2401.02380  [pdf, other

    cs.IT

    Byzantine-Resilient Gradient Coding through Local Gradient Computations

    Authors: Christoph Hofmeister, Luis Maßny, Eitan Yaakobi, Rawad Bitar

    Abstract: We consider gradient coding in the presence of an adversary controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the responses from malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partia… ▽ More

    Submitted 5 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2303.13231

  20. M-DAB: An Input-Distribution Optimization Algorithm for Composite DNA Storage by the Multinomial Channel

    Authors: Adir Kobovich, Eitan Yaakobi, Nir Weinberger

    Abstract: Recent experiments have shown that the capacity of DNA storage systems may be significantly increased by synthesizing composite DNA letters. In this work, we model a DNA storage channel with composite inputs as a \textit{multinomial channel}, and propose an optimization algorithm for its capacity achieving input distribution, for an arbitrary number of output reads. The algorithm is termed multidi… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: 6 pages, 3 figures

    ACM Class: H.1.1

  21. arXiv:2308.14558  [pdf, other

    cs.IT math.CO

    Storage codes and recoverable systems on lines and grids

    Authors: Alexander Barg, Ohad Elishco, Ryan Gabrys, Geyang Wang, Eitan Yaakobi

    Abstract: A storage code is an assignment of symbols to the vertices of a connected graph $G(V,E)$ with the property that the value of each vertex is a function of the values of its neighbors, or more generally, of a certain neighborhood of the vertex in $G$. In this work we introduce a new construction method of storage codes, enabling one to construct new codes from known ones via an interleaving procedur… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  22. Error-Correcting Codes for Nanopore Sequencing

    Authors: Anisha Banerjee, Yonatan Yehezkeally, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: Nanopore sequencing, superior to other sequencing technologies for DNA storage in multiple aspects, has recently attracted considerable attention. Its high error rates, however, demand thorough research on practical and efficient coding schemes to enable accurate recovery of stored data. To this end, we consider a simplified model of a nanopore sequencer inspired by Mao \emph{et al.}, incorporatin… ▽ More

    Submitted 8 December, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

    Comments: Submitted to Transactions on Information Theory

  23. arXiv:2305.07992  [pdf, ps, other

    cs.IT

    On the Capacity of DNA Labeling

    Authors: Dganit Hanania, Daniella Bar-Lev, Yevgeni Nogin, Yoav Shechtman, Eitan Yaakobi

    Abstract: DNA labeling is a powerful tool in molecular biology and biotechnology that allows for the visualization, detection, and study of DNA at the molecular level. Under this paradigm, a DNA molecule is being labeled by specific k patterns and is then imaged. Then, the resulted image is modeled as a (k + 1)- ary sequence in which any non-zero symbol indicates on the appearance of the corresponding label… ▽ More

    Submitted 22 January, 2024; v1 submitted 13 May, 2023; originally announced May 2023.

  24. arXiv:2305.05972  [pdf, other

    cs.IT cs.DS

    Coding for IBLTs with Listing Guarantees

    Authors: Daniella Bar-Lev, Avi Mizrahi, Tuvi Etzion, Ori Rottenstreich, Eitan Yaakobi

    Abstract: The Invertible Bloom Lookup Table (IBLT) is a probabilistic data structure for set representation, with applications in network and traffic monitoring. It is known for its ability to list its elements, an operation that succeeds with high probability for sufficiently large table. However, listing can fail even for relatively small sets. This paper extends recent work on the worst-case analysis of… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  25. arXiv:2305.05656  [pdf, other

    cs.DM cs.IT math.PR

    Cover Your Bases: How to Minimize the Sequencing Coverage in DNA Storage Systems

    Authors: Daniella Bar-Lev, Omer Sabary, Ryan Gabrys, Eitan Yaakobi

    Abstract: Although the expenses associated with DNA sequencing have been rapidly decreasing, the current cost of sequencing information stands at roughly $120/GB, which is dramatically more expensive than reading from existing archival storage solutions today. In this work, we aim to reduce not only the cost but also the latency of DNA storage by initiating the study of the DNA coverage depth problem, which… ▽ More

    Submitted 29 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  26. arXiv:2305.04597  [pdf, other

    cs.IT

    Data-Driven Bee Identification for DNA Strands

    Authors: Shubhransh Singhvi, Avital Boruchovsky, Han Mao Kiah, Eitan Yaakobi

    Abstract: We study a data-driven approach to the bee identification problem for DNA strands. The bee-identification problem, introduced by Tandon et al. (2019), requires one to identify $M$ bees, each tagged by a unique barcode, via a set of $M$ noisy measurements. Later, Chrisnata et al. (2022) extended the model to case where one observes $N$ noisy measurements of each bee, and applied the model to addres… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Conference paper accepted at ISIT 2023

  27. arXiv:2304.10391  [pdf, other

    cs.IT

    DNA-Correcting Codes: End-to-end Correction in DNA Storage Systems

    Authors: Avital Boruchovsky, Daniella Bar-Lev, Eitan Yaakobi

    Abstract: This paper introduces a new solution to DNA storage that integrates all three steps of retrieval, namely clustering, reconstruction, and error correction. DNA-correcting codes are presented as a unique solution to the problem of ensuring that the output of the storage system is unique for any valid set of input strands. To this end, we introduce a novel distance metric to capture the unique behavi… ▽ More

    Submitted 30 June, 2024; v1 submitted 20 April, 2023; originally announced April 2023.

    Comments: Extended version of the paper that appeared in ISIT 2023

  28. arXiv:2304.01317  [pdf, other

    cs.IT

    Universal Framework for Parametric Constrained Coding

    Authors: Daniella Bar-Lev, Adir Kobovich, Orian Leitersdorf, Eitan Yaakobi

    Abstract: Constrained coding is a fundamental field in coding theory that tackles efficient communication through constrained channels. While channels with fixed constraints have a general optimal solution, there is increasing demand for parametric constraints that are dependent on the message length. Several works have tackled such parametric constraints through iterative algorithms, yet they require compl… ▽ More

    Submitted 3 April, 2023; originally announced April 2023.

  29. arXiv:2303.13231  [pdf, other

    cs.IT

    Trading Communication for Computation in Byzantine-Resilient Gradient Coding

    Authors: Christoph Hofmeister, Luis Maßny, Eitan Yaakobi, Rawad Bitar

    Abstract: We consider gradient coding in the presence of an adversary, controlling so-called malicious workers trying to corrupt the computations. Previous works propose the use of MDS codes to treat the inputs of the malicious workers as errors and correct them using the error-correction properties of the code. This comes at the expense of increasing the replication, i.e., the number of workers each partia… ▽ More

    Submitted 5 June, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

  30. arXiv:2212.13812  [pdf, other

    cs.IT cs.DS

    Invertible Bloom Lookup Tables with Listing Guarantees

    Authors: Avi Mizrahi, Daniella Bar-Lev, Eitan Yaakobi, Ori Rottenstreich

    Abstract: The Invertible Bloom Lookup Table (IBLT) is a probabilistic concise data structure for set representation that supports a listing operation as the recovery of the elements in the represented set. Its applications can be found in network synchronization and traffic monitoring as well as in error-correction codes. IBLT can list its elements with probability affected by the size of the allocated memo… ▽ More

    Submitted 28 December, 2022; originally announced December 2022.

  31. Generalized Unique Reconstruction from Substrings

    Authors: Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi

    Abstract: This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which all substrings of pre-defined lengths are read or substrings are read with no overlap for the single string case, this… ▽ More

    Submitted 20 April, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: Author-submitted, peer-reviewed and accepted version (IEEE Trans. on Inform. Theory). arXiv admin note: text overlap with arXiv:2205.03933

  32. arXiv:2208.05412  [pdf, ps, other

    cs.IT

    Equivalence of Insertion/Deletion Correcting Codes for $d$-dimensional Arrays

    Authors: Evagoras Stylianou, Lorenz Welter, Rawad Bitar, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: We consider the problem of correcting insertion and deletion errors in the $d$-dimensional space. This problem is well understood for vectors (one-dimensional space) and was recently studied for arrays (two-dimensional space). For vectors and arrays, the problem is motivated by several practical applications such as DNA-based storage and racetrack memories. From a theoretical perspective, it is in… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: Accepted to IEEE International Symposium on Information Theory 2022

  33. arXiv:2206.07995  [pdf, ps, other

    cs.IT math.CO

    On the Size of Balls and Anticodes of Small Diameter under the Fixed-Length Levenshtein Metric

    Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi

    Abstract: The rapid development of DNA storage has brought the deletion and insertion channel to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. Similar to any other metric, the size of a ball is one of the most fundamental parameters. In this w… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2103.01681

  34. arXiv:2206.03711  [pdf, other

    cs.IT

    Covering Sequences for $\ell$-Tuples

    Authors: Sagi Marcovich, Tuvi Etzion, Eitan Yaakobi

    Abstract: de Bruijn sequences of order $\ell$, i.e., sequences that contain each $\ell$-tuple as a window exactly once, have found many diverse applications in information theory and most recently in DNA storage. This family of binary sequences has rate of $1/2$. To overcome this low rate, we study $\ell$-tuples covering sequences, which impose that each $\ell$-tuple appears at least once as a window in the… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

  35. arXiv:2205.03933  [pdf, ps, other

    cs.IT

    Reconstruction from Substrings with Partial Overlap

    Authors: Yonatan Yehezkeally, Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi

    Abstract: This paper introduces a new family of reconstruction codes which is motivated by applications in DNA data storage and sequencing. In such applications, DNA strands are sequenced by reading some subset of their substrings. While previous works considered two extreme cases in which \emph{all} substrings of some fixed length are read or substrings are read with no overlap, this work considers the set… ▽ More

    Submitted 8 May, 2022; originally announced May 2022.

    Comments: 6 pages, 2 figures; conference submission

  36. arXiv:2205.03911  [pdf, other

    cs.IT

    Codes for Constrained Periodicity

    Authors: Adir Kobovich, Orian Leitersdorf, Daniella Bar-Lev, Eitan Yaakobi

    Abstract: Reliability is an inherent challenge for the emerging nonvolatile technology of racetrack memories, and there exists a fundamental relationship between codes designed for racetrack memories and codes with constrained periodicity. Previous works have sought to construct codes that avoid periodicity in windows, yet have either only provided existence proofs or required high redundancy. This paper pr… ▽ More

    Submitted 25 August, 2022; v1 submitted 8 May, 2022; originally announced May 2022.

    Comments: Accepted to The International Symposium on Information Theory and Its Applications (ISITA) 2022

  37. arXiv:2202.03024  [pdf, other

    cs.IT

    The Input and Output Entropies of the $k$-Deletion/Insertion Channel

    Authors: Shubhransh Singhvi, Omer Sabary, Daniella Bar-Lev, Eitan Yaakobi

    Abstract: The channel output entropy of a transmitted word is the entropy of the possible channel outputs and similarly, the input entropy of a received word is the entropy of all possible transmitted words. The goal of this work is to study these entropy values for the k-deletion, k-insertion channel, where exactly k symbols are deleted, and inserted in the transmitted word, respectively. If all possible w… ▽ More

    Submitted 15 June, 2022; v1 submitted 7 February, 2022; originally announced February 2022.

  38. Adversarial Torn-paper Codes

    Authors: Daniella Bar-Lev, Sagi Marcovich, Eitan Yaakobi, Yonatan Yehezkeally

    Abstract: We study the adversarial torn-paper channel. This problem is motivated by applications in DNA data storage where the DNA strands that carry information may break into smaller pieces which are received out of order. Our model extends the previously researched probabilistic setting to the worst-case. We develop code constructions for any parameters of the channel for which non-vanishing asymptotic r… ▽ More

    Submitted 4 July, 2023; v1 submitted 26 January, 2022; originally announced January 2022.

    Comments: Author submitted, peer-reviewed version

  39. arXiv:2201.08612  [pdf, ps, other

    cs.IT

    Insertion and Deletion Correction in Polymer-based Data Storage

    Authors: Anisha Banerjee, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: Synthetic polymer-based storage seems to be a particularly promising candidate that could help to cope with the ever-increasing demand for archival storage requirements. It involves designing molecules of distinct masses to represent the respective bits $\{0,1\}$, followed by the synthesis of a polymer of molecular units that reflects the order of bits in the information string. Reading out the st… ▽ More

    Submitted 24 January, 2022; v1 submitted 21 January, 2022; originally announced January 2022.

  40. arXiv:2201.02466  [pdf, other

    cs.IT

    On The Decoding Error Weight of One or Two Deletion Channels

    Authors: Omer Sabary, Daniella Bar-Lev, Yotam Gershon, Alexander Yucovich, Eitan Yaakobi

    Abstract: This paper tackles two problems that are relevant to coding for insertions and deletions. These problems are motivated by several applications, among them is reconstructing strands in DNA-based storage systems. Under this paradigm, a word is transmitted over some fixed number of identical independent channels and the goal of the decoder is to output the transmitted word or some close approximation… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: arXiv admin note: text overlap with arXiv:2001.05582

  41. arXiv:2110.02008  [pdf, ps, other

    cs.IT

    Lifted Reed-Solomon Codes and Lifted Multiplicity Codes

    Authors: Lukas Holzbaur, Rina Polyanskaya, Nikita Polyanskii, Ilya Vorobyev, Eitan Yaakobi

    Abstract: Lifted Reed-Solomon and multiplicity codes are classes of codes, constructed from specific sets of $m$-variate polynomials. These codes allow for the design of high-rate codes that can recover every codeword or information symbol from many disjoint sets. Recently, the underlying approaches have been combined for the bi-variate case to construct lifted multiplicity codes, a generalization of lifted… ▽ More

    Submitted 11 October, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: The results on lifted RS codes have partially been presented at the IEEE International Symposium on Information Theory (ISIT) 2020 (arXiv:2001.11981) and parts of the results on lifted multiplicity codes have been presented at the IEEE Information Theory Workshop (ITW) 2020 (arXiv:2008.04717)

  42. arXiv:2109.09932  [pdf, other

    cs.IT

    Endurance-Limited Memories: Capacity and Codes

    Authors: Yeow Meng Chee, Michal Horovitz, Alexander Vardy, Van Khu Vu, Eitan Yaakobi

    Abstract: \emph{Resistive memories}, such as \emph{phase change memories} and \emph{resistive random access memories} have attracted significant attention in recent years due to their better scalability, speed, rewritability, and yet non-volatility. However, their \emph{limited endurance} is still a major drawback that has to be improved before they can be widely adapted in large-scale systems. In this wo… ▽ More

    Submitted 20 September, 2021; originally announced September 2021.

  43. arXiv:2109.00031  [pdf

    cs.IT cs.AI

    Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

    Authors: Daniella Bar-Lev, Itai Orr, Omer Sabary, Tuvi Etzion, Eitan Yaakobi

    Abstract: DNA-based storage is an emerging technology that enables digital information to be archived in DNA molecules. This method enjoys major advantages over magnetic and optical storage solutions such as exceptional information density, enhanced data durability, and negligible power consumption to maintain data integrity. To access the data, an information retrieval process is employed, where some of th… ▽ More

    Submitted 11 March, 2024; v1 submitted 31 August, 2021; originally announced September 2021.

  44. Multi-strand Reconstruction from Substrings

    Authors: Yonatan Yehezkeally, Sagi Marcovich, Eitan Yaakobi

    Abstract: The problem of string reconstruction based on its substrings spectrum has received significant attention recently due to its applicability to DNA data storage and sequencing. In contrast to previous works, we consider in this paper a setup of this problem where multiple strings are reconstructed together. Given a multiset $S$ of strings, all their substrings of some fixed length $\ell$, defined as… ▽ More

    Submitted 26 August, 2021; originally announced August 2021.

    Comments: 5 pages + 1 reference page. Version accepted for presentation at ITW2021

  45. arXiv:2103.01681  [pdf, ps, other

    cs.IT

    On Levenshtein Balls with Radius One

    Authors: Daniella Bar-Lev, Tuvi Etzion, Eitan Yaakobi

    Abstract: The rapid development of DNA storage has brought the deletion and insertion channel, once again, to the front line of research. When the number of deletions is equal to the number of insertions, the Fixed Length Levenshtein (FLL) metric is the right measure for the distance between two words of the same length. The size of a ball is one of the most fundamental parameters in any metric. The size of… ▽ More

    Submitted 29 June, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: 6 pages, to be published in 2021 IEEE International Symposium on Information Theory (ISIT)

  46. arXiv:2102.03094  [pdf, other

    cs.IT

    Function-Correcting Codes

    Authors: Andreas Lenz, Rawad Bitar, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: In this paper we study function-correcting codes, a new class of codes designed to protect the function evaluation of a message against errors. We show that FCCs are equivalent to irregular-distance codes, i.e., codes that obey some given distance requirement between each pair of codewords. Using these connections, we study irregular-distance codes and derive general upper and lower bounds on thei… ▽ More

    Submitted 22 May, 2023; v1 submitted 5 February, 2021; originally announced February 2021.

    MSC Class: 94B60 ACM Class: E.4

  47. arXiv:2102.02727  [pdf, ps, other

    cs.IT

    Multiple Criss-Cross Insertion and Deletion Correcting Codes

    Authors: Lorenz Welter, Rawad Bitar, Antonia Wachter-Zeh, Eitan Yaakobi

    Abstract: This paper investigates the problem of correcting multiple criss-cross insertions and deletions in arrays. More precisely, we study the unique recovery of $n \times n$ arrays affected by $t$-criss-cross deletions defined as any combination of $t_r$ row and $t_c$ column deletions such that $t_r + t_c = t$ for a given $t$. We show an equivalence between correcting $t$-criss-cross deletions and $t$-c… ▽ More

    Submitted 15 November, 2021; v1 submitted 4 February, 2021; originally announced February 2021.

  48. arXiv:2102.00519  [pdf, other

    cs.IT

    The Zero Cubes Free and Cubes Unique Multidimensional Constraints

    Authors: Sagi Marcovich, Eitan Yaakobi

    Abstract: This paper studies two families of constraints for two-dimensional and multidimensional arrays. The first family requires that a multidimensional array will not contain a cube of zeros of some fixed size and the second constraint imposes that there will not be two identical cubes of a given size in the array. These constraints are natural extensions of their one-dimensional counterpart that have b… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  49. arXiv:2101.10028  [pdf, ps, other

    cs.IT

    Correctable Erasure Patterns in Product Topologies

    Authors: Lukas Holzbaur, Sven Puchinger, Eitan Yaakobi, Antonia Wachter-Zeh

    Abstract: Locality enables storage systems to recover failed nodes from small subsets of surviving nodes. The setting where nodes are partitioned into subsets, each allowing for local recovery, is well understood. In this work we consider a generalization introduced by Gopalan et al., where, viewing the codewords as arrays, constraints are imposed on the columns and rows in addition to some global constrain… ▽ More

    Submitted 10 February, 2021; v1 submitted 25 January, 2021; originally announced January 2021.

  50. arXiv:2101.06722  [pdf, other

    cs.IT

    Almost Optimal Construction of Functional Batch Codes Using Hadamard Codes

    Authors: Lev Yohananov, Eitan Yaakobi

    Abstract: A \textit{functional $k$-batch} code of dimension $s$ consists of $n$ servers storing linear combinations of $s$ linearly independent information bits. Any multiset request of size $k$ of linear combinations (or requests) of the information bits can be recovered by $k$ disjoint subsets of the servers. The goal under this paradigm is to find the minimum number of servers for given values of $s$ and… ▽ More

    Submitted 16 October, 2021; v1 submitted 17 January, 2021; originally announced January 2021.