subscribe to arXiv mailings

Number Systems for Deep Neural Network Architectures: A Survey

Authors: Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis

Abstract: Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, ene… ▽ More Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted △ Less

Submitted 11 July, 2023; originally announced July 2023.

Comments: 28 pages

arXiv:2209.07524 [pdf, ps, other]

$\tilde{O}(n+\mathrm{poly}(k))$-time Algorithm for Bounded Tree Edit Distance

Authors: Debarati Das, Jacob Gilbert, MohammadTaghi Hajiaghayi, Tomasz Kociumaka, Barna Saha, Hamed Saleh

Abstract: Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of dissimilarity between two (unweighted) rooted trees with node labels. Perhaps the most notable recent application of tree edit distance is in NoSQL big databases,… ▽ More Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of dissimilarity between two (unweighted) rooted trees with node labels. Perhaps the most notable recent application of tree edit distance is in NoSQL big databases, such as MongoDB, where each row of the database is a JSON document represented as a labeled rooted tree, and finding dissimilarity between two rows is a basic operation. Until recently, the fastest algorithm for tree edit distance ran in cubic time (Demaine, Mozes, Rossman, Weimann; TALG'10); however, Mao (FOCS'21) broke the cubic barrier for the tree edit distance problem using fast matrix multiplication. Given a parameter $k$ as an upper bound on the distance, an $O(n+k^2)$-time algorithm for edit distance has been known since the 1980s due to the works of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88). The existence of an $\tilde{O}(n+\mathrm{poly}(k))$-time algorithm for tree edit distance has been posed as an open question, e.g., by Akmal and Jin (ICALP'21), who gave a state-of-the-art $\tilde{O}(nk^2)$-time algorithm. In this paper, we answer this question positively. △ Less

Submitted 15 September, 2022; originally announced September 2022.

Comments: Full version of a paper accepted to FOCS 2022

arXiv:2209.04936 [pdf, ps, other]

Dynamic Subset Sum with Truly Sublinear Processing Time

Authors: Hamed Saleh, Saeed Seddighin

Abstract: Subset sum is a very old and fundamental problem in theoretical computer science. In this problem, $n$ items with weights $w_1, w_2, w_3, \ldots, w_n$ are given as input and the goal is to find out if there is a subset of them whose weights sum up to a given value $t$. While the problem is NP-hard in general, when the values are non-negative integer, subset sum can be solved in pseudo-polynomial t… ▽ More Subset sum is a very old and fundamental problem in theoretical computer science. In this problem, $n$ items with weights $w_1, w_2, w_3, \ldots, w_n$ are given as input and the goal is to find out if there is a subset of them whose weights sum up to a given value $t$. While the problem is NP-hard in general, when the values are non-negative integer, subset sum can be solved in pseudo-polynomial time $~\widetilde O(n+t)$. In this work, we consider the dynamic variant of subset sum. In this setting, an upper bound $\tmax$ is provided in advance to the algorithm and in each operation, either a new item is added to the problem or for a given integer value $t \leq \tmax$, the algorithm is required to output whether there is a subset of items whose sum of weights is equal to $t$. Unfortunately, none of the existing subset sum algorithms is able to process these operations in truly sublinear time\footnote{Truly sublinear means $n^{1-Ω(1)}$.} in terms of $\tmax$. Our main contribution is an algorithm whose amortized processing time\footnote{Since the runtimes are amortized, we do not use separate terms update time and query time for different operations and use processing time for all types of operations.} for each operation is truly sublinear in $\tmax$ when the number of operations is at least $\tmax^{2/3+Ω(1)}$. We also show that when both element addition and element removal are allowed, there is no algorithm that can process each operation in time $\tmax^{1-Ω(1)}$ on average unless \textsf{SETH}\footnote{The \textit{strong exponential time hypothesis} states that no algorithm can solve the satisfiability problem in time $2^{n(1-Ω(1))}$.} fails. △ Less

Submitted 11 September, 2022; originally announced September 2022.

arXiv:2205.14101 [pdf, ps, other]

Adaptive Massively Parallel Algorithms for Cut Problems

Authors: MohammadTaghi Hajiaghayi, Marina Knittel, Jan Olkowski, Hamed Saleh

Abstract: We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is… ▽ More We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is an abstraction of famous practical frameworks such as MapReduce, Hadoop, Flume, and Spark, has been at the forefront of this research. While great strides have been taken to create highly efficient MPC algorithms for a range of problems, recent progress has been limited by the 1-vs-2 Cycle Conjecture [20], which postulates that the simple problem of distinguishing between one and two cycles requires $Ω(\log n)$ MPC rounds. In the AMPC model, each machine has adaptive read access to a distributed hash table even when communication is restricted (i.e., in the middle of a round). While remaining practical [4], this gives algorithms the power to bypass limitations like the 1-vs-2 Cycle Conjecture. We give the first sublogarithmic AMPC algorithm, requiring $O(\log\log n)$ rounds, for $(2+ε)$-approximate weighted Min Cut. Our algorithm is inspired by the divide and conquer approach of Ghaffari and Nowicki [11], which solves the $(2+ε)$-approximate weighted Min Cut problem in $O(\log n\log\log n)$ rounds of MPC using the classic result of Karger and Stein [15]. Our work is fully-scalable in the sense that the local memory of each machine is $O(n^ε)$ for any constant $0 < ε< 1$. There are no $o(\log n)$-round MPC algorithms for Min Cut in this memory regime assuming the 1-vs-2 Cycle Conjecture holds. The exponential speedup in AMPC is the result of decoupling the different layers of the divide and conquer algorithm and solving all layers in $O(1)$ rounds. △ Less

Submitted 27 May, 2022; originally announced May 2022.

arXiv:2111.01904 [pdf, other]

Adaptive Massively Parallel Constant-round Tree Contraction

Authors: MohammadTaghi Hajiaghayi, Marina Knittel, Hamed Saleh, Hsin-Hao Su

Abstract: Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also used as an algorithmic design technique for a large number of parallel graph algorithms. In all previously explored models of computation, however, tree contractions have only been achieved in… ▽ More Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also used as an algorithmic design technique for a large number of parallel graph algorithms. In all previously explored models of computation, however, tree contractions have only been achieved in $Ω(\log n)$ rounds of parallel run time. In this work, we not only introduce a generalized tree contraction method but also show it can be computed highly efficiently in $O(1/ε^3)$ rounds in the Adaptive Massively Parallel Computing (AMPC) setting, where each machine has $O(n^ε)$ local memory for some $0 < ε< 1$. AMPC is a practical extension of Massively Parallel Computing (MPC) which utilizes distributed hash tables. In general, MPC is an abstract model for MapReduce, Hadoop, Spark, and Flume which are currently widely used across industry and has been studied extensively in the theory community in recent years. Last but not least, we show that our results extend to multiple problems on trees, including but not limited to maximum and maximal matching, maximum and maximal independent set, tree isomorphism testing, and more. △ Less

Submitted 2 November, 2021; originally announced November 2021.

Comments: 35 pages, 3 figures, to be published in Innovations in Theoretical Computer Science (ITCS)

arXiv:2111.01515 [pdf]

Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model

Authors: Hind Saleh, Areej Alhothali, Kawthar Moria

Abstract: The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their negative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation.… ▽ More The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their negative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation. Hate communities tend to use abbreviations, intentional spelling mistakes, and coded words in their communication to evade detection, adding more challenges to hate speech detection tasks. Thus, word representation will play an increasingly pivotal role in detecting hate speech. This paper investigates the feasibility of leveraging domain-specific word embedding in Bidirectional LSTM based deep model to automatically detect/classify hate speech. Furthermore, we investigate the use of the transfer learning language model (BERT) on hate speech problem as a binary classification task. The experiments showed that domainspecific word embedding with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT achieved up to 96% f1-score on a combined balanced dataset from available hate speech datasets. △ Less

Submitted 2 November, 2021; originally announced November 2021.

arXiv:2110.05947 [pdf, other]

C3PU: Cross-Coupling Capacitor Processing Unit Using Analog-Mixed Signal In-Memory Computing for AI Inference

Authors: Dima Kilani, Baker Mohammad, Yasmin Halawani, Mohammed F. Tolba, Hani Saleh

Abstract: This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication onc… ▽ More This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication once the multiplicand is applied at the terminal. The multiplicand is the input voltage that is converted to a pulse width signal using a low power VTC. The transistor transfers this multiplication where a voltage level is generated. A demonstrator of 5x4 C3PU array that is capable of implementing 4 MAC units is presented. The design has been verified using Monte Carlo simulation in 65 nm technology. The 5x4 C3PU consumed energy of 66.4 fJ/MAC at 0.3 V voltage supply with an error of 5.7%. The proposed unit achieves lower energy and occupies a smaller area by 3.4x and 3.6x, respectively, with similar error value when compared to a digital-based 8x4-bit fixed point MAC unit. The C3PU has been utilized through an iris fower classification utilizing an artificial neural network which achieved a 90% classification accuracy compared to ideal accuracy of 96.67% using MATLAB. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 10 pages, 12 figures and 7 tables

arXiv:2109.12111 [pdf]

Accurate Remaining Useful Life Prediction with Uncertainty Quantification: a Deep Learning and Nonstationary Gaussian Process Approach

Authors: Zhaoyi Xu, Yanjie Guo, Joseph Homer Saleh

Abstract: Remaining useful life (RUL) refers to the expected remaining lifespan of a component or system. Accurate RUL prediction is critical for prognostic and health management and for maintenance planning. In this work, we address three prevalent challenges in data-driven RUL prediction, namely the handling of high dimensional input features, the robustness to noise in sensor data and prognostic datasets… ▽ More Remaining useful life (RUL) refers to the expected remaining lifespan of a component or system. Accurate RUL prediction is critical for prognostic and health management and for maintenance planning. In this work, we address three prevalent challenges in data-driven RUL prediction, namely the handling of high dimensional input features, the robustness to noise in sensor data and prognostic datasets, and the capturing of the time-dependency between system degradation and RUL prediction. We devise a highly accurate RUL prediction model with uncertainty quantification, which integrates and leverages the advantages of deep learning and nonstationary Gaussian process regression (DL-NSGPR). We examine and benchmark our model against other advanced data-driven RUL prediction models using the turbofan engine dataset from the NASA prognostic repository. Our computational experiments show that the DL-NSGPR predictions are highly accurate with root mean square error 1.7 to 6.2 times smaller than those of competing RUL models. Furthermore, the results demonstrate that RUL uncertainty bounds with the proposed DL-NSGPR are both valid and significantly tighter than other stochastic RUL prediction models. We unpack and discuss the reasons for this excellent performance of the DL-NSGPR. △ Less

Submitted 23 September, 2021; originally announced September 2021.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2109.11579

arXiv:2109.11579 [pdf]

Remaining useful life prediction with uncertainty quantification: development of a highly accurate model for rotating machinery

Authors: Zhaoyi Xu, Yanjie Guo, Joseph Homer Saleh

Abstract: Rotating machinery is essential to modern life, from power generation to transportation and a host of other industrial applications. Since such equipment generally operates under challenging working conditions, which can lead to untimely failures, accurate remaining useful life (RUL) prediction is essential for maintenance planning and to prevent catastrophic failures. In this work, we address cur… ▽ More Rotating machinery is essential to modern life, from power generation to transportation and a host of other industrial applications. Since such equipment generally operates under challenging working conditions, which can lead to untimely failures, accurate remaining useful life (RUL) prediction is essential for maintenance planning and to prevent catastrophic failures. In this work, we address current challenges in data-driven RUL prediction for rotating machinery. The challenges revolve around the accuracy and uncertainty quantification of the prediction, and the non-stationarity of the system degradation and RUL estimation given sensor data. We devise a novel architecture and RUL prediction model with uncertainty quantification, termed VisPro, which integrates time-frequency analysis, deep learning image recognition, and nonstationary Gaussian process regression. We analyze and benchmark the results obtained with our model against those of other advanced data-driven RUL prediction models for rotating machinery using the PHM12 bearing vibration dataset. The computational experiments show that (1) the VisPro predictions are highly accurate and provide significant improvements over existing prediction models (three times more accurate than the second-best model), and (2) the RUL uncertainty bounds are valid and informative. We identify and discuss the architectural and modeling choices made that explain this excellent predictive performance of VisPro. △ Less

Submitted 23 September, 2021; originally announced September 2021.

arXiv:2105.02954 [pdf, other]

Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification

Authors: Mohammed F. Tolba, Huruy Tekle Tesfai, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri

Abstract: Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weigh… ▽ More Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses. △ Less

Submitted 28 April, 2021; originally announced May 2021.

Comments: 10 pages 9 figures

arXiv:2012.14938 [pdf, other]

doi 10.1109/TIFS.2021.3057576

UNSAIL: Thwarting Oracle-Less Machine Learning Attacks on Logic Locking

Authors: Lilas Alrahis, Satwik Patnaik, Johann Knechtel, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri, Ozgur Sinanoglu

Abstract: Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic… ▽ More Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic locking technique that inserts key-gate structures with the specific aim to confuse ML models like those used in SAIL. More specifically, UNSAIL serves to prevent attacks seeking to resolve the structural transformations of synthesis-induced obfuscation, which is an essential step for logic locking. Our approach is generic; it can protect any local structure of key-gates against such ML-based attacks in an oracle-less setting. We develop a reference implementation for the SAIL attack and launch it on both traditionally locked and UNSAIL-locked designs. In SAIL, a change-prediction model is used to determine which key-gate structures to restore using a reconstruction model. Our study on benchmarks ranging from the ISCAS-85 and ITC-99 suites to the OpenRISC Reference Platform System-on-Chip (ORPSoC) confirms that UNSAIL degrades the accuracy of the change-prediction model and the reconstruction model by an average of 20.13 and 17 percentage points (pp) respectively. When the aforementioned models are combined, which is the most powerful scenario for SAIL, UNSAIL reduces the attack accuracy of SAIL by an average of 11pp. We further demonstrate that UNSAIL thwarts other oracle-less attacks, i.e., SWEEP and the redundancy attack, indicating the generic nature and strength of our approach. Detailed layout-level evaluations illustrate that UNSAIL incurs minimal area and power overheads of 0.26% and 0.61%, respectively, on the million-gate ORPSoC design. △ Less

Submitted 9 February, 2021; v1 submitted 29 December, 2020; originally announced December 2020.

Comments: IEEE Transactions on Information Forensics and Security (TIFS)

arXiv:2012.05948 [pdf, other]

GNNUnlock: Graph Neural Networks-based Oracle-less Unlocking Scheme for Provably Secure Logic Locking

Authors: Lilas Alrahis, Satwik Patnaik, Faiq Khalid, Muhammad Abdullah Hanif, Hani Saleh, Muhammad Shafique, Ozgur Sinanoglu

Abstract: In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, witho… ▽ More In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, without requiring an oracle. This approach fits perfectly with the targeted problem since a circuit is a graph with an inherent structure and the protection logic is a sub-graph of nodes (gates) with specific and common characteristics. GNNs are powerful in capturing the nodes' neighborhood properties, facilitating the detection of the protection logic. To rectify any misclassifications induced by the GNN, we additionally propose a connectivity analysis-based post-processing algorithm to successfully remove the predicted protection logic, thereby retrieving the original design. Our extensive experimental evaluation demonstrates that GNNUnlock is 99.24%-100% successful in breaking various benchmarks locked using stripped-functionality logic locking, tenacious and traceless logic locking, and Anti-SAT. Our proposed post-processing enhances the detection accuracy, reaching 100% for all of our tested locked benchmarks. Analysis of the results corroborates that GNNUnlock is powerful enough to break the considered schemes under different parameters, synthesis settings, and technology nodes. The evaluation further shows that GNNUnlock successfully breaks corner cases where even the most advanced state-of-the-art attacks fail. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: 6 pages, 4 figures, 6 tables, conference

arXiv:2008.08221 [pdf]

Machine Learning for Reliability Engineering and Safety Applications: Review of Current Status and Future Opportunities

Authors: Zhaoyi Xu, Joseph Homer Saleh

Abstract: Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhe… ▽ More Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhelming to navigate and integrate into a coherent whole. In this work, we facilitate this task by providing a synthesis of, and a roadmap to this ever-expanding analytical landscape and highlighting its major landmarks and pathways. We first provide an overview of the different ML categories and sub-categories or tasks, and we note several of the corresponding models and algorithms. We then look back and review the use of ML in reliability and safety applications. We examine several publications in each category/sub-category, and we include a short discussion on the use of Deep Learning to highlight its growing popularity and distinctive advantages. Finally, we look ahead and outline several promising future opportunities for leveraging ML in service of advancing reliability and safety considerations. Overall, we argue that ML is capable of providing novel insights and opportunities to solve important challenges in reliability and safety applications. It is also capable of teasing out more accurate insights from accident datasets than with traditional analysis tools, and this in turn can lead to better informed decision-making and more effective accident prevention. △ Less

Submitted 18 August, 2020; originally announced August 2020.

Comments: 50 pages

arXiv:2005.13814 [pdf, other]

doi 10.1017/S0956796820000131

Explicit Effect Subtyping

Authors: Georgios Karachalias, Matija Pretnar, Amr Hany Saleh, Stien Vanderhallen, Tom Schrijvers

Abstract: As popularity of algebraic effects and handlers increases, so does a demand for their efficient execution. Eff, an ML-like language with native support for handlers, has a subtyping-based effect system on which an effect-aware optimizing compiler could be built. Unfortunately, in our experience, implementing optimizations for Eff is overly error-prone because its core language is implicitly-typed,… ▽ More As popularity of algebraic effects and handlers increases, so does a demand for their efficient execution. Eff, an ML-like language with native support for handlers, has a subtyping-based effect system on which an effect-aware optimizing compiler could be built. Unfortunately, in our experience, implementing optimizations for Eff is overly error-prone because its core language is implicitly-typed, making code transformations very fragile. To remedy this, we present an explicitly-typed polymorphic core calculus for algebraic effect handlers with a subtyping-based type-and-effect system. It reifies appeals to subtyping in explicit casts with coercions that witness the subtyping proof, quickly exposing typing bugs in program transformations. Our typing-directed elaboration comes with a constraint-based inference algorithm that turns an implicitly-typed Eff-like language into our calculus. Moreover, all coercions and effect information can be erased in a straightforward way, demonstrating that coercions have no computational content. Additionally, we present a monadic translation from our calculus into a pure language without algebraic effects or handlers, using the effect information to introduce monadic constructs only where necessary. △ Less

Submitted 28 May, 2020; originally announced May 2020.

Comments: 57 pages, 29 figures

Journal ref: J. Funct. Prog. 30 (2020) e15

arXiv:1910.11829 [pdf, other]

doi 10.1145/3409964.3461793

String Matching with Wildcards in the Massively Parallel Computation Model

Authors: MohammadTaghi Hajiaghayi, Hamed Saleh, Saeed Seddighin, Xiaorui Sun

Abstract: We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the… ▽ More We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the fields of bioinformatics and machine learning. Persistent effort has led to a variety of algorithms for the problem since 1960s. With rise of big data and the inevitable demand to solve problems on huge data sets, there have been many attempts to adapt classic algorithms into the MPC framework to obtain further efficiency. MPC is a recent framework for parallel computation of big data, which is designed to capture the MapReduce-like algorithms. In this paper, we study the string matching problem using a set of tools translated to MPC model. We consider three types of wildcards in string matching: - '?' wildcard: In this setting, the pattern is allowed to contain special '?' characters or don't cares that match any character of the text. String matching with don't cares could be solved by fast convolutions, and we give a constant round MPC algorithm for which by utilizing FFT in a constant number of MPC rounds. - '+' wildcard: '+' wildcard is a special character that allows for arbitrary repetitions of a character. When the pattern contains '+' wildcard characters, our algorithm runs in a constant number of MPC rounds by a reduction from subset matching problem. - '*' wildcard: '*' is a special character that matches with any substring of the text. When '*' is allowed in the pattern, we solve two special cases of the problem in logarithmic rounds. △ Less

Submitted 4 June, 2021; v1 submitted 25 October, 2019; originally announced October 2019.

arXiv:1909.04428 [pdf, other]

ScanSAT: Unlocking Static and Dynamic Scan Obfuscation

Authors: Lilas Alrahis, Muhammad Yasin, Nimisha Limaye, Hani Saleh, Baker Mohammad, Mahmoud Al-Qutayri, Ozgur Sinanoglu

Abstract: While financially advantageous, outsourcing key steps, such as testing, to potentially untrusted Outsourced Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hides the transformation functions that map… ▽ More While financially advantageous, outsourcing key steps, such as testing, to potentially untrusted Outsourced Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hides the transformation functions that map the scan-in stimulus (scan-out response) and the delivered scan pattern (captured response). While static scan obfuscation utilizes the same secret key, and thus, the same secret transformation functions throughout the lifetime of the chip, dynamic scan obfuscation updates the key periodically. In this paper, we propose ScanSAT: an attack that transforms a scan obfuscated circuit to its logic-locked version and applies the Boolean satisfiability (SAT) based attack, thereby extracting the secret key. We implement our attack, apply on representative scan obfuscation techniques, and show that ScanSAT can break both static and dynamic scan obfuscation schemes with 100% success rate. Moreover, ScanSAT is effective even for large key sizes and in the presence of scan compression. △ Less

Submitted 10 September, 2019; originally announced September 2019.

Comments: 16 pages, 14 figures, IEEE Transactions on Emerging Topics in Computing

arXiv:1805.06191 [pdf, other]

Fair Allocation of Indivisible Items With Externalities

Authors: Mohammad Ghodsi, Hamed Saleh, Masoud Seddighin

Abstract: One of the important yet insufficiently studied subjects in fair allocation is the externality effect among agents. For a resource allocation problem, externalities imply that a bundle allocated to an agent may affect the utilities of other agents. In this paper, we conduct a study of fair allocation of indivisible goods when the externalities are not negligible. We present a simple and natural… ▽ More One of the important yet insufficiently studied subjects in fair allocation is the externality effect among agents. For a resource allocation problem, externalities imply that a bundle allocated to an agent may affect the utilities of other agents. In this paper, we conduct a study of fair allocation of indivisible goods when the externalities are not negligible. We present a simple and natural model, namely \emph{network externalities}, to capture the externalities. To evaluate fairness in the network externalities model, we generalize the idea behind the notion of maximin-share ($\MMS$) to achieve a new criterion, namely, \emph{extended-maximin-share} ($\EMMS$). Next, we consider two problems concerning our model. First, we discuss the computational aspects of finding the value of $\EMMS$ for every agent. For this, we introduce a generalized form of partitioning problem that includes many famous partitioning problems such as maximin, minimax, and leximin partitioning problems. We show that a $1/2$-approximation algorithm exists for this partitioning problem. Next, we investigate on finding approximately optimal $\EMMS$ allocations. That is, allocations that guarantee every agent a utility of at least a fraction of his extended-maximin-share. We show that under a natural assumption that the agents are $α$-self-reliant, an $α/2$-$\EMMS$ allocation always exists. The combination of this with the former result yields a polynomial-time $α/4$-$\EMMS$ allocation algorithm. △ Less

Submitted 16 May, 2018; originally announced May 2018.

arXiv:1608.00816 [pdf, ps, other]

Efficient Algebraic Effect Handlers for Prolog

Authors: Amr Hany Saleh, Tom Schrijvers

Abstract: Recent work has provided delimited control for Prolog to dynamically manipulate the program control-flow, and to implement a wide range of control-flow and dataflow effects on top of. Unfortunately, delimited control is a rather primitive language feature that is not easy to use. As a remedy, this work introduces algebraic effect handlers for Prolog, as a high-level and structured way of definin… ▽ More Recent work has provided delimited control for Prolog to dynamically manipulate the program control-flow, and to implement a wide range of control-flow and dataflow effects on top of. Unfortunately, delimited control is a rather primitive language feature that is not easy to use. As a remedy, this work introduces algebraic effect handlers for Prolog, as a high-level and structured way of defining new side-effects in a modular fashion. We illustrate the expressive power of the feature and provide an implementation by means of elaboration into the delimited control primitives. The latter add a non-negligible performance overhead when used extensively. To address this issue, we present an optimised compilation approach that combines partial evaluation with dedicated rewrite rules. The rewrite rules are driven by a lightweight effect inference that analyses what effect operations may be called by a goal. We illustrate the effectiveness of this approach on a range of benchmarks. This article is under consideration for acceptance in TPLP. △ Less

Submitted 2 August, 2016; originally announced August 2016.

Comments: Paper presented at the 32nd International Conference on Logic Programming (ICLP 2016), New York City, USA, 16-21 October 2016, LaTex, 14 pages, 2 figures

Showing 1–18 of 18 results for author: Saleh, H