-
Number Systems for Deep Neural Network Architectures: A Survey
Authors:
Ghada Alsuhli,
Vasileios Sakellariou,
Hani Saleh,
Mahmoud Al-Qutayri,
Baker Mohammad,
Thanos Stouraitis
Abstract:
Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, ene…
▽ More
Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
$\tilde{O}(n+\mathrm{poly}(k))$-time Algorithm for Bounded Tree Edit Distance
Authors:
Debarati Das,
Jacob Gilbert,
MohammadTaghi Hajiaghayi,
Tomasz Kociumaka,
Barna Saha,
Hamed Saleh
Abstract:
Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of dissimilarity between two (unweighted) rooted trees with node labels. Perhaps the most notable recent application of tree edit distance is in NoSQL big databases,…
▽ More
Computing the edit distance of two strings is one of the most basic problems in computer science and combinatorial optimization. Tree edit distance is a natural generalization of edit distance in which the task is to compute a measure of dissimilarity between two (unweighted) rooted trees with node labels. Perhaps the most notable recent application of tree edit distance is in NoSQL big databases, such as MongoDB, where each row of the database is a JSON document represented as a labeled rooted tree, and finding dissimilarity between two rows is a basic operation. Until recently, the fastest algorithm for tree edit distance ran in cubic time (Demaine, Mozes, Rossman, Weimann; TALG'10); however, Mao (FOCS'21) broke the cubic barrier for the tree edit distance problem using fast matrix multiplication.
Given a parameter $k$ as an upper bound on the distance, an $O(n+k^2)$-time algorithm for edit distance has been known since the 1980s due to the works of Myers (Algorithmica'86) and Landau and Vishkin (JCSS'88). The existence of an $\tilde{O}(n+\mathrm{poly}(k))$-time algorithm for tree edit distance has been posed as an open question, e.g., by Akmal and Jin (ICALP'21), who gave a state-of-the-art $\tilde{O}(nk^2)$-time algorithm. In this paper, we answer this question positively.
△ Less
Submitted 15 September, 2022;
originally announced September 2022.
-
Dynamic Subset Sum with Truly Sublinear Processing Time
Authors:
Hamed Saleh,
Saeed Seddighin
Abstract:
Subset sum is a very old and fundamental problem in theoretical computer science. In this problem, $n$ items with weights $w_1, w_2, w_3, \ldots, w_n$ are given as input and the goal is to find out if there is a subset of them whose weights sum up to a given value $t$. While the problem is NP-hard in general, when the values are non-negative integer, subset sum can be solved in pseudo-polynomial t…
▽ More
Subset sum is a very old and fundamental problem in theoretical computer science. In this problem, $n$ items with weights $w_1, w_2, w_3, \ldots, w_n$ are given as input and the goal is to find out if there is a subset of them whose weights sum up to a given value $t$. While the problem is NP-hard in general, when the values are non-negative integer, subset sum can be solved in pseudo-polynomial time $~\widetilde O(n+t)$.
In this work, we consider the dynamic variant of subset sum. In this setting, an upper bound $\tmax$ is provided in advance to the algorithm and in each operation, either a new item is added to the problem or for a given integer value $t \leq \tmax$, the algorithm is required to output whether there is a subset of items whose sum of weights is equal to $t$. Unfortunately, none of the existing subset sum algorithms is able to process these operations in truly sublinear time\footnote{Truly sublinear means $n^{1-Ω(1)}$.} in terms of $\tmax$.
Our main contribution is an algorithm whose amortized processing time\footnote{Since the runtimes are amortized, we do not use separate terms update time and query time for different operations and use processing time for all types of operations.} for each operation is truly sublinear in $\tmax$ when the number of operations is at least $\tmax^{2/3+Ω(1)}$. We also show that when both element addition and element removal are allowed, there is no algorithm that can process each operation in time $\tmax^{1-Ω(1)}$ on average unless \textsf{SETH}\footnote{The \textit{strong exponential time hypothesis} states that no algorithm can solve the satisfiability problem in time $2^{n(1-Ω(1))}$.} fails.
△ Less
Submitted 11 September, 2022;
originally announced September 2022.
-
Adaptive Massively Parallel Algorithms for Cut Problems
Authors:
MohammadTaghi Hajiaghayi,
Marina Knittel,
Jan Olkowski,
Hamed Saleh
Abstract:
We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is…
▽ More
We study the Weighted Min Cut problem in the Adaptive Massively Parallel Computation (AMPC) model. In 2019, Behnezhad et al. [3] introduced the AMPC model as an extension of the Massively Parallel Computation (MPC) model. In the past decade, research on highly scalable algorithms has had significant impact on many massive systems. The MPC model, introduced in 2010 by Karloff et al. [16], which is an abstraction of famous practical frameworks such as MapReduce, Hadoop, Flume, and Spark, has been at the forefront of this research. While great strides have been taken to create highly efficient MPC algorithms for a range of problems, recent progress has been limited by the 1-vs-2 Cycle Conjecture [20], which postulates that the simple problem of distinguishing between one and two cycles requires $Ω(\log n)$ MPC rounds. In the AMPC model, each machine has adaptive read access to a distributed hash table even when communication is restricted (i.e., in the middle of a round). While remaining practical [4], this gives algorithms the power to bypass limitations like the 1-vs-2 Cycle Conjecture.
We give the first sublogarithmic AMPC algorithm, requiring $O(\log\log n)$ rounds, for $(2+ε)$-approximate weighted Min Cut. Our algorithm is inspired by the divide and conquer approach of Ghaffari and Nowicki [11], which solves the $(2+ε)$-approximate weighted Min Cut problem in $O(\log n\log\log n)$ rounds of MPC using the classic result of Karger and Stein [15]. Our work is fully-scalable in the sense that the local memory of each machine is $O(n^ε)$ for any constant $0 < ε< 1$. There are no $o(\log n)$-round MPC algorithms for Min Cut in this memory regime assuming the 1-vs-2 Cycle Conjecture holds. The exponential speedup in AMPC is the result of decoupling the different layers of the divide and conquer algorithm and solving all layers in $O(1)$ rounds.
△ Less
Submitted 27 May, 2022;
originally announced May 2022.
-
Adaptive Massively Parallel Constant-round Tree Contraction
Authors:
MohammadTaghi Hajiaghayi,
Marina Knittel,
Hamed Saleh,
Hsin-Hao Su
Abstract:
Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also used as an algorithmic design technique for a large number of parallel graph algorithms. In all previously explored models of computation, however, tree contractions have only been achieved in…
▽ More
Miller and Reif's FOCS'85 classic and fundamental tree contraction algorithm is a broadly applicable technique for the parallel solution of a large number of tree problems. Additionally it is also used as an algorithmic design technique for a large number of parallel graph algorithms. In all previously explored models of computation, however, tree contractions have only been achieved in $Ω(\log n)$ rounds of parallel run time. In this work, we not only introduce a generalized tree contraction method but also show it can be computed highly efficiently in $O(1/ε^3)$ rounds in the Adaptive Massively Parallel Computing (AMPC) setting, where each machine has $O(n^ε)$ local memory for some $0 < ε< 1$. AMPC is a practical extension of Massively Parallel Computing (MPC) which utilizes distributed hash tables. In general, MPC is an abstract model for MapReduce, Hadoop, Spark, and Flume which are currently widely used across industry and has been studied extensively in the theory community in recent years. Last but not least, we show that our results extend to multiple problems on trees, including but not limited to maximum and maximal matching, maximum and maximal independent set, tree isomorphism testing, and more.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
Detection of Hate Speech using BERT and Hate Speech Word Embedding with Deep Model
Authors:
Hind Saleh,
Areej Alhothali,
Kawthar Moria
Abstract:
The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their negative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation.…
▽ More
The enormous amount of data being generated on the web and social media has increased the demand for detecting online hate speech. Detecting hate speech will reduce their negative impact and influence on others. A lot of effort in the Natural Language Processing (NLP) domain aimed to detect hate speech in general or detect specific hate speech such as religion, race, gender, or sexual orientation. Hate communities tend to use abbreviations, intentional spelling mistakes, and coded words in their communication to evade detection, adding more challenges to hate speech detection tasks. Thus, word representation will play an increasingly pivotal role in detecting hate speech. This paper investigates the feasibility of leveraging domain-specific word embedding in Bidirectional LSTM based deep model to automatically detect/classify hate speech. Furthermore, we investigate the use of the transfer learning language model (BERT) on hate speech problem as a binary classification task. The experiments showed that domainspecific word embedding with the Bidirectional LSTM based deep model achieved a 93% f1-score while BERT achieved up to 96% f1-score on a combined balanced dataset from available hate speech datasets.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.
-
C3PU: Cross-Coupling Capacitor Processing Unit Using Analog-Mixed Signal In-Memory Computing for AI Inference
Authors:
Dima Kilani,
Baker Mohammad,
Yasmin Halawani,
Mohammed F. Tolba,
Hani Saleh
Abstract:
This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication onc…
▽ More
This paper presents a novel cross-coupling capacitor processing unit (C3PU) that supports analog-mixed signal in memory computing to perform multiply-and-accumulate (MAC) operations. The C3PU consists of a capacitive unit, a CMOS transistor, and a voltage-to-time converter (VTC). The capacitive unit serves as a computational element that holds the multiplier operand and performs multiplication once the multiplicand is applied at the terminal. The multiplicand is the input voltage that is converted to a pulse width signal using a low power VTC. The transistor transfers this multiplication where a voltage level is generated. A demonstrator of 5x4 C3PU array that is capable of implementing 4 MAC units is presented. The design has been verified using Monte Carlo simulation in 65 nm technology. The 5x4 C3PU consumed energy of 66.4 fJ/MAC at 0.3 V voltage supply with an error of 5.7%. The proposed unit achieves lower energy and occupies a smaller area by 3.4x and 3.6x, respectively, with similar error value when compared to a digital-based 8x4-bit fixed point MAC unit. The C3PU has been utilized through an iris fower classification utilizing an artificial neural network which achieved a 90% classification accuracy compared to ideal accuracy of 96.67% using MATLAB.
△ Less
Submitted 11 October, 2021;
originally announced October 2021.
-
Accurate Remaining Useful Life Prediction with Uncertainty Quantification: a Deep Learning and Nonstationary Gaussian Process Approach
Authors:
Zhaoyi Xu,
Yanjie Guo,
Joseph Homer Saleh
Abstract:
Remaining useful life (RUL) refers to the expected remaining lifespan of a component or system. Accurate RUL prediction is critical for prognostic and health management and for maintenance planning. In this work, we address three prevalent challenges in data-driven RUL prediction, namely the handling of high dimensional input features, the robustness to noise in sensor data and prognostic datasets…
▽ More
Remaining useful life (RUL) refers to the expected remaining lifespan of a component or system. Accurate RUL prediction is critical for prognostic and health management and for maintenance planning. In this work, we address three prevalent challenges in data-driven RUL prediction, namely the handling of high dimensional input features, the robustness to noise in sensor data and prognostic datasets, and the capturing of the time-dependency between system degradation and RUL prediction. We devise a highly accurate RUL prediction model with uncertainty quantification, which integrates and leverages the advantages of deep learning and nonstationary Gaussian process regression (DL-NSGPR). We examine and benchmark our model against other advanced data-driven RUL prediction models using the turbofan engine dataset from the NASA prognostic repository. Our computational experiments show that the DL-NSGPR predictions are highly accurate with root mean square error 1.7 to 6.2 times smaller than those of competing RUL models. Furthermore, the results demonstrate that RUL uncertainty bounds with the proposed DL-NSGPR are both valid and significantly tighter than other stochastic RUL prediction models. We unpack and discuss the reasons for this excellent performance of the DL-NSGPR.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Remaining useful life prediction with uncertainty quantification: development of a highly accurate model for rotating machinery
Authors:
Zhaoyi Xu,
Yanjie Guo,
Joseph Homer Saleh
Abstract:
Rotating machinery is essential to modern life, from power generation to transportation and a host of other industrial applications. Since such equipment generally operates under challenging working conditions, which can lead to untimely failures, accurate remaining useful life (RUL) prediction is essential for maintenance planning and to prevent catastrophic failures. In this work, we address cur…
▽ More
Rotating machinery is essential to modern life, from power generation to transportation and a host of other industrial applications. Since such equipment generally operates under challenging working conditions, which can lead to untimely failures, accurate remaining useful life (RUL) prediction is essential for maintenance planning and to prevent catastrophic failures. In this work, we address current challenges in data-driven RUL prediction for rotating machinery. The challenges revolve around the accuracy and uncertainty quantification of the prediction, and the non-stationarity of the system degradation and RUL estimation given sensor data. We devise a novel architecture and RUL prediction model with uncertainty quantification, termed VisPro, which integrates time-frequency analysis, deep learning image recognition, and nonstationary Gaussian process regression. We analyze and benchmark the results obtained with our model against those of other advanced data-driven RUL prediction models for rotating machinery using the PHM12 bearing vibration dataset. The computational experiments show that (1) the VisPro predictions are highly accurate and provide significant improvements over existing prediction models (three times more accurate than the second-best model), and (2) the RUL uncertainty bounds are valid and informative. We identify and discuss the architectural and modeling choices made that explain this excellent predictive performance of VisPro.
△ Less
Submitted 23 September, 2021;
originally announced September 2021.
-
Deep Neural Networks Based Weight Approximation and Computation Reuse for 2-D Image Classification
Authors:
Mohammed F. Tolba,
Huruy Tekle Tesfai,
Hani Saleh,
Baker Mohammad,
Mahmoud Al-Qutayri
Abstract:
Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weigh…
▽ More
Deep Neural Networks (DNNs) are computationally and memory intensive, which makes their hardware implementation a challenging task especially for resource constrained devices such as IoT nodes. To address this challenge, this paper introduces a new method to improve DNNs performance by fusing approximate computing with data reuse techniques to be used for image recognition applications. DNNs weights are approximated based on the linear and quadratic approximation methods during the training phase, then, all of the weights are replaced with the linear/quadratic coefficients to execute the inference in a way where different weights could be computed using the same coefficients. This leads to a repetition of the weights across the processing element (PE) array, which in turn enables the reuse of the DNN sub-computations (computational reuse) and leverage the same data (data reuse) to reduce DNNs computations, memory accesses, and improve energy efficiency albeit at the cost of increased training time. Complete analysis for both MNIST and CIFAR 10 datasets is presented for image recognition , where LeNet 5 revealed a reduction in the number of parameters by a factor of 1211.3x with a drop of less than 0.9% in accuracy. When compared to the state of the art Row Stationary (RS) method, the proposed architecture saved 54% of the total number of adders and multipliers needed. Overall, the proposed approach is suitable for IoT edge devices as it reduces the memory size requirement as well as the number of needed memory accesses.
△ Less
Submitted 28 April, 2021;
originally announced May 2021.
-
UNSAIL: Thwarting Oracle-Less Machine Learning Attacks on Logic Locking
Authors:
Lilas Alrahis,
Satwik Patnaik,
Johann Knechtel,
Hani Saleh,
Baker Mohammad,
Mahmoud Al-Qutayri,
Ozgur Sinanoglu
Abstract:
Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic…
▽ More
Logic locking aims to protect the intellectual property (IP) of integrated circuit (IC) designs throughout the globalized supply chain. The SAIL attack, based on tailored machine learning (ML) models, circumvents combinational logic locking with high accuracy and is amongst the most potent attacks as it does not require a functional IC acting as an oracle. In this work, we propose UNSAIL, a logic locking technique that inserts key-gate structures with the specific aim to confuse ML models like those used in SAIL. More specifically, UNSAIL serves to prevent attacks seeking to resolve the structural transformations of synthesis-induced obfuscation, which is an essential step for logic locking. Our approach is generic; it can protect any local structure of key-gates against such ML-based attacks in an oracle-less setting. We develop a reference implementation for the SAIL attack and launch it on both traditionally locked and UNSAIL-locked designs. In SAIL, a change-prediction model is used to determine which key-gate structures to restore using a reconstruction model. Our study on benchmarks ranging from the ISCAS-85 and ITC-99 suites to the OpenRISC Reference Platform System-on-Chip (ORPSoC) confirms that UNSAIL degrades the accuracy of the change-prediction model and the reconstruction model by an average of 20.13 and 17 percentage points (pp) respectively. When the aforementioned models are combined, which is the most powerful scenario for SAIL, UNSAIL reduces the attack accuracy of SAIL by an average of 11pp. We further demonstrate that UNSAIL thwarts other oracle-less attacks, i.e., SWEEP and the redundancy attack, indicating the generic nature and strength of our approach. Detailed layout-level evaluations illustrate that UNSAIL incurs minimal area and power overheads of 0.26% and 0.61%, respectively, on the million-gate ORPSoC design.
△ Less
Submitted 9 February, 2021; v1 submitted 29 December, 2020;
originally announced December 2020.
-
GNNUnlock: Graph Neural Networks-based Oracle-less Unlocking Scheme for Provably Secure Logic Locking
Authors:
Lilas Alrahis,
Satwik Patnaik,
Faiq Khalid,
Muhammad Abdullah Hanif,
Hani Saleh,
Muhammad Shafique,
Ozgur Sinanoglu
Abstract:
In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, witho…
▽ More
In this paper, we propose GNNUnlock, the first-of-its-kind oracle-less machine learning-based attack on provably secure logic locking that can identify any desired protection logic without focusing on a specific syntactic topology. The key is to leverage a well-trained graph neural network (GNN) to identify all the gates in a given locked netlist that belong to the targeted protection logic, without requiring an oracle. This approach fits perfectly with the targeted problem since a circuit is a graph with an inherent structure and the protection logic is a sub-graph of nodes (gates) with specific and common characteristics. GNNs are powerful in capturing the nodes' neighborhood properties, facilitating the detection of the protection logic. To rectify any misclassifications induced by the GNN, we additionally propose a connectivity analysis-based post-processing algorithm to successfully remove the predicted protection logic, thereby retrieving the original design. Our extensive experimental evaluation demonstrates that GNNUnlock is 99.24%-100% successful in breaking various benchmarks locked using stripped-functionality logic locking, tenacious and traceless logic locking, and Anti-SAT. Our proposed post-processing enhances the detection accuracy, reaching 100% for all of our tested locked benchmarks. Analysis of the results corroborates that GNNUnlock is powerful enough to break the considered schemes under different parameters, synthesis settings, and technology nodes. The evaluation further shows that GNNUnlock successfully breaks corner cases where even the most advanced state-of-the-art attacks fail.
△ Less
Submitted 10 December, 2020;
originally announced December 2020.
-
Machine Learning for Reliability Engineering and Safety Applications: Review of Current Status and Future Opportunities
Authors:
Zhaoyi Xu,
Joseph Homer Saleh
Abstract:
Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhe…
▽ More
Machine learning (ML) pervades an increasing number of academic disciplines and industries. Its impact is profound, and several fields have been fundamentally altered by it, autonomy and computer vision for example; reliability engineering and safety will undoubtedly follow suit. There is already a large but fragmented literature on ML for reliability and safety applications, and it can be overwhelming to navigate and integrate into a coherent whole. In this work, we facilitate this task by providing a synthesis of, and a roadmap to this ever-expanding analytical landscape and highlighting its major landmarks and pathways. We first provide an overview of the different ML categories and sub-categories or tasks, and we note several of the corresponding models and algorithms. We then look back and review the use of ML in reliability and safety applications. We examine several publications in each category/sub-category, and we include a short discussion on the use of Deep Learning to highlight its growing popularity and distinctive advantages. Finally, we look ahead and outline several promising future opportunities for leveraging ML in service of advancing reliability and safety considerations. Overall, we argue that ML is capable of providing novel insights and opportunities to solve important challenges in reliability and safety applications. It is also capable of teasing out more accurate insights from accident datasets than with traditional analysis tools, and this in turn can lead to better informed decision-making and more effective accident prevention.
△ Less
Submitted 18 August, 2020;
originally announced August 2020.
-
Explicit Effect Subtyping
Authors:
Georgios Karachalias,
Matija Pretnar,
Amr Hany Saleh,
Stien Vanderhallen,
Tom Schrijvers
Abstract:
As popularity of algebraic effects and handlers increases, so does a demand for their efficient execution. Eff, an ML-like language with native support for handlers, has a subtyping-based effect system on which an effect-aware optimizing compiler could be built. Unfortunately, in our experience, implementing optimizations for Eff is overly error-prone because its core language is implicitly-typed,…
▽ More
As popularity of algebraic effects and handlers increases, so does a demand for their efficient execution. Eff, an ML-like language with native support for handlers, has a subtyping-based effect system on which an effect-aware optimizing compiler could be built. Unfortunately, in our experience, implementing optimizations for Eff is overly error-prone because its core language is implicitly-typed, making code transformations very fragile.
To remedy this, we present an explicitly-typed polymorphic core calculus for algebraic effect handlers with a subtyping-based type-and-effect system. It reifies appeals to subtyping in explicit casts with coercions that witness the subtyping proof, quickly exposing typing bugs in program transformations.
Our typing-directed elaboration comes with a constraint-based inference algorithm that turns an implicitly-typed Eff-like language into our calculus. Moreover, all coercions and effect information can be erased in a straightforward way, demonstrating that coercions have no computational content. Additionally, we present a monadic translation from our calculus into a pure language without algebraic effects or handlers, using the effect information to introduce monadic constructs only where necessary.
△ Less
Submitted 28 May, 2020;
originally announced May 2020.
-
String Matching with Wildcards in the Massively Parallel Computation Model
Authors:
MohammadTaghi Hajiaghayi,
Hamed Saleh,
Saeed Seddighin,
Xiaorui Sun
Abstract:
We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the…
▽ More
We study distributed algorithms for string matching problem in presence of wildcard characters. Given a string T (a text), we look for all occurrences of another string P (a pattern) as a substring of string T . Each wildcard character in the pattern matches a specific class of strings based on its type. String matching is one of the most fundamental problems in computer science, especially in the fields of bioinformatics and machine learning. Persistent effort has led to a variety of algorithms for the problem since 1960s.
With rise of big data and the inevitable demand to solve problems on huge data sets, there have been many attempts to adapt classic algorithms into the MPC framework to obtain further efficiency. MPC is a recent framework for parallel computation of big data, which is designed to capture the MapReduce-like algorithms. In this paper, we study the string matching problem using a set of tools translated to MPC model. We consider three types of wildcards in string matching:
- '?' wildcard: In this setting, the pattern is allowed to contain special '?' characters or don't cares that match any character of the text. String matching with don't cares could be solved by fast convolutions, and we give a constant round MPC algorithm for which by utilizing FFT in a constant number of MPC rounds.
- '+' wildcard: '+' wildcard is a special character that allows for arbitrary repetitions of a character. When the pattern contains '+' wildcard characters, our algorithm runs in a constant number of MPC rounds by a reduction from subset matching problem.
- '*' wildcard: '*' is a special character that matches with any substring of the text. When '*' is allowed in the pattern, we solve two special cases of the problem in logarithmic rounds.
△ Less
Submitted 4 June, 2021; v1 submitted 25 October, 2019;
originally announced October 2019.
-
ScanSAT: Unlocking Static and Dynamic Scan Obfuscation
Authors:
Lilas Alrahis,
Muhammad Yasin,
Nimisha Limaye,
Hani Saleh,
Baker Mohammad,
Mahmoud Al-Qutayri,
Ozgur Sinanoglu
Abstract:
While financially advantageous, outsourcing key steps, such as testing, to potentially untrusted Outsourced Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hides the transformation functions that map…
▽ More
While financially advantageous, outsourcing key steps, such as testing, to potentially untrusted Outsourced Assembly and Test (OSAT) companies may pose a risk of compromising on-chip assets. Obfuscation of scan chains is a technique that hides the actual scan data from the untrusted testers; logic inserted between the scan cells, driven by a secret key, hides the transformation functions that map the scan-in stimulus (scan-out response) and the delivered scan pattern (captured response). While static scan obfuscation utilizes the same secret key, and thus, the same secret transformation functions throughout the lifetime of the chip, dynamic scan obfuscation updates the key periodically. In this paper, we propose ScanSAT: an attack that transforms a scan obfuscated circuit to its logic-locked version and applies the Boolean satisfiability (SAT) based attack, thereby extracting the secret key. We implement our attack, apply on representative scan obfuscation techniques, and show that ScanSAT can break both static and dynamic scan obfuscation schemes with 100% success rate. Moreover, ScanSAT is effective even for large key sizes and in the presence of scan compression.
△ Less
Submitted 10 September, 2019;
originally announced September 2019.
-
Fair Allocation of Indivisible Items With Externalities
Authors:
Mohammad Ghodsi,
Hamed Saleh,
Masoud Seddighin
Abstract:
One of the important yet insufficiently studied subjects in fair allocation is the externality effect among agents. For a resource allocation problem, externalities imply that a bundle allocated to an agent may affect the utilities of other agents.
In this paper, we conduct a study of fair allocation of indivisible goods when the externalities are not negligible. We present a simple and natural…
▽ More
One of the important yet insufficiently studied subjects in fair allocation is the externality effect among agents. For a resource allocation problem, externalities imply that a bundle allocated to an agent may affect the utilities of other agents.
In this paper, we conduct a study of fair allocation of indivisible goods when the externalities are not negligible. We present a simple and natural model, namely \emph{network externalities}, to capture the externalities. To evaluate fairness in the network externalities model, we generalize the idea behind the notion of maximin-share ($\MMS$) to achieve a new criterion, namely, \emph{extended-maximin-share} ($\EMMS$). Next, we consider two problems concerning our model.
First, we discuss the computational aspects of finding the value of $\EMMS$ for every agent. For this, we introduce a generalized form of partitioning problem that includes many famous partitioning problems such as maximin, minimax, and leximin partitioning problems. We show that a $1/2$-approximation algorithm exists for this partitioning problem.
Next, we investigate on finding approximately optimal $\EMMS$ allocations. That is, allocations that guarantee every agent a utility of at least a fraction of his extended-maximin-share. We show that under a natural assumption that the agents are $α$-self-reliant, an $α/2$-$\EMMS$ allocation always exists. The combination of this with the former result yields a polynomial-time $α/4$-$\EMMS$ allocation algorithm.
△ Less
Submitted 16 May, 2018;
originally announced May 2018.
-
Efficient Algebraic Effect Handlers for Prolog
Authors:
Amr Hany Saleh,
Tom Schrijvers
Abstract:
Recent work has provided delimited control for Prolog to dynamically manipulate the program control-flow, and to implement a wide range of control-flow and dataflow effects on top of. Unfortunately, delimited control is a rather primitive language feature that is not easy to use.
As a remedy, this work introduces algebraic effect handlers for Prolog, as a high-level and structured way of definin…
▽ More
Recent work has provided delimited control for Prolog to dynamically manipulate the program control-flow, and to implement a wide range of control-flow and dataflow effects on top of. Unfortunately, delimited control is a rather primitive language feature that is not easy to use.
As a remedy, this work introduces algebraic effect handlers for Prolog, as a high-level and structured way of defining new side-effects in a modular fashion. We illustrate the expressive power of the feature and provide an implementation by means of elaboration into the delimited control primitives.
The latter add a non-negligible performance overhead when used extensively. To address this issue, we present an optimised compilation approach that combines partial evaluation with dedicated rewrite rules. The rewrite rules are driven by a lightweight effect inference that analyses what effect operations may be called by a goal. We illustrate the effectiveness of this approach on a range of benchmarks. This article is under consideration for acceptance in TPLP.
△ Less
Submitted 2 August, 2016;
originally announced August 2016.