subscribe to arXiv mailings

Pervasive Technology-Enabled Care and Support for People with Dementia: The State of Art and Research Issues

Authors: Sayan Kumar Ray, Geri Harris, Akbar Hossain, NZ Jhanjhi

Abstract: Dementia is a mental illness that people live with all across the world. No one is immune. Nothing can predict its onset. The true story of dementia remains unknown globally, partly due to the denial of dementia symptoms and partly due to the social stigma attached to the disease. In recent years, dementia as a mental illness has received a lot of attention from the scientific community and health… ▽ More Dementia is a mental illness that people live with all across the world. No one is immune. Nothing can predict its onset. The true story of dementia remains unknown globally, partly due to the denial of dementia symptoms and partly due to the social stigma attached to the disease. In recent years, dementia as a mental illness has received a lot of attention from the scientific community and healthcare providers. This paper presents a state of art survey of pervasive technology enabled care and support for people suffering from Alzheimers dementia. We identify three areas of pervasive technology support for dementia patients, focusing on care, wellness and active living. A critical analysis of existing research is presented here, exploring how pervasive computing, artificial intelligence (AI) and the Internet of Things (IoT) are already supporting and providing comfort to dementia patients, particularly those living alone in the community. The work discusses key challenges and limitations of technology-enabled support owing to reasons like lack of accessibility, availability, usability and affordability of technology, limited holistic care approach, and lack of education and information. Future research directions focusing on how pervasive and connected healthcare can better support the well being and mental health impacts of Alzheimers dementia are also highlighted. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.10785 [pdf, other]

ShareLoRA: Parameter Efficient and Robust Large Language Model Fine-tuning via Shared Low-Rank Adaptation

Authors: Yurun Song, Junchen Zhao, Ian G. Harris, Sangeetha Abdu Jyothi

Abstract: This study introduces an approach to optimize Parameter Efficient Fine Tuning (PEFT) for Pretrained Language Models (PLMs) by implementing a Shared Low Rank Adaptation (ShareLoRA). By strategically deploying ShareLoRA across different layers and adapting it for the Query, Key, and Value components of self-attention layers, we achieve a substantial reduction in the number of training parameters and… ▽ More This study introduces an approach to optimize Parameter Efficient Fine Tuning (PEFT) for Pretrained Language Models (PLMs) by implementing a Shared Low Rank Adaptation (ShareLoRA). By strategically deploying ShareLoRA across different layers and adapting it for the Query, Key, and Value components of self-attention layers, we achieve a substantial reduction in the number of training parameters and memory usage. Importantly, ShareLoRA not only maintains model performance but also exhibits robustness in both classification and generation tasks across a variety of models, including RoBERTa, GPT-2, LLaMA and LLaMA2. It demonstrates superior transfer learning capabilities compared to standard LoRA applications and mitigates overfitting by sharing weights across layers. Our findings affirm that ShareLoRA effectively boosts parameter efficiency while ensuring scalable and high-quality performance across different language model architectures. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 16 pages, 5 figures

arXiv:2401.15886 [pdf, other]

doi 10.1007/978-981-97-1335-6_7

Grey Level Texture Features for Segmentation of Chromogenic Dye RNAscope From Breast Cancer Tissue

Authors: Andrew Davidson, Arthur Morley-Bunker, George Wiggins, Logan Walker, Gavin Harris, Ramakrishnan Mukundan, kConFab Investigators

Abstract: Chromogenic RNAscope dye and haematoxylin staining of cancer tissue facilitates diagnosis of the cancer type and subsequent treatment, and fits well into existing pathology workflows. However, manual quantification of the RNAscope transcripts (dots), which signify gene expression, is prohibitively time consuming. In addition, there is a lack of verified supporting methods for quantification and an… ▽ More Chromogenic RNAscope dye and haematoxylin staining of cancer tissue facilitates diagnosis of the cancer type and subsequent treatment, and fits well into existing pathology workflows. However, manual quantification of the RNAscope transcripts (dots), which signify gene expression, is prohibitively time consuming. In addition, there is a lack of verified supporting methods for quantification and analysis. This paper investigates the usefulness of grey level texture features for automatically segmenting and classifying the positions of RNAscope transcripts from breast cancer tissue. Feature analysis showed that a small set of grey level features, including Grey Level Dependence Matrix and Neighbouring Grey Tone Difference Matrix features, were well suited for the task. The automated method performed similarly to expert annotators at identifying the positions of RNAscope transcripts, with an F1-score of 0.571 compared to the expert inter-rater F1-score of 0.596. These results demonstrate the potential of grey level texture features for automated quantification of RNAscope in the pathology workflow. △ Less

Submitted 12 March, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

Comments: This preprint has not undergone peer review (when applicable) or any post-submission improvements or corrections. The Version of Record of this contribution is published in Proceedings of 2023 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2023), and is available online at https://doi.org/10.1007/978-981-97-1335-6_7

arXiv:2312.00388 [pdf, other]

LinguaLinked: A Distributed Large Language Model Inference System for Mobile Devices

Authors: Junchen Zhao, Yurun Song, Simeng Liu, Ian G. Harris, Sangeetha Abdu Jyothi

Abstract: Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by proces… ▽ More Deploying Large Language Models (LLMs) locally on mobile devices presents a significant challenge due to their extensive memory requirements. In this paper, we introduce LinguaLinked, a system for decentralized, distributed LLM inference on mobile devices. LinguaLinked enables collaborative execution of the inference task across multiple trusted devices. LinguaLinked ensures data privacy by processing information locally. LinguaLinked uses three key strategies. First, an optimized model assignment technique segments LLMs and uses linear optimization to align segments with each device's capabilities. Second, an optimized data transmission mechanism ensures efficient and structured data flow between model segments while also maintaining the integrity of the original model structure. Finally, LinguaLinked incorporates a runtime load balancer that actively monitors and redistributes tasks among mobile devices to prevent bottlenecks, enhancing the system's overall efficiency and responsiveness. We demonstrate that LinguaLinked facilitates efficient LLM inference while maintaining consistent throughput and minimal latency through extensive testing across various mobile devices, from high-end to low-end Android devices. In our evaluations, compared to the baseline, LinguaLinked achieves an inference performance acceleration of $1.11\times$ to $1.61\times$ in single-threaded settings, $1.73\times$ to $2.65\times$ with multi-threading. Additionally, runtime load balancing yields an overall inference acceleration of $1.29\times$ to $1.32\times$. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 16 pages, 8 figures

arXiv:2311.00172 [pdf, other]

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

Authors: Jinhwa Kim, Ali Derakhshan, Ian G. Harris

Abstract: Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses. In the heart of these systems lies a safety classifier, a computational model trained to discern and mitigate potentially harmful, offensive, or unethical outputs. However, contemporary safety classifiers, despite their potential,… ▽ More Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses. In the heart of these systems lies a safety classifier, a computational model trained to discern and mitigate potentially harmful, offensive, or unethical outputs. However, contemporary safety classifiers, despite their potential, often fail when exposed to inputs infused with adversarial noise. In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts. Additionally, we propose novel strategies for autonomously generating adversarial training datasets, named Bot Adversarial Noisy Dialogue (BAND) datasets. These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process. Through evaluations involving Large Language Models, we demonstrate that our classifier has the potential to decrease the attack success rate resulting from adversarial attacks by up to 60%. This advancement paves the way for the next generation of more reliable and resilient conversational agents. △ Less

Submitted 31 October, 2023; originally announced November 2023.

Comments: 11 pages, 2 figures

arXiv:2309.05274 [pdf, other]

doi 10.1109/ICASSP48485.2024.10448041

FuzzLLM: A Novel and Universal Fuzzing Framework for Proactively Discovering Jailbreak Vulnerabilities in Large Language Models

Authors: Dongyu Yao, Jianshu Zhang, Ian G. Harris, Marcel Carlsson

Abstract: Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against individual jailbreak prompts through safety training strategies, this relatively passive approach struggles to handle the broader category of similar jailb… ▽ More Jailbreak vulnerabilities in Large Language Models (LLMs), which exploit meticulously crafted prompts to elicit content that violates service guidelines, have captured the attention of research communities. While model owners can defend against individual jailbreak prompts through safety training strategies, this relatively passive approach struggles to handle the broader category of similar jailbreaks. To tackle this issue, we introduce FuzzLLM, an automated fuzzing framework designed to proactively test and discover jailbreak vulnerabilities in LLMs. We utilize templates to capture the structural integrity of a prompt and isolate key features of a jailbreak class as constraints. By integrating different base classes into powerful combo attacks and varying the elements of constraints and prohibited questions, FuzzLLM enables efficient testing with reduced manual effort. Extensive experiments demonstrate FuzzLLM's effectiveness and comprehensiveness in vulnerability discovery across various LLMs. △ Less

Submitted 14 April, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Publish by ICASSP 2024 on 3/18/2024; Extended Arxiv version

arXiv:2308.07476 [pdf, ps, other]

Dependent rounding with strong negative-correlation, and scheduling on unrelated machines to minimize completion time

Authors: David G. Harris

Abstract: We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $\vec x$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $\vec X$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $X_f = 1$, and where the variables $X_e$ also satisfy broad nonpositive-correlation properties. In pa… ▽ More We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $\vec x$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $\vec X$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $X_f = 1$, and where the variables $X_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $X_{e_1}, X_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $X_{e_1} X_{e_2}$ is significantly below $x_{e_1} x_{e_2}$. This algorithm is based on generating negatively-correlated Exponential random variables and using them in a contention-resolution scheme inspired by an algorithm Im & Shadloo (2020). Our algorithm gives stronger and much more flexible negative correlation properties. Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.398$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023). △ Less

Submitted 16 May, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Journal ref: SODA 2024

arXiv:2303.06090 [pdf, ps, other]

Simple and efficient four-cycle counting on sparse graphs

Authors: Paul Burkhardt, David G. Harris

Abstract: We consider the problem of counting 4-cycles ($C_4$) in an undirected graph $G$ of $n$ vertices and $m$ edges (in bipartite graphs, 4-cycles are also often referred to as $\textit{butterflies}$). There have been a number of previous algorithms for this problem based on sorting the graph by degree and using randomized hash tables. These are appealing in theory due to compact storage and fast access… ▽ More We consider the problem of counting 4-cycles ($C_4$) in an undirected graph $G$ of $n$ vertices and $m$ edges (in bipartite graphs, 4-cycles are also often referred to as $\textit{butterflies}$). There have been a number of previous algorithms for this problem based on sorting the graph by degree and using randomized hash tables. These are appealing in theory due to compact storage and fast access on average. But, the performance of hash tables can degrade unpredictably and are also vulnerable to adversarial input. We develop a new simpler algorithm for counting $C_4$ requiring $O(m\barδ(G))$ time and $O(n)$ space, where $\bar δ(G) \leq O(\sqrt{m})$ is the $\textit{average degeneracy}$ parameter introduced by Burkhardt, Faber & Harris (2020). It has several practical improvements over previous algorithms; for example, it is fully deterministic, does not require any sorting of the input graph, and uses only addition and array access in its inner loops. To the best of our knowledge, all previous efficient algorithms for $C_4$ counting have required $Ω(m)$ space in addition to storing the input graph. Our algorithm is very simple and easily adapted to count 4-cycles incident to each vertex and edge. Empirical tests demonstrate that our array-based approach is $4\times$ -- $7\times$ faster on average compared to popular hash table implementations. △ Less

Submitted 17 April, 2024; v1 submitted 10 March, 2023; originally announced March 2023.

arXiv:2205.08022 [pdf, other]

A faster algorithm for Vertex Cover parameterized by solution size

Authors: David G. Harris, N. S. Narayanaswamy

Abstract: We describe a new algorithm for vertex cover with runtime $O^*(1.25284^k)$, where $k$ is the size of the desired solution and $O^*$ hides polynomial factors in the input size. This improves over previous runtime of $O^*(1.2738^k)$ due to Chen, Kanj, & Xia (2010) standing for more than a decade. The key to our algorithm is to use a potential function which simultaneously tracks $k$ as well as the o… ▽ More We describe a new algorithm for vertex cover with runtime $O^*(1.25284^k)$, where $k$ is the size of the desired solution and $O^*$ hides polynomial factors in the input size. This improves over previous runtime of $O^*(1.2738^k)$ due to Chen, Kanj, & Xia (2010) standing for more than a decade. The key to our algorithm is to use a potential function which simultaneously tracks $k$ as well as the optimal value $λ$ of the vertex cover LP relaxation. This approach also allows us to make use of prior algorithms for Maximum Independent Set in bounded-degree graphs and Above-Guarantee Vertex Cover. The main step in the algorithm is to branch on high-degree vertices, while ensuring that both $k$ and $μ= k - λ$ are decreased at each step. There can be local obstructions in the graph that prevent $μ$ from decreasing in this process; we develop a number of novel branching steps to handle these situations. △ Less

Submitted 5 January, 2024; v1 submitted 16 May, 2022; originally announced May 2022.

Journal ref: STACS 2024

arXiv:2201.08810 [pdf, other]

GAP-Gen: Guided Automatic Python Code Generation

Authors: Junchen Zhao, Yurun Song, Junlin Wang, Ian G. Harris

Abstract: Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Synta… ▽ More Automatic code generation from natural language descriptions can be highly beneficial during the process of software development. In this work, we propose GAP-Gen, a Guided Automatic Python Code Generation method based on Python syntactic constraints and semantic constraints. We first introduce Python syntactic constraints in the form of Syntax-Flow, which is a simplified version of Abstract Syntax Tree (AST) reducing the size and high complexity of Abstract Syntax Tree but maintaining crucial syntactic information of Python code. In addition to Syntax-Flow, we introduce Variable-Flow which abstracts variable and function names consistently through out the code. In our work, rather than pretraining, we focus on modifying the finetuning process which reduces computational requirements but retains high generation performance on automatic Python code generation task. GAP-Gen fine-tunes the transformer based language models T5 and CodeT5 using the Code-to-Docstring datasets CodeSearchNet, CodeSearchNet AdvTest and Code-Docstring Corpus from EdinburghNLP. Our experiments show that GAP-Gen achieves better results on automatic Python code generation task than previous works. △ Less

Submitted 9 May, 2023; v1 submitted 19 January, 2022; originally announced January 2022.

Comments: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop

arXiv:2109.13335 [pdf, ps, other]

Algorithms for matrix multiplication via sampling and opportunistic matrix multiplication

Authors: David G. Harris

Abstract: Karppa & Kaski (2019) proposed a novel ``broken" or ``opportunistic" matrix multiplication algorithm, based on a variant of Strassen's algorithm, and used this to develop new algorithms for Boolean matrix multiplication, among other tasks. Their algorithm can compute Boolean matrix multiplication in $O(n^{2.778})$ time. While asymptotically faster matrix multiplication algorithms exist, most such… ▽ More Karppa & Kaski (2019) proposed a novel ``broken" or ``opportunistic" matrix multiplication algorithm, based on a variant of Strassen's algorithm, and used this to develop new algorithms for Boolean matrix multiplication, among other tasks. Their algorithm can compute Boolean matrix multiplication in $O(n^{2.778})$ time. While asymptotically faster matrix multiplication algorithms exist, most such algorithms are infeasible for practical problems. We describe an alternative way to use the broken multiplication algorithm to approximately compute matrix multiplication, either for real-valued or Boolean matrices. In brief, instead of running multiple iterations of the broken algorithm on the original input matrix, we form a new larger matrix by sampling and run a single iteration of the broken algorithm on it. Asymptotically, our algorithm has runtime $O(n^{2.763})$, a slight improvement over the Karppa-Kaski algorithm. Since the goal is to obtain new practical matrix-multiplication algorithms, we also estimate the concrete runtime for our algorithm for some large-scale sample problems. It appears that for these parameters, further optimizations are still needed to make our algorithm competitive. △ Less

Submitted 13 February, 2024; v1 submitted 27 September, 2021; originally announced September 2021.

Journal ref: ESA 2023

arXiv:2106.03388 [pdf, other]

doi 10.1109/JBHI.2021.3087735

DINs: Deep Interactive Networks for Neurofibroma Segmentation in Neurofibromatosis Type 1 on Whole-Body MRI

Authors: Jian-Wei Zhang, Wei Chen, K. Ina Ly, Xubin Zhang, Fan Yan, Justin Jordan, Gordon Harris, Scott Plotkin, Pengyi Hao, Wenli Cai

Abstract: Neurofibromatosis type 1 (NF1) is an autosomal dominant tumor predisposition syndrome that involves the central and peripheral nervous systems. Accurate detection and segmentation of neurofibromas are essential for assessing tumor burden and longitudinal tumor size changes. Automatic convolutional neural networks (CNNs) are sensitive and vulnerable as tumors' variable anatomical location and heter… ▽ More Neurofibromatosis type 1 (NF1) is an autosomal dominant tumor predisposition syndrome that involves the central and peripheral nervous systems. Accurate detection and segmentation of neurofibromas are essential for assessing tumor burden and longitudinal tumor size changes. Automatic convolutional neural networks (CNNs) are sensitive and vulnerable as tumors' variable anatomical location and heterogeneous appearance on MRI. In this study, we propose deep interactive networks (DINs) to address the above limitations. User interactions guide the model to recognize complicated tumors and quickly adapt to heterogeneous tumors. We introduce a simple but effective Exponential Distance Transform (ExpDT) that converts user interactions into guide maps regarded as the spatial and appearance prior. Comparing with popular Euclidean and geodesic distances, ExpDT is more robust to various image sizes, which reserves the distribution of interactive inputs. Furthermore, to enhance the tumor-related features, we design a deep interactive module to propagate the guides into deeper layers. We train and evaluate DINs on three MRI data sets from NF1 patients. The experiment results yield significant improvements of 44% and 14% in DSC comparing with automated and other interactive methods, respectively. We also experimentally demonstrate the efficiency of DINs in reducing user burden when comparing with conventional interactive methods. The source code of our method is available at \url{https://github.com/Jarvis73/DINs}. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: Accepted by IEEE Journal of Biomedical and Health Informatics (JBHI)

Journal ref: IEEE Journal of Biomedical and Health Informatics, 2021

arXiv:2012.03624

Improving Constraint Satisfaction Algorithm Efficiency for the AllDifferent Constraint

Authors: Geoff Harris

Abstract: Combinatorial problems stated as Constraint Satisfaction Problems (CSP) are examined. It is shown by example that any algorithm designed for the original CSP, and involving the AllDifferent constraint, has at least the same level of efficacy when simultaneously applied to both the original and its complementary problem. The 1-to-1 mapping employed to transform a CSP to its complementary problem, w… ▽ More Combinatorial problems stated as Constraint Satisfaction Problems (CSP) are examined. It is shown by example that any algorithm designed for the original CSP, and involving the AllDifferent constraint, has at least the same level of efficacy when simultaneously applied to both the original and its complementary problem. The 1-to-1 mapping employed to transform a CSP to its complementary problem, which is also a CSP, is introduced. This "Dual CSP" method and its application are outlined. The analysis of several random problem instances demonstrate the benefits of this method for variable domain reduction compared to the standard approach to CSP. Extensions to additional constraints other than AllDifferent, as well as the use of hybrid algorithms, are proposed as candidates for this Dual CSP method. △ Less

Submitted 13 December, 2020; v1 submitted 7 December, 2020; originally announced December 2020.

Comments: *sigh* - it has been gently and kindly pointed out to me that I have simply re-discovered the channelling of constraints across alternate problem specifications. Gosh this is oddly amusing albeit embarrassing!

arXiv:2011.07097 [pdf, ps, other]

Some remarks on hypergraph matching and the Füredi-Kahn-Seymour conjecture

Authors: Nikhil Bansal, David G. Harris

Abstract: A classic conjecture of Füredi, Kahn and Seymour (1993) states that given any hypergraph with non-negative edge weights $w(e)$, there exists a matching $M$ such that $\sum_{e \in M} (|e|-1+1/|e|)\, w(e) \geq w^*$, where $w^*$ is the value of an optimum fractional matching. We show the conjecture is true for rank-3 hypergraphs, and is achieved by a natural iterated rounding algorithm. While the gen… ▽ More A classic conjecture of Füredi, Kahn and Seymour (1993) states that given any hypergraph with non-negative edge weights $w(e)$, there exists a matching $M$ such that $\sum_{e \in M} (|e|-1+1/|e|)\, w(e) \geq w^*$, where $w^*$ is the value of an optimum fractional matching. We show the conjecture is true for rank-3 hypergraphs, and is achieved by a natural iterated rounding algorithm. While the general conjecture remains open, we give several new improved bounds. In particular, we show that the iterated rounding algorithm gives $\sum_{e \in M} (|e|-δ(e))\, w(e) \geq w^*$, where $δ(e) = |e|/(|e|^2+|e|-1)$, improving upon the baseline guarantee of $\sum_{e \in M} |e|\,w(e) \geq w^*$. △ Less

Submitted 8 March, 2022; v1 submitted 13 November, 2020; originally announced November 2020.

Journal ref: Random Structures & Algorithms 62(1), pp. 52-67 (2023)

arXiv:2009.10761 [pdf, other]

On the Locality of Nash-Williams Forest Decomposition and Star-Forest Decomposition

Authors: David G. Harris, Hsin-Hao Su, Hoa T. Vu

Abstract: Given a graph $G=(V,E)$ with arboricity $α$, we study the problem of decomposing the edges of $G$ into $(1+ε)α$ disjoint forests in the distributed LOCAL model. Barenboim and Elkin [PODC `08] gave a LOCAL algorithm that computes a $(2+ε)α$-forest decomposition using $O(\frac{\log n}ε)$ rounds. Ghaffari and Su [SODA `17] made further progress by computing a $(1+ε) α$-forest decomposition in… ▽ More Given a graph $G=(V,E)$ with arboricity $α$, we study the problem of decomposing the edges of $G$ into $(1+ε)α$ disjoint forests in the distributed LOCAL model. Barenboim and Elkin [PODC `08] gave a LOCAL algorithm that computes a $(2+ε)α$-forest decomposition using $O(\frac{\log n}ε)$ rounds. Ghaffari and Su [SODA `17] made further progress by computing a $(1+ε) α$-forest decomposition in $O(\frac{\log^3 n}{ε^4})$ rounds when $εα= Ω(\sqrt{α\log n})$, i.e. the limit of their algorithm is an $(α+ Ω(\sqrt{α\log n}))$-forest decomposition. This algorithm, based on a combinatorial construction of Alon, McDiarmid \& Reed [Combinatorica `92], in fact provides a decomposition of the graph into \emph{star-forests}, i.e. each forest is a collection of stars. Our main result in this paper is to reduce the threshold of $εα$ in $(1+ε)α$-forest decomposition and star-forest decomposition. This further answers the $10^{\text{th}}$ open question from Barenboim and Elkin's "Distributed Graph Algorithms" book. Moreover, it gives the first $(1+ε)α$-orientation algorithms with {\it linear dependencies} on $ε^{-1}$. At a high level, our results for forest-decomposition are based on a combination of network decomposition, load balancing, and a new structural result on local augmenting sequences. Our result for star-forest decomposition uses a more careful probabilistic analysis for the construction of Alon, McDiarmid, \& Reed; the bounds on star-arboricity here were not previously known, even non-constructively. △ Less

Submitted 7 December, 2022; v1 submitted 22 September, 2020; originally announced September 2020.

Journal ref: SIAM Journal on Discrete Mathematics 37(2), pp. 800-830 (2023)

arXiv:2008.05569 [pdf, ps, other]

A new notion of commutativity for the algorithmic Lovász Local Lemma

Authors: David G. Harris, Fotis Iliopoulos, Vladimir Kolmogorov

Abstract: The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser and Tardos and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects. In particular, it can be seen as a… ▽ More The Lovász Local Lemma (LLL) is a powerful tool in probabilistic combinatorics which can be used to establish the existence of objects that satisfy certain properties. The breakthrough paper of Moser and Tardos and follow-up works revealed that the LLL has intimate connections with a class of stochastic local search algorithms for finding such desirable objects. In particular, it can be seen as a sufficient condition for this type of algorithms to converge fast. Besides conditions for existence of and fast convergence to desirable objects, one may naturally ask further questions regarding properties of these algorithms. For instance, "are they parallelizable?", "how many solutions can they output?", "what is the expected "weight" of a solution?", etc. These questions and more have been answered for a class of LLL-inspired algorithms called commutative. In this paper we introduce a new, very natural and more general notion of commutativity (essentially matrix commutativity) which allows us to show a number of new refined properties of LLL-inspired local search algorithms with significantly simpler proofs. △ Less

Submitted 30 March, 2022; v1 submitted 12 August, 2020; originally announced August 2020.

Journal ref: RANDOM 2021

arXiv:2008.03325 [pdf, ps, other]

Stochastic Optimization and Learning for Two-Stage Supplier Problems

Authors: Brian Brubach, Nathaniel Grammel, David G. Harris, Aravind Srinivasan, Leonidas Tsepenekas, Anil Vullikanti

Abstract: The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demand… ▽ More The main focus of this paper is radius-based (supplier) clustering in the two-stage stochastic setting with recourse, where the inherent stochasticity of the model comes in the form of a budget constraint. In addition to the standard (homogeneous) setting where all clients must be within a distance $R$ of the nearest facility, we provide results for the more general problem where the radius demands may be inhomogeneous (i.e., different for each client). We also explore a number of variants where additional constraints are imposed on the first-stage decisions, specifically matroid and multi-knapsack constraints, and provide results for these settings. We derive results for the most general distributional setting, where there is only black-box access to the underlying distribution. To accomplish this, we first develop algorithms for the polynomial scenarios setting; we then employ a novel scenario-discarding variant of the standard Sample Average Approximation (SAA) method, which crucially exploits properties of the restricted-case algorithms. We note that the scenario-discarding modification to the SAA method is necessary in order to optimize over the radius. △ Less

Submitted 7 April, 2024; v1 submitted 7 August, 2020; originally announced August 2020.

arXiv:2007.10824 [pdf, ps, other]

Parameter estimation for Gibbs distributions

Authors: David G. Harris, Vladimir Kolmogorov

Abstract: We consider Gibbs distributions, which are families of probability distributions over a discrete space $Ω$ with probability mass function of the form $μ^Ω_β(ω) \propto e^{βH(ω)}$ for $β$ in an interval $[β_{\min}, β_{\max}]$ and $H( ω) \in \{0 \} \cup [1, n]$. The partition function is the normalization factor $Z(β)=\sum_{ω\inΩ}e^{βH(ω)}$. Two important parameters of these distributions are the… ▽ More We consider Gibbs distributions, which are families of probability distributions over a discrete space $Ω$ with probability mass function of the form $μ^Ω_β(ω) \propto e^{βH(ω)}$ for $β$ in an interval $[β_{\min}, β_{\max}]$ and $H( ω) \in \{0 \} \cup [1, n]$. The partition function is the normalization factor $Z(β)=\sum_{ω\inΩ}e^{βH(ω)}$. Two important parameters of these distributions are the log partition ratio $q = \log \tfrac{Z(β_{\max})}{Z(β_{\min})}$ and the counts $c_x = |H^{-1}(x)|$. These are correlated with system parameters in a number of physical applications and sampling algorithms. Our first main result is to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{\varepsilon^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters), and we show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs, independent sets, and perfect matchings. As a key subroutine, we also develop algorithms to compute the partition function $Z$ using $\tilde O(\frac{q}{\varepsilon^2})$ samples for general Gibbs distributions and using $\tilde O(\frac{n^2}{\varepsilon^2})$ samples for integer-valued distributions. △ Less

Submitted 9 March, 2024; v1 submitted 17 July, 2020; originally announced July 2020.

Comments: This is a longer version which extends two previous papers "A Faster Approximation Algorithm for the Gibbs Partition Function" (arXiv:1608.04223), published in COLT 2018, and "Parameter estimation for Gibbs distributions" published in ICALP 2023

arXiv:2005.08301 [pdf, other]

Optimal Bounds for the $k$-cut Problem

Authors: Anupam Gupta, David G. Harris, Euiwoong Lee, Jason Li

Abstract: In the $k$-cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into $k$ connected components. Algorithms of Karger \& Stein can solve this in roughly $O(n^{2k})$ time. On the other hand, lower bounds from conjectures about the $k$-clique problem imply that $Ω(n^{(1-o(1))k})$ time is likely needed. Recent results of Gupta, Lee \& Li have given new… ▽ More In the $k$-cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into $k$ connected components. Algorithms of Karger \& Stein can solve this in roughly $O(n^{2k})$ time. On the other hand, lower bounds from conjectures about the $k$-clique problem imply that $Ω(n^{(1-o(1))k})$ time is likely needed. Recent results of Gupta, Lee \& Li have given new algorithms for general $k$-cut in $n^{1.98k + O(1)}$ time, as well as specialized algorithms with better performance for certain classes of graphs (e.g., for small integer edge weights). In this work, we resolve the problem for general graphs. We show that the Contraction Algorithm of Karger outputs any fixed $k$-cut of weight $αλ_k$ with probability $Ω_k(n^{-αk})$, where $λ_k$ denotes the minimum $k$-cut weight. This also gives an extremal bound of $O_k(n^k)$ on the number of minimum $k$-cuts and an algorithm to compute $λ_k$ with roughly $n^k \mathrm{polylog}(n)$ runtime. Both are tight up to lower-order factors, with the algorithmic lower bound assuming hardness of max-weight $k$-clique. The first main ingredient in our result is an extremal bound on the number of cuts of weight less than $2 λ_k/k$, using the Sunflower lemma. The second ingredient is a fine-grained analysis of how the graph shrinks -- and how the average degree evolves -- in the Karger process. △ Less

Submitted 11 September, 2021; v1 submitted 17 May, 2020; originally announced May 2020.

Comments: Final version of arXiv:1911.09165 with new and more general proofs

Journal ref: Journal of the ACM 69(1), Article #2 (2022)

arXiv:1909.08065 [pdf, ps, other]

Deterministic algorithms for the Lovasz Local Lemma: simpler, more general, and more parallel

Authors: David G. Harris

Abstract: The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection $\mathcal B$ of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability $p$ and affects at most $d$ bad-events, and $e p d < 1$, then a configuration avoid… ▽ More The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection $\mathcal B$ of "bad" events which are mostly independent and have low probability. In its simplest "symmetric" form, it asserts that whenever a bad-event has probability $p$ and affects at most $d$ bad-events, and $e p d < 1$, then a configuration avoiding all $\mathcal B$ exists. A seminal algorithm of Moser & Tardos (2010) gives nearly-automatic randomized algorithms for most constructions based on the LLL. However, deterministic algorithms have lagged behind. We address three specific shortcomings of the prior deterministic algorithms. First, our algorithm applies to the LLL criterion of Shearer (1985); this is more powerful than alternate LLL criteria and also removes a number of nuisance parameters and leads to cleaner and more legible bounds. Second, we provide parallel algorithms with much greater flexibility in the functional form of of the bad-events. Third, we provide a derandomized version of the MT-distribution, that is, the distribution of the variables at the termination of the MT algorithm. We show applications to non-repetitive vertex coloring, independent transversals, strong coloring, and other problems. These give deterministic algorithms which essentially match the best previous randomized sequential and parallel algorithms. △ Less

Submitted 1 February, 2023; v1 submitted 17 September, 2019; originally announced September 2019.

Comments: This superseded arxiv:1807.06672

Journal ref: Random Structures & Algorithms 63(3), pp. 716-752 (2023)

arXiv:1907.00033 [pdf, ps, other]

Algorithms for weighted independent transversals and strong colouring

Authors: Alessandra Graf, David G. Harris, Penny Haxell

Abstract: An independent transversal (IT) in a graph with a given vertex partition is an independent set consisting of one vertex in each partition class. Several sufficient conditions are known for the existence of an IT in a given graph with a given vertex partition, which have been used over the years to solve many combinatorial problems. Some of these IT existence theorems have algorithmic proofs, but t… ▽ More An independent transversal (IT) in a graph with a given vertex partition is an independent set consisting of one vertex in each partition class. Several sufficient conditions are known for the existence of an IT in a given graph with a given vertex partition, which have been used over the years to solve many combinatorial problems. Some of these IT existence theorems have algorithmic proofs, but there remains a gap between the best bounds given by nonconstructive results, and those obtainable by efficient algorithms. Recently, Graf and Haxell (2018) described a new (deterministic) algorithm that asymptotically closes this gap, but there are limitations on its applicability. In this paper we develop a randomized version of this algorithm that is much more widely applicable, and demonstrate its use by giving efficient algorithms for two problems concerning the strong chromatic number of graphs. △ Less

Submitted 21 August, 2021; v1 submitted 28 June, 2019; originally announced July 2019.

Journal ref: ACM Transactions on Algorithms 18(1), Article #1 (2021)

arXiv:1904.03139

Parameter estimation for integer-valued Gibbs distributions

Authors: David G. Harris, Vladimir Kolmogorov

Abstract: A central problem in computational statistics is to convert a procedure for sampling combinatorial from an objects into a procedure for counting those objects, and vice versa. Weconsider sampling problems coming from *Gibbs distributions*, which are probability distributions of the form $μ^Ω_β(ω) \propto e^{βH(ω)}$ for $β$ in an interval $[β_\min, β_\max]$ and $H( ω) \in \{0 \} \cup [1, n]$. The *… ▽ More A central problem in computational statistics is to convert a procedure for sampling combinatorial from an objects into a procedure for counting those objects, and vice versa. Weconsider sampling problems coming from *Gibbs distributions*, which are probability distributions of the form $μ^Ω_β(ω) \propto e^{βH(ω)}$ for $β$ in an interval $[β_\min, β_\max]$ and $H( ω) \in \{0 \} \cup [1, n]$. The *partition function* is the normalization factor $Z(β)=\sum_{ω\inΩ}e^{βH(ω)}$. Two important parameters are the log partition ratio $q = \log \tfrac{Z(β_\max)}{Z(β_\min)}$ and the vector of counts $c_x = |H^{-1}(x)|$. Our first result is an algorithm to estimate the counts $c_x$ using roughly $\tilde O( \frac{q}{ε^2})$ samples for general Gibbs distributions and $\tilde O( \frac{n^2}{ε^2} )$ samples for integer-valued distributions (ignoring some second-order terms and parameters). We show this is optimal up to logarithmic factors. We illustrate with improved algorithms for counting connected subgraphs and perfect matchings in a graph. We develop a key subroutine for global estimation of the partition function. Specifically, we produce a data structure to estimate $Z(β)$ for \emph{all} values $β$, without further samples. Constructing the data structure requires $O(\frac{q \log n}{ε^2})$ samples for general Gibbs distributions and $O(\frac{n^2 \log n}{ε^2} + n \log q)$ samples for integer-valued distributions. This improves over a prior algorithm of Kolmogorov (2018) which computes the single point estimate $Z(β_\max)$ using $\tilde O(\frac{q}{ε^2})$ samples. We also show that this complexity is optimal as a function of $n$ and $q$ up to logarithmic terms. △ Less

Submitted 17 August, 2023; v1 submitted 5 April, 2019; originally announced April 2019.

Comments: Superseded by arXiv:2007.10824 This version is obsolete

arXiv:1901.03744 [pdf, other]

Exponentially Faster Massively Parallel Maximal Matching

Authors: Soheil Behnezhad, MohammadTaghi Hajiaghayi, David G. Harris

Abstract: The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is consider… ▽ More The study of approximate matching in the Massively Parallel Computations (MPC) model has recently seen a burst of breakthroughs. Despite this progress, however, we still have a far more limited understanding of maximal matching which is one of the central problems of parallel and distributed computing. All known MPC algorithms for maximal matching either take polylogarithmic time which is considered inefficient, or require a strictly super-linear space of $n^{1+Ω(1)}$ per machine. In this work, we close this gap by providing a novel analysis of an extremely simple algorithm a variant of which was conjectured to work by Czumaj et al. [STOC'18]. The algorithm edge-samples the graph, randomly partitions the vertices, and finds a random greedy maximal matching within each partition. We show that this algorithm drastically reduces the vertex degrees. This, among some other results, leads to an $O(\log \log Δ)$ round algorithm for maximal matching with $O(n)$ space (or even mildly sublinear in $n$ using standard techniques). As an immediate corollary, we get a $2$ approximate minimum vertex cover in essentially the same rounds and space. This is the best possible approximation factor under standard assumptions, culminating a long line of research. It also leads to an improved $O(\log\log Δ)$ round algorithm for $1 + \varepsilon$ approximate matching. All these results can also be implemented in the congested clique model within the same number of rounds. △ Less

Submitted 17 August, 2023; v1 submitted 11 January, 2019; originally announced January 2019.

Comments: A preliminary version of this paper appeared in the proceedings of The 60th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2019)

Journal ref: Journal of the ACM 70(5), Article #34 (2023)

arXiv:1809.02271 [pdf, ps, other]

Approximation algorithms for stochastic clustering

Authors: David G. Harris, Shi Li, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

Abstract: We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure th… ▽ More We consider stochastic settings for clustering, and develop provably-good approximation algorithms for a number of these notions. These algorithms yield better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including clustering which is fairer and has better long-term behavior for each user. In particular, they ensure that *every user* is guaranteed to get good service (on average). We also complement some of these with impossibility results. △ Less

Submitted 16 December, 2020; v1 submitted 6 September, 2018; originally announced September 2018.

Comments: The version of this paper published in JMLR is incorrect; specifically, Theorem 14 of that paper appears to be fatally flawed. The version posted here on arxiv removes the claimed incorrect results

Journal ref: Journal of Machine Learning Research 20(153), pp. 1-33 (2019)

arXiv:1807.07645 [pdf, ps, other]

Distributed local approximation algorithms for maximum matching in graphs and hypergraphs

Authors: David G. Harris

Abstract: We describe approximation algorithms in Linial's classic LOCAL model of distributed computing to find maximum-weight matchings in a hypergraph of rank $r$. Our main result is a deterministic algorithm to generate a matching which is an $O(r)$-approximation to the maximum weight matching, running in $\tilde O(r \log Δ+ \log^2 Δ+ \log^* n)$ rounds. (Here, the $\tilde O()$ notations hides… ▽ More We describe approximation algorithms in Linial's classic LOCAL model of distributed computing to find maximum-weight matchings in a hypergraph of rank $r$. Our main result is a deterministic algorithm to generate a matching which is an $O(r)$-approximation to the maximum weight matching, running in $\tilde O(r \log Δ+ \log^2 Δ+ \log^* n)$ rounds. (Here, the $\tilde O()$ notations hides $\text{polyloglog } Δ$ and $\text{polylog } r$ factors). This is based on a number of new derandomization techniques extending methods of Ghaffari, Harris & Kuhn (2017). As a main application, we obtain nearly-optimal algorithms for the long-studied problem of maximum-weight graph matching. Specifically, we get a $(1+ε)$ approximation algorithm using $\tilde O(\log Δ/ ε^3 + \text{polylog}(1/ε, \log \log n))$ randomized time and $\tilde O(\log^2 Δ/ ε^4 + \log^*n / ε)$ deterministic time. The second application is a faster algorithm for hypergraph maximal matching, a versatile subroutine introduced in Ghaffari et al. (2017) for a variety of local graph algorithms. This gives an algorithm for $(2 Δ- 1)$-edge-list coloring in $\tilde O(\log^2 Δ\log n)$ rounds deterministically or $\tilde O( (\log \log n)^3 )$ rounds randomly. Another consequence (with additional optimizations) is an algorithm which generates an edge-orientation with out-degree at most $\lceil (1+ε) λ\rceil$ for a graph of arboricity $λ$; for fixed $ε$ this runs in $\tilde O(\log^6 n)$ rounds deterministically or $\tilde O(\log^3 n )$ rounds randomly. △ Less

Submitted 23 March, 2020; v1 submitted 19 July, 2018; originally announced July 2018.

Journal ref: SIAM Journal on Computing 49(4), pp. 711-746 (2020)

arXiv:1807.06672

Derandomizing the Lovasz Local Lemma via log-space statistical tests

Authors: David G. Harris

Abstract: The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection $\mathcal B$ of "bad" events which are mostly independent and have low probability. In its simplest form, it asserts that whenever a bad-event has probability $p$ and affects at most $d$ other bad-events, and $e p (d+1) < 1$, then a configuration avoidin… ▽ More The Lovász Local Lemma (LLL) is a keystone principle in probability theory, guaranteeing the existence of configurations which avoid a collection $\mathcal B$ of "bad" events which are mostly independent and have low probability. In its simplest form, it asserts that whenever a bad-event has probability $p$ and affects at most $d$ other bad-events, and $e p (d+1) < 1$, then a configuration avoiding all $\mathcal B$ exists. A seminal algorithm of Moser & Tardos (2010) gives randomized algorithms for most constructions based on the LLL. However, deterministic algorithms have lagged behind. Notably, prior deterministic LLL algorithms have required stringent conditions on $\mathcal B$; for example, they have required that events in $\mathcal B$ have low decision-tree complexity or depend on a small number of variables. For this reason, they can only be applied to small fraction of the numerous LLL applications in practice. We describe an alternate deterministic parallel (NC) algorithm for the LLL, based on a general derandomization method of Sivakumar (2002) using log-space statistical tests. The only requirement here is that bad-events should be computable via a finite automaton with $\text{poly}(d)$ states. This covers most LLL applications to graph theory and combinatorics. No auxiliary information about the bad-events, including any conditional probability calculations, are required. Additionally, the proof is a straightforward combination of general derandomization results and high-level analysis of the Moser-Tardos algorithm. We illustrate with applications to defective vertex coloring, domatic partition, and independent transversals. △ Less

Submitted 19 September, 2019; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: This is superseded by arXiv:1909.08065

arXiv:1806.05523 [pdf, other]

doi 10.7155/jgaa.00527

Bounds and algorithms for graph trusses

Authors: Paul Burkhardt, Vance Faber, David G. Harris

Abstract: The $k$-truss, introduced by Cohen (2005), is a graph where every edge is incident to at least $k$ triangles. This is a relaxation of the clique. It has proved to be a useful tool in identifying cohesive subnetworks in a variety of real-world graphs. Despite its simplicity and its utility, the combinatorial and algorithmic aspects of trusses have not been thoroughly explored. We provide nearly-t… ▽ More The $k$-truss, introduced by Cohen (2005), is a graph where every edge is incident to at least $k$ triangles. This is a relaxation of the clique. It has proved to be a useful tool in identifying cohesive subnetworks in a variety of real-world graphs. Despite its simplicity and its utility, the combinatorial and algorithmic aspects of trusses have not been thoroughly explored. We provide nearly-tight bounds on the edge counts of $k$-trusses. We also give two improved algorithms for finding trusses in large-scale graphs. First, we present a simplified and faster algorithm, based on approach discussed in Wang & Cheng (2012). Second, we present a theoretical algorithm based on fast matrix multiplication; this converts a triangle-generation algorithm of Bjorklund et al. (2014) into a dynamic data structure. △ Less

Submitted 31 March, 2020; v1 submitted 14 June, 2018; originally announced June 2018.

Journal ref: Journal of Graph Algorithms and Applications, 24(3):191-214, 2020

arXiv:1805.00311

Head Mounted Pupil Tracking Using Convolutional Neural Network

Authors: Yinheng Zhu, Wanli Chen, Xun Zhan, Zonglin Guo, Hongjian Shi, Ian G. Harris

Abstract: Pupil tracking is an important branch of object tracking which require high precision. We investigate head mounted pupil tracking which is often more convenient and precise than remote pupil tracking, but also more challenging. When pupil tracking suffers from noise like bad illumination, detection precision dramatically decreases. Due to the appearance of head mounted recording device and public… ▽ More Pupil tracking is an important branch of object tracking which require high precision. We investigate head mounted pupil tracking which is often more convenient and precise than remote pupil tracking, but also more challenging. When pupil tracking suffers from noise like bad illumination, detection precision dramatically decreases. Due to the appearance of head mounted recording device and public benchmark image datasets, head mounted tracking algorithms have become easier to design and evaluate. In this paper, we propose a robust head mounted pupil detection algorithm which uses a Convolutional Neural Network (CNN) to combine different features of pupil. Here we consider three features of pupil. Firstly, we use three pupil feature-based algorithms to find pupil center independently. Secondly, we use a CNN to evaluate the quality of each result. Finally, we select the best result as output. The experimental results show that our proposed algorithm performs better than the present state-of-art. △ Less

Submitted 16 July, 2018; v1 submitted 15 April, 2018; originally announced May 2018.

Comments: It's out of date and not STOA any more

arXiv:1711.08494 [pdf, ps, other]

Deterministic parallel algorithms for bilinear objective functions

Authors: David G. Harris

Abstract: Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showed that in many cases these techniques can be combined to give deterministic parallel (NC) algorithms for a variety of combinatorial optimization problems, with low time- and processor… ▽ More Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low independence. A series of papers, beginning with work by Luby (1988), showed that in many cases these techniques can be combined to give deterministic parallel (NC) algorithms for a variety of combinatorial optimization problems, with low time- and processor-complexity. We extend and generalize a technique of Luby for efficiently handling bilinear objective functions. One noteworthy application is an NC algorithm for maximal independent set. On a graph $G$ with $m$ edges and $n$ vertices, this takes $\tilde O(\log^2 n)$ time and $(m + n) n^{o(1)}$ processors, nearly matching the best randomized parallel algorithms. Other applications include reduced processor counts for algorithms of Berger (1997) for maximum acyclic subgraph and Gale-Berlekamp switching games. This bilinear factorization also gives better algorithms for problems involving discrepancy. An important application of this is to automata-fooling probability spaces, which are the basis of a notable derandomization technique of Sivakumar (2002). Our method leads to large reduction in processor complexity for a number of derandomization algorithms based on automata-fooling, including set discrepancy and the Johnson-Lindenstrauss Lemma. △ Less

Submitted 21 June, 2018; v1 submitted 22 November, 2017; originally announced November 2017.

Journal ref: Algorithmica 81(3), pp. 1288-1318 (2019)

arXiv:1711.02194 [pdf, ps, other]

On Derandomizing Local Distributed Algorithms

Authors: Mohsen Ghaffari, David G. Harris, Fabian Kuhn

Abstract: The gap between the known randomized and deterministic local distributed algorithms underlies arguably the most fundamental and central open question in distributed graph algorithms. In this paper, we develop a generic and clean recipe for derandomizing LOCAL algorithms. We also exhibit how this simple recipe leads to significant improvements on a number of problem. Two main results are: - An im… ▽ More The gap between the known randomized and deterministic local distributed algorithms underlies arguably the most fundamental and central open question in distributed graph algorithms. In this paper, we develop a generic and clean recipe for derandomizing LOCAL algorithms. We also exhibit how this simple recipe leads to significant improvements on a number of problem. Two main results are: - An improved distributed hypergraph maximal matching algorithm, improving on Fischer, Ghaffari, and Kuhn [FOCS'17], and giving improved algorithms for edge-coloring, maximum matching approximation, and low out-degree edge orientation. The first gives an improved algorithm for Open Problem 11.4 of the book of Barenboim and Elkin, and the last gives the first positive resolution of their Open Problem 11.10. - An improved distributed algorithm for the Lovász Local Lemma, which gets closer to a conjecture of Chang and Pettie [FOCS'17], and moreover leads to improved distributed algorithms for problems such as defective coloring and $k$-SAT. △ Less

Submitted 17 September, 2019; v1 submitted 6 November, 2017; originally announced November 2017.

Comments: 37 pages

arXiv:1710.00287 [pdf, other]

A Lottery Model for Center-type Problems With Outliers

Authors: David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

Abstract: In this paper, we give tight approximation algorithms for the $k$-center and matroid center problems with outliers. Unfairness arises naturally in this setting: certain clients could always be considered as outliers. To address this issue, we introduce a lottery model in which each client $j$ is allowed to submit a parameter $p_j \in [0,1]$ and we look for a random solution that covers every clien… ▽ More In this paper, we give tight approximation algorithms for the $k$-center and matroid center problems with outliers. Unfairness arises naturally in this setting: certain clients could always be considered as outliers. To address this issue, we introduce a lottery model in which each client $j$ is allowed to submit a parameter $p_j \in [0,1]$ and we look for a random solution that covers every client $j$ with probability at least $p_j$. Our techniques include a randomized rounding procedure to round a point inside a matroid intersection polytope to a basis plus at most one extra item such that all marginal probabilities are preserved and such that a certain linear function of the variables does not decrease in the process with probability one. △ Less

Submitted 30 September, 2017; originally announced October 2017.

arXiv:1709.06995 [pdf, other]

Dependent randomized rounding for clustering and partition systems with knapsack constraints

Authors: David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

Abstract: Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized a… ▽ More Clustering problems are fundamental to unsupervised learning. There is an increased emphasis on fairness in machine learning and AI; one representative notion of fairness is that no single demographic group should be over-represented among the cluster-centers. This, and much more general clustering problems, can be formulated with "knapsack" and "partition" constraints. We develop new randomized algorithms targeting such problems, and study two in particular: multi-knapsack median and multi-knapsack center. Our rounding algorithms give new approximation and pseudo-approximation algorithms for these problems. One key technical tool, which may be of independent interest, is a new tail bound analogous to Feige (2006) for sums of random variables with unbounded variances. Such bounds can be useful in inferring properties of large networks using few samples. △ Less

Submitted 11 May, 2024; v1 submitted 20 September, 2017; originally announced September 2017.

Comments: In the Journal version of this paper, there is a small error in Proposition 25. This version of the paper fixes the error (see Proposition 5.4 in the arxiv version)

Journal ref: Journal of Machine Learning Research 23(81), pp. 1-41 (2022)

arXiv:1704.06528

Fairness in Resource Allocation and Slowed-down Dependent Rounding

Authors: David G. Harris, Thomas Pensyl, Aravind Srinivasan, Khoa Trinh

Abstract: We consider an issue of much current concern: could fairness, an issue that is already difficult to guarantee, worsen when algorithms run much of our lives? We consider this in the context of resource-allocation problems, we show that algorithms can guarantee certain types of fairness in a verifiable way. Our conceptual contribution is a simple approach to fairness in this context, which only requ… ▽ More We consider an issue of much current concern: could fairness, an issue that is already difficult to guarantee, worsen when algorithms run much of our lives? We consider this in the context of resource-allocation problems, we show that algorithms can guarantee certain types of fairness in a verifiable way. Our conceptual contribution is a simple approach to fairness in this context, which only requires that all users trust some public lottery. Our technical contributions are in ways to address the $k$-center and knapsack-center problems that arise in this context: we develop a novel dependent-rounding technique that, via the new ingredients of "slowing down" and additional randomization, guarantees stronger correlation properties than known before. △ Less

Submitted 22 September, 2017; v1 submitted 21 April, 2017; originally announced April 2017.

Comments: We decided to split these results to two separate papers. Please see arXiv:1709.06995

arXiv:1702.02547 [pdf, ps, other]

Oblivious resampling oracles and parallel algorithms for the Lopsided Lovasz Local Lemma

Authors: David G. Harris

Abstract: The Lovász Local Lemma (LLL) is a probabilistic tool which shows that, if a collection of "bad" events $\mathcal B$ in a probability space are not too likely and not too interdependent, then there is a positive probability that no bad-events in $\mathcal B$ occur. Moser & Tardos (2010) gave sequential and parallel algorithms which transformed most applications of the variable-assignment LLL into e… ▽ More The Lovász Local Lemma (LLL) is a probabilistic tool which shows that, if a collection of "bad" events $\mathcal B$ in a probability space are not too likely and not too interdependent, then there is a positive probability that no bad-events in $\mathcal B$ occur. Moser & Tardos (2010) gave sequential and parallel algorithms which transformed most applications of the variable-assignment LLL into efficient algorithms. A framework of Harvey & Vondrák (2015) based on "resampling oracles" extended this to general sequential algorithms for other probability spaces satisfying the Lopsided Lovász Local Lemma (LLLL). We describe a new structural property which holds for all known resampling oracles, which we call "obliviousness." Essentially, it means that the interaction between two bad-events $B, B'$ depends only on the randomness used to resample $B$, and not the precise state within $B$ itself. This property has two major consequences. First, combined with a framework of Kolmogorov (2016), it is the key to achieving a unified parallel LLLL algorithm, which is faster than previous, problem-specific algorithms of Harris (2016) for the variable-assignment LLLL algorithm and of Harris \& Srinivasan (2014) for permutations. This gives the first RNC algorithms for rainbow perfect matchings and rainbow hamiltonian cycles of $K_n$. Second, this property allows us to build LLLL probability spaces out of relatively simple "atomic" events. This provides the first sequential resampling oracle for rainbow perfect matchings on the complete $s$-uniform hypergraph $K_n^{(s)}$, and the first commutative resampling oracle for hamiltonian cycles of $K_n$. △ Less

Submitted 22 July, 2020; v1 submitted 8 February, 2017; originally announced February 2017.

Journal ref: ACM Transactions on Algorithms 17(1), Article #1 (2021)

arXiv:1612.02663 [pdf, other]

A constructive algorithm for the LLL on permutations

Authors: David G. Harris, Aravind Srinivasan

Abstract: While there has been significant progress on algorithmic aspects of the Lovász Local Lemma (LLL) in recent years, a noteworthy exception is when the LLL is used in the context of random permutations. The breakthrough algorithm of Moser & Tardos only works in the setting of independent variables, and does not apply in this context. We resolve this by developing a randomized polynomial-time algorith… ▽ More While there has been significant progress on algorithmic aspects of the Lovász Local Lemma (LLL) in recent years, a noteworthy exception is when the LLL is used in the context of random permutations. The breakthrough algorithm of Moser & Tardos only works in the setting of independent variables, and does not apply in this context. We resolve this by developing a randomized polynomial-time algorithm for such applications. A noteworthy application is for Latin transversals: the best-known general result here (Bissacot et al., improving on Erdős and Spencer), states that any $n \times n$ matrix in which each entry appears at most $(27/256)n$ times, has a Latin transversal. We present the first polynomial-time algorithm to construct such a transversal. We also develop RNC algorithms for Latin transversals, rainbow Hamiltonian cycles, strong chromatic number, and hypergraph packing. In addition to efficiently finding a configuration which avoids bad-events, the algorithm of Moser & Tardos has many powerful extensions and properties. These include a well-characterized distribution on the output distribution, parallel algorithms, and a partial resampling variant. We show that our algorithm has nearly all of the same useful properties as the Moser-Tardos algorithm, and present a comparison of this aspect with recent works on the LLL in general probability spaces. △ Less

Submitted 8 December, 2016; originally announced December 2016.

Journal ref: Theory of Computing 13(17), pp. 1-41 (2017)

arXiv:1612.01610

Tighter inapproximability for set cover

Authors: David G. Harris

Abstract: Set Cover is a classic NP-hard problem; as shown by Slavík (1997) the greedy algorithm gives an approximation ratio of $\ln n - \ln \ln n + Θ(1)$. A series of works by Lund \& Yannakakis (1994), Feige (1998), Moshkovitz (2015) have shown that, under the assumption $P \neq NP$, it is impossible to obtain a polynomial-time approximation ratio with approximation ratio $(1 - α) \ln n$, for any constan… ▽ More Set Cover is a classic NP-hard problem; as shown by Slavík (1997) the greedy algorithm gives an approximation ratio of $\ln n - \ln \ln n + Θ(1)$. A series of works by Lund \& Yannakakis (1994), Feige (1998), Moshkovitz (2015) have shown that, under the assumption $P \neq NP$, it is impossible to obtain a polynomial-time approximation ratio with approximation ratio $(1 - α) \ln n$, for any constant $α> 0$. In this note, we show that under the Exponential Time Hypothesis (a stronger complexity-theoretic assumptions than $P \neq NP$), there are no polynomial-time algorithms achieving approximation ratio $\ln n - C \ln \ln n$, where $C$ is some universal constant. Thus, the greedy algorithm achieves an essentially optimal approximation ratio (up to the coefficient of $\ln \ln n$). △ Less

Submitted 15 February, 2017; v1 submitted 5 December, 2016; originally announced December 2016.

Comments: We discovered that these results have already appeared in Dinur & Steurer, "Analytical approach to parallel repetition."

arXiv:1610.09653 [pdf, other]

New bounds for the Moser-Tardos distribution

Authors: David G. Harris

Abstract: The Lovasz Local Lemma (LLL) is a probabilistic tool which has been used to show the existence of a variety of combinatorial structures with good "local" properties. The "LLL-distribution" can be used to show that the resulting structures have good global properties in expectation. The simplest, variable-based setting of the LLL was covered by the seminal algorithm of Moser & Tardos (2010). This… ▽ More The Lovasz Local Lemma (LLL) is a probabilistic tool which has been used to show the existence of a variety of combinatorial structures with good "local" properties. The "LLL-distribution" can be used to show that the resulting structures have good global properties in expectation. The simplest, variable-based setting of the LLL was covered by the seminal algorithm of Moser & Tardos (2010). This has since been extended to other probability spaces including random permutations. One can similarly define an "MT-distribution" for these algorithms, that is, the distribution of the configuration they produce. Haeupler et al. (2011) showed bounds on the MT-distribution which essentially match the LLL-distribution for the variable-assignment setting; Harris & Srinivasan showed similar results for the permutation setting. In this work, we show new bounds on the MT-distribution which are significantly stronger than those known to hold for the LLL-distribution. In the variable-assignment setting, we show a tighter bound on the probability of a disjunctive event or singleton event. As a consequence, in $k$-SAT instances with bounded variable occurrence, the MT-distribution satisfies an $ε$-approximate independence condition asymptotically stronger than the LLL-distribution. We use this to show a nearly tight bound on the minimum implicate size of a CNF boolean formula. Another noteworthy application is constructing independent transversals which avoid a given subset of vertices; this provides a constructive analogue to a result of Rabern (2014). In the permutation LLL setting, we show a new type of bound which is similar to the cluster-expansion LLL criterion of Bissacot et al. (2011), but is stronger and takes advantage of the extra structure in permutations. We illustrate with improved bounds on weighted Latin transversals and partial Latin transversals. △ Less

Submitted 10 October, 2019; v1 submitted 30 October, 2016; originally announced October 2016.

Journal ref: Random Structures & Algorithms 57(1), pp. 97-131 (2020)

arXiv:1610.03383 [pdf, ps, other]

Deterministic parallel algorithms for fooling polylogarithmic juntas and the Lovasz Local Lemma

Authors: David G. Harris

Abstract: Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with Luby (1993) and continuing with Berger & Rompel (1991) and Chari et al. (2000), showed that these techniques can be combined to give deterministic parallel algorithms for combinatorial optimization p… ▽ More Many randomized algorithms can be derandomized efficiently using either the method of conditional expectations or probability spaces with low (almost-) independence. A series of papers, beginning with Luby (1993) and continuing with Berger & Rompel (1991) and Chari et al. (2000), showed that these techniques can be combined to give deterministic parallel algorithms for combinatorial optimization problems involving sums of $w$-juntas. We improve these algorithms through derandomized variable partitioning and a new code construction for fooling Fourier characters over $GF(2)$. This reduces the processor complexity to essentially independent of $w$ while the running time is reduced from exponential in $w$ to linear in $w$. As a key subroutine, we give a new algorithm to generate a probability space which can fool a given set of neighborhoods. Schulman (1992) gave an NC algorithm to do so for neighborhoods of size $w \leq O(\log n)$. Our new algorithm is NC1, with essentially optimal time and processor complexity, when $w = O(\log n)$; it remains NC up to $w = \text{polylog}(n)$. This answers an open problem of Schulman. One major application of these algorithms is an NC algorithm for the Lovász Local Lemma. Previous NC algorithms, including the seminal algorithm of Moser & Tardos (2010) and the work of Chandrasekaran et. al (2013), required that (essentially) the bad-events could span only $O(\log n)$ variables; we relax this to $\text{polylog}(n)$ variables. We use this for an $\text{NC}^2$ algorithm for defective vertex coloring, which works for arbitrary degree graphs. △ Less

Submitted 20 August, 2018; v1 submitted 11 October, 2016; originally announced October 2016.

Journal ref: ACM Transactions on Algorithms 14(4), Article #47 (2017)

arXiv:1610.02420 [pdf, ps, other]

Lopsidependency in the Moser-Tardos framework: Beyond the Lopsided Lovasz Local Lemma

Authors: David G. Harris

Abstract: The Lopsided Lovász Local Lemma (LLLL) is a powerful probabilistic principle which has been used in a variety of combinatorial constructions. While originally a general statement about probability spaces, it has recently been transformed into a variety of polynomial-time algorithms. The resampling algorithm of Moser & Tardos (2010) is the most well-known example of this. A variety of criteria have… ▽ More The Lopsided Lovász Local Lemma (LLLL) is a powerful probabilistic principle which has been used in a variety of combinatorial constructions. While originally a general statement about probability spaces, it has recently been transformed into a variety of polynomial-time algorithms. The resampling algorithm of Moser & Tardos (2010) is the most well-known example of this. A variety of criteria have been shown for the LLLL; the strongest possible criterion was shown by Shearer, and other criteria which are easier to use computationally have been shown by Bissacot et al (2011), Pegden (2014), Kolipaka & Szegedy (2011), and Kolipaka, Szegedy, Xu (2012). We show a new criterion for the Moser-Tardos algorithm to converge. This criterion is stronger than the LLLL criterion; this is possible because it does not apply in the same generality as the original LLLL; yet, it is strong enough to cover many applications of the LLLL in combinatorics. We show a variety of new bounds and algorithms. A noteworthy application is for $k$-SAT, with bounded occurrences of variables. As shown in Gebauer, Szábo, and Tardos (2011), a $k$-SAT instance in which every variable appears $L \leq \frac{2^{k+1}}{e (k+1)}$ times, is satisfiable. Although this bound is asymptotically tight (in $k$), we improve it to $L \leq \frac{2^{k+1} (1 - 1/k)^k}{k-1} - \frac{2}{k}$ which can be significantly stronger when $k$ is small. We introduce a new parallel algorithm for the LLLL. While Moser & Tardos described a simple parallel algorithm for the Lovász Local Lemma, and described a simple sequential algorithm for a form of the Lopsided Lemma, they were not able to combine the two. Our new algorithm applies in nearly all settings in which the sequential algorithm works --- this includes settings covered by our new stronger LLLL criterion. △ Less

Submitted 1 December, 2016; v1 submitted 7 October, 2016; originally announced October 2016.

Journal ref: ACM Transactions on Algorithms 13(1), Article #17 (2016)

arXiv:1610.01926 [pdf, other]

Comparison of two convergence criteria for the variable-assignment Lopsided Lovasz Local Lemma

Authors: David G. Harris

Abstract: The Lopsided Lovasz Local Lemma (LLLL) is a cornerstone probabilistic tool for showing that it is possible to avoid a collection of "bad" events as long as their probabilities and interdependencies are sufficiently small. The strongest possible criterion in these terms is due to Shearer (1985), although it is technically difficult to apply to constructions in combinatorics. The original formulat… ▽ More The Lopsided Lovasz Local Lemma (LLLL) is a cornerstone probabilistic tool for showing that it is possible to avoid a collection of "bad" events as long as their probabilities and interdependencies are sufficiently small. The strongest possible criterion in these terms is due to Shearer (1985), although it is technically difficult to apply to constructions in combinatorics. The original formulation of the LLLL was non-constructive; a seminal algorithm of Moser & Tardos (2010) gave an efficient algorithm for nearly all its applications, including to $k$-SAT instances where each variable appears in a bounded number of clauses. Harris (2015) later gave an alternate criterion for this algorithm to converge; unlike the LLL criterion or its variants, this criterion depends in a fundamental way on the decomposition of bad-events into variables. In this note, we show that the criterion given by Harris can be stronger in some cases even than Shearer's criterion. We construct $k$-SAT formulas with bounded variable occurrence, and show that the criterion of Harris is satisfied while the criterion of Shearer is violated. In fact, there is an exponentially growing gap between the bounds provable from any form of the LLLL and from the bound shown by Harris. △ Less

Submitted 26 August, 2022; v1 submitted 6 October, 2016; originally announced October 2016.

Journal ref: Electronic Journal of Combinatorics 29(4), #P4.10 (2022)

arXiv:1609.06156 [pdf, ps, other]

Derandomized concentration bounds for polynomials, and hypergraph maximal independent set

Authors: David G. Harris

Abstract: A parallel algorithm for maximal independent set (MIS) in hypergraphs has been a long-standing algorithmic challenge, dating back nearly 30 years to a survey of Karp & Ramachandran (1990). The best randomized parallel algorithm for hypergraphs of fixed rank $r$ was developed by Beame & Luby (1990) and Kelsen (1992), running in time roughly $(\log n)^{r!}$. We improve the randomized algorithm of… ▽ More A parallel algorithm for maximal independent set (MIS) in hypergraphs has been a long-standing algorithmic challenge, dating back nearly 30 years to a survey of Karp & Ramachandran (1990). The best randomized parallel algorithm for hypergraphs of fixed rank $r$ was developed by Beame & Luby (1990) and Kelsen (1992), running in time roughly $(\log n)^{r!}$. We improve the randomized algorithm of Kelsen, reducing the runtime to roughly $(\log n)^{2^r}$ and simplifying the analysis through the use of more-modern concentration inequalities. We also give a method for derandomizing concentration bounds for low-degree polynomials, which are the key technical tool used to analyze that algorithm. This leads to a deterministic PRAM algorithm also running in $(\log n)^{2^{r+3}}$ time and $\text{poly}(m,n)$ processors. This is the first deterministic algorithm with sub-polynomial runtime for hypergraphs of rank $r > 3$. Our analysis can also apply when $r$ is slowly growing; using this in conjunction with a strategy of Bercea et al. (2015) gives a deterministic MIS algorithm running in time $\exp(O( \frac{\log (mn)}{\log \log (mn)}))$. △ Less

Submitted 23 May, 2019; v1 submitted 20 September, 2016; originally announced September 2016.

Journal ref: ACM Transactions on Algorithms 15(3), Article #43 (2019)

arXiv:1605.04714 [pdf]

Participating or Not Participating? A Sociomaterial Perspective of the Embeddedness of Online Communities in Everyday Life

Authors: Geri Harris, Babak Abedin

Abstract: Online communities are increasingly becoming a venue for socializing, engaging in politics, and conducting business. Ironically, the same enabling social media technology is encroaching into everyday life and reconfiguring relations of participation. Yet, while participation in online communities has been widely studied empirically, theoretical aspects of this social phenomenon need further invest… ▽ More Online communities are increasingly becoming a venue for socializing, engaging in politics, and conducting business. Ironically, the same enabling social media technology is encroaching into everyday life and reconfiguring relations of participation. Yet, while participation in online communities has been widely studied empirically, theoretical aspects of this social phenomenon need further investigation. This paper uses a sociomaterial perspective to further develop theoretical explanation of participation in online communities and the impacts of not participating online. A sociomaterial view of online community participation decenters the human participant and recognises the agency of technology, thus creating a richer understanding than epistemological paradigms. Using converging hermeneutic circles, the paper first reviews literature for evidence of sociomaterial applications to online community research, and then proposes a framework for expressing participation in online communities from a sociomaterial perspective. Subsequently, implications of the findings and the potential for future studies are discussed. △ Less

Submitted 16 May, 2016; originally announced May 2016.

Comments: ISBN# 978-0-646-95337-3 Presented at the Australasian Conference on Information Systems 2015 (arXiv:1605.01032)

Report number: ACIS/2015/20

arXiv:1603.01486 [pdf, other]

Distributed $(Δ+1)$-Coloring in Sublogarithmic Rounds

Authors: David G. Harris, Johannes Schneider, Hsin-Hao Su

Abstract: We give a new randomized distributed algorithm for $(Δ+1)$-coloring in the LOCAL model, running in $O(\sqrt{\log Δ})+ 2^{O(\sqrt{\log \log n})}$ rounds in a graph of maximum degree~$Δ$. This implies that the $(Δ+1)$-coloring problem is easier than the maximal independent set problem and the maximal matching problem, due to their lower bounds of… ▽ More We give a new randomized distributed algorithm for $(Δ+1)$-coloring in the LOCAL model, running in $O(\sqrt{\log Δ})+ 2^{O(\sqrt{\log \log n})}$ rounds in a graph of maximum degree~$Δ$. This implies that the $(Δ+1)$-coloring problem is easier than the maximal independent set problem and the maximal matching problem, due to their lower bounds of $Ω\left( \min \left( \sqrt{\frac{\log n}{\log \log n}}, \frac{\log Δ}{\log \log Δ} \right) \right)$ by Kuhn, Moscibroda, and Wattenhofer [PODC'04]. Our algorithm also extends to list-coloring where the palette of each node contains $Δ+1$ colors. We extend the set of distributed symmetry-breaking techniques by performing a decomposition of graphs into dense and sparse parts. △ Less

Submitted 17 January, 2018; v1 submitted 4 March, 2016; originally announced March 2016.

ACM Class: F.2.2

Journal ref: Journal of the ACM 65(4), Article #19 (2018)

arXiv:1602.08730 [pdf, ps, other]

Improved bounds and algorithms for graph cuts and network reliability

Authors: David G. Harris, Aravind Srinivasan

Abstract: Karger (SIAM Journal on Computing, 1999) developed the first fully-polynomial approximation scheme to estimate the probability that a graph $G$ becomes disconnected, given that its edges are removed independently with probability $p$. This algorithm runs in $n^{5+o(1)} ε^{-3}$ time to obtain an estimate within relative error $ε$. We improve this run-time through algorithmic and graph-theoretic a… ▽ More Karger (SIAM Journal on Computing, 1999) developed the first fully-polynomial approximation scheme to estimate the probability that a graph $G$ becomes disconnected, given that its edges are removed independently with probability $p$. This algorithm runs in $n^{5+o(1)} ε^{-3}$ time to obtain an estimate within relative error $ε$. We improve this run-time through algorithmic and graph-theoretic advances. First, there is a certain key sub-problem encountered by Karger, for which a generic estimation procedure is employed, we show that this has a special structure for which a much more efficient algorithm can be used. Second, we show better bounds on the number of edge cuts which are likely to fail. Here, Karger's analysis uses a variety of bounds for various graph parameters, we show that these bounds cannot be simultaneously tight. We describe a new graph parameter, which simultaneously influences all the bounds used by Karger, and obtain much tighter estimates of the cut structure of $G$. These techniques allow us to improve the runtime to $n^{3+o(1)} ε^{-2}$, our results also rigorously prove certain experimental observations of Karger & Tai (Proc. ACM-SIAM Symposium on Discrete Algorithms, 1997). Our rigorous proofs are motivated by certain non-rigorous differential-equation approximations which, however, provably track the worst-case trajectories of the relevant parameters. A key driver of Karger's approach (and other cut-related results) is a bound on the number of small cuts: we improve these estimates when the min-cut size is "small" and odd, augmenting, in part, a result of Bixby (Bulletin of the AMS, 1974). △ Less

Submitted 4 August, 2017; v1 submitted 28 February, 2016; originally announced February 2016.

Journal ref: Random Structures & Algorithms 52(1), pp. 74-135 (2018)

arXiv:1509.06430 [pdf, ps, other]

Parallel algorithms and concentration bounds for the Lovasz Local Lemma via witness DAGs

Authors: Bernhard Haeupler, David G. Harris

Abstract: The Lovász Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This can be parallelized to give an algorithm that uses polynomially many processors and runs in $O(\log^3 n)$ time on an EREW PRAM, stemming from $O(\log n)$ adaptive computations of a max… ▽ More The Lovász Local Lemma (LLL) is a cornerstone principle in the probabilistic method of combinatorics, and a seminal algorithm of Moser & Tardos (2010) provides an efficient randomized algorithm to implement it. This can be parallelized to give an algorithm that uses polynomially many processors and runs in $O(\log^3 n)$ time on an EREW PRAM, stemming from $O(\log n)$ adaptive computations of a maximal independent set (MIS). Chung et al. (2014) developed faster local and parallel algorithms, potentially running in time $O(\log^2 n)$, but these algorithms require more stringent conditions than the LLL. We give a new parallel algorithm that works under essentially the same conditions as the original algorithm of Moser & Tardos but uses only a single MIS computation, thus running in $O(\log^2 n)$ time on an EREW PRAM. This can be derandomized to give an NC algorithm running in time $O(\log^2 n)$ as well, speeding up a previous NC LLL algorithm of Chandrasekaran et al. (2013). We also provide improved and tighter bounds on the run-times of the sequential and parallel resampling-based algorithms originally developed by Moser & Tardos. These apply to any problem instance in which the tighter Shearer LLL criterion is satisfied. △ Less

Submitted 28 September, 2017; v1 submitted 21 September, 2015; originally announced September 2015.

Journal ref: ACM Transactions on Algorithms 13(4), Article #53 (2017)

arXiv:1507.07402 [pdf, ps, other]

Partial resampling to approximate covering integer programs

Authors: Antares Chen, David G. Harris, Aravind Srinivasan

Abstract: We consider column-sparse covering integer programs, a generalization of set cover, which have a long line of research of (randomized) approximation algorithms. We develop a new rounding scheme based on the Partial Resampling variant of the Lovász Local Lemma developed by Harris & Srinivasan (2019). This achieves an approximation ratio of… ▽ More We consider column-sparse covering integer programs, a generalization of set cover, which have a long line of research of (randomized) approximation algorithms. We develop a new rounding scheme based on the Partial Resampling variant of the Lovász Local Lemma developed by Harris & Srinivasan (2019). This achieves an approximation ratio of $1 + \frac{\ln (Δ_1+1)}{a_{\min}} + O\Big( \log(1 + \sqrt{ \frac{\log (Δ_1+1)}{a_{\min}}} \Big)$, where $a_{\min}$ is the minimum covering constraint and $Δ_1$ is the maximum $\ell_1$-norm of any column of the covering matrix (whose entries are scaled to lie in $[0,1]$). When there are additional constraints on the variable sizes, we show an approximation ratio of $\ln Δ_0 + O(\log \log Δ_0)$ (where $Δ_0$ is the maximum number of non-zero entries in any column of the covering matrix). These results improve asymptotically, in several different ways, over results of Srinivasan (2006) and Kolliopoulos & Young (2005). We show nearly-matching inapproximability and integrality-gap lower bounds. We also show that the rounding process leads to negative correlation among the variables, which allows us to handle multi-criteria programs. △ Less

Submitted 2 July, 2020; v1 submitted 27 July, 2015; originally announced July 2015.

Journal ref: Random Structures & Algorithms 58(1), pp. 68-93 (2021)

arXiv:1507.02674 [pdf, ps, other]

Algorithmic and enumerative aspects of the Moser-Tardos distribution

Authors: David G. Harris, Aravind Srinivasan

Abstract: Moser & Tardos have developed a powerful algorithmic approach (henceforth "MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its variants is a search for "bad" events in a current configuration. In the initial stage of MT, the variables are set independently. We examine the distributions on these variables which arise during intermediate stages of MT. We show that these conf… ▽ More Moser & Tardos have developed a powerful algorithmic approach (henceforth "MT") to the Lovasz Local Lemma (LLL); the basic operation done in MT and its variants is a search for "bad" events in a current configuration. In the initial stage of MT, the variables are set independently. We examine the distributions on these variables which arise during intermediate stages of MT. We show that these configurations have a more or less "random" form, building further on the "MT-distribution" concept of Haeupler et al. in understanding the (intermediate and) output distribution of MT. This has a variety of algorithmic applications; the most important is that bad events can be found relatively quickly, improving upon MT across the complexity spectrum: it makes some polynomial-time algorithms sub-linear (e.g., for Latin transversals, which are of basic combinatorial interest), gives lower-degree polynomial run-times in some settings, transforms certain super-polynomial-time algorithms into polynomial-time ones, and leads to Las Vegas algorithms for some coloring problems for which only Monte Carlo algorithms were known. We show that in certain conditions when the LLL condition is violated, a variant of the MT algorithm can still produce a distribution which avoids most of the bad events. We show in some cases this MT variant can run faster than the original MT algorithm itself, and develop the first-known criterion for the case of the asymmetric LLL. This can be used to find partial Latin transversals -- improving upon earlier bounds of Stein (1975) -- among other applications. We furthermore give applications in enumeration, showing that most applications (where we aim for all or most of the bad events to be avoided) have many more solutions than known before by proving that the MT-distribution has "large" min-entropy and hence that its support-size is large. △ Less

Submitted 16 February, 2017; v1 submitted 9 July, 2015; originally announced July 2015.

Journal ref: ACM Transactions on Algorithms 13(3), Article #33 (2017)

arXiv:1406.5943 [pdf, other]

The Moser-Tardos Framework with Partial Resampling

Authors: David G. Harris, Aravind Srinivasan

Abstract: The resampling algorithm of Moser \& Tardos is a powerful approach to develop constructive versions of the Lovász Local Lemma (LLL). We generalize this to partial resampling: when a bad event holds, we resample an appropriately-random subset of the variables that define this event, rather than the entire set as in Moser & Tardos. This is particularly useful when the bad events are determined by su… ▽ More The resampling algorithm of Moser \& Tardos is a powerful approach to develop constructive versions of the Lovász Local Lemma (LLL). We generalize this to partial resampling: when a bad event holds, we resample an appropriately-random subset of the variables that define this event, rather than the entire set as in Moser & Tardos. This is particularly useful when the bad events are determined by sums of random variables. This leads to several improved algorithmic applications in scheduling, graph transversals, packet routing etc. For instance, we settle a conjecture of Szabó & Tardos (2006) on graph transversals asymptotically, and obtain improved approximation ratios for a packet routing problem of Leighton, Maggs, & Rao (1994). △ Less

Submitted 27 June, 2019; v1 submitted 23 June, 2014; originally announced June 2014.

Journal ref: Journal of the ACM 66(5), Article #36 (2019)

arXiv:1405.1133 [pdf, ps, other]

On Computing Maximal Independent Sets of Hypergraphs in Parallel

Authors: Ioana O. Bercea, Navin Goyal, David G. Harris, Aravind Srinivasan

Abstract: Whether or not the problem of finding maximal independent sets (MIS) in hypergraphs is in (R)NC is one of the fundamental problems in the theory of parallel computing. Unlike the well-understood case of MIS in graphs, for the hypergraph problem, our knowledge is quite limited despite considerable work. It is known that the problem is in \emph{RNC} when the edges of the hypergraph have constant siz… ▽ More Whether or not the problem of finding maximal independent sets (MIS) in hypergraphs is in (R)NC is one of the fundamental problems in the theory of parallel computing. Unlike the well-understood case of MIS in graphs, for the hypergraph problem, our knowledge is quite limited despite considerable work. It is known that the problem is in \emph{RNC} when the edges of the hypergraph have constant size. For general hypergraphs with $n$ vertices and $m$ edges, the fastest previously known algorithm works in time $O(\sqrt{n})$ with $\text{poly}(m,n)$ processors. In this paper we give an EREW PRAM algorithm that works in time $n^{o(1)}$ with $\text{poly}(m,n)$ processors on general hypergraphs satisfying $m \leq n^{\frac{\log^{(2)}n}{8(\log^{(3)}n)^2}}$, where $\log^{(2)}n = \log\log n$ and $\log^{(3)}n = \log\log\log n$. Our algorithm is based on a sampling idea that reduces the dimension of the hypergraph and employs the algorithm for constant dimension hypergraphs as a subroutine. △ Less

Submitted 12 August, 2014; v1 submitted 5 May, 2014; originally announced May 2014.

ACM Class: G.2.2; F.1.2; F.2.2; G.1.0

arXiv:1401.6734 [pdf, ps, other]

Distinct volume subsets

Authors: David Conlon, Jacob Fox, William Gasarch, David G. Harris, Douglas Ulrich, Samuel Zbarsky

Abstract: Suppose that $a$ and $d$ are positive integers with $a \geq 2$. Let $h_{a,d}(n)$ be the largest integer $t$ such that any set of $n$ points in $\mathbb{R}^d$ contains a subset of $t$ points for which all the non-zero volumes of the ${t \choose a}$ subsets of order $a$ are distinct. Beginning with Erdős in 1957, the function $h_{2,d}(n)$ has been closely studied and is known to be at least a power… ▽ More Suppose that $a$ and $d$ are positive integers with $a \geq 2$. Let $h_{a,d}(n)$ be the largest integer $t$ such that any set of $n$ points in $\mathbb{R}^d$ contains a subset of $t$ points for which all the non-zero volumes of the ${t \choose a}$ subsets of order $a$ are distinct. Beginning with Erdős in 1957, the function $h_{2,d}(n)$ has been closely studied and is known to be at least a power of $n$. We improve the best known bound for $h_{2,d}(n)$ and show that $h_{a,d}(n)$ is at least a power of $n$ for all $a$ and $d$. △ Less

Submitted 10 May, 2015; v1 submitted 26 January, 2014; originally announced January 2014.

Comments: 10 pages

Journal ref: SIAM Journal on Discrete Math 29(1), pp. 472-480 (2014)

Showing 1–50 of 51 results for author: Harris, G