subscribe to arXiv mailings

Sharif-MGTD at SemEval-2024 Task 8: A Transformer-Based Approach to Detect Machine Generated Text

Authors: Seyedeh Fatemeh Ebrahimi, Karim Akhavan Azari, Amirmasoud Iravani, Arian Qazvini, Pouya Sadeghi, Zeinab Sadat Taghavi, Hossein Sameti

Abstract: Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a pow… ▽ More Detecting Machine-Generated Text (MGT) has emerged as a significant area of study within Natural Language Processing. While language models generate text, they often leave discernible traces, which can be scrutinized using either traditional feature-based methods or more advanced neural language models. In this research, we explore the effectiveness of fine-tuning a RoBERTa-base transformer, a powerful neural architecture, to address MGT detection as a binary classification task. Focusing specifically on Subtask A (Monolingual-English) within the SemEval-2024 competition framework, our proposed system achieves an accuracy of 78.9% on the test dataset, positioning us at 57th among participants. Our study addresses this challenge while considering the limited hardware resources, resulting in a system that excels at identifying human-written texts but encounters challenges in accurately discerning MGTs. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 8 pages, 3 figures, 2 tables. Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

arXiv:2407.01167 [pdf, other]

Information Density Bounds for Privacy

Authors: Sara Saeidian, Leonhard Grosse, Parastoo Sadeghi, Mikael Skoglund, Tobias J. Oechtering

Abstract: This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce an operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information… ▽ More This paper explores the implications of guaranteeing privacy by imposing a lower bound on the information density between the private and the public data. We introduce an operationally meaningful privacy measure called pointwise maximal cost (PMC) and demonstrate that imposing an upper bound on PMC is equivalent to enforcing a lower bound on the information density. PMC quantifies the information leakage about a secret to adversaries who aim to minimize non-negative cost functions after observing the outcome of a privacy mechanism. When restricted to finite alphabets, PMC can equivalently be defined as the information leakage to adversaries aiming to minimize the probability of incorrectly guessing randomized functions of the secret. We study the properties of PMC and apply it to standard privacy mechanisms to demonstrate its practical relevance. Through a detailed examination, we connect PMC with other privacy measures that impose upper or lower bounds on the information density. Our results highlight that lower bounding the information density is a more stringent requirement than upper bounding it. Overall, our work significantly bridges the gaps in understanding the relationships between various privacy frameworks and provides insights for selecting a suitable framework for a given application. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.13569 [pdf, other]

Bayes' capacity as a measure for reconstruction attacks in federated learning

Authors: Sayan Biswas, Mark Dras, Pedro Faustini, Natasha Fernandes, Annabelle McIver, Catuscia Palamidessi, Parastoo Sadeghi

Abstract: Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the wei… ▽ More Within the machine learning community, reconstruction attacks are a principal attack of concern and have been identified even in federated learning, which was designed with privacy preservation in mind. In federated learning, it has been shown that an adversary with knowledge of the machine learning architecture is able to infer the exact value of a training element given an observation of the weight updates performed during stochastic gradient descent. In response to these threats, the privacy community recommends the use of differential privacy in the stochastic gradient descent algorithm, termed DP-SGD. However, DP has not yet been formally established as an effective countermeasure against reconstruction attacks. In this paper, we formalise the reconstruction threat model using the information-theoretic framework of quantitative information flow. We show that the Bayes' capacity, related to the Sibson mutual information of order infinity, represents a tight upper bound on the leakage of the DP-SGD algorithm to an adversary interested in performing a reconstruction attack. We provide empirical results demonstrating the effectiveness of this measure for comparing mechanisms against reconstruction threats. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.06990 [pdf, ps, other]

Privacy-Utility Tradeoff Based on $α$-lift

Authors: Mohammad Amin Zarrabian, Parastoo Sadeghi

Abstract: Information density and its exponential form, known as lift, play a central role in information privacy leakage measures. $α$-lift is the power-mean of lift, which is tunable between the worst-case measure max-lift ($α=\infty$) and more relaxed versions ($α<\infty$). This paper investigates the optimization problem of the privacy-utility tradeoff (PUT) where $α$-lift and mutual information are pri… ▽ More Information density and its exponential form, known as lift, play a central role in information privacy leakage measures. $α$-lift is the power-mean of lift, which is tunable between the worst-case measure max-lift ($α=\infty$) and more relaxed versions ($α<\infty$). This paper investigates the optimization problem of the privacy-utility tradeoff (PUT) where $α$-lift and mutual information are privacy and utility measures, respectively. Due to the nonlinear nature of $α$-lift for $α<\infty$, finding the optimal solution is challenging. Therefore, we propose a heuristic algorithm to estimate the optimal utility for each value of $α$, inspired by the optimal solution for $α=\infty$ and the convexity of $α$-lift with respect to the lift, which we prove. The numerical results show the efficacy of the algorithm and indicate the effective range of $α$ and privacy budget $\varepsilon$ with good PUT performance. △ Less

Submitted 20 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: This version has developed algorithm representations and updated simulation results

arXiv:2405.07151 [pdf, other]

Group Complete-$\{s\}$ Pliable Index Coding

Authors: Sina Eghbal, Badri N. Vellambi, Lawrence Ong, Parastoo Sadeghi

Abstract: This paper introduces a novel class of PICOD($t$) problems referred to as $g$-group complete-$S$ PICOD($t$) problems. It constructs a multi-stage achievability scheme to generate pliable index codes for group complete PICOD problems when $S = \{s\}$ is a singleton set. Using the maximum acyclic induced subgraph bound, lower bounds on the broadcast rate are derived for singleton $S$, which establis… ▽ More This paper introduces a novel class of PICOD($t$) problems referred to as $g$-group complete-$S$ PICOD($t$) problems. It constructs a multi-stage achievability scheme to generate pliable index codes for group complete PICOD problems when $S = \{s\}$ is a singleton set. Using the maximum acyclic induced subgraph bound, lower bounds on the broadcast rate are derived for singleton $S$, which establishes the optimality of the achievability scheme for a range of values for $t$ and for any $g$ and $s$. For all other values, it is shown that the achievability scheme is optimal among the restricted class of broadcast codes. △ Less

Submitted 11 May, 2024; originally announced May 2024.

Comments: Accepted for publication in 2024 IEEE International Symposium on Information Theory

arXiv:2405.00423 [pdf, ps, other]

$α$-leakage by Rényi Divergence and Sibson Mutual Information

Authors: Ni Ding, Mohammad Amin Zarrabian, Parastoo Sadeghi

Abstract: For $\tilde{f}(t) = \exp(\frac{α-1}αt)$, this paper proposes a $\tilde{f}$-mean information gain measure. Rényi divergence is shown to be the maximum $\tilde{f}$-mean information gain incurred at each elementary event $y$ of channel output $Y$ and Sibson mutual information is the $\tilde{f}$-mean of this $Y$-elementary information gain. Both are proposed as $α$-leakage measures, indicating the mos… ▽ More For $\tilde{f}(t) = \exp(\frac{α-1}αt)$, this paper proposes a $\tilde{f}$-mean information gain measure. Rényi divergence is shown to be the maximum $\tilde{f}$-mean information gain incurred at each elementary event $y$ of channel output $Y$ and Sibson mutual information is the $\tilde{f}$-mean of this $Y$-elementary information gain. Both are proposed as $α$-leakage measures, indicating the most information an adversary can obtain on sensitive data. It is shown that the existing $α$-leakage by Arimoto mutual information can be expressed as $\tilde{f}$-mean measures by a scaled probability. Further, Sibson mutual information is interpreted as the maximum $\tilde{f}$-mean information gain over all estimation decisions applied to channel output. △ Less

Submitted 2 July, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: authorship dispute

arXiv:2404.04845 [pdf, other]

SLPL SHROOM at SemEval2024 Task 06: A comprehensive study on models ability to detect hallucination

Authors: Pouya Fallah, Soroush Gooran, Mohammad Jafarinasab, Pouya Sadeghi, Reza Farnia, Amirreza Tarabkhah, Zainab Sadat Taghavi, Hossein Sameti

Abstract: Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledge or the source text. This study explores methods for detecting hallucinations in three SemEval-2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation. We evaluate two methods: semantic similarity between the generated text and factu… ▽ More Language models, particularly generative models, are susceptible to hallucinations, generating outputs that contradict factual knowledge or the source text. This study explores methods for detecting hallucinations in three SemEval-2024 Task 6 tasks: Machine Translation, Definition Modeling, and Paraphrase Generation. We evaluate two methods: semantic similarity between the generated text and factual references, and an ensemble of language models that judge each other's outputs. Our results show that semantic similarity achieves moderate accuracy and correlation scores in trial data, while the ensemble method offers insights into the complexities of hallucination detection but falls short of expectations. This work highlights the challenges of hallucination detection and underscores the need for further research in this critical area. △ Less

Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

arXiv:2404.02474 [pdf, other]

uTeBC-NLP at SemEval-2024 Task 9: Can LLMs be Lateral Thinkers?

Authors: Pouya Sadeghi, Amirhossein Abaskohi, Yadollah Yaghoobzadeh

Abstract: Inspired by human cognition, Jiang et al.(2023c) create a benchmark for assessing LLMs' lateral thinking-thinking outside the box. Building upon this benchmark, we investigate how different prompting methods enhance LLMs' performance on this task to reveal their inherent power for outside-the-box thinking ability. Through participating in SemEval-2024, task 9, Sentence Puzzle sub-task, we explore… ▽ More Inspired by human cognition, Jiang et al.(2023c) create a benchmark for assessing LLMs' lateral thinking-thinking outside the box. Building upon this benchmark, we investigate how different prompting methods enhance LLMs' performance on this task to reveal their inherent power for outside-the-box thinking ability. Through participating in SemEval-2024, task 9, Sentence Puzzle sub-task, we explore prompt engineering methods: chain of thoughts (CoT) and direct prompting, enhancing with informative descriptions, and employing contextualizing prompts using a retrieval augmented generation (RAG) pipeline. Our experiments involve three LLMs including GPT-3.5, GPT-4, and Zephyr-7B-beta. We generate a dataset of thinking paths between riddles and options using GPT-4, validated by humans for quality. Findings indicate that compressed informative prompts enhance performance. Dynamic in-context learning enhances model performance significantly. Furthermore, fine-tuning Zephyr on our dataset enhances performance across other commonsense datasets, underscoring the value of innovative thinking. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 12 pages, 5 figures, 6 tables, Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024) @ NAACL 2024

arXiv:2404.02403 [pdf, other]

Benchmarking Large Language Models for Persian: A Preliminary Study Focusing on ChatGPT

Authors: Amirhossein Abaskohi, Sara Baruni, Mostafa Masoudi, Nesa Abbasi, Mohammad Hadi Babalou, Ali Edalat, Sepehr Kamahi, Samin Mahdizadeh Sani, Nikoo Naghavian, Danial Namazifard, Pouya Sadeghi, Yadollah Yaghoobzadeh

Abstract: This paper explores the efficacy of large language models (LLMs) for Persian. While ChatGPT and consequent LLMs have shown remarkable performance in English, their efficiency for more low-resource languages remains an open question. We present the first comprehensive benchmarking study of LLMs across diverse Persian language tasks. Our primary focus is on GPT-3.5-turbo, but we also include GPT-4 a… ▽ More This paper explores the efficacy of large language models (LLMs) for Persian. While ChatGPT and consequent LLMs have shown remarkable performance in English, their efficiency for more low-resource languages remains an open question. We present the first comprehensive benchmarking study of LLMs across diverse Persian language tasks. Our primary focus is on GPT-3.5-turbo, but we also include GPT-4 and OpenChat-3.5 to provide a more holistic evaluation. Our assessment encompasses a diverse set of tasks categorized into classic, reasoning, and knowledge-based domains. To enable a thorough comparison, we evaluate LLMs against existing task-specific fine-tuned models. Given the limited availability of Persian datasets for reasoning tasks, we introduce two new benchmarks: one based on elementary school math questions and another derived from the entrance exams for 7th and 10th grades. Our findings reveal that while LLMs, especially GPT-4, excel in tasks requiring reasoning abilities and a broad understanding of general knowledge, they often lag behind smaller pre-trained models fine-tuned specifically for particular tasks. Additionally, we observe improved performance when test sets are translated to English before inputting them into GPT-3.5. These results highlight the significant potential for enhancing LLM performance in the Persian language. This is particularly noteworthy due to the unique attributes of Persian, including its distinct alphabet and writing styles. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 14 pages, 1 figure, 6 tables, Proceeding of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING)

arXiv:2403.10342 [pdf, other]

Cooperative Jamming for Physical Layer Security Enhancement Using Deep Reinforcement Learning

Authors: Sayed Amir Hoseini, Faycal Bouhafs, Neda Aboutorab, Parastoo Sadeghi, Frank den Hartog

Abstract: Wireless data communications are always facing the risk of eavesdropping and interception. Conventional protection solutions which are based on encryption may not always be practical as is the case for wireless IoT networks or may soon become ineffective against quantum computers. In this regard, Physical Layer Security (PLS) presents a promising approach to secure wireless communications through… ▽ More Wireless data communications are always facing the risk of eavesdropping and interception. Conventional protection solutions which are based on encryption may not always be practical as is the case for wireless IoT networks or may soon become ineffective against quantum computers. In this regard, Physical Layer Security (PLS) presents a promising approach to secure wireless communications through the exploitation of the physical properties of the wireless channel. Cooperative Friendly Jamming (CFJ) is among the PLS techniques that have received attention in recent years. However, finding an optimal transmit power allocation that results in the highest secrecy is a complex problem that becomes more difficult to address as the size of the wireless network increases. In this paper, we propose an optimization approach to achieve CFJ in large Wi-Fi networks by using a Reinforcement Learning Algorithm. Obtained results show that our optimization approach offers better secrecy results and becomes more effective as the network size and the density of Wi-Fi access points increase. △ Less

Submitted 15 March, 2024; originally announced March 2024.

arXiv:2402.12967 [pdf, other]

Quantifying Privacy via Information Density

Authors: Leonhard Grosse, Sara Saeidian, Parastoo Sadeghi, Tobias J. Oechtering, Mikael Skoglund

Abstract: We examine the relationship between privacy metrics that utilize information density to measure information leakage between a private and a disclosed random variable. Firstly, we prove that bounding the information density from above or below in turn implies a lower or upper bound on the information density, respectively. Using this result, we establish new relationships between local information… ▽ More We examine the relationship between privacy metrics that utilize information density to measure information leakage between a private and a disclosed random variable. Firstly, we prove that bounding the information density from above or below in turn implies a lower or upper bound on the information density, respectively. Using this result, we establish new relationships between local information privacy, asymmetric local information privacy, pointwise maximal leakage and local differential privacy. We further provide applications of these relations to privacy mechanism design. Furthermore, we provide statements showing the equivalence between a lower bound on information density and risk-averse adversaries. More specifically, we prove an equivalence between a guessing framework and a cost-function framework that result in the desired lower bound on the information density. △ Less

Submitted 20 February, 2024; originally announced February 2024.

MSC Class: 94A17 ACM Class: H.1.1

arXiv:2401.15202 [pdf, ps, other]

A Cross Entropy Interpretation of R{é}nyi Entropy for $α$-leakage

Authors: Ni Ding, Mohammad Amin Zarrabian, Parastoo Sadeghi

Abstract: This paper proposes an $α$-leakage measure for $α\in[0,\infty)$ by a cross entropy interpretation of R{é}nyi entropy. While Rényi entropy was originally defined as an $f$-mean for $f(t) = \exp((1-α)t)$, we reveal that it is also a $\tilde{f}$-mean cross entropy measure for $\tilde{f}(t) = \exp(\frac{1-α}αt)$. Minimizing this Rényi cross-entropy gives Rényi entropy, by which the prior and posterior… ▽ More This paper proposes an $α$-leakage measure for $α\in[0,\infty)$ by a cross entropy interpretation of R{é}nyi entropy. While Rényi entropy was originally defined as an $f$-mean for $f(t) = \exp((1-α)t)$, we reveal that it is also a $\tilde{f}$-mean cross entropy measure for $\tilde{f}(t) = \exp(\frac{1-α}αt)$. Minimizing this Rényi cross-entropy gives Rényi entropy, by which the prior and posterior uncertainty measures are defined corresponding to the adversary's knowledge gain on sensitive attribute before and after data release, respectively. The $α$-leakage is proposed as the difference between $\tilde{f}$-mean prior and posterior uncertainty measures, which is exactly the Arimoto mutual information. This not only extends the existing $α$-leakage from $α\in [1,\infty)$ to the overall R{é}nyi order range $α\in [0,\infty)$ in a well-founded way with $α=0$ referring to nonstochastic leakage, but also reveals that the existing maximal leakage is a $\tilde{f}$-mean of an elementary $α$-leakage for all $α\in [0,\infty)$, which generalizes the existing pointwise maximal leakage. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 7 pages; 1 figure

arXiv:2310.20486 [pdf, ps, other]

Optimal Binary Differential Privacy via Graphs

Authors: Sahel Torkamani, Javad B. Ebrahimi, Parastoo Sadeghi, Rafael G. L. D'Oliveira, Muriel Médard

Abstract: We present the notion of \emph{reasonable utility} for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a ran… ▽ More We present the notion of \emph{reasonable utility} for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a randomized graph coloring, we characterize these optimal DP in terms of their behavior on a certain subset of the boundary datasets we call a boundary hitting set. In the process of establishing our results, we also introduce a useful notion that generalizes DP conditions for binary-valued queries, which we coin as suitable pairs. Suitable pairs abstract away the algebraic roles of $\varepsilon,δ$ in the DP framework, making the derivations and understanding of our proofs simpler. Additionally, the notion of a suitable pair can potentially capture privacy conditions in frameworks other than DP and may be of independent interest. △ Less

Submitted 31 October, 2023; originally announced October 2023.

arXiv:2309.05871 [pdf, other]

Generalized Rainbow Differential Privacy

Authors: Yuzhou Gu, Ziqi Zhou, Onur Günlü, Rafael G. L. D'Oliveira, Parastoo Sadeghi, Muriel Médard, Rafael F. Schaefer

Abstract: We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different… ▽ More We study a new framework for designing differentially private (DP) mechanisms via randomized graph colorings, called rainbow differential privacy. In this framework, datasets are nodes in a graph, and two neighboring datasets are connected by an edge. Each dataset in the graph has a preferential ordering for the possible outputs of the mechanism, and these orderings are called rainbows. Different rainbows partition the graph of connected datasets into different regions. We show that if a DP mechanism at the boundary of such regions is fixed and it behaves identically for all same-rainbow boundary datasets, then a unique optimal $(ε,δ)$-DP mechanism exists (as long as the boundary condition is valid) and can be expressed in closed-form. Our proof technique is based on an interesting relationship between dominance ordering and DP, which applies to any finite number of colors and for $(ε,δ)$-DP, improving upon previous results that only apply to at most three colors and for $ε$-DP. We justify the homogeneous boundary condition assumption by giving an example with non-homogeneous boundary condition, for which there exists no optimal DP mechanism. △ Less

Submitted 5 April, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: arXiv admin note: text overlap with arXiv:2202.03974

arXiv:2305.06577 [pdf, other]

Preferential Pliable Index Coding

Authors: Daniel Byrne, Lawrence Ong, Parastoo Sadeghi, Badri N. Vellambi

Abstract: We propose and study a variant of pliable index coding (PICOD) where receivers have preferences for their unknown messages and give each unknown message a preference ranking. We call this the preferential pliable index-coding (PPICOD) problem and study the Pareto trade-off between the code length and overall satisfaction metric among all receivers. We derive theoretical characteristics of the PPIC… ▽ More We propose and study a variant of pliable index coding (PICOD) where receivers have preferences for their unknown messages and give each unknown message a preference ranking. We call this the preferential pliable index-coding (PPICOD) problem and study the Pareto trade-off between the code length and overall satisfaction metric among all receivers. We derive theoretical characteristics of the PPICOD problem in terms of interactions between achievable code length and satisfaction metric. We also conceptually characterise two methods for computation of the Pareto boundary of the set of all achievable code length-satisfaction pairs. As for a coding scheme, we extend the Greedy Cover Algorithm for PICOD by Brahma and Fragouli, 2015, to balance the number of satisfied receivers and average satisfaction metric in each iteration. We present numerical results which show the efficacy of our proposed algorithm in approaching the Pareto boundary, found via brute-force computation. △ Less

Submitted 15 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: An extended version of the same-titled paper accepted for presentation at the 2023 IEEE International Symposium on Information Theory (ISIT)

arXiv:2303.13771 [pdf, other]

On the connection between the ABS perturbation methodology and differential privacy

Authors: Parastoo Sadeghi, Chien-Hung Chien

Abstract: This paper explores analytical connections between the perturbation methodology of the Australian Bureau of Statistics (ABS) and the differential privacy (DP) framework. We consider a single static counting query function and find the analytical form of the perturbation distribution with symmetric support for the ABS perturbation methodology. We then analytically measure the DP parameters, namely… ▽ More This paper explores analytical connections between the perturbation methodology of the Australian Bureau of Statistics (ABS) and the differential privacy (DP) framework. We consider a single static counting query function and find the analytical form of the perturbation distribution with symmetric support for the ABS perturbation methodology. We then analytically measure the DP parameters, namely the $(\varepsilon, δ)$ pair, for the ABS perturbation methodology under this setting. The results and insights obtained about the behaviour of $(\varepsilon, δ)$ with respect to the perturbation support and variance are used to judiciously select the variance of the perturbation distribution to give a good $δ$ in the DP framework for a given desired $\varepsilon$ and perturbation support. Finally, we propose a simple sampling scheme to implement the perturbation probability matrix in the ABS Cellkey method. The post sampling $(\varepsilon, δ)$ pair is numerically analysed as a function of the Cellkey size. It is shown that the best results are obtained for a larger Cellkey size, because the $(\varepsilon, δ)$ pair post-sampling measures remain almost identical when we compare sampling and theoretical results. △ Less

Submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.01017 [pdf, ps, other]

doi 10.3390/e25040679

On the Lift, Related Privacy Measures, and Applications to Privacy-Utility Tradeoffs

Authors: Mohammad Amin Zarrabian, Ni Ding, Parastoo Sadeghi

Abstract: This paper investigates lift, the likelihood ratio between the posterior and prior belief about sensitive features in a dataset. Maximum and minimum lifts over sensitive features quantify the adversary's knowledge gain and should be bounded to protect privacy. We demonstrate that max and min lifts have a distinct range of values and probability of appearance in the dataset, referred to as \emph{li… ▽ More This paper investigates lift, the likelihood ratio between the posterior and prior belief about sensitive features in a dataset. Maximum and minimum lifts over sensitive features quantify the adversary's knowledge gain and should be bounded to protect privacy. We demonstrate that max and min lifts have a distinct range of values and probability of appearance in the dataset, referred to as \emph{lift asymmetry}. We propose asymmetric local information privacy (ALIP) as a compatible privacy notion with lift asymmetry, where different bounds can be applied to min and max lifts. We use ALIP in the watchdog and optimal random response (ORR) mechanisms, the main methods to achieve lift-based privacy. It is shown that ALIP enhances utility in these methods compared to existing local information privacy, which ensures the same (symmetric) bounds on both max and min lifts. We propose subset merging for the watchdog mechanism to improve data utility and subset random response for the ORR to reduce complexity. We then investigate the related lift-based measures, including $\ell_1$-norm, $χ^2$-privacy criterion, and $α$-lift. We reveal that they can only restrict max-lift, resulting in significant min-lift leakage. To overcome this problem, we propose corresponding lift-inverse measures to restrict the min-lift. We apply these lift-based and lift-inverse measures in the watchdog mechanism. We show that they can be considered as relaxations of ALIP, where a higher utility can be achieved by bounding only average max and min lifts. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2210.15078 [pdf, other]

doi 10.1109/JSAC.2023.3280986

Age of Information in Downlink Systems: Broadcast or Unicast Transmission?

Authors: Zhifeng Tang, Nan Yang, Parastoo Sadeghi, Xiangyun Zhou

Abstract: We analytically decide whether the broadcast transmission scheme or the unicast transmission scheme achieves the optimal age of information (AoI) performance of a multiuser system where a base station (BS) generates and transmits status updates to multiple user equipments (UEs). In the broadcast transmission scheme, the status update for all UEs is jointly encoded into a packet for transmission, w… ▽ More We analytically decide whether the broadcast transmission scheme or the unicast transmission scheme achieves the optimal age of information (AoI) performance of a multiuser system where a base station (BS) generates and transmits status updates to multiple user equipments (UEs). In the broadcast transmission scheme, the status update for all UEs is jointly encoded into a packet for transmission, while in the unicast transmission scheme, the status update for each UE is encoded individually and transmitted by following the round robin policy. For both transmission schemes, we examine three packet management strategies, namely the non-preemption strategy, the preemption in buffer strategy, and the preemption in serving strategy. We first derive new closed-form expressions for the average AoI achieved by two transmission schemes with three packet management strategies. Based on them, we compare the AoI performance of two transmission schemes in two systems, namely, the remote control system and the dynamic system. Aided by simulation results, we verify our analysis and investigate the impact of system parameters on the average AoI. For example, the unicast transmission scheme is more appropriate for the system with a large number UEs. Otherwise, the broadcast transmission scheme is more appropriate. △ Less

Submitted 7 July, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

arXiv:2210.12916 [pdf, ps, other]

Explaining epsilon in local differential privacy through the lens of quantitative information flow

Authors: Natasha Fernandes, Annabelle McIver, Parastoo Sadeghi

Abstract: The study of leakage measures for privacy has been a subject of intensive research and is an important aspect of understanding how privacy leaks occur in computer systems. Differential privacy has been a focal point in the privacy community for some years and yet its leakage characteristics are not completely understood. In this paper we bring together two areas of research -- information theory a… ▽ More The study of leakage measures for privacy has been a subject of intensive research and is an important aspect of understanding how privacy leaks occur in computer systems. Differential privacy has been a focal point in the privacy community for some years and yet its leakage characteristics are not completely understood. In this paper we bring together two areas of research -- information theory and the g-leakage framework of quantitative information flow (QIF) -- to give an operational interpretation for the epsilon parameter of local differential privacy. We find that epsilon emerges as a capacity measure in both frameworks; via (log)-lift, a popular measure in information theory; and via max-case g-leakage, which we introduce to describe the leakage of any system to Bayesian adversaries modelled using ``worst-case'' assumptions under the QIF framework. Our characterisation resolves an important question of interpretability of epsilon and consolidates a number of disparate results covering the literature of both information theory and quantitative information flow. △ Less

Submitted 18 May, 2023; v1 submitted 23 October, 2022; originally announced October 2022.

arXiv:2206.06646 [pdf, other]

Network-Controlled Physical-Layer Security: Enhancing Secrecy Through Friendly Jamming

Authors: Sayed Amir Hoseini, Parastoo Sadeghi, Faycal Bouhafs, Neda Aboutorab, Frank den Hartog

Abstract: The broadcasting nature of the wireless medium makes exposure to eavesdroppers a potential threat. Physical Layer Security (PLS) has been widely recognized as a promising security measure complementary to encryption. It has recently been demonstrated that PLS can be implemented using off-the-shelf equipment by spectrum-programming enhanced Software-Defined Networking (SDN), where a network control… ▽ More The broadcasting nature of the wireless medium makes exposure to eavesdroppers a potential threat. Physical Layer Security (PLS) has been widely recognized as a promising security measure complementary to encryption. It has recently been demonstrated that PLS can be implemented using off-the-shelf equipment by spectrum-programming enhanced Software-Defined Networking (SDN), where a network controller is able to execute intelligent access point (AP) selection algorithms such that PLS can be achieved and secrecy capacity optimized. In this paper, we provide a basic system model for such implementations. We also introduce a novel secrecy capacity optimization algorithm, in which we combine intelligent AP selection with the addition of Friendly Jamming (FJ) by the not-selected AP. △ Less

Submitted 14 June, 2022; originally announced June 2022.

arXiv:2205.14549 [pdf, ps, other]

Asymmetric Local Information Privacy and the Watchdog Mechanism

Authors: Mohammad Amin Zarrabian, Ni Ding, Parastoo Sadeghi

Abstract: This paper proposes a novel watchdog privatization scheme by generalizing local information privacy (LIP) to enhance data utility. To protect the sensitive features $S$ correlated with some useful data $X$, LIP restricts the lift, the ratio of the posterior belief to the prior on $S$ after and before accessing $X$. For each $x$, both maximum and minimum lift over sensitive features are measures of… ▽ More This paper proposes a novel watchdog privatization scheme by generalizing local information privacy (LIP) to enhance data utility. To protect the sensitive features $S$ correlated with some useful data $X$, LIP restricts the lift, the ratio of the posterior belief to the prior on $S$ after and before accessing $X$. For each $x$, both maximum and minimum lift over sensitive features are measures of the privacy risk of publishing this symbol and should be restricted for the privacy-preserving purpose. Previous works enforce the same bound for both max-lift and min-lift. However, empirical observations show that the min-lift is usually much smaller than the max-lift. In this work, we generalize the LIP definition to consider the unequal values of max and min lift, i.e., considering different bounds for max-lift and min-lift. This new definition is applied to the watchdog privacy mechanism. We demonstrate that the utility is enhanced under a given privacy constraint on local differential privacy. At the same time, the resulting max-lift is lower and, therefore, tightly restricts other privacy leakages, e.g., mutual information, maximal leakage, and $α$-leakage. △ Less

Submitted 28 May, 2022; originally announced May 2022.

arXiv:2205.10827 [pdf, other]

Information Leakage in Index Coding With Sensitive and Non-Sensitive Messages

Authors: Yucheng Liu, Lawrence Ong, Phee Lep Yeoh, Parastoo Sadeghi, Joerg Kliewer, Sarah Johnson

Abstract: Information leakage to a guessing adversary in index coding is studied, where some messages in the system are sensitive and others are not. The non-sensitive messages can be used by the server like secret keys to mitigate leakage of the sensitive messages to the adversary. We construct a deterministic linear coding scheme, developed from the rank minimization method based on fitting matrices (Bar-… ▽ More Information leakage to a guessing adversary in index coding is studied, where some messages in the system are sensitive and others are not. The non-sensitive messages can be used by the server like secret keys to mitigate leakage of the sensitive messages to the adversary. We construct a deterministic linear coding scheme, developed from the rank minimization method based on fitting matrices (Bar-Yossef et al. 2011). The linear scheme leads to a novel upper bound on the optimal information leakage rate, which is proved to be tight over all deterministic scalar linear codes. We also derive a converse result from a graph-theoretic perspective, which holds in general over all deterministic and stochastic coding schemes. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE International Symposium on Information Theory (ISIT) 2022

arXiv:2205.10821 [pdf, other]

Information Leakage in Index Coding

Authors: Yucheng Liu, Lawrence Ong, Phee Lep Yeoh, Parastoo Sadeghi, Joerg Kliewer, Sarah Johnson

Abstract: We study the information leakage to a guessing adversary in index coding with a general message distribution. Under both vanishing-error and zero-error decoding assumptions, we develop lower and upper bounds on the optimal leakage rate, which are based on the broadcast rate of the subproblem induced by the set of messages the adversary tries to guess. When the messages are independent and uniforml… ▽ More We study the information leakage to a guessing adversary in index coding with a general message distribution. Under both vanishing-error and zero-error decoding assumptions, we develop lower and upper bounds on the optimal leakage rate, which are based on the broadcast rate of the subproblem induced by the set of messages the adversary tries to guess. When the messages are independent and uniformly distributed, the lower and upper bounds match, establishing an equivalence between the two rates. △ Less

Submitted 22 May, 2022; originally announced May 2022.

Comments: Published in Proceedings of IEEE Information Theory Workshop (ITW) 2021

arXiv:2203.15429 [pdf, ps, other]

Heterogeneous Differential Privacy via Graphs

Authors: Sahel Torkamani, Javad B. Ebrahimi, Parastoo Sadeghi, Rafael G. L. D'Oliveira, Muriel Medard

Abstract: We generalize a previous framework for designing utility-optimal differentially private (DP) mechanisms via graphs, where datasets are vertices in the graph and edges represent dataset neighborhood. The boundary set contains datasets where an individual's response changes the binary-valued query compared to its neighbors. Previous work was limited to the homogeneous case where the privacy paramete… ▽ More We generalize a previous framework for designing utility-optimal differentially private (DP) mechanisms via graphs, where datasets are vertices in the graph and edges represent dataset neighborhood. The boundary set contains datasets where an individual's response changes the binary-valued query compared to its neighbors. Previous work was limited to the homogeneous case where the privacy parameter $\varepsilon$ across all datasets was the same and the mechanism at boundary datasets was identical. In our work, the mechanism can take different distributions at the boundary and the privacy parameter $\varepsilon$ is a function of neighboring datasets, which recovers an earlier definition of personalized DP as special case. The problem is how to extend the mechanism, which is only defined at the boundary set, to other datasets in the graph in a computationally efficient and utility optimal manner. Using the concept of strongest induced DP condition we solve this problem efficiently in polynomial time (in the size of the graph). △ Less

Submitted 29 March, 2022; originally announced March 2022.

arXiv:2202.03974 [pdf, other]

Rainbow Differential Privacy

Authors: Ziqi Zhou, Onur Günlü, Rafael G. L. D'Oliveira, Muriel Médard, Parastoo Sadeghi, Rafael F. Schaefer

Abstract: We extend a previous framework for designing differentially private (DP) mechanisms via randomized graph colorings that was restricted to binary functions, corresponding to colorings in a graph, to multi-valued functions. As before, datasets are nodes in the graph and any two neighboring datasets are connected by an edge. In our setting, we assume that each dataset has a preferential ordering for… ▽ More We extend a previous framework for designing differentially private (DP) mechanisms via randomized graph colorings that was restricted to binary functions, corresponding to colorings in a graph, to multi-valued functions. As before, datasets are nodes in the graph and any two neighboring datasets are connected by an edge. In our setting, we assume that each dataset has a preferential ordering for the possible outputs of the mechanism, each of which we refer to as a rainbow. Different rainbows partition the graph of datasets into different regions. We show that if the DP mechanism is pre-specified at the boundary of such regions and behaves identically for all same-rainbow boundary datasets, at most one optimal such mechanism can exist and the problem can be solved by means of a morphism to a line graph. We then show closed form expressions for the line graph in the case of ternary functions. Treatment of ternary queries in this paper displays enough richness to be extended to higher-dimensional query spaces with preferential query ordering, but the optimality proof does not seem to follow directly from the ternary proof. △ Less

Submitted 13 May, 2022; v1 submitted 8 February, 2022; originally announced February 2022.

Comments: To appear in the 2022 IEEE International Symposium on Information Theory

arXiv:2201.10057 [pdf, ps, other]

On the Optimality of Linear Index Coding over the Fields with Characteristic Three

Authors: Arman Sharififar, Parastoo Sadeghi, Neda Aboutorab

Abstract: It has been known that the insufficiency of linear coding in achieving the optimal rate of the general index coding problem is rooted in its rate's dependency on the field size. However, this dependency has been described only through the two well-known matroid instances, namely the Fano and non-Fano matroids, which, in turn, limits its scope only to the fields with characteristic two. In this pap… ▽ More It has been known that the insufficiency of linear coding in achieving the optimal rate of the general index coding problem is rooted in its rate's dependency on the field size. However, this dependency has been described only through the two well-known matroid instances, namely the Fano and non-Fano matroids, which, in turn, limits its scope only to the fields with characteristic two. In this paper, we extend this scope to demonstrate the reliance of linear index coding rate on fields with characteristic three. By constructing two index coding instances of size 29, we prove that for the first instance, linear coding is optimal only over the fields with characteristic three, and for the second instance, linear coding over any field with characteristic three can never be optimal. Then, a variation of the second instance is designed as the third index coding instance of size 58. For this instance, it is proved that while linear coding over any field with characteristic three cannot be optimal, there exists a nonlinear code over the fields with characteristic three, which achieves its optimal rate. Connecting the first and third index coding instances in two specific ways, called no-way and two-way connections, will lead to two new index coding instances of size 87 and 91, for which linear coding is outperformed by nonlinear codes. Another main contribution of this paper is the reduction of the key constraints on the space of the linear coding for the first and second index coding instances, each of size 29, into a matroid instance with the ground set of size 9, whose linear representability is dependent on the fields with characteristic three. The proofs and discussions provided in this paper through using these two relatively small matroid instances will shed light on the underlying reason causing the linear coding to become insufficient for the general index coding problem. △ Less

Submitted 1 May, 2022; v1 submitted 24 January, 2022; originally announced January 2022.

arXiv:2111.02638 [pdf, ps, other]

The Age of Information of Short-Packet Communications: Joint or Distributed Encoding?

Authors: Zhifeng Tang, Nan Yang, Parastoo Sadeghi, Xiangyun Zhou

Abstract: In this paper, we analyze the impact of different encoding schemes on the age of information (AoI) performance in a point-to-point system, where a source generates packets based on the status updates collected from multiple sensors and transmits the packets to a destination. In this system, we consider two encoding schemes, namely, the joint encoding scheme and the distributed encoding scheme. In… ▽ More In this paper, we analyze the impact of different encoding schemes on the age of information (AoI) performance in a point-to-point system, where a source generates packets based on the status updates collected from multiple sensors and transmits the packets to a destination. In this system, we consider two encoding schemes, namely, the joint encoding scheme and the distributed encoding scheme. In the joint encoding scheme, the status updates from all the sensors are jointly encoded into a packet for transmission. In the distributed encoding scheme, the status update from each sensor is encoded individually and the sensors' packets are transmitted following the round robin policy. To ensure the freshness of packets, the zero-wait policy is adopted in both schemes, where a new packet is immediately generated once the source finishes the transmission of the current packet. We derive closed-form expressions for the average AoI achieved by these two encoding schemes and compare their performances. Simulation results show that the distributed encoding scheme is more appropriate for systems with a relatively large number of sensors, compared with the joint encoding scheme. △ Less

Submitted 4 November, 2021; originally announced November 2021.

arXiv:2110.06412 [pdf, other]

Offset-Symmetric Gaussians for Differential Privacy

Authors: Parastoo Sadeghi, Mehdi Korki

Abstract: The Gaussian distribution is widely used in mechanism design for differential privacy (DP). Thanks to its sub-Gaussian tail, it significantly reduces the chance of outliers when responding to queries. However, it can only provide approximate $(ε, δ(ε))$-DP. In practice, $δ(ε)$ must be much smaller than the size of the dataset, which may limit the use of the Gaussian mechanism for large datasets wi… ▽ More The Gaussian distribution is widely used in mechanism design for differential privacy (DP). Thanks to its sub-Gaussian tail, it significantly reduces the chance of outliers when responding to queries. However, it can only provide approximate $(ε, δ(ε))$-DP. In practice, $δ(ε)$ must be much smaller than the size of the dataset, which may limit the use of the Gaussian mechanism for large datasets with strong privacy requirements. In this paper, we introduce and analyze a new distribution for use in DP that is based on the Gaussian distribution, but has improved privacy performance. The so-called offset-symmetric Gaussian tail (OSGT) distribution is obtained through using the normalized tails of two symmetric Gaussians around zero. Consequently, it can still have sub-Gaussian tail and lend itself to analytical derivations. We analytically derive the variance of the OSGT random variable and the $δ(ε)$ of the OSGT mechanism. We then numerically show that at the same variance, the OSGT mechanism can offer a lower $δ(ε)$ than the Gaussian mechanism. We extend the OSGT mechanism to $k$-dimensional queries and derive an easy-to-compute analytical upper bound for its zero-concentrated differential privacy (zCDP) performance. We analytically prove that at the same variance, the same global query sensitivity and for sufficiently large concentration orders $α$, the OSGT mechanism performs better than the Gaussian mechanism in terms of zCDP. △ Less

Submitted 12 October, 2021; originally announced October 2021.

arXiv:2110.04724 [pdf, ps, other]

Enhancing Utility in the Watchdog Privacy Mechanism

Authors: Mohammad Amin Zarrabian, Ni Ding, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: This paper is concerned with enhancing data utility in the privacy watchdog method for attaining information-theoretic privacy. For a specific privacy constraint, the watchdog method filters out the high-risk data symbols through applying a uniform data regulation scheme, e.g., merging all high-risk symbols together. While this method entirely trades the symbols resolution off for privacy, we show… ▽ More This paper is concerned with enhancing data utility in the privacy watchdog method for attaining information-theoretic privacy. For a specific privacy constraint, the watchdog method filters out the high-risk data symbols through applying a uniform data regulation scheme, e.g., merging all high-risk symbols together. While this method entirely trades the symbols resolution off for privacy, we show that the data utility can be greatly improved by partitioning the high-risk symbols set and individually privatizing each subset. We further propose an agglomerative merging algorithm that finds a suitable partition of high-risk symbols: it starts with a singleton high-risk symbol, which is iteratively fused with others until the resulting subsets are private.~Numerical simulations demonstrate the efficacy of this algorithm in privately achieving higher utilities in the watchdog scheme. △ Less

Submitted 10 October, 2021; originally announced October 2021.

Comments: 5 pages, 3 figures

MSC Class: 68P27;

arXiv:2105.07211 [pdf, other]

On Converse Results for Secure Index Coding

Authors: Yucheng Liu, Lawrence Ong, Parastoo Sadeghi, Neda Aboutorab, Arman Sharififar

Abstract: In this work, we study the secure index coding problem where there are security constraints on both legitimate receivers and eavesdroppers. We develop two performance bounds (i.e., converse results) on the symmetric secure capacity. The first one is an extended version of the basic acyclic chain bound (Liu and Sadeghi, 2019) that takes security constraints into account. The second converse result… ▽ More In this work, we study the secure index coding problem where there are security constraints on both legitimate receivers and eavesdroppers. We develop two performance bounds (i.e., converse results) on the symmetric secure capacity. The first one is an extended version of the basic acyclic chain bound (Liu and Sadeghi, 2019) that takes security constraints into account. The second converse result is a novel information-theoretic lower bound on the symmetric secure capacity, which is interesting as all the existing converse results in the literature for secure index coding give upper bounds on the capacity. △ Less

Submitted 15 May, 2021; originally announced May 2021.

Comments: A shortened version submitted to IEEE Information Theory Workshop (ITW) 2021

arXiv:2102.05172 [pdf, ps, other]

Differential Privacy for Binary Functions via Randomized Graph Colorings

Authors: Rafael G. L. D'Oliveira, Muriel Medard, Parastoo Sadeghi

Abstract: We present a framework for designing differentially private (DP) mechanisms for binary functions via a graph representation of datasets. Datasets are nodes in the graph and any two neighboring datasets are connected by an edge. The true binary function we want to approximate assigns a value (or true color) to a dataset. Randomized DP mechanisms are then equivalent to randomized colorings of the gr… ▽ More We present a framework for designing differentially private (DP) mechanisms for binary functions via a graph representation of datasets. Datasets are nodes in the graph and any two neighboring datasets are connected by an edge. The true binary function we want to approximate assigns a value (or true color) to a dataset. Randomized DP mechanisms are then equivalent to randomized colorings of the graph. A key notion we use is that of the boundary of the graph. Any two neighboring datasets assigned a different true color belong to the boundary. Under this framework, we show that fixing the mechanism behavior at the boundary induces a unique optimal mechanism. Moreover, if the mechanism is to have a homogeneous behavior at the boundary, we present a closed expression for the optimal mechanism, which is obtained by means of a \emph{pullback} operation on the optimal mechanism of a line graph. For balanced mechanisms, not favoring one binary value over another, the optimal $(ε,δ)$-DP mechanism takes a particularly simple form, depending only on the minimum distance to the boundary, on $ε$, and on $δ$. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: Submitted to IEEE ISIT 2021

arXiv:2102.01908 [pdf, other]

Information Leakage in Zero-Error Source Coding: A Graph-Theoretic Perspective

Authors: Yucheng Liu, Lawrence Ong, Sarah Johnson, Joerg Kliewer, Parastoo Sadeghi, Phee Lep Yeoh

Abstract: We study the information leakage to a guessing adversary in zero-error source coding. The source coding problem is defined by a confusion graph capturing the distinguishability between source symbols. The information leakage is measured by the ratio of the adversary's successful guessing probability after and before eavesdropping the codeword, maximized over all possible source distributions. Such… ▽ More We study the information leakage to a guessing adversary in zero-error source coding. The source coding problem is defined by a confusion graph capturing the distinguishability between source symbols. The information leakage is measured by the ratio of the adversary's successful guessing probability after and before eavesdropping the codeword, maximized over all possible source distributions. Such measurement under the basic adversarial model where the adversary makes a single guess and allows no distortion between its estimator and the true sequence is known as the maximum min-entropy leakage or the maximal leakage in the literature. We develop a single-letter characterization of the optimal normalized leakage under the basic adversarial model, together with an optimum-achieving scalar stochastic mapping scheme. An interesting observation is that the optimal normalized leakage is equal to the optimal compression rate with fixed-length source codes, both of which can be simultaneously achieved by some deterministic coding schemes. We then extend the leakage measurement to generalized adversarial models where the adversary makes multiple guesses and allows certain level of distortion, for which we derive single-letter lower and upper bounds. △ Less

Submitted 3 February, 2021; originally announced February 2021.

Comments: A shortened version has been submitted to ISIT 2021

arXiv:2102.01526 [pdf, ps, other]

Broadcast Rate Requires Nonlinear Coding in a Unicast Index Coding Instance of Size 36

Authors: Arman Sharififar, Parastoo Sadeghi, Neda Aboutorab

Abstract: Insufficiency of linear coding for the network coding problem was first proved by providing an instance which is solvable only by nonlinear network coding (Dougherty et al., 2005).Based on the work of Effros, et al., 2015, this specific network coding instance can be modeled as a groupcast index coding (GIC)instance with 74 messages and 80 users (where a message can be requested by multiple users)… ▽ More Insufficiency of linear coding for the network coding problem was first proved by providing an instance which is solvable only by nonlinear network coding (Dougherty et al., 2005).Based on the work of Effros, et al., 2015, this specific network coding instance can be modeled as a groupcast index coding (GIC)instance with 74 messages and 80 users (where a message can be requested by multiple users). This proves the insufficiency of linear coding for the GIC problem. Using the systematic approach proposed by Maleki et al., 2014, the aforementioned GIC instance can be cast into a unicast index coding (UIC) instance with more than 200 users, each wanting a unique message. This confirms the necessity of nonlinear coding for the UIC problem, but only for achieving the entire capacity region. Nevertheless, the question of whether nonlinear coding is required to achieve the symmetric capacity (broadcast rate) of the UIC problem remained open. In this paper, we settle this question and prove the insufficiency of linear coding, by directly building a UIC instance with only 36users for which there exists a nonlinear index code outperforming the optimal linear code in terms of the broadcast rate. △ Less

Submitted 5 February, 2023; v1 submitted 2 February, 2021; originally announced February 2021.

arXiv:2101.10551 [pdf, ps, other]

$α$-Information-theoretic Privacy Watchdog and Optimal Privatization Scheme

Authors: Ni Ding, Mohammad Amin Zarrabian, Parastoo Sadeghi

Abstract: This paper proposes an $α$-lift measure for data privacy and determines the optimal privatization scheme that minimizes the $α$-lift in the watchdog method. To release data $X$ that is correlated with sensitive information $S$, the ratio $l(s,x) = \frac{p(s|x)}{p(s)} $ denotes the `lift' of the posterior belief on $S$ and quantifies data privacy. The $α$-lift is proposed as the $L_α$-norm of the l… ▽ More This paper proposes an $α$-lift measure for data privacy and determines the optimal privatization scheme that minimizes the $α$-lift in the watchdog method. To release data $X$ that is correlated with sensitive information $S$, the ratio $l(s,x) = \frac{p(s|x)}{p(s)} $ denotes the `lift' of the posterior belief on $S$ and quantifies data privacy. The $α$-lift is proposed as the $L_α$-norm of the lift: $\ell_α(x) = \| (\cdot,x) \|_α = (E[l(S,x)^α])^{1/α}$. This is a tunable measure: When $α< \infty$, each lift is weighted by its likelihood of appearing in the dataset (w.r.t. the marginal probability $p(s)$); For $α= \infty$, $α$-lift reduces to the existing maximum lift. To generate the sanitized data $Y$, we adopt the privacy watchdog method using $α$-lift: Obtain $\mathcal{X}_ε$ containing all $x$'s such that $\ell_α(x) > e^ε$; Apply the randomization $r(y|x)$ to all $x \in \mathcal{X}_ε$, while all other $x \in \mathcal{X} \setminus \mathcal{X}_ε$ are published directly. For the resulting $α$-lift $\ell_α(y)$, it is shown that the Sibson mutual information $I_α^{S}(S;Y)$ is proportional to $E[ \ell_α(y)]$. We further define a stronger measure $\bar{I}_α^{S}(S;Y)$ using the worst-case $α$-lift: $\max_{y} \ell_α(y)$. We prove that the optimal randomization $r^*(y|x)$ that minimizes both $I_α^{S}(S;Y)$ and $\bar{I}_α^{S}(S;Y)$ is $X$-invariant, i.e., $r^*(y|x) = R(y), \forall x\in \mathcal{X}_ε$ for any probability distribution $R$ over $y \in \mathcal{X}_ε$. Numerical experiments show that $α$-lift can provide flexibility in the privacy-utility tradeoff. △ Less

Submitted 25 January, 2021; originally announced January 2021.

arXiv:2101.08970 [pdf, other]

An Update-based Maximum Column Distance Coding Scheme for Index Coding

Authors: Arman Sharififar, Neda Aboutorab, Parastoo Sadeghi

Abstract: In this paper, we propose a new scalar linear coding scheme for the index coding problem called update-based maximum column distance (UMCD) coding scheme. The central idea in each transmission is to code messages such that one of the receivers with the minimum size of side information is instantaneously eliminated from unsatisfied receivers. One main contribution of the paper is to prove that the… ▽ More In this paper, we propose a new scalar linear coding scheme for the index coding problem called update-based maximum column distance (UMCD) coding scheme. The central idea in each transmission is to code messages such that one of the receivers with the minimum size of side information is instantaneously eliminated from unsatisfied receivers. One main contribution of the paper is to prove that the other satisfied receivers can be identified after each transmission, using a polynomial-time algorithm solving the well-known maximum cardinality matching problem in graph theory. This leads to determining the total number of transmissions without knowing the coding coefficients. Once this number and what messages to transmit in each round are found, we then propose a method to determine all coding coefficients from a sufficiently large finite field. We provide concrete instances where the proposed UMCD coding scheme has a better broadcast performance compared to the most efficient existing coding schemes, including the recursive scheme (Arbabjolfaei and Kim, 2014) and the interlinked-cycle cover (ICC) scheme (Thapa et al., 2017). We prove that the proposed UMCD coding scheme performs at least as well as the MDS coding scheme in terms of broadcast rate. By characterizing two classes of index coding instances, we show that the gap between the broadcast rates of the recursive and ICC schemes and the UMCD scheme grows linearly with the number of messages. Then, we extend the UMCD coding scheme to its vector version by applying it as a basic coding block to solve the subinstances. △ Less

Submitted 5 February, 2023; v1 submitted 22 January, 2021; originally announced January 2021.

arXiv:2010.09367 [pdf, other]

On Properties and Optimization of Information-theoretic Privacy Watchdog

Authors: Parastoo Sadeghi, Ni Ding, Thierry Rakotoarivelo

Abstract: We study the problem of privacy preservation in data sharing, where $S$ is a sensitive variable to be protected and $X$ is a non-sensitive useful variable correlated with $S$. Variable $X$ is randomized into variable $Y$, which will be shared or released according to $p_{Y|X}(y|x)$. We measure privacy leakage by \emph{information privacy} (also known as \emph{log-lift} in the literature), which gu… ▽ More We study the problem of privacy preservation in data sharing, where $S$ is a sensitive variable to be protected and $X$ is a non-sensitive useful variable correlated with $S$. Variable $X$ is randomized into variable $Y$, which will be shared or released according to $p_{Y|X}(y|x)$. We measure privacy leakage by \emph{information privacy} (also known as \emph{log-lift} in the literature), which guarantees mutual information privacy and differential privacy (DP). Let $\Xepsc \subseteq \X$ contain elements n the alphabet of $X$ for which the absolute value of log-lift (abs-log-lift for short) is greater than a desired threshold $\eps$. When elements $x\in \Xepsc$ are randomized into $y\in \Y$, we derive the best upper bound on the abs-log-lift across the resultant pairs $(s,y)$. We then prove that this bound is achievable via an \emph{$X$-invariant} randomization $p(y|x) = R(y)$ for $x,y\in\Xepsc$. However, the utility measured by the mutual information $I(X;Y)$ is severely damaged in imposing a strict upper bound $\eps$ on the abs-log-lift. To remedy this and inspired by the probabilistic ($\eps$, $δ$)-DP, we propose a relaxed ($\eps$, $δ$)-log-lift framework. To achieve this relaxation, we introduce a greedy algorithm which exempts some elements in $\Xepsc$ from randomization, as long as their abs-log-lift is bounded by $\eps$ with probability $1-δ$. Numerical results demonstrate efficacy of this algorithm in achieving a better privacy-utility tradeoff. △ Less

Submitted 19 October, 2020; originally announced October 2020.

arXiv:2008.09702 [pdf, ps, other]

Low Influence, Utility, and Independence in Differential Privacy: A Curious Case of $3 \choose 2$

Authors: Rafael G. L. D'Oliveira, Salman Salamatian, Muriel Médard, Parastoo Sadeghi

Abstract: We study the relationship between randomized low influence functions and differentially private mechanisms. Our main aim is to formally determine whether differentially private mechanisms are low influence and whether low influence randomized functions can be differentially private. We show that differential privacy does not necessarily imply low influence in a formal sense. However, low influence… ▽ More We study the relationship between randomized low influence functions and differentially private mechanisms. Our main aim is to formally determine whether differentially private mechanisms are low influence and whether low influence randomized functions can be differentially private. We show that differential privacy does not necessarily imply low influence in a formal sense. However, low influence implies approximate differential privacy. These results hold for both independent and non-independent randomized mechanisms, where an important instance of the former is the widely-used additive noise techniques in the differential privacy literature. Our study also reveals the interesting dynamics between utility, low influence, and independence of a differentially private mechanism. As the name of this paper suggests, we show that any two such features are simultaneously possible. However, in order to have a differentially private mechanism that has both utility and low influence, even under a very mild utility condition, one has to employ non-independent mechanisms. △ Less

Submitted 7 February, 2021; v1 submitted 21 August, 2020; originally announced August 2020.

arXiv:2007.09374 [pdf, other]

Differentially Private Mechanisms for Count Queries

Authors: Parastoo Sadeghi, Shahab Asoodeh, Flavio du Pin Calmon

Abstract: In this paper, we consider the problem of responding to a count query (or any other integer-valued queries) evaluated on a dataset containing sensitive attributes. To protect the privacy of individuals in the dataset, a standard practice is to add continuous noise to the true count. We design a differentially-private mechanism which adds integer-valued noise allowing the released output to remain… ▽ More In this paper, we consider the problem of responding to a count query (or any other integer-valued queries) evaluated on a dataset containing sensitive attributes. To protect the privacy of individuals in the dataset, a standard practice is to add continuous noise to the true count. We design a differentially-private mechanism which adds integer-valued noise allowing the released output to remain integer. As a trade-off between utility and privacy, we derive privacy parameters $\eps$ and $δ$ in terms of the the probability of releasing an erroneous count under the assumption that the true count is no smaller than half the support size of the noise. We then numerically demonstrate that our mechanism provides higher privacy guarantee compared to the discrete Gaussian mechanism that is recently proposed in the literature. △ Less

Submitted 18 July, 2020; originally announced July 2020.

arXiv:2004.02076 [pdf, ps, other]

Independent User Partition Multicast Scheme for the Groupcast Index Coding Problem

Authors: Arman Sharififar, Neda Aboutorab, Yucheng Liu, Parastoo Sadeghi

Abstract: The groupcast index coding (GIC) problem is a generalization of the index coding problem, where one packet can be demanded by multiple users. In this paper, we propose a new coding scheme called independent user partition multicast (IUPM) for the GIC problem. The novelty of this scheme compared to the user partition multicast (UPM) (Shanmugam \textit{et al.}, 2015) is in removing redundancies in t… ▽ More The groupcast index coding (GIC) problem is a generalization of the index coding problem, where one packet can be demanded by multiple users. In this paper, we propose a new coding scheme called independent user partition multicast (IUPM) for the GIC problem. The novelty of this scheme compared to the user partition multicast (UPM) (Shanmugam \textit{et al.}, 2015) is in removing redundancies in the UPM solution by eliminating the linearly dependent coded packets. We also prove that the UPM scheme subsumes the packet partition multicast (PPM) scheme (Tehrani \textit{et al.}, 2012). Hence, the IUPM scheme is a generalization of both PPM and UPM schemes. Furthermore, inspired by jointly considering users and packets, we modify the approximation partition multicast (CAPM) scheme (Unal and Wagner, 2016) to achieve a new polynomial-time algorithm for solving the general GIC problem. We characterize a class of GIC problems with $\frac{k(k-1)}{2}$ packets, for any integer $k\geq 2$, for which the IUPM scheme is optimal. We also prove that for this class, the broadcast rate of the proposed new heuristic algorithm is $k$, while the broadcast rate of the CAPM scheme is $\mathcal{O}(k^2)$. △ Less

Submitted 4 April, 2020; originally announced April 2020.

arXiv:2001.07296 [pdf, ps, other]

Secure Index Coding with Security Constraints on Receivers

Authors: Yucheng Liu, Parastoo Sadeghi, Neda Aboutorab, Arman Sharififar

Abstract: Index coding is concerned with efficient broadcast of a set of messages to receivers in the presence of receiver side information. In this paper, we study the secure index coding problem with security constraints on the receivers themselves. That is, for each receiver there is a single legitimate message it needs to decode and a prohibited message list, none of which should be decoded by that rece… ▽ More Index coding is concerned with efficient broadcast of a set of messages to receivers in the presence of receiver side information. In this paper, we study the secure index coding problem with security constraints on the receivers themselves. That is, for each receiver there is a single legitimate message it needs to decode and a prohibited message list, none of which should be decoded by that receiver. To this end, our contributions are threefold. We first introduce a secure linear coding scheme, which is an extended version of the fractional local partial clique covering scheme that was originally devised for non-secure index coding. We then develop two information-theoretic bounds on the performance of any valid secure index code, namely secure polymatroidal outer bound (on the capacity region) and secure maximum acyclic induced subgraph lower bound (on the broadcast rate). The structure of these bounds leads us to further develop two necessary conditions for a given index coding problem to be securely feasible (i.e., to have nonzero rates). △ Less

Submitted 15 April, 2020; v1 submitted 20 January, 2020; originally announced January 2020.

Comments: 6 pages; a shorted version submitted to the International Symposium on Information Theory and Its Applications (ISITA) 2020

arXiv:2001.06828 [pdf, ps, other]

Privacy-Utility Tradeoff in a Guessing Framework Inspired by Index Coding

Authors: Yucheng Liu, Ni Ding, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: This paper studies the tradeoff in privacy and utility in a single-trial multi-terminal guessing (estimation) framework using a system model that is inspired by index coding. There are $n$ independent discrete sources at a data curator. There are $m$ legitimate users and one adversary, each with some side information about the sources. The data curator broadcasts a distorted function of sources to… ▽ More This paper studies the tradeoff in privacy and utility in a single-trial multi-terminal guessing (estimation) framework using a system model that is inspired by index coding. There are $n$ independent discrete sources at a data curator. There are $m$ legitimate users and one adversary, each with some side information about the sources. The data curator broadcasts a distorted function of sources to legitimate users, which is also overheard by the adversary. In terms of utility, each legitimate user wishes to perfectly reconstruct some of the unknown sources and attain a certain gain in the estimation correctness for the remaining unknown sources. In terms of privacy, the data curator wishes to minimize the maximal leakage: the worst-case guessing gain of the adversary in estimating any target function of its unknown sources after receiving the broadcast data. Given the system settings, we derive fundamental performance lower bounds on the maximal leakage to the adversary, which are inspired by the notion of confusion graph and performance bounds for the index coding problem. We also detail a greedy privacy enhancing mechanism, which is inspired by the agglomerative clustering algorithms in the information bottleneck and privacy funnel problems. △ Less

Submitted 18 June, 2020; v1 submitted 19 January, 2020; originally announced January 2020.

Comments: 6 pages; accepted by IEEE International Symposium on Information Theory (ISIT) 2020

arXiv:1912.11814 [pdf, ps, other]

Part II: A Practical Approach for Successive Omniscience

Authors: Ni Ding, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: In Part I, we studied the communication for omniscience (CO) problem and proposed a parametric (PAR) algorithm to determine the minimum sum-rate at which a set of users indexed by a finite set $V$ attain omniscience. The omniscience in CO refers to the status that each user in $V$ recovers the observations of a multiple random source. It is called the global omniscience in this paper in contrast t… ▽ More In Part I, we studied the communication for omniscience (CO) problem and proposed a parametric (PAR) algorithm to determine the minimum sum-rate at which a set of users indexed by a finite set $V$ attain omniscience. The omniscience in CO refers to the status that each user in $V$ recovers the observations of a multiple random source. It is called the global omniscience in this paper in contrast to the study of the successive omniscience (SO), where the local omniscience is attained subsequently in user subsets. By inputting a lower bound on the minimum sum-rate for CO, we apply the PAR algorithm to search a complimentary subset $X_* \subsetneq V$ such that if the local omniscience in $X_*$ is reached first, the global omniscience whereafter can still be attained with the minimum sum-rate. We further utilize the outputs of the PAR algorithm to outline a multi-stage SO approach that is characterized by $K \leq |V| - 1$ complimentary subsets $X_*^{(k)}, \forall k \in \{1,\dotsc,K\}$ forming a nesting sequence $X_*^{(1)} \subsetneq \dotsc \subsetneq X_*^{(K)} = V$. Starting from stage $k = 1$, the local omniscience in $X_*^{(k)}$ is attained at each stage $k$ until the final global omniscience in $X_*^{(K)} = V$. A $|X_*{(k)}|$-dimensional local omniscience achievable rate vector is also derived for each stage $k$ designating individual users transmitting rates. The sum-rate of this rate vector in the last stage $K$ coincides with the minimized sum-rate for the global omniscience. △ Less

Submitted 26 December, 2019; originally announced December 2019.

Comments: 12 pages, 2 figures

arXiv:1912.11808 [pdf, ps, other]

Part I: Improving Computational Efficiency of Communication for Omniscience

Authors: Ni Ding, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: Communication for omniscience (CO) refers to the problem where the users in a finite set $V$ observe a discrete multiple random source and want to exchange data over broadcast channels to reach omniscience, the state where everyone recovers the entire source. This paper studies how to improve the computational complexity for the problem of minimizing the sum-rate for attaining omniscience in $V$.… ▽ More Communication for omniscience (CO) refers to the problem where the users in a finite set $V$ observe a discrete multiple random source and want to exchange data over broadcast channels to reach omniscience, the state where everyone recovers the entire source. This paper studies how to improve the computational complexity for the problem of minimizing the sum-rate for attaining omniscience in $V$. While the existing algorithms rely on the submodular function minimization (SFM) techniques and complete in $O(|V|^2 \cdot \text{SFM}(|V|)$ time, we prove the strict strong map property of the nesting SFM problem. We propose a parametric (PAR) algorithm that utilizes the parametric SFM techniques and reduce the the complexity to $O(|V| \cdot \text{SFM}(|V|)$. The output of the PAR algorithm is in fact the segmented Dilworth truncation of the residual entropy for all minimum sum-rate estimates $α$, which characterizes the principal sequence of partitions (PSP) and solves some related problems: It not only determines the secret capacity, a dual problem to CO, and the network strength of a graph, but also outlines the hierarchical solution to a combinatorial clustering problem. \end{abstract} △ Less

Submitted 18 December, 2020; v1 submitted 26 December, 2019; originally announced December 2019.

Comments: 16 pages, 3 figures

arXiv:1909.11850 [pdf, other]

Improved Lower Bounds for Pliable Index Coding using Absent Receivers

Authors: Lawrence Ong, Badri N. Vellambi, Jörg Kliewer, Parastoo Sadeghi

Abstract: This paper studies pliable index coding, in which a sender broadcasts information to multiple receivers through a shared broadcast medium, and the receivers each have some message a priori and want any message they do not have. An approach, based on receivers that are absent from the problem, was previously proposed to find lower bounds on the optimal broadcast rate. In this paper, we introduce ne… ▽ More This paper studies pliable index coding, in which a sender broadcasts information to multiple receivers through a shared broadcast medium, and the receivers each have some message a priori and want any message they do not have. An approach, based on receivers that are absent from the problem, was previously proposed to find lower bounds on the optimal broadcast rate. In this paper, we introduce new techniques to obtained better lower bounds, and derive the optimal broadcast rates for new classes of the problems, including all problems with up to four absent receivers. △ Less

Submitted 1 October, 2019; v1 submitted 25 September, 2019; originally announced September 2019.

Comments: An extended version of the same-titled paper submitted to a conference

arXiv:1903.01001 [pdf, ps, other]

Improving Computational Efficiency of Communication for Omniscience and Successive Omniscience

Authors: Ni Ding, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: For a group of users in $V$ where everyone observes a component of a discrete multiple random source, the process that users exchange data so as to reach omniscience, the state where everyone recovers the entire source, is called communication for omniscience (CO). We first consider how to improve the existing complexity $O(|V|^2 \cdot \text{SFM}(|V|)$ of minimizing the sum of communication rates… ▽ More For a group of users in $V$ where everyone observes a component of a discrete multiple random source, the process that users exchange data so as to reach omniscience, the state where everyone recovers the entire source, is called communication for omniscience (CO). We first consider how to improve the existing complexity $O(|V|^2 \cdot \text{SFM}(|V|)$ of minimizing the sum of communication rates in CO, where $\text{SFM}(|V|)$ denotes the complexity of minimizing a submodular function. We reveal some structured property in an existing coordinate saturation algorithm: the resulting rate vector and the corresponding partition of $V$ are segmented in $α$, the estimation of the minimum sum-rate. A parametric (PAR) algorithm is then proposed where, instead of a particular $α$, we search the critical points that fully determine the segmented variables for all $α$ so that they converge to the solution to the minimum sum-rate problem and the overall complexity reduces to $O(|V| \cdot \text{SFM}(|V|))$. For the successive omniscience (SO), we consider how to attain local omniscience in some complimentary user subset so that the overall sum-rate for the global omniscience still remains minimum. While the existing algorithm only determines a complimentary user subset in $O(|V| \cdot \text{SFM}(|V|))$ time, we show that, if a lower bound on the minimum sum-rate is applied to the segmented variables in the PAR algorithm, not only a complimentary subset, but also an optimal rate vector for attaining the local omniscience in it are returned in $O(|V| \cdot \text{SFM}(|V|))$ time. △ Less

Submitted 3 March, 2019; originally announced March 2019.

Comments: 8 pages, 3 figures

arXiv:1902.03706 [pdf, ps, other]

Attaining Fairness in Communication for Omniscience

Authors: Ni Ding, Parastoo Sadeghi, David Smith, Thierry Rakotoarivelo

Abstract: This paper studies how to attain fairness in communication for omniscience, where a set of users exchange their observations of a discrete multiple random source to attain omniscience---the state that all users recover the entire source. The optimal rate region containing all source coding rate vectors that achieve the omniscience with the minimum sum rate is shown to coincide with the core (the s… ▽ More This paper studies how to attain fairness in communication for omniscience, where a set of users exchange their observations of a discrete multiple random source to attain omniscience---the state that all users recover the entire source. The optimal rate region containing all source coding rate vectors that achieve the omniscience with the minimum sum rate is shown to coincide with the core (the solution set) of a coalitional game. Two game-theoretic fairness solutions are studied: the Shapley value and the egalitarian solution. It is shown that the Shapley value assigns each user the source coding rate measured by his/her remaining information of the multiple source given the common randomness that is shared by all users, while the egalitarian solution simply distributes the rates as evenly as possible in the core. To avoid the exponentially growing complexity of obtaining the Shapley value, a polynomial-time approximation method is proposed by utilizing the fact that the Shapley value is the mean value over all extreme points in the core. In addition, a steepest descent algorithm is proposed which converges in polynomial time to the fractional egalitarian solution in the core that can be implemented by network coding schemes. Finally, it is shown that the game can be decomposed into subgames so that both the Shapley value and the egalitarian solution can be obtained within each subgame in a distributed manner with reduced complexity. △ Less

Submitted 10 February, 2019; originally announced February 2019.

Comments: 12 pages, 5 figures

arXiv:1901.09183 [pdf, other]

Generalized Alignment Chain: Improved Converse Results for Index Coding

Authors: Yucheng Liu, Parastoo Sadeghi

Abstract: In this paper, we study the information-theoretic converse for the index coding problem. We generalize the definition for the alignment chain, introduced by Maleki et al., to capture more flexible relations among interfering messages at each receiver. Based on this, we derive improved converse results for the single-server index coding problem. Compared to the maximum acyclic induced subgraph (MAI… ▽ More In this paper, we study the information-theoretic converse for the index coding problem. We generalize the definition for the alignment chain, introduced by Maleki et al., to capture more flexible relations among interfering messages at each receiver. Based on this, we derive improved converse results for the single-server index coding problem. Compared to the maximum acyclic induced subgraph (MAIS) bound, the new bounds are always as tight and can strictly outperform the MAIS bound. They can also be useful for large problems, where the generally tighter polymatroidal bound is computationally impractical. We then extend these new bounds to the multi-server index coding problem. We also present a separate, but related result where we identify a smaller single-server index coding instance, compared to those identified in the literature, for which non-Shannon-type inequalities are necessary to give a tighter converse. △ Less

Submitted 28 May, 2019; v1 submitted 26 January, 2019; originally announced January 2019.

Comments: A shorter version has been accepted by the 2019 IEEE International Symposium on Information Theory (ISIT)

arXiv:1901.06629 [pdf, ps, other]

A Submodularity-based Agglomerative Clustering Algorithm for the Privacy Funnel

Authors: Ni Ding, Parastoo Sadeghi

Abstract: For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data $X$ correlated with the sensitive information $S$, the PF problem is to generate the sanitized data $\hat{X}$ that maintains a specified utility/fidelity threshold on… ▽ More For the privacy funnel (PF) problem, we propose an efficient iterative agglomerative clustering algorithm based on the minimization of the difference of submodular functions (IAC-MDSF). For a data curator that wants to share the data $X$ correlated with the sensitive information $S$, the PF problem is to generate the sanitized data $\hat{X}$ that maintains a specified utility/fidelity threshold on $I(X; \hat{X})$ while minimizing the privacy leakage $I(S; \hat{X})$. Our IAC-MDSF algorithm starts with the original alphabet $\hat{\mathcal{X}} := \mathcal{X}$ and iteratively merges the elements in the current alphabet $\hat{\mathcal{X}}$ that minimizes the Lagrangian function $ I(S;\hat{X}) - λI(X;\hat{X}) $. We prove that the best merge in each iteration of IAC-MDSF can be searched efficiently over all subsets of $\hat{\mathcal{X}}$ by the existing MDSF algorithms. We show that the IAC-MDSF algorithm also applies to the information bottleneck (IB), a dual problem to PF. By varying the value of the Lagrangian multiplier $λ$, we obtain the experimental results on a heart disease data set in terms of the Pareto frontier: $ I(S;\hat{X})$ vs. $- I(X;\hat{X})$. We show that our IAC-MDSF algorithm outperforms the existing iterative pairwise merge approaches for both PF and IB and is computationally much less complex. △ Less

Submitted 12 February, 2019; v1 submitted 20 January, 2019; originally announced January 2019.

Comments: 6 pages, 4 figures

arXiv:1809.03615 [pdf, other]

On the Capacity Region for Secure Index Coding

Authors: Yuxin Liu, Badri N. Vellambi, Young-Han Kim, Parastoo Sadeghi

Abstract: We study the index coding problem in the presence of an eavesdropper, where the aim is to communicate without allowing the eavesdropper to learn any single message aside from the messages it may already know as side information. We establish an outer bound on the underlying secure capacity region of the index coding problem, which includes polymatroidal and security constraints, as well as the set… ▽ More We study the index coding problem in the presence of an eavesdropper, where the aim is to communicate without allowing the eavesdropper to learn any single message aside from the messages it may already know as side information. We establish an outer bound on the underlying secure capacity region of the index coding problem, which includes polymatroidal and security constraints, as well as the set of additional decoding constraints for legitimate receivers. We then propose a secure variant of the composite coding scheme, which yields an inner bound on the secure capacity region of the index coding problem. For the achievability of secure composite coding, a secret key with vanishingly small rate may be needed to ensure that each legitimate receiver who wants the same message as the eavesdropper, knows at least two more messages than the eavesdropper. For all securely feasible index coding problems with four or fewer messages, our numerical results establish the secure index coding capacity region. △ Less

Submitted 10 September, 2018; originally announced September 2018.

arXiv:1805.01583 [pdf, ps, other]

Fairness in Multiterminal Data Compression: A Splitting Method for The Egalitarian Solution

Authors: Ni Ding, David Smith, Parastoo Sadeghi, Thierry Rakotoarivelo

Abstract: This paper proposes a novel splitting (SPLIT) algorithm to achieve fairness in the multiterminal lossless data compression problem. It finds the egalitarian solution in the Slepian-Wolf region and completes in strongly polynomial time. We show that the SPLIT algorithm adaptively updates the source coding rates to the optimal solution, while recursively splitting the terminal set, enabling parallel… ▽ More This paper proposes a novel splitting (SPLIT) algorithm to achieve fairness in the multiterminal lossless data compression problem. It finds the egalitarian solution in the Slepian-Wolf region and completes in strongly polynomial time. We show that the SPLIT algorithm adaptively updates the source coding rates to the optimal solution, while recursively splitting the terminal set, enabling parallel and distributed computation. The result of an experiment demonstrates a significant reduction in computation time by the parallel implementation when the number of terminals becomes large. The achieved egalitarian solution is also shown to be superior to the Shapley value in distributed networks, e.g., wireless sensor networks, in that it best balances the nodes' energy consumption and is far less computationally complex to obtain. △ Less

Submitted 3 May, 2018; originally announced May 2018.

Comments: 5 pages, 4 figures

Journal ref: ICASSP2018 proceedings

Showing 1–50 of 105 results for author: Sadeghi, P