subscribe to arXiv mailings

Data Measurements for Decentralized Data Markets

Authors: Charles Lu, Mohammad Mohammadi Amiri, Ramesh Raskar

Abstract: Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relativ… ▽ More Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relative comparisons between sellers without requiring intermediate brokers and training task-dependent models. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 20 pages, 11 figures

arXiv:2208.12354 [pdf, other]

Fundamentals of Task-Agnostic Data Valuation

Authors: Mohammad Mohammadi Amiri, Frederic Berdoz, Ramesh Raskar

Abstract: We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which coul… ▽ More We study valuing the data of a data owner/seller for a data seeker/buyer. Data valuation is often carried out for a specific task assuming a particular utility metric, such as test accuracy on a validation set, that may not exist in practice. In this work, we focus on task-agnostic data valuation without any validation requirements. The data buyer has access to a limited amount of data (which could be publicly available) and seeks more data samples from a data seller. We formulate the problem as estimating the differences in the statistical properties of the data at the seller with respect to the baseline data available at the buyer. We capture these statistical differences through second moment by measuring diversity and relevance of the seller's data for the buyer; we estimate these measures through queries to the seller without requesting raw data. We design the queries with the proposed approach so that the seller is blind to the buyer's raw data and has no knowledge to fabricate responses to queries to obtain a desired outcome of the diversity and relevance trade-off.We will show through extensive experiments on real tabular and image datasets that the proposed estimates capture the diversity and relevance of the seller's data for the buyer. △ Less

Submitted 25 August, 2022; originally announced August 2022.

arXiv:2207.03652 [pdf, other]

Private independence testing across two parties

Authors: Praneeth Vepakomma, Mohammad Mohammadi Amiri, Clément L. Canonne, Ramesh Raskar, Alex Pentland

Abstract: We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially… ▽ More We introduce $π$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Székely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially private test, which we believe will find applications in a variety of distributed hypothesis testing settings involving sensitive data. △ Less

Submitted 26 September, 2023; v1 submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.05723 [pdf, ps, other]

Communication-Efficient Federated Learning over MIMO Multiple Access Channels

Authors: Yo-Seb Jeon, Mohammad Mohammadi Amiri, Namyoon Lee

Abstract: Communication efficiency is of importance for wireless federated learning systems. In this paper, we propose a communication-efficient strategy for federated learning over multiple-input multiple-output (MIMO) multiple access channels (MACs). The proposed strategy comprises two components. When sending a locally computed gradient, each device compresses a high dimensional local gradient to multipl… ▽ More Communication efficiency is of importance for wireless federated learning systems. In this paper, we propose a communication-efficient strategy for federated learning over multiple-input multiple-output (MIMO) multiple access channels (MACs). The proposed strategy comprises two components. When sending a locally computed gradient, each device compresses a high dimensional local gradient to multiple lower-dimensional gradient vectors using block sparsification. When receiving a superposition of the compressed local gradients via a MIMO-MAC, a parameter server (PS) performs a joint MIMO detection and the sparse local-gradient recovery. Inspired by the turbo decoding principle, our joint detection-and-recovery algorithm accurately recovers the high-dimensional local gradients by iteratively exchanging their beliefs for MIMO detection and sparse local gradient recovery outputs. We then analyze the reconstruction error of the proposed algorithm and its impact on the convergence rate of federated learning. From simulations, our gradient compression and joint detection-and-recovery methods diminish the communication cost significantly while achieving identical classification accuracy for the case without any compression. △ Less

Submitted 12 June, 2022; originally announced June 2022.

arXiv:2107.03510 [pdf, ps, other]

Federated Learning with Downlink Device Selection

Authors: Mohammad Mohammadi Amiri, Sanjeev R. Kulkarni, H. Vincent Poor

Abstract: We study federated edge learning, where a global model is trained collaboratively using privacy-sensitive data at the edge of a wireless network. A parameter server (PS) keeps track of the global model and shares it with the wireless edge devices for training using their private local data. The devices then transmit their local model updates, which are used to update the global model, to the PS. T… ▽ More We study federated edge learning, where a global model is trained collaboratively using privacy-sensitive data at the edge of a wireless network. A parameter server (PS) keeps track of the global model and shares it with the wireless edge devices for training using their private local data. The devices then transmit their local model updates, which are used to update the global model, to the PS. The algorithm, which involves transmission over PS-to-device and device-to-PS links, continues until the convergence of the global model or lack of any participating devices. In this study, we consider device selection based on downlink channels over which the PS shares the global model with the devices. Performing digital downlink transmission, we design a partial device participation framework where a subset of the devices is selected for training at each iteration. Therefore, the participating devices can have a better estimate of the global model compared to the full device participation case which is due to the shared nature of the broadcast channel with the price of updating the global model with respect to a smaller set of data. At each iteration, the PS broadcasts different quantized global model updates to different participating devices based on the last global model estimates available at the devices. We investigate the best number of participating devices through experimental results for image classification using the MNIST dataset with biased distribution. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: accepted in IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), 2021

arXiv:2010.10030 [pdf, ps, other]

Blind Federated Edge Learning

Authors: Mohammad Mohammadi Amiri, Tolga M. Duman, Deniz Gunduz, Sanjeev R. Kulkarni, H. Vincent Poor

Abstract: We study federated edge learning (FEEL), where wireless edge devices, each with its own dataset, learn a global model collaboratively with the help of a wireless access point acting as the parameter server (PS). At each iteration, wireless devices perform local updates using their local data and the most recent global model received from the PS, and send their local updates to the PS over a wirele… ▽ More We study federated edge learning (FEEL), where wireless edge devices, each with its own dataset, learn a global model collaboratively with the help of a wireless access point acting as the parameter server (PS). At each iteration, wireless devices perform local updates using their local data and the most recent global model received from the PS, and send their local updates to the PS over a wireless fading multiple access channel (MAC). The PS then updates the global model according to the signal received over the wireless MAC, and shares it with the devices. Motivated by the additive nature of the wireless MAC, we propose an analog `over-the-air' aggregation scheme, in which the devices transmit their local updates in an uncoded fashion. Unlike recent literature on over-the-air edge learning, here we assume that the devices do not have channel state information (CSI), while the PS has imperfect CSI. Instead, the PS is equipped multiple antennas to alleviate the destructive effect of the channel, exacerbated due to the lack of perfect CSI. We design a receive beamforming scheme at the PS, and show that it can compensate for the lack of perfect CSI when the PS has a sufficient number of antennas. We also derive the convergence rate of the proposed algorithm highlighting the impact of the lack of perfect CSI, as well as the number of PS antennas. Both the experimental results and the convergence analysis illustrate the performance improvement of the proposed algorithm with the number of PS antennas, where the wireless fading MAC becomes deterministic despite the lack of perfect CSI when the PS has a sufficiently large number of antennas. △ Less

Submitted 19 October, 2020; originally announced October 2020.

Comments: submitted for publication. arXiv admin note: text overlap with arXiv:1907.03909

arXiv:2009.13269 [pdf, other]

Communicate to Learn at the Edge

Authors: Deniz Gunduz, David Burth Kurka, Mikolaj Jankowski, Mohammad Mohammadi Amiri, Emre Ozfatura, Sreejith Sreekumar

Abstract: Bringing the success of modern machine learning (ML) techniques to mobile devices can enable many new services and businesses, but also poses significant technical and research challenges. Two factors that are critical for the success of ML algorithms are massive amounts of data and processing power, both of which are plentiful, yet highly distributed at the network edge. Moreover, edge devices ar… ▽ More Bringing the success of modern machine learning (ML) techniques to mobile devices can enable many new services and businesses, but also poses significant technical and research challenges. Two factors that are critical for the success of ML algorithms are massive amounts of data and processing power, both of which are plentiful, yet highly distributed at the network edge. Moreover, edge devices are connected through bandwidth- and power-limited wireless links that suffer from noise, time-variations, and interference. Information and coding theory have laid the foundations of reliable and efficient communications in the presence of channel imperfections, whose application in modern wireless networks have been a tremendous success. However, there is a clear disconnect between the current coding and communication schemes, and the ML algorithms deployed at the network edge. In this paper, we challenge the current approach that treats these problems separately, and argue for a joint communication and learning paradigm for both the training and inference stages of edge learning. △ Less

Submitted 28 September, 2020; originally announced September 2020.

Comments: 13 pages, 5 figures

arXiv:2008.13492 [pdf, other]

Wireless for Machine Learning

Authors: Henrik Hellström, José Mairton B. da Silva Jr, Mohammad Mohammadi Amiri, Mingzhe Chen, Viktoria Fodor, H. Vincent Poor, Carlo Fischione

Abstract: As data generation increasingly takes place on devices without a wired connection, machine learning (ML) related traffic will be ubiquitous in wireless networks. Many studies have shown that traditional wireless protocols are highly inefficient or unsustainable to support ML, which creates the need for new wireless communication methods. In this survey, we give an exhaustive review of the state-of… ▽ More As data generation increasingly takes place on devices without a wired connection, machine learning (ML) related traffic will be ubiquitous in wireless networks. Many studies have shown that traditional wireless protocols are highly inefficient or unsustainable to support ML, which creates the need for new wireless communication methods. In this survey, we give an exhaustive review of the state-of-the-art wireless methods that are specifically designed to support ML services over distributed datasets. Currently, there are two clear themes within the literature, analog over-the-air computation and digital radio resource management optimized for ML. This survey gives a comprehensive introduction to these methods, reviews the most important works, highlights open problems, and discusses application scenarios. △ Less

Submitted 9 June, 2022; v1 submitted 31 August, 2020; originally announced August 2020.

arXiv:2008.11141 [pdf, ps, other]

Convergence of Federated Learning over a Noisy Downlink

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz, Sanjeev R. Kulkarni, H. Vincent Poor

Abstract: We study federated learning (FL), where power-limited wireless devices utilize their local datasets to collaboratively train a global model with the help of a remote parameter server (PS). The PS has access to the global model and shares it with the devices for local training, and the devices return the result of their local updates to the PS to update the global model. This framework requires dow… ▽ More We study federated learning (FL), where power-limited wireless devices utilize their local datasets to collaboratively train a global model with the help of a remote parameter server (PS). The PS has access to the global model and shares it with the devices for local training, and the devices return the result of their local updates to the PS to update the global model. This framework requires downlink transmission from the PS to the devices and uplink transmission from the devices to the PS. The goal of this study is to investigate the impact of the bandwidth-limited shared wireless medium in both the downlink and uplink on the performance of FL with a focus on the downlink. To this end, the downlink and uplink channels are modeled as fading broadcast and multiple access channels, respectively, both with limited bandwidth. For downlink transmission, we first introduce a digital approach, where a quantization technique is employed at the PS to broadcast the global model update at a common rate such that all the devices can decode it. Next, we propose analog downlink transmission, where the global model is broadcast by the PS in an uncoded manner. We consider analog transmission over the uplink in both cases. We further analyze the convergence behavior of the proposed analog approach assuming that the uplink transmission is error-free. Numerical experiments show that the analog downlink approach provides significant improvement over the digital one, despite a significantly lower transmit power at the PS. The experimental results corroborate the convergence results, and show that a smaller number of local iterations should be used when the data distribution is more biased, and also when the devices have a better estimate of the global model in the analog downlink approach. △ Less

Submitted 25 August, 2020; originally announced August 2020.

Comments: submitted for publication

arXiv:2006.10672 [pdf, ps, other]

Federated Learning With Quantized Global Model Updates

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz, Sanjeev R. Kulkarni, H. Vincent Poor

Abstract: We study federated learning (FL), which enables mobile devices to utilize their local datasets to collaboratively train a global model with the help of a central server, while keeping data localized. At each iteration, the server broadcasts the current global model to the devices for local training, and aggregates the local model updates from the devices to update the global model. Previous work o… ▽ More We study federated learning (FL), which enables mobile devices to utilize their local datasets to collaboratively train a global model with the help of a central server, while keeping data localized. At each iteration, the server broadcasts the current global model to the devices for local training, and aggregates the local model updates from the devices to update the global model. Previous work on the communication efficiency of FL has mainly focused on the aggregation of model updates from the devices, assuming perfect broadcasting of the global model. In this paper, we instead consider broadcasting a compressed version of the global model. This is to further reduce the communication cost of FL, which can be particularly limited when the global model is to be transmitted over a wireless medium. We introduce a lossy FL (LFL) algorithm, in which both the global model and the local model updates are quantized before being transmitted. We analyze the convergence behavior of the proposed LFL algorithm assuming the availability of accurate local model updates at the server. Numerical experiments show that the proposed LFL scheme, which quantizes the global model update (with respect to the global model estimate at the devices) rather than the global model itself, significantly outperforms other existing schemes studying quantization of the global model at the PS-to-device direction. Also, the performance loss of the proposed scheme is marginal compared to the fully lossless approach, where the PS and the devices transmit their messages entirely without any quantization. △ Less

Submitted 6 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

arXiv:2003.08059 [pdf, ps, other]

A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems

Authors: Yo-Seb Jeon, Mohammad Mohammadi Amiri, Jun Li, H. Vincent Poor

Abstract: Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices, each with its own local training data set. In this paper, we present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems in which the central server equipped with a massive antenna array communica… ▽ More Federated learning is a privacy-preserving approach to train a global model at a central server by collaborating with wireless devices, each with its own local training data set. In this paper, we present a compressive sensing approach for federated learning over massive multiple-input multiple-output communication systems in which the central server equipped with a massive antenna array communicates with the wireless devices. One major challenge in system design is to reconstruct local gradient vectors accurately at the central server, which are computed-and-sent from the wireless devices. To overcome this challenge, we first establish a transmission strategy to construct sparse transmitted signals from the local gradient vectors at the devices. We then propose a compressive sensing algorithm enabling the server to iteratively find the linear minimum-mean-square-error (LMMSE) estimate of the transmitted signal by exploiting its sparsity. We also derive an analytical threshold for the residual error at each iteration, to design the stopping criterion of the proposed algorithm. We show that for a sparse transmitted signal, the proposed algorithm requires less computationally complexity than LMMSE. Simulation results demonstrate that the presented approach outperforms conventional linear beamforming approaches and reduces the performance gap between federated learning and centralized learning with perfect reconstruction. △ Less

Submitted 5 August, 2020; v1 submitted 18 March, 2020; originally announced March 2020.

Comments: The title of the paper has been changed from "Gradient Estimation for Federated Learning over Massive MIMO Communication Systems" to "A Compressive Sensing Approach for Federated Learning over Massive MIMO Communication Systems"

arXiv:2001.10402 [pdf, ps, other]

Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz, Sanjeev R. Kulkarni, H. Vincent Poor

Abstract: We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the… ▽ More We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution. △ Less

Submitted 8 May, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Comments: submitted for publication

arXiv:2001.01255 [pdf, other]

Multi-Antenna Coded Content Delivery with Caching: A Low-Complexity Solution

Authors: Junlin Zhao, Mohammad Mohammadi Amiri, Deniz Gündüz

Abstract: We study downlink beamforming in a single-cell network with a multi-antenna base station serving cache-enabled users. Assuming a library of files with a common rate, we formulate the minimum transmit power with proactive caching and coded delivery as a non-convex optimization problem. While this multiple multicast problem can be efficiently solved by successive convex approximation (SCA), the comp… ▽ More We study downlink beamforming in a single-cell network with a multi-antenna base station serving cache-enabled users. Assuming a library of files with a common rate, we formulate the minimum transmit power with proactive caching and coded delivery as a non-convex optimization problem. While this multiple multicast problem can be efficiently solved by successive convex approximation (SCA), the complexity of the problem grows exponentially with the number of subfiles delivered to each user in each time slot, which itself grows exponentially with the number of users. We introduce a low-complexity alternative through time-sharing that limits the number of subfiles received by a user in each time slot. We then consider the joint design of beamforming and content delivery with sparsity constraints to limit the number of subfiles received by a user in each time slot. Numerical simulations show that the low-complexity scheme has only a small performance gap to that obtained by solving the joint problem with sparsity constraints, and outperforms state-of-the-art results at all signal-to-noise ratio (SNR) and rate values with a sufficient number of transmit antennas. A lower bound on the achievable degrees-of-freedom (DoF) of the low-complexity scheme is derived to characterize its performance in the high SNR regime. △ Less

Submitted 23 July, 2020; v1 submitted 5 January, 2020; originally announced January 2020.

Comments: to appear in IEEE Transactions on Wireless Communications

arXiv:1907.09769 [pdf, ps, other]

Federated Learning over Wireless Fading Channels

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD). We… ▽ More We study federated machine learning at the wireless network edge, where limited power wireless devices, each with its own dataset, build a joint model with the help of a remote parameter server (PS). We consider a bandwidth-limited fading multiple access channel (MAC) from the wireless devices to the PS, and propose various techniques to implement distributed stochastic gradient descent (DSGD). We first propose a digital DSGD (D-DSGD) scheme, in which one device is selected opportunistically for transmission at each iteration based on the channel conditions; the scheduled device quantizes its gradient estimate to a finite number of bits imposed by the channel condition, and transmits these bits to the PS in a reliable manner. Next, motivated by the additive nature of the wireless MAC, we propose a novel analog communication scheme, referred to as the compressed analog DSGD (CA-DSGD), where the devices first sparsify their gradient estimates while accumulating error, and project the resultant sparse vector into a low-dimensional vector for bandwidth reduction. Numerical results show that D-DSGD outperforms other digital approaches in the literature; however, in general the proposed CA-DSGD algorithm converges faster than the D-DSGD scheme and other schemes in the literature, and reaches a higher level of accuracy. We have observed that the gap between the analog and digital schemes increases when the datasets of devices are not independent and identically distributed (i.i.d.). Furthermore, the performance of the CA-DSGD scheme is shown to be robust against imperfect channel state information (CSI) at the devices. Overall these results show clear advantages for the proposed analog over-the-air DSGD scheme, which suggests that learning and communication algorithms should be designed jointly to achieve the best end-to-end performance in machine learning applications at the wireless edge. △ Less

Submitted 10 February, 2020; v1 submitted 23 July, 2019; originally announced July 2019.

Comments: to appear, IEEE Transactions on Wireless Communications

arXiv:1907.03909 [pdf, ps, other]

Collaborative Machine Learning at the Wireless Edge with Blind Transmitters

Authors: Mohammad Mohammadi Amiri, Tolga M. Duman, Deniz Gunduz

Abstract: We study wireless collaborative machine learning (ML), where mobile edge devices, each with its own dataset, carry out distributed stochastic gradient descent (DSGD) over-the-air with the help of a wireless access point acting as the parameter server (PS). At each iteration of the DSGD algorithm wireless devices compute gradient estimates with their local datasets, and send them to the PS over a w… ▽ More We study wireless collaborative machine learning (ML), where mobile edge devices, each with its own dataset, carry out distributed stochastic gradient descent (DSGD) over-the-air with the help of a wireless access point acting as the parameter server (PS). At each iteration of the DSGD algorithm wireless devices compute gradient estimates with their local datasets, and send them to the PS over a wireless fading multiple access channel (MAC). Motivated by the additive nature of the wireless MAC, we propose an analog DSGD scheme, in which the devices transmit scaled versions of their gradient estimates in an uncoded fashion. We assume that the channel state information (CSI) is available only at the PS. We instead allow the PS to employ multiple antennas to alleviate the destructive fading effect, which cannot be cancelled by the transmitters due to the lack of CSI. Theoretical analysis indicates that, with the proposed DSGD scheme, increasing the number of PS antennas mitigates the fading effect, and, in the limit, the effects of fading and noise disappear, and the PS receives aligned signals used to update the model parameter. The theoretical results are then corroborated with the experimental ones. △ Less

Submitted 8 July, 2019; originally announced July 2019.

arXiv:1903.03856 [pdf, other]

A Low-Complexity Cache-Aided Multi-antenna Content Delivery Scheme

Authors: Junlin Zhao, Mohammad Mohammadi Amiri, Deniz Gündüz

Abstract: We study downlink beamforming in a single-cell network with a multi-antenna base station (BS) serving cache-enabled users. For a given common rate of the files in the system, we first formulate the minimum transmit power with beamforming at the BS as a non-convex optimization problem. This corresponds to a multiple multicast problem, to which a stationary solution can be efficiently obtained throu… ▽ More We study downlink beamforming in a single-cell network with a multi-antenna base station (BS) serving cache-enabled users. For a given common rate of the files in the system, we first formulate the minimum transmit power with beamforming at the BS as a non-convex optimization problem. This corresponds to a multiple multicast problem, to which a stationary solution can be efficiently obtained through successive convex approximation (SCA). It is observed that the complexity of the problem grows exponentially with the number of subfiles delivered to each user in each time slot, which itself grows exponentially with the number of users in the system. Therefore, we introduce a low-complexity alternative through time-sharing that limits the number of subfiles that can be received by a user in each time slot. It is shown through numerical simulations that, the reduced-complexity beamforming scheme has minimal performance gap compared to transmitting all the subfiles jointly, and outperforms the state-of-the-art low-complexity scheme at all SNR and rate values with sufficient spatial degrees of freedom, and in the high SNR/high rate regime when the number of spatial degrees of freedom is limited. △ Less

Submitted 16 May, 2019; v1 submitted 9 March, 2019; originally announced March 2019.

Comments: Accepted in IEEE SPAWC 2019

arXiv:1901.00844 [pdf, ps, other]

doi 10.1109/TSP.2020.2981904

Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: We study federated machine learning (ML) at the wireless edge, where power- and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and transmitted to the PS over orthogonal link… ▽ More We study federated machine learning (ML) at the wireless edge, where power- and bandwidth-limited wireless devices with local datasets carry out distributed stochastic gradient descent (DSGD) with the help of a remote parameter server (PS). Standard approaches assume separate computation and communication, where local gradient estimates are compressed and transmitted to the PS over orthogonal links. Following this digital approach, we introduce D-DSGD, in which the wireless devices employ gradient quantization and error accumulation, and transmit their gradient estimates to the PS over a multiple access channel (MAC). We then introduce a novel analog scheme, called A-DSGD, which exploits the additive nature of the wireless MAC for over-the-air gradient computation, and provide convergence analysis for this approach. In A-DSGD, the devices first sparsify their gradient estimates, and then project them to a lower dimensional space imposed by the available channel bandwidth. These projections are sent directly over the MAC without employing any digital code. Numerical results show that A-DSGD converges faster than D-DSGD thanks to its more efficient use of the limited bandwidth and the natural alignment of the gradient estimates over the channel. The improvement is particularly compelling at low power and low bandwidth regimes. We also illustrate for a classification problem that, A-DSGD is more robust to bias in data distribution across devices, while D-DSGD significantly outperforms other digital schemes in the literature. We also observe that both D-DSGD and A-DSGD perform better by increasing the number of devices (while keeping the total dataset size constant), showing their ability in harnessing the computation power of edge devices. △ Less

Submitted 7 April, 2020; v1 submitted 3 January, 2019; originally announced January 2019.

Comments: IEEE Transactions on Signal Processing, Early Access, Mar. 2020

arXiv:1810.09992 [pdf, ps, other]

doi 10.1109/TSP.2019.2952051

Computation Scheduling for Distributed Machine Learning with Straggling Workers

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to workers in order to tolerate stragglers. We consider sequential computation of tasks assigned to a worker, while the result of each computation is sent to the mast… ▽ More We study scheduling of computation tasks across n workers in a large scale distributed learning problem with the help of a master. Computation and communication delays are assumed to be random, and redundant computations are assigned to workers in order to tolerate stragglers. We consider sequential computation of tasks assigned to a worker, while the result of each computation is sent to the master right after its completion. Each computation round, which can model an iteration of the stochastic gradient descent (SGD) algorithm, is completed once the master receives k distinct computations, referred to as the computation target. Our goal is to characterize the average completion time as a function of the computation load, which denotes the portion of the dataset available at each worker, and the computation target. We propose two computation scheduling schemes that specify the tasks assigned to each worker, as well as their computation schedule, i.e., the order of execution. Assuming a general statistical model for computation and communication delays, we derive the average completion time of the proposed schemes. We also establish a lower bound on the minimum average completion time by assuming prior knowledge of the random delays. Experimental results carried out on Amazon EC2 cluster show a significant reduction in the average completion time over existing coded and uncoded computing schemes. It is also shown numerically that the gap between the proposed scheme and the lower bound is relatively small, confirming the efficiency of the proposed scheduling design. △ Less

Submitted 23 May, 2019; v1 submitted 23 October, 2018; originally announced October 2018.

Comments: Submitted for publication

arXiv:1808.04835 [pdf, other]

Audience-Retention-Rate-Aware Caching and Coded Video Delivery with Asynchronous Demands

Authors: Qianqian Yang, Mohammad Mohammadi Amiri, Deniz Gündüz

Abstract: Most results on coded caching focus on a static scenario, in which a fixed number of users synchronously place their requests from a content library, and the performance is measured in terms of the latency in satisfying all of these demands. In practice, however, users start watching an online video content asynchronously over time, and often abort watching a video before it is completed. The latt… ▽ More Most results on coded caching focus on a static scenario, in which a fixed number of users synchronously place their requests from a content library, and the performance is measured in terms of the latency in satisfying all of these demands. In practice, however, users start watching an online video content asynchronously over time, and often abort watching a video before it is completed. The latter behaviour is captured by the notion of audience retention rate, which measures the portion of a video content watched on average. In order to bring coded caching one step closer to practice, asynchronous user demands are considered in this paper, by allowing user demands to arrive randomly over time, and both the popularity of video files, and the audience retention rates are taken into account. A decentralized partial coded caching (PCC) scheme is proposed, together with two cache allocation schemes; namely the optimal cache allocation (OCA) and the popularity-based cache allocation (PCA), which allocate users' caches among different chunks of the video files in the library. Numerical results validate that the proposed PCC scheme, either with OCA or PCA, outperforms conventional uncoded caching as well as the state-of-the-art decentralized caching schemes, which consider only the file popularities, and are designed for synchronous demand arrivals. An information-theoretical lower bound on the average delivery rate is also presented. △ Less

Submitted 14 August, 2018; originally announced August 2018.

Comments: 30 pages, 5 figures

arXiv:1806.09894 [pdf, ps, other]

On the Capacity Region of a Cache-Aided Gaussian Broadcast Channel with Multi-Layer Messages

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: A cache-aided $K$-user Gaussian broadcast channel (BC) is studied. The transmitter has a library of $N$ files, from which each user requests one. The users are equipped with caches of different sizes, which are filled without the knowledge of the user requests in a centralized manner. Differently from the literature, it is assumed that each file can be delivered to different users at different rat… ▽ More A cache-aided $K$-user Gaussian broadcast channel (BC) is studied. The transmitter has a library of $N$ files, from which each user requests one. The users are equipped with caches of different sizes, which are filled without the knowledge of the user requests in a centralized manner. Differently from the literature, it is assumed that each file can be delivered to different users at different rates, which may correspond to different quality representations of the underlying content, e.g., scalable coded video segments. Accordingly, instead of a single achievable rate, the system performance is characterized by a rate tuple, which corresponds to the vector of rates users' requests can be delivered at. The goal is to characterize the set of all achievable rate tuples for a given total cache capacity by designing joint cache and channel coding schemes together with cache allocation across users. Assuming that the users are ordered in increasing channel quality, each file is coded into $K$ layers, and only the first $k$ layers of the requested file are delivered to user $k$, $k=1,...,K$. Three different coding schemes are proposed, which differ in the way they deliver the coded contents over the BC; in particular, time-division, superposition, and dirty paper coding schemes are studied. Corresponding achievable rate regions are characterized, and compared with a novel outer bound. To the best of our knowledge, this is the first work studying the delivery of files at different rates over a cache-aided noisy BC. △ Less

Submitted 26 June, 2018; originally announced June 2018.

Comments: Part of this work was presented at the IEEE International Symposium on Information Theory, Colorado, USA, June 2018

arXiv:1712.03433 [pdf, ps, other]

Caching and Coded Delivery over Gaussian Broadcast Channels for Energy Efficiency

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: A cache-aided $K$-user Gaussian broadcast channel (BC) is considered. The transmitter has a library of $N$ equal-rate files, from which each user demands one. The impact of the equal-capacity receiver cache memories on the minimum required transmit power to satisfy all user demands is studied. Considering uniformly random demands across the library, both the minimum average power (averaged over al… ▽ More A cache-aided $K$-user Gaussian broadcast channel (BC) is considered. The transmitter has a library of $N$ equal-rate files, from which each user demands one. The impact of the equal-capacity receiver cache memories on the minimum required transmit power to satisfy all user demands is studied. Considering uniformly random demands across the library, both the minimum average power (averaged over all demand combinations) and the minimum peak power (minimum power required to satisfy all demand combinations) are studied. Upper bounds are presented on the minimum required average and peak transmit power as a function of the cache capacity considering both centralized and decentralized caching. The lower bounds on the minimum required average and peak power values are also derived assuming uncoded cache placement. The bounds for both the peak and average power values are shown to be tight in the centralized scenario through numerical simulations. The results in this paper show that proactive caching and coded delivery can provide significant energy savings in wireless networks. △ Less

Submitted 30 April, 2018; v1 submitted 9 December, 2017; originally announced December 2017.

Comments: IEEE Journal on Selected Areas in Communications, to appear

arXiv:1702.05454 [pdf, ps, other]

Cache-Aided Content Delivery over Erasure Broadcast Channels

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: A cache-aided broadcast network is studied, in which a server delivers contents to a group of receivers over a packet erasure broadcast channel (BC). The receivers are divided into two sets with regards to their channel qualities: the weak and strong receivers, where all the weak receivers have statistically worse channel qualities than all the strong receivers. The weak receivers, in order to com… ▽ More A cache-aided broadcast network is studied, in which a server delivers contents to a group of receivers over a packet erasure broadcast channel (BC). The receivers are divided into two sets with regards to their channel qualities: the weak and strong receivers, where all the weak receivers have statistically worse channel qualities than all the strong receivers. The weak receivers, in order to compensate for the high erasure probability they encounter over the channel, are equipped with cache memories of equal size, while the receivers in the strong set have no caches. Data can be pre-delivered to weak receivers' caches over the off-peak traffic period before the receivers reveal their demands. Allowing arbitrary erasure probabilities for the weak and strong receivers, a joint caching and channel coding scheme, which divides each file into several subfiles, and applies a different caching and delivery scheme for each subfile, is proposed. It is shown that all the receivers, even those without any cache memories, benefit from the presence of caches across the network. An information theoretic trade-off between the cache size and the achievable rate is formulated. It is shown that the proposed scheme improves upon the state-of-the-art in terms of the achievable trade-off. △ Less

Submitted 31 May, 2017; v1 submitted 17 February, 2017; originally announced February 2017.

arXiv:1611.01579 [pdf, ps, other]

Decentralized Caching and Coded Delivery with Distinct Cache Capacities

Authors: Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gunduz

Abstract: Decentralized proactive caching and coded delivery is studied in a content delivery network, where each user is equipped with a cache memory, not necessarily of equal capacity. Cache memories are filled in advance during the off-peak traffic period in a decentralized manner, i.e., without the knowledge of the number of active users, their identities, or their particular demands. User demands are r… ▽ More Decentralized proactive caching and coded delivery is studied in a content delivery network, where each user is equipped with a cache memory, not necessarily of equal capacity. Cache memories are filled in advance during the off-peak traffic period in a decentralized manner, i.e., without the knowledge of the number of active users, their identities, or their particular demands. User demands are revealed during the peak traffic period, and are served simultaneously through an error-free shared link. The goal is to find the minimum delivery rate during the peak traffic period that is sufficient to satisfy all possible demand combinations. A group-based decentralized caching and coded delivery scheme is proposed, and it is shown to improve upon the state-of-the-art in terms of the minimum required delivery rate when there are more users in the system than files. Numerical results indicate that the improvement is more significant as the cache capacities of the users become more skewed. A new lower bound on the delivery rate is also presented, which provides a tighter bound than the classical cut-set bound. △ Less

Submitted 31 July, 2017; v1 submitted 4 November, 2016; originally announced November 2016.

Comments: to appear, IEEE Transactions on Communications

arXiv:1610.03792 [pdf, other]

Decentralized Coded Caching with Distinct Cache Capacities

Authors: Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gündüz

Abstract: Decentralized coded caching is studied for a content server with $N$ files, each of size $F$ bits, serving $K$ active users, each equipped with a cache of distinct capacity. It is assumed that the users' caches are filled in advance during the off-peak traffic period without the knowledge of the number of active users, their identities, or the particular demands. User demands are revealed during t… ▽ More Decentralized coded caching is studied for a content server with $N$ files, each of size $F$ bits, serving $K$ active users, each equipped with a cache of distinct capacity. It is assumed that the users' caches are filled in advance during the off-peak traffic period without the knowledge of the number of active users, their identities, or the particular demands. User demands are revealed during the peak traffic period, and are served simultaneously through an error-free shared link. A new decentralized coded caching scheme is proposed for this scenario, and it is shown to improve upon the state-of-the-art in terms of the required delivery rate over the shared link, when there are more users in the system than the number of files. Numerical results indicate that the improvement becomes more significant as the cache capacities of the users become more skewed. △ Less

Submitted 12 October, 2016; originally announced October 2016.

Comments: To be presented in ASILOMAR conference, 2016

arXiv:1605.01993 [pdf, other]

Coded Caching for a Large Number Of Users

Authors: Mohammad Mohammadi Amiri, Qianqian Yang, Deniz Gunduz

Abstract: Information theoretic analysis of a coded caching system is considered, in which a server with a database of N equal-size files, each F bits long, serves K users. Each user is assumed to have a local cache that can store M files, i.e., capacity of MF bits. Proactive caching to user terminals is considered, in which the caches are filled by the server in advance during the placement phase, without… ▽ More Information theoretic analysis of a coded caching system is considered, in which a server with a database of N equal-size files, each F bits long, serves K users. Each user is assumed to have a local cache that can store M files, i.e., capacity of MF bits. Proactive caching to user terminals is considered, in which the caches are filled by the server in advance during the placement phase, without knowing the user requests. Each user requests a single file, and all the requests are satisfied simultaneously through a shared error-free link during the delivery phase. First, centralized coded caching is studied assuming both the number and the identity of the active users in the delivery phase are known by the server during the placement phase. A novel group-based centralized coded caching (GBC) scheme is proposed for a cache capacity of M = N/K. It is shown that this scheme achieves a smaller delivery rate than all the known schemes in the literature. The improvement is then extended to a wider range of cache capacities through memory-sharing between the proposed scheme and other known schemes in the literature. Next, the proposed centralized coded caching idea is exploited in the decentralized setting, in which the identities of the users that participate in the delivery phase are assumed to be unknown during the placement phase. It is shown that the proposed decentralized caching scheme also achieves a delivery rate smaller than the state-of-the-art. Numerical simulations are also presented to corroborate our theoretical results. △ Less

Submitted 6 May, 2016; originally announced May 2016.

arXiv:1604.03888 [pdf, other]

Fundamental Limits of Coded Caching: Improved Delivery Rate-Cache Capacity Trade-off

Authors: Mohammad Mohammadi Amiri, Deniz Gunduz

Abstract: A centralized coded caching system, consisting of a server delivering N popular files, each of size F bits, to K users through an error-free shared link, is considered. It is assumed that each user is equipped with a local cache memory with capacity MF bits, and contents can be proactively cached into these caches over a low traffic period; however, without the knowledge of the user demands. Durin… ▽ More A centralized coded caching system, consisting of a server delivering N popular files, each of size F bits, to K users through an error-free shared link, is considered. It is assumed that each user is equipped with a local cache memory with capacity MF bits, and contents can be proactively cached into these caches over a low traffic period; however, without the knowledge of the user demands. During the peak traffic period each user requests a single file from the server. The goal is to minimize the number of bits delivered by the server over the shared link, known as the delivery rate, over all user demand combinations. A novel coded caching scheme for the cache capacity of M= (N-1)/K is proposed. It is shown that the proposed scheme achieves a smaller delivery rate than the existing coded caching schemes in the literature when K > N >= 3. Furthermore, we argue that the delivery rate of the proposed scheme is within a constant multiplicative factor of 2 of the optimal delivery rate for cache capacities 1/K <= M <= (N-1)/K, when K > N >= 3. △ Less

Submitted 13 December, 2016; v1 submitted 13 April, 2016; originally announced April 2016.

Comments: To appear in IEEE Transactions on Communications

Showing 1–26 of 26 results for author: Amiri, M M