subscribe to arXiv mailings

Multi-Agent Bandit Learning through Heterogeneous Action Erasure Channels

Authors: Osama A. Hanna, Merve Karakas, Lin F. Yang, Christina Fragouli

Abstract: Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by… ▽ More Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments. △ Less

Submitted 29 April, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

arXiv:2301.04780 [pdf, other]

doi 10.1145/3576840.3578316

Much Ado About Gender: Current Practices and Future Recommendations for Appropriate Gender-Aware Information Access

Authors: Christine Pinney, Amifa Raj, Alex Hanna, Michael D. Ekstrand

Abstract: Information access research (and development) sometimes makes use of gender, whether to report on the demographics of participants in a user study, as inputs to personalized results or recommendations, or to make systems gender-fair, amongst other purposes. This work makes a variety of assumptions about gender, however, that are not necessarily aligned with current understandings of what gender is… ▽ More Information access research (and development) sometimes makes use of gender, whether to report on the demographics of participants in a user study, as inputs to personalized results or recommendations, or to make systems gender-fair, amongst other purposes. This work makes a variety of assumptions about gender, however, that are not necessarily aligned with current understandings of what gender is, how it should be encoded, and how a gender variable should be ethically used. In this work, we present a systematic review of papers on information retrieval and recommender systems that mention gender in order to document how gender is currently being used in this field. We find that most papers mentioning gender do not use an explicit gender variable, but most of those that do either focus on contextualizing results of model performance, personalizing a system based on assumptions of user gender, or auditing a model's behavior for fairness or other privacy-related issues. Moreover, most of the papers we review rely on a binary notion of gender, even if they acknowledge that gender cannot be split into two categories. We connect these findings with scholarship on gender theory and recent work on gender in human-computer interaction and natural language processing. We conclude by making recommendations for ethical and well-grounded use of gender in building and researching information access systems. △ Less

Submitted 13 January, 2023; v1 submitted 11 January, 2023; originally announced January 2023.

Comments: Published in CHIIR 2023

arXiv:2211.05632 [pdf, ps, other]

Contexts can be Cheap: Solving Stochastic Contextual Bandits with Linear Bandit Algorithms

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a numb… ▽ More In this paper, we address the stochastic contextual linear bandit problem, where a decision maker is provided a context (a random set of actions drawn from a distribution). The expected reward of each action is specified by the inner product of the action and an unknown parameter. The goal is to design an algorithm that learns to play as close as possible to the unknown optimal policy after a number of action plays. This problem is considered more challenging than the linear bandit problem, which can be viewed as a contextual bandit problem with a \emph{fixed} context. Surprisingly, in this paper, we show that the stochastic contextual problem can be solved as if it is a linear bandit problem. In particular, we establish a novel reduction framework that converts every stochastic contextual linear bandit instance to a linear bandit instance, when the context distribution is known. When the context distribution is unknown, we establish an algorithm that reduces the stochastic contextual instance to a sequence of linear bandit instances with small misspecifications and achieves nearly the same worst-case regret bound as the algorithm that solves the misspecified linear bandit instances. As a consequence, our results imply a $O(d\sqrt{T\log T})$ high-probability regret bound for contextual linear bandits, making progress in resolving an open problem in (Li et al., 2019), (Li et al., 2021). Our reduction framework opens up a new way to approach stochastic contextual linear bandit problems, and enables improved regret bounds in a number of instances including the batch setting, contextual bandits with misspecifications, contextual bandits with sparse unknown parameters, and contextual bandits with adversarial corruption. △ Less

Submitted 26 May, 2023; v1 submitted 8 November, 2022; originally announced November 2022.

arXiv:2207.04958 [pdf, other]

Documenting Data Production Processes: A Participatory Approach for Data Work

Authors: Milagros Miceli, Tianling Yang, Adriana Alvarado Garcia, Julian Posada, Sonja Mei Wang, Marc Pohl, Alex Hanna

Abstract: The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets toward documenting data production. We draw on participatory design and collaborate wi… ▽ More The opacity of machine learning data is a significant threat to ethical data work and intelligible systems. Previous research has addressed this issue by proposing standardized checklists to document datasets. This paper expands that field of inquiry by proposing a shift of perspective: from documenting datasets toward documenting data production. We draw on participatory design and collaborate with data workers at two companies located in Bulgaria and Argentina, where the collection and annotation of data for machine learning are outsourced. Our investigation comprises 2.5 years of research, including 33 semi-structured interviews, five co-design workshops, the development of prototypes, and several feedback instances with participants. We identify key challenges and requirements related to the integration of documentation practices in real-world data production scenarios. Our findings comprise important design considerations and highlight the value of designing data documentation based on the needs of data workers. We argue that a view of documentation as a boundary object, i.e., an object that can be used differently across organizations and teams but holds enough immutable content to maintain integrity, can be useful when designing documentation to retrieve heterogeneous, often distributed, contexts of data production. △ Less

Submitted 9 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

Journal ref: Proceedings of the ACM on Human-Computer Interaction, 6(CSCW2), 2022

arXiv:2207.03445 [pdf, other]

Differentially Private Stochastic Linear Bandits: (Almost) for Free

Authors: Osama A. Hanna, Antonious M. Girgis, Christina Fragouli, Suhas Diggavi

Abstract: In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear… ▽ More In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde{O}(\frac{1}ε\sqrt{T})$. In the local case, we achieve a regret of $\tilde{O}(\frac{1}ε{\sqrt{T}})$ which matches the non-private regret for constant $ε$, but suffers a regret penalty when $ε$ is small. In the shuffled model, we also achieve regret of $\tilde{O}(\sqrt{T}+\frac{1}ε)$ %for small $ε$ as in the central case, while the best previously known algorithm suffers a regret of $\tilde{O}(\frac{1}ε{T^{3/5}})$. Our numerical evaluation validates our theoretical results. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2206.04180 [pdf, ps, other]

Learning in Distributed Contextual Linear Bandits Without Sharing the Context

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit lear… ▽ More Contextual linear bandits is a rich and theoretically important model that has many practical applications. Recently, this setup gained a lot of interest in applications over wireless where communication constraints can be a performance bottleneck, especially when the contexts come from a large $d$-dimensional space. In this paper, we consider a distributed memoryless contextual linear bandit learning problem, where the agents who observe the contexts and take actions are geographically separated from the learner who performs the learning while not seeing the contexts. We assume that contexts are generated from a distribution and propose a method that uses $\approx 5d$ bits per context for the case of unknown context distribution and $0$ bits per context if the context distribution is known, while achieving nearly the same regret bound as if the contexts were directly observable. The former bound improves upon existing bounds by a $\log(T)$ factor, where $T$ is the length of the horizon, while the latter achieves information theoretical tightness. △ Less

Submitted 8 June, 2022; originally announced June 2022.

arXiv:2112.01716 [pdf, other]

Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research

Authors: Bernard Koch, Emily Denton, Alex Hanna, Jacob G. Foster

Abstract: Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine lear… ▽ More Benchmark datasets play a central role in the organization of machine learning research. They coordinate researchers around shared research problems and serve as a measure of progress towards shared goals. Despite the foundational role of benchmarking practices in this field, relatively little attention has been paid to the dynamics of benchmark dataset use and reuse, within or across machine learning subcommunities. In this paper, we dig into these dynamics. We study how dataset usage patterns differ across machine learning subcommunities and across time from 2015-2020. We find increasing concentration on fewer and fewer datasets within task communities, significant adoption of datasets from other tasks, and concentration across the field on datasets that have been introduced by researchers situated within a small number of elite institutions. Our results have implications for scientific evaluation, AI ethics, and equity/access within the field. △ Less

Submitted 3 December, 2021; originally announced December 2021.

Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia

arXiv:2111.15366 [pdf, other]

AI and the Everything in the Whole Wide World Benchmark

Authors: Inioluwa Deborah Raji, Emily M. Bender, Amandalynne Paullada, Emily Denton, Alex Hanna

Abstract: There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress to… ▽ More There is a tendency across different subfields in AI to valorize a small collection of influential benchmarks. These benchmarks operate as stand-ins for a range of anointed common problems that are frequently framed as foundational milestones on the path towards flexible and generalizable AI systems. State-of-the-art performance on these benchmarks is widely understood as indicative of progress towards these long-term goals. In this position paper, we explore the limits of such benchmarks in order to reveal the construct validity issues in their framing as the functionally "general" broad measures of progress they are set up to be. △ Less

Submitted 26 November, 2021; originally announced November 2021.

Comments: Accepted in NeurIPS 2021 Benchmarks and Datasets track

arXiv:2111.06067 [pdf, other]

Solving Multi-Arm Bandit Using a Few Bits of Communication

Authors: Osama A. Hanna, Lin F. Yang, Christina Fragouli

Abstract: The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In… ▽ More The multi-armed bandit (MAB) problem is an active learning framework that aims to select the best among a set of actions by sequentially observing rewards. Recently, it has become popular for a number of applications over wireless networks, where communication constraints can form a bottleneck. Existing works usually fail to address this issue and can become infeasible in certain applications. In this paper we address the communication problem by optimizing the communication of rewards collected by distributed agents. By providing nearly matching upper and lower bounds, we tightly characterize the number of bits needed per reward for the learner to accurately learn without suffering additional regret. In particular, we establish a generic reward quantization algorithm, QuBan, that can be applied on top of any (no-regret) MAB algorithm to form a new communication-efficient counterpart, that requires only a few (as low as 3) bits to be sent per iteration while preserving the same regret bound. Our lower bound is established via constructing hard instances from a subgaussian distribution. Our theory is further corroborated by numerically experiments. △ Less

Submitted 11 November, 2021; originally announced November 2021.

arXiv:2108.04308 [pdf, other]

doi 10.1145/3476058

Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

Authors: Morgan Klaus Scheuerman, Emily Denton, Alex Hanna

Abstract: Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial r… ▽ More Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process. △ Less

Submitted 16 September, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

Comments: CSCW 2021; 37 pages

Journal ref: Proc. ACM Hum.-Comput. Interact.5, CSCW2, Article 317(October 2021), 37 pages

arXiv:2104.04546 [pdf, other]

One-class Autoencoder Approach for Optimal Electrode Set-up Identification in Wearable EEG Event Monitoring

Authors: Laura M. Ferrari, Guy Abi Hanna, Paolo Volpe, Esma Ismailova, François Bremond, Maria A. Zuluaga

Abstract: A limiting factor towards the wide routine use of wearables devices for continuous healthcare monitoring is their cumbersome and obtrusive nature. This is particularly true for electroencephalography (EEG) recordings, which require the placement of multiple electrodes in contact with the scalp. In this work, we propose to identify the optimal wearable EEG electrode set-up, in terms of minimal numb… ▽ More A limiting factor towards the wide routine use of wearables devices for continuous healthcare monitoring is their cumbersome and obtrusive nature. This is particularly true for electroencephalography (EEG) recordings, which require the placement of multiple electrodes in contact with the scalp. In this work, we propose to identify the optimal wearable EEG electrode set-up, in terms of minimal number of electrodes, comfortable location and performance, for EEG-based event detection and monitoring. By relying on the demonstrated power of autoencoder (AE) networks to learn latent representations from high-dimensional data, our proposed strategy trains an AE architecture in a one-class classification setup with different electrode set-ups as input data. The resulting models are assessed using the F-score and the best set-up is chosen according to the established optimal criteria. Using alpha wave detection as use case, we demonstrate that the proposed method allows to detect an alpha state from an optimal set-up consisting of electrodes in the forehead and behind the ear, with an average F-score of 0.78. Our results suggest that a learning-based approach can be used to enable the design and implementation of optimized wearable devices for real-life healthcare monitoring. △ Less

Submitted 19 May, 2021; v1 submitted 9 April, 2021; originally announced April 2021.

arXiv:2012.07913 [pdf, other]

Quantizing data for distributed learning

Authors: Osama A. Hanna, Yahya H. Ezzeldin, Christina Fragouli, Suhas Diggavi

Abstract: We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alt… ▽ More We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the gradient at the quantized data point and the original gradient using a small number of bits. Lastly, in order to save communication, our approach adds a layer that decides whether to transmit a quantized data sample or not based on its importance for learning. We analyze the convergence of the proposed approach for smooth convex and non-convex objective functions and show that we can achieve order optimal convergence rates with communication that mostly depends on the data rather than the model (gradient) dimension. We use our proposed algorithm to train ResNet models on the CIFAR-10 and ImageNet datasets, and show that we can achieve an order of magnitude savings over gradient compression methods. These communication savings come at the cost of increasing computation at the learning agent, and thus our approach is beneficial in scenarios where communication load is the main problem. △ Less

Submitted 8 September, 2021; v1 submitted 14 December, 2020; originally announced December 2020.

arXiv:2012.05345 [pdf, ps, other]

doi 10.1016/j.patter.2021.100336

Data and its (dis)contents: A survey of dataset development and use in machine learning research

Authors: Amandalynne Paullada, Inioluwa Deborah Raji, Emily M. Bender, Emily Denton, Alex Hanna

Abstract: Datasets have played a foundational role in the advancement of machine learning research. They form the basis for the models we design and deploy, as well as our primary medium for benchmarking and evaluation. Furthermore, the ways in which we collect, construct and share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development. However, recen… ▽ More Datasets have played a foundational role in the advancement of machine learning research. They form the basis for the models we design and deploy, as well as our primary medium for benchmarking and evaluation. Furthermore, the ways in which we collect, construct and share these datasets inform the kinds of problems the field pursues and the methods explored in algorithm development. However, recent work from a breadth of perspectives has revealed the limitations of predominant practices in dataset collection and use. In this paper, we survey the many concerns raised about the way we collect and use data in machine learning and advocate that a more cautious and thorough understanding of data is necessary to address several of the practical and ethical issues of the field. △ Less

Submitted 9 December, 2020; originally announced December 2020.

Journal ref: Patterns, Volume 2, Issue 11, 100336. 2021

arXiv:2010.13970 [pdf, other]

An Analysis of Security Vulnerabilities in Container Images for Scientific Data Analysis

Authors: Bhupinder Kaur, Mathieu Dugré, Aiman Hanna, Tristan Glatard

Abstract: Software containers greatly facilitate the deployment and reproducibility of scientific data analyses in various platforms. However, container images often contain outdated or unnecessary software packages, which increases the number of security vulnerabilities in the images, widens the attack surface in the container host, and creates substantial security risks for computing infrastructures at la… ▽ More Software containers greatly facilitate the deployment and reproducibility of scientific data analyses in various platforms. However, container images often contain outdated or unnecessary software packages, which increases the number of security vulnerabilities in the images, widens the attack surface in the container host, and creates substantial security risks for computing infrastructures at large. This paper presents a vulnerability analysis of container images for scientific data analysis. We compare results obtained with four vulnerability scanners, focusing on the use case of neuroscience data analysis, and quantifying the effect of image update and minification on the number of vulnerabilities. We find that container images used for neuroscience data analysis contain hundreds of vulnerabilities, that software updates remove about two thirds of these vulnerabilities, and that removing unused packages is also effective. We conclude with recommendations on how to build container images with a reduced amount of vulnerabilities. △ Less

Submitted 17 March, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

arXiv:2010.13561 [pdf, other]

Towards Accountability for Machine Learning Datasets: Practices from Software Engineering and Infrastructure

Authors: Ben Hutchinson, Andrew Smart, Alex Hanna, Emily Denton, Christina Greer, Oddur Kjartansson, Parker Barnes, Margaret Mitchell

Abstract: Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was co… ▽ More Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with little visibility into the processes of deliberation which led to their creation. Which stakeholder groups had their perspectives included when the dataset was conceived? Which domain experts were consulted regarding how to model subgroups and other phenomena? How were questions of representational biases measured and addressed? Who labeled the data? In this paper, we introduce a rigorous framework for dataset development transparency which supports decision-making and accountability. The framework uses the cyclical, infrastructural and engineering nature of dataset development to draw on best practices from the software development lifecycle. Each stage of the data development lifecycle yields a set of documents that facilitate improved communication and decision-making, as well as drawing attention the value and necessity of careful data work. The proposed framework is intended to contribute to closing the accountability gap in artificial intelligence systems, by making visible the often overlooked work that goes into dataset creation. △ Less

Submitted 29 January, 2021; v1 submitted 22 October, 2020; originally announced October 2020.

arXiv:2010.08850 [pdf, other]

Against Scale: Provocations and Resistances to Scale Thinking

Authors: Alex Hanna, Tina M. Park

Abstract: At the heart of what drives the bulk of innovation and activity in Silicon Valley and elsewhere is scalability. This unwavering commitment to scalability -- to identify strategies for efficient growth -- is at the heart of what we refer to as "scale thinking." Whether people are aware of it or not, scale thinking is all-encompassing. It is not just an attribute of one's product, service, or compan… ▽ More At the heart of what drives the bulk of innovation and activity in Silicon Valley and elsewhere is scalability. This unwavering commitment to scalability -- to identify strategies for efficient growth -- is at the heart of what we refer to as "scale thinking." Whether people are aware of it or not, scale thinking is all-encompassing. It is not just an attribute of one's product, service, or company, but frames how one thinks about the world (what constitutes it and how it can be observed and measured), its problems (what is a problem worth solving versus not), and the possible technological fixes for those problems. This paper examines different facets of scale thinking and its implication on how we view technology and collaborative work. We argue that technological solutions grounded in scale thinking are unlikely to be as liberatory or effective at deep, systemic change as their purveyors imagine. Rather, solutions which resist scale thinking are necessary to undo the social structures which lie at the heart of social inequality. We draw on recent work on mutual aid networks and propose questions to ask of collaborative work systems as a means to evaluate technological solutions and guide designers in identifying sites of resistance to scale thinking. △ Less

Submitted 20 November, 2020; v1 submitted 17 October, 2020; originally announced October 2020.

arXiv:2007.07399 [pdf, ps, other]

Bringing the People Back In: Contesting Benchmark Machine Learning Datasets

Authors: Remi Denton, Alex Hanna, Razvan Amironesei, Andrew Smart, Hilary Nicole, Morgan Klaus Scheuerman

Abstract: In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research… ▽ More In response to algorithmic unfairness embedded in sociotechnical systems, significant attention has been focused on the contents of machine learning datasets which have revealed biases towards white, cisgender, male, and Western data subjects. In contrast, comparatively less attention has been paid to the histories, values, and norms embedded in such datasets. In this work, we outline a research program - a genealogy of machine learning data - for investigating how and why these datasets have been created, what and whose values influence the choices of data to collect, the contextual and contingent conditions of their creation. We describe the ways in which benchmark datasets in machine learning operate as infrastructure and pose four research questions for these datasets. This interrogation forces us to "bring the people back in" by aiding us in understanding the labor embedded in dataset construction, and thereby presenting new avenues of contestation for other researchers encountering the data. △ Less

Submitted 14 July, 2020; originally announced July 2020.

arXiv:2002.03256 [pdf, other]

doi 10.1145/3375627.3375832

Diversity and Inclusion Metrics in Subset Selection

Authors: Margaret Mitchell, Dylan Baker, Nyalleng Moorosi, Emily Denton, Ben Hutchinson, Alex Hanna, Timnit Gebru, Jamie Morgenstern

Abstract: The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics… ▽ More The ethical concept of fairness has recently been applied in machine learning (ML) settings to describe a wide range of constraints and objectives. When considering the relevance of ethical concepts to subset selection problems, the concepts of diversity and inclusion are additionally applicable in order to create outputs that account for social power and access differentials. We introduce metrics based on these concepts, which can be applied together, separately, and in tandem with additional fairness constraints. Results from human subject experiments lend support to the proposed criteria. Social choice methods can additionally be leveraged to aggregate and choose preferable sets, and we detail how these may be applied. △ Less

Submitted 8 February, 2020; originally announced February 2020.

Journal ref: AIES 2020: Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society

arXiv:1912.03593 [pdf, ps, other]

doi 10.1145/3351095.3372826

Towards a Critical Race Methodology in Algorithmic Fairness

Authors: Alex Hanna, Emily Denton, Andrew Smart, Jamila Smith-Loud

Abstract: We examine the way race and racial categories are adopted in algorithmic fairness frameworks. Current methodologies fail to adequately account for the socially constructed nature of race, instead adopting a conceptualization of race as a fixed attribute. Treating race as an attribute, rather than a structural, institutional, and relational phenomenon, can serve to minimize the structural aspects o… ▽ More We examine the way race and racial categories are adopted in algorithmic fairness frameworks. Current methodologies fail to adequately account for the socially constructed nature of race, instead adopting a conceptualization of race as a fixed attribute. Treating race as an attribute, rather than a structural, institutional, and relational phenomenon, can serve to minimize the structural aspects of algorithmic unfairness. In this work, we focus on the history of racial categories and turn to critical race theory and sociological work on race and ethnicity to ground conceptualizations of race for fairness research, drawing on lessons from public health, biomedical research, and social survey research. We argue that algorithmic fairness researchers need to take into account the multidimensionality of race, take seriously the processes of conceptualizing and operationalizing race, focus on social processes which produce racial inequality, and consider perspectives of those most affected by sociotechnical systems. △ Less

Submitted 7 December, 2019; originally announced December 2019.

Comments: Conference on Fairness, Accountability, and Transparency (FAT* '20), January 27-30, 2020, Barcelona, Spain

arXiv:1911.00216 [pdf, other]

On Distributed Quantization for Classification

Authors: Osama A. Hanna, Yahya H. Ezzeldin, Tara Sadjadpour, Christina Fragouli, Suhas Diggavi

Abstract: We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that… ▽ More We consider the problem of distributed feature quantization, where the goal is to enable a pretrained classifier at a central node to carry out its classification on features that are gathered from distributed nodes through communication constrained channels. We propose the design of distributed quantization schemes specifically tailored to the classification task: unlike quantization schemes that help the central node reconstruct the original signal as accurately as possible, our focus is not reconstruction accuracy, but instead correct classification. Our work does not make any apriori distributional assumptions on the data, but instead uses training data for the quantizer design. Our main contributions include: we prove NP-hardness of finding optimal quantizers in the general case; we design an optimal scheme for a special case; we propose quantization algorithms, that leverage discrete neural representations and training data, and can be designed in polynomial-time for any number of features, any number of classes, and arbitrary division of features across the distributed nodes. We find that tailoring the quantizers to the classification task can offer significant savings: as compared to alternatives, we can achieve more than a factor of two reduction in terms of the number of bits communicated, for the same classification accuracy. △ Less

Submitted 1 November, 2019; originally announced November 2019.

arXiv:1905.09654 [pdf]

A ROS2 based communication architecture for control in collaborative and intelligent automation systems

Authors: Endre Erős, Martin Dahl, Kristofer Bengtsson, Atieh Hanna, Petter Falkman

Abstract: Collaborative robots are becoming part of intelligent automation systems in modern industry. Development and control of such systems differs from traditional automation methods and consequently leads to new challenges. Thankfully, Robot Operating System (ROS) provides a communication platform and a vast variety of tools and utilities that can aid that development. However, it is hard to use ROS in… ▽ More Collaborative robots are becoming part of intelligent automation systems in modern industry. Development and control of such systems differs from traditional automation methods and consequently leads to new challenges. Thankfully, Robot Operating System (ROS) provides a communication platform and a vast variety of tools and utilities that can aid that development. However, it is hard to use ROS in large-scale automation systems due to communication issues in a distributed setup, hence the development of ROS2. In this paper, a ROS2 based communication architecture is presented together with an industrial use-case of a collaborative and intelligent automation system. △ Less

Submitted 23 May, 2019; originally announced May 2019.

Comments: 9 pages, 4 figures, 3 tables, to be published in the proceedings of 29th International Conference on Flexible Automation and Intelligent Manufacturing (FAIM2019), June 2019

arXiv:1903.05850 [pdf, other]

Sequence Planner - Automated Planning and Control for ROS2-based Collaborative and Intelligent Automation Systems

Authors: Martin Dahl, Endre Erös, Atieh Hanna, Kristofer Bengtsson, Petter Falkman

Abstract: Systems based on the Robot Operating System (ROS) are easy to extend with new on-line algorithms and devices. However, there is relatively little support for coordinating a large number of heterogeneous sub-systems. In this paper we propose an architecture to model and control collaborative and intelligent automation systems in a hierarchical fashion. Systems based on the Robot Operating System (ROS) are easy to extend with new on-line algorithms and devices. However, there is relatively little support for coordinating a large number of heterogeneous sub-systems. In this paper we propose an architecture to model and control collaborative and intelligent automation systems in a hierarchical fashion. △ Less

Submitted 14 March, 2019; originally announced March 2019.

Comments: Submitted to IROS 2019. \c{opyright} 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses

arXiv:1803.03610 [pdf, other]

Random Access Schemes in Wireless Systems With Correlated User Activity

Authors: Anders Ellersgaard Kalør, Osama A. Hanna, Petar Popovski

Abstract: Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduli… ▽ More Traditional random access schemes are designed based on the aggregate process of user activation, which is created on the basis of independent activations of the users. However, in Machine-Type Communications (MTC), some users are likely to exhibit a high degree of correlation, e.g. because they observe the same physical phenomenon. This paves the way to devise access schemes that combine scheduling and random access, which is the topic of this work. The underlying idea is to schedule highly correlated users in such a way that their transmissions are less likely to result in a collision. To this end, we propose two greedy allocation algorithms. Both attempt to maximize the throughput using only pairwise correlations, but they rely on different assumptions about the higher-order dependencies. We show that both algorithms achieve higher throughput compared to the traditional random access schemes, suggesting that user correlation can be utilized effectively in access protocols for MTC. △ Less

Submitted 9 March, 2018; originally announced March 2018.

Comments: Submitted to SPAWC 2018

arXiv:1702.05528 [pdf, other]

Degrees of Freedom in Cached MIMO Relay Networks With Multiple Base Stations

Authors: Osama A. Hanna, Amr El-Keyi, Mohammed Nafie

Abstract: The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that… ▽ More The ability of physical layer relay caching to increase the degrees of freedom (DoF) of a single cell was recently illustrated. In this paper, we extend this result to the case of multiple cells in which a caching relay is shared among multiple non-cooperative base stations (BSs). In particular, we show that a large DoF gain can be achieved by exploiting the benefits of having a shared relay that cooperates with the BSs. We first propose a cache-assisted relaying protocol that improves the cooperation opportunity between the BSs and the relay. Next, we consider the cache content placement problem that aims to design the cache content at the relay such that the DoF gain is maximized. We propose an optimal algorithm and a near-optimal low-complexity algorithm for the cache content placement problem. Simulation results show significant improvement in the DoF gain using the proposed relay-caching protocol. △ Less

Submitted 17 February, 2017; originally announced February 2017.

Showing 1–24 of 24 results for author: Hanna, A