research-article

Open access

Do Datasets Have Politics? Disciplinary Values in Computer Vision Dataset Development

Authors:

Morgan Klaus Scheuerman,

Alex Hanna, and

Emily DentonAuthors Info & Claims

Proceedings of the ACM on Human-Computer Interaction, Volume 5, Issue CSCW2

Article No.: 317, Pages 1 - 37

https://doi.org/10.1145/3476058

Published: 18 October 2021 Publication History

PDF eReader

Abstract

Data is a crucial component of machine learning. The field is reliant on data to train, validate, and test models. With increased technical capabilities, machine learning research has boomed in both academic and industry settings, and one major focus has been on computer vision. Computer vision is a popular domain of machine learning increasingly pertinent to real-world applications, from facial recognition in policing to object detection for autonomous vehicles. Given computer vision's propensity to shape machine learning research and impact human life, we seek to understand disciplinary practices around dataset documentation - how data is collected, curated, annotated, and packaged into datasets for computer vision researchers and practitioners to use for model tuning and development. Specifically, we examine what dataset documentation communicates about the underlying values of vision data and the larger practices and goals of computer vision as a field. To conduct this study, we collected a corpus of about 500 computer vision datasets, from which we sampled 114 dataset publications across different vision tasks. Through both a structured and thematic content analysis, we document a number of values around accepted data practices, what makes desirable data, and the treatment of humans in the dataset construction process. We discuss how computer vision datasets authors value efficiency at the expense of care; universality at the expense of contextuality; impartiality at the expense of positionality; and model work at the expense of data work. Many of the silenced values we identify sit in opposition with social computing practices. We conclude with suggestions on how to better incorporate silenced values into the dataset creation and curation process.

References

[1]

Rediet Abebe, Solon Barocas, Jon Kleinberg, Karen Levy, Manish Raghavan, and David G. Robinson. 2020. Roles for Computing in Social Change. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, Barcelona Spain, 252--260. https://doi.org/10.1145/3351095.3372871

Abstract

References

Cited By

Index Terms

Recommendations

Documenting Computer Vision Datasets: An Invitation to Reflexive Data Practices

From Human to Data to Dataset: Mapping the Traceability of Human Subjects in Computer Vision Datasets

Envisioning Identity: The Social Production of Human-Centric Computer Vision Systems

Comments

Information

Published In

Publisher

Publication History

Check for updates

Badges

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations