-
Run Like a Girl! Sports-Related Gender Bias in Language and Vision
Authors:
Sophia Harrison,
Eleonora Gualdoni,
Gemma Boleda
Abstract:
Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing spo…
▽ More
Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing sports: speakers produce names indicating the sport (e.g. 'tennis player' or 'surfer') more often when it is a man or a boy participating in the sport than when it is a woman or a girl, with an average of 46% vs. 35% of sports-related names for each gender. A computational model trained on these naming data reproduces the bias. We argue that both the data and the model result in representational harm against women.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Natural Adversarial Objects
Authors:
Felix Lau,
Nishant Subramani,
Sasha Harrison,
Aerin Kim,
Elliot Branson,
Rosanne Liu
Abstract:
Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cau…
▽ More
Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cause state-of-the-art detection models to misclassify with high confidence. The mean average precision (mAP) of EfficientDet-D7 drops 74.5% when evaluated on NAO compared to the standard MSCOCO validation set.
Moreover, by comparing a variety of object detection architectures, we find that better performance on MSCOCO validation set does not necessarily translate to better performance on NAO, suggesting that robustness cannot be simply achieved by training a more accurate model.
We further investigate why examples in NAO are difficult to detect and classify. Experiments of shuffling image patches reveal that models are overly sensitive to local texture. Additionally, using integrated gradients and background replacement, we find that the detection model is reliant on pixel information within the bounding box, and insensitive to the background context when predicting class labels. NAO can be downloaded at https://drive.google.com/drive/folders/15P8sOWoJku6SSEiHLEts86ORfytGezi8.
△ Less
Submitted 7 November, 2021;
originally announced November 2021.
-
On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models
Authors:
Zeyad Emam,
Andrew Kondrich,
Sasha Harrison,
Felix Lau,
Yushi Wang,
Aerin Kim,
Elliot Branson
Abstract:
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision comm…
▽ More
High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision community tackle supervised learning on datasets that are orders of magnitude larger than Imagenet. In this paper, we survey computer vision research domains that study the effects of such large datasets on model performance across different vision tasks. We summarize the community's current understanding of those effects, and highlight some open questions related to training with massive datasets. In particular, we tackle: (a) The largest datasets currently used in computer vision research and the interesting takeaways from training on such datasets; (b) The effectiveness of pre-training on large datasets; (c) Recent advancements and hurdles facing synthetic datasets; (d) An overview of double descent and sample non-monotonicity phenomena; and finally, (e) A brief discussion of lifelong/continual learning and how it fares compared to learning from huge labeled datasets in an offline setting. Overall, our findings are that research on optimization for deep learning focuses on perfecting the training routine and thus making DL models less data hungry, while research on synthetic datasets aims to offset the cost of data labeling. However, for the time being, acquiring non-synthetic labeled data remains indispensable to boost performance.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Translating the Concept of Goal Setting into Practice -- What 'Else' does it Require than a Goal Setting Tool?
Authors:
Gábor Kismihók,
Catherine Zhao,
Michaéla C. Schippers,
Stefan T. Mol,
Scott Harrison,
Shady Shehata
Abstract:
This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor.…
▽ More
This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor. The most important bottlenecks are the lack of students goal setting skills and abilities, and the current curriculum design, which, especially in the observed higher education institutions, provides little support for goal setting interventions.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Crafting, Communality, and Computing: Building on Existing Strengths To Support a Vulnerable Population
Authors:
Aakash Gautam,
Deborah Tatar,
Steve Harrison
Abstract:
In Nepal, sex-trafficking survivors and the organizations that support them have limited resources to assist the survivors in their on-going journey towards reintegration. We take an asset-based approach wherein we identify and build on the strengths possessed by such groups. In this work, we present reflections from introducing a voice-annotated web application to a group of survivors. The web ap…
▽ More
In Nepal, sex-trafficking survivors and the organizations that support them have limited resources to assist the survivors in their on-going journey towards reintegration. We take an asset-based approach wherein we identify and build on the strengths possessed by such groups. In this work, we present reflections from introducing a voice-annotated web application to a group of survivors. The web application tapped into and built upon two elements of pre-existing strengths possessed by the survivors -- the social bond between them and knowledge of crafting as taught to them by the organization. Our findings provide insight into the array of factors influencing how the survivors act in relation to one another as they created novel use practices and adapted the technology. Experience with the application seemed to open knowledge of computing as a potential source of strength. Finally, we articulate three design desiderata that could help promote communal spaces: make activity perceptible to the group, create appropriable steps, and build in fun choices.
△ Less
Submitted 4 May, 2020;
originally announced May 2020.
-
Towards Connecting Experiences during Collocated Events through Data Mining and Visualization
Authors:
Shuo Niu,
D. Scott McCrickard,
Steve Harrison
Abstract:
Themed collocated events, such as conferences, workshops, and seminars, invite people with related life experiences to connect with each other. In this era when people record lives through the Internet, individual experiences exist in different forms of digital contents. People share digital life records during collocated events, such as sharing blogs they wrote, Twitter posts they forwarded, and…
▽ More
Themed collocated events, such as conferences, workshops, and seminars, invite people with related life experiences to connect with each other. In this era when people record lives through the Internet, individual experiences exist in different forms of digital contents. People share digital life records during collocated events, such as sharing blogs they wrote, Twitter posts they forwarded, and books they have read. However, connecting experiences during collocated events are challenging. Not only one is blind to the large contents of others, identifying related experiential items depends on how well experiences are retrieved. The collection of personal contents from all participants forms a valuable group repository, from which connections between experiences can be mined. Visualizing same or related experiences inspire conversations and support social exchange. Common topics in group content also help participants generate new perspectives about the collocated group. Advances in machine learning and data visualization provide automated approaches to process large data and enable interactions with data repositories. This position paper promotes the idea of event mining: how to utilize state-of-the-art data processing and visualization techniques to design event mining systems for connecting experiences during collocated activities. We discuss empirical and constructive problems in this design space, and our preliminary study of deploying a tabletop-based system, BlogCloud, which supports experience re-visitation and exchange with machine-learning and data visualization.
△ Less
Submitted 8 November, 2018;
originally announced November 2018.
-
Opportunity in Conflict: Understanding Tension Among Key Groups on the Trail
Authors:
Lindah Kotut,
Michael Horning,
Steve Harrison,
D. Scott McCrickard
Abstract:
This paper examines the question of who technology users on the trail are, what their technological uses and needs are, and what conflicts exist between different trail users regarding technology use and experience, toward understanding how experiences of trail users contribute to designers. We argue that exploring these tensions provide opportunities for design that can be used to both mitigate c…
▽ More
This paper examines the question of who technology users on the trail are, what their technological uses and needs are, and what conflicts exist between different trail users regarding technology use and experience, toward understanding how experiences of trail users contribute to designers. We argue that exploring these tensions provide opportunities for design that can be used to both mitigate conflicts and improve community on the trail.
△ Less
Submitted 13 February, 2018;
originally announced February 2018.
-
A Security Evaluation Framework for U.K. E-Goverment Services Agile Software Development
Authors:
Steve Harrison,
Antonis Tzounis,
Leandros A. Maglaras,
Francois Siewe,
Richard Smith,
Helge Janicke
Abstract:
This study examines the traditional approach to software development within the United Kingdom Government and the accreditation process. Initially we look at the Waterfall methodology that has been used for several years. We discuss the pros and cons of Waterfall before moving onto the Agile Scrum methodology. Agile has been adopted by the majority of Government digital departments including the G…
▽ More
This study examines the traditional approach to software development within the United Kingdom Government and the accreditation process. Initially we look at the Waterfall methodology that has been used for several years. We discuss the pros and cons of Waterfall before moving onto the Agile Scrum methodology. Agile has been adopted by the majority of Government digital departments including the Government Digital Services. Agile, despite its ability to achieve high rates of productivity organized in short, flexible, iterations, has faced security professionals disbelief when working within the U.K. Government. One of the major issues is that we develop in Agile but the accreditation process is conducted using Waterfall resulting in delays to go live dates. Taking a brief look into the accreditation process that is used within Government for I.T. systems and applications, we focus on giving the accreditor the assurance they need when developing new applications and systems. A framework has been produced by utilizing the Open Web Application Security Project (OWASP) Application Security Verification Standard (ASVS). This framework will allow security and Agile to work side by side and produce secure code.
△ Less
Submitted 8 April, 2016;
originally announced April 2016.
-
Storage Workload Modelling by Hidden Markov Models: Application to FLASH Memory
Authors:
P. G. Harrison,
S. K. Harrison,
N. M. Patel,
S. Zertal
Abstract:
A workload analysis technique is presented that processes data from operation type traces and creates a Hidden Markov Model (HMM) to represent the workload that generated those traces. The HMM can be used to create representative traces for performance models, such as simulators, avoiding the need to repeatedly acquire suitable traces. It can also be used to estimate directly the transition probab…
▽ More
A workload analysis technique is presented that processes data from operation type traces and creates a Hidden Markov Model (HMM) to represent the workload that generated those traces. The HMM can be used to create representative traces for performance models, such as simulators, avoiding the need to repeatedly acquire suitable traces. It can also be used to estimate directly the transition probabilities and rates of a Markov modulated arrival process, for use as input to an analytical performance model of Flash memory. The HMMs obtained from industrial workloads are validated by comparing their autocorrelation functions and other statistics with those of the corresponding monitored time series. Further, the performance model applications are illustrated by numerical examples.
△ Less
Submitted 14 September, 2012;
originally announced September 2012.
-
Personal Information Ecosystems and Implications for Design
Authors:
Manas Tungare,
Pardha S. Pyla,
Manuel Pérez-Quiñones,
Steve Harrison
Abstract:
Today, people use multiple devices to fulfill their information needs. However, designers design each device individually, without accounting for the other devices that users may also use. In many cases, the applications on all these devices are designed to be functional replicates of each other. We argue that this results in an over-reliance on data synchronization across devices, version contr…
▽ More
Today, people use multiple devices to fulfill their information needs. However, designers design each device individually, without accounting for the other devices that users may also use. In many cases, the applications on all these devices are designed to be functional replicates of each other. We argue that this results in an over-reliance on data synchronization across devices, version control nightmares, and increased burden of file management. In this paper, we present the idea of a \textit{personal information ecosystem}, an analogy to biological ecosystems, which allows us to discuss the inter-relationships among these devices to fulfill the information needs of the user. There is a need for designers to design devices as part of a complete ecosystem, not as independent devices that simply share data replicated across them. To help us understand this domain and to facilitate the dialogue and study of such systems, we present the terminology, classifications of the interdependencies among different devices, and resulting implications for design.
△ Less
Submitted 18 December, 2006;
originally announced December 2006.