subscribe to arXiv mailings

Run Like a Girl! Sports-Related Gender Bias in Language and Vision

Authors: Sophia Harrison, Eleonora Gualdoni, Gemma Boleda

Abstract: Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing spo… ▽ More Gender bias in Language and Vision datasets and models has the potential to perpetuate harmful stereotypes and discrimination. We analyze gender bias in two Language and Vision datasets. Consistent with prior work, we find that both datasets underrepresent women, which promotes their invisibilization. Moreover, we hypothesize and find that a bias affects human naming choices for people playing sports: speakers produce names indicating the sport (e.g. 'tennis player' or 'surfer') more often when it is a man or a boy participating in the sport than when it is a woman or a girl, with an average of 46% vs. 35% of sports-related names for each gender. A computational model trained on these naming data reproduces the bias. We argue that both the data and the model result in representational harm against women. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2111.04204 [pdf, other]

Natural Adversarial Objects

Authors: Felix Lau, Nishant Subramani, Sasha Harrison, Aerin Kim, Elliot Branson, Rosanne Liu

Abstract: Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cau… ▽ More Although state-of-the-art object detection methods have shown compelling performance, models often are not robust to adversarial attacks and out-of-distribution data. We introduce a new dataset, Natural Adversarial Objects (NAO), to evaluate the robustness of object detection models. NAO contains 7,934 images and 9,943 objects that are unmodified and representative of real-world scenarios, but cause state-of-the-art detection models to misclassify with high confidence. The mean average precision (mAP) of EfficientDet-D7 drops 74.5% when evaluated on NAO compared to the standard MSCOCO validation set. Moreover, by comparing a variety of object detection architectures, we find that better performance on MSCOCO validation set does not necessarily translate to better performance on NAO, suggesting that robustness cannot be simply achieved by training a more accurate model. We further investigate why examples in NAO are difficult to detect and classify. Experiments of shuffling image patches reveal that models are overly sensitive to local texture. Additionally, using integrated gradients and background replacement, we find that the detection model is reliant on pixel information within the bounding box, and insensitive to the background context when predicting class labels. NAO can be downloaded at https://drive.google.com/drive/folders/15P8sOWoJku6SSEiHLEts86ORfytGezi8. △ Less

Submitted 7 November, 2021; originally announced November 2021.

Journal ref: Advances in Neural Information Processing Systems Data Centric AI workshop 2021

arXiv:2108.00114 [pdf, ps, other]

On The State of Data In Computer Vision: Human Annotations Remain Indispensable for Developing Deep Learning Models

Authors: Zeyad Emam, Andrew Kondrich, Sasha Harrison, Felix Lau, Yushi Wang, Aerin Kim, Elliot Branson

Abstract: High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision comm… ▽ More High-quality labeled datasets play a crucial role in fueling the development of machine learning (ML), and in particular the development of deep learning (DL). However, since the emergence of the ImageNet dataset and the AlexNet model in 2012, the size of new open-source labeled vision datasets has remained roughly constant. Consequently, only a minority of publications in the computer vision community tackle supervised learning on datasets that are orders of magnitude larger than Imagenet. In this paper, we survey computer vision research domains that study the effects of such large datasets on model performance across different vision tasks. We summarize the community's current understanding of those effects, and highlight some open questions related to training with massive datasets. In particular, we tackle: (a) The largest datasets currently used in computer vision research and the interesting takeaways from training on such datasets; (b) The effectiveness of pre-training on large datasets; (c) Recent advancements and hurdles facing synthetic datasets; (d) An overview of double descent and sample non-monotonicity phenomena; and finally, (e) A brief discussion of lifelong/continual learning and how it fares compared to learning from huge labeled datasets in an offline setting. Overall, our findings are that research on optimization for deep learning focuses on perfecting the training routine and thus making DL models less data hungry, while research on synthetic datasets aims to offset the cost of data labeling. However, for the time being, acquiring non-synthetic labeled data remains indispensable to boost performance. △ Less

Submitted 30 July, 2021; originally announced August 2021.

arXiv:2005.08669 [pdf]

Translating the Concept of Goal Setting into Practice -- What 'Else' does it Require than a Goal Setting Tool?

Authors: Gábor Kismihók, Catherine Zhao, Michaéla C. Schippers, Stefan T. Mol, Scott Harrison, Shady Shehata

Abstract: This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor.… ▽ More This conceptual paper reviews the current status of goal setting in the area of technology enhanced learning and education. Besides a brief literature review, three current projects on goal setting are discussed. The paper shows that the main barriers for goal setting applications in education are not related to the technology, the available data or analytical methods, but rather the human factor. The most important bottlenecks are the lack of students goal setting skills and abilities, and the current curriculum design, which, especially in the observed higher education institutions, provides little support for goal setting interventions. △ Less

Submitted 18 May, 2020; originally announced May 2020.

Comments: This paper has been accepted to be published in the proceedings of CSEDU 2020 by SciTePress

arXiv:2005.01459 [pdf, other]

doi 10.1145/3313831.3376647

Crafting, Communality, and Computing: Building on Existing Strengths To Support a Vulnerable Population

Authors: Aakash Gautam, Deborah Tatar, Steve Harrison

Abstract: In Nepal, sex-trafficking survivors and the organizations that support them have limited resources to assist the survivors in their on-going journey towards reintegration. We take an asset-based approach wherein we identify and build on the strengths possessed by such groups. In this work, we present reflections from introducing a voice-annotated web application to a group of survivors. The web ap… ▽ More In Nepal, sex-trafficking survivors and the organizations that support them have limited resources to assist the survivors in their on-going journey towards reintegration. We take an asset-based approach wherein we identify and build on the strengths possessed by such groups. In this work, we present reflections from introducing a voice-annotated web application to a group of survivors. The web application tapped into and built upon two elements of pre-existing strengths possessed by the survivors -- the social bond between them and knowledge of crafting as taught to them by the organization. Our findings provide insight into the array of factors influencing how the survivors act in relation to one another as they created novel use practices and adapted the technology. Experience with the application seemed to open knowledge of computing as a potential source of strength. Finally, we articulate three design desiderata that could help promote communal spaces: make activity perceptible to the group, create appropriable steps, and build in fun choices. △ Less

Submitted 4 May, 2020; originally announced May 2020.

Comments: 14 pages, 1 figure. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (CHI'20)

ACM Class: K.4

arXiv:1811.03541 [pdf, other]

Towards Connecting Experiences during Collocated Events through Data Mining and Visualization

Authors: Shuo Niu, D. Scott McCrickard, Steve Harrison

Abstract: Themed collocated events, such as conferences, workshops, and seminars, invite people with related life experiences to connect with each other. In this era when people record lives through the Internet, individual experiences exist in different forms of digital contents. People share digital life records during collocated events, such as sharing blogs they wrote, Twitter posts they forwarded, and… ▽ More Themed collocated events, such as conferences, workshops, and seminars, invite people with related life experiences to connect with each other. In this era when people record lives through the Internet, individual experiences exist in different forms of digital contents. People share digital life records during collocated events, such as sharing blogs they wrote, Twitter posts they forwarded, and books they have read. However, connecting experiences during collocated events are challenging. Not only one is blind to the large contents of others, identifying related experiential items depends on how well experiences are retrieved. The collection of personal contents from all participants forms a valuable group repository, from which connections between experiences can be mined. Visualizing same or related experiences inspire conversations and support social exchange. Common topics in group content also help participants generate new perspectives about the collocated group. Advances in machine learning and data visualization provide automated approaches to process large data and enable interactions with data repositories. This position paper promotes the idea of event mining: how to utilize state-of-the-art data processing and visualization techniques to design event mining systems for connecting experiences during collocated activities. We discuss empirical and constructive problems in this design space, and our preliminary study of deploying a tabletop-based system, BlogCloud, which supports experience re-visitation and exchange with machine-learning and data visualization. △ Less

Submitted 8 November, 2018; originally announced November 2018.

arXiv:1802.05534 [pdf, other]

Opportunity in Conflict: Understanding Tension Among Key Groups on the Trail

Authors: Lindah Kotut, Michael Horning, Steve Harrison, D. Scott McCrickard

Abstract: This paper examines the question of who technology users on the trail are, what their technological uses and needs are, and what conflicts exist between different trail users regarding technology use and experience, toward understanding how experiences of trail users contribute to designers. We argue that exploring these tensions provide opportunities for design that can be used to both mitigate c… ▽ More This paper examines the question of who technology users on the trail are, what their technological uses and needs are, and what conflicts exist between different trail users regarding technology use and experience, toward understanding how experiences of trail users contribute to designers. We argue that exploring these tensions provide opportunities for design that can be used to both mitigate conflicts and improve community on the trail. △ Less

Submitted 13 February, 2018; originally announced February 2018.

Comments: Workshop Paper Submitted to CHI HCI Outdoors: Understanding Human-Computer Interaction in the Outdoors (2018)

ACM Class: H.5.m

arXiv:1604.02368 [pdf]

doi 10.5121/ijnsa.2016.8204

A Security Evaluation Framework for U.K. E-Goverment Services Agile Software Development

Authors: Steve Harrison, Antonis Tzounis, Leandros A. Maglaras, Francois Siewe, Richard Smith, Helge Janicke

Abstract: This study examines the traditional approach to software development within the United Kingdom Government and the accreditation process. Initially we look at the Waterfall methodology that has been used for several years. We discuss the pros and cons of Waterfall before moving onto the Agile Scrum methodology. Agile has been adopted by the majority of Government digital departments including the G… ▽ More This study examines the traditional approach to software development within the United Kingdom Government and the accreditation process. Initially we look at the Waterfall methodology that has been used for several years. We discuss the pros and cons of Waterfall before moving onto the Agile Scrum methodology. Agile has been adopted by the majority of Government digital departments including the Government Digital Services. Agile, despite its ability to achieve high rates of productivity organized in short, flexible, iterations, has faced security professionals disbelief when working within the U.K. Government. One of the major issues is that we develop in Agile but the accreditation process is conducted using Waterfall resulting in delays to go live dates. Taking a brief look into the accreditation process that is used within Government for I.T. systems and applications, we focus on giving the accreditor the assurance they need when developing new applications and systems. A framework has been produced by utilizing the Open Web Application Security Project (OWASP) Application Security Verification Standard (ASVS). This framework will allow security and Agile to work side by side and produce secure code. △ Less

Submitted 8 April, 2016; originally announced April 2016.

Comments: 19 pages, 4 figures, International Journal of Network Security & Its Applications (IJNSA) Vol.8, No.2, March 2016

arXiv:1209.3315 [pdf, ps, other]

Storage Workload Modelling by Hidden Markov Models: Application to FLASH Memory

Authors: P. G. Harrison, S. K. Harrison, N. M. Patel, S. Zertal

Abstract: A workload analysis technique is presented that processes data from operation type traces and creates a Hidden Markov Model (HMM) to represent the workload that generated those traces. The HMM can be used to create representative traces for performance models, such as simulators, avoiding the need to repeatedly acquire suitable traces. It can also be used to estimate directly the transition probab… ▽ More A workload analysis technique is presented that processes data from operation type traces and creates a Hidden Markov Model (HMM) to represent the workload that generated those traces. The HMM can be used to create representative traces for performance models, such as simulators, avoiding the need to repeatedly acquire suitable traces. It can also be used to estimate directly the transition probabilities and rates of a Markov modulated arrival process, for use as input to an analytical performance model of Flash memory. The HMMs obtained from industrial workloads are validated by comparing their autocorrelation functions and other statistics with those of the corresponding monitored time series. Further, the performance model applications are illustrated by numerical examples. △ Less

Submitted 14 September, 2012; originally announced September 2012.

Comments: 29 pages, 18 figures

Journal ref: Performance Evaluation 69, 1 (2012), 17-40

arXiv:cs/0612081 [pdf, ps, other]

Personal Information Ecosystems and Implications for Design

Authors: Manas Tungare, Pardha S. Pyla, Manuel Pérez-Quiñones, Steve Harrison

Abstract: Today, people use multiple devices to fulfill their information needs. However, designers design each device individually, without accounting for the other devices that users may also use. In many cases, the applications on all these devices are designed to be functional replicates of each other. We argue that this results in an over-reliance on data synchronization across devices, version contr… ▽ More Today, people use multiple devices to fulfill their information needs. However, designers design each device individually, without accounting for the other devices that users may also use. In many cases, the applications on all these devices are designed to be functional replicates of each other. We argue that this results in an over-reliance on data synchronization across devices, version control nightmares, and increased burden of file management. In this paper, we present the idea of a \textit{personal information ecosystem}, an analogy to biological ecosystems, which allows us to discuss the inter-relationships among these devices to fulfill the information needs of the user. There is a need for designers to design devices as part of a complete ecosystem, not as independent devices that simply share data replicated across them. To help us understand this domain and to facilitate the dialogue and study of such systems, we present the terminology, classifications of the interdependencies among different devices, and resulting implications for design. △ Less

Submitted 18 December, 2006; originally announced December 2006.

Showing 1–10 of 10 results for author: Harrison, S