subscribe to arXiv mailings

doi 10.1145/3613904.3642749

A Canary in the AI Coal Mine: American Jews May Be Disproportionately Harmed by Intellectual Property Dispossession in Large Language Model Training

Authors: Heila Precel, Allison McDonald, Brent Hecht, Nicholas Vincent

Abstract: Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without the… ▽ More Systemic property dispossession from minority groups has often been carried out in the name of technological progress. In this paper, we identify evidence that the current paradigm of large language models (LLMs) likely continues this long history. Examining common LLM training datasets, we find that a disproportionate amount of content authored by Jewish Americans is used for training without their consent. The degree of over-representation ranges from around 2x to around 6.5x. Given that LLMs may substitute for the paid labor of those who produced their training data, they have the potential to cause even more substantial and disproportionate economic harm to Jewish Americans in the coming years. This paper focuses on Jewish Americans as a case study, but it is probable that other minority communities (e.g., Asian Americans, Hindu Americans) may be similarly affected and, most importantly, the results should likely be interpreted as a "canary in the coal mine" that highlights deep structural concerns about the current LLM paradigm whose harms could soon affect nearly everyone. We discuss the implications of these results for the policymakers thinking about how to regulate LLMs as well as for those in the AI field who are working to advance LLMs. Our findings stress the importance of working together towards alternative LLM paradigms that avoid both disparate impacts and widespread societal harms. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Preprint, to appear in CHI 2024 proceedings

arXiv:2403.12388 [pdf, other]

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Authors: Ying-Chun Lin, Jennifer Neville, Jack W. Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan

Abstract: Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featur… ▽ More Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown. △ Less

Submitted 8 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2305.13238 [pdf]

doi 10.1145/3593013.3594070

The Dimensions of Data Labor: A Road Map for Researchers, Activists, and Policymakers to Empower Data Producers

Authors: Hanlin Li, Nicholas Vincent, Stevie Chancellor, Brent Hecht

Abstract: Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with t… ▽ More Many recent technological advances (e.g. ChatGPT and search engines) are possible only because of massive amounts of user-generated data produced through user interactions with computing systems or scraped from the web (e.g. behavior logs, user-generated content, and artwork). However, data producers have little say in what data is captured, how it is used, or who it benefits. Organizations with the ability to access and process this data, e.g. OpenAI and Google, possess immense power in shaping the technology landscape. By synthesizing related literature that reconceptualizes the production of data for computing as ``data labor'', we outline opportunities for researchers, policymakers, and activists to empower data producers in their relationship with tech companies, e.g advocating for transparency about data reuse, creating feedback channels between data producers and companies, and potentially developing mechanisms to share data's revenue more broadly. In doing so, we characterize data labor with six important dimensions - legibility, end-use awareness, collaboration requirement, openness, replaceability, and livelihood overlap - based on the parallels between data labor and various other types of labor in the computing literature. △ Less

Submitted 22 May, 2023; originally announced May 2023.

Comments: To appear at the 2023 ACM Conference on Fairness, Accountability, and Transparency (ACM FAccT)

arXiv:2207.04049 [pdf, other]

Learning Causal Effects on Hypergraphs

Authors: Jing Ma, Mengting Wan, Longqi Yang, Jundong Li, Brent Hecht, Jaime Teevan

Abstract: Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, in this paper, we focus on the problem of individual treatment effect (ITE) estimation on hypergra… ▽ More Hypergraphs provide an effective abstraction for modeling multi-way group interactions among nodes, where each hyperedge can connect any number of nodes. Different from most existing studies which leverage statistical dependencies, we study hypergraphs from the perspective of causality. Specifically, in this paper, we focus on the problem of individual treatment effect (ITE) estimation on hypergraphs, aiming to estimate how much an intervention (e.g., wearing face covering) would causally affect an outcome (e.g., COVID-19 infection) of each individual node. Existing works on ITE estimation either assume that the outcome on one individual should not be influenced by the treatment assignments on other individuals (i.e., no interference), or assume the interference only exists between pairs of connected individuals in an ordinary graph. We argue that these assumptions can be unrealistic on real-world hypergraphs, where higher-order interference can affect the ultimate ITE estimations due to the presence of group interactions. In this work, we investigate high-order interference modeling, and propose a new causality learning framework powered by hypergraph neural networks. Extensive experiments on real-world hypergraphs verify the superiority of our framework over existing baselines. △ Less

Submitted 7 July, 2022; originally announced July 2022.

arXiv:2205.14529 [pdf]

All That's Happening behind the Scenes: Putting the Spotlight on Volunteer Moderator Labor in Reddit

Authors: Hanlin Li, Brent Hecht, Stevie Chancellor

Abstract: Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, d… ▽ More Online volunteers are an uncompensated yet valuable labor force for many social platforms. For example, volunteer content moderators perform a vast amount of labor to maintain online communities. However, as social platforms like Reddit favor revenue generation and user engagement, moderators are under-supported to manage the expansion of online communities. To preserve these online communities, developers and researchers of social platforms must account for and support as much of this labor as possible. In this paper, we quantitatively characterize the publicly visible and invisible actions taken by moderators on Reddit, using a unique dataset of private moderator logs for 126 subreddits and over 900 moderators. Our analysis of this dataset reveals the heterogeneity of moderation work across both communities and moderators. Moreover, we find that analyzing only visible work - the dominant way that moderation work has been studied thus far - drastically underestimates the amount of human moderation labor on a subreddit. We discuss the implications of our results on content moderation research and social platforms. △ Less

Submitted 5 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: This is a preprint. The paper will be presented at the 2022 International Conference on Web and Social Media (ICWSM'22)

arXiv:2205.14528 [pdf]

Measuring the Monetary Value of Online Volunteer Work

Authors: Hanlin Li, Brent Hecht, Stevie Chancellor

Abstract: Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer… ▽ More Online volunteers are a crucial labor force that keeps many for-profit systems afloat (e.g. social media platforms and online review sites). Despite their substantial role in upholding highly valuable technological systems, online volunteers have no way of knowing the value of their work. This paper uses content moderation as a case study and measures its monetary value to make apparent volunteer labor's value. Using a novel dataset of private logs generated by moderators, we use linear mixed-effect regression and estimate that Reddit moderators worked a minimum of 466 hours per day in 2020. These hours amount to 3.4 million USD a year based on the median hourly wage for comparable content moderation services in the U.S. We discuss how this information may inform pathways to alleviate the one-sided relationship between technology companies and online volunteers. △ Less

Submitted 5 June, 2022; v1 submitted 28 May, 2022; originally announced May 2022.

Comments: This is a preprint. The paper will be presented at the 2022 International Conference on Web and Social Media (ICWSM'22)

arXiv:2112.09544 [pdf]

It's Time to Do Something: Mitigating the Negative Impacts of Computing Through a Change to the Peer Review Process

Authors: Brent Hecht, Lauren Wilcox, Jeffrey P. Bigham, Johannes Schöning, Ehsan Hoque, Jason Ernst, Yonatan Bisk, Luigi De Russis, Lana Yarosh, Bushra Anjum, Danish Contractor, Cathy Wu

Abstract: The computing research community needs to work much harder to address the downsides of our innovations. Between the erosion of privacy, threats to democracy, and automation's effect on employment (among many other issues), we can no longer simply assume that our research will have a net positive impact on the world. While bending the arc of computing innovation towards societal benefit may at firs… ▽ More The computing research community needs to work much harder to address the downsides of our innovations. Between the erosion of privacy, threats to democracy, and automation's effect on employment (among many other issues), we can no longer simply assume that our research will have a net positive impact on the world. While bending the arc of computing innovation towards societal benefit may at first seem intractable, we believe we can achieve substantial progress with a straightforward step: making a small change to the peer review process. As we explain below, we hypothesize that our recommended change will force computing researchers to more deeply consider the negative impacts of their work. We also expect that this change will incentivize research and policy that alleviates computing's negative impacts. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: First published on the ACM Future of Computing Academy blog on March 29, 2018. This is the archival version

arXiv:2108.03350 [pdf, other]

doi 10.1145/1122445.1122456

Learning to Represent Human Motives for Goal-directed Web Browsing

Authors: Jyun-Yu Jiang, Chia-Jung Lee, Longqi Yang, Bahareh Sarrafzadeh, Brent Hecht, Jaime Teevan

Abstract: Motives or goals are recognized in psychology literature as the most fundamental drive that explains and predicts why people do what they do, including when they browse the web. Although providing enormous value, these higher-ordered goals are often unobserved, and little is known about how to leverage such goals to assist people's browsing activities. This paper proposes to take a new approach to… ▽ More Motives or goals are recognized in psychology literature as the most fundamental drive that explains and predicts why people do what they do, including when they browse the web. Although providing enormous value, these higher-ordered goals are often unobserved, and little is known about how to leverage such goals to assist people's browsing activities. This paper proposes to take a new approach to address this problem, which is fulfilled through a novel neural framework, Goal-directed Web Browsing (GoWeB). We adopt a psychologically-sound taxonomy of higher-ordered goals and learn to build their representations in a structure-preserving manner. Then we incorporate the resulting representations for enhancing the experiences of common activities people perform on the web. Experiments on large-scale data from Microsoft Edge web browser show that GoWeB significantly outperforms competitive baselines for in-session web page recommendation, re-visitation classification, and goal-based web page grouping. A follow-up analysis further characterizes how the variety of human motives can affect the difference observed in human behavioral patterns. △ Less

Submitted 6 August, 2021; originally announced August 2021.

Comments: Accepted by RecSys 2021

arXiv:2101.11865 [pdf, other]

doi 10.1145/3411764.3445243

Large Scale Analysis of Multitasking Behavior During Remote Meetings

Authors: Hancheng Cao, Chia-Jung Lee, Shamsi Iqbal, Mary Czerwinski, Priscilla Wong, Sean Rintel, Brent Hecht, Jaime Teevan, Longqi Yang

Abstract: Virtual meetings are critical for remote work because of the need for synchronous collaboration in the absence of in-person interactions. In-meeting multitasking is closely linked to people's productivity and wellbeing. However, we currently have limited understanding of multitasking in remote meetings and its potential impact. In this paper, we present what we believe is the most comprehensive st… ▽ More Virtual meetings are critical for remote work because of the need for synchronous collaboration in the absence of in-person interactions. In-meeting multitasking is closely linked to people's productivity and wellbeing. However, we currently have limited understanding of multitasking in remote meetings and its potential impact. In this paper, we present what we believe is the most comprehensive study of remote meeting multitasking behavior through an analysis of a large-scale telemetry dataset collected from February to May 2020 of U.S. Microsoft employees and a 715-person diary study. Our results demonstrate that intrinsic meeting characteristics such as size, length, time, and type, significantly correlate with the extent to which people multitask, and multitasking can lead to both positive and negative outcomes. Our findings suggest important best-practice guidelines for remote meetings (e.g., avoid important meetings in the morning) and design implications for productivity tools (e.g., support positive remote multitasking). △ Less

Submitted 28 January, 2021; originally announced January 2021.

Comments: In ACM CHI 2021

arXiv:2012.09995 [pdf, other]

Data Leverage: A Framework for Empowering the Public in its Relationship with Technology Companies

Authors: Nicholas Vincent, Hanlin Li, Nicole Tilly, Stevie Chancellor, Brent Hecht

Abstract: Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stopping, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthes… ▽ More Many powerful computing technologies rely on implicit and explicit data contributions from the public. This dependency suggests a potential source of leverage for the public in its relationship with technology companies: by reducing, stopping, redirecting, or otherwise manipulating data contributions, the public can reduce the effectiveness of many lucrative technologies. In this paper, we synthesize emerging research that seeks to better understand and help people action this \textit{data leverage}. Drawing on prior work in areas including machine learning, human-computer interaction, and fairness and accountability in computing, we present a framework for understanding data leverage that highlights new opportunities to change technology company behavior related to privacy, economic inequality, content moderation and other areas of societal concern. Our framework also points towards ways that policymakers can bolster data leverage as a means of changing the balance of power between the public and tech companies. △ Less

Submitted 17 February, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: This is a preprint. The paper will be presented at the 2021 Conference on Fairness, Accountability, and Transparency (FAccT 2021)

arXiv:2011.03116 [pdf, ps, other]

doi 10.1145/3531146.3533143

Behavioral Use Licensing for Responsible AI

Authors: Danish Contractor, Daniel McDuff, Julia Haines, Jenny Lee, Christopher Hines, Brent Hecht, Nicholas Vincent, Hanlin Li

Abstract: With the growing reliance on artificial intelligence (AI) for many different applications, the sharing of code, data, and models is important to ensure the replicability and democratization of scientific knowledge. Many high-profile academic publishing venues expect code and models to be submitted and released with papers. Furthermore, developers often want to release these assets to encourage dev… ▽ More With the growing reliance on artificial intelligence (AI) for many different applications, the sharing of code, data, and models is important to ensure the replicability and democratization of scientific knowledge. Many high-profile academic publishing venues expect code and models to be submitted and released with papers. Furthermore, developers often want to release these assets to encourage development of technology that leverages their frameworks and services. A number of organizations have expressed concerns about the inappropriate or irresponsible use of AI and have proposed ethical guidelines around the application of such systems. While such guidelines can help set norms and shape policy, they are not easily enforceable. In this paper, we advocate the use of licensing to enable legally enforceable behavioral use conditions on software and code and provide several case studies that demonstrate the feasibility of behavioral use licensing. We envision how licensing may be implemented in accordance with existing responsible AI guidelines. △ Less

Submitted 20 October, 2022; v1 submitted 4 November, 2020; originally announced November 2020.

Comments: Paper published at ACM FAccT 2022

arXiv:2007.15584 [pdf]

doi 10.1038/s41562-021-01196-4

How Work From Home Affects Collaboration: A Large-Scale Study of Information Workers in a Natural Experiment During COVID-19

Authors: Longqi Yang, Sonia Jaffe, David Holtz, Siddharth Suri, Shilpi Sinha, Jeffrey Weston, Connor Joyce, Neha Shah, Kevin Sherman, CJ Lee, Brent Hecht, Jaime Teevan

Abstract: The COVID-19 pandemic has had a wide-ranging impact on information workers such as higher stress levels, increased workloads, new workstreams, and more caregiving responsibilities during lockdown. COVID-19 also caused the overwhelming majority of information workers to rapidly shift to working from home (WFH). The central question this work addresses is: can we isolate the effects of WFH on inform… ▽ More The COVID-19 pandemic has had a wide-ranging impact on information workers such as higher stress levels, increased workloads, new workstreams, and more caregiving responsibilities during lockdown. COVID-19 also caused the overwhelming majority of information workers to rapidly shift to working from home (WFH). The central question this work addresses is: can we isolate the effects of WFH on information workers' collaboration activities from all other factors, especially the other effects of COVID-19? This is important because in the future, WFH will likely to be more common than it was prior to the pandemic. We use difference-in-differences (DiD), a causal identification strategy commonly used in the social sciences, to control for unobserved confounding factors and estimate the causal effect of WFH. Our analysis relies on measuring the difference in changes between those who WFH prior to COVID-19 and those who did not. Our preliminary results suggest that on average, people spent more time on collaboration in April (Post WFH mandate) than in February (Pre WFH mandate), but this is primarily due to factors other than WFH, such as lockdowns during the pandemic. The change attributable to WFH specifically is in the opposite direction: less time on collaboration and more focus time. This reversal shows the importance of using causal inference: a simple analysis would have resulted in the wrong conclusion. We further find that the effect of WFH is moderated by individual remote collaboration experience prior to WFH. Meanwhile, the medium for collaboration has also shifted due to WFH: instant messages were used more, whereas scheduled meetings were used less. We discuss design implications -- how future WFH may affect focused work, collaborative work, and creative work. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Journal ref: Nature Human Behaviour (2021)

arXiv:2006.03196 [pdf, other]

Towards Better Driver Safety: Empowering Personal Navigation Technologies with Road Safety Awareness

Authors: Runsheng Xu, Shibo Zhang, Yue Zhao, Peixi Xiong, Allen Yilun Lin, Brent Hecht, Jiaqi Ma

Abstract: Recent research has found that navigation systems usually assume that all roads are equally safe, directing drivers to dangerous routes, which led to catastrophic consequences. To address this problem, this paper aims to begin the process of adding road safety awareness to navigation systems. To do so, we first created a definition for road safety that navigation systems can easily understand by a… ▽ More Recent research has found that navigation systems usually assume that all roads are equally safe, directing drivers to dangerous routes, which led to catastrophic consequences. To address this problem, this paper aims to begin the process of adding road safety awareness to navigation systems. To do so, we first created a definition for road safety that navigation systems can easily understand by adapting well-established safety standards from transportation studies. Based on this road safety definition, we then developed a machine learning-based road safety classifier that predicts the safety level for road segments using a diverse feature set constructed only from large-scale publicly available geographic data. Evaluations in four different countries show that our road safety classifier achieves satisfactory performance. Finally, we discuss the factors to consider when extending our road safety classifier to other regions and potential new safety designs enabled by our road safety predictions. △ Less

Submitted 5 December, 2021; v1 submitted 4 June, 2020; originally announced June 2020.

Comments: Submitted to Autonomous Intelligent System Journal

arXiv:2004.10265 [pdf]

A Deeper Investigation of the Importance of Wikipedia Links to the Success of Search Engines

Authors: Nicholas Vincent, Brent Hecht

Abstract: A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of millions of people. In this paper, we report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs). Our results extend prior work by considering thre… ▽ More A growing body of work has highlighted the important role that Wikipedia's volunteer-created content plays in helping search engines achieve their core goal of addressing the information needs of millions of people. In this paper, we report the results of an investigation into the incidence of Wikipedia links in search engine results pages (SERPs). Our results extend prior work by considering three U.S. search engines, simulating both mobile and desktop devices, and using a spatial analysis approach designed to study modern SERPs that are no longer just "ten blue links". We find that Wikipedia links are extremely common in important search contexts, appearing in 67-84% of all SERPs for common and trending queries, but less often for medical queries. Furthermore, we observe that Wikipedia links often appear in "Knowledge Panel" SERP elements and are in positions visible to users without scrolling, although Wikipedia appears less in prominent positions on mobile devices. Our findings reinforce the complementary notions that (1) Wikipedia content and research has major impact outside of the Wikipedia domain and (2) powerful technologies like search engines are highly reliant on free content created by volunteers. △ Less

Submitted 21 April, 2020; originally announced April 2020.

Comments: This is a pre-print of a paper accepted to the non-archival track of the WikiWorkshop at the Web Conference 2020

arXiv:1912.00757 [pdf]

Mapping the Potential and Pitfalls of "Data Dividends" as a Means of Sharing the Profits of Artificial Intelligence

Authors: Nicholas Vincent, Yichun Li, Renee Zha, Brent Hecht

Abstract: Identifying strategies to more broadly distribute the economic winnings of AI technologies is a growing priority in HCI and other fields. One idea gaining prominence centers on "data dividends", or sharing the profits of AI technologies with the people who generated the data on which these technologies rely. Despite the rapidly growing discussion around data dividends - including backing by promin… ▽ More Identifying strategies to more broadly distribute the economic winnings of AI technologies is a growing priority in HCI and other fields. One idea gaining prominence centers on "data dividends", or sharing the profits of AI technologies with the people who generated the data on which these technologies rely. Despite the rapidly growing discussion around data dividends - including backing by prominent politicians - there exists little guidance about how data dividends might be designed and little information about if they will work. In this paper, we begin the process of developing a concrete design space for data dividends. We additionally simulate the effects of a variety of important design decisions using well-known datasets and algorithms. We find that seemingly innocuous decisions can create counterproductive effects, e.g. severely concentrated dividends and demographic disparities. Overall, the outcomes we observe -- both desirable and undesirable -- highlight the need for dividend implementers to make design decisions cautiously. △ Less

Submitted 18 November, 2019; originally announced December 2019.

Comments: This is a working draft. It has not been peer-reviewed and is intended for internal discussion in the computing community

arXiv:1908.10954 [pdf]

doi 10.1145/2858036.2858123

Not at Home on the Range: Peer Production and the Urban/Rural Divide

Authors: Isaac Johnson, Allen Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, Brent Hecht

Abstract: Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in bo… ▽ More Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. bots). We then codify the systemic challenges inherent to characterizing rural phenomena through peer production and discuss potential solutions. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 10 pages, published on CHI'16

ACM Class: H.5.m

Journal ref: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

arXiv:1906.08576 [pdf]

Measuring the Importance of User-Generated Content to Search Engines

Authors: Nicholas Vincent, Isaac Johnson, Patrick Sheehan, Brent Hecht

Abstract: Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to… ▽ More Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to respond to queries. Analyzing results for six types of important queries (e.g. most popular, trending, expensive advertising), we observe that Wikipedia appears in over 80% of results pages for some query types and is by far the most prevalent individual content source across all query types. More generally, our results provide empirical information to inform a nascent but rapidly-growing debate surrounding a highly-consequential question: Do users provide enough value to intelligent technologies that they should receive more of the economic benefits from intelligent technologies? △ Less

Submitted 20 June, 2019; originally announced June 2019.

Comments: This version includes a bibliography entry that was missing from the first version of the text due to a processing error. This is a preprint of a paper accepted at ICWSM 2019. Please cite that version instead

arXiv:1904.01694 [pdf, other]

doi 10.1145/3098279.3098529

Pharos: improving navigation instructions on smartwatches by including global landmarks

Authors: N. Wenig, D. Wenig, S. Ernst, R. Malaka, B. Hecht, J. Schöning

Abstract: Landmark-based navigation systems have proven benefits relative to traditional turn-by-turn systems that use street names and distances. However, one obstacle to the implementation of landmark-based navigation systems is the complex challenge of selecting salient local landmarks at each decision point for each user. In this paper, we present Pharos, a novel system that extends turn-by-turn navigat… ▽ More Landmark-based navigation systems have proven benefits relative to traditional turn-by-turn systems that use street names and distances. However, one obstacle to the implementation of landmark-based navigation systems is the complex challenge of selecting salient local landmarks at each decision point for each user. In this paper, we present Pharos, a novel system that extends turn-by-turn navigation instructions using a single global landmark (e.g. the Eiffel Tower, the Burj Khalifa, municipal TV towers) rather than multiple, hard-to-select local landmarks. We first show that our approach is feasible in a large number of cities around the world through the use of computer vision to select global landmarks. We then present the results of a study demonstrating that by including global landmarks in navigation instructions, users navigate more confidently and build a more accurate mental map of the navigated area than using turn-by-turn instructions. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: MobileHCI 2017 Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services

arXiv:1904.01689 [pdf]

doi 10.1145/1753326.1753370

The Tower of Babel Meets Web 2.0: User-Generated Content and its Applications in a Multilingual Context

Authors: B. Hecht, D. Gergle

Abstract: This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in… ▽ More This study explores language's fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create "culturally-aware applications" and "hyperlingual applications". △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: CHI 2010 Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

arXiv:1904.01675 [pdf]

doi 10.1145/2666310.2666396

SubwayPS: Towards Enabling Smartphone Positioning in Underground Public Transportation Systems

Authors: T. Stockx, B. Hecht, J. Schöning

Abstract: Thanks to rapid advances in technologies like GPS and Wi-Fi positioning, smartphone users are able to determine their location almost everywhere they go. This is not true, however, of people who are traveling in underground public transportation networks, one of the few types of high-traffic areas where smartphones do not have access to accurate position information. In this paper, we introduce th… ▽ More Thanks to rapid advances in technologies like GPS and Wi-Fi positioning, smartphone users are able to determine their location almost everywhere they go. This is not true, however, of people who are traveling in underground public transportation networks, one of the few types of high-traffic areas where smartphones do not have access to accurate position information. In this paper, we introduce the problem of underground transport positioning on smartphones and present SubwayPS, an accelerometer-based positioning technique that allows smartphones to determine their location substantially better than baseline approaches, even deep beneath city streets. We highlight several immediate applications of positioning in subway networks in domains ranging from mobile advertising to mobile maps and present MetroNavigator, a proof-of-concept smartphone and smartwatch app that notifies users of upcoming points-of-interest and alerts them when it is time to get ready to exit the train. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: Proceedings of the ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems 2014 (ACM SIGSPATIAL 2014)

arXiv:1904.01673 [pdf]

doi 10.1145/2858036.2858053

Helping Computers Understand Geographically-Bound Activity Restrictions

Authors: M. Soll, P. Naumann, J. Schöning, P. Samsonov, B. Hecht

Abstract: The lack of certain types of geographic data prevents the development of location-aware technologies in a number of important domains. One such type of "unmapped" geographic data is space usage rules (SURs), which are defined as geographically-bound activity restrictions (e.g. "no dogs", "no smoking", "no fishing", "no skateboarding"). Researchers in the area of human-computer interaction have rec… ▽ More The lack of certain types of geographic data prevents the development of location-aware technologies in a number of important domains. One such type of "unmapped" geographic data is space usage rules (SURs), which are defined as geographically-bound activity restrictions (e.g. "no dogs", "no smoking", "no fishing", "no skateboarding"). Researchers in the area of human-computer interaction have recently begun to develop techniques for the automated mapping of SURs with the aim of supporting activity planning systems (e.g. one-touch "Can I Smoke Here?" apps, SUR-aware vacation planning tools). In this paper, we present a novel SUR mapping technique - SPtP - that outperforms state-of-the-art approaches by 30% for one of the most important components of the SUR mapping pipeline: associating a point observation of a SUR (e.g. a 'no smoking' sign) with the corresponding polygon in which the SUR applies (e.g. the nearby park or the entire campus on which the sign is located). This paper also contributes a series of new SUR benchmark datasets to help further research in this area. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Journal ref: Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2016)

arXiv:1904.01672 [pdf]

doi 10.1145/1378773.1378790

Improving Interaction with Virtual Globes through Spatial Thinking: Helping Users Ask "Why?"

Authors: J. Schöning, B. Hecht, M. Raubal, A. Krüger, M. Marsh, M. Rohs

Abstract: Virtual globes have progressed from little-known technology to broadly popular software in a mere few years. We investigated this phenomenon through a survey and discovered that, while virtual globes are en vogue, their use is restricted to a small set of tasks so simple that they do not involve any spatial thinking. Spatial thinking requires that users ask "what is where" and "why"; the most comm… ▽ More Virtual globes have progressed from little-known technology to broadly popular software in a mere few years. We investigated this phenomenon through a survey and discovered that, while virtual globes are en vogue, their use is restricted to a small set of tasks so simple that they do not involve any spatial thinking. Spatial thinking requires that users ask "what is where" and "why"; the most common virtual globe tasks only include the "what". Based on the results of this survey, we have developed a multi-touch virtual globe derived from an adapted virtual globe paradigm designed to widen the potential uses of the technology by helping its users to inquire about both the "what is where" and "why" of spatial distribution. We do not seek to provide users with full GIS (geographic information system) functionality, but rather we aim to facilitate the asking and answering of simple "why" questions about general topics that appeal to a wide virtual globe user base. △ Less

Submitted 2 April, 2019; originally announced April 2019.

Comments: Proceedings of the International Conference on Intelligent User Interfaces (IUI 2008)

arXiv:1903.12041 [pdf]

doi 10.1145/3025453.3025495

The Geography of Pokémon GO: Beneficial and Problematic Effects on Places and Movement

Authors: Ashley Colley, Jacob Thebault-Spieker, Allen Yilun Lin, Donald Degraen, Benjamin Fischman, Jonna Häkkilä, Kate Kuehl, Valentina Nisi, Nuno Jardim Nunes, Nina Wenig, Dirk Wenig, Brent Hecht, Johannes Schöning

Abstract: The widespread popularity of Pokémon GO presents the first opportunity to observe the geographic effects of location-based gaming at scale. This paper reports the results of a mixed methods study of the geography of Pokémon GO that includes a five-country field survey of 375 Pokémon GO players and a large scale geostatistical analysis of game elements. Focusing on the key geographic themes of plac… ▽ More The widespread popularity of Pokémon GO presents the first opportunity to observe the geographic effects of location-based gaming at scale. This paper reports the results of a mixed methods study of the geography of Pokémon GO that includes a five-country field survey of 375 Pokémon GO players and a large scale geostatistical analysis of game elements. Focusing on the key geographic themes of places and movement, we find that the design of Pokémon GO reinforces existing geographically-linked biases (e.g. the game advantages urban areas and neighborhoods with smaller minority populations), that Pokémon GO may have instigated a relatively rare large-scale shift in global human mobility patterns, and that Pokémon GO has geographically-linked safety risks, but not those typically emphasized by the media. Our results point to geographic design implications for future systems in this space such as a means through which the geographic biases present in Pokémon GO may be counteracted. △ Less

Submitted 28 March, 2019; originally announced March 2019.

Comments: This version of the paper contains a fix for a reference issue that appeared in the original version. Proceedings of the 35th Annual ACM Conference on Human Factors in Computing Systems (CHI 2017)

ACM Class: H.5.m

Showing 1–23 of 23 results for author: Hecht, B