subscribe to arXiv mailings

arXiv:2404.11988 [pdf, other]

The Emerging AI Divide in the United States

Authors: Madeleine I. G. Daepp, Scott Counts

Abstract: The digital divide describes disparities in access to and usage of digital tooling between social and economic groups. Emerging generative artificial intelligence tools, which strongly affect productivity, could magnify the impact of these divides. However, the affordability, multi-modality, and multilingual capabilities of these tools could also make them more accessible to diverse users in compa… ▽ More The digital divide describes disparities in access to and usage of digital tooling between social and economic groups. Emerging generative artificial intelligence tools, which strongly affect productivity, could magnify the impact of these divides. However, the affordability, multi-modality, and multilingual capabilities of these tools could also make them more accessible to diverse users in comparison with previous forms of digital tooling. In this study, we characterize spatial differences in U.S. residents' knowledge of a new generative AI tool, ChatGPT, through an analysis of state- and county-level search query data. In the first six months after the tool's release, we observe the highest rates of users searching for ChatGPT in West Coast states and persistently low rates of search in Appalachian and Gulf states. Counties with the highest rates of search are relatively more urbanized and have proportionally more educated, more economically advantaged, and more Asian residents in comparison with other counties or with the U.S. average. In multilevel models adjusting for socioeconomic and demographic factors as well as industry makeup, education is the strongest positive predictor of rates of search for generative AI tooling. Although generative AI technologies may be novel, early differences in uptake appear to be following familiar paths of digital marginalization. △ Less

Submitted 30 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

ACM Class: K.4.2

arXiv:2404.04268 [pdf]

The Use of Generative Search Engines for Knowledge Work and Complex Tasks

Authors: Siddharth Suri, Scott Counts, Leijie Wang, Chacha Chen, Mengting Wan, Tara Safavi, Jennifer Neville, Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Sathish Manivannan, Nagu Rangan, Longqi Yang

Abstract: Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine.… ▽ More Until recently, search engines were the predominant method for people to access online information. The recent emergence of large language models (LLMs) has given machines new capabilities such as the ability to generate new digital artifacts like text, images, code etc., resulting in a new tool, a generative search engine, which combines the capabilities of LLMs with a traditional search engine. Through the empirical analysis of Bing Copilot (Bing Chat), one of the first publicly available generative search engines, we analyze the types and complexity of tasks that people use Bing Copilot for compared to Bing Search. Findings indicate that people use the generative search engine for more knowledge work tasks that are higher in cognitive complexity than were commonly done with a traditional search engine. △ Less

Submitted 19 March, 2024; originally announced April 2024.

Comments: 32 pages, 3 figures, 4 tables

ACM Class: J.4

arXiv:2403.12388 [pdf, other]

Interpretable User Satisfaction Estimation for Conversational Systems with Large Language Models

Authors: Ying-Chun Lin, Jennifer Neville, Jack W. Stokes, Longqi Yang, Tara Safavi, Mengting Wan, Scott Counts, Siddharth Suri, Reid Andersen, Xiaofeng Xu, Deepak Gupta, Sujay Kumar Jauhar, Xia Song, Georg Buscher, Saurabh Tiwary, Brent Hecht, Jaime Teevan

Abstract: Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featur… ▽ More Accurate and interpretable user satisfaction estimation (USE) is critical for understanding, evaluating, and continuously improving conversational systems. Users express their satisfaction or dissatisfaction with diverse conversational patterns in both general-purpose (ChatGPT and Bing Copilot) and task-oriented (customer service chatbot) conversational systems. Existing approaches based on featurized ML models or text embeddings fall short in extracting generalizable patterns and are hard to interpret. In this work, we show that LLMs can extract interpretable signals of user satisfaction from their natural language utterances more effectively than embedding-based approaches. Moreover, an LLM can be tailored for USE via an iterative prompting framework using supervision from labeled examples. The resulting method, Supervised Prompting for User satisfaction Rubrics (SPUR), not only has higher accuracy but is more interpretable as it scores user satisfaction via learned rubrics with a detailed breakdown. △ Less

Submitted 8 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12173 [pdf, other]

TnT-LLM: Text Mining at Scale with Large Language Models

Authors: Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan

Abstract: Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. Thi… ▽ More Transforming unstructured text into structured and meaningful forms, organized by useful category labels, is a fundamental step in text mining for downstream analysis and application. However, most existing methods for producing label taxonomies and building text-based label classifiers still rely heavily on domain expertise and manual curation, making the process expensive and time-consuming. This is particularly challenging when the label space is under-specified and large-scale data annotations are unavailable. In this paper, we address these challenges with Large Language Models (LLMs), whose prompt-based interface facilitates the induction and use of large-scale pseudo labels. We propose TnT-LLM, a two-phase framework that employs LLMs to automate the process of end-to-end label generation and assignment with minimal human effort for any given use-case. In the first phase, we introduce a zero-shot, multi-stage reasoning approach which enables LLMs to produce and refine a label taxonomy iteratively. In the second phase, LLMs are used as data labelers that yield training samples so that lightweight supervised classifiers can be reliably built, deployed, and served at scale. We apply TnT-LLM to the analysis of user intent and conversational domain for Bing Copilot (formerly Bing Chat), an open-domain chat-based search engine. Extensive experiments using both human and automatic evaluation metrics demonstrate that TnT-LLM generates more accurate and relevant label taxonomies when compared against state-of-the-art baselines, and achieves a favorable balance between accuracy and efficiency for classification at scale. We also share our practical experiences and insights on the challenges and opportunities of using LLMs for large-scale text mining in real-world applications. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 9 pages main content, 8 pages references and appendix

arXiv:2309.13063 [pdf, other]

Using Large Language Models to Generate, Validate, and Apply User Intent Taxonomies

Authors: Chirag Shah, Ryen W. White, Reid Andersen, Georg Buscher, Scott Counts, Sarkar Snigdha Sarathi Das, Ali Montazer, Sathish Manivannan, Jennifer Neville, Xiaochuan Ni, Nagu Rangan, Tara Safavi, Siddharth Suri, Mengting Wan, Leijie Wang, Longqi Yang

Abstract: Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics.… ▽ More Log data can reveal valuable information about how users interact with Web search services, what they want, and how satisfied they are. However, analyzing user intents in log data is not easy, especially for emerging forms of Web search such as AI-driven chat. To understand user intents from log data, we need a way to label them with meaningful categories that capture their diversity and dynamics. Existing methods rely on manual or machine-learned labeling, which are either expensive or inflexible for large and dynamic datasets. We propose a novel solution using large language models (LLMs), which can generate rich and relevant concepts, descriptions, and examples for user intents. However, using LLMs to generate a user intent taxonomy and apply it for log analysis can be problematic for two main reasons: (1) such a taxonomy is not externally validated; and (2) there may be an undesirable feedback loop. To address this, we propose a new methodology with human experts and assessors to verify the quality of the LLM-generated taxonomy. We also present an end-to-end pipeline that uses an LLM with human-in-the-loop to produce, refine, and apply labels for user intent analysis in log data. We demonstrate its effectiveness by uncovering new insights into user intents from search and chat logs from the Microsoft Bing commercial search engine. The proposed work's novelty stems from the method for generating purpose-driven user intent taxonomies with strong validation. This method not only helps remove methodological and practical bottlenecks from intent-focused research, but also provides a new framework for generating, validating, and applying other kinds of taxonomies in a scalable and adaptable way with reasonable human effort. △ Less

Submitted 9 May, 2024; v1 submitted 14 September, 2023; originally announced September 2023.

Report number: MSR-TR-2023-32

arXiv:1709.03441 [pdf, ps, other]

The Diverse Cohort Selection Problem

Authors: Candice Schumann, Samsara N. Counts, Jeffrey S. Foster, John P. Dickerson

Abstract: How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent… ▽ More How should a firm allocate its limited interviewing resources to select the optimal cohort of new employees from a large set of job applicants? How should that firm allocate cheap but noisy resume screenings and expensive but in-depth in-person interviews? We view this problem through the lens of combinatorial pure exploration (CPE) in the multi-armed bandit setting, where a central learning agent performs costly exploration of a set of arms before selecting a final subset with some combinatorial structure. We generalize a recent CPE algorithm to the setting where arm pulls can have different costs and return different levels of information. We then prove theoretical upper bounds for a general class of arm-pulling strategies in this new setting. We apply our general algorithm to a real-world problem with combinatorial structure: incorporating diversity into university admissions. We take real data from admissions at one of the largest US-based computer science graduate programs and show that a simulation of our algorithm produces a cohort with hiring overall utility while spending comparable budget to the current admissions process at that university. △ Less

Submitted 14 March, 2019; v1 submitted 11 September, 2017; originally announced September 2017.

arXiv:1605.08844 [pdf]

doi 10.1145/2486227.2486249

Smart Societies: From Citizens as Sensors to Collective Action

Authors: Andrés Monroy-Hernández, Shelly Farnham, Emre Kıcıman, Scott Counts, Munmun De Choudhury

Abstract: Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels. Social media has become globally ubiquitous, transforming how people are networked and mobilized. This forum explores research and applications of these new networked publics at individual, organizational, and societal levels. △ Less

Submitted 28 May, 2016; originally announced May 2016.

Journal ref: interactions 20, 4 (July 2013)

arXiv:1507.01291 [pdf]

doi 10.1145/2441776.2441938

The New War Correspondents: the Rise of Civic Media Curation in Urban Warfare

Authors: Andrés Monroy-Hernández, danah boyd, Emre Kiciman, Munmun De Choudhury, Scott Counts

Abstract: In this paper we examine the information sharing practices of people living in cities amid armed conflict. We describe the volume and frequency of microblogging activity on Twitter from four cities afflicted by the Mexican Drug War, showing how citizens use social media to alert one another and to comment on the violence that plagues their communities. We then investigate the emergence of civic me… ▽ More In this paper we examine the information sharing practices of people living in cities amid armed conflict. We describe the volume and frequency of microblogging activity on Twitter from four cities afflicted by the Mexican Drug War, showing how citizens use social media to alert one another and to comment on the violence that plagues their communities. We then investigate the emergence of civic media "curators," individuals who act as "war correspondents" by aggregating and disseminating information to large numbers of people on social media. We conclude by outlining the implications of our observations for the design of civic media systems in wartime. △ Less

Submitted 5 July, 2015; originally announced July 2015.

Comments: In Proceedings of the 2013 conference on Computer supported cooperative work (CSCW 2013). ACM, New York, NY, USA, 1443-1452

arXiv:1507.01290 [pdf]

Narcotweets: Social Media in Wartime

Authors: Andrés Monroy-Hernández, Emre Kiciman, Danah Boyd, Scott Counts

Abstract: This paper describes how people living in armed conflict environments use social media as a participatory news platform, in lieu of damaged state and media apparatuses. We investigate this by analyzing the microblogging practices of Mexican citizens whose everyday life is affected by the Drug War. We provide a descriptive analysis of the phenomenon, combining content and quantitative Twitter data… ▽ More This paper describes how people living in armed conflict environments use social media as a participatory news platform, in lieu of damaged state and media apparatuses. We investigate this by analyzing the microblogging practices of Mexican citizens whose everyday life is affected by the Drug War. We provide a descriptive analysis of the phenomenon, combining content and quantitative Twitter data analyses. We focus on three interrelated phenomena: general participation patterns of ordinary citizens, the emergence and role of information curators, and the tension between governmental regulation and drug cartel intimidation. This study reveals the complex tensions among citizens, media actors, and the government in light of large scale organized crime. △ Less

Submitted 5 July, 2015; originally announced July 2015.

Comments: In Proceedings of the 2012 International AAAI Conference on Weblogs and Social Media

arXiv:1112.1051 [pdf, other]

Predicting Financial Markets: Comparing Survey, News, Twitter and Search Engine Data

Authors: Huina Mao, Scott Counts, Johan Bollen

Abstract: Financial market prediction on the basis of online sentiment tracking has drawn a lot of attention recently. However, most results in this emerging domain rely on a unique, particular combination of data sets and sentiment tracking tools. This makes it difficult to disambiguate measurement and instrument effects from factors that are actually involved in the apparent relation between online sentim… ▽ More Financial market prediction on the basis of online sentiment tracking has drawn a lot of attention recently. However, most results in this emerging domain rely on a unique, particular combination of data sets and sentiment tracking tools. This makes it difficult to disambiguate measurement and instrument effects from factors that are actually involved in the apparent relation between online sentiment and market values. In this paper, we survey a range of online data sets (Twitter feeds, news headlines, and volumes of Google search queries) and sentiment tracking methods (Twitter Investor Sentiment, Negative News Sentiment and Tweet & Google Search volumes of financial terms), and compare their value for financial prediction of market indices such as the Dow Jones Industrial Average, trading volumes, and market volatility (VIX), as well as gold prices. We also compare the predictive power of traditional investor sentiment survey data, i.e. Investor Intelligence and Daily Sentiment Index, against those of the mentioned set of online sentiment indicators. Our results show that traditional surveys of Investor Intelligence are lagging indicators of the financial markets. However, weekly Google Insight Search volumes on financial search queries do have predictive value. An indicator of Twitter Investor Sentiment and the frequency of occurrence of financial terms on Twitter in the previous 1-2 days are also found to be very statistically significant predictors of daily market log return. Survey sentiment indicators are however found not to be statistically significant predictors of financial market values, once we control for all other mood indicators as well as the VIX. △ Less

Submitted 5 December, 2011; originally announced December 2011.

Comments: This paper includes 10 pages, 6 figures and 10 tables

Showing 1–10 of 10 results for author: Counts, S