Skip to main content

Showing 1–31 of 31 results for author: Johnson, I

  1. The Potential and Implications of Generative AI on HCI Education

    Authors: Ahmed Kharrufa, Ian G Johnson

    Abstract: Generative AI (GAI) is impacting teaching and learning directly or indirectly across a range of subjects and disciplines. As educators, we need to understand the potential and limitations of AI in HCI education and ensure our graduating HCI students are aware of the potential and limitations of AI in HCI. In this paper, we report on the main pedagogical insights gained from the inclusion of genera… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 14 pages, 2 figures, to be published at EduCHI 2024 The 6th Annual Symposium on HCI Education, June 2024, New York, NY

  2. Design Implications for a Social and Collaborative Understanding of online Information Assessment Practices, Challenges and Heuristics

    Authors: Vasilis Vlachokyriakos, Ian G. Johnson, Robert Anderson, Caroline Claisse, Viana Zhang, Pamela Briggs

    Abstract: The broader adoption of social media platforms (e.g., TikTok), combined with recent developments in Generative AI (GAI) technologies has had a transformative effect on many peoples' ability to confidently assess the veracity and meaning of information online. In this paper, building on recent related work that surfaced the social ways that young people evaluate information online, we explore the d… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: To be published in Proceedings of ECSCW 2024, Rimini, Italy

  3. arXiv:2404.09764  [pdf, other

    cs.CY

    Language-Agnostic Modeling of Wikipedia Articles for Content Quality Assessment across Languages

    Authors: Paramita Das, Isaac Johnson, Diego Saez-Trumper, Pablo Aragón

    Abstract: Wikipedia is the largest web repository of free knowledge. Volunteer editors devote time and effort to creating and expanding articles in more than 300 language editions. As content quality varies from article to article, editors also spend substantial time rating articles with specific criteria. However, keeping these assessments complete and up-to-date is largely impossible given the ever-changi… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted at ICWSM-24

  4. arXiv:2404.03428  [pdf, other

    cs.CL

    Edisum: Summarizing and Explaining Wikipedia Edits at Scale

    Authors: Marija Šakota, Isaac Johnson, Guosheng Feng, Robert West

    Abstract: An edit summary is a succinct comment written by a Wikipedia editor explaining the nature of, and reasons for, an edit to a Wikipedia page. Edit summaries are crucial for maintaining the encyclopedia: they are the first thing seen by content moderators and help them decide whether to accept or reject an edit. Additionally, edit summaries constitute a valuable data source for researchers. Unfortuna… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  5. arXiv:2312.04927  [pdf, other

    cs.CL cs.LG

    Zoology: Measuring and Improving Recall in Efficient Language Models

    Authors: Simran Arora, Sabri Eyuboglu, Aman Timalsina, Isys Johnson, Michael Poli, James Zou, Atri Rudra, Christopher Ré

    Abstract: Attention-free language models that combine gating and convolutions are growing in popularity due to their efficiency and increasingly competitive performance. To better understand these architectures, we pretrain a suite of 17 attention and "gated-convolution" language models, finding that SoTA gated-convolution architectures still underperform attention by up to 2.1 perplexity points on the Pile… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  6. arXiv:2310.12109  [pdf, other

    cs.LG

    Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture

    Authors: Daniel Y. Fu, Simran Arora, Jessica Grogan, Isys Johnson, Sabri Eyuboglu, Armin W. Thomas, Benjamin Spector, Michael Poli, Atri Rudra, Christopher Ré

    Abstract: Machine learning models are increasingly being scaled in both sequence length and model dimension to reach longer contexts and better performance. However, existing architectures such as Transformers scale quadratically along both these axes. We ask: are there performant architectures that can scale sub-quadratically along sequence length and model dimension? We introduce Monarch Mixer (M2), a new… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023 (Oral)

  7. arXiv:2308.16298  [pdf, other

    cs.CR

    Publishing Wikipedia usage data with strong privacy guarantees

    Authors: Temilola Adeleye, Skye Berghel, Damien Desfontaines, Michael Hay, Isaac Johnson, Cléo Lemoisson, Ashwin Machanavajjhala, Tom Magerlein, Gabriele Modena, David Pujol, Daniel Simmons-Marengo, Hal Triedman

    Abstract: For almost 20 years, the Wikimedia Foundation has been publishing statistics about how many people visited each Wikipedia page on each day. This data helps Wikipedia editors determine where to focus their efforts to improve the online encyclopedia, and enables academic research. In June 2023, the Wikimedia Foundation, helped by Tumult Labs, addressed a long-standing request from Wikipedia editors… ▽ More

    Submitted 1 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: 11 pages, 10 figures, Theory and Practice of Differential Privacy (TPDP) 2023

  8. Increasing Participation in Peer Production Communities with the Newcomer Homepage

    Authors: Morten Warncke-Wang, Rita Ho, Marshall Miller, Isaac Johnson

    Abstract: For peer production communities to be sustainable, they must attract and retain new contributors. Studies have identified social and technical barriers to entry and discovered some potential solutions, but these solutions have typically focused on a single highly successful community, the English Wikipedia, been tested in isolation, and rarely evaluated through controlled experiments. We propose t… ▽ More

    Submitted 18 August, 2023; originally announced August 2023.

  9. arXiv:2307.08669  [pdf, other

    cs.CY cs.HC cs.IR

    Leveraging Recommender Systems to Reduce Content Gaps on Peer Production Platforms

    Authors: Mo Houtti, Isaac Johnson, Morten Warncke-Wang, Loren Terveen

    Abstract: Peer production platforms like Wikipedia commonly suffer from content gaps. Prior research suggests recommender systems can help solve this problem, by guiding editors towards underrepresented topics. However, it remains unclear whether this approach would result in less relevant recommendations, leading to reduced overall engagement with recommended items. To answer this question, we first conduc… ▽ More

    Submitted 10 April, 2024; v1 submitted 17 July, 2023; originally announced July 2023.

    Comments: To appear at the 18th International AAAI Conference on Web and Social Media (ICWSM 2024)

  10. arXiv:2303.00070  [pdf

    cs.HC cs.CR cs.CY

    Tainted Love: A Systematic Review of Online Romance Fraud

    Authors: Alexander Bilz, Lynsay A. Shepherd, Graham I. Johnson

    Abstract: Romance fraud involves cybercriminals engineering a romantic relationship on online dating platforms. It is a cruel form of cybercrime whereby victims are left heartbroken, often facing financial ruin. We characterise the literary landscape on romance fraud, advancing the understanding of researchers and practitioners by systematically reviewing and synthesising contemporary qualitative and quanti… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

    Comments: 41 pages, 3 figures, 3 tables

  11. arXiv:2302.10856  [pdf, other

    cs.IR

    Overview of the TREC 2021 Fair Ranking Track

    Authors: Michael D. Ekstrand, Graham McDonald, Amifa Raj, Isaac Johnson

    Abstract: The TREC Fair Ranking Track aims to provide a platform for participants to develop and evaluate novel retrieval algorithms that can provide a fair exposure to a mixture of demographics or attributes, such as ethnicity, that are represented by relevant documents in response to a search query. For example, particular demographics or attributes can be represented by the documents' topical content or… ▽ More

    Submitted 21 February, 2023; originally announced February 2023.

    Comments: Published in The Thirtieth Text REtrieval Conference Proceedings (TREC 2021). arXiv admin note: substantial text overlap with arXiv:2302.05558

  12. arXiv:2302.05558  [pdf, other

    cs.IR

    Overview of the TREC 2022 Fair Ranking Track

    Authors: Michael D. Ekstrand, Graham McDonald, Amifa Raj, Isaac Johnson

    Abstract: The TREC Fair Ranking Track aims to provide a platform for participants to develop and evaluate novel retrieval algorithms that can provide a fair exposure to a mixture of demographics or attributes, such as ethnicity, that are represented by relevant documents in response to a search query. For example, particular demographics or attributes can be represented by the documents topical content or a… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  13. arXiv:2301.02130  [pdf

    cs.LG cs.AI eess.SP

    A deep learning approach to using wearable seismocardiography (SCG) for diagnosing aortic valve stenosis and predicting aortic hemodynamics obtained by 4D flow MRI

    Authors: Mahmoud E. Khani, Ethan M. I. Johnson, Aparna Sodhi, Joshua Robinson, Cynthia K. Rigsby, Bradly D. Allen, Michael Markl

    Abstract: In this paper, we explored the use of deep learning for the prediction of aortic flow metrics obtained using 4D flow MRI using wearable seismocardiography (SCG) devices. 4D flow MRI provides a comprehensive assessment of cardiovascular hemodynamics, but it is costly and time-consuming. We hypothesized that deep learning could be used to identify pathological changes in blood flow, such as elevated… ▽ More

    Submitted 5 January, 2023; originally announced January 2023.

    Comments: 16 pages, 4 figures

  14. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  15. arXiv:2208.08426  [pdf, other

    cs.CY cs.HC

    "We Need a Woman in Music": Exploring Wikipedia's Values on Article Priority

    Authors: Mo Houtti, Isaac Johnson, Joel Cepeda, Soumya Khandelwal, Aviral Bhatnagar, Loren Terveen

    Abstract: Wikipedia -- like most peer production communities -- suffers from a basic problem: the amount of work that needs to be done (articles to be created and improved) exceeds the available resources (editor effort). Recommender systems have been deployed to address this problem, but they have tended to recommend work tasks that match individuals' personal interests, ignoring more global community valu… ▽ More

    Submitted 17 August, 2022; originally announced August 2022.

    Comments: To appear at the 25th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW 2022)

  16. arXiv:2206.12037  [pdf, other

    cs.LG

    How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

    Authors: Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

    Abstract: Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4). A core component of S4 involves initializing the SSM state matrix to a particular matrix called a HiPPO matrix, which was empirically important for S4's ability to handle… ▽ More

    Submitted 5 August, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

  17. arXiv:2206.03216  [pdf, other

    cs.CY cs.AI cs.CL

    Data Governance in the Age of Large-Scale Data-Driven Language Technology

    Authors: Yacine Jernite, Huu Nguyen, Stella Biderman, Anna Rogers, Maraim Masoud, Valentin Danchev, Samson Tan, Alexandra Sasha Luccioni, Nishant Subramani, Gérard Dupont, Jesse Dodge, Kyle Lo, Zeerak Talat, Isaac Johnson, Dragomir Radev, Somaieh Nikpoor, Jörg Frohberg, Aaron Gokaslan, Peter Henderson, Rishi Bommasani, Margaret Mitchell

    Abstract: The recent emergence and adoption of Machine Learning technology, and specifically of Large Language Models, has drawn attention to the need for systematic and transparent management of language data. This work proposes an approach to global language data governance that attempts to organize data management amongst stakeholders, values, and rights. Our proposal is informed by prior work on distrib… ▽ More

    Submitted 2 November, 2022; v1 submitted 3 May, 2022; originally announced June 2022.

    Comments: 32 pages: Full paper and Appendices; Association for Computing Machinery, New York, NY, USA, 2206-2222

    Journal ref: Proceedings of 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22)

  18. arXiv:2204.02483  [pdf, ps, other

    cs.CY cs.CL

    Considerations for Multilingual Wikipedia Research

    Authors: Isaac Johnson, Emily Lescak

    Abstract: English Wikipedia has long been an important data source for much research and natural language machine learning modeling. The growth of non-English language editions of Wikipedia, greater computational resources, and calls for equity in the performance of language and multimodal models have led to the inclusion of many more language editions of Wikipedia in datasets and models. Building better mu… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: Accepted to Wiki-M3L workshop as part of ICLR 2022

  19. arXiv:2110.13985  [pdf, other

    cs.LG cs.AI

    Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers

    Authors: Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré

    Abstract: Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations (NDEs) are popular families of deep learning models for time-series data, each with unique strengths and tradeoffs in modeling power and computational efficiency. We introduce a simple sequence model inspired by control systems that generalizes these approaches while addressing their shortcomings. The Linear… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: NeurIPS 2021

  20. arXiv:2110.13041  [pdf, other

    cs.LG cs.AR physics.data-an physics.ins-det

    Applications and Techniques for Fast Machine Learning in Science

    Authors: Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini, Thea Aarrestad, Steffen Bahr, Jurgen Becker, Anne-Sophie Berthold, Richard J. Bonventre, Tomas E. Muller Bravo, Markus Diefenthaler, Zhen Dong, Nick Fritzsche, Amir Gholami, Ekaterina Govorkova, Kyle J Hazelwood , et al. (62 additional authors not shown)

    Abstract: In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML ac… ▽ More

    Submitted 25 October, 2021; originally announced October 2021.

    Comments: 66 pages, 13 figures, 5 tables

    Report number: FERMILAB-PUB-21-502-AD-E-SCD

    Journal ref: Front. Big Data 5, 787421 (2022)

  21. arXiv:2103.00068  [pdf, other

    cs.CY

    Language-agnostic Topic Classification for Wikipedia

    Authors: Isaac Johnson, Martin Gerlach, Diego Sáez-Trumper

    Abstract: A major challenge for many analyses of Wikipedia dynamics -- e.g., imbalances in content quality, geographic differences in what content is popular, what types of articles attract more editor discussion -- is grouping the very diverse range of Wikipedia articles into coherent, consistent topics. This problem has been addressed using various approaches based on Wikipedia's category network, WikiPro… ▽ More

    Submitted 26 February, 2021; originally announced March 2021.

    Comments: Accepted to WikiWorkshop at The Web Conference 2021

  22. arXiv:2011.00997  [pdf, other

    cs.CY

    Analyzing Wikidata Transclusion on English Wikipedia

    Authors: Isaac Johnson

    Abstract: Wikidata is steadily becoming more central to Wikipedia, not just in maintaining interlanguage links, but in automated population of content within the articles themselves. It is not well understood, however, how widespread this transclusion of Wikidata content is within Wikipedia. This work presents a taxonomy of Wikidata transclusion from the perspective of its potential impact on readers and an… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: Accepted to 1st Wikidata Workshop at ISWC 2020

  23. arXiv:2008.12314  [pdf, other

    cs.CY

    A Taxonomy of Knowledge Gaps for Wikimedia Projects (Second Draft)

    Authors: Miriam Redi, Martin Gerlach, Isaac Johnson, Jonathan Morgan, Leila Zia

    Abstract: In January 2019, prompted by the Wikimedia Movement's 2030 strategic direction, the Research team at the Wikimedia Foundation identified the need to develop a knowledge gaps index -- a composite index to support the decision makers across the Wikimedia movement by providing: a framework to encourage structured and targeted brainstorming discussions; data on the state of the knowledge gaps across t… ▽ More

    Submitted 29 January, 2021; v1 submitted 27 August, 2020; originally announced August 2020.

    Comments: Second draft: see summary of changes at https://meta.wikimedia.org/wiki/Research:Knowledge_Gaps_Index/Taxonomy/Summary_of_Changes_for_Second_Version

  24. arXiv:2007.10403  [pdf, other

    cs.CY

    Global gender differences in Wikipedia readership

    Authors: Isaac Johnson, Florian Lemmerich, Diego Sáez-Trumper, Robert West, Markus Strohmaier, Leila Zia

    Abstract: Wikipedia represents the largest and most popular source of encyclopedic knowledge in the world today, aiming to provide equal access to information worldwide. From a global online survey of 65,031 readers of Wikipedia and their corresponding reading logs, we present novel evidence of gender differences in Wikipedia readership and how they manifest in records of user behavior. More specifically we… ▽ More

    Submitted 20 July, 2020; originally announced July 2020.

  25. arXiv:1908.10954  [pdf

    cs.HC cs.CY cs.SI

    Not at Home on the Range: Peer Production and the Urban/Rural Divide

    Authors: Isaac Johnson, Allen Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, Brent Hecht

    Abstract: Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in bo… ▽ More

    Submitted 28 August, 2019; originally announced August 2019.

    Comments: 10 pages, published on CHI'16

    ACM Class: H.5.m

    Journal ref: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

  26. arXiv:1906.08576  [pdf

    cs.CY

    Measuring the Importance of User-Generated Content to Search Engines

    Authors: Nicholas Vincent, Isaac Johnson, Patrick Sheehan, Brent Hecht

    Abstract: Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs. In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: This version includes a bibliography entry that was missing from the first version of the text due to a processing error. This is a preprint of a paper accepted at ICWSM 2019. Please cite that version instead

  27. Detecting and Gauging Impact on Wikipedia Page Views

    Authors: Xiaoxi Chelsy Xie, Isaac Johnson, Anne Gomez

    Abstract: Understanding how various external campaigns or events affect readership on Wikipedia is important to efforts aimed at improving awareness and access to its content. In this paper, we consider how to build time-series models aimed at predicting page views on Wikipedia with the goal of detecting whether there are significant changes to the existing trends. We test these models on two different even… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

  28. arXiv:1504.08033  [pdf, other

    cs.PL

    Automating Abstract Interpretation of Abstract Machines

    Authors: James Ian Johnson

    Abstract: Static program analysis is a valuable tool for any programming language that people write programs in. The prevalence of scripting languages in the world suggests programming language interpreters are relatively easy to write. Users of these languages lament their inability to analyze their code, therefore programming language analyzers are not easy to write. This thesis investigates a systematic… ▽ More

    Submitted 29 April, 2015; originally announced April 2015.

    Comments: This dissertation has been accepted by the thesis committee

  29. Pushdown flow analysis with abstract garbage collection

    Authors: J. Ian Johnson, Ilya Sergey, Christopher Earl, Matthew Might, David Van Horn

    Abstract: In the static analysis of functional programs, pushdown flow analysis and abstract garbage collection push the boundaries of what we can learn about programs statically. This work illuminates and poses solutions to theoretical and practical challenges that stand in the way of combining the power of these techniques. Pushdown flow analysis grants unbounded yet computable polyvariance to the analysi… ▽ More

    Submitted 19 June, 2014; originally announced June 2014.

    ACM Class: D.3.4; F.3.2

    Journal ref: Journal of Functional Programming, Volume 24, Special Issue 2-3, May 2014, pp 218-283

  30. Abstracting Abstract Control (Extended)

    Authors: J. Ian Johnson, David Van Horn

    Abstract: The strength of a dynamic language is also its weakness: run-time flexibility comes at the cost of compile-time predictability. Many of the hallmarks of dynamic languages such as closures, continuations, various forms of reflection, and a lack of static types make many programmers rejoice, while compiler writers, tool developers, and verification engineers lament. The dynamism of these features si… ▽ More

    Submitted 14 August, 2014; v1 submitted 14 May, 2013; originally announced May 2013.

    Comments: To appear at DLS '14

    ACM Class: F.3.2

  31. Optimizing Abstract Abstract Machines

    Authors: J. Ian Johnson, Nicholas Labich, Matthew Might, David Van Horn

    Abstract: The technique of abstracting abstract machines (AAM) provides a systematic approach for deriving computable approximations of evaluators that are easily proved sound. This article contributes a complementary step-by-step process for subsequently going from a naive analyzer derived under the AAM approach, to an efficient and correct implementation. The end result of the process is a two to three or… ▽ More

    Submitted 24 July, 2013; v1 submitted 15 November, 2012; originally announced November 2012.

    Comments: Proceedings of the International Conference on Functional Programming 2013 (ICFP 2013). Boston, Massachusetts. September, 2013

    ACM Class: F.3.2