subscribe to arXiv mailings

Flowy: Supporting UX Design Decisions Through AI-Driven Pattern Annotation in Multi-Screen User Flows

Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Yewon Oh, Bryan Wang, Toby Jia-Jun Li

Abstract: Many recent AI-powered UX design tools focus on generating individual static UI screens from natural language. However, they overlook the crucial aspect of interactions and user experiences across multiple screens. Through formative studies with UX professionals, we identified limitations of these tools in supporting realistic UX design workflows. In response, we designed and developed Flowy, an a… ▽ More Many recent AI-powered UX design tools focus on generating individual static UI screens from natural language. However, they overlook the crucial aspect of interactions and user experiences across multiple screens. Through formative studies with UX professionals, we identified limitations of these tools in supporting realistic UX design workflows. In response, we designed and developed Flowy, an app that augments designers' information foraging process in ideation by supplementing specific user flow examples with distilled design pattern knowledge. Flowy utilizes large multimodal AI models and a high-quality user flow dataset to help designers identify and understand relevant abstract design patterns in the design space for multi-screen user flows. Our user study with professional UX designers demonstrates how Flowy supports realistic UX tasks. Our design considerations in Flowy, such as representations with appropriate levels of abstraction and assisted navigation through the solution space, are generalizable to other creative tasks and embody a human-centered, intelligence augmentation approach to using AI in UX design. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.16173 [pdf, other]

Crepe: A Mobile Screen Data Collector Using Graph Query

Authors: Yuwen Lu, Meng Chen, Qi Zhao, Victor Cox, Yang Yang, Meng Jiang, Jay Brockman, Tamara Kay, Toby Jia-Jun Li

Abstract: Collecting mobile datasets remains challenging for academic researchers due to limited data access and technical barriers. Commercial organizations often possess exclusive access to mobile data, leading to a "data monopoly" that restricts the independence of academic research. Existing open-source mobile data collection frameworks primarily focus on mobile sensing data rather than screen content,… ▽ More Collecting mobile datasets remains challenging for academic researchers due to limited data access and technical barriers. Commercial organizations often possess exclusive access to mobile data, leading to a "data monopoly" that restricts the independence of academic research. Existing open-source mobile data collection frameworks primarily focus on mobile sensing data rather than screen content, which is crucial for various research studies. We present Crepe, a no-code Android app that enables researchers to collect information displayed on screen through simple demonstrations of target data. Crepe utilizes a novel Graph Query technique which augments the structures of mobile UI screens to support flexible identification, location, and collection of specific data pieces. The tool emphasizes participants' privacy and agency by providing full transparency over collected data and allowing easy opt-out. We designed and built Crepe for research purposes only and in scenarios where researchers obtain explicit consent from participants. Code for Crepe will be open-sourced to support future academic research data collection. △ Less

Submitted 23 June, 2024; originally announced June 2024.

arXiv:2405.18573 [pdf, other]

Programmer Visual Attention During Context-Aware Code Summarization

Authors: Aakash Bansal, Robert Wallace, Zachary Karas, Ningzhi Tang, Yu Huang, Toby Jia-Jun Li, Collin McMillan

Abstract: Abridged: Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. We conducted an in-depth human study with XY Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye-tracking equipment to map the visual attention of programmers while they w… ▽ More Abridged: Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. We conducted an in-depth human study with XY Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye-tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read significantly (p<0.01) fewer words and make significantly fewer revisits to words (p\textless0.03) as they summarize more methods during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p<0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 10 pages, 4 figures, 4 tables. this is a pre-print submitted to IEEE Transactions on Software Engineering for review

arXiv:2405.16081 [pdf, other]

A Study on Developer Behaviors for Validating and Repairing LLM-Generated Code Using Eye Tracking and IDE Actions

Authors: Ningzhi Tang, Meng Chen, Zheng Ning, Aakash Bansal, Yu Huang, Collin McMillan, Toby Jia-Jun Li

Abstract: The increasing use of large language model (LLM)-powered code generation tools, such as GitHub Copilot, is transforming software engineering practices. This paper investigates how developers validate and repair code generated by Copilot and examines the impact of code provenance awareness during these processes. We conducted a lab study with 28 participants, who were tasked with validating and rep… ▽ More The increasing use of large language model (LLM)-powered code generation tools, such as GitHub Copilot, is transforming software engineering practices. This paper investigates how developers validate and repair code generated by Copilot and examines the impact of code provenance awareness during these processes. We conducted a lab study with 28 participants, who were tasked with validating and repairing Copilot-generated code in three software projects. Participants were randomly divided into two groups: one informed about the provenance of LLM-generated code and the other not. We collected data on IDE interactions, eye-tracking, cognitive workload assessments, and conducted semi-structured interviews. Our results indicate that, without explicit information, developers often fail to identify the LLM origin of the code. Developers generally employ similar validation and repair strategies for LLM-generated code, but exhibit behaviors such as frequent switching between code and comments, different attentional focus, and a tendency to delete and rewrite code. Being aware of the code's provenance led to improved performance, increased search efforts, more frequent Copilot usage, and higher cognitive workload. These findings enhance our understanding of how developers interact with LLM-generated code and carry implications for designing tools that facilitate effective human-LLM collaboration in software development. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.12438 [pdf, other]

doi 10.1145/3635636.3664260

CoCo Matrix: Taxonomy of Cognitive Contributions in Co-writing with Intelligent Agents

Authors: Ruyuan Wan, Simret Gebreegziabhe, Toby Jia-Jun Li, Karla Badillo-Urquiola

Abstract: In recent years, there has been a growing interest in employing intelligent agents in writing. Previous work emphasizes the evaluation of the quality of end product-whether it was coherent and polished, overlooking the journey that led to the product, which is an invaluable dimension of the creative process. To understand how to recognize human efforts in co-writing with intelligent writing system… ▽ More In recent years, there has been a growing interest in employing intelligent agents in writing. Previous work emphasizes the evaluation of the quality of end product-whether it was coherent and polished, overlooking the journey that led to the product, which is an invaluable dimension of the creative process. To understand how to recognize human efforts in co-writing with intelligent writing systems, we adapt Flower and Hayes' cognitive process theory of writing and propose CoCo Matrix, a two-dimensional taxonomy of entropy and information gain, to depict the new human-agent co-writing model. We define four quadrants and situate thirty-four published systems within the taxonomy. Our research found that low entropy and high information gain systems are under-explored, yet offer promising future directions in writing tasks that benefit from the agent's divergent planning and the human's focused translation. CoCo Matrix, not only categorizes different writing systems but also deepens our understanding of the cognitive processes in human-agent co-writing. By analyzing minimal changes in the writing process, CoCo Matrix serves as a proxy for the writer's mental model, allowing writers to reflect on their contributions. This reflection is facilitated through the measured metrics of information gain and entropy, which provide insights irrespective of the writing system used. △ Less

Submitted 20 May, 2024; originally announced May 2024.

arXiv:2404.15107 [pdf, other]

doi 10.1145/3635636.3656189

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Authors: Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

Abstract: Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only mon… ▽ More Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, MIMOSA automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix the errors in the locations of sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of MIMOSA exemplifies a human-AI collaboration approach that, instead of utilizing state-of art end-to-end "black-box" ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user's workflow. A lab user study with 15 participants demonstrates MIMOSA's usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.00405 [pdf, other]

doi 10.1145/3613905.3650786

A Taxonomy for Human-LLM Interaction Modes: An Initial Exploration

Authors: Jie Gao, Simret Araya Gebreegziabher, Kenny Tsu Wei Choo, Toby Jia-Jun Li, Simon Tangi Perrault, Thomas W. Malone

Abstract: With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to… ▽ More With ChatGPT's release, conversational prompting has become the most popular form of human-LLM interaction. However, its effectiveness is limited for more complex tasks involving reasoning, creativity, and iteration. Through a systematic analysis of HCI papers published since 2021, we identified four key phases in the human-LLM interaction flow - planning, facilitating, iterating, and testing - to precisely understand the dynamics of this process. Additionally, we have developed a taxonomy of four primary interaction modes: Mode 1: Standard Prompting, Mode 2: User Interface, Mode 3: Context-based, and Mode 4: Agent Facilitator. This taxonomy was further enriched using the "5W1H" guideline method, which involved a detailed examination of definitions, participant roles (Who), the phases that happened (When), human objectives and LLM abilities (What), and the mechanics of each interaction mode (How). We anticipate this taxonomy will contribute to the future design and evaluation of human-LLM interaction. △ Less

Submitted 30 March, 2024; originally announced April 2024.

Comments: 11 pages, 4 figures, 3 tables. Accepted at CHI Late-Breaking Work 2024

arXiv:2403.19876 [pdf, other]

"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices

Authors: Shivani Kapania, Ruiyi Wang, Toby Jia-Jun Li, Tianshi Li, Hong Shen

Abstract: Large language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews an… ▽ More Large language models are increasingly applied in real-world scenarios, including research and education. These models, however, come with well-known ethical issues, which may manifest in unexpected ways in human-computer interaction research due to the extensive engagement with human subjects. This paper reports on research practices related to LLM use, drawing on 16 semi-structured interviews and a survey conducted with 50 HCI researchers. We discuss the ways in which LLMs are already being utilized throughout the entire HCI research pipeline, from ideation to system development and paper writing. While researchers described nuanced understandings of ethical issues, they were rarely or only partially able to identify and address those ethical concerns in their own projects. This lack of action and reliance on workarounds was explained through the perceived lack of control and distributed responsibility in the LLM supply chain, the conditional nature of engaging with ethics, and competing priorities. Finally, we reflect on the implications of our findings and present opportunities to shape emerging norms of engaging with large language models in HCI research. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2402.14096 [pdf, other]

doi 10.1145/3643732

EyeTrans: Merging Human and Machine Attention for Neural Code Summarization

Authors: Yifan Zhang, Jiliang Li, Zachary Karas, Aakash Bansal, Toby Jia-Jun Li, Collin McMillan, Kevin Leach, Yu Huang

Abstract: Neural code summarization leverages deep learning models to automatically generate brief natural language summaries of code snippets. The development of Transformer models has led to extensive use of attention during model design. While existing work has primarily and almost exclusively focused on static properties of source code and related structural representations like the Abstract Syntax Tree… ▽ More Neural code summarization leverages deep learning models to automatically generate brief natural language summaries of code snippets. The development of Transformer models has led to extensive use of attention during model design. While existing work has primarily and almost exclusively focused on static properties of source code and related structural representations like the Abstract Syntax Tree (AST), few studies have considered human attention, that is, where programmers focus while examining and comprehending code. In this paper, we develop a method for incorporating human attention into machine attention to enhance neural code summarization. To facilitate this incorporation and vindicate this hypothesis, we introduce EyeTrans, which consists of three steps: (1) we conduct an extensive eye-tracking human study to collect and pre-analyze data for model training, (2) we devise a data-centric approach to integrate human attention with machine attention in the Transformer architecture, and (3) we conduct comprehensive experiments on two code summarization tasks to demonstrate the effectiveness of incorporating human attention into Transformers. Integrating human attention leads to an improvement of up to 29.91% in Functional Summarization and up to 6.39% in General Code Summarization performance, demonstrating the substantial benefits of this combination. We further explore performance in terms of robustness and efficiency by creating challenging summarization scenarios in which EyeTrans exhibits interesting properties. We also visualize the attention map to depict the simplifying effect of machine attention in the Transformer by incorporating human attention. This work has the potential to propel AI research in software engineering by introducing more human-centered approaches and data. △ Less

Submitted 29 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.07300 [pdf, other]

SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Authors: Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li

Abstract: Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video… ▽ More Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs. △ Less

Submitted 26 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2402.06089 [pdf, other]

AI Assistance for UX: A Literature Review Through Human-Centered AI

Authors: Yuwen Lu, Yuewen Yang, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li

Abstract: Recent advancements in HCI and AI research attempt to support user experience (UX) practitioners with AI-enabled tools. Despite the potential of emerging models and new interaction mechanisms, mainstream adoption of such tools remains limited. We took the lens of Human-Centered AI and presented a systematic literature review of 359 papers, aiming to synthesize the current landscape, identify trend… ▽ More Recent advancements in HCI and AI research attempt to support user experience (UX) practitioners with AI-enabled tools. Despite the potential of emerging models and new interaction mechanisms, mainstream adoption of such tools remains limited. We took the lens of Human-Centered AI and presented a systematic literature review of 359 papers, aiming to synthesize the current landscape, identify trends, and uncover UX practitioners' unmet needs in AI support. Guided by the Double Diamond design framework, our analysis uncovered that UX practitioners' unique focuses on empathy building and experiences across UI screens are often overlooked. Simplistic AI automation can obstruct the valuable empathy-building process. Furthermore, focusing solely on individual UI screens without considering interactions and user flows reduces the system's practical value for UX designers. Based on these findings, we call for a deeper understanding of UX mindsets and more designer-centric datasets and evaluation metrics, for HCI and AI communities to collaboratively work toward effective AI support for UX. △ Less

Submitted 12 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2311.09825 [pdf, other]

Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks

Authors: Yuxuan Lu, Bingsheng Yao, Shao Zhang, Yun Wang, Peng Zhang, Tun Lu, Toby Jia-Jun Li, Dakuo Wang

Abstract: Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained wit… ▽ More Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained with expert annotations in domain-specific tasks? In this work, we conduct an empirical experiment on four datasets from three different domains comparing SOTA LLMs with small models trained on expert annotations with AL. We found that small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller. Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications and human experts remain indispensable in tasks involving data annotation driven by domain-specific knowledge. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.17846 [pdf, other]

From Awareness to Action: Exploring End-User Empowerment Interventions for Dark Patterns in UX

Authors: Yuwen Lu, Chao Zhang, Yuewen Yang, Yaxing Yao, Toby Jia-Jun Li

Abstract: The study of UX dark patterns, i.e., UI designs that seek to manipulate user behaviors, often for the benefit of online services, has drawn significant attention in the CHI and CSCW communities in recent years. To complement previous studies in addressing dark patterns from (1) the designer's perspective on education and advocacy for ethical designs; and (2) the policymaker's perspective on new re… ▽ More The study of UX dark patterns, i.e., UI designs that seek to manipulate user behaviors, often for the benefit of online services, has drawn significant attention in the CHI and CSCW communities in recent years. To complement previous studies in addressing dark patterns from (1) the designer's perspective on education and advocacy for ethical designs; and (2) the policymaker's perspective on new regulations, we propose an end-user-empowerment intervention approach that helps users (1) raise the awareness of dark patterns and understand their underlying design intents; (2) take actions to counter the effects of dark patterns using a web augmentation approach. Through a two-phase co-design study, including 5 co-design workshops (N=12) and a 2-week technology probe study (N=15), we reported findings on the understanding of users' needs, preferences, and challenges in handling dark patterns and investigated the feedback and reactions to users' awareness of and action on dark patterns being empowered in a realistic in-situ setting. △ Less

Submitted 2 February, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to CSCW 2024

arXiv:2310.15455 [pdf, other]

UI Layout Generation with LLMs Guided by UI Grammar

Authors: Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li

Abstract: The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarch… ▽ More The recent advances in Large Language Models (LLMs) have stimulated interest among researchers and industry professionals, particularly in their application to tasks concerning mobile user interfaces (UIs). This position paper investigates the use of LLMs for UI layout generation. Central to our exploration is the introduction of UI grammar -- a novel approach we proposed to represent the hierarchical structure inherent in UI screens. The aim of this approach is to guide the generative capacities of LLMs more effectively and improve the explainability and controllability of the process. Initial experiments conducted with GPT-4 showed the promising capability of LLMs to produce high-quality user interfaces via in-context learning. Furthermore, our preliminary comparative study suggested the potential of the grammar-based approach in improving the quality of generative results in specific aspects. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: ICML 2023 Workshop on AI and HCI

arXiv:2310.12953 [pdf, other]

doi 10.1145/3613904.3642400

Luminate: Structured Generation and Exploration of Design Space with Large Language Models for Human-AI Co-Creation

Authors: Sangho Suh, Meng Chen, Bryan Min, Toby Jia-Jun Li, Haijun Xia

Abstract: Thanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rap… ▽ More Thanks to their generative capabilities, large language models (LLMs) have become an invaluable tool for creative processes. These models have the capacity to produce hundreds and thousands of visual and textual outputs, offering abundant inspiration for creative endeavors. But are we harnessing their full potential? We argue that current interaction paradigms fall short, guiding users towards rapid convergence on a limited set of ideas, rather than empowering them to explore the vast latent design space in generative models. To address this limitation, we propose a framework that facilitates the structured generation of design space in which users can seamlessly explore, evaluate, and synthesize a multitude of responses. We demonstrate the feasibility and usefulness of this framework through the design and development of an interactive system, Luminate, and a user study with 14 professional writers. Our work advances how we interact with LLMs for creative tasks, introducing a way to harness the creative potential of LLMs. △ Less

Submitted 13 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

arXiv:2309.14510 [pdf, other]

doi 10.1145/3613904.3642363

An Empathy-Based Sandbox Approach to Bridge the Privacy Gap among Attitudes, Goals, Knowledge, and Behaviors

Authors: Chaoran Chen, Weijun Li, Wenxin Song, Yanfang Ye, Yaxing Yao, Toby Jia-jun Li

Abstract: Managing privacy to reach privacy goals is challenging, as evidenced by the privacy attitude-behavior gap. Mitigating this discrepancy requires solutions that account for both system opaqueness and users' hesitations in testing different privacy settings due to fears of unintended data exposure. We introduce an empathy-based approach that allows users to experience how privacy attributes may alter… ▽ More Managing privacy to reach privacy goals is challenging, as evidenced by the privacy attitude-behavior gap. Mitigating this discrepancy requires solutions that account for both system opaqueness and users' hesitations in testing different privacy settings due to fears of unintended data exposure. We introduce an empathy-based approach that allows users to experience how privacy attributes may alter system outcomes in a risk-free sandbox environment from the perspective of artificially generated personas. To generate realistic personas, we introduce a novel pipeline that augments the outputs of large language models (e.g., GPT-4) using few-shot learning, contextualization, and chain of thoughts. Our empirical studies demonstrated the adequate quality of generated personas and highlighted the changes in privacy-related applications (e.g., online advertising) caused by different personas. Furthermore, users demonstrated cognitive and emotional empathy towards the personas when interacting with our sandbox. We offered design implications for downstream applications in improving user privacy literacy. △ Less

Submitted 20 March, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.13858 [pdf, other]

Impact of Human-AI Interaction on User Trust and Reliance in AI-Assisted Qualitative Coding

Authors: Jie Gao, Junming Cao, ShunYi Yeo, Kenny Tsu Wei Choo, Zheng Zhang, Toby Jia-Jun Li, Shengdong Zhao, Simon Tangi Perrault

Abstract: While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted… ▽ More While AI shows promise for enhancing the efficiency of qualitative analysis, the unique human-AI interaction resulting from varied coding strategies makes it challenging to develop a trustworthy AI-assisted qualitative coding system (AIQCs) that supports coding tasks effectively. We bridge this gap by exploring the impact of varying coding strategies on user trust and reliance on AI. We conducted a mixed-methods split-plot 3x3 study, involving 30 participants, and a follow-up study with 6 participants, exploring varying text selection and code length in the use of our AIQCs system for qualitative analysis. Our results indicate that qualitative open coding should be conceptualized as a series of distinct subtasks, each with differing levels of complexity, and therefore, should be given tailored design considerations. We further observed a discrepancy between perceived and behavioral measures, and emphasized the potential challenges of under- and over-reliance on AIQCs systems. Additional design implications were also proposed for consideration. △ Less

Submitted 24 September, 2023; originally announced September 2023.

Comments: 27 pages with references, 9 figures, 5 tables

arXiv:2308.15272 [pdf, other]

AutoDroid: LLM-powered Task Automation in Android

Authors: Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, Yunxin Liu

Abstract: Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning… ▽ More Mobile task automation is an attractive technique that aims to enable voice-based hands-free user interaction with smartphones. However, existing approaches suffer from poor scalability due to the limited language understanding ability and the non-trivial manual efforts required from developers or end-users. The recent advance of large language models (LLMs) in language understanding and reasoning inspires us to rethink the problem from a model-centric perspective, where task preparation, comprehension, and execution are handled by a unified language model. In this work, we introduce AutoDroid, a mobile task automation system capable of handling arbitrary tasks on any Android application without manual efforts. The key insight is to combine the commonsense knowledge of LLMs and domain-specific knowledge of apps through automated dynamic analysis. The main components include a functionality-aware UI representation method that bridges the UI with the LLM, exploration-based memory injection techniques that augment the app-specific domain knowledge of LLM, and a multi-granularity query optimization module that reduces the cost of model inference. We integrate AutoDroid with off-the-shelf LLMs including online GPT-4/GPT-3.5 and on-device Vicuna, and evaluate its performance on a new benchmark for memory-augmented Android task automation with 158 common tasks. The results demonstrated that AutoDroid is able to precisely generate actions with an accuracy of 90.9%, and complete tasks with a success rate of 71.3%, outperforming the GPT-4-powered baselines by 36.4% and 39.7%. The demo, benchmark suites, and source code of AutoDroid will be released at url{https://autodroid-sys.github.io/}. △ Less

Submitted 9 March, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: Published in MobiCom 2024; Original title: "Empowering LLM to use Smartphone for Intelligent Task Automation"

arXiv:2308.13920 [pdf, other]

Modeling Programmer Attention as Scanpath Prediction

Authors: Aakash Bansal, Chia-Yi Su, Zachary Karas, Yifan Zhang, Yu Huang, Toby Jia-Jun Li, Collin McMillan

Abstract: This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfac… ▽ More This paper launches a new effort at modeling programmer attention by predicting eye movement scanpaths. Programmer attention refers to what information people intake when performing programming tasks. Models of programmer attention refer to machine prediction of what information is important to people. Models of programmer attention are important because they help researchers build better interfaces, assistive technologies, and more human-like AI. For many years, researchers in SE have built these models based on features such as mouse clicks, key logging, and IDE interactions. Yet the holy grail in this area is scanpath prediction -- the prediction of the sequence of eye fixations a person would take over a visual stimulus. A person's eye movements are considered the most concrete evidence that a person is taking in a piece of information. Scanpath prediction is a notoriously difficult problem, but we believe that the emergence of lower-cost, higher-accuracy eye tracking equipment and better large language models of source code brings a solution within grasp. We present an eye tracking experiment with 27 programmers and a prototype scanpath predictor to present preliminary results and obtain early community feedback. △ Less

Submitted 26 August, 2023; originally announced August 2023.

Comments: Accepter at ASE2023 NIER Track. 4 pages + 1 page for references, 4 figures, 1 table

arXiv:2307.15167 [pdf, other]

doi 10.1145/3586183.3606776

doi 10.1145/3586183.360677610.1145/3586183.3606776

doi 10.1145/3586183.3606776

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data

Authors: Zheng Zhang, Zheng Ning, Chenliang Xu, Yapeng Tian, Toby Jia-Jun Li

Abstract: Audio-visual learning seeks to enhance the computer's multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks, such as video retrieval, AR/VR, and accessibility, the performance and adoption of existing audio-visual models have been impeded by the availability of high-quality datasets. Annotating audio-visual datasets… ▽ More Audio-visual learning seeks to enhance the computer's multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks, such as video retrieval, AR/VR, and accessibility, the performance and adoption of existing audio-visual models have been impeded by the availability of high-quality datasets. Annotating audio-visual datasets is laborious, expensive, and time-consuming. To address this challenge, we designed and developed an efficient audio-visual annotation tool called Peanut. Peanut's human-AI collaborative pipeline separates the multi-modal task into two single-modal tasks, and utilizes state-of-the-art object detection and sound-tagging models to reduce the annotators' effort to process each frame and the number of manually-annotated frames needed. A within-subject user study with 20 participants found that Peanut can significantly accelerate the audio-visual data annotation process while maintaining high annotation accuracy. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 18 pages, published in UIST'23

arXiv:2307.04280 [pdf, ps, other]

Shaping the Emerging Norms of Using Large Language Models in Social Computing Research

Authors: Hong Shen, Tianshi Li, Toby Jia-Jun Li, Joon Sung Park, Diyi Yang

Abstract: The emergence of Large Language Models (LLMs) has brought both excitement and concerns to social computing research. On the one hand, LLMs offer unprecedented capabilities in analyzing vast amounts of textual data and generating human-like responses, enabling researchers to delve into complex social phenomena. On the other hand, concerns are emerging regarding the validity, privacy, and ethics of… ▽ More The emergence of Large Language Models (LLMs) has brought both excitement and concerns to social computing research. On the one hand, LLMs offer unprecedented capabilities in analyzing vast amounts of textual data and generating human-like responses, enabling researchers to delve into complex social phenomena. On the other hand, concerns are emerging regarding the validity, privacy, and ethics of the research when LLMs are involved. This SIG aims at offering an open space for social computing researchers who are interested in understanding the impacts of LLMs to discuss their current practices, perspectives, challenges when engaging with LLMs in their everyday work and collectively shaping the emerging norms of using LLMs in social computing research. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2305.07372 [pdf, other]

Interactive Text-to-SQL Generation via Editable Step-by-Step Explanations

Authors: Yuan Tian, Zheng Zhang, Zheng Ning, Toby Jia-Jun Li, Jonathan K. Kummerfeld, Tianyi Zhang

Abstract: Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly f… ▽ More Relational databases play an important role in business, science, and more. However, many users cannot fully unleash the analytical power of relational databases, because they are not familiar with database languages such as SQL. Many techniques have been proposed to automatically generate SQL from natural language, but they suffer from two issues: (1) they still make many mistakes, particularly for complex queries, and (2) they do not provide a flexible way for non-expert users to validate and refine incorrect queries. To address these issues, we introduce a new interaction mechanism that allows users to directly edit a step-by-step explanation of a query to fix errors. Our experiments on multiple datasets, as well as a user study with 24 participants, demonstrate that our approach can achieve better performance than multiple SOTA approaches. Our code and datasets are available at https://github.com/magic-YuanTian/STEPS. △ Less

Submitted 4 January, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

Comments: Accepted to EMNLP 2023

ACM Class: I.2.7

arXiv:2304.07810 [pdf, other]

doi 10.1145/3586183.3606800

VISAR: A Human-AI Argumentative Writing Assistant with Visual Programming and Rapid Draft Prototyping

Authors: Zheng Zhang, Jie Gao, Ranjodh Singh Dhaliwal, Toby Jia-Jun Li

Abstract: In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting. Recent advances in large language models (LLMs) have made interactive text generation through a chat interface (e.g., ChatGPT) possible. However, this approach often neglects implicit writing context and user intent, lacks… ▽ More In argumentative writing, writers must brainstorm hierarchical writing goals, ensure the persuasiveness of their arguments, and revise and organize their plans through drafting. Recent advances in large language models (LLMs) have made interactive text generation through a chat interface (e.g., ChatGPT) possible. However, this approach often neglects implicit writing context and user intent, lacks support for user control and autonomy, and provides limited assistance for sensemaking and revising writing plans. To address these challenges, we introduce VISAR, an AI-enabled writing assistant system designed to help writers brainstorm and revise hierarchical goals within their writing context, organize argument structures through synchronized text editing and visual programming, and enhance persuasiveness with argumentation spark recommendations. VISAR allows users to explore, experiment with, and validate their writing plans using automatic draft prototyping. A controlled lab study confirmed the usability and effectiveness of VISAR in facilitating the argumentative writing planning process. △ Less

Submitted 27 July, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

Comments: 30 pages, published in UIST'23

arXiv:2304.07366 [pdf, other]

CollabCoder: A Lower-barrier, Rigorous Workflow for Inductive Collaborative Qualitative Analysis with Large Language Models

Authors: Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, Simon Tangi Perrault

Abstract: Collaborative Qualitative Analysis (CQA) can enhance qualitative analysis rigor and depth by incorporating varied viewpoints. Nevertheless, ensuring a rigorous CQA procedure itself can be both demanding and costly. To lower this bar, we take a theoretical perspective to design the CollabCoder workflow, that integrates Large Language Models (LLMs) into key inductive CQA stages: independent open cod… ▽ More Collaborative Qualitative Analysis (CQA) can enhance qualitative analysis rigor and depth by incorporating varied viewpoints. Nevertheless, ensuring a rigorous CQA procedure itself can be both demanding and costly. To lower this bar, we take a theoretical perspective to design the CollabCoder workflow, that integrates Large Language Models (LLMs) into key inductive CQA stages: independent open coding, iterative discussions, and final codebook creation. In the open coding phase, CollabCoder offers AI-generated code suggestions and records decision-making data. During discussions, it promotes mutual understanding by sharing this data within the coding team and using quantitative metrics to identify coding (dis)agreements, aiding in consensus-building. In the code grouping stage, CollabCoder provides primary code group suggestions, lightening the cognitive load of finalizing the codebook. A 16-user evaluation confirmed the effectiveness of CollabCoder, demonstrating its advantages over existing software and providing empirical insights into the role of LLMs in the CQA practice. △ Less

Submitted 22 January, 2024; v1 submitted 14 April, 2023; originally announced April 2023.

Comments: Will be published at the ACM CHI Conference on Human Factors in Computing Systems (CHI'24)

arXiv:2210.02830 [pdf, other]

KnowledgeShovel: An AI-in-the-Loop Document Annotation System for Scientific Knowledge Base Construction

Authors: Shao Zhang, Yuting Jia, Hui Xu, Dakuo Wang, Toby Jia-jun Li, Ying Wen, Xinbing Wang, Chenghu Zhou

Abstract: Constructing a comprehensive, accurate, and useful scientific knowledge base is crucial for human researchers synthesizing scientific knowledge and for enabling Al-driven scientific discovery. However, the current process is difficult, error-prone, and laborious due to (1) the enormous amount of scientific literature available; (2) the highly-specialized scientific domains; (3) the diverse modalit… ▽ More Constructing a comprehensive, accurate, and useful scientific knowledge base is crucial for human researchers synthesizing scientific knowledge and for enabling Al-driven scientific discovery. However, the current process is difficult, error-prone, and laborious due to (1) the enormous amount of scientific literature available; (2) the highly-specialized scientific domains; (3) the diverse modalities of information (text, figure, table); and, (4) the silos of scientific knowledge in different publications with inconsistent formats and structures. Informed by a formative study and iterated with participatory design workshops, we designed and developed KnowledgeShovel, an Al-in-the-Loop document annotation system for researchers to construct scientific knowledge bases. The design of KnowledgeShovel introduces a multi-step multi-modal human-AI collaboration pipeline that aligns with users' existing workflows to improve data accuracy while reducing the human burden. A follow-up user evaluation with 7 geoscience researchers shows that KnowledgeShovel can enable efficient construction of scientific knowledge bases with satisfactory accuracy. △ Less

Submitted 6 October, 2022; originally announced October 2022.

Comments: 33 pages, 17 figures, manuscript submitted to CHI2023

arXiv:2206.04612 [pdf, ps, other]

A computational framework for weighted simplicial homology

Authors: Andrei C. Bura, Neelav S. Dutta, Thomas J. X. Li, Christian M. Reidys

Abstract: We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in t… ▽ More We provide a bottom up construction of torsion generators for weighted homology of a weighted complex over a discrete valuation ring $R=\mathbb{F}[[π]]$. This is achieved by starting from a basis for classical homology of the $n$-th skeleton for the underlying complex with coefficients in the residue field $\mathbb{F}$ and then lifting it to a basis for the weighted homology with coefficients in the ring $R$. Using the latter, a bijection is established between $n+1$ and $n$ dimensional simplices whose weight ratios provide the exponents of the $π$-monomials that generate each torsion summand in the structure theorem of the weighted homology modules over $R$. We present algorithms that subsume the torsion computation by reducing it to normalization over the residue field of $R$, and describe a Python package we implemented that takes advantage of this reduction and performs the computation efficiently. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 16 pages, 2 figures

MSC Class: 05E45; 55U10; 55N35; 13P20

arXiv:2204.13842 [pdf, other]

A Bottom-Up End-User Intelligent Assistant Approach to Empower Gig Workers against AI Inequality

Authors: Toby Jia-Jun Li, Yuwen Lu, Jaylexia Clark, Meng Chen, Victor Cox, Meng Jiang, Yang Yang, Tamara Kay, Danielle Wood, Jay Brockman

Abstract: The growing inequality in gig work between workers and platforms has become a critical social issue as gig work plays an increasingly prominent role in the future of work. The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of… ▽ More The growing inequality in gig work between workers and platforms has become a critical social issue as gig work plays an increasingly prominent role in the future of work. The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of workers' diverse preferences, and workers' lack of trust in the platforms. In this position paper, we argue that a bottom-up approach that empowers individual workers to access AI-enabled work planning support and share data among a group of workers through a network of end-user-programmable intelligent assistants is a practical way to bridge AI inequality in gig work under the current paradigm of privately owned platforms. This position paper articulates a set of research challenges, potential approaches, and community engagement opportunities, seeking to start a dialogue on this important research topic in the interdisciplinary CHIWORK community. △ Less

Submitted 28 April, 2022; originally announced April 2022.

Comments: In 2022 Symposium on Human-Computer Interaction for Work (CHIWORK 2022)

arXiv:2203.13947 [pdf, other]

Fantastic Questions and Where to Find Them: FairytaleQA -- An Authentic Dataset for Narrative Comprehension

Authors: Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Jia-Jun Li, Nora Bradford, Branda Sun, Tran Bao Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Yu, Mark Warschauer

Abstract: Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on th… ▽ More Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models' fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions. △ Less

Submitted 25 March, 2022; originally announced March 2022.

Comments: Accepted to ACL 2022

arXiv:2202.06205 [pdf, other]

doi 10.1145/3491102.3517479

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

Authors: Zheng Zhang, Ying Xu, Yanhao Wang, Bingsheng Yao, Daniel Ritchie, Tongshuang Wu, Mo Yu, Dakuo Wang, Toby Jia-Jun Li

Abstract: Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent inv… ▽ More Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges in coming up with appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, and underoptimizes for child engagement. Informed by need-finding interviews and participatory design (PD) results, we developed StoryBuddy, an AI-enabled system for parents to create interactive storytelling experiences. StoryBuddy's design highlighted the need for accommodating dynamic user needs between the desire for parent involvement and parent-child bonding and the goal of minimizing parent intervention when busy. The PD revealed varied assessment and educational goals of parents, which StoryBuddy addressed by supporting configuring question types and tracking child progress. A user study validated StoryBuddy's usability and suggested design insights for future parent-AI collaboration systems. △ Less

Submitted 14 March, 2022; v1 submitted 12 February, 2022; originally announced February 2022.

Comments: Published at CHI 2022

arXiv:2109.03423 [pdf, other]

It is AI's Turn to Ask Humans a Question: Question-Answer Pair Generation for Children's Story Books

Authors: Bingsheng Yao, Dakuo Wang, Tongshuang Wu, Zheng Zhang, Toby Jia-Jun Li, Mo Yu, Ying Xu

Abstract: Existing question answering (QA) techniques are created mainly to answer questions asked by humans. But in educational applications, teachers often need to decide what questions they should ask, in order to help students to improve their narrative understanding capabilities. We design an automated question-answer generation (QAG) system for this education scenario: given a story book at the kinder… ▽ More Existing question answering (QA) techniques are created mainly to answer questions asked by humans. But in educational applications, teachers often need to decide what questions they should ask, in order to help students to improve their narrative understanding capabilities. We design an automated question-answer generation (QAG) system for this education scenario: given a story book at the kindergarten to eighth-grade level as input, our system can automatically generate QA pairs that are capable of testing a variety of dimensions of a student's comprehension skills. Our proposed QAG model architecture is demonstrated using a new expert-annotated FairytaleQA dataset, which has 278 child-friendly storybooks with 10,580 QA pairs. Automatic and human evaluations show that our model outperforms state-of-the-art QAG baseline systems. On top of our QAG system, we also start to build an interactive story-telling application for the future real-world deployment in this educational scenario. △ Less

Submitted 25 March, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: Accepted to ACL 2022

arXiv:2105.10127 [pdf, other]

A Need-finding Study for Understanding Text Entry in Smartphone App Usage

Authors: Toby Jia-Jun Li, Brad A. Myers

Abstract: Text entry makes up about one-fourth of the smartphone interaction events, and is known to be challenging and difficult. However, there has been little study about the characteristics of text entry in the context of smartphone app usage. In this paper, we present a mixed-method in-situ study conducted in 2016 with 17 active smartphone users to better understand text entry in smartphone app usage.… ▽ More Text entry makes up about one-fourth of the smartphone interaction events, and is known to be challenging and difficult. However, there has been little study about the characteristics of text entry in the context of smartphone app usage. In this paper, we present a mixed-method in-situ study conducted in 2016 with 17 active smartphone users to better understand text entry in smartphone app usage. Our results show 80% of text was entered into communication apps, with different apps exhibiting distinct usage patterns. We found that structured data such as URLs and email addresses are rarely typed but instead are auto-completed or replaced with search, copy-and-paste is rarely used, and sessions of smartphone usage with text entry involve more apps and last longer. We conclude with a discussion about the implications on the development of systems to better support mobile interaction. △ Less

Submitted 19 June, 2021; v1 submitted 21 May, 2021; originally announced May 2021.

arXiv:2101.11103 [pdf, other]

doi 10.1145/3411764.3445049

Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

Authors: Toby Jia-Jun Li, Lindsay Popowski, Tom M. Mitchell, Brad A. Myers

Abstract: Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts… ▽ More Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec's key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks. △ Less

Submitted 11 January, 2021; originally announced January 2021.

Comments: Accepted to CHI Conference on Human Factors in Computing Systems (CHI 2021)

arXiv:2007.09809 [pdf, other]

Geno: A Developer Tool for Authoring Multimodal Interaction on Existing Web Applications

Authors: Ritam Jyoti Sarmah, Yunpeng Ding, Di Wang, Cheuk Yin Phipson Lee, Toby Jia-Jun Li, Xiang 'Anthony' Chen

Abstract: Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multimodal interfaces. We present Geno---a developer tool for adding the voice input modality to existing web apps without requ… ▽ More Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack of unified support for creating multimodal interfaces. We present Geno---a developer tool for adding the voice input modality to existing web apps without requiring significant NLP expertise. Geno provides a high-level workflow for developers to specify functionalities to be supported by voice (intents), create language models for detecting intents and the relevant information (parameters) from user utterances, and fulfill the intents by either programmatically invoking the corresponding functions or replaying GUI actions on the web app. Geno further supports multimodal references to GUI context in voice commands (e.g. "move this [event] to next week" while pointing at an event with the cursor). In a study, developers with little NLP expertise were able to add multimodal voice command support for two existing web apps using Geno. △ Less

Submitted 19 July, 2020; originally announced July 2020.

arXiv:2004.08353 [pdf, other]

doi 10.1145/3392869

Privacy-Preserving Script Sharing in GUI-based Programming-by-Demonstration Systems

Authors: Toby Jia-Jun Li, Jingya Chen, Brandon Canfield, Brad A. Myers

Abstract: An important concern in end user development (EUD) is accidentally embedding personal information in program artifacts when sharing them. This issue is particularly important in GUI-based programming-by-demonstration (PBD) systems due to the lack of direct developer control of script contents. Prior studies reported that these privacy concerns were the main barrier to script sharing in EUD. We pre… ▽ More An important concern in end user development (EUD) is accidentally embedding personal information in program artifacts when sharing them. This issue is particularly important in GUI-based programming-by-demonstration (PBD) systems due to the lack of direct developer control of script contents. Prior studies reported that these privacy concerns were the main barrier to script sharing in EUD. We present a new approach that can identify and obfuscate the potential personal information in GUI-based PBD scripts based on the uniqueness of information entries with respect to the corresponding app GUI context. Compared with the prior approaches, ours supports broader types of personal information beyond explicitly pre-specified ones, requires minimal user effort, addresses the threat of re-identification attacks, and can work with third-party apps from any task domain. Our approach also recovers obfuscated fields locally on the script consumer's side to preserve the shared scripts' transparency, readability, robustness, and generalizability. Our evaluation shows that our approach (1) accurately identifies the potential personal information in scripts across different apps in diverse task domains; (2) allows end-user developers to feel comfortable sharing their own scripts; and (3) enables script consumers to understand the operation of shared scripts despite the obfuscated fields. △ Less

Submitted 17 April, 2020; originally announced April 2020.

Comments: In the Proceedings of the ACM on Human-Computer Interaction (PACM) Vol.4 No. CSCW1. (CSCW 2020)

Journal ref: Proc. ACM Hum.-Comput. Interact. 4, CSCW1, Article 60 (May 2020), 23 pages

arXiv:2003.02622 [pdf]

Towards Effective Human-AI Collaboration in GUI-Based Interactive Task Learning Agents

Authors: Toby Jia-Jun Li, Jingya Chen, Tom M. Mitchell, Brad A. Myers

Abstract: We argue that a key challenge in enabling usable and useful interactive task learning for intelligent agents is to facilitate effective Human-AI collaboration. We reflect on our past 5 years of efforts on designing, developing and studying the SUGILITE system, discuss the issues on incorporating recent advances in AI with HCI principles in mixed-initiative interactions and multi-modal interactions… ▽ More We argue that a key challenge in enabling usable and useful interactive task learning for intelligent agents is to facilitate effective Human-AI collaboration. We reflect on our past 5 years of efforts on designing, developing and studying the SUGILITE system, discuss the issues on incorporating recent advances in AI with HCI principles in mixed-initiative interactions and multi-modal interactions, and summarize the lessons we learned. Lastly, we identify several challenges and opportunities, and describe our ongoing work △ Less

Submitted 5 March, 2020; originally announced March 2020.

Journal ref: CHI 2020 Workshop on Artificial Intelligence for HCI: A Modern Approach (AI4HCI)

arXiv:1909.05744 [pdf, ps, other]

On an enhancement of RNA probing data using Information Theory

Authors: Thomas J. X. Li, Christian M. Reidys

Abstract: Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree,… ▽ More Identifying the secondary structure of an RNA is crucial for understanding its diverse regulatory functions. This paper focuses on how to enhance target identification in a Boltzmann ensemble of structures via chemical probing data. We employ an information-theoretic approach to solve the problem, via considering a variant of the Rényi-Ulam game. Our framework is centered around the ensemble tree, a hierarchical bi-partition of the input ensemble, that is constructed by recursively querying about whether or not a base pair of maximum information entropy is contained in the target. These queries are answered via relating local with global probing data, employing the modularity in RNA secondary structures. We present that leaves of the tree are comprised of sub-samples exhibiting a distinguished structure with high probability. In particular, for a Boltzmann ensemble incorporating probing data, which is well established in the literature, the probability of our framework correctly identifying the target in the leaf is greater than $90\%$. △ Less

Submitted 12 September, 2019; originally announced September 2019.

MSC Class: 94A15; 68P30; 92E10; 92B05

arXiv:1909.00031 [pdf, other]

Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations

Authors: Toby Jia-Jun Li, Marissa Radensky, Justin Jia, Kirielle Singarajah, Tom M. Mitchell, Brad A. Myers

Abstract: Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concept… ▽ More Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multi-modal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability. △ Less

Submitted 6 January, 2020; v1 submitted 30 August, 2019; originally announced September 2019.

Comments: The AAAI-20 Workshop on Intelligent Process Automation (IPA-20)

arXiv:1908.10954 [pdf]

doi 10.1145/2858036.2858123

Not at Home on the Range: Peer Production and the Urban/Rural Divide

Authors: Isaac Johnson, Allen Yilun Lin, Toby Jia-Jun Li, Andrew Hall, Aaron Halfaker, Johannes Schöning, Brent Hecht

Abstract: Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in bo… ▽ More Wikipedia articles about places, OpenStreetMap features, and other forms of peer-produced content have become critical sources of geographic knowledge for humans and intelligent technologies. In this paper, we explore the effectiveness of the peer production model across the rural/urban divide, a divide that has been shown to be an important factor in many online social systems. We find that in both Wikipedia and OpenStreetMap, peer-produced content about rural areas is of systematically lower quality, is less likely to have been produced by contributors who focus on the local area, and is more likely to have been generated by automated software agents (i.e. bots). We then codify the systemic challenges inherent to characterizing rural phenomena through peer production and discuss potential solutions. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 10 pages, published on CHI'16

ACM Class: H.5.m

Journal ref: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems

arXiv:1307.1961 [pdf, ps, other]

Optimal Locally Repairable Linear Codes

Authors: Wentu Song, Son Hoang Dau, Chau Yuen, Tiffany Jing Li

Abstract: Linear erasure codes with local repairability are desirable for distributed data storage systems. An [n, k, d] code having all-symbol (r, δ})-locality, denoted as (r, δ)a, is considered optimal if it also meets the minimum Hamming distance bound. The existing results on the existence and the construction of optimal (r, δ)a codes are limited to only the special case of δ = 2, and to only two small… ▽ More Linear erasure codes with local repairability are desirable for distributed data storage systems. An [n, k, d] code having all-symbol (r, δ})-locality, denoted as (r, δ)a, is considered optimal if it also meets the minimum Hamming distance bound. The existing results on the existence and the construction of optimal (r, δ)a codes are limited to only the special case of δ = 2, and to only two small regions within this special case, namely, m = 0 or m >= (v+δ-1) > (δ-1), where m = n mod (r+δ-1) and v = k mod r. This paper investigates the existence conditions and presents deterministic constructive algorithms for optimal (r, δ)a codes with general r and δ. First, a structure theorem is derived for general optimal (r, δ)a codes which helps illuminate some of their structure properties. Next, the entire problem space with arbitrary n, k, r and δ is divided into eight different cases (regions) with regard to the specific relations of these parameters. For two cases, it is rigorously proved that no optimal (r, δ)a could exist. For four other cases the optimal (r, δ)a codes are shown to exist, deterministic constructions are proposed and the lower bound on the required field size for these algorithms to work is provided. Our new constructive algorithms not only cover more cases, but for the same cases where previous algorithms exist, the new constructions require a considerably smaller field, which translates to potentially lower computational complexity. Our findings substantially enriches the knowledge on (r, δ)a codes, leaving only two cases in which the existence of optimal codes are yet to be determined. △ Less

Submitted 8 July, 2013; originally announced July 2013.

Comments: Under Review

arXiv:1209.5212 [pdf, ps, other]

Error Correction for Cooperative Data Exchange

Authors: Wentu Song, Xiumin Wang, Chau Yuen, Tiffany Jing Li, Rongquan Feng

Abstract: This paper considers the problem of error correction for a cooperative data exchange (CDE) system, where some clients are compromised or failed and send false messages. Assuming each client possesses a subset of the total messages, we analyze the error correction capability when every client is allowed to broadcast only one linearly-coded message. Our error correction capability bound determines t… ▽ More This paper considers the problem of error correction for a cooperative data exchange (CDE) system, where some clients are compromised or failed and send false messages. Assuming each client possesses a subset of the total messages, we analyze the error correction capability when every client is allowed to broadcast only one linearly-coded message. Our error correction capability bound determines the maximum number of clients that can be compromised or failed without jeopardizing the final decoding solution at each client. We show that deterministic, feasible linear codes exist that can achieve the derived bound. We also evaluate random linear codes, where the coding coefficients are drawn randomly, and then develop the probability for a client to withstand a certain number of compromised or failed peers and successfully deduce the complete message for any network size and any initial message distributions. △ Less

Submitted 24 September, 2012; originally announced September 2012.

Showing 1–40 of 40 results for author: Li, T J