subscribe to arXiv mailings

doi 10.1145/3635636.3656189

MIMOSA: Human-AI Co-Creation of Computational Spatial Audio Effects on Videos

Authors: Zheng Ning, Zheng Zhang, Jerrick Ban, Kaiwen Jiang, Ruohong Gan, Yapeng Tian, Toby Jia-Jun Li

Abstract: Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only mon… ▽ More Spatial audio offers more immersive video consumption experiences to viewers; however, creating and editing spatial audio often expensive and requires specialized equipment and skills, posing a high barrier for amateur video creators. We present MIMOSA, a human-AI co-creation tool that enables amateur users to computationally generate and manipulate spatial audio effects. For a video with only monaural or stereo audio, MIMOSA automatically grounds each sound source to the corresponding sounding object in the visual scene and enables users to further validate and fix the errors in the locations of sounding objects. Users can also augment the spatial audio effect by flexibly manipulating the sounding source positions and creatively customizing the audio effect. The design of MIMOSA exemplifies a human-AI collaboration approach that, instead of utilizing state-of art end-to-end "black-box" ML models, uses a multistep pipeline that aligns its interpretable intermediate results with the user's workflow. A lab user study with 15 participants demonstrates MIMOSA's usability, usefulness, expressiveness, and capability in creating immersive spatial audio effects in collaboration with users. △ Less

Submitted 23 April, 2024; originally announced April 2024.

arXiv:2402.16102 [pdf, other]

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

Authors: Joris Baan, Raquel Fernández, Barbara Plank, Wilker Aziz

Abstract: With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; t… ▽ More With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good representation of uncertainty by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; the second as an indication of human label variation. We discuss their merits and limitations, and take the position that both are crucial for trustworthy and fair NLP systems, but that exploiting a single predictive distribution is limiting. We recommend tools and highlight exciting directions towards models with disentangled representations of uncertainty about predictions and uncertainty about human labels. △ Less

Submitted 25 February, 2024; originally announced February 2024.

Comments: EACL 2024 main

arXiv:2402.07300 [pdf, other]

SPICA: Interactive Video Content Exploration through Augmented Audio Descriptions for Blind or Low-Vision Viewers

Authors: Zheng Ning, Brianna L. Wimer, Kaiwen Jiang, Keyi Chen, Jerrick Ban, Yapeng Tian, Yuhang Zhao, Toby Jia-Jun Li

Abstract: Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video… ▽ More Blind or Low-Vision (BLV) users often rely on audio descriptions (AD) to access video content. However, conventional static ADs can leave out detailed information in videos, impose a high mental load, neglect the diverse needs and preferences of BLV users, and lack immersion. To tackle these challenges, we introduce SPICA, an AI-powered system that enables BLV users to interactively explore video content. Informed by prior empirical studies on BLV video consumption, SPICA offers novel interactive mechanisms for supporting temporal navigation of frame captions and spatial exploration of objects within key frames. Leveraging an audio-visual machine learning pipeline, SPICA augments existing ADs by adding interactivity, spatial sound effects, and individual object descriptions without requiring additional human annotation. Through a user study with 14 BLV participants, we evaluated the usability and usefulness of SPICA and explored user behaviors, preferences, and mental models when interacting with augmented ADs. △ Less

Submitted 26 February, 2024; v1 submitted 11 February, 2024; originally announced February 2024.

arXiv:2307.15703 [pdf, other]

Uncertainty in Natural Language Generation: From Theory to Applications

Authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz

Abstract: Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely… ▽ More Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more. △ Less

Submitted 28 July, 2023; originally announced July 2023.

arXiv:2305.11707 [pdf, other]

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Authors: Mario Giulianelli, Joris Baan, Wilker Aziz, Raquel Fernández, Barbara Plank

Abstract: In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of o… ▽ More In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator's calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model's representation of uncertainty. Code available at https://github.com/dmg-illc/nlg-uncertainty-probes. △ Less

Submitted 20 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: Camera ready version for EMNLP 2023

arXiv:2210.16133 [pdf, other]

Stop Measuring Calibration When Humans Disagree

Authors: Joris Baan, Wilker Aziz, Barbara Plank, Raquel Fernández

Abstract: Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We sh… ▽ More Calibration is a popular framework to evaluate whether a classifier knows when it does not know - i.e., its predictive probabilities are a good indication of how likely a prediction is to be correct. Correctness is commonly estimated against the human majority class. Recently, calibration to human majority has been measured on tasks where humans inherently disagree about which class applies. We show that measuring calibration to human majority given inherent disagreements is theoretically problematic, demonstrate this empirically on the ChaosNLI dataset, and derive several instance-level measures of calibration that capture key statistical properties of human judgements - class frequency, ranking and entropy. △ Less

Submitted 30 November, 2022; v1 submitted 28 October, 2022; originally announced October 2022.

Comments: Accepted at EMNLP 2022

arXiv:2010.09648 [pdf]

Agent-based Simulation Model and Deep Learning Techniques to Evaluate and Predict Transportation Trends around COVID-19

Authors: Ding Wang, Fan Zuo, Jingqin Gao, Yueshuai He, Zilin Bian, Suzana Duran Bernardes, Chaekuk Na, Jingxing Wang, John Petinos, Kaan Ozbay, Joseph Y. J. Chow, Shri Iyer, Hani Nassif, Xuegang Jeff Ban

Abstract: The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. This edition of the white paper updates travel trends and highlights an agent-based simulation model's results to predict the impact of proposed phased reopening strategies. It also introduces a re… ▽ More The COVID-19 pandemic has affected travel behaviors and transportation system operations, and cities are grappling with what policies can be effective for a phased reopening shaped by social distancing. This edition of the white paper updates travel trends and highlights an agent-based simulation model's results to predict the impact of proposed phased reopening strategies. It also introduces a real-time video processing method to measure social distancing through cameras on city streets. △ Less

Submitted 23 September, 2020; originally announced October 2020.

arXiv:2006.14882 [pdf, other]

An Interactive Data Visualization and Analytics Tool to Evaluate Mobility and Sociability Trends During COVID-19

Authors: Fan Zuo, Jingxing Wang, Jingqin Gao, Kaan Ozbay, Xuegang Jeff Ban, Yubin Shen, Hong Yang, Shri Iyer

Abstract: The COVID-19 outbreak has dramatically changed travel behavior in affected cities. The C2SMART research team has been investigating the impact of COVID-19 on mobility and sociability. New York City (NYC) and Seattle, two of the cities most affected by COVID-19 in the U.S. were included in our initial study. An all-in-one dashboard with data mining and cloud computing capabilities was developed for… ▽ More The COVID-19 outbreak has dramatically changed travel behavior in affected cities. The C2SMART research team has been investigating the impact of COVID-19 on mobility and sociability. New York City (NYC) and Seattle, two of the cities most affected by COVID-19 in the U.S. were included in our initial study. An all-in-one dashboard with data mining and cloud computing capabilities was developed for interactive data analytics and visualization to facilitate the understanding of the impact of the outbreak and corresponding policies such as social distancing on transportation systems. This platform is updated regularly and continues to evolve with the addition of new data, impact metrics, and visualizations to assist public and decision-makers to make informed decisions. This paper presents the architecture of the COVID related mobility data dashboard and preliminary mobility and sociability metrics for NYC and Seattle. △ Less

Submitted 26 June, 2020; originally announced June 2020.

arXiv:1911.03898 [pdf, other]

Understanding Multi-Head Attention in Abstractive Summarization

Authors: Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke

Abstract: Attention mechanisms in deep learning architectures have often been used as a means of transparency and, as such, to shed light on the inner workings of the architectures. Recently, there has been a growing interest in whether or not this assumption is correct. In this paper we investigate the interpretability of multi-head attention in abstractive summarization, a sequence-to-sequence task for wh… ▽ More Attention mechanisms in deep learning architectures have often been used as a means of transparency and, as such, to shed light on the inner workings of the architectures. Recently, there has been a growing interest in whether or not this assumption is correct. In this paper we investigate the interpretability of multi-head attention in abstractive summarization, a sequence-to-sequence task for which attention does not have an intuitive alignment role, such as in machine translation. We first introduce three metrics to gain insight in the focus of attention heads and observe that these heads specialize towards relative positions, specific part-of-speech tags, and named entities. However, we also find that ablating and pruning these heads does not lead to a significant drop in performance, indicating redundancy. By replacing the softmax activation functions with sparsemax activation functions, we find that attention heads behave seemingly more transparent: we can ablate fewer heads and heads score higher on our interpretability metrics. However, if we apply pruning to the sparsemax model we find that we can prune even more heads, raising the question whether enforced sparsity actually improves transparency. Finally, we find that relative positions heads seem integral to summarization performance and persistently remain after pruning. △ Less

Submitted 10 November, 2019; originally announced November 2019.

arXiv:1907.00570 [pdf, other]

Do Transformer Attention Heads Provide Transparency in Abstractive Summarization?

Authors: Joris Baan, Maartje ter Hoeve, Marlies van der Wees, Anne Schuth, Maarten de Rijke

Abstract: Learning algorithms become more powerful, often at the cost of increased complexity. In response, the demand for algorithms to be transparent is growing. In NLP tasks, attention distributions learned by attention-based deep learning models are used to gain insights in the models' behavior. To which extent is this perspective valid for all NLP tasks? We investigate whether distributions calculated… ▽ More Learning algorithms become more powerful, often at the cost of increased complexity. In response, the demand for algorithms to be transparent is growing. In NLP tasks, attention distributions learned by attention-based deep learning models are used to gain insights in the models' behavior. To which extent is this perspective valid for all NLP tasks? We investigate whether distributions calculated by different attention heads in a transformer architecture can be used to improve transparency in the task of abstractive summarization. To this end, we present both a qualitative and quantitative analysis to investigate the behavior of the attention heads. We show that some attention heads indeed specialize towards syntactically and semantically distinct input. We propose an approach to evaluate to which extent the Transformer model relies on specifically learned attention distributions. We also discuss what this implies for using attention distributions as a means of transparency. △ Less

Submitted 8 July, 2019; v1 submitted 1 July, 2019; originally announced July 2019.

Comments: To appear at FACTS-IR 2019, SIGIR

arXiv:1906.01634 [pdf, other]

On the Realization of Compositionality in Neural Networks

Authors: Joris Baan, Jana Leible, Mitja Nikolaus, David Rau, Dennis Ulmer, Tim Baumgärtner, Dieuwke Hupkes, Elia Bruni

Abstract: We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has s… ▽ More We present a detailed comparison of two types of sequence to sequence models trained to conduct a compositional task. The models are architecturally identical at inference time, but differ in the way that they are trained: our baseline model is trained with a task-success signal only, while the other model receives additional supervision on its attention mechanism (Attentive Guidance), which has shown to be an effective method for encouraging more compositional solutions (Hupkes et al.,2019). We first confirm that the models with attentive guidance indeed infer more compositional solutions than the baseline, by training them on the lookup table task presented by Liška et al. (2019). We then do an in-depth analysis of the structural differences between the two model types, focusing in particular on the organisation of the parameter space and the hidden layer activations and find noticeable differences in both these aspects. Guided networks focus more on the components of the input rather than the sequence as a whole and develop small functional groups of neurons with specific purposes that use their gates more selectively. Results from parameter heat maps, component swapping and graph analysis also indicate that guided networks exhibit a more modular structure with a small number of specialized, strongly connected neurons. △ Less

Submitted 6 June, 2019; v1 submitted 4 June, 2019; originally announced June 2019.

Comments: To appear at BlackboxNLP 2019, ACL

arXiv:1808.04925 [pdf, ps, other]

Complexity of Shift Spaces on Semigroups

Authors: J. C. Ban, C. H. Chang, Y. Z. Huang

Abstract: Let $G=\left\langle S|R_{A}\right\rangle $ be a semigroup with generating set $ S$ and equivalences $R_{A}$ among $S$ determined by a matrix $A$. This paper investigates the complexity of $G$-shift spaces by yielding the topological entropies. After revealing the existence of topological entropy of $G$-shift of finite type ($G$-SFT), the calculation of topological entropy of $G$-SFT is equivalent… ▽ More Let $G=\left\langle S|R_{A}\right\rangle $ be a semigroup with generating set $ S$ and equivalences $R_{A}$ among $S$ determined by a matrix $A$. This paper investigates the complexity of $G$-shift spaces by yielding the topological entropies. After revealing the existence of topological entropy of $G$-shift of finite type ($G$-SFT), the calculation of topological entropy of $G$-SFT is equivalent to solving a system of nonlinear recurrence equations. The complete characterization of topological entropies of $G$-SFTs on two symbols is addressed, which extends [Ban and Chang, arXiv:1803.03082] in which $G$ is a free semigroup. △ Less

Submitted 14 August, 2018; originally announced August 2018.

Showing 1–12 of 12 results for author: Ban, J