-
Beware of Validation by Eye: Visual Validation of Linear Trends in Scatterplots
Authors:
Daniel Braun,
Remco Chang,
Michael Gleicher,
Tatiana von Landesberger
Abstract:
Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the l…
▽ More
Visual validation of regression models in scatterplots is a common practice for assessing model quality, yet its efficacy remains unquantified. We conducted two empirical experiments to investigate individuals' ability to visually validate linear regression models (linear trends) and to examine the impact of common visualization designs on validation quality. The first experiment showed that the level of accuracy for visual estimation of slope (i.e., fitting a line to data) is higher than for visual validation of slope (i.e., accepting a shown line). Notably, we found bias toward slopes that are "too steep" in both cases. This lead to novel insights that participants naturally assessed regression with orthogonal distances between the points and the line (i.e., ODR regression) rather than the common vertical distances (OLS regression). In the second experiment, we investigated whether incorporating common designs for regression visualization (error lines, bounding boxes, and confidence intervals) would improve visual validation. Even though error lines reduced validation bias, results failed to show the desired improvements in accuracy for any design. Overall, our findings suggest caution in using visual model validation for linear trends in scatterplots.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Design-Specific Transformations in Visualization
Authors:
Eugene Wu,
Remco Chang
Abstract:
In visualization, the process of transforming raw data into visually comprehensible representations is pivotal. While existing models like the Information Visualization Reference Model describe the data-to-visual mapping process, they often overlook a crucial intermediary step: design-specific transformations. This process, occurring after data transformation but before visual-data mapping, furthe…
▽ More
In visualization, the process of transforming raw data into visually comprehensible representations is pivotal. While existing models like the Information Visualization Reference Model describe the data-to-visual mapping process, they often overlook a crucial intermediary step: design-specific transformations. This process, occurring after data transformation but before visual-data mapping, further derives data, such as groupings, layout, and statistics, that are essential to properly render the visualization. In this paper, we advocate for a deeper exploration of design-specific transformations, highlighting their importance in understanding visualization properties, particularly in relation to user tasks. We incorporate design-specific transformations into the Information Visualization Reference Model and propose a new formalism that encompasses the user task as a function over data. The resulting formalism offers three key benefits over existing visualization models: (1) describing task as compositions of functions, (2) enabling analysis of data transformations for visual-data mapping, and (3) empowering reasoning about visualization correctness and effectiveness. We further discuss the potential implications of this model on visualization theory and visualization experiment design.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum
Authors:
Hadi Pouransari,
Chun-Liang Li,
Jen-Hao Rick Chang,
Pavan Kumar Anasosalu Vasu,
Cem Koc,
Vaishaal Shankar,
Oncel Tuzel
Abstract:
Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal n…
▽ More
Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal nor computationally efficient. Additionally, training on long sequences becomes computationally prohibitive due to the quadratic cost of attention. In this study, we introduce dataset decomposition, a novel variable sequence length training technique, to tackle these challenges. We decompose a dataset into a union of buckets, each containing sequences of the same size extracted from a unique document. During training, we use variable sequence length and batch size, sampling simultaneously from all buckets with a curriculum. In contrast to the concat-and-chunk baseline, which incurs a fixed attention cost at every step of training, our proposed method incurs a penalty proportional to the actual document lengths at each step, resulting in significant savings in training time. We train an 8k context-length 1B model at the same cost as a 2k context-length model trained with the baseline approach. Experiments on a web-scale corpus demonstrate that our approach significantly enhances performance on standard language evaluations and long-context benchmarks, reaching target accuracy 3x faster compared to the baseline. Our method not only enables efficient pretraining on long sequences but also scales effectively with dataset size. Lastly, we shed light on a critical yet less studied aspect of training large language models: the distribution and curriculum of sequence lengths, which results in a non-negligible difference in performance.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
A Typology of Decision-Making Tasks for Visualization
Authors:
Camelia D. Brumar,
Sam Molnar,
Gabriel Appleby,
Kristi Potter,
Remco Chang
Abstract:
Despite decision-making being a vital goal of data visualization, little work has been done to differentiate the decision-making tasks within our field. While visualization task taxonomies and typologies exist, they are often too granular for describing complex decision goals and decision-making processes, thus limiting their potential use in designing decision-support tools. In this paper, we con…
▽ More
Despite decision-making being a vital goal of data visualization, little work has been done to differentiate the decision-making tasks within our field. While visualization task taxonomies and typologies exist, they are often too granular for describing complex decision goals and decision-making processes, thus limiting their potential use in designing decision-support tools. In this paper, we contribute a typology of decision-making tasks that were iteratively refined from a list of design goals distilled from a literature review. Our typology is concise and consists of only three tasks: choose, activate, and create. Originally proposed by the scientific community, we extend and provide definitions for these tasks that are suitable for the visualization community. Our proposed typology offers two benefits. First, it facilitates the composition of decisions using these three tasks, allowing for flexible and clear descriptions across varying complexities and domains. Second, diagrams created using this typology encourage productive discourse between visualization designers and domain experts by abstracting the intricacies of data, thereby promoting clarity and rigorous analysis of decision-making processes. We motivate the use of our typology through four case studies and demonstrate the benefits of our approach through semi-structured interviews conducted with experienced members of the visualization community, comprising academic and industry experts, who have contributed to developing or publishing decision support systems for domain experts. Our interviewees composed diagrams using our typology to delineate the decision-making processes that drive their decision-support tools, demonstrating its descriptive capacity and effectiveness.
△ Less
Submitted 22 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
DimBridge: Interactive Explanation of Visual Patterns in Dimensionality Reductions with Predicate Logic
Authors:
Brian Montambault,
Gabriel Appleby,
Jen Rogers,
Camelia D. Brumar,
Mingwei Li,
Remco Chang
Abstract:
Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact wi…
▽ More
Dimensionality reduction techniques are widely used for visualizing high-dimensional data. However, support for interpreting patterns of dimension reduction results in the context of the original data space is often insufficient. Consequently, users may struggle to extract insights from the projections. In this paper, we introduce DimBridge, a visual analytics tool that allows users to interact with visual patterns in a projection and retrieve corresponding data patterns. DimBridge supports several interactions, allowing users to perform various analyses, from contrasting multiple clusters to explaining complex latent structures. Leveraging first-order predicate logic, DimBridge identifies subspaces in the original dimensions relevant to a queried pattern and provides an interface for users to visualize and interact with them. We demonstrate how DimBridge can help users overcome the challenges associated with interpreting visual patterns in projections.
△ Less
Submitted 12 April, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Preliminary Guidelines For Combining Data Integration and Visual Data Analysis
Authors:
Adam Coscia,
Ashley Suh,
Remco Chang,
Alex Endert
Abstract:
Data integration is often performed to consolidate information from multiple disparate data sources during visual data analysis. However, integration operations are usually separate from visual analytics operations such as encode and filter in both interface design and empirical research. We conducted a preliminary user study to investigate whether and how data integration should be incorporated d…
▽ More
Data integration is often performed to consolidate information from multiple disparate data sources during visual data analysis. However, integration operations are usually separate from visual analytics operations such as encode and filter in both interface design and empirical research. We conducted a preliminary user study to investigate whether and how data integration should be incorporated directly into the visual analytics process. We used two interface alternatives featuring contrasting approaches to the data preparation and analysis workflow: manual file-based ex-situ integration as a separate step from visual analytics operations; and automatic UI-based in-situ integration merged with visual analytics operations. Participants were asked to complete specific and free-form tasks with each interface, browsing for patterns, generating insights, and summarizing relationships between attributes distributed across multiple files. Analyzing participants' interactions and feedback, we found both task completion time and total interactions to be similar across interfaces and tasks, as well as unique integration strategies between interfaces and emergent behaviors related to satisficing and cognitive bias. Participants' time spent and interactions revealed that in-situ integration enabled users to spend more time on analysis tasks compared with ex-situ integration. Participants' integration strategies and analytical behaviors revealed differences in interface usage for generating and tracking hypotheses and insights. With these results, we synthesized preliminary guidelines for designing future visual analytics interfaces that can support integrating attributes throughout an active analysis process.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Authors:
Taicheng Guo,
Xiuying Chen,
Yaqi Wang,
Ruidi Chang,
Shichao Pei,
Nitesh V. Chawla,
Olaf Wiest,
Xiangliang Zhang
Abstract:
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in com…
▽ More
Large Language Models (LLMs) have achieved remarkable success across a wide array of tasks. Due to the impressive planning and reasoning abilities of LLMs, they have been used as autonomous agents to do many tasks automatically. Recently, based on the development of using one LLM as a single planning or decision-making agent, LLM-based multi-agent systems have achieved considerable progress in complex problem-solving and world simulation. To provide the community with an overview of this dynamic field, we present this survey to offer an in-depth discussion on the essential aspects of multi-agent systems based on LLMs, as well as the challenges. Our goal is for readers to gain substantial insights on the following questions: What domains and environments do LLM-based multi-agents simulate? How are these agents profiled and how do they communicate? What mechanisms contribute to the growth of agents' capacities? For those interested in delving into this field of study, we also summarize the commonly used datasets or benchmarks for them to have convenient access. To keep researchers updated on the latest studies, we maintain an open-source GitHub repository, dedicated to outlining the research on LLM-based multi-agent systems.
△ Less
Submitted 18 April, 2024; v1 submitted 21 January, 2024;
originally announced February 2024.
-
Ambush from All Sides: Understanding Security Threats in Open-Source Software CI/CD Pipelines
Authors:
Ziyue Pan,
Wenbo Shen,
Xingkai Wang,
Yutian Yang,
Rui Chang,
Yao Liu,
Chengwei Liu,
Yang Liu,
Kui Ren
Abstract:
The continuous integration and continuous deployment (CI/CD) pipelines are widely adopted on Internet hosting platforms, such as GitHub. With the popularity, the CI/CD pipeline faces various security threats. However, current CI/CD pipelines suffer from malicious code and severe vulnerabilities. Even worse, people have not been fully aware of its attack surfaces and the corresponding impacts.
Th…
▽ More
The continuous integration and continuous deployment (CI/CD) pipelines are widely adopted on Internet hosting platforms, such as GitHub. With the popularity, the CI/CD pipeline faces various security threats. However, current CI/CD pipelines suffer from malicious code and severe vulnerabilities. Even worse, people have not been fully aware of its attack surfaces and the corresponding impacts.
Therefore, in this paper, we conduct a large-scale measurement and a systematic analysis to reveal the attack surfaces of the CI/CD pipeline and quantify their security impacts. Specifically, for the measurement, we collect a data set of 320,000+ CI/CD pipeline-configured GitHub repositories and build an analysis tool to parse the CI/CD pipelines and extract security-critical usages. Besides, current CI/CD ecosystem heavily relies on several core scripts, which may lead to a single point of failure. While the CI/CD pipelines contain sensitive information/operations, making them the attacker's favorite targets.
Inspired by the measurement findings, we abstract the threat model and the attack approach toward CI/CD pipelines, followed by a systematic analysis of attack surfaces, attack strategies, and the corresponding impacts. We further launch case studies on five attacks in real-world CI/CD environments to validate the revealed attack surfaces. Finally, we give suggestions on mitigating attacks on CI/CD scripts, including securing CI/CD configurations, securing CI/CD scripts, and improving CI/CD infrastructure.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality Awareness
Authors:
Ruei-Che Chang,
Chia-Sheng Hung,
Bing-Yu Chen,
Dhruv Jain,
Anhong Guo
Abstract:
Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum…
▽ More
Mixed-reality (MR) soundscapes blend real-world sound with virtual audio from hearing devices, presenting intricate auditory information that is hard to discern and differentiate. This is particularly challenging for blind or visually impaired individuals, who rely on sounds and descriptions in their everyday lives. To understand how complex audio information is consumed, we analyzed online forum posts within the blind community, identifying prevailing challenges, needs, and desired solutions. We synthesized the results and propose SoundShift for increasing MR sound awareness, which includes six sound manipulations: Transparency Shift, Envelope Shift, Position Shift, Style Shift, Time Shift, and Sound Append. To evaluate the effectiveness of SoundShift, we conducted a user study with 18 blind participants across three simulated MR scenarios, where participants identified specific sounds within intricate soundscapes. We found that SoundShift increased MR sound awareness and minimized cognitive load. Finally, we developed three real-world example applications to demonstrate the practicality of SoundShift.
△ Less
Submitted 26 May, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
ModuleGuard:Understanding and Detecting Module Conflicts in Python Ecosystem
Authors:
Ruofan Zhu,
Xingyu Wang,
Chengwei Liu,
Zhengzi Xu,
Wenbo Shen,
Rui Chang,
Yang Liu
Abstract:
Python has become one of the most popular programming languages for software development due to its simplicity, readability, and versatility. As the Python ecosystem grows, developers face increasing challenges in avoiding module conflicts, which occur when different packages have the same namespace modules. Unfortunately, existing work has neither investigated the module conflict comprehensively…
▽ More
Python has become one of the most popular programming languages for software development due to its simplicity, readability, and versatility. As the Python ecosystem grows, developers face increasing challenges in avoiding module conflicts, which occur when different packages have the same namespace modules. Unfortunately, existing work has neither investigated the module conflict comprehensively nor provided tools to detect the conflict. Therefore, this paper systematically investigates the module conflict problem and its impact on the Python ecosystem. We propose a novel technique called InstSimulator, which leverages semantics and installation simulation to achieve accurate and efficient module extraction. Based on this, we implement a tool called ModuleGuard to detect module conflicts for the Python ecosystem. For the study, we first collect 97 MC issues, classify the characteristics and causes of these MC issues, summarize three different conflict patterns, and analyze their potential threats. Then, we conducted a large-scale analysis of the whole PyPI ecosystem (4.2 million packages) and GitHub popular projects (3,711 projects) to detect each MC pattern and analyze their potential impact. We discovered that module conflicts still impact numerous TPLs and GitHub projects. This is primarily due to developers' lack of understanding of the modules within their direct dependencies, not to mention the modules of the transitive dependencies. Our work reveals Python's shortcomings in handling naming conflicts and provides a tool and guidelines for developers to detect conflicts.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Supervised Contrastive Learning for Fine-grained Chromosome Recognition
Authors:
Ruijia Chang,
Suncheng Xiang,
Chengyu Zhou,
Kui Su,
Dahong Qian,
Jun Wang
Abstract:
Chromosome recognition is an essential task in karyotyping, which plays a vital role in birth defect diagnosis and biomedical research. However, existing classification methods face significant challenges due to the inter-class similarity and intra-class variation of chromosomes. To address this issue, we propose a supervised contrastive learning strategy that is tailored to train model-agnostic d…
▽ More
Chromosome recognition is an essential task in karyotyping, which plays a vital role in birth defect diagnosis and biomedical research. However, existing classification methods face significant challenges due to the inter-class similarity and intra-class variation of chromosomes. To address this issue, we propose a supervised contrastive learning strategy that is tailored to train model-agnostic deep networks for reliable chromosome classification. This method enables extracting fine-grained chromosomal embeddings in latent space. These embeddings effectively expand inter-class boundaries and reduce intra-class variations, enhancing their distinctiveness in predicting chromosome types. On top of two large-scale chromosome datasets, we comprehensively validate the power of our contrastive learning strategy in boosting cutting-edge deep networks such as Transformers and ResNets. Extensive results demonstrate that it can significantly improve models' generalization performance, with an accuracy improvement up to +4.5%. Codes and pretrained models will be released upon acceptance of this work.
△ Less
Submitted 12 December, 2023;
originally announced December 2023.
-
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
Authors:
Karren D. Yang,
Anurag Ranjan,
Jen-Hao Rick Chang,
Raviteja Vemulapalli,
Oncel Tuzel
Abstract:
We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f…
▽ More
We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D facial motions that accompany speech in the real world. Importantly, the relationship between speech and facial motion is one-to-many, containing both inter-speaker and intra-speaker variations and necessitating a probabilistic approach. In this paper, we identify and address key challenges that have so far limited the development of probabilistic models: lack of datasets and metrics that are suitable for training and evaluating them, as well as the difficulty of designing a model that generates diverse results while remaining faithful to a strong conditioning signal as speech. We first propose large-scale benchmark datasets and metrics suitable for probabilistic modeling. Then, we demonstrate a probabilistic model that achieves both diversity and fidelity to speech, outperforming other methods across the proposed benchmarks. Finally, we showcase useful applications of probabilistic models trained on these large-scale datasets: we can generate diverse speech-driven 3D facial motion that matches unseen speaker styles extracted from reference clips; and our synthetic meshes can be used to improve the performance of downstream audio-visual models.
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
HUGS: Human Gaussian Splats
Authors:
Muhammed Kocabas,
Jen-Hao Rick Chang,
James Gabriel,
Oncel Tuzel,
Anurag Ranjan
Abstract:
Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human togethe…
▽ More
Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human together with the scene using 3D Gaussian Splatting (3DGS). Our method takes only a monocular video with a small number of (50-100) frames, and it automatically learns to disentangle the static scene and a fully animatable human avatar within 30 minutes. We utilize the SMPL body model to initialize the human Gaussians. To capture details that are not modeled by SMPL (e.g. cloth, hairs), we allow the 3D Gaussians to deviate from the human body model. Utilizing 3D Gaussians for animated humans brings new challenges, including the artifacts created when articulating the Gaussians. We propose to jointly optimize the linear blend skinning weights to coordinate the movements of individual Gaussians during animation. Our approach enables novel-pose synthesis of human and novel view synthesis of both the human and the scene. We achieve state-of-the-art rendering quality with a rendering speed of 60 FPS while being ~100x faster to train over previous work. Our code will be announced here: https://github.com/apple/ml-hugs
△ Less
Submitted 29 November, 2023;
originally announced November 2023.
-
Tracking a Year of Polarized Twitter Discourse on Abortion
Authors:
Ashwin Rao,
Rong-Ching Chang,
Qiankun Zhong,
Kristina Lerman,
Magdalena Wojcieszak
Abstract:
Abortion is one of the most contentious issues in American politics. The Dobbs v. Jackson Women's Health Organization ruling in 2022, which shifted the authority to regulate abortion from the federal government to the states, triggering intense protests and emotional debates across the nation. Yet, little is known about how online discourse about abortion rights fluctuated on social media platform…
▽ More
Abortion is one of the most contentious issues in American politics. The Dobbs v. Jackson Women's Health Organization ruling in 2022, which shifted the authority to regulate abortion from the federal government to the states, triggering intense protests and emotional debates across the nation. Yet, little is known about how online discourse about abortion rights fluctuated on social media platforms. This study analyzes a corpus of over 57M abortion-related tweets from January 2022 to January 2023 to show how emotions, hateful rhetoric, toxic speech, use of obscenities and insults, and also framing strategies fluctuated over the span of one year among liberal and conservative users. We offer three key findings. (1) Fluctuations in emotions were temporary; key events during the analyzed period did not bring about lasting shifts in expressed emotions. (2) We observe significant ideological differences in the use of hate speech: conservatives resorted to hateful rhetoric more than liberals. Yet, liberals were especially likely to use obscenities and insults, especially on the days the ruling was leaked and after the Dobbs decision. In turn, toxic language sharply increased among both groups following the leak and after the SCOTUS ruling. (3) Conservatives employ religious and fetal personhood frames, while liberals emphasize women's health and bodily autonomy, with each group reacting negatively to the other group's frames. Our results offer an in-depth insight into the dynamics of online discourse on one of the most contentious issues in contemporary America.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Demystifying Compiler Unstable Feature Usage and Impacts in the Rust Ecosystem
Authors:
Chenghao Li,
Yifei Wu,
Wenbo Shen,
Zichen Zhao,
Rui Chang,
Chengwei Liu,
Yang Liu,
Kui Ren
Abstract:
Rust programming language is gaining popularity rapidly in building reliable and secure systems due to its security guarantees and outstanding performance. To provide extra functionalities, the Rust compiler introduces Rust unstable features (RUF) to extend compiler functionality, syntax, and standard library support. However, these features are unstable and may get removed, introducing compilatio…
▽ More
Rust programming language is gaining popularity rapidly in building reliable and secure systems due to its security guarantees and outstanding performance. To provide extra functionalities, the Rust compiler introduces Rust unstable features (RUF) to extend compiler functionality, syntax, and standard library support. However, these features are unstable and may get removed, introducing compilation failures to dependent packages. Even worse, their impacts propagate through transitive dependencies, causing large-scale failures in the whole ecosystem. Although RUF is widely used in Rust, previous research has primarily concentrated on Rust code safety, with the usage and impacts of RUF from the Rust compiler remaining unexplored. Therefore, we aim to bridge this gap by systematically analyzing the RUF usage and impacts in the Rust ecosystem. We propose novel techniques for extracting RUF precisely, and to assess its impact on the entire ecosystem quantitatively, we accurately resolve package dependencies. We have analyzed the whole Rust ecosystem with 590K package versions and 140M transitive dependencies. Our study shows that the Rust ecosystem uses 1000 different RUF, and at most 44% of package versions are affected by RUF, causing compiling failures for at most 12%. To mitigate wide RUF impacts, we further design and implement a RUF-compilation-failure recovery tool that can recover up to 90% of the failure. We believe our techniques, findings, and tools can help to stabilize the Rust compiler, ultimately enhancing the security and reliability of the Rust ecosystem.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
Novel-View Acoustic Synthesis from 3D Reconstructed Rooms
Authors:
Byeongjoo Ahn,
Karren Yang,
Brian Hamilton,
Jonathan Sheaffer,
Anurag Ranjan,
Miguel Sarabia,
Oncel Tuzel,
Jen-Hao Rick Chang
Abstract:
We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separ…
▽ More
We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. While naively training an end-to-end network fails to produce high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed rooms enables the same network to jointly tackle these tasks. Our method outperforms existing methods designed for the individual tasks, demonstrating its effectiveness at utilizing 3D visual information. In a simulated study on the Matterport3D-NVAS dataset, our model achieves near-perfect accuracy on source localization, a PSNR of 26.44 dB and a SDR of 14.23 dB for source separation and dereverberation, resulting in a PSNR of 25.55 dB and a SDR of 14.20 dB on novel-view acoustic synthesis. Code, pretrained model, and video results are available on the project webpage (https://github.com/apple/ml-nvas3d).
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
RekomGNN: Visualizing, Contextualizing and Evaluating Graph Neural Networks Recommendations
Authors:
Camelia D. Brumar,
Gabriel Appleby,
Jen Rogers,
Teddy Matinde,
Lara Thompson,
Remco Chang,
Anamaria Crisan
Abstract:
Content recommendation tasks increasingly use Graph Neural Networks, but it remains challenging for machine learning experts to assess the quality of their outputs. Visualization systems for GNNs that could support this interrogation are few. Moreover, those that do exist focus primarily on exposing GNN architectures for tuning and prediction tasks and do not address the challenges of recommendati…
▽ More
Content recommendation tasks increasingly use Graph Neural Networks, but it remains challenging for machine learning experts to assess the quality of their outputs. Visualization systems for GNNs that could support this interrogation are few. Moreover, those that do exist focus primarily on exposing GNN architectures for tuning and prediction tasks and do not address the challenges of recommendation tasks. We developed RekomGNN, a visual analytics system that supports ML experts in exploring GNN recommendations across several dimensions and making annotations about their quality. RekomGNN straddles the design space between Neural Network and recommender system visualization to arrive at a set of encoding and interaction choices for recommendation tasks. We found that RekomGNN helps experts make qualitative assessments of the GNN's results, which they can use for model refinement. Overall, our contributions and findings add to the growing understanding of visualizing GNNs for increasingly complex tasks.
△ Less
Submitted 17 October, 2023;
originally announced October 2023.
-
Efficient-3DiM: Learning a Generalizable Single-image Novel-view Synthesizer in One Day
Authors:
Yifan Jiang,
Hao Tang,
Jen-Hao Rick Chang,
Liangchen Song,
Zhangyang Wang,
Liangliang Cao
Abstract:
The task of novel view synthesis aims to generate unseen perspectives of an object or scene from a limited set of input images. Nevertheless, synthesizing novel views from a single image still remains a significant challenge in the realm of computer vision. Previous approaches tackle this problem by adopting mesh prediction, multi-plain image construction, or more advanced techniques such as neura…
▽ More
The task of novel view synthesis aims to generate unseen perspectives of an object or scene from a limited set of input images. Nevertheless, synthesizing novel views from a single image still remains a significant challenge in the realm of computer vision. Previous approaches tackle this problem by adopting mesh prediction, multi-plain image construction, or more advanced techniques such as neural radiance fields. Recently, a pre-trained diffusion model that is specifically designed for 2D image synthesis has demonstrated its capability in producing photorealistic novel views, if sufficiently optimized on a 3D finetuning task. Although the fidelity and generalizability are greatly improved, training such a powerful diffusion model requires a vast volume of training data and model parameters, resulting in a notoriously long time and high computational costs. To tackle this issue, we propose Efficient-3DiM, a simple but effective framework to learn a single-image novel-view synthesizer. Motivated by our in-depth analysis of the inference process of diffusion models, we propose several pragmatic strategies to reduce the training overhead to a manageable scale, including a crafted timestep sampling strategy, a superior 3D feature extractor, and an enhanced training scheme. When combined, our framework is able to reduce the total training time from 10 days to less than 1 day, significantly accelerating the training process under the same computational platform (one instance with 8 Nvidia A100 GPUs). Comprehensive experiments are conducted to demonstrate the efficiency and generalizability of our proposed method.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
Model-enhanced Vector Index
Authors:
Hailin Zhang,
Yujing Wang,
Qi Chen,
Ruiheng Chang,
Ting Zhang,
Ziming Miao,
Yingyan Hou,
Yang Ding,
Xupeng Miao,
Haonan Wang,
Bochen Pang,
Yuefeng Zhan,
Hao Sun,
Weiwei Deng,
Qi Zhang,
Fan Yang,
Xing Xie,
Mao Yang,
Bin Cui
Abstract:
Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to sup…
▽ More
Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to support document updates. In this paper, we aim to enhance the vector index with end-to-end deep generative models, leveraging the differentiable advantages of deep retrieval models while maintaining desirable serving efficiency. We propose Model-enhanced Vector Index (MEVI), a differentiable model-enhanced index empowered by a twin-tower representation model. MEVI leverages a Residual Quantization (RQ) codebook to bridge the sequence-to-sequence deep retrieval and embedding-based models. To substantially reduce the inference time, instead of decoding the unique document ids in long sequential steps, we first generate some semantic virtual cluster ids of candidate documents in a small number of steps, and then leverage the well-adapted embedding vectors to further perform a fine-grained search for the relevant documents in the candidate virtual clusters. We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions.
△ Less
Submitted 9 November, 2023; v1 submitted 23 September, 2023;
originally announced September 2023.
-
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models
Authors:
Hsuan Su,
Ting-Yao Hu,
Hema Swetha Koppula,
Raviteja Vemulapalli,
Jen-Hao Rick Chang,
Karren Yang,
Gautam Varma Mantena,
Oncel Tuzel
Abstract:
While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from…
▽ More
While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28\%$ on unseen target domains without any performance drop in source domains.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
A Closer Look at the Security Risks in the Rust Ecosystem
Authors:
Xiaoye Zheng,
Zhiyuan Wan,
Yun Zhang,
Rui Chang,
David Lo
Abstract:
Rust is an emerging programming language designed for the development of systems software. To facilitate the reuse of Rust code, crates.io, as a central package registry of the Rust ecosystem, hosts thousands of third-party Rust packages. The openness of crates.io enables the growth of the Rust ecosystem but comes with security risks by severe security advisories. Although Rust guarantees a softwa…
▽ More
Rust is an emerging programming language designed for the development of systems software. To facilitate the reuse of Rust code, crates.io, as a central package registry of the Rust ecosystem, hosts thousands of third-party Rust packages. The openness of crates.io enables the growth of the Rust ecosystem but comes with security risks by severe security advisories. Although Rust guarantees a software program to be safe via programming language features and strict compile-time checking, the unsafe keyword in Rust allows developers to bypass compiler safety checks for certain regions of code. Prior studies empirically investigate the memory safety and concurrency bugs in the Rust ecosystem, as well as the usage of unsafe keywords in practice. Nonetheless, the literature lacks a systematic investigation of the security risks in the Rust ecosystem.
In this paper, we perform a comprehensive investigation into the security risks present in the Rust ecosystem, asking ``what are the characteristics of the vulnerabilities, what are the characteristics of the vulnerable packages, and how are the vulnerabilities fixed in practice?''. To facilitate the study, we first compile a dataset of 433 vulnerabilities, 300 vulnerable code repositories, and 218 vulnerability fix commits in the Rust ecosystem, spanning over 7 years. With the dataset, we characterize the types, life spans, and evolution of the disclosed vulnerabilities. We then characterize the popularity, categorization, and vulnerability density of the vulnerable Rust packages, as well as their versions and code regions affected by the disclosed vulnerabilities. Finally, we characterize the complexity of vulnerability fixes and localities of corresponding code changes, and inspect how practitioners fix vulnerabilities in Rust packages with various localities.
△ Less
Submitted 29 August, 2023;
originally announced August 2023.
-
Extraction of Text from Optic Nerve Optical Coherence Tomography Reports
Authors:
Iyad Majid,
Youchen Victor Zhang,
Robert Chang,
Sophia Y. Wang
Abstract:
Purpose: The purpose of this study was to develop and evaluate rule-based algorithms to enhance the extraction of text data, including retinal nerve fiber layer (RNFL) values and other ganglion cell count (GCC) data, from Zeiss Cirrus optical coherence tomography (OCT) scan reports. Methods: DICOM files that contained encapsulated PDF reports with RNFL or Ganglion Cell in their document titles wer…
▽ More
Purpose: The purpose of this study was to develop and evaluate rule-based algorithms to enhance the extraction of text data, including retinal nerve fiber layer (RNFL) values and other ganglion cell count (GCC) data, from Zeiss Cirrus optical coherence tomography (OCT) scan reports. Methods: DICOM files that contained encapsulated PDF reports with RNFL or Ganglion Cell in their document titles were identified from a clinical imaging repository at a single academic ophthalmic center. PDF reports were then converted into image files and processed using the PaddleOCR Python package for optical character recognition. Rule-based algorithms were designed and iteratively optimized for improved performance in extracting RNFL and GCC data. Evaluation of the algorithms was conducted through manual review of a set of RNFL and GCC reports. Results: The developed algorithms demonstrated high precision in extracting data from both RNFL and GCC scans. Precision was slightly better for the right eye in RNFL extraction (OD: 0.9803 vs. OS: 0.9046), and for the left eye in GCC extraction (OD: 0.9567 vs. OS: 0.9677). Some values presented more challenges in extraction, particularly clock hours 5 and 6 for RNFL thickness, and signal strength for GCC. Conclusions: A customized optical character recognition algorithm can identify numeric results from optical coherence scan reports with high precision. Automated processing of PDF reports can greatly reduce the time to extract OCT results on a large scale.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Causal Disentanglement Hidden Markov Model for Fault Diagnosis
Authors:
Rihao Chang,
Yongtao Ma,
Weizhi Nie,
Jie Nie,
An-an Liu
Abstract:
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism a…
▽ More
In modern industries, fault diagnosis has been widely applied with the goal of realizing predictive maintenance. The key issue for the fault diagnosis system is to extract representative characteristics of the fault signal and then accurately predict the fault type. In this paper, we propose a Causal Disentanglement Hidden Markov model (CDHM) to learn the causality in the bearing fault mechanism and thus, capture their characteristics to achieve a more robust representation. Specifically, we make full use of the time-series data and progressively disentangle the vibration signal into fault-relevant and fault-irrelevant factors. The ELBO is reformulated to optimize the learning of the causal disentanglement Markov model. Moreover, to expand the scope of the application, we adopt unsupervised domain adaptation to transfer the learned disentangled representations to other working environments. Experiments were conducted on the CWRU dataset and IMS dataset. Relevant results validate the superiority of the proposed method.
△ Less
Submitted 6 August, 2023;
originally announced August 2023.
-
Visual Validation versus Visual Estimation: A Study on the Average Value in Scatterplots
Authors:
Daniel Braun,
Ashley Suh,
Remco Chang,
Michael Gleicher,
Tatiana von Landesberger
Abstract:
We investigate the ability of individuals to visually validate statistical models in terms of their fit to the data. While visual model estimation has been studied extensively, visual model validation remains under-investigated. It is unknown how well people are able to visually validate models, and how their performance compares to visual and computational estimation. As a starting point, we cond…
▽ More
We investigate the ability of individuals to visually validate statistical models in terms of their fit to the data. While visual model estimation has been studied extensively, visual model validation remains under-investigated. It is unknown how well people are able to visually validate models, and how their performance compares to visual and computational estimation. As a starting point, we conducted a study across two populations (crowdsourced and volunteers). Participants had to both visually estimate (i.e, draw) and visually validate (i.e., accept or reject) the frequently studied model of averages. Across both populations, the level of accuracy of the models that were considered valid was lower than the accuracy of the estimated models. We find that participants' validation and estimation were unbiased. Moreover, their natural critical point between accepting and rejecting a given mean value is close to the boundary of its 95% confidence interval, indicating that the visually perceived confidence interval corresponds to a common statistical standard. Our work contributes to the understanding of visual model validation and opens new research opportunities.
△ Less
Submitted 2 January, 2024; v1 submitted 18 July, 2023;
originally announced July 2023.
-
Sandpile Prediction on Undirected Graphs
Authors:
Ruinian Chang,
Jingbang Chen,
Ian Munro,
Richard Peng,
Qingyu Shi,
Zeyu Zheng
Abstract:
The $\textit{Abelian Sandpile}$ model is a well-known model used in exploring $\textit{self-organized criticality}$. Despite a large amount of work on other aspects of sandpiles, there have been limited results in efficiently computing the terminal state, known as the $\textit{sandpile prediction}$ problem.
On graphs with special structures, we present algorithms that compute the terminal config…
▽ More
The $\textit{Abelian Sandpile}$ model is a well-known model used in exploring $\textit{self-organized criticality}$. Despite a large amount of work on other aspects of sandpiles, there have been limited results in efficiently computing the terminal state, known as the $\textit{sandpile prediction}$ problem.
On graphs with special structures, we present algorithms that compute the terminal configurations for sandpile instances in $O(n \log n)$ time on trees and $O(n)$ time on paths, where $n$ is the number of vertices. Our algorithms improve the previous best runtime of $O(n \log^5 n)$ on trees [Ramachandran-Schild SODA '17] and $O(n \log n)$ on paths [Moore-Nilsson '99]. To do so, we move beyond the simulation of individual events by directly computing the number of firings for each vertex. The computation is accelerated using splittable binary search trees. In addition, we give algorithms in $O(n)$ time on cliques and $O(n \log^2 n)$ time on pseudotrees.
On general graphs, we propose a fast algorithm under the setting where the number of chips $N$ could be arbitrarily large. We obtain a $\log N$ dependency, improving over the $\mathtt{poly}(N)$ dependency in purely simulation-based algorithms. Our algorithm also achieves faster performance on various types of graphs, including regular graphs, expander graphs, and hypercubes. We also provide a reduction that enables us to decompose the input sandpile into several smaller instances and solve them separately.
△ Less
Submitted 5 April, 2024; v1 submitted 15 July, 2023;
originally announced July 2023.
-
Pointersect: Neural Rendering with Cloud-Ray Intersection
Authors:
Jen-Hao Rick Chang,
Wei-Yu Chen,
Anurag Ranjan,
Kwang Moo Yi,
Oncel Tuzel
Abstract:
We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other…
▽ More
We propose a novel method that renders point clouds as if they are surfaces. The proposed method is differentiable and requires no scene-specific optimization. This unique capability enables, out-of-the-box, surface normal estimation, rendering room-scale point clouds, inverse rendering, and ray tracing with global illumination. Unlike existing work that focuses on converting point clouds to other representations--e.g., surfaces or implicit functions--our key idea is to directly infer the intersection of a light ray with the underlying surface represented by the given point cloud. Specifically, we train a set transformer that, given a small number of local neighbor points along a light ray, provides the intersection point, the surface normal, and the material blending weights, which are used to render the outcome of this light ray. Localizing the problem into small neighborhoods enables us to train a model with only 48 meshes and apply it to unseen point clouds. Our model achieves higher estimation accuracy than state-of-the-art surface reconstruction and point-cloud rendering methods on three test sets. When applied to room-scale point clouds, without any scene-specific optimization, the model achieves competitive quality with the state-of-the-art novel-view rendering methods. Moreover, we demonstrate ability to render and manipulate Lidar-scanned point clouds such as lighting control and object insertion.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities
Authors:
Harry Li,
Gabriel Appleby,
Camelia Daniela Brumar,
Remco Chang,
Ashley Suh
Abstract:
This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas…
▽ More
This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.
△ Less
Submitted 18 June, 2024; v1 submitted 3 April, 2023;
originally announced April 2023.
-
FaceLit: Neural 3D Relightable Faces
Authors:
Anurag Ranjan,
Kwang Moo Yi,
Jen-Hao Rick Chang,
Oncel Tuzel
Abstract:
We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Ph…
▽ More
We propose a generative framework, FaceLit, capable of generating a 3D face that can be rendered at various user-defined lighting conditions and views, learned purely from 2D images in-the-wild without any manual annotation. Unlike existing works that require careful capture setup or human labor, we rely on off-the-shelf pose and illumination estimators. With these estimates, we incorporate the Phong reflectance model in the neural volume rendering framework. Our model learns to generate shape and material properties of a face such that, when rendered according to the natural statistics of pose and illumination, produces photorealistic face images with multiview 3D and illumination consistency. Our method enables photorealistic generation of faces with explicit illumination and view controls on multiple datasets - FFHQ, MetFaces and CelebA-HQ. We show state-of-the-art photorealism among 3D aware GANs on FFHQ dataset achieving an FID score of 3.5.
△ Less
Submitted 27 March, 2023;
originally announced March 2023.
-
Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis
Authors:
Karren Yang,
Ting-Yao Hu,
Jen-Hao Rick Chang,
Hema Swetha Koppula,
Oncel Tuzel
Abstract:
Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To…
▽ More
Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To address the first question, we adapt a state-of-the-art automatic speech recognition (ASR) model to target speakers from four benchmark datasets representative of different speaker types. We show that ASR personalization with synthetic data is effective in all cases, but particularly when (i) the target speaker is underrepresented in the global data, and (ii) the capacity of the global model is limited. To address the second question of why personalized synthetic data is effective, we use controllable speech synthesis to generate speech with varied styles and content. Surprisingly, we find that the text content of the synthetic data, rather than style, is important for speaker adaptation. These results lead us to propose a data selection strategy for ASR personalization based on speech content.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
Planning Goals for Exploration
Authors:
Edward S. Hu,
Richard Chang,
Oleh Rybkin,
Dinesh Jayaraman
Abstract:
Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for…
▽ More
Dropped into an unknown environment, what should an agent do to quickly learn about the environment and how to accomplish diverse tasks within it? We address this question within the goal-conditioned reinforcement learning paradigm, by identifying how the agent should set its goals at training time to maximize exploration. We propose "Planning Exploratory Goals" (PEG), a method that sets goals for each training episode to directly optimize an intrinsic exploration reward. PEG first chooses goal commands such that the agent's goal-conditioned policy, at its current level of training, will end up in states with high exploration potential. It then launches an exploration policy starting at those promising states. To enable this direct optimization, PEG learns world models and adapts sampling-based planning algorithms to "plan goal commands". In challenging simulated robotics environments including a multi-legged ant robot in a maze, and a robot arm on a cluttered tabletop, PEG exploration enables more efficient and effective training of goal-conditioned policies relative to baselines and ablations. Our ant successfully navigates a long maze, and the robot arm successfully builds a stack of three blocks upon command. Website: https://penn-pal-lab.github.io/peg/
△ Less
Submitted 22 March, 2023;
originally announced March 2023.
-
A Problem Space for Designing Visualizations
Authors:
Michael Gleicher,
Maria Riveiro,
Tatiana von Landesberger,
Oliver Deussen,
Remco Chang,
Christina Gillman
Abstract:
Visualization researchers and visualization professionals seek appropriate abstractions of visualization requirements that permit considering visualization solutions independently from specific problems. Abstractions can help us design, analyze, organize, and evaluate the things we create. The literature has many task structures (taxonomies, typologies, etc.), design spaces, and related ``framewor…
▽ More
Visualization researchers and visualization professionals seek appropriate abstractions of visualization requirements that permit considering visualization solutions independently from specific problems. Abstractions can help us design, analyze, organize, and evaluate the things we create. The literature has many task structures (taxonomies, typologies, etc.), design spaces, and related ``frameworks'' that provide abstractions of the problems a visualization is meant to address. In this viewpoint, we introduce a different one, a problem space that complements existing frameworks by focusing on the needs that a visualization is meant to solve. We believe it provides a valuable conceptual tool for designing and discussing visualizations.
△ Less
Submitted 14 March, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Multi-site, Multi-domain Airway Tree Modeling (ATM'22): A Public Benchmark for Pulmonary Airway Segmentation
Authors:
Minghui Zhang,
Yangqian Wu,
Hanxiao Zhang,
Yulei Qin,
Hao Zheng,
Wen Tang,
Corey Arnold,
Chenhao Pei,
Pengxin Yu,
Yang Nan,
Guang Yang,
Simon Walsh,
Dominic C. Marshall,
Matthieu Komorowski,
Puyang Wang,
Dazhou Guo,
Dakai Jin,
Ya'nan Wu,
Shuiqing Zhao,
Runsheng Chang,
Boyu Zhang,
Xing Lv,
Abdul Qayyum,
Moona Mazher,
Qi Su
, et al. (11 additional authors not shown)
Abstract:
Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms drive…
▽ More
Open international challenges are becoming the de facto standard for assessing computer vision and image analysis algorithms. In recent years, new methods have extended the reach of pulmonary airway segmentation that is closer to the limit of image resolution. Since EXACT'09 pulmonary airway segmentation, limited effort has been directed to quantitative comparison of newly emerged algorithms driven by the maturity of deep learning based approaches and clinical drive for resolving finer details of distal airways for early intervention of pulmonary diseases. Thus far, public annotated datasets are extremely limited, hindering the development of data-driven methods and detailed performance evaluation of new algorithms. To provide a benchmark for the medical imaging community, we organized the Multi-site, Multi-domain Airway Tree Modeling (ATM'22), which was held as an official challenge event during the MICCAI 2022 conference. ATM'22 provides large-scale CT scans with detailed pulmonary airway annotation, including 500 CT scans (300 for training, 50 for validation, and 150 for testing). The dataset was collected from different sites and it further included a portion of noisy COVID-19 CTs with ground-glass opacity and consolidation. Twenty-three teams participated in the entire phase of the challenge and the algorithms for the top ten teams are reviewed in this paper. Quantitative and qualitative results revealed that deep learning models embedded with the topological continuity enhancement achieved superior performance in general. ATM'22 challenge holds as an open-call design, the training data and the gold standard evaluation are available upon successful registration via its homepage.
△ Less
Submitted 27 June, 2023; v1 submitted 10 March, 2023;
originally announced March 2023.
-
#RoeOverturned: Twitter Dataset on the Abortion Rights Controversy
Authors:
Rong-Ching Chang,
Ashwin Rao,
Qiankun Zhong,
Magdalena Wojcieszak,
Kristina Lerman
Abstract:
On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, espec…
▽ More
On June 24, 2022, the United States Supreme Court overturned landmark rulings made in its 1973 verdict in Roe v. Wade. The justices by way of a majority vote in Dobbs v. Jackson Women's Health Organization, decided that abortion wasn't a constitutional right and returned the issue of abortion to the elected representatives. This decision triggered multiple protests and debates across the US, especially in the context of the midterm elections in November 2022. Given that many citizens use social media platforms to express their views and mobilize for collective action, and given that online debate provides tangible effects on public opinion, political participation, news media coverage, and the political decision-making, it is crucial to understand online discussions surrounding this topic. Toward this end, we present the first large-scale Twitter dataset collected on the abortion rights debate in the United States. We present a set of 74M tweets systematically collected over the course of one year from January 1, 2022 to January 6, 2023.
△ Less
Submitted 2 February, 2023;
originally announced February 2023.
-
Two-stage Contextual Transformer-based Convolutional Neural Network for Airway Extraction from CT Images
Authors:
Yanan Wu,
Shuiqing Zhao,
Shouliang Qi,
Jie Feng,
Haowen Pang,
Runsheng Chang,
Long Bai,
Mengqi Li,
Shuyue Xia,
Wei Qian,
Hongliang Ren
Abstract:
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in…
▽ More
Accurate airway extraction from computed tomography (CT) images is a critical step for planning navigation bronchoscopy and quantitative assessment of airway-related chronic obstructive pulmonary disease (COPD). The existing methods are challenging to sufficiently segment the airway, especially the high-generation airway, with the constraint of the limited label and cannot meet the clinical use in COPD. We propose a novel two-stage 3D contextual transformer-based U-Net for airway segmentation using CT images. The method consists of two stages, performing initial and refined airway segmentation. The two-stage model shares the same subnetwork with different airway masks as input. Contextual transformer block is performed both in the encoder and decoder path of the subnetwork to finish high-quality airway segmentation effectively. In the first stage, the total airway mask and CT images are provided to the subnetwork, and the intrapulmonary airway mask and corresponding CT scans to the subnetwork in the second stage. Then the predictions of the two-stage method are merged as the final prediction. Extensive experiments were performed on in-house and multiple public datasets. Quantitative and qualitative analysis demonstrate that our proposed method extracted much more branches and lengths of the tree while accomplishing state-of-the-art airway segmentation performance. The code is available at https://github.com/zhaozsq/airway_segmentation.
△ Less
Submitted 15 December, 2022;
originally announced December 2022.
-
DA-CIL: Towards Domain Adaptive Class-Incremental 3D Object Detection
Authors:
Ziyuan Zhao,
Mingxi Xu,
Peisheng Qian,
Ramanpreet Singh Pahwa,
Richard Chang
Abstract:
Deep learning has achieved notable success in 3D object detection with the advent of large-scale point cloud datasets. However, severe performance degradation in the past trained classes, i.e., catastrophic forgetting, still remains a critical issue for real-world deployment when the number of classes is unknown or may vary. Moreover, existing 3D class-incremental detection methods are developed f…
▽ More
Deep learning has achieved notable success in 3D object detection with the advent of large-scale point cloud datasets. However, severe performance degradation in the past trained classes, i.e., catastrophic forgetting, still remains a critical issue for real-world deployment when the number of classes is unknown or may vary. Moreover, existing 3D class-incremental detection methods are developed for the single-domain scenario, which fail when encountering domain shift caused by different datasets, varying environments, etc. In this paper, we identify the unexplored yet valuable scenario, i.e., class-incremental learning under domain shift, and propose a novel 3D domain adaptive class-incremental object detection framework, DA-CIL, in which we design a novel dual-domain copy-paste augmentation method to construct multiple augmented domains for diversifying training distributions, thereby facilitating gradual domain adaptation. Then, multi-level consistency is explored to facilitate dual-teacher knowledge distillation from different domains for domain adaptive class-incremental learning. Extensive experiments on various datasets demonstrate the effectiveness of the proposed method over baselines in the domain adaptive class-incremental learning scenario.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Anger Breeds Controversy: Analyzing Controversy and Emotions on Reddit
Authors:
Kai Chen,
Zihao He,
Rong-Ching Chang,
Jonathan May,
Kristina Lerman
Abstract:
Emotions play an important role in interpersonal interactions and social conflict, yet their function in the development of controversy and disagreement in online conversations has not been explored. To address this gap, we study controversy on Reddit, a popular network of online discussion forums. We collect discussions from a wide variety of topical forums and use emotion detection to recognize…
▽ More
Emotions play an important role in interpersonal interactions and social conflict, yet their function in the development of controversy and disagreement in online conversations has not been explored. To address this gap, we study controversy on Reddit, a popular network of online discussion forums. We collect discussions from a wide variety of topical forums and use emotion detection to recognize a range of emotions from text, including anger, fear, joy, admiration, etc. Our study has three main findings. First, controversial comments express more anger and less admiration, joy and optimism than non-controversial comments. Second, controversial comments affect emotions of downstream comments in a discussion, usually resulting in long-term increase in anger and a decrease in positive emotions, although the magnitude and direction of emotional change depends on the forum. Finally, we show that emotions help better predict which comments will become controversial. Understanding emotional dynamics of online discussions can help communities to better manage conversations.
△ Less
Submitted 1 December, 2022;
originally announced December 2022.
-
Analysis Without Data: Teaching Students to Tackle the VAST Challenge
Authors:
Edward W He,
Daniel Tolessa,
Ashley Suh,
Remco Chang
Abstract:
The VAST Challenges have been shown to be an effective tool in visual analytics education, encouraging student learning while enforcing good visualization design and development practices. However, research has observed that students often struggle at identifying a good "starting point" when tackling the VAST Challenge. Consequently, students who could not identify a good starting point failed at…
▽ More
The VAST Challenges have been shown to be an effective tool in visual analytics education, encouraging student learning while enforcing good visualization design and development practices. However, research has observed that students often struggle at identifying a good "starting point" when tackling the VAST Challenge. Consequently, students who could not identify a good starting point failed at finding the correct solution to the challenge. In this paper, we propose a preliminary guideline for helping students approach the VAST Challenge and identify initial analysis directions. We recruited two students to analyze the VAST 2017 Challenge using a hypothesis-driven approach, where they were required to pre-register their hypotheses prior to inspecting and analyzing the full dataset. From their experience, we developed a prescriptive guideline for other students to tackle VAST Challenges. In a preliminary study, we found that the students were able to use the guideline to generate well-formed hypotheses that could lead them towards solving the challenge. Additionally, the students reported that with the guideline, they felt like they had concrete steps that they could follow, thereby alleviating the burden of identifying a good starting point in their analysis process.
△ Less
Submitted 1 November, 2022;
originally announced November 2022.
-
3D Shape Knowledge Graph for Cross-domain 3D Shape Retrieval
Authors:
Rihao Chang,
Yongtao Ma,
Tong Hao,
Weizhi Nie
Abstract:
The surge in 3D modeling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. This study presents an innovative notion,…
▽ More
The surge in 3D modeling has led to a pronounced research emphasis on the field of 3D shape retrieval. Numerous contemporary approaches have been put forth to tackle this intricate challenge. Nevertheless, effectively addressing the intricacies of cross-modal 3D shape retrieval remains a formidable undertaking, owing to inherent modality-based disparities. This study presents an innovative notion, termed "geometric words", which functions as elemental constituents for representing entities through combinations. To establish the knowledge graph, we employ geometric words as nodes, connecting them via shape categories and geometry attributes. Subsequently, we devise a unique graph embedding method for knowledge acquisition. Finally, an effective similarity measure is introduced for retrieval purposes. Importantly, each 3D or 2D entity can anchor its geometric terms within the knowledge graph, thereby serving as a link between cross-domain data. As a result, our approach facilitates multiple cross-domain 3D shape retrieval tasks. We evaluate the proposed method's performance on the ModelNet40 and ShapeNetCore55 datasets, encompassing scenarios related to 3D shape retrieval and cross-domain retrieval. Furthermore, we employ the established cross-modal dataset (MI3DOR) to assess cross-modal 3D shape retrieval. The resulting experimental outcomes, in conjunction with comparisons against state-of-the-art techniques, clearly highlight the superiority of our approach.
△ Less
Submitted 21 December, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
PA-Boot: A Formally Verified Authentication Protocol for Multiprocessor Secure Boot
Authors:
Zhuoruo Zhang,
Chenyang Yu,
Rui Chang,
Mingshuai Chen,
Bo Feng,
He Huang,
Qinming Dai,
Wenbo Shen,
Yongwang Zhao
Abstract:
Hardware supply-chain attacks are raising significant security threats to the boot process of multiprocessor systems. This paper identifies a new, prevalent hardware supply-chain attack surface that can bypass multiprocessor secure boot due to the absence of processor-authentication mechanisms. To defend against such attacks, we present PA-Boot, the first formally verified processor-authentication…
▽ More
Hardware supply-chain attacks are raising significant security threats to the boot process of multiprocessor systems. This paper identifies a new, prevalent hardware supply-chain attack surface that can bypass multiprocessor secure boot due to the absence of processor-authentication mechanisms. To defend against such attacks, we present PA-Boot, the first formally verified processor-authentication protocol for secure boot in multiprocessor systems. PA-Boot is proved functionally correct and is guaranteed to detect multiple adversarial behaviors, e.g., processor replacements, man-in-the-middle attacks, and tampering with certificates. The fine-grained formalization of PA-Boot and its fully mechanized security proofs are carried out in the Isabelle/HOL theorem prover with 306 lemmas/theorems and ~7,100 LoC. Experiments on a proof-of-concept implementation indicate that PA-Boot can effectively identify boot-process attacks with a considerably minor overhead and thereby improve the security of multiprocessor systems.
△ Less
Submitted 24 April, 2024; v1 submitted 16 September, 2022;
originally announced September 2022.
-
Machine Learning-based EEG Applications and Markets
Authors:
Weiqing Gu,
Bohan Yang,
Ryan Chang
Abstract:
This paper addresses both the various EEG applications and the current EEG market ecosystem propelled by machine learning. Increasingly available open medical and health datasets using EEG encourage data-driven research with a promise of improving neurology for patient care through knowledge discovery and machine learning data science algorithm development. This effort leads to various kinds of EE…
▽ More
This paper addresses both the various EEG applications and the current EEG market ecosystem propelled by machine learning. Increasingly available open medical and health datasets using EEG encourage data-driven research with a promise of improving neurology for patient care through knowledge discovery and machine learning data science algorithm development. This effort leads to various kinds of EEG developments and currently forms a new EEG market. This paper attempts to do a comprehensive survey on the EEG market and covers the six significant applications of EEG, including diagnosis/screening, drug development, neuromarketing, daily health, metaverse, and age/disability assistance. The highlight of this survey is on the compare and contrast between the research field and the business market. Our survey points out the current limitations of EEG and indicates the future direction of research and business opportunity for every EEG application listed above. Based on our survey, more research on machine learning-based EEG applications will lead to a more robust EEG-related market. More companies will use the research technology and apply it to real-life settings. As the EEG-related market grows, the EEG-related devices will collect more EEG data, and there will be more EEG data available for researchers to use in their study, coming back as a virtuous cycle. Our market analysis indicates that research related to the use of EEG data and machine learning in the six applications listed above points toward a clear trend in the growth and development of the EEG ecosystem and machine learning world.
△ Less
Submitted 10 August, 2022;
originally announced August 2022.
-
Sensor Deployment and Link Analysis in Satellite IoT Systems for Wildfire Detection
Authors:
How-Hang Liu,
Ronald Y. Chang,
Yi-Ying Chen,
I-Kang Fu,
H. Vincent Poor
Abstract:
Climate change has been identified as one of the most critical threats to human civilization and sustainability. Wildfires, which produce huge amounts of carbon emission, are both drivers and results of climate change. An early and timely wildfire detection system can constrain fires to short and small ones and yield significant carbon reduction. In this paper, we propose to use ground sensor depl…
▽ More
Climate change has been identified as one of the most critical threats to human civilization and sustainability. Wildfires, which produce huge amounts of carbon emission, are both drivers and results of climate change. An early and timely wildfire detection system can constrain fires to short and small ones and yield significant carbon reduction. In this paper, we propose to use ground sensor deployment and satellite Internet of Things (IoT) technologies for wildfire detection by taking advantage of satellites' ubiquitous global coverage. We first develop an optimal IoT sensor placement strategy based on fire ignition and detection models. Then, we analyze the uplink satellite communication budget and the bandwidth required for wildfire detection under the narrowband IoT (NB-IoT) radio interface. Finally, we conduct simulations on the California wildfire database and quantify the potential economical benefits by factoring in carbon emission reductions and sensor/bandwidth costs.
△ Less
Submitted 5 August, 2022; v1 submitted 2 August, 2022;
originally announced August 2022.
-
PIXAL: Anomaly Reasoning with Visual Analytics
Authors:
Brian Montambault,
Camelia D. Brumar,
Michael Behrisch,
Remco Chang
Abstract:
Anomaly detection remains an open challenge in many application areas. While there are a number of available machine learning algorithms for detecting anomalies, analysts are frequently asked to take additional steps in reasoning about the root cause of the anomalies and form actionable hypotheses that can be communicated to business stakeholders. Without the appropriate tools, this reasoning proc…
▽ More
Anomaly detection remains an open challenge in many application areas. While there are a number of available machine learning algorithms for detecting anomalies, analysts are frequently asked to take additional steps in reasoning about the root cause of the anomalies and form actionable hypotheses that can be communicated to business stakeholders. Without the appropriate tools, this reasoning process is time-consuming, tedious, and potentially error-prone. In this paper we present PIXAL, a visual analytics system developed following an iterative design process with professional analysts responsible for anomaly detection. PIXAL is designed to fill gaps in existing tools commonly used by analysts to reason with and make sense of anomalies. PIXAL consists of three components: (1) an algorithm that finds patterns by aggregating multiple anomalous data points using first-order predicates, (2) a visualization tool that allows the analyst to build trust in the algorithmically-generated predicates by performing comparative and counterfactual analyses, and (3) a visualization tool that helps the analyst generate and validate hypotheses by exploring which features in the data most explain the anomalies. Finally, we present the results of a qualitative observational study with professional analysts. These results of the study indicate that PIXAL facilitates the anomaly reasoning process, allowing analysts to make sense of anomalies and generate hypotheses that are meaningful and actionable to business stakeholders.
△ Less
Submitted 22 May, 2022;
originally announced May 2022.
-
Are Metrics Enough? Guidelines for Communicating and Visualizing Predictive Models to Subject Matter Experts
Authors:
Ashley Suh,
Gabriel Appleby,
Erik W. Anderson,
Luca Finelli,
Remco Chang,
Dylan Cashman
Abstract:
Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail i…
▽ More
Presenting a predictive model's performance is a communication bottleneck that threatens collaborations between data scientists and subject matter experts. Accuracy and error metrics alone fail to tell the whole story of a model - its risks, strengths, and limitations - making it difficult for subject matter experts to feel confident in their decision to use a model. As a result, models may fail in unexpected ways or go entirely unused, as subject matter experts disregard poorly presented models in favor of familiar, yet arguably substandard methods. In this paper, we describe an iterative study conducted with both subject matter experts and data scientists to understand the gaps in communication between these two groups. We find that, while the two groups share common goals of understanding the data and predictions of the model, friction can stem from unfamiliar terms, metrics, and visualizations - limiting the transfer of knowledge to SMEs and discouraging clarifying questions being asked during presentations. Based on our findings, we derive a set of communication guidelines that use visualization as a common medium for communicating the strengths and weaknesses of a model. We provide a demonstration of our guidelines in a regression modeling scenario and elicit feedback on their use from subject matter experts. From our demonstration, subject matter experts were more comfortable discussing a model's performance, more aware of the trade-offs for the presented model, and better equipped to assess the model's risks - ultimately informing and contextualizing the model's use beyond text and numbers.
△ Less
Submitted 27 March, 2023; v1 submitted 11 May, 2022;
originally announced May 2022.
-
Inferential Tasks as an Evaluation Technique for Visualization
Authors:
Ashley Suh,
Ab Mosca,
Shannon Robinson,
Quinn Pham,
Dylan Cashman,
Alvitta Ottley,
Remco Chang
Abstract:
Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended…
▽ More
Designing suitable tasks for visualization evaluation remains challenging. Traditional evaluation techniques commonly rely on 'low-level' or 'open-ended' tasks to assess the efficacy of a proposed visualization, however, nontrivial trade-offs exist between the two. Low-level tasks allow for robust quantitative evaluations, but are not indicative of the complex usage of a visualization. Open-ended tasks, while excellent for insight-based evaluations, are typically unstructured and require time-consuming interviews. Bridging this gap, we propose inferential tasks: a complementary task category based on inferential learning in psychology. Inferential tasks produce quantitative evaluation data in which users are prompted to form and validate their own findings with a visualization. We demonstrate the use of inferential tasks through a validation experiment on two well-known visualization tools.
△ Less
Submitted 11 May, 2022;
originally announced May 2022.
-
A Grammar of Hypotheses for Visualization, Data, and Analysis
Authors:
Ashley Suh,
Ab Mosca,
Eugene Wu,
Remco Chang
Abstract:
We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a system…
▽ More
We present a grammar for expressing hypotheses in visual data analysis to formalize the previously abstract notion of "analysis tasks." Through the lens of our grammar, we lay the groundwork for how a user's data analysis questions can be operationalized and automated as a set of hypotheses (a hypothesis space). We demonstrate that our grammar-based approach for analysis tasks can provide a systematic method towards unifying three disparate spaces in visualization research: the hypotheses a dataset can express (a data hypothesis space), the hypotheses a user would like to refine or verify through analysis (an analysis hypothesis space), and the hypotheses a visualization design is capable of supporting (a visualization hypothesis space). We illustrate how the formalization of these three spaces can inform future research in visualization evaluation, knowledge elicitation, analytic provenance, and visualization recommendation by using a shared language for hypotheses. Finally, we compare our proposed grammar-based approach with existing visual analysis models and discuss the potential of a new hypothesis-driven theory of visual analytics.
△ Less
Submitted 3 April, 2023; v1 submitted 29 April, 2022;
originally announced April 2022.
-
Unsupervised Learning Based Hybrid Beamforming with Low-Resolution Phase Shifters for MU-MIMO Systems
Authors:
Chia-Ho Kuo,
Hsin-Yuan Chang,
Ronald Y. Chang,
Wei-Ho Chung
Abstract:
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based sche…
▽ More
Millimeter wave (mmWave) is a key technology for fifth-generation (5G) and beyond communications. Hybrid beamforming has been proposed for large-scale antenna systems in mmWave communications. Existing hybrid beamforming designs based on infinite-resolution phase shifters (PSs) are impractical due to hardware cost and power consumption. In this paper, we propose an unsupervised-learning-based scheme to jointly design the analog precoder and combiner with low-resolution PSs for multiuser multiple-input multiple-output (MU-MIMO) systems. We transform the analog precoder and combiner design problem into a phase classification problem and propose a generic neural network architecture, termed the phase classification network (PCNet), capable of producing solutions of various PS resolutions. Simulation results demonstrate the superior sum-rate and complexity performance of the proposed scheme, as compared to state-of-the-art hybrid beamforming designs for the most commonly used low-resolution PS configurations.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Few-Shot Transfer Learning for Device-Free Fingerprinting Indoor Localization
Authors:
Bing-Jia Chen,
Ronald Y. Chang
Abstract:
Device-free wireless indoor localization is an essential technology for the Internet of Things (IoT), and fingerprint-based methods are widely used. A common challenge to fingerprint-based methods is data collection and labeling. This paper proposes a few-shot transfer learning system that uses only a small amount of labeled data from the current environment and reuses a large amount of existing l…
▽ More
Device-free wireless indoor localization is an essential technology for the Internet of Things (IoT), and fingerprint-based methods are widely used. A common challenge to fingerprint-based methods is data collection and labeling. This paper proposes a few-shot transfer learning system that uses only a small amount of labeled data from the current environment and reuses a large amount of existing labeled data previously collected in other environments, thereby significantly reducing the data collection and labeling cost for localization in each new environment. The core method lies in graph neural network (GNN) based few-shot transfer learning and its modifications. Experimental results conducted on real-world environments show that the proposed system achieves comparable performance to a convolutional neural network (CNN) model, with 40 times fewer labeled data.
△ Less
Submitted 29 January, 2022;
originally announced January 2022.
-
Skin feature point tracking using deep feature encodings
Authors:
Jose Ramon Chang,
Torbjörn E. M. Nordling
Abstract:
Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robus…
▽ More
Facial feature tracking is a key component of imaging ballistocardiography (BCG) where accurate quantification of the displacement of facial keypoints is needed for good heart rate estimation. Skin feature tracking enables video-based quantification of motor degradation in Parkinson's disease. Traditional computer vision algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Lucas-Kanade method (LK). These have long represented the state-of-the-art in efficiency and accuracy but fail when common deformations, like affine local transformations or illumination changes, are present.
Over the past five years, deep convolutional neural networks have outperformed traditional methods for most computer vision tasks. We propose a pipeline for feature tracking, that applies a convolutional stacked autoencoder to identify the most similar crop in an image to a reference crop containing the feature of interest. The autoencoder learns to represent image crops into deep feature encodings specific to the object category it is trained on.
We train the autoencoder on facial images and validate its ability to track skin features in general using manually labeled face and hand videos. The tracking errors of distinctive skin features (moles) are so small that we cannot exclude that they stem from the manual labelling based on a $χ^2$-test. With a mean error of 0.6-4.2 pixels, our method outperformed the other methods in all but one scenario. More importantly, our method was the only one to not diverge.
We conclude that our method creates better feature descriptors for feature tracking, feature matching, and image registration than the traditional algorithms.
△ Less
Submitted 4 December, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
SoK: A Study of the Security on Voice Processing Systems
Authors:
Robert Chang,
Logan Kuo,
Arthur Liu,
Nader Sehatbakhsh
Abstract:
As the use of Voice Processing Systems (VPS) continues to become more prevalent in our daily lives through the increased reliance on applications such as commercial voice recognition devices as well as major text-to-speech software, the attacks on these systems are increasingly complex, varied, and constantly evolving. With the use cases for VPS rapidly growing into new spaces and purposes, the po…
▽ More
As the use of Voice Processing Systems (VPS) continues to become more prevalent in our daily lives through the increased reliance on applications such as commercial voice recognition devices as well as major text-to-speech software, the attacks on these systems are increasingly complex, varied, and constantly evolving. With the use cases for VPS rapidly growing into new spaces and purposes, the potential consequences regarding privacy are increasingly more dangerous. In addition, the growing number and increased practicality of over-the-air attacks have made system failures much more probable. In this paper, we will identify and classify an arrangement of unique attacks on voice processing systems. Over the years research has been moving from specialized, untargeted attacks that result in the malfunction of systems and the denial of services to more general, targeted attacks that can force an outcome controlled by an adversary. The current and most frequently used machine learning systems and deep neural networks, which are at the core of modern voice processing systems, were built with a focus on performance and scalability rather than security. Therefore, it is critical for us to reassess the developing voice processing landscape and to identify the state of current attacks and defenses so that we may suggest future developments and theoretical improvements.
△ Less
Submitted 24 December, 2021;
originally announced December 2021.
-
UnProjection: Leveraging Inverse-Projections for Visual Analytics of High-Dimensional Data
Authors:
Mateus Espadoto,
Gabriel Appleby,
Ashley Suh,
Dylan Cashman,
Mingwei Li,
Carlos Scheidegger,
Erik W Anderson,
Remco Chang,
Alexandru C Telea
Abstract:
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection -- the process of mapping the projected points, or more generally, the projection space back to the origina…
▽ More
Projection techniques are often used to visualize high-dimensional data, allowing users to better understand the overall structure of multi-dimensional spaces on a 2D screen. Although many such methods exist, comparably little work has been done on generalizable methods of inverse-projection -- the process of mapping the projected points, or more generally, the projection space back to the original high-dimensional space. In this paper we present NNInv, a deep learning technique with the ability to approximate the inverse of any projection or mapping. NNInv learns to reconstruct high-dimensional data from any arbitrary point on a 2D projection space, giving users the ability to interact with the learned high-dimensional representation in a visual analytics system. We provide an analysis of the parameter space of NNInv, and offer guidance in selecting these parameters. We extend validation of the effectiveness of NNInv through a series of quantitative and qualitative analyses. We then demonstrate the method's utility by applying it to three visualization tasks: interactive instance interpolation, classifier agreement, and gradient visualization.
△ Less
Submitted 2 November, 2021;
originally announced November 2021.