Skip to main content

Showing 1–50 of 195 results for author: Pfister, H

  1. Lite2Relight: 3D-aware Single Image Portrait Relighting

    Authors: Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

    Abstract: Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted at SIGGRAPH '24: ACM SIGGRAPH 2024 Conference Papers

  2. arXiv:2406.16935  [pdf, other

    eess.SP cs.AI

    Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

    Authors: Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman

    Abstract: We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \tex… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  3. arXiv:2406.11331  [pdf, other

    cs.CV cs.IR cs.LG

    They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

    Authors: Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

    Abstract: Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks. In this work, we propose a novel framework to generate synthetic counterfactual images to create a diverse and balanced dataset that can be use… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  4. arXiv:2406.10772  [pdf, ps, other

    cs.DM

    On the maximal L1 influence of real-valued boolean functions

    Authors: Andrew J. Young, Henry D. Pfister

    Abstract: We show that any sequence of well-behaved (e.g. bounded and non-constant) real-valued functions of $n$ boolean variables $\{f_n\}$ admits a sequence of coordinates whose $L^1$ influence under the $p$-biased distribution, for any $p\in(0,1)$, is $Ω(\text{var}(f_n) \frac{\ln n}{n})$.

    Submitted 15 June, 2024; originally announced June 2024.

  5. arXiv:2405.20643  [pdf, other

    cs.CV cs.AI

    Learning Gaze-aware Compositional GAN

    Authors: Nerea Aranjuelo, Siyu Huang, Ignacio Arganda-Carreras, Luis Unzueta, Oihana Otaegui, Hanspeter Pfister, Donglai Wei

    Abstract: Gaze-annotated facial data is crucial for training deep neural networks (DNNs) for gaze estimation. However, obtaining these data is labor-intensive and requires specialized equipment due to the challenge of accurately annotating the gaze direction of a subject. In this work, we present a generative framework to create annotated gaze data by leveraging the benefits of labeled and unlabeled data so… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Accepted by ETRA 2024 as Full paper, and as journal paper in Proceedings of the ACM on Computer Graphics and Interactive Techniques

    Journal ref: Proceedings of the ACM on Computer Graphics and Interactive Techniques, 2024

  6. arXiv:2404.14435  [pdf, other

    cs.CV eess.IV

    FreSeg: Frenet-Frame-based Part Segmentation for 3D Curvilinear Structures

    Authors: Shixuan Gu, Jason Ken Adhinarta, Mikhail Bessmeltsev, Jiancheng Yang, Jessica Zhang, Daniel Berger, Jeff W. Lichtman, Hanspeter Pfister, Donglai Wei

    Abstract: Part segmentation is a crucial task for 3D curvilinear structures like neuron dendrites and blood vessels, enabling the analysis of dendritic spines and aneurysms with scientific and clinical significance. However, their diversely winded morphology poses a generalization challenge to existing deep learning methods, which leads to labor-intensive manual correction. In this work, we propose FreSeg,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures

  7. arXiv:2404.01976  [pdf, other

    cs.CV cs.AI cs.LG

    Joint-Task Regularization for Partially Labeled Multi-Task Learning

    Authors: Kento Nishi, Junsik Kim, Wanhua Li, Hanspeter Pfister

    Abstract: Multi-task learning has become increasingly popular in the machine learning field, but its practicality is hindered by the need for large, labeled datasets. Most multi-task learning methods depend on fully labeled datasets wherein each input example is accompanied by ground-truth labels for all target tasks. Unfortunately, curating such datasets can be prohibitively expensive and impractical, espe… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted paper to CVPR 2024 (main conference)

  8. arXiv:2404.00801  [pdf, other

    cs.CV

    $R^2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

    Authors: Ye Liu, Jixuan He, Wanhua Li, Junsik Kim, Donglai Wei, Hanspeter Pfister, Chang Wen Chen

    Abstract: Video temporal grounding (VTG) is a fine-grained video understanding problem that aims to ground relevant clips in untrimmed videos given natural language queries. Most existing VTG models are built upon frame-wise final-layer CLIP features, aided by additional temporal backbones (e.g., SlowFast) with sophisticated temporal reasoning mechanisms. In this work, we claim that CLIP itself already show… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  9. arXiv:2402.18684  [pdf, ps, other

    quant-ph cs.IT

    Quantum State Compression with Polar Codes

    Authors: Jack Weinberg, Avijit Mandal, Henry D. Pfister

    Abstract: In the quantum compression scheme proposed by Schumacher, Alice compresses a message that Bob decompresses. In that approach, there is some probability of failure and, even when successful, some distortion of the state. For sufficiently large blocklengths, both of these imperfections can be made arbitrarily small while achieving a compression rate that asymptotically approaches the source coding b… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Extended Version of ISIT 2024 Submission

  10. arXiv:2402.10962  [pdf, other

    cs.CL cs.AI cs.LG

    Measuring and Controlling Instruction (In)Stability in Language Model Dialogs

    Authors: Kenneth Li, Tianle Liu, Naomi Bashkansky, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: System-prompting is a standard tool for customizing language-model chatbots, enabling them to follow a specific instruction. An implicit assumption in the use of system prompts is that they will be stable, so the chatbot will continue to generate text according to the stipulated instructions for the duration of a conversation. We propose a quantitative benchmark to test this assumption, evaluating… ▽ More

    Submitted 1 May, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Code: https://github.com/likenneth/persona_drift

  11. arXiv:2402.09372  [pdf, other

    eess.IV cs.AI cs.CV

    Deep Rib Fracture Instance Segmentation and Classification from CT on the RibFrac Challenge

    Authors: Jiancheng Yang, Rui Shi, Liang Jin, Xiaoyang Huang, Kaiming Kuang, Donglai Wei, Shixuan Gu, Jianying Liu, Pengfei Liu, Zhizhong Chai, Yongjie Xiao, Hao Chen, Liming Xu, Bang Du, Xiangyi Yan, Hao Tang, Adam Alessio, Gregory Holste, Jiapeng Zhang, Xiaoming Wang, Jianye He, Lixuan Che, Hanspeter Pfister, Ming Li, Bingbing Ni

    Abstract: Rib fractures are a common and potentially severe injury that can be challenging and labor-intensive to detect in CT scans. While there have been efforts to address this field, the lack of large-scale annotated datasets and evaluation benchmarks has hindered the development and validation of deep learning algorithms. To address this issue, the RibFrac Challenge was introduced, providing a benchmar… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Challenge paper for MICCAI RibFrac Challenge (https://ribfrac.grand-challenge.org/)

  12. arXiv:2402.03700  [pdf, other

    cs.HC cs.AI

    GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

    Authors: Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

    Abstract: The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: To Appear in IEEE PacificVis 2024

  13. arXiv:2401.15856  [pdf, other

    cs.LG cs.AI

    Look Around! Unexpected gains from training on environments in the vicinity of the target

    Authors: Serena Bono, Spandan Madan, Ishaan Grover, Mao Yasueda, Cynthia Breazeal, Hanspeter Pfister, Gabriel Kreiman

    Abstract: Solutions to Markov Decision Processes (MDP) are often very sensitive to state transition probabilities. As the estimation of these probabilities is often inaccurate in practice, it is important to understand when and how Reinforcement Learning (RL) agents generalize when transition probabilities change. Here we present a new methodology to evaluate such generalization of RL agents under small shi… ▽ More

    Submitted 28 January, 2024; originally announced January 2024.

  14. arXiv:2401.13961  [pdf, other

    cs.CV

    TriSAM: Tri-Plane SAM for zero-shot cortical blood vessel segmentation in VEM images

    Authors: Jia Wan, Wanhua Li, Jason Ken Adhinarta, Atmadeep Banerjee, Evelina Sjostedt, Jingpeng Wu, Jeff Lichtman, Hanspeter Pfister, Donglai Wei

    Abstract: While imaging techniques at macro and mesoscales have garnered substantial attention and resources, microscale Volume Electron Microscopy (vEM) imaging, capable of revealing intricate vascular details, has lacked the necessary benchmarking infrastructure. In this paper, we address a significant gap in this field of neuroimaging by introducing the first-in-class public benchmark, BvEM, designed spe… ▽ More

    Submitted 17 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: BvEM-Mouse can be visualized at: https://tinyurl.com/yc2s38x9

  15. arXiv:2401.07167  [pdf, ps, other

    cs.IT

    Polar Codes for CQ Channels: Decoding via Belief-Propagation with Quantum Messages

    Authors: Avijit Mandal, S. Brandsen, Henry D. Pfister

    Abstract: This paper considers the design and decoding of polar codes for general classical-quantum (CQ) channels. It focuses on decoding via belief-propagation with quantum messages (BPQM) and, in particular, the idea of paired-measurement BPQM (PM-BPQM) decoding. Since the PM-BPQM decoder admits a classical density evolution (DE) analysis, one can use DE to design a polar code for any CQ channel and then… ▽ More

    Submitted 13 January, 2024; originally announced January 2024.

  16. arXiv:2312.16084  [pdf, other

    cs.CV

    LangSplat: 3D Language Gaussian Splatting

    Authors: Minghan Qin, Wanhua Li, Jiawei Zhou, Haoqian Wang, Hanspeter Pfister

    Abstract: Humans live in a 3D world and commonly use natural language to interact with a 3D scene. Modeling a 3D language field to support open-ended language queries in 3D has gained increasing attention recently. This paper introduces LangSplat, which constructs a 3D language field that enables precise and efficient open-vocabulary querying within 3D spaces. Unlike existing methods that ground CLIP langua… ▽ More

    Submitted 31 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: CVPR 2024. Project Page: https://langsplat.github.io

  17. arXiv:2312.14965  [pdf, other

    cs.CV cs.LG

    Unraveling the Temporal Dynamics of the Unet in Diffusion Models

    Authors: Vidya Prasad, Chen Zhu-Tian, Anna Vilanova, Hanspeter Pfister, Nicola Pezzotti, Hendrik Strobelt

    Abstract: Diffusion models have garnered significant attention since they can effectively learn complex multivariate Gaussian distributions, resulting in diverse, high-quality outcomes. They introduce Gaussian noise into training data and reconstruct the original data iteratively. Central to this iterative process is a single Unet, adapting across time steps to facilitate generation. Recent work revealed th… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  18. arXiv:2312.10950  [pdf, other

    cs.IT quant-ph

    Belief Propagation Decoding of Quantum LDPC Codes with Guided Decimation

    Authors: Hanwen Yao, Waleed Abu Laban, Christian Häger, Alexandre Graell i Amat, Henry D. Pfister

    Abstract: Quantum low-density parity-check (QLDPC) codes have emerged as a promising technique for quantum error correction. A variety of decoders have been proposed for QLDPC codes and many of them utilize belief propagation (BP) decoding in some fashion. However, the use of BP decoding for degenerate QLDPC codes is known to have issues with convergence. These issues are typically attributed to short cycle… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 19 pages, 8 figures

  19. arXiv:2310.16783  [pdf, other

    cs.CV

    S$^3$-TTA: Scale-Style Selection for Test-Time Augmentation in Biomedical Image Segmentation

    Authors: Kangxian Xie, Siyu Huang, Sebastian Andres Cajas Ordonez, Hanspeter Pfister, Donglai Wei

    Abstract: Deep-learning models have been successful in biomedical image segmentation. To generalize for real-world deployment, test-time augmentation (TTA) methods are often used to transform the test image into different versions that are hopefully closer to the training domain. Unfortunately, due to the vast diversity of instance scale and image styles, many augmented test images produce undesirable resul… ▽ More

    Submitted 6 January, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

  20. arXiv:2310.05262  [pdf, other

    cs.CV

    Structure-Preserving Instance Segmentation via Skeleton-Aware Distance Transform

    Authors: Zudi Lin, Donglai Wei, Aarush Gupta, Xingyu Liu, Deqing Sun, Hanspeter Pfister

    Abstract: Objects with complex structures pose significant challenges to existing instance segmentation methods that rely on boundary or affinity maps, which are vulnerable to small errors around contacting pixels that cause noticeable connectivity change. While the distance transform (DT) makes instance interiors and boundaries more distinguishable, it tends to overlook the intra-object connectivity for in… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    Comments: MICCAI 2023 (Oral Presentation)

  21. This is the Table I Want! Interactive Data Transformation on Desktop and in Virtual Reality

    Authors: Sungwon In, Tica Lin, Chris North, Hanspeter Pfister, Yalong Yang

    Abstract: Data transformation is an essential step in data science. While experts primarily use programming to transform their data, there is an increasing need to support non-programmers with user interface-based tools. With the rapid development in interaction techniques and computing environments, we report our empirical findings about the effects of interaction techniques and environments on performing… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: IEEE Transactions on Visualization and Computer Graphics (TVCG), to appear

  22. arXiv:2309.10724  [pdf, other

    cs.CV cs.AI cs.MM cs.SD eess.AS

    Sound Source Localization is All about Cross-Modal Alignment

    Authors: Arda Senocak, Hyeonggon Ryu, Junsik Kim, Tae-Hyun Oh, Hanspeter Pfister, Joon Son Chung

    Abstract: Humans can easily perceive the direction of sound sources in a visual scene, termed sound source localization. Recent studies on learning-based sound source localization have mainly explored the problem from a localization perspective. However, prior arts and existing benchmarks do not account for a more important aspect of the problem, cross-modal semantic understanding, which is essential for ge… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  23. Residency Octree: A Hybrid Approach for Scalable Web-Based Multi-Volume Rendering

    Authors: Lukas Herzberger, Markus Hadwiger, Robert Krüger, Peter Sorger, Hanspeter Pfister, Eduard Gröller, Johanna Beyer

    Abstract: We present a hybrid multi-volume rendering approach based on a novel Residency Octree that combines the advantages of out-of-core volume rendering using page tables with those of standard octrees. Octree approaches work by performing hierarchical tree traversal. However, in octree volume rendering, tree traversal and the selection of data resolution are intrinsically coupled. This makes fine-grain… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

    Comments: VIS 2023 - full paper

  24. arXiv:2309.03148  [pdf, ps, other

    cs.IT cs.LG

    Data-Driven Neural Polar Codes for Unknown Channels With and Without Memory

    Authors: Ziv Aharoni, Bashar Huleihel, Henry D. Pfister, Haim H. Permuter

    Abstract: In this work, a novel data-driven methodology for designing polar codes for channels with and without memory is proposed. The methodology is suitable for the case where the channel is given as a "black-box" and the designer has access to the channel for generating observations of its inputs and outputs, but does not have access to the explicit channel model. The proposed method leverages the struc… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  25. arXiv:2308.15226  [pdf, other

    cs.CV cs.AI cs.CL

    CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

    Authors: Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter Pfister, Donglai Wei

    Abstract: There has been a growing interest in developing multimodal machine translation (MMT) systems that enhance neural machine translation (NMT) with visual knowledge. This problem setup involves using images as auxiliary information during training, and more recently, eliminating their use during inference. Towards this end, previous works face a challenge in training powerful MMT models from scratch d… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

    Comments: 15 pages, 9 figures, to be published In Proceedings of International Conference of Computer Vision(ICCV), 2023

  26. RL-LABEL: A Deep Reinforcement Learning Approach Intended for AR Label Placement in Dynamic Scenarios

    Authors: Chen Zhu-Tian, Daniele Chiappalupi, Tica Lin, Yalong Yang, Johanna Beyer, Hanspeter Pfister

    Abstract: Labels are widely used in augmented reality (AR) to display digital information. Ensuring the readability of AR labels requires placing them occlusion-free while keeping visual linkings legible, especially when multiple labels exist in the scene. Although existing optimization-based methods, such as force-based methods, are effective in managing AR labels in static scenarios, they often struggle i… ▽ More

    Submitted 10 May, 2024; v1 submitted 20 August, 2023; originally announced August 2023.

  27. arXiv:2308.05168  [pdf, other

    cs.CV cs.HC

    A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

    Authors: Changjian Chen, Yukai Guo, Fengyuan Tian, Shilong Liu, Weikai Yang, Zhaowei Wang, Jing Wu, Hang Su, Hanspeter Pfister, Shixia Liu

    Abstract: Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection. In this paper, we develop an open-source visual analysis tool, Uni-Evaluator, to support a unified model evaluation for classification, object detection, and instance segmentation in computer vision. The key idea behind our method is to formul… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: Accepted to IEEE VIS 2023

  28. arXiv:2308.03390  [pdf, other

    cs.HC

    Quantifying the Impact of XR Visual Guidance on User Performance Using a Large-Scale Virtual Assembly Experiment

    Authors: Leon Pietschmann, Paul-David Zuercher, Erik Bubík, Zhutian Chen, Hanspeter Pfister, Thomas Bohné

    Abstract: The combination of Visual Guidance and Extended Reality (XR) technology holds the potential to greatly improve the performance of human workforces in numerous areas, particularly industrial environments. Focusing on virtual assembly tasks and making use of different forms of supportive visualisations, this study investigates the potential of XR Visual Guidance. Set in a web-based immersive environ… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: IEEE VIS 2023

  29. arXiv:2307.12539  [pdf, other

    cs.HC cs.GR

    VIRD: Immersive Match Video Analysis for High-Performance Badminton Coaching

    Authors: Tica Lin, Alexandre Aouididi, Zhutian Chen, Johanna Beyer, Hanspeter Pfister, Jui-Hsien Wang

    Abstract: Badminton is a fast-paced sport that requires a strategic combination of spatial, temporal, and technical tactics. To gain a competitive edge at high-level competitions, badminton professionals frequently analyze match videos to gain insights and develop game strategies. However, the current process for analyzing matches is time-consuming and relies heavily on manual note-taking, due to the lack o… ▽ More

    Submitted 8 August, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: To Appear in IEEE Transactions on Visualization and Computer Graphics (IEEE VIS), 2023

  30. Domain-Scalable Unpaired Image Translation via Latent Space Anchoring

    Authors: Siyu Huang, Jie An, Donglai Wei, Zudi Lin, Jiebo Luo, Hanspeter Pfister

    Abstract: Unpaired image-to-image translation (UNIT) aims to map images between two visual domains without paired training data. However, given a UNIT model trained on certain domains, it is difficult for current methods to incorporate new domains because they often need to train the full model on both existing and new domains. To address this problem, we propose a new domain-scalable UNIT method, termed as… ▽ More

    Submitted 26 June, 2023; originally announced June 2023.

    Comments: Accepeted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI). Code is available at https://github.com/siyuhuang/Latent-Space-Anchoring

  31. arXiv:2306.03341  [pdf, other

    cs.LG cs.AI cs.CL

    Inference-Time Intervention: Eliciting Truthful Answers from a Language Model

    Authors: Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLa… ▽ More

    Submitted 26 June, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: NeurIPS 2023 spotlight; code: https://github.com/likenneth/honest_llama

  32. arXiv:2306.02914  [pdf, other

    cs.HC cs.GR

    Beyond Generating Code: Evaluating GPT on a Data Visualization Course

    Authors: Chen Zhu-Tian, Chenyang Zhang, Qianwen Wang, Jakob Troidl, Simon Warchol, Johanna Beyer, Nils Gehlenborg, Hanspeter Pfister

    Abstract: This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization desig… ▽ More

    Submitted 11 May, 2024; v1 submitted 5 June, 2023; originally announced June 2023.

    Comments: vis short papge

  33. arXiv:2305.07779  [pdf, ps, other

    cs.IT

    Achieving Capacity on Non-Binary Channels with Generalized Reed-Muller Codes

    Authors: Galen Reeves, Henry D. Pfister

    Abstract: Recently, the authors showed that Reed-Muller (RM) codes achieve capacity on binary memoryless symmetric (BMS) channels with respect to bit error rate. This paper extends that work by showing that RM codes defined on non-binary fields, known as generalized RM codes, achieve capacity on sufficiently symmetric non-binary channels with respect to symbol error rate. The new proof also simplifies the p… ▽ More

    Submitted 12 May, 2023; originally announced May 2023.

    Comments: Extended version of ISIT 2023 accepted paper

  34. arXiv:2303.13132  [pdf, other

    cs.CV

    Masked Image Training for Generalizable Deep Image Denoising

    Authors: Haoyu Chen, Jinjin Gu, Yihao Liu, Salma Abdel Magid, Chao Dong, Qiong Wang, Hanspeter Pfister, Lei Zhu

    Abstract: When capturing and storing images, devices inevitably introduce noise. Reducing this noise is a critical task called image denoising. Deep learning has become the de facto method for image denoising, especially with the emergence of Transformer-based models that have achieved notable state-of-the-art results on various image tasks. However, deep learning-based methods often suffer from a lack of g… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  35. iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations

    Authors: Chen Zhu-Tian, Qisen Yang, Jiarui Shan, Tica Lin, Johanna Beyer, Haijun Xia, Hanspeter Pfister

    Abstract: We present iBall, a basketball video-watching system that leverages gaze-moderated embedded visualizations to facilitate game understanding and engagement of casual fans. Video broadcasting and online video platforms make watching basketball games increasingly accessible. Yet, for new or casual fans, watching basketball videos is often confusing due to their limited basketball knowledge and the la… ▽ More

    Submitted 10 May, 2024; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: ACM CHI23

  36. arXiv:2302.03819  [pdf, other

    cs.CV cs.LG q-bio.NC

    The XPRESS Challenge: Xray Projectomic Reconstruction -- Extracting Segmentation with Skeletons

    Authors: Tri Nguyen, Mukul Narwani, Mark Larson, Yicong Li, Shuhan Xie, Hanspeter Pfister, Donglai Wei, Nir Shavit, Lu Mi, Alexandra Pacureanu, Wei-Chung Lee, Aaron T. Kuan

    Abstract: The wiring and connectivity of neurons form a structural basis for the function of the nervous system. Advances in volume electron microscopy (EM) and image segmentation have enabled mapping of circuit diagrams (connectomics) within local regions of the mouse brain. However, applying volume EM over the whole brain is not currently feasible due to technological challenges. As a result, comprehensiv… ▽ More

    Submitted 24 February, 2023; v1 submitted 7 February, 2023; originally announced February 2023.

    Comments: 6 pages, 2 figures

  37. arXiv:2302.00545  [pdf, other

    cs.CV q-bio.NC

    An Out-of-Domain Synapse Detection Challenge for Microwasp Brain Connectomes

    Authors: Jingpeng Wu, Yicong Li, Nishika Gupta, Kazunori Shinomiya, Pat Gunn, Alexey Polilov, Hanspeter Pfister, Dmitri Chklovskii, Donglai Wei

    Abstract: The size of image stacks in connectomics studies now reaches the terabyte and often petabyte scales with a great diversity of appearance across brain regions and samples. However, manual annotation of neural structures, e.g., synapses, is time-consuming, which leads to limited training data often smaller than 0.001\% of the test data in size. Domain adaptation and generalization approaches were pr… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  38. Is Embodied Interaction Beneficial? A Study on Navigating Network Visualizations

    Authors: Helen H. Huang, Hanspeter Pfister, Yalong Yang

    Abstract: Network visualizations are commonly used to analyze relationships in various contexts. To efficiently explore a network visualization, the user needs to quickly navigate to different parts of the network and analyze local details. Recent advancements in display and interaction technologies inspire new visions for improved visualization and interaction design. Past research into network design has… ▽ More

    Submitted 26 January, 2023; originally announced January 2023.

    Comments: Accepted by the Information Visualization journal

  39. arXiv:2212.10431  [pdf, other

    cs.CV cs.LG cs.MM eess.IV

    QuantArt: Quantizing Image Style Transfer Towards High Visual Fidelity

    Authors: Siyu Huang, Jie An, Donglai Wei, Jiebo Luo, Hanspeter Pfister

    Abstract: The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style. However, this type of approach cannot guarantee visual fidelity, i.e., the generated artworks should be indistinguishable from real ones. In this paper, we devise a new style transfer framework called QuantArt for high visual-fi… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 December, 2022; originally announced December 2022.

    Comments: Accepted to CVPR 2023. Code is available at https://github.com/siyuhuang/QuantArt

  40. arXiv:2212.04711  [pdf, other

    cs.CV

    ShadowDiffusion: When Degradation Prior Meets Diffusion Model for Shadow Removal

    Authors: Lanqing Guo, Chong Wang, Wenhan Yang, Siyu Huang, Yufei Wang, Hanspeter Pfister, Bihan Wen

    Abstract: Recent deep learning methods have achieved promising results in image shadow removal. However, their restored images still suffer from unsatisfactory boundary artifacts, due to the lack of degradation prior embedding and the deficiency in modeling capacity. Our work addresses these issues by proposing a unified diffusion framework that integrates both the image and degradation priors for highly ef… ▽ More

    Submitted 13 December, 2022; v1 submitted 9 December, 2022; originally announced December 2022.

    ACM Class: I.2.10; I.5.4

  41. arXiv:2211.13087  [pdf, other

    cs.CV cs.AI

    Human or Machine? Turing Tests for Vision and Language

    Authors: Mengmi Zhang, Giorgia Dellaferrera, Ankur Sikarwar, Marcelo Armendariz, Noga Mudrik, Prachi Agrawal, Spandan Madan, Andrei Barbu, Haochen Yang, Tanishq Kumar, Meghna Sadwani, Stella Dellaferrera, Michele Pizzochero, Hanspeter Pfister, Gabriel Kreiman

    Abstract: As AI algorithms increasingly participate in daily activities that used to be the sole province of humans, we are inevitably called upon to consider how much machines are really like us. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans. We establish a methodology to evaluate humans versus machines in Turing-like test… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: 134 pages

  42. The Ball is in Our Court: Conducting Visualization Research with Sports Experts

    Authors: Tica Lin, Zhutian Chen, Johanna Beyer, Yincai Wu, Hanspeter Pfister, Yalong Yang

    Abstract: Most sports visualizations rely on a combination of spatial, highly temporal, and user-centric data, making sports a challenging target for visualization. Emerging technologies, such as augmented and mixed reality (AR/XR), have brought exciting opportunities along with new challenges for sports visualization. We share our experience working with sports domain experts and present lessons learned fr… ▽ More

    Submitted 14 November, 2022; originally announced November 2022.

    Comments: To appear in IEEE Transactions on Computer Graphics and Applications (IEEE CG&A), 2022

  43. arXiv:2210.13382  [pdf, other

    cs.LG cs.AI cs.CL

    Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task

    Authors: Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg

    Abstract: Language models show a surprising range of capabilities, but the source of their apparent competence is unclear. Do these networks just memorize a collection of surface statistics, or do they rely on internal representations of the process that generates the sequences they see? We investigate this question by applying a variant of the GPT model to the task of predicting legal moves in a simple boa… ▽ More

    Submitted 26 June, 2024; v1 submitted 24 October, 2022; originally announced October 2022.

    Comments: ICLR 2023 oral (notable-top-5%): https://openreview.net/forum?id=DeG07_TcZvT ; code: https://github.com/likenneth/othello_world

  44. arXiv:2210.09309  [pdf, other

    eess.IV cs.CV cs.LG

    RibSeg v2: A Large-scale Benchmark for Rib Labeling and Anatomical Centerline Extraction

    Authors: Liang Jin, Shixuan Gu, Donglai Wei, Jason Ken Adhinarta, Kaiming Kuang, Yongjie Jessica Zhang, Hanspeter Pfister, Bingbing Ni, Jiancheng Yang, Ming Li

    Abstract: Automatic rib labeling and anatomical centerline extraction are common prerequisites for various clinical applications. Prior studies either use in-house datasets that are inaccessible to communities, or focus on rib segmentation that neglects the clinical significance of rib labeling. To address these issues, we extend our prior dataset (RibSeg) on the binary rib segmentation task to a comprehens… ▽ More

    Submitted 1 August, 2023; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: 10 pages, 6 figures, journal

  45. Sporthesia: Augmenting Sports Videos Using Natural Language

    Authors: Chen Zhu-Tian, Qisen Yang, Xiao Xie, Johanna Beyer, Haijun Xia, Yingcai Wu, Hanspeter Pfister

    Abstract: Augmented sports videos, which combine visualizations and video effects to present data in actual scenes, can communicate insights engagingly and thus have been increasingly popular for sports enthusiasts around the world. Yet, creating augmented sports videos remains a challenging task, requiring considerable time and video editing skills. On the other hand, sports insights are often communicated… ▽ More

    Submitted 10 May, 2024; v1 submitted 7 September, 2022; originally announced September 2022.

    Comments: 10 pages, IEEE VIS conference

    Journal ref: IEEE Transactions on Visualization and Computer Graphics 2022

  46. arXiv:2209.00202  [pdf, other

    cs.HC

    The Quest for Omnioculars: Embedded Visualization for Augmenting Basketball Game Viewing Experiences

    Authors: Tica Lin, Zhutian Chen, Yalong Yang, Daniele Chiappalupi, Johanna Beyer, Hanspeter Pfister

    Abstract: Sports game data is becoming increasingly complex, often consisting of multivariate data such as player performance stats, historical team records, and athletes' positional tracking information. While numerous visual analytics systems have been developed for sports analysts to derive insights, few tools target fans to improve their understanding and engagement of sports data during live games. By… ▽ More

    Submitted 31 August, 2022; originally announced September 2022.

    Comments: To appear in IEEE Transactions on Visualization and Computer Graphics (IEEE VIS), 2022

  47. arXiv:2208.07852  [pdf, other

    cs.CL cs.HC cs.LG

    Interactive and Visual Prompt Engineering for Ad-hoc Task Adaptation with Large Language Models

    Authors: Hendrik Strobelt, Albert Webson, Victor Sanh, Benjamin Hoover, Johanna Beyer, Hanspeter Pfister, Alexander M. Rush

    Abstract: State-of-the-art neural language models can now be used to solve ad-hoc language tasks through zero-shot prompting without the need for supervised training. This approach has gained popularity in recent years, and researchers have demonstrated prompts that achieve strong accuracy on specific NLP tasks. However, finding a prompt for new tasks requires experimentation. Different prompt templates wit… ▽ More

    Submitted 16 August, 2022; originally announced August 2022.

    Comments: 9 pages content, 2 pages references

  48. arXiv:2207.04984  [pdf, other

    cs.IT quant-ph

    Belief Propagation with Quantum Messages for Symmetric Classical-Quantum Channels

    Authors: S. Brandsen, Avijit Mandal, Henry D. Pfister

    Abstract: Belief propagation (BP) is a classical algorithm that approximates the marginal distribution associated with a factor graph by passing messages between adjacent nodes in the graph. It gained popularity in the 1990's as a powerful decoding algorithm for LDPC codes. In 2016, Renes introduced a belief propagation with quantum messages (BPQM) and described how it could be used to decode classical code… ▽ More

    Submitted 11 July, 2022; originally announced July 2022.

    Comments: Extended version of submission to the 2022 Information Theory Workshop in Mumbai, India

  49. arXiv:2206.07802  [pdf, other

    cs.CV cs.AI cs.GR

    Improving generalization by mimicking the human visual diet

    Authors: Spandan Madan, You Li, Mengmi Zhang, Hanspeter Pfister, Gabriel Kreiman

    Abstract: We present a new perspective on bridging the generalization gap between biological and computer vision -- mimicking the human visual diet. While computer vision models rely on internet-scraped datasets, humans learn from limited 3D scenes under diverse real-world transformations with objects in natural context. Our results demonstrate that incorporating variations and contextual cues ubiquitous in… ▽ More

    Submitted 10 January, 2024; v1 submitted 15 June, 2022; originally announced June 2022.

  50. Diagnosing Ensemble Few-Shot Classifiers

    Authors: Weikai Yang, Xi Ye, Xingxing Zhang, Lanxi Xiao, Jiazhi Xia, Zhongyuan Wang, Jun Zhu, Hanspeter Pfister, Shixia Liu

    Abstract: The base learners and labeled samples (shots) in an ensemble few-shot classifier greatly affect the model performance. When the performance is not satisfactory, it is usually difficult to understand the underlying causes and make improvements. To tackle this issue, we propose a visual analysis method, FSLDiagnotor. Given a set of base learners and a collection of samples with a few shots, we consi… ▽ More

    Submitted 9 June, 2022; originally announced June 2022.

    Comments: Accepted in IEEE TVCG