Skip to main content

Showing 1–50 of 347 results for author: Chang, J

  1. arXiv:2407.08801  [pdf, other

    cs.CV

    DG-PIC: Domain Generalized Point-In-Context Learning for Point Cloud Understanding

    Authors: Jincen Jiang, Qianyu Zhou, Yuhang Li, Xuequan Lu, Meili Wang, Lizhuang Ma, Jian Chang, Jian Jun Zhang

    Abstract: Recent point cloud understanding research suffers from performance drops on unseen data, due to the distribution shifts across different domains. While recent studies use Domain Generalization (DG) techniques to mitigate this by learning domain-invariant features, most are designed for a single task and neglect the potential of testing data. Despite In-Context Learning (ICL) showcasing multi-task… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2407.05254  [pdf, other

    cs.CV

    GaussReg: Fast 3D Registration with Gaussian Splatting

    Authors: Jiahao Chang, Yinglin Xu, Yihao Li, Yuantao Chen, Xiaoguang Han

    Abstract: Point cloud registration is a fundamental problem for large-scale 3D scene scanning and reconstruction. With the help of deep learning, registration methods have evolved significantly, reaching a nearly-mature stage. As the introduction of Neural Radiance Fields (NeRF), it has become the most popular 3D scene representation as its powerful view synthesis capabilities. Regarding NeRF representation… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  3. arXiv:2406.19560  [pdf, other

    cs.CV cs.LG eess.IV

    Cost-efficient Active Illumination Camera For Hyper-spectral Reconstruction

    Authors: Yuxuan Zhang, T. M. Sazzad, Yangyang Song, Spencer J. Chang, Ritesh Chowdhry, Tomas Mejia, Anna Hampton, Shelby Kucharski, Stefan Gerber, Barry Tillman, Marcio F. R. Resende, William M. Hammond, Chris H. Wilson, Alina Zare, Sanjeev J. Koppal

    Abstract: Hyper-spectral imaging has recently gained increasing attention for use in different applications, including agricultural investigation, ground tracking, remote sensing and many other. However, the high cost, large physical size and complicated operation process stop hyperspectral cameras from being employed for various applications and research fields. In this paper, we introduce a cost-efficient… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.10370  [pdf, other

    cs.HC

    Let's Get to the Point: LLM-Supported Planning, Drafting, and Revising of Research-Paper Blog Posts

    Authors: Marissa Radensky, Daniel S. Weld, Joseph Chee Chang, Pao Siangliulue, Jonathan Bragg

    Abstract: Research-paper blog posts help scientists disseminate their work to a larger audience, but translating papers into this format requires substantial additional effort. Blog post creation is not simply transforming a long-form article into a short output, as studied in most prior work on human-AI summarization. In contrast, blog posts are typically full-length articles that require a combination of… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 28 pages, 9 figures in main text (not appendix)

  5. arXiv:2406.00490  [pdf, other

    cs.CV cs.AI

    Research on the Application of Computer Vision Based on Deep Learning in Autonomous Driving Technology

    Authors: Jingyu Zhang, Jin Cao, Jinghao Chang, Xinjin Li, Houze Liu, Zhenglin Li

    Abstract: This research aims to explore the application of deep learning in autonomous driving computer vision technology and its impact on improving system performance. By using advanced technologies such as convolutional neural networks (CNN), multi-task joint learning methods, and deep reinforcement learning, this article analyzes in detail the application of deep learning in image recognition, real-time… ▽ More

    Submitted 3 June, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

  6. arXiv:2405.17829  [pdf, other

    cs.LG cs.AI

    LDMol: Text-Conditioned Molecule Diffusion Model Leveraging Chemically Informative Latent Space

    Authors: Jinho Chang, Jong Chul Ye

    Abstract: With the emergence of diffusion models as the frontline of generative models, many researchers have proposed molecule generation techniques using conditional diffusion models. However, due to the fundamental nature of a molecule, which carries highly entangled correlations within a small number of atoms and bonds, it becomes difficult for a model to connect raw data with the conditions when the co… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  7. arXiv:2405.13226  [pdf, other

    cs.CL cs.LG

    Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum

    Authors: Hadi Pouransari, Chun-Liang Li, Jen-Hao Rick Chang, Pavan Kumar Anasosalu Vasu, Cem Koc, Vaishaal Shankar, Oncel Tuzel

    Abstract: Large language models (LLMs) are commonly trained on datasets consisting of fixed-length token sequences. These datasets are created by randomly concatenating documents of various lengths and then chunking them into sequences of a predetermined target length. However, this method of concatenation can lead to cross-document attention within a sequence, which is neither a desirable learning signal n… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2405.09592  [pdf, other

    cs.LG cs.AI cs.CE

    A Survey of Generative Techniques for Spatial-Temporal Data Mining

    Authors: Qianru Zhang, Haixin Wang, Cheng Long, Liangcai Su, Xingwei He, Jianlong Chang, Tailin Wu, Hongzhi Yin, Siu-Ming Yiu, Qi Tian, Christian S. Jensen

    Abstract: This paper focuses on the integration of generative techniques into spatial-temporal data mining, considering the significant growth and diverse nature of spatial-temporal data. With the advancements in RNNs, CNNs, and other non-generative techniques, researchers have explored their application in capturing temporal and spatial dependencies within spatial-temporal data. However, the emergence of g… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 19 pages

  9. Theorizing Deception: A Scoping Review of Theory in Research on Dark Patterns and Deceptive Design

    Authors: Weichen Joe Chang, Katie Seaborn, Andrew A. Adams

    Abstract: The issue of dark patterns and deceptive designs (DPs) in everyday interfaces and interactions continues to grow. DPs are manipulative and malicious elements within user interfaces that deceive users into making unintended choices. In parallel, research on DPs has significantly increased over the past two decades. As the field has matured, epistemological gaps have also become a salient and pressi… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Journal ref: CHI EA '24: Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (2024), Article No.: 321, 1-7

  10. arXiv:2405.04943  [pdf, ps, other

    cs.CV

    Unsupervised Skin Feature Tracking with Deep Neural Networks

    Authors: Jose Chang, Torbjörn E. M. Nordling

    Abstract: Facial feature tracking is essential in imaging ballistocardiography for accurate heart rate estimation and enables motor degradation quantification in Parkinson's disease through skin feature tracking. While deep convolutional neural networks have shown remarkable accuracy in tracking tasks, they typically require extensive labeled data for supervised training. Our proposed pipeline employs a con… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2112.14159

  11. arXiv:2404.17486  [pdf, other

    cs.CV

    TextGaze: Gaze-Controllable Face Generation with Natural Language

    Authors: Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang

    Abstract: Generating face image with specific gaze information has attracted considerable attention. Existing approaches typically input gaze values directly for face generation, which is unnatural and requires annotated gaze datasets for training, thereby limiting its application. In this paper, we present a novel gaze-controllable face generation task. Our approach inputs textual descriptions that describ… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Comments: Under review

  12. arXiv:2404.16767  [pdf, other

    cs.LG cs.CL cs.CV

    REBEL: Reinforcement Learning via Regressing Relative Rewards

    Authors: Zhaolin Gao, Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Gokul Swamy, Kianté Brantley, Thorsten Joachims, J. Andrew Bagnell, Jason D. Lee, Wen Sun

    Abstract: While originally developed for continuous control problems, Proximal Policy Optimization (PPO) has emerged as the work-horse of a variety of reinforcement learning (RL) applications, including the fine-tuning of generative models. Unfortunately, PPO requires multiple heuristics to enable stable convergence (e.g. value networks, clipping), and is notorious for its sensitivity to the precise impleme… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: New experimental results on general chat

  13. arXiv:2404.08513  [pdf, other

    cs.LG cs.AI

    Adversarial Imitation Learning via Boosting

    Authors: Jonathan D. Chang, Dhruv Sreenivas, Yingbing Huang, Kianté Brantley, Wen Sun

    Abstract: Adversarial imitation learning (AIL) has stood out as a dominant framework across various imitation learning (IL) applications, with Discriminator Actor Critic (DAC) (Kostrikov et al.,, 2019) demonstrating the effectiveness of off-policy learning algorithms in improving sample efficiency and scalability to higher-dimensional observations. Despite DAC's empirical success, the original AIL objective… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 19 pages, 7 figures, 4 tables, 3 algorithms, ICLR 2024

  14. arXiv:2404.08495  [pdf, other

    cs.LG cs.AI cs.CL

    Dataset Reset Policy Optimization for RLHF

    Authors: Jonathan D. Chang, Wenhao Zhan, Owen Oertell, Kianté Brantley, Dipendra Misra, Jason D. Lee, Wen Sun

    Abstract: Reinforcement Learning (RL) from Human Preference-based feedback is a popular paradigm for fine-tuning generative models, which has produced impressive models such as GPT-4 and Claude3 Opus. This framework often consists of two steps: learning a reward model from an offline preference dataset followed by running online RL to optimize the learned reward model. In this work, leveraging the idea of r… ▽ More

    Submitted 16 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

    Comments: 28 pages, 6 tables, 3 Figures, 3 Algorithms

  15. arXiv:2404.03673  [pdf, other

    cs.CV cs.AI cs.LG

    RL for Consistency Models: Faster Reward Guided Text-to-Image Generation

    Authors: Owen Oertell, Jonathan D. Chang, Yiyi Zhang, Kianté Brantley, Wen Sun

    Abstract: Reinforcement learning (RL) has improved guided image generation with diffusion models by directly optimizing rewards that capture image quality, aesthetics, and instruction following capabilities. However, the resulting generative policies inherit the same iterative sampling process of diffusion models that causes slow generation. To overcome this limitation, consistency models proposed learning… ▽ More

    Submitted 22 June, 2024; v1 submitted 25 March, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures, 1 table

  16. arXiv:2403.19632  [pdf, other

    cs.CV

    GauStudio: A Modular Framework for 3D Gaussian Splatting and Beyond

    Authors: Chongjie Ye, Yinyu Nie, Jiahao Chang, Yuantao Chen, Yihao Zhi, Xiaoguang Han

    Abstract: We present GauStudio, a novel modular framework for modeling 3D Gaussian Splatting (3DGS) to provide standardized, plug-and-play components for users to easily customize and implement a 3DGS pipeline. Supported by our framework, we propose a hybrid Gaussian representation with foreground and skyball background models. Experiments demonstrate this representation reduces artifacts in unbounded outdo… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Code: https://github.com/GAP-LAB-CUHK-SZ/gaustudio

  17. arXiv:2403.17428  [pdf, other

    cs.AI cs.CL

    Aligning Large Language Models for Enhancing Psychiatric Interviews through Symptom Delineation and Summarization

    Authors: Jae-hee So, Joonhwan Chang, Eunji Kim, Junho Na, JiYeon Choi, Jy-yong Sohn, Byung-Hoon Kim, Sang Hui Chu

    Abstract: Recent advancements in Large Language Models (LLMs) have accelerated their usage in various domains. Given the fact that psychiatric interviews are goal-oriented and structured dialogues between the professional interviewer and the interviewee, it is one of the most underexplored areas where LLMs can contribute substantial value. Here, we explore the use of LLMs for enhancing psychiatric interview… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  18. arXiv:2403.16428  [pdf, other

    cs.CV

    Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

    Authors: Zicong Fan, Takehiko Ohkawa, Linlin Yang, Nie Lin, Zhishan Zhou, Shihao Zhou, Jiajun Liang, Zhong Gao, Xuanyang Zhang, Xue Zhang, Fei Li, Liu Zheng, Feng Lu, Karim Abou Zeid, Bastian Leibe, Jeongwan On, Seungryul Baek, Aditya Prakash, Saurabh Gupta, Kun He, Yoichi Sato, Otmar Hilliges, Hyung Jin Chang, Angela Yao

    Abstract: We interact with the world with our hands and see it through our own (egocentric) perspective. A holistic 3D understanding of such interactions from egocentric views is important for tasks in robotics, AR/VR, action recognition and motion generation. Accurately reconstructing such interactions in 3D is challenging due to heavy occlusion, viewpoint bias, camera distortion, and motion blur from the… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  19. arXiv:2403.15943  [pdf, ps, other

    cs.CV

    Advanced Feature Manipulation for Enhanced Change Detection Leveraging Natural Language Models

    Authors: Zhenglin Li, Yangchen Huang, Mengran Zhu, Jingyu Zhang, JingHao Chang, Houze Liu

    Abstract: Change detection is a fundamental task in computer vision that processes a bi-temporal image pair to differentiate between semantically altered and unaltered regions. Large language models (LLMs) have been utilized in various domains for their exceptional feature extraction capabilities and have shown promise in numerous downstream applications. In this study, we harness the power of a pre-trained… ▽ More

    Submitted 13 June, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

    Comments: This version is not our full version based on our new progress, related data, and methodology we are dealing with, and based on the rules and the laws, we are adjusting our current version

  20. arXiv:2403.15664  [pdf, other

    cs.CV

    What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

    Authors: Yihua Cheng, Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang

    Abstract: Driver's eye gaze holds a wealth of cognitive and intentional cues crucial for intelligent vehicles. Despite its significance, research on in-vehicle gaze estimation remains limited due to the scarcity of comprehensive and well-annotated datasets in real driving scenarios. In this paper, we present three novel elements to advance in-vehicle gaze research. Firstly, we introduce IVGaze, a pioneering… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

    Comments: CVPR24

  21. A Design Space for Intelligent and Interactive Writing Assistants

    Authors: Mina Lee, Katy Ilonka Gero, John Joon Young Chung, Simon Buckingham Shum, Vipul Raheja, Hua Shen, Subhashini Venugopalan, Thiemo Wambsganss, David Zhou, Emad A. Alghamdi, Tal August, Avinash Bhat, Madiha Zahrah Choksi, Senjuti Dutta, Jin L. C. Guo, Md Naimul Hoque, Yewon Kim, Simon Knight, Seyed Parsa Neshaei, Agnia Sergeyuk, Antonette Shibani, Disha Shrivastava, Lila Shroff, Jessi Stark, Sarah Sterman , et al. (11 additional authors not shown)

    Abstract: In our era of rapid technological advancement, the research landscape for writing assistants has become increasingly fragmented across various research communities. We seek to address this challenge by proposing a design space as a structured way to examine and explore the multidimensional space of intelligent and interactive writing assistants. Through a large community collaboration, we explore… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Published as a conference paper at CHI 2024

  22. arXiv:2403.13551  [pdf, other

    cs.CV cs.LG

    Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing

    Authors: Hangeol Chang, Jinho Chang, Jong Chul Ye

    Abstract: Despite recent advancements in text-to-image diffusion models facilitating various image editing techniques, complex text prompts often lead to an oversight of some requests due to a bottleneck in processing text information. To tackle this challenge, we present Ground-A-Score, a simple yet powerful model-agnostic image editing method by incorporating grounding during score distillation. This appr… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  23. arXiv:2403.12002  [pdf, other

    cs.CV cs.AI

    DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing

    Authors: Hyeonho Jeong, Jinho Chang, Geon Yeong Park, Jong Chul Ye

    Abstract: Text-driven diffusion-based video editing presents a unique challenge not encountered in image editing literature: establishing real-world motion. Unlike existing video editing approaches, here we focus on score distillation sampling to circumvent the standard reverse diffusion process and initiate optimization from videos that already exhibit natural motion. Our analysis reveals that while video… ▽ More

    Submitted 15 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ECCV 2024, Project page: https://hyeonho99.github.io/dreammotion/

  24. arXiv:2403.10301  [pdf, other

    cs.CL cs.CV

    Uni-SMART: Universal Science Multimodal Analysis and Research Transformer

    Authors: Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, Hongshuai Wang, Yongge Li, Mujie Lin, Yaqi Li, Yuqi Yin, Linfeng Zhang, Guolin Ke

    Abstract: In scientific research and its application, scientific literature analysis is crucial as it allows researchers to build on the work of others. However, the fast growth of scientific knowledge has led to a massive increase in scholarly articles, making in-depth literature analysis increasingly challenging and time-consuming. The emergence of Large Language Models (LLMs) has offered a new way to add… ▽ More

    Submitted 15 June, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  25. arXiv:2403.06225  [pdf, other

    cs.CV cs.AI

    MoST: Motion Style Transformer between Diverse Action Contents

    Authors: Boeun Kim, Jungho Kim, Hyung Jin Chang, Jin Young Choi

    Abstract: While existing motion style transfer methods are effective between two motions with identical content, their performance significantly diminishes when transferring style between motions with different contents. This challenge lies in the lack of clear separation between content and style of a motion. To tackle this challenge, we propose a novel motion style transformer that effectively disentangle… ▽ More

    Submitted 20 March, 2024; v1 submitted 10 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  26. arXiv:2403.05268  [pdf, ps, other

    cs.CL cs.LG

    Deep Prompt Multi-task Network for Abuse Language Detection

    Authors: Jian Zhu, Yuping Ruan, Jingfei Chang, Wenhui Sun, Hui Wan, Jian Long, Cheng Luo

    Abstract: The detection of abusive language remains a long-standing challenge with the extensive use of social networks. The detection task of abusive language suffers from limited accuracy. We argue that the existing detection methods utilize the fine-tuning technique of the pre-trained language models (PLMs) to handle downstream tasks. Hence, these methods fail to stimulate the general knowledge of the PL… ▽ More

    Submitted 24 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted by the International Conference on Pattern Recognition (ICPR) 2024

  27. arXiv:2403.04652  [pdf, other

    cs.CL cs.AI

    Yi: Open Foundation Models by 01.AI

    Authors: 01. AI, :, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie , et al. (7 additional authors not shown)

    Abstract: We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  28. arXiv:2403.02939  [pdf, other

    cs.DL cs.AI cs.CL cs.HC

    PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers

    Authors: Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, Pao Siangliulue

    Abstract: With the rapid growth of scholarly archives, researchers subscribe to "paper alert" systems that periodically provide them with recommendations of recently published papers that are similar to previously collected papers. However, researchers sometimes struggle to make sense of nuanced connections between recommended papers and their own research context, as existing systems only present paper tit… ▽ More

    Submitted 9 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Accepted to CHI 2024

  29. arXiv:2403.01976  [pdf, other

    cs.CL

    SciAssess: Benchmarking LLM Proficiency in Scientific Literature Analysis

    Authors: Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Hongshuai Wang, Yongge Li, Mujie Lin, Shuwen Yang, Jiankun Wang, Mingjun Xu, Jin Huang, Fang Xi, Jiaxi Zhuang, Yuqi Yin, Yaqi Li, Changhong Chen, Zheng Cheng, Zifeng Zhao, Linfeng Zhang, Guolin Ke

    Abstract: Recent breakthroughs in Large Language Models (LLMs) have revolutionized natural language understanding and generation, sparking significant interest in applying them to scientific literature analysis. However, existing benchmarks fail to adequately evaluate the proficiency of LLMs in this domain, particularly in scenarios requiring higher-level abilities beyond mere memorization and the handling… ▽ More

    Submitted 18 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

  30. arXiv:2403.01513  [pdf

    eess.IV cs.CV

    CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion

    Authors: Jiao Ding, Jie Chang, Renrui Han, Li Yang

    Abstract: Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanis… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  31. arXiv:2402.18362  [pdf, other

    cs.CV cs.AI

    Objective and Interpretable Breast Cosmesis Evaluation with Attention Guided Denoising Diffusion Anomaly Detection Model

    Authors: Sangjoon Park, Yong Bae Kim, Jee Suk Chang, Seo Hee Choi, Hyungjin Chung, Ik Jae Lee, Hwa Kyung Byun

    Abstract: As advancements in the field of breast cancer treatment continue to progress, the assessment of post-surgical cosmetic outcomes has gained increasing significance due to its substantial impact on patients' quality of life. However, evaluating breast cosmesis presents challenges due to the inherently subjective nature of expert labeling. In this study, we present a novel automated approach, Attenti… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  32. Mitigating Barriers to Public Social Interaction with Meronymous Communication

    Authors: Nouran Soliman, Hyeonsu B Kang, Matthew Latzke, Jonathan Bragg, Joseph Chee Chang, Amy X. Zhang, David R Karger

    Abstract: In communities with social hierarchies, fear of judgment can discourage communication. While anonymity may alleviate some social pressure, fully anonymous spaces enable toxic behavior and hide the social context that motivates people to participate and helps them tailor their communication. We explore a design space of meronymous communication, where people can reveal carefully chosen aspects of t… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI '24), May 11--16, 2024, Honolulu, HI, USA

  33. arXiv:2402.12613  [pdf, other

    cs.LG

    Analysis of Using Sigmoid Loss for Contrastive Learning

    Authors: Chungpa Lee, Joonhwan Chang, Jy-yong Sohn

    Abstract: Contrastive learning has emerged as a prominent branch of self-supervised learning for several years. Especially, CLIP, which applies contrastive learning to large sets of captioned images, has garnered significant attention. Recently, SigLIP, a variant of CLIP, has been proposed, which uses the sigmoid loss instead of the standard InfoNCE loss. SigLIP achieves the performance comparable to CLIP i… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Journal ref: Proceedings of the 27th International Conference on Artificial Intelligence and Statistics (AISTATS) 2024, Valencia, Spain

  34. arXiv:2402.08151  [pdf, other

    stat.ME cs.AI cs.LG math.SP math.ST

    Gradient-flow adaptive importance sampling for Bayesian leave one out cross-validation for sigmoidal classification models

    Authors: Joshua C Chang, Xiangting Li, Shixin Xu, Hao-Ren Yao, Julia Porcino, Carson Chow

    Abstract: We introduce a set of gradient-flow-guided adaptive importance sampling (IS) transformations to stabilize Monte-Carlo approximations of point-wise leave one out cross-validated (LOO) predictions for Bayesian classification models. One can leverage this methodology for assessing model generalizability by for instance computing a LOO analogue to the AIC or computing LOO ROC/PRC curves and derived me… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: Submitted

  35. arXiv:2402.07788  [pdf, other

    cs.CL

    Multi-Intent Attribute-Aware Text Matching in Searching

    Authors: Mingzhe Li, Xiuying Chen, Jing Xiang, Qishen Zhang, Changsheng Ma, Chenchen Dai, Jinxiong Chang, Zhongyi Liu, Guannan Zhang

    Abstract: Text matching systems have become a fundamental service in most searching platforms. For instance, they are responsible for matching user queries to relevant candidate items, or rewriting the user-input query to a pre-selected high-performing one for a better search experience. In practice, both the queries and items often contain multiple attributes, such as the category of the item and the locat… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Comments: 9 pages

  36. arXiv:2402.05532  [pdf, other

    cs.CV

    NCRF: Neural Contact Radiance Fields for Free-Viewpoint Rendering of Hand-Object Interaction

    Authors: Zhongqun Zhang, Jifei Song, Eduardo Pérez-Pellitero, Yiren Zhou, Hyung Jin Chang, Aleš Leonardis

    Abstract: Modeling hand-object interactions is a fundamentally challenging task in 3D computer vision. Despite remarkable progress that has been achieved in this field, existing methods still fail to synthesize the hand-object interaction photo-realistically, suffering from degraded rendering quality caused by the heavy mutual occlusions between the hand and the object, and inaccurate hand-object pose estim… ▽ More

    Submitted 9 February, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted by 3DV 2024

  37. arXiv:2402.01460  [pdf, other

    stat.ML cs.LG

    Deep conditional distribution learning via conditional Föllmer flow

    Authors: Jinyuan Chang, Zhao Ding, Yuling Jiao, Ruoxuan Li, Jerry Zhijian Yang

    Abstract: We introduce an ordinary differential equation (ODE) based deep generative method for learning conditional distributions, named Conditional Föllmer Flow. Starting from a standard Gaussian distribution, the proposed flow could approximate the target conditional distribution very well when the time is close to 1. For effective implementation, we discretize the flow with Euler's method where we estim… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

    Comments: The original title of this paper is "Deep Conditional Generative Learning: Model and Error Analysis"

  38. arXiv:2402.01220  [pdf, other

    cs.CV cs.CR

    Delving into Decision-based Black-box Attacks on Semantic Segmentation

    Authors: Zhaoyu Chen, Zhengyang Shan, Jingwen Chang, Kaixun Jiang, Dingkang Yang, Yiting Cheng, Wenqiang Zhang

    Abstract: Semantic segmentation is a fundamental visual task that finds extensive deployment in applications with security-sensitive considerations. Nonetheless, recent work illustrates the adversarial vulnerability of semantic segmentation models to white-box attacks. However, its adversarial robustness against black-box attacks has not been fully explored. In this paper, we present the first exploration o… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  39. arXiv:2401.09490  [pdf, other

    q-bio.QM cs.IR

    Gene-associated Disease Discovery Powered by Large Language Models

    Authors: Jiayu Chang, Shiyu Wang, Chen Ling, Zhaohui Qin, Liang Zhao

    Abstract: The intricate relationship between genetic variation and human diseases has been a focal point of medical research, evidenced by the identification of risk genes regarding specific diseases. The advent of advanced genome sequencing techniques has significantly improved the efficiency and cost-effectiveness of detecting these genetic markers, playing a crucial role in disease diagnosis and forming… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Comments: This is the official paper accepted by AAAI 2024 Workshop on Large Language Models for Biological Discoveries

  40. arXiv:2401.08036  [pdf, other

    cs.CV

    3D Lane Detection from Front or Surround-View using Joint-Modeling & Matching

    Authors: Haibin Zhou, Huabing Zhou, Jun Chang, Tao Lu, Jiayi Ma

    Abstract: 3D lanes offer a more comprehensive understanding of the road surface geometry than 2D lanes, thereby providing crucial references for driving decisions and trajectory planning. While many efforts aim to improve prediction accuracy, we recognize that an efficient network can bring results closer to lane modeling. However, if the modeling data is imprecise, the results might not accurately capture… ▽ More

    Submitted 28 May, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted to IEEE Transactions on Intelligent Vehicles(T-IV). 13 pages with 9 figures and 6 tables

  41. arXiv:2312.12458  [pdf, other

    cs.CL cs.AI

    When Parameter-efficient Tuning Meets General-purpose Vision-language Models

    Authors: Yihang Zhai, Haixin Wang, Jianlong Chang, Xinlong Yang, Jinan Sun, Shikun Zhang, Qi Tian

    Abstract: Instruction tuning has shown promising potential for developing general-purpose AI capabilities by using large-scale pre-trained models and boosts growing research to integrate multimodal information for creative applications. However, existing works still face two main limitations: the high training costs and heavy computing resource dependence of full model fine-tuning, and the lack of semantic… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

  42. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  43. LabelCraft: Empowering Short Video Recommendations with Automated Label Crafting

    Authors: Yimeng Bai, Yang Zhang, Jing Lu, Jianxin Chang, Xiaoxue Zang, Yanan Niu, Yang Song, Fuli Feng

    Abstract: Short video recommendations often face limitations due to the quality of user feedback, which may not accurately depict user interests. To tackle this challenge, a new task has emerged: generating more dependable labels from original feedback. Existing label generation methods rely on manual rules, demanding substantial human effort and potentially misaligning with the desired objectives of the pl… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by WSDM'24

    ACM Class: H.3.3; H.3.5

  44. arXiv:2312.09527  [pdf, other

    cs.CV cs.GR

    TIFace: Improving Facial Reconstruction through Tensorial Radiance Fields and Implicit Surfaces

    Authors: Ruijie Zhu, Jiahao Chang, Ziyang Song, Jiahuan Yu, Tianzhu Zhang

    Abstract: This report describes the solution that secured the first place in the "View Synthesis Challenge for Human Heads (VSCHH)" at the ICCV 2023 workshop. Given the sparse view images of human heads, the objective of this challenge is to synthesize images from novel viewpoints. Due to the complexity of textures on the face and the impact of lighting, the baseline method TensoRF yields results with signi… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: 1st place solution in the View Synthesis Challenge for Human Heads (VSCHH) at the ICCV 2023 workshop

  45. arXiv:2311.18168  [pdf, other

    cs.CV cs.LG eess.AS

    Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications

    Authors: Karren D. Yang, Anurag Ranjan, Jen-Hao Rick Chang, Raviteja Vemulapalli, Oncel Tuzel

    Abstract: We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping from speech signal to 3D face meshes on small datasets with limited speakers. While these models can achieve high-quality lip articulation for speakers in the training set, they are unable to capture the full and diverse distribution of 3D f… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  46. arXiv:2311.17910  [pdf, other

    cs.CV cs.GR

    HUGS: Human Gaussian Splats

    Authors: Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel, Oncel Tuzel, Anurag Ranjan

    Abstract: Recent advances in neural rendering have improved both training and rendering times by orders of magnitude. While these methods demonstrate state-of-the-art quality and speed, they are designed for photogrammetry of static scenes and do not generalize well to freely moving humans in the environment. In this work, we introduce Human Gaussian Splats (HUGS) that represents an animatable human togethe… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  47. arXiv:2311.11163  [pdf, other

    cs.SI stat.AP stat.CO

    Hate speech and hate crimes: a data-driven study of evolving discourse around marginalized groups

    Authors: Malvina Bozhidarova, Jonathn Chang, Aaishah Ale-rasool, Yuxiang Liu, Chongyao Ma, Andrea L. Bertozzi, P. Jeffrey Brantingham, Junyuan Lin, Sanjukta Krishnagopal

    Abstract: This study explores the dynamic relationship between online discourse, as observed in tweets, and physical hate crimes, focusing on marginalized groups. Leveraging natural language processing techniques, including keyword extraction and topic modeling, we analyze the evolution of online discourse after events affecting these groups. Examining sentiment and polarizing tweets, we establish correlati… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  48. arXiv:2311.09481  [pdf, other

    cs.CL

    Personalized Jargon Identification for Enhanced Interdisciplinary Communication

    Authors: Yue Guo, Joseph Chee Chang, Maria Antoniak, Erin Bransom, Trevor Cohen, Lucy Lu Wang, Tal August

    Abstract: Scientific jargon can impede researchers when they read materials from other domains. Current methods of jargon identification mainly use corpus-level familiarity indicators (e.g., Simple Wikipedia represents plain language). However, researchers' familiarity of a term can vary greatly based on their own background. We collect a dataset of over 10K term familiarity annotations from 11 computer sci… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

  49. arXiv:2311.08302  [pdf, other

    cs.IR

    Inverse Learning with Extremely Sparse Feedback for Recommendation

    Authors: Guanyu Lin, Chen Gao, Yu Zheng, Yinfeng Li, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li

    Abstract: Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback f… ▽ More

    Submitted 20 November, 2023; v1 submitted 14 November, 2023; originally announced November 2023.

    Comments: WSDM 2024

  50. arXiv:2311.08272  [pdf, other

    cs.IR cs.LG

    Mixed Attention Network for Cross-domain Sequential Recommendation

    Authors: Guanyu Lin, Chen Gao, Yu Zheng, Jianxin Chang, Yanan Niu, Yang Song, Kun Gai, Zhiheng Li, Depeng Jin, Yong Li, Meng Wang

    Abstract: In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WSDM 2024