Skip to main content

Showing 1–50 of 199 results for author: Wen, B

  1. arXiv:2407.08865  [pdf, other

    cs.CV

    Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

    Authors: Laniqng Guo, Chong Wang, Yufei Wang, Siyu Huang, Wenhan Yang, Alex C. Kot, Bihan Wen

    Abstract: Shadow removal aims at restoring the image content within shadow regions, pursuing a uniform distribution of illumination that is consistent between shadow and non-shadow regions. {Comparing to other image restoration tasks, there are two unique challenges in shadow removal:} 1) The patterns of shadows are arbitrary, varied, and often have highly complex trace structures, making ``trace-less'' ima… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: url: https://github.com/GuoLanqing/Awesome-Shadow-Removal

  2. arXiv:2407.08028  [pdf, other

    cs.RO

    AutoMate: Specialist and Generalist Assembly Policies over Diverse Geometries

    Authors: Bingjie Tang, Iretiayo Akinola, Jie Xu, Bowen Wen, Ankur Handa, Karl Van Wyk, Dieter Fox, Gaurav S. Sukhatme, Fabio Ramos, Yashraj Narang

    Abstract: Robotic assembly for high-mixture settings requires adaptivity to diverse parts and poses, which is an open challenge. Meanwhile, in other areas of robotics, large models and sim-to-real have led to tremendous progress. Inspired by such work, we present AutoMate, a learning framework and system that consists of 4 parts: 1) a dataset of 100 assemblies compatible with simulation and the real world,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2407.06600  [pdf, other

    cs.CV

    Integrating Clinical Knowledge into Concept Bottleneck Models

    Authors: Winnie Pang, Xueyi Ke, Satoshi Tsutsui, Bihan Wen

    Abstract: Concept bottleneck models (CBMs), which predict human-interpretable concepts (e.g., nucleus shapes in cell images) before predicting the final output (e.g., cell type), provide insights into the decision-making processes of the model. However, training CBMs solely in a data-driven manner can introduce undesirable biases, which may compromise prediction performance, especially when the trained mode… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI2024

  4. arXiv:2407.03978  [pdf, other

    cs.CL cs.AI

    Benchmarking Complex Instruction-Following with Multiple Constraints Composition

    Authors: Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning Wang, Minlie Huang

    Abstract: Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on m… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 20 pages, 7 figures

  5. arXiv:2407.01067  [pdf, other

    cs.AI cs.CL cs.CV cs.HC cs.LG

    Human-like object concept representations emerge naturally in multimodal large language models

    Authors: Changde Du, Kaicheng Fu, Bincheng Wen, Yi Sun, Jie Peng, Wei Wei, Ying Gao, Shengpei Wang, Chuncheng Zhang, Jinpeng Li, Shuang Qiu, Le Chang, Huiguang He

    Abstract: The conceptualization and categorization of natural objects in the human mind have long intrigued cognitive scientists and neuroscientists, offering crucial insights into human perception and cognition. Recently, the rapid development of Large Language Models (LLMs) has raised the attractive question of whether these models can also develop human-like object representations through exposure to vas… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2407.00820  [pdf

    cs.RO

    Localization and Perception for Control of a Low Speed Autonomous Shuttle in a Campus Pilot Deployment

    Authors: Bowen Wen

    Abstract: Future SAE Level 4 and Level 5 autonomous vehicles will require novel applications of localization, perception, control and artificial intelligence technology in order to offer innovative and disruptive solutions to current mobility problems. Accurate localization is essential for self driving vehicle navigation in GPS inaccessible environments. This thesis concentrates on low speed autonomous shu… ▽ More

    Submitted 2 April, 2024; originally announced July 2024.

    Comments: Master thesis, ADL & GDA, The Ohio State University, 2014

  7. arXiv:2406.13659  [pdf, other

    cs.AI

    Leveraging Large Language Models for Patient Engagement: The Power of Conversational AI in Digital Health

    Authors: Bo Wen, Raquel Norel, Julia Liu, Thaddeus Stappenbeck, Farhana Zulkernine, Huamin Chen

    Abstract: The rapid advancements in large language models (LLMs) have opened up new opportunities for transforming patient engagement in healthcare through conversational AI. This paper presents an overview of the current landscape of LLMs in healthcare, specifically focusing on their applications in analyzing and generating conversations for improved patient engagement. We showcase the power of LLMs in han… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 10 pages, 6 figures, ICDH 2024 invited paper

  8. arXiv:2406.10543  [pdf, other

    cs.CV cs.AI

    NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows

    Authors: Zhenggang Tang, Zhongzheng Ren, Xiaoming Zhao, Bowen Wen, Jonathan Tremblay, Stan Birchfield, Alexander Schwing

    Abstract: We present a method for automatically modifying a NeRF representation based on a single observation of a non-rigid transformed version of the original scene. Our method defines the transformation as a 3D flow, specifically as a weighted linear blending of rigid transformations of 3D anchor points that are defined on the surface of the scene. In order to identify anchor points, we introduce a novel… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 8 pages of main paper, CVPR 2024. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  9. arXiv:2406.10462  [pdf, other

    cs.CV

    CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

    Authors: Wei Chen, Lin Li, Yongqi Yang, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

    Abstract: Interleaved image-text generation has emerged as a crucial multimodal task, aiming at creating sequences of interleaved visual and textual content given a query. Despite notable advancements in recent multimodal large language models (MLLMs), generating integrated image-text sequences that exhibit narrative coherence and entity and style consistency remains challenging due to poor training data qu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 22 pages

  10. arXiv:2406.08300  [pdf, other

    eess.IV cs.CV

    From Chaos to Clarity: 3DGS in the Dark

    Authors: Zhihao Li, Yufei Wang, Alex Kot, Bihan Wen

    Abstract: Novel view synthesis from raw images provides superior high dynamic range (HDR) information compared to reconstructions from low dynamic range RGB images. However, the inherent noise in unprocessed raw images compromises the accuracy of 3D scene representation. Our study reveals that 3D Gaussian Splatting (3DGS) is particularly susceptible to this noise, leading to numerous elongated Gaussian shap… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.06843  [pdf, other

    cs.CV

    HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction

    Authors: Jikai Wang, Qifan Zhang, Yu-Wei Chao, Bowen Wen, Xiaohu Guo, Yu Xiang

    Abstract: We introduce a data capture system and a new dataset named HO-Cap that can be used to study 3D reconstruction and pose tracking of hands and objects in videos. The capture system uses multiple RGB-D cameras and a HoloLens headset for data collection, avoiding the use of expensive 3D scanners or mocap systems. We propose a semi-automatic method to obtain annotations of shape and pose of hands and o… ▽ More

    Submitted 16 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

  12. arXiv:2406.05955  [pdf, other

    cs.LG cs.CL

    Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters

    Authors: Yixin Song, Haotong Xie, Zhengyan Zhang, Bo Wen, Li Ma, Zeyu Mi, Haibo Chen

    Abstract: Exploiting activation sparsity is a promising approach to significantly accelerating the inference process of large language models (LLMs) without compromising performance. However, activation sparsity is determined by activation functions, and commonly used ones like SwiGLU and GeGLU exhibit limited sparsity. Simply replacing these functions with ReLU fails to achieve sufficient sparsity. Moreove… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  13. arXiv:2405.20721  [pdf, other

    cs.CV cs.AI

    ContextGS: Compact 3D Gaussian Splatting with Anchor Level Context Model

    Authors: Yufei Wang, Zhihao Li, Lanqing Guo, Wenhan Yang, Alex C. Kot, Bihan Wen

    Abstract: Recently, 3D Gaussian Splatting (3DGS) has become a promising framework for novel view synthesis, offering fast rendering speeds and high fidelity. However, the large number of Gaussians and their associated attributes require effective compression techniques. Existing methods primarily compress neural Gaussians individually and independently, i.e., coding all the neural Gaussians at the same time… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  14. arXiv:2405.19996  [pdf, other

    cs.CV cs.AI

    DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild

    Authors: Honghao Fu, Yufei Wang, Wenhan Yang, Bihan Wen

    Abstract: Image quality assessment (IQA) plays a critical role in selecting high-quality images and guiding compression and enhancement methods in a series of applications. The blind IQA, which assesses the quality of in-the-wild images containing complex authentic distortions without reference images, poses greater challenges. Existing methods are limited to modeling a uniform distribution with local patch… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

  15. arXiv:2405.16820  [pdf, other

    cs.LG cs.AI cs.CY cs.HC

    Laboratory-Scale AI: Open-Weight Models are Competitive with ChatGPT Even in Low-Resource Settings

    Authors: Robert Wolfe, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, Eva Brown, Zening Qu, Nic Weber, Bill Howe

    Abstract: The rapid proliferation of generative AI has raised questions about the competitiveness of lower-parameter, locally tunable, open-weight models relative to high-parameter, API-guarded, closed-weight models in terms of performance, domain adaptation, cost, and generalization. Centering under-resourced yet risk-intolerant settings in government, research, and healthcare, we see for-profit closed-wei… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted at the ACM Conference on Fairness, Accountability, and Transparency (FAccT) 2024

  16. arXiv:2405.16295  [pdf, other

    cs.CL cs.LG

    Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data

    Authors: Yuhao Chen, Zhimu Wang, Bo Wen, Farhana Zulkernine

    Abstract: Unstructured text in medical notes and dialogues contains rich information. Recent advancements in Large Language Models (LLMs) have demonstrated superior performance in question answering and summarization tasks on unstructured text data, outperforming traditional text analysis approaches. However, there is a lack of scientific studies in the literature that methodically evaluate and report on th… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  17. arXiv:2405.11852  [pdf, other

    cs.CV

    Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models

    Authors: Xiyu Wang, Yufei Wang, Satoshi Tsutsui, Weisi Lin, Bihan Wen, Alex C. Kot

    Abstract: Diffusion-based models for story visualization have shown promise in generating content-coherent images for storytelling tasks. However, how to effectively integrate new characters into existing narratives while maintaining character consistency remains an open problem, particularly with limited data. Two major limitations hinder the progress: (1) the absence of a suitable benchmark due to potenti… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  18. arXiv:2405.09364  [pdf, other

    gr-qc hep-ex hep-ph

    Orbital Stability Study of the Taiji Space Gravitational Wave Detector

    Authors: Yu-Yang Zhang, Geng Li, Bo Wen

    Abstract: Space-based gravitational wave detection is extremely sensitive to disturbances. The Keplerian configuration cannot accurately reflect the variations in spacecraft configuration. Planetary gravitational disturbances are one of the main sources. Numerical simulation is an effective method to investigate the impact of perturbation on spacecraft orbits. This study shows that, in the context of the Ta… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Comments: 18 pages, 11 figures

    Journal ref: Universe 2024, Volume 10, Issue 5, 219

  19. arXiv:2405.08438  [pdf, other

    cond-mat.str-el cond-mat.supr-con

    Magnetic fluctuation and dominant superconducting pairing symmetry near the tunable Van Hove singularity

    Authors: Xiaohan Kong, Boyang Wen, Kaiyi Guo, Ying Liang, Tianxing Ma

    Abstract: We have investigated the magnetism and pairing correlations of the triangular lattice based on the Hubbard model using the determinant quantum Monte Carlo method and the constrained path Monte Carlo. The results show that the presence of the next-nearest-neighbor hopping integral $t^{\prime}$ introduces an additional energy scale to the system, and through $t^{\prime}$, one can regulate the shape… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: 7 pages and 9 figures. Accepted for publication as a Regular Article in Physical Review B

  20. arXiv:2405.00574  [pdf, other

    cs.CV cs.MM

    EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model

    Authors: Deng Li, Xin Liu, Bohao Xing, Baiqiang Xia, Yuan Zong, Bihan Wen, Heikki Kälviäinen

    Abstract: Emotion AI is the ability of computers to understand human emotional states. Existing works have achieved promising progress, but two limitations remain to be solved: 1) Previous studies have been more focused on short sequential video emotion analysis while overlooking long sequential video. However, the emotions in short sequential videos only reflect instantaneous emotions, which may be deliber… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  21. arXiv:2404.12452  [pdf, other

    cs.CL

    Characterizing LLM Abstention Behavior in Science QA with Context Perturbations

    Authors: Bingbing Wen, Bill Howe, Lucy Lu Wang

    Abstract: The correct model response in the face of uncertainty is to abstain from answering a question so as not to mislead the user. In this work, we study the ability of LLMs to abstain from answering context-dependent science questions when provided insufficient or incorrect context. We probe model sensitivity in several settings: removing gold context, replacing gold context with irrelevant context, an… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  22. arXiv:2404.01440  [pdf, other

    cs.CV cs.AI cs.GR cs.RO

    Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects

    Authors: Yijia Weng, Bowen Wen, Jonathan Tremblay, Valts Blukis, Dieter Fox, Leonidas Guibas, Stan Birchfield

    Abstract: We address the problem of building digital twins of unknown articulated objects from two RGBD scans of the object at different articulation states. We decompose the problem into two stages, each addressing distinct aspects. Our method first reconstructs object-level shape at each state, then recovers the underlying articulation model including part segmentation and joint articulations that associa… ▽ More

    Submitted 6 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: CVPR 2024

  23. arXiv:2403.10076  [pdf, other

    cs.CV

    Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks

    Authors: Chong Wang, Yi Yu, Lanqing Guo, Bihan Wen

    Abstract: Shadow removal is a task aimed at erasing regional shadows present in images and reinstating visually pleasing natural scenes with consistent illumination. While recent deep learning techniques have demonstrated impressive performance in image shadow removal, their robustness against adversarial attacks remains largely unexplored. Furthermore, many existing attack frameworks typically allocate a u… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to ICASSP 2024

  24. arXiv:2403.10064  [pdf, other

    eess.IV cs.CV

    Progressive Divide-and-Conquer via Subsampling Decomposition for Accelerated MRI

    Authors: Chong Wang, Lanqing Guo, Yufei Wang, Hao Cheng, Yi Yu, Bihan Wen

    Abstract: Deep unfolding networks (DUN) have emerged as a popular iterative framework for accelerated magnetic resonance imaging (MRI) reconstruction. However, conventional DUN aims to reconstruct all the missing information within the entire null space in each iteration. Thus it could be challenging when dealing with highly ill-posed degradation, usually leading to unsatisfactory reconstruction. In this wo… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  25. Optimization decision model of vegetable stock and pricing based on TCN-Attention and genetic algorithm

    Authors: Linhan Xia, Jinyuan Zhang, Bohan Wen

    Abstract: With the expansion of operational scale of supermarkets in China, the vegetable market has grown considerably. The decision-making related to procurement costs and allocation quantities of vegetables has become a pivotal factor in determining the profitability of supermarkets. This paper analyzes the relationship between pricing and allocation faced by supermarkets in vegetable operations. Optimiz… ▽ More

    Submitted 2 March, 2024; originally announced March 2024.

    Comments: ICCSMT2023

  26. arXiv:2403.00527  [pdf, other

    cs.HC cs.CY cs.SI

    "There is a Job Prepared for Me Here": Understanding How Short Video and Live-streaming Platforms Empower Ageing Job Seekers in China

    Authors: PiaoHong Wang, Siying Hu, Bo Wen, Zhicong Lu

    Abstract: In recent years, the global unemployment rate has remained persistently high. Compounding this issue, the ageing population in China often encounters additional challenges in finding employment due to prevalent age discrimination in daily life. However, with the advent of social media, there has been a rise in the popularity of short videos and live-streams for recruiting ageing workers. To better… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

    Comments: 14 pages, 3 figures; Accepted to ACM CHI 2024. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI'24)

    ACM Class: H.5.m; K.4.0

  27. arXiv:2402.15052  [pdf, other

    cs.CL cs.AI

    ToMBench: Benchmarking Theory of Mind in Large Language Models

    Authors: Zhuang Chen, Jincenzi Wu, Jinfeng Zhou, Bosi Wen, Guanqun Bi, Gongyao Jiang, Yaru Cao, Mengting Hu, Yunghwei Lai, Zexuan Xiong, Minlie Huang

    Abstract: Theory of Mind (ToM) is the cognitive capability to perceive and ascribe mental states to oneself and others. Recent research has sparked a debate over whether large language models (LLMs) exhibit a form of ToM. However, existing ToM evaluations are hindered by challenges such as constrained scope, subjective judgment, and unintended contamination, yielding inadequate assessments. To address this… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Under review

  28. arXiv:2402.10491  [pdf, other

    cs.CV

    Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

    Authors: Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, Yufei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

    Abstract: Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data. Adapting large pre-trained diffusion models for higher resolution demands substantial computational and optimization resources, yet achieving a generation capability comparable to low-resolution… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Project Page: https://guolanqing.github.io/Self-Cascade/

  29. arXiv:2401.01223  [pdf

    cond-mat.mtrl-sci physics.app-ph

    Twinning induced by elastic anisotropy in FCC crystals

    Authors: Jie Huang, Mingyu Lei, Guangpeng Sun, Guochun Yang, Bin Wen

    Abstract: Dislocation slip and deformation twin are widely regarded as two important mechanisms of active competition in the process of plastic deformation. Calculating and comparing the critical resolved shear stress (CRSS) of two deformation modes are the key to discussing the mechanical properties reflected by different mechanisms in crystals. Here, the paper proposes a model to predict the CRSS of discr… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: 20 pages, 4 figures

  30. arXiv:2312.13503  [pdf, other

    cs.CV cs.AI

    InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

    Authors: Bingbing Wen, Zhengyuan Yang, Jianfeng Wang, Zhe Gan, Bill Howe, Lijuan Wang

    Abstract: In this paper, we build a visual dialogue dataset, named InfoVisDial, which provides rich informative answers in each round even with external knowledge related to the visual content. Different from existing datasets where the answer is compact and short, InfoVisDial contains long free-form answers with rich information in each round of dialogue. For effective data collection, the key idea is to b… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  31. arXiv:2312.08344  [pdf, other

    cs.CV cs.AI cs.RO

    FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

    Authors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield

    Abstract: We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit represen… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  32. arXiv:2312.02459  [pdf, other

    cond-mat.mtrl-sci physics.comp-ph

    An adaptive preconditioning scheme for the self-consistent field iteration and generalized stacking-fault energy calculations

    Authors: Sitong Zhang, Xingyu Gao, Haifeng Song, Bin Wen

    Abstract: The generalized stacking-fault energy (GSFE) is the fundamental but key parameter for the plastic deformation of materials. We perform first-principles calculations by full-potential linearized augmented planewave (FLAPW) method to evaluate the GSFE based on the single-shift and triple-shift supercell models. Different degrees of defects are introduced in the two models, thereby affecting the conv… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

    Comments: 10 pages, 8 figures

  33. arXiv:2311.18743  [pdf, other

    cs.CL cs.AI cs.LG

    AlignBench: Benchmarking Chinese Alignment of Large Language Models

    Authors: Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, Jie Tang

    Abstract: Alignment has become a critical step for instruction-tuned Large Language Models (LLMs) to become helpful assistants. However, effective evaluation of alignment for emerging Chinese LLMs is still significantly lacking, calling for real-scenario grounded, open-ended, challenging and automatic evaluations tailored for alignment. To fill in this gap, we introduce AlignBench, a comprehensive multi-dim… ▽ More

    Submitted 5 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

  34. arXiv:2311.18702  [pdf, other

    cs.CL cs.AI

    CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation

    Authors: Pei Ke, Bosi Wen, Zhuoer Feng, Xiao Liu, Xuanyu Lei, Jiale Cheng, Shengyuan Wang, Aohan Zeng, Yuxiao Dong, Hongning Wang, Jie Tang, Minlie Huang

    Abstract: Since the natural language processing (NLP) community started to make large language models (LLMs) act as a critic to evaluate the quality of generated texts, most of the existing works train a critique generation model on the evaluation data labeled by GPT-4's direct prompting. We observe that these models lack the ability to generate informative critiques in both pointwise grading and pairwise c… ▽ More

    Submitted 26 June, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted by ACL 2024 (Main Conference)

  35. arXiv:2311.18303  [pdf, other

    cs.CV

    OmniMotionGPT: Animal Motion Generation with Limited Data

    Authors: Zhangsihao Yang, Mingyuan Zhou, Mengyi Shan, Bingbing Wen, Ziwei Xuan, Mitch Hill, Junjie Bai, Guo-Jun Qi, Yalin Wang

    Abstract: Our paper aims to generate diverse and realistic animal motion sequences from textual descriptions, without a large-scale animal text-motion dataset. While the task of text-driven human motion synthesis is already extensively studied and benchmarked, it remains challenging to transfer this success to other skeleton structures with limited data. In this work, we design a model architecture that imi… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: The project page is at https://zshyang.github.io/omgpt-website/

  36. arXiv:2311.16832  [pdf, other

    cs.CL cs.AI

    CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

    Authors: Jinfeng Zhou, Zhuang Chen, Dazhen Wan, Bosi Wen, Yi Song, Jifan Yu, Yongkang Huang, Libiao Peng, Jiaming Yang, Xiyao Xiao, Sahand Sabour, Xiaohan Zhang, Wenjing Hou, Yijia Zhang, Yuxiao Dong, Jie Tang, Minlie Huang

    Abstract: In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can custom… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: Work in progress

  37. arXiv:2311.16551  [pdf

    cond-mat.mtrl-sci

    Thermally-activated precipitation strengthening

    Authors: Guangpeng Sun, Liqiang zhang, Bin Wen

    Abstract: Precipitation strengthening is a key strengthening method for metallic materials. However, the temperature effect on precipitation strengthening is still unclear to date. Based on dislocation theory, a thermally-activated precipitation strengthening model is built by considering the competition between shear and bypass mechanisms. For medium-sized precipitate particles, the thermally-activated she… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  38. arXiv:2311.14760  [pdf, other

    cs.CV

    SinSR: Diffusion-Based Image Super-Resolution in a Single Step

    Authors: Yufei Wang, Wenhan Yang, Xinyuan Chen, Yaohui Wang, Lanqing Guo, Lap-Pui Chau, Ziwei Liu, Yu Qiao, Alex C. Kot, Bihan Wen

    Abstract: While super-resolution (SR) methods based on diffusion models exhibit promising results, their practical application is hindered by the substantial number of required inference steps. Recent methods utilize degraded images in the initial state, thereby shortening the Markov chain. Nevertheless, these solutions either rely on a precise formulation of the degradation process or still necessitate a r… ▽ More

    Submitted 23 November, 2023; originally announced November 2023.

  39. arXiv:2311.01373  [pdf, other

    cs.CV cs.AI

    Optimization Efficient Open-World Visual Region Recognition

    Authors: Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu

    Abstract: Understanding the semantics of individual regions or patches of unconstrained images, such as open-world object detection, remains a critical yet challenging task in computer vision. Building on the success of powerful image-level vision-language (ViL) foundation models like CLIP, recent efforts have sought to harness their capabilities by either training a contrastive model from scratch with an e… ▽ More

    Submitted 13 June, 2024; v1 submitted 2 November, 2023; originally announced November 2023.

  40. arXiv:2310.17596  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations

    Authors: Ajay Mandlekar, Soroush Nasiriany, Bowen Wen, Iretiayo Akinola, Yashraj Narang, Linxi Fan, Yuke Zhu, Dieter Fox

    Abstract: Imitation learning from a large set of human demonstrations has proved to be an effective paradigm for building capable robot agents. However, the demonstrations can be extremely costly and time-consuming to collect. We introduce MimicGen, a system for automatically synthesizing large-scale, rich datasets from only a small number of human demonstrations by adapting them to new contexts. We use Mim… ▽ More

    Submitted 26 October, 2023; originally announced October 2023.

    Comments: Conference on Robot Learning (CoRL) 2023

  41. Health Guardian: Using Multi-modal Data to Understand Individual Health

    Authors: Vince S. Siu, Kuan Yu Hsieh, Italo Buleje, Takashi Itoh, Tian Hao, Ben Civjan, Nigel Hinds, Bing Dang, Jeffrey L. Rogers, Bo Wen

    Abstract: Artificial intelligence (AI) has shown great promise in revolutionizing the field of digital health by improving disease diagnosis, treatment, and prevention. This paper describes the Health Guardian platform, a non-commercial, scientific research-based platform developed by the IBM Digital Health team to rapidly translate AI research into cloud-based microservices. The platform can collect health… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 10 pages, 6 figures

    Journal ref: IEEE International Conference on Digital Health (ICDH), 2023, pp. 65-74

  42. arXiv:2310.00463  [pdf, other

    cs.CV cs.RO

    Diff-DOPE: Differentiable Deep Object Pose Estimation

    Authors: Jonathan Tremblay, Bowen Wen, Valts Blukis, Balakumar Sundaralingam, Stephen Tyree, Stan Birchfield

    Abstract: We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

    Comments: Submitted to ICRA 2023. Project page is at https://diffdope.github.io

  43. arXiv:2309.07169  [pdf, other

    eess.SP cs.LG

    Spectral Convergence of Complexon Shift Operators

    Authors: Purui Zhang, Xingchao Jian, Feng Ji, Wee Peng Tay, Bihan Wen

    Abstract: Topological Signal Processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence [1]. Inspired by the graphon shift operator and message-passing neu… ▽ More

    Submitted 5 May, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: 9 pages, 2 figures

  44. Remote Inference of Cognitive Scores in ALS Patients Using a Picture Description

    Authors: Carla Agurto, Guillermo Cecchi, Bo Wen, Ernest Fraenkel, James Berry, Indu Navar, Raquel Norel

    Abstract: Amyotrophic lateral sclerosis is a fatal disease that not only affects movement, speech, and breath but also cognition. Recent studies have focused on the use of language analysis techniques to detect ALS and infer scales for monitoring functional progression. In this paper, we focused on another important aspect, cognitive impairment, which affects 35-50% of the ALS population. In an effort to re… ▽ More

    Submitted 13 September, 2023; originally announced September 2023.

    Comments: conference paper

  45. Enabling the Evaluation of Driver Physiology Via Vehicle Dynamics

    Authors: Rodrigo Ordonez-Hurtado, Bo Wen, Nicholas Barra, Ryan Vimba, Sergio Cabrero-Barros, Sergiy Zhuk, Jeffrey L. Rogers

    Abstract: Driving is a daily routine for many individuals across the globe. This paper presents the configuration and methodologies used to transform a vehicle into a connected ecosystem capable of assessing driver physiology. We integrated an array of commercial sensors from the automotive and digital health sectors along with driver inputs from the vehicle itself. This amalgamation of sensors allows for m… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: 7 pages, 11 figures, 2023 IEEE International Conference on Digital Health (ICDH)

    Journal ref: in 2023 IEEE International Conference on Digital Health (ICDH), Chicago, IL, USA, 2023 pp. 195-201

  46. arXiv:2308.01477  [pdf, other

    cs.RO cs.CV

    HANDAL: A Dataset of Real-World Manipulable Object Categories with Pose Annotations, Affordances, and Reconstructions

    Authors: Andrew Guo, Bowen Wen, Jianhe Yuan, Jonathan Tremblay, Stephen Tyree, Jeffrey Smith, Stan Birchfield

    Abstract: We present the HANDAL dataset for category-level object pose estimation and affordance prediction. Unlike previous datasets, ours is focused on robotics-ready manipulable objects that are of the proper size and shape for functional grasping by robot manipulators, such as pliers, utensils, and screwdrivers. Our annotation process is streamlined, requiring only a single off-the-shelf camera and semi… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: IROS 2023. Project page: https://nvlabs.github.io/HANDAL/

  47. arXiv:2307.10811  [pdf, other

    cs.HC cs.AI cs.CL

    "It Felt Like Having a Second Mind": Investigating Human-AI Co-creativity in Prewriting with Large Language Models

    Authors: Qian Wan, Siying Hu, Yu Zhang, Piaohong Wang, Bo Wen, Zhicong Lu

    Abstract: Prewriting is the process of discovering and developing ideas before a first draft, which requires divergent thinking and often implies unstructured strategies such as diagramming, outlining, free-writing, etc. Although large language models (LLMs) have been demonstrated to be useful for a variety of tasks including creative writing, little is known about how users would collaborate with LLMs to s… ▽ More

    Submitted 29 February, 2024; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: To appear at ACM CSCW 2024; Accepted to PACM HCI (CSCW); 25 pages, 2 figures

    ACM Class: H.5.m; K.4.0

    Journal ref: Proc. ACM Hum.-Comput. Interact. 8, CSCW1, Article 84 (2024)

  48. arXiv:2307.07710  [pdf, other

    cs.CV eess.IV

    ExposureDiffusion: Learning to Expose for Low-light Image Enhancement

    Authors: Yufei Wang, Yi Yu, Wenhan Yang, Lanqing Guo, Lap-Pui Chau, Alex C. Kot, Bihan Wen

    Abstract: Previous raw image-based low-light image enhancement methods predominantly relied on feed-forward neural networks to learn deterministic mappings from low-light to normally-exposed images. However, they failed to capture critical distribution information, leading to visually undesirable results. This work addresses the issue by seamlessly integrating a diffusion model with a physics-based exposure… ▽ More

    Submitted 15 August, 2023; v1 submitted 15 July, 2023; originally announced July 2023.

    Comments: accepted by ICCV2023

  49. arXiv:2307.04122  [pdf, other

    cs.CV eess.IV

    Enhancing Low-Light Images Using Infrared-Encoded Images

    Authors: Shulin Tian, Yufei Wang, Renjie Wan, Wenhan Yang, Alex C. Kot, Bihan Wen

    Abstract: Low-light image enhancement task is essential yet challenging as it is ill-posed intrinsically. Previous arts mainly focus on the low-light images captured in the visible spectrum using pixel-wise loss, which limits the capacity of recovering the brightness, contrast, and texture details due to the small number of income photons. In this work, we propose a novel approach to increase the visibility… ▽ More

    Submitted 9 July, 2023; originally announced July 2023.

    Comments: The first two authors contribute equally. The work is accepted by ICIP 2023

  50. arXiv:2306.13531  [pdf, other

    cs.CV

    WBCAtt: A White Blood Cell Dataset Annotated with Detailed Morphological Attributes

    Authors: Satoshi Tsutsui, Winnie Pang, Bihan Wen

    Abstract: The examination of blood samples at a microscopic level plays a fundamental role in clinical diagnostics, influencing a wide range of medical conditions. For instance, an in-depth study of White Blood Cells (WBCs), a crucial component of our blood, is essential for diagnosing blood-related diseases such as leukemia and anemia. While multiple datasets containing WBC images have been proposed, they… ▽ More

    Submitted 25 December, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Comments: Neural Information Processing Systems 2023