Skip to main content

Showing 1–50 of 110 results for author: Fu, B

  1. arXiv:2406.09162  [pdf, other

    cs.CV

    EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

    Authors: Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

    Abstract: Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image gen… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: https://tencentqqgylab.github.io/EMMA

  2. arXiv:2406.04594  [pdf, other

    cs.DC cs.AI cs.LG

    Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

    Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

    Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2405.20853  [pdf, other

    cs.CV

    MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

    Authors: Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen

    Abstract: The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation… ▽ More

    Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  4. arXiv:2405.05590  [pdf, other

    cs.CR cs.AR cs.LG

    TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

    Authors: Fangzhou Wang, Qijing Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  5. arXiv:2405.00940  [pdf, other

    cs.DC cs.ET cs.MA

    Computing Threshold Circuits with Bimolecular Void Reactions in Step Chemical Reaction Networks

    Authors: Rachel Anderson, Bin Fu, Aiden Massie, Gourab Mukhopadhyay, Adrian Salinas, Robert Schweller, Evan Tomai, Tim Wylie

    Abstract: Step Chemical Reaction Networks (step CRNs) are an augmentation of the Chemical Reaction Network (CRN) model where additional species may be introduced to the system in a sequence of ``steps.'' We study step CRN systems using a weak subset of reaction rules, \emph{void} rules, in which molecular species can only be deleted. We demonstrate that step CRNs with only void rules of size (2,0) can simul… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: text overlap with arXiv:2402.08220

  6. arXiv:2403.11974  [pdf, other

    eess.IV cs.CV

    OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images

    Authors: Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou

    Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  7. arXiv:2403.05851  [pdf, other

    cs.MM cs.ET

    Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks

    Authors: Baojie Fu, Tong Tang, Dapeng Wu, Ruyan Wang

    Abstract: In the upcoming B5G/6G era, virtual reality (VR) over wireless has become a typical application, which is an inevitable trend in the development of video. However, in immersive and interactive VR experiences, VR services typically exhibit high delay, while simultaneously posing challenges for the energy consumption of local devices. To address these issues, this paper aims to improve the performan… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  8. arXiv:2403.05135  [pdf, other

    cs.CV

    ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

    Authors: Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

    Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Ad… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Project Page: https://ella-diffusion.github.io/

  9. arXiv:2402.08220  [pdf, other

    q-bio.MN cs.ET

    Computing Threshold Circuits with Void Reactions in Step Chemical Reaction Networks

    Authors: Rachel Anderson, Alberto Avila, Bin Fu, Timothy Gomez, Elise Grizzell, Aiden Massie, Gourab Mukhopadhyay, Adrian Salinas, Robert Schweller, Evan Tomai, Tim Wylie

    Abstract: We introduce a new model of \emph{step} Chemical Reaction Networks (step CRNs), motivated by the step-wise addition of materials in standard lab procedures. Step CRNs have ordered reactants that transform into products via reaction rules over a series of steps. We study an important subset of weak reaction rules, \emph{void} rules, in which chemical species may only be deleted but never changed. W… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

  10. arXiv:2312.15645  [pdf, other

    cs.CL

    Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

    Authors: Rui Zhao, Liang Zhang, Biao Fu, Cong Hu, Jinsong Su, Yidong Chen

    Abstract: Sign language translation (SLT) aims to convert continuous sign language videos into textual sentences. As a typical multi-modal task, there exists an inherent modality gap between sign language videos and spoken language text, which makes the cross-modal alignment between visual and textual modalities crucial. However, previous studies tend to rely on an intermediate sign gloss representation to… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Accepted as conference paper by AAAI24. The code and models are available at https://github.com/rzhao-zhsq/CV-SLT

  11. arXiv:2312.13913  [pdf, other

    cs.CV

    Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

    Authors: Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

    Abstract: This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within mod… ▽ More

    Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Website: https://github.com/OpenTexture/Paint3D

  12. arXiv:2312.13771  [pdf, other

    cs.CV

    AppAgent: Multimodal Agents as Smartphone Users

    Authors: Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

    Abstract: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping… ▽ More

    Submitted 21 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Project Page is https://appagent-official.github.io/

  13. arXiv:2312.02663  [pdf, other

    cs.CV cs.AI

    FaceStudio: Put Your Face Everywhere in Seconds

    Authors: Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu

    Abstract: This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for f… ▽ More

    Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Project homepage: https://icoz69.github.io/facestudio/

  14. arXiv:2311.16483  [pdf, other

    cs.CV cs.CL

    ChartLlama: A Multimodal LLM for Chart Understanding and Generation

    Authors: Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

    Abstract: Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tunin… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Code and model on https://tingxueronghua.github.io/ChartLlama/

  15. arXiv:2311.14189  [pdf, other

    cs.CV

    D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

    Authors: Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

    Abstract: Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction… ▽ More

    Submitted 22 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  16. arXiv:2311.11910  [pdf, other

    cs.AI cs.CV

    Generalization of Fitness Exercise Recognition from Doppler Measurements by Domain-adaption and Few-Shot Learning

    Authors: Biying Fu, Naser Damer, Florian Kirchbuchner, Arjan Kuijper

    Abstract: In previous works, a mobile application was developed using an unmodified commercial off-the-shelf smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab-environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicab… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: accepted at International Conference on Pattern Recognition (ICPR) workshop 2021

  17. arXiv:2311.11106  [pdf, other

    cs.CV

    ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

    Authors: Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

    Abstract: In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are t… ▽ More

    Submitted 11 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

    Comments: CVPR2024

  18. arXiv:2311.03967  [pdf, other

    cs.CV stat.ML

    CeCNN: Copula-enhanced convolutional neural networks in joint prediction of refraction error and axial length based on ultra-widefield fundus images

    Authors: Chong Zhong, Yang Li, Danjuan Yang, Meiyan Li, Xingyao Zhou, Bo Fu, Catherine C. Liu, A. H. Welsh

    Abstract: Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular c… ▽ More

    Submitted 1 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

  19. arXiv:2310.15161  [pdf, other

    cs.CV

    SAM-Med3D

    Authors: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao

    Abstract: Although the Segment Anything Model (SAM) has demonstrated impressive performance in 2D natural image segmentation, its application to 3D volumetric medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. These issues can hardly be addressed by fine-tuning SAM on medic… ▽ More

    Submitted 29 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

  20. arXiv:2310.13819  [pdf, other

    cs.RO

    LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly

    Authors: Bowen Fu, Sek Kun Leong, Yan Di, Jiwen Tang, Xiangyang Ji

    Abstract: Comprehending natural language instructions is a critical skill for robots to cooperate effectively with humans. In this paper, we aim to learn 6D poses for roboticassembly by natural language instructions. For this purpose, Language-Instructed 6D Pose Regression Network (LanPose) is proposed to jointly predict the 6D poses of the observed object and the corresponding assembly position. Our propos… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: 8 pages

  21. arXiv:2310.11696  [pdf, other

    cs.CV

    MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

    Authors: Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

    Abstract: Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-vie… ▽ More

    Submitted 13 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: CVPR 2024

  22. arXiv:2309.15373  [pdf, other

    cs.RO cs.MA

    Human-robot Matching and Routing for Multi-robot Tour Guiding under Time Uncertainty

    Authors: Bo Fu, Tribhi Kathuria, Denise Rizzo, Matthew Castanier, X. Jessie Yang, Maani Ghaffari, Kira Barton

    Abstract: This work presents a framework for multi-robot tour guidance in a partially known environment with uncertainty, such as a museum. A simultaneous matching and routing problem (SMRP) is formulated to match the humans with robot guides according to their requested places of interest (POIs) and generate the routes for the robots according to uncertain time estimation. A large neighborhood search algor… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: ICRA 2022 Workshop Paper (https://sites.google.com/view/icra22ws-cor-wotf/accepted-papers). arXiv admin note: substantial text overlap with arXiv:2201.10635

    MSC Class: 93A16

  23. arXiv:2309.09724  [pdf, other

    cs.CV

    Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

    Authors: Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

    Abstract: In this study, we address the challenge of 3D scene structure recovery from monocular depth estimation. While traditional depth estimation methods leverage labeled datasets to directly predict absolute depth, recent advancements advocate for mix-dataset training, enhancing generalization across diverse scenes. However, such mixed dataset training yields depth predictions only up to an unknown scal… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted by ICCV2023

  24. arXiv:2308.16404  [pdf, other

    cs.CV

    Deformation Robust Text Spotting with Geometric Prior

    Authors: Xixuan Hao, Aozhong Zhang, Xianze Meng, Bin Fu

    Abstract: The goal of text spotting is to perform text detection and recognition in an end-to-end manner. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape variance of the same character are ignored in recent works, since most characters in natural images are rendered in standard fonts. To solve this problem, we present a Chinese Artist… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

  25. arXiv:2308.10253  [pdf, other

    cs.CV cs.CL cs.LG

    StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

    Authors: Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

    Abstract: The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct im… ▽ More

    Submitted 27 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

    Comments: Project page: https://github.com/icoz69/StableLLAVA

  26. arXiv:2307.10799  [pdf, other

    cs.CL

    Layer-wise Representation Fusion for Compositional Generalization

    Authors: Yafang Zheng, Lei Lin, Shuangtao Li, Yuxuan Yuan, Zhaohong Lai, Shan Liu, Biao Fu, Yidong Chen, Xiaodong Shi

    Abstract: Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the… ▽ More

    Submitted 21 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

    Comments: accepted by aaai24. arXiv admin note: substantial text overlap with arXiv:2305.12169

  27. arXiv:2306.17115  [pdf, other

    cs.CV

    Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation

    Authors: Zibo Zhao, Wen Liu, Xin Chen, Xianfang Zeng, Rui Wang, Pei Cheng, Bin Fu, Tao Chen, Gang Yu, Shenghua Gao

    Abstract: We present a novel alignment-before-generation approach to tackle the challenging task of generating general 3D shapes based on 2D images or texts. Directly learning a conditional generative model from images or texts to 3D shapes is prone to producing inconsistent results with the conditions because 3D shapes have an additional dimension whose distribution significantly differs from that of 2D im… ▽ More

    Submitted 3 July, 2023; v1 submitted 29 June, 2023; originally announced June 2023.

    Comments: Project Website: https://neuralcarver.github.io/michelangelo

  28. arXiv:2305.19012  [pdf, other

    cs.CV cs.AI

    StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation

    Authors: Chi Zhang, Yiwen Chen, Yijun Fu, Zhenglin Zhou, Gang YU, Billzb Wang, Bin Fu, Tao Chen, Guosheng Lin, Chunhua Shen

    Abstract: The recent advancements in image-text diffusion models have stimulated research interest in large-scale 3D generative models. Nevertheless, the limited availability of diverse 3D resources presents significant challenges to learning. In this paper, we present a novel method for generating high-quality, stylized 3D avatars that utilizes pre-trained image-text diffusion models for data generation an… ▽ More

    Submitted 30 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Project page: https://github.com/icoz69/StyleAvatar3D

  29. arXiv:2305.12169  [pdf, other

    cs.CL

    Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

    Authors: Lei Lin, Shuangtao Li, Yafang Zheng, Biao Fu, Shan Liu, Yidong Chen, Xiaodong Shi

    Abstract: Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences… ▽ More

    Submitted 18 October, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by Findings of EMNLP 2023

  30. arXiv:2303.07914  [pdf, other

    cs.CL

    Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

    Authors: Biao Fu, Minpeng Liao, Kai Fan, Zhongqiang Huang, Boxing Chen, Yidong Chen, Xiaodong Shi

    Abstract: A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech r… ▽ More

    Submitted 26 October, 2023; v1 submitted 14 March, 2023; originally announced March 2023.

    Comments: Accept to EMNLP 2023 main conference

  31. arXiv:2302.00133  [pdf, ps, other

    cs.DS

    Sublinear Approximation Schemes for Scheduling Precedence Graphs of Bounded Depth

    Authors: Bin Fu, Yumei Huo, Hairong Zhao

    Abstract: We study the classical scheduling problem on parallel machines %with precedence constraints where the precedence graph has the bounded depth $h$. Our goal is to minimize the maximum completion time. We focus on developing approximation algorithms that use only sublinear space or sublinear time. We develop the first one-pass streaming approximation schemes using sublinear space when all jobs' proce… ▽ More

    Submitted 31 January, 2023; originally announced February 2023.

  32. arXiv:2212.13725  [pdf, ps, other

    cs.AI

    Robust Sequence Networked Submodular Maximization

    Authors: Qihao Shi, Bingyang Fu, Can Wang, Jiawei Chen, Sheng Zhou, Yan Feng, Chun Chen

    Abstract: In this paper, we study the \underline{R}obust \underline{o}ptimization for \underline{se}quence \underline{Net}worked \underline{s}ubmodular maximization (RoseNets) problem. We interweave the robust optimization with the sequence networked submodular maximization. The elements are connected by a directed acyclic graph and the objective function is not submodular on the elements but on the edges i… ▽ More

    Submitted 26 January, 2023; v1 submitted 28 December, 2022; originally announced December 2022.

    Comments: 12 pages, 14 figures, aaai2023 conference accepted

  33. AdvMIL: Adversarial Multiple Instance Learning for the Survival Analysis on Whole-Slide Images

    Authors: Pei Liu, Luping Ji, Feng Ye, Bo Fu

    Abstract: The survival analysis on histological whole-slide images (WSIs) is one of the most important means to estimate patient prognosis. Although many weakly-supervised deep learning models have been developed for gigapixel WSIs, their potential is generally restricted by classical survival analysis rules and fully-supervised learning requirements. As a result, these models provide patients only with a c… ▽ More

    Submitted 5 April, 2023; v1 submitted 13 December, 2022; originally announced December 2022.

    Comments: 15 pages, 10 figures, 8 tables

    Journal ref: Medical Image Analysis, 103020 (2023)

  34. arXiv:2212.04048  [pdf, other

    cs.CV cs.GR

    Executing your Commands via Motion Diffusion in Latent Space

    Authors: Xin Chen, Biao Jiang, Wen Liu, Zilong Huang, Bin Fu, Tao Chen, Jingyi Yu, Gang Yu

    Abstract: We study a challenging task, conditional human motion generation, which produces plausible human motion sequences according to various conditional inputs, such as action classes or textual descriptors. Since human motions are highly diverse and have a property of quite different distribution from conditional modalities, such as textual descriptors in natural languages, it is hard to learn a probab… ▽ More

    Submitted 19 May, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

    Comments: 18 pages, 11 figures, conference

  35. arXiv:2211.12603  [pdf, other

    cs.DC cs.DM cs.ET nlin.AO q-bio.MN

    Reachability in Restricted Chemical Reaction Networks

    Authors: Robert M. Alaniz, Bin Fu, Timothy Gomez, Elise Grizzell, Andrew Rodriguez, Robert Schweller, Tim Wylie

    Abstract: The popularity of molecular computation has given rise to several models of abstraction, one of the more recent ones being Chemical Reaction Networks (CRNs). These are equivalent to other popular computational models, such as Vector Addition Systems and Petri-Nets, and restricted versions are equivalent to Population Protocols. This paper continues the work on core reachability questions related t… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

    Comments: This research was supported in part by National Science Foundation Grant CCF-1817602

  36. arXiv:2211.07997  [pdf, other

    cs.CR cs.AR cs.LG

    Security Closure of IC Layouts Against Hardware Trojans

    Authors: Fangzhou Wang, Qijing Wang, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Lilas Alrahis, Ozgur Sinanoglu, Johann Knechtel, Tsung-Yi Ho, Evangeline F. Y. Young

    Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically harden the physical layouts of ICs against post-desi… ▽ More

    Submitted 15 November, 2022; originally announced November 2022.

    Comments: To appear in ISPD'23

  37. arXiv:2211.04470  [pdf, other

    cs.CV eess.IV

    Efficient Single-Image Depth Estimation on Mobile Devices, Mobile AI & AIM 2022 Challenge: Report

    Authors: Andrey Ignatov, Grigory Malivenko, Radu Timofte, Lukasz Treszczotko, Xin Chang, Piotr Ksiazek, Michal Lopuszynski, Maciej Pioro, Rafal Rudnicki, Maciej Smyl, Yujie Ma, Zhenyu Li, Zehui Chen, Jialei Xu, Xianming Liu, Junjun Jiang, XueChao Shi, Difan Xu, Yanan Li, Xiaotao Wang, Lei Lei, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo , et al. (14 additional authors not shown)

    Abstract: Various depth estimation models are now widely used on many mobile and IoT devices for image segmentation, bokeh effect rendering, object tracking and many other mobile tasks. Thus, it is very crucial to have efficient and accurate depth estimation models that can run fast on low-power mobile chipsets. In this Mobile AI challenge, the target was to develop deep learning-based single image depth es… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2105.08630, arXiv:2211.03885; text overlap with arXiv:2105.08819, arXiv:2105.08826, arXiv:2105.08629, arXiv:2105.07809, arXiv:2105.07825

  38. arXiv:2211.03286  [pdf, other

    cs.RO cs.MA

    Learning Task Requirements and Agent Capabilities for Multi-agent Task Allocation

    Authors: Bo Fu, William Smith, Denise Rizzo, Matthew Castanier, Maani Ghaffari, Kira Barton

    Abstract: This paper presents a learning framework to estimate an agent capability and task requirement model for multi-agent task allocation. With a set of team configurations and the corresponding task performances as the training data, linear task constraints can be learned to be embedded in many existing optimization-based task allocation frameworks. Comprehensive computational evaluations are conducted… ▽ More

    Submitted 7 November, 2022; v1 submitted 6 November, 2022; originally announced November 2022.

    Comments: The video and open-source code are at https://brg.engin.umich.edu/publications/learn-multiagent-taskreq/

    MSC Class: 93A16

  39. arXiv:2210.15134  [pdf, other

    cs.CV

    Learning Variational Motion Prior for Video-based Motion Capture

    Authors: Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, Gang Yu

    Abstract: Motion capture from a monocular video is fundamental and crucial for us humans to naturally experience and interact with each other in Virtual Reality (VR) and Augmented Reality (AR). However, existing methods still struggle with challenging cases involving self-occlusion and complex poses due to the lack of effective motion prior modeling. In this paper, we present a novel variational motion prio… ▽ More

    Submitted 27 October, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: 9 pages, 9 figures

    ACM Class: I.4.8

  40. arXiv:2210.09670  [pdf, other

    cs.CV

    Hierarchical Normalization for Robust Monocular Depth Estimation

    Authors: Chi Zhang, Wei Yin, Zhibin Wang, Gang Yu, Bin Fu, Chunhua Shen

    Abstract: In this paper, we address monocular depth estimation with deep neural networks. To enable training of deep monocular estimation models with various sources of datasets, state-of-the-art methods adopt image-level normalization strategies to generate affine-invariant depth representations. However, learning with image-level normalization mainly emphasizes the relations of pixel representations with… ▽ More

    Submitted 18 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  41. arXiv:2210.05210  [pdf, other

    cs.CV

    Robust Human Matting via Semantic Guidance

    Authors: Xiangguang Chen, Ye Zhu, Yu Li, Bingtao Fu, Lei Sun, Ying Shan, Shan Liu

    Abstract: Automatic human matting is highly desired for many real applications. We investigate recent human matting methods and show that common bad cases happen when semantic human segmentation fails. This indicates that semantic understanding is crucial for robust human matting. From this, we develop a fast yet accurate human matting framework, named Semantic Guided Human Matting (SGHM). It builds on a se… ▽ More

    Submitted 11 October, 2022; originally announced October 2022.

    Comments: ACCV 2022

  42. arXiv:2208.13400  [pdf, other

    cs.CV

    Towards Explaining Demographic Bias through the Eyes of Face Recognition Models

    Authors: Biying Fu, Naser Damer

    Abstract: Biases inherent in both data and algorithms make the fairness of widespread machine learning (ML)-based decision-making systems less than optimal. To improve the trustfulness of such ML decision systems, it is crucial to be aware of the inherent biases in these solutions and to make them more transparent to the public and developers. In this work, we aim at providing a set of explainability tool t… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

    Comments: Accepted at at the 2022 International Joint Conference on Biometrics (IJCB 2022)

  43. arXiv:2208.12986  [pdf, other

    cs.RO cs.CV

    6D Robotic Assembly Based on RGB-only Object Pose Estimation

    Authors: Bowen Fu, Sek Kun Leong, Xiaocong Lian, Xiangyang Ji

    Abstract: Vision-based robotic assembly is a crucial yet challenging task as the interaction with multiple objects requires high levels of precision. In this paper, we propose an integrated 6D robotic system to perceive, grasp, manipulate and assemble blocks with tight tolerances. Aiming to provide an off-the-shelf RGB-only solution, our system is built upon a monocular 6D object pose estimation network tra… ▽ More

    Submitted 27 August, 2022; originally announced August 2022.

    Comments: Accepted by IROS 2022

  44. arXiv:2208.09848  [pdf

    cs.CV eess.IV

    Multi-task Learning for Monocular Depth and Defocus Estimations with Real Images

    Authors: Renzhi He, Hualin Hong, Boya Fu, Fei Liu

    Abstract: Monocular depth estimation and defocus estimation are two fundamental tasks in computer vision. Most existing methods treat depth estimation and defocus estimation as two separate tasks, ignoring the strong connection between them. In this work, we propose a multi-task learning network consisting of an encoder with two decoders to estimate the depth and defocus map from a single focused image. Thr… ▽ More

    Submitted 21 August, 2022; originally announced August 2022.

  45. arXiv:2208.05864  [pdf, other

    cs.CV

    Face Morphing Attacks and Face Image Quality: The Effect of Morphing and the Unsupervised Attack Detection by Quality

    Authors: Biying Fu, Naser Damer

    Abstract: Morphing attacks are a form of presentation attacks that gathered increasing attention in recent years. A morphed image can be successfully verified to multiple identities. This operation, therefore, poses serious security issues related to the ability of a travel or identity document to be verified to belong to multiple persons. Previous works touched on the issue of the quality of morphing attac… ▽ More

    Submitted 12 August, 2022; v1 submitted 11 August, 2022; originally announced August 2022.

    Comments: accepted at IET Biometrics journal

  46. arXiv:2207.09649  [pdf, other

    cs.CV

    GenText: Unsupervised Artistic Text Generation via Decoupled Font and Texture Manipulation

    Authors: Qirui Huang, Bin Fu, Aozhong Zhang, Yu Qiao

    Abstract: Automatic artistic text generation is an emerging topic which receives increasing attention due to its wide applications. The artistic text can be divided into three components, content, font, and texture, respectively. Existing artistic text generation models usually focus on manipulating one aspect of the above components, which is a sub-optimal solution for controllable general artistic text ge… ▽ More

    Submitted 20 July, 2022; originally announced July 2022.

  47. arXiv:2206.05782  [pdf, other

    eess.IV cs.CV cs.LG

    DSCA: A Dual-Stream Network with Cross-Attention on Whole-Slide Image Pyramids for Cancer Prognosis

    Authors: Pei Liu, Bo Fu, Feng Ye, Rui Yang, Bin Xu, Luping Ji

    Abstract: The cancer prognosis on gigapixel Whole-Slide Images (WSIs) has always been a challenging task. To further enhance WSI visual representations, existing methods have explored image pyramids, instead of single-resolution images, in WSIs. In spite of this, they still face two major problems: high computational cost and the unnoticed semantical gap in multi-resolution feature fusion. To tackle these p… ▽ More

    Submitted 28 March, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

    Comments: 12 pages, 6 figures, 7 tables

    Journal ref: Expert Systems with Applications, 120280 (2023)

  48. arXiv:2205.14377  [pdf, other

    cs.CV

    Enhancing Quality of Pose-varied Face Restoration with Local Weak Feature Sensing and GAN Prior

    Authors: Kai Hu, Yu Liu, Renhe Liu, Wei Lu, Gang Yu, Bin Fu

    Abstract: Facial semantic guidance (including facial landmarks, facial heatmaps, and facial parsing maps) and facial generative adversarial networks (GAN) prior have been widely used in blind face restoration (BFR) in recent years. Although existing BFR methods have achieved good performance in ordinary cases, these solutions have limited resilience when applied to face images with serious degradation and p… ▽ More

    Submitted 14 June, 2023; v1 submitted 28 May, 2022; originally announced May 2022.

    Comments: pdfLaTeX 2021, 11 pages with 15 figures

    ACM Class: I.4.4

  49. arXiv:2205.01030  [pdf, other

    eess.SP cs.AI cs.LG

    GMSS: Graph-Based Multi-Task Self-Supervised Learning for EEG Emotion Recognition

    Authors: Yang Li, Ji Chen, Fu Li, Boxun Fu, Hao Wu, Youshuo Ji, Yijin Zhou, Yi Niu, Guangming Shi, Wenming Zheng

    Abstract: Previous electroencephalogram (EEG) emotion recognition relies on single-task learning, which may lead to overfitting and learned emotion features lacking generalization. In this paper, a graph-based multi-task self-supervised learning model (GMSS) for EEG emotion recognition is proposed. GMSS has the ability to learn more general representations by integrating multiple self-supervised tasks, incl… ▽ More

    Submitted 11 April, 2022; originally announced May 2022.

  50. arXiv:2204.04916  [pdf, ps, other

    cs.CL

    A Token-level Contrastive Framework for Sign Language Translation

    Authors: Biao Fu, Peigen Ye, Liang Zhang, Pei Yu, Cong Hu, Yidong Chen, Xiaodong Shi

    Abstract: Sign Language Translation (SLT) is a promising technology to bridge the communication gap between the deaf and the hearing people. Recently, researchers have adopted Neural Machine Translation (NMT) methods, which usually require large-scale corpus for training, to achieve SLT. However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and th… ▽ More

    Submitted 21 March, 2023; v1 submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to ICASSP 2023