Skip to main content

Showing 1–11 of 11 results for author: Shan, D

  1. arXiv:2404.04924  [pdf, other

    cs.CV cs.AI

    GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets

    Authors: Dongjing Shan, guiqiang chen

    Abstract: Vision Transformers (ViTs) have achieved impressive results in large-scale image classification. However, when training from scratch on small datasets, there is still a significant performance gap between ViTs and Convolutional Neural Networks (CNNs), which is attributed to the lack of inductive bias. To address this issue, we propose a Graph-based Vision Transformer (GvT) that utilizes graph conv… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  2. arXiv:2402.02029  [pdf, other

    cs.CV cs.AI cs.LG

    ScribFormer: Transformer Makes CNN Work Better for Scribble-based Medical Image Segmentation

    Authors: Zihan Li, Yuan Zheng, Dandan Shan, Shuzhou Yang, Qingde Li, Beizhan Wang, Yuanting Zhang, Qingqi Hong, Dinggang Shen

    Abstract: Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally can only capture small-range feature dependency for the convolutional layer with the local receptive field, which makes it difficult to learn global shape information from the limited information provided by scribble annot… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: Accepted by IEEE Transactions on Medical Imaging (TMI)

  3. arXiv:2312.05251  [pdf, other

    cs.CV

    Reconstructing Hands in 3D with Transformers

    Authors: Georgios Pavlakos, Dandan Shan, Ilija Radosavovic, Angjoo Kanazawa, David Fouhey, Jitendra Malik

    Abstract: We present an approach that can reconstruct hands in 3D from monocular input. Our approach for Hand Mesh Recovery, HaMeR, follows a fully transformer-based architecture and can analyze hands with significantly increased accuracy and robustness compared to previous work. The key to HaMeR's success lies in scaling up both the data used for training and the capacity of the deep network for hand recon… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  4. arXiv:2307.16226  [pdf, other

    cs.CV cs.MM

    ScribbleVC: Scribble-supervised Medical Image Segmentation with Vision-Class Embedding

    Authors: Zihan Li, Yuan Zheng, Xiangde Luo, Dandan Shan, Qingqi Hong

    Abstract: Medical image segmentation plays a critical role in clinical decision-making, treatment planning, and disease monitoring. However, accurate segmentation of medical images is challenging due to several factors, such as the lack of high-quality annotation, imaging noise, and anatomical differences across patients. In addition, there is still a considerable gap in performance between the existing lab… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: Accepted by ACM MM 2023, project page: https://github.com/HUANGLIZI/ScribbleVC

  5. arXiv:2303.00279  [pdf, other

    eess.IV cs.CL cs.CV cs.IR

    Coarse-to-Fine Covid-19 Segmentation via Vision-Language Alignment

    Authors: Dandan Shan, Zihan Li, Wentao Chen, Qingde Li, Jie Tian, Qingqi Hong

    Abstract: Segmentation of COVID-19 lesions can assist physicians in better diagnosis and treatment of COVID-19. However, there are few relevant studies due to the lack of detailed information and high-quality annotation in the COVID-19 dataset. To solve the above problem, we propose C2FVL, a Coarse-to-Fine segmentation framework via Vision-Language alignment to merge text information containing the number o… ▽ More

    Submitted 1 March, 2023; originally announced March 2023.

    Comments: Accepted by ICASSP 2023

  6. arXiv:2209.13064  [pdf, other

    cs.CV cs.AI cs.LG

    EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations

    Authors: Ahmad Darkhalil, Dandan Shan, Bin Zhu, Jian Ma, Amlan Kar, Richard Higgins, Sanja Fidler, David Fouhey, Dima Damen

    Abstract: We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transf… ▽ More

    Submitted 26 September, 2022; originally announced September 2022.

    Comments: 10 pages main, 38 pages appendix. Accepted at NeurIPS 2022 Track on Datasets and Benchmarks Data, code and leaderboards from: http://epic-kitchens.github.io/VISOR

  7. arXiv:2202.08138  [pdf, other

    cs.CV cs.CL

    When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

    Authors: Oana Ignat, Santiago Castro, Yuhang Zhou, Jiajun Bao, Dandan Shan, Rada Mihalcea

    Abstract: We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effec… ▽ More

    Submitted 21 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:1906.04236

  8. arXiv:2006.06669  [pdf, other

    cs.CV

    Understanding Human Hands in Contact at Internet Scale

    Authors: Dandan Shan, Jiaqi Geng, Michelle Shu, David F. Fouhey

    Abstract: Hands are the central means by which humans manipulate their world and being able to reliably extract hand state information from Internet videos of humans engaged in their hands has the potential to pave the way to systems that can learn from petabytes of video data. This paper proposes steps towards this by inferring a rich representation of hands engaged in interaction method that includes: han… ▽ More

    Submitted 11 June, 2020; originally announced June 2020.

    Comments: To appear at CVPR 2020 (Oral). Project and dataset webpage: http://fouheylab.eecs.umich.edu/~dandans/projects/100DOH/

  9. arXiv:1806.05406  [pdf, other

    cs.NI

    Micro Congestion Control: Every Flow Deserves a Second Chance

    Authors: Kefan Chen, Danfeng Shan, Xiaohui Luo, Tong Zhang, Yajun Yang, Ya Zhao, Fengyuan Ren

    Abstract: Today, considerable Internet traffic is sent from the datacenter and heads for users. The characteristics of connections served by servers in datacenters are usually diverse and varied over time, with continuous upgrades in network infrastructure and user devices. As a result, a specific congestion control algorithm hardly accommodates the heterogeneity and performs well in various scenarios. In t… ▽ More

    Submitted 22 October, 2018; v1 submitted 14 June, 2018; originally announced June 2018.

  10. arXiv:1604.07621  [pdf, other

    cs.NI

    Micro-burst in Data Centers: Observations, Implications, and Applications

    Authors: Danfeng Shan, Fengyuan Ren, Peng Cheng, Ran Shu

    Abstract: Micro-burst traffic is not uncommon in data centers. It can cause packet dropping, which results in serious performance degradation (e.g., Incast problem). However, current solutions that attempt to suppress micro-burst traffic are extrinsic and ad hoc, since they lack the comprehensive and essential understanding of micro-burst's root cause and dynamic behavior. On the other hand, traditional stu… ▽ More

    Submitted 26 April, 2016; originally announced April 2016.

    Comments: 14 pages, 18 figures

  11. A General SIMD-based Approach to Accelerating Compression Algorithms

    Authors: Wayne Xin Zhao, Xudong Zhang, Daniel Lemire, Dongdong Shan, Jian-Yun Nie, Hongfei Yan, Ji-Rong Wen

    Abstract: Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance. Previous research has shown that SIMD-based optimizations can multiply decoding speeds. Following these pioneering studies, we propose a general approach to accelerate… ▽ More

    Submitted 6 February, 2015; originally announced February 2015.

    ACM Class: E.4; H.3.1; C.1.2

    Journal ref: ACM Trans. Inf. Syst. 33, 3, Article 15 (March 2015)