Skip to main content

Showing 1–18 of 18 results for author: Bin, Y

  1. arXiv:2407.03788  [pdf, other

    cs.CV cs.CL

    Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning

    Authors: Thong Nguyen, Yi Bin, Xiaobao Wu, Xinshuai Dong, Zhiyuan Hu, Khoi Le, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Data quality stands at the forefront of deciding the effectiveness of video-language representation learning. However, video-text pairs in previous data typically do not align perfectly with each other, which might lead to video-language representations that do not accurately reflect cross-modal semantics. Moreover, previous data also possess an uneven distribution of concepts, thereby hampering t… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  2. arXiv:2406.17294  [pdf, other

    cs.CL

    Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

    Authors: Wenhao Shi, Zhiqiang Hu, Yi Bin, Junhua Liu, Yang Yang, See-Kiong Ng, Lidong Bing, Roy Ka-Wei Lee

    Abstract: Large language models (LLMs) have demonstrated impressive reasoning capabilities, particularly in textual mathematical problem-solving. However, existing open-source image instruction fine-tuning datasets, containing limited question-answer pairs per image, do not fully exploit visual information to enhance the multimodal mathematical reasoning capabilities of Multimodal LLMs (MLLMs). To bridge th… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages

  3. arXiv:2406.05615  [pdf, other

    cs.CL

    Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

    Authors: Thong Nguyen, Yi Bin, Junbin Xiao, Leigang Qu, Yicong Li, Jay Zhangjie Wu, Cong-Duy Nguyen, See-Kiong Ng, Luu Anh Tuan

    Abstract: Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video-language understanding systems with human-like senses since a video-language pair can mimic both our linguistic medium and visual environment with te… ▽ More

    Submitted 1 July, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024 (Findings)

  4. arXiv:2404.05705  [pdf, other

    cs.CV

    Learning 3D-Aware GANs from Unposed Images with Template Feature Field

    Authors: Xinya Chen, Hanlei Guo, Yanrui Bin, Shangzhan Zhang, Yuanbo Yang, Yue Wang, Yujun Shen, Yiyi Liao

    Abstract: Collecting accurate camera poses of training images has been shown to well serve the learning of 3D-aware generative adversarial networks (GANs) yet can be quite expensive in practice. This work targets learning 3D-aware GANs from unposed images, for which we propose to perform on-the-fly pose estimation of training images with a learned template feature field (TeFF). Concretely, in addition to a… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: https://XDimlab.github.io/TeFF

  5. arXiv:2311.01807  [pdf, other

    cs.SI

    Cross-modal Consistency Learning with Fine-grained Fusion Network for Multimodal Fake News Detection

    Authors: Jun Li, Yi Bin, Jie Zou, Jie Zou, Guoqing Wang, Yang Yang

    Abstract: Previous studies on multimodal fake news detection have observed the mismatch between text and images in the fake news and attempted to explore the consistency of multimodal news based on global features of different modalities. However, they fail to investigate this relationship between fine-grained fragments in multimodal content. To gain public trust, fake news often includes relevant parts in… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  6. arXiv:2310.12640  [pdf, other

    cs.CL

    Non-Autoregressive Sentence Ordering

    Authors: Yi Bin, Wenhao Shi, Bin Ji, Jipeng Zhang, Yujuan Ding, Yang Yang

    Abstract: Existing sentence ordering approaches generally employ encoder-decoder frameworks with the pointer net to recover the coherence by recurrently predicting each sentence step-by-step. Such an autoregressive manner only leverages unilateral dependencies during decoding and cannot fully explore the semantic dependency between sentences for ordering. To overcome these limitations, in this paper, we pro… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted at Findings of EMNLP2023

  7. arXiv:2310.09590  [pdf, other

    cs.CL cs.AI

    Solving Math Word Problems with Reexamination

    Authors: Yi Bin, Wenhao Shi, Yujuan Ding, Yang Yang, See-Kiong Ng

    Abstract: Math word problem (MWP) solving aims to understand the descriptive math problem and calculate the result, for which previous efforts are mostly devoted to upgrade different technical modules. This paper brings a different perspective of \textit{reexamination process} during training by introducing a pseudo-dual task to enhance the MWP solving. We propose a pseudo-dual (PseDual) learning scheme to… ▽ More

    Submitted 19 November, 2023; v1 submitted 14 October, 2023; originally announced October 2023.

    Comments: To be appeared at NeurIPS2023 Workshop on MATH-AI

  8. arXiv:2309.04800  [pdf, other

    cs.CV

    VeRi3D: Generative Vertex-based Radiance Fields for 3D Controllable Human Image Synthesis

    Authors: Xinya Chen, Jiaxin Huang, Yanrui Bin, Lu Yu, Yiyi Liao

    Abstract: Unsupervised learning of 3D-aware generative adversarial networks has lately made much progress. Some recent work demonstrates promising results of learning human generative models using neural articulated radiance fields, yet their generalization ability and controllability lag behind parametric human models, i.e., they do not perform well when generalizing to novel pose/shape and are not part co… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  9. arXiv:2308.04380  [pdf, other

    cs.CV cs.IR cs.MM

    Your Negative May not Be True Negative: Boosting Image-Text Matching with False Negative Elimination

    Authors: Haoxuan Li, Yi Bin, Junrong Liao, Yang Yang, Heng Tao Shen

    Abstract: Most existing image-text matching methods adopt triplet loss as the optimization objective, and choosing a proper negative sample for the triplet of <anchor, positive, negative> is important for effectively training the model, e.g., hard negatives make the model learn efficiently and effectively. However, we observe that existing methods mainly employ the most similar samples as hard negatives, wh… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM MM 2023

  10. arXiv:2308.04343  [pdf, other

    cs.CV cs.IR cs.MM

    Unifying Two-Stream Encoders with Transformers for Cross-Modal Retrieval

    Authors: Yi Bin, Haoxuan Li, Yahui Xu, Xing Xu, Yang Yang, Heng Tao Shen

    Abstract: Most existing cross-modal retrieval methods employ two-stream encoders with different architectures for images and texts, \textit{e.g.}, CNN for images and RNN/Transformer for texts. Such discrepancy in architectures may induce different semantic distribution spaces and limit the interactions between images and texts, and further result in inferior alignment between images and texts. To fill this… ▽ More

    Submitted 8 August, 2023; originally announced August 2023.

    Comments: Accepted at ACM Multimedia 2023

  11. arXiv:2306.11746  [pdf, other

    cs.SI cs.MM

    Focusing on Relevant Responses for Multi-modal Rumor Detection

    Authors: Jun Li, Yi Bin, Liang Peng, Yang Yang, Yangyang Li, Hao Jin, Zi Huang

    Abstract: In the absence of an authoritative statement about a rumor, people may expose the truth behind such rumor through their responses on social media. Most rumor detection methods aggregate the information of all the responses and have made great progress. However, due to the different backgrounds of users, the responses have different relevance for discovering th suspicious points hidden in a rumor c… ▽ More

    Submitted 18 June, 2023; originally announced June 2023.

    Comments: Submitted to TKDE

  12. arXiv:2305.04556  [pdf, other

    cs.CL cs.AI

    Non-Autoregressive Math Word Problem Solver with Unified Tree Structure

    Authors: Yi Bin, Mengqun Han, Wenhao Shi, Lei Wang, Yang Yang, See-Kiong Ng, Heng Tao Shen

    Abstract: Existing MWP solvers employ sequence or binary tree to present the solution expression and decode it from given problem description. However, such structures fail to handle the variants that can be derived via mathematical manipulation, e.g., $(a_1+a_2) * a_3$ and $a_1 * a_3+a_2 * a_3$ can both be possible valid solutions for a same problem but formulated as different expression sequences or trees… ▽ More

    Submitted 28 October, 2023; v1 submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP2023

  13. arXiv:2201.02062  [pdf

    cs.NI

    Traffic Flow Modeling for UAV-Enabled Wireless Networks

    Authors: A. Abada, Y. Bin, T. Taleb

    Abstract: This paper investigates traffic flow modeling issue in multi-services oriented unmanned aerial vehicle (UAV)-enabled wireless networks, which is critical for supporting future various applications of such networks. We propose a general traffic flow model for multi-services oriented UAV-enable wireless networks. Under this model, we first classify the network services into three subsets: telemetry,… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

  14. arXiv:2105.03072  [pdf, other

    eess.IV cs.CV

    NTIRE 2021 Challenge on Perceptual Image Quality Assessment

    Authors: Jinjin Gu, Haoming Cai, Chao Dong, Jimmy S. Ren, Yu Qiao, Shuhang Gu, Radu Timofte, Manri Cheon, Sungjun Yoon, Byungyeon Kang, Junwoo Lee, Qing Zhang, Haiyang Guo, Yi Bin, Yuqing Hou, Hengliang Luo, Jingyu Guo, Zirui Wang, Hai Wang, Wenming Yang, Qingyan Bai, Shuwei Shi, Weihao Xia, Mingdeng Cao, Jiahao Wang , et al. (25 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2021 challenge on perceptual image quality assessment (IQA), held in conjunction with the New Trends in Image Restoration and Enhancement workshop (NTIRE) workshop at CVPR 2021. As a new type of image processing technology, perceptual image processing algorithms based on Generative Adversarial Networks (GAN) have produced images with more realistic textures. These o… ▽ More

    Submitted 28 June, 2021; v1 submitted 7 May, 2021; originally announced May 2021.

  15. arXiv:2011.11221  [pdf, other

    cs.CV

    Adversarial Refinement Network for Human Motion Prediction

    Authors: Xianjin Chao, Yanrui Bin, Wenqing Chu, Xuan Cao, Yanhao Ge, Chengjie Wang, Jilin Li, Feiyue Huang, Howard Leung

    Abstract: Human motion prediction aims to predict future 3D skeletal sequences by giving a limited human motion as inputs. Two popular methods, recurrent neural networks and feed-forward deep networks, are able to predict rough motion trend, but motion details such as limb movement may be lost. To predict more accurate future human motion, we propose an Adversarial Refinement Network (ARNet) following a sim… ▽ More

    Submitted 23 November, 2020; v1 submitted 23 November, 2020; originally announced November 2020.

    Comments: Accepted by ACCV 2020(Oral)

  16. arXiv:2008.00697  [pdf, other

    cs.CV

    Adversarial Semantic Data Augmentation for Human Pose Estimation

    Authors: Yanrui Bin, Xuan Cao, Xinya Chen, Yanhao Ge, Ying Tai, Chengjie Wang, Jilin Li, Feiyue Huang, Changxin Gao, Nong Sang

    Abstract: Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge the amounts of challenging cases, previous methods augmented images by cropping and pasting image patches with weak semantics, which leads to unrealistic appe… ▽ More

    Submitted 3 August, 2020; originally announced August 2020.

  17. arXiv:2005.09816  [pdf, other

    cs.CV

    Relevant Region Prediction for Crowd Counting

    Authors: Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, Hao Tang

    Abstract: Crowd counting is a concerned and challenging task in computer vision. Existing density map based methods excessively focus on the individuals' localization which harms the crowd counting performance in highly congested scenes. In addition, the dependency between the regions of different density is also ignored. In this paper, we propose Relevant Region Prediction (RRP) for crowd counting, which c… ▽ More

    Submitted 19 May, 2020; originally announced May 2020.

    Comments: accepted by Neurocomputing

  18. arXiv:1606.04631  [pdf, other

    cs.MM cs.CL

    Bidirectional Long-Short Term Memory for Video Description

    Authors: Yi Bin, Yang Yang, Zi Huang, Fumin Shen, Xing Xu, Heng Tao Shen

    Abstract: Video captioning has been attracting broad research attention in multimedia community. However, most existing approaches either ignore temporal information among video frames or just employ local contextual temporal knowledge. In this work, we propose a novel video captioning framework, termed as \emph{Bidirectional Long-Short Term Memory} (BiLSTM), which deeply captures bidirectional global tempo… ▽ More

    Submitted 14 June, 2016; originally announced June 2016.

    Comments: 5 pages