Skip to main content

Showing 1–3 of 3 results for author: Sacheti, A

  1. arXiv:2106.09889  [pdf, other

    cs.CL cs.CV cs.MM

    GEM: A General Evaluation Benchmark for Multimodal Tasks

    Authors: Lin Su, Nan Duan, Edward Cui, Lei Ji, Chenfei Wu, Huaishao Luo, Yongfei Liu, Ming Zhong, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO an… ▽ More

    Submitted 17 June, 2021; originally announced June 2021.

    Comments: Accepted by Findings of ACL 2021

  2. arXiv:2001.07966  [pdf, other

    cs.CV

    ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

    Authors: Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them. The model is pre-trained on four tasks simultaneously: Masked Language Modeling (MLM), Masked Object Classification (MOC), Masked Region Feature Regression (MRF… ▽ More

    Submitted 23 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

  3. arXiv:1802.04914  [pdf, other

    cs.CV

    Web-Scale Responsive Visual Search at Bing

    Authors: Houdong Hu, Yan Wang, Linjun Yang, Pavel Komlev, Li Huang, Xi Chen, Jiapei Huang, Ye Wu, Meenaz Merchant, Arun Sacheti

    Abstract: In this paper, we introduce a web-scale general visual search system deployed in Microsoft Bing. The system accommodates tens of billions of images in the index, with thousands of features for each image, and can respond in less than 200 ms. In order to overcome the challenges in relevance, latency, and scalability in such large scale of data, we employ a cascaded learning-to-rank framework based… ▽ More

    Submitted 20 February, 2018; v1 submitted 13 February, 2018; originally announced February 2018.