Skip to main content

Showing 1–5 of 5 results for author: Bouyarmane, K

  1. arXiv:2406.02987  [pdf, other

    cs.CV

    Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment

    Authors: Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton, Boxin Du, Shioulin Sam, Karim Bouyarmane, Ismail Tutar, Junzhou Huang

    Abstract: Multimodal Large Language Models (MLLMs) have achieved SOTA performance in various visual language tasks by fusing the visual representations with LLMs leveraging some visual adapters. In this paper, we first establish that adapters using query-based Transformers such as Q-former is a simplified Multi-instance Learning method without considering instance heterogeneity/correlation. We then propose… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  2. arXiv:2406.00069  [pdf, other

    cs.CL cs.LG

    Confidence-Aware Sub-Structure Beam Search (CABS): Mitigating Hallucination in Structured Data Generation with Large Language Models

    Authors: Chengwei Wei, Kee Kiat Koo, Amir Tavanaei, Karim Bouyarmane

    Abstract: Large Language Models (LLMs) have facilitated structured data generation, with applications in domains like tabular data, document databases, product catalogs, etc. However, concerns persist about generation veracity due to incorrect references or hallucinations, necessitating the incorporation of some form of model confidence for mitigation. Existing confidence estimation methods on LLM generatio… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

  3. arXiv:2401.13795  [pdf, other

    cs.CV

    Diffuse to Choose: Enriching Image Conditioned Inpainting in Latent Diffusion Models for Virtual Try-All

    Authors: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar

    Abstract: As online shopping is growing, the ability for buyers to virtually visualize products in their settings-a phenomenon we define as "Virtual Try-All"-has become crucial. Recent diffusion models inherently contain a world model, rendering them suitable for this task within an inpainting context. However, traditional image-conditioned diffusion models often fail to capture the fine-grained details of… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  4. arXiv:2308.16354  [pdf, other

    cs.CV

    Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

    Authors: Wenyi Wu, Karim Bouyarmane, Ismail Tutar

    Abstract: We present Catalog Phrase Grounding (CPG), a model that can associate product textual data (title, brands) into corresponding regions of product images (isolated product region, brand logo region) for e-commerce vision-language applications. We use a state-of-the-art modulated multimodal transformer encoder-decoder architecture unifying object detection and phrase-grounding. We train the model in… ▽ More

    Submitted 30 August, 2023; originally announced August 2023.

    Comments: KDD 2022 Workshop on First Content Understanding and Generation for e-Commerce

  5. arXiv:2305.01257  [pdf, other

    cs.CV cs.AI

    DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling

    Authors: Mehmet Saygin Seyfioglu, Karim Bouyarmane, Suren Kumar, Amir Tavanaei, Ismail B. Tutar

    Abstract: We introduce DreamPaint, a framework to intelligently inpaint any e-commerce product on any user-provided context image. The context image can be, for example, the user's own image for virtual try-on of clothes from the e-commerce catalog on themselves, the user's room image for virtual try-on of a piece of furniture from the e-commerce catalog in their room, etc. As opposed to previous augmented-… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.