Skip to main content

Showing 1–4 of 4 results for author: Bhola, I

  1. arXiv:2405.15341  [pdf, other

    cs.AI cs.CV

    V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

    Authors: Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

    Abstract: In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs). Despite these advancements, the nuanced interaction and understanding of GUIs pose a significant challenge, limiting th… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  2. arXiv:2404.16048  [pdf, other

    cs.HC cs.AI

    GUIDE: Graphical User Interface Data for Execution

    Authors: Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

    Abstract: In this paper, we introduce GUIDE, a novel dataset tailored for the advancement of Multimodal Large Language Model (MLLM) applications, particularly focusing on Robotic Process Automation (RPA) use cases. Our dataset encompasses diverse data from various websites including Apollo(62.67\%), Gmail(3.43\%), Calendar(10.98\%) and Canva(22.92\%). Each data entry includes an image, a task description, t… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 11 pages, 8 figures, 3 Tables and 1 Algorithm

  3. arXiv:2403.10171  [pdf

    cs.AI cs.CV

    AUTONODE: A Neuro-Graphic Self-Learnable Engine for Cognitive GUI Automation

    Authors: Arkajit Datta, Tushar Verma, Rajat Chawla, Mukunda N. S, Ishaan Bhola

    Abstract: In recent advancements within the domain of Large Language Models (LLMs), there has been a notable emergence of agents capable of addressing Robotic Process Automation (RPA) challenges through enhanced cognitive capabilities and sophisticated reasoning. This development heralds a new era of scalability and human-like adaptability in goal attainment. In this context, we introduce AUTONODE (Autonomo… ▽ More

    Submitted 27 May, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted in MIPR-2024

  4. arXiv:2403.08773  [pdf, other

    cs.CV cs.AI cs.CL cs.MM

    Veagle: Advancements in Multimodal Representation Learning

    Authors: Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

    Abstract: Lately, researchers in artificial intelligence have been really interested in how language and vision come together, giving rise to the development of multimodal models that aim to seamlessly integrate textual and visual information. Multimodal models, an extension of Large Language Models (LLMs), have exhibited remarkable capabilities in addressing a diverse array of tasks, ranging from image cap… ▽ More

    Submitted 18 January, 2024; originally announced March 2024.