Skip to main content

Showing 1–2 of 2 results for author: Swift, C

  1. arXiv:2311.12983  [pdf, other

    cs.CL cs.AI

    GAIA: a benchmark for General AI Assistants

    Authors: Grégoire Mialon, Clémentine Fourrier, Craig Swift, Thomas Wolf, Yann LeCun, Thomas Scialom

    Abstract: We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human r… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  2. arXiv:2311.10538  [pdf, other

    cs.AI

    Testing Language Model Agents Safely in the Wild

    Authors: Silen Naihin, David Atkinson, Marc Green, Merwane Hamadi, Craig Swift, Douglas Schonholtz, Adam Tauman Kalai, David Bau

    Abstract: A prerequisite for safe autonomy-in-the-wild is safe testing-in-the-wild. Yet real-world autonomous tests face several unique safety challenges, both due to the possibility of causing harm during a test, as well as the risk of encountering new unsafe agent behavior through interactions with real-world and potentially malicious actors. We propose a framework for conducting safe autonomous agent tes… ▽ More

    Submitted 3 December, 2023; v1 submitted 17 November, 2023; originally announced November 2023.