Skip to main content

Showing 1–1 of 1 results for author: Lerzer, N

  1. arXiv:2406.10421  [pdf, other

    cs.CL

    SciEx: Benchmarking Large Language Models on Scientific Exams with Human Expert Grading and Automatic Grading

    Authors: Tu Anh Dinh, Carlos Mullov, Leonard Bärmann, Zhaolin Li, Danni Liu, Simon Reiß, Jueun Lee, Nathan Lerzer, Fabian Ternava, Jianfeng Gao, Tobias Röddiger, Alexander Waibel, Tamim Asfour, Michael Beigl, Rainer Stiefelhagen, Carsten Dachsbacher, Klemens Böhm, Jan Niehues

    Abstract: With the rapid development of Large Language Models (LLMs), it is crucial to have benchmarks which can evaluate the ability of LLMs on different domains. One common use of LLMs is performing tasks on scientific topics, such as writing algorithms, querying databases or giving mathematical proofs. Inspired by the way university students are evaluated on such tasks, in this paper, we propose SciEx -… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

    ACM Class: I.2.7