Skip to main content

Showing 1–30 of 30 results for author: Burtsev, M

  1. arXiv:2407.04841  [pdf, other

    cs.CL cs.AI cs.LG

    Associative Recurrent Memory Transformer

    Authors: Ivan Rodkin, Yuri Kuratov, Aydar Bulatov, Mikhail Burtsev

    Abstract: This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We dem… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: ICML 2024 Next Generation of Sequence Modeling Architectures Workshop

    ACM Class: I.2.7

  2. arXiv:2407.04363  [pdf, other

    cs.AI

    AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents

    Authors: Petr Anokhin, Nikita Semenov, Artyom Sorokin, Dmitry Evseev, Mikhail Burtsev, Evgeny Burnaev

    Abstract: Advancements in generative AI have broadened the potential applications of Large Language Models (LLMs) in the development of autonomous agents. Achieving true autonomy requires accumulating and updating knowledge gained from interactions with the environment and effectively utilizing it. Current LLM-based approaches leverage past experiences using a full history of observations, summarization or… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: Code for this work is avaliable at https://github.com/AIRI-Institute/AriGraph

  3. Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

    Authors: Alsu Sagirova, Mikhail Burtsev

    Abstract: Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts. This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. Such working memory enhances the quality of model predictions in machine translation task a… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages, 6 figures. Published in the journal Cognitive Systems Research 3 June 2022: https://www.sciencedirect.com/science/article/abs/pii/S1389041722000274

    Journal ref: Cognitive Systems Research, Volume 75, 2022, Pages 16-24, ISSN 1389-0417

  4. arXiv:2406.10149  [pdf, other

    cs.CL cs.AI

    BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

    Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

    Abstract: In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long doc… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  5. arXiv:2402.10790  [pdf, other

    cs.CL cs.AI cs.LG

    In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

    Authors: Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev

    Abstract: This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for seque… ▽ More

    Submitted 20 February, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

    Comments: 11M tokens, fix qa3 min facts per task in Table 1

  6. arXiv:2311.18151  [pdf, other

    cs.CL

    Uncertainty Guided Global Memory Improves Multi-Hop Question Answering

    Authors: Alsu Sagirova, Mikhail Burtsev

    Abstract: Transformers have become the gold standard for many natural language processing tasks and, in particular, for multi-hop question answering (MHQA). This task includes processing a long document and reasoning over the multiple parts of it. The landscape of MHQA approaches can be classified into two primary categories. The first group focuses on extracting supporting evidence, thereby constraining th… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: 12 pages, 7 figures. EMNLP 2023. Our code is available at https://github.com/Aloriosa/GEMFormer

  7. arXiv:2311.01326  [pdf, other

    cs.CL cs.AI

    Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information

    Authors: Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev

    Abstract: Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based lang… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to Findings of the Association for Computational Linguistics: EMNLP 2023

  8. arXiv:2306.07797  [pdf, ps, other

    cs.CL cs.AI

    Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

    Authors: Dmitry Karpov, Mikhail Burtsev

    Abstract: This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have p… ▽ More

    Submitted 4 July, 2023; v1 submitted 13 June, 2023; originally announced June 2023.

  9. arXiv:2304.11062  [pdf, other

    cs.CL cs.AI cs.LG

    Scaling Transformer to 1M tokens and beyond with RMT

    Authors: Aydar Bulatov, Yuri Kuratov, Yermek Kapushev, Mikhail S. Burtsev

    Abstract: A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer models to extend input context length while linearly scaling compute. Our approach demonstrates the capability to store information in memory for sequences of up… ▽ More

    Submitted 6 February, 2024; v1 submitted 19 April, 2023; originally announced April 2023.

  10. arXiv:2301.03252  [pdf, other

    cs.CL

    Active Learning for Abstractive Text Summarization

    Authors: Akim Tsvigun, Ivan Lysenko, Danila Sedashov, Ivan Lazichny, Eldar Damirov, Vladimir Karlov, Artemy Belousov, Leonid Sanochkin, Maxim Panov, Alexander Panchenko, Mikhail Burtsev, Artem Shelmanov

    Abstract: Construction of human-curated annotated datasets for abstractive text summarization (ATS) is very time-consuming and expensive because creating each instance requires a human annotator to read a long document and compose a shorter summary that would preserve the key information relayed by the original document. Active Learning (AL) is a technique developed to reduce the amount of annotation requir… ▽ More

    Submitted 9 January, 2023; originally announced January 2023.

    Comments: Accepted at EMNLP-2022 Findings

  11. arXiv:2211.06552  [pdf, other

    cs.CL cs.AI

    Collecting Interactive Multi-modal Datasets for Grounded Language Understanding

    Authors: Shrestha Mohanty, Negar Arabzadeh, Milagro Teruel, Yuxuan Sun, Artem Zholus, Alexey Skrynnik, Mikhail Burtsev, Kavya Srinet, Aleksandr Panov, Arthur Szlam, Marc-Alexandre Côté, Julia Kiseleva

    Abstract: Human intelligence can remarkably adapt quickly to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research which can enable similar capabilities in machines, we made the following contributions (1) formalized the co… ▽ More

    Submitted 21 March, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

    Journal ref: Interactive Learning for Natural Language Processing NeurIPS 2022 Workshop

  12. arXiv:2211.00688  [pdf, other

    cs.AI cs.CL

    Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions

    Authors: Alexey Skrynnik, Zoya Volovikova, Marc-Alexandre Côté, Anton Voronov, Artem Zholus, Negar Arabzadeh, Shrestha Mohanty, Milagro Teruel, Ahmed Awadallah, Aleksandr Panov, Mikhail Burtsev, Julia Kiseleva

    Abstract: The adoption of pre-trained language models to generate action plans for embodied agents is a promising research strategy. However, execution of instructions in real or simulated environments requires verification of the feasibility of actions as well as their relevance to the completion of a goal. We propose a new method that combines a language model and reinforcement learning for the task of bu… ▽ More

    Submitted 1 November, 2022; originally announced November 2022.

    Comments: 6 pages, 3 figures

  13. arXiv:2207.13649  [pdf, other

    cs.LG cs.AI

    Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes

    Authors: Artyom Sorokin, Nazar Buzun, Leonid Pugachev, Mikhail Burtsev

    Abstract: In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires to store prohibitively large intermediate data if a sequence consists of thousands or even millions elements, and… ▽ More

    Submitted 30 November, 2022; v1 submitted 27 July, 2022; originally announced July 2022.

    Journal ref: Advances in Neural Information Processing Systems 35 (NeurIPS 2022)

  14. arXiv:2207.06881  [pdf, other

    cs.CL cs.LG

    Recurrent Memory Transformer

    Authors: Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

    Abstract: Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-… ▽ More

    Submitted 8 December, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  15. arXiv:2205.13771  [pdf, other

    cs.CL

    IGLU 2022: Interactive Grounded Language Understanding in a Collaborative Environment at NeurIPS 2022

    Authors: Julia Kiseleva, Alexey Skrynnik, Artem Zholus, Shrestha Mohanty, Negar Arabzadeh, Marc-Alexandre Côté, Mohammad Aliannejadi, Milagro Teruel, Ziming Li, Mikhail Burtsev, Maartje ter Hoeve, Zoya Volovikova, Aleksandr Panov, Yuxuan Sun, Kavya Srinet, Arthur Szlam, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: text overlap with arXiv:2110.06536

  16. arXiv:2205.02388  [pdf, other

    cs.CL cs.AI

    Interactive Grounded Language Understanding in a Collaborative Environment: IGLU 2021

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Marc-Alexandre Côté, Katja Hofmann, Ahmed Awadallah, Linar Abdrazakov, Igor Churin, Putra Manggala, Kata Naszadi, Michiel van der Meer, Taewoon Kim

    Abstract: Human intelligence has the remarkable ability to quickly adapt to new tasks and environments. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose \emph{IGLU: Interactive Grounded Language Understanding in a Co… ▽ More

    Submitted 27 May, 2022; v1 submitted 4 May, 2022; originally announced May 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.06536

    Journal ref: Proceedings of Machine Learning Research NeurIPS 2021 Competition and Demonstration Track

  17. arXiv:2205.02340  [pdf, other

    cs.CL cs.LG

    Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

    Authors: Alina Kolesnikova, Yuri Kuratov, Vasily Konovalov, Mikhail Burtsev

    Abstract: Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden rep… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

  18. arXiv:2110.06536  [pdf, other

    cs.AI

    NeurIPS 2021 Competition IGLU: Interactive Grounded Language Understanding in a Collaborative Environment

    Authors: Julia Kiseleva, Ziming Li, Mohammad Aliannejadi, Shrestha Mohanty, Maartje ter Hoeve, Mikhail Burtsev, Alexey Skrynnik, Artem Zholus, Aleksandr Panov, Kavya Srinet, Arthur Szlam, Yuxuan Sun, Katja Hofmann, Michel Galley, Ahmed Awadallah

    Abstract: Human intelligence has the remarkable ability to adapt to new tasks and environments quickly. Starting from a very young age, humans acquire new skills and learn how to solve new tasks either by imitating the behavior of others or by following provided natural language instructions. To facilitate research in this direction, we propose IGLU: Interactive Grounded Language Understanding in a Collabor… ▽ More

    Submitted 14 October, 2021; v1 submitted 13 October, 2021; originally announced October 2021.

  19. arXiv:2109.05794  [pdf, other

    cs.CL cs.AI cs.IR

    Building and Evaluating Open-Domain Dialogue Corpora with Clarifying Questions

    Authors: Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeffrey Dalton, Mikhail Burtsev

    Abstract: Enabling open-domain dialogue systems to ask clarifying questions when appropriate is an important direction for improving the quality of the system response. Namely, for cases when a user request is not specific enough for a conversation system to provide an answer right away, it is desirable to ask a clarifying question to increase the chances of retrieving a satisfying answer. To address the pr… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted in EMNLP 2021

  20. arXiv:2107.10342  [pdf, other

    cs.CL cs.NE

    Multi-Stream Transformers

    Authors: Mikhail Burtsev, Anna Rumshisky

    Abstract: Transformer-based encoder-decoder models produce a fused token-wise representation after every encoder layer. We investigate the effects of allowing the encoder to preserve and explore alternative hypotheses, combined at the end of the encoding process. To that end, we design and examine a $\textit{Multi-stream Transformer}$ architecture and find that splitting the Transformer encoder into multipl… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

  21. arXiv:2102.00541  [pdf, ps, other

    cs.CL

    Short Text Clustering with Transformers

    Authors: Leonid Pugachev, Mikhail Burtsev

    Abstract: Recent techniques for the task of short text clustering often rely on word embeddings as a transfer learning component. This paper shows that sentence vector representations from Transformers in conjunction with different clustering methods can be successfully applied to address the task. Furthermore, we demonstrate that the algorithm of enhancement of clustering via iterative classification can f… ▽ More

    Submitted 31 January, 2021; originally announced February 2021.

  22. arXiv:2009.11352  [pdf, ps, other

    cs.CL cs.IR

    ConvAI3: Generating Clarifying Questions for Open-Domain Dialogue Systems (ClariQ)

    Authors: Mohammad Aliannejadi, Julia Kiseleva, Aleksandr Chuklin, Jeff Dalton, Mikhail Burtsev

    Abstract: This document presents a detailed description of the challenge on clarifying questions for dialogue systems (ClariQ). The challenge is organized as part of the Conversational AI challenge series (ConvAI3) at Search Oriented Conversational AI (SCAI) EMNLP workshop in 2020. The main aim of the conversational systems is to return an appropriate answer in response to the user requests. However, some u… ▽ More

    Submitted 23 September, 2020; originally announced September 2020.

  23. arXiv:2006.11527  [pdf, other

    cs.CL cs.LG cs.NE

    Memory Transformer

    Authors: Mikhail S. Burtsev, Yuri Kuratov, Anton Peganov, Grigory V. Sapunov

    Abstract: Transformer-based models have achieved state-of-the-art results in many natural language processing tasks. The self-attention architecture allows transformer to combine information from all elements of a sequence into context-aware representations. However, information about the context is stored mostly in the same element-wise representations. This might limit the processing of properties related… ▽ More

    Submitted 16 February, 2021; v1 submitted 20 June, 2020; originally announced June 2020.

  24. arXiv:2002.02450  [pdf, other

    cs.CL cs.LG stat.ML

    Goal-Oriented Multi-Task BERT-Based Dialogue State Tracker

    Authors: Pavel Gulyaev, Eugenia Elistratova, Vasily Konovalov, Yuri Kuratov, Leonid Pugachev, Mikhail Burtsev

    Abstract: Dialogue State Tracking (DST) is a core component of virtual assistants such as Alexa or Siri. To accomplish various tasks, these assistants need to support an increasing number of services and APIs. The Schema-Guided State Tracking track of the 8th Dialogue System Technology Challenge highlighted the DST problem for unseen services. The organizers introduced the Schema-Guided Dialogue (SGD) datas… ▽ More

    Submitted 5 February, 2020; originally announced February 2020.

  25. arXiv:1910.03867  [pdf, other

    cs.LG stat.ML

    Loss Landscape Sightseeing with Multi-Point Optimization

    Authors: Ivan Skorokhodov, Mikhail Burtsev

    Abstract: We present multi-point optimization: an optimization technique that allows to train several models simultaneously without the need to keep the parameters of each one individually. The proposed method is used for a thorough empirical analysis of the loss landscape of neural networks. By extensive experiments on FashionMNIST and CIFAR10 datasets we demonstrate two things: 1) loss surface is surprisi… ▽ More

    Submitted 14 October, 2019; v1 submitted 9 October, 2019; originally announced October 2019.

  26. arXiv:1905.02662  [pdf, other

    cs.NE cs.AI cs.LG

    Continual and Multi-task Reinforcement Learning With Shared Episodic Memory

    Authors: Artyom Y. Sorokin, Mikhail S. Burtsev

    Abstract: Episodic memory plays an important role in the behavior of animals and humans. It allows the accumulation of information about current state of the environment in a task-agnostic way. This episodic representation can be later accessed by down-stream tasks in order to make their execution more efficient. In this work, we introduce the neural architecture with shared episodic memory (SEM) for learni… ▽ More

    Submitted 7 May, 2019; originally announced May 2019.

    Comments: Presented at the Task-Agnostic Reinforcement Learning Workshop at ICLR 2019

  27. arXiv:1902.00098  [pdf, other

    cs.AI cs.CL cs.HC

    The Second Conversational Intelligence Challenge (ConvAI2)

    Authors: Emily Dinan, Varvara Logacheva, Valentin Malykh, Alexander Miller, Kurt Shuster, Jack Urbanek, Douwe Kiela, Arthur Szlam, Iulian Serban, Ryan Lowe, Shrimai Prabhumoye, Alan W Black, Alexander Rudnicky, Jason Williams, Joelle Pineau, Mikhail Burtsev, Jason Weston

    Abstract: We describe the setting and results of the ConvAI2 NeurIPS competition that aims to further the state-of-the-art in open-domain chatbots. Some key takeaways from the competition are: (i) pretrained Transformer variants are currently the best performing models on this task, (ii) but to improve performance on multi-turn conversations with humans, future systems must go beyond single word metrics lik… ▽ More

    Submitted 31 January, 2019; originally announced February 2019.

  28. arXiv:1709.09686  [pdf, ps, other

    cs.CL

    Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition

    Authors: L. T. Anh, M. Y. Arkhipov, M. S. Burtsev

    Abstract: Named Entity Recognition (NER) is one of the most common tasks of the natural language processing. The purpose of NER is to find and classify tokens in text documents into predefined categories called tags, such as person names, quantity expressions, percentage expressions, names of locations, organizations, as well as expression of time, currency and others. Although there is a number of approach… ▽ More

    Submitted 8 October, 2017; v1 submitted 27 September, 2017; originally announced September 2017.

    Comments: Artificial Intelligence and Natural Language Conference (AINL 2017)

  29. arXiv:1204.3221  [pdf, ps, other

    cs.NE cs.AI nlin.AO

    Neuroevolution Results in Emergence of Short-Term Memory for Goal-Directed Behavior

    Authors: Konstantin Lakhman, Mikhail Burtsev

    Abstract: Animals behave adaptively in the environment with multiply competing goals. Understanding of the mechanisms underlying such goal-directed behavior remains a challenge for neuroscience as well for adaptive system research. To address this problem we developed an evolutionary model of adaptive behavior in the multigoal stochastic environment. Proposed neuroevolutionary algorithm is based on neuron's… ▽ More

    Submitted 14 April, 2012; originally announced April 2012.

    Comments: Manuscript was submitted to the 12th International Conference on the Simulation of Adaptive Behavior 2012; 10 pages, 6 figures

  30. arXiv:cs/0110021  [pdf

    cs.NE

    Alife Model of Evolutionary Emergence of Purposeful Adaptive Behavior

    Authors: Mikhail S. Burtsev, Vladimir G. Redko, Roman V. Gusarev

    Abstract: The process of evolutionary emergence of purposeful adaptive behavior is investigated by means of computer simulations. The model proposed implies that there is an evolving population of simple agents, which have two natural needs: energy and reproduction. Any need is characterized quantitatively by a corresponding motivation. Motivations determine goal-directed behavior of agents. The model dem… ▽ More

    Submitted 8 October, 2001; originally announced October 2001.

    Comments: 9 pages, 5 figures. Full version of poster presentation on ECAL 2001 (see "Advances in Artificial Life." J. Kelemen, P. Sosik (Eds.), 6th European Conference, ECAL 2001, Prague, Czech Republic, September 10-14, 2001, Proceedings, p. 413.)

    ACM Class: I.2.6; I.2.8; I.2.11