Skip to main content

Showing 1–29 of 29 results for author: Sokolov, A

  1. arXiv:2405.20551  [pdf, other

    cs.SE cs.HC cs.LG cs.PL

    EM-Assist: Safe Automated ExtractMethod Refactoring with LLMs

    Authors: Dorin Pomian, Abhiram Bellur, Malinda Dilhara, Zarina Kurbatova, Egor Bogomolov, Andrey Sokolov, Timofey Bryksin, Danny Dig

    Abstract: Excessively long methods, loaded with multiple responsibilities, are challenging to understand, debug, reuse, and maintain. The solution lies in the widely recognized Extract Method refactoring. While the application of this refactoring is supported in modern IDEs, recommending which code fragments to extract has been the topic of many research tools. However, they often struggle to replicate real… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: This paper is accepted to the tool demonstration track of the 32nd ACM Symposium on the Foundations of Software Engineering (FSE 2024). This is an author copy

  2. arXiv:2405.18450  [pdf, other

    cs.DB

    Distance based prefetching algorithms for mining of the sporadic requests associations

    Authors: Vadim Voevodkin, Andrey Sokolov

    Abstract: Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR) directly affects overall performance characteristics of the storage system (read latency, IOPS, etc.). There are widely known prefetching algorithms that are focused… ▽ More

    Submitted 13 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  3. arXiv:2307.08426  [pdf, other

    cs.CL

    Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts

    Authors: Rebekka Hubert, Artem Sokolov, Stefan Riezler

    Abstract: End-to-end automatic speech translation (AST) relies on data that combines audio inputs with text translation outputs. Previous work used existing large parallel corpora of transcriptions and translations in a knowledge distillation (KD) setup to distill a neural machine translation (NMT) into an AST student model. While KD allows using larger pretrained models, the reliance of previous KD approac… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: IWSLT 2023, corrected version

    Journal ref: In Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), pages 89-101

  4. arXiv:2304.10611  [pdf, other

    cs.CL cs.LG

    Joint Repetition Suppression and Content Moderation of Large Language Models

    Authors: Minghui Zhang, Alex Sokolov, Weixin Cai, Si-Qing Chen

    Abstract: Natural language generation (NLG) is one of the most impactful fields in NLP, and recent years have witnessed its evolution brought about by large language models (LLMs). As the key instrument for writing assistance applications, they are generally prone to replicating or extending offensive content provided in the input. In low-resource data regime, they can also lead to repetitive outputs. Usual… ▽ More

    Submitted 5 June, 2023; v1 submitted 20 April, 2023; originally announced April 2023.

  5. An Evaluation on Large Language Model Outputs: Discourse and Memorization

    Authors: Adrian de Wynter, Xun Wang, Alex Sokolov, Qilong Gu, Si-Qing Chen

    Abstract: We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available tools. We find a correlation between percentage of memorized text, percentage of unique text, and overall output quality, when measured with respect to output pathologies such as counterfactual and logically-fl… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Comments: Preprint. Under review

  6. arXiv:2302.13959  [pdf, other

    cs.CL

    Make Every Example Count: On the Stability and Utility of Self-Influence for Learning from Noisy NLP Datasets

    Authors: Irina Bejan, Artem Sokolov, Katja Filippova

    Abstract: Increasingly larger datasets have become a standard ingredient to advancing the state-of-the-art in NLP. However, data quality might have already become the bottleneck to unlock further gains. Given the diversity and the sizes of modern datasets, standard data filtering is not straight-forward to apply, because of the multifacetedness of the harmful data and elusiveness of filtering rules that wou… ▽ More

    Submitted 17 October, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Published at EMNLP 2023

  7. arXiv:2207.09821  [pdf

    cs.DL cs.LG stat.ML

    Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study

    Authors: Anna Severin, Michaela Strinzel, Matthias Egger, Tiago Barros, Alexander Sokolov, Julia Vilstrup Mouatt, Stefan Müller

    Abstract: The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine le… ▽ More

    Submitted 19 August, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 44 pages

  8. arXiv:2204.09716  [pdf, ps, other

    cs.CL cs.AI

    Domain Specific Fine-tuning of Denoising Sequence-to-Sequence Models for Natural Language Summarization

    Authors: Brydon Parker, Alik Sokolov, Mahtab Ahmed, Matt Kalebic, Sedef Akinli Kocak, Ofer Shai

    Abstract: Summarization of long-form text data is a problem especially pertinent in knowledge economy jobs such as medicine and finance, that require continuously remaining informed on a sophisticated and evolving body of knowledge. As such, isolating and summarizing key content automatically using Natural Language Processing (NLP) techniques holds the potential for extensive time savings in these industrie… ▽ More

    Submitted 6 April, 2022; originally announced April 2022.

    Comments: 8 pages, 6 figures

    ACM Class: I.2.7

  9. arXiv:2202.06045  [pdf, other

    cs.CL cs.SD eess.AS

    USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

    Authors: Bolaji Yusuf, Ankur Gandhe, Alex Sokolov

    Abstract: Improving end-to-end speech recognition by incorporating external text data has been a longstanding research topic. There has been a recent focus on training E2E ASR models that get the performance benefits of external text data without incurring the extra cost of evaluating an external language model at inference time. In this work, we propose training ASR model jointly with a set of text-to-text… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

    Comments: 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2022)

  10. arXiv:2112.03052  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Up Influence Functions

    Authors: Andrea Schioppa, Polina Zablotskaia, David Vilar, Artem Sokolov

    Abstract: We address efficient calculation of influence functions for tracking predictions back to the training data. We propose and analyze a new approach to speeding up the inverse Hessian calculation based on Arnoldi iteration. With this improvement, we achieve, to the best of our knowledge, the first successful implementation of influence functions that scales to full-size (language and vision) Transfor… ▽ More

    Submitted 6 December, 2021; originally announced December 2021.

    Comments: Published at AAAI-22

  11. arXiv:2110.06997  [pdf, other

    cs.CL cs.AI

    Bandits Don't Follow Rules: Balancing Multi-Facet Machine Translation with Multi-Armed Bandits

    Authors: Julia Kreutzer, David Vilar, Artem Sokolov

    Abstract: Training data for machine translation (MT) is often sourced from a multitude of large corpora that are multi-faceted in nature, e.g. containing contents from multiple domains or different levels of quality or complexity. Naturally, these facets do not occur with equal frequency, nor are they equally important for the test scenario at hand. In this work, we propose to optimize this balance jointly… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

    Comments: EMNLP Findings 2021

  12. arXiv:2109.07926  [pdf, other

    cs.CL

    Don't Search for a Search Method -- Simple Heuristics Suffice for Adversarial Text Attacks

    Authors: Nathaniel Berger, Stefan Riezler, Artem Sokolov, Sebastian Ebert

    Abstract: Recently more attention has been given to adversarial attacks on neural networks for natural language processing (NLP). A central research topic has been the investigation of search algorithms and search constraints, accompanied by benchmark algorithms and tasks. We implement an algorithm inspired by zeroth order optimization-based attacks and compare with the benchmark results in the TextAttack f… ▽ More

    Submitted 4 October, 2021; v1 submitted 16 September, 2021; originally announced September 2021.

    Comments: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP Main Conference)

  13. arXiv:2109.04114  [pdf, other

    cs.CL cs.LG

    Fixing exposure bias with imitation learning needs powerful oracles

    Authors: Luca Hormann, Artem Sokolov

    Abstract: We apply imitation learning (IL) to tackle the NMT exposure bias problem with error-correcting oracles, and evaluate an SMT lattice-based oracle which, despite its excellent performance in an unconstrained oracle translation task, turned out to be too pruned and idiosyncratic to serve as the oracle for IL.

    Submitted 17 September, 2021; v1 submitted 9 September, 2021; originally announced September 2021.

  14. arXiv:2104.10121  [pdf, other

    cs.SD cs.CL eess.AS

    On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

    Authors: Shahin Amiriparian, Artem Sokolov, Ilhan Aslan, Lukas Christ, Maurice Gerczuk, Tobias Hübner, Dmitry Lamanov, Manuel Milling, Sandra Ottl, Ilya Poduremennykh, Evgeniy Shuranov, Björn W. Schuller

    Abstract: Text encodings from automatic speech recognition (ASR) transcripts and audio representations have shown promise in speech emotion recognition (SER) ever since. Yet, it is challenging to explain the effect of each information stream on the SER systems. Further, more clarification is required for analysing the impact of ASR's word error rate (WER) on linguistic emotion recognition per se and in the… ▽ More

    Submitted 20 April, 2021; originally announced April 2021.

    Comments: 5 pages, 1 figure

    ACM Class: I.2.7; I.5.0

  15. arXiv:2104.01923  [pdf, other

    eess.SP cs.SD eess.AS

    Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement

    Authors: Vasiliy Kuzmin, Fyodor Kravchenko, Artem Sokolov, Jie Geng

    Abstract: In this paper, we describe the work that we have done to participate in Task1 of the ConferencingSpeech2021 challenge. This task set a goal to develop the solution for multi-channel speech enhancement in a real-time manner. We propose a novel system for streaming speech enhancement. We employ Wave-U-Net architecture with temporal convolutions in encoder and decoder. We incorporate self-attention i… ▽ More

    Submitted 5 April, 2021; originally announced April 2021.

    Comments: Draft paper for InterSpeech 2021 processing

  16. Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

    Authors: Julia Kreutzer, Isaac Caswell, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Ortiz Suarez, Iroro Orife, Kelechi Ogueji, Andre Niyongabo Rubungo, Toan Q. Nguyen, Mathias Müller, André Müller , et al. (27 additional authors not shown)

    Abstract: With the success of large-scale pre-training and multilingual modeling in Natural Language Processing (NLP), recent years have seen a proliferation of large, web-mined text datasets covering hundreds of languages. We manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4). Lower-resource corpora have system… ▽ More

    Submitted 21 February, 2022; v1 submitted 22 March, 2021; originally announced March 2021.

    Comments: Accepted at TACL; pre-MIT Press publication version

    Journal ref: Transactions of the Association for Computational Linguistics (2022) 10: 50-72

  17. arXiv:2006.14223  [pdf, other

    cs.CL cs.AI cs.LG

    Neural Machine Translation For Paraphrase Generation

    Authors: Alex Sokolov, Denis Filimonov

    Abstract: Training a spoken language understanding system, as the one in Alexa, typically requires a large human-annotated corpus of data. Manual annotations are expensive and time consuming. In Alexa Skill Kit (ASK) user experience with the skill greatly depends on the amount of data provided by skill developer. In this work, we present an automatic natural language generation system, capable of generating… ▽ More

    Submitted 25 June, 2020; originally announced June 2020.

    Comments: Published in NIPS 2018: 2nd Conversational AI workshop

  18. arXiv:2006.14194  [pdf, other

    cs.CL

    Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

    Authors: Alex Sokolov, Tracy Rohlin, Ariya Rastrow

    Abstract: Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systems are monolingual and based on traditional joint-sequence based n-gram models [1,2]. As an al… ▽ More

    Submitted 28 June, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

    Comments: Published in INTERSPEECH (2019)

  19. arXiv:2006.01759  [pdf, other

    stat.ML cs.LG math.OC

    Sparse Perturbations for Improved Convergence in Stochastic Zeroth-Order Optimization

    Authors: Mayumi Ohta, Nathaniel Berger, Artem Sokolov, Stefan Riezler

    Abstract: Interest in stochastic zeroth-order (SZO) methods has recently been revived in black-box optimization scenarios such as adversarial black-box attacks to deep neural networks. SZO methods only require the ability to evaluate the objective function at random input points, however, their weakness is the dependency of their convergence speed on the dimensionality of the function to be evaluated. We pr… ▽ More

    Submitted 29 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

    Comments: International Conference on Machine Learning, Optimization, and Data Science (LOD), Siena, Italy

    Journal ref: LOD 2020

  20. arXiv:1907.13444  [pdf, other

    math.OC cs.DC

    Balanced Identification as an Intersection of Optimization and Distributed Computing

    Authors: Alexander Sokolov, Vladimir Voloshinov

    Abstract: Technology of formal quantitative estimation of the conformity of the mathematical models to the available dataset is presented. Main purpose of the technology is to make easier the model selection decision-making process for the researcher. The technology is a combination of approaches from the areas of data analysis, optimization and distributed computing including: cross-validation and regulari… ▽ More

    Submitted 20 April, 2020; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: 19 pages, 8 figures, 1 table, 32 references. Due to delay in publication (through no our fault) we uploaded revised text for reviewers of subsequent papers on the subject

  21. arXiv:1810.01480  [pdf, other

    cs.CL stat.ML

    Learning to Segment Inputs for NMT Favors Character-Level Processing

    Authors: Julia Kreutzer, Artem Sokolov

    Abstract: Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static… ▽ More

    Submitted 5 November, 2018; v1 submitted 2 October, 2018; originally announced October 2018.

    Comments: Technical report for IWSLT 2018 paper

  22. arXiv:1806.04458  [pdf, other

    stat.ML cs.CL cs.LG

    Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

    Authors: Artem Sokolov, Julian Hitschler, Mayumi Ohta, Stefan Riezler

    Abstract: Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, t… ▽ More

    Submitted 10 November, 2020; v1 submitted 12 June, 2018; originally announced June 2018.

  23. arXiv:1712.05690  [pdf, other

    cs.CL cs.LG stat.ML

    Sockeye: A Toolkit for Neural Machine Translation

    Authors: Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

    Abstract: We describe Sockeye (version 1.12), an open-source sequence-to-sequence toolkit for Neural Machine Translation (NMT). Sockeye is a production-ready framework for training and applying models as well as an experimental platform for researchers. Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attenti… ▽ More

    Submitted 1 June, 2018; v1 submitted 15 December, 2017; originally announced December 2017.

  24. arXiv:1707.09118  [pdf, other

    stat.ML cs.CL cs.LG

    Counterfactual Learning from Bandit Feedback under Deterministic Logging: A Case Study in Statistical Machine Translation

    Authors: Carolin Lawrence, Artem Sokolov, Stefan Riezler

    Abstract: The goal of counterfactual learning for statistical machine translation (SMT) is to optimize a target SMT system from logged data that consist of user feedback to translations that were predicted by another, historic SMT system. A challenge arises by the fact that risk-averse commercial SMT systems deterministically log the most probable translation. The lack of sufficient exploration of the SMT o… ▽ More

    Submitted 14 December, 2017; v1 submitted 28 July, 2017; originally announced July 2017.

    Comments: Conference on Empirical Methods in Natural Language Processing (EMNLP), 2017, Copenhagen, Denmark

  25. arXiv:1707.09050  [pdf, other

    cs.CL stat.ML

    A Shared Task on Bandit Learning for Machine Translation

    Authors: Artem Sokolov, Julia Kreutzer, Kellen Sunderland, Pavel Danchenko, Witold Szymaniak, Hagen Fürstenau, Stefan Riezler

    Abstract: We introduce and describe the results of a novel shared task on bandit learning for machine translation. The task was organized jointly by Amazon and Heidelberg University for the first time at the Second Conference on Machine Translation (WMT 2017). The goal of the task is to encourage research on learning machine translation from weak user feedback instead of human references or post-edits. On e… ▽ More

    Submitted 27 July, 2017; originally announced July 2017.

    Comments: Conference on Machine Translation (WMT) 2017

  26. arXiv:1705.07154  [pdf, other

    quant-ph cs.CR

    Demonstration of a quantum key distribution network in urban fibre-optic communication lines

    Authors: E. O. Kiktenko, N. O. Pozhar, A. V. Duplinskiy, A. A. Kanapin, A. S. Sokolov, S. S. Vorobey, A. V. Miller, V. E. Ustimchik, M. N. Anufriev, A. S. Trushechkin, R. R. Yunusov, V. L. Kurochkin, Y. V. Kurochkin, A. K. Fedorov

    Abstract: We report the results of the implementation of a quantum key distribution (QKD) network using standard fibre communication lines in Moscow. The developed QKD network is based on the paradigm of trusted repeaters and allows a common secret key to be generated between users via an intermediate trusted node. The main feature of the network is the integration of the setups using two types of encoding,… ▽ More

    Submitted 2 October, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

    Comments: 5 pages, 3 figures

    Journal ref: Quantum Electron. 47, 798 (2017)

  27. arXiv:1704.06497  [pdf, other

    stat.ML cs.CL cs.LG

    Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

    Authors: Julia Kreutzer, Artem Sokolov, Stefan Riezler

    Abstract: Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-b… ▽ More

    Submitted 13 December, 2018; v1 submitted 21 April, 2017; originally announced April 2017.

    Comments: ACL 2017

  28. arXiv:1606.00739  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Stochastic Structured Prediction under Bandit Feedback

    Authors: Artem Sokolov, Julia Kreutzer, Christopher Lo, Stefan Riezler

    Abstract: Stochastic structured prediction under bandit feedback follows a learning protocol where on each of a sequence of iterations, the learner receives an input, predicts an output structure, and receives partial feedback in form of a task loss evaluation of the predicted structure. We present applications of this learning scenario to convex and non-convex objectives for structured prediction and analy… ▽ More

    Submitted 2 November, 2016; v1 submitted 2 June, 2016; originally announced June 2016.

    Comments: 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain

  29. arXiv:1601.04468  [pdf, ps, other

    cs.CL cs.LG

    Bandit Structured Prediction for Learning from Partial Feedback in Statistical Machine Translation

    Authors: Artem Sokolov, Stefan Riezler, Tanguy Urvoy

    Abstract: We present an approach to structured prediction from bandit feedback, called Bandit Structured Prediction, where only the value of a task loss function at a single predicted point, instead of a correct structure, is observed in learning. We present an application to discriminative reranking in Statistical Machine Translation (SMT) where the learning algorithm only has access to a 1-BLEU loss evalu… ▽ More

    Submitted 18 January, 2016; originally announced January 2016.

    Comments: In Proceedings of MT Summit XV, 2015. Miami, FL