Skip to main content

Showing 1–5 of 5 results for author: Smywiński-Pohl, A

  1. arXiv:2407.00418  [pdf, other

    cs.CL cs.LG

    eFontes. Part of Speech Tagging and Lemmatization of Medieval Latin Texts.A Cross-Genre Survey

    Authors: Krzysztof Nowak, Jędrzej Ziębura, Krzysztof Wróbel, Aleksander Smywiński-Pohl

    Abstract: This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performan… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  2. arXiv:2206.01889  [pdf

    cs.CL cs.AI cs.LG

    Initial Study into Application of Feature Density and Linguistically-backed Embedding to Improve Machine Learning-based Cyberbullying Detection

    Authors: Juuso Eronen, Michal Ptaszynski, Fumito Masui, Gniewosz Leliwa, Michal Wroczynski, Mateusz Piech, Aleksander Smywinski-Pohl

    Abstract: In this research, we study the change in the performance of machine learning (ML) classifiers when various linguistic preprocessing methods of a dataset were used, with the specific focus on linguistically-backed embeddings in Convolutional Neural Networks (CNN). Moreover, we study the concept of Feature Density and confirm its potential to comparatively predict the performance of ML classifiers,… ▽ More

    Submitted 3 June, 2022; originally announced June 2022.

    Journal ref: Proceedings of The 6th Linguistic and Cognitive Approaches to Dialog Agents (LaCATODA 2020) IJCAI 2020 Workshop, Yokohama, Japan, January 2020

  3. arXiv:2111.01689  [pdf

    cs.CL cs.AI cs.CY

    Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density

    Authors: Juuso Eronen, Michal Ptaszynski, Fumito Masui, Aleksander Smywiński-Pohl, Gniewosz Leliwa, Michal Wroczynski

    Abstract: We study the effectiveness of Feature Density (FD) using different linguistically-backed feature preprocessing methods in order to estimate dataset complexity, which in turn is used to comparatively estimate the potential performance of machine learning (ML) classifiers prior to any training. We hypothesise that estimating dataset complexity allows for the reduction of the number of required exper… ▽ More

    Submitted 2 November, 2021; v1 submitted 2 November, 2021; originally announced November 2021.

    Comments: 73 pages, 4 figures, 19 tables, Information Processing and Management, Vol. 58, Issue 5, September 2021, paper ID: 102616

    Journal ref: Information Processing and Management, Vol. 58, Issue 5, September 2021, paper ID: 102616

  4. arXiv:1808.00926  [pdf, other

    cs.CL

    Cyberbullying Detection -- Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology

    Authors: Michał Ptaszyński, Gniewosz Leliwa, Mateusz Piech, Aleksander Smywiński-Pohl

    Abstract: The research described in this paper concerns automatic cyberbullying detection in social media. There are two goals to achieve: building a gold standard cyberbullying detection dataset and measuring the performance of the Samurai cyberbullying detection system. The Formspring dataset provided in a Kaggle competition was re-annotated as a part of the research. The annotation procedure is described… ▽ More

    Submitted 2 August, 2018; originally announced August 2018.

    Report number: 2/2018 CS AGH

  5. arXiv:1706.06363  [pdf, ps, other

    cs.CL

    Improving text classification with vectors of reduced precision

    Authors: Krzysztof Wróbel, Maciej Wielgosz, Marcin Pietroń, Michał Karwatowski, Aleksander Smywiński-Pohl

    Abstract: This paper presents the analysis of the impact of a floating-point number precision reduction on the quality of text classification. The precision reduction of the vectors representing the data (e.g. TF-IDF representation in our case) allows for a decrease of computing time and memory footprint on dedicated hardware platforms. The impact of precision reduction on the classification quality was per… ▽ More

    Submitted 20 June, 2017; originally announced June 2017.