subscribe to arXiv mailings

Self-training Large Language Models through Knowledge Detection

Authors: Wei Jie Yeo, Teddy Ferdinan, Przemyslaw Kazienko, Ranjan Satapathy, Erik Cambria

Abstract: Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency method. Empirical evaluations demonstrate significant i… ▽ More Large language models (LLMs) often necessitate extensive labeled datasets and training compute to achieve impressive performance across downstream tasks. This paper explores a self-training paradigm, where the LLM autonomously curates its own labels and selectively trains on unknown data samples identified through a reference-free consistency method. Empirical evaluations demonstrate significant improvements in reducing hallucination in generation across multiple subjects. Furthermore, the selective training framework mitigates catastrophic forgetting in out-of-distribution benchmarks, addressing a critical limitation in training LLMs. Our findings suggest that such an approach can substantially reduce the dependency on large labeled datasets, paving the way for more scalable and cost-effective language model training. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: Under review

arXiv:2404.05892 [pdf, other]

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Authors: Bo Peng, Daniel Goldstein, Quentin Anthony, Alon Albalak, Eric Alcaide, Stella Biderman, Eugene Cheah, Xingjian Du, Teddy Ferdinan, Haowen Hou, Przemysław Kazienko, Kranthi Kiran GV, Jan Kocoń, Bartłomiej Koptyra, Satyapriya Krishna, Ronald McClelland Jr., Niklas Muennighoff, Fares Obeid, Atsushi Saito, Guangyu Song, Haoqin Tu, Stanisław Woźniak, Ruichong Zhang, Bingchen Zhao, Qihang Zhao , et al. (3 additional authors not shown)

Abstract: We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokeni… ▽ More We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving upon the RWKV (RWKV-4) architecture. Our architectural design advancements include multi-headed matrix-valued states and a dynamic recurrence mechanism that improve expressivity while maintaining the inference efficiency characteristics of RNNs. We introduce a new multilingual corpus with 1.12 trillion tokens and a fast tokenizer based on greedy matching for enhanced multilinguality. We trained four Eagle models, ranging from 0.46 to 7.5 billion parameters, and two Finch models with 1.6 and 3.1 billion parameters and find that they achieve competitive performance across a wide variety of benchmarks. We release all our models on HuggingFace under the Apache 2.0 license. Models at: https://huggingface.co/RWKV Training code at: https://github.com/RWKV/RWKV-LM Inference code at: https://github.com/RWKV/ChatRWKV Time-parallel training code at: https://github.com/RWKV/RWKV-infctx-trainer △ Less

Submitted 10 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2402.09269 [pdf, other]

Personalized Large Language Models

Authors: Stanisław Woźniak, Bartłomiej Koptyra, Arkadiusz Janz, Przemysław Kazienko, Jan Kocoń

Abstract: Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demon… ▽ More Large language models (LLMs) have significantly advanced Natural Language Processing (NLP) tasks in recent years. However, their universal nature poses limitations in scenarios requiring personalized responses, such as recommendation systems and chatbots. This paper investigates methods to personalize LLMs, comparing fine-tuning and zero-shot reasoning approaches on subjective tasks. Results demonstrate that personalized fine-tuning improves model reasoning compared to non-personalized models. Experiments on datasets for emotion recognition and hate speech detection show consistent performance gains with personalized methods across different LLM architectures. These findings underscore the importance of personalization for enhancing LLM capabilities in subjective text perception tasks. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2402.09147 [pdf, other]

Into the Unknown: Self-Learning Large Language Models

Authors: Teddy Ferdinan, Jan Kocoń, Przemysław Kazienko

Abstract: We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through selfassessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in the Unknown (PiUs), along with one extrinsic and three intrinsic methods for automa… ▽ More We address the main problem of self-learning LLM: the question of what to learn. We propose a self-learning LLM framework that enables an LLM to independently learn previously unknown knowledge through selfassessment of their own hallucinations. Using the hallucination score, we introduce a new concept of Points in the Unknown (PiUs), along with one extrinsic and three intrinsic methods for automatic PiUs identification. It facilitates the creation of a self-learning loop that focuses exclusively on the knowledge gap in Points in the Unknown, resulting in a reduced hallucination score. We also developed evaluation metrics for gauging an LLM's self-learning capability. Our experiments revealed that 7B-Mistral models that have been finetuned or aligned and RWKV5-Eagle are capable of self-learning considerably well. Our self-learning concept allows more efficient LLM updates and opens new perspectives for knowledge exchange. It may also increase public trust in AI. △ Less

Submitted 4 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: 16 pages, 12 figures, 4 tables, submitted to ACL SRW 2024

arXiv:2312.11296 [pdf, other]

From Generalized Laughter to Personalized Chuckles: Unleashing the Power of Data Fusion in Subjective Humor Detection

Authors: Julita Bielaniewicz, Przemysław Kazienko

Abstract: The vast area of subjectivity in Natural Language Processing (NLP) poses a challenge to the solutions typically used in generalized tasks. As exploration in the scope of generalized NLP is much more advanced, it implies the tremendous gap that is still to be addressed amongst all feasible tasks where an opinion, taste, or feelings are inherent, thus creating a need for a solution, where a data fus… ▽ More The vast area of subjectivity in Natural Language Processing (NLP) poses a challenge to the solutions typically used in generalized tasks. As exploration in the scope of generalized NLP is much more advanced, it implies the tremendous gap that is still to be addressed amongst all feasible tasks where an opinion, taste, or feelings are inherent, thus creating a need for a solution, where a data fusion could take place. We have chosen the task of funniness, as it heavily relies on the sense of humor, which is fundamentally subjective. Our experiments across five personalized and four generalized datasets involving several personalized deep neural architectures have shown that the task of humor detection greatly benefits from the inclusion of personalized data in the training process. We tested five scenarios of training data fusion that focused on either generalized (majority voting) or personalized approaches to humor detection. The best results were obtained for the setup, in which all available personalized datasets were joined to train the personalized reasoning model. It boosted the prediction performance by up to approximately 35% of the macro F1 score. Such a significant gain was observed for all five personalized test sets. At the same time, the impact of the model's architecture was much less than the personalization itself. It seems that concatenating personalized datasets, even with the cost of normalizing the range of annotations across all datasets, if combined with the personalized models, results in an enormous increase in the performance of humor detection. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: 10 pages, 13 figures, 2 tables

arXiv:2312.08198 [pdf, other]

Towards Model-Based Data Acquisition for Subjective Multi-Task NLP Problems

Authors: Kamil Kanclerz, Julita Bielaniewicz, Marcin Gruza, Jan Kocon, Stanisław Woźniak, Przemysław Kazienko

Abstract: Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language processing (NLP) problems like offensiveness or emotion detection is often very expensive and time-consuming. One of the inevitable risks is to spend some of the funds… ▽ More Data annotated by humans is a source of knowledge by describing the peculiarities of the problem and therefore fueling the decision process of the trained model. Unfortunately, the annotation process for subjective natural language processing (NLP) problems like offensiveness or emotion detection is often very expensive and time-consuming. One of the inevitable risks is to spend some of the funds and annotator effort on annotations that do not provide any additional knowledge about the specific task. To minimize these costs, we propose a new model-based approach that allows the selection of tasks annotated individually for each text in a multi-task scenario. The experiments carried out on three datasets, dozens of NLP tasks, and thousands of annotations show that our method allows up to 40% reduction in the number of annotations with negligible loss of knowledge. The results also emphasize the need to collect a diverse amount of data required to efficiently train a model, depending on the subjectivity of the annotation task. We also focused on measuring the relation between subjective tasks by evaluating the model in single-task and multi-task scenarios. Moreover, for some datasets, training only on the labels predicted by our model improved the efficiency of task selection as a self-supervised learning regularization technique. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06034 [pdf, other]

Modeling Uncertainty in Personalized Emotion Prediction with Normalizing Flows

Authors: Piotr Miłkowski, Konrad Karanowski, Patryk Wielopolski, Jan Kocoń, Przemysław Kazienko, Maciej Zięba

Abstract: Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the reader to make more accurate predictions. However,… ▽ More Designing predictive models for subjective problems in natural language processing (NLP) remains challenging. This is mainly due to its non-deterministic nature and different perceptions of the content by different humans. It may be solved by Personalized Natural Language Processing (PNLP), where the model exploits additional information about the reader to make more accurate predictions. However, current approaches require complete information about the recipients to be straight embedded. Besides, the recent methods focus on deterministic inference or simple frequency-based estimations of the probabilities. In this work, we overcome this limitation by proposing a novel approach to capture the uncertainty of the forecast using conditional Normalizing Flows. This allows us to model complex multimodal distributions and to compare various models using negative log-likelihood (NLL). In addition, the new solution allows for various interpretations of possible reader perception thanks to the available sampling function. We validated our method on three challenging, subjective NLP tasks, including emotion recognition and hate speech. The comparative analysis of generalized and personalized approaches revealed that our personalized solutions significantly outperform the baseline and provide more precise uncertainty estimates. The impact on the text interpretability and uncertainty studies are presented as well. The information brought by the developed methods makes it possible to build hybrid models whose effectiveness surpasses classic solutions. In addition, an analysis and visualization of the probabilities of the given decisions for texts with high entropy of annotations and annotators with mixed views were carried out. △ Less

Submitted 10 December, 2023; originally announced December 2023.

Comments: 10 pages, 8 figures, SENTIRE'23 (ICDM 2023)

arXiv:2309.15292 [pdf, other]

Scaling Representation Learning from Ubiquitous ECG with State-Space Models

Authors: Kleanthis Avramidis, Dominika Kunc, Bartosz Perz, Kranti Adsul, Tiantian Feng, Przemysław Kazienko, Stanisław Saganowski, Shrikanth Narayanan

Abstract: Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging… ▽ More Ubiquitous sensing from wearable devices in the wild holds promise for enhancing human well-being, from diagnosing clinical conditions and measuring stress to building adaptive health promoting scaffolds. But the large volumes of data therein across heterogeneous contexts pose challenges for conventional supervised learning approaches. Representation Learning from biological signals is an emerging realm catalyzed by the recent advances in computational modeling and the abundance of publicly shared databases. The electrocardiogram (ECG) is the primary researched modality in this context, with applications in health monitoring, stress and affect estimation. Yet, most studies are limited by small-scale controlled data collection and over-parameterized architecture choices. We introduce \textbf{WildECG}, a pre-trained state-space model for representation learning from ECG signals. We train this model in a self-supervised manner with 275,000 10s ECG recordings collected in the wild and evaluate it on a range of downstream tasks. The proposed model is a robust backbone for ECG analysis, providing competitive performance on most of the tasks considered, while demonstrating efficacy in low-resource regimes. The code and pre-trained weights are shared publicly at https://github.com/klean2050/tiles_ecg_model. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: Pre-print, currently under review

arXiv:2305.13048 [pdf, other]

RWKV: Reinventing RNNs for the Transformer Era

Authors: Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Stella Biderman, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, Xuzheng He, Haowen Hou, Jiaju Lin, Przemyslaw Kazienko, Jan Kocon, Jiaming Kong, Bartlomiej Koptyra, Hayden Lau, Krishna Sri Ipsit Mantri, Ferdinand Mom, Atsushi Saito, Guangyu Song, Xiangru Tang, Bolun Wang , et al. (9 additional authors not shown)

Abstract: Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scala… ▽ More Transformers have revolutionized almost all natural language processing (NLP) tasks but suffer from memory and computational complexity that scales quadratically with sequence length. In contrast, recurrent neural networks (RNNs) exhibit linear scaling in memory and computational requirements but struggle to match the same performance as Transformers due to limitations in parallelization and scalability. We propose a novel model architecture, Receptance Weighted Key Value (RWKV), that combines the efficient parallelizable training of transformers with the efficient inference of RNNs. Our approach leverages a linear attention mechanism and allows us to formulate the model as either a Transformer or an RNN, thus parallelizing computations during training and maintains constant computational and memory complexity during inference. We scale our models as large as 14 billion parameters, by far the largest dense RNN ever trained, and find RWKV performs on par with similarly sized Transformers, suggesting future work can leverage this architecture to create more efficient models. This work presents a significant step towards reconciling trade-offs between computational efficiency and model performance in sequence processing tasks. △ Less

Submitted 10 December, 2023; v1 submitted 22 May, 2023; originally announced May 2023.

arXiv:2302.10724 [pdf, other]

doi 10.1016/j.inffus.2023.101861

ChatGPT: Jack of all trades, master of none

Authors: Jan Kocoń, Igor Cichecki, Oliwier Kaszyca, Mateusz Kochanek, Dominika Szydło, Joanna Baran, Julita Bielaniewicz, Marcin Gruza, Arkadiusz Janz, Kamil Kanclerz, Anna Kocoń, Bartłomiej Koptyra, Wiktoria Mieleszczenko-Kowszewicz, Piotr Miłkowski, Marcin Oleksy, Maciej Piasecki, Łukasz Radliński, Konrad Wojtasik, Stanisław Woźniak, Przemysław Kazienko

Abstract: OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined C… ▽ More OpenAI has released the Chat Generative Pre-trained Transformer (ChatGPT) and revolutionized the approach in artificial intelligence to human-model interaction. Several publications on ChatGPT evaluation test its effectiveness on well-known natural language processing (NLP) tasks. However, the existing studies are mostly non-automated and tested on a very limited scale. In this work, we examined ChatGPT's capabilities on 25 diverse analytical NLP tasks, most of them subjective even to humans, such as sentiment analysis, emotion recognition, offensiveness, and stance detection. In contrast, the other tasks require more objective reasoning like word sense disambiguation, linguistic acceptability, and question answering. We also evaluated GPT-4 model on five selected subsets of NLP tasks. We automated ChatGPT and GPT-4 prompting process and analyzed more than 49k responses. Our comparison of its results with available State-of-the-Art (SOTA) solutions showed that the average loss in quality of the ChatGPT model was about 25% for zero-shot and few-shot evaluation. For GPT-4 model, a loss for semantic tasks is significantly lower than for ChatGPT. We showed that the more difficult the task (lower SOTA performance), the higher the ChatGPT loss. It especially refers to pragmatic NLP problems like emotion recognition. We also tested the ability to personalize ChatGPT responses for selected subjective tasks via Random Contextual Few-Shot Personalization, and we obtained significantly better user-based predictions. Additional qualitative analysis revealed a ChatGPT bias, most likely due to the rules imposed on human trainers by OpenAI. Our results provide the basis for a fundamental discussion of whether the high quality of recent predictive NLP models can indicate a tool's usefulness to society and how the learning and validation procedures for such systems should be established. △ Less

Submitted 9 June, 2023; v1 submitted 21 February, 2023; originally announced February 2023.

Comments: preprint

Journal ref: Information Fusion 101861 (2023)

arXiv:2203.05368 [pdf, other]

doi 10.1063/5.0074992

Temporal Network Epistemology: on Reaching Consensus in Real World Setting

Authors: Radosław Michalski, Damian Serwata, Mateusz Nurek, Boleslaw K. Szymanski, Przemysław Kazienko, Tao Jia

Abstract: This work develops the concept of temporal network epistemology model enabling the simulation of the learning process in dynamic networks. The results of the research, conducted on the temporal social network generated using the CogSNet model and on the static topologies as a reference, indicate a significant influence of the network temporal dynamics on the outcome and flow of the learning proces… ▽ More This work develops the concept of temporal network epistemology model enabling the simulation of the learning process in dynamic networks. The results of the research, conducted on the temporal social network generated using the CogSNet model and on the static topologies as a reference, indicate a significant influence of the network temporal dynamics on the outcome and flow of the learning process. It has been shown that not only the dynamics of reaching consensus is different compared to baseline models but also that previously unobserved phenomena appear, such as uninformed agents or different consensus states for disconnected components. It has been also observed that sometimes only the change of the network structure can contribute to reaching consensus. The introduced approach and the experimental results can be used to better understand the way how human communities collectively solve both complex problems at the scientific level and to inquire into the correctness of less complex but common and equally important beliefs' spreading across entire societies. △ Less

Submitted 10 March, 2022; originally announced March 2022.

Journal ref: Chaos 32, 063135 (2022)

arXiv:2005.00093 [pdf, other]

doi 10.1145/3448891.3450332

Consumer Wearables and Affective Computing for Wellbeing Support

Authors: Stanisław Saganowski, Przemysław Kazienko, Maciej Dzieżyc, Patrycja Jakimów, Joanna Komoszyńska, Weronika Michalska, Anna Dutkowiak, Adam Polak, Adam Dziadek, Michał Ujma

Abstract: Wearables equipped with pervasive sensors enable us to monitor physiological and behavioral signals in our everyday life. We propose the WellAff system able to recognize affective states for wellbeing support. It also includes health care scenarios, in particular patients with chronic kidney disease (CKD) suffering from bipolar disorders. For the need of a large-scale field study, we revised over… ▽ More Wearables equipped with pervasive sensors enable us to monitor physiological and behavioral signals in our everyday life. We propose the WellAff system able to recognize affective states for wellbeing support. It also includes health care scenarios, in particular patients with chronic kidney disease (CKD) suffering from bipolar disorders. For the need of a large-scale field study, we revised over 50 off-the-shelf devices in terms of usefulness for emotion, stress, meditation, sleep, and physical activity recognition and analysis. Their usability directly comes from the types of sensors they possess as well as the quality and availability of raw signals. We found there is no versatile device suitable for all purposes. Using Empatica E4 and Samsung Galaxy Watch, we have recorded physiological signals from 11 participants over many weeks. The gathered data enabled us to train a classifier that accurately recognizes strong affective states. △ Less

Submitted 12 August, 2021; v1 submitted 30 April, 2020; originally announced May 2020.

Comments: Accepted to the International Workshop on Artificial Intelligence for Mobile and Ubiquitous Communication System, EAI MobiQuitous 2020

arXiv:1912.10528 [pdf, other]

Emotion Recognition Using Wearables: A Systematic Literature Review Work in progress

Authors: Stanisław Saganowski, Anna Dutkowiak, Adam Dziadek, Maciej Dzieżyc, Joanna Komoszyńska, Weronika Michalska, Adam Polak, Michał Ujma, Przemysław Kazienko

Abstract: Wearables like smartwatches or wrist bands equipped with pervasive sensors enable us to monitor our physiological signals. In this study, we address the question whether they can help us to recognize our emotions in our everyday life for ubiquitous computing. Using the systematic literature review, we identified crucial research steps and discussed the main limitations and problems in the domain. Wearables like smartwatches or wrist bands equipped with pervasive sensors enable us to monitor our physiological signals. In this study, we address the question whether they can help us to recognize our emotions in our everyday life for ubiquitous computing. Using the systematic literature review, we identified crucial research steps and discussed the main limitations and problems in the domain. △ Less

Submitted 15 January, 2020; v1 submitted 22 December, 2019; originally announced December 2019.

Comments: 6 pages, accepted to the Emotion Aware 2020 workshop. Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media

arXiv:1909.04917 [pdf, other]

Comprehensive Analysis of Aspect Term Extraction Methods using Various Text Embeddings

Authors: Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko

Abstract: Recently, a variety of model designs and methods have blossomed in the context of the sentiment analysis domain. However, there is still a lack of wide and comprehensive studies of aspect-based sentiment analysis (ABSA). We want to fill this gap and propose a comparison with ablation analysis of aspect term extraction using various text embedding methods. We particularly focused on architectures b… ▽ More Recently, a variety of model designs and methods have blossomed in the context of the sentiment analysis domain. However, there is still a lack of wide and comprehensive studies of aspect-based sentiment analysis (ABSA). We want to fill this gap and propose a comparison with ablation analysis of aspect term extraction using various text embedding methods. We particularly focused on architectures based on long short-term memory (LSTM) with optional conditional random field (CRF) enhancement using different pre-trained word embeddings. Moreover, we analyzed the influence on the performance of extending the word vectorization step with character embedding. The experimental results on SemEval datasets revealed that not only does bi-directional long short-term memory (BiLSTM) outperform regular LSTM, but also word embedding coverage and its source highly affect aspect detection performance. An additional CRF layer consistently improves the results as well. △ Less

Submitted 10 December, 2020; v1 submitted 11 September, 2019; originally announced September 2019.

arXiv:1909.01800 [pdf, other]

doi 10.1145/3302425.3302479

Extracting Aspects Hierarchies using Rhetorical Structure Theory

Authors: Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko

Abstract: We propose a novel approach to generate aspect hierarchies that proved to be consistently correct compared with human-generated hierarchies. We present an unsupervised technique using Rhetorical Structure Theory and graph analysis. We evaluated our approach based on 100,000 reviews from Amazon and achieved an astonishing 80% coverage compared with human-generated hierarchies coded in ConceptNet. T… ▽ More We propose a novel approach to generate aspect hierarchies that proved to be consistently correct compared with human-generated hierarchies. We present an unsupervised technique using Rhetorical Structure Theory and graph analysis. We evaluated our approach based on 100,000 reviews from Amazon and achieved an astonishing 80% coverage compared with human-generated hierarchies coded in ConceptNet. The method could be easily extended with a sentiment analysis model and used to describe sentiment on different levels of aspect granularity. Hence, besides the flat aspect structure, we can differentiate between aspects and describe if the charging aspect is related to battery or price. △ Less

Submitted 4 September, 2019; originally announced September 2019.

Comments: ACAI 2018 MLNLP

Journal ref: ACAI 2018 MLNLP

arXiv:1909.01276 [pdf, other]

doi 10.1109/AIKE.2019.00016

Aspect Detection using Word and Char Embeddings with (Bi)LSTM and CRF

Authors: Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko

Abstract: We proposed a~new accurate aspect extraction method that makes use of both word and character-based embeddings. We have conducted experiments of various models of aspect extraction using LSTM and BiLSTM including CRF enhancement on five different pre-trained word embeddings extended with character embeddings. The results revealed that BiLSTM outperforms regular LSTM, but also word embedding covera… ▽ More We proposed a~new accurate aspect extraction method that makes use of both word and character-based embeddings. We have conducted experiments of various models of aspect extraction using LSTM and BiLSTM including CRF enhancement on five different pre-trained word embeddings extended with character embeddings. The results revealed that BiLSTM outperforms regular LSTM, but also word embedding coverage in train and test sets profoundly impacted aspect detection performance. Moreover, the additional CRF layer consistently improves the results across different models and text embeddings. Summing up, we obtained state-of-the-art F-score results for SemEval Restaurants (85%) and Laptops (80%). △ Less

Submitted 3 September, 2019; originally announced September 2019.

Comments: IEEE AIKE

Journal ref: 2019 IEEE Second International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Sardinia, Italy, 2019, pp. 43-50

arXiv:1810.12116 [pdf, other]

doi 10.3390/e21121152

Using Machine Learning to Predict the Evolution of Physics Research

Authors: Wenyuan Liu, Stanisław Saganowski, Przemysław Kazienko, Siew Ann Cheong

Abstract: The advancement of science as outlined by Popper and Kuhn is largely qualitative, but with bibliometric data it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore it is also important to allocate finite resources to research topics that have growth potential, to accelerate the process from scientific breakthroughs to technological innovations. In this p… ▽ More The advancement of science as outlined by Popper and Kuhn is largely qualitative, but with bibliometric data it is possible and desirable to develop a quantitative picture of scientific progress. Furthermore it is also important to allocate finite resources to research topics that have growth potential, to accelerate the process from scientific breakthroughs to technological innovations. In this paper, we address this problem of quantitative knowledge evolution by analysing the APS publication data set from 1981 to 2010. We build the bibliographic coupling and co-citation networks, use the Louvain method to detect topical clusters (TCs) in each year, measure the similarity of TCs in consecutive years, and visualize the results as alluvial diagrams. Having the predictive features describing a given TC and its known evolution in the next year, we can train a machine learning model to predict future changes of TCs, i.e., their continuing, dissolving, merging and splitting. We found the number of papers from certain journals, the degree, closeness, and betweenness to be the most predictive features. Additionally, betweenness increases significantly for merging events, and decreases significantly for splitting events. Our results represent a first step from a descriptive understanding of the Science of Science (SciSci), towards one that is ultimately prescriptive. △ Less

Submitted 29 October, 2018; originally announced October 2018.

Comments: 24 pages, 10 figures, 4 tables, supplementary information is included

arXiv:1809.06656 [pdf, other]

doi 10.1038/S41598-018-32081-2

Probing Limits of Information Spread with Sequential Seeding

Authors: Jaroslaw Jankowski, Boleslaw K. Szymanski, Przemyslaw Kazienko, Radoslaw Michalski, Piotr Brodka

Abstract: We consider here information spread which propagates with certain probability from nodes just activated to their not yet activated neighbors. Diffusion cascades can be triggered by activation of even a small set of nodes. Such activation is commonly performed in a single stage. A novel approach based on sequential seeding is analyzed here resulting in three fundamental contributions. First, we pro… ▽ More We consider here information spread which propagates with certain probability from nodes just activated to their not yet activated neighbors. Diffusion cascades can be triggered by activation of even a small set of nodes. Such activation is commonly performed in a single stage. A novel approach based on sequential seeding is analyzed here resulting in three fundamental contributions. First, we propose a coordinated execution of randomized choices to enable precise comparison of different algorithms in general. We apply it here when the newly activated nodes at each stage of spreading attempt to activate their neighbors. Then, we present a formal proof that sequential seeding delivers at least as large coverage as the single stage seeding does. Moreover, we also show that, under modest assumptions, sequential seeding achieves coverage provably better than the single stage based approach using the same number of seeds and node ranking. Finally, we present experimental results showing how single stage and sequential approaches on directed and undirected graphs compare to the well-known greedy approach to provide the objective measure of the sequential seeding benefits. Surprisingly, applying sequential seeding to a simple degree-based selection leads to higher coverage than achieved by the computationally expensive greedy approach currently considered to be the best heuristic. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Journal ref: Scientific Reports, 8:13966, Sept. 2018

arXiv:1806.04658 [pdf]

doi 10.1155/2021/4963903

Social Networks through the Prism of Cognition

Authors: Radosław Michalski, Bolesław K. Szymański, Przemysław Kazienko, Christian Lebiere, Omar Lizardo, Marcin Kulisiewicz

Abstract: Human relations are driven by social events-people interact, exchange information, share knowledge and emotions, and gather news from mass media. These events leave traces in human memory, the strength of which depends on cognitive factors such as emotions or attention span. Each trace continuously weakens over time unless another related event activity strengthens it. Here, we introduce a novel c… ▽ More Human relations are driven by social events-people interact, exchange information, share knowledge and emotions, and gather news from mass media. These events leave traces in human memory, the strength of which depends on cognitive factors such as emotions or attention span. Each trace continuously weakens over time unless another related event activity strengthens it. Here, we introduce a novel cognition-driven social network (CogSNet) model that accounts for cognitive aspects of social perception. The model explicitly represents each social interaction as a trace in human memory with its corresponding dynamics. The strength of the trace is the only measure of the influence that the interactions had on a person. For validation, we apply our model to NetSense data on social interactions among university students. The results show that CogSNet significantly improves the quality of modeling of human interactions in social networks. △ Less

Submitted 22 January, 2021; v1 submitted 12 June, 2018; originally announced June 2018.

Comments: 13 pages, 5 figures, reproductory code available

Journal ref: Complexity, Vol. 2021, Article ID 4963903

arXiv:1801.04528 [pdf, other]

doi 10.1038/s41598-018-32571-3

Entropy Measures of Human Communication Dynamics

Authors: Marcin Kulisiewicz, Przemysław Kazienko, Bolesław K. Szymański, Radosław Michalski

Abstract: Human communication is commonly represented as a temporal social network, and evaluated in terms of its uniqueness. We propose a set of new entropy-based measures for human communication dynamics represented within the temporal social network as event sequences. Using real world datasets and random interaction series of different types we find that real human contact events always significantly di… ▽ More Human communication is commonly represented as a temporal social network, and evaluated in terms of its uniqueness. We propose a set of new entropy-based measures for human communication dynamics represented within the temporal social network as event sequences. Using real world datasets and random interaction series of different types we find that real human contact events always significantly differ from random ones. This human distinctiveness increases over time and by means of the proposed entropy measures, we can observe sociological processes that take place within dynamic communities. △ Less

Submitted 14 January, 2018; originally announced January 2018.

Journal ref: Scientific Reports, 8:15697. Oct. 24, 2018

arXiv:1801.03327 [pdf, other]

doi 10.1038/s41598-019-40015-9

Priority Attachment: a Comprehensive Mechanism for Generating Networks

Authors: Mikołaj Morzy, Tomasz Kajdanowicz, Przemysław Kazienko, Grzegorz Miebs, Arkadiusz Rusin

Abstract: We claim that networks are created according to the priority attachment mechanism and we show a simple model which uses the priority attachment to generate both synthetic and close to empirical networks. Priority attachment is a mechanism which generalizes previously proposed mechanisms, such as small world creation or preferential attachment, but we also observe its presence in a range of real-wo… ▽ More We claim that networks are created according to the priority attachment mechanism and we show a simple model which uses the priority attachment to generate both synthetic and close to empirical networks. Priority attachment is a mechanism which generalizes previously proposed mechanisms, such as small world creation or preferential attachment, but we also observe its presence in a range of real-world networks. In this paper we show that by using priority attachment we can generate networks of very diverse topologies, as well as recreate empirical networks. An additional advantage of the priority attachment mechanism is an easy interpretation of the latent processes of network formation. We substantiate our claims by performing numerical experiments on synthetic and empirical networks. The two main contributions of the paper are: the introduction of the priority attachment mechanism, and the design of the Priority Rank: a simple network generative model based on the priority attachment mechanism. △ Less

Submitted 20 June, 2018; v1 submitted 10 January, 2018; originally announced January 2018.

Journal ref: Scientific Reportsvolume 9, Article number: 3383 (2019)

arXiv:1711.01867 [pdf, other]

doi 10.1371/journal.pone.0224194

Analysis of group evolution prediction in complex networks

Authors: Stanisław Saganowski, Piotr Bródka, Michał Koziarski, Przemysław Kazienko

Abstract: In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the re… ▽ More In the world, in which acceptance and the identification with social communities are highly desired, the ability to predict evolution of groups over time appears to be a vital but very complex research problem. Therefore, we propose a new, adaptable, generic and mutli-stage method for Group Evolution Prediction (GEP) in complex networks, that facilitates reasoning about the future states of the recently discovered groups. The precise GEP modularity enabled us to carry out extensive and versatile empirical studies on many real-world complex / social networks to analyze the impact of numerous setups and parameters like time window type and size, group detection method, evolution chain length, prediction models, etc. Additionally, many new predictive features reflecting the group state at a given time have been identified and tested. Some other research problems like enriching learning evolution chains with external data have been analyzed as well. △ Less

Submitted 2 November, 2019; v1 submitted 6 November, 2017; originally announced November 2017.

Journal ref: PLoS ONE 14(10): e0224194 (2019)

arXiv:1709.04863 [pdf]

doi 10.1007/978-3-319-67217-5_37

Seeds Buffering for Information Spreading Processes

Authors: Jarosław Jankowski, Piotr Bródka, Radosław Michalski, Przemysław Kazienko

Abstract: Seeding strategies for influence maximization in social networks have been studied for more than a decade. They have mainly relied on the activation of all resources (seeds) simultaneously in the beginning; yet, it has been shown that sequential seeding strategies are commonly better. This research focuses on studying sequential seeding with buffering, which is an extension to basic sequential see… ▽ More Seeding strategies for influence maximization in social networks have been studied for more than a decade. They have mainly relied on the activation of all resources (seeds) simultaneously in the beginning; yet, it has been shown that sequential seeding strategies are commonly better. This research focuses on studying sequential seeding with buffering, which is an extension to basic sequential seeding concept. The proposed method avoids choosing nodes that will be activated through the natural diffusion process, which is leading to better use of the budget for activating seed nodes in the social influence process. This approach was compared with sequential seeding without buffering and single stage seeding. The results on both real and artificial social networks confirm that the buffer-based consecutive seeding is a good trade-off between the final coverage and the time to reach it. It performs significantly better than its rivals for a fixed budget. The gain is obtained by dynamic rankings and the ability to detect network areas with nodes that are not yet activated and have high potential of activating their neighbours. △ Less

Submitted 14 September, 2017; originally announced September 2017.

Comments: Jankowski, J., Bródka, P., Michalski, R., & Kazienko, P. (2017, September). Seeds Buffering for Information Spreading Processes. In International Conference on Social Informatics (pp. 628-641). Springer

arXiv:1609.07526 [pdf, ps, other]

Balancing Speed and Coverage by Sequential Seeding in Complex Networks

Authors: Jarosław Jankowski, Piotr Bródka, Przemysław Kazienko, Boleslaw Szymanski, Radosław Michalski, Tomasz Kajdanowicz

Abstract: Information spreading in complex networks is often modeled as diffusing information with certain probability from nodes that possess it to their neighbors that do not. Information cascades are triggered when the activation of a set of initial nodes (seeds) results in diffusion to large number of nodes. Here, several novel approaches for seed initiation that replace the commonly used activation of… ▽ More Information spreading in complex networks is often modeled as diffusing information with certain probability from nodes that possess it to their neighbors that do not. Information cascades are triggered when the activation of a set of initial nodes (seeds) results in diffusion to large number of nodes. Here, several novel approaches for seed initiation that replace the commonly used activation of all seeds at once with a sequence of initiation stages are introduced. Sequential strategies at later stages avoid seeding highly ranked nodes that are already activated by diffusion active between stages. The gain arises when a saved seed is allocated to a node difficult to reach via diffusion. Sequential seeding and a single stage approach are compared using various seed ranking methods and diffusion parameters on real complex networks. The experimental results indicate that, regardless of the seed ranking method used, sequential seeding strategies deliver better coverage than single stage seeding in about 90% of cases. Longer seeding sequences tend to activate more nodes but they also extend the duration of diffusion. Various variants of sequential seeding resolve the trade-off between the coverage and speed of diffusion differently. △ Less

Submitted 12 January, 2017; v1 submitted 23 September, 2016; originally announced September 2016.

Journal ref: Scientific Reports 7:891, April 18, 2017

arXiv:1606.03335 [pdf, other]

WordNet2Vec: Corpora Agnostic Word Vectorization Method

Authors: Roman Bartusiak, Łukasz Augustyniak, Tomasz Kajdanowicz, Przemysław Kazienko, Maciej Piasecki

Abstract: A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vector… ▽ More A complex nature of big data resources demands new methods for structuring especially for textual content. WordNet is a good knowledge source for comprehensive abstraction of natural language as its good implementations exist for many languages. Since WordNet embeds natural language in the form of a complex network, a transformation mechanism WordNet2Vec is proposed in the paper. It creates vectors for each word from WordNet. These vectors encapsulate general position - role of a given word towards all other words in the natural language. Any list or set of such vectors contains knowledge about the context of its component within the whole language. Such word representation can be easily applied to many analytic tasks like classification or clustering. The usefulness of the WordNet2Vec method was demonstrated in sentiment analysis, i.e. classification with transfer learning for the real Amazon opinion textual dataset. △ Less

Submitted 10 June, 2016; originally announced June 2016.

Comments: 29 pages, 16 figures, submitted to journal

arXiv:1605.00069 [pdf]

doi 10.1007/978-1-4614-6170-8_223

Community Evolution

Authors: Stanisław Saganowski, Piotr Bródka, Przemysław Kazienko

Abstract: The continuous interest in the social network area contributes to the fast development of this field. The new possibilities of obtaining and storing data facilitate deeper analysis of the entire social network, extracted social groups and single individuals as well. One of the most interesting research topic is the network dynamics and dynamics of social groups in particular, it means analysis of… ▽ More The continuous interest in the social network area contributes to the fast development of this field. The new possibilities of obtaining and storing data facilitate deeper analysis of the entire social network, extracted social groups and single individuals as well. One of the most interesting research topic is the network dynamics and dynamics of social groups in particular, it means analysis of group evolution over time. It is the natural step forward after social community extraction. Having communities extracted, appropriate knowledge and methods for dynamic analysis may be applied in order to identify changes as well as to predict the future of all or some selected groups. Furthermore, knowing the most probably change of a given group some additional steps may be performed in order to change this predicted future according to specific needs. Such ability would be a powerful tool in the hands of human resource managers, personnel recruitment, marketing, telecommunication companies, etc. △ Less

Submitted 17 January, 2017; v1 submitted 30 April, 2016; originally announced May 2016.

Comments: This is a pre-print version of Encyclopedia of Social Network Analysis and Mining essay about different ways to track Community Evolution. This is ina update for the next edition of the Encyclopedia

Journal ref: Encyclopedia of Social Network Analysis and Mining, Springer New York, 2014, pages 220-232

arXiv:1510.01270 [pdf, other]

Learning in Unlabeled Networks - An Active Learning and Inference Approach

Authors: Tomasz Kajdanowicz, Radosław Michalski, Katarzyna Musiał, Przemysław Kazienko

Abstract: The task of determining labels of all network nodes based on the knowledge about network structure and labels of some training subset of nodes is called the within-network classification. It may happen that none of the labels of the nodes is known and additionally there is no information about number of classes to which nodes can be assigned. In such a case a subset of nodes has to be selected for… ▽ More The task of determining labels of all network nodes based on the knowledge about network structure and labels of some training subset of nodes is called the within-network classification. It may happen that none of the labels of the nodes is known and additionally there is no information about number of classes to which nodes can be assigned. In such a case a subset of nodes has to be selected for initial label acquisition. The question that arises is: "labels of which nodes should be collected and used for learning in order to provide the best classification accuracy for the whole network?". Active learning and inference is a practical framework to study this problem. A set of methods for active learning and inference for within network classification is proposed and validated. The utility score calculation for each node based on network structure is the first step in the process. The scores enable to rank the nodes. Based on the ranking, a set of nodes, for which the labels are acquired, is selected (e.g. by taking top or bottom N from the ranking). The new measure-neighbour methods proposed in the paper suggest not obtaining labels of nodes from the ranking but rather acquiring labels of their neighbours. The paper examines 29 distinct formulations of utility score and selection methods reporting their impact on the results of two collective classification algorithms: Iterative Classification Algorithm and Loopy Belief Propagation. We advocate that the accuracy of presented methods depends on the structural properties of the examined network. We claim that measure-neighbour methods will work better than the regular methods for networks with higher clustering coefficient and worse than regular methods for networks with low clustering coefficient. According to our hypothesis, based on clustering coefficient we are able to recommend appropriate active learning and inference method. △ Less

Submitted 5 October, 2015; originally announced October 2015.

Journal ref: AI Communications, Vol. 29, No. 1, 2016, IOS Press

arXiv:1505.03049 [pdf]

doi 10.1016/j.chb.2014.12.015

Knowledge Acquisition from Social Platforms Based on Network Distributions Fitting

Authors: Jarosław Jankowski, Radosław Michalski, Piotr Bródka, Przemysław Kazienko, Sonja Utz

Abstract: The uniqueness of online social networks makes it possible to implement new methods that increase the quality and effectiveness of research processes. While surveys are one of the most important tools for research, the representativeness of selected online samples is often a challenge and the results are hardly generalizable. An approach based on surveys with representativeness targeted at network… ▽ More The uniqueness of online social networks makes it possible to implement new methods that increase the quality and effectiveness of research processes. While surveys are one of the most important tools for research, the representativeness of selected online samples is often a challenge and the results are hardly generalizable. An approach based on surveys with representativeness targeted at network measure distributions is proposed and analysed in this paper. Its main goal is to focus not only on sample representativeness in terms of demographic attributes, but also to follow the measures distributions within main network. The approach presented has many application areas related to online research, sampling a network for the evaluation of collaborative learning processes, and candidate selection for training purposes with the ability to distribute information within a social network. △ Less

Submitted 12 May, 2015; originally announced May 2015.

Journal ref: Computers in Human Behavior, 2014, 12

arXiv:1505.01709 [pdf]

doi 10.3390/e17053053

Predicting Community Evolution in Social Networks

Authors: Stanisław Saganowski, Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Przemysław Kazienko, Jarosław Koźlak

Abstract: Nowadays, sustained development of different social media can be observed worldwide. One of the relevant research domains intensively explored recently is analysis of social communities existing in social media as well as prediction of their future evolution taking into account collected historical evolution chains. These evolution chains proposed in the paper contain group states in the previous… ▽ More Nowadays, sustained development of different social media can be observed worldwide. One of the relevant research domains intensively explored recently is analysis of social communities existing in social media as well as prediction of their future evolution taking into account collected historical evolution chains. These evolution chains proposed in the paper contain group states in the previous time frames and its historical transitions that were identified using one out of two methods: Stable Group Changes Identification (SGCI) and Group Evolution Discovery (GED). Based on the observed evolution chains of various length, structural network features are extracted, validated and selected as well as used to learn classification models. The experimental studies were performed on three real datasets with different profile: DBLP, Facebook and Polish blogosphere. The process of group prediction was analysed with respect to different classifiers as well as various descriptive feature sets extracted from evolution chains of different length. The results revealed that, in general, the longer evolution chains the better predictive abilities of the classification models. However, chains of length 3 to 7 enabled the GED-based method to almost reach its maximum possible prediction quality. For SGCI, this value was at the level of 3 to 5 last periods. △ Less

Submitted 7 May, 2015; originally announced May 2015.

Comments: Entropy 2015, 17, 1-x manuscripts; doi:10.3390/e170x000x 46 pages

Journal ref: Entropy 2015, 17, 3053-3096

arXiv:1407.1056 [pdf]

doi 10.1155/2014/359868

Extraction of Multi-layered Social Networks from Activity Data

Authors: Katarzyna Musial, Piotr Bródka, Przemysław Kazienko, Jarosław Gaworecki

Abstract: The data gathered in all kind of web-based systems, which enable users to interact with each other, provides an opportunity to extract social networks that consist of people and relationships between them. The emerging structures are very complex due to the number and type of discovered connections. In webbased systems, the characteristic element of each interaction between users is that there is… ▽ More The data gathered in all kind of web-based systems, which enable users to interact with each other, provides an opportunity to extract social networks that consist of people and relationships between them. The emerging structures are very complex due to the number and type of discovered connections. In webbased systems, the characteristic element of each interaction between users is that there is always an object that serves as a communication medium. This can be e.g. an email sent from one user to another or post at the forum authored by one user and commented by others. Based on these objects and activities that users perform towards them, different kinds of relationships can be identified and extracted. Additional challenge arises from the fact that hierarchies can exist between objects, e.g. a forum consists of one or more groups of topics, and each of them contains topics that finally include posts. In this paper, we propose a new method for creation of multi-layered social network based on the data about users activities towards different types of objects between which the hierarchy exists. Due to the flattening, preprocessing procedure new layers and new relationships in the multi-layered social network can be identified and analysed. △ Less

Submitted 3 July, 2014; originally announced July 2014.

Comments: 20 pages, 15 figures

Journal ref: The Scientific World Journal, vol. 2014, Article ID 359868, 13 pages, 2014

arXiv:1405.0538 [pdf, other]

doi 10.1007/s00354-014-0402-9

Seed Selection for Spread of Influence in Social Networks: Temporal vs. Static Approach

Authors: Radosław Michalski, Tomasz Kajdanowicz, Piotr Bródka, Przemysław Kazienko

Abstract: The problem of finding optimal set of users for influencing others in the social network has been widely studied. Because it is NP-hard, some heuristics were proposed to find sub-optimal solutions. Still, one of the commonly used assumption is the one that seeds are chosen on the static network, not the dynamic one. This static approach is in fact far from the real-world networks, where new nodes… ▽ More The problem of finding optimal set of users for influencing others in the social network has been widely studied. Because it is NP-hard, some heuristics were proposed to find sub-optimal solutions. Still, one of the commonly used assumption is the one that seeds are chosen on the static network, not the dynamic one. This static approach is in fact far from the real-world networks, where new nodes may appear and old ones dynamically disappear in course of time. The main purpose of this paper is to analyse how the results of one of the typical models for spread of influence - linear threshold - differ depending on the strategy of building the social network used later for choosing seeds. To show the impact of network creation strategy on the final number of influenced nodes - outcome of spread of influence, the results for three approaches were studied: one static and two temporal with different granularities, i.e. various number of time windows. Social networks for each time window encapsulated dynamic changes in the network structure. Calculation of various node structural measures like degree or betweenness respected these changes by means of forgetting mechanism - more recent data had greater influence on node measure values. These measures were, in turn, used for node ranking and their selection for seeding. All concepts were applied to experimental verification on five real datasets. The results revealed that temporal approach is always better than static and the higher granularity in the temporal social network while seeding, the more finally influenced nodes. Additionally, outdegree measure with exponential forgetting typically outperformed other time-dependent structural measures, if used for seed candidate ranking. △ Less

Submitted 21 November, 2014; v1 submitted 2 May, 2014; originally announced May 2014.

Journal ref: New Generation Computing, Vol. 32, Issue 3-4, pp. 213-235, 2014

arXiv:1306.3517 [pdf]

doi 10.1145/2492517.2500231

Different Approaches to Community Evolution Prediction in Blogosphere

Authors: Bogdan Gliwa, Piotr Bródka, Anna Zygmunt, Stanisław Saganowski, Przemysław Kazienko, Jarosław Koźlak

Abstract: Predicting the future direction of community evolution is a problem with high theoretical and practical significance. It allows to determine which characteristics describing communities have importance from the point of view of their future behaviour. Knowledge about the probable future career of the community aids in the decision concerning investing in contact with members of a given community a… ▽ More Predicting the future direction of community evolution is a problem with high theoretical and practical significance. It allows to determine which characteristics describing communities have importance from the point of view of their future behaviour. Knowledge about the probable future career of the community aids in the decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it. It also allows to determine effective ways of forming opinions or to protect group participants against such activities. In the paper, a new approach to group identification and prediction of future events is presented together with the comparison to existing method. Performed experiments prove a high quality of prediction results. Comparison to previous studies shows that using many measures to describe the group profile, and in consequence as a classifier input, can improve predictions. △ Less

Submitted 14 June, 2013; originally announced June 2013.

Comments: SNAA2013 at ASONAM2013 IEEE Computer Society

arXiv:1306.0326 [pdf, other]

Parallel Processing of Large Graphs

Authors: Tomasz Kajdanowicz, Przemyslaw Kazienko, Wojciech Indyk

Abstract: More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous P… ▽ More More and more large data collections are gathered worldwide in various IT systems. Many of them possess the networked nature and need to be processed and analysed as graph structures. Due to their size they require very often usage of parallel paradigm for efficient computation. Three parallel techniques have been compared in the paper: MapReduce, its map-side join extension and Bulk Synchronous Parallel (BSP). They are implemented for two different graph problems: calculation of single source shortest paths (SSSP) and collective classification of graph nodes by means of relational influence propagation (RIP). The methods and algorithms are applied to several network datasets differing in size and structural profile, originating from three domains: telecommunication, multimedia and microblog. The results revealed that iterative graph processing with the BSP implementation always and significantly, even up to 10 times outperforms MapReduce, especially for algorithms with many iterations and sparse communication. Also MapReduce extension based on map-side join usually noticeably presents better efficiency, although not as much as BSP. Nevertheless, MapReduce still remains the good alternative for enormous networks, whose data structures do not fit in local memories. △ Less

Submitted 3 June, 2013; originally announced June 2013.

Comments: Preprint submitted to Future Generation Computer Systems

MSC Class: 65Y05 ACM Class: D.1.3

arXiv:1304.4137 [pdf]

Group Evolution Discovery in Social Networks

Authors: Piotr Bródka, Stanisław Saganowski, Przemysław Kazienko

Abstract: Group extraction and their evolution are among the topics which arouse the greatest interest in the domain of social network analysis. However, while the grouping methods in social networks are developed very dynamically, the methods of group evolution discovery and analysis are still uncharted territory on the social network analysis map. Therefore the new method for the group evolution discovery… ▽ More Group extraction and their evolution are among the topics which arouse the greatest interest in the domain of social network analysis. However, while the grouping methods in social networks are developed very dynamically, the methods of group evolution discovery and analysis are still uncharted territory on the social network analysis map. Therefore the new method for the group evolution discovery called GED is proposed in this paper. Additionally, the results of the first experiments on the email based social network together with comparison with two other methods of group evolution discovery are presented. △ Less

Submitted 15 April, 2013; originally announced April 2013.

Comments: Brodka, P.; Saganowski, S.; Kazienko, P., "Group Evolution Discovery in Social Networks," Advances in Social Networks Analysis and Mining (ASONAM), 2011 International Conference on, vol., no., pp.247,253, 25-27 July 2011 doi: 10.1109/ASONAM.2011.69

arXiv:1304.1877 [pdf]

Privacy-preserving Data Mining, Sharing and Publishing

Authors: Katarzyna Pasierb, Tomasz Kajdanowicz, Przemyslaw Kazienko

Abstract: The goal of the paper is to present different approaches to privacy-preserving data sharing and publishing in the context of e-health care systems. In particular, the literature review on technical issues in privacy assurance and current real-life high complexity implementation of medical system that assumes proper data sharing mechanisms are presented in the paper. The goal of the paper is to present different approaches to privacy-preserving data sharing and publishing in the context of e-health care systems. In particular, the literature review on technical issues in privacy assurance and current real-life high complexity implementation of medical system that assumes proper data sharing mechanisms are presented in the paper. △ Less

Submitted 6 April, 2013; originally announced April 2013.

Journal ref: Journal of Medical Informatics & Technologies, Vol. 18, pp. 69-76, 2011

arXiv:1303.5009 [pdf]

doi 10.1109/CASoN.2012.6412380

Quantifying Social Network Dynamics

Authors: Radosław Michalski, Piotr Bródka, Przemysław Kazienko, Krzysztof Juszczyszyn

Abstract: The dynamic character of most social networks requires to model evolution of networks in order to enable complex analysis of theirs dynamics. The following paper focuses on the definition of differences between network snapshots by means of Graph Differential Tuple. These differences enable to calculate the diverse distance measures as well as to investigate the speed of changes. Four separate mea… ▽ More The dynamic character of most social networks requires to model evolution of networks in order to enable complex analysis of theirs dynamics. The following paper focuses on the definition of differences between network snapshots by means of Graph Differential Tuple. These differences enable to calculate the diverse distance measures as well as to investigate the speed of changes. Four separate measures are suggested in the paper with experimental study on real social network data. △ Less

Submitted 20 March, 2013; originally announced March 2013.

Comments: In proceedings of the 4th International Conference on Computational Aspects of Social Networks, CASoN 2012

Journal ref: Michalski, R., Brodka, P., Kazienko, P., Juszczyszyn, K.: Quantifying Social Network Dynamics. IEEE Computer Society, pp. 59-64 (2012)

arXiv:1303.2369 [pdf]

doi 10.1109/CGC.2012.95

Negative Effects of Incentivised Viral Campaigns for Activity in Social Networks

Authors: Radosław Michalski, Jarosław Jankowski, Przemysław Kazienko

Abstract: Viral campaigns are crucial methods for word-of-mouth marketing in social communities. The goal of these campaigns is to encourage people for activity. The problem of incentivised and non-incentivised campaigns is studied in the paper. Based on the data collected within the real social networking site both approaches were compared. The experimental results revealed that a highly motivated campaign… ▽ More Viral campaigns are crucial methods for word-of-mouth marketing in social communities. The goal of these campaigns is to encourage people for activity. The problem of incentivised and non-incentivised campaigns is studied in the paper. Based on the data collected within the real social networking site both approaches were compared. The experimental results revealed that a highly motivated campaign not necessarily provides better results due to overlapping effect. Additional studies have shown that the behaviour of individual community members in the campaign based on their service profile can be predicted but the classification accuracy may be limited. △ Less

Submitted 10 March, 2013; originally announced March 2013.

Comments: In proceedings of the 2nd International Conference on Social Computing and its Applications, SCA 2012

Journal ref: Michalski, R., Jankowski, J., Kazienko, P.: Negative Effects of Incentivised Viral Campaigns for Activity in Social Networks. IEEE Computer Society, pp. 391-398 (2012)

arXiv:1303.2364 [pdf]

doi 10.1007/978-3-642-35386-4_34

The Multidimensional Study of Viral Campaigns as Branching Processes

Authors: Jarosław Jankowski, Radosław Michalski, Przemysław Kazienko

Abstract: Viral campaigns on the Internet may follow variety of models, depending on the content, incentives, personal attitudes of sender and recipient to the content and other factors. Due to the fact that the knowledge of the campaign specifics is essential for the campaign managers, researchers are constantly evaluating models and real-world data. The goal of this article is to present the new knowledge… ▽ More Viral campaigns on the Internet may follow variety of models, depending on the content, incentives, personal attitudes of sender and recipient to the content and other factors. Due to the fact that the knowledge of the campaign specifics is essential for the campaign managers, researchers are constantly evaluating models and real-world data. The goal of this article is to present the new knowledge obtained from studying two viral campaigns that took place in a virtual world which followed the branching process. The results show that it is possible to reduce the time needed to estimate the model parameters of the campaign and, moreover, some important aspects of time-generations relationship are presented. △ Less

Submitted 10 March, 2013; originally announced March 2013.

Comments: In proceedings of the 4th International Conference on Social Informatics, SocInfo 2012

Journal ref: Jankowski, J., Michalski, R., Kazienko, P.: The Multidimensional Study of Viral Campaigns as Branching Processes. K. Aberer et al. (Eds.): LNCS, vol. 7710, pp. 462-474, Springer, Berlin Heidelberg (2012)

arXiv:1303.0095 [pdf]

doi 10.1007/978-3-642-16567-2_7

Label-dependent Feature Extraction in Social Networks for Node Classification

Authors: Tomasz Kajdanowicz, Przemyslaw Kazienko, Piotr Doskocz

Abstract: A new method of feature extraction in the social network for within-network classification is proposed in the paper. The method provides new features calculated by combination of both: network structure information and class labels assigned to nodes. The influence of various features on classification performance has also been studied. The experiments on real-world data have shown that features cr… ▽ More A new method of feature extraction in the social network for within-network classification is proposed in the paper. The method provides new features calculated by combination of both: network structure information and class labels assigned to nodes. The influence of various features on classification performance has also been studied. The experiments on real-world data have shown that features created owing to the proposed method can lead to significant improvement of classification accuracy. △ Less

Submitted 1 March, 2013; originally announced March 2013.

Comments: feature extraction, label-dependent features, classification, social network analysis, AMD social network

MSC Class: 91D30; 68T05; 68T10 ACM Class: I.2.8; I.2.11

Journal ref: Kajdanowicz T., Kazienko P., Doskocz P.: Label-dependent Feature Extraction in Social Networks for Node Classification. Lecture Notes in Artificial Intelligence LNAI 6430, Springer, 2010, pp. 89-102

arXiv:1303.0093 [pdf]

doi 10.1109/TSMCA.2011.2132707

Multidimensional Social Network in the Social Recommender System

Authors: Przemyslaw Kazienko, Katarzyna Musial, Tomasz Kajdanowicz

Abstract: All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and ob… ▽ More All online sharing systems gather data that reflects users' collective behaviour and their shared activities. This data can be used to extract different kinds of relationships, which can be grouped into layers, and which are basic components of the multidimensional social network proposed in the paper. The layers are created on the basis of two types of relations between humans, i.e. direct and object-based ones which respectively correspond to either social or semantic links between individuals. For better understanding of the complexity of the social network structure, layers and their profiles were identified and studied on two, spanned in time, snapshots of the Flickr population. Additionally, for each layer, a separate strength measure was proposed. The experiments on the Flickr photo sharing system revealed that the relationships between users result either from semantic links between objects they operate on or from social connections of these users. Moreover, the density of the social network increases in time. The second part of the study is devoted to building a social recommender system that supports the creation of new relations between users in a multimedia sharing system. Its main goal is to generate personalized suggestions that are continuously adapted to users' needs depending on the personal weights assigned to each layer in the multidimensional social network. The conducted experiments confirmed the usefulness of the proposed model. △ Less

Submitted 1 March, 2013; originally announced March 2013.

Comments: social recommender system;Multidimensional social network (MSN);Web 2.0;multi-layered social network;multimedia sharing system (MSS);recommender system;social network analysis

MSC Class: 91D30 ACM Class: H.3.4; H.3.5

Journal ref: Kazienko, P.; Musial, K.; Kajdanowicz, T.; , "Multidimensional Social Network in the Social Recommender System," Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on , vol.41, no.4, pp.746-759, July 2011

arXiv:1301.1534 [pdf]

doi 10.2478/v10209-011-0017-6

Influence Of The User Importance Measure On The Group Evolution Discovery

Authors: Stanisław Saganowski, Piotr Bródka, Przemysław Kazienko

Abstract: One of the most interesting topics in social network science are social groups. Their extraction, dynamics and evolution. One year ago the method for group evolution discovery (GED) was introduced. The GED method during extraction process takes into account both the group members quality and quantity. The quality is reflected by user importance measure. In this paper the influence of different use… ▽ More One of the most interesting topics in social network science are social groups. Their extraction, dynamics and evolution. One year ago the method for group evolution discovery (GED) was introduced. The GED method during extraction process takes into account both the group members quality and quantity. The quality is reflected by user importance measure. In this paper the influence of different user importance measures on the results of the GED method is examined and presented. The results indicate that using global measures like social position (page rank) allows to achieve more precise results than using local measures like degree centrality or no measure at all. △ Less

Submitted 8 January, 2013; originally announced January 2013.

Comments: Creative Commons Attribution-NonCommercial-NoDerivs license. Presented at the Congress of Young IT Scientists, Mi{\ke}dzyzdroje, Poland, 20-22.09.2012

Journal ref: Foundations of Computing and Decision Sciences, Volume 37, Issue 4, Pages 293-303, 2012

arXiv:1212.2425 [pdf]

Multi-layered Social Networks

Authors: Piotr Bródka, Przemysław Kazienko

Abstract: It is quite obvious that in the real world, more than one kind of relationship can exist between two actors and that those ties can be so intertwined that it is impossible to analyse them separately [Fienberg 85], [Minor 83], [Szell 10]. Social networks with more than one type of relation are not a completely new concept [Wasserman 94] but they were analysed mainly at the small scale, e.g. in [McP… ▽ More It is quite obvious that in the real world, more than one kind of relationship can exist between two actors and that those ties can be so intertwined that it is impossible to analyse them separately [Fienberg 85], [Minor 83], [Szell 10]. Social networks with more than one type of relation are not a completely new concept [Wasserman 94] but they were analysed mainly at the small scale, e.g. in [McPherson 01], [Padgett 93], and [Entwisle 07]. Just like in the case of regular single-layered social network there is no widely accepted definition or even common name. At the beginning such networks have been called multiplex network [Haythornthwaite 99], [Monge 03]. The term is derived from communications theory which defines multiplex as combining multiple signals into one in such way that it is possible to separate them if needed [Hamill 06]. Recently, the area of multi-layered social network has started attracting more and more attention in research conducted within different domains [Kazienko 11a], [Szell 10], [Rodriguez 07], [Rodriguez 09], and the meaning of multiplex network has expanded and covers not only social relationships but any kind of connection, e.g. based on geography, occupation, kinship, hobbies, etc. [Abraham 12]. This essay aims to summarize existing knowledge about one concept which has many different names i.e. the concept of Multi-layered Social Network also known as Layered social network, Multi-relational social network, Multidimensional social network, Multiplex social network △ Less

Submitted 14 December, 2012; v1 submitted 11 December, 2012; originally announced December 2012.

Comments: It is the begining of essay for Encyclopedia of Social Network Analysis and Mining, Springer 2013, so please cite as: Bródka P., Kazienko P., Multi-layered Social Networks, Encyclopedia of Social Network Analysis and Mining, Springer 2013

arXiv:1210.5240 [pdf]

Tracking Group Evolution in Social Networks

Authors: Piotr Bródka, Stanisław Saganowski, Przemysław Kazienko

Abstract: Easy access and vast amount of data, especially from long period of time, allows to divide social network into timeframes and create temporal social network. Such network enables to analyse its dynamics. One aspect of the dynamics is analysis of social communities evolution, i.e., how particular group changes over time. To do so, the complete group evolution history is needed. That is why in this… ▽ More Easy access and vast amount of data, especially from long period of time, allows to divide social network into timeframes and create temporal social network. Such network enables to analyse its dynamics. One aspect of the dynamics is analysis of social communities evolution, i.e., how particular group changes over time. To do so, the complete group evolution history is needed. That is why in this paper the new method for group evolution extraction called GED is presented. △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: Bródka P., Saganowski P., Kazienko P.: Tracking Group Evolution in Social Networks. SocInfo'11, The Third International Conference on Social Informatics, Lecture Notes in Artificial Intelligence LNAI, Springer, 2011, pp. 316-319. To see extended version of this paper check http://arxiv.org/abs/1207.4297. arXiv admin note: substantial text overlap with arXiv:1210.5167

Journal ref: Lecture Notes in Artificial Intelligence LNAI , Springer, 2011

arXiv:1210.5184 [pdf]

doi 10.1109/CASON.2011.6085951

A degree centrality in multi-layered social network

Authors: Piotr Bródka, Krzysztof Skibicki, Przemysław Kazienko, Katarzyna Musiał

Abstract: Multi-layered social networks reflect complex relationships existing in modern interconnected IT systems. In such a network each pair of nodes may be linked by many edges that correspond to different communication or collaboration user activities. Multi-layered degree centrality for multi-layered social networks is presented in the paper. Experimental studies were carried out on data collected fro… ▽ More Multi-layered social networks reflect complex relationships existing in modern interconnected IT systems. In such a network each pair of nodes may be linked by many edges that correspond to different communication or collaboration user activities. Multi-layered degree centrality for multi-layered social networks is presented in the paper. Experimental studies were carried out on data collected from the real Web 2.0 site. The multi-layered social network extracted from this data consists of ten distinct layers and the network analysis was performed for different degree centralities measures. △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: Brodka, P.; Skibicki, K.; Kazienko, P.; Musial, K.; "A degree centrality in multi-layered social network," Computational Aspects of Social Networks (CASoN), 2011 International Conference on, vol., no., pp.237-242, 19-21 Oct. 2011 doi: 10.1109/CASON.2011.6085951; http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6085951&isnumber=6085907

arXiv:1210.5180 [pdf]

doi 10.1109/ASONAM.2011.67

Shortest Path Discovery in the Multi-layered Social Network

Authors: Piotr Bródka, Paweł Stawiak, Przemysław Kazienko

Abstract: Multi-layered social networks consist of the fixed set of nodes linked by multiple connections. These connections may be derived from different types of user activities logged in the IT system. To calculate any structural measures for multi-layered networks this multitude of relations should be coped with in the parameterized way. Two separate algorithms for evaluation of shortest paths in the mul… ▽ More Multi-layered social networks consist of the fixed set of nodes linked by multiple connections. These connections may be derived from different types of user activities logged in the IT system. To calculate any structural measures for multi-layered networks this multitude of relations should be coped with in the parameterized way. Two separate algorithms for evaluation of shortest paths in the multi-layered social network are proposed in the paper. The first one is based on pre-processing - aggregation of multiple links into single multi-layered edges, whereas in the second approach, many edges are processed 'on the fly' in the middle of path discovery. Experimental studies carried out on the DBLP database converted into the multi-layered social network are presented as well. △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: This is an extended version of the paper ASONAM 2011, IEEE Computer Society, pp. 497-501 DOI 10.1109/ASONAM.2011.67

Journal ref: IEEE Computer Society, 2011

arXiv:1210.5171 [pdf]

doi 10.1109/ASONAM.2012.207

Identification of Group Changes in Blogosphere

Authors: Bogdan Gliwa, Stanisław Saganowski, Anna Zygmunt, Piotr Bródka, Przemysław Kazienko, Jarosław Koźlak

Abstract: The paper addresses a problem of change identification in social group evolution. A new SGCI method for discovering of stable groups was proposed and compared with existing GED method. The experimental studies on a Polish blogosphere service revealed that both methods are able to identify similar evolution events even though both use different concepts. Some differences were demonstrated as well The paper addresses a problem of change identification in social group evolution. A new SGCI method for discovering of stable groups was proposed and compared with existing GED method. The experimental studies on a Polish blogosphere service revealed that both methods are able to identify similar evolution events even though both use different concepts. Some differences were demonstrated as well △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Computer Society, 2012, pp. 1233-1238

Journal ref: IEEE Computer Society, 2012

arXiv:1210.5167 [pdf]

doi 10.1109/ASONAM.2012.113

Influence of the Dynamic Social Network Timeframe Type and Size on the Group Evolution Discovery

Authors: Stanisław Saganowski, Piotr Bródka, Przemysław Kazienko

Abstract: New technologies allow to store vast amount of data about users interaction. From those data the social network can be created. Additionally, because usually also time and dates of this activities are stored, the dynamic of such network can be analysed by splitting it into many timeframes representing the state of the network during specific period of time. One of the most interesting issue is gro… ▽ More New technologies allow to store vast amount of data about users interaction. From those data the social network can be created. Additionally, because usually also time and dates of this activities are stored, the dynamic of such network can be analysed by splitting it into many timeframes representing the state of the network during specific period of time. One of the most interesting issue is group evolution over time. To track group evolution the GED method can be used. However, choice of the timeframe type and length might have great influence on the method results. Therefore, in this paper, the influence of timeframe type as well as timeframe length on the GED method results is extensively analysed. △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: The 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Computer Society, 2012, pp. 678-682

Journal ref: IEEE Computer Society, 2012

arXiv:1210.5161 [pdf]

Predicting Group Evolution in the Social Network

Authors: Piotr Bródka, Przemysław Kazienko, Bartosz Kołoszczyk

Abstract: Groups - social communities are important components of entire societies, analysed by means of the social network concept. Their immanent feature is continuous evolution over time. If we know how groups in the social network has evolved we can use this information and try to predict the next step in the given group evolution. In the paper, a new aproach for group evolution prediction is presented… ▽ More Groups - social communities are important components of entire societies, analysed by means of the social network concept. Their immanent feature is continuous evolution over time. If we know how groups in the social network has evolved we can use this information and try to predict the next step in the given group evolution. In the paper, a new aproach for group evolution prediction is presented and examined. Experimental studies on four evolving social networks revealed that (i) the prediction based on the simple input features may be very accurate, (ii) some classifiers are more precise than the others and (iii) parameters of the group evolution extracion method significantly influence the prediction quality. △ Less

Submitted 18 October, 2012; originally announced October 2012.

Comments: Bródka P., Kazienko P, Kołoszczyk B., Predicting Group Evolution in the Social Network K. Aberer et al. (Eds.): SocInfo 2012, LNCS 7710, pp. 54-67, 2012

Journal ref: K. Aberer et al. (Eds.): SocInfo 2012, LNCS 7710, pp. 54-67, 2012

arXiv:1209.6050 [pdf]

An Introduction to Community Detection in Multi-layered Social Network

Authors: Piotr Bródka, Tomasz Filipowski, Przemysław Kazienko

Abstract: Social communities extraction and their dynamics are one of the most important problems in today's social network analysis. During last few years, many researchers have proposed their own methods for group discovery in social networks. However, almost none of them have noticed that modern social networks are much more complex than few years ago. Due to vast amount of different data about various u… ▽ More Social communities extraction and their dynamics are one of the most important problems in today's social network analysis. During last few years, many researchers have proposed their own methods for group discovery in social networks. However, almost none of them have noticed that modern social networks are much more complex than few years ago. Due to vast amount of different data about various user activities available in IT systems, it is possible to distinguish the new class of social networks called multi-layered social network. For that reason, the new approach to community detection in the multi-layered social network, which utilizes multi-layered edge clustering coefficient is proposed in the paper. △ Less

Submitted 26 September, 2012; originally announced September 2012.

Comments: M.D. Lytras et al. (Eds.): WSKS 2011, CCIS 278, pp. 185-190, 2012

Journal ref: CCIS 278, pp. 185-190, 2012

arXiv:1207.4297 [pdf]

doi 10.1007/s13278-012-0058-8

GED: the method for group evolution discovery in social networks

Authors: Piotr Bródka, Stanisław Saganowski, Przemysław Kazienko

Abstract: The continuous interest in the social network area contributes to the fast development of this field. The new possibilities of obtaining and storing data facilitate deeper analysis of the entire network, extracted social groups and single individuals as well. One of the most interesting research topic is the dynamics of social groups, it means analysis of group evolution over time. Having appropri… ▽ More The continuous interest in the social network area contributes to the fast development of this field. The new possibilities of obtaining and storing data facilitate deeper analysis of the entire network, extracted social groups and single individuals as well. One of the most interesting research topic is the dynamics of social groups, it means analysis of group evolution over time. Having appropriate knowledge and methods for dynamic analysis, one may attempt to predict the future of the group, and then manage it properly in order to achieve or change this predicted future according to specific needs. Such ability would be a powerful tool in the hands of human resource managers, personnel recruitment, marketing, etc. The social group evolution consists of individual events and seven types of such changes have been identified in the paper: continuing, shrinking, growing, splitting, merging, dissolving and forming. To enable the analysis of group evolution a change indicator - inclusion measure was proposed. It has been used in a new method for exploring the evolution of social groups, called Group Evolution Discovery (GED). The experimental results of its use together with the comparison to two well-known algorithms in terms of accuracy, execution time, flexibility and ease of implementation are also described in the paper. △ Less

Submitted 22 July, 2012; v1 submitted 18 July, 2012; originally announced July 2012.

Comments: 14 pages, Social Network Analysis and Mining

Showing 1–50 of 51 results for author: Kazienko, P