Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)(2017)

Authors

Smirnoff I. , Devyatkin D. , Kobozeva M. , Suvorova (Ananieva) M.

Annotation

In this paper we present results of a research on automatic extremist text detection. For this purporse an experimental dataset in the Russian language was created. According to the Russian legislation we cannot make it publicly available. We compared various classification methods (multinomial naive Bayes, logistic regression, linear SVM, random forest, and gradient boosting) and evaluated the contribution of differentiating features (lexical, semantic and psycholinguistic) to classification quality. The results of experiments show that psycholinguistic and semantic features are promising for extremist text detection.

External links

DOI: https://doi.org/10.1109/ISI.2017.8004907

Publications of Higher School of Economics: https://publications.hse.ru/en/chapters/215574374

ResearchGate: https://www.researchgate.net/publication/319051546_Exploring_linguistic_features_for_extremist_texts_detection_on_the_material_of_Russian-speaking_illegal_texts

Semantic Scholar: https://api.semanticscholar.org/CorpusID:38805623

Reference link

Devyatkin, D., Smirnov, I., Ananyeva, M., Kobozeva, M., Chepovskiy, A., Solovyev, F. Exploring linguistic features for extremist texts detection (on the material of Russian-speaking illegal texts)(2017) 2017 IEEE International Conference on Intelligence and Security Informatics: Security and Big Data, ISI 2017, art. no. 8004907, pp. 188-190.