Analysis of Corpus of Extremist Texts and Unlawful Texts

Authors

Smirnoff I. Suvorova (Ananieva) M.

Annotation

The purpose of the study: development of a technique of creation and automatic analysis of special corpora for their subsequent application as the training datasets and detecting the differentiating characters in problems of text classification. Method: tools of the analysis of the case TXM platform expanded with the developed procedures of calculation of additional characteristics of texts, such as combinations of letters, pseudo-bases, noun phrases, verb phrases were used. Results: it is shown that the developed extenders of the case TXM platform allow to solve effectively problems of the analysis of texts of special subject, the created corpus of extremist subject can be used as the training selection for problems of classification of texts, the conclusion about use of combinations of letters as the universal differentiating characters along with classical linguistic characteristics of texts is drawn.

External links

DOI and a link to the PDF file (in Russian): https://doi.org/10.21681/2311-3456-2019-4-54-60

Contents of 4th issue of the Cybersecurity Issues journal with a link to PDF (in Russian): https://cyberrus.com/voprosy_kiberbezopasnosti_444/?lang=en

Read or download PDF at ResearchGate (in Russian): https://www.researchgate.net/publication/335256244_Analysis_of_Corpus_of_Extremist_Texts_and_Unlawful_Texts

Reference link

Lavrentev A. M., Smirnov I. V., Solovyov F. N., Suvorova M. I., Fokina A. I., Chepovsky A. M. Analysis of Corpus of Extremist Texts and Unlawful Texts // Cybersecurity Issues. – 2019. – No. 4. – Page 54-60.