ParaPlag: Корпус для выявления перефразированных текстовых заимствований на русском языке

Авторы

Смирнов И. В. , Соченков И. В. , Зубарев Д. В.

Аннотация

The paper presents the ParaPlag: a large text dataset in Russian to evaluate and compare quality metrics of different plagiarism detection approaches that deal with big data. The competition PlagEvalRus-2017 aimed to evaluate plagiarism detection methods uses the ParaPlag as a main dataset for source retrieval and text alignment tasks. The ParaPlag is open and available on the Web. We propose a guide for writers who want to contribute to the ParaPlag and extend it. The analysis of text rewrite techniques used by unscrupulous authors is also presented in our research.

Внешние ссылки

PDF на сайте Международной конференции «Диалог» (англ.): http://www.dialog-21.ru/media/3950/sochenkovivetal.pdf

РИНЦ: https://elibrary.ru/item.asp?id=31051080

Скачать PDF или читать онлайн на ResearchGate (англ.): https://www.researchgate.net/publication/330401168_THE_PARAPLAG_RUSSIAN_DATASET_FOR_PARAPHRASED_PLAGIARISM_DETECTION

Ссылка при цитировании

Sochenkov I. V., Zubarev D. V., Smirnov I. V. The ParaPlag: Russian Dataset for Paraphrased Plagiarism Detection. Computational Linguistics and Intellectual Technologies // Papers from the Annual International Conference "Dialogue" 2017, v. 1, pp. 284–296