Классификация риторических отношений для дискурсивного анализа текстов на русском языке


Смирнов И. В. Чистова Е. В. Кобозева М. В.


The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for Russian language. The corpus provides the discourse annotation in a widely adopted formalisation – Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting.

Внешние ссылки

PDF на сайте Международной конференции «Диалог» (англ.): www.dialog-21.ru/media/4595/chistovaevplusetal-076.pdf

PDF на сайте ВШЭ (англ.): https://www.hse.ru/data/2019/06/20/1488855652/2019_Dialogue_Chistova.pdf

РИНЦ: https://elibrary.ru/item.asp?id=43244096

РУДН. Репозиторий: https://repository.rudn.ru/ru/records/article/record/65814/

Ссылка при цитировании

Chistova E. V., Shelmanov A. O., Kobozeva M. V., Pisarevskaya D. B., Smirnov I. V., Toldova S. Yu. Classification Models for RST Discourse Parsing of Texts In Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”. – 2019. – Pages 163-176.