The paper considers the task of automatic discourse parsing of texts in Russian. Discourse parsing is a well-known approach to capturing text semantics across boundaries of single sentences. Discourse annotation was found to be useful for various tasks including summarization, sentiment analysis, question-answering. Recently, the release of manually annotated Ru-RSTreebank corpus unlocked the possibility of leveraging supervised machine learning techniques for creating such parsers for Russian language. The corpus provides the discourse annotation in a widely adopted formalisation – Rhetorical Structure Theory. In this work, we develop feature sets for rhetorical relation classification in Russian-language texts, investigate importance of various types of features, and report results of the first experimental evaluation of machine learning models trained on Ru-RSTreebank corpus. We consider various machine learning methods including gradient boosting, neural network, and ensembling of several models by soft voting.
PDF на сайте Международной конференции «Диалог» (англ.): www.dialog-21.ru/media/4595/chistovaevplusetal-076.pdf
PDF на сайте ВШЭ (англ.): https://www.hse.ru/data/2019/06/20/1488855652/2019_Dialogue_Chistova.pdf
РУДН. Репозиторий: https://repository.rudn.ru/ru/records/article/record/65814/
Chistova E. V., Shelmanov A. O., Kobozeva M. V., Pisarevskaya D. B., Smirnov I. V., Toldova S. Yu. Classification Models for RST Discourse Parsing of Texts In Russian // Computational Linguistics and Intellectual Technologies: Proceedings of the International Conference “Dialogue 2019”. – 2019. – Pages 163-176.