Принципы разработки дискурсивного корпуса русского языка

Авторы

Смирнов И. В. , Кобозева М. В. , Суворова (Ананьева) М. И.

Аннотация

For many natural language processing tasks (machine translation evaluation, anaphora resolution, information retrieval, etc.) a corpus of texts annotated for discourse structure is essential. As for now, there are no such corpora of written Russian, which stands in the way of developing a range of applications. This paper presents the first steps of constructing a Rhetorical Structure Corpus of the Russian language. Main annotation principles are discussed, as well as the problems that arise and the ways to solve them. Since annotation consistency is often an issue when texts are manually annotated for something as subjective as discourse structure, we specifically focus on the subject of inter-annotator agreement measurement. We also propose a new set of rhetorical relations (modified from the classic Mann & Thompson set), which is more suitable for Russian. We aim to use the corpus for experiments on discourse parsing and believe that the corpus will be of great help to other researchers. The corpus will be made available for public use.

Внешние ссылки

РИНЦ: https://www.elibrary.ru/item.asp?id=31061690

PDF на сайте международной конференции «Диалог» (на англ.): http://www.dialog-21.ru/media/3938/pisarevskayadetal.pdf

Читать на ResearchGate (на англ.): https://www.researchgate.net/publication/320083661_Towards_building_a_Discourse-annotated_corpus_of_Russian

Авторы

Аннотация

Внешние ссылки

Ссылка при цитировании