Bilingual Rhetorical Structure Parsing with Large Parallel Annotations

Authors

Chistova E.

Annotation

Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.

External links

DOI: 10.18653/v1/2024.findings-acl.577

DOI: 10.48550/arXiv.2409.14969

Download the article at ACL Anthology (PDF): https://aclanthology.org/2024.findings-acl.577/

Download the article (PDF) or read online at arXiv.org: https://arxiv.org/html/2409.14969v1

Download Data, code and models at GitHub: https://github.com/tchewik/isanlp_rst

Reference link

Chistova, Elena. Bilingual Rhetorical Structure Parsing with Large Parallel Annotations // Findings of the Association for Computational Linguistics, ACL 2024, Bangkok, Thailand and virtual meeting, August 11-16, 2024, pp. 9689–9706.