Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser

Авторы

Чистова Е. В.

Аннотация

We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark monotreebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono-treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.

Внешние ссылки

DOI: 10.18653/v1/2025.codi-1.17

DOI: 10.48550/arXiv.2510.06427

Скачать статью (PDF) на ArXiv.org (англ.): https://arxiv.org/abs/2510.06427

Скачать сборник статей CODI 2025 (PDF) из библиотеки ACL Anthology (англ.): https://aclanthology.org/2025.codi-1.pdf

ResearchGate: https://www.researchgate.net/publication/396330106_Bridging_Discourse_Treebanks_with_a_Unified_Rhetorical_Structure_Parser

Ссылка при цитировании

Elena Chistova. Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser // In Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025), pages 197–208, Suzhou, China.