Study of cause-effect discourse connectives can help in automated discourse processing and automatic identification of argument units. Besides conjunctions and other functional words and expressions, there are different multi-word expressions containing content-words (e.g. is etogo sleduet ‘it follows that’) that have the function of cause-effect connectives. We present a method for connectives mining for Russian language. Firstly, the seed list of 143 multi-word connectives was manually extracted from the Ru-RSTreebank corpus. Two Word2Vec models, trained on the news corpus, were used to detect new multi-word connectives. Before first model training, connectives from the seed list were glued to build multi-word tokens. Before second model training, in addition to it, all 3-grams in corpus, that correspond to the specific proposed patterns based on anaphoric expressions, were also glued in the same way. The method based on the second model gives a satisfactory result and lets expand connectives list for cause-effect discourse relations, after manual editing (286 new connectives).
PDF на SpringerLink (англ.): https://link.springer.com/content/pdf/10.1007%2F978-3-030-37858-5_60.pdf
Публикации ВШЭ: https://publications.hse.ru/chapters/359416623
Pisarevskaya D., Kobozeva M. et al. Automatic Mining of Cause-Effect Discourse Connectives for Russian // International Conference on Digital Transformation and Global Society. – Springer, Cham, 2019. – Pp. 708-718.