Automatic Mining of Cause-Effect Discourse Connectives for Russian


Kobozeva M.


Study of cause-effect discourse connectives can help in automated discourse processing and automatic identification of argument units. Besides conjunctions and other functional words and expressions, there are different multi-word expressions containing content-words (e.g. is etogo sleduet ‘it follows that’) that have the function of cause-effect connectives. We present a method for connectives mining for Russian language. Firstly, the seed list of 143 multi-word connectives was manually extracted from the Ru-RSTreebank corpus. Two Word2Vec models, trained on the news corpus, were used to detect new multi-word connectives. Before first model training, connectives from the seed list were glued to build multi-word tokens. Before second model training, in addition to it, all 3-grams in corpus, that correspond to the specific proposed patterns based on anaphoric expressions, were also glued in the same way. The method based on the second model gives a satisfactory result and lets expand connectives list for cause-effect discourse relations, after manual editing (286 new connectives).

External links


PDF at SpringerLink:


Higher School of Economics publications:

Reference link

Pisarevskaya D., Kobozeva M. et al. Automatic Mining of Cause-Effect Discourse Connectives for Russian // International Conference on Digital Transformation and Global Society. – Springer, Cham, 2019. – Pp. 708-718.