The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.
DOI: http://dx.doi.org/10.1007/978-3-030-01204-5_8
Cкачать PDF на SpringerLink (англ.): https://link.springer.com/content/pdf/10.1007%2F978-3-030-01204-5_8.pdf
Презентаций на сайте конференции AINL-2018 (англ.): https://ainlconf.ru/2018/agenda
РИНЦ: https://www.elibrary.ru/item.asp?id=38642409
Публикации ВШЭ: https://publications.hse.ru/chapters/226714563
Toldova S., Kobozeva M., and Pisarevskaya D. Automatic mining of discourse connectives for Russian // In Conference on Artificial Intelligence and Natural Language. – 2018. – С. 79–87.