The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.
Download PDF at SpringerLink: https://link.springer.com/content/pdf/10.1007%2F978-3-030-01204-5_8.pdf
Presentations at the AINL-2018 Conference website: https://ainlconf.ru/2018/agenda
Semantic Scholar: https://api.semanticscholar.org/CorpusID:125954186
HSE publications: https://publications.hse.ru/en/chapters/226714563
Toldova S., Kobozeva M., and Pisarevskaya D. Automatic mining of discourse connectives for Russian // In Conference on Artificial Intelligence and Natural Language. – 2018. – С. 79–87.