Automatic mining of discourse connectives for Russian

Authors

Kobozeva M.

Annotation

The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.

External links

DOI: http://dx.doi.org/10.1007/978-3-030-01204-5_8

Download PDF at SpringerLink: https://link.springer.com/content/pdf/10.1007%2F978-3-030-01204-5_8.pdf

Presentations at the AINL-2018 Conference website: https://ainlconf.ru/2018/agenda

Semantic Scholar: https://api.semanticscholar.org/CorpusID:125954186

HSE publications: https://publications.hse.ru/en/chapters/226714563

Authors

Annotation

External links

Reference link