Automatic mining of discourse connectives for Russian

Авторы

Кобозева М. В.

Аннотация

The identification of discourse connectives plays an important role in many discourse processing approaches. Among them there are functional words usually enumerated in grammars (iz-za ‘due to’, blagodarya ‘thanks to’,) and not grammaticalized expressions (X vedet k Y ‘X leads to Y’, prichina etogo ‘the cause is’). Both types of connectives signal certain relations between discourse units. However, there are no ready-made lists of the second type of connectives. We suggest a method for expanding a seed list of connectives based on their vector representations by candidates for not grammaticalized connectives for Russian. Firstly, we compile a list of patterns for this type of connectives. These patterns are based on the following heuristics: the connectives are often used with anaphoric expressions substituting discourse units (thus, some patterns include special anaphoric elements); the connectives more frequently occur at the sentence beginning or after a comma. Secondly, we build multi-word tokens that are based on these patterns. Thirdly, we build vector representations for the multi-word tokens that match these patterns. Our experiments based on distributional semantics give quite reasonable list of the candidates for connectives.

Внешние ссылки

DOI: http://dx.doi.org/10.1007/978-3-030-01204-5_8

Cкачать PDF на SpringerLink (англ.): https://link.springer.com/content/pdf/10.1007%2F978-3-030-01204-5_8.pdf

Презентаций на сайте конференции AINL-2018 (англ.): https://ainlconf.ru/2018/agenda

РИНЦ: https://www.elibrary.ru/item.asp?id=38642409

Публикации ВШЭ: https://publications.hse.ru/chapters/226714563

Авторы

Аннотация

Внешние ссылки

Ссылка при цитировании