The method of attribution of the author using Word Embeddings

Authors

Sochenkov I.

Annotation

In this paper we look at a methodology of revealing an unknown document’s author through the use of extracting the author's characteristics from their writing style The method used explores identifying sources of unknown documents, using a model of distributive semantics to form a set of queries to a search engine. The dataset used is the PAN @ CLEF 2019 shared task on Cross-domain Authorship Attribution are in the following languages: English, French, Italian, and Spanish, each of which contains 5 problematic questions, which gives a total of 20 problematic questions. The problem relates to Natural Language Programming where the process is done through the attribution of the user that can be used to identify an author’s work. The method explores identifying sources of unknown document, using a model of distributive semantics to form a set of queries to a search engine. The method used to reveal the unknown authors is done through distributional semantics; this is based on the following hypothesis: the linguistic units that are observed in close contexts have similar semantic meaning, in this area when looking at linguistics this is calculated based on the proximity of linguistic elements in terms of semantic load based on their distribution in large textual boxes.

External links

DOI: 10.25559/SITITO.15.201903.572-578

Download PDF from the Modern Information Technologies and IT-Education journal website: http://sitito.cs.msu.ru/index.php/SITITO/article/view/562

eLibrary: https://www.elibrary.ru/item.asp?id=43136389

Reference link

Simon C.K., Sochenkov I.V. Method for Author Attribution Using Word Embeddings // Sovremennye informacionnye tehnologii i IT-obrazovanie = Modern Information Technologies and IT-Education. 2019; 15(3):572-578.