The Hybrid Method for Accurate Patent Classification

Авторы

Соченков И. В. Ядринцев В. В.

Аннотация

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers.

Внешние ссылки

DOI: http://dx.doi.org/10.1134/S1995080219110325

PDF на сайте SpringerLink (англ.): https://link.springer.com/content/pdf/10.1134/S1995080219110325.pdf

РУДН. Репозиторий: https://repository.rudn.ru/ru/records/article/record/54931/

Ссылка при цитировании

Yadrintsev V. V., Sochenkov I. V. The Hybrid Method for Accurate Patent Classification // Lobachevskii Journal of Mathematics, 2019, Volume 40, Issue 11, pp 1873–1880.