The Hybrid Method for Accurate Patent Classification

Авторы

Соченков И. В. , Ядринцев В. В.

Аннотация

This article is dedicated to stacking of two approaches of patent classification. First is based on linguistically-supported k-nearest neighbors algorithm using the method of search for topically similar documents based on a comparison of vectors of lexical descriptors. Second is the word embeddings based fastText, where the sentence (or a document) vector is obtained by averaging the n-gram embeddings, and then a multinomial logistic regression exploits these vectors as features. We show in Russian and English datasets that stacking classifier shows better results compared to single classifiers.

Внешние ссылки

DOI: http://dx.doi.org/10.1134/S1995080219110325

PDF на сайте SpringerLink (англ.): https://link.springer.com/content/pdf/10.1134/S1995080219110325.pdf

РУДН. Репозиторий: https://repository.rudn.ru/ru/records/article/record/54931/

Ссылка при цитировании

Yadrintsev V. V., Sochenkov I. V. The Hybrid Method for Accurate Patent Classification // Lobachevskii Journal of Mathematics, 2019, Volume 40, Issue 11, pp 1873–1880.