Development of Methods for Extracting Information from Pharmacy Line Using Conditional Random Fields

Авторы

Молодченков А. И. Николаев А. А.

Аннотация

The paper considers the solution to the problem of extracting information from short lines of pharmacological orientation in Russian language. As an example, pharmacy lines are used, from which you need to extract the full name of the drug, manufacturer, form of issue, dosage, number of pieces in a package and some other parameters. To extract this information, a conditional random field (CRF) algorithm was used. There was also created a method for preliminary standardization of the strings to bring string tokens to a single form. More than seven thousand pharmacy lines were marked for the experiments and 2 CRF models were trained - with and without preliminary standardization of the lines. For the model with standardization, the following results were obtained: accuracy for different data sets is 0.95 (on the validation set) and 0.89 (on the test set). For the model without standardization, the accuracy is 0.95 (on the validation set) and 0.87 (on the test set).

Внешние ссылки

Скачать PDF на сайте журнала CEUR Workshop Proceedings (англ.): http://ceur-ws.org/Vol-3036/paper27.pdf

Scopus: https://www.scopus.com/record/display.uri?origin=inward&eid=2-s2.0-85121247297&featureToggles=FEATURE_NEW_DOC_DETAILS_EXPORT:1

ResearchGate (англ.): https://www.researchgate.net/publication/357963846_Development_of_Methods_for_Extracting_Information_from_Pharmacy_Line_Using_Conditional_Random_Fields

Ссылка при цитировании

Alexey I. Molodchenkov, Artem A. Nikolaev, Evgenia A. Mitrokhina. Development of Methods for Extracting Information from Pharmacy Line Using Conditional Random Fields // CEUR Workshop Proceedings. 2021. Vol. 3036. pp. 340-348