TXM platform suggests a wide range of corpus analysis capabilities including correspondence analysis, clusterization, lexical table construction, parametrized subcorpus selection. The default structural unit of analysis for the TXM platform is a token. However it is possible to supply each token with a number of features enabling more sophisticated, complex while flexible corpus analysis. The only extension available by default is the TreeTagger augmenting TXM platform with automated token morphological analysis capability. In this work we present a number of tools for even more extensive and complex corpus analysis relying both on our previously developed tools as well as on publicly available tools.
DOI: https://doi.org/10.18127/j20729472-201803-13
Article (PDF) in the Highly Available Systems journal (in Russian): https://npo-echelon.ru/doc/Aktualnie_voprosi_2019.pdf
ResearchGate: https://www.researchgate.net/publication/327903105_Sozdanie_specialnyh_korpusov_tekstov_na_osnove_rassirennoj_platformy_TXM
Semantic Scholar: https://api.semanticscholar.org/CorpusID:187957131
Lavrentev A. M., Smirnov I. V., Suvorova M. I., Solovyov F. N., Fokina A. I., Chepovsky A. M. Creating text corpora for special purposes on the basis of extended TXM platform // Highly Available Systems. 2018. T. 14. No. 3. Page 76-81