Methods for cross-lingual retrieval of similar documents in legal domain based on machine learnings

Authors

Devyatkin D. Sochenkov I. Zubarev D. Zhebel V.

Annotation

The need of studying the international experience to improve legislation cause the need of information retrieval systems to be good in multilingual legal domain. One of the possible solutions is thematically similar document retrieval. However, there is an important task to transfer between languages to let the user put a document on the one language and get the search result on another one. The paper describes different approaches to solve this problem: from classical mediator-based methods to modern procedures of distributive semantics. As a test collection, we have used the UN digital library. The combination of the extended translation model and BM25 ranking function demonstrates the best results.

External links

DOI: 10.14357/20718594220203

Reference link

Vladimir Zhebel, Dmitry Devyatkin, Denis Zubarev, Ilya Sochenkov. Methods for cross-lingual retrieval of similar documents in legal domain based on machine learnings // Artificial Intelligence and Decision Making. 2022. № 2. Pages 27-35. DOI 10.14357/20718594220203.