The press service of NUST MISIS reported that a group of NUST MISIS scientists, who were solving the problem of correctly searching for voluminous documents that are similar in meaning, have developed a mechanism for semantic fast search in specialized databases.
It is argued that the development, carried out under a grant from the Russian Science Foundation, can help improve the quality of information retrieval and data analysis in specialized search engines. These are systems used by scientific and industrial organizations for searching reports, patents, scientific publications.
The mechanism is based on the segmentation method. It allows you to correctly cope with the situation when large complex documents cover several topics at once, which, with the usual approach, greatly complicates automatic search. After splitting into thematically homogeneous pieces of text, the algorithm performs the search more efficiently.
“As part of the study, we used a method based on the additive regularization of topic models (ARTM) approach and the TopicTiling algorithm. As a result of the experiments, it was possible to improve the accuracy of the work of highly specialized research in scientific publications from 55% to almost 82%, ”commented Nikita Nikitinsky, researcher at the NUST MISIS Big Data Research Center, on the chosen approach.
The development has already been implemented in the Russian project for the creation of the Register of Mandatory Requirements.