Checking the originality of synonymized texts
Abstract
Synonymization is the replacement of words in a text with synonyms (words with a similar meaning, but different spellings). The main purpose of synonymization is to change a text document in such a way as to increase its uniqueness, thereby hiding the fact of borrowing. The paper discusses the features of checking synonymized texts and searches for ways to improve the quality of detecting borrowings. For the processing of synonymized texts, it is proposed to use heavy synonyms (the most frequent, weighty synonyms). The studies carried out have shown the high efficiency of the approach in comparison with the existing systems for checking originality. One of the key features of the approach is the ability to use various information retrieval algorithms for subsequent text processing - a “bag of words”, TF*IDF, N-grams, shingles, etc. This allow to give both a statistical assessment of the similarity of documents and visualize the found matches.