Towards understanding news plagiarism: theoretical and experimental analysis

Ruxandra Marinescu-Ghemeci
Adrian Miclăuș
Ionuț Murarețu
Alexandru Popa

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this paper we aim to improve the understanding of the news manipulation using a mathematical formalism. First, we develop a software that has the ability to crawl various news domains, to collect the news and to carry out similarity search between the collected news. Then, we create a mathematical model that uses temporal graphs based on the collected data. We also designed a random data generator for the above-mentioned model which we validate using four statistical tests: Kolmogorov-Smirnov test, Fisher's Exact test, Mann-Whitney U test and permutation test. We ran each of the 4 statistical tests on 100 randomly generated data counting how many times p-value is less than 0.05 or greater. On average, on at least 75% of instances we obtained a p-value greater or equal than 0.05. Then, our main result of the paper is that we formulate several combinatorial optimization problems and explain their relation to news plagiarism. We theoretically analyze each of these problems and prove NP-hardness and hardness-of approximation results. These results show that, unless P=NP, polynomial time exact algorithms for these problems do not exist. Finally, given the NP-hardness results, we formulate our problems as integer programs and use the state of the art solver, Gurobi, to solve them both on collected data and random generated data. Using multiple tests, we observe that a bugdet of around 40-60% of the total cost achieves an influence of at least 75% of the maximum value.

Version published to 10.21203/rs.3.rs-7069020/v1 on Research Square
Aug 13, 2025

Algorithms and Authors: How Generative AI is Transforming News Production

This article has 2 authors:
1. Alexander John Wasdahl
2. Ramesh Srinivasan
This article has no evaluationsLatest version Aug 7, 2025
Issue Detection and Future Proofing Dutch Government Apps Using Language Technologies

This article has 3 authors:
1. Anca-Mihaela Matei
2. Flor Miriam Plaza-del-Arco
3. Natalia Amat-Lefort
This article has no evaluationsLatest version Aug 21, 2025
Survey on Information Requirements on the Google Books Ngram Corpus

This article has 4 authors:
1. Fabian Richter
2. Federico Matteucci
3. Peter Reimann
4. Klemens Böhm
This article has no evaluationsLatest version Sep 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Algorithms and Authors: How Generative AI is Transforming News Production

Issue Detection and Future Proofing Dutch Government Apps Using Language Technologies

Survey on Information Requirements on the Google Books Ngram Corpus