A Data-Driven Approach to Supporting Fact-Checking and Mitigating Mis/Disinformation Through Domain Quality Evaluation

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Misinformation and disinformation spread rapidly on social media, threatening public discourse, democratic processes, and social cohesion. One promising strategy to address these challenges is to evaluate the trustworthiness of entire domains (source websites) as a proxy for content credibility. This approach demands methods that are both scalable and data-driven. However, current solutions like NewsGuard and MBFC rely on expert assessments, cover only a limited number of domains, and often require paid subscriptions. These constraints limit their usefulness for large-scale research.This study introduces a machine-learning-based system designed to assess the quality and trustworthiness of websites. We propose a data-driven approach that leverages a large dataset of expert-rated domains to predict credibility scores for previously unseen domains using domain-level features. Our supervised regression model achieves moderate performance, with a mean absolute error of 0.12. Using feature importance analysis, we found that PageRank-based features provided the greatest reduction in prediction error, confirming that link-based indicators play a central role in domain trustworthiness. This highlights the importance of highly referenced domains in reliable news dissemination. This approach can also help fact-checkers and social media platforms refine their credibility assessment strategies.The solution’s scalable design accommodates the continuously evolving nature of online content, ensuring that evaluations remain timely and relevant. The framework enables continuous assessment of thousands of domains with minimal manual effort. This capability allows stakeholders (social media platforms, media monitoring organizations, content moderators, and researchers) to allocate resources more efficiently, prioritize verification efforts, and reduce exposure to questionable sources. Ultimately, this facilitates a more proactive and effective response to misinformation while also supporting robust public discourse and informed decision-making.

Article activity feed