topSEARCH: a Comprehensive Tool for the Retrieval and Analysis of Multi-Type Online Resources
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The internet is filled with diverse content types, such as videos, news articles, podcasts, and mobile apps, spread across various platforms and requiring significant time and effort to gather and evaluate. We propose a novel methodology for efficiently retrieving, organizing, and storing more than one type of online resources. This methodology is implemented in a tool called topSEARCH, which automates resource gathering using public APIs. To assess the quality and diversity of the resources, we compare the interest trends from topSEARCH with those from Google Trends. This comparison is done after searching in both tools 10 different queries (e.g. Covid-19 or ChatGPT) with known trends in the last three years. Results show a high mean similarity between both tools (cosine: 0.7766, Pearson: 0.5478, Euclidean: 16.59) indicating that the proposed methodology is able to search and combine different online resource types efficiently and with enough quality. In addition, the application of filters has reduced the average similarity between both tools by up to 15.33%. We publicly release topSEARCH's code to support future research. We also release the database generated with topSEARCH, which contains a total of 27,002 resources for the selected 10 search queries.