Shapley Value-based Data Valuation for Machine Learning Data Markets
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Data valuation is the act of assigning a monetary value to data based on its estimated usefulness, potential impact, and scarcity. In the current landscape of digital technology, data has emerged as a valuable resource for various entities such as organizations, governments, and individuals. Consequently, data valuation plays an increasingly important role in managing data assets, ultimately facilitating informed decision-making in relation to data acquisition, sharing, analysis, and even monetization. In the context of the Machine Learning Data Market, a platform to exchange data considering its value, data valuation has an important role in putting economic value before trading data. The Shapley Value has assumed a central role in data valuation, due to its equitable value distribution among contributors. This paper focuses on data valuation within the context of the Machine Learning Data Market (MLDM). Our primary objective is to investigate whether data valuation methods based on the Shapley Value can result in improved performance in MLDM. We introduced the Gain Data Shapley Value (GDSV) method. This paper presents an extensive empirical study of its behavior and compares GDSV with performance-based data valuation in MLDM under different configurations and learning algorithms. Our findings confirm that considering the contribution of the data set to performance scores can lead to systematic improvements in learning performance.