Semi-supervised attribute selection for partially labeled multiset-valued data

Yuanzi He
Jiali He
Haotian Liu
Zhaowen Li

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In machine learning, when part of the data with labels needs to be pro- cessed, it is termed as a semi-supervised learning algorithm. Dataset with missing attribute values or labels is referred to as incomplete information sys- tem. Addressing incomplete information within a system poses a significant challenge, which can be effectively tackled through the application of rough set theory (R-theory). However, R-theory has its limits, it fails to consider the frequency of an attribute value and then can not well fit the distribu- tion of attribute values. If we consider partially labeled data and replace a missing attribute value with the multiset of all possible attribute values under the same attribute, then it leads to the emergence of partially labeled multiset-valued data. In semi-supervised learning algorithm, in order to save time and cost, a large number of redundant features need to be deleted. This paper studies semi-supervised attribute selection (ss-attribute selec- tion) for partially labeled multiset-valued data. Initially, a partially labeled multiset-valued decision information system (p-MSVDIS) is partitioned into two distinct systems: a labeled multiset-valued decision information system (l-MSVDIS) and an unlabeled multiset-valued decision information system (u-MSVDIS). Subsequently, using the indistinguishable relation, distinguish- able relation, and dependence function, two types of attribute subset impor- tance in a p-MSVDIS are defined. They are the weighted sum of l-MSVDIS and u-MSVDIS determined by the missing rate of labels and can be regarded as a uncertainty measurement (UM) of a p-MSVDIS. Next, an adaptive ss- attribute selection algorithm for a p-MSVDIS is introduced, leveraging the degrees of importance, allowing for automatic adaptation to diverse missing rates. Finally, 10 datasets are used for experiment and statistical analysis, the outcomes show the proposed algorithm has their advantage than some algorithms.

Version published to 10.21203/rs.3.rs-5298390/v1 on Research Square
Nov 29, 2024

An incomplete multi-view multi-label learning with Universum

This article has 2 authors:
1. Changming Zhu
2. Lei Wang
This article has no evaluationsLatest version Sep 18, 2025
Multi-view Unsupervised Feature Selection Guided by Latent Representation and Tensor Learning

This article has 3 authors:
1. Jianjun Jiang
2. Xijiong Xie
3. Guoqing Chao
This article has no evaluationsLatest version Aug 20, 2025
Ranking Methods for Skyline Queries

This article has 2 authors:
1. Mickaël Martin Nevot
2. Lotfi Lakhal
This article has no evaluationsLatest version Aug 27, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

An incomplete multi-view multi-label learning with Universum

Multi-view Unsupervised Feature Selection Guided by Latent Representation and Tensor Learning

Ranking Methods for Skyline Queries