Feature Centric Schematic Tree Based Big Data Management Using Efficient ETL

Vijayalakshmi M
Minu R I
Pradeep S

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The problem of managing big data in the cloud has been well-studied. The increased form of data heterogeneity increases the challenge in maintaining the data in a most readable and relational form. There exist different approaches that perform Extraction-Transformation-Learning from the given big data towards obtaining information most feasibly. There are methods that use basic metadata in maintaining the data, and they extract data and transform it into the form of the same set of relational data. However, the transformed data has not supported the learning problem, which in turn should produce higher accuracy in knowledge extraction. To solve this issue, a dynamic feature-centric schematic tree based ETL model (DFST) has been presented in this article. This approach transforms the big data given in the form of a structured schematic tree to support higher performance of big data management as well as knowledge mining. The proposed model extracts the features from the data given and finds the schema of any data towards various relational schemas available. Extracted data has been measured for Feature Centric Schematic Relation Score (FCSRS) towards various schemas available. The value of FCSRS is measured towards different relational schemas maintained and identifies the most similar schema at the feature level. If the schema of any data given does not match any of the schemas, then it generates its own schema to update the schema set. For the schema identified or generated, the method generates a schematic tree and adds the features under a specific leaf by generating a node. Similarly, for an existing schema, the method adds a node under the leaf’s available according to the schema of the data given. At the learning stage, the method generates a rule set according to the tree available and the values of different features on the tree given. The learning process computes feature centric similarity (FCS) on various dimensional trees to find the tree, and based on that, the method identifies the features to generate rule sets. Generated rule sets are populated as results for analysis with higher accuracy.

Version published to 10.21203/rs.3.rs-5149135/v1 on Research Square
Oct 31, 2025

A Vector Database Approach for Enhancing Data Warehouse Development Practices

This article has 4 authors:
1. Sherif R Eldemerdash
2. Osama E Emam
3. Manal A Abdelfattah
4. Wael Mohamed abass
This article has no evaluationsLatest version Dec 12, 2025
Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations

This article has 3 authors:
1. Shahrzad Khayatbashi
2. Najmeh Miri
3. Amin Jalali
This article has no evaluationsLatest version Jan 21, 2026
A Systematic Literature Review on the Evolution of Skyline Query on Uncertain Database: Trends and Insights

This article has 3 authors:
1. H. M. Ikram Kays
2. Raini Hassan
3. Dini Oktarina Dwi Handayani
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Vector Database Approach for Enhancing Data Warehouse Development Practices

Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations

A Systematic Literature Review on the Evolution of Skyline Query on Uncertain Database: Trends and Insights