Feature Centric Schematic Tree Based Big Data Management Using Efficient ETL

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The problem of managing big data in the cloud has been well-studied. The increased form of data heterogeneity increases the challenge in maintaining the data in a most readable and relational form. There exist different approaches that perform Extraction-Transformation-Learning from the given big data towards obtaining information most feasibly. There are methods that use basic metadata in maintaining the data, and they extract data and transform it into the form of the same set of relational data. However, the transformed data has not supported the learning problem, which in turn should produce higher accuracy in knowledge extraction. To solve this issue, a dynamic feature-centric schematic tree based ETL model (DFST) has been presented in this article. This approach transforms the big data given in the form of a structured schematic tree to support higher performance of big data management as well as knowledge mining. The proposed model extracts the features from the data given and finds the schema of any data towards various relational schemas available. Extracted data has been measured for Feature Centric Schematic Relation Score (FCSRS) towards various schemas available. The value of FCSRS is measured towards different relational schemas maintained and identifies the most similar schema at the feature level. If the schema of any data given does not match any of the schemas, then it generates its own schema to update the schema set. For the schema identified or generated, the method generates a schematic tree and adds the features under a specific leaf by generating a node. Similarly, for an existing schema, the method adds a node under the leaf’s available according to the schema of the data given. At the learning stage, the method generates a rule set according to the tree available and the values of different features on the tree given. The learning process computes feature centric similarity (FCS) on various dimensional trees to find the tree, and based on that, the method identifies the features to generate rule sets. Generated rule sets are populated as results for analysis with higher accuracy.

Article activity feed