A Vector Database Approach for Enhancing Data Warehouse Development Practices
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The importance of data warehouses (DWH) in all companies cannot be denied. Online analytical processing (OLAP) is a crucial component of a data warehouse (DWH). Business data serves as the input for creating valuable information essential for a business's sustainability. It comes from a variety of sources and takes many different forms, from traditional structured data to unstructured data. A vector database is a specific type of database that stores data in multidimensional vectors, representing traits or attributes. The process of transforming high-dimensional data, including unstructured text and images, into a representation with fewer dimensions is known as embedding in vector database usage. Vector embeddings are structured numerical representations generated from unstructured data, such as text and images, using modern techniques that preserve semantic notions of similarity and difference in the vector morphology. DWH systems are always designed with structured data in mind. However, as the volume of unstructured data grows exponentially, organizations need more complex methods to understand, represent, and analyze this material. In this case, vector databases offer a ground-breaking answer. To enable vector database technology in data warehouses and handle unstructured data, it is necessary to employ modern techniques, such as Retrieval Augmented Generation (RAG). The recently proposed approach offers a viable method for building an unstructured data warehouse.