ETL and Real-time Data Warehousing
By Nena Marin, Sr. Solutions Architect, Pervasive Software
Real-time data warehousing is at the core of enterprise intelligence initiatives. Forecasting, trending and what to do next require freshest knowledge of what is happening now.
Pervasive Data Integrator (DI) provides workflow based ETL for real-time concurrent warehouse loading. As an example, the following integration process implements a Data Warehouse load using Pervasive DI.
In this dimensional model, the data warehouse is comprised of dimension and fact tables. The fact tables contain measures (like $ and units) and relationship to every dimension or attributes via foreign keys. The dimensional model provides a framework that is scalable yet standard across the enterprise.
Pervasive DI process designer provides a canvas to create workflows for data warehouse loads. Each map/transformation node in the workflow loads a specific table; dimensions and facts. At load level 1, dimension tables without dependencies can be loaded concurrently. At load level 2, dimensions with downstream dependencies can be loaded concurrently. In this example, the Product_Dimension includes a foreign key to (or is downstream dependent on) the Product_Class_Dimension. Since fact tables contain foreign keys or dependencies to all dimensions, fact tables are loaded last in the workflow process.
Pervasive DI workflow offers seamless concurrency, native connectivity and SQL querying to join data from multiple sources. The pervasive ETL process designer also provides handy features like decision steps, email notification and a scripting language for custom source to target transformations. There are built in options to profile the ETL engine performance during your load processes.
Finally, processes are scheduled as tasks for automation.
