Skip to main content

Introduction

This data warehouse project demonstrates a comprehensive end-to-end solution for consolidating sales data from multiple source systems into a unified analytical platform. Built with PostgreSQL and Docker, the warehouse enables business intelligence, reporting, and data-driven decision-making.
The architecture follows industry best practices in data engineering, implementing the Medallion Architecture pattern with Bronze, Silver, and Gold layers for progressive data refinement.

Architecture Diagram

The data warehouse architecture implements a three-layer approach for data processing and transformation:
Data warehouse architecture diagram

Data Flow

Data flows through the system in a structured, progressive manner:
1

Source Systems

Raw data originates from two primary business systems:
  • ERP System: Enterprise Resource Planning data
  • CRM System: Customer Relationship Management data
Both systems export data as CSV files for ingestion.
2

Bronze Layer (Raw Ingestion)

CSV files are loaded directly into PostgreSQL tables with minimal transformation. This layer preserves the original data structure and serves as the single source of truth.
The Bronze layer acts as a data lake, storing raw data exactly as received from source systems.
3

Silver Layer (Cleansing & Standardization)

Data undergoes quality checks, cleansing, and standardization:
  • Remove duplicates and inconsistencies
  • Standardize data formats and naming conventions
  • Normalize data structures
  • Resolve data quality issues
4

Gold Layer (Business-Ready Analytics)

Refined data is modeled into a star schema optimized for analytical queries:
  • Fact tables for business transactions
  • Dimension tables for analytical context
  • Pre-aggregated metrics for performance
  • Business-friendly naming and structure

Technology Stack

The data warehouse is built with modern, proven technologies:

PostgreSQL

Open-source relational database providing robust data storage, ACID compliance, and powerful SQL capabilities for data transformation and analytics.

Docker

Containerization platform ensuring consistent development and deployment environments. Simplifies setup and makes the warehouse portable across systems.

PL/pgSQL

PostgreSQL’s procedural language for implementing ETL logic, stored procedures, and complex data transformations within the database.

SQL

Standard SQL for data modeling, queries, and analytics. Enables business users and analysts to extract insights directly from the warehouse.

Key Capabilities

Consolidates data from multiple source systems (ERP and CRM) into a unified data model, eliminating data silos and providing a single source of truth for analytics.
Implements comprehensive data cleansing and validation in the Silver layer to ensure high-quality, trustworthy data for business decisions.
Star schema design in the Gold layer optimizes query performance for common analytical patterns, enabling fast reporting and ad-hoc analysis.
PostgreSQL’s robust architecture and Docker containerization provide a foundation for scaling the warehouse as data volumes and user demands grow.

Project Objectives

The data warehouse project fulfills multiple business and technical objectives:

Consolidate Sales Data

Integrate sales information from disparate ERP and CRM systems into a unified repository.

Enable Analytics

Provide a foundation for SQL-based reporting, dashboards, and business intelligence tools.

Ensure Data Quality

Implement systematic data cleansing and validation to maintain high data integrity.

Support Decision-Making

Deliver actionable insights on customer behavior, product performance, and sales trends.
This project focuses on current state data. Historical tracking and slowly changing dimensions are not implemented in this version.

Next Steps

Explore the architecture in greater detail:
  • Medallion Layers - Deep dive into Bronze, Silver, and Gold layer implementations
  • Data Model - Detailed star schema design and table structures

Build docs developers (and LLMs) love