How to design a Data Warehouse(Part 1)

Last blog I wrote why we need a Data Warehouse.

First, what is the data warehouse?

It is a centralized relational database that pulls together data from different sources (CRM, marketing stack, etc.) for better business insights.

It stores current and historical data are used for reporting and analysis.

However, here is the problem:

How we can design a Data Warehouse?

1 Define Business Requirements

Because it touches all areas of a company, all departments need to be onboard with the design. Each department needs to understand what the benefits of data warehouse and what results they can expect from it.

What objectives we can focus on:

  1. Determine the scope of the whole project
  2. Find out what data is useful for analysis and where our dat is current siloed
  3. Create a backup plan in case of failure
  4. Security: monitoring, etc.

2 Choose a data warehouse platform

There are four types of data warehouse platforms:

  1. Traditional database management systems: Row-based relational platforms, e.g., Microsoft SQL Server
  2. Specialized Analytics DBMS: Columnar data stores designed specifically for managing and running analytics, e.g., Teradata
  3. Out-of-box data warehouse appliances: the combination with a software and hardware with a DBMS pre-installed, e.g., Oracle Exadata
  4. Cloud-hosted data warehouse tools

We can choose the suitable one for the company according to budget, employees and infrastructure.

We can choose between cloud or on-premise?

Cloud solution pros:

  1. Scalability: easy, cost-effective, simple and flexible to scale with cloud services
  2. Low entry cost: no servers, hardware and operational cost
  3. Connectivity: easy to connect to other cloud services
  4. Security: cloud providers supply security patches and protocols to keep customers safe

Choices:

  1. Amazon Redshift
  2. Microsoft Azure SQL Data Warehouse
  3. Google Bigquery
  4. Snowflake Computing

On-premise solution pros:

  1. Reliability: With good staff and exceptional hardware, on-premise solutions can be highly available and reliable
  2. Security: Organizations have full control of the security and access

Choices:

  1. Oracle Database
  2. Microsoft SQL Server
  3. MySQL
  4. IBM DB2
  5. PostgreSQL

What we can choose between on-premise and cloud solution, in the big picture, it depends on our budget and existing system.

If we look for control, then we can choose on-premise solution. Conversely, if we look for scalability, we can choose a cloud service.

3 Set up the physical environments

There are three physical environments in Data Warehouse: development, testing and production.

  1. We need to test changes before they move into the production environment
  2. Running tests against data typically uses extreme data sets or random sets of data from the production environment.
  3. Data integrity is much easier to track and issues are easier to contain if we have three environments running.

4 Data Modelling

It is the most complex phase of Data Warehouse design. It is the process of visualizing data distribution in the warehouse.

  1. Visualize the relationships between data
  2. Set standardized naming conventions
  3. Create relationships between data sets
  4. Establish compliance and security processes

There are bunches of data modeling techniques that businesses use for data warehouse design. Here are top 3 popular ones:

  1. Snowflake Schema
  2. Star Schema
  3. Galaxy Schema

4 Choosing the ETL solution

ETL, stands for Extract, Transform and Load is the process we pull data from the storage solutions to warehouse.

We need to build an easy, replicable and consistent data pipeline because a poor ETL process can break the entire data warehouse.

Wrapping up

This post explored the first 4 steps about designing a Data Warehouse in the company. In the future, I will write the next steps.

If you are interested in or have any problems with Data Warehouse, feel free to contact me.

Or you can connect with me through my LinkedIn.

Author: Jacqui

Data Science|Business Intelligence

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s