Create a SSIS Project in Visual Studio 2015

Installation:

  1. SQL Server Data Tools 2015(install a shell of Visual Studio 2015)
  2. Visual Studio 2015

What is SSIS?

SSIS stands for SQL Server Intergration Services. SSIS is to do the ETL(Extract, Transform and Load) task for data warehouse.

SSIS can also update data warehouse, clean and mine data, create ‘packages’, manipulate data, etc.

To design ETL task in Visual Studio, we use the data flow in Visual Studio.

What is data flow?

A data flow defines a flow of data from a source to a destination.

Before designing the data flow, we can try to drag a data flow and drop it in the control flow.

What is a control flow?

It is a flow we can use to control the flow for different tasks. It provides the logic for when data flow components are run and how they run.

For example:

  1. perform lopping
  2. call stored procedures
  3. move files
  4. manage error handling
  5. check a condition and call different tasks

It defines a workflow of tasks to be executed, often a particular order (assuming your included precedence constraints).

we can rename the data flow task in control flow.

For example, we can drag OLE DB Source, so we can connect to Relational Database.

Then we can follow the Microsoft official toturial to do it step by step.

https://docs.microsoft.com/en-us/sql/integration-services/lesson-1-1-creating-a-new-integration-services-project?view=sql-server-ver15

If you are interested in or have any problems with SSIS, feel free to contact me.

Or you can connect with me through my LinkedIn.

Dimensional Modelling

First of all, a concept of Data Warehouse is supposed to be clear. Data Warehouse is not a copy of source database with a name prefixed with ‘DW’.

DW is we can store data from multiple data sources to be used for historical or analysis report. Some data sources are .csv format, some are google docs, etc. So we need to aggregate them into one data source.

This is called Data Warehouse.

How to design it? The procedure is Dimensional Modelling.

What is the difference between relational and dimensional?

It is same as the difference like normalised VS denormalised.

Normalised:

  1. Minimal data redundancy
  2. Optimised for fast read and fast write
  3. Current data only
  4. Realtime data

Denormalised:

  1. Redundancy data storage for performance
  2. Fast read only
  3. Non-realtime data
  4. Current and historical data

The next part is about fact table and dimension table in dimensional design.

Fact table:

  1. Data that can be measured
  2. Contains surrogate key, linking the associated measures or facts

Dimension table:

  1. Descriptive information

Some types of dimensional models:

  1. Accumalating snapshot table
  2. Aggregate fact
  3. Fact table
  4. Factless Fact table
  5. Snapshot table

If you are interested in or have any problems with Dimensional Modelling, feel free to contact me.

Or you can connect with me through my LinkedIn.