An Introduction to Power BI (Power BI 101)

This article we will provide an introduction to Power BI.

It is a Microsoft business analytics service to provide interface to create reports and dashboards with interactive visualisations.

It has an advantage: easy to use (self service BI).

First of all, Power BI official guided learning material.

It’s very structured and good training tutorial.

As I talk in previous blog about this learning platform, it is fun to get points and badge.

What is the workflow of Power BI?

  1. Bring data in using Power BI desktop, manipulate data and build reports.
  2. Publish it to Power BI service.
  3. Share reports and dashboards to others.

Here we go to the first step:

Download the Power BI desktop.

Power BI desktop version provides data warehouse capabilities:

  1. ETL
  2. Calculates column
  3. Measures

After we download it, we can find there are five panels in Power BI interface:

  1. Fields: where the datasets
  2. Data: view and manipulate the data
  3. Reports: place visualizations to build reports
  4. Dashboards: choose the graphs to use
  5. Relationships: view/change relationships in the dataset

Through these panels, we can manipulate data and build reports.

Once the report is completed, we can publish it onto Power BI Service on the cloud.

Reference sources:

https://www.sqlbi.com/ref/power-bi-visuals-reference/

If you are interested in or have any problems with Power BI, feel free to contact me.

Or you can connect with me through my LinkedIn.

The Simplest Way to Understand SCD in Data Warehouse

What is SCD?

SCD stands for Slowly Changing Dimensions.

It is very important in Data Warehouse.

As we know, ETL (Extract, Transform, Load) is between data sources and data warehouse.

When ETL runs, it will pick up all records and update them in the Dimension tables.

Why we need SCD?

Because we have some problems in updating data in Data Warehouse when data in data sources are changing.

In the dimension tables, if we want to keep some old records, how we can do this?

Using SCD.

Types of SCD:

Note:The types of SCD are defined on Column level, not on the table level.

There are two popular types in SCD.

Type 1: Overwrite

An old record is updated by the new record. It means covering the old records.

Type 2: Store history by creating another row

Type 2 is to add new records rather than covering the old records.

As long as we have a type 2 in tables, we must have two extra values: ‘StartDate’ and ‘EndDate’. The ‘EndDate’ is when the changes happen, as the end date of historical data.

We can have a third one, ‘IsCurrent’, to identify or mark the current record.

We don’t need to update the records, and we just need to update ‘StartDate’ and ‘EndDate’.

If you are interested in or have any problems with fact tables and dimension tables, feel free to contact me .

Rules of Creating Dimension Tables and Fact Tables

What is a dimension table? What is a fact table?

Why we need both of them in Data Warehouse?

Because Data Warehouse is used to make reports for business decisions.

Every report is made of two parts: Fact and Dimension.

Here is a picture of fact tables and dimension tables in a star schema in data warehouse.

So, this blog we will talk about what are the rules of creating dimension tables and fact tables.

First of all, we need to illustrate a definition.

What is surrogate key?

It is the primary key in demension tables.

Rules of creating dimension tables:

  1. Primary key (surrogate key, auto-increase number, only unique number in data warehouse)
  2. Business key (the key can be linked back to data source, with business meaning)
  3. Attributes (descriptive information from data source)

There are two kinds of data: Master data and transactional data.

Master data refers to the entity (e.g. employee) whereas transactional data refers to all the transactions that are carried out using that entity.

Master data is limited whereas transactional data can be billions.

In dimension tables, most data is master data.

Rules of creating fact tables:

  1. Primary key (surrogate key/alternate key, auto-increase number)
  2. Foreign key (primary key/surrogate key/alternate key from dimension tables)
  3. Measure (addictive number/semi-addictive number)

Tips: No descriptive data in Fact tables.

If you are interested in or have any problems with fact tables and dimension tables, feel free to contact me .

Or you can connect with me through my LinkedIn.

4 Reasons Why We Need Data Warehouse

Here is a basic process in Business Intelligence.

Maybe some people will be confused, why we need data warehouse?

Without data warehouse, we can also analyze the data.

We can get the data and create the report directly.

So, what the benefits of data warehouse in an organization?

Here we list 4 reasons why we need data warehouse.

Integrate data from various data sources and centralize the data into one place.

Have data loaded into data warehouse so that reporting won’t impact live system or database.

That is why we have a seperate data warehouse and the data is stored in the data warehouse.

We can make a scheduled job running at night to centralize the data from operational databases to data warehouse.

Easy access (one place of data and single source of truth).

It is easy for people to go to data warehouse to get the data, and they don’t need to worry other problems, e.g., we have so many data sources and where can I get the data?

We can trust the data warehouse where we can get the data.

Build model: choose the best design model to get the best flexibility and performance, especially for those large datasets.

We usually use kimball methodology – star schema/snowflake schema (de-normalization).

For example, we use the star schema to improve the query performance.

It is a methodology developed by Ralph Kimball which includes a set of methods, techniques and concepts for use in data warehouse design.

There are also some other methodologies we can use in data warehouse, e.g., inmon methodology, datavault methodology.

If you are interested in or have any problems with data warehouse, feel free to contact me.

Or you can connect with me through my LinkedIn.

A mind map for SQL

In this article, I make a basic SQL mind map for people who want to kick-start their career into business intelligence and data analysis industry.

Hope it can give you a basic understanding about SQL.

If you are interested in or have any problems with SQL, feel free to contact me.

Or you can connect with me through my LinkedIn.

Stored Procedure in SQL

This article I will give you some basic syntax about stored procedure in SQL.

Stored Procedure:

Stored Procedure is the prepared SQL codes that we can save, so the code can be reused over and over again.

It is suitable for some SQL queries we need to use frequently.

There are three kinds of stored procedure:

  1. No parameter
  2. One parameter
  3. multiple parameters

Stored Procedure (No parameter) Syntax

CREATE PROCEDURE procedure_name
AS
sql_statement
GO
EXEC procedure_name

Stored Procedure (One parameter) Syntax

CREATE PROCEDURE [dbo].[oneparameter]
@ProductCatergoryID int
AS
SELECT *FROM Production.ProductCategoryID
GO
EXEC oneparameter @ProductCatergoryID = '4'

Stored Procedure (multiparameter) Syntax

CREATE PROCEDURE [dbo].[multiparameter]
@ProductCategoryID int, @Name varchar(50)
AS
SELECT *FROM Production.ProductCategory pc
WHERE pc.ProductCategoryID = @ProductCategoryID and pc.Name = @Name
GO
EXEC multiparameter @ProductCategoryID = '4', @Name = 'Accessories'

I will record all knowledge I touch in my Business Intelligence journey.

Next blog will be around Data Warehouse.

If you are interested in or have any problems with Business Intelligence, feel free to contact me .

Or you can connect with me through my LinkedIn.

Business Intelligence Tutorial

This blog is Business Intelligence tutorial, which contains lots of definitions and terms.

There are three steps in Business Intelligence: Data collecting, Data Warehousing and Reporting.

Data Collecting:

  1. Structured: Standardized and easy for computers to read and query.
  2. Semistructured
  3. Unstructured: Not stored in rows and columns, so it can’t easily read by computers.

As we talked in previous blog, company data can be found in several locations, such as CRM programs, which is also shown in the picture below.

Data Warehouse:

Data warehouse uses a process (ETL, i.e., extract, transform and load) to standardize data, which allows it can be queried.

How does information get to a central location?

ETL———————>Data Warehouse

  1. Extract: unstructutred data is tagged with metadata to make it easier to find
  2. Transform: normalize data
  3. Load: tranfer data to central warehouse or data mart

Turning Data into Powerpoints (Business Intelligence Reporting)

  1. Data visualization: Graphic display of results
  2. Dashboard: Interfaces that represent specific analyses

If you are interested in or have any problems with Business Intelligence, feel free to contact me .

Or you can connect with me through my LinkedIn.

What is the Difference between Business Analysis, Data Analysis and Data Science?

The terms and job titles in the data field are extraordinarily large, such as business analysis, data analysis, data science, etc.

They often stun everyone, so this article we will talk about all of them, especially the differences.

Business Analysis VS Data Analysis

Generally, data analysis means that using data to analyse a XXX problem. The pipeline includes data collecting, data warehousing, data cleaning, data visualisation, etc.

As we can see, there is a blank in the sentence.

Actually, data analysis can be applied to different areas, and that is also the meaning of ‘XXX’ .

It can be academic or commercial.

If ‘XXX’ is commercial, then data analysis will equal to business analysis. In other words, the business analysis is an application of data analysis.

There is a main difference between data analysis and business analysis, which is the data sources.

The data source of data analysis positions is often based on the company websites, Apps, ERP system, etc.

And those job descriptions will require candidates master SQL, Python or R.

However, the data collected from those platforms often exists several problems, mainly the data quality, such as some missing data or data noise.

Those are caused by weak IT infrastructures and ‘econnoisseurs’ who want to register more user accounts.

However, the data sources of business analysis positions not just include those internal data collected by companies but also include a large amount of external sources. For example:

  1. Industry studies
  2. Qualitative interviews
  3. Quantitative interviews
  4. Internal data

Business Analysis VS Data Science

Because of Alpha Go, Artificial Intelligence is well-known to everyone.

However, the successful area of AI is not related to data analysis.

They focus on the computer vision and natural language processing, and their industrial applications are mainly in security and supply chain.

However, the usage of algorithms in commercial industry is limited because some departments in the company are hard to be represented by data and algorithm models.

This causes that algorithm can only solve the specific problems.

  1. Firstly, algorithm is related to users directly. The famous example is risk control. Because the factors contributing to the user credits are easy to be built by an algorithm model, commonly logistic regression.
  2. Secondly, forecasting algorithms. Business Intelligence is highly required to do forecasting.
  3. Thirdly, dimensionality reduction algorithms. It is often to reduce dimensions so it is easy to evaluate a problem, such as new products.

Summary

In summary, algorithms are useful in business intelligence, however, it cannot replace it.

Business Intelligence is an application in commercial problems and algorithm models are tools to solve specific problems.

If you are interested in or have any problems with Business Intellgence, feel free to contact me .

Or you can connect with me through my LinkedIn.

The Simplest Way to Install SQL Server 2017 on macOS

This article we will show how to install Microsoft SQL Server 2017 on macOS, as we talk it in previous blog.

Which method we use?

Prior to SQL Server 2017, if we want to install it on macOS, a virtual machine (like Parallels Desktop) is essential. Then we can install a Windows system in it and then we install and run SQL Server.

Luckily, from SQL Server 2017, we can choose to install SQL Server on Docker containers.

How to install SQL Server on Docker?

1 What is Docker?

Docker is a platform which can make softwares to run in it. It is called a container, which is an isolated environment.

2 Download and install Docker

If you haven’t installed Docker on your Mac, the next step is to install Docker.

Go to Docker page to download the .dmg file and then double click and install it according to the instructions.

3 Run Docker and increase the memory

Run Docker as you used to run other softwares and the next step is to increase the memory. This is because the default value of Docker memory is 2 GB but SQL needs at least 3.25 GB.

I recommend we set the memory value to 4 GB.

  1. Click Docker icon on top menu of your Mac
  2. Click Preferences
  3. Set Memory under Advanced to 4 GB.

Click Apply&Restart

4 Download Microsoft SQL Server 2017

The next step is to download SQL Server 2017 from Terminal, which is an easy way.

Copy and paste the command in Terminal of your Mac:

docker pull microsoft/mssql-server-linux

Through it, the latest version of MS Server SQL can be downloaded.

5 Run a Docker image

Copy, change and paste the command in Terminal of your Mac:

docker run -d --name xxxxxxx -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=xxxx\xxxx' -p 1433:1433 microsoft/mssql-server-linux

We need to change the name and password as your own here.

This step is to run an instance of Docker image. A Docker image is a file, which is used to execute codes in a Docker container.

If you want to check, copy and paste the following command to see whether the Docker container is running:

docker ps

If it works, it will show like this:

6 Install the sql-cli command line tool

The next step is to install the sql-cli command line tool. It can allow you to run commands against your SQL Server instance.

Copy and paste the command in Terminal of your Mac:

npm install -g sql-cli

If an error happened and shows you do not have the permissions to access this file as the current user, just try add sudo in your command:

sudo npm install -g sql-cli

7 Connect to SQL Server

After sql-cli is installed, now we can connect to SQL Server using the mssql command.

mssql -u xxxxxxx -p xxxxxxx

Here xxxxxxx and xxxxxxx means your name and password.

Then you will see:


Now you’ve successfully connected to your instance of SQL Server.

Next blog we will continue our journey with SQL Server and Azure Data Studio. Maybe you will want to read more about Azure and Microsoft Learn in the previous blog.

If you are interested in or have any problems with Business Intelligence, feel free to contact me .

Or you can connect with me through my LinkedIn.

Welcome to SQL (SQL 101)

If you want to find a BI Analyst job in New Zealand, you may be not a master of Python, R, etc.

However, SQL knowledge is essential.

Here is a BI Analyst job advertisement on LinkedIn, which is a full-time role.

St John is also an accredited employer.

If you have a skill that is needed by a New Zealand accredited employer and they offer you full-time work when you are abroed, you may be able to get a Talent (Accredited Employer) Work Visa.

From the job description, we can find SQL knowledge is important.

What is SQL?

Firstly, I want to talk about Database(DB). We all know DB is a kind of software to store a large amount of data, mainly Relational DB.

Structured Query Language(SQL) is always linked with DB. SQL can be used to operate data, such as querying and updating.

The relationship between DB, data and SQL is like: DB is a plate, data is dishes on the plate and SQL is your fork.

Currently, most of the websites and Apps are based on SQL and DB.

There are several popular DB on the world, e.g., SQLite, MySQL, Postgres, Oracle and Microsoft SQL Server.

Among them, MSSQL Server is the most popular in New Zealand.

Luckily, all of them can support SQL although they have different characteristics.

It is like if you have the fork, you can operate dishes on different plates.

What is Relational DB?

A DB is composed of several tables (like tables in Excel) and tables are composed of rows and columns.

We can query and obtain some results from the DB through SQL language.

Maybe some people will be confused about: What is the difference between DB and Excel? Excel has already existed on the world, why SQL is created?

That is because DB = Tables + Relations between tables.

Tables in Excel can’t meet the data operating requirements in companies because of the complex relations between tables.

How to kick-start SQL?

After knowing the terms of SQL and Relational DB, the next step is to set up the environment with SQL Server, which we will talk in the next blog.

If you are interested in or have any problems with Business Intelligence, also feel free to contact me .

Or you can connect with me through my LinkedIn.