Google Analytics cheat sheets

Data-driven VS data-informed

Data-driven: Decisions made only based upon statistics, which can be misleading.

Data-informed: Decisions made by combining statistics with insights and our knowledge of human wants & needs. 

We will be able to use data and human creativity to come up with innovative solutions in a business.

When we click into Google Analytics, we can see a large amount of lines, full of data, strange names.

But don’t chill out, we can break things

In this blog, I will illustrate Analytics +Art = Creative Data Scientist


  1. Installing And Customising Google Analytics
  2. Learning Dashboard
  3. Analysing Behaviours
  4. User Acquisition
  5. Generate And Share Reports

1. Installing And Customising Google Analytics

How to Setup Google Analytics & Install on Website - YouTube

When we install the Google Analytics, there are several terms we should pay attention:

  1. Tracking code: Which is a basic code snippet for a website. It starts as UA, which stands for Universal Analytics.
  2. Data collection:  Turn it on and it allows us to get data of users.
  3. User-ID:  Allow us to tracker users. Generate the unique user Id and make sure the right ID is assigned to the right user and associate the data in Google Analytics.
  4. Session setting:  Any time a user has loaded up your site on their device. We can set the session time here.
  5. Referral exclusive list: Mostly, we set them as our own site URL.

2. Learning Dashboards:

Admin page: account, property, view

Under the account, there are several properties.

Take one account as the example, there are many URLs associated with this account.

In other words, we have a site. Under this site, we have a whole bunch of other properties that we’ve associated here.

The tracking Id is the same, except for a number at the very end.

Google Analytics Admin Page

Google Ads and Google AdSense: 

Ads are the words we buy from Google. They are links or text that appear on top of Google search pages. AdSense are the ads google sells that we can insert on our own site. We use AdSense if we are a publisher and we want to monetise our content.

Set the ‘Bot Filtering’ in the view setting under the view page of Admin.

It excludes all hits from known bots and spiders. Google, Yahoo and etc. They have programs to analyze and index the content. Bots is short for robots. Spiders are because these little programs crawl, web, spiders.

It is not a must way to have it turn on, but we have a big site, for a professional who wants to do the analytics, we should gain insights about what the humans (content is for humans) excluding the robots. If we are interested in what users are doing on our site, maybe we can do this easy way to turn it on.

Because the robots just crawl the data from our sites.


Then we can have a look at the sidebar.

  1. Customisation:  Suggest to install a custom report to give it a try.
  2. Real-time: What users are doing the right very second.
  3. Audience:  We will see who, what, when, where, from where. A big module.
  4. Acquisition:  Where traffic comes from, how marketing efforts working. We can have a look at channel, we will see direct, paid search, organic search, (other), referral, email, social. And this tells us again, how people find me? How is my marketing working? Are we just wasting our money?
  5. Behaviour: This is kind of fascinating, think of this as being the security camera in the store. We are watching our uses picking up items or checking out or running out of the exits.
  6. Conversiond: It is the happy part. It is where we track and figure out how well our sites are turning our visitors into customers.

3. Analysing Audience Behaviour

(1) Conversion vs Engagement:

Conversion: A one-time interaction. Granted, this is a powerful interaction, but it is the end goal of a chain of events.

Engagement: Repeated use, that results in an emotional, psychological and sometimes near-physical tie that users have to products, e.g., apple fans.

Build a hypothesis via the audience overview

There are a lot of opportunities to grow if we were to take this site to have it available in other languages.

(2) Active users

From the line in active users, we can see whether nothing is effective on the traffic. Or there is really no marketing being done.

(3) Cohort analysis

Cohort is a group of users that all share a common characteristic, in this case, the acquisition date is the day they came to your site which here is known as day 0. Metrics here are used to analyse the user behaviour.

We can see how is going on day 1?

We can see how many people came back the next day. 

  1. Track individual users with user explorer
  2. How to use segmentation to refine demographics and interests

It helps us to know who our audience is and what type of contents we are trying to expose them—>impact the design choices.

(4) Demographics

If it is young people who use smartphones mostly, we should simplify the navigation choices.

(5) Interests

Target-rich environment for the site can be a place which is the combination of top 3 interests

we can create a segment ‘tablet traffic’ to give it a try and we can find whether there are some differences between the all users.

(6) Geo

Language & Location. We can set the segment like ‘converters’ to compare with the ‘all users’ to find some differences.

(7) Behaviours:

We can set the ‘mobile and tablet traffic’ as the segment to compare with ‘all users’. We find after how many seconds, people are paying more attention. The numbers are trending up. Whether we got their attention for a long span of time. 

(8) Technology

Browser & OS:  Flash version is if we want to do ads on the website, we need to make sure that they actually display.

Network: This can be a really big deal if we are working in users and areas where we know they have very slow connections. And do we need to simplify a new page for them? This is called adaptive design.

(9) Mobile

If something is strange but not significant, we can just move on.

  1. Benchmarking and users flow
  2. Page Analytics (a plugin we can find in Chrome store)

4. User Acquisition

(1) Learning about channels, sources and mediums

There are many questions here:

Well, how do they get the site?

Sources, searched and referrals

SEO and what users are looking for

Social statistics and …

Channels:  The general, top-level categories that our traffic is coming from, such as search, referral or social

Sources: A subcategory of a channel. For example, search is a channel. Inside that channel, Yahoo Search is a source.

Mediums:  By which the traffic from a source is coming to our site. That is, if the traffic is coming from Google, is it organic search or paid search?  

(2) Differentiating between channels-organic search and direct

(direct): direct traffic is where someone comes directly to our site, i.e., type the address into the browser bar or they click on the bookmark.

‘not provided’: the data comes through Google is now encrypted to keep governments or hackers or spies from getting value from it. 

(3) Unlocking Mysterious Dark Social Traffic

There are 6 ways that the dark social traffic can come to the site.

  1. Email, messages. The traffic is from someone’s email program. This is not tracked by Google Analytics because GA lives in browsers.
  2. Links in docs: it is in an application that is not tracked by Google Analytics
  3.,, etc.
  4. Mobile social: twitter etc. 
  5. From https to http
  6. Search errors

(4) Drilling down to track who goes where

From source/mediums: trigger email

(5) Spotting the ‘Ghost Spam’ in referrals

Ghost spam: 

It isn’t really hurting anything. In fact if we really are organized and we are the only one looking at the reports. We can leave them alone and nothing happens. 

There are noxious visits to my site made with the nefarious intent of getting us to click on the links and visit the site of the spammer. They are not actual visits. These sessions and pageviews are from bots that either hit our site and execute the Google Analytics scripts or bypass the server and hit the Google Analytics directly.

Firstly, we need to find out the ghost referrals: 

They came from host names that are not our site.

Check it through:  Acquisition–>All traffic–>Referrals.

We can see there is some websites that have 100% bounce rate and 0s average session duration.

We can also see some things e.g., / referral in Acquisition–>All traffic–>Sources/Medium

How to remove the ghost spam and fake traffic from Google Analytics?

Next blog we will talk more about it.

If you are interested in or have any problems with Business Intelligence or Google Analytics, feel free to contact me.

Or you can connect with me through my LinkedIn.

A special Easter Day

In this special Easter Day, New Zealanders need to stay in our own ‘bubbles’.

So, good time to do some learning stuff.

Pluralsight is now offering all courses free in April.

Completed the Google Analytics for Creative Professionals course on it.

Highly recommended its methodology:

  1. Look for top-level outliers + Mix & Match (segment)
  2. Go to pages and look for issues (technical, content & design)

Especially the part of spotting the ‘Ghost Spam’ in referrals and how to remove it with Regex, quite useful.

In business, making decisions by combining statistics with insight and our knowledge of human wants & needs is called data-informed

Next target: Architecting Data Warehousing Solutions Using Google BigQuery

If you are interested in or have any problems with BigQuery or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Google Cloud Platform Fundamentals

Google Cloud is seen as a leader in areas including data analytics, machine learning and open source. And digital transformation through the cloud allowed companies to deliver personalised, high quality experiences.

During this Lockdown time in New Zealand, working from home means taking less time on the traffic and a time to learn more advanced techniques.

So stay positive and stay safe!

Thanks to GCP fundamentals, it is a perfect opportunity for those who wants to learn Google Cloud Platform.

If you are interested in or have any problems with Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

DAX Cheat Sheet

What is DAX?

It is the programming language for Power Pivot, SSAS Tabular and Power BI.

It resembles Excel because it was born in PowerPivot. But it has no concept of <row> and <column> and has different type system.

The most important, it has many new functions.

The most two are measure and calculated column:

Measure is used to calculate aggregates, e.g., Sum, Avg and evaluated in the context of the cell in a report or a DAX query.

Calculated column evaluates each row and is computed at the low level within the table it belongs to.

Some Common Dax Expressions:


  1. Return the value in result_columnName for the row that meets all criteria specified by search_columnName and search_value
  2. LOOKUPVALUE(Result Column Name, Search Column Name, Search Column value)


  1. Return a subset of a table or expression
  2. FILTER(<table>,<filter>)


  1. Return all the rows in a table, or values in a column, ignoring any filters that may have been applied
  2. ALL(<table> or <column>)


  1. Returns a related value from another table
  2. RELATED(<column>)


  1. Evaluates an expression in a context that is modifies by specific filters
  2. CALCULATE(<expression>,<filter1>,<filter2>)

If you are interested in or have any problems with DAX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Agile Scrum Workflow

Scrum is one of the Agile framework:

1 User story and refinement:

Input from executives, team, stakeholders, customers

2 Product Backlog:

Ranked list of what is required: features, stories

3 Sprint Planning Meeting:

Team selects starting at stop as much as it can commit to deliver by end of sprint

4 Sprint Backlog: Task breakout

5 Stand-up meeting: daily discussion between team members

6 Sprint end date and team deliverable do not change

7 Sprint review, finished work and sprint retrospective

If you are interested in or have any problems with Agile or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.


What is a view?

A view is a virtual table based on the result set of an SQL statement.

Here is an example.

create view [sales].[v.salesbyperson2]
  salespersonid, round(totaldue,2) as salesamount

What is a synonym?

A synonym, like the name, is an alternate name we create for another database object. 

Here is the syntax:

create synonym [synonym_name]
for [server_name].[database_name].[schema_name].[object_name]

What is the trigger?

A trigger is a special kind of stored procedure that automatically executes when an event occurs in the database server. 

Regardless of whether or not any table rows are affected.

Here is an example:

create trigger reminder1
on sales.customer
after insert, update
as raiserror ('notify customer relations', 16, 10);

If you are interested in or have any problems with SQL or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

How to do web scrapping in Power Query in Power BI?

On web, there are public data sourse which can be imported and transformed to make insight reports or dashboards.

There are two circumstances:

This kind of importing of tables is super easy.

For example, we can try to make an easy one.

Here is the web link: What we want is to import the tables from a Wikipedia webpage.

The simple steps are as follow:

Open Power BI Desktop–>Click Get Data–>Click Web–> Paste the URL in the dialog and click OK.

So, there is the screenshot:

In Navigatot dialog, we can select the corresponding table that we want to import and transform it.

This kind of importing from web in Power BI is simple. Because Power Query can identify the table in HTML and tables are in the <table>…</table> tags.

However, many tables in HTML are different.

If we meet this kind of circumstance, we will end up seeing the Document entity in the Navigator.

Next blog, we will demonstrate how to extract the relevant elements in HTML with Power Query in Power BI.

The tricky part id how to find the correct path in the HTML source tree, which needs you to have a basic but not prior HTML knowledge.

To motivate you to keep reading, here is the final web scarping screenshot which i made recently:

If you are interested in or have any problems with Power BI or Power Query, feel free to contact me.

Or you can connect with me through my LinkedIn.

Create a SSIS Project in Visual Studio 2015


  1. SQL Server Data Tools 2015(install a shell of Visual Studio 2015)
  2. Visual Studio 2015

What is SSIS?

SSIS stands for SQL Server Intergration Services. SSIS is to do the ETL(Extract, Transform and Load) task for data warehouse.

SSIS can also update data warehouse, clean and mine data, create ‘packages’, manipulate data, etc.

To design ETL task in Visual Studio, we use the data flow in Visual Studio.

What is data flow?

A data flow defines a flow of data from a source to a destination.

Before designing the data flow, we can try to drag a data flow and drop it in the control flow.

What is a control flow?

It is a flow we can use to control the flow for different tasks. It provides the logic for when data flow components are run and how they run.

For example:

  1. perform lopping
  2. call stored procedures
  3. move files
  4. manage error handling
  5. check a condition and call different tasks

It defines a workflow of tasks to be executed, often a particular order (assuming your included precedence constraints).

we can rename the data flow task in control flow.

For example, we can drag OLE DB Source, so we can connect to Relational Database.

Then we can follow the Microsoft official toturial to do it step by step.

If you are interested in or have any problems with SSIS, feel free to contact me.

Or you can connect with me through my LinkedIn.

Some Prerequisite Knowledge About Spatial Data

Spatial Data Types:

Two kinds of spatial data types:

  1. geometry: flat 2D surface with two dimensions. Supposed X = 3 and Y = 4, then our point representation will be like POINT (3 4).
  2. geography: uses the same methods but the data type reflects the fact the we live on a curved 2D surface.

However, the two kinds of spatial data types is the need for the aforementioned Spatial Reference IDs (SRID).


Both geometry and geography data types have two parts, the coordinates of the object and the SRID number

To check the list of SRID in SQL server, we can execute query statement as belows:


The SRID number is set by EPSG standard. It dictates that the SRID of any geometry data is 0 and for Geography the default of SRID is 4326.

An example:

Here is a link of my Github of stored procedure, which aims to get the nearest suburb for each public transport stop:

In this stored procedure, it can be seen that

SET @geo1=geography::Point(@stationlat,@stationlong,4326);

Which is an application of transformation of the two spatial data types.

If you are interested in or have any problems with SQL, feel free to contact me.

Or you can connect with me through my LinkedIn.

SQL Correlated Subqueries

Recently I often made some mistakes about subqueries, so I wrote this blog about correlated subqueries.

Firstly, here is a SQL practicing website:

It is free and easy for SQL beginners to do SQL exercise step by step.

A correlated subquery works like a nested loop: the subquery only has access to rows related to a single record at a time in the outer query.

The technique relies on table aliases to identify two different usages of the same table, which means one usage is in the outer query and another one is in the subquery.

Here is a table called world, which is an example on the sqlzoo website:

Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000

Question: Find the largest country (by area) in each continent, show the continent, the name and the area.

SQL answer using subquery:

SELECT continent, name, area 
FROM world x
WHERE area >= ALL
    (SELECT area FROM world y
     WHERE y.continent=x.continent AND area>0)

One way to interpret the SQL line in the WHERE clause that references the two table is “… where the correlated values are the same”.

In this example, we can tell “select the country details from the world table where the area is larger than or equal to the area of all countries where the continent is the same”.

If you are interested in or have any problems with SQL, feel free to contact me.

Or you can connect with me through my LinkedIn.