Last blog I gave some examples of how we can use the Chrome User Experience report (CrUX) to gain some insights about site speed. This blog I will continue to show you how to use bigquery to compare your site with the competitors.
Prerequisite:
Log into Google Cloud,
Create a project for the CrUX work
Avigate to BigQuery console
Add the chrome-ux-report dataset and explore the way the tables are structured in ‘preview’
Step one: Figure out what is the origin of your site and the competitor site
like syntax is preferred (Take care of the syntax difference between Standard SQL and T-SQL)
-- created by: Jacqui Wu
-- data source: Chrome-ux-report(202003)
-- last update: 12/05/2020
SELECT
DISTINCT origin
FROM
`chrome-ux-report.all.202003`
WHERE
origin LIKE '%yoursite'
Step two: Figure out what should be queried in the select clause?
What we can query from CrUX?
The specific elements that Google is sharing are:
“Origin”, which consists of the protocol and hostname, as we used in step one, which can make sure the URL link
Effective Connection Type (4G, 3G, etc), which can be queried as the network
Form Factor (desktop, mobile, tablet), which can be queried as the device
Percentile Histogram data for First Paint, First Contentful Paint, DOM Content Loaded and onLoad (these are all nested, so if we want to query them, we need to unnest them)
Here I create a SQL query of FCP percentage in different sites, which measures the time from navigation to the time when the browser renders the first bit of content from the DOM.
This is an important milestone for users because it provides feedback that the page is actually loading.
SQL queries:
-- created by: Jacqui Wu
-- data source: Chrome-ux-report(202003) in diffrent sites
-- last update: 12/05/2020
-- Comparing fcp metric in Different Sites
SELECT origin, form_factor.name AS device, effective_connection_type.name AS conn, "first contentful paint" AS metric, bin.start/1000 AS bin, SUM(bin.density) AS volume
FROM(
SELECT origin, form_factor, effective_connection_type, first_contentful_paint.histogram.bin as bins
FROM `chrome-ux-report.all.202003`
WHERE origin IN ("your site URL link", "competitor A site URL link", "competitor B site URL link")
)
CROSSS JOIN UNNEST(bins) AS bin
GROUP BY origin, device, conn, bin
Step 3: Export the results to the Data Studio(Google visualization tool)
Here are some tips may be useful
Line chart is preferred for comparing different sites in Visual Selection
Set x-axis to bin(which we already calculate it to seconds) and y-axis to percentage of fcp
Set filter(origin, device, conn) in Filtering section
Wrapping up
This post explored the data pipeline we can use CrUX report to analyze the site performance. In the future, I will write more about CrUX.
If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.
Data-driven: Decisions made only based upon statistics, which can be misleading.
Data-informed: Decisions made by combining statistics with insights and our knowledge of human wants & needs.
We will be able to use data and human creativity to come up with innovative solutions in a business.
When we click into Google Analytics, we can see a large amount of lines, full of data, strange names.
But don’t chill out, we can break things
In this blog, I will illustrate Analytics +Art = Creative Data Scientist
Agenda
Installing And Customising Google Analytics
Learning Dashboard
Analysing Behaviours
User Acquisition
Generate And Share Reports
1. Installing And Customising Google Analytics
When we install the Google Analytics, there are several terms we should pay attention:
Tracking code: Which is a basic code snippet for a website. It starts as UA, which stands for Universal Analytics.
Data collection: Turn it on and it allows us to get data of users.
User-ID: Allow us to tracker users. Generate the unique user Id and make sure the right ID is assigned to the right user and associate the data in Google Analytics.
Session setting: Any time a user has loaded up your site on their device. We can set the session time here.
Referral exclusive list: Mostly, we set them as our own site URL.
2. Learning Dashboards:
Admin page: account, property, view
Under the account, there are several properties.
Take one account as the example, there are many URLs associated with this account.
In other words, we have a site. Under this site, we have a whole bunch of other properties that we’ve associated here.
The tracking Id is the same, except for a number at the very end.
Google Ads and Google AdSense:
Ads are the words we buy from Google. They are links or text that appear on top of Google search pages. AdSense are the ads google sells that we can insert on our own site. We use AdSense if we are a publisher and we want to monetise our content.
Set the ‘Bot Filtering’ in the view setting under the view page of Admin.
It excludes all hits from known bots and spiders. Google, Yahoo and etc. They have programs to analyze and index the content. Bots is short for robots. Spiders are because these little programs crawl, web, spiders.
It is not a must way to have it turn on, but we have a big site, for a professional who wants to do the analytics, we should gain insights about what the humans (content is for humans) excluding the robots. If we are interested in what users are doing on our site, maybe we can do this easy way to turn it on.
Because the robots just crawl the data from our sites.
Sidebar
Then we can have a look at the sidebar.
Customisation: Suggest to install a custom report to give it a try.
Real-time: What users are doing the right very second.
Audience: We will see who, what, when, where, from where. A big module.
Acquisition: Where traffic comes from, how marketing efforts working. We can have a look at channel, we will see direct, paid search, organic search, (other), referral, email, social. And this tells us again, how people find me? How is my marketing working? Are we just wasting our money?
Behaviour: This is kind of fascinating, think of this as being the security camera in the store. We are watching our uses picking up items or checking out or running out of the exits.
Conversiond: It is the happy part. It is where we track and figure out how well our sites are turning our visitors into customers.
3. Analysing Audience Behaviour
(1) Conversion vs Engagement:
Conversion: A one-time interaction. Granted, this is a powerful interaction, but it is the end goal of a chain of events.
Engagement: Repeated use, that results in an emotional, psychological and sometimes near-physical tie that users have to products, e.g., apple fans.
Build a hypothesis via the audience overview
There are a lot of opportunities to grow if we were to take this site to have it available in other languages.
(2) Active users
From the line in active users, we can see whether nothing is effective on the traffic. Or there is really no marketing being done.
(3) Cohort analysis
Cohort is a group of users that all share a common characteristic, in this case, the acquisition date is the day they came to your site which here is known as day 0. Metrics here are used to analyse the user behaviour.
We can see how is going on day 1?
We can see how many people came back the next day.
Track individual users with user explorer
How to use segmentation to refine demographics and interests
It helps us to know who our audience is and what type of contents we are trying to expose them—>impact the design choices.
(4) Demographics
If it is young people who use smartphones mostly, we should simplify the navigation choices.
(5) Interests
Target-rich environment for the site can be a place which is the combination of top 3 interests
we can create a segment ‘tablet traffic’ to give it a try and we can find whether there are some differences between the all users.
(6) Geo
Language & Location. We can set the segment like ‘converters’ to compare with the ‘all users’ to find some differences.
(7) Behaviours:
We can set the ‘mobile and tablet traffic’ as the segment to compare with ‘all users’. We find after how many seconds, people are paying more attention. The numbers are trending up. Whether we got their attention for a long span of time.
(8) Technology
Browser & OS: Flash version is if we want to do ads on the website, we need to make sure that they actually display.
Network: This can be a really big deal if we are working in users and areas where we know they have very slow connections. And do we need to simplify a new page for them? This is called adaptive design.
(9) Mobile
If something is strange but not significant, we can just move on.
Benchmarking and users flow
Page Analytics (a plugin we can find in Chrome store)
4. User Acquisition
(1) Learning about channels, sources and mediums
There are many questions here:
Well, how do they get the site?
Sources, searched and referrals
SEO and what users are looking for
Social statistics and …
Channels: The general, top-level categories that our traffic is coming from, such as search, referral or social
Sources: A subcategory of a channel. For example, search is a channel. Inside that channel, Yahoo Search is a source.
Mediums: By which the traffic from a source is coming to our site. That is, if the traffic is coming from Google, is it organic search or paid search?
(2) Differentiating between channels-organic search and direct
(direct): direct traffic is where someone comes directly to our site, i.e., type the address into the browser bar or they click on the bookmark.
‘not provided’: the data comes through Google is now encrypted to keep governments or hackers or spies from getting value from it.
(3) Unlocking Mysterious Dark Social Traffic
There are 6 ways that the dark social traffic can come to the site.
Email, messages. The traffic is from someone’s email program. This is not tracked by Google Analytics because GA lives in browsers.
Links in docs: it is in an application that is not tracked by Google Analytics
Bit.ly, Ow.ly, etc.
Mobile social: twitter etc.
From https to http
Search errors
(4) Drilling down to track who goes where
From source/mediums: trigger email
(5) Spotting the ‘Ghost Spam’ in referrals
Ghost spam:
It isn’t really hurting anything. In fact if we really are organized and we are the only one looking at the reports. We can leave them alone and nothing happens.
There are noxious visits to my site made with the nefarious intent of getting us to click on the links and visit the site of the spammer. They are not actual visits. These sessions and pageviews are from bots that either hit our site and execute the Google Analytics scripts or bypass the server and hit the Google Analytics directly.
Firstly, we need to find out the ghost referrals:
They came from host names that are not our site.
Check it through: Acquisition–>All traffic–>Referrals.
We can see there is some websites that have 100% bounce rate and 0s average session duration.
We can also see some things e.g., xxxxxx.com / referral in Acquisition–>All traffic–>Sources/Medium
How to remove the ghost spam and fake traffic from Google Analytics?
Next blog we will talk more about it.
If you are interested in or have any problems with Business Intelligence or Google Analytics, feel free to contact me.
Among Google Cloud Platform family products, there are Google App Engine, Google Compute Engine, Google Cloud Datastore, Google Cloud Storage, Google Big Query (for analytics), and Google Cloud SQL.
The most important product for BI Analyst is Big Query, it is an OLAP Data Warehouse which supports DW, Join and fully managed. It can make developers use SQL to query massive amounts of data in seconds.
Why BigQuery?
The main advantage is BigQuery can integrate with Google Analytics. It means we can synchronize Session/Event data to BigQuery easily to make custom analytics, not only the Google Analytics functions.
In other words, BigQuery can dump raw GA data into it. So it means some custom analytics which can’t be performed with the GA interface now can be generated by BigQuery.
Moreover, we can also bring in third-party data into it.
What is the difficulty for BI Analyst, it means we need to calculate every metrics in queries.
Which SQL is preferred in Big Query?
Standard SQL syntax is preferred in Big query nowadays.
How we can get the data from Google Analytics?
A daily dataset can be got from GA to BigQuery. Any within each dataset, a table is imported for each day of export. Its name format is ga_sessions_YYYYMMDD.
We can also set some steps to make sure the tables, dashboards and data transfers are always up-to-date.
How to get it a try?
Firstly, set up a Google Cloud Billing account. With a Google Cloud Billing account, we can use BigQuery web UI with Google Analytics 360.
The next step is to run a SQL query and visualize the output. The query editor is standard and follows the SQL syntax.
For example, here is a sample query that queries user-level data, total visits and page views.
SELECT fullVisitorId,
visitId,
trafficSource.source,
trafficSource.medium,
totals.visits,
totals.pageviews,
FROM 'ga_sessions_YYYYMMDD'
In this step, if we need to get a good understanding of ga_sessios_table in BigQuery, we need to make sure what is the available raw GA data fileds can be got in BigQuery.
Next blog we will give more examples about how to analyze GA data in BigQuery according to data ranges or others like users, sessions, traffic sources, etc.
If you are interested in or have any problems with Business Intelligence or BigQuery, feel free to contact me.
Google Cloud is seen as a leader in areas including data analytics, machine learning and open source. And digital transformation through the cloud allowed companies to deliver personalised, high quality experiences.
During this Lockdown time in New Zealand, working from home means taking less time on the traffic and a time to learn more advanced techniques.
So stay positive and stay safe!
Thanks to GCP fundamentals, it is a perfect opportunity for those who wants to learn Google Cloud Platform.
If you are interested in or have any problems with Business Intelligence, feel free to contact me.