CrUX Dashboard&Data Strategy Lifecycle

Last Blog I demonstrated the data pipeline we can use CrUX to analyze the site performance. This is from a BI developer perspective.

However, for a company, especially the leadership team, what they want is the final dashboard that generated from BI department, so management plan can be gained.

I already wrote how to query from Bigquery and what site speed metrics we can use from the introduction of CrUX blog and public dataset analysis blog.

So this blog I will show you what kind of dashboard we can generate after the steps of data collection from Google public dataset and ETL.

What data visualization tool we need to use?

There are bunches of data visualization tools we can use, e.g., Data Studio, Power BI. This time I take Tableau for an example.

I took www.flightcentre.co.nz(blue line) and www.rentalcars.com(red line) as the origin for comparison, set customer’s device is ‘desktop’ (we also can put a filter on it too).

And there are 4 sheets on a dashboard, i.e., Slow FCP Percentage, Fast FCP Percentage, Fast FID Percentage and Slow FID Percentage.

What they actually mean?

  • Slow FCP Percentage(the percentage of users that experienced a first contentful paint time of 1 second or less)
  • Fast FCP Percentage(the percentage of users that experienced a first contentful paint time of 2.5 seconds or more)
  • Fast FID Percentage(the percentage of users that experienced a first input delay time of 50 ms or less)
  • Slow FID Percentage(the percentage of users that experienced a first input delay time of 250 ms or more)

After this graph, we can roughly see that flightcentre has a higher site speed than rentalcars in user experience.

What we can do in the next step?

After that, we can inform devs and communicate impact according to show exactly the area that the site is falling down. We can point to the fact that it’s from real users and how people actually experiencing the site.

The second part is the data strategy lifecycle in a company.

What is the data strategy lifecycle in a company?

Develop the strategy–>Create the roadmap–>Change management plan–>Analytics lifestyle–>Measurement plan

Perspectives:

  • Scope and Purpose: What data will we manage? How much is our data worth? How do we measure success?
  • Data collection: Archiving, what data where and when, single source of truth(data lake), integrating data silos
  • Architecture: Real time vs Batch, data sharing, data management and security, data modelling, visualization
  • Insights and analysis: Data Exploration, self-service, collaboration, managing results
  • Data governance: Identify data owners, strategic leadership. data stewardship. data lineage, quality, and cost
  • Access and security: RBAC, encryption, PII, access processes, audits, regulatory
  • Retention and SLAs: Data tiers and retention, SLA’s to the business

Wrapping up

This post explored the CrUX dashboard BI team can generate and the data strategy in the company. In the future, I will write more.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Tutorial: Using BigQuery to Analyze CrUX Data

Last blog I gave some examples of how we can use the Chrome User Experience report (CrUX) to gain some insights about site speed. This blog I will continue to show you how to use bigquery to compare your site with the competitors.

Prerequisite:

  •  Log into Google Cloud,
  • Create a project for the CrUX work
  • Avigate to BigQuery console
  • Add the chrome-ux-report dataset and explore the way the tables are structured in ‘preview’

Step one: Figure out what is the origin of your site and the competitor site

like syntax is preferred (Take care of the syntax difference between Standard SQL and T-SQL)

 -- created by: Jacqui Wu
  -- data source: Chrome-ux-report(202003)
  -- last update: 12/05/2020
  
SELECT
  DISTINCT origin
FROM
  `chrome-ux-report.all.202003`
WHERE
  origin LIKE '%yoursite'

Step two: Figure out what should be queried in the select clause?

What we can query from CrUX?

The specific elements that Google is sharing are:

  • “Origin”, which consists of the protocol and hostname, as we used in step one, which can make sure the URL link
  • Effective Connection Type (4G, 3G, etc), which can be queried as the network
  • Form Factor (desktop, mobile, tablet), which can be queried as the device
  • Percentile Histogram data for First Paint, First Contentful Paint, DOM Content Loaded and onLoad (these are all nested, so if we want to query them, we need to unnest them)

Here I create a SQL query of FCP percentage in different sites, which measures the time from navigation to the time when the browser renders the first bit of content from the DOM.

This is an important milestone for users because it provides feedback that the page is actually loading.

SQL queries: 

  -- created by: Jacqui Wu
  -- data source: Chrome-ux-report(202003) in diffrent sites
  -- last update: 12/05/2020
  -- Comparing fcp metric in Different Sites

SELECT origin, form_factor.name AS device, effective_connection_type.name AS conn, "first contentful paint" AS metric, bin.start/1000 AS bin, SUM(bin.density) AS volume
FROM(  
SELECT origin, form_factor, effective_connection_type, first_contentful_paint.histogram.bin as bins
FROM `chrome-ux-report.all.202003`
WHERE origin IN ("your site URL link", "competitor A site URL link", "competitor B site URL link")
)
CROSSS JOIN UNNEST(bins) AS bin
GROUP BY origin, device, conn, bin

Step 3: Export the results to the Data Studio(Google visualization tool)

Here are some tips may be useful

  1. Line chart is preferred for comparing different sites in Visual Selection
  2. Set x-axis to bin(which we already calculate it to seconds) and y-axis to percentage of fcp
  3. Set filter(origin, device, conn) in Filtering section

Wrapping up

This post explored the data pipeline we can use CrUX report to analyze the site performance. In the future, I will write more about CrUX.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

How to use CrUX to analyze your site?

What is CrUX?

CrUX stands for the Chrome User Experience Report. It provides real world and real user metrics gathered from the millions of Google Chrome users who load millions of websites (include yours) each month. Of course, they all opt-in to syncing their browsing history and have usage statistic reporting enabled.

According to Google, its goal is ‘capture the full range of external factors that shape and contribute to the final user experience’.

In this post, I will walk you through how to use it to get insights of your site’s performance.

Why we need CrUX?

We all know faster site results in a better user experience and a better customer loyalty, compared to the sites of competitors. It results in the revenue increasing. Google confirmed some details about how they understand the speed. They are available in CrUX.

What are CrUX metrics?

  • FP(First Paint): when everything loads on the page
  • FCP(First Content loaded): when some text or an image loaded
  • DCL(DOM content loaded): when DOM is loaded
  • ONLOAD: when any additional scripts have loaded
  • FID(First Input Delay): the time between when a user interacts with your site to when the server actually responds to that

How to generate the CrUX report on PageSpeed Insights?

PageSpeed Insights is a tool for people to understand what a page’s performance is and how to improve it.

It uses the lighthouse to audit the given page and identify opportunities to improve performance. It also integrates with the CrUX to show how real users experience performance on the page.

Take Yahoo as the example, after a few seconds, lighthouse audits will be performed and we will see sections for field and lab data.

In the field data section, we can see FCP and FID (please see the table below as we can see the FCP and FID values).

MetricFastAverageSlow
FCP0-1000ms1000ms-2500ms2500ms+
FID0-50ms50-250ms250ms+

We can see the Yahoo site is in ‘average’ according to the table. To achieve the ‘fast’, both FCP and FID must be categorized as fast.

Also, a percentile can be shown in each metric. For FCP, the 75th percentile is used and for FID, it is the 95th. For example, 75% of FCP experiences on the page are 1.5s or less.

How to use it in BigQuery?

In BigQuery, we can also extract insights about UX on our site.

SELECT origin, form_factor.name AS device, effective_connection_type.name  AS conn, 
       ROUND(SUM(onload.density),4) as density
FROM `chrome-ux-report.all.201907`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin IN ("https://www.yahoo.com")
GROUP BY origin, device, conn

Then we can see the result in BigQuery.

The raw data is organized like a histogram, with bins have a start time, end time and density value. For example, we can query for the percent of ‘fast’ FCP experiences, where ‘fast’ is defined as happening under a second.

We can compare Yahoo with bing. Here is how the query look:

SELECT
  origin,
  SUM(fcp.density) AS fast_fcp
FROM
  `chrome-ux-report.all.201907`,
  UNNEST (first_contentful_paint.histogram.bin) AS fcp
WHERE
  fcp.start<1000
  AND origin IN ('https://www.bing.com',
    'https://www.yahoo.com')
GROUP BY
  origin

Wrapping up

This post explored some methods to get site insights with CrUX report. In the future, I will write more about CrUX.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Analyzing Google Analytics Data in BigQuery (Part1)

What is BigQuery?

Among Google Cloud Platform family products, there are Google App Engine, Google Compute Engine, Google Cloud Datastore, Google Cloud Storage, Google Big Query (for analytics), and Google Cloud SQL.

The most important product for BI Analyst is Big Query, it is an OLAP Data Warehouse which supports DW, Join and fully managed. It can make developers use SQL to query massive amounts of data in seconds.

Why BigQuery?

The main advantage is BigQuery can integrate with Google Analytics. It means we can synchronize Session/Event data to BigQuery easily to make custom analytics, not only the Google Analytics functions.

In other words, BigQuery can dump raw GA data into it. So it means some custom analytics which can’t be performed with the GA interface now can be generated by BigQuery.

Moreover, we can also bring in third-party data into it.

What is the difficulty for BI Analyst, it means we need to calculate every metrics in queries.

Which SQL is preferred in Big Query?

Standard SQL syntax is preferred in Big query nowadays.

How we can get the data from Google Analytics?

A daily dataset can be got from GA to BigQuery. Any within each dataset, a table is imported for each day of export. Its name format is ga_sessions_YYYYMMDD.

We can also set some steps to make sure the tables, dashboards and data transfers are always up-to-date.

How to get it a try?

Firstly, set up a Google Cloud Billing account. With a Google Cloud Billing account, we can use BigQuery web UI with Google Analytics 360.

The next step is to run a SQL query and visualize the output. The query editor is standard and follows the SQL syntax.

For example, here is a sample query that queries user-level data, total visits and page views.

SELECT fullVisitorId,
       visitId,
       trafficSource.source,
       trafficSource.medium,
       totals.visits,
       totals.pageviews,
FROM 'ga_sessions_YYYYMMDD'

In this step, if we need to get a good understanding of ga_sessios_table in BigQuery, we need to make sure what is the available raw GA data fileds can be got in BigQuery.

We can use an interactive visual representation as the reference.

Next blog we will give more examples about how to analyze GA data in BigQuery according to data ranges or others like users, sessions, traffic sources, etc.

If you are interested in or have any problems with Business Intelligence or BigQuery, feel free to contact me.

Or you can connect with me through my LinkedIn.