Why we need data integration and what can we do?

The first question is why we need data integration?

Let me give you an example here to answer this question.

Every company has many departments, and different departments use different tools to store their data. For example, marketing team may use hubspot tool.

Now, we have different departments which store different types of data in a company.

However, insightful information is needed to make business decisions through those large amount of data.

What can we do?

Maybe we can connect all the databases everytime to generate reports. However, it will cost us large amount of time, then the term of data integration is raised.

What is data integration?

Data integration is a process in which heterogeneous data is retrieved and combined as an incorporated form and structure.

There are some advantages of data integration:

  1. Reduce data complexity
  2. Data integrity
  3. Easy data collaboration
  4. Smarter business decisions

There are also some well-known tools can do data integration, e.g., Microsoft SQL Server Integration Services (SSIS), Informatica, Oracle Data Integrator, AWS Glue etc.

This blog we will talk about SSIS, which is one of the most popular ETL tool in New Zealand.

Why SSIS?

  1. Data can be loaded in parallel to many varied destinations
  2. SSIS removes the need of hard core programmers
  3. Tight integration with other products of Microsoft
  4. SSIS is cheaper than most other ETL tools

SSIS stands for SQL Server Integration Services, which is a component of the Microsoft SQL Server database software that can be used to perform a broad range of data integration and data transformation tasks.

What can SSIS do?

It combines the data residing in different sources and provides users with an unified view of these data.

Also, it can also be used to automate maintenance of SQL Server databases and update to multidimensional analytical data.

How is works?

These tasks of data transformation and workflow creation is carried out using SSIS package:

Operational Data–>ETL–>Data Warehouse

In the first place, operational data store (ODS) is a database designed to integrate data from multiple sources for additional operations on the data. This is the place where most of the data used in current operation is housed before it’s tranferred to the data warehouse for longer tem storage or achiiving.

Next step is ETL(Extract, Transform and Load), the process of extracting the data from various sources, tranforming this data to meet your requirement and then loading into a target data warehouse.

The third step is data warehousing, which is a large set of data accumulated which is used for assembling and managing data from various sources of answering business questions. Hence, helps in making decisions.

What is SSIS package?

It is an object that implements integration services functionality to extract, transform and load data. A package is composed of:

  1. connection
  2. control flow elements(handle workflow)
  3. data flow elements(handle data transform)

If you want to investigate more about SSIS, check it out on Microsoft official documents.

If you are interested in or have any problems with SSIS, feel free to contact me.

Or you can connect with me through my LinkedIn.

Some Cloud Computing Fundamentals

What is cloud computing?

The practice of using a network of remote servers hosted on the Internet to store, manage, and process data, rather than a local server or a personal computer.

On-premise:

  1. You own the servers
  2. You hire the IT people
  3. You pay or rent the real-estate
  4. You take all the risk

Cloud providers:

  1. Someone else owns the servers
  2. Someone else hires the IT people
  3. Someone else pays or rents the real-estate
  4. You are responsible for your figuring cloud services and code, someone else takes care of the rest.

Different kinds of hosting:

  1. Dedicated Server: One physical machine dedicated to single a business. Runs a single web-app/site. (Very expensive, high maintenance, high security)
  2. Virtual Private Server: One physical machine dedicated to a single business. The physical machine is virtualized into sub-machines runs multiple web-apps/sites.
  3. Shared Hosting: One physical machine, shared by hundreds of businesses. Relies on most tenants under-utilizing their resources. (Very cheap, Very limited)
  4. Cloud Hosting: Multiple physical machines that act as one system. The system is abstracted into multiple cloud services. (Flexible, Scalable, Secure, Cost-Effective, High Configurability)

Common Cloud Services

A cloud provider can have hundreds of cloud services that are grouped various types of services. The four most common types of cloud services for infrastructure as a service(laaS) would be:

  1. Compute: Imagine having a virtual computer that can run application, programs and code.
  2. Storage: Imagine having a virtual hard-drive that can store files
  3. Networking: Image having a virtual network being able to define Internet connections or network isolations
  4. Database: Imagine a virtual database for stoing reporting data or a database for genetal purpose web-application

The term ‘cloud computing’ can be used to refer to all categories, even though it has ‘compute’ in the name.

Benefits of Cloud Computing

Cost-effective: You pay for what you consume, no up-front cost. Pay-as-you-go(PAYG) thousands of customers sharing the cost of the resources.

Global: Launch workloads anywhere in the world, just choose a region

Secure: Cloud provider takes care of physical security. Cloud services can by secure by default or you have the ability to configure access down to granular level.

Reliable: Data backup, disaster recovery, and data replication, and fault tolerance.

Scalable: Increase or decrease resources and services based on demand

Elastic: Automate scaling during spikes and drop in demand

Current: The underlying hardware and managed software is patched, upgraded and replaced by the cloud provider without interruption to you.

Next blog I will write some fundamentals about Microsoft Azure, which is the cloud provider service of Microsoft.

If you are interested in or have any problems with cloud computing, feel free to contact me.

Or you can connect with me through my LinkedIn.

How to design a Data Warehouse(Part 1)

Last blog I wrote why we need a Data Warehouse.

First, what is the data warehouse?

It is a centralized relational database that pulls together data from different sources (CRM, marketing stack, etc.) for better business insights.

It stores current and historical data are used for reporting and analysis.

However, here is the problem:

How we can design a Data Warehouse?

1 Define Business Requirements

Because it touches all areas of a company, all departments need to be onboard with the design. Each department needs to understand what the benefits of data warehouse and what results they can expect from it.

What objectives we can focus on:

  1. Determine the scope of the whole project
  2. Find out what data is useful for analysis and where our dat is current siloed
  3. Create a backup plan in case of failure
  4. Security: monitoring, etc.

2 Choose a data warehouse platform

There are four types of data warehouse platforms:

  1. Traditional database management systems: Row-based relational platforms, e.g., Microsoft SQL Server
  2. Specialized Analytics DBMS: Columnar data stores designed specifically for managing and running analytics, e.g., Teradata
  3. Out-of-box data warehouse appliances: the combination with a software and hardware with a DBMS pre-installed, e.g., Oracle Exadata
  4. Cloud-hosted data warehouse tools

We can choose the suitable one for the company according to budget, employees and infrastructure.

We can choose between cloud or on-premise?

Cloud solution pros:

  1. Scalability: easy, cost-effective, simple and flexible to scale with cloud services
  2. Low entry cost: no servers, hardware and operational cost
  3. Connectivity: easy to connect to other cloud services
  4. Security: cloud providers supply security patches and protocols to keep customers safe

Choices:

  1. Amazon Redshift
  2. Microsoft Azure SQL Data Warehouse
  3. Google Bigquery
  4. Snowflake Computing

On-premise solution pros:

  1. Reliability: With good staff and exceptional hardware, on-premise solutions can be highly available and reliable
  2. Security: Organizations have full control of the security and access

Choices:

  1. Oracle Database
  2. Microsoft SQL Server
  3. MySQL
  4. IBM DB2
  5. PostgreSQL

What we can choose between on-premise and cloud solution, in the big picture, it depends on our budget and existing system.

If we look for control, then we can choose on-premise solution. Conversely, if we look for scalability, we can choose a cloud service.

3 Set up the physical environments

There are three physical environments in Data Warehouse: development, testing and production.

  1. We need to test changes before they move into the production environment
  2. Running tests against data typically uses extreme data sets or random sets of data from the production environment.
  3. Data integrity is much easier to track and issues are easier to contain if we have three environments running.

4 Data Modelling

It is the most complex phase of Data Warehouse design. It is the process of visualizing data distribution in the warehouse.

  1. Visualize the relationships between data
  2. Set standardized naming conventions
  3. Create relationships between data sets
  4. Establish compliance and security processes

There are bunches of data modeling techniques that businesses use for data warehouse design. Here are top 3 popular ones:

  1. Snowflake Schema
  2. Star Schema
  3. Galaxy Schema

4 Choosing the ETL solution

ETL, stands for Extract, Transform and Load is the process we pull data from the storage solutions to warehouse.

We need to build an easy, replicable and consistent data pipeline because a poor ETL process can break the entire data warehouse.

Wrapping up

This post explored the first 4 steps about designing a Data Warehouse in the company. In the future, I will write the next steps.

If you are interested in or have any problems with Data Warehouse, feel free to contact me.

Or you can connect with me through my LinkedIn.

CrUX Dashboard&Data Strategy Lifecycle

Last Blog I demonstrated the data pipeline we can use CrUX to analyze the site performance. This is from a BI developer perspective.

However, for a company, especially the leadership team, what they want is the final dashboard that generated from BI department, so management plan can be gained.

I already wrote how to query from Bigquery and what site speed metrics we can use from the introduction of CrUX blog and public dataset analysis blog.

So this blog I will show you what kind of dashboard we can generate after the steps of data collection from Google public dataset and ETL.

What data visualization tool we need to use?

There are bunches of data visualization tools we can use, e.g., Data Studio, Power BI etc. This time I take Tableau for an example.

I took www.flightcentre.co.nz(blue line) and www.rentalcars.com(red line) as the origin for comparison, set customer’s device is ‘desktop’ (we also can put a filter on it too).

And there are 4 sheets on a dashboard, i.e., Slow FCP Percentage, Fast FCP Percentage, Fast FID Percentage and Slow FID Percentage.

What they actually mean?

  1. Slow FCP Percentage(the percentage of users that experienced a first contentful paint time of 1 second or less)
  2. Fast FCP Percentage(the percentage of users that experienced a first contentful paint time of 2.5 seconds or more)
  3. Fast FID Percentage(the percentage of users that experienced a first input delay time of 50 ms or less)
  4. Slow FID Percentage(the percentage of users that experienced a first input delay time of 250 ms or more)

After this graph, we can roughly see that flightcentre has a higher site speed than rentalcars in user experience.

What we can do in the next step?

After that, we can inform devs and communicate impact according to show exactly the area that the site is falling down. We can point to the fact that it’s from real users and how people actually experiencing the site.

The second part is the data strategy lifecycle in a company.

What is the data strategy lifecycle in a company?

Develop the strategy–>Create the roadmap–>Change management plan–>Analytics lifestyle–>Measurement plan

Perspectives:

  1. Scope and Purpose: What data will we manage? How much does our data worth? How do we measure success?
  2. Data collection: Archiving, what data where and when, single source of truth(data lake), integrating data silos
  3. Architecture: Real time vs Batch, data sharing, data management and security, data modelling, visualization
  4. Insights and analysis: Data Exploration, self-service, collaboration, managing results
  5. Data governance: Identify data owners, strategic leadership. data stewardship. data lineage, quality, and cost
  6. Access and security: RBAC, encryption, PII, access processes, audits, regulatory
  7. Retention and SLAs: Data tiers and retention, SLA’s to the business

Wrapping up

This post explored the CrUX dashboard BI team can generate and the data strategy in a company. In the future, I will write more.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Tutorial: Using BigQuery to Analyze CrUX Data

Last blog I gave some examples of how we can use the Chrome User Experience report (CrUX) to gain some insights about site speed. This blog I will continue to show you how to use bigquery to compare your site with the competitors.

Prerequisite:

  1.  Log into Google Cloud,
  2. Create a project for the CrUX work
  3. Avigate to BigQuery console
  4. Add the chrome-ux-report dataset and explore the way the tables are structured in ‘preview’

Step one: Figure out what is the origin of your site and the competitor site

like syntax is preferred (Take care of the syntax difference between Standard SQL and T-SQL)

 -- created by: Jacqui Wu
  -- data source: Chrome-ux-report(202003)
  -- last update: 12/05/2020
  
SELECT
  DISTINCT origin
FROM
  `chrome-ux-report.all.202003`
WHERE
  origin LIKE '%yoursite'

Step two: Figure out what should be queried in the select clause?

What we can query from CrUX?

The specific elements that Google is sharing are:

  1. “Origin”, which consists of the protocol and hostname, as we used in step one, which can make sure the URL link
  2. Effective Connection Type (4G, 3G, etc), which can be queried as the network
  3. Form Factor (desktop, mobile, tablet), which can be queried as the device
  4. Percentile Histogram data for First Paint, First Contentful Paint, DOM Content Loaded and onLoad (these are all nested, so if we want to query them, we need to unnest them)

Here I create a SQL query of FCP percentage in different sites, which measures the time from navigation to the time when the browser renders the first bit of content from the DOM.

This is an important milestone for users because it provides feedback that the page is actually loading.

SQL queries: 

  -- created by: Jacqui Wu
  -- data source: Chrome-ux-report(202003) in diffrent sites
  -- last update: 12/05/2020
  -- Comparing fcp metric in Different Sites

SELECT origin, form_factor.name AS device, effective_connection_type.name AS conn, "first contentful paint" AS metric, bin.start/1000 AS bin, SUM(bin.density) AS volume
FROM(  
SELECT origin, form_factor, effective_connection_type, first_contentful_paint.histogram.bin as bins
FROM `chrome-ux-report.all.202003`
WHERE origin IN ("your site URL link", "competitor A site URL link", "competitor B site URL link")
)
CROSSS JOIN UNNEST(bins) AS bin
GROUP BY origin, device, conn, bin

Step 3: Export the results to the Data Studio(Google visualization tool)

Here are some tips may be useful

  1. Line chart is preferred for comparing different sites in Visual Selection
  2. Set x-axis to bin(which we already calculate it to seconds) and y-axis to percentage of fcp
  3. Set filter(origin, device, conn) in Filtering section

Wrapping up

This post explored the data pipeline we can use CrUX report to analyze the site performance. In the future, I will write more about CrUX.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

How to use CrUX to analyze your site?

What is CrUX?

CrUX stands for the Chrome User Experience Report. It provides real world and real user metrics gathered from the millions of Google Chrome users who load millions of websites (include yours) each month. Of course, they all opt-in to syncing their browsing history and have usage statistic reporting enabled.

According to Google, its goal is ‘capture the full range of external factors that shape and contribute to the final user experience’.

In this post, I will walk you through how to use it to get insights of your site’s performance.

Why we need CrUX?

We all know faster site results in a better user experience and a better customer loyalty, compared to the sites of competitors. It results in the revenue increasing. Google confirmed some details about how they understand the speed. They are available in CrUX.

What are CrUX metrics?

  1. FP(First Paint): when everything loads on the page
  2. FCP(First Content loaded): when some text or an image loaded
  3. DCL(DOM content loaded): when DOM is loaded
  4. ONLOAD: when any additional scripts have loaded
  5. FID(First Input Delay): the time between when a user interacts with your site to when the server actually responds to that

How to generate the CrUX report on PageSpeed Insights?

PageSpeed Insights is a tool for people to understand what a page’s performance is and how to improve it.

It uses the lighthouse to audit the given page and identify opportunities to improve performance. It also integrates with the CrUX to show how real users experience performance on the page.

Take Yahoo as the example, after a few seconds, lighthouse audits will be performed and we will see sections for field and lab data.

In the field data section, we can see FCP and FID (please see the table below as we can see the FCP and FID values).

MetricFastAverageSlow
FCP0-1000ms1000ms-2500ms2500ms+
FID0-50ms50-250ms250ms+

We can see the Yahoo site is in ‘average’ according to the table. To achieve the ‘fast’, both FCP and FID must be categorized as fast.

Also, a percentile can be shown in each metric. For FCP, the 75th percentile is used and for FID, it is the 95th. For example, 75% of FCP experiences on the page are 1.5s or less.

How to use it in BigQuery?

In BigQuery, we can also extract insights about UX on our site.

SELECT origin, form_factor.name AS device, effective_connection_type.name  AS conn, 
       ROUND(SUM(onload.density),4) as density
FROM `chrome-ux-report.all.201907`,
    UNNEST (onload.histogram.bin) as onload
WHERE origin IN ("https://www.yahoo.com")
GROUP BY origin, device, conn

Then we can see the result in BigQuery.

The raw data is organized like a histogram, with bins have a start time, end time and density value. For example, we can query for the percent of ‘fast’ FCP experiences, where ‘fast’ is defined as happening under a second.

We can compare Yahoo with bing. Here is how the query look:

SELECT
  origin,
  SUM(fcp.density) AS fast_fcp
FROM
  `chrome-ux-report.all.201907`,
  UNNEST (first_contentful_paint.histogram.bin) AS fcp
WHERE
  fcp.start<1000
  AND origin IN ('https://www.bing.com',
    'https://www.yahoo.com')
GROUP BY
  origin

Wrapping up

This post explored some methods to get site insights with CrUX report. In the future, I will write more about CrUX.

If you are interested in or have any problems with CrUX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Google Analytics cheat sheets

Data-driven VS data-informed

Data-driven: Decisions made only based upon statistics, which can be misleading.

Data-informed: Decisions made by combining statistics with insights and our knowledge of human wants & needs. 

We will be able to use data and human creativity to come up with innovative solutions in a business.

When we click into Google Analytics, we can see a large amount of lines, full of data, strange names.

But don’t chill out, we can break things

In this blog, I will illustrate Analytics +Art = Creative Data Scientist

Agenda

  1. Installing And Customising Google Analytics
  2. Learning Dashboard
  3. Analysing Behaviours
  4. User Acquisition
  5. Generate And Share Reports

1. Installing And Customising Google Analytics

How to Setup Google Analytics & Install on Website - YouTube

When we install the Google Analytics, there are several terms we should pay attention:

  1. Tracking code: Which is a basic code snippet for a website. It starts as UA, which stands for Universal Analytics.
  2. Data collection:  Turn it on and it allows us to get data of users.
  3. User-ID:  Allow us to tracker users. Generate the unique user Id and make sure the right ID is assigned to the right user and associate the data in Google Analytics.
  4. Session setting:  Any time a user has loaded up your site on their device. We can set the session time here.
  5. Referral exclusive list: Mostly, we set them as our own site URL.

2. Learning Dashboards:

Admin page: account, property, view

Under the account, there are several properties.

Take one account as the example, there are many URLs associated with this account.

In other words, we have a site. Under this site, we have a whole bunch of other properties that we’ve associated here.

The tracking Id is the same, except for a number at the very end.

Google Analytics Admin Page

Google Ads and Google AdSense: 

Ads are the words we buy from Google. They are links or text that appear on top of Google search pages. AdSense are the ads google sells that we can insert on our own site. We use AdSense if we are a publisher and we want to monetise our content.

Set the ‘Bot Filtering’ in the view setting under the view page of Admin.

It excludes all hits from known bots and spiders. Google, Yahoo and etc. They have programs to analyze and index the content. Bots is short for robots. Spiders are because these little programs crawl, web, spiders.

It is not a must way to have it turn on, but we have a big site, for a professional who wants to do the analytics, we should gain insights about what the humans (content is for humans) excluding the robots. If we are interested in what users are doing on our site, maybe we can do this easy way to turn it on.

Because the robots just crawl the data from our sites.

Sidebar

Then we can have a look at the sidebar.

  1. Customisation:  Suggest to install a custom report to give it a try.
  2. Real-time: What users are doing the right very second.
  3. Audience:  We will see who, what, when, where, from where. A big module.
  4. Acquisition:  Where traffic comes from, how marketing efforts working. We can have a look at channel, we will see direct, paid search, organic search, (other), referral, email, social. And this tells us again, how people find me? How is my marketing working? Are we just wasting our money?
  5. Behaviour: This is kind of fascinating, think of this as being the security camera in the store. We are watching our uses picking up items or checking out or running out of the exits.
  6. Conversiond: It is the happy part. It is where we track and figure out how well our sites are turning our visitors into customers.

3. Analysing Audience Behaviour

(1) Conversion vs Engagement:

Conversion: A one-time interaction. Granted, this is a powerful interaction, but it is the end goal of a chain of events.

Engagement: Repeated use, that results in an emotional, psychological and sometimes near-physical tie that users have to products, e.g., apple fans.

Build a hypothesis via the audience overview

There are a lot of opportunities to grow if we were to take this site to have it available in other languages.

(2) Active users

From the line in active users, we can see whether nothing is effective on the traffic. Or there is really no marketing being done.

(3) Cohort analysis

Cohort is a group of users that all share a common characteristic, in this case, the acquisition date is the day they came to your site which here is known as day 0. Metrics here are used to analyse the user behaviour.

We can see how is going on day 1?

We can see how many people came back the next day. 

  1. Track individual users with user explorer
  2. How to use segmentation to refine demographics and interests

It helps us to know who our audience is and what type of contents we are trying to expose them—>impact the design choices.

(4) Demographics

If it is young people who use smartphones mostly, we should simplify the navigation choices.

(5) Interests

Target-rich environment for the site can be a place which is the combination of top 3 interests

we can create a segment ‘tablet traffic’ to give it a try and we can find whether there are some differences between the all users.

(6) Geo

Language & Location. We can set the segment like ‘converters’ to compare with the ‘all users’ to find some differences.

(7) Behaviours:

We can set the ‘mobile and tablet traffic’ as the segment to compare with ‘all users’. We find after how many seconds, people are paying more attention. The numbers are trending up. Whether we got their attention for a long span of time. 

(8) Technology

Browser & OS:  Flash version is if we want to do ads on the website, we need to make sure that they actually display.

Network: This can be a really big deal if we are working in users and areas where we know they have very slow connections. And do we need to simplify a new page for them? This is called adaptive design.

(9) Mobile

If something is strange but not significant, we can just move on.

  1. Benchmarking and users flow
  2. Page Analytics (a plugin we can find in Chrome store)

4. User Acquisition

(1) Learning about channels, sources and mediums

There are many questions here:

Well, how do they get the site?

Sources, searched and referrals

SEO and what users are looking for

Social statistics and …

Channels:  The general, top-level categories that our traffic is coming from, such as search, referral or social

Sources: A subcategory of a channel. For example, search is a channel. Inside that channel, Yahoo Search is a source.

Mediums:  By which the traffic from a source is coming to our site. That is, if the traffic is coming from Google, is it organic search or paid search?  

(2) Differentiating between channels-organic search and direct

(direct): direct traffic is where someone comes directly to our site, i.e., type the address into the browser bar or they click on the bookmark.

‘not provided’: the data comes through Google is now encrypted to keep governments or hackers or spies from getting value from it. 

(3) Unlocking Mysterious Dark Social Traffic

There are 6 ways that the dark social traffic can come to the site.

  1. Email, messages. The traffic is from someone’s email program. This is not tracked by Google Analytics because GA lives in browsers.
  2. Links in docs: it is in an application that is not tracked by Google Analytics
  3. Bit.ly, Ow.ly, etc.
  4. Mobile social: twitter etc. 
  5. From https to http
  6. Search errors

(4) Drilling down to track who goes where

From source/mediums: trigger email

(5) Spotting the ‘Ghost Spam’ in referrals

Ghost spam: 

It isn’t really hurting anything. In fact if we really are organized and we are the only one looking at the reports. We can leave them alone and nothing happens. 

There are noxious visits to my site made with the nefarious intent of getting us to click on the links and visit the site of the spammer. They are not actual visits. These sessions and pageviews are from bots that either hit our site and execute the Google Analytics scripts or bypass the server and hit the Google Analytics directly.

Firstly, we need to find out the ghost referrals: 

They came from host names that are not our site.

Check it through:  Acquisition–>All traffic–>Referrals.

We can see there is some websites that have 100% bounce rate and 0s average session duration.

We can also see some things e.g., xxxxxx.com / referral in Acquisition–>All traffic–>Sources/Medium

How to remove the ghost spam and fake traffic from Google Analytics?

Next blog we will talk more about it.

If you are interested in or have any problems with Business Intelligence or Google Analytics, feel free to contact me.

Or you can connect with me through my LinkedIn.

A special Easter Day

In this special Easter Day, New Zealanders need to stay in our own ‘bubbles’.

So, good time to do some learning stuff.

Pluralsight is now offering all courses free in April.

Completed the Google Analytics for Creative Professionals course on it.

Highly recommended its methodology:

  1. Look for top-level outliers + Mix & Match (segment)
  2. Go to pages and look for issues (technical, content & design)

Especially the part of spotting the ‘Ghost Spam’ in referrals and how to remove it with Regex, quite useful.

In business, making decisions by combining statistics with insight and our knowledge of human wants & needs is called data-informed

Next target: Architecting Data Warehousing Solutions Using Google BigQuery

If you are interested in or have any problems with BigQuery or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

Google Cloud Platform Fundamentals

Google Cloud is seen as a leader in areas including data analytics, machine learning and open source. And digital transformation through the cloud allowed companies to deliver personalised, high quality experiences.

During this Lockdown time in New Zealand, working from home means taking less time on the traffic and a time to learn more advanced techniques.

So stay positive and stay safe!

Thanks to GCP fundamentals, it is a perfect opportunity for those who wants to learn Google Cloud Platform.

If you are interested in or have any problems with Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.

DAX Cheat Sheet

What is DAX?

It is the programming language for Power Pivot, SSAS Tabular and Power BI.

It resembles Excel because it was born in PowerPivot. But it has no concept of <row> and <column> and has different type system.

The most important, it has many new functions.

The most two are measure and calculated column:

Measure is used to calculate aggregates, e.g., Sum, Avg and evaluated in the context of the cell in a report or a DAX query.

Calculated column evaluates each row and is computed at the low level within the table it belongs to.

Some Common Dax Expressions:

LOOKUP:

  1. Return the value in result_columnName for the row that meets all criteria specified by search_columnName and search_value
  2. LOOKUPVALUE(Result Column Name, Search Column Name, Search Column value)

FILTER:

  1. Return a subset of a table or expression
  2. FILTER(<table>,<filter>)

ALL:

  1. Return all the rows in a table, or values in a column, ignoring any filters that may have been applied
  2. ALL(<table> or <column>)

RELATED

  1. Returns a related value from another table
  2. RELATED(<column>)

CALCULATE

  1. Evaluates an expression in a context that is modifies by specific filters
  2. CALCULATE(<expression>,<filter1>,<filter2>)

If you are interested in or have any problems with DAX or Business Intelligence, feel free to contact me.

Or you can connect with me through my LinkedIn.