Scroll to top

Find Keywords That Drive Organic Search Traffic: Using GA4 and Google Search Console Data in BigQuery


Anoop T Unnikrishnan - July 21, 2024 - 0 comments

Understanding which keywords bring visitors to your website is essential for improving your SEO and boosting your site’s performance. By combining data from Google Analytics 4 (GA4) and Google Search Console (GSC) in BigQuery, you can see exactly which search queries are leading people to your site. This method lets you analyze the keywords that generate traffic to your web pages, giving you valuable insights into what users are looking for and helping you optimize your content to attract more visitors.

Prerequisites

Before we dive into the code, ensure you have the following:

  1. Google Cloud Project: With BigQuery enabled.
  2. GA4 Data: Exported to BigQuery.
  3. GSC Data: Imported into BigQuery.

Step-by-Step Guide

Step 1: Extract GA4 Data

First, we extract relevant data from the GA4 dataset. We focus on sessions that originated from organic Google searches. Here’s the query to achieve this:

WITH ga4_session_data AS (
  SELECT 
    PARSE_DATE('%Y%m%d', event_date) AS date,
    CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_id,
    MAX(CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'entrances') = 1 THEN (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') END) AS landing_page,
    MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium')) AS medium,
    MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source')) AS source
  FROM `Anmize-Project.analytics_282095110.events_*`
  WHERE _TABLE_SUFFIX BETWEEN '20240710' AND '20240716'
  GROUP BY 1, 2
  HAVING source = 'google' AND medium = 'organic'
),
ga4_aggregated_sessions AS (
  SELECT 
    date,
    landing_page,
    COUNT(DISTINCT session_id) AS unique_sessions
  FROM ga4_session_data
  WHERE landing_page IS NOT NULL
  GROUP BY 1, 2
)

Explanation:

  • PARSE_DATE: Converts event_date to a date format.
  • session_id: Concatenates user_pseudo_id with ga_session_id to create a unique session identifier.
  • landing_page: Extracts the landing page URL for each session.
  • medium & source: Extracts medium and source to filter organic Google traffic.
  • _TABLE_SUFFIX: Filters the data for the specific date range (July 10, 2024, to July 16, 2024).
  • GROUP BY: Groups the data by date and landing page, counting unique sessions.
  • WHERE & HAVING: Ensures only relevant data (organic Google traffic) is included and removes rows with missing landing pages.

Step 2: Extract GSC Data

Next, we extract data from the GSC dataset, focusing on search queries, impressions, clicks, and average position for the same date range.

WITH gsc_search_console_data AS (
  SELECT 
    data_date AS date,
    query,
    CASE 
      WHEN REGEXP_CONTAINS(query, '(bulkmockup)|(bulk mockup)|(bulkmockup.com)') THEN "branded" 
      ELSE 'non-branded' 
    END AS branded_query,
    country,
    url AS landing_page,
    AVG(sum_position) AS avg_position, 
    SUM(impressions) AS impressions,
    SUM(clicks) AS clicks  
  FROM `Anmize-Project.searchconsole.searchdata_url_impression`
  WHERE data_date BETWEEN '2024-07-10' AND '2024-07-16'
  GROUP BY 1, 2, 3, 4, 5
  HAVING query IS NOT NULL AND landing_page IS NOT NULL
)

Explanation:

  • data_date: Extracts the date.
  • query: Extracts search queries.
  • branded_query: Categorizes queries as “branded” or “non-branded” based on specific keywords.
  • country: Extracts the country of the search.
  • landing_page: Extracts the landing page URL.
  • avg_position, impressions, clicks: Aggregates data to calculate average position, total impressions, and clicks.
  • GROUP BY: Groups the data by date, query, branded query, country, and landing page.
  • HAVING: Ensures only non-null queries and landing pages are included.

Step 3: Combine GA4 and GSC Data

Finally, we combine the GA4 and GSC datasets using a LEFT JOIN on the date and landing page.

SELECT 
  ga4_aggregated_sessions.date,
  gsc_search_console_data.country,
  ga4_aggregated_sessions.landing_page,
  ga4_aggregated_sessions.unique_sessions,
  gsc_search_console_data.query,
  gsc_search_console_data.branded_query,
  gsc_search_console_data.impressions,
  gsc_search_console_data.clicks,
  gsc_search_console_data.avg_position
FROM ga4_aggregated_sessions
LEFT JOIN gsc_search_console_data ON gsc_search_console_data.date = ga4_aggregated_sessions.date
AND ga4_aggregated_sessions.landing_page = gsc_search_console_data.landing_page
WHERE ga4_aggregated_sessions.landing_page IS NOT NULL;

Explanation:

  • LEFT JOIN: Combines ga4_data and gsc_data on date and landing_page.
  • SELECT: Retrieves relevant fields from both datasets, including date, country, landing page, unique sessions, query, branded query, impressions, clicks, and average position.
  • WHERE: Ensures no rows with missing landing pages are included.

Important: Replace “Anmize-Project” with your GCP project name and Change ‘2024-07-10’ AND ‘2024-07-16’ based on your requirement

By combining GA4 and GSC data in BigQuery, you can gain comprehensive insights into how organic search queries contribute to website traffic and engagement. This integrated view helps you understand the effectiveness of your SEO efforts and identify opportunities for optimization.

Feel free to adjust the queries to fit your specific use case and data structure. With the power of BigQuery, you can efficiently analyze large datasets and derive valuable insights for your business.