Understanding which keywords bring visitors to your website is essential for improving your SEO and boosting your site’s performance. By combining data from Google Analytics 4 (GA4) and Google Search Console (GSC) in BigQuery, you can see exactly which search queries are leading people to your site. This method lets you analyze the keywords that generate traffic to your web pages, giving you valuable insights into what users are looking for and helping you optimize your content to attract more visitors.
Prerequisites
Before we dive into the code, ensure you have the following:
- Google Cloud Project: With BigQuery enabled.
- GA4 Data: Exported to BigQuery.
- GSC Data: Imported into BigQuery.
Step-by-Step Guide
Step 1: Extract GA4 Data
First, we extract relevant data from the GA4 dataset. We focus on sessions that originated from organic Google searches. Here’s the query to achieve this:
WITH ga4_session_data AS (
SELECT
PARSE_DATE('%Y%m%d', event_date) AS date,
CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_id,
MAX(CASE WHEN (SELECT value.int_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'entrances') = 1 THEN (SELECT value.string_value FROM UNNEST(event_params) WHERE event_name = 'page_view' AND key = 'page_location') END) AS landing_page,
MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'medium')) AS medium,
MAX((SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'source')) AS source
FROM `Anmize-Project.analytics_282095110.events_*`
WHERE _TABLE_SUFFIX BETWEEN '20240710' AND '20240716'
GROUP BY 1, 2
HAVING source = 'google' AND medium = 'organic'
),
ga4_aggregated_sessions AS (
SELECT
date,
landing_page,
COUNT(DISTINCT session_id) AS unique_sessions
FROM ga4_session_data
WHERE landing_page IS NOT NULL
GROUP BY 1, 2
)
Explanation:
- PARSE_DATE: Converts
event_date
to a date format. - session_id: Concatenates
user_pseudo_id
withga_session_id
to create a unique session identifier. - landing_page: Extracts the landing page URL for each session.
- medium & source: Extracts medium and source to filter organic Google traffic.
- _TABLE_SUFFIX: Filters the data for the specific date range (July 10, 2024, to July 16, 2024).
- GROUP BY: Groups the data by date and landing page, counting unique sessions.
- WHERE & HAVING: Ensures only relevant data (organic Google traffic) is included and removes rows with missing landing pages.
Step 2: Extract GSC Data
Next, we extract data from the GSC dataset, focusing on search queries, impressions, clicks, and average position for the same date range.
WITH gsc_search_console_data AS (
SELECT
data_date AS date,
query,
CASE
WHEN REGEXP_CONTAINS(query, '(bulkmockup)|(bulk mockup)|(bulkmockup.com)') THEN "branded"
ELSE 'non-branded'
END AS branded_query,
country,
url AS landing_page,
AVG(sum_position) AS avg_position,
SUM(impressions) AS impressions,
SUM(clicks) AS clicks
FROM `Anmize-Project.searchconsole.searchdata_url_impression`
WHERE data_date BETWEEN '2024-07-10' AND '2024-07-16'
GROUP BY 1, 2, 3, 4, 5
HAVING query IS NOT NULL AND landing_page IS NOT NULL
)
Explanation:
- data_date: Extracts the date.
- query: Extracts search queries.
- branded_query: Categorizes queries as “branded” or “non-branded” based on specific keywords.
- country: Extracts the country of the search.
- landing_page: Extracts the landing page URL.
- avg_position, impressions, clicks: Aggregates data to calculate average position, total impressions, and clicks.
- GROUP BY: Groups the data by date, query, branded query, country, and landing page.
- HAVING: Ensures only non-null queries and landing pages are included.
Step 3: Combine GA4 and GSC Data
Finally, we combine the GA4 and GSC datasets using a LEFT JOIN
on the date and landing page.
SELECT
ga4_aggregated_sessions.date,
gsc_search_console_data.country,
ga4_aggregated_sessions.landing_page,
ga4_aggregated_sessions.unique_sessions,
gsc_search_console_data.query,
gsc_search_console_data.branded_query,
gsc_search_console_data.impressions,
gsc_search_console_data.clicks,
gsc_search_console_data.avg_position
FROM ga4_aggregated_sessions
LEFT JOIN gsc_search_console_data ON gsc_search_console_data.date = ga4_aggregated_sessions.date
AND ga4_aggregated_sessions.landing_page = gsc_search_console_data.landing_page
WHERE ga4_aggregated_sessions.landing_page IS NOT NULL;
Explanation:
- LEFT JOIN: Combines
ga4_data
andgsc_data
ondate
andlanding_page
. - SELECT: Retrieves relevant fields from both datasets, including date, country, landing page, unique sessions, query, branded query, impressions, clicks, and average position.
- WHERE: Ensures no rows with missing landing pages are included.
Important: Replace “Anmize-Project” with your GCP project name and Change ‘2024-07-10’ AND ‘2024-07-16’ based on your requirement
By combining GA4 and GSC data in BigQuery, you can gain comprehensive insights into how organic search queries contribute to website traffic and engagement. This integrated view helps you understand the effectiveness of your SEO efforts and identify opportunities for optimization.
Feel free to adjust the queries to fit your specific use case and data structure. With the power of BigQuery, you can efficiently analyze large datasets and derive valuable insights for your business.