Exploring the Impact of AIOs on Web Traffic

It is now abundantly clear that Google’s AI Overviews (AIOs) are here to stay. Draped over the top of SERPs, AIOs take up the most valuable SERP real estate and push Organic Results further down the SERP. At the same time, AIOs can platform certain webpages by way of in-line and separated citations.

In this effort, we examine the give-and-take nature of AIOs by inspecting thousands of webpages to quantify the net impact that AIOs have on web traffic.

While aggregating data across all web pages would yield a straightforward and tidy result, our Causal Inference approach reveals deeper nuances in web traffic trends when we analyze at a granular level.

In short, specific attributes of a webpage and a SERP can have profound impacts on AIO-driven web traffic trends. Although by and large we can say both that the presence of AIOs has a strong effect on web traffic and that AIO inclusion can greatly boost a site’s traffic.

Overview of research

To understand the impact of AIOs on web traffic, we compare Total Clicks (traffic) across webpages with three different forms of AIO Exposure:

No-AIO Webpages: webpages on SERPs with No AIO
Included Webpages: webpages on SERPs with an AIO + featured in the AIO
Excluded Webpages: webpages on SERPs with an AIO + not featured in the AIO

Notably, we consider the same URL a distinct “webpage” for each SERP it appears on, as it has a distinct AIO Exposure, distinct web traffic data, and a distinct SERP position on each SERP.

We use Total Clicks as a proxy for web traffic. By comparing clicks across our three webpage groups, we can attribute differences in clicks to the differences in AIO Exposure across webpage groups.

In order to execute this concept reliably, we utilize a Causal Inference framework, which includes safeguards to avoid spurious results.

Key findings

Our results vary based on the specific subset of data being discussed. Specifically, our results vary based on the SERP Position and the Intent of a webpage.

We assign SERP Position to a webpage as “Top-Ranked” (positions < 3) or “Lower-Ranked” (positions 3–10) based on Organic Results rankings. We did not consider webpages outside the Top 10 Organic Results as these webpages would not usually be on the same page as an AIO.

We assign Webpage Intent to a webpage as “Transactional” or “Informational” based on a proprietary Terakeet classifier.

General takeaways

Despite the granularity of our findings, we are able to tease out a handful of trends that are generally applicable across all types of webpages.

Regardless of Intent, when AIOs are present, webpages benefit from being included
Informational Queries: AIO presence diverts traffic away from Top-Ranked Webpages (regardless of their inclusion) but increases traffic for Lower-Ranked Webpages
Transactional Queries: AIO inclusion adds more traffic to webpages across all SERP positions compared to any other type of AIO Exposure
Top-Ranked Webpages and Lower-Ranked Webpages reacted differently to all forms of AIO Exposure, never exhibiting a statistically similar response

Results data tables

Statistical significance is harder to interpret with a Causal Inference framework, as the datasets we are comparing have less randomness than a truly random distribution of data. Given less randomness in our data, we could reasonably use relaxed standards for statistical significance. However, we err on the side of caution and impose the following thresholds for statistical significance:

Strong: p-value < 0.05
Weak: p-value < 0.15
None: p-value >= 0.15

No-AIO Webpages vs Included Webpages

SERP Positions	Intent	Takeaway	Statistical Significance
Top-Ranked	Informational	No-AIO Webpages had 1.4x as many clicks as Included Webpages	Weak
Top-Ranked	Transactional	Included Webpages had 1.4x as many clicks as No-AIO Webpages	None
Lower-Ranked	Informational	Included Webpages had 2x as many clicks as No-AIO Webpages	Strong
Lower-Ranked	Transactional	Included Webpages had 3.6x as many clicks as No-AIO Webpages	Strong

No-AIO Webpages vs Excluded Webpages

SERP Positions	Intent	Takeaway	Statistical Significance
Top-Ranked	Informational	No-AIO Webpages had 2x as many clicks as Excluded Webpages	Strong
Top-Ranked	Transactional	No-AIO Webpages had 2.3x as many clicks as Excluded Webpages	Strong
Lower-Ranked	Informational	Excluded Webpages had 1.8x as many clicks as No-AIO Webpages	Strong
Lower-Ranked	Transactional	Excluded Webpages had 3x as many clicks as No-AIO Webpages	Strong

Included Webpages vs Excluded Webpages

SERP Positions	Intent	Takeaway	Statistical Significance
Top-Ranked	Informational	Included Webpages had 1.5x as many clicks as Excluded Webpages	Weak
Top-Ranked	Transactional	Included Webpages had 3.2x as many clicks as Excluded Webpages	Strong
Lower-Ranked	Informational	Included Webpages had 1.1x as many clicks as Excluded Webpages	None
Lower-Ranked	Transactional	Included Webpages had 1.2x as many clicks as Excluded Webpages	None

Causal Inference overview

In short, Causal Inference is an approach used to show that A caused B. In our particular case, we want to see if any particular AIO condition causes some change in web traffic. Specifically, we hope to establish a causal relationship instead of simply showing a correlation between two observations.

Causal Inference allows us to make causal claims by holding constant other variables, called confounding variables (or confounders), that may be driving variation in observed data. By holding confounders constant, we can be confident that differences in observed data are actually driven by our independent variables. This confidence is rooted in the fact that if the value of confounders is the same across datasets, the effects of those confounders are also equal and can therefore be cancelled out when comparing results between datasets.

The example below illustrates this concept.

Example

Does the color of your shirt make you a better soccer player?
To answer this question, we can take the average goals scored for players in blue shirts and compare it to the average goals scored for players in red shirts. Let’s say players in red shirts score more goals than players in blue shirts, and the difference is even statistically significant!
We could say that shirt color matters. But now let’s say that many players in red shirts are the team captains of their teams. We have now introduced a confounding variable: Team Captaincy. Suddenly, it’s not clear if shirt color (our independent variable) had anything to do with goals scored as Team Captaincy could explain the difference in goals scored.
However, what if we held Team Captaincy constant? As in, what if we only compare players who wear red shirts and who are captains to players who wear blue shirts and are captains? Now, we can be certain that the difference in goals scored has nothing to do with Team Captaincy.

Causal Inference execution

Data

We execute Causal Inference for our use case with these variables:

Dependent Variable: traffic metrics
- Total Clicks (per day)
Independent Variable: AIO Exposures
- No-AIO: No AIO was present
- Inclusion: AIO was present, and webpage was featured in it
- Exclusion: AIO was present, and webpage was not featured in it

We employ three datasets, one dataset for each of our AIO Exposures. Each dataset has webpages and webpage metadata (SERP Position, Intent, etc) along with the corresponding clicks data for each webpage.

We pull SERP data from SERP API and clicks data from Google Search Console. We pull Webpage Intent from our proprietary classifier. Finally, we sample keywords from our internal data store to pull SERPs and webpages.

The sizes of our datasets are shown below; as far as statistical significance goes, we determine “N” based on the “Unique Webpages” column.

Dataset	Unique Webpages	Unique URLs
No-AIOs	1793	536
Inclusion	2074	552
Exclusion	1545	421

Identifying confounding variables

Our two main criteria to define confounding variables are:

Does the candidate confounder have a meaningful impact on our dependent variable (Total Clicks)?
Did the candidate confounder have meaningful variation across our three datasets?

Because designating confounders slices up datasets, superfluously designating confounders can unnecessarily reduce the sample size available to compare datasets. Therefore, candidate confounders must meet both of these criteria to ensure that they actually threaten the fidelity of our findings.

To evaluate meaningful impact, we use a Pearson’s Correlation Coefficient and a manual comb-through of data. To evaluate meaningful variation, we use the Kolmogorov-Smirnov (KS) Test for distribution difference with the requirement of strong statistical significance.

We evaluated several candidate confounders, but ultimately find two conclusive confounding variables:

SERP Position
Webpage Intent

SERP Position

SERP Position shows a correlation with Total Clicks across all three of our datasets.

Dataset	Correlation: SERP Position & Total Clicks
No-AIO	-0.21
Inclusion	-0.09
Exclusion	-0.12

KS Tests for a difference in SERP Position distribution between all two-dataset combinations of our three datasets yield p-values less than 0.05. In fact, the “weakest” p-value between any two of our datasets is 5e-09, several orders of magnitude stronger than our threshold. This indicates that SERP Position varies meaningfully across our three datasets.

Webpage Intent

The following table shows the impact that Webpage Intent has on clicks across our three datasets. The table also shows the variance in Webpage Intent distribution across our three datasets. It is clear from this data that both conditions for being a confounding variable are satisfied.

Dataset	Total Clicks for Informational Pages	Total Clicks for Transactional Pages	% of Pages Informational	% of Pages Transactional
No-AIO	0.25	0.68	60%	17%
Inclusion	0.28	0.82	96%	3%
Exclusion	0.16	0.26	91%	6%

(Total Clicks is an average across many unique webpages for a single day).

Blocking

Once confounding variables have been identified, Blocking is the process we use to actually account for and hold constant the confounders. Blocking partitions all datasets into chunks (or “blocks”) of data based on our confounding variables. Each dataset has the same blocks, allowing us to compare datasets by lining up similar blocks. Crucially however, blocks must have a similar distribution of our confounders across datasets in order to ensure a level comparison.

To illustrate, let’s dive into our data, where we want to chunk our data based on our confounders: SERP Position and Webpage Intent.

Generating blocks: Webpage Intent

With Webpage Intent, we can very easily divide our data into two categories, Informational and Transaction. Now, we have six total blocks across our three datasets:

Block Abbreviation	AIO Exposure	Webpage Intent
NI	No-AIO	Informational
NT	No-AIO	Transactional
II	Included	Informational
IT	Included	Transactional
EI	Excluded	Informational
ET	Excluded	Transactional

We are not worried about the distribution of Webpage Intent within blocks. All blocks tagged as Transactional only have webpages with Transactional intent and all blocks tagged as Informational only have webpages with Informational intent. As long as we compare Transactional blocks to Transaction blocks and Informational blocks to Informational blocks, we run no risk of dissimilar distributions of Webpage Intent.

With SERP Position, we reach a more complex situation.

Generating blocks: SERP Position

We could make each SERP Position a block and compare webpages at position 1 to webpages with position 1, and so on. However, if we do this, we generate 10 more blocks and slice our data quite thin. Our blocks may contain very few webpages, which makes comparison weak.

Instead, we take a range of SERP Positions and make them a single block. In doing this, we ensure that the distribution of SERP Positions within each block is consistent across our datasets.

If not, we risk a scenario where we, for example, make SERP positions 1–3 a block but one dataset has many more webpages at position 1 than the others (and therefore has a completely different distribution, even within a supposedly similar block).

We choose to block our data by SERP Position by making two blocks: SERP Positions 1–3 (Top-Ranked) and SERP Positions 3–10 (Lower-Ranked). Indeed, we find that within these blocks, the distribution of SERP Positions in our Excluded Webpage dataset was meaningfully different from our Included Webpage and No-AIO datasets.

To address this, we downsampled the Excluded Webpage dataset’s Top-Ranked block to match the distributions of our other two datasets. We also upsampled the Excluded Webpage dataset’s Lower-Ranked block to match distributions (notably, only 2.8% of records in the block are upsampled to maintain the integrity of the dataset).

Resultant blocks

We reach these blocks as our final blocks. The comparisons presented at the top of this post are done so by comparing, within blocks, the average of Total Clicks in one dataset to another.

Block	SERP Positions	Intent	# of No-AIO Webpages	# of Included Webpages	# of Excluded Webpages
A	Top-Ranked	Informational	395	726	473
B	Top-Ranked	Transactional	293	35	47
C	Lower-Ranked	Informational	684	751	1648
D	Lower-Ranked	Transactional	421	33	152

Conclusion

It is simpler to avoid confounders, blocking, and the various hoops that Causal Inference presents. However, our findings show that the impact of AIOs on web traffic is truly nuanced. The impact of AIOs on web traffic to a webpage depends on the attributes of the webpage itself.

This said, we can say that generally speaking, being excluded from an AIO has measurable and significant harms for a webpage. Conversely, being included in an AIO has clear benefits for webpages. And overall, the presence of AIOs dramatically changes web traffic across webpages.