Unboxing and Understanding Google's AI Overviews

In May of 2024, Google announced its launch of AI Overviews (AIOs) in search. AIOs appear for certain search queries based on an unknown criteria set by Google. When they appear, AIOs leapfrog all other search results and can take up a sizable chunk of valuable real estate on user screens.

Despite the prevalence of AIOs in search, there is little reliable information about these AI-generated results. To address this gap, I conducted introductory research into the new expanse of AIOs.

Overview of Research

My research effort had two objectives: quantify basic information about AIOs and develop a reusable pipeline to consistently ingest new data. To guide my research more concretely, I developed three hypotheses to investigate.

SERP Ranking Impacts AIO Inclusion
- The higher a page ranks amongst Organic Results for a keyword, the more likely it is to appear in the AIO for that keyword
Basic HTML Formatting does not Impact a Page’s AIO Inclusion Probability
- Basic HTML formatting features such as headers will not be meaningfully predictive of its inclusion in an AIO
AIOs will be Consistent Week over Week
- We will see 90%+ overlap in the URLs featured in AIOs
- We will see 90%+ semantic similarity in the text rendered by AIOs

Key Findings

The AIO space proved to be quite dynamic – my research answered many questions about AIOs, but also illuminated new areas for exploration. A few trends emerged from my research as reliable takeaways.

Pages that rank higher for a given keyword are better represented in the AI Overview for that keyword
HTML formatting features of a given page do not meaningfully impact whether or not that page is represented in AI Overviews
The set of pages featured in a given AIO partially changes week to week
The text portion of a given AIO hardly changes week to week

If you would like to learn more about my process, approach, and findings, I have detailed my research journey across the rest of this post.

Data

Data Sourcing

AIO data, for my purposes, had three components: the AIO text, the AIO featured URLs (AIO URLs for short), and the Organic Search Results (note: I took the top 70 Organic Search Results to cover a reasonable amount of the search landscape while maintaining a manageable data burden). The snippet below shows a sample AIO with labeled AIO text and AIO featured URLs. The Organic Search Results (or Organic Results) are the webpages that follow the AIO and other content featured by Google.

I needed to pull and store all this information for each keyword along with a record of when the information was pulled. After considering a multitude of data sources, I chose SERP API, as it was able to pull all my desired data points with the highest accuracy. (You can check out their API documentation). Specifically, SERP API showed a competitive advantage over SEMRush, STAT API, Nimble API, and Data for SEO.

Dataset

I selected roughly 14K keywords across an array of industries for my dataset, pulling both Organic Results data and AIO data three distinct times (each exactly one week after the other). Across three weeks, I was able to pull at least one week of complete AIO data (text and featured URLs) for roughly 2.3K keywords. I was able to pull Organic Results data for all keywords across all three weeks.

I stored the data in Snowflake to allow for efficient querying and to make use of Snowflake’s text embedding functionality through its Cortex Functions.

Data Storage

After parsing the JSON output from SERP API, I funneled the data into three tables: Organic Results, AIO Data, and AIO URLs. The tables reference one another through the data model outlined below.

Diagram of tables used to drive research

KEYWORD: keyword or search query
WEEK: datetime object indicating week and year of data
SERP_POSITION: Organic Results rank (SERP Ranking) of page
TITLE: title of page
LINK: full URL of page, used interchangeably with “URL”
AIO_FLAG: binary flag indicating if AIO is present for a keyword
AIO_TEXT: AIO text section of AIO
AIO_URL_REFS: reference ids of AIO URLs
KW_URL_ID: unique id of a URL for a keyword

Analysis & Results

Hypothesis 1: SERP Ranking

To answer this question, I first brought together two tables of interest: Organic Results and AIO URLs. I reasoned that left-joining AIO URLs to Organic Results on [Keyword, Date, Link] would give me every Organic Result across my 14K keywords along with its AIO URL data, if it was featured in the AIO.

This ended up being true, but with an unexpected twist.

I found that when joining on [Keyword, Date, Link], only 60% of AIO URLs had a match in my Organic Results table. In other words, 40% of URLs in my AIO URLs table did not rank in the top 70 Organic Results for a given keyword – despite being featured in the AIO for that very keyword.

This felt inexplicable at first. But after some digging, I found that indeed, Google sometimes features URLs in its AIOs that simply do not appear in their own Organic Results. However, using the AIO URLs that did show up in my Organic Results data, I was able to plot the distribution of AIO URLs across SERP rankings (or, SERP Position).

Distribution of AIO Links, broken down by SERP Position

To address the aforementioned oddity, I re-ran the table join. But, instead of joining on [Keyword, Date, Link], I joined on [Keyword, Date, Domain], where Domain was the website domain extracted from the full link. I found that 90% of domains from my AIO URLs table had a match in my Organic Results table. This showed that only 10% of domains featured in AIOs did not rank in the top 70 Organic Results. The resulting distribution is displayed below.

Domain AIO-Inclusion Probability, broken down by SERP Position

Nonetheless, the two plots display a nearly identical trend: the SERP ranking of a page is strongly correlated with its inclusion in AIOs. This suggests that traditional search engine optimization (SEO) practices are still relevant in the age of AI-assisted search.

Hypothesis 2: Page Features

The unique hurdle to addressing this question was to develop a set of features to represent HTML page structure for urls in our dataset. Experts at Terakeet told me that header structure and lists were particularly salient. So, I sought to represent both with new features.

I turned to ScrapingBee to scrape the URLs in my Organic Results table. ScrapingBee returned encoded HTML that I was able to parse using BeautifulSoup4. Given the computational expense of scraping, I sampled down my data to a dataset with 1K keywords and 36K total rows. I extracted the following features from my parsed HTML:

Header String: a string containing – in preserved order – the headers used in the page’s HTML (ex: “h1 h2 h2 h3 … ")
OL: a count of the number of lines attributed to an ordered list
UL: a count of the number of lines attributed to an unordered list
max_consec_h: a count of the maximum consecutive occurrences of each header type (h1, h2, … , h6)
num_h1h2: a count of “header bridges” (i.e. when header string goes from h1 to h2); can be found for any combination of consecutive headers (e.g. h2-h3, h3-h4, etc.)
start_h1: denotes if html starts with h1 header first (feature also available for h2 and h3)
h_pct: gets the count of each header (h1, h2, … , h6) as a percentage of the total number of headers

I pulled in these features along with SERP ranking (which I grabbed from my Organic Results table) to see what connections existed between my features and AIO inclusion (which I grabbed from AIO_FLAG in Organic Results).

A quick correlation plot suggested that while SERP ranking was resonably correlated with AIO inclusion, none of my page features had meaningful correlations on their own with AIO inclusion.

Magnitude of Correlation between various features and binary target variable (AIO Inclusion)

Still, I decided to go one step further to check to see if perhaps through interaction, my page features could collectively relate to AIO inclusion. To do this, I spun up three XGBOOST models:

Model A: predicts AIO inclusion using only SERP Ranking
Model B: predicts AIO inclusion using SERP Ranking + Page Features
Model C: predicts AIO inclusion using only Page Features

I conducted hyperparameter tuning for each model using F1-score as the objective given a class imbalance in my dataset (13% of rows in my dataset were positive for AIO inclusion). I found an overwhelming confirmation of what the correlation plot showed: page features contributed little predictive power with respect to AIO inclusion. The plot below shows that on their own, my page features yielded a poor model for predicting AIO Inclusion. Even with SERP Position, my page features added a nominal amount of predictive ability.

F1 score of xgboost models, broken down by feature sets

Hypothesis 3: AIO Consistency

Text Consistency

Here, the only additional data wrangling I did was to join my tables together to allow for week over week analysis. I joined my AIO table to itself on [Keyword], filtering out records with matching dates. Notably, this process tossed out keywords without multiple AIOs over the course of my three week data collection.

I then leveraged Snowflake’s Cortex functions, powered by Snowflake’s arctic-embed-m LLM. I used the EMBED_TEXT_768() Cortex function to embed the AIO text columns into 768-dimensional vectors. I then fed these two embedding vectors into the VECTOR_COSINE_SIMILARITY() Cortex function, which gave me the cosine similarity of the two AIO texts. Both of these functions were fully managed by Snowflake and utilized Snowflake’s GPUs for near-instant execution. A sample resultant dataset is shown below.

Sample dataframe used to compute cosine similarity of AIO text

I plotted the similarity scores, which showed that generally, AIO text remains semantically similar week over week.

Distribution of cosine similarity scores across aio-text pairs

Featured Links Consistency

Here, I began with AIO URLs data, filtered into a week A and a week B dataset (again where week A is the week prior to week B). Then, I took both the intersection (inner join) and union (outer join) of my two datasets, joining on [Keyword, Link].

Next, I counted – for each keyword – the number of URLs in the intersection and in the union. This allowed me to fill in the formula for Jaccard Similarity and compute the similarity between featured URLs for each keyword across week A and week B.

Jaccard Similarity for two sets, A and B

I repeated this process, swapping out joining on [Keyword, Link] for [Keyword, Domain]. This effectively returned a featured domain similarity score as opposed to a featured URL similarity score. Both scores exhibit somewhat similar distributions. However, the domain overlap is higher, with a median similarity score of 0.50 compared to the URL overlap median similarity score of 0.36.

Distribution of jaccard similarity scores across aio-domain set pairs

Distribution of jaccard similarity scores across aio-url set pairs

So, while we can confirm the similarity of AIO text week over week, we cannot confirm the same constancy of featured URLs (or featured domains).

Conclusion

The research here represents a preliminary exploration of AI Overviews – there are still many open questions in this space that we intend to continue exploring. At the moment, we can reliably proceed with the following results:

Claim	Verdict
SERP Ranking Impacts AIO Inclusion	True! Higher-ranking pages are featured more often in AIOs
HTML Formatting does not Impact a Page’s AIO Inclusion	True! Basic HTML features do not impact whether a page is included in an AIO
AIO Text will be Consistent Week over Week	True! AIO Text is semantically very consistent week to week
Featured AIO URLs will be Consistent Week over Week	False! The set of URLs featured in AIOs show mixed consistency week to week

A key outcome from this research was the development of a reusable data pipeline to pull in new AIOs. With an ever-flowing source of fresh AIOs, we hope to be strongly positioned to monitor and discover new trends in the new paradigm of AI-assisted search.