What Does The Google Search API Ranking Factors Data Leak Mean for Linkbuilding, Digital PR, and SEO

Early in May 2024, 2,500+ internal documents relating to Google Search’s Content Warehouse API were leaked. This constitutes the most comprehensive collection of Google Search API ranking factors any SEO has ever seen in the history of search engines.  

Not only does this leak contradict numerous denials and false statements from Google over the years, but it also provides valuable insights for the whole SEO industry and, crucially, our clients.

​​TL;DR:

  • On Sunday 5 May, 2024, Erfan Azimi, founder & CEO of an SEO agency, leaked the most comprehensive collection of Google Search API ranking factors in the history of search engines.  
  • He leaked these documents to two of the most respected voices in the SEO industry: Rand Fishkin and Mike King (founder of iPullRank). 
  • Google dodged questions about the leak and then admitted the authenticity of the documents.
  • Everyone in the SEO industry needs to take this leak with a pinch of salt and a dash of context. 
  • We also need to remember that this isn’t the inner workings of Google’s algorithm. These are 2,500+ internal documents relating to Google Search’s Content Warehouse API which mirrors (or did, at the time of the leak), the Document AI Warehouse
  • Now, let’s dive into the details about the Google documents data leak and what this means for SEO, Linkbuilding, and Digital PR.

Read on for details 👇

In this article, we are going to take a quick look at the details of the leak. 

And then, crucially for businesses that need and use organic services to drive traffic and revenue growth:

  • What does this leak mean for SEO (Search Engine Optimization)?
  • What it means for Linkbuilding and Digital PR?
  • Does this mean you need to overhaul your whole SEO, Linkbuilding, and Digital PR strategy?

Quick Overview: What are the Leaked Google Search API Documents?

Like every monopoly and corporate empire before it, Google is a gatekeeper. Google is the gatekeeper of the Internet. Other search engines don’t make enough of an impact for most SEOs to be worth worrying about. 

To use a Star Wars analogy, Google is the Galactic Empire and the whole SEO industry is the Rebel Alliance. Anything we can get from the Empire helps us, and our clients. It’s a war of attrition. Wins and losses. Lies and greedy mistrust. 

Despite Google’s attempts to be more transparent in recent years, this leak shows that Google’s concealed and denied a lot that’s proven to be true. 

From our perspective, as an SEO agency that’s currently helping over 300+ businesses rank higher in Google, we are pleased that this leak vindicates a lot of what we knew and suspected over the years. 

💡These documents, our analysis, and numerous experts reviewing the details prove that linkbuilding, guest posting, blogger outreach, and digital PR works, SEO makes a difference (new services coming soon!), and it’s important to have SEO audits and topical maps done before launching any kind of SEO campaign. 

So, let’s get into the details, and what this means for how we at OutreachMama are doing for our clients…

Who leaked them, and who analyzed them first?

On Sunday 5 May 2024, Erfan Azimi, founder & CEO of an SEO agency, leaked the most comprehensive collection of Google Search API ranking factors in the history of search engines.  

It seems the documents were released by mistake after they were published on GitHub on March 27, 2024, and not removed until May 7, 2024, under the Apache 2.0 license (same as any open-source code). No laws were broken. It wasn’t a data breach, hack, or illegal, as the license means that anyone who accessed the documents could use, modify, and distribute. 

It just so happened that Erfan Azimi spotted them, and shared these documents with two of the loudest and most respected authorities in the SEO industry: Rand Fishkin and Mike King. Both released the documents and accompanying analysis on 27 May. 

Since then, everyone in the SEO industry has taken a look at the documents, or one of the many resources, tools, and websites (listed below) in an attempt to make sense of how to use this information for their clients. 

Everyone in the SEO industry needs to take this leak with a pinch of salt and a dash of context. A lot of us will let cognitive bias and pre-conceived ideas get in the way of a clear analysis. 

Interestingly, this has done nothing to improve trust amongst SEOs when it comes to Google. As SEO Roundtable notes: “Even after the Google search API data leak, even after it was confirmed by Google, still [only] 10% of SEOs will continue to trust Google statements going forward. The wild thing is, this level of trust has not changed in the past decade when I ran a similar poll in 2014.”

Are the Google Search API Documents authentic?

Yes, it seems they are. 

At first, Google dodged questions about the leak, and then a few days later, admitted the documents were accurate, saying that: “We would caution against making inaccurate assumptions about Search based on out-of-context, outdated, or incomplete information.” 

Not only that but crucially, the leaked documents align with what’s coming out in the testimony and discovery in the DoJ’s antitrust suit against Google, so we’ve got further proof of authenticity. This leak also fits in with what the industry now knows thanks to previous Google leaks.

For example, one thing the leaked documents prove (despite repeated Google denials over the years) is that numerous user clicks do factor into how websites are ranked. 

As The Verge and numerous other commentators have pointed out: “Testimony from the antitrust suit by the US Department of Justice previously revealed a ranking factor called Navboost that uses searchers’ clicks to elevate content in search.”

In the DoJ case against Google, VP of Search, Pandu Nayak, admitted that an algorithm called NavBoost collected data from Google’s Toolbar PageRank. Getting more user data for NavBoost is one of the main reasons for the launch of their browser, Google Chrome, in 2008. 

Alongside the documents, and references to the same ranking factors in material the DoJ can access, both Rand Fishkin (exited Founder of Moz, now running SparkToro and Snackbar Studio,) and Mike King verified as many details as they could with current and ex-Google employees. 

Here’s what a couple of former Googlers said about the documents when Fishkin asked for signs they were authentic:

  • “It has all the hallmarks of an internal Google API.”
  • “It’s a Java-based API. And someone spent a lot of time adhering to Google’s own internal standards for documentation and naming.”

An example of the inner workings of how Google scores and ranks websites.

Fishkin also talked to Azimi, anonymously at first, before everything was released and Azimi went public on YouTube

The big question, or series of questions, is what does all of this mean?

What should we or could we do with this information? While keeping in mind that it’s incomplete, Google could change ranking factors since the leak, and as Fishkin points out: “This documentation doesn’t show things like the weight of particular elements in the search ranking algorithm, nor does it prove which elements are used in the ranking systems.

So, we absolutely need to take that into account. 

The 2,500+ leaked documents show us 14,014 attributes (API features), or “ranking factors” from Google Search’s Content Warehouse API. In other words, it shows us what data Google collects, rather than exactly how Google interprets that data. 

However, even that information is valuable and has value for any company that cares about how to get more organic traffic from Google.

An example of the inner workings of how Google scores and ranks websites.

Google Search API Docs Leak Learnings & Takeaways

Businesses work with SEO agencies, professionals, and consultants because they want and need to increase web traffic (mainly from Google), ensure that traffic is relevant to them ⏤ people actually interested in their services ⏤ and turn that traffic into paying customers. 

Businesses could turn to advertising, such as through Google Ads, and many do, or do both as part of “Search-based” marketing. However, the more traffic you win organically, the less money you need to spend on ads, and the higher-value that traffic is because it’s authentic. 

Searchers trust organic results more than ads. Search takes more time but generates a higher ROI. Making it crucial that businesses get organic search ⏤ in all its forms: SEO (on-page, content), Linkbuilding, Digital PR, and Local ⏤ working effectively to generate an ongoing ROI. 

Hence why we are looking at what this data leak means for the following areas of organic search:

  • What it means for Linkbuilding and Digital PR?
  • What does this leak mean for SEO (Search Engine Optimization)?
  • Does this mean you need to overhaul your whole SEO, Linkbuilding, and Digital PR strategy?
The public face of what these documents represent:  Google’s Document AI Warehouse, also known as Google’s Cloud APIs.

What does the Google Search API Docs Leak mean for Digital PR & Linkbuilding?

Here are a number of important considerations we’ve now got to factor in when implementing SEO for clients:

  • DA = Important! Google does give websites a weighting, or ranking factor called “siteAuthority.” In other words, very similar to what Moz and other tools call “Domain Authority” (DA). For on-page SEO (more about this below), this means those efforts matter to improve a website’s authority and, therefore, traffic and rankings.  For linkbuilding, this means that higher DA-based backlinks are more valuable than lower-ranking links.
  • Clicks = Ranking signal. Again, equally important for linkbuilding and digital PR as for SEO, despite years of Google saying that “using clicks directly in rankings would be a mistake.” So, we can confidently state that the more clicks a website gets, the more weight that signal gives to Google that the website getting those clicks should be ranked higher.
  • Higher quality links to individual pages = Better siteAuthority. Similar to Moz’s Page Authority (PA), a sub-authority ranking factor under DA. Linkbuilding and digital PR shouldn’t only focus on the Home page and one or two others. Whenever possible, a linkbuilding campaign should send domain authority and, therefore, ranking signals (and traffic) to dozens of pages, and keep doing that to ensure the links, traffic, and clicks are as fresh as possible. All of this plays a role in how Google ranks websites.
  • Google explicitly ranks websites higher that have higher-quality backlinks. According to Mike King, the Google API documents clearly show that “the higher the tier [of backlink], the more valuable the link. Pages that are considered “fresh” are also considered high quality. Suffice it to say, you want your links to come from pages that either fresh or are otherwise featured in the top tier. This partially explains why getting rankings from highly ranking pages and from news pages yields better ranking performance.”
  • Branded search has value. Another thing the documents showed us is that if a site has a lot of links and traffic but a smaller SEO footprint (e.g., a site with 10k backlinks but only 5 pages and limited social presence) that’s a warning signal to Google. Website’s need to ensure that linkbuilding strategies are in-line with other efforts, such as SEO, social media, and other factors that Google does include in ranking algorithms.

Now, let’s look at the other side of this: On-page SEO, and content, or simply SEO.

What does the Google Search API Docs Leak mean for SEO (On-page/on-site)?

There is quite a bit of cross-over, as we are seeing from the documentation.

  • DA = Important! As mentioned above, Google does rank websites, so that means that on-page/organic SEO work for websites makes an impact because this improves how a website ranks and the traffic it can pull in.
  • Clicks = Ranking signal. Despite years of denials, the number of clicks a website gets does matter. It does make a difference. The more you do for a website (e.g., on-page optimizations, ongoing content marketing/SEO) the more traffic that will drive and, therefore, more clicks, resulting in higher rankings.
  • Great content, promoted to the right audience + backlinks = WILL generate higher rankings and traffic. Google has been saying this for years, and these documents are yet more proof. In his analysis, Mike King, says: “After reviewing these features that give Google its advantages, it is quite obvious that making better content and promoting it to audiences that it resonates with will yield the best impact on those measures.”
  • Age matters. According to the documents, Google factors in the age of a website, and, crucially, the age (or last time it was done) websites and individual pages were last updated. It’s another thing that SEOs have long-since suspected, now proven to be true. You need to update your content as this is factored into ranking performance.
  • Traffic loss over time is a factor for Google = Keep SEO content fresh/new. Again, something Google disputed, even attacking the reputations of SEOs who figured this out. Say your website had 10K a month in new users and it drops to 2K a month; does Google penalize this? Yes, Google does. One of the ranking factors is the “last good click”, and this is caused by content decay. In other words, you’ve not updated your website in ages, and your content isn’t as fresh, so traffic reduces, and Google notices, reducing your position in search engine rankings (SERPs).
  • Post-click behavior matters. In SEO terms, we know that as “CTR” (click-through rate) or “dwell time”, and bounce rates; e.g., how long people spend on your website. What do people do when they click and how long do they hang around? A high bounce rate gets noticed by Google. Various sources have indicated that NavBoost is ‘already one of Google’s strongest ranking signals’.”
  • Mobile and Google Chrome performance matters. Based on everything in these documents, and a recent announcement about Google de-indexing websites that don’t perform on mobile devices, we’ve got to take that and Google Chrome performance seriously. Seriously, if your site is performing poorly on mobile (Google Search Console will show you that, and we can conduct an SEO audit to check this) it will be de-indexed on 5 July 2024. 
  • Regular updates = Better siteAuthority. Similar to Moz’s Page Authority (PA), a sub-authority ranking factor under DA. Turns out, the content, keywords, and referring links (external and internal) all play a role in how well individual pages rank within a website, and that impacts how a website ranks overall.
  • Branded search has value. For example, if a site has a lot of links and traffic but a smaller SEO footprint (e.g., a site with 10k backlinks but only 5 pages and limited social presence) that’s a warning signal to Google. Website’s need to ensure that linkbuilding strategies are in-line with other efforts, such as SEO, social media, and other factors that Google does include in ranking algorithms. 

An example of the inner workings of how Google scores and ranks websites.

Of course, we could write several thousand more words on all of this and still not get to the bottom of every ranking factor, algorithmic choice, and other detail. 

For those interested in learning more, there are dozens of resources, tools, and websites to explore at the end of this article. For those who rely on SEO for traffic, leads, and sales, we’ve got some food for thought below👇

Does your business need to overhaul your entire SEO strategy after the Google Search API document leak?

No, probably not, or maybe . . . 

It depends on what your current SEO strategy is, really. 

If you’ve got a solid, full-service SEO strategy; SEO (on-page, content), Linkbuilding, Digital PR, and Local, and it’s working, then you probably don’t need to change anything. 

However, if you’ve only got a partial plan (e.g., elements of it are working but you’ve not updated your web copy since 2018) and it’s not working as well as it should, then yes, it’s time to rethink your SEO strategy. 

If you do need a refreshed, ROI-focused SEO strategy, then it’s time to contact us:

Does your website and content need an update? Are you getting enough organic traffic from Google?

Talk to us today about SEO services you can depend on. OutreachMama is trusted by over 300 companies worldwide to generate high-impact organic growth from Google. 

OutreachMama can deliver every SEO service you need to improve your rankings and drive more traffic, sales leads, and customers to your website.

List of Google Data Leak Tools and Resources

Want to take a closer, in-depth, SEO-focused look at all of the data and what it means for yourself?

Go ahead, as these resources and tools are especially useful for SEO professionals 👇

Master open-source resource: Leaked Google Search API Doc Aggregation of Analysis, Tools, and Resources [Updated on June 2nd by Aleyda]: Google Search API Doc Resources Aggregation

About: A quickly-compiled database of articles, tools, and resources by Aleyda Solis

Open-source database: Leaked Google Search Algorithm Ranking Factors Database

About: A user-friendly, searchable database of the Google leaked doc featured potential signals. 

Creator(s): Mahendra Choudhary and Swapnil Pate 

Open-source, AI-powered searchable website: 2596.org

About: A website providing a searchable interface and AI-powered overviews for each of the leaked Google’s Search API doc modules to help you dive deep into the specifics of the leak.

Creator(s): Matt Hodson

Open-source, searchable table: Google Ranking Signals Searchable List

About: A searchable table including the Google Search API doc variables. 

Creator(s): Dixon Jones

An open-source database, focusing on Local Search: Local SEO Ranking Factors From The Leaked Google API Documents

About: A list of the Local SEO-related information found in the Google Search API Document. 

Creator(s): Andrew Shotland

Curious how the modules relate to one another? Take a look: Google’s Ranking Features Modules Relations

About: A visualization of Google Ranking Features Module Relationships.

Creator(s): Natzir Turrado

AI-generated summaries:Google Leak Reporting Tool

About: A tool that provides an AI-generated synthesis of the leaked documentation. 

Creator(s): Wordlift

Article by

7+ years in the SEO industry, I’m an off-page optimization specialist with a track record of helping clients climb the search engine ladder. My focus? Building long-term client relationships founded on great results and customer satisfaction.Off the clock? You’ll find me behind the lens, capturing the world through photography, or catching NBA games (minus any recent All-Star debates) – or maybe even experimenting with new recipes in the kitchen! Let’s chat about SEO, strategy, or maybe even your favorite classic basketball moments.

Latest From Our Blog​

What Does The Google Search API Ranking Factors Data Leak Mean for Linkbuilding, Digital PR, and SEO
Blog
Tomas Tasic

What Does The Google Search API Ranking Factors Data Leak Mean for Linkbuilding, Digital PR, and SEO

Early in May 2024, 2,500+ internal documents relating to Google Search’s Content Warehouse API were leaked. This constitutes the most comprehensive collection of Google Search API ranking factors any SEO has ever seen in the history of search engines.   Not only does this leak contradict numerous denials and false statements from Google over the years, but it also provides valuable insights for the whole SEO industry and, crucially, our clients. ​​TL;DR: Read on for details 👇 In this article, we are going to take a quick look at the details of the leak.  And then, crucially for businesses that need and use organic services to drive traffic and revenue growth: Quick Overview: What are the Leaked Google Search API Documents? Like every monopoly and corporate empire before it, Google is a gatekeeper. Google is the gatekeeper of the Internet. Other search engines don’t make enough of an impact for most

Read More »
Blog
Tomas Tasic

Best Practices for Google Penalty Recovery & Avoidance

If there’s one thing we know about Google, it’s that they keep tweaking their search methods to give users the best experience possible. And if you happen to go against their ever changing rules, your website could face penalties. Whether it’s through a manual action or a change in the algorithm, getting a Google Penalty can knock your website down in search rankings, costing you visitors and money. Losing your hard-earned spot on the SERPs sounds like a nightmare but all is not lost! With some effort and changes, it’s possible to bounce back and reclaim your rankings.  In this guide, we’ll highlight some best practices to recover from and prevent penalties, so you can build a penalty-free future for your website.  Let’s get started! Understanding Google Penalties Before jumping into the recovery process, it’s important to understand what a Google penalty implies.  Simply put, it’s when your website gets

Read More »
Broken Link Building 101
Blog
Tomas Tasic

Broken Link Building 101

Broken link building isn’t just another tactic; it’s among the top five most popular strategies for a reason. But, you need to know the ropes to really make it work.

Read More »
Blog
Tomas Tasic

How Many Backlinks Do I Need to Rank?

As you embark on your link building journey with an aim to boost your rankings, the first question you’ll ask yourself is: How many backlinks do I need to rank? That can be difficult to answer because it entirely depends upon a couple of dynamic variables. You have to figure out how competitive your keyword is, what domain authority you have currently, and how it compares to your competitors.

Read More »