Some of the advanced SEO concepts are influenced by academic research and patents that shape the way search engines rank and present content. One man who has played a huge role in uncovering these technical foundations was Bill Slawski.
To say Slawski was a SEO expert is an understatement. Bill dedicated years decades to analyzing search-related patents and their impact on SEO practices. All of which he document on his blog Seo by the Sea.
Even if many of his articles may seem dated, as Google keeps growing and changing, they still pack a lot of value to understand the fundamentals and core structure on top of which modern Google is built (and by virtue of mimicking Google most of modern search engines).
Grab a cup of coffee, take 45-minutes. Let’s go and geek out a bit!
1. Information Gain
What is the basis for Information Gain in SEO?
In the big picture, the concept of Information Gain originates from the field of information theory, particularly from research by Claude Shannon. Oh, yes, everyone’s favorite Claude Chadchin from CS101.
In SEO, this idea has evolved into how search engines determine the value of unique content.
The main goal is to avoid duplicate content. The secondary goal is to increase the value of unique, high-quality content and bring that to more prominence.
By using a metric similar to Information Gain, Google makes sure that users such as you and I are exposed to the most informative and unique content possible.
What Bill Wrote About It
Bill Slawski played a big role in bringing different patents related to Information Gain to light through his detailed blog posts on SEO by the Sea. He emphasized how Google might be using such a system to reward content with high information gain and punish content that merely rehashes what’s already available. His insights allowed SEOs to focus on creating more original and in-depth content to satisfy the needs of search engines for fresh information.
Here are two articles from him that addresses duplicate content directly:
- “Duplicate Content Issues and Search Engines”
- “How Google Identifies Primary Versions of Duplicate Content”
On a more advanced level, understanding concept research can offer value (even if it was “just a phase” in the evolution of Google):
- “Should You Be Doing Concept Research Instead of Keyword Research?”
- “Quality Scores for Queries: Structured Data, Synthetic Queries and Augmentation Queries”
- “Google’s Quality Score Patent: The Birth of Panda?”
2. PageRank
The origin of PageRank:
PageRank is one of the most famous SEO concepts, rooted in the academic research of Google’s founders Larry Page and Sergey Brin. The foundational paper is titled “The Anatomy of a Large-Scale Hypertextual Web Search Engine“ (1998). This document introduced the world to the concept of PageRank, which calculates a webpage’s importance based on the quantity and quality of backlinks it receives.
The corresponding patent, “Method for Node Ranking in a Linked Database” (US Patent No. 6,285,999), explains how PageRank treats links as votes of confidence. A page linked to by many high-quality pages will rank higher.
Bill on PageRank
Slawski was one of the first SEO experts to break down the PageRank patent in an understandable way. He explained how PageRank works, what SEOs should focus on regarding backlinks, and how updates to PageRank (like Google’s switch to trust-based metrics and other algorithms) influenced rankings over time.
Bill’s Ten-Part Behemoth PageRank MegaThread
- Part 1 – The Original PageRank Patent Application
- Part 2 – The Original Historical Data Patent Filing and its Children
- Part 3 – Classifying Web Blocks with Linguistic Features
- Part 4 – PageRank Meets the Reasonable Surfer
- Part 5 – Phrase Based Indexing
- Part 6 – Named Entity Detection in Queries
- Part 7 – Sets, Semantic Closeness, Segmentation, and Webtables
- Part 8 – Assigning Geographic Relevance to Web Pages
- Part 9 – From Ten Blue Links to Blended and Universal Search
- Part 10 – Just the Beginning
Beyond this MegaGuide, Slawski’s posts on SEO by the Sea often elaborated on subtle changes in PageRank that came through new patents or research, giving SEOs deeper insight into the continual evolution of the algorithm.
3. Long Clicks and Short Clicks
The basis for Long Clicks and Short Clicks:
Long clicks and short clicks are based on the concept of user engagement metrics, which search engines like Google use to gauge user satisfaction with a search result. These metrics are closely tied to patents like “Modifying search result ranking based on implicit user feedback” (US Patent No. 8661029B1), which describes how Google tracks user behavior to determine whether a particular search result was useful.
When a user clicks on a result and spends a significant amount of time on the page before returning to the search engine (a long click), it signals that the content was relevant. A short click or quick return to the search page (often called pogo-sticking) signals dissatisfaction with the result.
The Long and Short of It
Slawski consistently reviewed patents that highlighted the importance of user behavior in search engine rankings. In his analysis of these patents, he made it clear that Google was evolving toward an algorithm that took user experience into account rather than just relying on on-page factors or backlinks.
Slawski’s deep dive into these interaction-based signals gave SEOs early warning to focus on providing real value to users and improving the usability and satisfaction of their websites, beyond just traditional ranking factors.
- “How Google Might Rank Pages Based upon User Behavior Infromation”
- “User Behavior Data Google May Use to Influence Search Rankings”
4. Navboost and Navigational Queries
The patent behind Navboost:
Navboost relates to search queries where users intend to navigate to a specific website, such as “Facebook login” or “YouTube homepage.” These are known as navigational queries, and search engines like Google provide a boost to the most relevant pages for these queries.
Google often pushes the most relevant page for navigational queries to the top of the results, allowing users to quickly get to their intended destination.
Bill’s take on navigational queries
Slawski brought attention to patents like this one, explaining how brand websites and key pages benefit from such boosts in search rankings. His work helped SEOs understand the value of optimizing for branded and navigational keywords, and how to leverage this concept to dominate branded search results.
- “How Google Identifies Navigational Queries and Resources”
- “Redefining Navigational Queries to Find Perfect Sites”
- “How a Search Engine May Expand Search Queries Based upon Popularity Measured by User Behavior”
5. TF-IDF
Research behind TF-IDF:
TF-IDF is a mathematical model used to measure the importance of a term in a document relative to a corpus (collection of documents). It has been widely used in information retrieval and search engines. The original formula was introduced by Gerard Salton in the 1970s, and it plays a key role in determining how search engines evaluate the relevance of a document based on keyword frequency.
TF-IDF is an information retrieval concept that forms the basis for understanding keyword relevance in a document and is part of how search engines rank pages based on content relevance.
Bill Slawski’s articles on TF-IDF
Bill explored patents where Google employed variations of TF-IDF in its ranking algorithms.
He highlighted the evolution of search engine technology from basic keyword matching to more sophisticated models like TF-IDF and helped SEOs understand how keyword density, usage, and distribution influence search rankings.
- “Term Frequency and Inverse Document Frequency at Google”
- “Entity Optimization: How I Came to Love Entities”
6. BM25
The basis of BM25:
BM25 is an improvement on the TF-IDF model and is part of the Okapi BM family of algorithms used in information retrieval systems. Developed in the late 1990s by Stephen Robertson and others, BM25 refines the way term frequency and document length are factored into search rankings, making the model more adaptable to natural language.
Unlike simple TF-IDF, BM25 applies a diminishing returns function to the frequency of a term and normalizes for document length, ensuring that longer documents aren’t unfairly favored.
Articles on SEO by the SEA regarding relevance
Slawski was one of the few SEO professionals to delve into the nuances of patents similar to BM25. By analyzing Google’s patents and drawing connections to academic papers like Robertson’s work, he showed how modern search engines were improving their relevance algorithms beyond basic term frequency.
His coverage helps marketers understand the shift from simple keyword optimization to more nuanced strategies, where document length, structure, and natural language usage began playing a more prominent role.
- “Page Relevance Determined by Anchor Text”
- “On Search Engine Relevance”
- “How Google Might Ignore Insignificant Query Terms”
- “How Google May Boost Search Rankings for Document Category Keywords”
- “How a Search Engine Might Weigh Pages with Relevant Annotations Higher in Search Results”
- “How a Search Engine Might Determine Search Engine Relevance from Related Queries”