On March 25th, 2014, Google was granted a patent for the Google Panda update (# 8,682,892 B1).

The patent was filed about a year and a half ago on September 28th, 2012 even though Google Panda was released in February of 2011. From this, it is likely that the patent filed in 2012 is probably a reflection of changes Google made to Panda after they tested and refined the theory for a year and a half; it is better to patent what works versus theory since what works is more valuable. This article will examine the key parts of the Google Panda Patent. The article uses the .pdf version of the patent for it contains line and page numbers. You can view the patent here or you can download the .pdf here.

What the Google Patent Says

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for ranking search results. One of the methods includes determining, for each of a plurality of groups of resources, a respective count of independent incoming links to resources in the group; determining, for each of the plurality of groups of resources, a respective count of reference queries; determining, for each of the plurality of groups of resources, a respective group-specific modification factor, wherein the group-specific modification factor for each group is based on the count of independent links and the count of reference queries for the group; and associating, with each of the plurality of groups of resources, the respective group-specific modification factor for the group, wherein the respective group-specific modification for the group modifies initial scores generated for resources in the group in response to received search queries.

The patent’s abstract, quoted above, provides a high level overview of what Panda does. It breaks Panda into four parts:

  1. It determines the number of incoming links for a domain
  2. It determines the volume of searches for the brand
  3. It creates a sitewide multiplier for the domain based upon the number of links and brand searches
  4. It applies this sitewide multiplier to the page if it deems a “penalty” or “modifier” is needed. Each of the four functions will be analyzed.

Function One: Panda Counts Incoming Links & Citations

The system determines a count of independent links for the group… A link for a group of resources is an incoming link to a resource in the group… [and can be] express links, implied links, or both. An express link [is] a hyperlink… [an] implied link is… a citation. — Google Patent Page 6, line 31 to 43

Google counts both hyperlinks as well as brand mentions as incoming links. Therefore, a blogger linking to your site counts as a link as well as a blogger simply mentioning your brand. This part is nothing new and brand mentions have been showing up on Google Webmaster Tools for awhile.

However, not all hyperlinks or mentions count. Line 44 on page 6 to line 13 on page 7 go on to explain that links are judge to see if they were independently made or not. The patent goes through a few examples including if the links use similar style sheets, photos, content, hosts, owners (we assume from who.is data), and other factors. If a link is determined to be created by the linked to site, it won’t count.

In some implementations, the system counts at most one link from resources in any one source group as an independent link for a target group. Alternatively, if more than one independent link is identified from resources in a source group to resources in the target group, the number of independent links counted for the target group by the system may be a function of the total number of independent links. For example, the counted number of independent links may be the total number of independent links from resources in the source group to resources in the target group, a logarithm of the total number of independent links from resources in the source group to resources in the target group, or other non-decreasing function of the actual number. –Google Patent Page 7, line 13 to 29

In addition to that, not all links are weighted the same. Line 13 to 29 on page 6 states that, sometimes, 1,000 links from a site can count as 1 point and, other times, 1,000 links from a site can count as 1,000 points or more. Why would this be so? Lets say you get a site-wide footer link on CNN. That link will give you several thousand links from CNN but those links would not be weighted the same as being featured on several thousand CNN articles.

Function Two: Panda Counts Brand Searches

The system determines a count of reference queries for the group… A reference query for a group of resources is a search query that has been submitted to a search engine and has been classified as referring to a resource in the group. A query can be classified as referring to a particular resource if the query includes a term that is recognized by the system as referring to the particular resource.

Google determines brand searches — called reference queries in the patent — by finding the total number of people searching for your brand. Queries such as “starbucks hours,” “best starbucks drink,” “starbucks,” “how to get free starbucks,” and “starbucks 14th and university” would all count as brand searches for Starbucks.

In some implementations, the system counts only queries submitted by unique users as reference queries for the group. That is, a query that includes a term that has been categorized as referring to a resource in the group is counted as a reference query only if the user submitting the query has not previously submitted a query that has been categorized as referring to any resource in the group. — Google Patent Page 7, line 60 to 65

A brand search might only count once per user. Therefore, die-hard Justin Bieber fans who search for Bieber news daily might be counted the same as a confused mom who is wondering, “who is Justin Bieber and why is my daughter obsessed with him?”

Function Three: Panda Create a Site-wide Multiplier

The system generates a modification factor for the group of resources from the count of independent links and the count of reference queries. — Google Patent Page 8, line 10 to 13

The modification factor, referred to as multiplier in this article, is based upon total incoming links divided by brand mentions. Page 8 line 28 to 63 goes on to say that this multiplier is relative to sites with similar brand mentions on the web. Therefore, sites with 10,000 brand searches a month (a very popular brand) will be compared to other sites with 10,000 brand searches a month. Why would this be so? If you are running TV ads, you’ll get more brand searches then a site that doesn’t and, because of that, your backlink to brand search ratios would be different compared to a site that doesn’t run TV ads. This feature of Panda weights sites that are similar in size so Apple isn’t compared to Marge’s Donut Den.

Function Four: Panda Applies the Sitewide Multiplier to Individual Pages Based Upon the Search Query

Page 8, line 63 to page 9, line 63 states that, for a given search query, Google asks itself a few question to see whether or not it should apply a penalty.

The first question it asks itself is “is the searcher trying to go to the site?” It was phrased in the patent as “is the query navigational to the resource.” What this means is that, if someone is searching specifically for that page, that page will never receive a penalty. These are usually referred to as “go” searches in the “do-know-go” method of classifying search queries. In fact, Google quality reviewers are required to classify searches as do, know, or go and quality reviewers’ scores are used to determine ranking factors.

google-panda-patent

If the query is not a “go” query, Google then asks itself “does this page meet my first quality threshold?” If yes, the page is not penalized. If no, Google asks itself “does this page meet my second quality threshold?” If yes, a “first modification score” is applied. If no, a “second modification score” is applied. Modification scores are multipliers that reduce the ranking of the page.

Some individuals report sitewide drops from Panda whereas others report some content dropping while other content remaining the same. The above methodology matches these observations for a large sitewide multiplier would drop rankings for all pages since individual page multipliers are a function of the sitewide multiplier. Also, certain pages might pass the threshold and, therefore, receive no penalty whereas other pages might fail the threshold and, therefore, receive a penalty.

Summary of the Above

Panda compares a site’s backlinks with its brand searches to create a multiplier. This multiplier is relative to other sites with similar amounts of brand searches so that well known sites (sites who spend a lot in promotion) are judged against their peers. Panda then creates a multiplier for the specific page a user is searching based upon the sitewide multiplier and only applies the penalty if the page fails to meet certain thresholds. These thresholds are factors determined by Google Quality Raters.

Conclusion

Matt Cutts stats in the video above that having a patent does not necessarily mean Google uses said patent. Therefore, how Google Panda works might — and probably is — a little bit different then this. Also, Google Panda changes often and is being softened once again so that small businesses can rank better.