The best way to get probably the most out of the Google Search Console API utilizing regex


Google Search Console is an incredible device that gives invaluable search knowledge by actual customers instantly from Google. Whereas the charts and tables are pleasant to work with, a big a part of the info just isn’t accessible from the UI. 

The one solution to get to this hidden knowledge is to make use of the API and extract all that priceless search knowledge that’s out there to you – if you understand how. That is doable with common expressions.

Right here’s how one can maximize the Google Search Console API utilizing common expressions, in keeping with Eric Wu, VP of Product Development at Honey, a PayPal Firm, who spoke at SMX Superior.

Diagnosing search engine marketing points with GSC

Engaged on a web site experiencing stagnant or declining progress or a core replace drop?

Most search engine marketing professionals flip to Google Search Console (GSC) to diagnose such points.

(Or if assets allow, you might even use a paid device like Ryte or construct your personal platform.)

Thankfully for the search engine marketing group, there’s no scarcity of Looker Studio dashboards (previously Google Knowledge Studio) helpful for GSC evaluation, together with:

Dashboards permit SEOs to have a look at an outline of various traits versus utilizing GSC and doing a number of clicks to get to the info you want.

However should you’re analyzing enterprise websites, you’ll be able to run into some roadblocks.

  • Looker Studio and Google Sheets each load slowly, particularly if you’re coping with massive websites. 
  • GSC’s interface has a 1,000-row export restrict.
  • GSC has an enormous sampling drawback. Enterprise search engine marketing groups miss 90% of their GSC key phrases, in keeping with And if you understand how to extract the info, you’ll be able to truly get 14x the key phrases. 

Overcoming GSC’s sampling drawback

Explorer for Search is one other device that you should utilize for GSC evaluation. From Noah Learner and the crew at Two Octobers, it’s constructed with knowledge pipelines utilizing GSC’s API which then outputs knowledge to BigQuery (mainly bypassing Google Sheets and downloading CSV recordsdata), after which visualizes data with Knowledge Studio.

With this, you’ll be able to trust that you simply’re attending to virtually all the info. 

There’s nonetheless a caveat attributable to GSC’s sampling drawback, particularly for giant, ecommerce websites with plenty of completely different classes. GSC received’t essentially present all the info that’s coming in from these directories.

After conducting numerous assessments to get probably the most knowledge out of the GSC API, the crew found a solution to close the GSC sampling gap.

They discovered that by including extra subdirectories as completely different profiles inside your GSC dashboard, you’ll be able to extract much more knowledge as Google offers you extra data at that decrease degree. 

For instance, should you’re and also you add “televisions” as a subdirectory in your GSC profile, Google offers you solely the key phrases and the press data for that subdirectory and down.

And by including plenty of these completely different subdirectories, you’ll be able to extract much more data.

That solves the sampling drawback, however you may get much more knowledge by utilizing common expressions.

Getting extra GSC knowledge with common expressions

Common expression, or regex, is a robust device to know your knowledge. 

In April 2021, Google added regex support to GSC – giving SEOs extra methods to slice and cube natural search knowledge. 

Quite a lot of instances, knowledge just isn’t helpful except you’ll be able to understand it. And regex helps to extract actionable insights from GSC’s wealthy knowledge.

However as highly effective as it might be, regex might be tough to be taught. 

The most effective place to know and dive deep into common expressions is Google’s official documentation on GitHub. (Google makes use of RE2 in its merchandise, which is a taste of normal expression.)

Whereas regex is offered in every kind of various programming languages, you’ll discover it virtually all over the place even to those that are modifying .htaccess recordsdata.

Within the subsequent few sections are use circumstances for leveraging regex for GSC. 

Regex informational queries

When precise informational search queries in GSC, you usually wish to perceive:

  • How are individuals truly coming to your website?
  • What questions are they extracting?

Taking a look at these issues from a one-off standpoint, inside GSC might be tough. 

You’re all the time looking for the phrases “what,” “how,” “why” after which “when.”

There are a few methods to make extracting informational queries much less tedious with regex.

Daniel Ok. Cheung shared a regex string that may present you all queries containing “what,” “how,” “why” and “when” that both bought a click on or an impression:

And this regex string shared by Steve Toth takes the earlier instance up a notch:

  • ^(who|what|the place|when|why|how)[" "]

You need to use this string if you wish to seize question-based queries that begin with both “who,” “what,” “the place,” “when,” “why” and “how” after which adopted by an area. 

This can be a nice checklist to make use of if you’re on the lookout for any kind of phrase that will begin a query:

  • are, can, can’t, might, couldn’t, did, didn’t, do, does, doesn’t, how, if, is, isn’t, ought to, shouldn’t, was, wasn’t, had been, weren’t, what, when, the place, who, whom, whose, why, will, received’t, would, wouldn’t

Placing all this into regex kind would look one thing like this: 

  • ^(are|can|cannot|might|could not|did|did not|do|does|does not|how|if|is|is not|ought to|should not|was|wasn't|had been|weren't|what|when|the place|who|whom|whose|why|will|will not|would|would not)s

On this 178-character string:

  • You might have the caret (^) which tells you the question wants to start with this phrase:
  • The phrases are separated with pipes (|) as an alternative of commas. 
  • All of the phrases are wrapped in parentheses. 
  • There’s a backslash and the “s” (s) which denotes an area after the phrase. 

That is good, however can even get tedious to do.  

Under, Wu simplified the earlier checklist of phrases to be extra regex-friendly and shorter which is good for copying and pasting. Sustaining it this manner additionally helps with effectivity. 

The best way to get probably the most out of the Google Search Console API utilizing regex 87

Within the first column are the traditional phrases and within the second column, the compressed regex. 

As an illustration, the phrase “can” makes use of the compressed model can(‘t)?.

What the query mark signifies is that something throughout the parentheses is optionally available. The compressed syntax permits you to cowl each the phrase “can” and “can’t.” 

Extra apparently, you are able to do this with might/couldn’t, ought to/shouldn’t, and would/wouldn’t the place the -ould a part of the phrases is the widespread base, like (c|sh|w)ould(n’t)?. This brief string covers all six of these circumstances.

Whereas simplifying that lengthy checklist of phrases turned the string much less readable, what’s nice is that it matches extra into the regex discipline and permits you to copy-paste simpler.

  • ^(are|can('t)?|(c|sh|w)ould(n't)?|did(n't)?|do(es)?(n't)?|how|if|is(n't)?|was(n't)?|had been(n't)?|wh(at|en|ere|y)who(m|se)?|will|will not)s

In the event you go a step additional, you’ll be able to compress it much more. On this case, Wu lowered the character depend from 135 to 113 characters. 

  • ^(are|can('t)?|how|if|wh(at|en|ere|y)|who(m|se)?|will|will not|((c|sh|w)ould|did(n't)?|do(es)?|was|is|had been)(n't)?)s

Common expressions can get actually sophisticated. In the event you’re getting a regex string from another person and want to disambiguate what’s doing what, you should utilize Regexper that can assist you visualize it. 

05 regexper

Under you’ll see a comparability of the completely different regex string variations. It’s simpler to keep up the primary one, and clearly tougher to keep up and browse the final one. 

06 regex list simplified compressed

However generally character depend actually will matter particularly when you’ve gotten longer common expressions.

Regex filter limits for GSC is 4,096 characters, in keeping with Google Search Advocate Daniel Waisberg. 

That would appear fairly a bit. Nevertheless, you probably have an ecommerce website and have so as to add domains, subdomains or longer directories, you’ll most certainly hit that restrict.

Regex branded queries

One other occasion the place you might begin hitting the regex character restrict in GSC is if you use it for branded queries.

07 samsung branded query misspellings

When you concentrate on all of the various kinds of misspellings of a model title that an individual might kind, you’ll shortly run into that 4,096 character depend. As an illustration:

  • aamaung, damsung, mamsang, sam sung, samaung, samdung, samesung, sameung, samgsung, samgung, samsang, samsaung, samsgu, samshgg, samshng, samsing, samsnug, samssung, samsu, samsuag, samsubg, samsubng, samsug, samsumg, samsumng, samsun g, samsunb, samsund, samsund, samsunh, samsunt …

That is the place understanding regex helps. With this string, you’ll be able to seize the model title “samsung” together with misspellings:

  • (s+|a|d|z)[a-zs]{1,4}m?[a-zs]{1,6}(m|u|n|g|t|h|b|v)

Quite a lot of instances, individuals will misspell the center components of the phrase. However usually, they get the format and size proper and you’ll strategy your syntax this manner.

For model question misspellings, take into account the next:

  • Essential letters that make up the model question.
  • Consonants.
  • Letters surrounding onerous consonants.
08 samsung misspellings regex

In crimson are the onerous consonants that folks usually don’t miss once they’re typing in a model title. These are the primary letters that make up that exact model. For “samsung”, the “s” at first, the ”m” within the center, after which “n” and “g” on the finish.

The blue letters surrounding these predominant consonants on the keyboard are those individuals usually mistype. Within the instance, round “s”, you see the “a”, “d” and “z”. (Whereas the format is completely different for worldwide keyboards, the idea remains to be the identical.) 

The regex string above captures all of the doable variants of “samsung.”

The opposite main trick right here is in [a-zs]{1,4}.

In regex kind, this mainly says, “I wish to match any letter “a” to “z”, or an area, one to 4 instances.” 

This captures all these bizarre misspellings that may occur in the midst of a model question – the place an individual can probably hit the identical key a number of instances or unintentionally press house.

Moreover, the model title is a sure size (“samsung” has seven characters). Folks doubtless received’t find yourself typing 20–50 characters. 

So on this common expression, we’re guessing that between “s” and “m” in “samsung,” somebody’s going to mistype 1–4 characters. After which from “m” to “g” on the finish, they’ll mistype 1–6 characters, with areas included. 

Including all this lets you seize the numerous variations of a branded question comprehensively.

The opposite factor to notice is that the model title might seem in numerous components of the question.

09 brand query position

So we have to guarantee that the model title itself, is captured. It ought to both be:

  • In the beginning of the question.
  • In the midst of the question (thus surrounded by areas).
  • Or on the finish of the question.

The common expression for that is as follows:

  • (^|s)(s+|a|d|z)[a-zs]{1,4}m?[a-zs]{1,6}(m|u|n|g|t|h|b|v)(s|$)

This captures all queries the place the model title “samsung” is both at the beginning, center or finish.

  • Begin of string = ^ 
  • Surrounded by areas = s
  • Finish of string = $

JC Chouinard’s publish, Regular Expressions (RegEx) in Google Search Console, dives even deeper into regex examples. 

Regex and the GSC API in motion

Common expressions got here in helpful for Wu and his crew once they labored with a consumer that encountered visitors drops following a core replace.

After wanting on the ecommerce website’s completely different points, they found that the issue resided in some product element pages. 

They wanted to phase pagetypes for evaluation in GSC. However this was a fancy process due to the completely different URL constructions for U.S. and worldwide merchandise.  

10 site url structure

The location’s worldwide product URLs included language and nation codes, whereas U.S. product URLs didn’t. 

Even utilizing regex syntax was tough as a result of letters and dashes exist within the product slug, classes and subcategories. Moreover, they wanted to filter out the worldwide product URLs to seize solely U.S. pages.

To get all U.S. product touchdown + element pages (not i18n pages), they got here up with the next regex strings:

Embody: /([^/]+/){1,2}p? 

Exclude: /[a-zA-Z]{2}|[a-zA-Z]{2}-[a-zA-Z]{2}/ 

Right here’s a breakdown:

11 regex us pages only

The crew wished to match the class, the subcategory and all of the merchandise so that they included:

  • Any character that’s not a slash = [^/]+
  • 1 or 2 directories = /){1,2}
  • Typically adopted by a product slug = p?

A caret (^) usually means the beginning of the string. However when it’s inside brackets (as in [^/]), it signifies a negation (i.e., “not something inside this field”). 

So this string /([^/]+/){1,2}p? means “I need any variety of characters that isn’t a slash, main as much as a slash (which denotes the listing), and generally adopted by the letter ‘p’ (the prefix for product slugs).” 

On the similar time, the crew didn’t wish to match the nation and language mixture which additionally contained letters and dashes, so that they excluded:

  • Any 2 letter listing = [a-zA-Z]{2}
  • 2 letter + 2 letter lang-country combo = [a-zA-Z]{2}-[a-zA-Z]{2}

Creating a daily expression to match all of the language and nation codes on their very own could be tedious due to all of the doable combos, so that they had been unable to strategy this the way in which did for informational queries (the place each single kind of mixture was excluded). 

However even after creating these regex strings, that they had an issue. 

In Google Search Console, there’s just one discipline to stick a regex string. You’ll have to decide on both Matches regex or Doesn’t match regex – you’ll be able to’t use each on the similar time.  

12 gsc regex filter

That is the place the GSC API got here in useful because it permits becoming a member of regex strings.

Within the Google Search Console API documentation, there’s a Attempt it now hyperlink.  

13 GSC api documentation

As soon as clicked, it’s going to open up a console that permits you to choose a website and make your API request by means of the online view.

14 GSC API console

However to raised handle API queries, Wu recommends utilizing Postman on the desktop or Paw (which is native to Mac).

Postman permits you to create queries and save them for later. And you probably have entry to different websites, you don’t should create a brand new question every time. You simply merely change out the positioning title with a variable after which make a number of requests.

15 postman

Paw, however, is far simpler to look by means of and make the most of.

16 paw

To entry the API, you’ll must get your API keys. (Right here’s a useful tutorial from Chouinard.) 

When you get this data, you’ll have your consumer ID and consumer secrets and techniques, which you’ll add to your OAuth 2.0 authentication inside both Postman or Paw.

17 api oauth

From there, you’ll have the ability to sign up along with your regular account.  

Wu primarily made GSC API requests utilizing the regex strings in Paw. The question is entered in the midst of the interface.

18 paw api request

The response from Google is just like that of the GSC API internet view. The information can then be exported for processing.

For the reason that knowledge is in JSON, the knowledge might be messy and onerous to learn. 

19 json 1

For this, you should utilize a free and open-source command-line JSON processor known as JQ to pretty-print the knowledge.

20 json with jq

The information just isn’t that helpful till you get it right into a spreadsheet. Pipe within the file you’ve exported from Paw to JQ. Open it after which iterate over every row – saving every aspect so you’ll be able to output them to a CSV.

21 json rows to iterate

Right here, you’ll must convert clicks and impressions that are floats (a quantity that has a decimal place). Each have to be transformed into strings appropriate with a CSV. 

JQ will then output the next much-simpler format. 

22 json end file format

Subsequent, you’ll use Dasel to take this format after which make it right into a CSV. 

And right here’s the top outcome. 

23 spreadsheet gsc data

What’s wonderful for Wu’s crew is that they had been in a position to make use of the Google Search Console API and common expressions to:

  • Filter out all of the worldwide queries and take a look at simply the U.S. the place they had been having the primary points.
  • Establish the times the positioning was having points. 

Watch: Getting probably the most out of the Google Search Console API

Under is the whole video of Wu’s SMX Superior presentation.

New on Search Engine Land

About The Writer

Angel Niñofranco

Angel Niñofranco is Senior Content material Editor at Third Door Media, specializing in enhancing content material from Search Engine Land and MarTech’s rosters of subject-matter consultants. She has over 5 years of mixed editorial and advertising and marketing expertise within the digital publishing trade, specializing in content material enhancing, copywriting and e mail advertising and marketing. Previous to becoming a member of Third Door Media, Angel labored with the editorial and advertising and marketing groups of Search Engine Journal in numerous roles, most notably as challenge editor and e mail advertising and marketing supervisor.

Source link


Please enter your comment!
Please enter your name here