Today we announced an improvement to our search volume accuracy in our US database. This post explains the technical details of how we did it.
How to Measure Search Volume Accuracy
In order to make our search volume prediction algorithms as accurate as possible, we had to find a way to measure whether we were on target or not.
To achieve that, we needed to:
- Choose a source of volume data that would be as close to real volume as possible and use it as the benchmark value
- Clean the data from the selected source to avoid irrelevancies and junk
- Make sure the selection of keywords had an even distribution of low-volume queries (long-tail keywords), high-volume queries, and medium-volume queries
After we validated the selection of keywords, we ran our study to see how Semrush compared to Moz, Ahrefs, Serpstat, Sistrix, Google Keyword Planner when it came to providing accurate search volumes.
How We Chose the Benchmark Data Source
After over 50 interviews with experienced SEOs, the consensus was clear: experts believe the most accurate source of search volume is through Google Search Console (GSC).
Because our panel was so confident, and because GSC contains real data coming directly from Google, we agreed that GSC would work well as our benchmark. Although there is no “Search Volume” metric found in GSC, there is something close: impressions.
We used this metric with reservations, because, as it’s said here, impressions are not the same as volume. Impressions are “how often someone saw a link to your site on Google. Depending on the result type, the link might need to be scrolled or expanded into view.”
While impressions and volume are different, there are instances where they’re similar.
If the position of your domain is immediately visible (without scrolling on desktop or mobile results) for everyone who enters the query, then impressions would be equal to volume in most cases.
100 impressions from a visible position ≈ 100 total searches.
With this relationship, we can say impressions are a valid source of reference Search Volumes for a comparison study.
Filtering Data from GSC and Preparing the Keyword Sample
Thanks to some of our kind users, we had a number of people that agreed to share their anonymized GSC data with us for the comparison study. We ended up with a set of URL-keyword-average position bindings as one would see in the Pages report of GSC.
Since not every binding had an average position that was guaranteed to be visible (top 3), we couldn’t use every keyword for our comparison. Thus, we had to clean up the data we had.
To clean up the dataset, we removed:
- Keywords for which the URLs had an average position in GSC outside of the top three, leaving only URLs with the highest chance of being immediately visible in the SERP
- Commercial and transactional keywords that contained so many ads on the SERP that the organic results weren't immediately visible
- Other keywords whose SERP layout didn’t show organic positions on the visible area of a user’s screen, desktop or mobile, before scrolling
Ensuring an Even Distribution of Keyword Characteristics Within the Sample
In the previous stage, we collected a sample of 1M keywords, from which we had to select 10,000 keywords for research. To make this final sample unbiased and accurate, we needed to ensure an even distribution of characteristics.
We fine-tuned the sample to contain equal proportions of:
- Keywords from different groups of volumes (5 buckets from low to high volume)
- Keywords with a different number of words, topics, intent, and other parameters.
For example, we divided the volumes into five ranges of monthly impressions and took an equal number of each:
- 1 to 100
- 101 to 1,000
- 1,001 to 10,000
- 10,001 to 100,000
- From 100,001+
We did the same for the rest of the parameters, dividing the sample into equal ranges.
Finally, we made sure that 10,000 is a sufficient size for this type of sample. We confirmed that because, with the same distribution of keywords based on the parameters above, a larger set of keywords still brought the same results.
The process we described above allowed us to create an unbiased, uniform sample that accurately reflects the real situation with quality and coverage in each tool.
We repeated such comparisons for several months in a row during the development of a new algorithm and each time received the same results, which proves its stable performance.
We liked the result of the comparison so much that we added a quality check of our databases on a regular basis to our data collection pipeline. Now, with monthly updates, we’re confident that we’re delivering the best volume data to you that we can.