Shopify and the Never-Ending Pages Problem

Written by Nikki Kettell

You might have heard or even been affected by the Shopify spam attack that hit eCommerce sites using the platform last year, and we certainly saw our fair share of challenges! With that in mind, we’ve put together a breakdown of what happened, how to detect the problems, and what we did to fix the issue, in the hope that it can help users.

Why Shopify?

As of 2023, almost 4.5 million websites use Shopify across more than 175 countries so it’s easy to see why it’s the preferred option for so many eCommerce websites. But with all that popularity, it’s easy to see why it’s such an appealing target for hackers, spam bots and general internet riff raff.

What happened?

In the later half of 2022, a huge wave of Shopify websites started noticing disturbing anomalies in their data:

  • Websites that had a few hundred or few thousands URLs, noticed Google was trying to index tens of millions of URLs overnight 
  • Websites that had spent time, money, and resources carefully and ethically optimising their content were suddenly ranking for NSFW, gambling and other spam keywords 
  • Backlink profiles that were once so natural suddenly reported an influx of spam keywords 

So, were these sites hacked? Technically no. 

If this is starting to sound familiar or you’re worried about your Shopify website, a distinction in terminology is important here. In this context, a hack means someone (or something) forced its way into your website’s CMS. This problem was not the result of a hack. 

If you’re worried your Shopify website has been hacked, the easiest way to check is by logging into it and checking the actual URLs (the Pages, Collections, Products and Blogs). If the new pages you’re seeing Google trying to index in your Google Search Console account, aren’t in the backend of your CMS, you haven’t been hacked. 

Unfortunately, terminology is just semantics at this point. You might not have to deal with a data or security breach but you’re likely going to see your organic traffic (and probably revenue) hit. 

For the sake of naming this problem, spam bots are exploiting a vulnerability in how pages build out in Shopify. For this reason, we’ve been calling this problem a Shopify Exploit – it’s much less alarming than ‘hack’ or ‘spam’ and elicits less panic. 

What exactly is happening? 

On a Shopify website, a query string can be added to the end of any URL ?q=

To begin with, the most common URLs we were seeing Google trying to index began with:

You can add anything after the = and the page should still load, for example:

This functionality exists to allow multiple variations of the same content to load. For example, if you’re using filters or page sort options. 

If you’ve got a page of walking boots, Users might like to sort by price, just view anything in a size 8, filter to see specific brands etc. If you’ve got a few filter options, the same page could build out hundreds of times with the same content. Using dynamic URLs means these aren’t actually static pages, Google shouldn’t be trying to index them (so no pesky duplicate content problems) and Google isn’t getting lost in a rabbit warren and potentially overlooking important content. 

What these spam attacks were doing was exploiting this functionality and adding their own keywords after the query string. For many websites, lots of these appendages were appearing in non latin based characters. These pages were then sending out backlinks – check out the below to see some of the anomalous appendages we were seeing:

OK, so why is this a problem?

For our clients, we noticed two main problems:

1. A huge influx in indexed pages

For a website that usually has less than five thousands URLs, Google suddenly started indexing millions

This alone would send alarm bells ringing in Google’s algorithm. For a site to grow this much this quickly, it can only be through malicious circumstances. We’re now reliant on Google to understand our website is the victim in all this and not the aggressor.

2. A huge increase in backlinks

These backlinks were not natural or procured purposefully by the websites impacted: 

What was the impact?

We noticed new rankings (as well as clicks and impressions) for unrelated keywords. Most were NSFW or gambling related – not ideal. After conducting some analysis on other Shopify websites, we saw the same pattern. 

At the same time, many clients started seeing their real organic keywords start to drop. This resulted in organic traffic and revenue following suit. 

Google claims it’s smart enough to identify spam links. This is obviously not the case as we saw countless legitimate ecommerce websites ranking for completely irrelevant terms the site was never optimised for. 

How did we fix this problem?

Our first port of call was the support team at Shopify who told us this wasn’t something they could or would be inclined to fix. Not overly helpful (!), but at least we knew where we stood.

Problem 1 was the huge increase in indexed pages, so the aim was to help Google get them out of the index. The first step here was to try and fix the problem at the source – for any URL trying to load content after vendors?q= , we needed to tell Google to ignore it. 

Canonicalising these URLs back to the parent URL was the easiest way to achieve this.

Adding the canonical tag: <link rel=”canonical” href=”” />

The next step was to back this up with the URL removal tool in the Google Search Console – not the most ideal tool. This was fixing the symptom, not the cause and only a temporary fix at that. In this case though, we’re hoping Google’s algorithm would have caught up by the time the removal expires.

For one client, we saw non-brand rankings start to climb back within 24 hours of submitting this request. 

Problem 2 – We need to get rid of those links. 

Since none of these spam bots were kind enough to leave a contact email, asking them to remove their links proved problematic. This left us with a Disavow submission as our only real course of action. This was also the suggestion made by the support team at Shopify when we contacted them. 

A Disavow file is a .txt file submitted to Google containing all the backlinks (or linking domains) you don’t want associated with your website. Google advises caution using this tool and it should only be used by those that know what they’re doing, although any SEO who lived through the great Penguin rollout of 2012 will know this tool well! 

A Disavow file isn’t admitting fault or telling Google you’ve been collecting spam links, it’s more of an insurance file. If you’re worried you’ve got enough spam links pointing to your website that they could damage your rankings, you’re telling Google you didn’t ask for them and not to count them. 

Until this Shopify exploit is rectified, Disavows should become part of monthly maintenance. 

And, what are the results?

It’s still early days. For some clients, we’re still seeing these spam pages appearing via other routes:

  • Sitting off the top level domain
  • Coming from the internal search:

So in those cases, we’re following the same steps as above.

However, we are starting to see the volume of pages not indexed outweigh those that are indexed. What’s even more encouraging is we’re seeing the largest increase in reasons for not indexing coming from ‘Crawled – currently not indexed’ which is coming from Google Systems. 

This means Google is deciding for itself these pages shouldn’t be indexed rather than because of an onsite directive (like noindex or canonical tags). This suggests Google’s algorithm is adapting and compensating accordingly. 

Were you impacted by the exploit? We’d love to hear how you handled it, and are on hand to help if you need advice and support. 

If you want to discuss this or any other Shopify SEO questions such as optimising navigation using data analysis, or leveraging data from search boxes, get in touch with the Vuzo team.

Get in Touch

    Download Form Title

    You're nearly there! Just enter your contact details and you will get instance access to the whitepaper