Inside a Hacked SEO Backlink Network
On Tuesday of last week, we got a notification from Google that stated a client website was hacked. What I didn’t realize at the time was this was one of the craziest SEO hacks I’ve seen in a very long time.
These types of hacks are extremely common on the interwebs, especially on WordPress sites. The hacks usually play out something like this:
Attackers will scan the open web for IP addresses that contain a certain framework. In this case they were looking for WordPress sites, however I also found Magento sites, custom sites, and a handful of other frameworks. One of the most impressive aspects of this hacks was the fact that I found multiple different frameworks that just got hacked, not just WordPress (or Joomla, etc).
From there, the hackers bulk scanned the targeted sites for ones that have outdated frameworks and plugins. They then lookup each site for known exploits and use them to gain access to the system. Most of the time these are SQL injections.
The hacked link network looks something along the lines of this:
A scaled version of the hacked SEO link network we uncovered
To illustrate just how easy this can be done, here is a step by step YouTube video of an attacker gaining access to a WordPress website in about 3 minutes. Crazy right?
Once they gain access, attackers often times have different goals for what to do with the hacked site. Sometimes they are politically motivated and put up and landing page for their hacking groups. Other times you won’t even notice anything is wrong, they are looking to expand their botnet to use your server as a node in their attack system
Other times, and in this case they are very blackhat SEO’s looking to use your website as a giant source of link juice to sell links or to boost their own affiliate sites. A secondary motivation for this hack might have been to collect credit card numbers from potential customers, but I don’t have any evidence of that.
Analysis of the hack
For starters, the attacker uploaded 14,000 files to the server which happened to be a Godaddy server. 95% of those files were HTML files that contained a semi-functioning “eCommerce like” page.
Most of the content was in an Asian language that I determined to be Japanese via Google Translate:
Using Google translate to determine what language the hacked network is.
You can see here the homepage of another hacked site on this same network is running Magento (with major problems)
Attackers many times (not in this case) would leave the front end of the website in tact
The attacker uploaded HTML files to the inner pages of the hacked sites. Unless the webmaster was browsing around the inner directories of the server, they probably wouldn’t even notice they are hacked.
Meanwhile, the backend of the website contains 1000’s of hacked pages
Thanks to some careful Linux hacking, I grepped the list of files and regex’d them into a list of filenames.
Using Linux command line to mass edit the file names
Once I had that list, I uploaded all the files to a sanitized and isolated VPS that I could play around on. That server is still live for the time being at:
Mirror of hacked server: http://18.104.22.168 (as of 2017, no longer mirrored)
This allowed me to freely browse through the files on the internet without risk of worrying about the integrity of the server.
From there I loaded up my trusty old friend Scrapebox 2.0 and used the addon “link extractor.” The 64 bit version of this tool in 2.0 burned through this list in just a few seconds.
Scrapebox 2.0 extracted all of the links from the live server
I loaded the list from the link extractor, trimmed to root, removed duplicates and I was left with a nice list of 239 websites that our hacked sites linked out to.
From our server alone, we located almost 300 other potentially hacked websites
At this point I came to a realization: this was not just my client site that was hacked, it was somewhat of a clever link network. I figured by default that all hacked sites just pointed to some money site, but I deduced that my hacked site pointed to other hacked sites, and those hacked sites pointed to even other hacked sites.
View source on any of the hacked HTML pages and you’ll see all sorts of outbound links to all sorts of different sites, with the same exact footprint:
For example have a look see at the view source of the content area of this site:
A view of the source code revealed tons of outbound links to other hacked sites, which linked to the main money site.
I found hundreds of other hacked sites on this “network” all with 100’s or 1000’s of HTML files uploaded to their server. With each infected server I scanned, I found more and more money sites.
Each hacked website had anywhere from 100-2000 HTML files uploaded to it.
I did a few random “spot checks” on some of these domains, some of them have been hacked since mid-2014 but many of them were hacked recently.
At one point I thought it might be worth a long shot to do a WHOIS and see if I can correlate any of the registrant details to peg the owner (not that I’d out them or anything) but of course, the real money sites were all private WHOIS.
Looking up WHOIS website owner information of the hacked sites to let them know their sites were hacked.
One of the brilliant things about this “link network” is that it is both interdependent and independent at the same time. All sites rely on each other to help boost their link portfolio, but at the same time if one of them did a malware scan and deleted the files, it doesn’t affect the other sites in the network.
Here is one example of what one of the hacked URL’s looked like. At the surface it is a fully functioning eCommerce website, but in all actuality it is just a rendered HTML page with all links pointing to the “money site.” Any attempt to add an item to a cart etc will simply lead you to the money site.
Uploaded the hacked files to a quarantined server for further analysis
Most of the money sites had pretty much the same stack:
Money site looks identical to the hacked site
Again about 90% or so of the money sites were Japanese language on Japanese hosts.
In the end, I identified these 6 domains as the main “money sites” and really the ones responsible for this widespread attack:
(hell, no they don’t get a link)
Most of these domains have a very low domain authority with the exception of a few of them.
Basic Mistakes by the Hacker
While I was somewhat impressed that this attacker was motivated enough to break into all of these systems, they made a ton of mistakes and a huge footprint.
Dear hacker – check your robots.txt and next time pick a directory that Google is actually allowed to crawl.
Next time check robots.txt files so Google can actually crawl the directory your links are pointing to.
Hackers forgot to check robots.txt file – sorry, no link juice for you!
Brought to you in part by: trackback spam
That’s right, trackbacks one of the oldest and dirtiest methods for quick spam.
Some of the money sites had “ok” authority, most of them did not.
Motivation for this hack
When I first encounters this breach, I thought for sure the motivation was phishing for credit card numbers. Within a few minutes I discovered how many other infected sites there are, and figured it was a link network to boost a money site.
After spending an hour or so analyzing the sites involved, I now believe that this network is not only a way to boost the attackers money site, but they may also be using this “network’ as a way to sell links.
What really stumped me in the end is that these sites really aren’t that great. They don’t appear to be getting that much traffic, they don’t seem to be ranking that well, a lot of their links aren’t showing up (maybe due to robots.txt).
Informing the other hacked sites
Listen, I’m not the type of person to stand in the way of someone doing blackhat SEO. If that is your game, fine. I do have a problem however with hacking websites especially ones owned by small business owners trying to make a living.
In this case I’m going to make one attempt at notifying these sites that they are hacked via their public email. If they respond I’ll treat it as a lead for our business, or point them in the right direction. If they don’t that’s all I can do.