While I’m only aware of one way to skin a cat, there are a number of ways to find and cleanup 404 errors on your website. While we aren’t a huge fan of using SEO tools for the most part, there is one that we highly recommend using for finding broken links on a website, and that tool is Xenu Link Sletuh.
Xenu is one of those tools that has been around forever, and that everyone uses. It was voted by PC Magazine the “fastest link checking software” which should say a lot.
One very common misconception is that 404’s will hurt your site or give you some sort of penalty. Google debunked this myth back in 2011 in an official blog post and it was reiterated again this year by Google’s new unofficial spokesperson and Webmaster Trends Analyst Garry Illyes.
But don’t get too comfortable – just because Google won’t penalize you doesn’t mean it is good for business. After all, most people hate landing on a broken link or 404. A lot of times this will actually cause visitors to “bounce” both figuratively and literally from your website.
We’ve been blogging since 2010 on this website and over the years have linked out to hundreds if not thousands of different websites and blogs. As months turn into years, many of those websites disappear or change their address by:
- shutting down completely
- getting shut down through litigation
- forgetting to pay hosting bill
- forgetting to pay domain bill
- restructuring skeleton / sitemap of website
- changing to HTTPS
- so much more
While 1 or 2 broken links isn’t really a big deal, on a large website such as this approaching 1000 indexed pages it can really start to add up. The process of finding and fixing broken links is simple, but the devil is in the details. In this tutorial, I’ll show you how I was able to fix 100’s of broken links on our website in just under a few hours:
What You’ll Need
In order to carry out this process you’ll need:
- a copy of Xenu Link Sleuth (free)
- a code editor such as Notepad++
- access to your website or CMS
- FTP credentials
- knowledge of HTML / CSS
- a few hours at least
- a full backup of your website
- a lot of patience for repetitive tasks
If you feel confident in your abilities, proceed with caution.
Beginning the Broken Link Audit
Skill Level: Intermediate – we don’t consider this a beginner task in the fact that many things can go wrong. You have to have some knowledge of servers and websites, and must have a strong overall attention to detail. With that said, proceed with caution
Launch Xenu the way you would any Windows app. Throw your URL in the top input and hit “OK” you can ignore other options for now. Go back and mess with them later if you want.
Warning: keep in mind Xenu can really tax your web host and your ISP. We used to joke around in our office and call it our own “DDoS attack tool” because it would actually take down some of our customers websites.
Anyway, you’ll see it start to run. Watch and learn. Take note of anything that turns “red” and familiarize yourself with the columns. If you’d like you can use the waiting time to do a mini title tag and meta description audit, another feature of Xenu. You can sort any column by clicking on the column title. I sort most of my columns either by “name” or by “status.”
Once Xenu is done running, it’ll politely ask you for your FTP details. This will allow the tool to do a deep scan for orphan files.
As I said earlier try to take note of anything interesting or look for patterns. In our case we had a small outage around November / December of 2011 which caused us to lose some images. This was sadly reflected in this tool, indicating a number of “not founds” throughout our site. We’ll fix those later.
I was running this tool on a virtual machine with not a lot of RAM. If it freezes up do not abort unless it freezes for 10+ minutes. A lot of times the tool will hit a bottleneck and you’ll just have to wait it out.
Once finished the tool will minimize and a report will open up in your web browser. Do a File > Save as so you can save this report for later.
I like to save incremental versions of my reports just in case I ever want to go back.
Ok now getting down to the nitty gritty. One by one, go to each URL on your website and seek out the broken link. In this case, the example shows a broken image.
I double check each URL to ensure that the tool didn’t misfire just in case. In this case, the image was in fact “not found” and needed to be fixed.
Thanks to Xenu, I was able to find the image located inside of an old blog post.
The next step is to pop open the source code and fix the error. In this case since it was a very unpopular post I just deleted the image reference.
Another alternative would be to re-create the image in Photoshop or find an alternative on the web and drop it back in via FTP at the 404’d path.
Sometimes the tool will report multiple 404’s on CMS’s like WordPress due to crazy URL parameters, when in actuality it is only 1 404.
In another case Xenu reported a “not found” error because of a whacky URL parameter at Mashable.com. Even though the site redirected properly, I still fixed it being that is “best practice.”
Other blogs change their entire permalink structure for their own reasons. Again, it is best practice to fix these. Google will love you.
Xenu will also find “not found” errors caused by 301 redirects. There have been a ton of these this year due to all these websites switching to SSL / HTTPS. If you only have a few of these you can fix them manually. If you are feeling adventurous you can fix these in bulk using a few different tools.
Since we are an SEO blog and link out to Moz.com a lot, we found hundreds of links to Moz.com. Being that they switched to SSL / HTTPS last year, this triggered a number of “not founds” on our report. We could go to each post and edit it manually, but since we use a database we can run a find and replace.
Please, please, please be careful doing this and make sure you backup your database. The below code will only work with WordPress. Again, use at your own risk.
This short SQL query will replace all “http://moz.com” with “https://moz.com”
You’ll also have to fix any www vs non-www URL’s just an FYI.
Here is the MYSQL query I ran to find and replace http with https on our own server.
UPDATE wp_posts SET post_content = REPLACE ( post_content, 'http://example.com', 'https://example.com');
This by no means is a difficult task, but it can be time consuming. If you are a savvy user you can use SQL queries to fix a lot of other URL’s as well.
Once I am done doing a 404 audit ill generally go back and run it again. A lot of times I’ll miss a few but I also want to keep a record of what it looks like when it is running good.
In this case we were able to clean up a ton of broken links, all in all we fixed over 1000 broken links. I would say over 50% of these were http to https fixes and a large portion were 301s.
Once a website starts to get large, a lot of issues starts to pop up and fixing broken links is just one of them.