Leveraging the Lynx browser
In my opinion, viewing your site in lynx is one of the most powerful SEO tools at your disposal.
For those that don’t know, lynx is an console based web browser that has been around forever. Google actually recommends that you look at your website in Lynx because that is pretty much how most search engine spiders see your site. To load your website in Lynx simply fire up any Linux command prompt and type:
lynx example.com
From there simply use the up and down arrows to navigate, Enter to click.
If you can pretty much navigate your site unencumbered then you are on the right track.
This is where having truly optimized title tags come in. If you name all of your images “SEO blog SEO blog etc” you are going to be out of luck trying to figure out what the images are. If you named your images something more reasonable such as “green grassy meadow on a sunny day” your navigation experience will be much better.
Like all Linux commands, you can pipe them to different commands. This command will pipe Lynx to “wc” (wordcount”) which will count the number of words in the document.
lynx -dump https://www.example.com | wc -w
or just
lynx -dump "https://about.com" > lynx_example_about.txt
or dump the source to a file:
lynx -source "https://www.example.com/" > dump_source_file.txt
On that note the ‘lynx -dump’ command option is pretty handy. It will basically dump the page into a file for later viewing.
In short, get familiar with the lynx browser. As an SEO any site you care about should look decent in lynx. At the very least it is worth opening up your site in Lynx.
The cURL command
The curl tool basically allows you to grab information from the web via the console. It is a basically a low level web browser of sorts but it allows you to do really cool things like pretend to be a different user agent and pull / push information from the command line.
This curl command for instance basically says “Load this imgur.com photo but pretend like you are coming from Twitter and show me header information.”
curl -I -H 'Referer: https://twitter.com/' 'https://i.imgur.com/ZKfUroW.png'
Using the -v parameter will run curl in verbose mode which essentially tells the command to send verbose output. There are a ton of more arguments you can run using curl.
As an SEO the cURL command can help answer the following questions and issues:
- what does our site look like from X referrer?
- help solve and follow redirection issues
- diagnose form errors
- send info via FTP
I’ve recently started playing with Twitter’s API and the cURL command is coming in very handy with that as well!
AB or Apache Benchmarking
This is somewhat of an advanced tool that I don’t have a lot of experience with but it is great if you are running a LAMP server (or just an Apache server) and need to get a good idea of how fast / efficient your server is running.
This command (with parameters) for instance will run the ab command on amazon.com and will send 500 requests at the server. The tool will give you info while the command is running, and then spit out statistics at the end.
ab -n 500 -c 100 https://www.amazon.com/
You can get some more info from the Apache website about the ab command.
‘grep’ yourself before you wreck yourself
The grep command is in my opinion one of the most powerful command line tools ever written. More or less, it is an advanced “find” tool but that just does not do it justice. I won’t go into a full grep tutorial here, but rather just give a few examples on how an SEO can use it.
For instance if you are inside of your Apache log directory (which the location varies depending on what flavor Linux you are running) and run this command:
grep "Googlebot" *.log
It will sift through all of the log files for any instance of “Googlebot.” The beauty of this command is you literally get to see every time (or almost every time) that Google visits your site. If you want to get crafty you can then pipe that to other commands for formatting or to a csv for later analysis.
You can even fire up a virtual terminal and create a “real time app” of sorts to monitor Googlebot traffic to your site in real time. Just run:
tail – f access.log | grep Googlebot
Tail is a neat little program similar to head that outputs the end of a file.
Finding 404s with grep
There are hundreds and perhaps even infinite commands to find 404’s on a Linux system, but this is the one that I am most familiar with:
grep "404" access.log | awk '{print $7 }' | uniq -c | sort -n
The grep command finds lines with the word 404 in it, the awk statement strips out only the needed column(s), uniq de-dupes it, and sort sorts it. Pretty cool right?
I should note that this may produce false positives in the fact that searching for “404” might output lines that contain the string “404” but this will do for simple demonstration purposes.
Also note that you must be inside of Apaches log folder, and it assumes your log file is named “access.log” which is the default on a lot of systems.
If you just want to keep it simple you can do something like:
grep "404" access.log > 404s.txt
Which will give you raw output of all 404s unsorted into a text file.
There are so many other things you can do as an SEO with grep on an Apache server such.
Awk this way
Awk is a Linux command (actually considered a language) that allows you to parse tabular data (separated by spaces, tabs, etc). I have very limited knowledge of awk and have only used it on log files, but it is a very powerful tool. The best way to get started with awk is to open a file and number the columns. Column 1, column 2, etc. Each column is assigned a $X parameter.
Bots can really wreak havoc on your server, your traffic, and can be a large nuisance at times. One quick way to see what kind of bot traffic is hitting your Apache server is to search your log files for it:
awk -F\" '{print $6}' access.log | uniq -c
Or output to a file
awk -F\" '{print $6}' access.log | uniq -c > file.txt
Or sorted and output to a file:
awk -F\" '{print $6}' access.log | uniq -c
Here is some sample output from the above command sequence.
With this knowledge, you are now armed with enough data to assemble a proper robots.txt file.
If the bots do not identify themselves properly you can ban IP addresses or menacing IP ranges in your firewall.
“du” you know how to find hard drive hogs?
One of my favorite commands to run on a shared Linux server or VPS is the “du” command. Fire up your shell prompt and change into your users home directory and run the command:
du -Sh
This command will then return the folders that are taking up the most space. From there you can find the source of many issues, including:
- accounts that might be using more space than you think
- what specific files / folders are taking up the most space
- accounts that could be compromised
- so much more
I generally like to run this anytime I am sitting in front of a /home directory:
du -Sh /home/
The above command will size up the /home directory for troublesome directories.
Thanks ya’ll
I got a little more in-depth with this article than I wanted to, but hope you found some of these commands useful and helpful.
If you don’t have a Linux machine to work with I highly recommend getting a cheap VPS from Digital Ocean which can be had for $5.
If you have a Mac you can also mess around with the Linux command line.
As always, be careful!