Been in a situation where you go to bed happy and calm and then wake up to missed calls and texts from your boss or clients that the website your manage is down? That’s the least best way of waking up from bed. But it happens.
If you are like me, you’ll your favorite breakfast. With your pajamas, boot your laptop, connect to the internet and open the website to verify. Alas, it’s down indeed. Where do you start? This is happened to be several times so let me share my personal experience.
Is it down for me or everyone else?
That’s a very good question and infact there’s a whole website https://downforeveryoneorjustme.com/ dedicated to answering that question within seconds. There are chances that the website in question is down for you only for various reasons; perhaps DNS lookups are failing for the website because they have not yet propagated to your Internet Provider’s ISP. Maybe, you are using a CDN like Cloudflare which is having random outages in specific areas like yours. Whatever the reason, you want to verify that the website is down globally for everyone.
Ping the website
Then you want to check if the server hosting the website is online. Ping is a powerful command line tool available on Mac, Windows and Linux. If you are using Windows, open Command Prompt (CMD) and for Mac/Linux open the terminal and type ping your-website.com. You should get a response such as “64 bytes from ec2-18-233-80-179.compute-1.amazonaws.com (22.214.171.124)” means the site is up. But if you get responses such as “Destination Host Unreachable” or no response at all, means the server hosting the website is down. This is time to contact your hosting provider if you are on shared hosting or try to login to the hosting dashboard to check the status of your Virtual Private Server.
If you can however, ping the server, then it’s not a hardware or network, DNS, Domain name issue. It must be something related with the services responsible for keeping the website up.
Check HTTP status code
If the server is up, then check the HTTP status code that the website is returning. Whenever you visit a website through a browser, the server responds with an HTTP status code. You can read the complete list on Wikipedia, but here are the common ones.
- Those in the 2xx range indicate website is up. Whatever you are requesting from the server is available and well understood. 200 OK response is the most common and it’s what’s send back to the browser for every successful request.
- Those in 3xx range are usually used to indicate a URL redirection. 301 Moved Permanently is used to for instance redirect users from non-http url to corresponding secure https version.
- 4xx indicate there’s an error on the user’s or client part. Famous error 404, page can’t be found belongs here and it happens when you request a web page which the server doesn’t have.
- 5xx is used in cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Famous 500 Internal Server Error belongs here.
You want to pay attention to errors in 4xx but mostly those in 5xx. The most common I know is 403 Forbidden which indicates that that the request was valid, but the server is refusing action. This is usually because server permission issues on the server. Don’t be tempted to just changed to 777 which means world read-writeable because this opens your server up to more vulnerabilities. Please add the appropriate file permissions to your website folder.
On the 5xx range, 500 Internal Server Error is very generic error message indicating something wrong with your code or server configuration. This is not so straight-forward to debug, so you have to consider several points; Check if your web server Apache or Nginx configurations are fine. Check the error logs in /var/log/ for anything meaningful. If the server configs are find, then you should debug your website code. Again check mostly the server error codes because this is where the errors will be outputted.
502 Bad Gateway is also common error. This indicates that the server was acting as a gateway or proxy and received an invalid response from the upstream server(that is your App). If you are using Nginx as a reverse proxy to your website or app, then this is very common error to get when there’s an error with your app code. If you are using a CDN like Cloudflare, this is also common. The problem is not with Cloudflare, it’s with your website.
Check your server resources
Another common reason for website downtime is that your server is has run out of compute resources which include;
- CPU usage
- RAM usage
- Storage space
If your CPU is being maxed to 100%, the web server can’t spawn any more threads or processes to answer to user requests. This is the same case with RAM. If you are on shared hosting, your website is at the mercy of your hosting provider to allocate resources accordingly. If you are running on a personal Virtual Private Server(VPS), then you are in control of your server resources.
I recommend htop for taking a snapshot into your resource usage stats. htop is a variant of Linux command top but with better visual display of the resources. You can install it on Ubuntu by simply running “sudo apt-get install htop”. Below is a sample output of htop;
Take note of the percentage usage of the CPU cores as in the example above I have cores 1 and 2. Then memory or RAM is the next row and then SWAP space.
For hard disk space, I recommend the command “df -h” which lists all available partitions and the current space usage.
Pay particular attention to the root file system or partition which is mounted on “/”. If this gets filled up, that’s it. Your server will come to a grinding halt.
If your website gets busy, it’s time to upgrade your hosting plan or buy more server resources.
Is web server running
At the heart of every website is a web server that’s responsible for listening to client requests and responding to them with your website content. There are two popular web servers on the market; Apache and Nginx. Apache is the grand father of web servers and is most likely running most CMS-based websites such as WordPress sites. Personally, I am moving more towards Nginx these days because of its simplicity.
How do you check if web server is running on the server? You can use a tool called telnet. Simply run telnet localhost 80 to check if there’s a service running on port 80 which is the default port web services run on. If you get response similar to the one below; then web server is running.
You can also use lsof command like so “sudo lsof -i :80”. It should output processes running on port 80.
If web server is not running, you won’t even get an HTTP response code.
Check database connection
Another culprit here is the database. Almost every website especially those that are based on CMS’s like WordPress connect to a database. They use databases to store content and user preferences. Having a database creates a single point of failure and indeed several website go down because they can’t reach the database.
There are several reasons why your website can’t connect to the database. Some of these include;
- incorrect login details
- corrupt database
- host with database is down
- Too much web traffic to your website.
It’s very advisable that you continuously backup your website database on a daily basis so that you can restore a corrupt database from a healthy database backup.
I simply can’t possibly list every reason why your website is down. But these are merely pointers you can quickly scan through to rule out the common possibilities. The most important thing is to be in the know when your website goes offline so you can respond to the incident in the shortest time possible. This is why you need to sign up to a website downtime/uptime monitoring service like Site Monki. Get started for Free.