The local site
The local site is where all the work happens. It’s where we build the website and make changes to it before it goes live. We call it a ‘local’ site, because it’s only available on the machine that we’re using to develop it with, so it can’t be accessed by anyone other than those who are working on the website. It’s not live on the internet and hasn’t yet been handed over to a client, so we’re free to build and make additions without worrying about breaking anything elsewhere on the site. Everything happens locally before it’s ever made live.
The staging area
Next we have the staging area. After the site has been developed locally and we’re at a point that we’re happy to share it with a client, we create a staging area for it. The staging area is a temporary space on the web that we can share with our clients. It’s where you first get to see your brand new website, and it gives you a chance to give us your feedback on every new feature we add, and suggest changes that you’d like to see before the site goes live.
The staging area serves as an important stepping stone between our local site and the final, live version of the website. Each time we finish a new feature on our local site, we update the staging area. We can access it from our tablets and mobile phones, allowing us to do a lot more testing so we can make sure everything is working correctly before we make the website live. If we made changes directly to the live website, we might have missed a bug or an error that could hurt page rankings, or just make the site unusable, so it’s important that it gets tested before being released into the wild.
As the staging area itself is technically a ‘live’ website, in that it can be viewed online by anyone that we share the link with, we have to make sure that the site doesn’t get indexed by any search engines, and this is where an important file called ‘robots.txt’ comes into play.
In order to get your website listed on Google, it first has to be indexed. Google does this by using something called a ‘Web Crawler’, which is a bot that systematically browses the web, gathering information on new and updated websites. This information is then sent back to Google (or whoever the crawler belongs to!) and is used to index your website so that it can be displayed when somebody searches for something relating to it. The ‘robots.txt’ file is basically a set of rules for these web crawling robots to follow; the crawler will look through your robots.txt file for any rules you have given, for example a, ‘don’t index any of my pages’ rule, and act accordingly. You can also tell it to ignore only specific pages on your website which still allows your site to show up in search results, but only the pages you want to be seen.
So that’s how we make sure that an incomplete website on a staging area is kept away from Google, whilst allowing for thorough testing and previewing - we just tell the crawler to ignore everything it finds. When the site is complete and we do want it to be shown in search results, we simply remove the robots.txt file. When a crawler comes across a site with no robots file, it means there’s no rules and it is allowed to index to its heart’s content!