Let’s go back to basics today.
Car owners typically know that there are a number of automatic safety features on a modern vehicle. There are a number of gauges and sensors that will show a driver that a taillight is out, or that the windshield wiper fluid is low. People routinely eyeball their tires or check the actual tire pressure with a tire pressure gauge so they know they’re maximizing the life of their tires and the safety of their commute.
Similarly, website owners have a few things they can do as well to do quick checks of their site. One easy way to check a site’s health in Google is via the site: operator. It’s my favorite operator by far (Advanced operators documentation). When I talked to website owners at Pubcon in Las Vegas last year, I was struck by how many people weren’t aware of it. These searches are useful to the technical and the non-technical alike.
Adding the site: operator to a search will allow one to restrict the results to a granular level. Most people use site: to restrict a search to a specific subsection of our index, such as [site:yahoo.com yankees] to only search Yahoo for information on the New York Yankees. They may not know that if one uses the site operator alone, Google will do its best to bring you back all the documents it knows about for that subsection of Google’s index. Some examples:
- It can restrict to a top-level domain ([site:co.uk]),
- to a domain level ([site:brianwhite.org]),
- subdomain level ([site:sports.yahoo.com])
- or a subdirectory ([site:mattcutts.com/blog/]).
There are more possibilites including subsections with dynamic URLs–so it’s fun to play around with that operator.
Stuff I like about the site: operator:
- It shows you immediately how page titles and URLs look. The nice thing about seeing all of one’s documents like this is that things you want to think about fixing will stick out at you pretty quickly. They might be page titles or documents you don’t want visible to the public. You can take things into your own hands (ideas below), or if you work with a web development professional or SEO, you can pick up the phone and ask them about what you see in these results.
- You can look at the estimated results at the top right to gauge roughly how many pages Google knows about.
- You can use a site: search on other sites to see how they want the world to see themselves through search engines. This could be sites you admire, your competitors, or admired competitors.
- As a member of the Webspam team at Google, I occasionally see sites that are hacked or defaced by rogue enterprises that aim to put revenue-generating pages on sites without the owner’s knowledge, in the hopes that those pages “borrow” the site’s reputation to show up in Google. Most sites don’t have to worry about this. Last year I posted on this topic at SEW. Again, it’s not something to worry about too much, my point is that routine site: checks have a good chance of showing you instantly if rogue pages have been inserted into your site (paying attention to analytics reports can help here too).
- Google’s not the only search engine to support this operator. While the implementations differ slightly, Yahoo, Live Search, and Ask all support it.
Tips on using the site: operator:
- You can also add a negative site: operator to do additional filtering. Consider this query: [site:brianwhite.org -site:brianwhiteblog.appspot.com]. I’m asking Google, “Please give me everything you know on the domain brianwhite.org, but remove results that are on the subdomain ‘www’ on the same domain.” These combinations can be powerful, especially if you have a larger site and use subdomains. One of the joys of living in our “search era” is challenging oneself to combine operators and techniques to get interesting results.
- Don’t panic if you see problems within your site: search, a lot of them can be fixed. There’s no guarantee that any particular URL has been seen by a user, as well–they still have to search for it and have the URL show up in results. More can be learned at the Google Webmaster Help Center.
- Don’t worry about Supplemental Results. Supplemental results, by themselves, don’t indicate problems. My blog has about half its results marked as Supplemental right now, and I consider that a bonus. Some sites have URLs come in and out of Supplemental status on a continual basis. Google’s index is refreshed very frequently and is highly dynamic. I don’t worry about that ratio as I know that people are finding my site based on my logs and Analytics reports.
Actions you can take as a result:
- Head to Google’s Webmaster Tools where you can get more insight on how Google crawls your site, discuss what you see with other site owners, and look for help in the documentation section.
- If there are sections of your site you want to remove from the index quickly, Webmaster Tools has a removal tool.
- If you have non-urgent things you’d like to clean up, you can think about cleaning up things like page titles or modifying your robots.txt and/or META tags to prevent crawling or archiving. I’m thinking of preventing Googlebot from crawling some sections of this blog based on a site: search (I’ll talk about it in an upcoming post).
I know that some webmasters and site owners have pet uses of site: operator queries. What are your favorite applications, tips, or issues with this operator? I’d love to hear about them. Also, eagle-eyed readers will have noted that there were no lyrical poems, stanzas, or verses to be had in this ode
Update Jun 16 2008: The Supplemental Results label went away.