You are reading an archived post from the first version of my blog. I've started fresh, and the new design and content is now at boxofchocolates.ca

These aren’t the pages you are looking for

July 11, 2004

The other day I told the story of The Case of the Missing Defensive Design. It is a story of how some other site didn’t provide default error pages, which lead a confused user that was at a dead end to do a search for the error text, which landed them on one of my sites (WATS.ca) – specifically on our Resource page that lists HTTP error/status codes and what they mean.

While preparing that post, I was checking our referrer logs looking at requests for that page. Here’s what I found:

We see a lot of traffic come our way because of that resource page. In the first part of this year, that page has been viewed roughly 17,000 times, with 12,500 or so of those coming from Google. I suspect this page also helps us with our overall Page Rank, which in turn helps us in Search Results for other topics. I’m not going to complain… ;)

What it Isn’t

This whole episode has me thinking though – Google is very good search technology, but it could be better. The problem as I see it is this: we have no reliable way of telling search engines that this particular resource is mostly reference material and not a “how to fix” guide. Google decides what a page is, but we can’t help Google by explicitly telling Google what a page isn’t.

My first thought was using Jedi mind tricks: “These aren’t the droids pages you are looking for…”

Then I woke up, and came up with a few other ways to tell Google and other search engines that the resource in question isn’t what they are looking for:

The easiest would be a low tech approach – include a statement at the top of the page: “This resource does not contain information on how to fix many of these errors. It is only reference.” The problem with this is that it now contains the phrase “how to fix” which will be a key phrase that might appear in searches. Am I now going to drive more traffic to the site?

The best might be to find various references on how to fix some of these errors and link to it — providing very specific link text “how to fix many of these http error codes”. Perhaps that might divert some of the traffic away from our resource page. Unfortunately I’m not aware of many resources like this that exist, and certainly not one resource that covers all the possible errors and solutions in one place.

Recently, on The Unofficial Google Blog, Judith Meskill offered Gmail accounts to commenters that finished the statement: “Google would be perfect if…” Well, here is mine:

Google would be perfect if we could tell it what our pages aren’t.

Maybe we need something new then – non-keyword stuffing? Many search engines now ignore meta tags (like keywords and description) because they were abused by spammers using keyword stuffing to drive inappropriate traffic to their site. In the case of anti-keyword stuffing, this would be a conscious effort on the part of the web master / developer / designer to keep particular types of traffic away. Maybe something like:

<meta name=”non-keywords” content=”how to fix” />

It would be relatively easy for search indexing robots to consider the data reliable data – after all, we would be working to keep traffic away. Searches for “how do I fix a 403.3 HTTP error?” still might include our resource page, but perhaps not as highly ranked in the list of search results. On the other hand, a search for 403.3 HTTP Error (without the how to fix) might list the same or a similar set of search results but rank our result higher.

Would it be worth it?

A technique like this would not have made a difference when I was contacted by that confused user – that person was looking for closure, not for instructions on how to fix the error.

Where I think it might be useful is for users that know specifically what type of resources they seek. I can’t see any obvious use for spammers (can you?), so I’d like to think it would be worth implementing in specific cases, for specific resources or pages, but I can’t say with confidence that it would definitely be a worthwhile exercise (for us as developers) or for Google and other search technologies to implement…

Would this make searching better? Would it be a waste of time and effort? by developers? by Google and other search engines?

Filed under:

3 Responses

Comment by Yuriy — Jul 30 2004 @ 10:52 am

There is absolutely no doubt that web-sites are ready for a better indexing mechanics. “Non-keywords” solutions will for certain do the business but web-developers also need a possibility to manualy relevant keywords defining most and less valuable in context of page content.

Also I think of a keywords linking mechanism which would allow better communicating between related pages outside the current site.

Comment by eric scheid — Aug 28 2004 @ 6:35 am

This idea has some merit, and you’re not the first to come up with it … see The Anti-Thesaurus

Comment by John Doe — Sep 20 2004 @ 7:29 am

Google should by default just not searh any sites that identify themselves as weblogs or discussion sites (obviously a system would have to be implemented whereby they could identify themselves that way). I think your average user (your mother, your grandparents) really aren’t interested in the mindless chatter (no offense) that appears on the average blog and it just skews the results from the information they are really searching for.

Advanced users can go into the settings and do a blog-specific search if they want.