256
Logo

Search Engine Page Ranking

Copyright 2000 by Gray Watson

[ So by now (2005) this document is somewhat out of date. The specifics of what the engines do needs to be tuned but the basic gist of the piece is still valid. ]

As a principal software engineer and architect of a large portion of the Lycos backend search technology, my friends seem to think that I know something about search engine ranking. Go figure. Just about once a week I get asked the question "how can I get my web pages to appear at the top of a search engine's result list?" So here's the answer. Please understand that I know Lycos technology and can make educated guesses about the other engines but the following is in my opinion only. See your dealer for details. Your mileage may vary. Apply only to affected area. Batteries not included.

How To Effect the Ranking

The short answer to this question is:

You need to anticipate your audience's "goal(s)", provide and properly represent appropriate, textual content for which they are looking, and properly link this content into both your site and the web.

Plain and simple. Sorry if you do not like this answer. If you are looking for some way to trick the engines or artificially boost your pages then you will have to talk to search engine companies directly and buy keywords or placement. This certainly can be done and may be less expensive than you would otherwise think.

A search engine's main job is to provide results which most satisfy a user's query. If they present a result that you visit and don't agree that the document is about your query, then you are disappointed. They perform a lot of research and spent a lot of development time trying to detect and penalize spam. They are trying to keep one step ahead of the folks who are attempting to direct searches to pages that are not about the query. For this reason Lycos, and I believe most other search engines, pays no attention at all to the meta description tags. If they are then, IMO, they shouldn't be. Meta description and keyword tags are hidden attributes that you can add to the front of your document which are supposed to annotate and describe the document. Since the users will never see this information, they will be disappointed if you stick in invalid keywords or fail to keep the description in line with the document's contents which usually is the case.

A Site is Greater Than Its Pages

Before we talk about creation of good content and respresentation of that content, I need to explain a little about how a good website should be laid out. Once you have a set of web pages talking about your organization, your products and services, you need to make sure that the pages are well integrated. If they are individual pages with no links to your other content then users cannot browse your site to learn more about you. If you say "widget a is similar to widget b" make sure that "widget b" is clickable to the pages about it. You certainly don't want to have gratuitous links everywhere which can distract from the content, but make sure there are ways for people to navigate your site.

Navigation bars on the top or side of the page are one mechanism for achieving the necessary integration however on many sites they are far too large and/or ugly. However, a nicely done image with pointers to: About-Us, Contact, Products, Help, Site Map, and Search goes a long way to improving a site. I've been thinking that I need one for this site.

Remember that your main page should be the introduction page. It should announce your organization and show what services that your organization and website provide. Small amounts of compelling content or information on key services should live there but try to keep the navigation and key information to the screenful and limit the page size to at most 2-3 screens.

It's Called Content Stupid

So you've created a web page with your organization's name, description, and contact information. For many groups this is all the web presence that they require at this time. If, however, you want people to find your pages not just because they are looking for your organization name but also because they are looking for the services that you offer, then you need to provide more content then just an introduction page.

One of the first tricks that I use when designing a site that I want to be "noticed" on the web is to think like the users that I'm trying to attract. We all use the web to some extent. How do you search? How do you browse? What words would you use to search for certain content? What information would you be looking for that your pages could provide? If you have problems being objective about these questions, then ask the family and friends who are less tightly related to your organization. Don't given them any background but ask them what searches they would use to look for your product, service, etc.. Be intentionally vague.

So now that you have a list of queries that people would use to find you, make sure that you make specific pages that satisfy these queries. The industry now calls these landing pages. I don't just mean to fill an entire page with the words "president clinton" to lured someone to your site without giving them any true value. This is an example of spam, and if someone did actually visited this page, why would they stay? No, I mean give the user a real reason to read the document. If you provide some service, create a document which fully describes the service, it's history, it's value, etc.. If you sell a widget make sure you give a description, specifications, pictures, etc.. You should use images remembering that they should rarely dominate the page. Users can always click on an image to make it larger to get more detail but you don't want to bury your text. Search engines can't read images so the text is what is important to them.

Proper Content Representation

So I've explained a bit about the importance of site integration and talked about how to use content to satisfy your audience. This section deals with the proper way of representing content on a web page so that search engines will find it and rank it higher in their result lists.

One of my favorite stories at Lycos is when Pepsi called us all bent out of shape that they could not find their web pages with a search on the word "pepsi". We examined their home page and found that the word "pepsi" did not appear on it once -- not in the title and not in the body of the document. What they had was large, gratuitous images and a super cool "look-and-feel". I can see the idiot web developers all goose-pimpled with excitement over the "experience" they had created. Unfortunately search engines look for text (i.e. words) and can't easily "read" images. A search engine would download their page, toss out the images, and be left with no content. They lose. Search engines do look at the alt HTML image tag however where you can specify a textual description of the image. Pepsi's alt tags were blank of course.

So if you have a page that you want to appear high in the search results page for a certain topic, you should make sure that the topic is well represented in the document. Choosing a good title for the document is of paramount importance. If you want a page to come up for a search on "president clinton" then both of those words should be in the title of a document. This is actually a poor example because you will be competing with many, many other pages out there and will probably not show up near the top of anyone's list. This is the reason why companies try for imaginative but easily spelled names. But be careful of indiscriminately putting lots of words into the title of your document. Lycos, and I'm sure other engines, flag documents with long titles as spam -- this penalizes the relevancy scores for your document significantly.

Once you have the right title, make sure that you use the words also in the body of the document. Lycos attaches relevance weight to words that appear in the HTML heading tags <H1>, <H2>, or <H3>. The words "Your Ranking with Search Engines" at the top of this page is enclosed in an <H1> heading tag. As you can see it appears larger and bolded than the rest of the document. Lycos believes that this means that the document is more likely about those words. In most document's case, this is probably correct. On most of my pages, you will see that the title is the same as the <H1> header at the top of the document.

Once you have the right title and use the words also in the heading tags, you should also make sure you use the words in normal text. If you have a title of "president clinton" but then talk about the "leader of the free world" named "bill" in the document then you may not get a high relevance score. You should figure on using the words "president" and "clinton" often in your document. You certainly, president, not want to, clinton, use them too often, president clinton, to be annoying, clinton clinton. Just be careful of using the thesaurus and synonyms too often.

At the end of these instructions, I'd like to remind you of the need for appropriate document content as mentioned above. Don't let this get lost as you are tuning your web pages so that they get scored highly in search engines. Remember that you are not only trying to get someone to find your page but you are trying to get them to read, browse, buy, etc..

Location, Location, Location

So the old brick and mortar saying that your business' success depends highly on where it is located seems to not be as relevant on the web. Remember, however, that the WWW is a web of links -- the more links to your site, the more chances that someone will visit you, and the better your virtual "location".

When you first create your site, you should immediately submit it to the search engines. You don't have to wait for all of the pages to be completed. Most engines have a "Submit This Site" link at the bottom of their introduction pages. It may be hidden a bit but it should be there. They will usually ask you for your email address and the URL to the web page to submit. If you are like me and don't give your email address out then you could use a MailNull address instead. By submitting your site, you are inviting the search engine spider programs to visit your pages. It may take a couple of hours to a couple of weeks for them to download your pages and include them in their collections.

If you have submitted your site to the search engines and have compelling content, you should start getting hits to your site sooner. This depends of course on the popularity of your information. You may find that users or other organization will put links to your pages on their sites. This is what you want to encourage. Check your referral logs on your web server to see how people are finding your site. Feel free to surf the web and find sites that are related to your's. See if you can contact them and have them put links on their site to you. If you do this, you should first put a link to other related sites on your web pages and then ask to reciprocate. Other ways to improve your web site's "location" include buying keywords and advertising on search engines or other portals.

Summary

Here are some bullet points to keep in mind while designing your pages:

Search Engine Relevancy

The following provides more technical information on relevancy scores and how Lycos' engine works under the cover. Other engines have similar mechanisms and ranking algorithms. Google's Pagerank algorithm does factor in strongly into their relevancy scoring however they use additional relevancy metrics to rank documents. I've created pages on my site which I know only have one link to them that I can find in the top 10 search results for certain queries.

How search engine's sort documents is very dependent on the engine of course. The Lycos search engine that I wrote combines boolean operators with relevancy sorting. When you do a search for "president clinton" Lycos assumes "president AND clinton" -- many search engines assume "president OR clinton" which is a mistake in my opinion. Once a document is known to contain both the word "president" and the word "clinton", a relevancy score is calculated for the document so that it can be inserted into the results list.

Relevancy can take into account a number of factors, some of which are listed below. None of this is secret information and there are published papers out there on this subject. The tricks are to determine what factors to use for any particular query and to do it a 100,000 times a second.

Relevancy factors:

All of these factors are used to calculate document relevancy. Their relative priority and weights are often on a per query or specific document collection basis.

Free Spam Protection   Android ORM   Simple Java Zip   JMX using HTTP   Great Eggnog Recipe