|
Search
Engine Optimisation Sydney > High
Search Engine Ranking Optimisation
Article
About High Search Engine Optimisation Ranking
The importance of robots.txt
Although the robots.txt file is a very important file if you want
to have a good ranking on search engines, many Web sites don't offer
this file.
If your Web site doesn't have a robots.txt file yet, read on to
learn how to create one. If you already have a robots.txt file,
read our tips to make sure that it doesn't contain errors.
What is robots.txt?
When a search engine crawler comes to your site, it will look for
a special file on your site. That file is called robots.txt and
it tells the search engine spider, which Web pages of your site
should be indexed and which Web pages should be ignored.
The robots.txt file is a simple text file (no HTML), that must
be placed in your root directory, for example:
http://www.yourwebsite.com/robots.txt
How do I create a robots.txt file?
As mentioned above, the robots.txt file is a simple text file.
Open a simple text editor to create it. The content of a robots.txt
file consists of so-called "records".
A record contains the information for a special search engine.
Each record consists of two fields: the user agent line and one
or more Disallow lines. Here's an example:
User-agent: googlebot
Disallow: /cgi-bin/
This robots.txt file would allow the "googlebot", which
is the search engine spider of Google, to retrieve every page from
your site except for files from the "cgi-bin" directory.
All files in the "cgi-bin" directory will be ignored by
googlebot.
The Disallow command works like a wildcard. If you enter
User-agent: googlebot
Disallow: /support
both "/support-desk/index.html" and "/support/index.html"
as well as all other files in the "support" directory
would not be indexed by search engines.
If you leave the Disallow line blank, you're telling the search
engine that all files may be indexed. In any case, you must enter
a Disallow line for every User-agent record.
If you want to give all search engine spiders the same rights,
use the following robots.txt content:
User-agent: *
Disallow: /cgi-bin/
Where can I find user agent names?
You can find user agent names in your log files by checking for
requests to robots.txt. Most often, all search engine spiders should
be given the same rights. in that case, use "User-agent: *"
as mentioned above.
Things you should avoid
If you don't format your robots.txt file properly, some or all
files of your Web site might not get indexed by search engines.
To avoid this, do the following:
Don't use comments in the robots.txt file
Although comments are allowed in a robots.txt file, they might
confuse some search engine spiders.
"Disallow: support # Don't index the support directory"
might be misinterepreted as "Disallow: support#Don't index
the support directory".
Don't use white space at the beginning of a line. For example,
don't write
placeholder User-agent: *
place Disallow: /support
but
User-agent: *
Disallow: /support
Don't change the order of the commands. If your robots.txt file
should work, don't mix it up. Don't write
Disallow: /support
User-agent: *
but
User-agent: *
Disallow: /support
Don't use more than one directory in a Disallow line. Do not use
the following
User-agent: *
Disallow: /support /cgi-bin/ /images/
Search engine spiders cannot understand that format. The correct
syntax for this is
User-agent: *
Disallow: /support
Disallow: /cgi-bin/
Disallow: /images/
Be sure to use the right case. The file names on your server are
case sensitve. If the name of your directory is "Support",
don't write "support" in the robots.txt file.
Don't list all files. If you want a search engine spider to ignore
all files in a special directory, you don't have to list all files.
For example:
User-agent: *
Disallow: /support/orders.html
Disallow: /support/technical.html
Disallow: /support/helpdesk.html
Disallow: /support/index.html
You can replace this with
User-agent: *
Disallow: /support
There is no "Allow" command
Don't use an "Allow" command in your robots.txt file.
Only mention files and directories that you don't want to be indexed.
All other files will be indexed automatically if they are linked
on your site.
Tips and tricks:
1. How to allow all search engine spiders to index all files
Use the following content for your robots.txt file if you want
to allow all search engine spiders to index all files of your Web
site:
User-agent: *
Disallow:
2. How to disallow all spiders to index any file
If you don't want search engines to index any file of your Web
site, use the following:
User-agent: *
Disallow: /
3. Where to find more complex examples.
If you want to see more complex examples, of robots.txt files,
view the robots.txt files of big Web sites:
http://www.cnn.com/robots.txt
http://www.nytimes.com/robots.txt
http://www.spiegel.com/robots.txt
http://www.ebay.com/robots.txt
Your Web site should have a proper robots.txt file if you want
to have good rankings on search engines. Only if search engines
know what to do with your pages, they can give you a good ranking.
Copyright by Axandra.com
Internet
marketing and search engine ranking software
High Search Engine Ranking Optimisation Article
Get Your Site To The Top of the Search Engines!
Find out how Seo-Xpress can get your site into the top results when someone searches for your keywords |
| If you are serious about improving your sites' web position in Google, Yahoo & Bing (MSN) by Search Engine Optimisation, simply submit your first name and email address in the form below to receive further information RIGHT NOW. (Check your email in a few seconds) |
| |
|