WebsiteX5.info - The ultimate guide for Incomedia's WebsiteX5

Search this site

Go to content

Main menu:


Robots.txt and sitemaps

Themes > Search engine optimization

There's a lot of confusion about the files robot.txt and sitemap.xml. Do you need them? What do they do? And why aren't they generated by WebsiteX5? To make it short: yes, you do need them. Modern search engines can do a much better job when these files are available. Even when all your pages can be indexed by internal links it is still a good idea to provide them. It seems that your page rank goes up when they are available.

What does a robots.txt looks like?
Robots.txt is simply a text file which tells spiders, like GoogleBot where they may and may not go on your website. For example, when you have an unfinished test site in your web space you do not want the pages of your test site to show up in the search results. Also you do not want a directory with scripts to be crawled.

Suppose your website got a directory structure like this:

    /files
    /guestbook
    /images
    /scripts
    /res
    /testsite
    /slideshow
    index.html
    file1.html
    file2.html
    test.html

Then you might do not want the robot to index the directories /files, /guestbook, /res, /testsite and /slideshow. Also the file test.html should not be indexed.

Now the robots.txt would look like this:

    User-Agent: *
    Disallow: /files
    Disallow: /guestbook
    Disallow: /res
    Disallow: /testsite
    Disallow: /scripts
    Disallow: /slideshow
    Disallow: /test.html
    Allow: /

Each line tells the spider what it can do or cant do. For example the robot is not allowed to access files in the /testsite folder (Disallow). But it is allowed to access the files in the root folder (Allow: /) and the files in the /Images folder (Allow: / , but no Disallow: /Images).

One of Googles webmaster tools is a utility which you can use to generate a robots.txt file. That's one of the reasons to sign up for a Google account and register your website with Google.

You can find more information on robots.txt on www.robotstxt.org/orig.html


Sitemap.xml
WebsiteX5 generates a file called 'imsitemap.html'. This is the page you will see when you choose 'sitemap' in the menu which is displayed on the bottom of each page, if you have set this option in the General Settings. This imsitemap.html is NOT the sitemap.xml-file which is used by search engines.

A sitemap.xml file is a file which has a format wich is described at www.sitemaps.org.

In general a sitemap looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

<url>
<loc>http://www.domain.com/page_xxx.html</loc>
<lastmod>2008-11-17T19:36:47+00:00</lastmod>
<priority>0.1</priority>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.domain.com/page_yyy.html</loc>
<lastmod>2008-11-17T19:36:50+00:00</lastmod>
<priority>0.9</priority>
<changefreq>hourly</changefreq>
</url>
</urlset>

For each page in your website there is a <url> </url> block. Each block contains:

  • <loc>...</loc> The value between these tags is the full url of the page
  • <lastmod>...</lastmod> The date and time the page was last modified
  • <priority>...</priority> The relative importance of the page within the website (a value between 0.1 and 1)
  • <changefreq>...</changefreq> Tells the search engine how often the page is changed.



Evolution 8 can generate the sitemap for you. This will work if you are working on a one project site. If you are working on a multi-project site you have to register each sitemap at Google.



On the Expert Tab you can set a checkbox indicating that WebsiteX5 should generate a sitemap as part of the export.



The values for the sitemap are set at the page properties dialog on the expert tab...



Think carefully about the values of the Update Frequency and the Contents Priority...


























Don't even try to create a sitemap.xml by hand. It's a hell of a job and the smallest error will make that the sitemap can't be read by the search engine. Use one of the many available online sitemap generators like




Most online sitemaps generators are a bit limited in their possibilities. Some of them have a limit to the number of pages they will crawl. Others don't calculate the priority, but they use a fixed value for it. But for a basic sitemap.xml they are more than sufficient... When you use an online generator make sure you check the sitemap.xml before you upload it to the web server. I found one that didn't correctly crawl the site. All the links were there, but most of them were faulty!

You could also use one of the available programs for generating sitemaps, for example:

  • GSiteCrawler (a good and free program but it is a bit technical). It not only generates the sitemap.xml but also a urllist for Yahoo.
  • Micro-Sys A1 Sitemap Generator. App. € 41. It generates a number of different sitemaps. This is good and easy to use program.


Both programs also generate the robots.txt file. You can define which pages to include in the sitemap and which pages shouldn't be crawled and the robots.txt is based on this information.

After you have generated (and downloaded) your sitemap.xml you have to upload the sitemap to the root-directory of your website using an FTP client (the programs can do the uploading automatically for you).

Next you HAVE to register your sitemap.xml at the search engines. For Google you can use Google Webmaster tools. But for example on Live Search you can submit a search engine through their site (use the webmasters link at the bottom of the Live Search page).




Back to content | Back to main menu