Themes > Search engine optimization
There's a lot of confusion about the files robot.txt and sitemap.xml. Do you need them? What do they do? And why aren't they generated by WebsiteX5? To make it short: yes, you do need them. Modern search engines can do a much better job when these files are available. Even when all your pages can be indexed by internal links it is still a good idea to provide them. It seems that your page rank goes up when they are available.
What does a robots.txt looks like?
Robots.txt is simply a text file which tells spiders, like GoogleBot where they may and may not go on your website. For example, when you have an unfinished test site in your web space you do not want the pages of your test site to show up in the search results. Also you do not want a directory with scripts to be crawled.
Suppose your website got a directory structure like this:
/files
/guestbook
/images
/scripts
/res
/testsite
/slideshow
index.html
file1.html
file2.html
test.html
Then you might do not want the robot to index the directories /files, /guestbook, /res, /testsite and /slideshow. Also the file test.html should not be indexed.
Now the robots.txt would look like this:
User-Agent: *
Disallow: /files
Disallow: /guestbook
Disallow: /res
Disallow: /testsite
Disallow: /scripts
Disallow: /slideshow
Disallow: /test.html
Allow: /
Each line tells the spider what it can do or cant do. For example the robot is not allowed to access files in the /testsite folder (Disallow). But it is allowed to access the files in the root folder (Allow: /) and the files in the /Images folder (Allow: / , but no Disallow: /Images).
One of Googles webmaster tools is a utility which you can use to generate a robots.txt file. That's one of the reasons to sign up for a Google account and register your website with Google.
You can find more information on robots.txt on www.robotstxt.org/orig.html
Sitemap.xml
WebsiteX5 generates a file called 'imsitemap.html'. This is the page you will see when you choose 'sitemap' in the menu which is displayed on the bottom of each page, if you have set this option in the General Settings. This imsitemap.html is NOT the sitemap.xml-file which is used by search engines.
A sitemap.xml file is a file which has a format wich is described at www.sitemaps.org.
In general a sitemap looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>http://www.domain.com/page_xxx.html</loc>
<lastmod>2008-11-17T19:36:47+00:00</lastmod>
<priority>0.1</priority>
<changefreq>monthly</changefreq>
</url>
<url>
<loc>http://www.domain.com/page_yyy.html</loc>
<lastmod>2008-11-17T19:36:50+00:00</lastmod>
<priority>0.9</priority>
<changefreq>hourly</changefreq>
</url>
</urlset>
For each page in your website there is a <url> </url> block. Each block contains:
Evolution 8 can generate the sitemap for you. This will work if you are working on a one project site. If you are working on a multi-project site you have to register each sitemap at Google.
On the Expert Tab you can set a checkbox indicating that WebsiteX5 should generate a sitemap as part of the export.
The values for the sitemap are set at the page properties dialog on the expert tab...
Think carefully about the values of the Update Frequency and the Contents Priority...
Don't even try to create a sitemap.xml by hand. It's a hell of a job and the smallest error will make that the sitemap can't be read by the search engine. Use one of the many available online sitemap generators like
Most online sitemaps generators are a bit limited in their possibilities. Some of them have a limit to the number of pages they will crawl. Others don't calculate the priority, but they use a fixed value for it. But for a basic sitemap.xml they are more than sufficient... When you use an online generator make sure you check the sitemap.xml before you upload it to the web server. I found one that didn't correctly crawl the site. All the links were there, but most of them were faulty!
You could also use one of the available programs for generating sitemaps, for example:
Both programs also generate the robots.txt file. You can define which pages to include in the sitemap and which pages shouldn't be crawled and the robots.txt is based on this information.
After you have generated (and downloaded) your sitemap.xml you have to upload the sitemap to the root-directory of your website using an FTP client (the programs can do the uploading automatically for you).
Next you HAVE to register your sitemap.xml at the search engines. For Google you can use Google Webmaster tools. But for example on Live Search you can submit a search engine through their site (use the webmasters link at the bottom of the Live Search page).
Sub-Menu: