// google // // google end //

Web page meta tags

August 9, 2009 by admin  
Filed under Web sites

META Tags can be used for excluding content from search engine crawlers. It is also used when you cannot upload a robots.txt file. Its purpose is to keep content out of search engine indexes.  They should be added between the HEAD section of your page(s) in question:

(no)index determines whether the crawler should index this page.
Possible values: “noindex” or “index”

(no)follow determines whether the crawler should follow links on this page and crawl them. Possible values: “nofollow” and “follow.”

Here are a few examples:
1) This disallows both indexing and following of links by a crawler on that specific page:
<meta name=”robots” content=”noindex,nofollow” />

2) This disallows indexing of the page, but lets the crawler go on and follow/crawl links contained within it:
<meta name=”robots” content=”noindex,follow” />

3) This allows indexing of the page, but instructs the crawler to not crawl links contained within it:
<meta name=”robots” content=”index,nofollow” />

4) Finally, there is a shorthand way of declaring 1) above (don’t index nor follow links on page):
<meta name=”robots” content=”none”>

If this meta tag is missing, or if there is no content, or the robot terms are not specified, then the robot terms will be assumed to be “index, follow” (e.g. “all”). If the keyword all is found in the robots terms list it overrides all other values. That is, a robots terms that is “nofollow, all, noindex, nofollow”, would effectively be “all”.

Designing a robots.txt file

August 9, 2009 by admin  
Filed under Google

Search engines collate an image of the internet by searching it, using a special program called a spider. This spider, sometimes called bot, retrieves a map of the internet and all its web pages and files. This map is then used as data to compile results for queries we type into the search engines like Google and Yahoo.

The robots.txt file sits in the root of your web site and tells these search engine bots what NOT to spider. Areas you may not want them to spider and hence not show up in search queries are sensitive pages, areas with no suitable content, images and pages of duplicate content. Indexing the same content twice risks the bots marking it as duplicate content.(monthly archives, category folders and on your front page) Duplicate content usually ends up in search engines supplemental index as opposed to its main index

A Robots.txt file is can be constructed using Notepad and contains statements like below :

To stop all bots indexing your site (indicated by “/”)
User-agent: *
Disallow: /

To block googles image bot scanning the site
User-agent: Googlebot-Image
Disallow: /

To prevent all bots from indexing certain directories
User-agent: *
Disallow: /cgi-bin/
Disallow: /privatedir/
Disallow: /tutorials/blank.htm
Disallow: /file.html

Dissallow all bots from indexing except Alaxa
User-agent: *
Disallow: /
User-agent: ia_archiver
Disallow:

Dissallow all bots except googlebot. This uses the ALLOW term which only google bot knows
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /

This statement tells the bots where your sitemap is. Sitemap: http://www.yoursite.com/sitemap.xml

Do not leave a robots.txt file empty as some bots will not index your site.