rap

Home - Sitemap

Any hope for "Allow" field in robots.txt?


first question, why do you want to exclude the users pages? that could be a lot of traffic.


It's traffic, but not the revenue-generating traffic we want. User sites, and all of those directories underneath, heavily dilute our own main site's rankings. I know this isn't what some users would want, but our service won't stay free and available if we can't pay the bills through advertisements. Having a much improved SERP will be impossible if we don't restrict the spidering of these sites. Besides, our site has a searchable directory for our user sites. "WE" need the traffic to come through our main site as much as possible for new site visits.


i think you're onto the the right idea. according to the [url=http://www.google.com/webmasters/faq.html]google webmaster f.a.q.[/url] the syntax is: [code:1:4b3f36d1db] User-Agent: * Disallow: / Allow: /*.html$ [/code:1:4b3f36d1db] so based on this example, your example is correct. 1) you disallow crawling by googlebot for everything / 2) you explicity state the directories you want googlebot to crawl


That's reassuring. I wonder what the net effect will be if I do this... I suppose Google would be in good shape for us, but the other major search engines wouldn't spider us... Would you suggest I specify a unique set of instructions for "googlebot" but leave the rest of the site open to other spiders? Please critique this example syntax: [code:1:48f8251354] # this section covers just Google UserAgent: googlebot Disallow: / Allow: /community/ #main public site's home directory Allow: /faq/ Allow: /forums/ Allow: #...etc... other desired visible directories here # this covers everyone else UserAgent: * Disallow: [/code:1:48f8251354] Also - is there a recommended list of spiderbots to restrict? I know that not all spiders will follow robots.txt , but I'd like to be made aware of the recommended "bad ones" out there. Thanks for your advice on this! :D


the useragent field is used to set the behavior for each search engines behavior. so, to be thorough, you would create an entry for each one. but really, google is the most important one so you could just use a wildcard for the others. here's information on [url=http://help.yahoo.com/help/us/ysearch/slurp]yahoo's slurp[/url] also on [url=http://search.msn.com/webmasters/msnbot.aspx]msnbot[/url] between google, msnbot, and yahoo that will be 95%+ of your referers anyway so i'd do those 3 at least. and unfortunately i've never done this so my advice is not based on experience, just reading. but it sure looks like you have it right. i would probably move the users to a sub-domain, it would be a lot easier to manage, especially if you put them on a seperate server.


[quote:5d8a09b200="edwin"]to be thorough, you would create an entry for each one. but really, google is the most important one so you could just use a wildcard for the others.[/quote:5d8a09b200] Thanks for all the advice! :D Just one last specific point to verify... should the "google, yahoo, msn" entries be ABOVE the wildcard entry in the file or does the order of how they are listed matter? For example: #list the important ones first UserAgent: googlebot...(msn, yahoo...) Disallow.... # everyone else goes last UserAgent: * Dis... Does it matter? :?: Thanks!


i doubt it matters, but i'd probably put the important ones first just like you were thinking :)



Copyright 2008, All Rights Reserved. Thanks for visiting a Performance Marketing Group Network Website. Advertise Here. Please visit one of our other quality websites.:
Thanks for visiting. News updated daily at Times Of The Internet Please visit the PerformanceCorporate Directory. Member of the Blog Republic.