firewall Spyware removal internet speed test web security service free software and tips


Go Back   Web Security > Web Site Optimization > Optimization Techniques
FAQ Members List Search Today's Posts Mark Forums Read

Optimization Techniques Ways to optimize your website so that you rank better on the search engines. Questions on how to change your site so it attacts more visitors, and other similar topics.

Advertisements

robots.txt

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 02-04-2008,
Senior Member
 
Join Date: Feb 2007
Posts: 360
Default robots.txt

Jim,

Is there anyway I can make my robots.txt file any more accessible?
Right now I have
User-agent: *
Disallow:

and my awstats for my site, albeit the free version (6.6), for example google stats are 64 +4

and
"Robots shown here gave hits or traffic "not viewed" by visitors, so they are not included in other charts. Numbers after + are successful hits on "robots.txt" files."

Can you think a way to improve the + number or doesn't it matter becuase I don't make as many changes to my site as I use to.

Thanks Jim,

Roger
Reply With Quote
  #2 (permalink)  
Old 02-05-2008,
AMPC's Avatar
Administrator
 
Join Date: Jan 2007
Posts: 1,415
Default Robots.txt Example

Hi Roger,

There is not much you can do to the robots.txt file except for excluding bots that may artificially inflate your stats. You have to be careful not to exclude the good bots, like google, msn, Y!, etc.

Here is what I use on my bigger sites:

Quote:
User-agent: *
Disallow: /visit.php

User-agent: NPBot
Disallow: /

User-agent: grub-client
Disallow: /

User-agent: grub
Disallow: /

User-agent: WebZip
Disallow: /

User-agent: larbin
Disallow: /

User-agent: b2w/0.1
Disallow: /

User-agent: psbot
Disallow: /

User-agent: Python-urllib
Disallow: /

User-agent: NetMechanic
Disallow: /

User-agent: URL_Spider_Pro
Disallow: /

User-agent: CherryPicker
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: EmailSiphon
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: CopyRightCheck
Disallow: /

User-agent: Crescent
Disallow: /

User-agent: SiteSnagger
Disallow: /

User-agent: ProWebWalker
Disallow: /

User-agent: CheeseBot
Disallow: /

User-agent: LNSpiderguy
Disallow: /

User-agent: Teleport
Disallow: /

User-agent: TeleportPro
Disallow: /

User-agent: MIIxpc
Disallow: /

User-agent: Telesoft
Disallow: /

User-agent: Website Quester
Disallow: /

User-agent: moget/2.1
Disallow: /

User-agent: WebZip/4.0
Disallow: /

User-agent: WebStripper
Disallow: /

User-agent: WebSauger
Disallow: /

User-agent: WebCopier
Disallow: /

User-agent: NetAnts
Disallow: /

User-agent: Mister PiX
Disallow: /

User-agent: WebAuto
Disallow: /

User-agent: TheNomad
Disallow: /

User-agent: WWW-Collector-E
Disallow: /

User-agent: RMA
Disallow: /

User-agent: libWeb/clsHTTP
Disallow: /

User-agent: asterias
Disallow: /

User-agent: httplib
Disallow: /

User-agent: turingos
Disallow: /

User-agent: spanner
Disallow: /

User-agent: InfoNaviRobot
Disallow: /

User-agent: Harvest/1.5
Disallow: /

User-agent: Bullseye/1.0
Disallow: /

User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /

User-agent: CherryPickerSE/1.0
Disallow: /

User-agent: CherryPickerElite/1.0
Disallow: /

User-agent: WebBandit/3.50
Disallow: /

User-agent: NICErsPRO
Disallow: /

User-agent: Microsoft URL Control - 5.01.4511
Disallow: /

User-agent: DittoSpyder
Disallow: /

User-agent: Foobot
Disallow: /

User-agent: SpankBot
Disallow: /

User-agent: BotALot
Disallow: /

User-agent: lwp-trivial/1.34
Disallow: /

User-agent: lwp-trivial
Disallow: /

User-agent: BunnySlippers
Disallow: /

User-agent: Microsoft URL Control - 6.00.8169
Disallow: /

User-agent: URLy Warning
Disallow: /

User-agent: Wget/1.6
Disallow: /

User-agent: Wget/1.5.3
Disallow: /

User-agent: Wget
Disallow: /

User-agent: LinkWalker
Disallow: /

User-agent: cosmos
Disallow: /

User-agent: moget
Disallow: /

User-agent: hloader
Disallow: /

User-agent: humanlinks
Disallow: /

User-agent: LinkextractorPro
Disallow: /

User-agent: Offline Explorer
Disallow: /

User-agent: Mata Hari
Disallow: /

User-agent: LexiBot
Disallow: /

User-agent: Web Image Collector
Disallow: /

User-agent: The Intraformant
Disallow: /

User-agent: True_Robot/1.0
Disallow: /

User-agent: True_Robot
Disallow: /

User-agent: BlowFish/1.0
Disallow: /

User-agent: JennyBot
Disallow: /

User-agent: MIIxpc/4.2
Disallow: /

User-agent: BuiltBotTough
Disallow: /

User-agent: ProPowerBot/2.14
Disallow: /

User-agent: BackDoorBot/1.0
Disallow: /

User-agent: toCrawl/UrlDispatcher
Disallow: /

User-agent: WebEnhancer
Disallow: /

User-agent: suzuran
Disallow: /

User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /

User-agent: VCI
Disallow: /

User-agent: Szukacz/1.4
Disallow: /

User-agent: QueryN Metasearch
Disallow: /

User-agent: Openfind data gathere
Disallow: /

User-agent: Openfind
Disallow: /

User-agent: Xenu's Link Sleuth 1.1c
Disallow: /

User-agent: Xenu's
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /

User-agent: RepoMonkey
Disallow: /

User-agent: Microsoft URL Control
Disallow: /

User-agent: Openbot
Disallow: /

User-agent: URL Control
Disallow: /

User-agent: Zeus Link Scout
Disallow: /

User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /

User-agent: Webster Pro
Disallow: /

User-agent: EroCrawler
Disallow: /

User-agent: LinkScan/8.1a Unix
Disallow: /

User-agent: Keyword Density/0.9
Disallow: /

User-agent: Kenjin Spider
Disallow: /

User-agent: Iron33/1.0.2
Disallow: /

User-agent: Bookmark search tool
Disallow: /

User-agent: GetRight/4.2
Disallow: /

User-agent: FairAd Client
Disallow: /

User-agent: Gaisbot
Disallow: /

User-agent: Aqua_Products
Disallow: /

User-agent: Radiation Retriever 1.1
Disallow: /

User-agent: Flaming AttackBot
Disallow: /

User-agent: PerMan
Disallow: /

User-agent: searchpreview
Disallow: /

User-agent: BecomeBot
Disallow: /
I just set the file up and leave it - nothing else to do.

Also, see the first entry, visit.php, that tells the bots that they can't access visit.php (other files are fine, but not this file). I don't really have a visit.php file, but I do monitor calls to it and when a bot or someone attempts to access it, I can act accordingly.

Many website owners place the names of files and directories that they don't want the bots to visit in their robots.txt file and this is one of the first places a malicious hacker will look for information.

It's most interesting when you find a search engine that you think would / should respect the robots.txt file trying to access the file anyway!

Regards,

Jim.
Reply With Quote
  #3 (permalink)  
Old 02-05-2008,
Senior Member
 
Join Date: Feb 2007
Posts: 360
Default Bots

Jim,

Holy Toledo. I'll have to add that to my robots.txt file.

Thanks Jim,

Roger
Reply With Quote
  #4 (permalink)  
Old 10-01-2008,
Banned
 
Join Date: Sep 2008
Posts: 3
Thumbs up Re: robots.txt

Robots.txt is a file that you create and upload to your websites root directory that is used by search engine spiders to determine which parts of your website they should index and which folders/pages that you, the website owner, don't want listed in search engine indexes.
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT -5. The time now is .


Powered by a CPU
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.
LinkBacks Enabled by vBSEO 3.3.2 © 2009, Crawlability, Inc.