DelphiFAQ Home Search:



Databases
InterBase, MS-SQL, mysql, Oracle
Programming
C#, C++, Delphi, Java,
JavaScript, perl, php, Visual Basic, VBScript
Linux
Apache, Network, Shell
Web Publishing
JavaScript, perl CGI, VBScript, Web Hosting
Windows
Apache, File Types, Internet Explorer,
Network, Printing, Processes
Outside the Cube
Auto, Computer Hardware,
Finances, Dating Scams,
Household, Male Dating Scammers,
Other Scams, Travel

Featured Article

Using robots.txt to block spiders crawling your web site

'Robots.txt' is a plain text file that through its name has special meaning to most decent robots on the web. By defining a few rules in this text file instruct robots to not crawl and index certain files or directories within your site.

If you do not want Google to crawl your site's /pictures folder, you can protect this folder from Google's crawler.

The following gives a few examples how to write a robots.txt file. It has to be placed in the www root directory of your server. On Linux boxes, this is typically /var/www/html.

The following example shows several versions of robots.txt files, separated by a line.


; block Google's image crawler completely User-agent: Googlebot-Image Disallow: /
; block all spiders and bots from those 2 directories User-agent: * Disallow: /cgi-bin/ Disallow: /pictures/
; allow Googlebot to access everything except /cgi-bin ; and all other bots can access nothing ; finally allow ia_archive (alexa.com) to access everything! User-agent: * Disallow: / User-agent: Googlebot Disallow: /cgi-bin/ User-agent: ia_archiver Allow: /

Generated 16:03:43 on Jul 16, 2019