robots.txt

View: New views
8 Messages — Rating Filter:   Alert me  

robots.txt

by Sébastien Hinderer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

Does someone use a robots.txt file on a Koha web site ?
If so, what should / could such a file contain ?
Many thanks in advance for any recommendation or advice,
Sébastien.
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by Bernardo Gonzalez Kriegel :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I use 

User-agent: *
Disallow: /

on /usr/share/koha/opac/htdocs/robots.txt

bgk

On Mon, Nov 2, 2009 at 4:14 AM, Sébastien Hinderer <Sebastien.Hinderer@...> wrote:
Dear all,

Does someone use a robots.txt file on a Koha web site ?
If so, what should / could such a file contain ?
Many thanks in advance for any recommendation or advice,
Sébastien.
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha


_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by MJ Ray-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sébastien Hinderer <Sebastien.Hinderer@...> wrote:
> Does someone use a robots.txt file on a Koha web site ?
> If so, what should / could such a file contain ?
> Many thanks in advance for any recommendation or advice,

It could contain any of http://www.robotstxt.org/robotstxt.html

It should contain whatever will make robots behave as you want.

I don't think our libraries currently use one but I've not checked
specifically for this.

Hope that helps,
--
MJ Ray, member of www.software.coop Experts in web and GNU/Linux
(TTLLP # in subject emails = copy to all workers unless asked.)
Turo Technology LLP, reg'd in England+Wales, number OC303457
Reg. Office: 36 Orchard Cl., Kewstoke, Somerset, GB-BS22 9XY
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by Owen Leonard-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> It could contain any of http://www.robotstxt.org/robotstxt.html
...
> I don't think our libraries currently use one but I've not checked
> specifically for this.

When we were hosting our own Koha installation we had to start
excluding search engine bots (Googlebot in particular) because our
server was getting hit too hard and it was slowing everything down. I
think LibLime blocks everything by default for its customers now. I'd
certainly prefer to be able to let Google in. I'd like the contents of
the OPAC to be discoverable in search engines.

  -- Owen

--
Web Developer
Athens County Public Libraries
http://www.myacpl.org
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by Sébastien Hinderer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

MJ Ray (2009/11/02 15:14 +0000):
> It could contain any of http://www.robotstxt.org/robotstxt.html

I didn't know this link, thanks.

> It should contain whatever will make robots behave as you want.

That's what I don't know -- how I want them to behave.
My guess is that everything shold be disallowed because no page has a
meaning without arguments... What I would like to know is whether this
guess is correct or not.

> I don't think our libraries currently use one but I've not checked
> specifically for this.

I apache error logs and noticed some clients look for this file so I
thought I could create one so that the error log does not contain this
not too interesting entry.

Cheers,
Sébastien.
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by MJ Ray-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Owen Leonard <oleonard@...> wrote:
> When we were hosting our own Koha installation we had to start
> excluding search engine bots (Googlebot in particular) because our
> server was getting hit too hard and it was slowing everything down. I
> think LibLime blocks everything by default for its customers now. I'd
> certainly prefer to be able to let Google in. I'd like the contents of
> the OPAC to be discoverable in search engines.

That's pretty much what our librarians have told me when I've asked.
To some of them, more eyeballs means more borrowers means more lends
and pretty directly means more funding.

It is possible to use things like Google Webmaster Tools and even
iptables to slow the search engine bots down if/when they become a
problem.  I don't know if that will override settings like LibLime's
blocking.

In general, I feel it sucks to be doing the search engine's work for
them and they should tread lightly by default, but that's the
trade-off if you want the OPAC to be indexed at the moment.

Hope that helps,
--
MJ Ray (slef)  Webmaster and LMS developer at     | software
www.software.coop http://mjr.towers.org.uk        |  .... co
IMO only: see http://mjr.towers.org.uk/email.html |  .... op
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by Ben Ide :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


If your library uses OCLC to record your holds, bear in mind that this information is ported to Google (and a few other places, I think) and always searchable using http://www.worldcat.org


Thanks,
-- Ben

On Mon, Nov 2, 2009 at 2:14 AM, Sébastien Hinderer <Sebastien.Hinderer@...> wrote:
Dear all,

Does someone use a robots.txt file on a Koha web site ?
If so, what should / could such a file contain ?
Many thanks in advance for any recommendation or advice,
Sébastien.
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha


_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha

Re: robots.txt

by Magnus Enger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2009/11/2 Sébastien Hinderer <Sebastien.Hinderer@...>:
> That's what I don't know -- how I want them to behave.
> My guess is that everything shold be disallowed because no page has a
> meaning without arguments... What I would like to know is whether this
> guess is correct or not.

I think it's not, actually. I have set up Koha for my customer #1 at
sksk.bibkat.no and without doing anything I now get 7.000+ hits in
Google when I search for site:sksk.bibkat.no:

http://www.google.com/search?q=site%3Asksk.bibkat.no

The very first hit is for a page of search results for the norwegian
word for birds. How they figured that out, I have no idea! The strange
thing is that this catalogue is hardly linked to from anywhere, so
they must have some way to index the catalogue other than just
following links.

I notice that on the second page of search results there are several
MARC-views - hardly what you want patrons to find first. So perhaps
there should be some way to tell bots to just index the "ordinary"
views, not things like MARC?

Also, having a robots.txt just to say "index everything" sounds like a
good idea, to avoid the "robots.txt not found" messages in the error
log.

Regards,
Magnus
libriotech.no
_______________________________________________
Koha mailing list
Koha@...
http://lists.katipo.co.nz/mailman/listinfo/koha