|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
[tgarip@neu.edu.tr: [Koha-devel] Building zebradb]----- Forwarded message from Tümer Garip <tgarip@...> -----
Hi, We have now put the zebra into production level systems. So here is some experience to share. Building the zebra database from single records is a veeeeery looong process. (100K records 150k items) Best method we found: 1- Change zebra.cfg file to include iso2079.recordType:grs.marcxml.collection recordType:grs.xml.collection 2- Write (or hack export.pl) to export all the marc records as one big chunk to the correct directory with an extension .iso2079 And system call "zebraidx -g iso2079 -d <dbnamehere> update records -n". This ensures that zebra knows its reading marc records rather than xml and builds 100K+ records in zooming speed. Your zoom module always uses the grs.xml filter while you can anytime update or reindex any big chunk of the database as long as you have marc records. 3-We are still using the old API so we read the xml and use MARC::Record->new_from_xml( $xmldata ) A note here that we did not had to upgrade MARC::Record or MARC::Charset at all. Any marc created within KOHA is UTF8 and any marc imported into KOHA (old marc_subfield_tables) was correctly decoded to utf8 with char_decode of biblio. 4- We modified circ2.pm and items table to have item onloan field and mapped it to marc holdings data. Now our opac search do not call mysql but for the branchname. 5- Average updates per day is about 2000 (circulation+cataloger). I can say that the speed of the zoom search which slows down during a commit operation is acceptable considering the speed gain we have on the search. 6- Zebra behaves very well with searches but is very tempremental with updates. A queue of updates sometimes crashes the zebraserver. When the database crash we can not save anything even though we are using shadow files. I'll be reporting on this issue once we can isolate the problems. Regards, Tumer _______________________________________________ Koha-devel mailing list Koha-devel@... http://lists.nongnu.org/mailman/listinfo/koha-devel ----- End forwarded message ----- -- Joshua Ferraro VENDOR SERVICES FOR OPEN-SOURCE SOFTWARE President, Technology migration, training, maintenance, support LibLime Featuring Koha Open-Source ILS jmf@... |Full Demos at http://liblime.com/koha |1(888)KohaILS _______________________________________________ Koha-zebra mailing list Koha-zebra@... http://lists.nongnu.org/mailman/listinfo/koha-zebra |
|
|
Re: [tgarip@neu.edu.tr: [Koha-devel] Building zebradb]Hi Joshua,
Thanks for this feedback, it's very interesting. Clearly some of the issues you describe (i.e. a lack of stability around upadets) indicate software problems, but there's also some interesting ideas for possible refinements or new developments which I think would be really useful to get into the general development plans for the software... there are more folks at ID involved in Zebra development that I'd like to get into these thoughts... I dunno if a wiki or just a larger zebra-dev list is in order, but it's something to think about. Joshua Ferraro wrote: >----- Forwarded message from Tümer Garip <tgarip@...> ----- > > >Hi, >We have now put the zebra into production level systems. So here is some >experience to share. > >Building the zebra database from single records is a veeeeery looong >process. (100K records 150k items) > > kind of buffering into for first-time updating, or else the logic has to be in the application, as you've seen... the situation is particularly grim if shadow-indexing is enabled during indexing and every record is committed, since this causes a sync of the disks, which could take up to a whole second. Also, I'm not sure which version of Zebra you're using? I've been foing some performance-testing of Zebra for the Internet Archive, and noted quite a difference between 1.3 and 1.4 (the CVS version) which is really where all the development happens. >Best method we found: > >1- Change zebra.cfg file to include > >iso2079.recordType:grs.marcxml.collection >recordType:grs.xml.collection > >2- Write (or hack export.pl) to export all the marc records as one big >chunk to the correct directory with an extension .iso2079 And system >call "zebraidx -g iso2079 -d <dbnamehere> update records -n". > >This ensures that zebra knows its reading marc records rather than xml >and builds 100K+ records in zooming speed. >Your zoom module always uses the grs.xml filter while you can anytime >update or reindex any big chunk of the database as long as you have marc >records. > > use two different formats, especially when they both have limitations. We really must look into handling ISO2709 from ZOOM. Mind you, version 1.4 should be able to read multiple collection-wrapped MARCXML records in one file, but only (AFAIK) in conjunction witht the new XSLT-based index rules. I *would* like to try to develop a good way to work with bibliographic data in that framework. >3-We are still using the old API so we read the xml and use >MARC::Record->new_from_xml( $xmldata ) >A note here that we did not had to upgrade MARC::Record or MARC::Charset >at all. Any marc created within KOHA is UTF8 and any marc imported into >KOHA (old marc_subfield_tables) was correctly decoded to utf8 with >char_decode of biblio. > > Why not ask for the records in ISO2709 from Zebra if that's what you want to work with (when records are loaded in MARCXML, it can spit out either 2709 or XML)? Or is it just that you want to have them in MARC::Record? At any rate, ISO2709 is a more compact exchange format, so it seems to make sense to use it. >4- We modified circ2.pm and items table to have item onloan field and >mapped it to marc holdings data. Now our opac search do not call mysql >but for the branchname. > >5- Average updates per day is about 2000 (circulation+cataloger). I can >say that the speed of the zoom search which slows down during a commit >operation is acceptable considering the speed gain we have on the >search. > > Beause the Internet Archive will be doing similar point-updates for records (only a *lot* more often than 2000 times per day) I have been looking a lot at the update speed for these small changes that only affect a single term in a record (like a circulation code).. In my test database of 150K records, 10K average size, I get update times around .5-.7 seconds, which just seems intuitively faster than it should have to be. I'm going to nudge the coders to see if we can possibly do this better. >6- Zebra behaves very well with searches but is very tempremental with >updates. A queue of updates sometimes crashes the zebraserver. When the >database crash we can not save anything even though we are using shadow >files. I'll be reporting on this issue once we can isolate the problems. > > Now *that* is nasty. Do you have *any* way of consistently recreating the problem? Even if you don't, I'm sure Adam and the Zebra crew would like to see some stack traces of those crashes!! --Sebastian >Regards, >Tumer > > > >_______________________________________________ >Koha-devel mailing list >Koha-devel@... >http://lists.nongnu.org/mailman/listinfo/koha-devel > >----- End forwarded message ----- > > > -- Sebastian Hammer, Index Data quinn@... www.indexdata.com Ph: (603) 209-6853 _______________________________________________ Koha-zebra mailing list Koha-zebra@... http://lists.nongnu.org/mailman/listinfo/koha-zebra |
| Free embeddable forum powered by Nabble | Forum Help |