|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
ZOOM-Perl release 1.10, now with different CCLJosh and others,
I've just made release 1.10 of ZOOM-Perl, which uses Adam's preferred CCL API rather than the one I originally implemented. It will show up on CPAN, but until then please use: http://www.miketaylor.org.uk/tmp/Net-Z3950-ZOOM-1.10.tar.gz It will build against the most recent YAZ release. I am off to watch England vs. Trinidad and Tobago! _/|_ ___________________________________________________________________ /o ) \/ Mike Taylor <mike@...> http://www.miketaylor.org.uk )_v__/\ "If you are a fascinating writer, then you follow a deeper set of rules which make the normal ones irrelevant. If not, then you need to follow the normal rules until you get fascinating" -- Greg Gunther. _______________________________________________ Koha-zebra mailing list Koha-zebra@... http://lists.nongnu.org/mailman/listinfo/koha-zebra |
|
|
Differently index same field numbers from different record typesThis issue revisits an indexing problem related to the problem which
appeared in the thread "[Zebralist] how to index everything ?" Sebastian Hammer wrote: > Hi Paul, > > I don't know if this helps, but if you add the line 'xpath enable' to > your .abs file, Zebra will build additional index structures to enable > searches like: > > Z> find @attr 1=/*/title someterm > > What is supported is a subset of the XPATH spec, but I *think* you can do: > > Z> find @attr 1=/*/datafield[@tag='245'] someterm > > In other words, XPATH-statements are used to select elements for > searching, as an alternative to numerical USE attributes. > > Performance is not quite as good as for the regular indexes, so it's not > something you want to do a lot in production on a 10M record database... > but it's fine for smaller applications. Unlike the issue presented in the earlier thread, this issue requires high performance. Sebastian Hammer wrote: > Hmm. You can speed things up by having a specialized tag index. > > something like > > xelem /record/datafield/@tag tag > > in your abs file. > > then you can query something like > > Z> find @and @attr 1=tag '245' @attr 1=/*/datafield/subfield[code='9'] > someterm > > to speed things up a bit. > > > You could also define an index for each combination of tag/subfields, > but that might be an administration nightmare. Sebastian Hammer wrote: > That wouldn't work out of the box. But this 'should work' (haven't tried > it): > > Z> find @attr 1=/*/datafield[@tag='245']/subfield[@code='a'] someterm Maybe we will need an administration nightmare to have the system function as needed. 1. INDEXING PROBLEM. We need to be able to differently index fields with the same field number from different record types differently. How can different indexing for the same field number be accomplished without storing them in separate databases? Record type can be distinguished by the value of 000/06 but I am uncertain that will help properly in all circumstances where we do actually want to search across multiple record types when the records are related. 2. MARC CONFLICT EXAMPLES. have not inspected well to consider all the cases risking false results if the record types are not distinguished well. 2.1. FIXED LENGTH FIELD CASE. I have always believed that the basic fixed length data elements fields need local use field analogues with appropriate values to ease searching because record type and even bibliographic level within a record type changes the meaning of fixed length data elements. MARC 21 008 and UNIMARC 100 have this variance problem. Supplementary local use fields might be a reasonable choice for solving other problems in the case of the fixed length data elements fields. 2.2. A MARC 21 CASE. If we have MARC 21 bibliographic records with 500, general note, and also MARC 21 authority records with 500, see also from tracing--personal name; how can we index them differently? 2.3. A UNIMARC CASE. If we have UNIMARC bibliographic records with 200, title and statement of responsibility, and also UNIMARC authorities records with 200, heading--personal name; how can we index them differently? 3. XML META RECORDS EXAMPLES. We have been considering using XML meta-records to overcome the problem of needing to index related records together. The records may have a structure like the following simplified possibility. <collection > <bibliographic_record> <related_authority_records> </related_authority_records> <related_holdings_records> </related_holdings_records> </bibliographic_record> </collection> 3.1. PATH ELEMENT DIFFERENCE. How can we a use a path element difference or even an attribute difference to have fields of different record record types indexed differently? <collection id="1"> <bibliographic_record> <record> <datafield tag="500" ind1=" " ind2=" "> <collection id="1"> <bibliographic_record> <record> <related authorities records> <record> <datafield tag="500" ind1="1" ind2=" "> 3.2. PATH MINOR ATTRIBTE DIFFERENCE. How can we a use a path element minor attribute difference to have fields of different record record types indexed differently? <record type="Bibliographic"> <datafield tag="500" ind1=" " ind2=" "> <record type="Authority"> <datafield tag="500" ind1="1" ind2=" "> Thomas Dukleth Agogme 109 E 9th Street, 3D New York, NY 10003 USA http://www.agogme.com 212-674-3783 _______________________________________________ Koha-zebra mailing list Koha-zebra@... http://lists.nongnu.org/mailman/listinfo/koha-zebra |
|
|
Re: Differently index same field numbers from different record typesIn my hurry to obtain a quick answer, I assembled the original question
with excess haste. The question was not well explained at the beginning of the message and I left out a possibility which might be an important part of an efficient solution. As there has been no answer yet, I here completely replace the original form of the question with this reorganised and corrected message. I also made a minor mistake with the attribution of a quote from a previous thread from the zebra list, which I correct here. I am trying to obtain at least a quick partial answer with enough information to know how or whether to start designing a schema for an XML-meta-record containing other related MARCXML records. If there enough of an easy answer to point the meta-record schema work in a correct direction, I would like to have that answer as soon as possible. If an answer which will help with other aspects of the question takes a little longer to think about, then, give an additional more complete answer later. 1. INDEXING PROBLEM. We need to be able to differently index fields with the same field number from different record types differently. How can different indexing for the same field number from different record types be accomplished without storing the different record types in separate databases? One example is the case of MARC 21 bibliographic records with 500, general note, and also MARC 21 authority records with 500, see also from tracing--personal name. The records are liable to be in some XML form that may be easy to work with and helpful for indexing. 1.1. STANDARD MARCXML. <record type="Bibliographic"> <leader>content</leader> <controlfield tag="some field">content</controlfield> <datafield tag="some field" ind1=" " ind2=" "> <subfield code="some subfield">content</subfield> </datafield> <datafield tag="some other field" ind1=" " ind2=" "> <subfield code="some subfield">content</subfield> </datafield> <datafield tag="500" ind1=" " ind2=" "> <record type="Authority"> <leader>content</leader> <datafield tag="some field" ind1=" " ind2=" "> <subfield code="some subfield">content</subfield> </datafield> <datafield tag="500" ind1="1" ind2=" "> 1.2. VARIATIONS ON STANDARD MARCXML. 1.2.1. SUPPLEMENTARY ATTRIBUTES VARIATION FROM STANDARD MARCXML. Adding additional attributes to standard elements provides a short predictable path between the field needing indexing and the point where record types are distinguished. Non-standard syntax and recordtype attributes are added for this possibility. <datafield tag="500" ind1=" " ind2=" " syntax="MARC 21 recordtype="Bibliographic"> <datafield tag="500" ind1="1" ind2=" " syntax="MARC21" recordtype="Authority"> 1.2.2. CHANGED ELEMENT NAMES FROM STANDARD MARCXML. Changing element names standard elements provides a short predictable path between the field needing indexing and the point where record types are distinguished. However, this variation requires additional record transformation before and after using standard MARCXML tools to change element names to and from standard names. Non-standard element names record the record syntax and record type for this possibility. <datafield_marc21_bib tag="500" ind1=" " ind2=" "> <datafield_marc21_auth tag="500" ind1="1" ind2=" "> 1.3. XML META-RECORDS. The records may have a structure like the following simplified possibility. <collection> <bibliographic_record> <related_authority_records> </related_authority_records> <related_holdings_records> </related_holdings_records> </bibliographic_record> </collection> 1.3.1 STANDARD MARCXML INSIDE. Using standard MARCXML in meta-records is much the same as indexing standard MARCXML alone where the type attribute in the record element is not part of the datafield element being indexed. The conclusion should be the same for this issue as it would be for indexing standard MARCXML without including it in a meta-record. 1.3.2. NON-STANDARD MARCXML INSIDE. Using non-standard MARCXML in meta-records is much the same as indexing non-standard MARCXML alone where the means of determining record type would be part of the element being indexed. The conclusion should be the same for this issue as it would be for indexing non-standard MARCXML without including it in a meta-record. 2. PREVIOUS THREAD FOR SIMILAR ISSUE. This issue revisits an indexing problem related to the problem which appeared in the thread "[Zebralist] how to index everything ?" Perhaps the answer to this issue would be similar to one of the answers given in that earlier thread. Sebastian Hammer wrote: > Hi Paul, > > I don't know if this helps, but if you add the line 'xpath enable' to > your .abs file, Zebra will build additional index structures to enable > searches like: > > Z> find @attr 1=/*/title someterm > > What is supported is a subset of the XPATH spec, but I *think* you can do: > > Z> find @attr 1=/*/datafield[@tag='245'] someterm > > In other words, XPATH-statements are used to select elements for > searching, as an alternative to numerical USE attributes. > > Performance is not quite as good as for the regular indexes, so it's not > something you want to do a lot in production on a 10M record database... > but it's fine for smaller applications. Unlike the issue presented in the earlier thread, this issue requires high performance. Marc wrote: > Hmm. You can speed things up by having a specialized tag index. > > something like > > xelem /record/datafield/@tag tag > > in your abs file. > > then you can query something like > > Z> find @and @attr 1=tag '245' @attr 1=/*/datafield/subfield[code='9'] > someterm > > to speed things up a bit. > > > You could also define an index for each combination of tag/subfields, > but that might be an administration nightmare. Sebastian Hammer wrote: > That wouldn't work out of the box. But this 'should work' (haven't tried > it): > > Z> find @attr 1=/*/datafield[@tag='245']/subfield[@code='a'] someterm Maybe we will need an administration nightmare to have the system function as needed. 3. DISTINGUISHING MARC RECORD TYPES. MARC Record type can be distinguished by the value of 000/06 but I am uncertain that will help properly in all circumstances where we do actually want to search across multiple record types as part of a meta-record when the records are related. Furthermore, there are multiple values for 000/06 for the same major record type. 4. MARC CONFLICT EXAMPLES. I have not inspected well to consider all the cases risking false results if the record types are not distinguished well. 4.1. FIXED LENGTH FIELD CASE. I have always believed that the basic fixed length data elements fields need local use field analogues with appropriate values to ease searching because record type and even bibliographic level within a record type changes the meaning of fixed length data elements. MARC 21 008 and UNIMARC 100 have this variance problem. Supplementary local use fields might be a reasonable choice for solving other problems in the case of the fixed length data elements fields. 4.2. A MARC 21 CASE. I gave a case, in the problem statement for conflicting field use between MARC 21 bibliographic records with 500, general note, and also MARC 21 authority records with 500, see also from tracing--personal name. 4.3. A UNIMARC CASE. One UNIMARC case for conflicting field use would be UNIMARC bibliographic records with 200, title and statement of responsibility, and also UNIMARC authorities records with 200, heading--personal name. 5. ISO 2709 PROBLEM RESTATEMENT. The immediate need is to know what direction to go in designing an XML meta-record. However, I would still be interested in knowing how to index the same field number differently based on the value in 000/06 or an abstracted value in a local use field. An abstracted local use value might be in a local use field such as 01k as follows. 000 content 001 content 01k ## $a MARC21 $b Bibliographic 100 ## $a content 245 ## $a content $c content 500 ## $a general note 000 content 001 content 01k ## $a MARC21 $b Authority 100 ## $a content 500 ## $a see also from personal name Thomas Dukleth Agogme 109 E 9th Street, 3D New York, NY 10003 USA http://www.agogme.com 212-674-3783 _______________________________________________ Koha-zebra mailing list Koha-zebra@... http://lists.nongnu.org/mailman/listinfo/koha-zebra |
| Free embeddable forum powered by Nabble | Forum Help |