|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
[jira] Created: (NUTCH-677) Segment merge filering based on segment contentSegment merge filering based on segment content
----------------------------------------------- Key: NUTCH-677 URL: https://issues.apache.org/jira/browse/NUTCH-677 Project: Nutch Issue Type: Improvement Affects Versions: 0.9.0 Reporter: Marcin Okraszewski Fix For: 0.9.0 I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: MergeFilter.patch The patch for 0.9 > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: SegmentMergeFilter.java The filter interface (referred by the patch). > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: SegmentMergeFilters.java Merge filter aggregation which hides extension point, etc. It is referred by the patch. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 0.9.0 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-677: -------------------------------- Fix Version/s: (was: 0.9.0) 1.1 Moving this issue to 1.1. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: MergeFilter_for_1.0.patch The patch ported to Nutch 1.0. The Java files remain unchanged, only patch has changed. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12714091#action_12714091 ] Otis Gospodnetic commented on NUTCH-677: ---------------------------------------- Marcin - could you please include the Apache license on top of the code, like other Nutch classes do? > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: SegmentMergeFilter.java Added Apache License. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Updated: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcin Okraszewski updated NUTCH-677: ------------------------------------- Attachment: SegmentMergeFilters.java Added Apache license header. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
|
|
[jira] Commented: (NUTCH-677) Segment merge filering based on segment content[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763681#action_12763681 ] Marcin Okraszewski commented on NUTCH-677: ------------------------------------------ Sorry, I didn't notice the request for the license header. I've just uploaded files with the header. > Segment merge filering based on segment content > ----------------------------------------------- > > Key: NUTCH-677 > URL: https://issues.apache.org/jira/browse/NUTCH-677 > Project: Nutch > Issue Type: Improvement > Affects Versions: 0.9.0 > Reporter: Marcin Okraszewski > Fix For: 1.1 > > Attachments: MergeFilter.patch, MergeFilter_for_1.0.patch, SegmentMergeFilter.java, SegmentMergeFilter.java, SegmentMergeFilters.java, SegmentMergeFilters.java > > > I needed a segment filtering based on meta data detected during parse phase. Unfortunately current URL based filtering does not allow for this. So I have created a new SegmentMergeFilter extension which receives segment entry which is being merged and decides if it should be included or not. Even though I needed only ParseData for my purpose I have done it a bit more general purpose, so the filter receives all merged data. > The attached patch is for version 0.9 which I use. Unfortunately I didn't have time to check how it fits to trunk version. Sorry :( -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. |
| Free embeddable forum powered by Nabble | Forum Help |