You might want to ask in the technical forum. Hopefully someone can
point you that way, or answer your question here.
Carcharoth
On Sat, Jun 27, 2009 at 10:24 PM, akhil1988<
akhilanger@...> wrote:
>
> Hi All!
>
> Here's a newbie to this forum.
>
> I am looking for some references to help me use Wikipedia XML dump.
>
> Here's what I have to do with the XML dump:
>
> I will set up a server on which people can browse Wikipedia articles and
> also a processed version of the corresponding Wikipedia article. By
> processed version means a wikipedia article with some additional information
> with each line. eg
>
> A line in a Wikipedia article (
http://en.wikipedia.org/wiki/Chicago) goes
> as:
>
> Chicago (pronounced /ʃɨˈkɑːɡoʊ/ or /ʃɨˈkɔːɡoʊ/) is the largest city in the
> U.S. state of Illinois, and with over 2.8 million people is the third
> largest city in the country.
>
> My processed version of wikipedia page would be like this:
>
> Chicago (pronounced /ʃɨˈkɑːɡoʊ/ or /ʃɨˈkɔːɡoʊ/) is the largest city in the
> U.S. state of Illinois, and with over 2.8 million people is the third
> largest city in the country. <Some additional information about this line>
>
> Dont bother about "Some additional information about this line". This is
> some NLP (natural Language Processing) stuff which processes the line and
> generates some additional information about the line.
>
> So, if somebody wants to access the processed version of any Wikipedia
> article, he can go to:
http://myserver/wiki/processed_Chicago>
> I hope I am clear what I intend to do with the wikipedia XML dump.
>
> For this I need to know the following things:
>
> 1. How should I extract articles from the XML dump, process them by
> extracting plain text from them and then insert the processed page back line
> by line at the same place in the XML article as before along with the
> additional information that will be generated by the NLP stuff.
> In this whole process, I want to maintain the look of the wikipedia page as
> the original version.
>
> 2. How to render a wikipedia page from the XML dump just like as we see in
> the online version of the Wikipedia.
>
> 3. XML dump does not have images in it, so how will I render images when a
> page on my server is accessed.
>
> Any references or ideas in this regard will be greatly appreciated.
>
> Thanks,
> Akhil
> --
> View this message in context:
http://www.nabble.com/Using-english-Wikipedia-XML-dump-tp24236727p24236727.html> Sent from the English Wikipedia mailing list archive at Nabble.com.
>
>
> _______________________________________________
> WikiEN-l mailing list
>
WikiEN-l@...
> To unsubscribe from this mailing list, visit:
>
https://lists.wikimedia.org/mailman/listinfo/wikien-l>
_______________________________________________
WikiEN-l mailing list
WikiEN-l@...
To unsubscribe from this mailing list, visit:
https://lists.wikimedia.org/mailman/listinfo/wikien-l