web to semantic web : an automated approach

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello friends,

I have been following semantic web for some time now and have seen quite
a lot of projects being run (dbpedia, FOAF etc) trying to generate some
semantic content. While these approaches might have been successful in
their goals, one major problem plaguing semantic web as a whole is the
lack of semantic content. Unfortunately there is nothing in sight that
we can rely on to generate semantic content for the truckloads of
information being put on web everyday. I think one of the _wrong_
assumption in semantic web community is that content creators will be
creating a semantic data which I think is too much for the asking from
even more technically sound part of web community let along whole of the
web community. It hasn't happened over last so many years and I don't
see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to
_automatcially_ convert the information over the web to semantic
information. There are many softwares/services that can be used for this
purpose. I am currently developing one prototype for this purpose. This
prototype uses services from OpenCalais(http://www.opencalais.com/) to
convert ordinary text to semantic form. This service is very limited in
what entities supports at the moment but its a very good start. I am
pretty sure there will be many other good options available that might
be unknown to me. The currently very primitive prototype can be seen at
http://arcse.appspot.com. This currently implements very few of the
ideas I have for this. This is hosted on Google's AppEngine so sometime
gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in
lot in domains ranging form NLP to artificial intelligence to semantic
web to logic etc. So thats why this mail. I will be more than happy if
we can join together to form a like minded team that can work on solving
this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur




Re: web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

any thoughts on this...


On Mon, Oct 20, 2008 at 12:38 AM, ravinder thakur <ravinderthakur@...> wrote:
Hello friends,

I have been following semantic web for some time now and have seen quite a lot of projects being run (dbpedia, FOAF etc) trying to generate some semantic content. While these approaches might have been successful in their goals, one major problem plaguing semantic web as a whole is the lack of semantic content. Unfortunately there is nothing in sight that we can rely on to generate semantic content for the truckloads of information being put on web everyday. I think one of the _wrong_ assumption in semantic web community is that content creators will be creating a semantic data which I think is too much for the asking from even more technically sound part of web community let along whole of the web community. It hasn't happened over last so many years and I don't see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to _automatcially_ convert the information over the web to semantic information. There are many softwares/services that can be used for this purpose. I am currently developing one prototype for this purpose. This prototype uses services from OpenCalais(http://www.opencalais.com/) to convert ordinary text to semantic form. This service is very limited in what entities supports at the moment but its a very good start. I am pretty sure there will be many other good options available that might be unknown to me. The currently very primitive prototype can be seen at http://arcse.appspot.com. This currently implements very few of the ideas I have for this. This is hosted on Google's AppEngine so sometime gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in lot in domains ranging form NLP to artificial intelligence to semantic web to logic etc. So thats why this mail. I will be more than happy if we can join together to form a like minded team that can work on solving this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur



Re: web to semantic web : an automated approach

by Kjetil Kjernsmo-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sunday 19 October 2008 21:08:10 ravinder thakur wrote:
> I think one of the _wrong_
> assumption in semantic web community is that content creators will be
> creating a semantic data which I think is too much for the asking from
> even more technically sound part of web community let along whole of the
> web community. It hasn't happened over last so many years and I don't
> see it happening in the near future.

This is indeed an essential point in the development of the Semantic Web. I'm
mostly in the "it'll happen" camp with regards to people creating semantic
content. There are two main sources, one is that they say that 70% of the
data on the web is allready in some structured form, thus what's needed is to
clarify what that structure means.

The other thing is what is often called "Mass Intelligence". Geonames.org is
an excellent example of an application that let people produce meaningful
data, and I think that could be extended to many more domains.

I personally would prefer Mass Intelligence, i.e. the real intelligence of
many people, over Artificial Intelligence (when it tries to do things humans
are better at), as I fear that the Semantic Web would inherit the perceived
and/or real weaknesses of AI if AI was dominant.

That's not to say that an automated approach isn't useful, I think it is, but
I see it as something that should be applied with care where it makes sense.
I also have colleagues who would agree with you that it is the most
interesting thing to do right now.

> I am
> pretty sure there will be many other good options available that might
> be unknown to me.

I think Cypher, which had a 1.9 release announced here a couple of days ago
would be of great interest to you: http://www.monrai.com/products/cypher/

Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
interest.

Kind regards

Kjetil Kjernsmo
--
Senior Knowledge Engineer
Mobile: +47 986 48 234
Email: kjetil.kjernsmo@...  
Web: http://www.computas.com/

|  SHARE YOUR KNOWLEDGE  |

Computas AS  PO Box 482, N-1327 Lysaker | Phone:+47 6783 1000 | Fax:+47 6783
1001



Re: web to semantic web : an automated approach

by Kannan Rajkumar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi Mr. Ravinder
 
It is a nice idea, why cannot we transform web content to semantic web content.
 
This is a necessity and will avoid regeneration of web content as semantic web data.
 
Even I am focusing in this direction.
 
With regards,
 
Dr. Rajkumar Kannan
Associate Professor
Dept. of Computer Science
Bishop Heber College, Tiruchirappalli, TN, India
 
 
===================================================

 
On 10/20/08, रविंदर ठाकुर (ravinder thakur) <ravinderthakur@...> wrote:
any thoughts on this...



On Mon, Oct 20, 2008 at 12:38 AM, ravinder thakur <ravinderthakur@...> wrote:
Hello friends,

I have been following semantic web for some time now and have seen quite a lot of projects being run (dbpedia, FOAF etc) trying to generate some semantic content. While these approaches might have been successful in their goals, one major problem plaguing semantic web as a whole is the lack of semantic content. Unfortunately there is nothing in sight that we can rely on to generate semantic content for the truckloads of information being put on web everyday. I think one of the _wrong_ assumption in semantic web community is that content creators will be creating a semantic data which I think is too much for the asking from even more technically sound part of web community let along whole of the web community. It hasn't happened over last so many years and I don't see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to _automatcially_ convert the information over the web to semantic information. There are many softwares/services that can be used for this purpose. I am currently developing one prototype for this purpose. This prototype uses services from OpenCalais(http://www.opencalais.com/) to convert ordinary text to semantic form. This service is very limited in what entities supports at the moment but its a very good start. I am pretty sure there will be many other good options available that might be unknown to me. The currently very primitive prototype can be seen at http://arcse.appspot.com. This currently implements very few of the ideas I have for this. This is hosted on Google's AppEngine so sometime gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in lot in domains ranging form NLP to artificial intelligence to semantic web to logic etc. So thats why this mail. I will be more than happy if we can join together to form a like minded team that can work on solving this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur




Re: web to semantic web : an automated approach

by Andreas Langegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.
Hi,

it's all happening, but it's not so easy as one may think in the first place.

Basically there are multiple sources of structured/interlinked information (A) and multiple ways of how to expose (B) linked data on the Web.

(A)
1. generated (wrapped) from information systems (RDBMS, etc) => needs mapping
2. user-generated (natively RDF-based systems, Semantic Wikis, etc.) => already in the right form
3. extracted (AI, heuristics, cypher, etc. - different levels of granularity; difficult, sometimes wrong)

(B)
1. RDF documents
2. SPARQL endpoints
3. embedded into HTML (RDFa)

The Linked Data Community project plays an important role regarding A1 and A2. A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

Regards,
AndyL



On Oct 20, 2008, at 10:35 AM, Kannan Rajkumar wrote:

Hi Mr. Ravinder
 
It is a nice idea, why cannot we transform web content to semantic web content.
 
This is a necessity and will avoid regeneration of web content as semantic web data.
 
Even I am focusing in this direction.
 
With regards,
 
Dr. Rajkumar Kannan
Associate Professor
Dept. of Computer Science
Bishop Heber College, Tiruchirappalli, TN, India
 
 
===================================================

 
On 10/20/08, रविंदर ठाकुर (ravinder thakur) <ravinderthakur@...> wrote:
any thoughts on this...



On Mon, Oct 20, 2008 at 12:38 AM, ravinder thakur <ravinderthakur@...> wrote:
Hello friends,

I have been following semantic web for some time now and have seen quite a lot of projects being run (dbpedia, FOAF etc) trying to generate some semantic content. While these approaches might have been successful in their goals, one major problem plaguing semantic web as a whole is the lack of semantic content. Unfortunately there is nothing in sight that we can rely on to generate semantic content for the truckloads of information being put on web everyday. I think one of the _wrong_ assumption in semantic web community is that content creators will be creating a semantic data which I think is too much for the asking from even more technically sound part of web community let along whole of the web community. It hasn't happened over last so many years and I don't see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to _automatcially_ convert the information over the web to semantic information. There are many softwares/services that can be used for this purpose. I am currently developing one prototype for this purpose. This prototype uses services from OpenCalais(http://www.opencalais.com/) to convert ordinary text to semantic form. This service is very limited in what entities supports at the moment but its a very good start. I am pretty sure there will be many other good options available that might be unknown to me. The currently very primitive prototype can be seen at http://arcse.appspot.com. This currently implements very few of the ideas I have for this. This is hosted on Google's AppEngine so sometime gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in lot in domains ranging form NLP to artificial intelligence to semantic web to logic etc. So thats why this mail. I will be more than happy if we can join together to form a like minded team that can work on solving this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur






Web of Data Practitioners Days / Oct 22-23 / Vienna
http://www.webofdata.info
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69





Re: web to semantic web : an automated approach

by Stephane Corlosquet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi all,

Popular content management systems have a great role to play in democratizing the semantic web. Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data. Because so many people are using, there is a great potential in implementing some built in semantic web features, and that's what's happening with Drupal 7, which will ship with RDFa support in core. Drupal will then be part of the category A2, with more than 30 000 RDFa enabled sites!

--
Stéphane,
scor @ drupal.org
http://drupal.org/user/52142

On Mon, Oct 20, 2008 at 9:55 AM, Andreas Langegger <al@...> wrote:
Hi,

it's all happening, but it's not so easy as one may think in the first place.

Basically there are multiple sources of structured/interlinked information (A) and multiple ways of how to expose (B) linked data on the Web.

(A)
1. generated (wrapped) from information systems (RDBMS, etc) => needs mapping
2. user-generated (natively RDF-based systems, Semantic Wikis, etc.) => already in the right form
3. extracted (AI, heuristics, cypher, etc. - different levels of granularity; difficult, sometimes wrong)

(B)
1. RDF documents
2. SPARQL endpoints
3. embedded into HTML (RDFa)

The Linked Data Community project plays an important role regarding A1 and A2. A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

Regards,
AndyL



On Oct 20, 2008, at 10:35 AM, Kannan Rajkumar wrote:

Hi Mr. Ravinder
 
It is a nice idea, why cannot we transform web content to semantic web content.
 
This is a necessity and will avoid regeneration of web content as semantic web data.
 
Even I am focusing in this direction.
 
With regards,
 
Dr. Rajkumar Kannan
Associate Professor
Dept. of Computer Science
Bishop Heber College, Tiruchirappalli, TN, India
 
 
===================================================

 
On 10/20/08, रविंदर ठाकुर (ravinder thakur) <ravinderthakur@...> wrote:
any thoughts on this...



On Mon, Oct 20, 2008 at 12:38 AM, ravinder thakur <ravinderthakur@...> wrote:
Hello friends,

I have been following semantic web for some time now and have seen quite a lot of projects being run (dbpedia, FOAF etc) trying to generate some semantic content. While these approaches might have been successful in their goals, one major problem plaguing semantic web as a whole is the lack of semantic content. Unfortunately there is nothing in sight that we can rely on to generate semantic content for the truckloads of information being put on web everyday. I think one of the _wrong_ assumption in semantic web community is that content creators will be creating a semantic data which I think is too much for the asking from even more technically sound part of web community let along whole of the web community. It hasn't happened over last so many years and I don't see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to _automatcially_ convert the information over the web to semantic information. There are many softwares/services that can be used for this purpose. I am currently developing one prototype for this purpose. This prototype uses services from OpenCalais(http://www.opencalais.com/) to convert ordinary text to semantic form. This service is very limited in what entities supports at the moment but its a very good start. I am pretty sure there will be many other good options available that might be unknown to me. The currently very primitive prototype can be seen at http://arcse.appspot.com. This currently implements very few of the ideas I have for this. This is hosted on Google's AppEngine so sometime gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in lot in domains ranging form NLP to artificial intelligence to semantic web to logic etc. So thats why this mail. I will be more than happy if we can join together to form a like minded team that can work on solving this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur






Web of Data Practitioners Days / Oct 22-23 / Vienna
http://www.webofdata.info
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69






Re: web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>This is indeed an essential point in the development of the Semantic Web. I'm
>>>>mostly in the "it'll happen" camp with regards to people creating semantic
>>>> content. There are two main sources, one is that they say that 70% of the
>>>> data on the web is allready in some structured form, thus what's needed is to
>>>> clarify what that structure means.

I have been in "it will happen" camp but nothing far reaching seems to be happening so i am out. I would say that most of the data (90%) of data out there is unstructured. Also most of the strucutred data is specific to companies and they wont share it. There are people writing blogs, wikipedia, news websites producing content continuisley, people reviewing the products, putting their opinions online, the list of unstructured data is endless and will continue to grow with increasing Internet peneratration in 3rd world conturies. To assume that all users will manually convert this data to sturcutred seems too far fetched. To assume that the information being put by these end users is of little uses than say wikipedia/dbpedia would be a horrible mistake. Even if we have large data, someone needs to club this vast amount of rdf/owl data and create a global graph interlinking all of that.(BTW i see some serious ontology issues anyone will likely to hit in this approach)

>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
>>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and there is hell lot of things to be done yet on this. eg. write domain specific components etc.



>>>>A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate >>>>environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will continue to improve its performance in the near future. Currently to aliviate the wrong linking/information problem I think reduancy of information will play an important role. If we have 10 sources of same peice of information and 6 NLP parsers give one view and rest 4 give other view, i am pretty sure the one on which 6 are agreeing will be the right one. Also we dont have to be 100% right(that too in the begining) since ( other than your boss :) ) nobody is 100% right:)


>>>>Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No. What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.


Thanks for initiating the discussion anyways. Keep it coming :)

Re: web to semantic web : an automated approach

by Paola Di Maio-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Stefan

I have a content management background, and I am a Drupal user.
I have been looking forward to the advances that you mention below however
my problem is modelling the triples.
I am sure that a tool can automatically extract/infer triples from
content, but I am not sure these would be meaningful/representative.

Assuming the functionality is available to expose the data as RDF (of
course I would have to upgrade to drupal 7 and all the custom modules
and functionalities written for drupal 6 would have to be rewritten,
but thats admittedly another problem) but what kind of knowledge
schema/ontology  would it adhere to? would the system automaticlaly
infer what is the subject what is the predicate, and would I (website
mom) be able to override what the system suggests? I havent quite
worked out how the system would work

clues welcome

Paola Di Maio



On Mon, Oct 20, 2008 at 4:28 PM, Stephane Corlosquet
<scorlosquet@...> wrote:

> Hi all,
>
> Popular content management systems have a great role to play in
> democratizing the semantic web. Some CMS like Drupal have already understood
> this and are rapidly moving towards exposing their content as RDF data.
> Because so many people are using, there is a great potential in implementing
> some built in semantic web features, and that's what's happening with Drupal
> 7, which will ship with RDFa support in core. Drupal will then be part of
> the category A2, with more than 30 000 RDFa enabled sites!
>
> --
> Stéphane,
> scor @ drupal.org
> http://drupal.org/user/52142
>
> On Mon, Oct 20, 2008 at 9:55 AM, Andreas Langegger <al@...> wrote:
>>
>> Hi,
>> it's all happening, but it's not so easy as one may think in the first
>> place.
>> Basically there are multiple sources of structured/interlinked information
>> (A) and multiple ways of how to expose (B) linked data on the Web.
>> (A)
>> 1. generated (wrapped) from information systems (RDBMS, etc) => needs
>> mapping
>> 2. user-generated (natively RDF-based systems, Semantic Wikis, etc.) =>
>> already in the right form
>> 3. extracted (AI, heuristics, cypher, etc. - different levels of
>> granularity; difficult, sometimes wrong)
>> (B)
>> 1. RDF documents
>> 2. SPARQL endpoints
>> 3. embedded into HTML (RDFa)
>> The Linked Data Community project plays an important role regarding A1 and
>> A2. A3 is cumbersome and may produce wrong links and information - a
>> nightmare without implicit support for provenance. In corporate environments
>> A3 is already very popular, but in the broader Web-scale I'm a bit sceptical
>> this will work well. What do you tink?
>> Regards,
>> AndyL
>>
>>
>> On Oct 20, 2008, at 10:35 AM, Kannan Rajkumar wrote:
>>
>> Hi Mr. Ravinder
>>
>> It is a nice idea, why cannot we transform web content to semantic web
>> content.
>>
>> This is a necessity and will avoid regeneration of web content as semantic
>> web data.
>>
>> Even I am focusing in this direction.
>>
>> With regards,
>>
>> Dr. Rajkumar Kannan
>> Associate Professor
>> Dept. of Computer Science
>> Bishop Heber College, Tiruchirappalli, TN, India
>> URL: http://member.acm.org/~rajkumark/
>>
>>
>> ===================================================
>>
>> On 10/20/08, रविंदर ठाकुर (ravinder thakur) <ravinderthakur@...>
>> wrote:
>>>
>>> any thoughts on this...
>>>
>>>
>>> On Mon, Oct 20, 2008 at 12:38 AM, ravinder thakur
>>> <ravinderthakur@...> wrote:
>>>>
>>>> Hello friends,
>>>>
>>>> I have been following semantic web for some time now and have seen quite
>>>> a lot of projects being run (dbpedia, FOAF etc) trying to generate some
>>>> semantic content. While these approaches might have been successful in their
>>>> goals, one major problem plaguing semantic web as a whole is the lack of
>>>> semantic content. Unfortunately there is nothing in sight that we can rely
>>>> on to generate semantic content for the truckloads of information being put
>>>> on web everyday. I think one of the _wrong_ assumption in semantic web
>>>> community is that content creators will be creating a semantic data which I
>>>> think is too much for the asking from even more technically sound part of
>>>> web community let along whole of the web community. It hasn't happened over
>>>> last so many years and I don't see it happening in the near future.
>>>>
>>>> I think what we need to move the semantic web forward is a mechanism to
>>>> _automatcially_ convert the information over the web to semantic
>>>> information. There are many softwares/services that can be used for this
>>>> purpose. I am currently developing one prototype for this purpose. This
>>>> prototype uses services from OpenCalais(http://www.opencalais.com/) to
>>>> convert ordinary text to semantic form. This service is very limited in what
>>>> entities supports at the moment but its a very good start. I am pretty sure
>>>> there will be many other good options available that might be unknown to me.
>>>> The currently very primitive prototype can be seen at
>>>> http://arcse.appspot.com. This currently implements very few of the ideas I
>>>> have for this. This is hosted on Google's AppEngine so sometime gives
>>>> timeout messages internally so please bear with this :).
>>>>
>>>> This automatic conversion however is not a simple task and needs work in
>>>> lot in domains ranging form NLP to artificial intelligence to semantic web
>>>> to logic etc. So thats why this mail. I will be more than happy if we can
>>>> join together to form a like minded team that can work on solving this most
>>>> important problem plaguing semantic web currently.
>>>>
>>>> Waiting for your suggestions/criticisms
>>>> Ravinder Thakur
>>>>
>>>
>>
>>
>>
>>
>> Web of Data Practitioners Days / Oct 22-23 / Vienna
>> http://www.webofdata.info
>> ----------------------------------------------------------------------
>> Dipl.-Ing.(FH) Andreas Langegger
>> Institute for Applied Knowledge Processing
>> Johannes Kepler University Linz
>> A-4040 Linz, Altenberger Straße 69
>> http://www.langegger.at
>>
>>
>>
>
>



--
Paola Di Maio
School of IT
www.mfu.ac.th
*********************************************

Re: web to semantic web : an automated approach

by Paola Di Maio-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>>>>>Some CMS like Drupal have already understood this and are rapidly moving
>>>>> towards exposing their content as RDF data
>
> Here's the problem. Drupal are exposing _the data stored in Drupal. Do we
> expect everyone on web to use Drupal ? No. What happens to information on
> times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web
> is not about converting someone's data and exposing it with semantic view.
> Its about the _whole_ data out there on web and then building a web of
> semantic links on top of that and then doing reasoning on top of that etc.
>
>
Modular components can be reused across platforms, and if necessary re-written.
I remember when RSS first, and then ATOM, became available to export feeds

first one, then two, then all the platforms supported this small but
vital feature

I figure RDF could be another one of those, but the questions I asked
about modelling the triples is still something I have not come to
terms with. I still cant figure it out, but think its somewhere along
the line


pdm


Re: web to semantic web : an automated approach

by Stephane Corlosquet :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No.
I gave Drupal as an example. It's up to the other systems to follow through, maybe learning from the other systems implementing it.
 
What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ?
You'd be surprised by the sites already using Drupal, to name a few: observer.com, fastcompany.com, amnesty.org, universalmusic.com, theonion.com. These represent already many web pages ready to make the switch.
 
Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.
This cannot happen in one day. This cannot open via one project. It's a global effort that will take a bit of time.

Stéphane.


On Mon, Oct 20, 2008 at 11:12 AM, रविंदर ठाकुर (ravinder thakur) <ravinderthakur@...> wrote:
>>>>This is indeed an essential point in the development of the Semantic Web. I'm
>>>>mostly in the "it'll happen" camp with regards to people creating semantic
>>>> content. There are two main sources, one is that they say that 70% of the
>>>> data on the web is allready in some structured form, thus what's needed is to
>>>> clarify what that structure means.

I have been in "it will happen" camp but nothing far reaching seems to be happening so i am out. I would say that most of the data (90%) of data out there is unstructured. Also most of the strucutred data is specific to companies and they wont share it. There are people writing blogs, wikipedia, news websites producing content continuisley, people reviewing the products, putting their opinions online, the list of unstructured data is endless and will continue to grow with increasing Internet peneratration in 3rd world conturies. To assume that all users will manually convert this data to sturcutred seems too far fetched. To assume that the information being put by these end users is of little uses than say wikipedia/dbpedia would be a horrible mistake. Even if we have large data, someone needs to club this vast amount of rdf/owl data and create a global graph interlinking all of that.(BTW i see some serious ontology issues anyone will likely to hit in this approach)


>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
>>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and there is hell lot of things to be done yet on this. eg. write domain specific components etc.




>>>>A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate >>>>environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will continue to improve its performance in the near future. Currently to aliviate the wrong linking/information problem I think reduancy of information will play an important role. If we have 10 sources of same peice of information and 6 NLP parsers give one view and rest 4 give other view, i am pretty sure the one on which 6 are agreeing will be the right one. Also we dont have to be 100% right(that too in the begining) since ( other than your boss :) ) nobody is 100% right:)


>>>>Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No. What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.


Thanks for initiating the discussion anyways. Keep it coming :)


Re: web to semantic web : an automated approach

by Andreas Langegger :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've always been a member of the pragmatics-camp, scepticism helps indeed, but it doesn't help to get forward. 

I think the idea of a global "Semantic Web" was, and still is tempting and many bloggers, columnists, and many smart and visionary people like to talk about web-scale reasoning. Some even said, the SW will replace the traditional Web, or the Web 2.0... This is soo stupid. The most important thing to me is the SW standards, the layer cake. The possibility to share and interlink information where it's appropriate, it's just about an open standard for data. 

The last 10 years everybody was talking about open protocols and Web services. But what's actually communicated between endpoints is data. XML/XML-Schema won't be replaced either. But if you want to interlink data on the Web, it's not feasible. This is where SW standards rule.

Because SW research is an open and democratic process, there are so many different viewpoints and interpretations about what it is itself. Many have stopped using ontologies and reasoners at all, they just use RDF and maybe RDF-S, even Ora Lassila - co-author of the original Scientific American article in 2001 [1] - as far as I know. Beside subsumption based on class hierachies, it's probably not working for all-day-information, but it works very very well for many applications mainly coming from live sciences. This is what Web 2.0 people and all those sceptical about reasoning usually don't see! They see blogs, foaf, vcards, etc. Here the possiblity for RDF-only interlinking is great and I'm sure it will be successfull and I'm sure that other CMS beside Drupal have and will adopt soon and introduce RDF features!

Nobody will ever demand for 100% of all information on the Web being RDFized... think pragmatic!

Regards,
AndyL

[1] http://www.sciam.com/article.cfm?id=the-semantic-web


On Oct 20, 2008, at 12:12 PM, रविंदर ठाकुर (ravinder thakur) wrote:

>>>>This is indeed an essential point in the development of the Semantic Web. I'm
>>>>mostly in the "it'll happen" camp with regards to people creating semantic
>>>> content. There are two main sources, one is that they say that 70% of the
>>>> data on the web is allready in some structured form, thus what's needed is to
>>>> clarify what that structure means.

I have been in "it will happen" camp but nothing far reaching seems to be happening so i am out. I would say that most of the data (90%) of data out there is unstructured. Also most of the strucutred data is specific to companies and they wont share it. There are people writing blogs, wikipedia, news websites producing content continuisley, people reviewing the products, putting their opinions online, the list of unstructured data is endless and will continue to grow with increasing Internet peneratration in 3rd world conturies. To assume that all users will manually convert this data to sturcutred seems too far fetched. To assume that the information being put by these end users is of little uses than say wikipedia/dbpedia would be a horrible mistake. Even if we have large data, someone needs to club this vast amount of rdf/owl data and create a global graph interlinking all of that.(BTW i see some serious ontology issues anyone will likely to hit in this approach)

>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
>>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and there is hell lot of things to be done yet on this. eg. write domain specific components etc.



>>>>A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate >>>>environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will continue to improve its performance in the near future. Currently to aliviate the wrong linking/information problem I think reduancy of information will play an important role. If we have 10 sources of same peice of information and 6 NLP parsers give one view and rest 4 give other view, i am pretty sure the one on which 6 are agreeing will be the right one. Also we dont have to be 100% right(that too in the begining) since ( other than your boss :) ) nobody is 100% right:)


>>>>Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No. What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.


Thanks for initiating the discussion anyways. Keep it coming :)



Web of Data Practitioners Days / Oct 22-23 / Vienna
http://www.webofdata.info
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69





Re: web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

there will always be people overly optimistic and overly pessimistic about anything in the world so i wont take the case with case with SW as an exception. On the otherhand we shouldn't  an option just because it there doesn't seems anyone asking for it. When Faraday found a way to create electircty he thought that was a useful invention since nobody uses or is asking for electicity. I don't see _everyone_ to be using RDFs but this can be used to solve many problems that no other technology can boast of. This is especially true since more and more data is coming to web and we need a better way to analyse/search/present that data to the end users looking for it.

But coming to the main point, i don't see why semantic web as envisioned by its main proponents shouldn't work. I am not saying that first attempt at it will be last one but an honest attempt is much better than some untested opinions :).

What we need is
a) a NLP system (similar to the one in www.opencalais.com) that converts the data on the web to its semantic form(rdf/owl etc) for much broader set of concepts.
b) a store for this data from a)
c) a reasoner for data stored in b)


As I see it a) is the hardest part and ignoring performance/scalability issues, b) and c) already exists. Its the lack of a) that is keeping them from achieving anything great with semantic web.


Re: web to semantic web : an automated approach

by Ed Summers :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Sun, Oct 19, 2008 at 3:08 PM, ravinder thakur
<ravinderthakur@...> wrote:

> I have been following semantic web for some time now and have seen quite a
> lot of projects being run (dbpedia, FOAF etc) trying to generate some
> semantic content. While these approaches might have been successful in their
> goals, one major problem plaguing semantic web as a whole is the lack of
> semantic content. Unfortunately there is nothing in sight that we can rely
> on to generate semantic content for the truckloads of information being put
> on web everyday. I think one of the _wrong_ assumption in semantic web
> community is that content creators will be creating a semantic data which I
> think is too much for the asking from even more technically sound part of
> web community let along whole of the web community. It hasn't happened over
> last so many years and I don't see it happening in the near future.

Thanks for the thought provoking question Ravinder. I wrote up a
longish response about how we basically need to convince people with
rich domain models (RDBMS, etc) to make their data available on the
web in a particular way. But then I re-read your email and realized
you already knew this :-)

My perspective is that we already see people publishing their data in
machine readable form on the web in vast quantities [1,2]. There is no
lack of semantic data on the web. Although I guess we could go back
and forth for a bit about the semantics of 'semantic' ... which
wouldn't be much fun. The challenge (I think) is in convincing current
and future publishers that there is value in using some patterns from
the semweb community [3, 4].

What sort of stuff does a linked-data space enable? What sorts of
things can you _do_ with a linked data graph that you couldn't do
before? Where are the real, working, non-hypothetical applications I
can play with now that use say the linked-open-data space?. How do we
get people excited enough about these things to invest the time in
making their data assets available in this way on the web? These are
the questions I personally need help with. I already have semantic
data, expressed in databases of various kinds, painstakingly entered
by real people. I need to convince my colleagues that it's a
worthwhile endeavor expressing these semantics on the web using URIs
and RDF.

I think web2.0 has convinced large portions of the software
development and business communities that there _is_ value in making
machine readable data available on the web. But I don't think these
people have yet been convinced that the entities described in this
data need URIs, and that there is value in linking these entities
together within the enterprise and across organizational boundaries.
But it feels like we're close to a tipping point ... perhaps you feel
like we've been on the tipping point for a while eh?

In the end I think the way forward is to continue to add to the list
of success stories [5], to bring semweb technologies to the broader
web developer community [6,7], and to show the synergies between
semweb technologies and other stuff like REST, AtomPub, OpenID,
Microformats, content-management-systems (Drupal, etc). To keep on
keeping on, as it were ... finding new algorithms, building useful
tools and real systems now (as you are)--not giving in to despair, and
retreating to an ivory tower to gaze into the future where the
computers will just take care of it.

//Ed

[1] http://www.oreillynet.com/pub/a/oreilly/tim/news/2005/09/30/what-is-web-20.html
[2] http://www.programmableweb.com/
[3] http://www.w3.org/TR/cooluris/
[4] http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
[5] http://www.w3.org/2001/sw/sweo/public/UseCases/
[6] http://www.w3.org/TR/xhtml-rdfa-primer/
[7] http://www.w3.org/2003/g/data-view


Parent Message unknown Re: web to semantic web : an automated approach

by Semantics-ProjectParadigm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

Before we can make any sensible comments on the extent to which we can structure information before it is put on the web or convert into SW formats, we need to know what is on the web in terms of (raw) information and data, which type of web services access this information and who are the users.

For the Semantic Web to gain momentum and expand its user base and number of SW compliant web pages, we need to do a survey.

For all the hype the Semantic Web has been getting, we must accept the fact that some (quite a bit actually) of (raw) data, information will not lend itself to useful content creation.

The question that should be answered is what, in what form for whom would be useful to convert into SW compliant format.

On the latter issue more effort should be focused.

Some key words here are open archives (library exchange), open access (to digital repositories), open licenses and open source applications, open access publications, web portals.

Rainbow Warriors International, Ekolibrium Foundation, WiserEarth are just three examples of three non-profits in a specific field, namely sustainable development who believe the people-centered approach is necessary to expanding the Semantic Web.

I am glad to see Drupal embrace SW, as we last year requested that Drupal consider incorporating SW, in a long email detailing the convergence of (new) technologies being 3G and 4G GSM mobile network platforms on GSM phones, search engine technologies (we recently in a submission to the www.project10tothe100.com of Google in which we suggested the GOLDEN TIP, i.e. expanding the search heuristics in the search algorithms to include links to tagged data, which means Google would be able to filter for SW content containing pages.

Similar ideas we have bounced off to companies like Sybase and the Eclipse consortium (www.eclipse.org).

There are also barriers and obstacles to consider, specialty printed publications who cater to professionals, librarires, academic institutions and research instutes etc, stand to loose potentially as do printed newspaper and magazine publishers.

As a sustainable development organization we are simply trying to rally as much support for the broadest possible platform utilizing the semantic web for a common good.

In our case we are thrying to empower all stakeholders in sustainable development worldwide by the utilization of ICT technologies (with mobile telephony and internet as spearhead technology platforms)..

Why? Because it has the power of consensus of the UN to throw at it, which makes it easier to persuade corporate players to throw resources at it.

Companies like Microsoft, Sun Microsystems, Sybase, Oracle, and the developers of web browsers and internet applications widely used by internet users need to come onboard.

Then we will have the thrust to get things moving.

I am betting that Google and browser developing software companies and open source networks will lead the way, with the academia and libraries following closely on foot.

But before we get into anything, we need raw numbers and answers to the question at the beginning of this email.

Milton Ponson
GSM: +297 747 8280
Rainbow Warriors Core Foundation
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
www.rainbowwarriors.net (under revision)
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
www.projectparadigm.info (under construction)
NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
www.ngo-opensource.org (proposed project)
MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
www.metaportal.info (proposed project)
SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project (proposed site: www.semanticwebsoftware.org)


--- On Mon, 10/20/08, Andreas Langegger <al@...> wrote:
From: Andreas Langegger <al@...>
Subject: Re: web to semantic web : an automated approach
To: "Semantic Web" <semantic-web@...>, semantic_web@...
Date: Monday, October 20, 2008, 10:56 AM

I've always been a member of the pragmatics-camp, scepticism helps indeed, but it doesn't help to get forward. 

I think the idea of a global "Semantic Web" was, and still is tempting and many bloggers, columnists, and many smart and visionary people like to talk about web-scale reasoning. Some even said, the SW will replace the traditional Web, or the Web 2.0... This is soo stupid. The most important thing to me is the SW standards, the layer cake. The possibility to share and interlink information where it's appropriate, it's just about an open standard for data. 

The last 10 years everybody was talking about open protocols and Web services. But what's actually communicated between endpoints is data. XML/XML-Schema won't be replaced either. But if you want to interlink data on the Web, it's not feasible. This is where SW standards rule.

Because SW research is an open and democratic process, there are so many different viewpoints and interpretations about what it is itself. Many have stopped using ontologies and reasoners at all, they just use RDF and maybe RDF-S, even Ora Lassila - co-author of the original Scientific American article in 2001 [1] - as far as I know. Beside subsumption based on class hierachies, it's probably not working for all-day-information, but it works very very well for many applications mainly coming from live sciences. This is what Web 2.0 people and all those sceptical about reasoning usually don't see! They see blogs, foaf, vcards, etc. Here the possiblity for RDF-only interlinking is great and I'm sure it will be successfull and I'm sure that other CMS beside Drupal have and will adopt soon and introduce RDF features!

Nobody will ever demand for 100% of all information on the Web being RDFized... think pragmatic!

Regards,
AndyL

[1] http://www.sciam.com/article.cfm?id=the-semantic-web


On Oct 20, 2008, at 12:12 PM, रविंदर ठाकुर (ravinder thakur) wrote:

>>>>This is indeed an essential point in the development of the Semantic Web. I'm
>>>>mostly in the "it'll happen" camp with regards to people creating semantic
>>>> content. There are two main sources, one is that they say that 70% of the
>>>> data on the web is allready in some structured form, thus what's needed is to
>>>> clarify what that structure means.

I have been in "it will happen" camp but nothing far reaching seems to be happening so i am out. I would say that most of the data (90%) of data out there is unstructured. Also most of the strucutred data is specific to companies and they wont share it. There are people writing blogs, wikipedia, news websites producing content continuisley, people reviewing the products, putting their opinions online, the list of unstructured data is endless and will continue to grow with increasing Internet peneratration in 3rd world conturies. To assume that all users will manually convert this data to sturcutred seems too far fetched. To assume that the information being put by these end users is of little uses than say wikipedia/dbpedia would be a horrible mistake. Even if we have large data, someone needs to club this vast amount of rdf/owl data and create a global graph interlinking all of that.(BTW i see some serious ontology issues anyone will likely to hit in this approach)

>>>>Also, I think IBM's SUKI http://www.research.ibm.com/UIMA/SUKI/ might be of
>>>>interest.

I have used UIMA but its not a one man army's job. Its just a framework and there is hell lot of things to be done yet on this. eg. write domain specific components etc.



>>>>A3 is cumbersome and may produce wrong links and information - a nightmare without implicit support for provenance. In corporate >>>>environments A3 is already very popular, but in the broader Web-scale I'm a bit sceptical this will work well. What do you tink?

I am hoping a lot on the progress we have made in NLP and no doubt NLP will continue to improve its performance in the near future. Currently to aliviate the wrong linking/information problem I think reduancy of information will play an important role. If we have 10 sources of same peice of information and 6 NLP parsers give one view and rest 4 give other view, i am pretty sure the one on which 6 are agreeing will be the right one. Also we dont have to be 100% right(that too in the begining) since ( other than your boss :) ) nobody is 100% right:)


>>>>Some CMS like Drupal have already understood this and are rapidly moving towards exposing their content as RDF data

Here's the problem. Drupal are exposing _the data stored in Drupal. Do we expect everyone on web to use Drupal ? No. What happens to information on times.com, blogspot.com, googlegroups.com or kashmirtimes.com ? Semantic web is not about converting someone's data and exposing it with semantic view. Its about the _whole_ data out there on web and then building a web of semantic links on top of that and then doing reasoning on top of that etc.


Thanks for initiating the discussion anyways. Keep it coming :)



Web of Data Practitioners Days / Oct 22-23 / Vienna
http://www.webofdata.info
----------------------------------------------------------------------
Dipl.-Ing.(FH) Andreas Langegger
Institute for Applied Knowledge Processing
Johannes Kepler University Linz
A-4040 Linz, Altenberger Straße 69





__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>>>>What sort of stuff does a linked-data space enable? What sorts of
>>>>things can you _do_ with a linked data graph that you couldn't do
>>>>before?
A lot depends upon the kind of data we have. Assuming we have the requisite data i would expect best system to give answers to queries like "world population in 1968" as  3.5 billion and not something like this (the current best system). How good and confident a system is which says he has found 489,000 matches for a simple one word question like this. The reason for this is the underlying architecture used to process the data. Unless that architecture is moved form a word analysis based system(pagerank etc) to one based on meaning (SW) there isn't much scope to improve.


>>>>Where are the real, working, non-hypothetical applications I
>>>>can play with now that use say the linked-open-data space?.
There isn't any real semantic web application that i know of. Please do let me know if anyone knows one.


>>>>How do we get people excited enough about these things to invest the time in
>>>> making their data assets available in this way on the web? These are
>>>>the questions I personally need help with.
The point is data owners wont be doing it unless they find some monetary benefit in this. One or two might share, but there will always be a bigger proportion that wont be sharing their data. But the point is why should we wait for others to share their data. Why not create our own semantic data ? If the approach of convincing people to share their data is not working why not take the other one or even better why not use both the approaches. We only stand to gain with these approaches.


RE: web to semantic web : an automated approach

by John Flynn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


The popular opinion in the community seems to be that the data for the
Semantic Web will mostly come from large structured data sources. However,
currently a large amount of the information on the Web is contained in
unstructured form. One of the key reasons that large unstructured sources of
data remains unavailable to the Semantic Web is that very little effort has
been made to make it easy and compelling for traditional html web site
developers to mark up their data in a way that it can simply be accessed via
the Semantic Web. Both RDFa and HTML2 are addressing this issue, but there
is still no simple way to html tag specific local web site data as instances
of a widely used ontology located at a remote site. You  might envision a
generally accepted ontology on a domain such as  "wine" that many of the
individual html web sites on that subject would link their data to as
instances. A capability to search that ontology could lead back to the
marked up instance data, which might, in turn, give a compelling reason for
the web site developers to go to the effort of making the changes to their
web site. But, this could only happen if a very simple way is provided for
them to mark up their data as instances of a remote ontology while also
allowing the data to show up in traditional web browser.

John

-----Original Message-----
From: semantic-web-request@... [mailto:semantic-web-request@...] On
Behalf Of ravinder thakur
Sent: Sunday, October 19, 2008 3:08 PM
To: semantic-web@...; semantic_web@...
Subject: web to semantic web : an automated approach


Hello friends,

I have been following semantic web for some time now and have seen quite
a lot of projects being run (dbpedia, FOAF etc) trying to generate some
semantic content. While these approaches might have been successful in
their goals, one major problem plaguing semantic web as a whole is the
lack of semantic content. Unfortunately there is nothing in sight that
we can rely on to generate semantic content for the truckloads of
information being put on web everyday. I think one of the _wrong_
assumption in semantic web community is that content creators will be
creating a semantic data which I think is too much for the asking from
even more technically sound part of web community let along whole of the
web community. It hasn't happened over last so many years and I don't
see it happening in the near future.

I think what we need to move the semantic web forward is a mechanism to
_automatcially_ convert the information over the web to semantic
information. There are many softwares/services that can be used for this
purpose. I am currently developing one prototype for this purpose. This
prototype uses services from OpenCalais(http://www.opencalais.com/) to
convert ordinary text to semantic form. This service is very limited in
what entities supports at the moment but its a very good start. I am
pretty sure there will be many other good options available that might
be unknown to me. The currently very primitive prototype can be seen at
http://arcse.appspot.com. This currently implements very few of the
ideas I have for this. This is hosted on Google's AppEngine so sometime
gives timeout messages internally so please bear with this :).

This automatic conversion however is not a simple task and needs work in
lot in domains ranging form NLP to artificial intelligence to semantic
web to logic etc. So thats why this mail. I will be more than happy if
we can join together to form a like minded team that can work on solving
this most important problem plaguing semantic web currently.

Waiting for your suggestions/criticisms
Ravinder Thakur





Re: web to semantic web : an automated approach

by रविंदर ठाकुर (ravinder thakur) :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Buts whats the incentive for web site owners to mark up their website with semantic data. Few days back i was reading some study conducted by Opera browser team that said that most of the html generated by websites is not even valid. How can we hope them to create correct semantic data. Also what happens to lot of other user submitted content(blogs, wikis etc ) ?

Instead why not create a mechanism to automatically convert web data to semantic data. Opencalais.com is already doing it on small domain, why can't/shouldn't we do it at web's scale ?


John : I realized that you are form BBN. In case you are aware, can you please tell us from your experience about the state of NLP ? To what extent the current best NLP systems are capable of extracting infroatmion from unformatted text ? And what are the hopes for the future to  overcome the curent shortcomings in NLP systems?




Re: web to semantic web : an automated approach

by Semantics-ProjectParadigm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Ravinder,

You are spot on in saying that quite a few data owners won't feel any incentive to have their data made SW compatible if no monetary or other gain is to be made.

And I also share the idea that we cannot wait. In the email I sent off to Google I actually made reference to tag-templates and not tags as an extra heuristic set to be included in the search algorithms.

I would like to know if there is anyone out there willing to look at the idea of creating a new search engine, that does what Google does, but includes semantic web filters?

Google can also be customized for languages, how about customizing for specific tags and semantic web structure formats?

I am a mathematician by training and would love to accept the challenge of creating a new search engine which is customizable semantic web enabled..

After all, anyone who knows the origin of Google, may believe it is possible to create something new if Google no longer fits the bill.

The same simple line of reasoning Page and Brin had when they built the prototype, was, "Build it and they (the guys with money) will come (to our door).'

But the project will have to be OPEN SOURCE with OPEN LICENSE approach, and preferably have support from popular browsers.

Anyone up to the challenge?

Milton Ponson
GSM: +297 747 8280
Rainbow Warriors Core Foundation
PO Box 1154, Oranjestad
Aruba, Dutch Caribbean
www.rainbowwarriors.net (under revision)
Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
www.projectparadigm.info (under construction)
NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
www.ngo-opensource.org (proposed project)
MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
www.metaportal.info (proposed project)
SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project (proposed site: www.semanticwebsoftware.org)


--- On Mon, 10/20/08, रविंदर ठा <ravinderthakur@...> wrote:
From: रविंदर ठा <ravinderthakur@...>
Subject: Re: web to semantic web : an automated approach
To: metadataportals@...
Cc: "Semantic Web" <semantic-web@...>, semantic_web@..., "Andreas Langegger" <al@...>
Date: Monday, October 20, 2008, 2:44 PM

>>>>What sort of stuff does a linked-data space enable? What sorts of
>>>>things can you _do_ with a linked data graph that you couldn't do
>>>>before?
A lot depends upon the kind of data we have. Assuming we have the requisite data i would expect best system to give answers to queries like "world population in 1968" as  3.5 billion and not something like this (the current best system). How good and confident a system is which says he has found 489,000 matches for a simple one word question like this. The reason for this is the underlying architecture used to process the data. Unless that architecture is moved form a word analysis based system(pagerank etc) to one based on meaning (SW) there isn't much scope to improve.


>>>>Where are the real, working, non-hypothetical applications I
>>>>can play with now that use say the linked-open-data space?.
There isn't any real semantic web application that i know of. Please do let me know if anyone knows one.


>>>>How do we get people excited enough about these things to invest the time in
>>>> making their data assets available in this way on the web? These are
>>>>the questions I personally need help with.
The point is data owners wont be doing it unless they find some monetary benefit in this. One or two might share, but there will always be a bigger proportion that wont be sharing their data. But the point is why should we wait for others to share their data. Why not create our own semantic data ? If the approach of convincing people to share their data is not working why not take the other one or even better why not use both the approaches. We only stand to gain with these approaches.


__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

RE: web to semantic web : an automated approach

by John Flynn :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some parts of this message have been removed. Learn more about Nabble's security policy.

BBN, and other NLP researchers, have had considerable success in using NLP to automatically extracting instance data from unstructured text and mapping it into ontological knowledge bases. The issue of co-reference resolution remains a difficult  problem. Extracting structure and  automatically creating the ontology is an even harder problem. Continued research in these areas is important because a great deal of human knowledge is contained in unstructured data. However, I’m personally convinced that in the long (maybe very long) run the best approach will be to mark up data as instances of classes and properties of ontologies as the very first step  in information processing and then automatically generating unstructured (human readable) text  from the knowledge bases that is tailored to specific human information requests. Human analyst would no longer spend their time writing unstructured text but would rather populate Semantic  Web knowledge bases. Of course, this approach to publishing wouldn’t apply to novels, plays, poems and other such works of art, only tailored responses to direct requests for information.

 

John

 

From: semantic-web-request@... [mailto:semantic-web-request@...] On Behalf Of ?????? ????? (ravinder thakur)
Sent: Monday, October 20, 2008 11:55 AM
To: John Flynn; semantic-web@...; semantic_web@...
Subject: Re: web to semantic web : an automated approach

 

Buts whats the incentive for web site owners to mark up their website with semantic data. Few days back i was reading some study conducted by Opera browser team that said that most of the html generated by websites is not even valid. How can we hope them to create correct semantic data. Also what happens to lot of other user submitted content(blogs, wikis etc ) ?

Instead why not create a mechanism to automatically convert web data to semantic data. Opencalais.com is already doing it on small domain, why can't/shouldn't we do it at web's scale ?


John : I realized that you are form BBN. In case you are aware, can you please tell us from your experience about the state of NLP ? To what extent the current best NLP systems are capable of extracting infroatmion from unformatted text ? And what are the hopes for the future to  overcome the curent shortcomings in NLP systems?



Re: web to semantic web : an automated approach

by c :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> I am a mathematician by training and would love to accept the challenge of creating a new search engine which is customizable semantic web enabled..

you could get a job at MetaWeb or Twine or Hakia. or heck even Google

that is if theyre actually hiring right now, not unhiring - YMMV

>
> After all, anyone who knows the origin of Google, may believe it is possible to create something new if Google no longer fits the bill.
>
> The same simple line of reasoning Page and Brin had when they built the prototype, was, "Build it and they (the guys with money) will come (to our door).'
>
> But the project will have to be OPEN SOURCE with OPEN LICENSE approach, and preferably have support from popular browsers.

ok yeah, the above mentioned companies are all proprietary hosted webservices. how about Wikia's search, or whatever? i know they bought some open source search engines. plus theres SINDICE/SWOOGLE etc. some of these are probably open

>
> Anyone up to the challenge?

frankly, my search needs are already met:

amazon for non-consumerelectronics products and reviews
newegg for consumerelectronics products and reviews
ebay for discontinued products
google for DNSy searches (eg engadget, dailykos, etc)
imdb for film reviews
isohunt for torrents
googlenews/topix for commercial news stories

the only needs that arent fully met ironically is the stuff on my own HD. i dont have an interest in beastly trackerd churning through the drive, and i wont run a proprietary OS, so Spotlight is out of the question..


>
> Milton Ponson
> GSM: +297 747 8280
> Rainbow Warriors Core Foundation
> PO Box 1154, Oranjestad
> Aruba, Dutch Caribbean
> www.rainbowwarriors.net (under revision)

gay militia?

> Project Paradigm: A structured approach to bringing the tools for sustainable development to all stakeholders worldwide
> www.projectparadigm.info (under construction)

the paradigm is shifting. perhaps remove 'www' from the URL?

> NGO-Opensource: Creating ICT tools for NGOs worldwide for Project Paradigm
> www.ngo-opensource.org (proposed project)

how about ngopen? easier to type/remember


> MetaPortal: providing online access to web sites and repositories of data and information for sustainable development
> www.metaportal.info (proposed project)

metaportal? welcome to 1993

> SemanticWebSoftware, part of NGO-Opensource to enable SW technologies in the Metaportal project (proposed site: www.semanticwebsoftware.org)
>
>
> --- On Mon, 10/20/08, ?????????????????? ?????? <ravinderthakur@...> wrote:
> From: ?????????????????? ?????? <ravinderthakur@...>
> Subject: Re: web to semantic web : an automated approach
> To: metadataportals@...
> Cc: "Semantic Web" <semantic-web@...>, semantic_web@..., "Andreas Langegger" <al@...>
> Date: Monday, October 20, 2008, 2:44 PM
>
> >>>>What sort of stuff does a linked-data space enable? What sorts of
>
> >>>>things can you _do_ with a linked data graph that you couldn't do
>
>
>
> >>>>before?
>
> A lot depends upon the kind of data we have. Assuming we have the
> requisite data i would expect best system to give answers to queries
> like "world population in 1968" as?? 3.5 billion and not something like this (the current best system). How good and confident a system is which says he has found 489,000  matches
> for a simple one word question like this. The reason for this is the
> underlying architecture used to process the data. Unless that
> architecture is moved form a word analysis based system(pagerank etc)
> to one based on meaning (SW) there isn't much scope to improve.
>
>
>
>
>
> >>>>Where are the real, working, non-hypothetical applications I
>
>
> >>>>can play with now that use say the linked-open-data space?.
>
> There isn't any real semantic web application that i know of. Please do let me know if anyone knows one.
>
>
>
>
>
> >>>>How do we get people excited enough about these things to invest the time in
>
> >>>>
> making their data assets available in this way on the web? These are
>
>
> >>>>the questions I personally need help with.
>
> The point is data owners wont be doing it unless they find some
> monetary benefit in this. One or two might share, but there will always
> be a bigger proportion that wont be sharing their data. But the point
> is why should we wait for others to share their data. Why not create
> our own semantic data ? If the approach of convincing people to share their data is not working why not take the other one or even better why not use both the approaches. We only stand to gain with these approaches.
>
>
>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com 

< Prev | 1 - 2 - 3 | Next >