« Return to Thread: How to sort list of sublists as per key/keys of sublist?

Re: How to sort list of sublists as per key/keys of sublist?

by Lev777 :: Rate this Message:

Reply to Author | View in Thread

Let me start from the beginning...
There is XML data which is separate for each type of objects.
It looks like:
...
<Oil_Painting>305</Oil_Painting>
<Title>Original Title</Title>
<Title_English>Some Title</Title English>
<Title_German>Some other title in German</Title German>
<Place>Bonn,school of arts, 1997</Place>
<Exhib>Exhibition of XYZ, Gallery of Bonn, Germany, 1980/1981</Exhib>
<Exhib>Exhibition of XYZ1, Gallery of Amsterdam, Holland, April 1981- 1982</Exhib>
<Monographs>M.L. Sam, The book name 1940-2001, Catalogue of Paintings, Cologne, 2002, Vol. II, p. 1174</Monographs>
<Monographs>M.L. Markus, The book name, Catalogue of Paintings, Cologne, 2003, p. 1174 (c)</Monographs>
.....

<Tapestry>305</Tapestry>
<Title English>Original Title</Title English>
<Author>A.S. Kushkin, 1963</Author>
<Reproduced By></Reproduced By>
<Based On></Based On>
<Year>1980 April 2</Year>
<Exhibition></Exhibition>
<Catalog></Catalog>
<.....

....

As you may see type of object I am taking from the first (or agreed with author of XML) field, which is OilPainting, Tapestry, etc. Unique ID within the Type is the value of the Type field.
Title is main descriptive field of the object. These 2 properties - ID and Type are unique and shall be used to genererate Primary Key (or as you say a surrogate key).

The XML is parsed by PHP script into Prolog terms (Knowledge Base) Then this knowledge base run as a local http server.

Main web site sends http requests to Prolog server and gets results which are shown in a browser.

XML data format is not strictly defined. Some fields have compound data like:

<Monographs>M.L. Sam, The great painter T.Smith 1940-2001, Catalogue of Paintings, Cologne, 2002, Vol. II, p. 1174</Monographs>
When parser find this type of value it may leave it as it is or may try to parse into several objects: Author of Monograph, 1st name of book, 2nd name(or line) of book, Publisher City, Year of publish, etc....

The problem is that, I can not demand from XML issuer to generate 'proper' XML with all fields and values separated.
Data come from an Old database system which was meant for entering data printing it on paper in a human readable way.    

However extracting and structuring information (generating new objects and linking to parent objects) from this kind of compound fields is priceless.

Further, objects will be categorized manually by Art expert and will be linked to different categories. Categories are hierarchical.
Objects itself will be interconnected as well (by an Art expert). Relations will be parent->child or node<=>node.
Thus we will get undirected graph...

When we get XML with updates/or new objects, all created so far linkage should be (of course!) persisted. And here, a surrogate key will help (Type+Id).
 
That is all, basically.


On Fri, Jun 12, 2009 at 1:05 AM, Richard O'Keefe <ok@...> wrote:

On 12 Jun 2009, at 9:23 am, Levan Cheishvili wrote:

I have the following structure:

object('obj1', [id='12', color='blue']).
object('obj4', [id='13', color='red', weight=120]).
object('obj21', [id='15 a', color='yellow', weight=1000, price=23]).
object('obj21', [id='16 a', color='blue', weight=200, price=230]).


While selecting a proper structure for data representation I read a newsletter: http://www.ainewsletter.com/newsletters/aix_0309.htm#ontology
and did not know that in prolog (=)/2 is not supported as (-)/2. (I am learning prolog while working on my very first project in this language).

I think it might be better to read a good book.
And while Prolog is _better_ than SQL, a lot of the stuff in
data base _theory_ (which SQL little respects) is useful to know.


How come you have two facts for obj21?

My data is like:

object(type1, id, title, [
 prop1-val1,
 prop2-val2,...
 propN-valN
]).

Data come from external source in form of XML, and I have no idea how many properties will be, but I know that there will be type, id and title which will form a unique key.

The point of an "id" is to be a unique identifier.
In fact, surely 'obj1', 'obj4', and so on *are* "ids".

With all the earnestness I possess, I tell you that life
will be *far* simpler if the things you use as unique keys
are SINGLE atoms or SINGLE numbers.  (What William Kent in
his wonderful book "Data and Reality" calls "surrogates".)

In general you will need relations mapping between surrogates
(internal keys) and external names.  So if external names are
(type,id,title) keys, use

surrogate(Atom, Type, Id, Title).
 % fd  Atom -> Type, Id, Title
 % fd Type, Id, Title -> Atom


I'm still very troubled by something being *called* an Id that is none.

If your data are provided in XML, then it simply isn't try that
your data "is (sic.) like:
       object(type1, id1, title, [ ... ])."
Your data are like *whatever you want them to be like*.

The big difference between
       prop1(key1, val1).
and
       object(key1, [...,prop1-val1,...])
is that

  If you just want to know "what is the value of prop1 for key1"
  then the first version (one relation per property) JUST RETURNS
  THE INFORMATION YOU WANT, while the second version COPIES
  EACH AND EVERY PROPERTY for that object EVERY TIME.

That is to say, using the list-of-properties version is
seriously inefficient, and that's throwing roses at it.




During my work on this project I have tried several structures. And first one was binary relation. But I found it difficult for me to write predicates for querying it,

Give specific examples.  The data structure you displayed in your
message only really handles binary relations, so there could be no
simple queries easily expressible that way that are not easier
expressed using binary relations.


as I have complex properties, like e.g. prop(obj, val1, val2, val3...) and each value may be connectedc to another property etc.

This is not clear.  Give an example.  There is no reason why the
*value* of a property cannot be a list or arbitrary term.

Why don't you tell us what your _abstract_ problem is, your
"logical data model", as it were, and _then_ we can discuss how
it might be mapped well onto Prolog.



However I think that RDF would be better in this case....I will do it in RDF when I learn it :-)

The _basic_ data model in RDF is binary relations, where admittedly
the relation names can be very long.  RDF can be written in Notation3,
after all.

Please do tell us more about what you are trying to model/do.
Let's see how well it can be done.



 « Return to Thread: How to sort list of sublists as per key/keys of sublist?