proof of concept code with bsd db4

View: New views
4 Messages — Rating Filter:   Alert me  

proof of concept code with bsd db4

by solsTiCe d'Hiver-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi.
i wanted to make a new bsd db4 back-end for alpm. but i never reached my
goal. and will not
all i have is a proof of concept code that use bsd db4 api to store
pmpkg_t and wanted to share it with anyone (interested ?)

i have coded 3 utilities:
- one that converts pacman's db into a bsd db4 file for each repo
- one that reads that new db format to perform query as pacman does
- one that converts directly a tarball db (taken from a sync mirror)
into a bsd db4 file

if this proves useful for someone, great.
More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
and in the README of
http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz








Re: proof of concept code with bsd db4

by Dan McGee :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
<solstice.dhiver@...> wrote:

> hi.
> i wanted to make a new bsd db4 back-end for alpm. but i never reached my
> goal. and will not
> all i have is a proof of concept code that use bsd db4 api to store
> pmpkg_t and wanted to share it with anyone (interested ?)
>
> i have coded 3 utilities:
> - one that converts pacman's db into a bsd db4 file for each repo
> - one that reads that new db format to perform query as pacman does
> - one that converts directly a tarball db (taken from a sync mirror)
> into a bsd db4 file
>
> if this proves useful for someone, great.
> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
> and in the README of
> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz

Nice work on actually doing something here and sharing the code!
Thanks, as it might just make some wheels turn for some other people
here on the list.

I grabbed your code and took it for a spin. I liked the fact that you
had a README and all, I didn't have much trouble at all getting it
running. I even found a real hotspot in readdb (add_sorted is a killer
in a tight loop; it makes a lot more sense to do all your adds
followed by an alpm_list_msort()).

For others on the list who haven't looked at it yet:
* Raw speed alone, this wins. Of course, pacman does a lot more (this
isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
cache case, of course).
* BDB uses key/value pairs for those who aren't familiar. The database
layout could probably be simplified a bit- we could pack many
attributes into one key/value pair for those we don't use all that
often, or never search by but only do lookups.
* It didn't take all that much code to do this. That is encouraging.

What do people think about non-file-system-based backends? There are
several options we could think about:
* BSD DB4, similar to what was done here (fast and pretty simple)
* SQLite, which might give us a bit more flexibility for querying/lookup
* Direct tarfile parsing each time, no conversion needed but likely
rather inefficient
* ???

The biggest reason always raised in the past against non-file backends
was corruption. If you get a corrupted localdb or something you can't
recover from, you are in a bad place. With files, you have the lowest
barrier to recovery. With a more binary format, it is a lot trickier.
Thoughts?

-Dan


Re: proof of concept code with bsd db4

by Ciprian Dorin, Craciun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee@...> wrote:

> On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
> <solstice.dhiver@...> wrote:
>> hi.
>> i wanted to make a new bsd db4 back-end for alpm. but i never reached my
>> goal. and will not
>> all i have is a proof of concept code that use bsd db4 api to store
>> pmpkg_t and wanted to share it with anyone (interested ?)
>>
>> i have coded 3 utilities:
>> - one that converts pacman's db into a bsd db4 file for each repo
>> - one that reads that new db format to perform query as pacman does
>> - one that converts directly a tarball db (taken from a sync mirror)
>> into a bsd db4 file
>>
>> if this proves useful for someone, great.
>> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
>> and in the README of
>> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz
>
> Nice work on actually doing something here and sharing the code!
> Thanks, as it might just make some wheels turn for some other people
> here on the list.
>
> I grabbed your code and took it for a spin. I liked the fact that you
> had a README and all, I didn't have much trouble at all getting it
> running. I even found a real hotspot in readdb (add_sorted is a killer
> in a tight loop; it makes a lot more sense to do all your adds
> followed by an alpm_list_msort()).
>
> For others on the list who haven't looked at it yet:
> * Raw speed alone, this wins. Of course, pacman does a lot more (this
> isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
> search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
> cache case, of course).
> * BDB uses key/value pairs for those who aren't familiar. The database
> layout could probably be simplified a bit- we could pack many
> attributes into one key/value pair for those we don't use all that
> often, or never search by but only do lookups.
> * It didn't take all that much code to do this. That is encouraging.
>
> What do people think about non-file-system-based backends? There are
> several options we could think about:
> * BSD DB4, similar to what was done here (fast and pretty simple)
> * SQLite, which might give us a bit more flexibility for querying/lookup
> * Direct tarfile parsing each time, no conversion needed but likely
> rather inefficient
> * ???
>
> The biggest reason always raised in the past against non-file backends
> was corruption. If you get a corrupted localdb or something you can't
> recover from, you are in a bad place. With files, you have the lowest
> barrier to recovery. With a more binary format, it is a lot trickier.
> Thoughts?
>
> -Dan


    Interesting. A quicker pacman should be a positive thing, right? :)

    I vote for BerkeleyDB, because I've used it in previous projects,
and besides performance it also brings data integrity and
recoverability. (For example what happens if a power outage happens
during pacman upgrading, just when pacman is writing its file system?
In the case of BerkeleyDB we have atomic operations without a
problem.)

    Another note: BerkeleyDB also supports indices, thus allowing us
to more efficiently search fol keys based on values (searching
packages by fields). Also newer versions of BerkeleyDB have a kind of
SQL-like language for defining structures. [1]

    About backups, there is a tool to dump and load a database, thus
backups should be very easy.

    So if someone needs some help with implementing this feature I
could also help.

    Ciprian.

    [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html


Re: proof of concept code with bsd db4

by Ciprian Dorin, Craciun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Mon, Nov 9, 2009 at 8:54 AM, Ciprian Dorin, Craciun
<ciprian.craciun@...> wrote:

> On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee@...> wrote:
>> On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
>> <solstice.dhiver@...> wrote:
>>> hi.
>>> i wanted to make a new bsd db4 back-end for alpm. but i never reached my
>>> goal. and will not
>>> all i have is a proof of concept code that use bsd db4 api to store
>>> pmpkg_t and wanted to share it with anyone (interested ?)
>>>
>>> i have coded 3 utilities:
>>> - one that converts pacman's db into a bsd db4 file for each repo
>>> - one that reads that new db format to perform query as pacman does
>>> - one that converts directly a tarball db (taken from a sync mirror)
>>> into a bsd db4 file
>>>
>>> if this proves useful for someone, great.
>>> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html
>>> and in the README of
>>> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz
>>
>> Nice work on actually doing something here and sharing the code!
>> Thanks, as it might just make some wheels turn for some other people
>> here on the list.
>>
>> I grabbed your code and took it for a spin. I liked the fact that you
>> had a README and all, I didn't have much trouble at all getting it
>> running. I even found a real hotspot in readdb (add_sorted is a killer
>> in a tight loop; it makes a lot more sense to do all your adds
>> followed by an alpm_list_msort()).
>>
>> For others on the list who haven't looked at it yet:
>> * Raw speed alone, this wins. Of course, pacman does a lot more (this
>> isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman"
>> search yielded times of 0.083 seconds vs 0.282 seconds (in the hot
>> cache case, of course).
>> * BDB uses key/value pairs for those who aren't familiar. The database
>> layout could probably be simplified a bit- we could pack many
>> attributes into one key/value pair for those we don't use all that
>> often, or never search by but only do lookups.
>> * It didn't take all that much code to do this. That is encouraging.
>>
>> What do people think about non-file-system-based backends? There are
>> several options we could think about:
>> * BSD DB4, similar to what was done here (fast and pretty simple)
>> * SQLite, which might give us a bit more flexibility for querying/lookup
>> * Direct tarfile parsing each time, no conversion needed but likely
>> rather inefficient
>> * ???
>>
>> The biggest reason always raised in the past against non-file backends
>> was corruption. If you get a corrupted localdb or something you can't
>> recover from, you are in a bad place. With files, you have the lowest
>> barrier to recovery. With a more binary format, it is a lot trickier.
>> Thoughts?
>>
>> -Dan
>
>
>    Interesting. A quicker pacman should be a positive thing, right? :)
>
>    I vote for BerkeleyDB, because I've used it in previous projects,
> and besides performance it also brings data integrity and
> recoverability. (For example what happens if a power outage happens
> during pacman upgrading, just when pacman is writing its file system?
> In the case of BerkeleyDB we have atomic operations without a
> problem.)
>
>    Another note: BerkeleyDB also supports indices, thus allowing us
> to more efficiently search fol keys based on values (searching
> packages by fields). Also newer versions of BerkeleyDB have a kind of
> SQL-like language for defining structures. [1]
>
>    About backups, there is a tool to dump and load a database, thus
> backups should be very easy.
>
>    So if someone needs some help with implementing this feature I
> could also help.
>
>    Ciprian.
>
>    [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html

    Sory for the wronk link (I've searched it in a hurry on Google).
It's the following one:
    http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/db_sql.html