|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
proof of concept code with bsd db4hi.
i wanted to make a new bsd db4 back-end for alpm. but i never reached my goal. and will not all i have is a proof of concept code that use bsd db4 api to store pmpkg_t and wanted to share it with anyone (interested ?) i have coded 3 utilities: - one that converts pacman's db into a bsd db4 file for each repo - one that reads that new db format to perform query as pacman does - one that converts directly a tarball db (taken from a sync mirror) into a bsd db4 file if this proves useful for someone, great. More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html and in the README of http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz |
|
|
Re: proof of concept code with bsd db4On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver
<solstice.dhiver@...> wrote: > hi. > i wanted to make a new bsd db4 back-end for alpm. but i never reached my > goal. and will not > all i have is a proof of concept code that use bsd db4 api to store > pmpkg_t and wanted to share it with anyone (interested ?) > > i have coded 3 utilities: > - one that converts pacman's db into a bsd db4 file for each repo > - one that reads that new db format to perform query as pacman does > - one that converts directly a tarball db (taken from a sync mirror) > into a bsd db4 file > > if this proves useful for someone, great. > More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html > and in the README of > http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz Nice work on actually doing something here and sharing the code! Thanks, as it might just make some wheels turn for some other people here on the list. I grabbed your code and took it for a spin. I liked the fact that you had a README and all, I didn't have much trouble at all getting it running. I even found a real hotspot in readdb (add_sorted is a killer in a tight loop; it makes a lot more sense to do all your adds followed by an alpm_list_msort()). For others on the list who haven't looked at it yet: * Raw speed alone, this wins. Of course, pacman does a lot more (this isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman" search yielded times of 0.083 seconds vs 0.282 seconds (in the hot cache case, of course). * BDB uses key/value pairs for those who aren't familiar. The database layout could probably be simplified a bit- we could pack many attributes into one key/value pair for those we don't use all that often, or never search by but only do lookups. * It didn't take all that much code to do this. That is encouraging. What do people think about non-file-system-based backends? There are several options we could think about: * BSD DB4, similar to what was done here (fast and pretty simple) * SQLite, which might give us a bit more flexibility for querying/lookup * Direct tarfile parsing each time, no conversion needed but likely rather inefficient * ??? The biggest reason always raised in the past against non-file backends was corruption. If you get a corrupted localdb or something you can't recover from, you are in a bad place. With files, you have the lowest barrier to recovery. With a more binary format, it is a lot trickier. Thoughts? -Dan |
|
|
Re: proof of concept code with bsd db4On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee@...> wrote:
> On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver > <solstice.dhiver@...> wrote: >> hi. >> i wanted to make a new bsd db4 back-end for alpm. but i never reached my >> goal. and will not >> all i have is a proof of concept code that use bsd db4 api to store >> pmpkg_t and wanted to share it with anyone (interested ?) >> >> i have coded 3 utilities: >> - one that converts pacman's db into a bsd db4 file for each repo >> - one that reads that new db format to perform query as pacman does >> - one that converts directly a tarball db (taken from a sync mirror) >> into a bsd db4 file >> >> if this proves useful for someone, great. >> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html >> and in the README of >> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz > > Nice work on actually doing something here and sharing the code! > Thanks, as it might just make some wheels turn for some other people > here on the list. > > I grabbed your code and took it for a spin. I liked the fact that you > had a README and all, I didn't have much trouble at all getting it > running. I even found a real hotspot in readdb (add_sorted is a killer > in a tight loop; it makes a lot more sense to do all your adds > followed by an alpm_list_msort()). > > For others on the list who haven't looked at it yet: > * Raw speed alone, this wins. Of course, pacman does a lot more (this > isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman" > search yielded times of 0.083 seconds vs 0.282 seconds (in the hot > cache case, of course). > * BDB uses key/value pairs for those who aren't familiar. The database > layout could probably be simplified a bit- we could pack many > attributes into one key/value pair for those we don't use all that > often, or never search by but only do lookups. > * It didn't take all that much code to do this. That is encouraging. > > What do people think about non-file-system-based backends? There are > several options we could think about: > * BSD DB4, similar to what was done here (fast and pretty simple) > * SQLite, which might give us a bit more flexibility for querying/lookup > * Direct tarfile parsing each time, no conversion needed but likely > rather inefficient > * ??? > > The biggest reason always raised in the past against non-file backends > was corruption. If you get a corrupted localdb or something you can't > recover from, you are in a bad place. With files, you have the lowest > barrier to recovery. With a more binary format, it is a lot trickier. > Thoughts? > > -Dan Interesting. A quicker pacman should be a positive thing, right? :) I vote for BerkeleyDB, because I've used it in previous projects, and besides performance it also brings data integrity and recoverability. (For example what happens if a power outage happens during pacman upgrading, just when pacman is writing its file system? In the case of BerkeleyDB we have atomic operations without a problem.) Another note: BerkeleyDB also supports indices, thus allowing us to more efficiently search fol keys based on values (searching packages by fields). Also newer versions of BerkeleyDB have a kind of SQL-like language for defining structures. [1] About backups, there is a tool to dump and load a database, thus backups should be very easy. So if someone needs some help with implementing this feature I could also help. Ciprian. [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html |
|
|
Re: proof of concept code with bsd db4On Mon, Nov 9, 2009 at 8:54 AM, Ciprian Dorin, Craciun
<ciprian.craciun@...> wrote: > On Mon, Nov 9, 2009 at 5:50 AM, Dan McGee <dpmcgee@...> wrote: >> On Sat, Oct 31, 2009 at 11:37 AM, solsTiCe d'Hiver >> <solstice.dhiver@...> wrote: >>> hi. >>> i wanted to make a new bsd db4 back-end for alpm. but i never reached my >>> goal. and will not >>> all i have is a proof of concept code that use bsd db4 api to store >>> pmpkg_t and wanted to share it with anyone (interested ?) >>> >>> i have coded 3 utilities: >>> - one that converts pacman's db into a bsd db4 file for each repo >>> - one that reads that new db format to perform query as pacman does >>> - one that converts directly a tarball db (taken from a sync mirror) >>> into a bsd db4 file >>> >>> if this proves useful for someone, great. >>> More info at http://pagesperso-orange.fr/solstice.dhiver/alpmdb4.html >>> and in the README of >>> http://pagesperso-orange.fr/solstice.dhiver/data/readdb.tar.gz >> >> Nice work on actually doing something here and sharing the code! >> Thanks, as it might just make some wheels turn for some other people >> here on the list. >> >> I grabbed your code and took it for a spin. I liked the fact that you >> had a README and all, I didn't have much trouble at all getting it >> running. I even found a real hotspot in readdb (add_sorted is a killer >> in a tight loop; it makes a lot more sense to do all your adds >> followed by an alpm_list_msort()). >> >> For others on the list who haven't looked at it yet: >> * Raw speed alone, this wins. Of course, pacman does a lot more (this >> isn't parsing conf files, reading mirrorlists, etc) but a "-Ss pacman" >> search yielded times of 0.083 seconds vs 0.282 seconds (in the hot >> cache case, of course). >> * BDB uses key/value pairs for those who aren't familiar. The database >> layout could probably be simplified a bit- we could pack many >> attributes into one key/value pair for those we don't use all that >> often, or never search by but only do lookups. >> * It didn't take all that much code to do this. That is encouraging. >> >> What do people think about non-file-system-based backends? There are >> several options we could think about: >> * BSD DB4, similar to what was done here (fast and pretty simple) >> * SQLite, which might give us a bit more flexibility for querying/lookup >> * Direct tarfile parsing each time, no conversion needed but likely >> rather inefficient >> * ??? >> >> The biggest reason always raised in the past against non-file backends >> was corruption. If you get a corrupted localdb or something you can't >> recover from, you are in a bad place. With files, you have the lowest >> barrier to recovery. With a more binary format, it is a lot trickier. >> Thoughts? >> >> -Dan > > > Interesting. A quicker pacman should be a positive thing, right? :) > > I vote for BerkeleyDB, because I've used it in previous projects, > and besides performance it also brings data integrity and > recoverability. (For example what happens if a power outage happens > during pacman upgrading, just when pacman is writing its file system? > In the case of BerkeleyDB we have atomic operations without a > problem.) > > Another note: BerkeleyDB also supports indices, thus allowing us > to more efficiently search fol keys based on values (searching > packages by fields). Also newer versions of BerkeleyDB have a kind of > SQL-like language for defining structures. [1] > > About backups, there is a tool to dump and load a database, thus > backups should be very easy. > > So if someone needs some help with implementing this feature I > could also help. > > Ciprian. > > [1] http://www.oracle.com/technology/pub/articles/seltzer-berkeleydb-sql.html Sory for the wronk link (I've searched it in a hurry on Google). It's the following one: http://www.oracle.com/technology/documentation/berkeley-db/db/api_reference/C/db_sql.html |
| Free embeddable forum powered by Nabble | Forum Help |