|
View:
New views
9 Messages
—
Rating Filter:
Alert me
|
|
|
Reduce database file sizeHello,
I would like to make an SQLite database as small as possible to transfer it over a slow link. Currently I am dropping all custom indices, run VACUUM and compress the file with lzma -9. However, I noticed that if I dump the entire database into a text file with the SQLite shell and then compress the text file, the result is significantly smaller than the "stripped" compressed database: Full database: 146 MB Without Custom Indices: 117 MB Compressed: 13 MB Dumped DB: 181 MB Compressed: 6.8 MB Is there a way to strip even more redundancies from the DB for the transfer (e.g. the automatically created indices for primary keys)? Alternatively, is there an easy way to dump and recover the DB using the standard API rather than the SQLite shell? (Obviously I could write a dump program myself, but I'd be nice if there is a solution that requires less work). Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeOn 10 Apr 2010, at 11:06pm, Nikolaus Rath wrote: > However, I noticed that if I dump the entire database into a text file > with the SQLite shell and then compress the text file, the result is > significantly smaller than the "stripped" compressed database: > > Full database: 146 MB > Without Custom Indices: 117 MB > Compressed: 13 MB > > Dumped DB: 181 MB > Compressed: 6.8 MB I assume you're using lzma -9 for both of these. You might try other compressors besides lzma but you're already getting very good compression. > Is there a way to strip even more redundancies from the DB for the > transfer (e.g. the automatically created indices for primary keys)? I think those are inherent in how SQLite works. You can't strip them from its format. > Alternatively, is there an easy way to dump and recover the DB using the > standard API rather than the SQLite shell? (Obviously I could write a > dump program myself, but I'd be nice if there is a solution that > requires less work). I think you've already worked out the best way to do it using sqlite and SQL formats. And of course if you're writing your own dump program you're probably better off using TSV format than raw SQL commands. Unless you have BLOBs in your file. Simon. _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeOn 04/10/2010 03:06 PM, Nikolaus Rath wrote:
> However, I noticed that if I dump the entire database into a text file > with the SQLite shell and then compress the text file, the result is > significantly smaller than the "stripped" compressed database: Have you tried different page sizes? You could also do the transmission using rsync with compression which may turn out to transfer even less. It may also be worthwhile looking to other compression mechanisms. For example smaz works well on English text because the compression dictionary is prebuilt rather than generated for the specific data. A prebuilt dictionary type mechanism may work well for you. You may also find a PPM compressor even better. > Alternatively, is there an easy way to dump and recover the DB using the > standard API rather than the SQLite shell? (Obviously I could write a > dump program myself, but I'd be nice if there is a solution that > requires less work). You are using Python and APSW IIRC. APSW includes a Python based shell that has dumping and restore code and can be used programmatically or interactively. Roger _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeRoger Binns <rogerb@...> writes:
> On 04/10/2010 03:06 PM, Nikolaus Rath wrote: >> However, I noticed that if I dump the entire database into a text file >> with the SQLite shell and then compress the text file, the result is >> significantly smaller than the "stripped" compressed database: > > Have you tried different page sizes? Yes, but the effect isn't very big: Page Size: DB: Compressed DB: 512 152 MB 21 MB 1024 147 MB 19 MB 4096 143 MB 18 MB 10240 147 MB 19 MB > You could also do the transmission using rsync with compression which may > turn out to transfer even less. > > It may also be worthwhile looking to other compression mechanisms. For > example smaz works well on English text because the compression dictionary > is prebuilt rather than generated for the specific data. A prebuilt > dictionary type mechanism may work well for you. You may also find a PPM > compressor even better. Using rsync is unfortunately not possible, I'm stuck with HTTP and FTP. I tried a few PPM compressors, but even though the compression ratio is amazing, I'm not desperate enough to invest *that* much CPU time yet. I may take at smaz though, thanks for the pointer. >> Alternatively, is there an easy way to dump and recover the DB using the >> standard API rather than the SQLite shell? (Obviously I could write a >> dump program myself, but I'd be nice if there is a solution that >> requires less work). > > You are using Python and APSW IIRC. APSW includes a Python based shell > that has dumping and restore code and can be used programmatically or > interactively. Yes, you remember correctly. Actually that would be a perfect solution. But how do I use it? It seems to me that I need to pass some argument to Shell.command_dump(), because the following just produces an empty file: import apsw ofh = open('dump.txt', 'w') db = apsw.Connection('test.db') s = apsw.Shell(db=db, stdout=ofh) s.command_dump('.dump') Thanks, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeOn 04/11/2010 11:09 AM, Nikolaus Rath wrote:
> Yes, you remember correctly. Actually that would be a perfect solution. > But how do I use it? It seems to me that I need to pass some argument to > Shell.command_dump(), because the following just produces an empty file: > > > import apsw > ofh = open('dump.txt', 'w') > db = apsw.Connection('test.db') > s = apsw.Shell(db=db, stdout=ofh) > s.command_dump('.dump') That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-) I'd suggest the more orthodox way of invoking commands which is less likely to catch you out like this. s.process_command(".dump") Roger _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeRoger Binns <rogerb@...> writes:
> On 04/11/2010 11:09 AM, Nikolaus Rath wrote: >> Yes, you remember correctly. Actually that would be a perfect solution. >> But how do I use it? It seems to me that I need to pass some argument to >> Shell.command_dump(), because the following just produces an empty file: >> >> >> import apsw >> ofh = open('dump.txt', 'w') >> db = apsw.Connection('test.db') >> s = apsw.Shell(db=db, stdout=ofh) >> s.command_dump('.dump') > > That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-) I see. I gues I got confused by the help that refers to the parameter as 'cmd': | command_dump(self, cmd) | dump ?TABLE? [TABLE...]: Dumps all or specified tables in SQL text format | | The table name is treated as like pattern so you can use % as > I'd suggest the more orthodox way of invoking commands which is less likely > to catch you out like this. > > s.process_command(".dump") That works, thanks! But why is it *that* slow? [0] vostro:~/tmp$ cat dump.py import apsw db = apsw.Connection('test.db') ofh = open('dump-py.sql', 'w') s = apsw.Shell(db=db, stdout=ofh) s.process_command('.dump') [0] vostro:~/tmp$ cat dump.sh cat <<EOF | sqlite3 test.db .output dump.txt .dump EOF [0] vostro:~/tmp$ time python dump.py ; time sh dump.sh real 2m16.632s user 2m12.080s sys 0m3.756s real 0m13.289s user 0m9.661s sys 0m3.600s Best, -Nikolaus -- »Time flies like an arrow, fruit flies like a Banana.« PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6 02CF A9AD B7F8 AE4E 425C _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeNikolaus Rath wrote:
> I see. I gues I got confused by the help that refers to the parameter as > 'cmd': The help shows that a list of TABLE is taken in the same syntax as SQLite's shell uses. Also note that the help is formatted for documenting interactive usage rather than API usage. > That works, thanks! But why is it *that* slow? Because it is written in Python and SQLite's is in C. I've made no effort to optimise the Python - please send me a sample database and I'll see what can be done. Additionally the Python supports more functionality which does have an effect on speed. For example you can choose what encoding is used. Roger _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeIf the subject is transferring data from a web server to a central
database, I like to refer to a feature of an open source package that I published myself. It is only a side-side feature, resulting from the general design, but it was added with just this in mind. This feature, 'Connecting over the internet', is described in http:// packages.python.org/sqmediumlite/#5.4 (this is only for Python users). Op 12-apr-2010, om 6:57 heeft Roger Binns het volgende geschreven: > Nikolaus Rath wrote: >> I see. I gues I got confused by the help that refers to the >> parameter as >> 'cmd': > > The help shows that a list of TABLE is taken in the same syntax as > SQLite's shell uses. Also note that the help is formatted for > documenting interactive usage rather than API usage. > >> That works, thanks! But why is it *that* slow? > > Because it is written in Python and SQLite's is in C. I've made no > effort to optimise the Python - please send me a sample database and > I'll see what can be done. > > Additionally the Python supports more functionality which does have an > effect on speed. For example you can choose what encoding is used. > > Roger > _______________________________________________ > sqlite-users mailing list > sqlite-users@... > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
|
|
Re: Reduce database file sizeOn Sun, 11 Apr 2010 14:09:21 -0400
Nikolaus Rath <Nikolaus@...> wrote: > Using rsync is unfortunately not possible, I'm stuck with HTTP and > FTP. > > I tried a few PPM compressors, but even though the compression ratio > is amazing, I'm not desperate enough to invest *that* much CPU time > yet. > > I may take at smaz though, thanks for the pointer. > You can try bbb and paq8. You can find them at http://cs.fit.edu/~mmahoney/compression/ and other compressors. Please, note that they are very slow (depending on options it can take hours) At http://mattmahoney.net/dc/text.html you can look for more compression programs. PeaZip has a gui for some of this compression programs http://peazip.sourceforge.net/ and works under windows and linux (but hasn't bbb). _______________________________________________ sqlite-users mailing list sqlite-users@... http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users |
| Free embeddable forum powered by Nabble | Forum Help |