Reduce database file size

View: New views
9 Messages — Rating Filter:   Alert me  

Reduce database file size

by Nikolaus Rath :: Rate this Message:

| View Threaded | Show Only this Message

Hello,

I would like to make an SQLite database as small as possible to transfer
it over a slow link. Currently I am dropping all custom indices, run
VACUUM and compress the file with lzma -9.

However, I noticed that if I dump the entire database into a text file
with the SQLite shell and then compress the text file, the result is
significantly smaller than the "stripped" compressed database:

Full database: 146 MB
Without Custom Indices: 117 MB
Compressed: 13 MB

Dumped DB: 181 MB
Compressed: 6.8 MB


Is there a way to strip even more redundancies from the DB for the
transfer (e.g. the automatically created indices for primary keys)?

Alternatively, is there an easy way to dump and recover the DB using the
standard API rather than the SQLite shell? (Obviously I could write a
dump program myself, but I'd be nice if there is a solution that
requires less work).


Best,

   -Nikolaus

--
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Simon Slavin-3 :: Rate this Message:

| View Threaded | Show Only this Message


On 10 Apr 2010, at 11:06pm, Nikolaus Rath wrote:

> However, I noticed that if I dump the entire database into a text file
> with the SQLite shell and then compress the text file, the result is
> significantly smaller than the "stripped" compressed database:
>
> Full database: 146 MB
> Without Custom Indices: 117 MB
> Compressed: 13 MB
>
> Dumped DB: 181 MB
> Compressed: 6.8 MB

I assume you're using lzma -9 for both of these.  You might try other compressors besides lzma but you're already getting very good compression.

> Is there a way to strip even more redundancies from the DB for the
> transfer (e.g. the automatically created indices for primary keys)?

I think those are inherent in how SQLite works.  You can't strip them from its format.

> Alternatively, is there an easy way to dump and recover the DB using the
> standard API rather than the SQLite shell? (Obviously I could write a
> dump program myself, but I'd be nice if there is a solution that
> requires less work).

I think you've already worked out the best way to do it using sqlite and SQL formats.  And of course if you're writing your own dump program you're probably better off using TSV format than raw SQL commands.  Unless you have BLOBs in your file.

Simon.
_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Roger Binns :: Rate this Message:

| View Threaded | Show Only this Message

On 04/10/2010 03:06 PM, Nikolaus Rath wrote:
> However, I noticed that if I dump the entire database into a text file
> with the SQLite shell and then compress the text file, the result is
> significantly smaller than the "stripped" compressed database:

Have you tried different page sizes?

You could also do the transmission using rsync with compression which may
turn out to transfer even less.

It may also be worthwhile looking to other compression mechanisms.  For
example smaz works well on English text because the compression dictionary
is prebuilt rather than generated for the specific data.  A prebuilt
dictionary type mechanism may work well for you.  You may also find a PPM
compressor even better.

> Alternatively, is there an easy way to dump and recover the DB using the
> standard API rather than the SQLite shell? (Obviously I could write a
> dump program myself, but I'd be nice if there is a solution that
> requires less work).

You are using Python and APSW IIRC.  APSW includes a Python based shell that
has dumping and restore code and can be used programmatically or interactively.

Roger
_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Nikolaus Rath :: Rate this Message:

| View Threaded | Show Only this Message

Roger Binns <rogerb@...> writes:
> On 04/10/2010 03:06 PM, Nikolaus Rath wrote:
>> However, I noticed that if I dump the entire database into a text file
>> with the SQLite shell and then compress the text file, the result is
>> significantly smaller than the "stripped" compressed database:
>
> Have you tried different page sizes?

Yes, but the effect isn't very big:

Page Size:      DB:       Compressed DB:
512             152 MB    21 MB
1024            147 MB    19 MB
4096            143 MB    18 MB
10240           147 MB    19 MB

> You could also do the transmission using rsync with compression which may
> turn out to transfer even less.
>
> It may also be worthwhile looking to other compression mechanisms.  For
> example smaz works well on English text because the compression dictionary
> is prebuilt rather than generated for the specific data.  A prebuilt
> dictionary type mechanism may work well for you.  You may also find a PPM
> compressor even better.

Using rsync is unfortunately not possible, I'm stuck with HTTP and FTP.

I tried a few PPM compressors, but even though the compression ratio is
amazing, I'm not desperate enough to invest *that* much CPU time yet.

I may take at smaz though, thanks for the pointer.

>> Alternatively, is there an easy way to dump and recover the DB using the
>> standard API rather than the SQLite shell? (Obviously I could write a
>> dump program myself, but I'd be nice if there is a solution that
>> requires less work).
>
> You are using Python and APSW IIRC. APSW includes a Python based shell
> that has dumping and restore code and can be used programmatically or
> interactively.

Yes, you remember correctly. Actually that would be a perfect solution.
But how do I use it? It seems to me that I need to pass some argument to
Shell.command_dump(), because the following just produces an empty file:


import apsw
ofh = open('dump.txt', 'w')
db = apsw.Connection('test.db')
s = apsw.Shell(db=db, stdout=ofh)
s.command_dump('.dump')


Thanks,

   -Nikolaus

--
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Roger Binns :: Rate this Message:

| View Threaded | Show Only this Message

On 04/11/2010 11:09 AM, Nikolaus Rath wrote:

> Yes, you remember correctly. Actually that would be a perfect solution.
> But how do I use it? It seems to me that I need to pass some argument to
> Shell.command_dump(), because the following just produces an empty file:
>
>
> import apsw
> ofh = open('dump.txt', 'w')
> db = apsw.Connection('test.db')
> s = apsw.Shell(db=db, stdout=ofh)
> s.command_dump('.dump')

That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-)

I'd suggest the more orthodox way of invoking commands which is less likely
to catch you out like this.

   s.process_command(".dump")

Roger
_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Nikolaus Rath :: Rate this Message:

| View Threaded | Show Only this Message

Roger Binns <rogerb@...> writes:

> On 04/11/2010 11:09 AM, Nikolaus Rath wrote:
>> Yes, you remember correctly. Actually that would be a perfect solution.
>> But how do I use it? It seems to me that I need to pass some argument to
>> Shell.command_dump(), because the following just produces an empty file:
>>
>>
>> import apsw
>> ofh = open('dump.txt', 'w')
>> db = apsw.Connection('test.db')
>> s = apsw.Shell(db=db, stdout=ofh)
>> s.command_dump('.dump')
>
> That is asking it to dump the tables named '.', 'd', 'u', 'm' and 'p' :-)

I see. I gues I got confused by the help that refers to the parameter as
'cmd':

 |  command_dump(self, cmd)
 |      dump ?TABLE? [TABLE...]: Dumps all or specified tables in SQL text format
 |      
 |      The table name is treated as like pattern so you can use % as

 
> I'd suggest the more orthodox way of invoking commands which is less likely
> to catch you out like this.
>
>    s.process_command(".dump")

That works, thanks! But why is it *that* slow?

[0] vostro:~/tmp$ cat dump.py
import apsw

db = apsw.Connection('test.db')
ofh = open('dump-py.sql', 'w')
s = apsw.Shell(db=db, stdout=ofh)
s.process_command('.dump')

[0] vostro:~/tmp$ cat dump.sh
cat <<EOF | sqlite3 test.db
.output dump.txt
.dump
EOF

[0] vostro:~/tmp$ time python dump.py ; time sh dump.sh

real 2m16.632s
user 2m12.080s
sys 0m3.756s

real 0m13.289s
user 0m9.661s
sys 0m3.600s


Best,

   -Nikolaus

--
 »Time flies like an arrow, fruit flies like a Banana.«

  PGP fingerprint: 5B93 61F8 4EA2 E279 ABF6  02CF A9AD B7F8 AE4E 425C

_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Roger Binns :: Rate this Message:

| View Threaded | Show Only this Message

Nikolaus Rath wrote:
> I see. I gues I got confused by the help that refers to the parameter as
> 'cmd':

The help shows that a list of TABLE is taken in the same syntax as
SQLite's shell uses.  Also note that the help is formatted for
documenting interactive usage rather than API usage.

> That works, thanks! But why is it *that* slow?

Because it is written in Python and SQLite's is in C.  I've made no
effort to optimise the Python - please send me a sample database and
I'll see what can be done.

Additionally the Python supports more functionality which does have an
effect on speed.  For example you can choose what encoding is used.

Roger
_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Edzard Pasma-2 :: Rate this Message:

| View Threaded | Show Only this Message

If the subject is transferring data from a web server to a central  
database, I like to refer to a feature of an open source package that  
I published myself. It is only a side-side feature, resulting from  
the general design, but it was added with just this in mind. This  
feature, 'Connecting over the internet', is described in http://
packages.python.org/sqmediumlite/#5.4 (this is only for Python users).

Op 12-apr-2010, om 6:57 heeft Roger Binns het volgende geschreven:

> Nikolaus Rath wrote:
>> I see. I gues I got confused by the help that refers to the  
>> parameter as
>> 'cmd':
>
> The help shows that a list of TABLE is taken in the same syntax as
> SQLite's shell uses.  Also note that the help is formatted for
> documenting interactive usage rather than API usage.
>
>> That works, thanks! But why is it *that* slow?
>
> Because it is written in Python and SQLite's is in C.  I've made no
> effort to optimise the Python - please send me a sample database and
> I'll see what can be done.
>
> Additionally the Python supports more functionality which does have an
> effect on speed.  For example you can choose what encoding is used.
>
> Roger
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@...
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Re: Reduce database file size

by Eduardo Morras-3 :: Rate this Message:

| View Threaded | Show Only this Message

On Sun, 11 Apr 2010 14:09:21 -0400
Nikolaus Rath <Nikolaus@...> wrote:

> Using rsync is unfortunately not possible, I'm stuck with HTTP and
> FTP.
>
> I tried a few PPM compressors, but even though the compression ratio
> is amazing, I'm not desperate enough to invest *that* much CPU time
> yet.
>
> I may take at smaz though, thanks for the pointer.
>


You can try bbb and paq8. You can find them at
http://cs.fit.edu/~mmahoney/compression/ and other compressors.
Please, note that they are very slow (depending on options it can take
hours)

At http://mattmahoney.net/dc/text.html you can look for more
compression programs.

PeaZip has a gui for some of this
compression programs http://peazip.sourceforge.net/ and works under
windows and linux (but hasn't bbb).
_______________________________________________
sqlite-users mailing list
sqlite-users@...
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users