Overriding one database's content with another

View: New views
9 Messages — Rating Filter:   Alert me  

Overriding one database's content with another

by Blanky-kun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm currently developing a Java version of Mids' Hero Designer
<http://www.cohplanner.com/>, using Derby as my data storage/retrieval.
The data in the database is representative of the archetypes, powersets,
powers, and enhancements in the online game City of Heroes
<http://www.cityofheroes.com/>. Updates to the program would include
updated values in the database to reflect changes to the game.

One of the features I'd like for the program is to allow user
customization of the database without interfering with the "official"
values, and without updates to the official values interfering with user
changes. User customization would be used, for example, to input
upcoming changes to the game by the user, before an official program
update is created. Alternatively, as a way to show the effects of
hypothetical changes when making a suggestion for the game.

My idea for achieving this was to maintain two databases: 'Data',
storing the official information, and 'User_Data', storing the
user-generated changes from Data. What I want is for any SELECT
statement to be passed to both DBs; results in User_Data which are not
in Data would be returned (information added by the user), results in
Data which are not in User_Data would be returned (information not
touched by the user), and results in both databases would return the
version from User_Data (information modified by the user).

The problem is, I'm not exactly certain how to go about it. I suppose I
need to create some sort of abstraction layer between my queries and
that actual database access, but how would I combine the ResultSets?

Re: Overriding one database's content with another

by Rick Hillegas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Brian,

Combining data from two databases can be done by using database
procedures or table functions, but you will probably get better
performance if you can just maintain two separate schemas in the same
database.

What you are trying to accomplish sounds a lot like an outer join. You
might want to read about this kind of join here:
http://en.wikipedia.org/wiki/Join_%28SQL%29

The following script may help you understand more about how to use outer
joins:

connect 'jdbc:derby:memory:dummy;create=true';

create schema data;
create schema user_data;

create table data.t( a int );
create table user_data.t( a int );

insert into data.t( a ) values ( 1 ), ( 2 );
insert into user_data.t( a ) values ( 2 ), ( 3 );

  select *
  from data.t left outer join user_data.t
  on data.t.a = user_data.t.a
union
  select *
  from data.t right outer join user_data.t
  on data.t.a = user_data.t.a
;

Hope this helps,
-Rick

Brian Shields wrote:

> I'm currently developing a Java version of Mids' Hero Designer
> <http://www.cohplanner.com/>, using Derby as my data
> storage/retrieval. The data in the database is representative of the
> archetypes, powersets, powers, and enhancements in the online game
> City of Heroes <http://www.cityofheroes.com/>. Updates to the program
> would include updated values in the database to reflect changes to the
> game.
>
> One of the features I'd like for the program is to allow user
> customization of the database without interfering with the "official"
> values, and without updates to the official values interfering with
> user changes. User customization would be used, for example, to input
> upcoming changes to the game by the user, before an official program
> update is created. Alternatively, as a way to show the effects of
> hypothetical changes when making a suggestion for the game.
>
> My idea for achieving this was to maintain two databases: 'Data',
> storing the official information, and 'User_Data', storing the
> user-generated changes from Data. What I want is for any SELECT
> statement to be passed to both DBs; results in User_Data which are not
> in Data would be returned (information added by the user), results in
> Data which are not in User_Data would be returned (information not
> touched by the user), and results in both databases would return the
> version from User_Data (information modified by the user).
>
> The problem is, I'm not exactly certain how to go about it. I suppose
> I need to create some sort of abstraction layer between my queries and
> that actual database access, but how would I combine the ResultSets?


Re: Overriding one database's content with another

by Blanky-kun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rick Hillegas wrote:

> Hi Brian,
>
> Combining data from two databases can be done by using database
> procedures or table functions, but you will probably get better
> performance if you can just maintain two separate schemas in the same
> database.
>
> What you are trying to accomplish sounds a lot like an outer join. You
> might want to read about this kind of join here:
> http://en.wikipedia.org/wiki/Join_%28SQL%29

Thanks for the help, that looks exactly like what I need. However, I've got two
questions:
1) Using two schemas in one database, how does Derby store the filesystem?
Currently, I've got a Data directory, containing a couple files, a log folder,
and a seg0 folder with many data files. If the two schemas are stored separately
(or in separate subfolders), adding database updates to my program should be
trivial. If the data from the two schemas is merged into the same data files,
updating the non-user data would be more tricky.
2) I don't think it would come up, but it doesn't seem that there's any
possibility with this method to detect if a row has been removed by the user. If
Data contains row 'foo', and the user deletes 'foo', this method seems that it
would return 'foo' from Data, and a set of NULLs in the corresponding User_Data,
which is the same result as if the user had done nothing.

Of course, as I write this, I think I may have come up with a solution to that
problem, as well; adding an extra column to the User_Data (boolean, likely) to
indicate if the row should be deleted. Does that seem reasonable?


Re: Overriding one database's content with another

by Rick Hillegas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Brian,

Some responses inline...

Brian Shields wrote:

> Rick Hillegas wrote:
>  
>> Hi Brian,
>>
>> Combining data from two databases can be done by using database
>> procedures or table functions, but you will probably get better
>> performance if you can just maintain two separate schemas in the same
>> database.
>>
>> What you are trying to accomplish sounds a lot like an outer join. You
>> might want to read about this kind of join here:
>> http://en.wikipedia.org/wiki/Join_%28SQL%29
>>    
>
> Thanks for the help, that looks exactly like what I need. However, I've got two
> questions:
> 1) Using two schemas in one database, how does Derby store the filesystem?
> Currently, I've got a Data directory, containing a couple files, a log folder,
> and a seg0 folder with many data files. If the two schemas are stored separately
> (or in separate subfolders), adding database updates to my program should be
> trivial. If the data from the two schemas is merged into the same data files,
> updating the non-user data would be more tricky.
>  
You'll see one file per table in the seg0 folder. The schemas will not
share data files.

> 2) I don't think it would come up, but it doesn't seem that there's any
> possibility with this method to detect if a row has been removed by the user. If
> Data contains row 'foo', and the user deletes 'foo', this method seems that it
> would return 'foo' from Data, and a set of NULLs in the corresponding User_Data,
> which is the same result as if the user had done nothing.
>
> Of course, as I write this, I think I may have come up with a solution to that
> problem, as well; adding an extra column to the User_Data (boolean, likely) to
> indicate if the row should be deleted. Does that seem reasonable?
>  
Sounds reasonable to me. You'll have to track history yourself; the
outer joins won't do that job for you.

Regards,
-Rick


Re: Overriding one database's content with another

by Blanky-kun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Rick Hillegas <Richard.Hillegas@...> writes:
> > 1) Using two schemas in one database, how does Derby store the filesystem?
> > Currently, I've got a Data directory, containing a couple files, a log
> > folder, and a seg0 folder with many data files. If the two schemas are
> > stored separately (or in separate subfolders), adding database updates to
> > my program should be trivial. If the data from the two schemas is merged
> > into the same data files, updating the non-user data would be more tricky.
> >
> You'll see one file per table in the seg0 folder. The schemas will not
> share data files.

So I've successfully got two schemas in my database: data and user_data. Both
have 19 tables, and each corresponding table has the same columns, with the
exception that the tables in user_data have a column 'delete_row' at the end.
All 38 tables have a single index. Aside from the two schemas I've created, the
following exist in my database, according to RazorSQL
<http://www.razorsql.com/>:
Schema SQLJ contains 3 procedures: install_jar, remove_jar, and replace_jar.
Schema SYS contains 19 system tables and 42 indices.
Schema SYSCS_UTIL contains 26 procedures.
Schema SYSIBM contains 1 system table (sysdummy1) and 22 procedures.

All told, that's 58 tables (20 are system tables), 51 procedures, and 80
indices. My seg0 folder contains 139 data files, which neither matches the total
number of tables in my database, nor the total number of items (189). So, I'm
not exactly sure which files represent what, and there's no noticeable naming
scheme to the DAT files I can go off of.

Most importantly, I don't want to overwrite the files representing user_data
tables when I push an update, so if worst comes to worst, I could
programatically delete the user_data tables from *my* copy before pushing an
update, and I would be overwriting the data and system
tables/procedures/whatever.

Ideally, I would want to only overwrite the data schema tables, as that would be
a smaller download, but it also means I need a way of identifying which files
correspond to what, but I can't exactly read them, so it's a problem.

In short: Is there any way to identify what data files in the seg0 folder
correspond to what tables, and if so what is it? Or do I just have to suck it up
and lop off my user_data before pushing an update?


Re: Overriding one database's content with another

by Bryan Pendleton :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> not exactly sure which files represent what, and there's no noticeable naming
> scheme to the DAT files I can go off of.

The naming scheme is based off the conglomerate ID, which can be used to
cross-reference back to the system catalogs.

This should help you get the hang of figuring out which file goes with
which conglomerate, and hence with which table/index:
http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/SpaceTable.html

thanks,

bryan


Re: Overriding one database's content with another

by Blanky-kun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bryan Pendleton <bpendleton@...> writes:
> > not exactly sure which files represent what, and there's no noticeable naming
> > scheme to the DAT files I can go off of.
>
> The naming scheme is based off the conglomerate ID, which can be used to
> cross-reference back to the system catalogs.
>
> This should help you get the hang of figuring out which file goes with
> which conglomerate, and hence with which table/index:
> http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/SpaceTable.html

The vast majority of the CONGLOMERATEIDs for my tables contain either c16f or
c20e, and end in c00 or c30. My filenames range from c1a1 to cf0, so unless you
can tell me how the filenames are generated from the CONGLOMERATEID, I don't see
how that solves my problem...?


Re: Overriding one database's content with another

by Kristian Waagan-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brian Shields wrote:

> Bryan Pendleton <bpendleton@...> writes:
>  
>>> not exactly sure which files represent what, and there's no noticeable naming
>>> scheme to the DAT files I can go off of.
>>>      
>> The naming scheme is based off the conglomerate ID, which can be used to
>> cross-reference back to the system catalogs.
>>
>> This should help you get the hang of figuring out which file goes with
>> which conglomerate, and hence with which table/index:
>> http://db.apache.org/derby/javadoc/engine/org/apache/derby/diag/SpaceTable.html
>>    
>
> The vast majority of the CONGLOMERATEIDs for my tables contain either c16f or
> c20e, and end in c00 or c30. My filenames range from c1a1 to cf0, so unless you
> can tell me how the filenames are generated from the CONGLOMERATEID, I don't see
> how that solves my problem...
Hi Brian,

I think what Bryan meant was the conglomerate number. See in
SYS.SYSCONGLOMERATES.
You may also want to filter based on the schema id etc, and you have to
join data from SYS.SYSTABLES if you want to target a specific table/index.
The conglomerate number is represented as an integer in the system
tables. To get the corresponding file, convert the integer to
hexadecimal, prefix a 'c' and append '.dat'.

Remember that indexes are stored in their own separate file (also
recorded in the system tables).
I've never tried to update parts of a database in this way, let us know
how it works out :)


Regards,
--
Kristian

Re: Overriding one database's content with another

by Blanky-kun :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kristian Waagan <Kristian.Waagan@...> writes:
> The conglomerate number is represented as an integer in the system
> tables. To get the corresponding file, convert the integer to
> hexadecimal, prefix a 'c' and append '.dat'.
>
> Remember that indexes are stored in their own separate file (also
> recorded in the system tables).

Thank you, this is exactly what I was looking for!

As for the indices, I don't think they're ever changing, but my plan was to grab
all of the conglomerates corresponding to the data schema, and the indices would
be a part of that. I don't mind 38 vs. 19 files, it was the 139 vs. 19 that was
problematic. I'd rather not needlessly overwrite a bunch of system information
for the database, and I'd rather not have to programatically drop the user_data
tables before pushing out a build of the program just to ensure the users don't
get their settings changed.