« Return to Thread: List construction for 'pivoting' list data

re[5]: List construction for 'pivoting' list data

by Kevin Day-7 :: Rate this Message:

Reply to Author | View in Thread

It looks like the lists returned by GroupingList aren't EventLists, so the CollectionList->GroupingList strategy may not work.  The docs for GroupingList say that additional transformations are possible to the GroupingList.  It would probably be a good idea to add a clarifying note that it is not possible to transform the sub-elements of the grouping list.
 
Bummer.
 
- K
----------------------- Original Message -----------------------
  
From: Kevin Day kevin@...
To: Glazed Lists users@...
Cc: 
Date: Wed, 20 May 2009 17:46:07 -0700
Subject: re[4]: List construction for 'pivoting' list data
  
I like the term 'tag' a lot.  That applies very well to my situation.
 
I'm still struggling a little bit with how what I'm trying to do matches to a true pivot table (except as a degenerate case with only one column, result sets with a single entry, and a summarizing function of 'First').  I'm wondering if what I'm trying to do might not be as complex as a full-on pivot table (after all, I'm not trying to do any summarizing, nor support multiple columns)...  Maybe I'm missing something subtle where the type of behavior I'm looking for could easily be expanded to full-on pivot functionality?
 
I'm kind of viewing things as a two level tree structure that can be viewed in two different ways:
 
Data\Tags
 
or
 
Tag\Data (plural)
 
 
and needing an efficient way of getting at both representations.  In fact, an email viewer GMail would be a good way to think about this.  Sorting on the 'tags' column would cause the tag values to expand and group by individual tag.  Sorting on the 'Reference' column would cause the tags column to collapse in to a multi-value representation, and things would by grouped by reference.
 
I'm not really using different tags for different columns/rows/etc...  I'm really just trying to exchange the order of the tree.
 
 
Right now, my Data object has the tags already intrinsically linked to it.  Perhaps the pivot table idea would be more applicable if the tags were captured in a separate data structure.  I'll have to think about that one a bit - so far, the current design has been natural, but I could see an alternate approach that consists of a list of Data/Tag tuples.  The pivot table would then provide a Data\Tags view or Tag\Data view, with the summarizing function being a list concatenation operation.  It would be pretty trivial at that point to apply a Calculation against the resulting list for folks who wanted true pivot table functionality.  Interesting.
 
 
Thanks again for taking the time to discuss - it helps me to explore different ideas.
 
- K
 
----------------------- Original Message -----------------------
  
From: James Lemieux jplemieux@...
To: users@...
Cc: 
Date: Wed, 20 May 2009 14:14:24 -0700
Subject: Re: re[2]: List construction for 'pivoting' list data
  
One suggestion for a terminology improvement:

Keys / KeyedData, etc is ok. "Key" is a very reasonable term.

That said, I think "tag", as popularized by Flickr or "label" as popularized by GMail, may be the best term to use. It reinforces the idea that any data item may have "multiple tags" which allow it to participate in multiple "views" (aka pivot table cells).

Traditional pivot tables typically assume that any given data item can only participate in one "view" (i.e. one cell in the pivot table), but you've said you are after something even more generic than that.

I can imagine a FunctionList<MyDataItem, TaggedData<MyDataItem>> where TaggedData is:

public class TaggedData<E> {
   private final Collection<Object> tags;
   private fin al E data;
}

and at that point it starts to become hazy as to what optimal data structures would be for creating the 2D grid based on the tags... but this is where I think you have to ask the API user what they want via an interface. (i.e. which tags constitute columns, which constitute rows, and in what order would you like each of them to appear)

With that hazy part in place, you could then return to the world of GL as much or as little as you like when coming up with the data structure that backs each "cell in the pivot table". Perhaps it's a simple FilterList<TaggedData>.... but I'm not sure if that will scale as well as you'd like, since all cellls in the grid would be reacting to each and every ListEvent... and you'd probably want to be smarter about that part.

Overall, it seems like a really well-defined fun problem!

James

On Wed, May 20, 2009 at 1:38 PM, Kevin Day <kevin@...> wrote:
Yeah, I hear you - the type of thing I'm going for isn't exactly a pivot table, it's more of a collectionlist inversion, I guess - or maybe a multi-valued grouping list where a single element in the source can wind up in multiple groups??  Not sure exactly what to call it.
 
As for API design for something like this, I envision something very similar to GroupingList, but instead of passing in a Comparator, the user would pass in an object that would provide the keys for a given list element (similar to CollectionList.Model, I guess).  That's easy enough.  But I'm struggling a bit with what the internal data structures of such a beast would look like.
 
I suppose creating another list with objects that hold the key and the original element is an obvious first step.  Then grouping that other list by key.
 
This seems to open another avenue of solution here using existing GL constructs (I know you recommended against, but I can't help it)...  I could use a CollectionList model to create FunctionLists that tranform each key list into a list of intermediate objects (holding the key and the original element).  Then put that into a GroupingList with a comparitor that pulls the key value out of the intermediate object.
 
I think my pipeline would be:
 
EventList<DataWithKeys> --> CollectionList<DataWithKeys, KeyedData> -> GroupingList<KeyedData>
 
and the model for the CollectionList would be:
 
public List<KeyedData> getChildren(DataWithKeys parent){
    KeyedDataFunction f = new KeyedDataFunction(parent);
    return new FunctionList<DataWithKeys, KeyedData>(parent.getKeys(), f);
}
 
and KeyedDataFunction would be:
 
public KeyedData evaluate(Key sourceValue){
    return new KeyedData(data, sourceValue);
}
 
 
 
Any comments on that type of approach?  (And if you still think it would be better to construct something from scratch, please definitely let me know).
 
- K
 
----------------------- Original Message -----------------------
  
From: James Lemieux jplemieux@...
Cc: 
Date: Wed, 20 May 2009 12:27:31 -0700
Subject: Re: List construction for 'pivoting' list data
  
If I was going to design a Pivot Solution to work with GL, I don't think I'd use too many GL existing transformations to accomplish it. It may seem tempting, but the potential to have a flood of ListEvents running everywhere is great.

It might be best to simply start with a ListEventListener and write your own logic for generating a pivot table from a raw list. If you want to make it generic, design an interface that you expect API users to implement which provides the tools you need to make decisions about the pivot data. e.g.: how many rows / columns will my pivot table have? what are the criteria by which data is judged for membership in each of those columns / rows?... these are questions you may want to consult a user implementation for....

Just a thought...

James

On Wed, May 20, 2009 at 8:50 AM, Kevin Day <kevin@...> wrote:
Hi all-
 
If anyone has any thoughts on this challenge, please share.
 
We have a list of data objects that themselves contain lists of keys.  We need to transform this list into another list of lists, but grouped by each key value (note that a single Data object is associated with a given key only once, but could be associated with multiple keys
 
 
To make this concrete, here's the source list:
 
EventList<Data> allData = {
    OBJ1+{ABC, DEF, GHI},
    OBJ2+{ABC, GHI},
    OBJ3+{DEF}
    }
 
and here's the desired result list:
 
EventList<Data> dataForKeys={
    ABC+{OBJ1, OBJ2},
    DEF+{OBJ1, OBJ3},
    GHI+{OBJ1, OBJ2}
    }
 
 
We will probably have 500-2000 data objects, and 500-1500 different keys.  Each data object will probably have between 1 and 3 keys associated with it, although it could be as high as 500 in some instances.
 
So far, I've come up with two approaches:
 
Approach 1:
Use a CollectionList against allData to get the list of keys, then apply UniqueList to get individual keys.  Then apply a FunctionList against the list of keys that creates a FilterList for each key (filtering allData by the key passed into the Function).  This approach has two issues (that I see, anyway) - first, there is an odd dependency in here that will probably require a related subject or listener to be registered with the publisher.  I haven't tracked this down yet (if anyone has any quick pointers, I'd appreciate it)  The test harness I put together yeterday evening definitely has problems with the order that list updates occur.  The bigger problem that I see with this approach is that we could wind up with a whole lot of FilterLists that are filtering down to a very small sub-set of the original data set.  Many of the FilterLists will be returning just a single element.  The thought of having 500 filters lists all running at the same time gives me an uncomfortable feeling (but maybe I'm concerned over nothing?)
 
Approach 2:
Some sort of map based approach.  For this to work, I'd need to have a multi-map implementation that would allow for a list of keys to be specified (instead of a single key).  And that list would have to be an EventList.  Then the list returned from the multi-map would also have to be an eventlist.  The issue with this approach is that it's not even remotely close to practical given the current GL implementations.  I could see trying to exapand the GL map wrapper so that it returns live lists (I think the obstacle here is that delete operations from the source list do not contain the deleted element, so it could be a right bugger to figure out how to remove elements from the mapped lists).  Support for multi-keys would also require information about the removed element.  If this approach were possible, I think that it would certainly be performant.
 
 
Are there other approaches that may be better (or just different) than these two?
 
Thanks in advance,
 
- Kevin
--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@...

--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@...

--------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@...

 « Return to Thread: List construction for 'pivoting' list data