Cluster Comparison

View: New views
12 Messages — Rating Filter:   Alert me  

Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Does Weka have cluster comparison capabilities?
For example, if I run two different clustering algorithms on the same
data set, is there a way to compare and contrast the clusters
generated from the different algorithms?

Sorry if this an inappropriate place to ask a general data mining question.

--
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Cluster Comparison

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Does Weka have cluster comparison capabilities?
> For example, if I run two different clustering algorithms on the same
> data set, is there a way to compare and contrast the clusters
> generated from the different algorithms?
>
> Sorry if this an inappropriate place to ask a general data mining question.

Apart from visualizing the cluster assignments from both runs, I don't
think so that you can do much else.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

RE: [Realist] Cluster Comparison

by Hongbo Du :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi, there

I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions.

Am I correct, Peter?

Cheers

Hongbo

-----Original Message-----
From: wekalist-bounces@...
[mailto:wekalist-bounces@...]On Behalf Of Curtis
Jensen
Sent: 21 October 2009 17:11
To: Weka machine learning workbench list.
Subject: [Wekalist] Cluster Comparison


Does Weka have cluster comparison capabilities?
For example, if I run two different clustering algorithms on the same
data set, is there a way to compare and contrast the clusters
generated from the different algorithms?

Sorry if this an inappropriate place to ask a general data mining question.

--
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions.

Huh? Why shouldn't you be able to use the filter in 3.6.x? Did you
maybe not *unset* the class attribute in the Explorer's preprocess
panel? Remember, clusters *cannot* process datasets with a class
attribute set and the preprocess panel automatically uses the last
attribute as class attribute if a dataset format doesn't support the
specification of a class attribute (ARFF or CSV don't support that,
XRFF does).

BTW when you visualize the cluster assignments, you can save these as
well to a file.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 22, 2009 at 2:30 AM, Hongbo Du <hongbo.du@...> wrote:

> Hi, there
>
> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions.
>
> Am I correct, Peter?
>
> Cheers
>
> Hongbo
>
That is a good idea.  Thanks.

--
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Oct 22, 2009 at 2:30 AM, Hongbo Du <hongbo.du@...> wrote:
> Hi, there
>
> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions.
>
> Am I correct, Peter?
>
> Cheers
>
> Hongbo

Can I just use the "Ignore attributes" button, instead of the filter,
to ignore the cluster attribute (Weka 3.6.1)?

Thanks,
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use.
>
> Can I just use the "Ignore attributes" button, instead of the filter,
> to ignore the cluster attribute (Weka 3.6.1)?

Hongbo was outlining an approach of how to compare *two* cluster runs
on the same data. Each run of the filter adds a new attribute with the
cluster assignments. But in the second run, you need to exclude the
assignments of the first run (they shouldn't influence the cluster
model being built). You can then inspect the generated cluster
assignments either in Weka or export the file as CSV and then use a
spreadsheet application.

Using the Explorer's cluster panel, you don't get the cluster
assignments of two runs in a *single* file.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 3:19 PM, Peter Reutemann <fracpete@...> wrote:

>>> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use.
>>
>> Can I just use the "Ignore attributes" button, instead of the filter,
>> to ignore the cluster attribute (Weka 3.6.1)?
>
> Hongbo was outlining an approach of how to compare *two* cluster runs
> on the same data. Each run of the filter adds a new attribute with the
> cluster assignments. But in the second run, you need to exclude the
> assignments of the first run (they shouldn't influence the cluster
> model being built). You can then inspect the generated cluster
> assignments either in Weka or export the file as CSV and then use a
> spreadsheet application.
>
> Using the Explorer's cluster panel, you don't get the cluster
> assignments of two runs in a *single* file.
>
> Cheers, Peter
> --
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>
I understand getting both cluster assignments in the same file.  I'm
wondering about the filter part.  It sounds like Hongbo suggested
using a filter to remove the cluster attribute from the first run so
the second cluster run doesn't use the cluster attribute from the
first run.  The output would then have both cluster attributes.

I'm wondering if the "Ignore attribute" option can be used to have the
second run ignore the cluster attribute instead of using the filter?
Does the "Ignore attribute" accomplish the same thing as the filter?

Thanks,
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I understand getting both cluster assignments in the same file.  I'm
> wondering about the filter part.  It sounds like Hongbo suggested
> using a filter to remove the cluster attribute from the first run so
> the second cluster run doesn't use the cluster attribute from the
> first run.  The output would then have both cluster attributes.

He was talking about the AddCluster filter (package
weka.filters.unsupervised.attribute), which takes a cluster algorithm
as parameter and also a parameter for ignoring attributes (i.e., the
attribute that got added after the first run of this filter). You
don't need the cluster panel at all.

> I'm wondering if the "Ignore attribute" option can be used to have the
> second run ignore the cluster attribute instead of using the filter?

It can.

> Does the "Ignore attribute" accomplish the same thing as the filter?

It accomplishes the same as the filter's "ignore attributes" option.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 7:55 PM, Peter Reutemann <fracpete@...> wrote:

>> I understand getting both cluster assignments in the same file.  I'm
>> wondering about the filter part.  It sounds like Hongbo suggested
>> using a filter to remove the cluster attribute from the first run so
>> the second cluster run doesn't use the cluster attribute from the
>> first run.  The output would then have both cluster attributes.
>
> He was talking about the AddCluster filter (package
> weka.filters.unsupervised.attribute), which takes a cluster algorithm
> as parameter and also a parameter for ignoring attributes (i.e., the
> attribute that got added after the first run of this filter). You
> don't need the cluster panel at all.
>
>> I'm wondering if the "Ignore attribute" option can be used to have the
>> second run ignore the cluster attribute instead of using the filter?
>
> It can.
>
>> Does the "Ignore attribute" accomplish the same thing as the filter?
>
> It accomplishes the same as the filter's "ignore attributes" option.
>
> Cheers, Peter
> --
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>

Sorry for the confusion.  On the "Preprocess" tab of the explorer,
there is a panel that allows attributes to be removed.  On the
"Cluster" tab, there is an "Ignore attribute" button.  Hongbo
suggested using the "Add Cluster" filter to ignore an attribute (as
well as add the cluster as an attribute to the output).

That's three different ways to ignore an attribute.  However, when I
run clustering after removing attributes from the "Preprocess" tab,
the results indicate that it is using the cluster attribute.  When I
use the "Ignore attribute" button from the "Cluster" tab, it still
shows that it is using the cluster attribute.

Am I using the tool incorrectly, or am I miss understanding what the
remove attribute and ignore attribute controls are for?

Thanks,
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Sorry for the confusion.  On the "Preprocess" tab of the explorer,
> there is a panel that allows attributes to be removed.

Yes, forget about that options. Removing =/= Ignoring. Removing
removes the attribute permanently from the dataset, the ignore option
only temporarily "hides" the attributes for the clusterer (it applies
internally the Remove filter before presenting the data to the cluster
algorithm).

> On the
> "Cluster" tab, there is an "Ignore attribute" button.  Hongbo
> suggested using the "Add Cluster" filter to ignore an attribute (as
> well as add the cluster as an attribute to the output).

Yes, the AddCluster filter has its own option for ignoring attributes.

> That's three different ways to ignore an attribute.  However, when I
> run clustering after removing attributes from the "Preprocess" tab,
> the results indicate that it is using the cluster attribute.

Not quite sure what you mean. If you store the cluster assignments for
visualization, then this will automatically add an attribute called
"Cluster" to the data (basically, what you just removed on the
preprocess pane;). Are you confused by that maybe?

> When I
> use the "Ignore attribute" button from the "Cluster" tab, it still
> shows that it is using the cluster attribute.

See my explanation above regarding the filter's ignore option.

> Am I using the tool incorrectly, or am I miss understanding what the
> remove attribute and ignore attribute controls are for?

You might get confused by the fact that the AddCluster and the "store
cluster assignments for visualization" both add an attribute called
"Cluster".

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: RE: [Realist] Cluster Comparison

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 23, 2009 at 8:19 PM, Peter Reutemann <fracpete@...> wrote:

>> Sorry for the confusion.  On the "Preprocess" tab of the explorer,
>> there is a panel that allows attributes to be removed.
>
> Yes, forget about that options. Removing =/= Ignoring. Removing
> removes the attribute permanently from the dataset, the ignore option
> only temporarily "hides" the attributes for the clusterer (it applies
> internally the Remove filter before presenting the data to the cluster
> algorithm).
>
>> On the
>> "Cluster" tab, there is an "Ignore attribute" button.  Hongbo
>> suggested using the "Add Cluster" filter to ignore an attribute (as
>> well as add the cluster as an attribute to the output).
>
> Yes, the AddCluster filter has its own option for ignoring attributes.
>
>> That's three different ways to ignore an attribute.  However, when I
>> run clustering after removing attributes from the "Preprocess" tab,
>> the results indicate that it is using the cluster attribute.
>
> Not quite sure what you mean. If you store the cluster assignments for
> visualization, then this will automatically add an attribute called
> "Cluster" to the data (basically, what you just removed on the
> preprocess pane;). Are you confused by that maybe?
>
>> When I
>> use the "Ignore attribute" button from the "Cluster" tab, it still
>> shows that it is using the cluster attribute.
>
> See my explanation above regarding the filter's ignore option.
>
>> Am I using the tool incorrectly, or am I miss understanding what the
>> remove attribute and ignore attribute controls are for?
>
> You might get confused by the fact that the AddCluster and the "store
> cluster assignments for visualization" both add an attribute called
> "Cluster".
>
> Cheers, Peter
> --
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>
Thanks for clearing things up.  There are two things that were confusing.

1. There are TWO new attributes that are added when the cluster
results are saved through the visualization window.  I knew about the
Cluster attribute.  However, an "Instance_number" attribute is also
being added.  I did not expect that.  When I ran tests to verify the
ignore option worked, my results were skewed because of the
"Instance_number" attibute.  To get the things to work, I really need
to ignore two attributes on the second cluster run.

2. The output window for the cluster run, lists the number of
attributes.  It was alway 2 more than the original data set.  I
figured that since I was ignoring attributes, it should be the same as
the original number of attributes.  And since my test experiments
never came out the same as what was expected, I figured the ignore
attribute wasn't working or I was doing it wrong.

It all works as expected when I ignore both newly created attributes.

Thanks again for the help,
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html