|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
Cluster ComparisonDoes Weka have cluster comparison capabilities?
For example, if I run two different clustering algorithms on the same data set, is there a way to compare and contrast the clusters generated from the different algorithms? Sorry if this an inappropriate place to ask a general data mining question. -- Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: Cluster Comparison> Does Weka have cluster comparison capabilities?
> For example, if I run two different clustering algorithms on the same > data set, is there a way to compare and contrast the clusters > generated from the different algorithms? > > Sorry if this an inappropriate place to ask a general data mining question. Apart from visualizing the cluster assignments from both runs, I don't think so that you can do much else. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
RE: [Realist] Cluster ComparisonHi, there
I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions. Am I correct, Peter? Cheers Hongbo -----Original Message----- From: wekalist-bounces@... [mailto:wekalist-bounces@...]On Behalf Of Curtis Jensen Sent: 21 October 2009 17:11 To: Weka machine learning workbench list. Subject: [Wekalist] Cluster Comparison Does Weka have cluster comparison capabilities? For example, if I run two different clustering algorithms on the same data set, is there a way to compare and contrast the clusters generated from the different algorithms? Sorry if this an inappropriate place to ask a general data mining question. -- Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster Comparison> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions.
Huh? Why shouldn't you be able to use the filter in 3.6.x? Did you maybe not *unset* the class attribute in the Explorer's preprocess panel? Remember, clusters *cannot* process datasets with a class attribute set and the preprocess panel automatically uses the last attribute as class attribute if a dataset format doesn't support the specification of a class attribute (ARFF or CSV don't support that, XRFF does). BTW when you visualize the cluster assignments, you can save these as well to a file. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster ComparisonOn Thu, Oct 22, 2009 at 2:30 AM, Hongbo Du <hongbo.du@...> wrote:
> Hi, there > > I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions. > > Am I correct, Peter? > > Cheers > > Hongbo > -- Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster ComparisonOn Thu, Oct 22, 2009 at 2:30 AM, Hongbo Du <hongbo.du@...> wrote:
> Hi, there > > I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. However, this filter cannot be used in the new 3.6.x versions. > > Am I correct, Peter? > > Cheers > > Hongbo Can I just use the "Ignore attributes" button, instead of the filter, to ignore the cluster attribute (Weka 3.6.1)? Thanks, Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster Comparison>> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use.
> > Can I just use the "Ignore attributes" button, instead of the filter, > to ignore the cluster attribute (Weka 3.6.1)? Hongbo was outlining an approach of how to compare *two* cluster runs on the same data. Each run of the filter adds a new attribute with the cluster assignments. But in the second run, you need to exclude the assignments of the first run (they shouldn't influence the cluster model being built). You can then inspect the generated cluster assignments either in Weka or export the file as CSV and then use a spreadsheet application. Using the Explorer's cluster panel, you don't get the cluster assignments of two runs in a *single* file. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster ComparisonOn Fri, Oct 23, 2009 at 3:19 PM, Peter Reutemann <fracpete@...> wrote:
>>> I have done something similar like this before using 3.5.6 developer version of Weka. In that version, there is a filter known as Add Cluster which allows you to call a clustering algorithm and save the clustering result, but ignore one attribute (that can be your previous cluster assignments). After applying the filter, you will see the previous cluster result (the ignored attribute) and the new assignment side by side. Of course, you could then save the file for later use. >> >> Can I just use the "Ignore attributes" button, instead of the filter, >> to ignore the cluster attribute (Weka 3.6.1)? > > Hongbo was outlining an approach of how to compare *two* cluster runs > on the same data. Each run of the filter adds a new attribute with the > cluster assignments. But in the second run, you need to exclude the > assignments of the first run (they shouldn't influence the cluster > model being built). You can then inspect the generated cluster > assignments either in Weka or export the file as CSV and then use a > spreadsheet application. > > Using the Explorer's cluster panel, you don't get the cluster > assignments of two runs in a *single* file. > > Cheers, Peter > -- > Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ > http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 > wondering about the filter part. It sounds like Hongbo suggested using a filter to remove the cluster attribute from the first run so the second cluster run doesn't use the cluster attribute from the first run. The output would then have both cluster attributes. I'm wondering if the "Ignore attribute" option can be used to have the second run ignore the cluster attribute instead of using the filter? Does the "Ignore attribute" accomplish the same thing as the filter? Thanks, Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster Comparison> I understand getting both cluster assignments in the same file. I'm
> wondering about the filter part. It sounds like Hongbo suggested > using a filter to remove the cluster attribute from the first run so > the second cluster run doesn't use the cluster attribute from the > first run. The output would then have both cluster attributes. He was talking about the AddCluster filter (package weka.filters.unsupervised.attribute), which takes a cluster algorithm as parameter and also a parameter for ignoring attributes (i.e., the attribute that got added after the first run of this filter). You don't need the cluster panel at all. > I'm wondering if the "Ignore attribute" option can be used to have the > second run ignore the cluster attribute instead of using the filter? It can. > Does the "Ignore attribute" accomplish the same thing as the filter? It accomplishes the same as the filter's "ignore attributes" option. Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster ComparisonOn Fri, Oct 23, 2009 at 7:55 PM, Peter Reutemann <fracpete@...> wrote:
>> I understand getting both cluster assignments in the same file. I'm >> wondering about the filter part. It sounds like Hongbo suggested >> using a filter to remove the cluster attribute from the first run so >> the second cluster run doesn't use the cluster attribute from the >> first run. The output would then have both cluster attributes. > > He was talking about the AddCluster filter (package > weka.filters.unsupervised.attribute), which takes a cluster algorithm > as parameter and also a parameter for ignoring attributes (i.e., the > attribute that got added after the first run of this filter). You > don't need the cluster panel at all. > >> I'm wondering if the "Ignore attribute" option can be used to have the >> second run ignore the cluster attribute instead of using the filter? > > It can. > >> Does the "Ignore attribute" accomplish the same thing as the filter? > > It accomplishes the same as the filter's "ignore attributes" option. > > Cheers, Peter > -- > Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ > http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 > Sorry for the confusion. On the "Preprocess" tab of the explorer, there is a panel that allows attributes to be removed. On the "Cluster" tab, there is an "Ignore attribute" button. Hongbo suggested using the "Add Cluster" filter to ignore an attribute (as well as add the cluster as an attribute to the output). That's three different ways to ignore an attribute. However, when I run clustering after removing attributes from the "Preprocess" tab, the results indicate that it is using the cluster attribute. When I use the "Ignore attribute" button from the "Cluster" tab, it still shows that it is using the cluster attribute. Am I using the tool incorrectly, or am I miss understanding what the remove attribute and ignore attribute controls are for? Thanks, Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster Comparison> Sorry for the confusion. On the "Preprocess" tab of the explorer,
> there is a panel that allows attributes to be removed. Yes, forget about that options. Removing =/= Ignoring. Removing removes the attribute permanently from the dataset, the ignore option only temporarily "hides" the attributes for the clusterer (it applies internally the Remove filter before presenting the data to the cluster algorithm). > On the > "Cluster" tab, there is an "Ignore attribute" button. Hongbo > suggested using the "Add Cluster" filter to ignore an attribute (as > well as add the cluster as an attribute to the output). Yes, the AddCluster filter has its own option for ignoring attributes. > That's three different ways to ignore an attribute. However, when I > run clustering after removing attributes from the "Preprocess" tab, > the results indicate that it is using the cluster attribute. Not quite sure what you mean. If you store the cluster assignments for visualization, then this will automatically add an attribute called "Cluster" to the data (basically, what you just removed on the preprocess pane;). Are you confused by that maybe? > When I > use the "Ignore attribute" button from the "Cluster" tab, it still > shows that it is using the cluster attribute. See my explanation above regarding the filter's ignore option. > Am I using the tool incorrectly, or am I miss understanding what the > remove attribute and ignore attribute controls are for? You might get confused by the fact that the AddCluster and the "store cluster assignments for visualization" both add an attribute called "Cluster". Cheers, Peter -- Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
|
|
Re: RE: [Realist] Cluster ComparisonOn Fri, Oct 23, 2009 at 8:19 PM, Peter Reutemann <fracpete@...> wrote:
>> Sorry for the confusion. On the "Preprocess" tab of the explorer, >> there is a panel that allows attributes to be removed. > > Yes, forget about that options. Removing =/= Ignoring. Removing > removes the attribute permanently from the dataset, the ignore option > only temporarily "hides" the attributes for the clusterer (it applies > internally the Remove filter before presenting the data to the cluster > algorithm). > >> On the >> "Cluster" tab, there is an "Ignore attribute" button. Hongbo >> suggested using the "Add Cluster" filter to ignore an attribute (as >> well as add the cluster as an attribute to the output). > > Yes, the AddCluster filter has its own option for ignoring attributes. > >> That's three different ways to ignore an attribute. However, when I >> run clustering after removing attributes from the "Preprocess" tab, >> the results indicate that it is using the cluster attribute. > > Not quite sure what you mean. If you store the cluster assignments for > visualization, then this will automatically add an attribute called > "Cluster" to the data (basically, what you just removed on the > preprocess pane;). Are you confused by that maybe? > >> When I >> use the "Ignore attribute" button from the "Cluster" tab, it still >> shows that it is using the cluster attribute. > > See my explanation above regarding the filter's ignore option. > >> Am I using the tool incorrectly, or am I miss understanding what the >> remove attribute and ignore attribute controls are for? > > You might get confused by the fact that the AddCluster and the "store > cluster assignments for visualization" both add an attribute called > "Cluster". > > Cheers, Peter > -- > Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ > http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174 > 1. There are TWO new attributes that are added when the cluster results are saved through the visualization window. I knew about the Cluster attribute. However, an "Instance_number" attribute is also being added. I did not expect that. When I ran tests to verify the ignore option worked, my results were skewed because of the "Instance_number" attibute. To get the things to work, I really need to ignore two attributes on the second cluster run. 2. The output window for the cluster run, lists the number of attributes. It was alway 2 more than the original data set. I figured that since I was ignoring attributes, it should be the same as the original number of attributes. And since my test experiments never came out the same as what was expected, I figured the ignore attribute wasn't working or I was doing it wrong. It all works as expected when I ignore both newly created attributes. Thanks again for the help, Curtis _______________________________________________ Wekalist mailing list Send posts to: Wekalist@... List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html |
| Free embeddable forum powered by Nabble | Forum Help |