Different Cluster Results When Running From Command Line

View: New views
4 Messages — Rating Filter:   Alert me  

Different Cluster Results When Running From Command Line

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For the attached arff file, I get drastically different results when
running from the command line than I do from the Explorer.

Here are the command line parameters:
weka.clusterers.Cobweb -A 1.0 -C 1.715825875 -S 42

Why would I get different results between the command line and the explorer?

--
Curtis


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

all.arff (145K) Download Attachment

Re: Different Cluster Results When Running From Command Line

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> For the attached arff file, I get drastically different results when
> running from the command line than I do from the Explorer.
>
> Here are the command line parameters:
> weka.clusterers.Cobweb -A 1.0 -C 1.715825875 -S 42
>
> Why would I get different results between the command line and the explorer?

The Explorer builds cluster algorithms always in batch-mode, the
commandline distinguishes between incremental and batch ones. If you
hack the code of the weka.clusterers.ClusterEvaluation class and turn
off incremental training, then you get the same output.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Different Cluster Results When Running From Command Line

by Curtis Jensen-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hack the weka.clusters.ClusterEvaluation class?
Are you saying that the command line can't (without code modification)
produce the same output as the explorer?

Thanks,
Curtis


On Mon, Oct 26, 2009 at 4:59 PM, Peter Reutemann <fracpete@...> wrote:

>> For the attached arff file, I get drastically different results when
>> running from the command line than I do from the Explorer.
>>
>> Here are the command line parameters:
>> weka.clusterers.Cobweb -A 1.0 -C 1.715825875 -S 42
>>
>> Why would I get different results between the command line and the explorer?
>
> The Explorer builds cluster algorithms always in batch-mode, the
> commandline distinguishes between incremental and batch ones. If you
> hack the code of the weka.clusterers.ClusterEvaluation class and turn
> off incremental training, then you get the same output.
>
> Cheers, Peter
> --
> Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
> http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@...
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Different Cluster Results When Running From Command Line

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please no top-posting, see mailing list etiquette why
(http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html).

> Hack the weka.clusters.ClusterEvaluation class?
> Are you saying that the command line can't (without code modification)
> produce the same output as the explorer?

What I'm saying is, that the Explorer is "dumb" and treats every
cluster algorithms as a batch algorithm. With the commandline, you're
able to process large amounts of data that don't have to fit into
memory all at once. The Explorer always loads the data completely into
memory.

I'm not very familiar with Cobweb, so I'm not sure where exactly the
problem is, why it's generating different results for incremental and
batch mode. For instance, in batch-mode, Cobweb randomizes the data
again internally, before incrementally building its model.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html