Deployment of Weka models to frontline

View: New views
8 Messages — Rating Filter:   Alert me  

Deployment of Weka models to frontline

by Kip Marks :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hi

Does anyone on the list have any experience or ideas on how best to deploy Weka models for use in an operational environment? At the moment we're thinking of refreshing and revalidating the models every month but when it comes to interfacing with production systems we simply don't know how best to go about it. An RMI call from our applications to a little data-mining server that runs a JDBC-compliant database for querying and anaysis of results?  Or direct call of Weka's classes? Help! There must be pros and cons for each option.

We're a small data- and text-mining team in a large government Ministry that offers many services to the citizens of NZ: benefits, student loans, child protection etc and the models we're building are designed to support the decision-making of frontline staff. We have terabytes of data in one of NZ's biggest data-warehouse and we have lots of models to build - so if there is anyone out there wanting to apply their data-mining skills to real-life problems, then please send me your CV as we are hiring!

Dr. Kip Marks

CSRE Forecasting & Modelling
Ministry of Social Development
Wellington
NZ

DDI: +64-4-9163594

 -------------------------------
 This email and any attachments may contain information that is confidential and subject to legal privilege. If you are not the intended recipient, any use, dissemination, distribution or duplication of this email and attachments is prohibited. If you have received this email in error please notify the author immediately and erase all copies of the email and attachments. The Ministry of Social Development accepts no responsibility for changes made to this message or attachments after transmission  from the Ministry.
 -------------------------------


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Deployment of Weka models to frontline

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Mark,

Pentaho offers a platform for deploying Weka models for scoring and for
refreshing/rebuilding models via their Kettle ETL tool. Kettle is a streaming,
process-flow style ETL engine that can interface with many data sources. There
are components for Kettle that allow serialized Weka models to be loaded from
disk or repositories at run time and used to score data as part of an ETL
process. Similarly, another Kettle component can execute Weka Knowledge Flow
processes to rebuild models. The ETL processes can be scheduled using your OS's
scheduling utilities or deployed on the Pentaho BI server.

There is a white paper on this at:

http://www.pentaho.com/products/demo/data_mining_models_with_pentaho.php?asset=data-mining-models-pdf

Documentation on the Weka-related components for Kettle can be found at:

http://wiki.pentaho.com/display/DATAMINING/Pentaho+Data+Mining+Community+Documentation

Cheers,
Mark.

On 6/11/09 8:15 AM, Kip Marks wrote:

>
> Hi
>
> Does anyone on the list have any experience or ideas on how best to deploy Weka models for use in an operational environment? At the moment we're thinking of refreshing and revalidating the models every month but when it comes to interfacing with production systems we simply don't know how best to go about it. An RMI call from our applications to a little data-mining server that runs a JDBC-compliant database for querying and anaysis of results?  Or direct call of Weka's classes? Help! There must be pros and cons for each option.
>
> We're a small data- and text-mining team in a large government Ministry that offers many services to the citizens of NZ: benefits, student loans, child protection etc and the models we're building are designed to support the decision-making of frontline staff. We have terabytes of data in one of NZ's biggest data-warehouse and we have lots of models to build - so if there is anyone out there wanting to apply their data-mining skills to real-life problems, then please send me your CV as we are hiring!
>
> Dr. Kip Marks
>
> CSRE Forecasting&  Modelling
> Ministry of Social Development
> Wellington
> NZ
>
> DDI: +64-4-9163594
>
>   -------------------------------
>   This email and any attachments may contain information that is confidential and subject to legal privilege. If you are not the intended recipient, any use, dissemination, distribution or duplication of this email and attachments is prohibited. If you have received this email in error please notify the author immediately and erase all copies of the email and attachments. The Ministry of Social Development accepts no responsibility for changes made to this message or attachments after transmission  from the Ministry.
>   -------------------------------
>
>
>
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@...
> List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL
32822, USA
+64 7 348-7099 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Deployment of Weka models to frontline

by NightlordTW :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



2009/11/5 Kip Marks <Kip.Marks007@...>

Hi

Does anyone on the list have any experience or ideas on how best to deploy Weka models for use in an operational environment? At the moment we're thinking of refreshing and revalidating the models every month but when it comes to interfacing with production systems we simply don't know how best to go about it. An RMI call from our applications to a little data-mining server that runs a JDBC-compliant database for querying and anaysis of results?  Or direct call of Weka's classes? Help! There must be pros and cons for each option.

We're a small data- and text-mining team in a large government Ministry that offers many services to the citizens of NZ: benefits, student loans, child protection etc and the models we're building are designed to support the decision-making of frontline staff. We have terabytes of data in one of NZ's biggest data-warehouse and we have lots of models to build - so if there is anyone out there wanting to apply their data-mining skills to real-life problems, then please send me your CV as we are hiring!

Dr. Kip Marks

CSRE Forecasting & Modelling
Ministry of Social Development
Wellington
NZ

DDI: +64-4-9163594

 -------------------------------
 This email and any attachments may contain information that is confidential and subject to legal privilege. If you are not the intended recipient, any use, dissemination, distribution or duplication of this email and attachments is prohibited. If you have received this email in error please notify the author immediately and erase all copies of the email and attachments. The Ministry of Social Development accepts no responsibility for changes made to this message or attachments after transmission  from the Ministry.
 -------------------------------


If you are working with huge datasets, it might be more appropriate to choose one technique, and to programm it in a language such as C. As far as I know, Weka is only available in Java, which is an excellent programming language for many situations, but not when it comes to speed. This is a typical result of interpretation (eg Java) in stead of compilation (eg C). However, if you do choose for Java (or any other programming language), I'd keep the querying load separated from the mining load. This means I'd go for at least 2 servers, not counting the computer/server that interacts with the end user. This means you need a central database where features can be stored and indexed. In that case, the database becomes the central point: the database answers queries from the user, while the mining server sends its results to the database on a regular basis (eg in batches or every time unit).
Hence, you can easily divide the mining work to more servers when they become available.
RMI looks only interesting when you want to develop an application on the client computer that does not has much CPU power. In that case, RMI allows you to move the load of interpreting the query results to a central server, such that the client only has to visualise. However, when many computers are connected to such a server, expect delays.
In any case, there are many possibilities, but more important is to define what the goal is. Otherwise, you risk creating a great application that is everything but performant.
 
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html




--
Thomas Debray | Theoretical Epidemiology | Julius Center | Stratenum 6.131 | University Medical Center Utrecht  | P.O.Box 85500  | 3508 GA Utrecht | The Netherlands | www.juliuscenter.nl | www.thomasdebray.be | www.netstorm.be

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Deployment of Weka models to frontline

by Bernhard Pfahringer-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> If you are working with huge datasets, it might be more appropriate to
> choose one technique, and to programm it in a language such as C. As far as
> I know, Weka is only available in Java, which is an excellent programming
> language for many situations, but not when it comes to speed. This is a
> typical result of interpretation (eg Java) in stead of compilation (eg C).

Please stop spreading myths!
Do you have *recent* comparison reference to backup your speed claims?

Java running on your mobile phone might be interpreted, but Java running
on a server will be compiled and with modern HotSpot technology will be
optimized using immediate runtime performance feedback, i.e. it will be
specifically optimized for what the current problem set is, or in other words
it uses profiling information on the fly. The same could of course be
done for C,
but I am not aware of any C product actually implementing this.

http://java.sun.com/javase/technologies/hotspot/

Bernhard

---------------------------------------------------------------------
Bernhard Pfahringer, Dept. of Computer Science, University of Waikato
http://www.cs.waikato.ac.nz/~bernhard                  +64 7 838 4041

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Deployment of Weka models to frontline

by Michael C. Harris :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, Nov 05, 2009 at 11:24:09PM +0100, Thomas Debray wrote:

[snip]

>    As far as I know, Weka is only available in Java, which is an
>    excellent programming language for many situations, but not when
>    it comes to speed.

It has little bearing on the speed issue, though it might be of
interest or news to some that Weka is "available" in any programming
language that has a JVM implementation, of which there are many.

I'm currently accessing Weka classes through JRuby.

--
Michael C. Harris, School of CS&IT, RMIT University
http://twofishcreative.com/michael/blog
IRC: michaeltwofish #habari


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Deployment of Weka models to frontline

by NightlordTW :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



2009/11/5 Bernhard Pfahringer <bernhard.pfahringer@...>
> If you are working with huge datasets, it might be more appropriate to
> choose one technique, and to programm it in a language such as C. As far as
> I know, Weka is only available in Java, which is an excellent programming
> language for many situations, but not when it comes to speed. This is a
> typical result of interpretation (eg Java) in stead of compilation (eg C).

Please stop spreading myths!
Do you have *recent* comparison reference to backup your speed claims?

I never heard of Hotspot, however, after some research I must admit that some new evolves have brought Java into a better position. 



--
www.juliuscenter.nl | www.thomasdebray.be | www.netstorm.be

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Exeception when calling classifyInstance method

by Doan Viet Dung :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi everybody,

When I call the following method

try {
      ((AdaBoostM1) myClassifier).classifyInstance(testing_instances.firstInstance());
}
catch (Exception e) {
      System.out.println(" nothing ");
}
it throw an exception -> " nothing ". I verified my classifier, it is ok. In fact, the AdaBoostM1 classifier is only performed 1 time ( then it gets out due to the error is too small ).

The testing_instances is empty and is created based on the dataset of the training instances. Then I add a new instance to the testing_instances, set the classValue to 0. It seems everything is ok but why it results an exception ?

Any suggestion ?
Thank you very much in advance
VDung


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Exeception when calling classifyInstance method

by Peter Reutemann-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> When I call the following method
>
> try {
>       ((AdaBoostM1) myClassifier).classifyInstance(testing_instances.firstInstance());
> }
> catch (Exception e) {
>       System.out.println(" nothing ");
> }
> it throw an exception -> " nothing ". I verified my classifier, it is ok. In fact, the AdaBoostM1 classifier is only performed 1 time ( then it gets out due to the error is too small ).
>
> The testing_instances is empty and is created based on the dataset of the training instances. Then I add a new instance to the testing_instances, set the classValue to 0. It seems everything is ok but why it results an exception ?
>
> Any suggestion ?

Can you please post the full stacktrace of the exception? That will
most likely shed some more light on the problem. Use the following
code to output the stacktrace:

try {
  ((AdaBoostM1)
myClassifier).classifyInstance(testing_instances.firstInstance());
}
catch (Exception e) {
  e.printStackTrace();
}

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/           Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html