jonathanbriggs wrote:
I know that this has been asked before (from searching the archive) but I can also see that the WEKA software may have changed; I have tried this with 3-6-1 and 3-7.
I have some basket data (1300 instances and 300 attributes) and I have tried using ? to represent unknown values with no luck - my 2G Mac crashes (both versions of WEKA) even when I edit down the number of instances to 300 with 200 attributes. I have tried representing the true values as T, 1 and TRUE and get the same crashes.
I want to check that what is happening is what others experience. WEKA will not let me use APRIORI but filtered apriori instead (because I assume of the missing values).
Made some breakthroughs in my learning today and thought I would share for others.
1. Use 0 to stand for unknown values AND set the Apriori filter in WEKA to accept zeros as unknowns. (left click on filter)
2. Do not use "replace unknowns" as this sets all the unknowns to TRUE. I was doing this yesterday and ran out of memory because of the almost infinite numbers of rules that can therefore be generated.
3. Lower the Support and Metric values where the data is very sparse. (left click on filter)
4. Increase the number of rules produced to see a greater range.
5. Use sampling if the data set is very large.
Hope this is useful
=========
Follow me on Twitter or LinkedIn @jonathanbriggs