Generalized Sequential Patterns: Data representation

View: New views
3 Messages — Rating Filter:   Alert me  

Generalized Sequential Patterns: Data representation

by Laura Lozano :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I have a question about how to represent sequential data to work with GSP, concretely sequences of different length.

Imagine I have the next transactions:

1,pear
1,peach,pear
2,apple,orange
2,apple,pear
2,pear
3,apple,orange,peach
3,orange
4,melon,orange,peach
4,orange

The first attribute indicates the sequence number, ¿How can I represent this data in weka? Because you have to have the same number of attributes in all the instances.

I have tried two things, first:

@relation sequential_fruits

@attribute sequence{1,2,3,4}
@attribute fruta_1{pear,peach,orange,melon,apple}
@attribute fruta_2{pear,peach,orange,melon,apple}
@attribute fruta_3{pear,peach,orange,melon,apple}
@attribute fruta_4{pear,peach,orange,melon,apple}
@attribute fruta_5{pear,peach,orange,melon,apple}

@data
1,pear,?,?,?,?
1,pear,peach,?,?,?
2,orange,apple,?,?,?
2,pear,apple,?,?,?
2,pear,?,?,?,?
3,peach,orange,apple,?,?
3,orange,?,?,?,?
4,peach,orange,melon,?,?
4,orange,?,?,?,?


But GSP can`t deal with missing values. With string values it neither works.

After that I have tried this:

@relation sequential_fruits

@attribute sequence{1,2,3,4}
@attribute pear{pear=0,pear=1}
@attribute peach{peach=0,peach=1}
@attribute orange{orange=0,orange=1}
@attribute melon{melon=0,melon=1}
@attribute apple{apple=0,apple=1}

@data
1,pear=1,peach=0,orange=0,melon=0,apple=0
1,pear=1,peach=1,orange=0,melon=0,apple=0
2,pear=0,peach=0,orange=1,melon=0,apple=1
2,pear=1,peach=0,orange=0,melon=0,apple=1
2,pear=1,peach=0,orange=0,melon=0,apple=0
3,pear=0,peach=1,orange=1,melon=0,apple=1
3,pear=0,peach=0,orange=1,melon=0,apple=0
4,pear=0,peach=1,orange=1,melon=1,apple=0
4,pear=0,peach=0,orange=1,melon=0,apple=0


This last representation works fine, but it`s very confusing to interpret the results, because patterns with the attributes equal to 0 (absence of the furit) are not interesting at all, is there any other reasonable way to represent this kind of data?


Thank you very much.

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Generalized Sequential Patterns: Data representation

by Sebastian Beer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Laura,

I'm afraid you have to use your last mentioned workaround at the
moment, I've never considered instances with different length when
implementing the algorithm. Some point for my todo list.

Greetings
Sebastian


On Thu, Nov 5, 2009 at 12:50 PM, Laura Lozano <laurloz83@...> wrote:

> Hello,
>
> I have a question about how to represent sequential data to work with GSP,
> concretely sequences of different length.
>
> Imagine I have the next transactions:
>
> 1,pear
> 1,peach,pear
> 2,apple,orange
> 2,apple,pear
> 2,pear
> 3,apple,orange,peach
> 3,orange
> 4,melon,orange,peach
> 4,orange
>
> The first attribute indicates the sequence number, ¿How can I represent this
> data in weka? Because you have to have the same number of attributes in all
> the instances.
>
> I have tried two things, first:
>
> @relation sequential_fruits
>
> @attribute sequence{1,2,3,4}
> @attribute fruta_1{pear,peach,orange,melon,apple}
> @attribute fruta_2{pear,peach,orange,melon,apple}
> @attribute fruta_3{pear,peach,orange,melon,apple}
> @attribute fruta_4{pear,peach,orange,melon,apple}
> @attribute fruta_5{pear,peach,orange,melon,apple}
>
> @data
> 1,pear,?,?,?,?
> 1,pear,peach,?,?,?
> 2,orange,apple,?,?,?
> 2,pear,apple,?,?,?
> 2,pear,?,?,?,?
> 3,peach,orange,apple,?,?
> 3,orange,?,?,?,?
> 4,peach,orange,melon,?,?
> 4,orange,?,?,?,?
>
>
> But GSP can`t deal with missing values. With string values it neither works.
>
> After that I have tried this:
>
> @relation sequential_fruits
>
> @attribute sequence{1,2,3,4}
> @attribute pear{pear=0,pear=1}
> @attribute peach{peach=0,peach=1}
> @attribute orange{orange=0,orange=1}
> @attribute melon{melon=0,melon=1}
> @attribute apple{apple=0,apple=1}
>
> @data
> 1,pear=1,peach=0,orange=0,melon=0,apple=0
> 1,pear=1,peach=1,orange=0,melon=0,apple=0
> 2,pear=0,peach=0,orange=1,melon=0,apple=1
> 2,pear=1,peach=0,orange=0,melon=0,apple=1
> 2,pear=1,peach=0,orange=0,melon=0,apple=0
> 3,pear=0,peach=1,orange=1,melon=0,apple=1
> 3,pear=0,peach=0,orange=1,melon=0,apple=0
> 4,pear=0,peach=1,orange=1,melon=1,apple=0
> 4,pear=0,peach=0,orange=1,melon=0,apple=0
>
>
> This last representation works fine, but it`s very confusing to interpret
> the results, because patterns with the attributes equal to 0 (absence of the
> furit) are not interesting at all, is there any other reasonable way to
> represent this kind of data?
>
>
> Thank you very much.
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@...
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
> http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html
>
>

_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html

Re: Generalized Sequential Patterns: Data representation

by asma ben zakour :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Laura,

you can use the other version! I do it  but ypu have to add a null value attributes like this :


@relation sequential_fruits

@attribute sequence{1,2,3,4}
@attribute fruta_1{pear,peach,orange,melon,apple, null}
@attribute fruta_2{pear,peach,orange,melon,apple, null}
@attribute fruta_3{pear,peach,orange,melon,apple, null}
@attribute fruta_4{pear,peach,orange,melon,apple, null}
@attribute fruta_5{pear,peach,orange,melon,apple,null}

@data
1,pear,null,null,null,null
1,pear,peach,null,null,null
2,orange,apple,null,null,null
2,pear,apple,?null,null,nu
2,pear,null,null,null,null

I hope it help you .
cheers
Asma





2009/11/5 Sebastian Beer <derphieffekt@...>
Hi Laura,

I'm afraid you have to use your last mentioned workaround at the
moment, I've never considered instances with different length when
implementing the algorithm. Some point for my todo list.

Greetings
Sebastian


On Thu, Nov 5, 2009 at 12:50 PM, Laura Lozano <laurloz83@...> wrote:
> Hello,
>
> I have a question about how to represent sequential data to work with GSP,
> concretely sequences of different length.
>
> Imagine I have the next transactions:
>
> 1,pear
> 1,peach,pear
> 2,apple,orange
> 2,apple,pear
> 2,pear
> 3,apple,orange,peach
> 3,orange
> 4,melon,orange,peach
> 4,orange
>
> The first attribute indicates the sequence number, ¿How can I represent this
> data in weka? Because you have to have the same number of attributes in all
> the instances.
>
> I have tried two things, first:
>
> @relation sequential_fruits
>
> @attribute sequence{1,2,3,4}
> @attribute fruta_1{pear,peach,orange,melon,apple}
> @attribute fruta_2{pear,peach,orange,melon,apple}
> @attribute fruta_3{pear,peach,orange,melon,apple}
> @attribute fruta_4{pear,peach,orange,melon,apple}
> @attribute fruta_5{pear,peach,orange,melon,apple}
>
> @data
> 1,pear,?,?,?,?
> 1,pear,peach,?,?,?
> 2,orange,apple,?,?,?
> 2,pear,apple,?,?,?
> 2,pear,?,?,?,?
> 3,peach,orange,apple,?,?
> 3,orange,?,?,?,?
> 4,peach,orange,melon,?,?
> 4,orange,?,?,?,?
>
>
> But GSP can`t deal with missing values. With string values it neither works.
>
> After that I have tried this:
>
> @relation sequential_fruits
>
> @attribute sequence{1,2,3,4}
> @attribute pear{pear=0,pear=1}
> @attribute peach{peach=0,peach=1}
> @attribute orange{orange=0,orange=1}
> @attribute melon{melon=0,melon=1}
> @attribute apple{apple=0,apple=1}
>
> @data
> 1,pear=1,peach=0,orange=0,melon=0,apple=0
> 1,pear=1,peach=1,orange=0,melon=0,apple=0
> 2,pear=0,peach=0,orange=1,melon=0,apple=1
> 2,pear=1,peach=0,orange=0,melon=0,apple=1
> 2,pear=1,peach=0,orange=0,melon=0,apple=0
> 3,pear=0,peach=1,orange=1,melon=0,apple=1
> 3,pear=0,peach=0,orange=1,melon=0,apple=0
> 4,pear=0,peach=1,orange=1,melon=1,apple=0
> 4,pear=0,peach=0,orange=1,melon=0,apple=0
>
>
> This last representation works fine, but it`s very confusing to interpret
> the results, because patterns with the attributes equal to 0 (absence of the
> furit) are not interesting at all, is there any other reasonable way to
> represent this kind of data?
>
>
> Thank you very much.
>
> _______________________________________________
> Wekalist mailing list
> Send posts to: Wekalist@...
> List info and subscription status:
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
> List etiquette:
_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html


_______________________________________________
Wekalist mailing list
Send posts to: Wekalist@...
List info and subscription status: https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
List etiquette: http://www.cs.waikato.ac.nz/~ml/weka/mailinglist_etiquette.html