Using matplotlib's prctile on masked arrays

View: New views
3 Messages — Rating Filter:   Alert me  

Using matplotlib's prctile on masked arrays

by Gökhan SEVER-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

Consider this sample two columns of data:

 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999 999999.9999
 999999.9999   1693.9069
 999999.9999   1676.1059
 999999.9999   1621.5875
    651.8040       1542.1373
    691.0138       1650.4214
    678.5558       1710.7311
    621.5777    999999.9999
    644.8341    999999.9999
    696.2080    999999.9999

Putting into this data into a file say "sample.data" and loading with:

a,b = np.loadtxt('sample.data', dtype="float").T

I[16]: a
O[16]:
array([  1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
         6.51804000e+02,   6.91013800e+02,   6.78555800e+02,
         6.21577700e+02,   6.44834100e+02,   6.96208000e+02])

I[17]: b
O[17]:
array([ 999999.9999,  999999.9999,  999999.9999,    1693.9069,
          1676.1059,    1621.5875,    1542.1373,    1650.4214,
          1710.7311,  999999.9999,  999999.9999,  999999.9999])

### interestingly, the second column is loaded as it is but a values reformed a little. Why this could be happening? Any idea? Anyways, back to masked arrays:

I[24]: am = ma.masked_values(a, value=999999.9999)

I[25]: am
O[25]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False False False False],
       fill_value = 999999.9999)


I[30]: bm = ma.masked_values(b, value=999999.9999)

I[31]: am
O[31]:
masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777 644.8341 696.208],
             mask = [ True  True  True  True  True  True False False False False False False],
       fill_value = 999999.9999)


So far so good. A few basic checks:

I[33]: am/bm
O[33]:
masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712 0.39664667346 -- -- --],
             mask = [ True  True  True  True  True  True False False False  True  True  True],
       fill_value = 999999.9999)


I[34]: mean(am/bm)
O[34]: 0.41266624676580849

Unfortunately, matplotlib.mlab's prctile cannot handle this division:

I[54]: prctile(am/bm, p=[5,25,50,75,95])
O[54]:
array([  3.96646673e-01,   6.21577700e+02,   1.00000000e+06,
         1.00000000e+06,   1.00000000e+06])


This also results with wrong looking box-and-whisker plots.


Testing further with scipy.stats functions yields expected correct results:

I[55]: stats.scoreatpercentile(am/bm, per=5)
O[55]: 0.40877012449846228

I[49]: stats.scoreatpercentile(am/bm, per=25)
O[49]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)

I[56]: stats.scoreatpercentile(am/bm, per=95)
O[56]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)


Any confirmation?







--
Gökhan

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@...
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Re: [Numpy-discussion] Using matplotlib's prctile on masked arrays

by josef.pktd :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 27, 2009 at 7:56 AM, Gökhan Sever <gokhansever@...> wrote:

> Hello,
>
> Consider this sample two columns of data:
>
>  999999.9999 999999.9999
>  999999.9999 999999.9999
>  999999.9999 999999.9999
>  999999.9999   1693.9069
>  999999.9999   1676.1059
>  999999.9999   1621.5875
>     651.8040       1542.1373
>     691.0138       1650.4214
>     678.5558       1710.7311
>     621.5777    999999.9999
>     644.8341    999999.9999
>     696.2080    999999.9999
>
> Putting into this data into a file say "sample.data" and loading with:
>
> a,b = np.loadtxt('sample.data', dtype="float").T
>
> I[16]: a
> O[16]:
> array([  1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
>          1.00000000e+06,   1.00000000e+06,   1.00000000e+06,
>          6.51804000e+02,   6.91013800e+02,   6.78555800e+02,
>          6.21577700e+02,   6.44834100e+02,   6.96208000e+02])
>
> I[17]: b
> O[17]:
> array([ 999999.9999,  999999.9999,  999999.9999,    1693.9069,
>           1676.1059,    1621.5875,    1542.1373,    1650.4214,
>           1710.7311,  999999.9999,  999999.9999,  999999.9999])
>
> ### interestingly, the second column is loaded as it is but a values
> reformed a little. Why this could be happening? Any idea? Anyways, back to
> masked arrays:
>
> I[24]: am = ma.masked_values(a, value=999999.9999)
>
> I[25]: am
> O[25]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
>              mask = [ True  True  True  True  True  True False False False
> False False False],
>        fill_value = 999999.9999)
>
>
> I[30]: bm = ma.masked_values(b, value=999999.9999)
>
> I[31]: am
> O[31]:
> masked_array(data = [-- -- -- -- -- -- 651.804 691.0138 678.5558 621.5777
> 644.8341 696.208],
>              mask = [ True  True  True  True  True  True False False False
> False False False],
>        fill_value = 999999.9999)
>
>
> So far so good. A few basic checks:
>
> I[33]: am/bm
> O[33]:
> masked_array(data = [-- -- -- -- -- -- 0.422662755126 0.418689311712
> 0.39664667346 -- -- --],
>              mask = [ True  True  True  True  True  True False False False
> True  True  True],
>        fill_value = 999999.9999)
>
>
> I[34]: mean(am/bm)
> O[34]: 0.41266624676580849
>
> Unfortunately, matplotlib.mlab's prctile cannot handle this division:
>
> I[54]: prctile(am/bm, p=[5,25,50,75,95])
> O[54]:
> array([  3.96646673e-01,   6.21577700e+02,   1.00000000e+06,
>          1.00000000e+06,   1.00000000e+06])
>
>
> This also results with wrong looking box-and-whisker plots.
>
>
> Testing further with scipy.stats functions yields expected correct results:

This should not be the correct results if you use scipy.stats.scoreatpercentile,
it doesn't have correct missing value handling, it treats nans or
mask/fill values as regular numbers sorted to the end.

stats.mstats.scoreatpercentile  is the corresponding function for
masked arrays.

(BTW I wasn't able to quickly copy and past your example because
MaskedArrays don't seem to have a constructive __repr__, i.e.
no commas)

I don't know anything about the matplotlib story.

Josef

>
> I[55]: stats.scoreatpercentile(am/bm, per=5)
> O[55]: 0.40877012449846228
>
> I[49]: stats.scoreatpercentile(am/bm, per=25)
> O[49]:
> masked_array(data = --,
>              mask = True,
>        fill_value = 1e+20)
>
> I[56]: stats.scoreatpercentile(am/bm, per=95)
> O[56]:
> masked_array(data = --,
>              mask = True,
>        fill_value = 1e+20)
>
>
> Any confirmation?
>
>
>
>
>
>
>
> --
> Gökhan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@...
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@...
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Re: [Numpy-discussion] Using matplotlib's prctile on masked arrays

by Gökhan SEVER-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



On Tue, Oct 27, 2009 at 8:25 AM, <josef.pktd@...> wrote:
This should not be the correct results if you use scipy.stats.scoreatpercentile,
it doesn't have correct missing value handling, it treats nans or
mask/fill values as regular numbers sorted to the end.

stats.mstats.scoreatpercentile  is the corresponding function for
masked arrays.


Thanks for the suggestion. I forgot the existence of such module. It yields better results.

I[14]: st.mstats.scoreatpercentile(r, per=25)
O[14]:
masked_array(data = 0.401055201111,
             mask = False,
       fill_value = 1e+20)

I[17]: st.scoreatpercentile(r, per=25)
O[17]:
masked_array(data = --,
             mask = True,
       fill_value = 1e+20)

I usually fall into traps using masked arrays. Hopefully I will figure out these before I make funnier mistakes in my analysis.

Besides, it would be nice to have the "per" argument accepts a sequence instead of a one item. Like matplotlib's prctile. Using it as: ...(array, per=[5,25,50,75,95]) in a one call.
 
(BTW I wasn't able to quickly copy and past your example because
MaskedArrays don't seem to have a constructive __repr__, i.e.
no commas)


You can copy and paste the sample data from this link. When I copied from a txt file into gmail into somehow distorted the original look of the data.

http://code.google.com/p/ccnworks/source/browse/trunk/sample.data

 
I don't know anything about the matplotlib story.

Josef

>
> I[55]: stats.scoreatpercentile(am/bm, per=5)
> O[55]: 0.40877012449846228
>
> I[49]: stats.scoreatpercentile(am/bm, per=25)
> O[49]:
> masked_array(data = --,
>              mask = True,
>        fill_value = 1e+20)
>
> I[56]: stats.scoreatpercentile(am/bm, per=95)
> O[56]:
> masked_array(data = --,
>              mask = True,
>        fill_value = 1e+20)
>
>
> Any confirmation?
>
>
>
>
>
>
>
> --
> Gökhan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@...
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@...
http://mail.scipy.org/mailman/listinfo/numpy-discussion



--
Gökhan

------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@...
https://lists.sourceforge.net/lists/listinfo/matplotlib-users