regriggging 'index' data

View: New views
10 Messages — Rating Filter:   Alert me  

regriggging 'index' data

by Dan Kokron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello all,

I have a gridded 2D dataset that contains global (43200x21600) soil
texture data.  Each 30-sec X 30-sec grid box contains a single integer
that represents the whole box.  There are 16 soil types represented in
the whole file.  I'd like to create a lower resolution (still global)
version of this file using a grid that is 1152x721.  The various linear
and cubic schemes don't make sense given the 'index' nature of the data.
Any ideas?

Does GMT have a box-averaging scheme that includes voting.  (see the
'vt' option in)
http://opengrads.org/doc/udxt/re/re.html

--
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron@...
Phone: (301) 614-5192
Fax:   (301) 614-5304

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Walter H. F. Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I think what you want is to sort the data into boxes sized to the new  
smaller size, and then take the median or the mode in each box.  I  
think you have to do this, because with the number representing an  
index to a type, you can only have one index per box, and an average  
doesn't make sense.

So look at blockmode and blockmedian in GMT


Walter H F Smith (PhD)
Chairman, GEBCO TSCOM/SCDB
Geophysicist, Laboratory for Satellite Altimetry
NOAA NESDIS code E/RA-31
1335 East West Hwy, room 5408
Silver Spring MD 20910-3226
tel 301-713-2857, extension 126
fax 301-713-3136
Walter.HF.Smith@...



On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:

> Hello all,
>
> I have a gridded 2D dataset that contains global (43200x21600) soil
> texture data.  Each 30-sec X 30-sec grid box contains a single integer
> that represents the whole box.  There are 16 soil types represented in
> the whole file.  I'd like to create a lower resolution (still global)
> version of this file using a grid that is 1152x721.  The various  
> linear
> and cubic schemes don't make sense given the 'index' nature of the  
> data.
> Any ideas?
>
> Does GMT have a box-averaging scheme that includes voting.  (see the
> 'vt' option in)
> http://opengrads.org/doc/udxt/re/re.html
>
> --
> Dan Kokron
> Global Modeling and Assimilation Office
> NASA Goddard Space Flight Center
> Greenbelt, MD 20771
> Daniel.S.Kokron@...
> Phone: (301) 614-5192
> Fax:   (301) 614-5304
>
> To unsubscribe, send the message "signoff gmt-help" to  
> listserv@...

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Dan Kokron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



Thanks for the pointer to 'blockmode'.  It does seem to be doing the
'right' thing most of the time, but it is giving me 'mean' values for
some blocks.  I guess this could happen if a target blocks ends up with
equal counts of two values, but that is not the case here.  I've
attached a series of images that I hope demonstrates the issue.

The first image shows the a region of Puerto Rico that roughly
corresponds to a single grid box on the target (lower resolution) grid.
The squiggly line represents the coastline.  The northern part of the
image (purple) is land.

The second image overlays the actual target grid box with the value that
'blockmode' came up with (11.5).

The third image overlays the native grid boxes with integer values for
each.  All the land points are 9 while all the water points are 14.
There are clearly more land points in the target grid box yet
'blockmode' decided to do a simple mean of the high and low values.

Does this seem reasonable?

There are numerous other examples in this dataset where 'blockmode' took
simple means instead of mode values.  Here are a few from the xyz file
that was created using ...

blockmode global_topsoil_texture_43200x21600.xyz
-Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >

-66.4625        18.75   14      0       14      14
-66.4625        18.5    11.5    3.7065  9       14
-66.4625        18.25   9       4.4478  6       12
-66.4625        18      11.5    3.7065  9       14
-66.4625        8.75    5.5     5.1891  2       9
-66.4625        8.5     5.5     5.1891  2       9
-66.4625        -18     6       0       6       16
-66.4625        -18.25  6       0       6       16
-66.4625        -18.5   6       0       6       16
-66.4625        -18.75  6       0       6       16
-66.4625        -21.75  9.5     5.1891  4       13
-66.4625        -34.75  5.5     5.1891  2       9
-66.4625        -35     5.5     5.1891  2       9
-66.4625        -35.25  5.5     5.1891  2       9
-66.4625        -35.5   5.5     5.1891  2       9
-66.4625        -35.75  5.5     5.1891  2       9
-66.4625        -36     5.5     5.1891  2       9

The process to reproduce this is quite involved, but I will put together
a recipe if needed.

Thanks
Dan

On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:

> I think what you want is to sort the data into boxes sized to the new  
> smaller size, and then take the median or the mode in each box.  I  
> think you have to do this, because with the number representing an  
> index to a type, you can only have one index per box, and an average  
> doesn't make sense.
>
> So look at blockmode and blockmedian in GMT
>
>
> Walter H F Smith (PhD)
> Chairman, GEBCO TSCOM/SCDB
> Geophysicist, Laboratory for Satellite Altimetry
> NOAA NESDIS code E/RA-31
> 1335 East West Hwy, room 5408
> Silver Spring MD 20910-3226
> tel 301-713-2857, extension 126
> fax 301-713-3136
> Walter.HF.Smith@...
>
>
>
> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>
> > Hello all,
> >
> > I have a gridded 2D dataset that contains global (43200x21600) soil
> > texture data.  Each 30-sec X 30-sec grid box contains a single integer
> > that represents the whole box.  There are 16 soil types represented in
> > the whole file.  I'd like to create a lower resolution (still global)
> > version of this file using a grid that is 1152x721.  The various  
> > linear
> > and cubic schemes don't make sense given the 'index' nature of the  
> > data.
> > Any ideas?
> >
> > Does GMT have a box-averaging scheme that includes voting.  (see the
> > 'vt' option in)
> > http://opengrads.org/doc/udxt/re/re.html
> >
> > --
> > Dan Kokron
> > Global Modeling and Assimilation Office
> > NASA Goddard Space Flight Center
> > Greenbelt, MD 20771
> > Daniel.S.Kokron@...
> > Phone: (301) 614-5192
> > Fax:   (301) 614-5304
> >
> > To unsubscribe, send the message "signoff gmt-help" to  
> > listserv@...
>
> To unsubscribe, send the message "signoff gmt-help" to listserv@...
--

To unsubscribe, send the message "signoff gmt-help" to listserv@...




PR_mode_case.gif (45K) Download Attachment
PR_mode_case_1.gif (49K) Download Attachment
PR_mode_case_2.gif (136K) Download Attachment

Re: regriggging 'index' data

by Paul Wessel-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Dan-

blockmode is using the "least median of squares" algorithm to  
approximate the mode, and this is defined as the half-point of the  
shortest range that covers half of the data.  Since your data are  
integers it would have been more optimal to simply pick a histogram  
peak but for general data sets (floating point) there is no way to  
select an optimum bin-width.  Currently, blockmode does not have an  
option to apply a histogram algorithm to the data.  However, if you  
roll up your sleeves for some serious scripting you could accomplish  
what you need by invoking pshistogram on each data bin inside a loop.  
pshistogram can return the counts instead of making a plot and you  
could pull out the bin with the highest frequency (the mode) and build  
a data set of modes.

Cheers,
Paul

On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:

>
>
> Thanks for the pointer to 'blockmode'.  It does seem to be doing the
> 'right' thing most of the time, but it is giving me 'mean' values for
> some blocks.  I guess this could happen if a target blocks ends up  
> with
> equal counts of two values, but that is not the case here.  I've
> attached a series of images that I hope demonstrates the issue.
>
> The first image shows the a region of Puerto Rico that roughly
> corresponds to a single grid box on the target (lower resolution)  
> grid.
> The squiggly line represents the coastline.  The northern part of the
> image (purple) is land.
>
> The second image overlays the actual target grid box with the value  
> that
> 'blockmode' came up with (11.5).
>
> The third image overlays the native grid boxes with integer values for
> each.  All the land points are 9 while all the water points are 14.
> There are clearly more land points in the target grid box yet
> 'blockmode' decided to do a simple mean of the high and low values.
>
> Does this seem reasonable?
>
> There are numerous other examples in this dataset where 'blockmode'  
> took
> simple means instead of mode values.  Here are a few from the xyz file
> that was created using ...
>
> blockmode global_topsoil_texture_43200x21600.xyz
> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
>
> -66.4625        18.75   14      0       14      14
> -66.4625        18.5    11.5    3.7065  9       14
> -66.4625        18.25   9       4.4478  6       12
> -66.4625        18      11.5    3.7065  9       14
> -66.4625        8.75    5.5     5.1891  2       9
> -66.4625        8.5     5.5     5.1891  2       9
> -66.4625        -18     6       0       6       16
> -66.4625        -18.25  6       0       6       16
> -66.4625        -18.5   6       0       6       16
> -66.4625        -18.75  6       0       6       16
> -66.4625        -21.75  9.5     5.1891  4       13
> -66.4625        -34.75  5.5     5.1891  2       9
> -66.4625        -35     5.5     5.1891  2       9
> -66.4625        -35.25  5.5     5.1891  2       9
> -66.4625        -35.5   5.5     5.1891  2       9
> -66.4625        -35.75  5.5     5.1891  2       9
> -66.4625        -36     5.5     5.1891  2       9
>
> The process to reproduce this is quite involved, but I will put  
> together
> a recipe if needed.
>
> Thanks
> Dan
>
> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
>> I think what you want is to sort the data into boxes sized to the new
>> smaller size, and then take the median or the mode in each box.  I
>> think you have to do this, because with the number representing an
>> index to a type, you can only have one index per box, and an average
>> doesn't make sense.
>>
>> So look at blockmode and blockmedian in GMT
>>
>>
>> Walter H F Smith (PhD)
>> Chairman, GEBCO TSCOM/SCDB
>> Geophysicist, Laboratory for Satellite Altimetry
>> NOAA NESDIS code E/RA-31
>> 1335 East West Hwy, room 5408
>> Silver Spring MD 20910-3226
>> tel 301-713-2857, extension 126
>> fax 301-713-3136
>> Walter.HF.Smith@...
>>
>>
>>
>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>>
>>> Hello all,
>>>
>>> I have a gridded 2D dataset that contains global (43200x21600) soil
>>> texture data.  Each 30-sec X 30-sec grid box contains a single  
>>> integer
>>> that represents the whole box.  There are 16 soil types  
>>> represented in
>>> the whole file.  I'd like to create a lower resolution (still  
>>> global)
>>> version of this file using a grid that is 1152x721.  The various
>>> linear
>>> and cubic schemes don't make sense given the 'index' nature of the
>>> data.
>>> Any ideas?
>>>
>>> Does GMT have a box-averaging scheme that includes voting.  (see the
>>> 'vt' option in)
>>> http://opengrads.org/doc/udxt/re/re.html
>>>
>>> --
>>> Dan Kokron
>>> Global Modeling and Assimilation Office
>>> NASA Goddard Space Flight Center
>>> Greenbelt, MD 20771
>>> Daniel.S.Kokron@...
>>> Phone: (301) 614-5192
>>> Fax:   (301) 614-5304
>>>
>>> To unsubscribe, send the message "signoff gmt-help" to
>>> listserv@...
>>
>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> --
>
> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Dan Kokron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Let me start by saying that I am not a C programmer.

Looking through blockmode.c I notice that when the 'E' option is active,
the code makes a sorted copy of the values in a particular target block
(z_tmp).  Is it safe to compute the mode (most frequent integer value)
on this array?

Dan

On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:

> Hi Dan-
>
> blockmode is using the "least median of squares" algorithm to  
> approximate the mode, and this is defined as the half-point of the  
> shortest range that covers half of the data.  Since your data are  
> integers it would have been more optimal to simply pick a histogram  
> peak but for general data sets (floating point) there is no way to  
> select an optimum bin-width.  Currently, blockmode does not have an  
> option to apply a histogram algorithm to the data.  However, if you  
> roll up your sleeves for some serious scripting you could accomplish  
> what you need by invoking pshistogram on each data bin inside a loop.  
> pshistogram can return the counts instead of making a plot and you  
> could pull out the bin with the highest frequency (the mode) and build  
> a data set of modes.
>
> Cheers,
> Paul
>
> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
>
> >
> >
> > Thanks for the pointer to 'blockmode'.  It does seem to be doing the
> > 'right' thing most of the time, but it is giving me 'mean' values for
> > some blocks.  I guess this could happen if a target blocks ends up  
> > with
> > equal counts of two values, but that is not the case here.  I've
> > attached a series of images that I hope demonstrates the issue.
> >
> > The first image shows the a region of Puerto Rico that roughly
> > corresponds to a single grid box on the target (lower resolution)  
> > grid.
> > The squiggly line represents the coastline.  The northern part of the
> > image (purple) is land.
> >
> > The second image overlays the actual target grid box with the value  
> > that
> > 'blockmode' came up with (11.5).
> >
> > The third image overlays the native grid boxes with integer values for
> > each.  All the land points are 9 while all the water points are 14.
> > There are clearly more land points in the target grid box yet
> > 'blockmode' decided to do a simple mean of the high and low values.
> >
> > Does this seem reasonable?
> >
> > There are numerous other examples in this dataset where 'blockmode'  
> > took
> > simple means instead of mode values.  Here are a few from the xyz file
> > that was created using ...
> >
> > blockmode global_topsoil_texture_43200x21600.xyz
> > -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
> >
> > -66.4625        18.75   14      0       14      14
> > -66.4625        18.5    11.5    3.7065  9       14
> > -66.4625        18.25   9       4.4478  6       12
> > -66.4625        18      11.5    3.7065  9       14
> > -66.4625        8.75    5.5     5.1891  2       9
> > -66.4625        8.5     5.5     5.1891  2       9
> > -66.4625        -18     6       0       6       16
> > -66.4625        -18.25  6       0       6       16
> > -66.4625        -18.5   6       0       6       16
> > -66.4625        -18.75  6       0       6       16
> > -66.4625        -21.75  9.5     5.1891  4       13
> > -66.4625        -34.75  5.5     5.1891  2       9
> > -66.4625        -35     5.5     5.1891  2       9
> > -66.4625        -35.25  5.5     5.1891  2       9
> > -66.4625        -35.5   5.5     5.1891  2       9
> > -66.4625        -35.75  5.5     5.1891  2       9
> > -66.4625        -36     5.5     5.1891  2       9
> >
> > The process to reproduce this is quite involved, but I will put  
> > together
> > a recipe if needed.
> >
> > Thanks
> > Dan
> >
> > On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
> >> I think what you want is to sort the data into boxes sized to the new
> >> smaller size, and then take the median or the mode in each box.  I
> >> think you have to do this, because with the number representing an
> >> index to a type, you can only have one index per box, and an average
> >> doesn't make sense.
> >>
> >> So look at blockmode and blockmedian in GMT
> >>
> >>
> >> Walter H F Smith (PhD)
> >> Chairman, GEBCO TSCOM/SCDB
> >> Geophysicist, Laboratory for Satellite Altimetry
> >> NOAA NESDIS code E/RA-31
> >> 1335 East West Hwy, room 5408
> >> Silver Spring MD 20910-3226
> >> tel 301-713-2857, extension 126
> >> fax 301-713-3136
> >> Walter.HF.Smith@...
> >>
> >>
> >>
> >> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
> >>
> >>> Hello all,
> >>>
> >>> I have a gridded 2D dataset that contains global (43200x21600) soil
> >>> texture data.  Each 30-sec X 30-sec grid box contains a single  
> >>> integer
> >>> that represents the whole box.  There are 16 soil types  
> >>> represented in
> >>> the whole file.  I'd like to create a lower resolution (still  
> >>> global)
> >>> version of this file using a grid that is 1152x721.  The various
> >>> linear
> >>> and cubic schemes don't make sense given the 'index' nature of the
> >>> data.
> >>> Any ideas?
> >>>
> >>> Does GMT have a box-averaging scheme that includes voting.  (see the
> >>> 'vt' option in)
> >>> http://opengrads.org/doc/udxt/re/re.html
> >>>
> >>> --
> >>> Dan Kokron
> >>> Global Modeling and Assimilation Office
> >>> NASA Goddard Space Flight Center
> >>> Greenbelt, MD 20771
> >>> Daniel.S.Kokron@...
> >>> Phone: (301) 614-5192
> >>> Fax:   (301) 614-5304
> >>>
> >>> To unsubscribe, send the message "signoff gmt-help" to
> >>> listserv@...
> >>
> >> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> > --
> >
> > To unsubscribe, send the message "signoff gmt-help" to listserv@...
> > <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
>
> To unsubscribe, send the message "signoff gmt-help" to listserv@...
--
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron@...
Phone: (301) 614-5192
Fax:   (301) 614-5304

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Walter H. F. Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dan,

The answer in your case may be "no".  You might be better off with  
block median, but there are still caveats.

Before I launch into a 50-minute lecture, here is a simple example.
Your data are integers representing index numbers to something else.  
Suppose that once your data are geographically sorted, the numbers  
that fall in a particular bin are:
{ 3, 11, 12, 12 }
Since these are just index values, we don't even really know whether  
all integers in some finite range are valid answers.  For example,  
there might not be any index # 7 or #8 or #9 or #10 in your set of  
possibilities.

The math used in these GMT tools is assuming that the data are  
quantities that lie on some sort of continuum.  It is assumed that  
each numerical value in the set is drawn from some distribution.  By  
sorting the data into non-decreasing order,
x[i] <= x[i+j] when i < i+j for all i,j such that x[i],x[i+j] is in  
the list
we can make a discrete approximation to the cumulative distribution  
function, C(x).  In an ideal world such as exists in the mind of  
those who write statistics textbooks, C(x) is a continuous function  
and ideally is differentiable.  The median can be defined as that x  
such that C(x_median) = 0.5, and the mode can be defined (if it is  
unique) as that x such that dC/dx is maximized at x_mode.

For weighted data the definition of the median is a bit more  
complicated but for unweighted data one can imagine simply sorting  
the list, and taking the middle value if there are an odd number of  
points in the list, or the average of the two values nearest the  
middle if there are an even number of points in the list.  Note that,  
if in your case, the number of points in each bin is sure to be odd  
(because of the input and output grid sizes being carefully chosen),  
then the median will always result in a number which is already in  
your set, and thus will give you a safe result.  However, if the  
number of points in the list is even, as in the above example, then  
the output could be a floating point number that is not in your set  
(11.5, in the above example).

For the mode definition the process is more complicated.  We need to  
define a numerical approximation of dC/dx and search for its  
maximum.  To guard against a singularity we tend to take the finite  
difference x[i+j] - x[i] over all i for some j that is chosen large  
enough to avoid (we hope) a divide by zero, yet small enough to  
isolate a local mode.  In GMT j is set to N/2, with truncation if N  
is odd.  This has proven to be a safe bet most of the time.

The process (for unweighted data) is to loop over all i and find the  
one (or more) points where x[i+j] - x[i] is smallest.  The mode is  
the mid-point, 0.5* (x[i+j]+x[i]), for that i making the difference  
smallest.  In the case of several non-unique smallest intervals, the  
average of their midpoints is the mode used in GMT.  In the above  
example this formula leads to 11.5, although a human would have  
chosen 11.

I think that in your case if you can force the medians to work on odd  
numbers of inputs, that is the best solution for you.

--walter



Walter H F Smith (PhD)
Chairman, GEBCO TSCOM/SCDB
Geophysicist, Laboratory for Satellite Altimetry
NOAA NESDIS code E/RA-31
1335 East West Hwy, room 5408
Silver Spring MD 20910-3226
tel 301-713-2857, extension 126
fax 301-713-3136
Walter.HF.Smith@...



On Oct 30, 2009, at 11:38 AM, Dan Kokron wrote:

> Let me start by saying that I am not a C programmer.
>
> Looking through blockmode.c I notice that when the 'E' option is  
> active,
> the code makes a sorted copy of the values in a particular target  
> block
> (z_tmp).  Is it safe to compute the mode (most frequent integer value)
> on this array?
>
> Dan
>
> On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:
>> Hi Dan-
>>
>> blockmode is using the "least median of squares" algorithm to
>> approximate the mode, and this is defined as the half-point of the
>> shortest range that covers half of the data.  Since your data are
>> integers it would have been more optimal to simply pick a histogram
>> peak but for general data sets (floating point) there is no way to
>> select an optimum bin-width.  Currently, blockmode does not have an
>> option to apply a histogram algorithm to the data.  However, if you
>> roll up your sleeves for some serious scripting you could accomplish
>> what you need by invoking pshistogram on each data bin inside a loop.
>> pshistogram can return the counts instead of making a plot and you
>> could pull out the bin with the highest frequency (the mode) and  
>> build
>> a data set of modes.
>>
>> Cheers,
>> Paul
>>
>> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
>>
>>>
>>>
>>> Thanks for the pointer to 'blockmode'.  It does seem to be doing the
>>> 'right' thing most of the time, but it is giving me 'mean' values  
>>> for
>>> some blocks.  I guess this could happen if a target blocks ends up
>>> with
>>> equal counts of two values, but that is not the case here.  I've
>>> attached a series of images that I hope demonstrates the issue.
>>>
>>> The first image shows the a region of Puerto Rico that roughly
>>> corresponds to a single grid box on the target (lower resolution)
>>> grid.
>>> The squiggly line represents the coastline.  The northern part of  
>>> the
>>> image (purple) is land.
>>>
>>> The second image overlays the actual target grid box with the value
>>> that
>>> 'blockmode' came up with (11.5).
>>>
>>> The third image overlays the native grid boxes with integer  
>>> values for
>>> each.  All the land points are 9 while all the water points are 14.
>>> There are clearly more land points in the target grid box yet
>>> 'blockmode' decided to do a simple mean of the high and low values.
>>>
>>> Does this seem reasonable?
>>>
>>> There are numerous other examples in this dataset where 'blockmode'
>>> took
>>> simple means instead of mode values.  Here are a few from the xyz  
>>> file
>>> that was created using ...
>>>
>>> blockmode global_topsoil_texture_43200x21600.xyz
>>> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
>>>
>>> -66.4625        18.75   14      0       14      14
>>> -66.4625        18.5    11.5    3.7065  9       14
>>> -66.4625        18.25   9       4.4478  6       12
>>> -66.4625        18      11.5    3.7065  9       14
>>> -66.4625        8.75    5.5     5.1891  2       9
>>> -66.4625        8.5     5.5     5.1891  2       9
>>> -66.4625        -18     6       0       6       16
>>> -66.4625        -18.25  6       0       6       16
>>> -66.4625        -18.5   6       0       6       16
>>> -66.4625        -18.75  6       0       6       16
>>> -66.4625        -21.75  9.5     5.1891  4       13
>>> -66.4625        -34.75  5.5     5.1891  2       9
>>> -66.4625        -35     5.5     5.1891  2       9
>>> -66.4625        -35.25  5.5     5.1891  2       9
>>> -66.4625        -35.5   5.5     5.1891  2       9
>>> -66.4625        -35.75  5.5     5.1891  2       9
>>> -66.4625        -36     5.5     5.1891  2       9
>>>
>>> The process to reproduce this is quite involved, but I will put
>>> together
>>> a recipe if needed.
>>>
>>> Thanks
>>> Dan
>>>
>>> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
>>>> I think what you want is to sort the data into boxes sized to  
>>>> the new
>>>> smaller size, and then take the median or the mode in each box.  I
>>>> think you have to do this, because with the number representing an
>>>> index to a type, you can only have one index per box, and an  
>>>> average
>>>> doesn't make sense.
>>>>
>>>> So look at blockmode and blockmedian in GMT
>>>>
>>>>
>>>> Walter H F Smith (PhD)
>>>> Chairman, GEBCO TSCOM/SCDB
>>>> Geophysicist, Laboratory for Satellite Altimetry
>>>> NOAA NESDIS code E/RA-31
>>>> 1335 East West Hwy, room 5408
>>>> Silver Spring MD 20910-3226
>>>> tel 301-713-2857, extension 126
>>>> fax 301-713-3136
>>>> Walter.HF.Smith@...
>>>>
>>>>
>>>>
>>>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> I have a gridded 2D dataset that contains global (43200x21600)  
>>>>> soil
>>>>> texture data.  Each 30-sec X 30-sec grid box contains a single
>>>>> integer
>>>>> that represents the whole box.  There are 16 soil types
>>>>> represented in
>>>>> the whole file.  I'd like to create a lower resolution (still
>>>>> global)
>>>>> version of this file using a grid that is 1152x721.  The various
>>>>> linear
>>>>> and cubic schemes don't make sense given the 'index' nature of the
>>>>> data.
>>>>> Any ideas?
>>>>>
>>>>> Does GMT have a box-averaging scheme that includes voting.  
>>>>> (see the
>>>>> 'vt' option in)
>>>>> http://opengrads.org/doc/udxt/re/re.html
>>>>>
>>>>> --
>>>>> Dan Kokron
>>>>> Global Modeling and Assimilation Office
>>>>> NASA Goddard Space Flight Center
>>>>> Greenbelt, MD 20771
>>>>> Daniel.S.Kokron@...
>>>>> Phone: (301) 614-5192
>>>>> Fax:   (301) 614-5304
>>>>>
>>>>> To unsubscribe, send the message "signoff gmt-help" to
>>>>> listserv@...
>>>>
>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>> listserv@...
>>> --
>>>
>>> To unsubscribe, send the message "signoff gmt-help" to  
>>> listserv@...
>>> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
>>
>> To unsubscribe, send the message "signoff gmt-help" to  
>> listserv@...
> --
> Dan Kokron
> Global Modeling and Assimilation Office
> NASA Goddard Space Flight Center
> Greenbelt, MD 20771
> Daniel.S.Kokron@...
> Phone: (301) 614-5192
> Fax:   (301) 614-5304
>
> To unsubscribe, send the message "signoff gmt-help" to  
> listserv@...

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Walter H. F. Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> In the above example this formula leads to 11.5, although a human  
> would have chosen 11.

I meant, of course, the human would have chosen 12.  Sorry.

w


On Oct 30, 2009, at 1:32 PM, Walter H. F. Smith wrote:

> Dan,
>
> The answer in your case may be "no".  You might be better off with  
> block median, but there are still caveats.
>
> Before I launch into a 50-minute lecture, here is a simple example.
> Your data are integers representing index numbers to something  
> else.  Suppose that once your data are geographically sorted, the  
> numbers that fall in a particular bin are:
> { 3, 11, 12, 12 }
> Since these are just index values, we don't even really know  
> whether all integers in some finite range are valid answers.  For  
> example, there might not be any index # 7 or #8 or #9 or #10 in  
> your set of possibilities.
>
> The math used in these GMT tools is assuming that the data are  
> quantities that lie on some sort of continuum.  It is assumed that  
> each numerical value in the set is drawn from some distribution.  
> By sorting the data into non-decreasing order,
> x[i] <= x[i+j] when i < i+j for all i,j such that x[i],x[i+j] is in  
> the list
> we can make a discrete approximation to the cumulative distribution  
> function, C(x).  In an ideal world such as exists in the mind of  
> those who write statistics textbooks, C(x) is a continuous function  
> and ideally is differentiable.  The median can be defined as that x  
> such that C(x_median) = 0.5, and the mode can be defined (if it is  
> unique) as that x such that dC/dx is maximized at x_mode.
>
> For weighted data the definition of the median is a bit more  
> complicated but for unweighted data one can imagine simply sorting  
> the list, and taking the middle value if there are an odd number of  
> points in the list, or the average of the two values nearest the  
> middle if there are an even number of points in the list.  Note  
> that, if in your case, the number of points in each bin is sure to  
> be odd (because of the input and output grid sizes being carefully  
> chosen), then the median will always result in a number which is  
> already in your set, and thus will give you a safe result.  
> However, if the number of points in the list is even, as in the  
> above example, then the output could be a floating point number  
> that is not in your set (11.5, in the above example).
>
> For the mode definition the process is more complicated.  We need  
> to define a numerical approximation of dC/dx and search for its  
> maximum.  To guard against a singularity we tend to take the finite  
> difference x[i+j] - x[i] over all i for some j that is chosen large  
> enough to avoid (we hope) a divide by zero, yet small enough to  
> isolate a local mode.  In GMT j is set to N/2, with truncation if N  
> is odd.  This has proven to be a safe bet most of the time.
>
> The process (for unweighted data) is to loop over all i and find  
> the one (or more) points where x[i+j] - x[i] is smallest.  The mode  
> is the mid-point, 0.5* (x[i+j]+x[i]), for that i making the  
> difference smallest.  In the case of several non-unique smallest  
> intervals, the average of their midpoints is the mode used in GMT.  
> In the above example this formula leads to 11.5, although a human  
> would have chosen 11.
>
> I think that in your case if you can force the medians to work on  
> odd numbers of inputs, that is the best solution for you.
>
> --walter
>
>
>
> Walter H F Smith (PhD)
> Chairman, GEBCO TSCOM/SCDB
> Geophysicist, Laboratory for Satellite Altimetry
> NOAA NESDIS code E/RA-31
> 1335 East West Hwy, room 5408
> Silver Spring MD 20910-3226
> tel 301-713-2857, extension 126
> fax 301-713-3136
> Walter.HF.Smith@...
>
>
>
> On Oct 30, 2009, at 11:38 AM, Dan Kokron wrote:
>
>> Let me start by saying that I am not a C programmer.
>>
>> Looking through blockmode.c I notice that when the 'E' option is  
>> active,
>> the code makes a sorted copy of the values in a particular target  
>> block
>> (z_tmp).  Is it safe to compute the mode (most frequent integer  
>> value)
>> on this array?
>>
>> Dan
>>
>> On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:
>>> Hi Dan-
>>>
>>> blockmode is using the "least median of squares" algorithm to
>>> approximate the mode, and this is defined as the half-point of the
>>> shortest range that covers half of the data.  Since your data are
>>> integers it would have been more optimal to simply pick a histogram
>>> peak but for general data sets (floating point) there is no way to
>>> select an optimum bin-width.  Currently, blockmode does not have an
>>> option to apply a histogram algorithm to the data.  However, if you
>>> roll up your sleeves for some serious scripting you could accomplish
>>> what you need by invoking pshistogram on each data bin inside a  
>>> loop.
>>> pshistogram can return the counts instead of making a plot and you
>>> could pull out the bin with the highest frequency (the mode) and  
>>> build
>>> a data set of modes.
>>>
>>> Cheers,
>>> Paul
>>>
>>> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
>>>
>>>>
>>>>
>>>> Thanks for the pointer to 'blockmode'.  It does seem to be doing  
>>>> the
>>>> 'right' thing most of the time, but it is giving me 'mean'  
>>>> values for
>>>> some blocks.  I guess this could happen if a target blocks ends up
>>>> with
>>>> equal counts of two values, but that is not the case here.  I've
>>>> attached a series of images that I hope demonstrates the issue.
>>>>
>>>> The first image shows the a region of Puerto Rico that roughly
>>>> corresponds to a single grid box on the target (lower resolution)
>>>> grid.
>>>> The squiggly line represents the coastline.  The northern part  
>>>> of the
>>>> image (purple) is land.
>>>>
>>>> The second image overlays the actual target grid box with the value
>>>> that
>>>> 'blockmode' came up with (11.5).
>>>>
>>>> The third image overlays the native grid boxes with integer  
>>>> values for
>>>> each.  All the land points are 9 while all the water points are 14.
>>>> There are clearly more land points in the target grid box yet
>>>> 'blockmode' decided to do a simple mean of the high and low values.
>>>>
>>>> Does this seem reasonable?
>>>>
>>>> There are numerous other examples in this dataset where 'blockmode'
>>>> took
>>>> simple means instead of mode values.  Here are a few from the  
>>>> xyz file
>>>> that was created using ...
>>>>
>>>> blockmode global_topsoil_texture_43200x21600.xyz
>>>> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
>>>>
>>>> -66.4625        18.75   14      0       14      14
>>>> -66.4625        18.5    11.5    3.7065  9       14
>>>> -66.4625        18.25   9       4.4478  6       12
>>>> -66.4625        18      11.5    3.7065  9       14
>>>> -66.4625        8.75    5.5     5.1891  2       9
>>>> -66.4625        8.5     5.5     5.1891  2       9
>>>> -66.4625        -18     6       0       6       16
>>>> -66.4625        -18.25  6       0       6       16
>>>> -66.4625        -18.5   6       0       6       16
>>>> -66.4625        -18.75  6       0       6       16
>>>> -66.4625        -21.75  9.5     5.1891  4       13
>>>> -66.4625        -34.75  5.5     5.1891  2       9
>>>> -66.4625        -35     5.5     5.1891  2       9
>>>> -66.4625        -35.25  5.5     5.1891  2       9
>>>> -66.4625        -35.5   5.5     5.1891  2       9
>>>> -66.4625        -35.75  5.5     5.1891  2       9
>>>> -66.4625        -36     5.5     5.1891  2       9
>>>>
>>>> The process to reproduce this is quite involved, but I will put
>>>> together
>>>> a recipe if needed.
>>>>
>>>> Thanks
>>>> Dan
>>>>
>>>> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
>>>>> I think what you want is to sort the data into boxes sized to  
>>>>> the new
>>>>> smaller size, and then take the median or the mode in each box.  I
>>>>> think you have to do this, because with the number representing an
>>>>> index to a type, you can only have one index per box, and an  
>>>>> average
>>>>> doesn't make sense.
>>>>>
>>>>> So look at blockmode and blockmedian in GMT
>>>>>
>>>>>
>>>>> Walter H F Smith (PhD)
>>>>> Chairman, GEBCO TSCOM/SCDB
>>>>> Geophysicist, Laboratory for Satellite Altimetry
>>>>> NOAA NESDIS code E/RA-31
>>>>> 1335 East West Hwy, room 5408
>>>>> Silver Spring MD 20910-3226
>>>>> tel 301-713-2857, extension 126
>>>>> fax 301-713-3136
>>>>> Walter.HF.Smith@...
>>>>>
>>>>>
>>>>>
>>>>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I have a gridded 2D dataset that contains global (43200x21600)  
>>>>>> soil
>>>>>> texture data.  Each 30-sec X 30-sec grid box contains a single
>>>>>> integer
>>>>>> that represents the whole box.  There are 16 soil types
>>>>>> represented in
>>>>>> the whole file.  I'd like to create a lower resolution (still
>>>>>> global)
>>>>>> version of this file using a grid that is 1152x721.  The various
>>>>>> linear
>>>>>> and cubic schemes don't make sense given the 'index' nature of  
>>>>>> the
>>>>>> data.
>>>>>> Any ideas?
>>>>>>
>>>>>> Does GMT have a box-averaging scheme that includes voting.  
>>>>>> (see the
>>>>>> 'vt' option in)
>>>>>> http://opengrads.org/doc/udxt/re/re.html
>>>>>>
>>>>>> --
>>>>>> Dan Kokron
>>>>>> Global Modeling and Assimilation Office
>>>>>> NASA Goddard Space Flight Center
>>>>>> Greenbelt, MD 20771
>>>>>> Daniel.S.Kokron@...
>>>>>> Phone: (301) 614-5192
>>>>>> Fax:   (301) 614-5304
>>>>>>
>>>>>> To unsubscribe, send the message "signoff gmt-help" to
>>>>>> listserv@...
>>>>>
>>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>>> listserv@...
>>>> --
>>>>
>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>> listserv@...
>>>> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
>>>
>>> To unsubscribe, send the message "signoff gmt-help" to  
>>> listserv@...
>> --
>> Dan Kokron
>> Global Modeling and Assimilation Office
>> NASA Goddard Space Flight Center
>> Greenbelt, MD 20771
>> Daniel.S.Kokron@...
>> Phone: (301) 614-5192
>> Fax:   (301) 614-5304
>>
>> To unsubscribe, send the message "signoff gmt-help" to  
>> listserv@...
>
> To unsubscribe, send the message "signoff gmt-help" to  
> listserv@...

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Remko Scharroo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

As always, Walter, thanks for the lecture. Enlightning as always.

I want to point out a bit stronger that doing a median search is of no  
use to Dan, since he is dealing with categorical data, which most  
likely cannot be sorted.

If the indexes 3, 10, 11, and 12 mean 3, 10, 11 and 12 trees per acre,  
then you can do a median test. But 3, 10, 11, 12 may mean, "river  
bed", "tundra", "salt flat" and "parking lot", which is not a sortable  
list, and hence the median (nor the mean) makes any sense. So Dan is  
right that he needs the mode to figure if in a certain section we have  
mostly "parking lot", for example.

I'm glad Walter explained how the "mode" in a "continuous series" can  
be found by looking for the largest derivative of the cumulative  
distribution function C(x). That would also work for your index  
values, even if C(x) itself doesn't really make any sense.

But blockmode is actually not doing the derivative at all, but is  
looking for the least difference between x[i+j] - x[i] where j = N/2  
(half the number of points in the block, truncated). The mode is then  
the midpoint, 0.5 * (x[i+j] + x[i]). Looking for an even number of  
points in a block, though laudable, is of no use in this matter. x[i
+j] and x[i] are realizations, index numbers, so taking the average of  
"tundra" (10) and "parking lot" (11) ends up to be "salt flat" (12).

Blockmean should have an option to do a true mode on integer values,  
simply by sorting (as now) and looking for the largest interval j  
where x[i+j] - x[i] < SMALL (any small number < 1 will do, 0.5 is  
fine). In fact this works too for continuous data, where the user  
could set SMALL to whatever limit. The mode would then be 0.5 * (x[i
+j] + x[i]), which in the integer case will be x[i+j] = x[i].
That is a minor change in the code, that Dan may be able to implement  
and we're happy to make that part of blockmode.

Remko


On 30 Oct 2009, at 13:32 , Walter H. F. Smith wrote:

> Dan,
>
> The answer in your case may be "no".  You might be better off with  
> block median, but there are still caveats.
>
> Before I launch into a 50-minute lecture, here is a simple example.
> Your data are integers representing index numbers to something  
> else.  Suppose that once your data are geographically sorted, the  
> numbers that fall in a particular bin are:
> { 3, 11, 12, 12 }
> Since these are just index values, we don't even really know whether  
> all integers in some finite range are valid answers.  For example,  
> there might not be any index # 7 or #8 or #9 or #10 in your set of  
> possibilities.
>
> The math used in these GMT tools is assuming that the data are  
> quantities that lie on some sort of continuum.  It is assumed that  
> each numerical value in the set is drawn from some distribution.  By  
> sorting the data into non-decreasing order,
> x[i] <= x[i+j] when i < i+j for all i,j such that x[i],x[i+j] is in  
> the list
> we can make a discrete approximation to the cumulative distribution  
> function, C(x).  In an ideal world such as exists in the mind of  
> those who write statistics textbooks, C(x) is a continuous function  
> and ideally is differentiable.  The median can be defined as that x  
> such that C(x_median) = 0.5, and the mode can be defined (if it is  
> unique) as that x such that dC/dx is maximized at x_mode.
>
> For weighted data the definition of the median is a bit more  
> complicated but for unweighted data one can imagine simply sorting  
> the list, and taking the middle value if there are an odd number of  
> points in the list, or the average of the two values nearest the  
> middle if there are an even number of points in the list.  Note  
> that, if in your case, the number of points in each bin is sure to  
> be odd (because of the input and output grid sizes being carefully  
> chosen), then the median will always result in a number which is  
> already in your set, and thus will give you a safe result.  However,  
> if the number of points in the list is even, as in the above  
> example, then the output could be a floating point number that is  
> not in your set (11.5, in the above example).
>
> For the mode definition the process is more complicated.  We need to  
> define a numerical approximation of dC/dx and search for its  
> maximum.  To guard against a singularity we tend to take the finite  
> difference x[i+j] - x[i] over all i for some j that is chosen large  
> enough to avoid (we hope) a divide by zero, yet small enough to  
> isolate a local mode.  In GMT j is set to N/2, with truncation if N  
> is odd.  This has proven to be a safe bet most of the time.
>
> The process (for unweighted data) is to loop over all i and find the  
> one (or more) points where x[i+j] - x[i] is smallest.  The mode is  
> the mid-point, 0.5* (x[i+j]+x[i]), for that i making the difference  
> smallest.  In the case of several non-unique smallest intervals, the  
> average of their midpoints is the mode used in GMT.  In the above  
> example this formula leads to 11.5, although a human would have  
> chosen 11.
>
> I think that in your case if you can force the medians to work on  
> odd numbers of inputs, that is the best solution for you.
>
> --walter
>
>
>
> Walter H F Smith (PhD)
> Chairman, GEBCO TSCOM/SCDB
> Geophysicist, Laboratory for Satellite Altimetry
> NOAA NESDIS code E/RA-31
> 1335 East West Hwy, room 5408
> Silver Spring MD 20910-3226
> tel 301-713-2857, extension 126
> fax 301-713-3136
> Walter.HF.Smith@...
>
>
>
> On Oct 30, 2009, at 11:38 AM, Dan Kokron wrote:
>
>> Let me start by saying that I am not a C programmer.
>>
>> Looking through blockmode.c I notice that when the 'E' option is  
>> active,
>> the code makes a sorted copy of the values in a particular target  
>> block
>> (z_tmp).  Is it safe to compute the mode (most frequent integer  
>> value)
>> on this array?
>>
>> Dan
>>
>> On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:
>>> Hi Dan-
>>>
>>> blockmode is using the "least median of squares" algorithm to
>>> approximate the mode, and this is defined as the half-point of the
>>> shortest range that covers half of the data.  Since your data are
>>> integers it would have been more optimal to simply pick a histogram
>>> peak but for general data sets (floating point) there is no way to
>>> select an optimum bin-width.  Currently, blockmode does not have an
>>> option to apply a histogram algorithm to the data.  However, if you
>>> roll up your sleeves for some serious scripting you could accomplish
>>> what you need by invoking pshistogram on each data bin inside a  
>>> loop.
>>> pshistogram can return the counts instead of making a plot and you
>>> could pull out the bin with the highest frequency (the mode) and  
>>> build
>>> a data set of modes.
>>>
>>> Cheers,
>>> Paul
>>>
>>> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
>>>
>>>>
>>>>
>>>> Thanks for the pointer to 'blockmode'.  It does seem to be doing  
>>>> the
>>>> 'right' thing most of the time, but it is giving me 'mean' values  
>>>> for
>>>> some blocks.  I guess this could happen if a target blocks ends up
>>>> with
>>>> equal counts of two values, but that is not the case here.  I've
>>>> attached a series of images that I hope demonstrates the issue.
>>>>
>>>> The first image shows the a region of Puerto Rico that roughly
>>>> corresponds to a single grid box on the target (lower resolution)
>>>> grid.
>>>> The squiggly line represents the coastline.  The northern part of  
>>>> the
>>>> image (purple) is land.
>>>>
>>>> The second image overlays the actual target grid box with the value
>>>> that
>>>> 'blockmode' came up with (11.5).
>>>>
>>>> The third image overlays the native grid boxes with integer  
>>>> values for
>>>> each.  All the land points are 9 while all the water points are 14.
>>>> There are clearly more land points in the target grid box yet
>>>> 'blockmode' decided to do a simple mean of the high and low values.
>>>>
>>>> Does this seem reasonable?
>>>>
>>>> There are numerous other examples in this dataset where 'blockmode'
>>>> took
>>>> simple means instead of mode values.  Here are a few from the xyz  
>>>> file
>>>> that was created using ...
>>>>
>>>> blockmode global_topsoil_texture_43200x21600.xyz
>>>> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
>>>>
>>>> -66.4625        18.75   14      0       14      14
>>>> -66.4625        18.5    11.5    3.7065  9       14
>>>> -66.4625        18.25   9       4.4478  6       12
>>>> -66.4625        18      11.5    3.7065  9       14
>>>> -66.4625        8.75    5.5     5.1891  2       9
>>>> -66.4625        8.5     5.5     5.1891  2       9
>>>> -66.4625        -18     6       0       6       16
>>>> -66.4625        -18.25  6       0       6       16
>>>> -66.4625        -18.5   6       0       6       16
>>>> -66.4625        -18.75  6       0       6       16
>>>> -66.4625        -21.75  9.5     5.1891  4       13
>>>> -66.4625        -34.75  5.5     5.1891  2       9
>>>> -66.4625        -35     5.5     5.1891  2       9
>>>> -66.4625        -35.25  5.5     5.1891  2       9
>>>> -66.4625        -35.5   5.5     5.1891  2       9
>>>> -66.4625        -35.75  5.5     5.1891  2       9
>>>> -66.4625        -36     5.5     5.1891  2       9
>>>>
>>>> The process to reproduce this is quite involved, but I will put
>>>> together
>>>> a recipe if needed.
>>>>
>>>> Thanks
>>>> Dan
>>>>
>>>> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
>>>>> I think what you want is to sort the data into boxes sized to  
>>>>> the new
>>>>> smaller size, and then take the median or the mode in each box.  I
>>>>> think you have to do this, because with the number representing an
>>>>> index to a type, you can only have one index per box, and an  
>>>>> average
>>>>> doesn't make sense.
>>>>>
>>>>> So look at blockmode and blockmedian in GMT
>>>>>
>>>>>
>>>>> Walter H F Smith (PhD)
>>>>> Chairman, GEBCO TSCOM/SCDB
>>>>> Geophysicist, Laboratory for Satellite Altimetry
>>>>> NOAA NESDIS code E/RA-31
>>>>> 1335 East West Hwy, room 5408
>>>>> Silver Spring MD 20910-3226
>>>>> tel 301-713-2857, extension 126
>>>>> fax 301-713-3136
>>>>> Walter.HF.Smith@...
>>>>>
>>>>>
>>>>>
>>>>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I have a gridded 2D dataset that contains global (43200x21600)  
>>>>>> soil
>>>>>> texture data.  Each 30-sec X 30-sec grid box contains a single
>>>>>> integer
>>>>>> that represents the whole box.  There are 16 soil types
>>>>>> represented in
>>>>>> the whole file.  I'd like to create a lower resolution (still
>>>>>> global)
>>>>>> version of this file using a grid that is 1152x721.  The various
>>>>>> linear
>>>>>> and cubic schemes don't make sense given the 'index' nature of  
>>>>>> the
>>>>>> data.
>>>>>> Any ideas?
>>>>>>
>>>>>> Does GMT have a box-averaging scheme that includes voting.  
>>>>>> (see the
>>>>>> 'vt' option in)
>>>>>> http://opengrads.org/doc/udxt/re/re.html
>>>>>>
>>>>>> --
>>>>>> Dan Kokron
>>>>>> Global Modeling and Assimilation Office
>>>>>> NASA Goddard Space Flight Center
>>>>>> Greenbelt, MD 20771
>>>>>> Daniel.S.Kokron@...
>>>>>> Phone: (301) 614-5192
>>>>>> Fax:   (301) 614-5304
>>>>>>
>>>>>> To unsubscribe, send the message "signoff gmt-help" to
>>>>>> listserv@...
>>>>>
>>>>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
>>>> --
>>>>
>>>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
>>>> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
>>>
>>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
>> --
>> Dan Kokron
>> Global Modeling and Assimilation Office
>> NASA Goddard Space Flight Center
>> Greenbelt, MD 20771
>> Daniel.S.Kokron@...
>> Phone: (301) 614-5192
>> Fax:   (301) 614-5304
>>
>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
>
> To unsubscribe, send the message "signoff gmt-help" to listserv@...

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Dan Kokron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This quote explains my situation very well.

"taking the average of "tundra" (10) and "parking lot" (11) ends up to
be "salt flat" (12)."

I'm working on a subroutine now and learning some C along the way.  It's
just a simple 'switch' structure which should work for my 16 category
problem.

If someone wants to code up a more general routine, I'd be happy to test
it. : )

Dan

On Fri, 2009-10-30 at 13:19 -0500, Remko Scharroo wrote:

> As always, Walter, thanks for the lecture. Enlightning as always.
>
> I want to point out a bit stronger that doing a median search is of no
> use to Dan, since he is dealing with categorical data, which most
> likely cannot be sorted.
>
> If the indexes 3, 10, 11, and 12 mean 3, 10, 11 and 12 trees per acre,
> then you can do a median test. But 3, 10, 11, 12 may mean, "river
> bed", "tundra", "salt flat" and "parking lot", which is not a sortable
> list, and hence the median (nor the mean) makes any sense. So Dan is
> right that he needs the mode to figure if in a certain section we have
> mostly "parking lot", for example.
>
> I'm glad Walter explained how the "mode" in a "continuous series" can
> be found by looking for the largest derivative of the cumulative
> distribution function C(x). That would also work for your index
> values, even if C(x) itself doesn't really make any sense.
>
> But blockmode is actually not doing the derivative at all, but is
> looking for the least difference between x[i+j] - x[i] where j = N/2
> (half the number of points in the block, truncated). The mode is then
> the midpoint, 0.5 * (x[i+j] + x[i]). Looking for an even number of
> points in a block, though laudable, is of no use in this matter. x[i
> +j] and x[i] are realizations, index numbers, so taking the average of
> "tundra" (10) and "parking lot" (11) ends up to be "salt flat" (12).
>
> Blockmean should have an option to do a true mode on integer values,
> simply by sorting (as now) and looking for the largest interval j
> where x[i+j] - x[i] < SMALL (any small number < 1 will do, 0.5 is
> fine). In fact this works too for continuous data, where the user
> could set SMALL to whatever limit. The mode would then be 0.5 * (x[i
> +j] + x[i]), which in the integer case will be x[i+j] = x[i].
> That is a minor change in the code, that Dan may be able to implement
> and we're happy to make that part of blockmode.
>
> Remko
>
>
> On 30 Oct 2009, at 13:32 , Walter H. F. Smith wrote:
>
> > Dan,
> >
> > The answer in your case may be "no".  You might be better off with
> > block median, but there are still caveats.
> >
> > Before I launch into a 50-minute lecture, here is a simple example.
> > Your data are integers representing index numbers to something
> > else.  Suppose that once your data are geographically sorted, the
> > numbers that fall in a particular bin are:
> > { 3, 11, 12, 12 }
> > Since these are just index values, we don't even really know whether
> > all integers in some finite range are valid answers.  For example,
> > there might not be any index # 7 or #8 or #9 or #10 in your set of
> > possibilities.
> >
> > The math used in these GMT tools is assuming that the data are
> > quantities that lie on some sort of continuum.  It is assumed that
> > each numerical value in the set is drawn from some distribution.  By
> > sorting the data into non-decreasing order,
> > x[i] <= x[i+j] when i < i+j for all i,j such that x[i],x[i+j] is in
> > the list
> > we can make a discrete approximation to the cumulative distribution
> > function, C(x).  In an ideal world such as exists in the mind of
> > those who write statistics textbooks, C(x) is a continuous function
> > and ideally is differentiable.  The median can be defined as that x
> > such that C(x_median) = 0.5, and the mode can be defined (if it is
> > unique) as that x such that dC/dx is maximized at x_mode.
> >
> > For weighted data the definition of the median is a bit more
> > complicated but for unweighted data one can imagine simply sorting
> > the list, and taking the middle value if there are an odd number of
> > points in the list, or the average of the two values nearest the
> > middle if there are an even number of points in the list.  Note
> > that, if in your case, the number of points in each bin is sure to
> > be odd (because of the input and output grid sizes being carefully
> > chosen), then the median will always result in a number which is
> > already in your set, and thus will give you a safe result.  However,
> > if the number of points in the list is even, as in the above
> > example, then the output could be a floating point number that is
> > not in your set (11.5, in the above example).
> >
> > For the mode definition the process is more complicated.  We need to
> > define a numerical approximation of dC/dx and search for its
> > maximum.  To guard against a singularity we tend to take the finite
> > difference x[i+j] - x[i] over all i for some j that is chosen large
> > enough to avoid (we hope) a divide by zero, yet small enough to
> > isolate a local mode.  In GMT j is set to N/2, with truncation if N
> > is odd.  This has proven to be a safe bet most of the time.
> >
> > The process (for unweighted data) is to loop over all i and find the
> > one (or more) points where x[i+j] - x[i] is smallest.  The mode is
> > the mid-point, 0.5* (x[i+j]+x[i]), for that i making the difference
> > smallest.  In the case of several non-unique smallest intervals, the
> > average of their midpoints is the mode used in GMT.  In the above
> > example this formula leads to 11.5, although a human would have
> > chosen 11.
> >
> > I think that in your case if you can force the medians to work on
> > odd numbers of inputs, that is the best solution for you.
> >
> > --walter
> >
> >
> >
> > Walter H F Smith (PhD)
> > Chairman, GEBCO TSCOM/SCDB
> > Geophysicist, Laboratory for Satellite Altimetry
> > NOAA NESDIS code E/RA-31
> > 1335 East West Hwy, room 5408
> > Silver Spring MD 20910-3226
> > tel 301-713-2857, extension 126
> > fax 301-713-3136
> > Walter.HF.Smith@...
> >
> >
> >
> > On Oct 30, 2009, at 11:38 AM, Dan Kokron wrote:
> >
> >> Let me start by saying that I am not a C programmer.
> >>
> >> Looking through blockmode.c I notice that when the 'E' option is
> >> active,
> >> the code makes a sorted copy of the values in a particular target
> >> block
> >> (z_tmp).  Is it safe to compute the mode (most frequent integer
> >> value)
> >> on this array?
> >>
> >> Dan
> >>
> >> On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:
> >>> Hi Dan-
> >>>
> >>> blockmode is using the "least median of squares" algorithm to
> >>> approximate the mode, and this is defined as the half-point of the
> >>> shortest range that covers half of the data.  Since your data are
> >>> integers it would have been more optimal to simply pick a histogram
> >>> peak but for general data sets (floating point) there is no way to
> >>> select an optimum bin-width.  Currently, blockmode does not have an
> >>> option to apply a histogram algorithm to the data.  However, if you
> >>> roll up your sleeves for some serious scripting you could accomplish
> >>> what you need by invoking pshistogram on each data bin inside a
> >>> loop.
> >>> pshistogram can return the counts instead of making a plot and you
> >>> could pull out the bin with the highest frequency (the mode) and
> >>> build
> >>> a data set of modes.
> >>>
> >>> Cheers,
> >>> Paul
> >>>
> >>> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
> >>>
> >>>>
> >>>>
> >>>> Thanks for the pointer to 'blockmode'.  It does seem to be doing
> >>>> the
> >>>> 'right' thing most of the time, but it is giving me 'mean' values
> >>>> for
> >>>> some blocks.  I guess this could happen if a target blocks ends up
> >>>> with
> >>>> equal counts of two values, but that is not the case here.  I've
> >>>> attached a series of images that I hope demonstrates the issue.
> >>>>
> >>>> The first image shows the a region of Puerto Rico that roughly
> >>>> corresponds to a single grid box on the target (lower resolution)
> >>>> grid.
> >>>> The squiggly line represents the coastline.  The northern part of
> >>>> the
> >>>> image (purple) is land.
> >>>>
> >>>> The second image overlays the actual target grid box with the value
> >>>> that
> >>>> 'blockmode' came up with (11.5).
> >>>>
> >>>> The third image overlays the native grid boxes with integer
> >>>> values for
> >>>> each.  All the land points are 9 while all the water points are 14.
> >>>> There are clearly more land points in the target grid box yet
> >>>> 'blockmode' decided to do a simple mean of the high and low values.
> >>>>
> >>>> Does this seem reasonable?
> >>>>
> >>>> There are numerous other examples in this dataset where 'blockmode'
> >>>> took
> >>>> simple means instead of mode values.  Here are a few from the xyz
> >>>> file
> >>>> that was created using ...
> >>>>
> >>>> blockmode global_topsoil_texture_43200x21600.xyz
> >>>> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
> >>>>
> >>>> -66.4625        18.75   14      0       14      14
> >>>> -66.4625        18.5    11.5    3.7065  9       14
> >>>> -66.4625        18.25   9       4.4478  6       12
> >>>> -66.4625        18      11.5    3.7065  9       14
> >>>> -66.4625        8.75    5.5     5.1891  2       9
> >>>> -66.4625        8.5     5.5     5.1891  2       9
> >>>> -66.4625        -18     6       0       6       16
> >>>> -66.4625        -18.25  6       0       6       16
> >>>> -66.4625        -18.5   6       0       6       16
> >>>> -66.4625        -18.75  6       0       6       16
> >>>> -66.4625        -21.75  9.5     5.1891  4       13
> >>>> -66.4625        -34.75  5.5     5.1891  2       9
> >>>> -66.4625        -35     5.5     5.1891  2       9
> >>>> -66.4625        -35.25  5.5     5.1891  2       9
> >>>> -66.4625        -35.5   5.5     5.1891  2       9
> >>>> -66.4625        -35.75  5.5     5.1891  2       9
> >>>> -66.4625        -36     5.5     5.1891  2       9
> >>>>
> >>>> The process to reproduce this is quite involved, but I will put
> >>>> together
> >>>> a recipe if needed.
> >>>>
> >>>> Thanks
> >>>> Dan
> >>>>
> >>>> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
> >>>>> I think what you want is to sort the data into boxes sized to
> >>>>> the new
> >>>>> smaller size, and then take the median or the mode in each box.  I
> >>>>> think you have to do this, because with the number representing an
> >>>>> index to a type, you can only have one index per box, and an
> >>>>> average
> >>>>> doesn't make sense.
> >>>>>
> >>>>> So look at blockmode and blockmedian in GMT
> >>>>>
> >>>>>
> >>>>> Walter H F Smith (PhD)
> >>>>> Chairman, GEBCO TSCOM/SCDB
> >>>>> Geophysicist, Laboratory for Satellite Altimetry
> >>>>> NOAA NESDIS code E/RA-31
> >>>>> 1335 East West Hwy, room 5408
> >>>>> Silver Spring MD 20910-3226
> >>>>> tel 301-713-2857, extension 126
> >>>>> fax 301-713-3136
> >>>>> Walter.HF.Smith@...
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
> >>>>>
> >>>>>> Hello all,
> >>>>>>
> >>>>>> I have a gridded 2D dataset that contains global (43200x21600)
> >>>>>> soil
> >>>>>> texture data.  Each 30-sec X 30-sec grid box contains a single
> >>>>>> integer
> >>>>>> that represents the whole box.  There are 16 soil types
> >>>>>> represented in
> >>>>>> the whole file.  I'd like to create a lower resolution (still
> >>>>>> global)
> >>>>>> version of this file using a grid that is 1152x721.  The various
> >>>>>> linear
> >>>>>> and cubic schemes don't make sense given the 'index' nature of
> >>>>>> the
> >>>>>> data.
> >>>>>> Any ideas?
> >>>>>>
> >>>>>> Does GMT have a box-averaging scheme that includes voting.
> >>>>>> (see the
> >>>>>> 'vt' option in)
> >>>>>> http://opengrads.org/doc/udxt/re/re.html
> >>>>>>
> >>>>>> --
> >>>>>> Dan Kokron
> >>>>>> Global Modeling and Assimilation Office
> >>>>>> NASA Goddard Space Flight Center
> >>>>>> Greenbelt, MD 20771
> >>>>>> Daniel.S.Kokron@...
> >>>>>> Phone: (301) 614-5192
> >>>>>> Fax:   (301) 614-5304
> >>>>>>
> >>>>>> To unsubscribe, send the message "signoff gmt-help" to
> >>>>>> listserv@...
> >>>>>
> >>>>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> >>>> --
> >>>>
> >>>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> >>>> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
> >>>
> >>> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> >> --
> >> Dan Kokron
> >> Global Modeling and Assimilation Office
> >> NASA Goddard Space Flight Center
> >> Greenbelt, MD 20771
> >> Daniel.S.Kokron@...
> >> Phone: (301) 614-5192
> >> Fax:   (301) 614-5304
> >>
> >> To unsubscribe, send the message "signoff gmt-help" to listserv@...
> >
> > To unsubscribe, send the message "signoff gmt-help" to listserv@...
>
> To unsubscribe, send the message "signoff gmt-help" to listserv@...
--
Dan Kokron
Global Modeling and Assimilation Office
NASA Goddard Space Flight Center
Greenbelt, MD 20771
Daniel.S.Kokron@...
Phone: (301) 614-5192
Fax:   (301) 614-5304

To unsubscribe, send the message "signoff gmt-help" to listserv@...

Re: regriggging 'index' data

by Walter H. F. Smith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Remko is right.  But the median still might not be a bad approach.

If 3 means rock and 7 means tree and 11 means hamburger and 28 means  
pizza, then the median of these numbers is nonsense.

The user has a high-resolution (I mean small grid spacing) grid, at  
each point of which an index category was assigned.  He wanted to  
make a lower resolution (larger grid spacing) grid, and wanted to  
know how to assign category numbers to the low resolution grid that  
would properly choose one of the valid categories.

This would be an operation for a mode, if we had a way of getting a  
mode that was sure to choose one of the categories in the list.  As I  
explained, the mode operator we have in GMT won't do that.

If the data categories are chosen so that there aren't big gaps among  
the valid integers used, and if "nearby" integers represent nearby  
categories, then the median may be a good choice, because it may be  
"close" to what an ideal mode might have been.

But yes to what Remko said, it is possible to choose the index values  
so as to make the median idea truly awful.

Another approach is just to down-sample the grid naively, using  
grdsample, being sure that the new grid points lie on top of a subset  
of the pre-existing grid points, so no interpolation is done.  This  
is sure to work, but it will "alias" the data in some way.  (What  
aliasing means in the context of category indices is an exercise for  
the reader.)

walter



Walter H F Smith (PhD)
Chairman, GEBCO TSCOM/SCDB
Geophysicist, Laboratory for Satellite Altimetry
NOAA NESDIS code E/RA-31
1335 East West Hwy, room 5408
Silver Spring MD 20910-3226
tel 301-713-2857, extension 126
fax 301-713-3136
Walter.HF.Smith@...



On Oct 30, 2009, at 2:19 PM, Remko Scharroo wrote:

> As always, Walter, thanks for the lecture. Enlightning as always.
>
> I want to point out a bit stronger that doing a median search is of  
> no use to Dan, since he is dealing with categorical data, which  
> most likely cannot be sorted.
>
> If the indexes 3, 10, 11, and 12 mean 3, 10, 11 and 12 trees per  
> acre, then you can do a median test. But 3, 10, 11, 12 may mean,  
> "river bed", "tundra", "salt flat" and "parking lot", which is not  
> a sortable list, and hence the median (nor the mean) makes any  
> sense. So Dan is right that he needs the mode to figure if in a  
> certain section we have mostly "parking lot", for example.
>
> I'm glad Walter explained how the "mode" in a "continuous series"  
> can be found by looking for the largest derivative of the  
> cumulative distribution function C(x). That would also work for  
> your index values, even if C(x) itself doesn't really make any sense.
>
> But blockmode is actually not doing the derivative at all, but is  
> looking for the least difference between x[i+j] - x[i] where j = N/
> 2 (half the number of points in the block, truncated). The mode is  
> then the midpoint, 0.5 * (x[i+j] + x[i]). Looking for an even  
> number of points in a block, though laudable, is of no use in this  
> matter. x[i+j] and x[i] are realizations, index numbers, so taking  
> the average of "tundra" (10) and "parking lot" (11) ends up to be  
> "salt flat" (12).
>
> Blockmean should have an option to do a true mode on integer  
> values, simply by sorting (as now) and looking for the largest  
> interval j where x[i+j] - x[i] < SMALL (any small number < 1 will  
> do, 0.5 is fine). In fact this works too for continuous data, where  
> the user could set SMALL to whatever limit. The mode would then be  
> 0.5 * (x[i+j] + x[i]), which in the integer case will be x[i+j] = x
> [i].
> That is a minor change in the code, that Dan may be able to  
> implement and we're happy to make that part of blockmode.
>
> Remko
>
>
> On 30 Oct 2009, at 13:32 , Walter H. F. Smith wrote:
>
>> Dan,
>>
>> The answer in your case may be "no".  You might be better off with  
>> block median, but there are still caveats.
>>
>> Before I launch into a 50-minute lecture, here is a simple example.
>> Your data are integers representing index numbers to something  
>> else.  Suppose that once your data are geographically sorted, the  
>> numbers that fall in a particular bin are:
>> { 3, 11, 12, 12 }
>> Since these are just index values, we don't even really know  
>> whether all integers in some finite range are valid answers.  For  
>> example, there might not be any index # 7 or #8 or #9 or #10 in  
>> your set of possibilities.
>>
>> The math used in these GMT tools is assuming that the data are  
>> quantities that lie on some sort of continuum.  It is assumed that  
>> each numerical value in the set is drawn from some distribution.  
>> By sorting the data into non-decreasing order,
>> x[i] <= x[i+j] when i < i+j for all i,j such that x[i],x[i+j] is  
>> in the list
>> we can make a discrete approximation to the cumulative  
>> distribution function, C(x).  In an ideal world such as exists in  
>> the mind of those who write statistics textbooks, C(x) is a  
>> continuous function and ideally is differentiable.  The median can  
>> be defined as that x such that C(x_median) = 0.5, and the mode can  
>> be defined (if it is unique) as that x such that dC/dx is  
>> maximized at x_mode.
>>
>> For weighted data the definition of the median is a bit more  
>> complicated but for unweighted data one can imagine simply sorting  
>> the list, and taking the middle value if there are an odd number  
>> of points in the list, or the average of the two values nearest  
>> the middle if there are an even number of points in the list.  
>> Note that, if in your case, the number of points in each bin is  
>> sure to be odd (because of the input and output grid sizes being  
>> carefully chosen), then the median will always result in a number  
>> which is already in your set, and thus will give you a safe  
>> result.  However, if the number of points in the list is even, as  
>> in the above example, then the output could be a floating point  
>> number that is not in your set (11.5, in the above example).
>>
>> For the mode definition the process is more complicated.  We need  
>> to define a numerical approximation of dC/dx and search for its  
>> maximum.  To guard against a singularity we tend to take the  
>> finite difference x[i+j] - x[i] over all i for some j that is  
>> chosen large enough to avoid (we hope) a divide by zero, yet small  
>> enough to isolate a local mode.  In GMT j is set to N/2, with  
>> truncation if N is odd.  This has proven to be a safe bet most of  
>> the time.
>>
>> The process (for unweighted data) is to loop over all i and find  
>> the one (or more) points where x[i+j] - x[i] is smallest.  The  
>> mode is the mid-point, 0.5* (x[i+j]+x[i]), for that i making the  
>> difference smallest.  In the case of several non-unique smallest  
>> intervals, the average of their midpoints is the mode used in  
>> GMT.  In the above example this formula leads to 11.5, although a  
>> human would have chosen 11.
>>
>> I think that in your case if you can force the medians to work on  
>> odd numbers of inputs, that is the best solution for you.
>>
>> --walter
>>
>>
>>
>> Walter H F Smith (PhD)
>> Chairman, GEBCO TSCOM/SCDB
>> Geophysicist, Laboratory for Satellite Altimetry
>> NOAA NESDIS code E/RA-31
>> 1335 East West Hwy, room 5408
>> Silver Spring MD 20910-3226
>> tel 301-713-2857, extension 126
>> fax 301-713-3136
>> Walter.HF.Smith@...
>>
>>
>>
>> On Oct 30, 2009, at 11:38 AM, Dan Kokron wrote:
>>
>>> Let me start by saying that I am not a C programmer.
>>>
>>> Looking through blockmode.c I notice that when the 'E' option is  
>>> active,
>>> the code makes a sorted copy of the values in a particular target  
>>> block
>>> (z_tmp).  Is it safe to compute the mode (most frequent integer  
>>> value)
>>> on this array?
>>>
>>> Dan
>>>
>>> On Thu, 2009-10-29 at 14:06 -0500, Paul Wessel wrote:
>>>> Hi Dan-
>>>>
>>>> blockmode is using the "least median of squares" algorithm to
>>>> approximate the mode, and this is defined as the half-point of the
>>>> shortest range that covers half of the data.  Since your data are
>>>> integers it would have been more optimal to simply pick a histogram
>>>> peak but for general data sets (floating point) there is no way to
>>>> select an optimum bin-width.  Currently, blockmode does not have an
>>>> option to apply a histogram algorithm to the data.  However, if you
>>>> roll up your sleeves for some serious scripting you could  
>>>> accomplish
>>>> what you need by invoking pshistogram on each data bin inside a  
>>>> loop.
>>>> pshistogram can return the counts instead of making a plot and you
>>>> could pull out the bin with the highest frequency (the mode) and  
>>>> build
>>>> a data set of modes.
>>>>
>>>> Cheers,
>>>> Paul
>>>>
>>>> On Oct 29, 2009, at 4:54 AM, Dan Kokron wrote:
>>>>
>>>>>
>>>>>
>>>>> Thanks for the pointer to 'blockmode'.  It does seem to be  
>>>>> doing the
>>>>> 'right' thing most of the time, but it is giving me 'mean'  
>>>>> values for
>>>>> some blocks.  I guess this could happen if a target blocks ends up
>>>>> with
>>>>> equal counts of two values, but that is not the case here.  I've
>>>>> attached a series of images that I hope demonstrates the issue.
>>>>>
>>>>> The first image shows the a region of Puerto Rico that roughly
>>>>> corresponds to a single grid box on the target (lower resolution)
>>>>> grid.
>>>>> The squiggly line represents the coastline.  The northern part  
>>>>> of the
>>>>> image (purple) is land.
>>>>>
>>>>> The second image overlays the actual target grid box with the  
>>>>> value
>>>>> that
>>>>> 'blockmode' came up with (11.5).
>>>>>
>>>>> The third image overlays the native grid boxes with integer  
>>>>> values for
>>>>> each.  All the land points are 9 while all the water points are  
>>>>> 14.
>>>>> There are clearly more land points in the target grid box yet
>>>>> 'blockmode' decided to do a simple mean of the high and low  
>>>>> values.
>>>>>
>>>>> Does this seem reasonable?
>>>>>
>>>>> There are numerous other examples in this dataset where  
>>>>> 'blockmode'
>>>>> took
>>>>> simple means instead of mode values.  Here are a few from the  
>>>>> xyz file
>>>>> that was created using ...
>>>>>
>>>>> blockmode global_topsoil_texture_43200x21600.xyz
>>>>> -Rglobal_slmask_1152x721_gt600kmsq.grd -E -Q >
>>>>>
>>>>> -66.4625        18.75   14      0       14      14
>>>>> -66.4625        18.5    11.5    3.7065  9       14
>>>>> -66.4625        18.25   9       4.4478  6       12
>>>>> -66.4625        18      11.5    3.7065  9       14
>>>>> -66.4625        8.75    5.5     5.1891  2       9
>>>>> -66.4625        8.5     5.5     5.1891  2       9
>>>>> -66.4625        -18     6       0       6       16
>>>>> -66.4625        -18.25  6       0       6       16
>>>>> -66.4625        -18.5   6       0       6       16
>>>>> -66.4625        -18.75  6       0       6       16
>>>>> -66.4625        -21.75  9.5     5.1891  4       13
>>>>> -66.4625        -34.75  5.5     5.1891  2       9
>>>>> -66.4625        -35     5.5     5.1891  2       9
>>>>> -66.4625        -35.25  5.5     5.1891  2       9
>>>>> -66.4625        -35.5   5.5     5.1891  2       9
>>>>> -66.4625        -35.75  5.5     5.1891  2       9
>>>>> -66.4625        -36     5.5     5.1891  2       9
>>>>>
>>>>> The process to reproduce this is quite involved, but I will put
>>>>> together
>>>>> a recipe if needed.
>>>>>
>>>>> Thanks
>>>>> Dan
>>>>>
>>>>> On Mon, 2009-10-26 at 16:39 -0500, Walter H. F. Smith wrote:
>>>>>> I think what you want is to sort the data into boxes sized to  
>>>>>> the new
>>>>>> smaller size, and then take the median or the mode in each  
>>>>>> box.  I
>>>>>> think you have to do this, because with the number  
>>>>>> representing an
>>>>>> index to a type, you can only have one index per box, and an  
>>>>>> average
>>>>>> doesn't make sense.
>>>>>>
>>>>>> So look at blockmode and blockmedian in GMT
>>>>>>
>>>>>>
>>>>>> Walter H F Smith (PhD)
>>>>>> Chairman, GEBCO TSCOM/SCDB
>>>>>> Geophysicist, Laboratory for Satellite Altimetry
>>>>>> NOAA NESDIS code E/RA-31
>>>>>> 1335 East West Hwy, room 5408
>>>>>> Silver Spring MD 20910-3226
>>>>>> tel 301-713-2857, extension 126
>>>>>> fax 301-713-3136
>>>>>> Walter.HF.Smith@...
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Oct 26, 2009, at 5:06 PM, Dan Kokron wrote:
>>>>>>
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I have a gridded 2D dataset that contains global  
>>>>>>> (43200x21600) soil
>>>>>>> texture data.  Each 30-sec X 30-sec grid box contains a single
>>>>>>> integer
>>>>>>> that represents the whole box.  There are 16 soil types
>>>>>>> represented in
>>>>>>> the whole file.  I'd like to create a lower resolution (still
>>>>>>> global)
>>>>>>> version of this file using a grid that is 1152x721.  The various
>>>>>>> linear
>>>>>>> and cubic schemes don't make sense given the 'index' nature  
>>>>>>> of the
>>>>>>> data.
>>>>>>> Any ideas?
>>>>>>>
>>>>>>> Does GMT have a box-averaging scheme that includes voting.  
>>>>>>> (see the
>>>>>>> 'vt' option in)
>>>>>>> http://opengrads.org/doc/udxt/re/re.html
>>>>>>>
>>>>>>> --
>>>>>>> Dan Kokron
>>>>>>> Global Modeling and Assimilation Office
>>>>>>> NASA Goddard Space Flight Center
>>>>>>> Greenbelt, MD 20771
>>>>>>> Daniel.S.Kokron@...
>>>>>>> Phone: (301) 614-5192
>>>>>>> Fax:   (301) 614-5304
>>>>>>>
>>>>>>> To unsubscribe, send the message "signoff gmt-help" to
>>>>>>> listserv@...
>>>>>>
>>>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>>>> listserv@...
>>>>> --
>>>>>
>>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>>> listserv@...
>>>>> <PR_mode_case.gif><PR_mode_case_1.gif><PR_mode_case_2.gif>
>>>>
>>>> To unsubscribe, send the message "signoff gmt-help" to  
>>>> listserv@...
>>> --
>>> Dan Kokron
>>> Global Modeling and Assimilation Office
>>> NASA Goddard Space Flight Center
>>> Greenbelt, MD 20771
>>> Daniel.S.Kokron@...
>>> Phone: (301) 614-5192
>>> Fax:   (301) 614-5304
>>>
>>> To unsubscribe, send the message "signoff gmt-help" to  
>>> listserv@...
>>
>> To unsubscribe, send the message "signoff gmt-help" to  
>> listserv@...
>
> To unsubscribe, send the message "signoff gmt-help" to  
> listserv@...

To unsubscribe, send the message "signoff gmt-help" to listserv@...