Sizing thread pools

View: New views
16 Messages — Rating Filter:   Alert me  

Sizing thread pools

by Ashley Williams-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I have a question after reading up on sizing thread pool in the jcip  
book.

It recommends that for IO bound tasks, the thread pool can be much  
larger which makes sense since the cpu isn't the bottleneck. But say I  
have two threads executing on a single cpu where the first is  
executing a blocking io method and a context switch occurs to the  
second thread. Will the underlying operating system call that the  
first thread is waiting on still somehow still continue - eg data  
still gets buffered for example - so that when the first thread is  
rescheduled it has at least made some progress?

A related question - since there isn't any way to assign threads to  
specific cpu cores from java (is this true?), should one always assume  
that any two thread executions are time-sliced?

Cheers
- Ashley
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by David Holmes-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


> Ashley Williams writes:
> It recommends that for IO bound tasks, the thread pool can be much
> larger which makes sense since the cpu isn't the bottleneck. But say I
> have two threads executing on a single cpu where the first is
> executing a blocking io method and a context switch occurs to the
> second thread. Will the underlying operating system call that the
> first thread is waiting on still somehow still continue - eg data
> still gets buffered for example - so that when the first thread is
> rescheduled it has at least made some progress?

That's a rather general question - there is a lot of detail that determines
exactly what may happen. For example, if the first thread has initiated an
I/O request that requires it to block, say a read from a socket, and data
arrives on that socket, then the OS will process the arrival of that data in
some form. At a minimum the interrupt of the ethernet card will be
processed, but it could be that the thread itself is marked as no longer
blocked, and will be eligible to be switched to at the next scheduling
point. However if the I/O thread was preempted at some arbitrary point prior
to actually initiating the I/O request and prior to blocking, then it will -
if chosen for execution again - simply continue from that point with no
"progress" having been made. And anything on-between. It depends on the
details of the blocking operation and how it is handled by the OS.

> A related question - since there isn't any way to assign threads to
> specific cpu cores from java (is this true?),

It is true that there is no Java SE API for this (It will be in the
Real-Time Specification for Java 1.1). You can do it using native code.

> should one always assume that any two thread executions are time-sliced?

One should always strive to write portable Java programs and for portability
the golden rule with regards to scheduling is:
 - to assume that any two actions in two threads could be interleaved; but
 - never require that any two actions must be interleaved

In a nutshell: don't use scheduling decisions as an alternative to
synchronization. (Even in the real-time world, with strict priority
scheduling, you can only use scheduling as an alternative to synchronization
in very specific circumstances).

I'm unclear what connection you are making between this and the ability to
bind threads to cores? If you have two threads bound on different cores,
they can not preempt each other, but they can still execute code that is
effectively interleaved - the possible interleavings are more restricted,
but as the programmer you have no idea what the relative phasing is, nor how
long individual actions take to execute. It would be difficult, and very
context sensitive, to be able to take advantage of that situation I think.
But what were you thinking of?

Cheers,
David Holmes

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Ashley Williams-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for your detailed response. My assumption in all this, that I  
should have
mentioned, is that the tasks in the thread pools are self contained  
with no side
effects and so I'm not approaching this from a code safety angle -  
this is more
about performance and getting the best use out of the cpu configuration.

For IO bound tasks I am interested in finding out if the underlying OS  
io operation that
produces the data keeps going in some fashion even if the corresponding
thread that consumes it has not yet been reactivated. It would be  
great to
have confidence that even within a single core there is some sort of  
concurrency
that occurs when for example reading off a socket.

And cpu bound threads would ideally be assigned to separate cores so  
that
they can truly execute concurrently. But if the underlying scheduler  
decides to
assign them to the same core then there will be no improved performance.

So to somebody ignorant of the facts such as myself, it seems that the  
scheduler
would need some sort of clue as to how to schedule the threads otherwise
any effort to balance thread pools for size and type of task could  
well be in
vain.

On 12 Aug 2009, at 14:08, David Holmes wrote:

>
>> Ashley Williams writes:
>> It recommends that for IO bound tasks, the thread pool can be much
>> larger which makes sense since the cpu isn't the bottleneck. But  
>> say I
>> have two threads executing on a single cpu where the first is
>> executing a blocking io method and a context switch occurs to the
>> second thread. Will the underlying operating system call that the
>> first thread is waiting on still somehow still continue - eg data
>> still gets buffered for example - so that when the first thread is
>> rescheduled it has at least made some progress?
>
> That's a rather general question - there is a lot of detail that  
> determines
> exactly what may happen. For example, if the first thread has  
> initiated an
> I/O request that requires it to block, say a read from a socket, and  
> data
> arrives on that socket, then the OS will process the arrival of that  
> data in
> some form. At a minimum the interrupt of the ethernet card will be
> processed, but it could be that the thread itself is marked as no  
> longer
> blocked, and will be eligible to be switched to at the next scheduling
> point. However if the I/O thread was preempted at some arbitrary  
> point prior
> to actually initiating the I/O request and prior to blocking, then  
> it will -
> if chosen for execution again - simply continue from that point with  
> no
> "progress" having been made. And anything on-between. It depends on  
> the
> details of the blocking operation and how it is handled by the OS.
>
>> A related question - since there isn't any way to assign threads to
>> specific cpu cores from java (is this true?),
>
> It is true that there is no Java SE API for this (It will be in the
> Real-Time Specification for Java 1.1). You can do it using native  
> code.
>
>> should one always assume that any two thread executions are time-
>> sliced?
>
> One should always strive to write portable Java programs and for  
> portability
> the golden rule with regards to scheduling is:
> - to assume that any two actions in two threads could be  
> interleaved; but
> - never require that any two actions must be interleaved
>
> In a nutshell: don't use scheduling decisions as an alternative to
> synchronization. (Even in the real-time world, with strict priority
> scheduling, you can only use scheduling as an alternative to  
> synchronization
> in very specific circumstances).
>
> I'm unclear what connection you are making between this and the  
> ability to
> bind threads to cores? If you have two threads bound on different  
> cores,
> they can not preempt each other, but they can still execute code  
> that is
> effectively interleaved - the possible interleavings are more  
> restricted,
> but as the programmer you have no idea what the relative phasing is,  
> nor how
> long individual actions take to execute. It would be difficult, and  
> very
> context sensitive, to be able to take advantage of that situation I  
> think.
> But what were you thinking of?
>
> Cheers,
> David Holmes
>

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by David Holmes-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ashley,

> For IO bound tasks I am interested in finding out if the underlying OS
> io operation that produces the data keeps going in some fashion even
> if the corresponding thread that consumes it has not yet been reactivated.

In general yes. Exactly how much depends on the OS and its structure. As
long as the I/O request has been made then the OS can service that request -
for example processing an incoming ethernet packet.
>
> And cpu bound threads would ideally be assigned to separate cores so
> that they can truly execute concurrently. But if the underlying scheduler
> decides to assign them to the same core then there will be no improved
performance.

Depends on how many threads you are trying to run on the system and how many
cores you have. If you give threads dedicated cores and the threads are
totally CPU bound then you effectively remove them from the scheduling
decisions: if nothing else can run on their core then they won't get
timesliced.

But to do this you require complete knowledge of everything on the system -
including OS services - to ensure that you don't "starve" your own threads
of services they need, and to ensure the system as a whole still functions
correctly.

> So to somebody ignorant of the facts such as myself, it seems that the
> scheduler would need some sort of clue as to how to schedule the threads
otherwise
> any effort to balance thread pools for size and type of task could
> well be in vain.

The scheduler only knows about scheduling policies and what properties
threads have under those policies. For "normal" time-sharing this means all
threads are initially equal, and the scheduler will schedule them as fairly
as it can, whilst still having regard for performance (eg. thread affinity
typically tries to run a thread on the same processor as it last ran on).
But CPU bound threads will consume their quanta and be switched out; while
I/O bound threads that block a lot tend to get some preferential treatment -
but it all depends on the OS and the scheduling policy.

Binding threads to specific cores makes the scheduler's job both harder and
easier - because there are less choices available.

It sounds to me that you may want to partition your available processors
into two sets: one for CPU bound and one for I/O bound tasks. That might
give your application better overall "performance" (as defined by your
application). But again there are no Java API's for doing this, and
processor-set/cpu-set management is a system level administrative task, not
something to be attempted from within the application.

But as always with these things the first step is to identify if there is
indeed a problem.

David Holmes

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Robert Fischer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My general guideline has been to shotgun the system: aim to be at the high end of the order of
magnitude of the number of threads that can be processed at any point in time, and default to
throwing off new threads to support a high water mark.  Yes, it creates some waste in terms of
context switching and memory, but anything less than that seems to eventually hit throughput limits.

At the end of the day, though, it's all voodoo.

~~ Robert.

David Holmes wrote:

> Hi Ashley,
>
>> For IO bound tasks I am interested in finding out if the underlying OS
>> io operation that produces the data keeps going in some fashion even
>> if the corresponding thread that consumes it has not yet been reactivated.
>
> In general yes. Exactly how much depends on the OS and its structure. As
> long as the I/O request has been made then the OS can service that request -
> for example processing an incoming ethernet packet.
>> And cpu bound threads would ideally be assigned to separate cores so
>> that they can truly execute concurrently. But if the underlying scheduler
>> decides to assign them to the same core then there will be no improved
> performance.
>
> Depends on how many threads you are trying to run on the system and how many
> cores you have. If you give threads dedicated cores and the threads are
> totally CPU bound then you effectively remove them from the scheduling
> decisions: if nothing else can run on their core then they won't get
> timesliced.
>
> But to do this you require complete knowledge of everything on the system -
> including OS services - to ensure that you don't "starve" your own threads
> of services they need, and to ensure the system as a whole still functions
> correctly.
>
>> So to somebody ignorant of the facts such as myself, it seems that the
>> scheduler would need some sort of clue as to how to schedule the threads
> otherwise
>> any effort to balance thread pools for size and type of task could
>> well be in vain.
>
> The scheduler only knows about scheduling policies and what properties
> threads have under those policies. For "normal" time-sharing this means all
> threads are initially equal, and the scheduler will schedule them as fairly
> as it can, whilst still having regard for performance (eg. thread affinity
> typically tries to run a thread on the same processor as it last ran on).
> But CPU bound threads will consume their quanta and be switched out; while
> I/O bound threads that block a lot tend to get some preferential treatment -
> but it all depends on the OS and the scheduling policy.
>
> Binding threads to specific cores makes the scheduler's job both harder and
> easier - because there are less choices available.
>
> It sounds to me that you may want to partition your available processors
> into two sets: one for CPU bound and one for I/O bound tasks. That might
> give your application better overall "performance" (as defined by your
> application). But again there are no Java API's for doing this, and
> processor-set/cpu-set management is a system level administrative task, not
> something to be attempted from within the application.
>
> But as always with these things the first step is to identify if there is
> indeed a problem.
>
> David Holmes
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

--
~~ Robert Fischer, Smokejumper IT Consulting.
Enfranchised Mind Blog http://EnfranchisedMind.com/blog

Check out my book, "Grails Persistence with GORM and GSQL"!
http://www.smokejumperit.com/redirect.html
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Ashley Williams-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm starting to think this is the best approach too. The kicker for me  
is that you can't assign threads to specific cores at least not with  
standard java - but maybe the jvm is smart enough to notice when there  
are cores going spare and to shunt a cpu intensive thread that is  
currently time slicing with other threads onto one of them. I suppose  
in that case, adaptive scheduling techniques would do a far better job  
than my manual efforts would ever achieve.

The other fly in the ointment when trying to be scientific about these  
calculations is the sheer number of threads introduced by dependency  
libraries. For example dumping the stack for a running jboss  
application reveals scores of threads, making my own thread count a  
drop in the ocean. I suppose I could make a ham-fisted attempt to  
calculate the wait/compute time ratio but I wonder how representative  
I could ever hope to make my sample.

On 13 Aug 2009, at 00:36, Robert Fischer wrote:

> My general guideline has been to shotgun the system: aim to be at  
> the high end of the order of magnitude of the number of threads that  
> can be processed at any point in time, and default to throwing off  
> new threads to support a high water mark.  Yes, it creates some  
> waste in terms of context switching and memory, but anything less  
> than that seems to eventually hit throughput limits.
>
> At the end of the day, though, it's all voodoo.
>
> ~~ Robert.
>
> David Holmes wrote:
>> Hi Ashley,
>>> For IO bound tasks I am interested in finding out if the  
>>> underlying OS
>>> io operation that produces the data keeps going in some fashion even
>>> if the corresponding thread that consumes it has not yet been  
>>> reactivated.
>> In general yes. Exactly how much depends on the OS and its  
>> structure. As
>> long as the I/O request has been made then the OS can service that  
>> request -
>> for example processing an incoming ethernet packet.
>>> And cpu bound threads would ideally be assigned to separate cores so
>>> that they can truly execute concurrently. But if the underlying  
>>> scheduler
>>> decides to assign them to the same core then there will be no  
>>> improved
>> performance.
>> Depends on how many threads you are trying to run on the system and  
>> how many
>> cores you have. If you give threads dedicated cores and the threads  
>> are
>> totally CPU bound then you effectively remove them from the  
>> scheduling
>> decisions: if nothing else can run on their core then they won't get
>> timesliced.
>> But to do this you require complete knowledge of everything on the  
>> system -
>> including OS services - to ensure that you don't "starve" your own  
>> threads
>> of services they need, and to ensure the system as a whole still  
>> functions
>> correctly.
>>> So to somebody ignorant of the facts such as myself, it seems that  
>>> the
>>> scheduler would need some sort of clue as to how to schedule the  
>>> threads
>> otherwise
>>> any effort to balance thread pools for size and type of task could
>>> well be in vain.
>> The scheduler only knows about scheduling policies and what  
>> properties
>> threads have under those policies. For "normal" time-sharing this  
>> means all
>> threads are initially equal, and the scheduler will schedule them  
>> as fairly
>> as it can, whilst still having regard for performance (eg. thread  
>> affinity
>> typically tries to run a thread on the same processor as it last  
>> ran on).
>> But CPU bound threads will consume their quanta and be switched  
>> out; while
>> I/O bound threads that block a lot tend to get some preferential  
>> treatment -
>> but it all depends on the OS and the scheduling policy.
>> Binding threads to specific cores makes the scheduler's job both  
>> harder and
>> easier - because there are less choices available.
>> It sounds to me that you may want to partition your available  
>> processors
>> into two sets: one for CPU bound and one for I/O bound tasks. That  
>> might
>> give your application better overall "performance" (as defined by  
>> your
>> application). But again there are no Java API's for doing this, and
>> processor-set/cpu-set management is a system level administrative  
>> task, not
>> something to be attempted from within the application.
>> But as always with these things the first step is to identify if  
>> there is
>> indeed a problem.
>> David Holmes
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest@...
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> --
> ~~ Robert Fischer, Smokejumper IT Consulting.
> Enfranchised Mind Blog http://EnfranchisedMind.com/blog
>
> Check out my book, "Grails Persistence with GORM and GSQL"!
> http://www.smokejumperit.com/redirect.html
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by David Holmes-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ashley Williams writes:
> I'm starting to think this is the best approach too. The kicker for me
> is that you can't assign threads to specific cores at least not with
> standard java - but maybe the jvm is smart enough to notice when there
> are cores going spare and to shunt a cpu intensive thread that is
> currently time slicing with other threads onto one of them. I suppose
> in that case, adaptive scheduling techniques would do a far better job
> than my manual efforts would ever achieve.

The JVMs I'm familiar with do zero scheduling in general - it is all done by
the OS.

But unless you've estblished there is a problem how do you know what you are
trying to achieve?

Cheers,
David Holmes

> The other fly in the ointment when trying to be scientific about these
> calculations is the sheer number of threads introduced by dependency
> libraries. For example dumping the stack for a running jboss
> application reveals scores of threads, making my own thread count a
> drop in the ocean. I suppose I could make a ham-fisted attempt to
> calculate the wait/compute time ratio but I wonder how representative
> I could ever hope to make my sample.
>
> On 13 Aug 2009, at 00:36, Robert Fischer wrote:
>
> > My general guideline has been to shotgun the system: aim to be at
> > the high end of the order of magnitude of the number of threads that
> > can be processed at any point in time, and default to throwing off
> > new threads to support a high water mark.  Yes, it creates some
> > waste in terms of context switching and memory, but anything less
> > than that seems to eventually hit throughput limits.
> >
> > At the end of the day, though, it's all voodoo.
> >
> > ~~ Robert.
> >
> > David Holmes wrote:
> >> Hi Ashley,
> >>> For IO bound tasks I am interested in finding out if the
> >>> underlying OS
> >>> io operation that produces the data keeps going in some fashion even
> >>> if the corresponding thread that consumes it has not yet been
> >>> reactivated.
> >> In general yes. Exactly how much depends on the OS and its
> >> structure. As
> >> long as the I/O request has been made then the OS can service that
> >> request -
> >> for example processing an incoming ethernet packet.
> >>> And cpu bound threads would ideally be assigned to separate cores so
> >>> that they can truly execute concurrently. But if the underlying
> >>> scheduler
> >>> decides to assign them to the same core then there will be no
> >>> improved
> >> performance.
> >> Depends on how many threads you are trying to run on the system and
> >> how many
> >> cores you have. If you give threads dedicated cores and the threads
> >> are
> >> totally CPU bound then you effectively remove them from the
> >> scheduling
> >> decisions: if nothing else can run on their core then they won't get
> >> timesliced.
> >> But to do this you require complete knowledge of everything on the
> >> system -
> >> including OS services - to ensure that you don't "starve" your own
> >> threads
> >> of services they need, and to ensure the system as a whole still
> >> functions
> >> correctly.
> >>> So to somebody ignorant of the facts such as myself, it seems that
> >>> the
> >>> scheduler would need some sort of clue as to how to schedule the
> >>> threads
> >> otherwise
> >>> any effort to balance thread pools for size and type of task could
> >>> well be in vain.
> >> The scheduler only knows about scheduling policies and what
> >> properties
> >> threads have under those policies. For "normal" time-sharing this
> >> means all
> >> threads are initially equal, and the scheduler will schedule them
> >> as fairly
> >> as it can, whilst still having regard for performance (eg. thread
> >> affinity
> >> typically tries to run a thread on the same processor as it last
> >> ran on).
> >> But CPU bound threads will consume their quanta and be switched
> >> out; while
> >> I/O bound threads that block a lot tend to get some preferential
> >> treatment -
> >> but it all depends on the OS and the scheduling policy.
> >> Binding threads to specific cores makes the scheduler's job both
> >> harder and
> >> easier - because there are less choices available.
> >> It sounds to me that you may want to partition your available
> >> processors
> >> into two sets: one for CPU bound and one for I/O bound tasks. That
> >> might
> >> give your application better overall "performance" (as defined by
> >> your
> >> application). But again there are no Java API's for doing this, and
> >> processor-set/cpu-set management is a system level administrative
> >> task, not
> >> something to be attempted from within the application.
> >> But as always with these things the first step is to identify if
> >> there is
> >> indeed a problem.
> >> David Holmes
> >> _______________________________________________
> >> Concurrency-interest mailing list
> >> Concurrency-interest@...
> >> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
> >
> > --
> > ~~ Robert Fischer, Smokejumper IT Consulting.
> > Enfranchised Mind Blog http://EnfranchisedMind.com/blog
> >
> > Check out my book, "Grails Persistence with GORM and GSQL"!
> > http://www.smokejumperit.com/redirect.html
> > _______________________________________________
> > Concurrency-interest mailing list
> > Concurrency-interest@...
> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Ashley Williams-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 20 Aug 2009, at 10:06, David Holmes wrote:

> Ashley Williams writes:
>> I'm starting to think this is the best approach too. The kicker for  
>> me
>> is that you can't assign threads to specific cores at least not with
>> standard java - but maybe the jvm is smart enough to notice when  
>> there
>> are cores going spare and to shunt a cpu intensive thread that is
>> currently time slicing with other threads onto one of them. I suppose
>> in that case, adaptive scheduling techniques would do a far better  
>> job
>> than my manual efforts would ever achieve.
>
> The JVMs I'm familiar with do zero scheduling in general - it is all  
> done by
> the OS.
>
> But unless you've estblished there is a problem how do you know what  
> you are
> trying to achieve?

Hi David,

I'm not trying to solve a problem I'm trying to understand how to  
solve problems.

- Ashley

>
> Cheers,
> David Holmes
>
>> The other fly in the ointment when trying to be scientific about  
>> these
>> calculations is the sheer number of threads introduced by dependency
>> libraries. For example dumping the stack for a running jboss
>> application reveals scores of threads, making my own thread count a
>> drop in the ocean. I suppose I could make a ham-fisted attempt to
>> calculate the wait/compute time ratio but I wonder how representative
>> I could ever hope to make my sample.
>>
>> On 13 Aug 2009, at 00:36, Robert Fischer wrote:
>>
>>> My general guideline has been to shotgun the system: aim to be at
>>> the high end of the order of magnitude of the number of threads that
>>> can be processed at any point in time, and default to throwing off
>>> new threads to support a high water mark.  Yes, it creates some
>>> waste in terms of context switching and memory, but anything less
>>> than that seems to eventually hit throughput limits.
>>>
>>> At the end of the day, though, it's all voodoo.
>>>
>>> ~~ Robert.
>>>
>>> David Holmes wrote:
>>>> Hi Ashley,
>>>>> For IO bound tasks I am interested in finding out if the
>>>>> underlying OS
>>>>> io operation that produces the data keeps going in some fashion  
>>>>> even
>>>>> if the corresponding thread that consumes it has not yet been
>>>>> reactivated.
>>>> In general yes. Exactly how much depends on the OS and its
>>>> structure. As
>>>> long as the I/O request has been made then the OS can service that
>>>> request -
>>>> for example processing an incoming ethernet packet.
>>>>> And cpu bound threads would ideally be assigned to separate  
>>>>> cores so
>>>>> that they can truly execute concurrently. But if the underlying
>>>>> scheduler
>>>>> decides to assign them to the same core then there will be no
>>>>> improved
>>>> performance.
>>>> Depends on how many threads you are trying to run on the system and
>>>> how many
>>>> cores you have. If you give threads dedicated cores and the threads
>>>> are
>>>> totally CPU bound then you effectively remove them from the
>>>> scheduling
>>>> decisions: if nothing else can run on their core then they won't  
>>>> get
>>>> timesliced.
>>>> But to do this you require complete knowledge of everything on the
>>>> system -
>>>> including OS services - to ensure that you don't "starve" your own
>>>> threads
>>>> of services they need, and to ensure the system as a whole still
>>>> functions
>>>> correctly.
>>>>> So to somebody ignorant of the facts such as myself, it seems that
>>>>> the
>>>>> scheduler would need some sort of clue as to how to schedule the
>>>>> threads
>>>> otherwise
>>>>> any effort to balance thread pools for size and type of task could
>>>>> well be in vain.
>>>> The scheduler only knows about scheduling policies and what
>>>> properties
>>>> threads have under those policies. For "normal" time-sharing this
>>>> means all
>>>> threads are initially equal, and the scheduler will schedule them
>>>> as fairly
>>>> as it can, whilst still having regard for performance (eg. thread
>>>> affinity
>>>> typically tries to run a thread on the same processor as it last
>>>> ran on).
>>>> But CPU bound threads will consume their quanta and be switched
>>>> out; while
>>>> I/O bound threads that block a lot tend to get some preferential
>>>> treatment -
>>>> but it all depends on the OS and the scheduling policy.
>>>> Binding threads to specific cores makes the scheduler's job both
>>>> harder and
>>>> easier - because there are less choices available.
>>>> It sounds to me that you may want to partition your available
>>>> processors
>>>> into two sets: one for CPU bound and one for I/O bound tasks. That
>>>> might
>>>> give your application better overall "performance" (as defined by
>>>> your
>>>> application). But again there are no Java API's for doing this, and
>>>> processor-set/cpu-set management is a system level administrative
>>>> task, not
>>>> something to be attempted from within the application.
>>>> But as always with these things the first step is to identify if
>>>> there is
>>>> indeed a problem.
>>>> David Holmes
>>>> _______________________________________________
>>>> Concurrency-interest mailing list
>>>> Concurrency-interest@...
>>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>>
>>> --
>>> ~~ Robert Fischer, Smokejumper IT Consulting.
>>> Enfranchised Mind Blog http://EnfranchisedMind.com/blog
>>>
>>> Check out my book, "Grails Persistence with GORM and GSQL"!
>>> http://www.smokejumperit.com/redirect.html
>>> _______________________________________________
>>> Concurrency-interest mailing list
>>> Concurrency-interest@...
>>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>>
>> _______________________________________________
>> Concurrency-interest mailing list
>> Concurrency-interest@...
>> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Hanson Char :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

For scaling services (written in Java) in general, given the same box equipped with multiple CPU's (and sufficient memory), can there be performance advantages in running multiple (identical) JVM instances (each with smaller thread pool) instead of running a single instance of JVM  but with a larger thread pool ?

Thanks,
Hanson

On Thu, Aug 20, 2009 at 2:06 AM, David Holmes <davidcholmes@...> wrote:
The JVMs I'm familiar with do zero scheduling in general - it is all done by
the OS.



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by David Holmes-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hanson Char writes:
> For scaling services (written in Java) in general, given the same box
equipped with
> multiple CPU's (and sufficient memory), can there be performance
advantages in running
> multiple (identical) JVM instances (each with smaller thread pool) instead
of running
> a single instance of JVM  but with a larger thread pool ?

Generally I'd say that the additional VM overheads would count against you
too much. But it all depends on where your performance bottleneck is. In
theory if you're saturating a particular aspect of the VM then
horizontal-scaling might be a benefit (but usually that involves adding new
machines). The main issue is heap: what throughput can you maintain with 10
4OOMB heaps versus 1 4GB heap, for example ?

As always with these things, the proof is in the measurement. :)

Cheers,
David Holmes

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Shaffer, Darron :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Some older (1.4) JVM's on Linux couldn't use more than 3 CPUs
effectively.  I wouldn't expect that to be the case anymore.

-----Original Message-----
From: concurrency-interest-bounces@...
[mailto:concurrency-interest-bounces@...] On Behalf Of David
Holmes
Sent: Thursday, August 20, 2009 5:06 PM
To: Hanson Char
Cc: concurrency-interest@...
Subject: Re: [concurrency-interest] Sizing thread pools


Hanson Char writes:
> For scaling services (written in Java) in general, given the same box
equipped with
> multiple CPU's (and sufficient memory), can there be performance
advantages in running
> multiple (identical) JVM instances (each with smaller thread pool)
instead
of running
> a single instance of JVM  but with a larger thread pool ?

Generally I'd say that the additional VM overheads would count against
you
too much. But it all depends on where your performance bottleneck is. In
theory if you're saturating a particular aspect of the VM then
horizontal-scaling might be a benefit (but usually that involves adding
new
machines). The main issue is heap: what throughput can you maintain with
10
4OOMB heaps versus 1 4GB heap, for example ?

As always with these things, the proof is in the measurement. :)

Cheers,
David Holmes

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Hanson Char :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>the proof is in the measurement

Totally agree.

Besides throughput, there is also the consideration of service response time.  In particular, TP99.9.

My (naive?) understanding is the latency of performing a full GC is in general linear (or at least in some way proportional) to the heap size, and a full GC would eventually kicks in on a busy JVM regardless of the specific collector (such as CMS) used.  So it will probably take (10 times?) longer for the freeze when a full GC kicks in for a heap of 4GB than 0.4GB.  Having 10 JVM's with 0.4GB heap, the chances of all JVM performing a full GC at the same time is low.  Wouldn't this translate to better TP99.9 if we have 10 small JVM's running on the same box than 1 big JVM (assuming we have say 10 (or more) CPU's on the box) ?

Thanks,
Hanson

On Thu, Aug 20, 2009 at 3:05 PM, David Holmes <davidcholmes@...> wrote:

Hanson Char writes:
> For scaling services (written in Java) in general, given the same box
equipped with
> multiple CPU's (and sufficient memory), can there be performance
advantages in running
> multiple (identical) JVM instances (each with smaller thread pool) instead
of running
> a single instance of JVM  but with a larger thread pool ?

Generally I'd say that the additional VM overheads would count against you
too much. But it all depends on where your performance bottleneck is. In
theory if you're saturating a particular aspect of the VM then
horizontal-scaling might be a benefit (but usually that involves adding new
machines). The main issue is heap: what throughput can you maintain with 10
4OOMB heaps versus 1 4GB heap, for example ?

As always with these things, the proof is in the measurement. :)

Cheers,
David Holmes



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by David Holmes-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hanson,
 
You'd have to consult some GC experts for an answer to that. My guess is that it all depends :)
 
Even looking at it simplistically though, you'd have to account for the full GCs happening 10x more frequently.
 
Without thinking too hard about this I'd say that with 10 VMs you'd probably reduce the worst-case latency but increase the median latency.
 
David
 
 -----Original Message-----
From: Hanson Char [mailto:hanson.char@...]
Sent: Friday, 21 August 2009 2:57 PM
To: dholmes@...
Cc: concurrency-interest@...
Subject: Re: [concurrency-interest] Sizing thread pools

>the proof is in the measurement

Totally agree.

Besides throughput, there is also the consideration of service response time.  In particular, TP99.9.

My (naive?) understanding is the latency of performing a full GC is in general linear (or at least in some way proportional) to the heap size, and a full GC would eventually kicks in on a busy JVM regardless of the specific collector (such as CMS) used.  So it will probably take (10 times?) longer for the freeze when a full GC kicks in for a heap of 4GB than 0.4GB.  Having 10 JVM's with 0.4GB heap, the chances of all JVM performing a full GC at the same time is low.  Wouldn't this translate to better TP99.9 if we have 10 small JVM's running on the same box than 1 big JVM (assuming we have say 10 (or more) CPU's on the box) ?

Thanks,
Hanson

On Thu, Aug 20, 2009 at 3:05 PM, David Holmes <davidcholmes@...> wrote:

Hanson Char writes:
> For scaling services (written in Java) in general, given the same box
equipped with
> multiple CPU's (and sufficient memory), can there be performance
advantages in running
> multiple (identical) JVM instances (each with smaller thread pool) instead
of running
> a single instance of JVM  but with a larger thread pool ?

Generally I'd say that the additional VM overheads would count against you
too much. But it all depends on where your performance bottleneck is. In
theory if you're saturating a particular aspect of the VM then
horizontal-scaling might be a benefit (but usually that involves adding new
machines). The main issue is heap: what throughput can you maintain with 10
4OOMB heaps versus 1 4GB heap, for example ?

As always with these things, the proof is in the measurement. :)

Cheers,
David Holmes



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Hanson Char :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

My simplistic thinking leads me to the same conclusion/expectation :)  Thanks, David.

Hanson

On Thu, Aug 20, 2009 at 10:31 PM, David Holmes <davidcholmes@...> wrote:
Hanson,
 
You'd have to consult some GC experts for an answer to that. My guess is that it all depends :)
 
Even looking at it simplistically though, you'd have to account for the full GCs happening 10x more frequently.
 
Without thinking too hard about this I'd say that with 10 VMs you'd probably reduce the worst-case latency but increase the median latency.
 
David
 
 -----Original Message-----
From: Hanson Char [mailto:hanson.char@...]
Sent: Friday, 21 August 2009 2:57 PM
To: dholmes@...
Cc: concurrency-interest@...
Subject: Re: [concurrency-interest] Sizing thread pools

>the proof is in the measurement

Totally agree.

Besides throughput, there is also the consideration of service response time.  In particular, TP99.9.

My (naive?) understanding is the latency of performing a full GC is in general linear (or at least in some way proportional) to the heap size, and a full GC would eventually kicks in on a busy JVM regardless of the specific collector (such as CMS) used.  So it will probably take (10 times?) longer for the freeze when a full GC kicks in for a heap of 4GB than 0.4GB.  Having 10 JVM's with 0.4GB heap, the chances of all JVM performing a full GC at the same time is low.  Wouldn't this translate to better TP99.9 if we have 10 small JVM's running on the same box than 1 big JVM (assuming we have say 10 (or more) CPU's on the box) ?

Thanks,
Hanson

On Thu, Aug 20, 2009 at 3:05 PM, David Holmes <davidcholmes@...> wrote:

Hanson Char writes:
> For scaling services (written in Java) in general, given the same box
equipped with
> multiple CPU's (and sufficient memory), can there be performance
advantages in running
> multiple (identical) JVM instances (each with smaller thread pool) instead
of running
> a single instance of JVM  but with a larger thread pool ?

Generally I'd say that the additional VM overheads would count against you
too much. But it all depends on where your performance bottleneck is. In
theory if you're saturating a particular aspect of the VM then
horizontal-scaling might be a benefit (but usually that involves adding new
machines). The main issue is heap: what throughput can you maintain with 10
4OOMB heaps versus 1 4GB heap, for example ?

As always with these things, the proof is in the measurement. :)

Cheers,
David Holmes




_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by kohlerm :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Hanson,
This might be a bit off topic here, but it's an interesting topic, so
let my try to answer your question :)

Yes definitely running more than one JVM can be beneficial. There are
several reasons.
First you could run several  32bit JVM's on a 64 bit machine. The
32bit JVM's were usually faster and needed less memory because of
smaller references.  SAP for example used to recommend 2 nodes instead
of only one even on 64 bit machine. You also can get better fail over
support.

As others have said "it all depends", also IHMO today there aren't
many good reasons to do so.
CMS performs fairly well even with large heaps and there is compressed
references support in the latest SUN JDK. And with multiple JVM's you
also waste memory because some data will be kept within each JVM.

Regards,
Markus

On Thu, Aug 20, 2009 at 7:21 PM, Hanson Char<hanson.char@...> wrote:

> For scaling services (written in Java) in general, given the same box
> equipped with multiple CPU's (and sufficient memory), can there be
> performance advantages in running multiple (identical) JVM instances (each
> with smaller thread pool) instead of running a single instance of JVM  but
> with a larger thread pool ?
>
> Thanks,
> Hanson
>
> On Thu, Aug 20, 2009 at 2:06 AM, David Holmes <davidcholmes@...>
> wrote:
>>
>> The JVMs I'm familiar with do zero scheduling in general - it is all done
>> by
>> the OS.
>>
>
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Sizing thread pools

by Gregg Wonderly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

In the distant past, around 1986, I was responsible for a Vax-11/750 that was
used by both the statistics department and the Math department.  The stats
people ran simulations, all the time at their terminals, letting them run for
hours, and just stopping by to look every once in a while to see if they were
done.  The more terminals they had access to, the more simulations they'd run.
The math people used a few math packages that used some CPU for a few minutes
usually.  The majority of the people were writing papers and sending email, and
with all the CPU bound stuff going on, the machine was dog slow in response to
interactive requests.

I did some studies of the scheduler and found that it was actually pretty rigid,
and so I wrote a program that ran in the background and periodically lowered
processes' priorities the longer they stayed CPU bound, eventually putting them
below the batch queue priorities even.  This made the interactive users feel
like they were the only ones using the computer.  They started at prio=5, and
got set down to 4 after 6 seconds of CPU.  At 30 seconds they went to 3 (batch
queue priority), and after a couple of minutes or more, they went to 2 as I
recall.  I finally would set them to priority 1 if they ran longer than an hour.

I mention this, because, for me, it points out, that the "larger" load on the
machine you have the worse latency gets for any response (this is not news).
With priorities, I was able to divide the machine in to pieces so that different
users had different opportunities to use the machines resources more readily.

This was a form of horizontal expansion because the priority structure on VMS
was fairly rigid.  Expanding horizontally with more machines/CPUs is almost
always the easiest way to deal with latency in a non-real-time OS.  Having 10
VMs running on the same machine, for me, is not horizontal expansion if you
don't manage how each VM uses the machine so that things needing certain
resources are in certain VMs all of which whose behavior needs to be
purposefully partitioned.  I/O bandwidth to disk for swapping/paging and network
contention for OS based scheduling bottle necks all need to be considered.

Gregg Wonderly

Hanson Char wrote:

> My simplistic thinking leads me to the same conclusion/expectation :)  
> Thanks, David.
>
> Hanson
>
> On Thu, Aug 20, 2009 at 10:31 PM, David Holmes <davidcholmes@...
> <mailto:davidcholmes@...>> wrote:
>
>     Hanson,
>      
>     You'd have to consult some GC experts for an answer to that. My
>     guess is that it all depends :)
>      
>     Even looking at it simplistically though, you'd have to account for
>     the full GCs happening 10x more frequently.
>      
>     Without thinking too hard about this I'd say that with 10 VMs you'd
>     probably reduce the worst-case latency but increase the median latency.
>      
>     David
>      
>      -----Original Message-----
>     *From:* Hanson Char [mailto:hanson.char@...
>     <mailto:hanson.char@...>]
>     *Sent:* Friday, 21 August 2009 2:57 PM
>     *To:* dholmes@... <mailto:dholmes@...>
>     *Cc:* concurrency-interest@...
>     <mailto:concurrency-interest@...>
>     *Subject:* Re: [concurrency-interest] Sizing thread pools
>
>          >the proof is in the measurement
>
>         Totally agree.
>
>         Besides throughput, there is also the consideration of service
>         response time.  In particular, TP99.9.
>
>         My (naive?) understanding is the latency of performing a full GC
>         is in general linear (or at least in some way proportional) to
>         the heap size, and a full GC would eventually kicks in on a busy
>         JVM regardless of the specific collector (such as CMS) used.  So
>         it will probably take (10 times?) longer for the freeze when a
>         full GC kicks in for a heap of 4GB than 0.4GB.  Having 10 JVM's
>         with 0.4GB heap, the chances of all JVM performing a full GC at
>         the same time is low.  Wouldn't this translate to better TP99.9
>         if we have 10 small JVM's running on the same box than 1 big JVM
>         (assuming we have say 10 (or more) CPU's on the box) ?
>
>         Thanks,
>         Hanson
>
>         On Thu, Aug 20, 2009 at 3:05 PM, David Holmes
>         <davidcholmes@... <mailto:davidcholmes@...>> wrote:
>
>
>             Hanson Char writes:
>              > For scaling services (written in Java) in general, given
>             the same box
>             equipped with
>              > multiple CPU's (and sufficient memory), can there be
>             performance
>             advantages in running
>              > multiple (identical) JVM instances (each with smaller
>             thread pool) instead
>             of running
>              > a single instance of JVM  but with a larger thread pool ?
>
>             Generally I'd say that the additional VM overheads would
>             count against you
>             too much. But it all depends on where your performance
>             bottleneck is. In
>             theory if you're saturating a particular aspect of the VM then
>             horizontal-scaling might be a benefit (but usually that
>             involves adding new
>             machines). The main issue is heap: what throughput can you
>             maintain with 10
>             4OOMB heaps versus 1 4GB heap, for example ?
>
>             As always with these things, the proof is in the measurement. :)
>
>             Cheers,
>             David Holmes
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest