Use of scheduledexecutor appropriate for this requirement?

View: New views
12 Messages — Rating Filter:   Alert me  

Use of scheduledexecutor appropriate for this requirement?

by Robert Nicholson-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have a need to process messages off a JMS queue in batches but yet  
still not wait around for messages to arrive before I can flush a  
batch. So occasionally I will flush a batch of records that is less  
than my batch size long. I do this by arranging for a one-time  
scheduled task to run every time I receive a message. The task will  
run in roughly 10 seconds from the time it's scheduled. If in the  
meantime another message arrives in less than 1 second I will cancel  
the previously scheduled task and arrange for another one. etc etc  
etc. So you can see the difference b/w when I scheduled the most  
recent task and the time of receipt of the most recent message  
determines whether or not I will cancel the most recently scheduled  
task. The idea being is that if I'm receiving a steady flow of  
incoming messages I don't need to rely on the scheduled tasks as I'll  
hit my batch size and flush the batch anyway.

Can somebody comment on the above logic and whether it's an  
appropriate use of scheduled executor.

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Joe Bowbeer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Scheduled Executor is a thread-pool re-implementation of java.util.Timer.  If you need multiple threads servicing periodic tasks, that is, if your periodic tasks should execute concurrently, then Scheduled Executor is prescribed.

You can accomplish what you are describing with Timer or ScheduledExecutor.  Your code needs to maintain a current cleanup task that is scheduled to execute in 10 seconds.  New messages would cancel the existing cleanup task and schedule a new task.

Alternatively, you could schedule a single repeating task that executes every 10 seconds, and have it do nothing if the last message was queued less than, say, 1 second prior to its execution.

Joe

On Sat, Oct 24, 2009 at 9:11 PM, Robert Nicholson wrote:
I have a need to process messages off a JMS queue in batches but yet still not wait around for messages to arrive before I can flush a batch. So occasionally I will flush a batch of records that is less than my batch size long. I do this by arranging for a one-time scheduled task to run every time I receive a message. The task will run in roughly 10 seconds from the time it's scheduled. If in the meantime another message arrives in less than 1 second I will cancel the previously scheduled task and arrange for another one. etc etc etc. So you can see the difference b/w when I scheduled the most recent task and the time of receipt of the most recent message determines whether or not I will cancel the most recently scheduled task. The idea being is that if I'm receiving a steady flow of incoming messages I don't need to rely on the scheduled tasks as I'll hit my batch size and flush the batch anyway.

Can somebody comment on the above logic and whether it's an appropriate use of scheduled executor.


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Gregg Wonderly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Many times when I need to do this, and I don't otherwise have a STPE around, I
just create a class that does this.  java.util.Timer was not designed for
cancelled tasks at the beginning and it took several iterations (JDK versions)
for it to get fixed to sorta work, and it still had a silly purge mechanism
instead of taking care of this issue by itself.  I stopped using it because of
the garbage it creates for mostly cancel situations like you described.

Gregg Wonderly

Joe Bowbeer wrote:

> Scheduled Executor is a thread-pool re-implementation of
> java.util.Timer.  If you need multiple threads servicing periodic tasks,
> that is, if your periodic tasks should execute concurrently, then
> Scheduled Executor is prescribed.
>
> You can accomplish what you are describing with Timer or
> ScheduledExecutor.  Your code needs to maintain a current cleanup task
> that is scheduled to execute in 10 seconds.  New messages would cancel
> the existing cleanup task and schedule a new task.
>
> Alternatively, you could schedule a single repeating task that executes
> every 10 seconds, and have it do nothing if the last message was queued
> less than, say, 1 second prior to its execution.
>
> Joe
>
> On Sat, Oct 24, 2009 at 9:11 PM, Robert Nicholson wrote:
>
>     I have a need to process messages off a JMS queue in batches but yet
>     still not wait around for messages to arrive before I can flush a
>     batch. So occasionally I will flush a batch of records that is less
>     than my batch size long. I do this by arranging for a one-time
>     scheduled task to run every time I receive a message. The task will
>     run in roughly 10 seconds from the time it's scheduled. If in the
>     meantime another message arrives in less than 1 second I will cancel
>     the previously scheduled task and arrange for another one. etc etc
>     etc. So you can see the difference b/w when I scheduled the most
>     recent task and the time of receipt of the most recent message
>     determines whether or not I will cancel the most recently scheduled
>     task. The idea being is that if I'm receiving a steady flow of
>     incoming messages I don't need to rely on the scheduled tasks as
>     I'll hit my batch size and flush the batch anyway.
>
>     Can somebody comment on the above logic and whether it's an
>     appropriate use of scheduled executor.
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Joe Bowbeer :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

A purge() method was added to Timer in Java 5 -- at the same time that (Scheduled)ThreadPoolExecutor with its purge() method was added.

The backlog of delayed tasks in this case will resemble the input queue delayed by 10 secs.  Purging canceled tasks may be necessary, depending on how many tasks are generated in 10 seconds, and how fast the timer thread can sift through them on its own.

I like the simplicity of the single, repetitive task that only functions if there are messages on the queue that have been sitting there for a while.

Joe


On Mon, Oct 26, 2009 at 6:27 AM, Gregg Wonderly wrote:
Many times when I need to do this, and I don't otherwise have a STPE around, I just create a class that does this.  java.util.Timer was not designed for cancelled tasks at the beginning and it took several iterations (JDK versions) for it to get fixed to sorta work, and it still had a silly purge mechanism instead of taking care of this issue by itself.  I stopped using it because of the garbage it creates for mostly cancel situations like you described.

Gregg Wonderly

Joe Bowbeer wrote:
Scheduled Executor is a thread-pool re-implementation of java.util.Timer.  If you need multiple threads servicing periodic tasks, that is, if your periodic tasks should execute concurrently, then Scheduled Executor is prescribed.

You can accomplish what you are describing with Timer or ScheduledExecutor.  Your code needs to maintain a current cleanup task that is scheduled to execute in 10 seconds.  New messages would cancel the existing cleanup task and schedule a new task.

Alternatively, you could schedule a single repeating task that executes every 10 seconds, and have it do nothing if the last message was queued less than, say, 1 second prior to its execution.

Joe

On Sat, Oct 24, 2009 at 9:11 PM, Robert Nicholson wrote:

   I have a need to process messages off a JMS queue in batches but yet
   still not wait around for messages to arrive before I can flush a
   batch. So occasionally I will flush a batch of records that is less
   than my batch size long. I do this by arranging for a one-time
   scheduled task to run every time I receive a message. The task will
   run in roughly 10 seconds from the time it's scheduled. If in the
   meantime another message arrives in less than 1 second I will cancel
   the previously scheduled task and arrange for another one. etc etc
   etc. So you can see the difference b/w when I scheduled the most
   recent task and the time of receipt of the most recent message
   determines whether or not I will cancel the most recently scheduled
   task. The idea being is that if I'm receiving a steady flow of
   incoming messages I don't need to rely on the scheduled tasks as
   I'll hit my batch size and flush the batch anyway.

   Can somebody comment on the above logic and whether it's an
   appropriate use of scheduled executor.


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Gregg Wonderly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Joe Bowbeer wrote:

> A purge() method was added to Timer in Java 5 -- at the same time that
> (Scheduled)ThreadPoolExecutor with its purge() method was added.
>
> The backlog of delayed tasks in this case will resemble the input queue
> delayed by 10 secs.  Purging canceled tasks may be necessary, depending
> on how many tasks are generated in 10 seconds, and how fast the timer
> thread can sift through them on its own.
>
> I like the simplicity of the single, repetitive task that only functions
> if there are messages on the queue that have been sitting there for a while.

This is the bit of the API that I just don't understand the logic of design.  It
takes the same amount of time to purge an entry from the list rather a
background thread does it post cancel(), or the calling thread does it at
cancel().  When cancel() is use heavily, purge's actions need to to be used
heavily.  When cancel is used lightly, purge's actions can be used occasionally.

If purge's actions were performed each time cancel() was called, then in the
occasional use, it would have limited impact on the calling thread.  When cancel
is used heavily, then purge has to be used unconditionally, and this provides
the needed resolution of garbage build up.

So, I don't understand how not performing the purge actions, every time cancel
is called can really be of any benefit.

The data structure design of the "list" may be a factor, but it would seem to be
to be more beneficial to fix that data structure and purge() on cancel(), no
matter what, compared to the surprises that the API can create today when
software designed for own use of the API changes in use, and the API then
creates problems because it doesn't automatically account for that change.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Looks like a setRemoveOnCancel() method is being added for Java 7. I think it makes cancel() go from O(1) to O(log n). In contrast, purge() is linear in the size of the queue when there is no interference, but potentially quadratic if there's lots of interference. So you can make the time-memory tradeoff decision per STPE instance.

--tim

On Mon, Oct 26, 2009 at 7:16 PM, Gregg Wonderly <gregg@...> wrote:
Joe Bowbeer wrote:
A purge() method was added to Timer in Java 5 -- at the same time that (Scheduled)ThreadPoolExecutor with its purge() method was added.

The backlog of delayed tasks in this case will resemble the input queue delayed by 10 secs.  Purging canceled tasks may be necessary, depending on how many tasks are generated in 10 seconds, and how fast the timer thread can sift through them on its own.

I like the simplicity of the single, repetitive task that only functions if there are messages on the queue that have been sitting there for a while.

This is the bit of the API that I just don't understand the logic of design.  It takes the same amount of time to purge an entry from the list rather a background thread does it post cancel(), or the calling thread does it at cancel().  When cancel() is use heavily, purge's actions need to to be used heavily.  When cancel is used lightly, purge's actions can be used occasionally.

If purge's actions were performed each time cancel() was called, then in the occasional use, it would have limited impact on the calling thread.  When cancel is used heavily, then purge has to be used unconditionally, and this provides the needed resolution of garbage build up.

So, I don't understand how not performing the purge actions, every time cancel is called can really be of any benefit.

The data structure design of the "list" may be a factor, but it would seem to be to be more beneficial to fix that data structure and purge() on cancel(), no matter what, compared to the surprises that the API can create today when software designed for own use of the API changes in use, and the API then creates problems because it doesn't automatically account for that change.

Gregg Wonderly

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Parent Message unknown Re: Use of scheduledexecutor appropriate for this requirement?

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Are you talking about changing the BlockingQueue implementation in STPE from a heap-based structure to a doubly-linked list? Would that do much better in practice than STPE.setRemoveOrCancel() for the kind of use you're talking about (enqueuing lots of tasks only to cancel them most of the time)? It would mean a performance and concurrency hit for what I think of as more typical uses.

Adding setRemoveOrCancel() isn't "patching over issues" -- it's addressing an issue involving a particular pattern of use without breaking or imposing a performance/contention hit on other uses. Yes, it's an API change, but it's the best kind: adding methods to a concrete class.

If you turn on setRemoveOnCancel() and you're canceling 99% of the tasks you enqueue, the queue size n doesn't get huge, so O(log n) is likely to be small. Have you tried this and determined that it isn't sufficiently fast for your intended use? If so, why not create a new implementation of ScheduledExecutorService, possibly extending ThreadPoolExecutor? If SES is too constrained for your intended use, how about creating a new interface (and implementation)? If you do, please share it with us.

--tim

On Tue, Oct 27, 2009 at 12:03 PM, Gregg Wonderly <gergg@...> wrote:
I've written various timer management things over the years, and I've almost always used a time sorted doubly linked list.  Insert is usually implemented by searching from the end of the list for the insertion point guessing that there would not be a random set of times.  Searching from the end thus has advantages compared to the front, either one being an O(n) operation.

Historically this has made insertion happen as an append and thus a O(1) operation, and the cancel is an O(1) remove.  I understand that the API could have been developed for a specific use case where cancel was not an issue, but more and more cases of people doing things that are more like 99% cancel seem to pop up in discussions like this.  It seems like it would be a good idea to rethink the internals of this class rather than continuing to patch over the issues with more API name space.

Gregg Wonderly

Tim Peierls wrote:
Looks like a setRemoveOnCancel() method is being added for Java 7. I think it makes cancel() go from O(1) to O(log n). In contrast, purge() is linear in the size of the queue when there is no interference, but potentially quadratic if there's lots of interference. So you can make the time-memory tradeoff decision per STPE instance.

--tim

On Mon, Oct 26, 2009 at 7:16 PM, Gregg Wonderly <gregg@... <mailto:gregg@...>> wrote:

   Joe Bowbeer wrote:

       A purge() method was added to Timer in Java 5 -- at the same
       time that (Scheduled)ThreadPoolExecutor with its purge() method
       was added.

       The backlog of delayed tasks in this case will resemble the
       input queue delayed by 10 secs.  Purging canceled tasks may be
       necessary, depending on how many tasks are generated in 10
       seconds, and how fast the timer thread can sift through them on
       its own.

       I like the simplicity of the single, repetitive task that only
       functions if there are messages on the queue that have been
       sitting there for a while.


   This is the bit of the API that I just don't understand the logic of
   design.  It takes the same amount of time to purge an entry from the
   list rather a background thread does it post cancel(), or the
   calling thread does it at cancel().  When cancel() is use heavily,
   purge's actions need to to be used heavily.  When cancel is used
   lightly, purge's actions can be used occasionally.

   If purge's actions were performed each time cancel() was called,
   then in the occasional use, it would have limited impact on the
   calling thread.  When cancel is used heavily, then purge has to be
   used unconditionally, and this provides the needed resolution of
   garbage build up.

   So, I don't understand how not performing the purge actions, every
   time cancel is called can really be of any benefit.

   The data structure design of the "list" may be a factor, but it
   would seem to be to be more beneficial to fix that data structure
   and purge() on cancel(), no matter what, compared to the surprises
   that the API can create today when software designed for own use of
   the API changes in use, and the API then creates problems because it
   doesn't automatically account for that change.

   Gregg Wonderly

   _______________________________________________
   Concurrency-interest mailing list
   Concurrency-interest@...
   <mailto:Concurrency-interest@...> ------------------------------------------------------------------------


_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest



_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Gregg Wonderly-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Doubly linked list was just an example, any kind of sorted structure which
includes a partitioned structure that can reduce O(n) to a more manageable N is
fine with me.

One traversal to delete 100 entries (for the setRemoveOnCancel(false) case) vs
100 traversals to delete 100 entries (for the setRemoveOnCancel(true) case) is
the issue that I'm trying to focus on.  That's where data structure design makes
the difference.  Constant time access is what should be happening I think.

The reason why I said setRemoveOnCancel() was patching over the issue was that
it implies that there are performance trade offs for one use case vs the other.
   Deferring that activity doesn't reduce the computational impact of finding
and removing a single entry, it only changes when that impact occurs.  If it
does make an impact for deleting multiple entries, one at a time, then that
implies a data structure design issue to me.

Canceling 10/minute is different than 200/sec.  In either case, you need to be
able to get the work done.  If setRemoveOnCancel() is viable for 10/minute but
not for 200/second, then it seems odd that the method would exist.  It it's not
viable for all N/second (i.e. it scales linearly or worse), then adding it is
just a patch to try and make some cases work out better till the point that
performance degrades and it is no longer a viable solution.

Gregg Wonderly

Tim Peierls wrote:

> Are you talking about changing the BlockingQueue implementation in STPE
> from a heap-based structure to a doubly-linked list? Would that do much
> better in practice than STPE.setRemoveOrCancel() for the kind of use
> you're talking about (enqueuing lots of tasks only to cancel them most
> of the time)? It would mean a performance and concurrency hit for what I
> think of as more typical uses.
>
> Adding setRemoveOrCancel() isn't "patching over issues" -- it's
> addressing an issue involving a particular pattern of use without
> breaking or imposing a performance/contention hit on other uses. Yes,
> it's an API change, but it's the best kind: adding methods to a concrete
> class.
>
> If you turn on setRemoveOnCancel() and you're canceling 99% of the tasks
> you enqueue, the queue size n doesn't get huge, so O(log n) is likely to
> be small. Have you tried this and determined that it isn't sufficiently
> fast for your intended use? If so, why not create a new implementation
> of ScheduledExecutorService, possibly extending ThreadPoolExecutor? If
> SES is too constrained for your intended use, how about creating a new
> interface (and implementation)? If you do, please share it with us.
>
> --tim
>
> On Tue, Oct 27, 2009 at 12:03 PM, Gregg Wonderly <gergg@...
> <mailto:gergg@...>> wrote:
>
>     I've written various timer management things over the years, and
>     I've almost always used a time sorted doubly linked list.  Insert is
>     usually implemented by searching from the end of the list for the
>     insertion point guessing that there would not be a random set of
>     times.  Searching from the end thus has advantages compared to the
>     front, either one being an O(n) operation.
>
>     Historically this has made insertion happen as an append and thus a
>     O(1) operation, and the cancel is an O(1) remove.  I understand that
>     the API could have been developed for a specific use case where
>     cancel was not an issue, but more and more cases of people doing
>     things that are more like 99% cancel seem to pop up in discussions
>     like this.  It seems like it would be a good idea to rethink the
>     internals of this class rather than continuing to patch over the
>     issues with more API name space.
>
>     Gregg Wonderly
>
>     Tim Peierls wrote:
>
>         Looks like a setRemoveOnCancel() method is being added for Java
>         7. I think it makes cancel() go from O(1) to O(log n). In
>         contrast, purge() is linear in the size of the queue when there
>         is no interference, but potentially quadratic if there's lots of
>         interference. So you can make the time-memory tradeoff decision
>         per STPE instance.
>
>         --tim
>
>         On Mon, Oct 26, 2009 at 7:16 PM, Gregg Wonderly
>         <gregg@... <mailto:gregg@...>
>         <mailto:gregg@... <mailto:gregg@...>>> wrote:
>
>            Joe Bowbeer wrote:
>
>                A purge() method was added to Timer in Java 5 -- at the same
>                time that (Scheduled)ThreadPoolExecutor with its purge()
>         method
>                was added.
>
>                The backlog of delayed tasks in this case will resemble the
>                input queue delayed by 10 secs.  Purging canceled tasks
>         may be
>                necessary, depending on how many tasks are generated in 10
>                seconds, and how fast the timer thread can sift through
>         them on
>                its own.
>
>                I like the simplicity of the single, repetitive task that
>         only
>                functions if there are messages on the queue that have been
>                sitting there for a while.
>
>
>            This is the bit of the API that I just don't understand the
>         logic of
>            design.  It takes the same amount of time to purge an entry
>         from the
>            list rather a background thread does it post cancel(), or the
>            calling thread does it at cancel().  When cancel() is use
>         heavily,
>            purge's actions need to to be used heavily.  When cancel is used
>            lightly, purge's actions can be used occasionally.
>
>            If purge's actions were performed each time cancel() was called,
>            then in the occasional use, it would have limited impact on the
>            calling thread.  When cancel is used heavily, then purge has
>         to be
>            used unconditionally, and this provides the needed resolution of
>            garbage build up.
>
>            So, I don't understand how not performing the purge actions,
>         every
>            time cancel is called can really be of any benefit.
>
>            The data structure design of the "list" may be a factor, but it
>            would seem to be to be more beneficial to fix that data structure
>            and purge() on cancel(), no matter what, compared to the
>         surprises
>            that the API can create today when software designed for own
>         use of
>            the API changes in use, and the API then creates problems
>         because it
>            doesn't automatically account for that change.
>
>            Gregg Wonderly
>
>            _______________________________________________
>            Concurrency-interest mailing list
>            Concurrency-interest@...
>         <mailto:Concurrency-interest@...>
>            <mailto:Concurrency-interest@...
>         <mailto:Concurrency-interest@...>>
>
>            http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>         ------------------------------------------------------------------------
>
>
>         _______________________________________________
>         Concurrency-interest mailing list
>         Concurrency-interest@...
>         <mailto:Concurrency-interest@...>
>         http://cs.oswego.edu/mailman/listinfo/concurrency-interest
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Concurrency-interest mailing list
> Concurrency-interest@...
> http://cs.oswego.edu/mailman/listinfo/concurrency-interest

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 27, 2009 at 3:02 PM, Gregg Wonderly <gergg@...> wrote:
Doubly linked list was just an example, any kind of sorted structure which includes a partitioned structure that can reduce O(n) to a more manageable N is fine with me.

How is doubly-linked list a partitioned structure?

 
One traversal to delete 100 entries (for the setRemoveOnCancel(false) case) vs 100 traversals to delete 100 entries (for the setRemoveOnCancel(true) case) is the issue that I'm trying to focus on.  That's where data structure design makes the difference.  Constant time access is what should be happening I think.

Do you have a sorted BlockingQueue implementation in mind that exhibits constant time put, take, and removal of internal nodes? That would be handy.

 
The reason why I said setRemoveOnCancel() was patching over the issue was that it implies that there are performance trade offs for one use case vs the other.  

There *are* trade-offs: setRemoveOnCancel(false) doesn't cause the lock on the shared queue to be acquired for each cancel; setRemoveOnCancel(true) does. If your main concern is avoiding contention on the queue, then you'll probably prefer the former. If your main concern is to avoid filling up the queue with canceled tasks, you'll probably prefer the latter. If both of these are major concerns, then you have decide which approach will work best, periodic purge() or remove-on-cancel or some combination of these.

I don't see a way around these competing concerns.

 
Deferring that activity doesn't reduce the computational impact of finding and removing a single entry, it only changes when that impact occurs.  If it does make an impact for deleting multiple entries, one at a time, then that implies a data structure design issue to me.

Removing a canceled task from the middle of the queue in STPE means holding the lock for O(log n). Taking a canceled task off the end of the queue and discarding it is O(1). Purging is O(n). Are there concurrent data structures that can do better?

 
Canceling 10/minute is different than 200/sec.  In either case, you need to be able to get the work done.  If setRemoveOnCancel() is viable for 10/minute but not for 200/second, then it seems odd that the method would exist.  It it's not viable for all N/second (i.e. it scales linearly or worse), then adding it is just a patch to try and make some cases work out better till the point that performance degrades and it is no longer a viable solution.

Try setRemoveOnCancel(true) in practice. If it doesn't scale, the right response is to come up with an implementation of SES (or to come up with another approach entirely) that solves your problem, not to force all STPE users to risk contention for the queue lock on every cancel.

Unless, as I've been hinting, you have a data structure that balances both sets of concerns without recourse to a configuration property like remove-on-cancel, in which case ... let's see it! :-)

--tim 

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by Gregg Wonderly-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Tim Peierls wrote:
> On Tue, Oct 27, 2009 at 3:02 PM, Gregg Wonderly <gergg@...
> <mailto:gergg@...>> wrote:
>
>     Doubly linked list was just an example, any kind of sorted structure
>     which includes a partitioned structure that can reduce O(n) to a
>     more manageable N is fine with me.
>
> How is doubly-linked list a partitioned structure?

 From a locking perspective, you lock individual objects to traverse, not one
global lock, so there is less contention on a single resource, over time.  If
you then segment the list into groups of lists (as you'd do in a hash map), you
get partitioning of space/time to search too.

>     One traversal to delete 100 entries (for the
>     setRemoveOnCancel(false) case) vs 100 traversals to delete 100
>     entries (for the setRemoveOnCancel(true) case) is the issue that I'm
>     trying to focus on.  That's where data structure design makes the
>     difference.  Constant time access is what should be happening I think.
>
> Do you have a sorted BlockingQueue implementation in mind that exhibits
> constant time put, take, and removal of internal nodes? That would be handy.

No, I've not used BlockingQueue as the mechanism of implementation.  I have used
paired map (key for search was not the timer entry itself) and a doubly linked
list though.  Random times make the insertion more O(n) than constant, but if
times have semi constant intervals, then it might be that most adds are appends,
and so you can search from the back of a doubly linked list to find the
insertion point.

>     The reason why I said setRemoveOnCancel() was patching over the
>     issue was that it implies that there are performance trade offs for
>     one use case vs the other.  
>
> There *are* trade-offs: setRemoveOnCancel(false) doesn't cause the lock
> on the shared queue to be acquired for each cancel;
> setRemoveOnCancel(true) does.

As I said this is an implementation detail.  I'm not suggesting that it is easy
to create a data structure/model that makes all of this easy to code.  I'm just
suggesting that some shortcuts are being taken because of the existing
implementation details.

> I don't see a way around these competing concerns.

I guess you are saying that it's the number of times the lock is held that is
the biggest issue in your mind?  I.e. if the lock is in place for every cancel,
then every add (happening at the same rate) will likely contend for access?

>     Deferring that activity doesn't reduce the computational impact of
>     finding and removing a single entry, it only changes when that
>     impact occurs.  If it does make an impact for deleting multiple
>     entries, one at a time, then that implies a data structure design
>     issue to me.
>
> Removing a canceled task from the middle of the queue in STPE means
> holding the lock for O(log n). Taking a canceled task off the end of the
> queue and discarding it is O(1). Purging is O(n). Are there concurrent
> data structures that can do better?
>
>     Canceling 10/minute is different than 200/sec.  In either case, you
>     need to be able to get the work done.  If setRemoveOnCancel() is
>     viable for 10/minute but not for 200/second, then it seems odd that
>     the method would exist.  It it's not viable for all N/second (i.e.
>     it scales linearly or worse), then adding it is just a patch to try
>     and make some cases work out better till the point that performance
>     degrades and it is no longer a viable solution.
>
> Try setRemoveOnCancel(true) in practice. If it doesn't scale, the right
> response is to come up with an implementation of SES (or to come up with
> another approach entirely) that solves your problem, not to force all
> STPE users to risk contention for the queue lock on every cancel.
>
> Unless, as I've been hinting, you have a data structure that balances
> both sets of concerns without recourse to a configuration property like
> remove-on-cancel, in which case ... let's see it! :-)

I appreciate your excitement to see a better solution.  I don't have one now,
but perhaps I'll play a bit with some of the things I have laying around to see
if there is something there that I can gel into a BlockingQueue.

Gregg Wonderly
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by tpeierls :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Oct 27, 2009 at 5:53 PM, Gregg Wonderly <gregg@...> wrote:
How is doubly-linked list a partitioned structure?

>From a locking perspective, you lock individual objects to traverse, not one global lock, so there is less contention on a single resource, over time.  If you then segment the list into groups of lists (as you'd do in a hash map), you get partitioning of space/time to search too.

Ah, OK -- not just a simple doubly-linked list, but something with hand-over-hand or nested locking.

 
I have used paired map (key for search was not the timer entry itself) and a doubly linked list though.  Random times make the insertion more O(n) than constant, but if times have semi constant intervals, then it might be that most adds are appends, and so you can search from the back of a doubly linked list to find the insertion point.

Not something you'd want to use for general-purpose scheduling, but excellent for certain patterns of use.

 
As I said this is an implementation detail.  I'm not suggesting that it is easy to create a data structure/model that makes all of this easy to code.  I'm just suggesting that some shortcuts are being taken because of the existing implementation details.

I think it's unfair to suggest that setRemoveOnCancel() is a "shortcut" without presenting a better general-purpose SES implementation that renders it unnecessary. You'd have to demonstrate (1) that STPE has pathologically bad behavior in certain practical contexts and (2) that the new impl has good behavior in the same contexts and performance similar to (or better than) STPE in all other practical contexts. I haven't seen (1) demonstrated for the Java 7 STPE yet.

My perspective is that there are *some* uses of STPE -- I won't call them niche uses, but I will say that they are by no means the only uses for STPE -- for which the accumulation of canceled tasks is such a problem that it's appropriate to risk a little contention by removing tasks on cancellation. There was no way to achieve this in the past but as of Java 7 there will be. That's not a patch or a shortcut; that's evolution.

 
I don't see a way around these competing concerns.

I guess you are saying that it's the number of times the lock is held that is the biggest issue in your mind?  I.e. if the lock is in place for every cancel, then every add (happening at the same rate) will likely contend for access?

I'm saying that contention for the lock is one of several competing concerns. In some contexts it might be irrelevant; it sounds like that's the case for the folks who have argued for remove-on-cancel as the normal mode.

 
Unless, as I've been hinting, you have a data structure that balances both sets of concerns without recourse to a configuration property like remove-on-cancel, in which case ... let's see it! :-)

I appreciate your excitement to see a better solution.  I don't have one now, but perhaps I'll play a bit with some of the things I have laying around to see if there is something there that I can gel into a BlockingQueue.

OK, and don't rule out the possibility that the best solution for the class of problems that is motivating you does *not* involve a BlockingQueue (and hence does not involve implementing ScheduledExecutorService); maybe you're looking for a different beast entirely, a "ScheduledTaskExecutor", say. Not having to implement BlockingQueue would remove some constraints on your implementation.

--tim

_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest

Re: Use of scheduledexecutor appropriate for this requirement?

by i30817 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Off topic, but still speaking of the scheduled executor,  the main
reason i changed from timer to scheduled executor was that timer
didn't allow reusing the TimerTask easily (if it is only going to be
used by one thread (resubmit) or is immutable). It was a pain to
recreate the object and add a long list of fields to another
constructor (to allow the required immutability that comes from that
design decision).

So i switched to ScheduledExecutor. However things here are missing.
Especially the scheduledExecutionTime from timer task, the fact that
there is no way to (easily) set it up to die after a time without
tasks (like for the others here :
http://www.kimchy.org/juc-executorservice-gotcha/) no way to set up a
sequence of executions with relative to last time (for ex:
schedule(long [] deltas, List<Runnable> s) - with error correction
from scheduledExecutionTime of course).

This is just whining so you can ignore it, it is already solved
(except the thread dieing when after a time without tasks).
_______________________________________________
Concurrency-interest mailing list
Concurrency-interest@...
http://cs.oswego.edu/mailman/listinfo/concurrency-interest