|
View:
New views
12 Messages
—
Rating Filter:
Alert me
|
|
|
Use of scheduledexecutor appropriate for this requirement?I have a need to process messages off a JMS queue in batches but yet
still not wait around for messages to arrive before I can flush a batch. So occasionally I will flush a batch of records that is less than my batch size long. I do this by arranging for a one-time scheduled task to run every time I receive a message. The task will run in roughly 10 seconds from the time it's scheduled. If in the meantime another message arrives in less than 1 second I will cancel the previously scheduled task and arrange for another one. etc etc etc. So you can see the difference b/w when I scheduled the most recent task and the time of receipt of the most recent message determines whether or not I will cancel the most recently scheduled task. The idea being is that if I'm receiving a steady flow of incoming messages I don't need to rely on the scheduled tasks as I'll hit my batch size and flush the batch anyway. Can somebody comment on the above logic and whether it's an appropriate use of scheduled executor. _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Scheduled Executor is a thread-pool re-implementation of java.util.Timer. If you need multiple threads servicing periodic tasks, that is, if your periodic tasks should execute concurrently, then Scheduled Executor is prescribed.
You can accomplish what you are describing with Timer or ScheduledExecutor. Your code needs to maintain a current cleanup task that is scheduled to execute in 10 seconds. New messages would cancel the existing cleanup task and schedule a new task. Alternatively, you could schedule a single repeating task that executes every 10 seconds, and have it do nothing if the last message was queued less than, say, 1 second prior to its execution. Joe
On Sat, Oct 24, 2009 at 9:11 PM, Robert Nicholson wrote:
I have a need to process messages off a JMS queue in batches but yet still not wait around for messages to arrive before I can flush a batch. So occasionally I will flush a batch of records that is less than my batch size long. I do this by arranging for a one-time scheduled task to run every time I receive a message. The task will run in roughly 10 seconds from the time it's scheduled. If in the meantime another message arrives in less than 1 second I will cancel the previously scheduled task and arrange for another one. etc etc etc. So you can see the difference b/w when I scheduled the most recent task and the time of receipt of the most recent message determines whether or not I will cancel the most recently scheduled task. The idea being is that if I'm receiving a steady flow of incoming messages I don't need to rely on the scheduled tasks as I'll hit my batch size and flush the batch anyway. _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Many times when I need to do this, and I don't otherwise have a STPE around, I
just create a class that does this. java.util.Timer was not designed for cancelled tasks at the beginning and it took several iterations (JDK versions) for it to get fixed to sorta work, and it still had a silly purge mechanism instead of taking care of this issue by itself. I stopped using it because of the garbage it creates for mostly cancel situations like you described. Gregg Wonderly Joe Bowbeer wrote: > Scheduled Executor is a thread-pool re-implementation of > java.util.Timer. If you need multiple threads servicing periodic tasks, > that is, if your periodic tasks should execute concurrently, then > Scheduled Executor is prescribed. > > You can accomplish what you are describing with Timer or > ScheduledExecutor. Your code needs to maintain a current cleanup task > that is scheduled to execute in 10 seconds. New messages would cancel > the existing cleanup task and schedule a new task. > > Alternatively, you could schedule a single repeating task that executes > every 10 seconds, and have it do nothing if the last message was queued > less than, say, 1 second prior to its execution. > > Joe > > On Sat, Oct 24, 2009 at 9:11 PM, Robert Nicholson wrote: > > I have a need to process messages off a JMS queue in batches but yet > still not wait around for messages to arrive before I can flush a > batch. So occasionally I will flush a batch of records that is less > than my batch size long. I do this by arranging for a one-time > scheduled task to run every time I receive a message. The task will > run in roughly 10 seconds from the time it's scheduled. If in the > meantime another message arrives in less than 1 second I will cancel > the previously scheduled task and arrange for another one. etc etc > etc. So you can see the difference b/w when I scheduled the most > recent task and the time of receipt of the most recent message > determines whether or not I will cancel the most recently scheduled > task. The idea being is that if I'm receiving a steady flow of > incoming messages I don't need to rely on the scheduled tasks as > I'll hit my batch size and flush the batch anyway. > > Can somebody comment on the above logic and whether it's an > appropriate use of scheduled executor. > > > ------------------------------------------------------------------------ > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest@... > http://cs.oswego.edu/mailman/listinfo/concurrency-interest _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?A purge() method was added to Timer in Java 5 -- at the same time that (Scheduled)ThreadPoolExecutor with its purge() method was added.
The backlog of delayed tasks in this case will resemble the input queue delayed by 10 secs. Purging canceled tasks may be necessary, depending on how many tasks are generated in 10 seconds, and how fast the timer thread can sift through them on its own. I like the simplicity of the single, repetitive task that only functions if there are messages on the queue that have been sitting there for a while. Joe On Mon, Oct 26, 2009 at 6:27 AM, Gregg Wonderly wrote:
Many times when I need to do this, and I don't otherwise have a STPE around, I just create a class that does this. java.util.Timer was not designed for cancelled tasks at the beginning and it took several iterations (JDK versions) for it to get fixed to sorta work, and it still had a silly purge mechanism instead of taking care of this issue by itself. I stopped using it because of the garbage it creates for mostly cancel situations like you described. _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Joe Bowbeer wrote:
> A purge() method was added to Timer in Java 5 -- at the same time that > (Scheduled)ThreadPoolExecutor with its purge() method was added. > > The backlog of delayed tasks in this case will resemble the input queue > delayed by 10 secs. Purging canceled tasks may be necessary, depending > on how many tasks are generated in 10 seconds, and how fast the timer > thread can sift through them on its own. > > I like the simplicity of the single, repetitive task that only functions > if there are messages on the queue that have been sitting there for a while. This is the bit of the API that I just don't understand the logic of design. It takes the same amount of time to purge an entry from the list rather a background thread does it post cancel(), or the calling thread does it at cancel(). When cancel() is use heavily, purge's actions need to to be used heavily. When cancel is used lightly, purge's actions can be used occasionally. If purge's actions were performed each time cancel() was called, then in the occasional use, it would have limited impact on the calling thread. When cancel is used heavily, then purge has to be used unconditionally, and this provides the needed resolution of garbage build up. So, I don't understand how not performing the purge actions, every time cancel is called can really be of any benefit. The data structure design of the "list" may be a factor, but it would seem to be to be more beneficial to fix that data structure and purge() on cancel(), no matter what, compared to the surprises that the API can create today when software designed for own use of the API changes in use, and the API then creates problems because it doesn't automatically account for that change. Gregg Wonderly _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Looks like a setRemoveOnCancel() method is being added for Java 7. I think it makes cancel() go from O(1) to O(log n). In contrast, purge() is linear in the size of the queue when there is no interference, but potentially quadratic if there's lots of interference. So you can make the time-memory tradeoff decision per STPE instance.
--tim
On Mon, Oct 26, 2009 at 7:16 PM, Gregg Wonderly <gregg@...> wrote:
_______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
|
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Doubly linked list was just an example, any kind of sorted structure which
includes a partitioned structure that can reduce O(n) to a more manageable N is fine with me. One traversal to delete 100 entries (for the setRemoveOnCancel(false) case) vs 100 traversals to delete 100 entries (for the setRemoveOnCancel(true) case) is the issue that I'm trying to focus on. That's where data structure design makes the difference. Constant time access is what should be happening I think. The reason why I said setRemoveOnCancel() was patching over the issue was that it implies that there are performance trade offs for one use case vs the other. Deferring that activity doesn't reduce the computational impact of finding and removing a single entry, it only changes when that impact occurs. If it does make an impact for deleting multiple entries, one at a time, then that implies a data structure design issue to me. Canceling 10/minute is different than 200/sec. In either case, you need to be able to get the work done. If setRemoveOnCancel() is viable for 10/minute but not for 200/second, then it seems odd that the method would exist. It it's not viable for all N/second (i.e. it scales linearly or worse), then adding it is just a patch to try and make some cases work out better till the point that performance degrades and it is no longer a viable solution. Gregg Wonderly Tim Peierls wrote: > Are you talking about changing the BlockingQueue implementation in STPE > from a heap-based structure to a doubly-linked list? Would that do much > better in practice than STPE.setRemoveOrCancel() for the kind of use > you're talking about (enqueuing lots of tasks only to cancel them most > of the time)? It would mean a performance and concurrency hit for what I > think of as more typical uses. > > Adding setRemoveOrCancel() isn't "patching over issues" -- it's > addressing an issue involving a particular pattern of use without > breaking or imposing a performance/contention hit on other uses. Yes, > it's an API change, but it's the best kind: adding methods to a concrete > class. > > If you turn on setRemoveOnCancel() and you're canceling 99% of the tasks > you enqueue, the queue size n doesn't get huge, so O(log n) is likely to > be small. Have you tried this and determined that it isn't sufficiently > fast for your intended use? If so, why not create a new implementation > of ScheduledExecutorService, possibly extending ThreadPoolExecutor? If > SES is too constrained for your intended use, how about creating a new > interface (and implementation)? If you do, please share it with us. > > --tim > > On Tue, Oct 27, 2009 at 12:03 PM, Gregg Wonderly <gergg@... > <mailto:gergg@...>> wrote: > > I've written various timer management things over the years, and > I've almost always used a time sorted doubly linked list. Insert is > usually implemented by searching from the end of the list for the > insertion point guessing that there would not be a random set of > times. Searching from the end thus has advantages compared to the > front, either one being an O(n) operation. > > Historically this has made insertion happen as an append and thus a > O(1) operation, and the cancel is an O(1) remove. I understand that > the API could have been developed for a specific use case where > cancel was not an issue, but more and more cases of people doing > things that are more like 99% cancel seem to pop up in discussions > like this. It seems like it would be a good idea to rethink the > internals of this class rather than continuing to patch over the > issues with more API name space. > > Gregg Wonderly > > Tim Peierls wrote: > > Looks like a setRemoveOnCancel() method is being added for Java > 7. I think it makes cancel() go from O(1) to O(log n). In > contrast, purge() is linear in the size of the queue when there > is no interference, but potentially quadratic if there's lots of > interference. So you can make the time-memory tradeoff decision > per STPE instance. > > --tim > > On Mon, Oct 26, 2009 at 7:16 PM, Gregg Wonderly > <gregg@... <mailto:gregg@...> > <mailto:gregg@... <mailto:gregg@...>>> wrote: > > Joe Bowbeer wrote: > > A purge() method was added to Timer in Java 5 -- at the same > time that (Scheduled)ThreadPoolExecutor with its purge() > method > was added. > > The backlog of delayed tasks in this case will resemble the > input queue delayed by 10 secs. Purging canceled tasks > may be > necessary, depending on how many tasks are generated in 10 > seconds, and how fast the timer thread can sift through > them on > its own. > > I like the simplicity of the single, repetitive task that > only > functions if there are messages on the queue that have been > sitting there for a while. > > > This is the bit of the API that I just don't understand the > logic of > design. It takes the same amount of time to purge an entry > from the > list rather a background thread does it post cancel(), or the > calling thread does it at cancel(). When cancel() is use > heavily, > purge's actions need to to be used heavily. When cancel is used > lightly, purge's actions can be used occasionally. > > If purge's actions were performed each time cancel() was called, > then in the occasional use, it would have limited impact on the > calling thread. When cancel is used heavily, then purge has > to be > used unconditionally, and this provides the needed resolution of > garbage build up. > > So, I don't understand how not performing the purge actions, > every > time cancel is called can really be of any benefit. > > The data structure design of the "list" may be a factor, but it > would seem to be to be more beneficial to fix that data structure > and purge() on cancel(), no matter what, compared to the > surprises > that the API can create today when software designed for own > use of > the API changes in use, and the API then creates problems > because it > doesn't automatically account for that change. > > Gregg Wonderly > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest@... > <mailto:Concurrency-interest@...> > <mailto:Concurrency-interest@... > <mailto:Concurrency-interest@...>> > > http://cs.oswego.edu/mailman/listinfo/concurrency-interest > > > > ------------------------------------------------------------------------ > > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest@... > <mailto:Concurrency-interest@...> > http://cs.oswego.edu/mailman/listinfo/concurrency-interest > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Concurrency-interest mailing list > Concurrency-interest@... > http://cs.oswego.edu/mailman/listinfo/concurrency-interest _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?On Tue, Oct 27, 2009 at 3:02 PM, Gregg Wonderly <gergg@...> wrote: Doubly linked list was just an example, any kind of sorted structure which includes a partitioned structure that can reduce O(n) to a more manageable N is fine with me.
How is doubly-linked list a partitioned structure? One traversal to delete 100 entries (for the setRemoveOnCancel(false) case) vs 100 traversals to delete 100 entries (for the setRemoveOnCancel(true) case) is the issue that I'm trying to focus on. That's where data structure design makes the difference. Constant time access is what should be happening I think. Do you have a sorted BlockingQueue implementation in mind that exhibits constant time put, take, and removal of internal nodes? That would be handy. The reason why I said setRemoveOnCancel() was patching over the issue was that it implies that there are performance trade offs for one use case vs the other.
There *are* trade-offs: setRemoveOnCancel(false) doesn't cause the lock on the shared queue to be acquired for each cancel; setRemoveOnCancel(true) does. If your main concern is avoiding contention on the queue, then you'll probably prefer the former. If your main concern is to avoid filling up the queue with canceled tasks, you'll probably prefer the latter. If both of these are major concerns, then you have decide which approach will work best, periodic purge() or remove-on-cancel or some combination of these.
I don't see a way around these competing concerns. Deferring that activity doesn't reduce the computational impact of finding and removing a single entry, it only changes when that impact occurs. If it does make an impact for deleting multiple entries, one at a time, then that implies a data structure design issue to me. Removing a canceled task from the middle of the queue in STPE means holding the lock for O(log n). Taking a canceled task off the end of the queue and discarding it is O(1). Purging is O(n). Are there concurrent data structures that can do better?
Canceling 10/minute is different than 200/sec. In either case, you need to be able to get the work done. If setRemoveOnCancel() is viable for 10/minute but not for 200/second, then it seems odd that the method would exist. It it's not viable for all N/second (i.e. it scales linearly or worse), then adding it is just a patch to try and make some cases work out better till the point that performance degrades and it is no longer a viable solution. Try setRemoveOnCancel(true) in practice. If it doesn't scale, the right response is to come up with an implementation of SES (or to come up with another approach entirely) that solves your problem, not to force all STPE users to risk contention for the queue lock on every cancel.
Unless, as I've been hinting, you have a data structure that balances both sets of concerns without recourse to a configuration property like remove-on-cancel, in which case ... let's see it! :-)
--tim _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Tim Peierls wrote:
> On Tue, Oct 27, 2009 at 3:02 PM, Gregg Wonderly <gergg@... > <mailto:gergg@...>> wrote: > > Doubly linked list was just an example, any kind of sorted structure > which includes a partitioned structure that can reduce O(n) to a > more manageable N is fine with me. > > How is doubly-linked list a partitioned structure? From a locking perspective, you lock individual objects to traverse, not one global lock, so there is less contention on a single resource, over time. If you then segment the list into groups of lists (as you'd do in a hash map), you get partitioning of space/time to search too. > One traversal to delete 100 entries (for the > setRemoveOnCancel(false) case) vs 100 traversals to delete 100 > entries (for the setRemoveOnCancel(true) case) is the issue that I'm > trying to focus on. That's where data structure design makes the > difference. Constant time access is what should be happening I think. > > Do you have a sorted BlockingQueue implementation in mind that exhibits > constant time put, take, and removal of internal nodes? That would be handy. No, I've not used BlockingQueue as the mechanism of implementation. I have used paired map (key for search was not the timer entry itself) and a doubly linked list though. Random times make the insertion more O(n) than constant, but if times have semi constant intervals, then it might be that most adds are appends, and so you can search from the back of a doubly linked list to find the insertion point. > The reason why I said setRemoveOnCancel() was patching over the > issue was that it implies that there are performance trade offs for > one use case vs the other. > > There *are* trade-offs: setRemoveOnCancel(false) doesn't cause the lock > on the shared queue to be acquired for each cancel; > setRemoveOnCancel(true) does. As I said this is an implementation detail. I'm not suggesting that it is easy to create a data structure/model that makes all of this easy to code. I'm just suggesting that some shortcuts are being taken because of the existing implementation details. > I don't see a way around these competing concerns. I guess you are saying that it's the number of times the lock is held that is the biggest issue in your mind? I.e. if the lock is in place for every cancel, then every add (happening at the same rate) will likely contend for access? > Deferring that activity doesn't reduce the computational impact of > finding and removing a single entry, it only changes when that > impact occurs. If it does make an impact for deleting multiple > entries, one at a time, then that implies a data structure design > issue to me. > > Removing a canceled task from the middle of the queue in STPE means > holding the lock for O(log n). Taking a canceled task off the end of the > queue and discarding it is O(1). Purging is O(n). Are there concurrent > data structures that can do better? > > Canceling 10/minute is different than 200/sec. In either case, you > need to be able to get the work done. If setRemoveOnCancel() is > viable for 10/minute but not for 200/second, then it seems odd that > the method would exist. It it's not viable for all N/second (i.e. > it scales linearly or worse), then adding it is just a patch to try > and make some cases work out better till the point that performance > degrades and it is no longer a viable solution. > > Try setRemoveOnCancel(true) in practice. If it doesn't scale, the right > response is to come up with an implementation of SES (or to come up with > another approach entirely) that solves your problem, not to force all > STPE users to risk contention for the queue lock on every cancel. > > Unless, as I've been hinting, you have a data structure that balances > both sets of concerns without recourse to a configuration property like > remove-on-cancel, in which case ... let's see it! :-) I appreciate your excitement to see a better solution. I don't have one now, but perhaps I'll play a bit with some of the things I have laying around to see if there is something there that I can gel into a BlockingQueue. Gregg Wonderly _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?On Tue, Oct 27, 2009 at 5:53 PM, Gregg Wonderly <gregg@...> wrote:
Ah, OK -- not just a simple doubly-linked list, but something with hand-over-hand or nested locking.
Not something you'd want to use for general-purpose scheduling, but excellent for certain patterns of use.
I think it's unfair to suggest that setRemoveOnCancel() is a "shortcut" without presenting a better general-purpose SES implementation that renders it unnecessary. You'd have to demonstrate (1) that STPE has pathologically bad behavior in certain practical contexts and (2) that the new impl has good behavior in the same contexts and performance similar to (or better than) STPE in all other practical contexts. I haven't seen (1) demonstrated for the Java 7 STPE yet.
My perspective is that there are *some* uses of STPE -- I won't call them niche uses, but I will say that they are by no means the only uses for STPE -- for which the accumulation of canceled tasks is such a problem that it's appropriate to risk a little contention by removing tasks on cancellation. There was no way to achieve this in the past but as of Java 7 there will be. That's not a patch or a shortcut; that's evolution.
I'm saying that contention for the lock is one of several competing concerns. In some contexts it might be irrelevant; it sounds like that's the case for the folks who have argued for remove-on-cancel as the normal mode.
OK, and don't rule out the possibility that the best solution for the class of problems that is motivating you does *not* involve a BlockingQueue (and hence does not involve implementing ScheduledExecutorService); maybe you're looking for a different beast entirely, a "ScheduledTaskExecutor", say. Not having to implement BlockingQueue would remove some constraints on your implementation.
--tim _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
|
|
Re: Use of scheduledexecutor appropriate for this requirement?Off topic, but still speaking of the scheduled executor, the main
reason i changed from timer to scheduled executor was that timer didn't allow reusing the TimerTask easily (if it is only going to be used by one thread (resubmit) or is immutable). It was a pain to recreate the object and add a long list of fields to another constructor (to allow the required immutability that comes from that design decision). So i switched to ScheduledExecutor. However things here are missing. Especially the scheduledExecutionTime from timer task, the fact that there is no way to (easily) set it up to die after a time without tasks (like for the others here : http://www.kimchy.org/juc-executorservice-gotcha/) no way to set up a sequence of executions with relative to last time (for ex: schedule(long [] deltas, List<Runnable> s) - with error correction from scheduledExecutionTime of course). This is just whining so you can ignore it, it is already solved (except the thread dieing when after a time without tasks). _______________________________________________ Concurrency-interest mailing list Concurrency-interest@... http://cs.oswego.edu/mailman/listinfo/concurrency-interest |
| Free embeddable forum powered by Nabble | Forum Help |