File upload progress

View: New views
6 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

RE: File upload progress

by Mike Wilson-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 19 juni 2009 10:43
To: users@...
Subject: Re: [dwr-user] File upload progress

That's an interesting thought.

On a side note... do you think I should add getInvocationId() to WebContext?
With this available, we'd be one step closer to supporting progress updates for long running tasks other than fileUploads.

2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

> Maybe this is what you suggested initially ? :-)
It sure was!!
Sorry about that... <:-S
Ok, so you want getProgress() to be a serverside call. Personally I don't care... I think most people will use the progressHandler approach. 
So do I, so let's make that first. An interesting use for the getProgress() version though is the following:
var future1 = remote.uploadFile(..., {errorHandler:...}); 
var future2 = remote.uploadFile(..., {errorHandler:...})
... 
setTimeout(function(){
    dwr.engine.beginBatch();
    future1.getProgress(function(pinfo){...}); 
    future2.getProgress(function(pinfo){...});
    dwr.engine.endBatch();
}, 1000);
Ie, user code can ask to get several progress remote calls packed into one request.
(Note that this whole thing is assuming a browser with >2 connections allowed to the server.)
Best regards
Mike 
2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

Ah... ok... future.getProgress() is a client side action and the progress gets updated by the progressHandler. I am happy to go ahead with this solution. 
No, I was thinking that getProgress() could be a low-level access to the progress API when the user wants to set up his own timer. I was assuming that this syntax:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
would imply that we keep asking for progress updates at regular intervals and call the supplied progressHandler?
The getProgress() construct would rather look like this:
var future = remote.uploadFile(..., {errorHandler:...});
...
future.getProgress(function(progressinfo){progressinfo.percentage...});
and ask for progress just once (and it should have a callback just like you point out for cancel() as it leads to a remote call). Maybe this is what you suggested initially ? :-)
Anyway, the progressHandler call option is the most important one here, I think. I just thought that we could implement the getProgress() one as well, as it seems pretty easy. The difference is that the former is called by a timer that we keep in DWR but the latter is triggered from user code.
 
Best regards
Mike
I think we can still pass a handler to cancel though since it invokes a remote call.
  future.cancel(function() {
     alert('Task cancelled on server');
  });

As for changing the progressHandler or callback in the middle of the call... I'm not going anywhere near this but I agree it's possible. 
Yes, I think there is no reason for us to do this, so let's skip it.
Cheers,
Lance.


2009/6/18 Mike Wilson <mikewse@...>
I think we are mixing two ways of specifying call options here, I'll try to describe what I mean:
 
Currently we specify all call options in the last call argument:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
These call options are input to the call, telling DWR how to process the call.
With the new returned "future" object:
var future = remote.uploadFile(...);
we now have a handle to everything supplied as input to the call, and its future output. In theory, this could include being able to examine, or change, all call options supplied in the actual call:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
future.getErrorHandler()
future.getProgressHandler()
future.setProgressHandler(function...) // corresponds to your getProgress()
This could offer a totally new way of supplying call options, but on the other hand allows the user to change handlers in the middle of a call (which could mean more cases for us to handle) and can only be supported for a subset of call options (f ex not for the async or ordered flags). I think we should probably skip this kind of API for the time being.
 
OTOH, things that I think we should provide on the "future" object are methods that don't correspond to call options. The cancel() method is one example, and "my version" of a getProgress() method that just manually asks for the current progress once and returns that value is another:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
...
var progressinfo = future.getProgress();
if (progressinfo.percentage < 20) future.cancel();
This could be used by user code that wants to implement their own progress checking without relying on our timer.
What do you think about this distinction?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:53

To: users@...
Subject: Re: [dwr-user] File upload progress

It will be more like:

var future = remote.uploadFile(...);
future.getProgress(function(progress) {
   progress.percentage;
   progress.totalBytes; 
});

Cheers,
Lance

2009/6/18 Mike Wilson <mikewse@...>
My personal taste, and I believe more on the "JS style" of things, is to have less methods and more stuff as object properties, so I would go for a single method and an object with all needed info, thus:
 
    var future = remote.uploadFile(...);
    var progress = future.getProgress();
    ->
    progress.percentage
    progress.totalBytes
    etc...
 
future.cancel() sounds good too!
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:02

To: users@...
Subject: Re: [dwr-user] File upload progress

I like the sound of this and it's a decent first cut.

What object do think I should return from getProgress()? I could either return an object that can be used in all cases (percentage, totalBytes, currentBytes) or I could have two functions:
   getProgress() // returns percentage
   getBytesProgress() // returns totalBytes, currentBytes

Also, I will have a cancel() function on the returned object.

2009/6/17 Mike Wilson <mikewse@...>
So, I propose the following:
  1. Lance implements the progress function along these lines for RC2 (or later?):
    • client-side algorithm with Math.random() etc as described to generate invocationId
    • no logic for using custom invocationId
    • update dwr.engine.transport.send to return the batch (or some object wrapping the batch or batchId) for everything except XHR with async:false
    • add a getProgress() method to the returned object (batch or other) that will transparently use the saved invocationId on the batch
  2. Later (for RC3 or 3.0) I implement:
    • server-side entropy cookie
    • change engine.js scriptSessionId handling to use the client-side progress algorithm (including entropy cookie) and replace the current server-side solution
What do you all think?
 
Best regards
Mike


From: Mike Wilson [mailto:mikewse@...]
Sent: den 15 juni 2009 17:49 Subject: RE: [dwr-user] File upload progress

If doing (1) we could probably combine the following available sources to get more randomness:
  • Math.random()
  • window.location
  • document.cookie
  • new Date().getTime() executed at include of engine.js
  • new Date().getTime() executed at first call
  • checksum of serialized inbound data in first call (although this is pretty uninteresting if the first call is a poll ;-)
We could also make a "light" (no extra request) combination of version (1) and (4) by letting all DWR responses carry a small entropy cookie that we change for every response. This would put the entropy algorithm on the server and a page would take a snapshot of the value at load time to use for constructing its scriptSessionId that it will use throughout its own lifetime. Normally, the entropy cookie would be set before the first call as we would deliver it in the engine.js response as well.
In theory, it would be enough with a 10-digit string containing [0-9A-F] to serve 1000 unique seeds per second for 35 years.
 
I was thinking that maybe this cookie could also be used for your suggestion on the new CSRF protection mechanism, but I made some tests and at least IE (surprise) seems to do no locking and be very liberal about changing cookie values under your feet when you work with the same cookie in multiple windows. So this would need some more work.
 
Though, I think we may be close to a scriptSessionid solution with the above algorithm. And yes, let's see where we end up in the discussion. Maybe it's a 3.1/3.5/4.0 feature.
 
Best regards
Mike


From: joseph.walker@... [mailto:joseph.walker@...] On Behalf Of Joe Walker
Sent: den 13 juni 2009 09:25
To: users@...
Subject: Re: [dwr-user] File upload progress


I would vote for alternatives in the order 1, 3, 2 as you, Mike. Although I would temper that with the need to do a release. It feels like this maybe stepping into new features a bit.

This is my trail of how to create a crypto secure random # in JavaScript:

Version 1:
randomPassword = function(length) {
    length = length || 16;
    var chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
    var pass = "";
    for (var x = 0; x < length; x++) {
        var charIndex = Math.floor(Math.random() * chars.length);
        pass += chars.charAt(charIndex);
    }
    return pass;
}

I have a feeling that Math.random() is decent on IE, but not generally trustworthy.
http://stackoverflow.com/questions/578700/how-trustworthy-is-javascripts-random-implementation-in-various-browsers/578714 certainly dislikes it.

Version 2:
window.crypto.random();
For once mozdev is up to date (https://developer.mozilla.org/en/JavaScript_crypto) I tried window.crypto.random(10); and it indeed is a function that dies instantly.

Version 3:
Clipperzlib is an AGPL, JavaScript PRNG http://sourceforge.net/projects/clipperzlib http://www.clipperz.com/open_source/javascript_crypto_library

Version 4:
We could even have a dwr "send me a random number function". ;-)

I would have thought that version 1 could be updated with some entropy as you noted Mike, without too much difficulty.

Joe.


On Fri, Jun 12, 2009 at 9:48 AM, Mike Wilson <mikewse@...> wrote:
That's true, but I find the principle important about not bloating DWR with multiple algorithms filling the same purpose.
 
We have the following alternatives:
  1. Use new client-generated ID for both scriptSessionId and progress.
  2. Use current server-generated ID for scriptSessionId and new client-generated ID for progress.
  3. Use current server-generated ID for both scriptSessionId and progress.
I understand that you Jose, and I think Lance, prefer (2)?
If possible to implement, I would choose (1) but if not, I lean towards (3).
Then again, if everybody else agrees on (2) I don't have a problem with that.
Joe: did you have any thoughts on how an algorithm for (1) could look? Maybe add some enthropy from the browser window's x/y and width/height :-).
 
Best regards
Mike


From: Jose Noheda [mailto:jose.noheda@...]
Sent: den 12 juni 2009 08:48

To: users@...
Subject: Re: [dwr-user] File upload progress

I'm all for a client side generated id for upload management only. I think we have deviated a lot from our original purpose

Regards

On Thu, Jun 11, 2009 at 11:13 PM, Joe Walker <joe@...> wrote:

I'm not convinced that we need server side IDs at all do we?

The reason I originally changed my code from client-side generated IDs to server-side generated ones was over 2 fears:
- ID spoofing: that there might be a way to session fixate someone (there's a good wikipedia article on it)
- Denial of Service: that a rogue client could flood the server with IDs

Traditional Http-Session fixation isn't prevented by server-side generated IDs, so I think that's bogus. I would have thought that it was possible to be somewhat smart with the way you handled client-side generated IDs to prevent trivial DoS attacks.

On the other hand there are considerable advantages in Client generated IDs:
- They are immune to HttpOnly cookie settings
- The client can always do the same thing allowing the server to manage what it tracks, it if wants to track anything at all.

For the case of file upload with a stateless scriptsession manager, we could easily have the ID tracking done by the download manager only for IDs involved in a download, and then the minimum memory is used.

What do you think?

Joe.


On Thu, Jun 11, 2009 at 10:44 AM, Mike Wilson <mikewse@...> wrote:
I'd like to first settle if we should ship an alternative "id" algorithm in our codebase at all.
 
Ie, do we want to provide and maintain a separate client-side id algorithm for progress handling when we already have the scriptSessionId+batchId algorithm solving the same problem?
If I am the only one against providing the alternative algorithm, then no problem, let's go for it.
Joe: you've been doing work in this area before, can we hear your input on this?
 
Depending on this decision we can then continue the discussion on the looks of the API.
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 11 juni 2009 10:01
To: users@...
Subject: Re: [dwr-user] File upload progress

Firstly, I think I'll change UploadManager to be ProgressManager.

How's this for a solution.

Have a new flag in engine.js:
   dwr.engine._trustRandomForInvocationId

I realise random numbers are generated by current time so 2 browsers can get the same random at the same time. If using a HttpSessionProgressManager there's no real chance of a clash though.

When generating an invocation id (clientside):
1. If (scriptSessionId != null) use scriptSessionId + batchId
2. If (scriptSessionId == null && trustRandomForInvocationId) use a random number for invocationId
3. Make a serverside call then use method 1.

An extension to this, a new method could be added to the ProgressManager interface.
   public boolean isTrustRandomForInvocationId()

This would return false for the global manager and true for HttpSessionProgressManager and ScriptSessionProgressManager.

We could then use this method when generating engine.js and set the default for the flag:
   dwr.engine._trustRandomForInvocationId = ${progressManager.trustRandomForInvocationId}

The developer can override this in javascript:
  dwr.engine.setTrustRandomForInvocationId(false);


2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I was hoping that the invocationId generation technique did not care about which upload manager implementation was being used. It therefore needs to assume the worst (unique to the application). The invocationId is generated in javascript. This may include a serverside generated token (ie scriptSessionId). 
Ah, ok, I don't mind either way here, and now I understand your point about a "complete" invocationId. 
One could say that adding the scriptSessionId to the invocationId in client code is redundant when a ScriptSession-based progress manager is used (as invocationIds are scoped on the current script session) but it certainly doesn't do any harm. And as you say, it lets you choose any progress manager (although it would be natural to choose the ScriptSession-based manager when the invocationId is based on scriptSessionId). No worries there.
 
Do we agree to use the scriptSessionId-based algorithm as default, and have the user implement their own algorithm in case they want something different?
 
Best regards
Mike
2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I also came up with a JSP taglib suggestion to avoid the initial request
Right, that's good for JSP users.
Please keep in mind that in all of my talk on the subject. I have meant that invocationId is an application wide unique identifier for the invocation (ie scriptSessionId + batchId). I have seen mike using it to mean batch id. I'd prefer to stick with my terminology if that's ok. 
I think we mean the same thing, I am just pointing out that depending on the server-side manager the supplied invocationId is working inside different-sized scopes, and therefore have different requirements on "how unique" we have to make it. For a session-based manager the "effective" invocationId is really JSESSIONID + the invocationId from engine.js.
 
Isn't it so, that when you say "application wide" you are assuming that you are using the application global upload manager, which doesn't add anything to the "effective invocationId"?
 
Best regards
Mike










Re: File upload progress

by Lance Java :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message



2009/6/21 <mike@...>
It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.

I just maintain a list of handlers and call all of them with the progress.
 
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike

I guess there's the option of

class Batch {
   Call[] calls;
   String invocationId;
   int batchId;
}

class Call {
   Method method;
   Object[] args;
   Object target;
   int callId;
}

webContext.getBatch()

I'm not sure if this information is helpful though.



From: Lance Java [mailto:lance.java@...]
Sent: den 19 juni 2009 10:43

To: users@...
Subject: Re: [dwr-user] File upload progress

That's an interesting thought.

On a side note... do you think I should add getInvocationId() to WebContext?
With this available, we'd be one step closer to supporting progress updates for long running tasks other than fileUploads.

2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

> Maybe this is what you suggested initially ? :-)
It sure was!!
Sorry about that... <:-S
Ok, so you want getProgress() to be a serverside call. Personally I don't care... I think most people will use the progressHandler approach. 
So do I, so let's make that first. An interesting use for the getProgress() version though is the following:
var future1 = remote.uploadFile(..., {errorHandler:...}); 
var future2 = remote.uploadFile(..., {errorHandler:...})
... 
setTimeout(function(){
    dwr.engine.beginBatch();
    future1.getProgress(function(pinfo){...}); 
    future2.getProgress(function(pinfo){...});
    dwr.engine.endBatch();
}, 1000);
Ie, user code can ask to get several progress remote calls packed into one request.
(Note that this whole thing is assuming a browser with >2 connections allowed to the server.)
Best regards
Mike 
2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

Ah... ok... future.getProgress() is a client side action and the progress gets updated by the progressHandler. I am happy to go ahead with this solution. 
No, I was thinking that getProgress() could be a low-level access to the progress API when the user wants to set up his own timer. I was assuming that this syntax:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
would imply that we keep asking for progress updates at regular intervals and call the supplied progressHandler?
The getProgress() construct would rather look like this:
var future = remote.uploadFile(..., {errorHandler:...});
...
future.getProgress(function(progressinfo){progressinfo.percentage...});
and ask for progress just once (and it should have a callback just like you point out for cancel() as it leads to a remote call). Maybe this is what you suggested initially ? :-)
Anyway, the progressHandler call option is the most important one here, I think. I just thought that we could implement the getProgress() one as well, as it seems pretty easy. The difference is that the former is called by a timer that we keep in DWR but the latter is triggered from user code.
 
Best regards
Mike
I think we can still pass a handler to cancel though since it invokes a remote call.
  future.cancel(function() {
     alert('Task cancelled on server');
  });

As for changing the progressHandler or callback in the middle of the call... I'm not going anywhere near this but I agree it's possible. 
Yes, I think there is no reason for us to do this, so let's skip it.
Cheers,
Lance.


2009/6/18 Mike Wilson <mikewse@...>
I think we are mixing two ways of specifying call options here, I'll try to describe what I mean:
 
Currently we specify all call options in the last call argument:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
These call options are input to the call, telling DWR how to process the call.
With the new returned "future" object:
var future = remote.uploadFile(...);
we now have a handle to everything supplied as input to the call, and its future output. In theory, this could include being able to examine, or change, all call options supplied in the actual call:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
future.getErrorHandler()
future.getProgressHandler()
future.setProgressHandler(function...) // corresponds to your getProgress()
This could offer a totally new way of supplying call options, but on the other hand allows the user to change handlers in the middle of a call (which could mean more cases for us to handle) and can only be supported for a subset of call options (f ex not for the async or ordered flags). I think we should probably skip this kind of API for the time being.
 
OTOH, things that I think we should provide on the "future" object are methods that don't correspond to call options. The cancel() method is one example, and "my version" of a getProgress() method that just manually asks for the current progress once and returns that value is another:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
...
var progressinfo = future.getProgress();
if (progressinfo.percentage < 20) future.cancel();
This could be used by user code that wants to implement their own progress checking without relying on our timer.
What do you think about this distinction?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:53

To: users@...
Subject: Re: [dwr-user] File upload progress

It will be more like:

var future = remote.uploadFile(...);
future.getProgress(function(progress) {
   progress.percentage;
   progress.totalBytes; 
});

Cheers,
Lance

2009/6/18 Mike Wilson <mikewse@...>
My personal taste, and I believe more on the "JS style" of things, is to have less methods and more stuff as object properties, so I would go for a single method and an object with all needed info, thus:
 
    var future = remote.uploadFile(...);
    var progress = future.getProgress();
    ->
    progress.percentage
    progress.totalBytes
    etc...
 
future.cancel() sounds good too!
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:02

To: users@...
Subject: Re: [dwr-user] File upload progress

I like the sound of this and it's a decent first cut.

What object do think I should return from getProgress()? I could either return an object that can be used in all cases (percentage, totalBytes, currentBytes) or I could have two functions:
   getProgress() // returns percentage
   getBytesProgress() // returns totalBytes, currentBytes

Also, I will have a cancel() function on the returned object.

2009/6/17 Mike Wilson <mikewse@...>
So, I propose the following:
  1. Lance implements the progress function along these lines for RC2 (or later?):
    • client-side algorithm with Math.random() etc as described to generate invocationId
    • no logic for using custom invocationId
    • update dwr.engine.transport.send to return the batch (or some object wrapping the batch or batchId) for everything except XHR with async:false
    • add a getProgress() method to the returned object (batch or other) that will transparently use the saved invocationId on the batch
  2. Later (for RC3 or 3.0) I implement:
    • server-side entropy cookie
    • change engine.js scriptSessionId handling to use the client-side progress algorithm (including entropy cookie) and replace the current server-side solution
What do you all think?
 
Best regards
Mike


From: Mike Wilson [mailto:mikewse@...]
Sent: den 15 juni 2009 17:49 Subject: RE: [dwr-user] File upload progress

If doing (1) we could probably combine the following available sources to get more randomness:
  • Math.random()
  • window.location
  • document.cookie
  • new Date().getTime() executed at include of engine.js
  • new Date().getTime() executed at first call
  • checksum of serialized inbound data in first call (although this is pretty uninteresting if the first call is a poll ;-)
We could also make a "light" (no extra request) combination of version (1) and (4) by letting all DWR responses carry a small entropy cookie that we change for every response. This would put the entropy algorithm on the server and a page would take a snapshot of the value at load time to use for constructing its scriptSessionId that it will use throughout its own lifetime. Normally, the entropy cookie would be set before the first call as we would deliver it in the engine.js response as well.
In theory, it would be enough with a 10-digit string containing [0-9A-F] to serve 1000 unique seeds per second for 35 years.
 
I was thinking that maybe this cookie could also be used for your suggestion on the new CSRF protection mechanism, but I made some tests and at least IE (surprise) seems to do no locking and be very liberal about changing cookie values under your feet when you work with the same cookie in multiple windows. So this would need some more work.
 
Though, I think we may be close to a scriptSessionid solution with the above algorithm. And yes, let's see where we end up in the discussion. Maybe it's a 3.1/3.5/4.0 feature.
 
Best regards
Mike


From: joseph.walker@... [mailto:joseph.walker@...] On Behalf Of Joe Walker
Sent: den 13 juni 2009 09:25
To: users@...
Subject: Re: [dwr-user] File upload progress


I would vote for alternatives in the order 1, 3, 2 as you, Mike. Although I would temper that with the need to do a release. It feels like this maybe stepping into new features a bit.

This is my trail of how to create a crypto secure random # in JavaScript:

Version 1:
randomPassword = function(length) {
    length = length || 16;
    var chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
    var pass = "";
    for (var x = 0; x < length; x++) {
        var charIndex = Math.floor(Math.random() * chars.length);
        pass += chars.charAt(charIndex);
    }
    return pass;
}

I have a feeling that Math.random() is decent on IE, but not generally trustworthy.
http://stackoverflow.com/questions/578700/how-trustworthy-is-javascripts-random-implementation-in-various-browsers/578714 certainly dislikes it.

Version 2:
window.crypto.random();
For once mozdev is up to date (https://developer.mozilla.org/en/JavaScript_crypto) I tried window.crypto.random(10); and it indeed is a function that dies instantly.

Version 3:
Clipperzlib is an AGPL, JavaScript PRNG http://sourceforge.net/projects/clipperzlib http://www.clipperz.com/open_source/javascript_crypto_library

Version 4:
We could even have a dwr "send me a random number function". ;-)

I would have thought that version 1 could be updated with some entropy as you noted Mike, without too much difficulty.

Joe.


On Fri, Jun 12, 2009 at 9:48 AM, Mike Wilson <mikewse@...> wrote:
That's true, but I find the principle important about not bloating DWR with multiple algorithms filling the same purpose.
 
We have the following alternatives:
  1. Use new client-generated ID for both scriptSessionId and progress.
  2. Use current server-generated ID for scriptSessionId and new client-generated ID for progress.
  3. Use current server-generated ID for both scriptSessionId and progress.
I understand that you Jose, and I think Lance, prefer (2)?
If possible to implement, I would choose (1) but if not, I lean towards (3).
Then again, if everybody else agrees on (2) I don't have a problem with that.
Joe: did you have any thoughts on how an algorithm for (1) could look? Maybe add some enthropy from the browser window's x/y and width/height :-).
 
Best regards
Mike


From: Jose Noheda [mailto:jose.noheda@...]
Sent: den 12 juni 2009 08:48

To: users@...
Subject: Re: [dwr-user] File upload progress

I'm all for a client side generated id for upload management only. I think we have deviated a lot from our original purpose

Regards

On Thu, Jun 11, 2009 at 11:13 PM, Joe Walker <joe@...> wrote:

I'm not convinced that we need server side IDs at all do we?

The reason I originally changed my code from client-side generated IDs to server-side generated ones was over 2 fears:
- ID spoofing: that there might be a way to session fixate someone (there's a good wikipedia article on it)
- Denial of Service: that a rogue client could flood the server with IDs

Traditional Http-Session fixation isn't prevented by server-side generated IDs, so I think that's bogus. I would have thought that it was possible to be somewhat smart with the way you handled client-side generated IDs to prevent trivial DoS attacks.

On the other hand there are considerable advantages in Client generated IDs:
- They are immune to HttpOnly cookie settings
- The client can always do the same thing allowing the server to manage what it tracks, it if wants to track anything at all.

For the case of file upload with a stateless scriptsession manager, we could easily have the ID tracking done by the download manager only for IDs involved in a download, and then the minimum memory is used.

What do you think?

Joe.


On Thu, Jun 11, 2009 at 10:44 AM, Mike Wilson <mikewse@...> wrote:
I'd like to first settle if we should ship an alternative "id" algorithm in our codebase at all.
 
Ie, do we want to provide and maintain a separate client-side id algorithm for progress handling when we already have the scriptSessionId+batchId algorithm solving the same problem?
If I am the only one against providing the alternative algorithm, then no problem, let's go for it.
Joe: you've been doing work in this area before, can we hear your input on this?
 
Depending on this decision we can then continue the discussion on the looks of the API.
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 11 juni 2009 10:01
To: users@...
Subject: Re: [dwr-user] File upload progress

Firstly, I think I'll change UploadManager to be ProgressManager.

How's this for a solution.

Have a new flag in engine.js:
   dwr.engine._trustRandomForInvocationId

I realise random numbers are generated by current time so 2 browsers can get the same random at the same time. If using a HttpSessionProgressManager there's no real chance of a clash though.

When generating an invocation id (clientside):
1. If (scriptSessionId != null) use scriptSessionId + batchId
2. If (scriptSessionId == null && trustRandomForInvocationId) use a random number for invocationId
3. Make a serverside call then use method 1.

An extension to this, a new method could be added to the ProgressManager interface.
   public boolean isTrustRandomForInvocationId()

This would return false for the global manager and true for HttpSessionProgressManager and ScriptSessionProgressManager.

We could then use this method when generating engine.js and set the default for the flag:
   dwr.engine._trustRandomForInvocationId = ${progressManager.trustRandomForInvocationId}

The developer can override this in javascript:
  dwr.engine.setTrustRandomForInvocationId(false);


2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I was hoping that the invocationId generation technique did not care about which upload manager implementation was being used. It therefore needs to assume the worst (unique to the application). The invocationId is generated in javascript. This may include a serverside generated token (ie scriptSessionId). 
Ah, ok, I don't mind either way here, and now I understand your point about a "complete" invocationId. 
One could say that adding the scriptSessionId to the invocationId in client code is redundant when a ScriptSession-based progress manager is used (as invocationIds are scoped on the current script session) but it certainly doesn't do any harm. And as you say, it lets you choose any progress manager (although it would be natural to choose the ScriptSession-based manager when the invocationId is based on scriptSessionId). No worries there.
 
Do we agree to use the scriptSessionId-based algorithm as default, and have the user implement their own algorithm in case they want something different?
 
Best regards
Mike
2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I also came up with a JSP taglib suggestion to avoid the initial request
Right, that's good for JSP users.
Please keep in mind that in all of my talk on the subject. I have meant that invocationId is an application wide unique identifier for the invocation (ie scriptSessionId + batchId). I have seen mike using it to mean batch id. I'd prefer to stick with my terminology if that's ok. 
I think we mean the same thing, I am just pointing out that depending on the server-side manager the supplied invocationId is working inside different-sized scopes, and therefore have different requirements on "how unique" we have to make it. For a session-based manager the "effective" invocationId is really JSESSIONID + the invocationId from engine.js.
 
Isn't it so, that when you say "application wide" you are assuming that you are using the application global upload manager, which doesn't add anything to the "effective invocationId"?
 
Best regards
Mike











Re: File upload progress

by Lance Java :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hmm... I think i responded too soon before thinking your email through. You are considering the possibility of having different calls in a batch each having their own progress meter. This is very difficult to do for fileUploads because you need to parse the request to get the parameter names which is used to group parameters into calls. Hence you've done all the work before updating any progress.

For long running tasks however... each call in a batch could update a separate progress. In this case, calls would need to know their id. If we maintained a currentCall() on webContext, we could achieve this. I'm not sure I like the idea of a thread local that changes but am open to the idea.

Eg

webContext.getBatch() // determined after parsing the request into a batch
webContext.getCurrentCall() // this changes as DWR loops through the batch.

interface ProgressManager {
   public Progress getProgress(String invocationId, int callId);
   public void updateProgress(String invocationId, int callId, Progress progress);
   public void cancel(String invocationId, int callId);
}

I seem to remember that there is an option to run a batch's calls in parallel threads? I guess this still works as each thread has it's own threadLocal.

2009/6/21 Lance Java <lance.java@...>


2009/6/21 <mike@...>

It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.

I just maintain a list of handlers and call all of them with the progress.
 
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike

I guess there's the option of

class Batch {
   Call[] calls;
   String invocationId;
   int batchId;
}

class Call {
   Method method;
   Object[] args;
   Object target;
   int callId;
}

webContext.getBatch()

I'm not sure if this information is helpful though.



From: Lance Java [mailto:lance.java@...]
Sent: den 19 juni 2009 10:43

To: users@...
Subject: Re: [dwr-user] File upload progress

That's an interesting thought.

On a side note... do you think I should add getInvocationId() to WebContext?
With this available, we'd be one step closer to supporting progress updates for long running tasks other than fileUploads.

2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

> Maybe this is what you suggested initially ? :-)
It sure was!!
Sorry about that... <:-S
Ok, so you want getProgress() to be a serverside call. Personally I don't care... I think most people will use the progressHandler approach. 
So do I, so let's make that first. An interesting use for the getProgress() version though is the following:
var future1 = remote.uploadFile(..., {errorHandler:...}); 
var future2 = remote.uploadFile(..., {errorHandler:...})
... 
setTimeout(function(){
    dwr.engine.beginBatch();
    future1.getProgress(function(pinfo){...}); 
    future2.getProgress(function(pinfo){...});
    dwr.engine.endBatch();
}, 1000);
Ie, user code can ask to get several progress remote calls packed into one request.
(Note that this whole thing is assuming a browser with >2 connections allowed to the server.)
Best regards
Mike 
2009/6/18 Mike Wilson <mikewse@...>
Lance wrote:

Ah... ok... future.getProgress() is a client side action and the progress gets updated by the progressHandler. I am happy to go ahead with this solution. 
No, I was thinking that getProgress() could be a low-level access to the progress API when the user wants to set up his own timer. I was assuming that this syntax:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
would imply that we keep asking for progress updates at regular intervals and call the supplied progressHandler?
The getProgress() construct would rather look like this:
var future = remote.uploadFile(..., {errorHandler:...});
...
future.getProgress(function(progressinfo){progressinfo.percentage...});
and ask for progress just once (and it should have a callback just like you point out for cancel() as it leads to a remote call). Maybe this is what you suggested initially ? :-)
Anyway, the progressHandler call option is the most important one here, I think. I just thought that we could implement the getProgress() one as well, as it seems pretty easy. The difference is that the former is called by a timer that we keep in DWR but the latter is triggered from user code.
 
Best regards
Mike
I think we can still pass a handler to cancel though since it invokes a remote call.
  future.cancel(function() {
     alert('Task cancelled on server');
  });

As for changing the progressHandler or callback in the middle of the call... I'm not going anywhere near this but I agree it's possible. 
Yes, I think there is no reason for us to do this, so let's skip it.
Cheers,
Lance.


2009/6/18 Mike Wilson <mikewse@...>
I think we are mixing two ways of specifying call options here, I'll try to describe what I mean:
 
Currently we specify all call options in the last call argument:
remote.uploadFile(..., {errorHandler:..., progressHandler:...});
These call options are input to the call, telling DWR how to process the call.
With the new returned "future" object:
var future = remote.uploadFile(...);
we now have a handle to everything supplied as input to the call, and its future output. In theory, this could include being able to examine, or change, all call options supplied in the actual call:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
future.getErrorHandler()
future.getProgressHandler()
future.setProgressHandler(function...) // corresponds to your getProgress()
This could offer a totally new way of supplying call options, but on the other hand allows the user to change handlers in the middle of a call (which could mean more cases for us to handle) and can only be supported for a subset of call options (f ex not for the async or ordered flags). I think we should probably skip this kind of API for the time being.
 
OTOH, things that I think we should provide on the "future" object are methods that don't correspond to call options. The cancel() method is one example, and "my version" of a getProgress() method that just manually asks for the current progress once and returns that value is another:
var future = remote.uploadFile(..., {errorHandler:..., progressHandler:...});
...
var progressinfo = future.getProgress();
if (progressinfo.percentage < 20) future.cancel();
This could be used by user code that wants to implement their own progress checking without relying on our timer.
What do you think about this distinction?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:53

To: users@...
Subject: Re: [dwr-user] File upload progress

It will be more like:

var future = remote.uploadFile(...);
future.getProgress(function(progress) {
   progress.percentage;
   progress.totalBytes; 
});

Cheers,
Lance

2009/6/18 Mike Wilson <mikewse@...>
My personal taste, and I believe more on the "JS style" of things, is to have less methods and more stuff as object properties, so I would go for a single method and an object with all needed info, thus:
 
    var future = remote.uploadFile(...);
    var progress = future.getProgress();
    ->
    progress.percentage
    progress.totalBytes
    etc...
 
future.cancel() sounds good too!
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 18 juni 2009 10:02

To: users@...
Subject: Re: [dwr-user] File upload progress

I like the sound of this and it's a decent first cut.

What object do think I should return from getProgress()? I could either return an object that can be used in all cases (percentage, totalBytes, currentBytes) or I could have two functions:
   getProgress() // returns percentage
   getBytesProgress() // returns totalBytes, currentBytes

Also, I will have a cancel() function on the returned object.

2009/6/17 Mike Wilson <mikewse@...>
So, I propose the following:
  1. Lance implements the progress function along these lines for RC2 (or later?):
    • client-side algorithm with Math.random() etc as described to generate invocationId
    • no logic for using custom invocationId
    • update dwr.engine.transport.send to return the batch (or some object wrapping the batch or batchId) for everything except XHR with async:false
    • add a getProgress() method to the returned object (batch or other) that will transparently use the saved invocationId on the batch
  2. Later (for RC3 or 3.0) I implement:
    • server-side entropy cookie
    • change engine.js scriptSessionId handling to use the client-side progress algorithm (including entropy cookie) and replace the current server-side solution
What do you all think?
 
Best regards
Mike


From: Mike Wilson [mailto:mikewse@...]
Sent: den 15 juni 2009 17:49 Subject: RE: [dwr-user] File upload progress

If doing (1) we could probably combine the following available sources to get more randomness:
  • Math.random()
  • window.location
  • document.cookie
  • new Date().getTime() executed at include of engine.js
  • new Date().getTime() executed at first call
  • checksum of serialized inbound data in first call (although this is pretty uninteresting if the first call is a poll ;-)
We could also make a "light" (no extra request) combination of version (1) and (4) by letting all DWR responses carry a small entropy cookie that we change for every response. This would put the entropy algorithm on the server and a page would take a snapshot of the value at load time to use for constructing its scriptSessionId that it will use throughout its own lifetime. Normally, the entropy cookie would be set before the first call as we would deliver it in the engine.js response as well.
In theory, it would be enough with a 10-digit string containing [0-9A-F] to serve 1000 unique seeds per second for 35 years.
 
I was thinking that maybe this cookie could also be used for your suggestion on the new CSRF protection mechanism, but I made some tests and at least IE (surprise) seems to do no locking and be very liberal about changing cookie values under your feet when you work with the same cookie in multiple windows. So this would need some more work.
 
Though, I think we may be close to a scriptSessionid solution with the above algorithm. And yes, let's see where we end up in the discussion. Maybe it's a 3.1/3.5/4.0 feature.
 
Best regards
Mike


From: joseph.walker@... [mailto:joseph.walker@...] On Behalf Of Joe Walker
Sent: den 13 juni 2009 09:25
To: users@...
Subject: Re: [dwr-user] File upload progress


I would vote for alternatives in the order 1, 3, 2 as you, Mike. Although I would temper that with the need to do a release. It feels like this maybe stepping into new features a bit.

This is my trail of how to create a crypto secure random # in JavaScript:

Version 1:
randomPassword = function(length) {
    length = length || 16;
    var chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
    var pass = "";
    for (var x = 0; x < length; x++) {
        var charIndex = Math.floor(Math.random() * chars.length);
        pass += chars.charAt(charIndex);
    }
    return pass;
}

I have a feeling that Math.random() is decent on IE, but not generally trustworthy.
http://stackoverflow.com/questions/578700/how-trustworthy-is-javascripts-random-implementation-in-various-browsers/578714 certainly dislikes it.

Version 2:
window.crypto.random();
For once mozdev is up to date (https://developer.mozilla.org/en/JavaScript_crypto) I tried window.crypto.random(10); and it indeed is a function that dies instantly.

Version 3:
Clipperzlib is an AGPL, JavaScript PRNG http://sourceforge.net/projects/clipperzlib http://www.clipperz.com/open_source/javascript_crypto_library

Version 4:
We could even have a dwr "send me a random number function". ;-)

I would have thought that version 1 could be updated with some entropy as you noted Mike, without too much difficulty.

Joe.


On Fri, Jun 12, 2009 at 9:48 AM, Mike Wilson <mikewse@...> wrote:
That's true, but I find the principle important about not bloating DWR with multiple algorithms filling the same purpose.
 
We have the following alternatives:
  1. Use new client-generated ID for both scriptSessionId and progress.
  2. Use current server-generated ID for scriptSessionId and new client-generated ID for progress.
  3. Use current server-generated ID for both scriptSessionId and progress.
I understand that you Jose, and I think Lance, prefer (2)?
If possible to implement, I would choose (1) but if not, I lean towards (3).
Then again, if everybody else agrees on (2) I don't have a problem with that.
Joe: did you have any thoughts on how an algorithm for (1) could look? Maybe add some enthropy from the browser window's x/y and width/height :-).
 
Best regards
Mike


From: Jose Noheda [mailto:jose.noheda@...]
Sent: den 12 juni 2009 08:48

To: users@...
Subject: Re: [dwr-user] File upload progress

I'm all for a client side generated id for upload management only. I think we have deviated a lot from our original purpose

Regards

On Thu, Jun 11, 2009 at 11:13 PM, Joe Walker <joe@...> wrote:

I'm not convinced that we need server side IDs at all do we?

The reason I originally changed my code from client-side generated IDs to server-side generated ones was over 2 fears:
- ID spoofing: that there might be a way to session fixate someone (there's a good wikipedia article on it)
- Denial of Service: that a rogue client could flood the server with IDs

Traditional Http-Session fixation isn't prevented by server-side generated IDs, so I think that's bogus. I would have thought that it was possible to be somewhat smart with the way you handled client-side generated IDs to prevent trivial DoS attacks.

On the other hand there are considerable advantages in Client generated IDs:
- They are immune to HttpOnly cookie settings
- The client can always do the same thing allowing the server to manage what it tracks, it if wants to track anything at all.

For the case of file upload with a stateless scriptsession manager, we could easily have the ID tracking done by the download manager only for IDs involved in a download, and then the minimum memory is used.

What do you think?

Joe.


On Thu, Jun 11, 2009 at 10:44 AM, Mike Wilson <mikewse@...> wrote:
I'd like to first settle if we should ship an alternative "id" algorithm in our codebase at all.
 
Ie, do we want to provide and maintain a separate client-side id algorithm for progress handling when we already have the scriptSessionId+batchId algorithm solving the same problem?
If I am the only one against providing the alternative algorithm, then no problem, let's go for it.
Joe: you've been doing work in this area before, can we hear your input on this?
 
Depending on this decision we can then continue the discussion on the looks of the API.
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 11 juni 2009 10:01
To: users@...
Subject: Re: [dwr-user] File upload progress

Firstly, I think I'll change UploadManager to be ProgressManager.

How's this for a solution.

Have a new flag in engine.js:
   dwr.engine._trustRandomForInvocationId

I realise random numbers are generated by current time so 2 browsers can get the same random at the same time. If using a HttpSessionProgressManager there's no real chance of a clash though.

When generating an invocation id (clientside):
1. If (scriptSessionId != null) use scriptSessionId + batchId
2. If (scriptSessionId == null && trustRandomForInvocationId) use a random number for invocationId
3. Make a serverside call then use method 1.

An extension to this, a new method could be added to the ProgressManager interface.
   public boolean isTrustRandomForInvocationId()

This would return false for the global manager and true for HttpSessionProgressManager and ScriptSessionProgressManager.

We could then use this method when generating engine.js and set the default for the flag:
   dwr.engine._trustRandomForInvocationId = ${progressManager.trustRandomForInvocationId}

The developer can override this in javascript:
  dwr.engine.setTrustRandomForInvocationId(false);


2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I was hoping that the invocationId generation technique did not care about which upload manager implementation was being used. It therefore needs to assume the worst (unique to the application). The invocationId is generated in javascript. This may include a serverside generated token (ie scriptSessionId). 
Ah, ok, I don't mind either way here, and now I understand your point about a "complete" invocationId. 
One could say that adding the scriptSessionId to the invocationId in client code is redundant when a ScriptSession-based progress manager is used (as invocationIds are scoped on the current script session) but it certainly doesn't do any harm. And as you say, it lets you choose any progress manager (although it would be natural to choose the ScriptSession-based manager when the invocationId is based on scriptSessionId). No worries there.
 
Do we agree to use the scriptSessionId-based algorithm as default, and have the user implement their own algorithm in case they want something different?
 
Best regards
Mike
2009/6/10 Mike Wilson <mikewse@...>
Lance wrote:

I also came up with a JSP taglib suggestion to avoid the initial request
Right, that's good for JSP users.
Please keep in mind that in all of my talk on the subject. I have meant that invocationId is an application wide unique identifier for the invocation (ie scriptSessionId + batchId). I have seen mike using it to mean batch id. I'd prefer to stick with my terminology if that's ok. 
I think we mean the same thing, I am just pointing out that depending on the server-side manager the supplied invocationId is working inside different-sized scopes, and therefore have different requirements on "how unique" we have to make it. For a session-based manager the "effective" invocationId is really JSESSIONID + the invocationId from engine.js.
 
Isn't it so, that when you say "application wide" you are assuming that you are using the application global upload manager, which doesn't add anything to the "effective invocationId"?
 
Best regards
Mike












RE: File upload progress

by mikewse :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Exactly, there is a mismatch here between long-running tasks that probably want to report progress per task ("call") and file upload that wants to report for the whole batch.
 
We also don't really know what kind of progress (call- or batch-oriented) the UI user code wants? Maybe we should provide both?
When the progress manager is batch-oriented (file upload) we can emulate progress on call level by forwarding the batch percentage to all call handlers. And when the progress manager is call-oriented we can emulate progress on batch level by summing up the progress from the individual calls.
 
Another question is: should we have separate progress for inbound and outbound data? In theory, our
remote.uploadFile(..., function(reply){...});
could return something from a long-running task that takes longer time than the upload. Then it would be confusing to show 0-100 just for the upload, and nothing for the download, but it would also be confusing to always show 0-50 or something for the upload.
Maybe progress info should have two phases (send/receive, upload/download, inbound/outbound etc) and go from 0 to 100 twice?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 21 juni 2009 22:34
To: users@...
Subject: Re: [dwr-user] File upload progress

Hmm... I think i responded too soon before thinking your email through. You are considering the possibility of having different calls in a batch each having their own progress meter. This is very difficult to do for fileUploads because you need to parse the request to get the parameter names which is used to group parameters into calls. Hence you've done all the work before updating any progress.

For long running tasks however... each call in a batch could update a separate progress. In this case, calls would need to know their id. If we maintained a currentCall() on webContext, we could achieve this. I'm not sure I like the idea of a thread local that changes but am open to the idea.

Eg

webContext.getBatch() // determined after parsing the request into a batch
webContext.getCurrentCall() // this changes as DWR loops through the batch.

interface ProgressManager {
   public Progress getProgress(String invocationId, int callId);
   public void updateProgress(String invocationId, int callId, Progress progress);
   public void cancel(String invocationId, int callId);
}

I seem to remember that there is an option to run a batch's calls in parallel threads? I guess this still works as each thread has it's own threadLocal.

2009/6/21 Lance Java <lance.java@...>


2009/6/21 <mike@...>

It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.

I just maintain a list of handlers and call all of them with the progress.
 
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike

I guess there's the option of

class Batch {
   Call[] calls;
   String invocationId;
   int batchId;
}

class Call {
   Method method;
   Object[] args;
   Object target;
   int callId;
}

webContext.getBatch()

I'm not sure if this information is helpful though.

Re: File upload progress

by Lance Java :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hmm... this is quite a can of worms we've opened!

So... worst case scenario is a batch with multiple calls. Each call does the following:
1. upload some large files
2. perform some long running task
3. return a list of large files

For 1, we can only update progress per batch
For 2, we can update progress per call (requires a pretty complex algorithm i described in a previous email)
For 3, we can update progress per call or per batch if we know the size of the item we're downloading using some fancy InputStream wrappers.

Stop me if I'm going to far now!

On the client side we could have:
dwr.engine.startBatch();
remote.uploadFile(file1, {
   callProgressHandler(function(progressType, progress) { ... });
});
remote.uploadFile(file2, {
   callProgressHandler(function(progressType, progress) { ... });
});
dwr.engine.endBatch({
   batchProgressHandler: function(progressType, progress) { ... }
});

Where progressType in ('inbound', 'processing', 'outbound')

batchProgressHandler will receive notifications of 'inbound' and 'outbound'
callProgressHandler will receive notifications of 'processing' and 'outbound'


2009/6/22 Mike Wilson <mikewse@...>
Exactly, there is a mismatch here between long-running tasks that probably want to report progress per task ("call") and file upload that wants to report for the whole batch.
 
We also don't really know what kind of progress (call- or batch-oriented) the UI user code wants? Maybe we should provide both?
When the progress manager is batch-oriented (file upload) we can emulate progress on call level by forwarding the batch percentage to all call handlers. And when the progress manager is call-oriented we can emulate progress on batch level by summing up the progress from the individual calls.
 
Another question is: should we have separate progress for inbound and outbound data? In theory, our
remote.uploadFile(..., function(reply){...});
could return something from a long-running task that takes longer time than the upload. Then it would be confusing to show 0-100 just for the upload, and nothing for the download, but it would also be confusing to always show 0-50 or something for the upload.
Maybe progress info should have two phases (send/receive, upload/download, inbound/outbound etc) and go from 0 to 100 twice?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 21 juni 2009 22:34

To: users@...
Subject: Re: [dwr-user] File upload progress

Hmm... I think i responded too soon before thinking your email through. You are considering the possibility of having different calls in a batch each having their own progress meter. This is very difficult to do for fileUploads because you need to parse the request to get the parameter names which is used to group parameters into calls. Hence you've done all the work before updating any progress.

For long running tasks however... each call in a batch could update a separate progress. In this case, calls would need to know their id. If we maintained a currentCall() on webContext, we could achieve this. I'm not sure I like the idea of a thread local that changes but am open to the idea.

Eg

webContext.getBatch() // determined after parsing the request into a batch
webContext.getCurrentCall() // this changes as DWR loops through the batch.

interface ProgressManager {
   public Progress getProgress(String invocationId, int callId);
   public void updateProgress(String invocationId, int callId, Progress progress);
   public void cancel(String invocationId, int callId);
}

I seem to remember that there is an option to run a batch's calls in parallel threads? I guess this still works as each thread has it's own threadLocal.

2009/6/21 Lance Java <lance.java@...>


2009/6/21 <mike@...>

It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.

I just maintain a list of handlers and call all of them with the progress.
 
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike

I guess there's the option of

class Batch {
   Call[] calls;
   String invocationId;
   int batchId;
}

class Call {
   Method method;
   Object[] args;
   Object target;
   int callId;
}

webContext.getBatch()

I'm not sure if this information is helpful though.


RE: File upload progress

by mikewse :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sorry about the delay Lance, I haven't had any mailing list time recently... :-(
 
Lance wrote:

hmm... this is quite a can of worms we've opened!
Sort of I guess :-S. I'm not at all saying we should implement everything in this area now, but it would be nice to design enough with the future in mind so we can have stable APIs when we do.
So... worst case scenario is a batch with multiple calls. Each call does the following:
1. upload some large files
2. perform some long running task
3. return a list of large files

For 1, we can only update progress per batch 
Right. I guess we could do a "best-effort" thingie for multi-call batches that detects when the the upload moves into the next call's data, quickly going to 100% on the current call, and then using the remaining received/total data statistics for that next call (until this shifts into the next-next call etc).
For 2, we can update progress per call (requires a pretty complex algorithm i described in a previous email) 
I was hoping that we, for now, could invent an API that lets user code report progress for its long-running tasks, and that the "complex algorithm" could later be installed as a fallback implementation that reports to this same API? Then it could be added transparently later.
For 3, we can update progress per call or per batch if we know the size of the item we're downloading using some fancy InputStream wrappers. 
I'm not sure on how much progress reporting we should do for actual file downloads (the ones where DWR generates a temporary path, returns the file path in the Ajax reply, and user code then does some kind of DOM/BOM update to make the browser load the file).
 
From our POV I guess we could say that our progress is at 100% once we return the Ajax reply, and we shouldn't concern ourselves with what user code does with these files once handed over to it. Also, when there is a file download in DOM/BOM the browser usually gives some feedback through its own progress bar or similar. But I agree it is probably quite possible to include this stuff in our progress handling, so why not, we have control over the file download servlet so shouldn't be too hard to track how many bytes have been written to the response.
 
Then of course there are also the alternatives:
1b. upload some large data
3b. download some large data
that deserve progress handling as well.
Stop me if I'm going to far now!
Not at all, it's refreshing to look at all angles of this topic :-)
On the client side we could have:
dwr.engine.startBatch();
remote.uploadFile(file1, {
   callProgressHandler(function(progressType, progress) { ... });
});
remote.uploadFile(file2, {
   callProgressHandler(function(progressType, progress) { ... });
});
dwr.engine.endBatch({
   batchProgressHandler: function(progressType, progress) { ... }
});

Where progressType in ('inbound', 'processing', 'outbound')

batchProgressHandler will receive notifications of 'inbound' and 'outbound'
callProgressHandler will receive notifications of 'processing' and 'outbound' 
Nice addition with the "processing" stage.
With the emulation algorithms to convert between call and batch progress I mentioned previously, I think we could have inbound/processing/outbound phases both for call and batch progress. Currently we load all calls before starting processing, and we finish all calls before starting to send the response, so there would be a clear processing phase for the batch also. Even if that is optimized a bit we can say that processing is everything between inbound and outbound phases for the batch.
 
For the option naming I would prefer to go with just "progressHandler" in line with other handlers (f ex exceptionHandler is not named callExceptionHandler). The need to differentiate between the two handler versions gets less important when we can have the same semantics for them, as just mentioned wrt handling the same phases, above.
 
Best regards
Mike 
2009/6/22 Mike Wilson <mikewse@...>
Exactly, there is a mismatch here between long-running tasks that probably want to report progress per task ("call") and file upload that wants to report for the whole batch.
 
We also don't really know what kind of progress (call- or batch-oriented) the UI user code wants? Maybe we should provide both?
When the progress manager is batch-oriented (file upload) we can emulate progress on call level by forwarding the batch percentage to all call handlers. And when the progress manager is call-oriented we can emulate progress on batch level by summing up the progress from the individual calls.
 
Another question is: should we have separate progress for inbound and outbound data? In theory, our
remote.uploadFile(..., function(reply){...});
could return something from a long-running task that takes longer time than the upload. Then it would be confusing to show 0-100 just for the upload, and nothing for the download, but it would also be confusing to always show 0-50 or something for the upload.
Maybe progress info should have two phases (send/receive, upload/download, inbound/outbound etc) and go from 0 to 100 twice?
 
Best regards
Mike


From: Lance Java [mailto:lance.java@...]
Sent: den 21 juni 2009 22:34

To: users@...
Subject: Re: [dwr-user] File upload progress

Hmm... I think i responded too soon before thinking your email through. You are considering the possibility of having different calls in a batch each having their own progress meter. This is very difficult to do for fileUploads because you need to parse the request to get the parameter names which is used to group parameters into calls. Hence you've done all the work before updating any progress.

For long running tasks however... each call in a batch could update a separate progress. In this case, calls would need to know their id. If we maintained a currentCall() on webContext, we could achieve this. I'm not sure I like the idea of a thread local that changes but am open to the idea.

Eg

webContext.getBatch() // determined after parsing the request into a batch
webContext.getCurrentCall() // this changes as DWR loops through the batch.

interface ProgressManager {
   public Progress getProgress(String invocationId, int callId);
   public void updateProgress(String invocationId, int callId, Progress progress);
   public void cancel(String invocationId, int callId);
}

I seem to remember that there is an option to run a batch's calls in parallel threads? I guess this still works as each thread has it's own threadLocal.

2009/6/21 Lance Java <lance.java@...>


2009/6/21 <mike@...>

It sounds like a good idea to expose the progress identity in some way. First though, I just realized there is something we need to figure out: should progress apply to a single call or to all calls in a batch?
I e, with the following code:

dwr.engine.beginBatch();
remote.uploadFile(..., {progressHandler:a}); 
remote.uploadFile(..., {progressHandler:b}); 
dwr.engine.endBatch({progressHandler:c});
would we deliver individual progress to a and b, or just total progress to c?
I guess the individual version (a+b) is the most elegant, but how doable is that with file uploading, as we rely on Commons FileUpload?
When progress is done on call level the invocationId needs to contain callId in addition to batchId but that's not a problem.

I just maintain a list of handlers and call all of them with the progress.
 
 
Anyway, once we know the above the next question is how to include the invocation/progress identity in WebContext:
 
Browser data               Server object  WebContext member
------------               -------------  -----------------
JSESSIONID                 HttpSession    HttpSession
scriptSessionId            ScriptSession  ScriptSession
batch                      CallBatch      String batchInvocationId?
batch.map["c"+callId+...]  Call           String callInvocationId?
 
The existing information in WebContext is exposed as objects and not ids, i e you get the actual ScriptSession object and not just the scriptSessionId. Looking at invocations, we do have internal server-side objects that correspond to calls and batches, but we are not exposing them today. We probably don't want to expose them just to be able to query about invocationIds, so putting the string ids directly on WebContext may be the best option, although that makes it somewhat inconsistent. Thoughts?
 
Best regards
Mike

I guess there's the option of

class Batch {
   Call[] calls;
   String invocationId;
   int batchId;
}

class Call {
   Method method;
   Object[] args;
   Object target;
   int callId;
}

webContext.getBatch()

I'm not sure if this information is helpful though.

< Prev | 1 - 2 - 3 | Next >