|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
S3ServiceMulti vs Home-brewedI have been using S3ServiceMulti, but I am thinking about changing to
a home-brewed multi-threading uploader. Specifically, I have a set of threads that are generating content I want to upload to S3. Right now what my code does is dump that content to /mnt on my EC2 machine and then only after the threads are completed does it initiate a simple S3ServiceMulti call that uploads everything in /mnt. The problem with that is that a bunch of stuff could have been getting uploaded before the threads finished. Since I know there will never be more than 300 objects, it seems like a perfect candidate for a simple ThreadPoolExecutor backed by a LinkedBlockingQueue that I just submit individual Runnables to that upload individual files using S3Service. My only concern about this is that there might be some extra "magic" that S3ServiceMulti does that makes batch uploading more efficient than my code will be. Perhaps Keep-Alives are used there that wouldn't be used in my code? Not really sure if this is a real concern I should have or not, which is why I'm asking here. I did look at S3SerivceMulti.CreateObjectRunnable which appears to simply make calls to S3Service.putObject(), so I think I'm OK but I wanted to ask. Patrick --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: S3ServiceMulti vs Home-brewedHi Patrick,
A lot of the HTTP-level smarts exhibited by JetS3t comes from the underlying HttpClient library, including things like connection pipe lining. The REST implementation always uses a MultiThreadedHttpConnectionManager to manage a pool of connections, so your own multi-threaded code can take advantage of this without extra work. To take full advantage of the HttpClient connection management, simply make sure that you use a RestS3Service instance that is initialised with HTTP socket, connection, and timeout settings appropriate for your use case. Check this class' constructor to see what HttpClient settings are applied and add to or adjust them as necessary. The most important thing is to make sure you never run more threads than there are available connections, as set by the "httpclient.max-connections" property in jets3t.properties via HttpConnectionManagerParams#setMaxConnectionsPerHost. If you attempt to use more connections than this maximum the HttpClient connection manager can deadlock. From your email, it sounds like setting this maximum to 300 will work for your situation. Hope this helps, James --- http://www.jamesmurty.com On Tue, Nov 25, 2008 at 1:02 AM, Patrick Lightbody <patrick@...> wrote: I have been using S3ServiceMulti, but I am thinking about changing to |
|
|
Re: S3ServiceMulti vs Home-brewedJames,
Thanks for the info. I think I'm all set then. Patrick On Mon, Nov 24, 2008 at 2:43 PM, James Murty <jmurty@...> wrote: > Hi Patrick, > > A lot of the HTTP-level smarts exhibited by JetS3t comes from the underlying > HttpClient library, including things like connection pipe lining. The REST > implementation always uses a MultiThreadedHttpConnectionManager to manage a > pool of connections, so your own multi-threaded code can take advantage of > this without extra work. > > To take full advantage of the HttpClient connection management, simply make > sure that you use a RestS3Service instance that is initialised with HTTP > socket, connection, and timeout settings appropriate for your use case. > Check this class' constructor to see what HttpClient settings are applied > and add to or adjust them as necessary. > > The most important thing is to make sure you never run more threads than > there are available connections, as set by the "httpclient.max-connections" > property in jets3t.properties via > HttpConnectionManagerParams#setMaxConnectionsPerHost. If you attempt to use > more connections than this maximum the HttpClient connection manager can > deadlock. From your email, it sounds like setting this maximum to 300 will > work for your situation. > > Hope this helps, > James > > --- > http://www.jamesmurty.com > > > On Tue, Nov 25, 2008 at 1:02 AM, Patrick Lightbody <patrick@...> > wrote: >> >> I have been using S3ServiceMulti, but I am thinking about changing to >> a home-brewed multi-threading uploader. >> >> Specifically, I have a set of threads that are generating content I >> want to upload to S3. Right now what my code does is dump that content >> to /mnt on my EC2 machine and then only after the threads are >> completed does it initiate a simple S3ServiceMulti call that uploads >> everything in /mnt. >> >> The problem with that is that a bunch of stuff could have been getting >> uploaded before the threads finished. Since I know there will never be >> more than 300 objects, it seems like a perfect candidate for a simple >> ThreadPoolExecutor backed by a LinkedBlockingQueue that I just submit >> individual Runnables to that upload individual files using S3Service. >> >> My only concern about this is that there might be some extra "magic" >> that S3ServiceMulti does that makes batch uploading more efficient >> than my code will be. Perhaps Keep-Alives are used there that wouldn't >> be used in my code? >> >> Not really sure if this is a real concern I should have or not, which >> is why I'm asking here. I did look at >> S3SerivceMulti.CreateObjectRunnable which appears to simply make calls >> to S3Service.putObject(), so I think I'm OK but I wanted to ask. >> >> Patrick >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: users-unsubscribe@... >> For additional commands, e-mail: users-help@... >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
| Free embeddable forum powered by Nabble | Forum Help |