|
View:
New views
5 Messages
—
Rating Filter:
Alert me
|
|
|
Best way to test if object exists?I've just started working with jets3t/s3 and have a couple elementary
questions about best practices. Any guidance would be appreciated. (i) What's the most efficient way to determine if an object with a particular key exists in a particular bucket? I'm going to be doing this a lot, so I need to find the fastest method. One way is just to use S3Service.listObjects(bucket) (and cache the results). But of course this could lead to problems if I have more than one application accessing the bucket (I'd prefer to avoid having to implement locks). The only other way I've come up with is to use S3Service.getObjectDetails(). E.g. S3Service s3RestService; public boolean objectExists(S3Bucket bucket, String key) throws S3ServiceException { try { S3Object sobj = s3RestService.getObjectDetails(bucket,key); return(true); } catch(S3ServiceException x) { return(false); } } ^ Not very nice. Is there a better way of doing this? (ii) Re Uploading files. I initially tried using the putObject() in S3Service, but ran into problems with largish objects. I'm currently using methods in the S3ServiceMulti,which I gather break large files into chunks. I don't really understand the how this work and I'm wondering if I'm using this package correctly and if it is the right choice. Currently I'm uploading files with S3ServiceMulti mService; public void uploadFile(S3Bucket bucket,String key,File file) throws Exception { S3Object sobj = ObjectUtils.createObjectForUpload(key,file,null,false); mService.putObjects(bucket,new S3Object[] { sobj }); throwError(); } Is this reasonable, or is there a better way to upload a single file? (My files range in size from 50k to ~ 100 mb.) It appears that sending several files at once is much faster. E.g. using public void uploadFiles(S3Bucket bucket,String[] keys,File[] files) throws Exception { S3Object[] sobjs = new S3Object [files.length]; for ( int i = 0; i < files.length; i++ ) { sobjs[i] = ObjectUtils.createObjectForUpload(keys[i],files[i],null,GZIP); } mService.putObjects(bucket,sobjs); throwError(); } seems to be considerably faster than sending one at a time. Does this make sense? (iii) Is there a way of creating an S3Object from a Java object, rather than from a File? My data consists of Java objects which I want to store in S3, in serialized form. At present, I'm writing them (serialized to byte arrays) to ramdisk and then giving the File to createObjectForUpload(),but it would be nice to cut out them middle man. Is there a way of doing this without modifying jets3t? thanks, Barnet Wagman --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscribe@... For additional commands, e-mail: users-help@... |
|
|
Re: Best way to test if object exists?Hi Barnet,
(i) For testing the existence of an object in S3, the code you are currently using (with the getObjectDetails method) is the best way. The code's a bit messy, but it's the only way the toolkit provides to do this at present. I should warn you that you cannot (or should not) rely on this test to give you accurate results all the time due to the "eventual consistency" model of S3 and the other AWS services. Depending on which of the S3 data centers or servers your request hits, you may get outdated information about an object. For example, it may look like an object exists when it has already been deleted, or vice versa. Generally this won't be too much of an issue as long as you give the service a few seconds to sort itself out, but it's something you should bear in mind. (ii) For uploading files, the putObject method should work but you will be far better off uploading files in parallel with the S3ServiceMulti class. This class does not break files into chunks, it just uploads multiple objects at the same time. You can control how many simultaneous HTTP/S connections this class can use for uploads and downloads by setting the "s3service.max-thread-count" property in configs/jets3t.properties. If you have the bandwidth for it, I would recommend setting this value to 50 or higher (it's only 10 by default, to be reliable for people with slow DSL connections). Increasing the number of connection threads will have a dramatic impact if you are transferring many small files. (iii) JetS3t doesn't include a convenient way to serialize Java objects, but if you write the objects to a byte array you can upload them easily by wrappping the array in a ByteArrayInputStream and providing this to the S3 object like the following (example code only, it may not compile): S3Object o = new S3Object(objectName); o.setContentLength(byteArray.length); o.setDataInputStream(new ByteArrayInputStream(byteArray)); Hope this helps, James
On Fri, May 2, 2008 at 4:53 AM, Barnet Wagman <b.wagman@...> wrote: I've just started working with jets3t/s3 and have a couple elementary questions about best practices. Any guidance would be appreciated. -- http://www.jamesmurty.com |
|
|
Re: Best way to test if object exists?Just a tip for getting the byte array:
ByteArrayOutputStream bos = new ByteArrayOutputStream() ; ObjectOutput out = new ObjectOutputStream(bos) ; out.writeObject(object); out.close(); byte[] byteArray = bos.toByteArray(); On Thu, May 1, 2008 at 5:55 PM, James Murty <jmurty@...> wrote: > Hi Barnet, > > (i) > For testing the existence of an object in S3, the code you are currently using (with the getObjectDetails method) is the best way. The code's a bit messy, but it's the only way the toolkit provides to do this at present. > > I should warn you that you cannot (or should not) rely on this test to give you accurate results all the time due to the "eventual consistency" model of S3 and the other AWS services. Depending on which of the S3 data centers or servers your request hits, you may get outdated information about an object. For example, it may look like an object exists when it has already been deleted, or vice versa. Generally this won't be too much of an issue as long as you give the service a few seconds to sort itself out, but it's something you should bear in mind. > > (ii) > For uploading files, the putObject method should work but you will be far better off uploading files in parallel with the S3ServiceMulti class. This class does not break files into chunks, it just uploads multiple objects at the same time. You can control how many simultaneous HTTP/S connections this class can use for uploads and downloads by setting the "s3service.max-thread-count" property in configs/jets3t.properties. If you have the bandwidth for it, I would recommend setting this value to 50 or higher (it's only 10 by default, to be reliable for people with slow DSL connections). Increasing the number of connection threads will have a dramatic impact if you are transferring many small files. > > (iii) > JetS3t doesn't include a convenient way to serialize Java objects, but if you write the objects to a byte array you can upload them easily by wrappping the array in a ByteArrayInputStream and providing this to the S3 object like the following (example code only, it may not compile): > > S3Object o = new S3Object(objectName); > o.setContentLength(byteArray.length); > o.setDataInputStream(new ByteArrayInputStream(byteArray)); > > > Hope this helps, > James > > > > > > > > On Fri, May 2, 2008 at 4:53 AM, Barnet Wagman <b.wagman@...> wrote: > > > I've just started working with jets3t/s3 and have a couple elementary questions about best practices. Any guidance would be appreciated. > > > > (i) What's the most efficient way to determine if an object with a particular key exists in a particular bucket? I'm going to be doing this a lot, so I need to find the fastest method. One way is just to use S3Service.listObjects(bucket) (and cache the results). But of course this could lead to problems if I have more than one application accessing the bucket (I'd prefer to avoid having to implement locks). The only other way I've come up with is to use S3Service.getObjectDetails(). E.g. > > S3Service s3RestService; public boolean objectExists(S3Bucket bucket, String key) > > throws S3ServiceException { > > > > try { S3Object sobj = s3RestService.getObjectDetails(bucket,key); > > return(true); > > } > > catch(S3ServiceException x) { > > return(false); > > } } > > > > ^ Not very nice. Is there a better way of doing this? > > > > (ii) Re Uploading files. I initially tried using the putObject() in S3Service, but ran into problems with largish objects. I'm currently using methods in the S3ServiceMulti,which I gather break large files into chunks. I don't really understand the how this work and I'm wondering if I'm using this package correctly and if it is the right choice. Currently I'm uploading files with > > > > S3ServiceMulti mService; > > > > public void uploadFile(S3Bucket bucket,String key,File file) > > throws Exception { > > S3Object sobj = ObjectUtils.createObjectForUpload(key,file,null,false); > > mService.putObjects(bucket,new S3Object[] { sobj }); > > throwError(); > > } > > > > Is this reasonable, or is there a better way to upload a single file? (My files range in size from 50k to ~ 100 mb.) > > > > It appears that sending several files at once is much faster. E.g. using > > > > public void uploadFiles(S3Bucket bucket,String[] keys,File[] files) > > throws Exception { > > S3Object[] sobjs = new S3Object [files.length]; > > for ( int i = 0; i < files.length; i++ ) { > > sobjs[i] = ObjectUtils.createObjectForUpload(keys[i],files[i],null,GZIP); > > } > > mService.putObjects(bucket,sobjs); > > throwError(); > > } > > > > seems to be considerably faster than sending one at a time. Does this make sense? > > > > (iii) Is there a way of creating an S3Object from a Java object, rather than from a File? My data consists of Java objects which I want to store in S3, in serialized form. At present, I'm writing them (serialized to byte arrays) to ramdisk and then giving the File to createObjectForUpload(),but it would be nice to cut out them middle man. Is there a way of doing this without modifying jets3t? > > > > thanks, > > > > Barnet Wagman > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: users-unsubscribe@... > > For additional commands, e-mail: users-help@... > > > > > > > > -- > http://www.jamesmurty.com |
|
|
Re: Best way to test if object exists?
Thanks, that helps great deal. I'll be accessing S3 primarily from Ec2
instances, so raising the s3service.max-thread-count is probably
appropriate.
One other related question: I will have many small objects (~50 kb) to store in S3, and was wondering if I'd be better off bundling them into larger objects. My main concern is access speed, in both directions. I'll typically be generating around five to ten thousand small objects at a time (~ 350 mb) and will end up with a few terabytes of data stored in S3. I don't have any idea of how much overhead is associated with transfers to and from S3 and have been wondering if having several million little objects will be inefficient. thanks again, Barnet James Murty wrote: Hi Barnet, |
|
|
Re: Best way to test if object exists?Thanks for that example code Travis.
Regarding storing many small objects in S3, I would definitely recommend bundling them together if you can. The time to setup each HTTP/S connections to S3 can be a big overhead when you're dealing with many small files, so reducing the number will be helpful. JetS3t will re-use connections to a certain extent, but reducing the number of objects should pay dividends. You will also reduce your per-request usage fees, which could mount up quite quickly if you're saving and reading millions of objects in S3. There are a few strategies you could use to bundle objects. You could zip them together into archives, or just merge the different files. If you merge the files and keep a local record of the byte boundaries, you could use range-limited GET requests to retrieve portions of the merged file. If possible, you should run some small-scale tests to try out different approaches. The performance of S3 can vary a bit depending on load, so run tests at different times to see what kind of performance you can expect. Cheers, James On Fri, May 2, 2008 at 12:39 PM, Barnet Wagman <b.wagman@...> wrote:
-- http://www.jamesmurty.com |
| Free embeddable forum powered by Nabble | Forum Help |