[Building Sakai] TinyURL Service Impl Issue

View: New views
15 Messages — Rating Filter:   Alert me  

[Building Sakai] TinyURL Service Impl Issue

by carl.hall :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Summary: There is an ever increasing probability that the collision of
IDs in the TinyURL service will slow down the service considerably.

In looking at the implementation of the TinyURL service, I've noticed
something about the way URLs are generated.  If I'm headed in the
wrong direction with this, please correct me.

The very general approach is:
1) generate a random ID
2) check that the ID doesn't already exist
3) if exists, go to 1.  if not, record the ID and release it to the caller.

The following code for generating a URL:

Random generator = new Random();
int seed = Math.abs(generator.nextInt());

has an issue with collision which is addressed by checking for the
existence of that ID before handing it out but the algorithm run time
will continue to grow because of the growing chance of collision over
time.

Explanation of concern:
The first generated ID will have a 0/n chance of collision where n is
the total number of possible IDs.  The next ID has a 1/n chance, the
next 2/n, etc up to the last having a chance of collision being n-1/n.
 Considering n needs to be sufficiently large to have a low to zero
rate of duplication over time, this check will take more time because
of the number of collisions over time.  Assuming that the ID column in
the database is indexed, the lookup time is minimized but there's
still a call to the database lookup on each generated ID.  As the
number of IDs generated reaches the max possible, there are possibly
n-1 lookups before a valid ID is found.

The current code recreates the Random each time which stops nextInt()
from knowing history (ie. new random seed on each generation).  Java
implements a linear congruential generator to generate random numbers
which requires knowledge of the previously generated ID to be
effective.  Math.random() creates the random seed once and reuses it
for the life of the JVM process which is a better approach.

I would suggest creating an LCG in the TinyURL service so that the
proper range can be easily managed (better performance due to a
narrower scope than Math.random()) and the seed can be restored after
a restart.  I have created a prototype in Python based on the rules
laid out in the Wikipedia article on LCG's [1] and using a
sufficiently large prime number.  This still carries a concern of
generating IDs across a cluster but is relatively easy to address.

WDYT?

[1] http://en.wikipedia.org/wiki/Linear_congruential_generator
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Steve Swinsburg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Carl,

Sounds good, is there an existing LCG Java implementation we can  
leverage or should we roll our own?

cheers,
Steve


On 6 Jul 2009, at 12:34, Carl Hall wrote:

> Summary: There is an ever increasing probability that the collision of
> IDs in the TinyURL service will slow down the service considerably.
>
> In looking at the implementation of the TinyURL service, I've noticed
> something about the way URLs are generated.  If I'm headed in the
> wrong direction with this, please correct me.
>
> The very general approach is:
> 1) generate a random ID
> 2) check that the ID doesn't already exist
> 3) if exists, go to 1.  if not, record the ID and release it to the  
> caller.
>
> The following code for generating a URL:
>
> Random generator = new Random();
> int seed = Math.abs(generator.nextInt());
>
> has an issue with collision which is addressed by checking for the
> existence of that ID before handing it out but the algorithm run time
> will continue to grow because of the growing chance of collision over
> time.
>
> Explanation of concern:
> The first generated ID will have a 0/n chance of collision where n is
> the total number of possible IDs.  The next ID has a 1/n chance, the
> next 2/n, etc up to the last having a chance of collision being n-1/n.
> Considering n needs to be sufficiently large to have a low to zero
> rate of duplication over time, this check will take more time because
> of the number of collisions over time.  Assuming that the ID column in
> the database is indexed, the lookup time is minimized but there's
> still a call to the database lookup on each generated ID.  As the
> number of IDs generated reaches the max possible, there are possibly
> n-1 lookups before a valid ID is found.
>
> The current code recreates the Random each time which stops nextInt()
> from knowing history (ie. new random seed on each generation).  Java
> implements a linear congruential generator to generate random numbers
> which requires knowledge of the previously generated ID to be
> effective.  Math.random() creates the random seed once and reuses it
> for the life of the JVM process which is a better approach.
>
> I would suggest creating an LCG in the TinyURL service so that the
> proper range can be easily managed (better performance due to a
> narrower scope than Math.random()) and the seed can be restored after
> a restart.  I have created a prototype in Python based on the rules
> laid out in the Wikipedia article on LCG's [1] and using a
> sufficiently large prime number.  This still carries a concern of
> generating IDs across a cluster but is relatively easy to address.
>
> WDYT?
>
> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>  with a subject of "unsubscribe"

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Antranig Basman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Please take a look at the comments at the head of the "EighteenIDGenerator"
file for some details of the maths involved in computing how many
documents will arise before a collision:
https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java

I wouldn't really recommend using a key space smaller than this one
if you want to avoid collisions - although of course you will reprobe
to be sure. Another issue with LCGs is that they are highly predictable -
this is why I use a SecureRandom here: One of the requirements of the
TinyURLService that I made for the work at CARET was that the ID
sequence could not be predicted - to prevent the possibility of malicious
"hijacking" of upcoming URL slots (this was actually a system designed
to allocate small URLs that were plaintext signup URLs routed to a PHP
system, perhaps more risk than is present in a typical application!)

Note that a further issue you have with an LCG is running it in a cluster -
even if you can lock on the particular instance you have, you can't
guarantee that some other instance in the cluster has not stepped on
past the slot you want. I think that the approach in my service
(a SecureRandom plus a transactional check into persistence) is most
likely to work reasonably well in a cluster - although it hasn't been
tested there :P

Cheers,
Antranig.


Quoting Steve Swinsburg <s.swinsburg@...>:

> Carl,
>
> Sounds good, is there an existing LCG Java implementation we can
> leverage or should we roll our own?
>
> cheers,
> Steve
>
>
> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>
> > Summary: There is an ever increasing probability that the collision of
> > IDs in the TinyURL service will slow down the service considerably.
> >
> > In looking at the implementation of the TinyURL service, I've noticed
> > something about the way URLs are generated.  If I'm headed in the
> > wrong direction with this, please correct me.
> >
> > The very general approach is:
> > 1) generate a random ID
> > 2) check that the ID doesn't already exist
> > 3) if exists, go to 1.  if not, record the ID and release it to the
> > caller.
> >
> > The following code for generating a URL:
> >
> > Random generator = new Random();
> > int seed = Math.abs(generator.nextInt());
> >
> > has an issue with collision which is addressed by checking for the
> > existence of that ID before handing it out but the algorithm run time
> > will continue to grow because of the growing chance of collision over
> > time.
> >
> > Explanation of concern:
> > The first generated ID will have a 0/n chance of collision where n is
> > the total number of possible IDs.  The next ID has a 1/n chance, the
> > next 2/n, etc up to the last having a chance of collision being n-1/n.
> > Considering n needs to be sufficiently large to have a low to zero
> > rate of duplication over time, this check will take more time because
> > of the number of collisions over time.  Assuming that the ID column in
> > the database is indexed, the lookup time is minimized but there's
> > still a call to the database lookup on each generated ID.  As the
> > number of IDs generated reaches the max possible, there are possibly
> > n-1 lookups before a valid ID is found.
> >
> > The current code recreates the Random each time which stops nextInt()
> > from knowing history (ie. new random seed on each generation).  Java
> > implements a linear congruential generator to generate random numbers
> > which requires knowledge of the previously generated ID to be
> > effective.  Math.random() creates the random seed once and reuses it
> > for the life of the JVM process which is a better approach.
> >
> > I would suggest creating an LCG in the TinyURL service so that the
> > proper range can be easily managed (better performance due to a
> > narrower scope than Math.random()) and the seed can be restored after
> > a restart.  I have created a prototype in Python based on the rules
> > laid out in the Wikipedia article on LCG's [1] and using a
> > sufficiently large prime number.  This still carries a concern of
> > generating IDs across a cluster but is relatively easy to address.
> >
> > WDYT?
> >
> > [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
> > _______________________________________________
> > sakai-dev mailing list
> > sakai-dev@...
> > http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >
> > TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
> >  with a subject of "unsubscribe"
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
> with a subject of "unsubscribe"
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Matthew Jones-8 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You could consider using 22 character uuids as autogenerated tinies in
the url. It might not be able to be called tinyurl anymore, but it
would at least be short, linkable and (much more) collision free.

http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string

I'm not convinced it would be necessary to keep these id's from being
unpredictable. The current security model should prevent anyone from
accessing content they don't have permission to even guessing id's. I
think eventually when a UI was designed for this it might even want to
allow the user to specify a custom ID, similar to how the Site Aliases
(on new site creation) works, allowing a user to specify anything they
want for their site alias. (Like tinyurl.com) So this would almost
guarantee collisions.

So I wouldn't think that the possibility of a collision on an low use
feature for an indexed column (when collisions will probably be
featured in eventually anyway) likely won't be a big deal.

-Matthew

On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:

> Please take a look at the comments at the head of the "EighteenIDGenerator"
> file for some details of the maths involved in computing how many
> documents will arise before a collision:
> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>
> I wouldn't really recommend using a key space smaller than this one
> if you want to avoid collisions - although of course you will reprobe
> to be sure. Another issue with LCGs is that they are highly predictable -
> this is why I use a SecureRandom here: One of the requirements of the
> TinyURLService that I made for the work at CARET was that the ID
> sequence could not be predicted - to prevent the possibility of malicious
> "hijacking" of upcoming URL slots (this was actually a system designed
> to allocate small URLs that were plaintext signup URLs routed to a PHP
> system, perhaps more risk than is present in a typical application!)
>
> Note that a further issue you have with an LCG is running it in a cluster -
> even if you can lock on the particular instance you have, you can't
> guarantee that some other instance in the cluster has not stepped on
> past the slot you want. I think that the approach in my service
> (a SecureRandom plus a transactional check into persistence) is most
> likely to work reasonably well in a cluster - although it hasn't been
> tested there :P
>
> Cheers,
> Antranig.
>
>
> Quoting Steve Swinsburg <s.swinsburg@...>:
>
>> Carl,
>>
>> Sounds good, is there an existing LCG Java implementation we can
>> leverage or should we roll our own?
>>
>> cheers,
>> Steve
>>
>>
>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>
>> > Summary: There is an ever increasing probability that the collision of
>> > IDs in the TinyURL service will slow down the service considerably.
>> >
>> > In looking at the implementation of the TinyURL service, I've noticed
>> > something about the way URLs are generated.  If I'm headed in the
>> > wrong direction with this, please correct me.
>> >
>> > The very general approach is:
>> > 1) generate a random ID
>> > 2) check that the ID doesn't already exist
>> > 3) if exists, go to 1.  if not, record the ID and release it to the
>> > caller.
>> >
>> > The following code for generating a URL:
>> >
>> > Random generator = new Random();
>> > int seed = Math.abs(generator.nextInt());
>> >
>> > has an issue with collision which is addressed by checking for the
>> > existence of that ID before handing it out but the algorithm run time
>> > will continue to grow because of the growing chance of collision over
>> > time.
>> >
>> > Explanation of concern:
>> > The first generated ID will have a 0/n chance of collision where n is
>> > the total number of possible IDs.  The next ID has a 1/n chance, the
>> > next 2/n, etc up to the last having a chance of collision being n-1/n.
>> > Considering n needs to be sufficiently large to have a low to zero
>> > rate of duplication over time, this check will take more time because
>> > of the number of collisions over time.  Assuming that the ID column in
>> > the database is indexed, the lookup time is minimized but there's
>> > still a call to the database lookup on each generated ID.  As the
>> > number of IDs generated reaches the max possible, there are possibly
>> > n-1 lookups before a valid ID is found.
>> >
>> > The current code recreates the Random each time which stops nextInt()
>> > from knowing history (ie. new random seed on each generation).  Java
>> > implements a linear congruential generator to generate random numbers
>> > which requires knowledge of the previously generated ID to be
>> > effective.  Math.random() creates the random seed once and reuses it
>> > for the life of the JVM process which is a better approach.
>> >
>> > I would suggest creating an LCG in the TinyURL service so that the
>> > proper range can be easily managed (better performance due to a
>> > narrower scope than Math.random()) and the seed can be restored after
>> > a restart.  I have created a prototype in Python based on the rules
>> > laid out in the Wikipedia article on LCG's [1] and using a
>> > sufficiently large prime number.  This still carries a concern of
>> > generating IDs across a cluster but is relatively easy to address.
>> >
>> > WDYT?
>> >
>> > [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>> > _______________________________________________
>> > sakai-dev mailing list
>> > sakai-dev@...
>> > http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >
>> > TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>> >  with a subject of "unsubscribe"
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>> with a subject of "unsubscribe"
>>
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Aaron Zeckoski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Is there a reason why the id generation could be more like tinyURL and
simply be sequential and thus very short (at least initially and
probably for most institutions it would stay small).

Just a thought.
-AZ


On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...> wrote:

> You could consider using 22 character uuids as autogenerated tinies in
> the url. It might not be able to be called tinyurl anymore, but it
> would at least be short, linkable and (much more) collision free.
>
> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>
> I'm not convinced it would be necessary to keep these id's from being
> unpredictable. The current security model should prevent anyone from
> accessing content they don't have permission to even guessing id's. I
> think eventually when a UI was designed for this it might even want to
> allow the user to specify a custom ID, similar to how the Site Aliases
> (on new site creation) works, allowing a user to specify anything they
> want for their site alias. (Like tinyurl.com) So this would almost
> guarantee collisions.
>
> So I wouldn't think that the possibility of a collision on an low use
> feature for an indexed column (when collisions will probably be
> featured in eventually anyway) likely won't be a big deal.
>
> -Matthew
>
> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>> Please take a look at the comments at the head of the "EighteenIDGenerator"
>> file for some details of the maths involved in computing how many
>> documents will arise before a collision:
>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>
>> I wouldn't really recommend using a key space smaller than this one
>> if you want to avoid collisions - although of course you will reprobe
>> to be sure. Another issue with LCGs is that they are highly predictable -
>> this is why I use a SecureRandom here: One of the requirements of the
>> TinyURLService that I made for the work at CARET was that the ID
>> sequence could not be predicted - to prevent the possibility of malicious
>> "hijacking" of upcoming URL slots (this was actually a system designed
>> to allocate small URLs that were plaintext signup URLs routed to a PHP
>> system, perhaps more risk than is present in a typical application!)
>>
>> Note that a further issue you have with an LCG is running it in a cluster -
>> even if you can lock on the particular instance you have, you can't
>> guarantee that some other instance in the cluster has not stepped on
>> past the slot you want. I think that the approach in my service
>> (a SecureRandom plus a transactional check into persistence) is most
>> likely to work reasonably well in a cluster - although it hasn't been
>> tested there :P
>>
>> Cheers,
>> Antranig.
>>
>>
>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>
>>> Carl,
>>>
>>> Sounds good, is there an existing LCG Java implementation we can
>>> leverage or should we roll our own?
>>>
>>> cheers,
>>> Steve
>>>
>>>
>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>
>>> > Summary: There is an ever increasing probability that the collision of
>>> > IDs in the TinyURL service will slow down the service considerably.
>>> >
>>> > In looking at the implementation of the TinyURL service, I've noticed
>>> > something about the way URLs are generated.  If I'm headed in the
>>> > wrong direction with this, please correct me.
>>> >
>>> > The very general approach is:
>>> > 1) generate a random ID
>>> > 2) check that the ID doesn't already exist
>>> > 3) if exists, go to 1.  if not, record the ID and release it to the
>>> > caller.
>>> >
>>> > The following code for generating a URL:
>>> >
>>> > Random generator = new Random();
>>> > int seed = Math.abs(generator.nextInt());
>>> >
>>> > has an issue with collision which is addressed by checking for the
>>> > existence of that ID before handing it out but the algorithm run time
>>> > will continue to grow because of the growing chance of collision over
>>> > time.
>>> >
>>> > Explanation of concern:
>>> > The first generated ID will have a 0/n chance of collision where n is
>>> > the total number of possible IDs.  The next ID has a 1/n chance, the
>>> > next 2/n, etc up to the last having a chance of collision being n-1/n.
>>> > Considering n needs to be sufficiently large to have a low to zero
>>> > rate of duplication over time, this check will take more time because
>>> > of the number of collisions over time.  Assuming that the ID column in
>>> > the database is indexed, the lookup time is minimized but there's
>>> > still a call to the database lookup on each generated ID.  As the
>>> > number of IDs generated reaches the max possible, there are possibly
>>> > n-1 lookups before a valid ID is found.
>>> >
>>> > The current code recreates the Random each time which stops nextInt()
>>> > from knowing history (ie. new random seed on each generation).  Java
>>> > implements a linear congruential generator to generate random numbers
>>> > which requires knowledge of the previously generated ID to be
>>> > effective.  Math.random() creates the random seed once and reuses it
>>> > for the life of the JVM process which is a better approach.
>>> >
>>> > I would suggest creating an LCG in the TinyURL service so that the
>>> > proper range can be easily managed (better performance due to a
>>> > narrower scope than Math.random()) and the seed can be restored after
>>> > a restart.  I have created a prototype in Python based on the rules
>>> > laid out in the Wikipedia article on LCG's [1] and using a
>>> > sufficiently large prime number.  This still carries a concern of
>>> > generating IDs across a cluster but is relatively easy to address.
>>> >
>>> > WDYT?
>>> >
>>> > [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>> > _______________________________________________
>>> > sakai-dev mailing list
>>> > sakai-dev@...
>>> > http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>> >
>>> > TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>> >  with a subject of "unsubscribe"
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>> with a subject of "unsubscribe"
>>>
>>
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>



--
Aaron Zeckoski (azeckoski (at) vt.edu)
Senior Research Engineer - CARET - University of Cambridge
https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Steve Swinsburg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The security is handled in the container anyway so you get a shortened  
URL and it resolves into the full URL, redirects and Sakai takes over.  
Even if it was sequential it wouldn't be a security concern.

With a keyspace of lowercase letters and numbers(36 chars) there are  
about 2 billion combinations, with uppercase as well there's 50  
billion. Info here on the theoretical limits:
http://bit.ly/3QG2uj

Using RandomStringUtils from commons-lang, I wrote a short method to  
iterate over a million alphanumeric combinations using letters and  
numbers of length 6 chars and didn't run into a single collision.  
Using this method and a transactional check when adding into the  
database is probably the way to go.

We can also filter for bad words, BBC has an interesting list (thanks  
Seth! :)

WDYT?

Steve


On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:

> Is there a reason why the id generation could be more like tinyURL and
> simply be sequential and thus very short (at least initially and
> probably for most institutions it would stay small).
>
> Just a thought.
> -AZ
>
>
> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>  
> wrote:
>> You could consider using 22 character uuids as autogenerated tinies  
>> in
>> the url. It might not be able to be called tinyurl anymore, but it
>> would at least be short, linkable and (much more) collision free.
>>
>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>
>> I'm not convinced it would be necessary to keep these id's from being
>> unpredictable. The current security model should prevent anyone from
>> accessing content they don't have permission to even guessing id's. I
>> think eventually when a UI was designed for this it might even want  
>> to
>> allow the user to specify a custom ID, similar to how the Site  
>> Aliases
>> (on new site creation) works, allowing a user to specify anything  
>> they
>> want for their site alias. (Like tinyurl.com) So this would almost
>> guarantee collisions.
>>
>> So I wouldn't think that the possibility of a collision on an low use
>> feature for an indexed column (when collisions will probably be
>> featured in eventually anyway) likely won't be a big deal.
>>
>> -Matthew
>>
>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>> Please take a look at the comments at the head of the  
>>> "EighteenIDGenerator"
>>> file for some details of the maths involved in computing how many
>>> documents will arise before a collision:
>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>
>>> I wouldn't really recommend using a key space smaller than this one
>>> if you want to avoid collisions - although of course you will  
>>> reprobe
>>> to be sure. Another issue with LCGs is that they are highly  
>>> predictable -
>>> this is why I use a SecureRandom here: One of the requirements of  
>>> the
>>> TinyURLService that I made for the work at CARET was that the ID
>>> sequence could not be predicted - to prevent the possibility of  
>>> malicious
>>> "hijacking" of upcoming URL slots (this was actually a system  
>>> designed
>>> to allocate small URLs that were plaintext signup URLs routed to a  
>>> PHP
>>> system, perhaps more risk than is present in a typical application!)
>>>
>>> Note that a further issue you have with an LCG is running it in a  
>>> cluster -
>>> even if you can lock on the particular instance you have, you can't
>>> guarantee that some other instance in the cluster has not stepped on
>>> past the slot you want. I think that the approach in my service
>>> (a SecureRandom plus a transactional check into persistence) is most
>>> likely to work reasonably well in a cluster - although it hasn't  
>>> been
>>> tested there :P
>>>
>>> Cheers,
>>> Antranig.
>>>
>>>
>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>
>>>> Carl,
>>>>
>>>> Sounds good, is there an existing LCG Java implementation we can
>>>> leverage or should we roll our own?
>>>>
>>>> cheers,
>>>> Steve
>>>>
>>>>
>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>
>>>>> Summary: There is an ever increasing probability that the  
>>>>> collision of
>>>>> IDs in the TinyURL service will slow down the service  
>>>>> considerably.
>>>>>
>>>>> In looking at the implementation of the TinyURL service, I've  
>>>>> noticed
>>>>> something about the way URLs are generated.  If I'm headed in the
>>>>> wrong direction with this, please correct me.
>>>>>
>>>>> The very general approach is:
>>>>> 1) generate a random ID
>>>>> 2) check that the ID doesn't already exist
>>>>> 3) if exists, go to 1.  if not, record the ID and release it to  
>>>>> the
>>>>> caller.
>>>>>
>>>>> The following code for generating a URL:
>>>>>
>>>>> Random generator = new Random();
>>>>> int seed = Math.abs(generator.nextInt());
>>>>>
>>>>> has an issue with collision which is addressed by checking for the
>>>>> existence of that ID before handing it out but the algorithm run  
>>>>> time
>>>>> will continue to grow because of the growing chance of collision  
>>>>> over
>>>>> time.
>>>>>
>>>>> Explanation of concern:
>>>>> The first generated ID will have a 0/n chance of collision where  
>>>>> n is
>>>>> the total number of possible IDs.  The next ID has a 1/n chance,  
>>>>> the
>>>>> next 2/n, etc up to the last having a chance of collision being  
>>>>> n-1/n.
>>>>> Considering n needs to be sufficiently large to have a low to zero
>>>>> rate of duplication over time, this check will take more time  
>>>>> because
>>>>> of the number of collisions over time.  Assuming that the ID  
>>>>> column in
>>>>> the database is indexed, the lookup time is minimized but there's
>>>>> still a call to the database lookup on each generated ID.  As the
>>>>> number of IDs generated reaches the max possible, there are  
>>>>> possibly
>>>>> n-1 lookups before a valid ID is found.
>>>>>
>>>>> The current code recreates the Random each time which stops  
>>>>> nextInt()
>>>>> from knowing history (ie. new random seed on each generation).  
>>>>> Java
>>>>> implements a linear congruential generator to generate random  
>>>>> numbers
>>>>> which requires knowledge of the previously generated ID to be
>>>>> effective.  Math.random() creates the random seed once and  
>>>>> reuses it
>>>>> for the life of the JVM process which is a better approach.
>>>>>
>>>>> I would suggest creating an LCG in the TinyURL service so that the
>>>>> proper range can be easily managed (better performance due to a
>>>>> narrower scope than Math.random()) and the seed can be restored  
>>>>> after
>>>>> a restart.  I have created a prototype in Python based on the  
>>>>> rules
>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>> sufficiently large prime number.  This still carries a concern of
>>>>> generating IDs across a cluster but is relatively easy to address.
>>>>>
>>>>> WDYT?
>>>>>
>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>>  with a subject of "unsubscribe"
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>> with a subject of "unsubscribe"
>>>>
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging Program.
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>  with a subject of "unsubscribe"
>>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>  with a subject of "unsubscribe"
>>
>
>
>
> --
> Aaron Zeckoski (azeckoski (at) vt.edu)
> Senior Research Engineer - CARET - University of Cambridge
> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>  with a subject of "unsubscribe"

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Jim Martino :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

i love the bad words filter idea, once inserted a one-line filter in a
perl script (for displaying student comments on course surveys) to
replace offending constructions with asterisks. easily the funniest line
of code i ever constructed, especially with the regular expressions :)

jim

Steve Swinsburg wrote:

> The security is handled in the container anyway so you get a shortened
> URL and it resolves into the full URL, redirects and Sakai takes over.
> Even if it was sequential it wouldn't be a security concern.
>
> With a keyspace of lowercase letters and numbers(36 chars) there are
> about 2 billion combinations, with uppercase as well there's 50
> billion. Info here on the theoretical limits:
> http://bit.ly/3QG2uj
>
> Using RandomStringUtils from commons-lang, I wrote a short method to
> iterate over a million alphanumeric combinations using letters and
> numbers of length 6 chars and didn't run into a single collision.
> Using this method and a transactional check when adding into the
> database is probably the way to go.
>
> We can also filter for bad words, BBC has an interesting list (thanks
> Seth! :)
>
> WDYT?
>
> Steve
>
>
> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>
>  
>> Is there a reason why the id generation could be more like tinyURL and
>> simply be sequential and thus very short (at least initially and
>> probably for most institutions it would stay small).
>>
>> Just a thought.
>> -AZ
>>
>>
>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>> wrote:
>>    
>>> You could consider using 22 character uuids as autogenerated tinies
>>> in
>>> the url. It might not be able to be called tinyurl anymore, but it
>>> would at least be short, linkable and (much more) collision free.
>>>
>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>
>>> I'm not convinced it would be necessary to keep these id's from being
>>> unpredictable. The current security model should prevent anyone from
>>> accessing content they don't have permission to even guessing id's. I
>>> think eventually when a UI was designed for this it might even want
>>> to
>>> allow the user to specify a custom ID, similar to how the Site
>>> Aliases
>>> (on new site creation) works, allowing a user to specify anything
>>> they
>>> want for their site alias. (Like tinyurl.com) So this would almost
>>> guarantee collisions.
>>>
>>> So I wouldn't think that the possibility of a collision on an low use
>>> feature for an indexed column (when collisions will probably be
>>> featured in eventually anyway) likely won't be a big deal.
>>>
>>> -Matthew
>>>
>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>      
>>>> Please take a look at the comments at the head of the
>>>> "EighteenIDGenerator"
>>>> file for some details of the maths involved in computing how many
>>>> documents will arise before a collision:
>>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>
>>>> I wouldn't really recommend using a key space smaller than this one
>>>> if you want to avoid collisions - although of course you will
>>>> reprobe
>>>> to be sure. Another issue with LCGs is that they are highly
>>>> predictable -
>>>> this is why I use a SecureRandom here: One of the requirements of
>>>> the
>>>> TinyURLService that I made for the work at CARET was that the ID
>>>> sequence could not be predicted - to prevent the possibility of
>>>> malicious
>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>> designed
>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>> PHP
>>>> system, perhaps more risk than is present in a typical application!)
>>>>
>>>> Note that a further issue you have with an LCG is running it in a
>>>> cluster -
>>>> even if you can lock on the particular instance you have, you can't
>>>> guarantee that some other instance in the cluster has not stepped on
>>>> past the slot you want. I think that the approach in my service
>>>> (a SecureRandom plus a transactional check into persistence) is most
>>>> likely to work reasonably well in a cluster - although it hasn't
>>>> been
>>>> tested there :P
>>>>
>>>> Cheers,
>>>> Antranig.
>>>>
>>>>
>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>
>>>>        
>>>>> Carl,
>>>>>
>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>> leverage or should we roll our own?
>>>>>
>>>>> cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>
>>>>>          
>>>>>> Summary: There is an ever increasing probability that the
>>>>>> collision of
>>>>>> IDs in the TinyURL service will slow down the service
>>>>>> considerably.
>>>>>>
>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>> noticed
>>>>>> something about the way URLs are generated.  If I'm headed in the
>>>>>> wrong direction with this, please correct me.
>>>>>>
>>>>>> The very general approach is:
>>>>>> 1) generate a random ID
>>>>>> 2) check that the ID doesn't already exist
>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>> the
>>>>>> caller.
>>>>>>
>>>>>> The following code for generating a URL:
>>>>>>
>>>>>> Random generator = new Random();
>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>
>>>>>> has an issue with collision which is addressed by checking for the
>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>> time
>>>>>> will continue to grow because of the growing chance of collision
>>>>>> over
>>>>>> time.
>>>>>>
>>>>>> Explanation of concern:
>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>> n is
>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>> the
>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>> n-1/n.
>>>>>> Considering n needs to be sufficiently large to have a low to zero
>>>>>> rate of duplication over time, this check will take more time
>>>>>> because
>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>> column in
>>>>>> the database is indexed, the lookup time is minimized but there's
>>>>>> still a call to the database lookup on each generated ID.  As the
>>>>>> number of IDs generated reaches the max possible, there are
>>>>>> possibly
>>>>>> n-1 lookups before a valid ID is found.
>>>>>>
>>>>>> The current code recreates the Random each time which stops
>>>>>> nextInt()
>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>> Java
>>>>>> implements a linear congruential generator to generate random
>>>>>> numbers
>>>>>> which requires knowledge of the previously generated ID to be
>>>>>> effective.  Math.random() creates the random seed once and
>>>>>> reuses it
>>>>>> for the life of the JVM process which is a better approach.
>>>>>>
>>>>>> I would suggest creating an LCG in the TinyURL service so that the
>>>>>> proper range can be easily managed (better performance due to a
>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>> after
>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>> rules
>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>> sufficiently large prime number.  This still carries a concern of
>>>>>> generating IDs across a cluster but is relatively easy to address.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>>>  with a subject of "unsubscribe"
>>>>>>            
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>>
>>>>>          
>>>>
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>  with a subject of "unsubscribe"
>>>>
>>>>        
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>  with a subject of "unsubscribe"
>>>
>>>      
>>
>> --
>> Aaron Zeckoski (azeckoski (at) vt.edu)
>> Senior Research Engineer - CARET - University of Cambridge
>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>  with a subject of "unsubscribe"
>>    
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>  

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Antranig Basman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

The issue with collisions occurs when you are hosting URLs which are
not protected by the container, but held by some external resource -
for example, a signup system "click here to validate your email to
Xxxxx service". The implementation I made actually runs in two modes,
you may request either "SECURE" or "SHORT" URLs by an option to the
API. The short URLs start at only a couple of characters and
are indeed sequential.
Anyway, feel free to implement your own system, it is the Sakai way
after all :)

Cheers,
Antranig.

Quoting Steve Swinsburg <s.swinsburg@...>:

> The security is handled in the container anyway so you get a shortened
> URL and it resolves into the full URL, redirects and Sakai takes over.
> Even if it was sequential it wouldn't be a security concern.
>
> With a keyspace of lowercase letters and numbers(36 chars) there are
> about 2 billion combinations, with uppercase as well there's 50
> billion. Info here on the theoretical limits:
> http://bit.ly/3QG2uj
>
> Using RandomStringUtils from commons-lang, I wrote a short method to
> iterate over a million alphanumeric combinations using letters and
> numbers of length 6 chars and didn't run into a single collision.
> Using this method and a transactional check when adding into the
> database is probably the way to go.
>
> We can also filter for bad words, BBC has an interesting list (thanks
> Seth! :)
>
> WDYT?
>
> Steve
>
>
> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>
> > Is there a reason why the id generation could be more like tinyURL and
> > simply be sequential and thus very short (at least initially and
> > probably for most institutions it would stay small).
> >
> > Just a thought.
> > -AZ
> >
> >
> > On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
> > wrote:
> >> You could consider using 22 character uuids as autogenerated tinies
> >> in
> >> the url. It might not be able to be called tinyurl anymore, but it
> >> would at least be short, linkable and (much more) collision free.
> >>
> >> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
> >>
> >> I'm not convinced it would be necessary to keep these id's from being
> >> unpredictable. The current security model should prevent anyone from
> >> accessing content they don't have permission to even guessing id's. I
> >> think eventually when a UI was designed for this it might even want
> >> to
> >> allow the user to specify a custom ID, similar to how the Site
> >> Aliases
> >> (on new site creation) works, allowing a user to specify anything
> >> they
> >> want for their site alias. (Like tinyurl.com) So this would almost
> >> guarantee collisions.
> >>
> >> So I wouldn't think that the possibility of a collision on an low use
> >> feature for an indexed column (when collisions will probably be
> >> featured in eventually anyway) likely won't be a big deal.
> >>
> >> -Matthew
> >>
> >> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
> >>> Please take a look at the comments at the head of the
> >>> "EighteenIDGenerator"
> >>> file for some details of the maths involved in computing how many
> >>> documents will arise before a collision:
> >>>
>
https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java

> >>>
> >>> I wouldn't really recommend using a key space smaller than this one
> >>> if you want to avoid collisions - although of course you will
> >>> reprobe
> >>> to be sure. Another issue with LCGs is that they are highly
> >>> predictable -
> >>> this is why I use a SecureRandom here: One of the requirements of
> >>> the
> >>> TinyURLService that I made for the work at CARET was that the ID
> >>> sequence could not be predicted - to prevent the possibility of
> >>> malicious
> >>> "hijacking" of upcoming URL slots (this was actually a system
> >>> designed
> >>> to allocate small URLs that were plaintext signup URLs routed to a
> >>> PHP
> >>> system, perhaps more risk than is present in a typical application!)
> >>>
> >>> Note that a further issue you have with an LCG is running it in a
> >>> cluster -
> >>> even if you can lock on the particular instance you have, you can't
> >>> guarantee that some other instance in the cluster has not stepped on
> >>> past the slot you want. I think that the approach in my service
> >>> (a SecureRandom plus a transactional check into persistence) is most
> >>> likely to work reasonably well in a cluster - although it hasn't
> >>> been
> >>> tested there :P
> >>>
> >>> Cheers,
> >>> Antranig.
> >>>
> >>>
> >>> Quoting Steve Swinsburg <s.swinsburg@...>:
> >>>
> >>>> Carl,
> >>>>
> >>>> Sounds good, is there an existing LCG Java implementation we can
> >>>> leverage or should we roll our own?
> >>>>
> >>>> cheers,
> >>>> Steve
> >>>>
> >>>>
> >>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
> >>>>
> >>>>> Summary: There is an ever increasing probability that the
> >>>>> collision of
> >>>>> IDs in the TinyURL service will slow down the service
> >>>>> considerably.
> >>>>>
> >>>>> In looking at the implementation of the TinyURL service, I've
> >>>>> noticed
> >>>>> something about the way URLs are generated.  If I'm headed in the
> >>>>> wrong direction with this, please correct me.
> >>>>>
> >>>>> The very general approach is:
> >>>>> 1) generate a random ID
> >>>>> 2) check that the ID doesn't already exist
> >>>>> 3) if exists, go to 1.  if not, record the ID and release it to
> >>>>> the
> >>>>> caller.
> >>>>>
> >>>>> The following code for generating a URL:
> >>>>>
> >>>>> Random generator = new Random();
> >>>>> int seed = Math.abs(generator.nextInt());
> >>>>>
> >>>>> has an issue with collision which is addressed by checking for the
> >>>>> existence of that ID before handing it out but the algorithm run
> >>>>> time
> >>>>> will continue to grow because of the growing chance of collision
> >>>>> over
> >>>>> time.
> >>>>>
> >>>>> Explanation of concern:
> >>>>> The first generated ID will have a 0/n chance of collision where
> >>>>> n is
> >>>>> the total number of possible IDs.  The next ID has a 1/n chance,
> >>>>> the
> >>>>> next 2/n, etc up to the last having a chance of collision being
> >>>>> n-1/n.
> >>>>> Considering n needs to be sufficiently large to have a low to zero
> >>>>> rate of duplication over time, this check will take more time
> >>>>> because
> >>>>> of the number of collisions over time.  Assuming that the ID
> >>>>> column in
> >>>>> the database is indexed, the lookup time is minimized but there's
> >>>>> still a call to the database lookup on each generated ID.  As the
> >>>>> number of IDs generated reaches the max possible, there are
> >>>>> possibly
> >>>>> n-1 lookups before a valid ID is found.
> >>>>>
> >>>>> The current code recreates the Random each time which stops
> >>>>> nextInt()
> >>>>> from knowing history (ie. new random seed on each generation).
> >>>>> Java
> >>>>> implements a linear congruential generator to generate random
> >>>>> numbers
> >>>>> which requires knowledge of the previously generated ID to be
> >>>>> effective.  Math.random() creates the random seed once and
> >>>>> reuses it
> >>>>> for the life of the JVM process which is a better approach.
> >>>>>
> >>>>> I would suggest creating an LCG in the TinyURL service so that the
> >>>>> proper range can be easily managed (better performance due to a
> >>>>> narrower scope than Math.random()) and the seed can be restored
> >>>>> after
> >>>>> a restart.  I have created a prototype in Python based on the
> >>>>> rules
> >>>>> laid out in the Wikipedia article on LCG's [1] and using a
> >>>>> sufficiently large prime number.  This still carries a concern of
> >>>>> generating IDs across a cluster but is relatively easy to address.
> >>>>>
> >>>>> WDYT?
> >>>>>
> >>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
> >>>>> _______________________________________________
> >>>>> sakai-dev mailing list
> >>>>> sakai-dev@...
> >>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >>>>>
> >>>>> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe@...
> >>>>>  with a subject of "unsubscribe"
> >>>>
> >>>> _______________________________________________
> >>>> sakai-dev mailing list
> >>>> sakai-dev@...
> >>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >>>>
> >>>> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe@...
> >>>> with a subject of "unsubscribe"
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>> ----------------------------------------------------------------
> >>> This message was sent using IMP, the Internet Messaging Program.
> >>>
> >>> _______________________________________________
> >>> sakai-dev mailing list
> >>> sakai-dev@...
> >>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >>>
> >>> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe@...
> >>>  with a subject of "unsubscribe"
> >>>
> >> _______________________________________________
> >> sakai-dev mailing list
> >> sakai-dev@...
> >> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >>
> >> TO UNSUBSCRIBE: send email to
> sakai-dev-unsubscribe@...
> >>  with a subject of "unsubscribe"
> >>
> >
> >
> >
> > --
> > Aaron Zeckoski (azeckoski (at) vt.edu)
> > Senior Research Engineer - CARET - University of Cambridge
> > https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
> > http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
> > _______________________________________________
> > sakai-dev mailing list
> > sakai-dev@...
> > http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
> >
> > TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
> >  with a subject of "unsubscribe"
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
> with a subject of "unsubscribe"
>




----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Michael Osterman-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

You realize, of course, that said bad word filter would need to be internationalized. Could be some of the more entertaining il8n work, though.

-Mike

On Wed, Jul 8, 2009 at 4:51 PM, Jim Martino <jrm@...> wrote:
i love the bad words filter idea, once inserted a one-line filter in a
perl script (for displaying student comments on course surveys) to
replace offending constructions with asterisks. easily the funniest line
of code i ever constructed, especially with the regular expressions :)

jim

Steve Swinsburg wrote:
> The security is handled in the container anyway so you get a shortened
> URL and it resolves into the full URL, redirects and Sakai takes over.
> Even if it was sequential it wouldn't be a security concern.
>
> With a keyspace of lowercase letters and numbers(36 chars) there are
> about 2 billion combinations, with uppercase as well there's 50
> billion. Info here on the theoretical limits:
> http://bit.ly/3QG2uj
>
> Using RandomStringUtils from commons-lang, I wrote a short method to
> iterate over a million alphanumeric combinations using letters and
> numbers of length 6 chars and didn't run into a single collision.
> Using this method and a transactional check when adding into the
> database is probably the way to go.
>
> We can also filter for bad words, BBC has an interesting list (thanks
> Seth! :)
>
> WDYT?
>
> Steve
>
>
> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>
>
>> Is there a reason why the id generation could be more like tinyURL and
>> simply be sequential and thus very short (at least initially and
>> probably for most institutions it would stay small).
>>
>> Just a thought.
>> -AZ
>>
>>
>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>> wrote:
>>
>>> You could consider using 22 character uuids as autogenerated tinies
>>> in
>>> the url. It might not be able to be called tinyurl anymore, but it
>>> would at least be short, linkable and (much more) collision free.
>>>
>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>
>>> I'm not convinced it would be necessary to keep these id's from being
>>> unpredictable. The current security model should prevent anyone from
>>> accessing content they don't have permission to even guessing id's. I
>>> think eventually when a UI was designed for this it might even want
>>> to
>>> allow the user to specify a custom ID, similar to how the Site
>>> Aliases
>>> (on new site creation) works, allowing a user to specify anything
>>> they
>>> want for their site alias. (Like tinyurl.com) So this would almost
>>> guarantee collisions.
>>>
>>> So I wouldn't think that the possibility of a collision on an low use
>>> feature for an indexed column (when collisions will probably be
>>> featured in eventually anyway) likely won't be a big deal.
>>>
>>> -Matthew
>>>
>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>
>>>> Please take a look at the comments at the head of the
>>>> "EighteenIDGenerator"
>>>> file for some details of the maths involved in computing how many
>>>> documents will arise before a collision:
>>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>
>>>> I wouldn't really recommend using a key space smaller than this one
>>>> if you want to avoid collisions - although of course you will
>>>> reprobe
>>>> to be sure. Another issue with LCGs is that they are highly
>>>> predictable -
>>>> this is why I use a SecureRandom here: One of the requirements of
>>>> the
>>>> TinyURLService that I made for the work at CARET was that the ID
>>>> sequence could not be predicted - to prevent the possibility of
>>>> malicious
>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>> designed
>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>> PHP
>>>> system, perhaps more risk than is present in a typical application!)
>>>>
>>>> Note that a further issue you have with an LCG is running it in a
>>>> cluster -
>>>> even if you can lock on the particular instance you have, you can't
>>>> guarantee that some other instance in the cluster has not stepped on
>>>> past the slot you want. I think that the approach in my service
>>>> (a SecureRandom plus a transactional check into persistence) is most
>>>> likely to work reasonably well in a cluster - although it hasn't
>>>> been
>>>> tested there :P
>>>>
>>>> Cheers,
>>>> Antranig.
>>>>
>>>>
>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>
>>>>
>>>>> Carl,
>>>>>
>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>> leverage or should we roll our own?
>>>>>
>>>>> cheers,
>>>>> Steve
>>>>>
>>>>>
>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>
>>>>>
>>>>>> Summary: There is an ever increasing probability that the
>>>>>> collision of
>>>>>> IDs in the TinyURL service will slow down the service
>>>>>> considerably.
>>>>>>
>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>> noticed
>>>>>> something about the way URLs are generated.  If I'm headed in the
>>>>>> wrong direction with this, please correct me.
>>>>>>
>>>>>> The very general approach is:
>>>>>> 1) generate a random ID
>>>>>> 2) check that the ID doesn't already exist
>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>> the
>>>>>> caller.
>>>>>>
>>>>>> The following code for generating a URL:
>>>>>>
>>>>>> Random generator = new Random();
>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>
>>>>>> has an issue with collision which is addressed by checking for the
>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>> time
>>>>>> will continue to grow because of the growing chance of collision
>>>>>> over
>>>>>> time.
>>>>>>
>>>>>> Explanation of concern:
>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>> n is
>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>> the
>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>> n-1/n.
>>>>>> Considering n needs to be sufficiently large to have a low to zero
>>>>>> rate of duplication over time, this check will take more time
>>>>>> because
>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>> column in
>>>>>> the database is indexed, the lookup time is minimized but there's
>>>>>> still a call to the database lookup on each generated ID.  As the
>>>>>> number of IDs generated reaches the max possible, there are
>>>>>> possibly
>>>>>> n-1 lookups before a valid ID is found.
>>>>>>
>>>>>> The current code recreates the Random each time which stops
>>>>>> nextInt()
>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>> Java
>>>>>> implements a linear congruential generator to generate random
>>>>>> numbers
>>>>>> which requires knowledge of the previously generated ID to be
>>>>>> effective.  Math.random() creates the random seed once and
>>>>>> reuses it
>>>>>> for the life of the JVM process which is a better approach.
>>>>>>
>>>>>> I would suggest creating an LCG in the TinyURL service so that the
>>>>>> proper range can be easily managed (better performance due to a
>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>> after
>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>> rules
>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>> sufficiently large prime number.  This still carries a concern of
>>>>>> generating IDs across a cluster but is relatively easy to address.
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>>>  with a subject of "unsubscribe"
>>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>>
>>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>  with a subject of "unsubscribe"
>>>>
>>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>  with a subject of "unsubscribe"
>>>
>>>
>>
>> --
>> Aaron Zeckoski (azeckoski (at) vt.edu)
>> Senior Research Engineer - CARET - University of Cambridge
>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>  with a subject of "unsubscribe"
>>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"


_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Aaron Zeckoski :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I would say that the tinyurl service we are using at CARET is a good
starting point but that is just my opinion. It gives us 2 out of 3
things that we want and the 3rd (custom tiny url) is easy to add.
-AZ

On Wed, Jul 8, 2009 at 10:25 PM, <antranig@...> wrote:

> The issue with collisions occurs when you are hosting URLs which are
> not protected by the container, but held by some external resource -
> for example, a signup system "click here to validate your email to
> Xxxxx service". The implementation I made actually runs in two modes,
> you may request either "SECURE" or "SHORT" URLs by an option to the
> API. The short URLs start at only a couple of characters and
> are indeed sequential.
> Anyway, feel free to implement your own system, it is the Sakai way
> after all :)
>
> Cheers,
> Antranig.
>
> Quoting Steve Swinsburg <s.swinsburg@...>:
>
>> The security is handled in the container anyway so you get a shortened
>> URL and it resolves into the full URL, redirects and Sakai takes over.
>> Even if it was sequential it wouldn't be a security concern.
>>
>> With a keyspace of lowercase letters and numbers(36 chars) there are
>> about 2 billion combinations, with uppercase as well there's 50
>> billion. Info here on the theoretical limits:
>> http://bit.ly/3QG2uj
>>
>> Using RandomStringUtils from commons-lang, I wrote a short method to
>> iterate over a million alphanumeric combinations using letters and
>> numbers of length 6 chars and didn't run into a single collision.
>> Using this method and a transactional check when adding into the
>> database is probably the way to go.
>>
>> We can also filter for bad words, BBC has an interesting list (thanks
>> Seth! :)
>>
>> WDYT?
>>
>> Steve
>>
>>
>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>
>> > Is there a reason why the id generation could be more like tinyURL and
>> > simply be sequential and thus very short (at least initially and
>> > probably for most institutions it would stay small).
>> >
>> > Just a thought.
>> > -AZ
>> >
>> >
>> > On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>> > wrote:
>> >> You could consider using 22 character uuids as autogenerated tinies
>> >> in
>> >> the url. It might not be able to be called tinyurl anymore, but it
>> >> would at least be short, linkable and (much more) collision free.
>> >>
>> >> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>> >>
>> >> I'm not convinced it would be necessary to keep these id's from being
>> >> unpredictable. The current security model should prevent anyone from
>> >> accessing content they don't have permission to even guessing id's. I
>> >> think eventually when a UI was designed for this it might even want
>> >> to
>> >> allow the user to specify a custom ID, similar to how the Site
>> >> Aliases
>> >> (on new site creation) works, allowing a user to specify anything
>> >> they
>> >> want for their site alias. (Like tinyurl.com) So this would almost
>> >> guarantee collisions.
>> >>
>> >> So I wouldn't think that the possibility of a collision on an low use
>> >> feature for an indexed column (when collisions will probably be
>> >> featured in eventually anyway) likely won't be a big deal.
>> >>
>> >> -Matthew
>> >>
>> >> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>> >>> Please take a look at the comments at the head of the
>> >>> "EighteenIDGenerator"
>> >>> file for some details of the maths involved in computing how many
>> >>> documents will arise before a collision:
>> >>>
>>
> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>> >>>
>> >>> I wouldn't really recommend using a key space smaller than this one
>> >>> if you want to avoid collisions - although of course you will
>> >>> reprobe
>> >>> to be sure. Another issue with LCGs is that they are highly
>> >>> predictable -
>> >>> this is why I use a SecureRandom here: One of the requirements of
>> >>> the
>> >>> TinyURLService that I made for the work at CARET was that the ID
>> >>> sequence could not be predicted - to prevent the possibility of
>> >>> malicious
>> >>> "hijacking" of upcoming URL slots (this was actually a system
>> >>> designed
>> >>> to allocate small URLs that were plaintext signup URLs routed to a
>> >>> PHP
>> >>> system, perhaps more risk than is present in a typical application!)
>> >>>
>> >>> Note that a further issue you have with an LCG is running it in a
>> >>> cluster -
>> >>> even if you can lock on the particular instance you have, you can't
>> >>> guarantee that some other instance in the cluster has not stepped on
>> >>> past the slot you want. I think that the approach in my service
>> >>> (a SecureRandom plus a transactional check into persistence) is most
>> >>> likely to work reasonably well in a cluster - although it hasn't
>> >>> been
>> >>> tested there :P
>> >>>
>> >>> Cheers,
>> >>> Antranig.
>> >>>
>> >>>
>> >>> Quoting Steve Swinsburg <s.swinsburg@...>:
>> >>>
>> >>>> Carl,
>> >>>>
>> >>>> Sounds good, is there an existing LCG Java implementation we can
>> >>>> leverage or should we roll our own?
>> >>>>
>> >>>> cheers,
>> >>>> Steve
>> >>>>
>> >>>>
>> >>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>> >>>>
>> >>>>> Summary: There is an ever increasing probability that the
>> >>>>> collision of
>> >>>>> IDs in the TinyURL service will slow down the service
>> >>>>> considerably.
>> >>>>>
>> >>>>> In looking at the implementation of the TinyURL service, I've
>> >>>>> noticed
>> >>>>> something about the way URLs are generated.  If I'm headed in the
>> >>>>> wrong direction with this, please correct me.
>> >>>>>
>> >>>>> The very general approach is:
>> >>>>> 1) generate a random ID
>> >>>>> 2) check that the ID doesn't already exist
>> >>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>> >>>>> the
>> >>>>> caller.
>> >>>>>
>> >>>>> The following code for generating a URL:
>> >>>>>
>> >>>>> Random generator = new Random();
>> >>>>> int seed = Math.abs(generator.nextInt());
>> >>>>>
>> >>>>> has an issue with collision which is addressed by checking for the
>> >>>>> existence of that ID before handing it out but the algorithm run
>> >>>>> time
>> >>>>> will continue to grow because of the growing chance of collision
>> >>>>> over
>> >>>>> time.
>> >>>>>
>> >>>>> Explanation of concern:
>> >>>>> The first generated ID will have a 0/n chance of collision where
>> >>>>> n is
>> >>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>> >>>>> the
>> >>>>> next 2/n, etc up to the last having a chance of collision being
>> >>>>> n-1/n.
>> >>>>> Considering n needs to be sufficiently large to have a low to zero
>> >>>>> rate of duplication over time, this check will take more time
>> >>>>> because
>> >>>>> of the number of collisions over time.  Assuming that the ID
>> >>>>> column in
>> >>>>> the database is indexed, the lookup time is minimized but there's
>> >>>>> still a call to the database lookup on each generated ID.  As the
>> >>>>> number of IDs generated reaches the max possible, there are
>> >>>>> possibly
>> >>>>> n-1 lookups before a valid ID is found.
>> >>>>>
>> >>>>> The current code recreates the Random each time which stops
>> >>>>> nextInt()
>> >>>>> from knowing history (ie. new random seed on each generation).
>> >>>>> Java
>> >>>>> implements a linear congruential generator to generate random
>> >>>>> numbers
>> >>>>> which requires knowledge of the previously generated ID to be
>> >>>>> effective.  Math.random() creates the random seed once and
>> >>>>> reuses it
>> >>>>> for the life of the JVM process which is a better approach.
>> >>>>>
>> >>>>> I would suggest creating an LCG in the TinyURL service so that the
>> >>>>> proper range can be easily managed (better performance due to a
>> >>>>> narrower scope than Math.random()) and the seed can be restored
>> >>>>> after
>> >>>>> a restart.  I have created a prototype in Python based on the
>> >>>>> rules
>> >>>>> laid out in the Wikipedia article on LCG's [1] and using a
>> >>>>> sufficiently large prime number.  This still carries a concern of
>> >>>>> generating IDs across a cluster but is relatively easy to address.
>> >>>>>
>> >>>>> WDYT?
>> >>>>>
>> >>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>> >>>>> _______________________________________________
>> >>>>> sakai-dev mailing list
>> >>>>> sakai-dev@...
>> >>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>>>>
>> >>>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>> >>>>>  with a subject of "unsubscribe"
>> >>>>
>> >>>> _______________________________________________
>> >>>> sakai-dev mailing list
>> >>>> sakai-dev@...
>> >>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>>>
>> >>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>> >>>> with a subject of "unsubscribe"
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ----------------------------------------------------------------
>> >>> This message was sent using IMP, the Internet Messaging Program.
>> >>>
>> >>> _______________________________________________
>> >>> sakai-dev mailing list
>> >>> sakai-dev@...
>> >>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>>
>> >>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>> >>>  with a subject of "unsubscribe"
>> >>>
>> >> _______________________________________________
>> >> sakai-dev mailing list
>> >> sakai-dev@...
>> >> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >>
>> >> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>> >>  with a subject of "unsubscribe"
>> >>
>> >
>> >
>> >
>> > --
>> > Aaron Zeckoski (azeckoski (at) vt.edu)
>> > Senior Research Engineer - CARET - University of Cambridge
>> > https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>> > http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>> > _______________________________________________
>> > sakai-dev mailing list
>> > sakai-dev@...
>> > http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>> >
>> > TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>> >  with a subject of "unsubscribe"
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>> with a subject of "unsubscribe"
>>
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>



--
Aaron Zeckoski (azeckoski (at) vt.edu)
Senior Research Engineer - CARET - University of Cambridge
https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Steve Swinsburg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ok I've added an alternative method to return a longer random id which  
makes it less guessable. The number of permutations with this combo  
has 30 zeroes (on a-z and 0-9).

Have some fun with the calculator! http://bit.ly/gJP1X

cheers,
Steve



On 8 Jul 2009, at 17:25, antranig@... wrote:

> The issue with collisions occurs when you are hosting URLs which are
> not protected by the container, but held by some external resource -
> for example, a signup system "click here to validate your email to
> Xxxxx service". The implementation I made actually runs in two modes,
> you may request either "SECURE" or "SHORT" URLs by an option to the
> API. The short URLs start at only a couple of characters and
> are indeed sequential.
> Anyway, feel free to implement your own system, it is the Sakai way
> after all :)
>
> Cheers,
> Antranig.
>
> Quoting Steve Swinsburg <s.swinsburg@...>:
>
>> The security is handled in the container anyway so you get a  
>> shortened
>> URL and it resolves into the full URL, redirects and Sakai takes  
>> over.
>> Even if it was sequential it wouldn't be a security concern.
>>
>> With a keyspace of lowercase letters and numbers(36 chars) there are
>> about 2 billion combinations, with uppercase as well there's 50
>> billion. Info here on the theoretical limits:
>> http://bit.ly/3QG2uj
>>
>> Using RandomStringUtils from commons-lang, I wrote a short method to
>> iterate over a million alphanumeric combinations using letters and
>> numbers of length 6 chars and didn't run into a single collision.
>> Using this method and a transactional check when adding into the
>> database is probably the way to go.
>>
>> We can also filter for bad words, BBC has an interesting list (thanks
>> Seth! :)
>>
>> WDYT?
>>
>> Steve
>>
>>
>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>
>>> Is there a reason why the id generation could be more like tinyURL  
>>> and
>>> simply be sequential and thus very short (at least initially and
>>> probably for most institutions it would stay small).
>>>
>>> Just a thought.
>>> -AZ
>>>
>>>
>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>>> wrote:
>>>> You could consider using 22 character uuids as autogenerated tinies
>>>> in
>>>> the url. It might not be able to be called tinyurl anymore, but it
>>>> would at least be short, linkable and (much more) collision free.
>>>>
>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>
>>>> I'm not convinced it would be necessary to keep these id's from  
>>>> being
>>>> unpredictable. The current security model should prevent anyone  
>>>> from
>>>> accessing content they don't have permission to even guessing  
>>>> id's. I
>>>> think eventually when a UI was designed for this it might even want
>>>> to
>>>> allow the user to specify a custom ID, similar to how the Site
>>>> Aliases
>>>> (on new site creation) works, allowing a user to specify anything
>>>> they
>>>> want for their site alias. (Like tinyurl.com) So this would almost
>>>> guarantee collisions.
>>>>
>>>> So I wouldn't think that the possibility of a collision on an low  
>>>> use
>>>> feature for an indexed column (when collisions will probably be
>>>> featured in eventually anyway) likely won't be a big deal.
>>>>
>>>> -Matthew
>>>>
>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>>> Please take a look at the comments at the head of the
>>>>> "EighteenIDGenerator"
>>>>> file for some details of the maths involved in computing how many
>>>>> documents will arise before a collision:
>>>>>
>>
> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>
>>>>> I wouldn't really recommend using a key space smaller than this  
>>>>> one
>>>>> if you want to avoid collisions - although of course you will
>>>>> reprobe
>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>> predictable -
>>>>> this is why I use a SecureRandom here: One of the requirements of
>>>>> the
>>>>> TinyURLService that I made for the work at CARET was that the ID
>>>>> sequence could not be predicted - to prevent the possibility of
>>>>> malicious
>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>> designed
>>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>>> PHP
>>>>> system, perhaps more risk than is present in a typical  
>>>>> application!)
>>>>>
>>>>> Note that a further issue you have with an LCG is running it in a
>>>>> cluster -
>>>>> even if you can lock on the particular instance you have, you  
>>>>> can't
>>>>> guarantee that some other instance in the cluster has not  
>>>>> stepped on
>>>>> past the slot you want. I think that the approach in my service
>>>>> (a SecureRandom plus a transactional check into persistence) is  
>>>>> most
>>>>> likely to work reasonably well in a cluster - although it hasn't
>>>>> been
>>>>> tested there :P
>>>>>
>>>>> Cheers,
>>>>> Antranig.
>>>>>
>>>>>
>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>
>>>>>> Carl,
>>>>>>
>>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>>> leverage or should we roll our own?
>>>>>>
>>>>>> cheers,
>>>>>> Steve
>>>>>>
>>>>>>
>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>
>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>> collision of
>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>> considerably.
>>>>>>>
>>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>>> noticed
>>>>>>> something about the way URLs are generated.  If I'm headed in  
>>>>>>> the
>>>>>>> wrong direction with this, please correct me.
>>>>>>>
>>>>>>> The very general approach is:
>>>>>>> 1) generate a random ID
>>>>>>> 2) check that the ID doesn't already exist
>>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>>> the
>>>>>>> caller.
>>>>>>>
>>>>>>> The following code for generating a URL:
>>>>>>>
>>>>>>> Random generator = new Random();
>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>
>>>>>>> has an issue with collision which is addressed by checking for  
>>>>>>> the
>>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>>> time
>>>>>>> will continue to grow because of the growing chance of collision
>>>>>>> over
>>>>>>> time.
>>>>>>>
>>>>>>> Explanation of concern:
>>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>>> n is
>>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>>> the
>>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>>> n-1/n.
>>>>>>> Considering n needs to be sufficiently large to have a low to  
>>>>>>> zero
>>>>>>> rate of duplication over time, this check will take more time
>>>>>>> because
>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>> column in
>>>>>>> the database is indexed, the lookup time is minimized but  
>>>>>>> there's
>>>>>>> still a call to the database lookup on each generated ID.  As  
>>>>>>> the
>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>> possibly
>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>
>>>>>>> The current code recreates the Random each time which stops
>>>>>>> nextInt()
>>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>>> Java
>>>>>>> implements a linear congruential generator to generate random
>>>>>>> numbers
>>>>>>> which requires knowledge of the previously generated ID to be
>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>> reuses it
>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>
>>>>>>> I would suggest creating an LCG in the TinyURL service so that  
>>>>>>> the
>>>>>>> proper range can be easily managed (better performance due to a
>>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>>> after
>>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>>> rules
>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>> sufficiently large prime number.  This still carries a concern  
>>>>>>> of
>>>>>>> generating IDs across a cluster but is relatively easy to  
>>>>>>> address.
>>>>>>>
>>>>>>> WDYT?
>>>>>>>
>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev@...
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@...
>>>> with a subject of "unsubscribe"
>>>>
>>>
>>>
>>>
>>> --
>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>> Senior Research Engineer - CARET - University of Cambridge
>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>> with a subject of "unsubscribe"
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>> with a subject of "unsubscribe"
>>
>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Jon Gorrono :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I have been a little curious why timeboxing is not used to decrease
collisions for this service.... we use the service in a custom guest
access app we made to put registration urls etc in emails....
obviously we want to timebox access to the urls but a ttl can be
xtremely effective in reducing an otherwise skewed data set to
something resembling white noise... as it give a small window view of
a larger distribution

Maybe an api call could change the ttl and return the probability of
collision during that time or some other silliness



On Wed, Jul 8, 2009 at 2:52 PM, Steve
Swinsburg<s.swinsburg@...> wrote:

> Ok I've added an alternative method to return a longer random id which
> makes it less guessable. The number of permutations with this combo
> has 30 zeroes (on a-z and 0-9).
>
> Have some fun with the calculator! http://bit.ly/gJP1X
>
> cheers,
> Steve
>
>
>
> On 8 Jul 2009, at 17:25, antranig@... wrote:
>
>> The issue with collisions occurs when you are hosting URLs which are
>> not protected by the container, but held by some external resource -
>> for example, a signup system "click here to validate your email to
>> Xxxxx service". The implementation I made actually runs in two modes,
>> you may request either "SECURE" or "SHORT" URLs by an option to the
>> API. The short URLs start at only a couple of characters and
>> are indeed sequential.
>> Anyway, feel free to implement your own system, it is the Sakai way
>> after all :)
>>
>> Cheers,
>> Antranig.
>>
>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>
>>> The security is handled in the container anyway so you get a
>>> shortened
>>> URL and it resolves into the full URL, redirects and Sakai takes
>>> over.
>>> Even if it was sequential it wouldn't be a security concern.
>>>
>>> With a keyspace of lowercase letters and numbers(36 chars) there are
>>> about 2 billion combinations, with uppercase as well there's 50
>>> billion. Info here on the theoretical limits:
>>> http://bit.ly/3QG2uj
>>>
>>> Using RandomStringUtils from commons-lang, I wrote a short method to
>>> iterate over a million alphanumeric combinations using letters and
>>> numbers of length 6 chars and didn't run into a single collision.
>>> Using this method and a transactional check when adding into the
>>> database is probably the way to go.
>>>
>>> We can also filter for bad words, BBC has an interesting list (thanks
>>> Seth! :)
>>>
>>> WDYT?
>>>
>>> Steve
>>>
>>>
>>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>>
>>>> Is there a reason why the id generation could be more like tinyURL
>>>> and
>>>> simply be sequential and thus very short (at least initially and
>>>> probably for most institutions it would stay small).
>>>>
>>>> Just a thought.
>>>> -AZ
>>>>
>>>>
>>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>>>> wrote:
>>>>> You could consider using 22 character uuids as autogenerated tinies
>>>>> in
>>>>> the url. It might not be able to be called tinyurl anymore, but it
>>>>> would at least be short, linkable and (much more) collision free.
>>>>>
>>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>>
>>>>> I'm not convinced it would be necessary to keep these id's from
>>>>> being
>>>>> unpredictable. The current security model should prevent anyone
>>>>> from
>>>>> accessing content they don't have permission to even guessing
>>>>> id's. I
>>>>> think eventually when a UI was designed for this it might even want
>>>>> to
>>>>> allow the user to specify a custom ID, similar to how the Site
>>>>> Aliases
>>>>> (on new site creation) works, allowing a user to specify anything
>>>>> they
>>>>> want for their site alias. (Like tinyurl.com) So this would almost
>>>>> guarantee collisions.
>>>>>
>>>>> So I wouldn't think that the possibility of a collision on an low
>>>>> use
>>>>> feature for an indexed column (when collisions will probably be
>>>>> featured in eventually anyway) likely won't be a big deal.
>>>>>
>>>>> -Matthew
>>>>>
>>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>>>> Please take a look at the comments at the head of the
>>>>>> "EighteenIDGenerator"
>>>>>> file for some details of the maths involved in computing how many
>>>>>> documents will arise before a collision:
>>>>>>
>>>
>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>>
>>>>>> I wouldn't really recommend using a key space smaller than this
>>>>>> one
>>>>>> if you want to avoid collisions - although of course you will
>>>>>> reprobe
>>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>>> predictable -
>>>>>> this is why I use a SecureRandom here: One of the requirements of
>>>>>> the
>>>>>> TinyURLService that I made for the work at CARET was that the ID
>>>>>> sequence could not be predicted - to prevent the possibility of
>>>>>> malicious
>>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>>> designed
>>>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>>>> PHP
>>>>>> system, perhaps more risk than is present in a typical
>>>>>> application!)
>>>>>>
>>>>>> Note that a further issue you have with an LCG is running it in a
>>>>>> cluster -
>>>>>> even if you can lock on the particular instance you have, you
>>>>>> can't
>>>>>> guarantee that some other instance in the cluster has not
>>>>>> stepped on
>>>>>> past the slot you want. I think that the approach in my service
>>>>>> (a SecureRandom plus a transactional check into persistence) is
>>>>>> most
>>>>>> likely to work reasonably well in a cluster - although it hasn't
>>>>>> been
>>>>>> tested there :P
>>>>>>
>>>>>> Cheers,
>>>>>> Antranig.
>>>>>>
>>>>>>
>>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>>
>>>>>>> Carl,
>>>>>>>
>>>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>>>> leverage or should we roll our own?
>>>>>>>
>>>>>>> cheers,
>>>>>>> Steve
>>>>>>>
>>>>>>>
>>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>>
>>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>>> collision of
>>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>>> considerably.
>>>>>>>>
>>>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>>>> noticed
>>>>>>>> something about the way URLs are generated.  If I'm headed in
>>>>>>>> the
>>>>>>>> wrong direction with this, please correct me.
>>>>>>>>
>>>>>>>> The very general approach is:
>>>>>>>> 1) generate a random ID
>>>>>>>> 2) check that the ID doesn't already exist
>>>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>>>> the
>>>>>>>> caller.
>>>>>>>>
>>>>>>>> The following code for generating a URL:
>>>>>>>>
>>>>>>>> Random generator = new Random();
>>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>>
>>>>>>>> has an issue with collision which is addressed by checking for
>>>>>>>> the
>>>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>>>> time
>>>>>>>> will continue to grow because of the growing chance of collision
>>>>>>>> over
>>>>>>>> time.
>>>>>>>>
>>>>>>>> Explanation of concern:
>>>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>>>> n is
>>>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>>>> the
>>>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>>>> n-1/n.
>>>>>>>> Considering n needs to be sufficiently large to have a low to
>>>>>>>> zero
>>>>>>>> rate of duplication over time, this check will take more time
>>>>>>>> because
>>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>>> column in
>>>>>>>> the database is indexed, the lookup time is minimized but
>>>>>>>> there's
>>>>>>>> still a call to the database lookup on each generated ID.  As
>>>>>>>> the
>>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>>> possibly
>>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>>
>>>>>>>> The current code recreates the Random each time which stops
>>>>>>>> nextInt()
>>>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>>>> Java
>>>>>>>> implements a linear congruential generator to generate random
>>>>>>>> numbers
>>>>>>>> which requires knowledge of the previously generated ID to be
>>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>>> reuses it
>>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>>
>>>>>>>> I would suggest creating an LCG in the TinyURL service so that
>>>>>>>> the
>>>>>>>> proper range can be easily managed (better performance due to a
>>>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>>>> after
>>>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>>>> rules
>>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>>> sufficiently large prime number.  This still carries a concern
>>>>>>>> of
>>>>>>>> generating IDs across a cluster but is relatively easy to
>>>>>>>> address.
>>>>>>>>
>>>>>>>> WDYT?
>>>>>>>>
>>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>>> _______________________________________________
>>>>>>>> sakai-dev mailing list
>>>>>>>> sakai-dev@...
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>
>>>>>>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@...
>>>>>>>> with a subject of "unsubscribe"
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev@...
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@...
>>>>>>> with a subject of "unsubscribe"
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ----------------------------------------------------------------
>>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>>
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@...
>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>>> Senior Research Engineer - CARET - University of Cambridge
>>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>> with a subject of "unsubscribe"
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>> with a subject of "unsubscribe"
>>>
>>
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>



--
Jon Gorrono
PGP Key: 5434509D
email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>+++++++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<++++++++.>---.>>+.<<<----.-.>++.}
http{ats.ucdavis.edu}
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Steve Swinsburg-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Good idea, I'll add that as an additional method where you specify the  
TTL before the URL is no longer resolved.


cheers,
Steve

---
Steve Swinsburg
Portal Systems Developer
Centre for e-Science
Lancaster University
Lancaster
LA1 4YT

email: s.swinsburg@...
phone: +44 (0) 1524 594870







On 15 Jul 2009, at 04:27, Jon Gorrono wrote:

> I have been a little curious why timeboxing is not used to decrease
> collisions for this service.... we use the service in a custom guest
> access app we made to put registration urls etc in emails....
> obviously we want to timebox access to the urls but a ttl can be
> xtremely effective in reducing an otherwise skewed data set to
> something resembling white noise... as it give a small window view of
> a larger distribution
>
> Maybe an api call could change the ttl and return the probability of
> collision during that time or some other silliness
>
>
>
> On Wed, Jul 8, 2009 at 2:52 PM, Steve
> Swinsburg<s.swinsburg@...> wrote:
>> Ok I've added an alternative method to return a longer random id  
>> which
>> makes it less guessable. The number of permutations with this combo
>> has 30 zeroes (on a-z and 0-9).
>>
>> Have some fun with the calculator! http://bit.ly/gJP1X
>>
>> cheers,
>> Steve
>>
>>
>>
>> On 8 Jul 2009, at 17:25, antranig@... wrote:
>>
>>> The issue with collisions occurs when you are hosting URLs which are
>>> not protected by the container, but held by some external resource -
>>> for example, a signup system "click here to validate your email to
>>> Xxxxx service". The implementation I made actually runs in two  
>>> modes,
>>> you may request either "SECURE" or "SHORT" URLs by an option to the
>>> API. The short URLs start at only a couple of characters and
>>> are indeed sequential.
>>> Anyway, feel free to implement your own system, it is the Sakai way
>>> after all :)
>>>
>>> Cheers,
>>> Antranig.
>>>
>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>
>>>> The security is handled in the container anyway so you get a
>>>> shortened
>>>> URL and it resolves into the full URL, redirects and Sakai takes
>>>> over.
>>>> Even if it was sequential it wouldn't be a security concern.
>>>>
>>>> With a keyspace of lowercase letters and numbers(36 chars) there  
>>>> are
>>>> about 2 billion combinations, with uppercase as well there's 50
>>>> billion. Info here on the theoretical limits:
>>>> http://bit.ly/3QG2uj
>>>>
>>>> Using RandomStringUtils from commons-lang, I wrote a short method  
>>>> to
>>>> iterate over a million alphanumeric combinations using letters and
>>>> numbers of length 6 chars and didn't run into a single collision.
>>>> Using this method and a transactional check when adding into the
>>>> database is probably the way to go.
>>>>
>>>> We can also filter for bad words, BBC has an interesting list  
>>>> (thanks
>>>> Seth! :)
>>>>
>>>> WDYT?
>>>>
>>>> Steve
>>>>
>>>>
>>>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>>>
>>>>> Is there a reason why the id generation could be more like tinyURL
>>>>> and
>>>>> simply be sequential and thus very short (at least initially and
>>>>> probably for most institutions it would stay small).
>>>>>
>>>>> Just a thought.
>>>>> -AZ
>>>>>
>>>>>
>>>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>>>>> wrote:
>>>>>> You could consider using 22 character uuids as autogenerated  
>>>>>> tinies
>>>>>> in
>>>>>> the url. It might not be able to be called tinyurl anymore, but  
>>>>>> it
>>>>>> would at least be short, linkable and (much more) collision free.
>>>>>>
>>>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>>>
>>>>>> I'm not convinced it would be necessary to keep these id's from
>>>>>> being
>>>>>> unpredictable. The current security model should prevent anyone
>>>>>> from
>>>>>> accessing content they don't have permission to even guessing
>>>>>> id's. I
>>>>>> think eventually when a UI was designed for this it might even  
>>>>>> want
>>>>>> to
>>>>>> allow the user to specify a custom ID, similar to how the Site
>>>>>> Aliases
>>>>>> (on new site creation) works, allowing a user to specify anything
>>>>>> they
>>>>>> want for their site alias. (Like tinyurl.com) So this would  
>>>>>> almost
>>>>>> guarantee collisions.
>>>>>>
>>>>>> So I wouldn't think that the possibility of a collision on an low
>>>>>> use
>>>>>> feature for an indexed column (when collisions will probably be
>>>>>> featured in eventually anyway) likely won't be a big deal.
>>>>>>
>>>>>> -Matthew
>>>>>>
>>>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>>>>> Please take a look at the comments at the head of the
>>>>>>> "EighteenIDGenerator"
>>>>>>> file for some details of the maths involved in computing how  
>>>>>>> many
>>>>>>> documents will arise before a collision:
>>>>>>>
>>>>
>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>>>
>>>>>>> I wouldn't really recommend using a key space smaller than this
>>>>>>> one
>>>>>>> if you want to avoid collisions - although of course you will
>>>>>>> reprobe
>>>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>>>> predictable -
>>>>>>> this is why I use a SecureRandom here: One of the requirements  
>>>>>>> of
>>>>>>> the
>>>>>>> TinyURLService that I made for the work at CARET was that the ID
>>>>>>> sequence could not be predicted - to prevent the possibility of
>>>>>>> malicious
>>>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>>>> designed
>>>>>>> to allocate small URLs that were plaintext signup URLs routed  
>>>>>>> to a
>>>>>>> PHP
>>>>>>> system, perhaps more risk than is present in a typical
>>>>>>> application!)
>>>>>>>
>>>>>>> Note that a further issue you have with an LCG is running it  
>>>>>>> in a
>>>>>>> cluster -
>>>>>>> even if you can lock on the particular instance you have, you
>>>>>>> can't
>>>>>>> guarantee that some other instance in the cluster has not
>>>>>>> stepped on
>>>>>>> past the slot you want. I think that the approach in my service
>>>>>>> (a SecureRandom plus a transactional check into persistence) is
>>>>>>> most
>>>>>>> likely to work reasonably well in a cluster - although it hasn't
>>>>>>> been
>>>>>>> tested there :P
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Antranig.
>>>>>>>
>>>>>>>
>>>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>>>
>>>>>>>> Carl,
>>>>>>>>
>>>>>>>> Sounds good, is there an existing LCG Java implementation we  
>>>>>>>> can
>>>>>>>> leverage or should we roll our own?
>>>>>>>>
>>>>>>>> cheers,
>>>>>>>> Steve
>>>>>>>>
>>>>>>>>
>>>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>>>
>>>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>>>> collision of
>>>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>>>> considerably.
>>>>>>>>>
>>>>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>>>>> noticed
>>>>>>>>> something about the way URLs are generated.  If I'm headed in
>>>>>>>>> the
>>>>>>>>> wrong direction with this, please correct me.
>>>>>>>>>
>>>>>>>>> The very general approach is:
>>>>>>>>> 1) generate a random ID
>>>>>>>>> 2) check that the ID doesn't already exist
>>>>>>>>> 3) if exists, go to 1.  if not, record the ID and release it  
>>>>>>>>> to
>>>>>>>>> the
>>>>>>>>> caller.
>>>>>>>>>
>>>>>>>>> The following code for generating a URL:
>>>>>>>>>
>>>>>>>>> Random generator = new Random();
>>>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>>>
>>>>>>>>> has an issue with collision which is addressed by checking for
>>>>>>>>> the
>>>>>>>>> existence of that ID before handing it out but the algorithm  
>>>>>>>>> run
>>>>>>>>> time
>>>>>>>>> will continue to grow because of the growing chance of  
>>>>>>>>> collision
>>>>>>>>> over
>>>>>>>>> time.
>>>>>>>>>
>>>>>>>>> Explanation of concern:
>>>>>>>>> The first generated ID will have a 0/n chance of collision  
>>>>>>>>> where
>>>>>>>>> n is
>>>>>>>>> the total number of possible IDs.  The next ID has a 1/n  
>>>>>>>>> chance,
>>>>>>>>> the
>>>>>>>>> next 2/n, etc up to the last having a chance of collision  
>>>>>>>>> being
>>>>>>>>> n-1/n.
>>>>>>>>> Considering n needs to be sufficiently large to have a low to
>>>>>>>>> zero
>>>>>>>>> rate of duplication over time, this check will take more time
>>>>>>>>> because
>>>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>>>> column in
>>>>>>>>> the database is indexed, the lookup time is minimized but
>>>>>>>>> there's
>>>>>>>>> still a call to the database lookup on each generated ID.  As
>>>>>>>>> the
>>>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>>>> possibly
>>>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>>>
>>>>>>>>> The current code recreates the Random each time which stops
>>>>>>>>> nextInt()
>>>>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>>>>> Java
>>>>>>>>> implements a linear congruential generator to generate random
>>>>>>>>> numbers
>>>>>>>>> which requires knowledge of the previously generated ID to be
>>>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>>>> reuses it
>>>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>>>
>>>>>>>>> I would suggest creating an LCG in the TinyURL service so that
>>>>>>>>> the
>>>>>>>>> proper range can be easily managed (better performance due  
>>>>>>>>> to a
>>>>>>>>> narrower scope than Math.random()) and the seed can be  
>>>>>>>>> restored
>>>>>>>>> after
>>>>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>>>>> rules
>>>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>>>> sufficiently large prime number.  This still carries a concern
>>>>>>>>> of
>>>>>>>>> generating IDs across a cluster but is relatively easy to
>>>>>>>>> address.
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>>>> _______________________________________________
>>>>>>>>> sakai-dev mailing list
>>>>>>>>> sakai-dev@...
>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>
>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>> sakai-dev-unsubscribe@...
>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sakai-dev mailing list
>>>>>>>> sakai-dev@...
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>
>>>>>>>> TO UNSUBSCRIBE: send email to
>>>> sakai-dev-unsubscribe@...
>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ----------------------------------------------------------------
>>>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev@...
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>>>> sakai-dev-unsubscribe@...
>>>>>>> with a subject of "unsubscribe"
>>>>>>>
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>>>> sakai-dev-unsubscribe@...
>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>>>> Senior Research Engineer - CARET - University of Cambridge
>>>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>>> with a subject of "unsubscribe"
>>>>
>>>
>>>
>>>
>>>
>>> ----------------------------------------------------------------
>>> This message was sent using IMP, the Internet Messaging Program.
>>>
>>
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>>  with a subject of "unsubscribe"
>>
>
>
>
> --
> Jon Gorrono
> PGP Key: 5434509D
> email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>++++++
> +.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<+++++++
> +.>---.>>+.<<<----.-.>++.}
> http{ats.ucdavis.edu}
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>  with a subject of "unsubscribe"

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Jon Gorrono :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm sort of reeling over this.... are you sure?

 unaccustomed and unprepared for this kind of response...

:)



On Wed, Jul 15, 2009 at 2:51 AM, Steve
Swinsburg<s.swinsburg@...> wrote:

> Good idea, I'll add that as an additional method where you specify the TTL
> before the URL is no longer resolved.
>
>
> cheers,
> Steve
>
> ---
> Steve Swinsburg
> Portal Systems Developer
> Centre for e-Science
> Lancaster University
> Lancaster
> LA1 4YT
>
> email: s.swinsburg@...
> phone: +44 (0) 1524 594870
>
>
>
>
>
>
>
> On 15 Jul 2009, at 04:27, Jon Gorrono wrote:
>
>> I have been a little curious why timeboxing is not used to decrease
>> collisions for this service.... we use the service in a custom guest
>> access app we made to put registration urls etc in emails....
>> obviously we want to timebox access to the urls but a ttl can be
>> xtremely effective in reducing an otherwise skewed data set to
>> something resembling white noise... as it give a small window view of
>> a larger distribution
>>
>> Maybe an api call could change the ttl and return the probability of
>> collision during that time or some other silliness
>>
>>
>>
>> On Wed, Jul 8, 2009 at 2:52 PM, Steve
>> Swinsburg<s.swinsburg@...> wrote:
>>>
>>> Ok I've added an alternative method to return a longer random id which
>>> makes it less guessable. The number of permutations with this combo
>>> has 30 zeroes (on a-z and 0-9).
>>>
>>> Have some fun with the calculator! http://bit.ly/gJP1X
>>>
>>> cheers,
>>> Steve
>>>
>>>
>>>
>>> On 8 Jul 2009, at 17:25, antranig@... wrote:
>>>
>>>> The issue with collisions occurs when you are hosting URLs which are
>>>> not protected by the container, but held by some external resource -
>>>> for example, a signup system "click here to validate your email to
>>>> Xxxxx service". The implementation I made actually runs in two modes,
>>>> you may request either "SECURE" or "SHORT" URLs by an option to the
>>>> API. The short URLs start at only a couple of characters and
>>>> are indeed sequential.
>>>> Anyway, feel free to implement your own system, it is the Sakai way
>>>> after all :)
>>>>
>>>> Cheers,
>>>> Antranig.
>>>>
>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>
>>>>> The security is handled in the container anyway so you get a
>>>>> shortened
>>>>> URL and it resolves into the full URL, redirects and Sakai takes
>>>>> over.
>>>>> Even if it was sequential it wouldn't be a security concern.
>>>>>
>>>>> With a keyspace of lowercase letters and numbers(36 chars) there are
>>>>> about 2 billion combinations, with uppercase as well there's 50
>>>>> billion. Info here on the theoretical limits:
>>>>> http://bit.ly/3QG2uj
>>>>>
>>>>> Using RandomStringUtils from commons-lang, I wrote a short method to
>>>>> iterate over a million alphanumeric combinations using letters and
>>>>> numbers of length 6 chars and didn't run into a single collision.
>>>>> Using this method and a transactional check when adding into the
>>>>> database is probably the way to go.
>>>>>
>>>>> We can also filter for bad words, BBC has an interesting list (thanks
>>>>> Seth! :)
>>>>>
>>>>> WDYT?
>>>>>
>>>>> Steve
>>>>>
>>>>>
>>>>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>>>>
>>>>>> Is there a reason why the id generation could be more like tinyURL
>>>>>> and
>>>>>> simply be sequential and thus very short (at least initially and
>>>>>> probably for most institutions it would stay small).
>>>>>>
>>>>>> Just a thought.
>>>>>> -AZ
>>>>>>
>>>>>>
>>>>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>>>>>> wrote:
>>>>>>>
>>>>>>> You could consider using 22 character uuids as autogenerated tinies
>>>>>>> in
>>>>>>> the url. It might not be able to be called tinyurl anymore, but it
>>>>>>> would at least be short, linkable and (much more) collision free.
>>>>>>>
>>>>>>>
>>>>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>>>>
>>>>>>> I'm not convinced it would be necessary to keep these id's from
>>>>>>> being
>>>>>>> unpredictable. The current security model should prevent anyone
>>>>>>> from
>>>>>>> accessing content they don't have permission to even guessing
>>>>>>> id's. I
>>>>>>> think eventually when a UI was designed for this it might even want
>>>>>>> to
>>>>>>> allow the user to specify a custom ID, similar to how the Site
>>>>>>> Aliases
>>>>>>> (on new site creation) works, allowing a user to specify anything
>>>>>>> they
>>>>>>> want for their site alias. (Like tinyurl.com) So this would almost
>>>>>>> guarantee collisions.
>>>>>>>
>>>>>>> So I wouldn't think that the possibility of a collision on an low
>>>>>>> use
>>>>>>> feature for an indexed column (when collisions will probably be
>>>>>>> featured in eventually anyway) likely won't be a big deal.
>>>>>>>
>>>>>>> -Matthew
>>>>>>>
>>>>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...> wrote:
>>>>>>>>
>>>>>>>> Please take a look at the comments at the head of the
>>>>>>>> "EighteenIDGenerator"
>>>>>>>> file for some details of the maths involved in computing how many
>>>>>>>> documents will arise before a collision:
>>>>>>>>
>>>>>
>>>>
>>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>>>>
>>>>>>>> I wouldn't really recommend using a key space smaller than this
>>>>>>>> one
>>>>>>>> if you want to avoid collisions - although of course you will
>>>>>>>> reprobe
>>>>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>>>>> predictable -
>>>>>>>> this is why I use a SecureRandom here: One of the requirements of
>>>>>>>> the
>>>>>>>> TinyURLService that I made for the work at CARET was that the ID
>>>>>>>> sequence could not be predicted - to prevent the possibility of
>>>>>>>> malicious
>>>>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>>>>> designed
>>>>>>>> to allocate small URLs that were plaintext signup URLs routed to a
>>>>>>>> PHP
>>>>>>>> system, perhaps more risk than is present in a typical
>>>>>>>> application!)
>>>>>>>>
>>>>>>>> Note that a further issue you have with an LCG is running it in a
>>>>>>>> cluster -
>>>>>>>> even if you can lock on the particular instance you have, you
>>>>>>>> can't
>>>>>>>> guarantee that some other instance in the cluster has not
>>>>>>>> stepped on
>>>>>>>> past the slot you want. I think that the approach in my service
>>>>>>>> (a SecureRandom plus a transactional check into persistence) is
>>>>>>>> most
>>>>>>>> likely to work reasonably well in a cluster - although it hasn't
>>>>>>>> been
>>>>>>>> tested there :P
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Antranig.
>>>>>>>>
>>>>>>>>
>>>>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>>>>
>>>>>>>>> Carl,
>>>>>>>>>
>>>>>>>>> Sounds good, is there an existing LCG Java implementation we can
>>>>>>>>> leverage or should we roll our own?
>>>>>>>>>
>>>>>>>>> cheers,
>>>>>>>>> Steve
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>>>>
>>>>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>>>>> collision of
>>>>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>>>>> considerably.
>>>>>>>>>>
>>>>>>>>>> In looking at the implementation of the TinyURL service, I've
>>>>>>>>>> noticed
>>>>>>>>>> something about the way URLs are generated.  If I'm headed in
>>>>>>>>>> the
>>>>>>>>>> wrong direction with this, please correct me.
>>>>>>>>>>
>>>>>>>>>> The very general approach is:
>>>>>>>>>> 1) generate a random ID
>>>>>>>>>> 2) check that the ID doesn't already exist
>>>>>>>>>> 3) if exists, go to 1.  if not, record the ID and release it to
>>>>>>>>>> the
>>>>>>>>>> caller.
>>>>>>>>>>
>>>>>>>>>> The following code for generating a URL:
>>>>>>>>>>
>>>>>>>>>> Random generator = new Random();
>>>>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>>>>
>>>>>>>>>> has an issue with collision which is addressed by checking for
>>>>>>>>>> the
>>>>>>>>>> existence of that ID before handing it out but the algorithm run
>>>>>>>>>> time
>>>>>>>>>> will continue to grow because of the growing chance of collision
>>>>>>>>>> over
>>>>>>>>>> time.
>>>>>>>>>>
>>>>>>>>>> Explanation of concern:
>>>>>>>>>> The first generated ID will have a 0/n chance of collision where
>>>>>>>>>> n is
>>>>>>>>>> the total number of possible IDs.  The next ID has a 1/n chance,
>>>>>>>>>> the
>>>>>>>>>> next 2/n, etc up to the last having a chance of collision being
>>>>>>>>>> n-1/n.
>>>>>>>>>> Considering n needs to be sufficiently large to have a low to
>>>>>>>>>> zero
>>>>>>>>>> rate of duplication over time, this check will take more time
>>>>>>>>>> because
>>>>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>>>>> column in
>>>>>>>>>> the database is indexed, the lookup time is minimized but
>>>>>>>>>> there's
>>>>>>>>>> still a call to the database lookup on each generated ID.  As
>>>>>>>>>> the
>>>>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>>>>> possibly
>>>>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>>>>
>>>>>>>>>> The current code recreates the Random each time which stops
>>>>>>>>>> nextInt()
>>>>>>>>>> from knowing history (ie. new random seed on each generation).
>>>>>>>>>> Java
>>>>>>>>>> implements a linear congruential generator to generate random
>>>>>>>>>> numbers
>>>>>>>>>> which requires knowledge of the previously generated ID to be
>>>>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>>>>> reuses it
>>>>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>>>>
>>>>>>>>>> I would suggest creating an LCG in the TinyURL service so that
>>>>>>>>>> the
>>>>>>>>>> proper range can be easily managed (better performance due to a
>>>>>>>>>> narrower scope than Math.random()) and the seed can be restored
>>>>>>>>>> after
>>>>>>>>>> a restart.  I have created a prototype in Python based on the
>>>>>>>>>> rules
>>>>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>>>>> sufficiently large prime number.  This still carries a concern
>>>>>>>>>> of
>>>>>>>>>> generating IDs across a cluster but is relatively easy to
>>>>>>>>>> address.
>>>>>>>>>>
>>>>>>>>>> WDYT?
>>>>>>>>>>
>>>>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>>>>> _______________________________________________
>>>>>>>>>> sakai-dev mailing list
>>>>>>>>>> sakai-dev@...
>>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>>
>>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe@...
>>>>>>>>>>
>>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> sakai-dev mailing list
>>>>>>>>> sakai-dev@...
>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>
>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe@...
>>>>>>>>>
>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ----------------------------------------------------------------
>>>>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sakai-dev mailing list
>>>>>>>> sakai-dev@...
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>
>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe@...
>>>>>>>>
>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev@...
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>
>>>>> sakai-dev-unsubscribe@...
>>>>>>>
>>>>>>> with a subject of "unsubscribe"
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>>>>> Senior Research Engineer - CARET - University of Cambridge
>>>>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>>>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>>>>>> sakai-dev-unsubscribe@...
>>>>>> with a subject of "unsubscribe"
>>>>>
>>>>> _______________________________________________
>>>>> sakai-dev mailing list
>>>>> sakai-dev@...
>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>
>>>>> TO UNSUBSCRIBE: send email to
>>>>> sakai-dev-unsubscribe@...
>>>>> with a subject of "unsubscribe"
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ----------------------------------------------------------------
>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@... with a subject of
>>> "unsubscribe"
>>>
>>
>>
>>
>> --
>> Jon Gorrono
>> PGP Key: 5434509D
>>
>> email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>+++++++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<++++++++.>---.>>+.<<<----.-.>++.}
>> http{ats.ucdavis.edu}
>> _______________________________________________
>> sakai-dev mailing list
>> sakai-dev@...
>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>
>> TO UNSUBSCRIBE: send email to
>> sakai-dev-unsubscribe@... with a subject of
>> "unsubscribe"
>
>



--
Jon Gorrono
PGP Key: 5434509D
email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>+++++++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<++++++++.>---.>>+.<<<----.-.>++.}
http{ats.ucdavis.edu}

Sent from Davis, CA, United States
_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"

Re: [Building Sakai] TinyURL Service Impl Issue

by Steve Swinsburg-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

What, that you had a good idea? Or someone would implement it :) Just  
got a Jira space, I'll file it.


cheers,
Steve


On 18/07/2009, at 2:56 AM, Jon Gorrono wrote:

> I'm sort of reeling over this.... are you sure?
>
> unaccustomed and unprepared for this kind of response...
>
> :)
>
>
>
> On Wed, Jul 15, 2009 at 2:51 AM, Steve
> Swinsburg<s.swinsburg@...> wrote:
>> Good idea, I'll add that as an additional method where you specify  
>> the TTL
>> before the URL is no longer resolved.
>>
>>
>> cheers,
>> Steve
>>
>> ---
>> Steve Swinsburg
>> Portal Systems Developer
>> Centre for e-Science
>> Lancaster University
>> Lancaster
>> LA1 4YT
>>
>> email: s.swinsburg@...
>> phone: +44 (0) 1524 594870
>>
>>
>>
>>
>>
>>
>>
>> On 15 Jul 2009, at 04:27, Jon Gorrono wrote:
>>
>>> I have been a little curious why timeboxing is not used to decrease
>>> collisions for this service.... we use the service in a custom guest
>>> access app we made to put registration urls etc in emails....
>>> obviously we want to timebox access to the urls but a ttl can be
>>> xtremely effective in reducing an otherwise skewed data set to
>>> something resembling white noise... as it give a small window view  
>>> of
>>> a larger distribution
>>>
>>> Maybe an api call could change the ttl and return the probability of
>>> collision during that time or some other silliness
>>>
>>>
>>>
>>> On Wed, Jul 8, 2009 at 2:52 PM, Steve
>>> Swinsburg<s.swinsburg@...> wrote:
>>>>
>>>> Ok I've added an alternative method to return a longer random id  
>>>> which
>>>> makes it less guessable. The number of permutations with this combo
>>>> has 30 zeroes (on a-z and 0-9).
>>>>
>>>> Have some fun with the calculator! http://bit.ly/gJP1X
>>>>
>>>> cheers,
>>>> Steve
>>>>
>>>>
>>>>
>>>> On 8 Jul 2009, at 17:25, antranig@... wrote:
>>>>
>>>>> The issue with collisions occurs when you are hosting URLs which  
>>>>> are
>>>>> not protected by the container, but held by some external  
>>>>> resource -
>>>>> for example, a signup system "click here to validate your email to
>>>>> Xxxxx service". The implementation I made actually runs in two  
>>>>> modes,
>>>>> you may request either "SECURE" or "SHORT" URLs by an option to  
>>>>> the
>>>>> API. The short URLs start at only a couple of characters and
>>>>> are indeed sequential.
>>>>> Anyway, feel free to implement your own system, it is the Sakai  
>>>>> way
>>>>> after all :)
>>>>>
>>>>> Cheers,
>>>>> Antranig.
>>>>>
>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>
>>>>>> The security is handled in the container anyway so you get a
>>>>>> shortened
>>>>>> URL and it resolves into the full URL, redirects and Sakai takes
>>>>>> over.
>>>>>> Even if it was sequential it wouldn't be a security concern.
>>>>>>
>>>>>> With a keyspace of lowercase letters and numbers(36 chars)  
>>>>>> there are
>>>>>> about 2 billion combinations, with uppercase as well there's 50
>>>>>> billion. Info here on the theoretical limits:
>>>>>> http://bit.ly/3QG2uj
>>>>>>
>>>>>> Using RandomStringUtils from commons-lang, I wrote a short  
>>>>>> method to
>>>>>> iterate over a million alphanumeric combinations using letters  
>>>>>> and
>>>>>> numbers of length 6 chars and didn't run into a single collision.
>>>>>> Using this method and a transactional check when adding into the
>>>>>> database is probably the way to go.
>>>>>>
>>>>>> We can also filter for bad words, BBC has an interesting list  
>>>>>> (thanks
>>>>>> Seth! :)
>>>>>>
>>>>>> WDYT?
>>>>>>
>>>>>> Steve
>>>>>>
>>>>>>
>>>>>> On 8 Jul 2009, at 12:10, Aaron Zeckoski wrote:
>>>>>>
>>>>>>> Is there a reason why the id generation could be more like  
>>>>>>> tinyURL
>>>>>>> and
>>>>>>> simply be sequential and thus very short (at least initially and
>>>>>>> probably for most institutions it would stay small).
>>>>>>>
>>>>>>> Just a thought.
>>>>>>> -AZ
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jul 8, 2009 at 5:00 PM, Matthew Jones<jonespm@...>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> You could consider using 22 character uuids as autogenerated  
>>>>>>>> tinies
>>>>>>>> in
>>>>>>>> the url. It might not be able to be called tinyurl anymore,  
>>>>>>>> but it
>>>>>>>> would at least be short, linkable and (much more) collision  
>>>>>>>> free.
>>>>>>>>
>>>>>>>>
>>>>>>>> http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string
>>>>>>>>
>>>>>>>> I'm not convinced it would be necessary to keep these id's from
>>>>>>>> being
>>>>>>>> unpredictable. The current security model should prevent anyone
>>>>>>>> from
>>>>>>>> accessing content they don't have permission to even guessing
>>>>>>>> id's. I
>>>>>>>> think eventually when a UI was designed for this it might  
>>>>>>>> even want
>>>>>>>> to
>>>>>>>> allow the user to specify a custom ID, similar to how the Site
>>>>>>>> Aliases
>>>>>>>> (on new site creation) works, allowing a user to specify  
>>>>>>>> anything
>>>>>>>> they
>>>>>>>> want for their site alias. (Like tinyurl.com) So this would  
>>>>>>>> almost
>>>>>>>> guarantee collisions.
>>>>>>>>
>>>>>>>> So I wouldn't think that the possibility of a collision on an  
>>>>>>>> low
>>>>>>>> use
>>>>>>>> feature for an indexed column (when collisions will probably be
>>>>>>>> featured in eventually anyway) likely won't be a big deal.
>>>>>>>>
>>>>>>>> -Matthew
>>>>>>>>
>>>>>>>> On Tue, Jul 7, 2009 at 7:08 PM, <antranig@...>  
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> Please take a look at the comments at the head of the
>>>>>>>>> "EighteenIDGenerator"
>>>>>>>>> file for some details of the maths involved in computing how  
>>>>>>>>> many
>>>>>>>>> documents will arise before a collision:
>>>>>>>>>
>>>>>>
>>>>>
>>>>> https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java
>>>>>>>>>
>>>>>>>>> I wouldn't really recommend using a key space smaller than  
>>>>>>>>> this
>>>>>>>>> one
>>>>>>>>> if you want to avoid collisions - although of course you will
>>>>>>>>> reprobe
>>>>>>>>> to be sure. Another issue with LCGs is that they are highly
>>>>>>>>> predictable -
>>>>>>>>> this is why I use a SecureRandom here: One of the  
>>>>>>>>> requirements of
>>>>>>>>> the
>>>>>>>>> TinyURLService that I made for the work at CARET was that  
>>>>>>>>> the ID
>>>>>>>>> sequence could not be predicted - to prevent the possibility  
>>>>>>>>> of
>>>>>>>>> malicious
>>>>>>>>> "hijacking" of upcoming URL slots (this was actually a system
>>>>>>>>> designed
>>>>>>>>> to allocate small URLs that were plaintext signup URLs  
>>>>>>>>> routed to a
>>>>>>>>> PHP
>>>>>>>>> system, perhaps more risk than is present in a typical
>>>>>>>>> application!)
>>>>>>>>>
>>>>>>>>> Note that a further issue you have with an LCG is running it  
>>>>>>>>> in a
>>>>>>>>> cluster -
>>>>>>>>> even if you can lock on the particular instance you have, you
>>>>>>>>> can't
>>>>>>>>> guarantee that some other instance in the cluster has not
>>>>>>>>> stepped on
>>>>>>>>> past the slot you want. I think that the approach in my  
>>>>>>>>> service
>>>>>>>>> (a SecureRandom plus a transactional check into persistence)  
>>>>>>>>> is
>>>>>>>>> most
>>>>>>>>> likely to work reasonably well in a cluster - although it  
>>>>>>>>> hasn't
>>>>>>>>> been
>>>>>>>>> tested there :P
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Antranig.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Quoting Steve Swinsburg <s.swinsburg@...>:
>>>>>>>>>
>>>>>>>>>> Carl,
>>>>>>>>>>
>>>>>>>>>> Sounds good, is there an existing LCG Java implementation  
>>>>>>>>>> we can
>>>>>>>>>> leverage or should we roll our own?
>>>>>>>>>>
>>>>>>>>>> cheers,
>>>>>>>>>> Steve
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>>>>>>>>
>>>>>>>>>>> Summary: There is an ever increasing probability that the
>>>>>>>>>>> collision of
>>>>>>>>>>> IDs in the TinyURL service will slow down the service
>>>>>>>>>>> considerably.
>>>>>>>>>>>
>>>>>>>>>>> In looking at the implementation of the TinyURL service,  
>>>>>>>>>>> I've
>>>>>>>>>>> noticed
>>>>>>>>>>> something about the way URLs are generated.  If I'm headed  
>>>>>>>>>>> in
>>>>>>>>>>> the
>>>>>>>>>>> wrong direction with this, please correct me.
>>>>>>>>>>>
>>>>>>>>>>> The very general approach is:
>>>>>>>>>>> 1) generate a random ID
>>>>>>>>>>> 2) check that the ID doesn't already exist
>>>>>>>>>>> 3) if exists, go to 1.  if not, record the ID and release  
>>>>>>>>>>> it to
>>>>>>>>>>> the
>>>>>>>>>>> caller.
>>>>>>>>>>>
>>>>>>>>>>> The following code for generating a URL:
>>>>>>>>>>>
>>>>>>>>>>> Random generator = new Random();
>>>>>>>>>>> int seed = Math.abs(generator.nextInt());
>>>>>>>>>>>
>>>>>>>>>>> has an issue with collision which is addressed by checking  
>>>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>> existence of that ID before handing it out but the  
>>>>>>>>>>> algorithm run
>>>>>>>>>>> time
>>>>>>>>>>> will continue to grow because of the growing chance of  
>>>>>>>>>>> collision
>>>>>>>>>>> over
>>>>>>>>>>> time.
>>>>>>>>>>>
>>>>>>>>>>> Explanation of concern:
>>>>>>>>>>> The first generated ID will have a 0/n chance of collision  
>>>>>>>>>>> where
>>>>>>>>>>> n is
>>>>>>>>>>> the total number of possible IDs.  The next ID has a 1/n  
>>>>>>>>>>> chance,
>>>>>>>>>>> the
>>>>>>>>>>> next 2/n, etc up to the last having a chance of collision  
>>>>>>>>>>> being
>>>>>>>>>>> n-1/n.
>>>>>>>>>>> Considering n needs to be sufficiently large to have a low  
>>>>>>>>>>> to
>>>>>>>>>>> zero
>>>>>>>>>>> rate of duplication over time, this check will take more  
>>>>>>>>>>> time
>>>>>>>>>>> because
>>>>>>>>>>> of the number of collisions over time.  Assuming that the ID
>>>>>>>>>>> column in
>>>>>>>>>>> the database is indexed, the lookup time is minimized but
>>>>>>>>>>> there's
>>>>>>>>>>> still a call to the database lookup on each generated ID.  
>>>>>>>>>>> As
>>>>>>>>>>> the
>>>>>>>>>>> number of IDs generated reaches the max possible, there are
>>>>>>>>>>> possibly
>>>>>>>>>>> n-1 lookups before a valid ID is found.
>>>>>>>>>>>
>>>>>>>>>>> The current code recreates the Random each time which stops
>>>>>>>>>>> nextInt()
>>>>>>>>>>> from knowing history (ie. new random seed on each  
>>>>>>>>>>> generation).
>>>>>>>>>>> Java
>>>>>>>>>>> implements a linear congruential generator to generate  
>>>>>>>>>>> random
>>>>>>>>>>> numbers
>>>>>>>>>>> which requires knowledge of the previously generated ID to  
>>>>>>>>>>> be
>>>>>>>>>>> effective.  Math.random() creates the random seed once and
>>>>>>>>>>> reuses it
>>>>>>>>>>> for the life of the JVM process which is a better approach.
>>>>>>>>>>>
>>>>>>>>>>> I would suggest creating an LCG in the TinyURL service so  
>>>>>>>>>>> that
>>>>>>>>>>> the
>>>>>>>>>>> proper range can be easily managed (better performance due  
>>>>>>>>>>> to a
>>>>>>>>>>> narrower scope than Math.random()) and the seed can be  
>>>>>>>>>>> restored
>>>>>>>>>>> after
>>>>>>>>>>> a restart.  I have created a prototype in Python based on  
>>>>>>>>>>> the
>>>>>>>>>>> rules
>>>>>>>>>>> laid out in the Wikipedia article on LCG's [1] and using a
>>>>>>>>>>> sufficiently large prime number.  This still carries a  
>>>>>>>>>>> concern
>>>>>>>>>>> of
>>>>>>>>>>> generating IDs across a cluster but is relatively easy to
>>>>>>>>>>> address.
>>>>>>>>>>>
>>>>>>>>>>> WDYT?
>>>>>>>>>>>
>>>>>>>>>>> [1] http://en.wikipedia.org/wiki/Linear_congruential_generator
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> sakai-dev mailing list
>>>>>>>>>>> sakai-dev@...
>>>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>>>
>>>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>>
>>>>>> sakai-dev-unsubscribe@...
>>>>>>>>>>>
>>>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> sakai-dev mailing list
>>>>>>>>>> sakai-dev@...
>>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>>
>>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>>
>>>>>> sakai-dev-unsubscribe@...
>>>>>>>>>>
>>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ----------------------------------------------------------------
>>>>>>>>> This message was sent using IMP, the Internet Messaging  
>>>>>>>>> Program.
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> sakai-dev mailing list
>>>>>>>>> sakai-dev@...
>>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>>
>>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>>
>>>>>> sakai-dev-unsubscribe@...
>>>>>>>>>
>>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> sakai-dev mailing list
>>>>>>>> sakai-dev@...
>>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>>
>>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>>
>>>>>> sakai-dev-unsubscribe@...
>>>>>>>>
>>>>>>>> with a subject of "unsubscribe"
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Aaron Zeckoski (azeckoski (at) vt.edu)
>>>>>>> Senior Research Engineer - CARET - University of Cambridge
>>>>>>> https://twitter.com/azeckoski - http://www.linkedin.com/in/azeckoski
>>>>>>> http://aaronz-sakai.blogspot.com/ - http://tinyurl.com/azprofile
>>>>>>> _______________________________________________
>>>>>>> sakai-dev mailing list
>>>>>>> sakai-dev@...
>>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>>
>>>>>>> TO UNSUBSCRIBE: send email to
>>>>>>> sakai-dev-unsubscribe@...
>>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>>> _______________________________________________
>>>>>> sakai-dev mailing list
>>>>>> sakai-dev@...
>>>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>>>
>>>>>> TO UNSUBSCRIBE: send email to
>>>>>> sakai-dev-unsubscribe@...
>>>>>> with a subject of "unsubscribe"
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----------------------------------------------------------------
>>>>> This message was sent using IMP, the Internet Messaging Program.
>>>>>
>>>>
>>>> _______________________________________________
>>>> sakai-dev mailing list
>>>> sakai-dev@...
>>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>>
>>>> TO UNSUBSCRIBE: send email to
>>>> sakai-dev-unsubscribe@... with a subject of
>>>> "unsubscribe"
>>>>
>>>
>>>
>>>
>>> --
>>> Jon Gorrono
>>> PGP Key: 5434509D
>>>
>>> email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>++++
>>> +++.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<+++++++
>>> +.>---.>>+.<<<----.-.>++.}
>>> http{ats.ucdavis.edu}
>>> _______________________________________________
>>> sakai-dev mailing list
>>> sakai-dev@...
>>> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>>>
>>> TO UNSUBSCRIBE: send email to
>>> sakai-dev-unsubscribe@... with a subject of
>>> "unsubscribe"
>>
>>
>
>
>
> --
> Jon Gorrono
> PGP Key: 5434509D
> email{>+++++++++[>+++++++++++>++++++++++++>+++++++>+++++<<<<-]>++++++
> +.>++++.<---.>-.+++..---.-.+.>+.<++++++.<----.+.---.>+.<+++++++
> +.>---.>>+.<<<----.-.>++.}
> http{ats.ucdavis.edu}
>
> Sent from Davis, CA, United States
> _______________________________________________
> sakai-dev mailing list
> sakai-dev@...
> http://collab.sakaiproject.org/mailman/listinfo/sakai-dev
>
> TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@...
>  with a subject of "unsubscribe"

_______________________________________________
sakai-dev mailing list
sakai-dev@...
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev

TO UNSUBSCRIBE: send email to sakai-dev-unsubscribe@... with a subject of "unsubscribe"