probably for most institutions it would stay small).
Just a thought.
> You could consider using 22 character uuids as autogenerated tinies in
> the url. It might not be able to be called tinyurl anymore, but it
> would at least be short, linkable and (much more) collision free.
>
>
http://stackoverflow.com/questions/772802/storing-uuid-as-base64-string>
> I'm not convinced it would be necessary to keep these id's from being
> unpredictable. The current security model should prevent anyone from
> accessing content they don't have permission to even guessing id's. I
> think eventually when a UI was designed for this it might even want to
> allow the user to specify a custom ID, similar to how the Site Aliases
> (on new site creation) works, allowing a user to specify anything they
> want for their site alias. (Like tinyurl.com) So this would almost
> guarantee collisions.
>
> So I wouldn't think that the possibility of a collision on an low use
> feature for an indexed column (when collisions will probably be
> featured in eventually anyway) likely won't be a big deal.
>
> -Matthew
>
> On Tue, Jul 7, 2009 at 7:08 PM, <
antranig@...> wrote:
>> Please take a look at the comments at the head of the "EighteenIDGenerator"
>> file for some details of the maths involved in computing how many
>> documents will arise before a collision:
>>
https://source.caret.cam.ac.uk/rsf/projects/PonderUtilCore/trunk/src/uk/org/ponder/hashutil/EighteenIDGenerator.java>>
>> I wouldn't really recommend using a key space smaller than this one
>> if you want to avoid collisions - although of course you will reprobe
>> to be sure. Another issue with LCGs is that they are highly predictable -
>> this is why I use a SecureRandom here: One of the requirements of the
>> TinyURLService that I made for the work at CARET was that the ID
>> sequence could not be predicted - to prevent the possibility of malicious
>> "hijacking" of upcoming URL slots (this was actually a system designed
>> to allocate small URLs that were plaintext signup URLs routed to a PHP
>> system, perhaps more risk than is present in a typical application!)
>>
>> Note that a further issue you have with an LCG is running it in a cluster -
>> even if you can lock on the particular instance you have, you can't
>> guarantee that some other instance in the cluster has not stepped on
>> past the slot you want. I think that the approach in my service
>> (a SecureRandom plus a transactional check into persistence) is most
>> likely to work reasonably well in a cluster - although it hasn't been
>> tested there :P
>>
>> Cheers,
>> Antranig.
>>
>>
>> Quoting Steve Swinsburg <
s.swinsburg@...>:
>>
>>> Carl,
>>>
>>> Sounds good, is there an existing LCG Java implementation we can
>>> leverage or should we roll our own?
>>>
>>> cheers,
>>> Steve
>>>
>>>
>>> On 6 Jul 2009, at 12:34, Carl Hall wrote:
>>>
>>> > Summary: There is an ever increasing probability that the collision of
>>> > IDs in the TinyURL service will slow down the service considerably.
>>> >
>>> > In looking at the implementation of the TinyURL service, I've noticed
>>> > something about the way URLs are generated. If I'm headed in the
>>> > wrong direction with this, please correct me.
>>> >
>>> > The very general approach is:
>>> > 1) generate a random ID
>>> > 2) check that the ID doesn't already exist
>>> > 3) if exists, go to 1. if not, record the ID and release it to the
>>> > caller.
>>> >
>>> > The following code for generating a URL:
>>> >
>>> > Random generator = new Random();
>>> > int seed = Math.abs(generator.nextInt());
>>> >
>>> > has an issue with collision which is addressed by checking for the
>>> > existence of that ID before handing it out but the algorithm run time
>>> > will continue to grow because of the growing chance of collision over
>>> > time.
>>> >
>>> > Explanation of concern:
>>> > The first generated ID will have a 0/n chance of collision where n is
>>> > the total number of possible IDs. The next ID has a 1/n chance, the
>>> > next 2/n, etc up to the last having a chance of collision being n-1/n.
>>> > Considering n needs to be sufficiently large to have a low to zero
>>> > rate of duplication over time, this check will take more time because
>>> > of the number of collisions over time. Assuming that the ID column in
>>> > the database is indexed, the lookup time is minimized but there's
>>> > still a call to the database lookup on each generated ID. As the
>>> > number of IDs generated reaches the max possible, there are possibly
>>> > n-1 lookups before a valid ID is found.
>>> >
>>> > The current code recreates the Random each time which stops nextInt()
>>> > from knowing history (ie. new random seed on each generation). Java
>>> > implements a linear congruential generator to generate random numbers
>>> > which requires knowledge of the previously generated ID to be
>>> > effective. Math.random() creates the random seed once and reuses it
>>> > for the life of the JVM process which is a better approach.
>>> >
>>> > I would suggest creating an LCG in the TinyURL service so that the
>>> > proper range can be easily managed (better performance due to a
>>> > narrower scope than Math.random()) and the seed can be restored after
>>> > a restart. I have created a prototype in Python based on the rules
>>> > laid out in the Wikipedia article on LCG's [1] and using a
>>> > sufficiently large prime number. This still carries a concern of
>>> > generating IDs across a cluster but is relatively easy to address.
>>> >
>>> > WDYT?
>>> >
>>> > [1]
http://en.wikipedia.org/wiki/Linear_congruential_generator>>> > _______________________________________________
>>> > sakai-dev mailing list
>>> >
sakai-dev@...
>>> >
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev>>> >
>>> > TO UNSUBSCRIBE: send email to
sakai-dev-unsubscribe@...
>>> > with a subject of "unsubscribe"
>>>
>>> _______________________________________________
>>> sakai-dev mailing list
>>>
sakai-dev@...
>>>
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev>>>
>>> TO UNSUBSCRIBE: send email to
sakai-dev-unsubscribe@...
>>> with a subject of "unsubscribe"
>>>
>>
>>
>>
>>
>> ----------------------------------------------------------------
>> This message was sent using IMP, the Internet Messaging Program.
>>
>> _______________________________________________
>> sakai-dev mailing list
>>
sakai-dev@...
>>
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev>>
>> TO UNSUBSCRIBE: send email to
sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>>
> _______________________________________________
> sakai-dev mailing list
>
sakai-dev@...
>
http://collab.sakaiproject.org/mailman/listinfo/sakai-dev>
> TO UNSUBSCRIBE: send email to
sakai-dev-unsubscribe@... with a subject of "unsubscribe"
>