promotion image of download ymail app

How to generate unique strings in PHP/SQL?

Our application has millions of entries, and each of them has a unique string identifier, until now generated by random functions in PHP, but to improve efficiency, my boss asked me to generate unique Strings before inserting in the DB:

Explaining better: now I use a sort of rand(), and check if the string (alphanumerical of 4 characters) already exists in the DB. If not, the script inserts, if it is, it generates another string.

Now my boss would like that rand() generates a unique string, so the script doesn't have to check for duplicates, is it possible?

Since the script is executed as the user asks to insert, how might it know "previous" strings? Should I use a SQL procedure?

Thank you.

3 Answers

  • Anonymous
    4 months ago

    You do understand the principles of how random number generators work, don't you? It's pseudo-random, you have a seed and a count variable.

    In the same manner that a car's keyfob can generate a unique key each time you push the button (until the count overflows) you just are storing a seed (which remains constant) and a count (which increases every time you generate a new userID).

    At the basic, you can just count from 0 to 36^4 -1... it shouldn't be an issue to implement the general ideas of a random generator to fit your use case, what your boss is asking for really isn't a terribly hard problem.

    • Alex4 months agoReport

      Yeah I do know, but how to do that?

    • Commenter avatarLogin to reply the answers
  • Anonymous
    4 months ago

    apply unique key to the identity field and set to not null, re-generate the rand() in case of duplication occurs. (alphanumerical of 4 too small, consider using GUID instead

    • Commenter avatarLogin to reply the answers
  • 4 months ago

    A random generator with no "memory" will always have a nonzero probability of generating the same number twice.  The probability of a duplicate increases when many such numbers are generated.  This is closely related to the famous Birthday Paradox, where there's a slightly better than even-money chance of a duplicate birthday among just 23 randomly chosen people.

    You mentioned strings of 4 alphanumeric characters.  Allowing upper and lower case, that's 62 different characters in each position.  About 6 bits (2^6 = 64) and 4 of those is about 24 bits for a total of a bit over 16 million possible unique strings.

    With n total distinct values to choose from, the probability of no duplicates in a set of k randomly chosen values is approximately:

    p(n, k) = [(n - 1) / n] ^ [k(k-1) / 2]

    See the Wikipedia link for details.  That ^ is "raised to the power", not the PHP "bitwise exclusive or" operator, btw.

    The probability of at least one collision is then q(n, k) = 1 - p(n, k).

    With only 1,000 random numbers from a set of 2^24 (~16.78 million), you have a probability of q(2^24, 1000) ~~ 0.029, or nearly a 3% chance.  With 10,000 values, the probability of at least one duplicate is nearly 0.95 (95%).

    With "millions", duplicates are effectively a certainty.  You can't do this with just random choices from such a short string. With 8 characters (about 2^48 bits) the probability of a collision in 1 million samples is down to 0.001775, less than 0.2%, but still too big for my taste.  With 16 characters, the chances are essentially zero for a duplicate after 100 million random choices. 

    Those aren't the only problems.  PHP can use a thoroughly deficient version of rand() if it's running on a Windows host, and the interface won't have more than 32 bits per call on any 32-bit OS. 

    Have you looked at making the string a primary key in the database table?  That would let you generate random values with a small chance of having to retry with a new random after getting a duplicate key error adding a table row.  (I still suggest a longer string since 4 alphanumerics only gives you 16.78 million keys, and a database of "millions" is going to have over a 10% chance of collision every time.)

    • ...Show all comments
    • Alex4 months agoReport

      4 characters are imposed since this "unique ID" has to be memorized by the user as ther ID

    • Commenter avatarLogin to reply the answers
Still have questions? Get your answers by asking now.