Today I came across some methods for generating random strings, but saw they had some pretty harsh limitations or were implemented in hacky (albeit beautiful) ways. I took it upon myself to come up with a better way of generating both hex and alphanumeric random strings, and figured I'd document it here in case I need it again in the future.
Choose a wooden pillar, any one will do
(~)| Camera | Canon Eos Digital Rebel Xt |
| Date | 03 August 2008 at 17:41 |
| Exposure Mode | Aperture Priority |
| Exposure Time | 1/200 seconds |
| Aperture | f/11 |
| ISO Speed | 1600 |
| Focal Length | 105 mm |
| Metering | Pattern |
| Flash | Off |
The method I found generating random hex strings was neat, but the method used would only work for hex or other numerical strings in various bases. The method for generating alphanumeric strings was fairly simple and worked but had one huge flaw:
import random
import string
def alphanum(length):
"""
Generate a random alphanumeric string.
@type length: int
@param length: The length of the string to generate
@rtype: str
@return: A random alphanumeric string
"""
return ''.join(random.sample(string.letters + string.digits, length))
The major issue becomes apparent when you try to generate a long random string, say for example anything over 62 characters in length (26 * 2 + 10). The problem is the fact that random.sample(...) is a sampling algorithm without replacement. This actually means that no letter or digit will ever be repeated in the string and the limit in length is the number of total choices, as mentioned above (that's where the problems of length come from).
import random
import string
def generate_sampling(choices, length):
"""
Generate a random list of items from choices that is length long.
@type choices: list or string
@param choices: The possible choices
@type length: int
@param length: The length of the string or list to
generate
@rtype: list or string
@return: The generated random sampling
"""
return random.sample(choices * length, length)
def alphanum(length):
"""
Generate a random alphanumeric string.
@type length: int
@param length: The length of the string to generate
@rtype: str
@return: A random alphanumeric string
"""
sampling = generate_sampling(string.letters + string.digits, length)
return ''.join(sampling)
def hex(length):
"""
Generate a random hexadecimal string.
@type length: int
@param length: The length of the string to generate
@rtype: str
@return: A random hexadecimal string
"""
sampling = generate_sampling("0123456789abcdef", length)
return ''.join(sampling)
With the above you can easily generate random string of any length allowing letters and numbers to be repeated, but for large strings this can use a lot more memory than it should. I've actually run several tests because my original idea was to use random.choice(...) to choose each item, appending it to a list inside of a loop and it turns out that using a loop is about 30% slower (feel free to try it yourself). As always this leads me back to what I've said time and again - Python is great but let C do the hard work. In this case the hard work is choosing in a loop, and it's plain to see the advantage of using a bit of extra memory to let C handle the loop.
Hope this helps someone else out there, but if not, hello to me in the future wondering why the hell random.sample(...) doesn't allow replacement. :-)
Update (17 Feb 2009)
I decided to try out another approach:
return ''.join([choices[random.randint(0, len(choices) - 1)] for x in range(length)])
This one was even worse, taking three times as long to complete for 10,000 random alphanumeric strings of 500 characters.





