Not worth creating a project for, and it might be interesting to see what changes people would make.
Non-standard dependencies:
#!/usr/bin/zsh
# Author: @[email protected]
# 2025-02-23
final=(xargs echo)
count=6
while getopts d opt; do
case $opt in
d)
final=(tr 'A-Z' 'a-z')
;;
*)
printf "Password generator based on the correcthorse algorithm from http://xkcd.com/936//n/n"
printf "USAGE: %s [-d] [#]\n" "$0"
printf " -d make the result all lower case; otherwise, each word will be capitalized.\n"
printf " # the number of words to include. Defaults to 6."
exit 1
;;
esac
done
shift $(($OPTIND - 1))
[[ $# -gt 0 ]] && count=$*
shuf -n $((count * 2)) /usr/share/dict/american-english | \
sed 's/'"'"'.*//; s/^\(\w\)/\U\1/' | \
sort | uniq | shuf -n $count | xargs echo | \
tr -d ' ' | $final
What’s going on here:
Nearly 30% of the American dictionary (34,242) are words with apostrophes. They could be left in to help satisfy password requirements that demand “special characters,” but correcthorse isn’t an algorithm that handles idiot “password best practices” well anyway. So, since every word with an apostrophe has a pair word without one, we pull 2·N words to make sure we have enough. Then we strip out the plural/possessives and capitalize every word. Then we remove duplicates and select our N words from the result. Finally, we compact that into a space-less string of words, and if the user passed the -d
option, we downcase the entire thing.
Without the user options, this really could be a 1-liner; that’s how it started:
alias pony="shuf -n 12 /usr/share/dict/american-english | sed 's/'\"'\"'.*//; s/^\(\w\)/\U\1/' | sort | uniq | shuf -n 6 | xargs echo | tr -d ' '"
Finally got around to reviewing this and it’s surprisingly efficient. I’ve considered myself a pretty advanced Basher for a while and admit learning better technique from this. More specifically, I was unaware of
shuf
after years and years of Bash scripting. Cheers!That’s the one that made me make this script. I thought, surely this is a one-liner? And indeed, except that I kept adding features. I think I’m going to change my shell function back to the one-liner though. All of that extra complexity is unnecessary.
Looks nice - though I don’t feel great about the 2n solution to apostrophes. You could just as well end up with 2n words with apostrophes, no? Its not particularly robust.
With n=6, and only grabbing n words, you have a roughly 88.24% chance of getting at least one word with an apostrophe (ie, you can’t generate a valid passphrase) With n=6, and grabbing 2n words, you have a roughly 3.86% chance of getting at least 7 words with an apostrophe (ie, you can’t generate a valid passphrase). That’s more than 1 in 30 fail!
If every apostrophed word has a non-apostrophe pair, as you say, then perhaps better practice would be to keep the dictionary in order and generate a (somewhat) random number as an index to grab. If it has an apostrophe, grab the next/previous word (ie, its pair).
Might be trickier to fit into exactly 27 lines, but at least it would be robust?
Edit: in hindsight, if the dictionary is required for this to function in the first place, you could just pre-prune it to remove apostrophe words
I don’t feel great about the 2n solution to apostrophes. You could just as well end up with 2n words with apostrophes, no? Its not particularly robust.
It doesn’t matter - the algorithm takes the stems, it doesn’t drop the words. “Dad’s” becomes “Dad”. If you get both “Dad’s” and “Dad”, you might indeed get a passphrase containing “DadDad” - but that’s not a weakness. Good randomness doesn’t include a guarantee of no duplicates. In fact, the
uniq
call reduces the quality of the passphrase: “DadDadDadDadDadDad” is a perfectly good phrase.But it’s a good catch in another way: I’d considering only plurals and possessives, but the American dictionary word file does indeed include many words with more than one apostrophe suffix. No word of more than one letter appears more than 5 times, so 5n would guarantee enough different words. But the best thing about your comment is that it exposes another weakness: the dictionary contains several 1-letter “words”, and one of them - “O” - contains 25 variations with apostrophes. They’re all names: “O’Connell”, “O’Keefe”, etc. The next largest is “L” with 8 appearances: all borrowed words from French, such as “L’Amour”.
I don’t see a simple solution to excluding names, although a tweak could ensure that we get no single letter words. However, maybe simplifying the algorithm would be better: simply grab N words and delete any apostrophes. You might end up with mush like “OBrianMustveHed”, but perhaps that’s not a bad thing.
Perhaps the best implementation would be the simplest:
alias pony="shuf -n 6 /usr/share/dict/american-english | xargs echo | tr -d ' '
Leave in the apostrophes; more random bits. Leave in the spaces, if they’re legal characters in the authentication program, and you get even more.
Aaaaah I totally misunderstood why you were taking 2n. You were taking 2n in case the truncated string was the same as one you already had. Makes more sense now.
I opened this in a browser tab on my phone so that I can remember to review on my computer when I get home. Cheers for sharing.