- cross-posted to:
- rust
- cross-posted to:
- rust
One of the digits of your credit card number is not like the rest of them: it’s purpose is to check for value correctness, this way any website or form knows what if checksum matches - the value was typed in correctly. Or you are lucky with a typo because it’s not a very good checksum - it was designed to be easy to calculate on a mechanical device and been sticking around mostly for historical reasons.
To check if a number is valid according to Luhn checksum you would go from right to left, double every second digit, add all the digits together, if result ends with 0 - checksum matches, if it ends with some other value - it does not.
For example let’s check the number 1594: write down the number as a bunch of digits
1 5 9 4
double every second digit from the right
2 5 18 4
add all the digits
2 + 5 + 1 + 8 + 4 = 20
ends with 0, so checksum is valid
Three key optimizations help to calculate it fast:
- You can split longer sums into short ones
- You can skip second splitting into digits by doing multiplication as usual and adding number of digits above 5 to the total
- SWAR can to perform a bunch of operations on individual digits at once
To illustrate first optimization let’s calculate 1 5 9 4
sum in two parts
1 5 => 2 5 => 2 + 5 = 7
9 4 => 18 4 => 1 + 8 + 4 = 13
7 + 13 = 20 - checksum is valid
to illustrate the second one
// digits themselves
1 5 9 4 => 2 5 18 4 => 2 + 5 + 18 + 4 = 29
// correction
0 0 1 0 => 0 + 0 + 1 + 0 = 1
total: 29 + 1 = 30
Result is off by 10, but for checksum validity purposes it’s good enough.
Last trick comes from doing SWAR and skipping extracting digits in the first place.
User input comes as an ASCII string “1594”, which we split into chunks of size 8 to fit into 64 bit register and pad with “0” from the right:
"1594" => "00001594"
// subtract 0x3030303030303030 ("00000000")
// to get decimal values
"00001594" - "00000000" => [0, 0, 0, 0, 1, 5, 9, 4]
At this point it is easy to check if string consists of only decimal digits just by adding 0x4646464646464646
and looking for overflows and underflows in both results - can be done for both at once
Multiplying every other digit by 2 and adding them together is done with a regular multiplication by 0x0201020102010201
, and math will do the rest:
A B
* 1 2
----------
2A 2B
A B
Top byte of the lower half of the result will contain 2A + B
for 2 digit case or the full result for all 8 digits in the actual code. Because all the digits are below 10 - there’s no overflow from earlier digits.
Whole code looks like this:
fn fold10_swar(mask1: u64, mask2: u64, raw: &[u8]) -> Option<u64> {
let mut sum = 0;
for c in raw.rchunks(8) {
let mut buf = [b'0'; 8];
copy_from_small_slice(&mut buf, c);
let mut v = u64::from_le_bytes(buf);
// try to overflow value up
let a = v.wrapping_add(0x4646464646464646);
// and down
v = v.wrapping_sub(0x3030303030303030);
// if either direction overflows - there are non 0..9 digits so
// checksum can't possibly be valid, otherwise all the values are digits
if (a | v) & 0x8080808080808080 == 0 {
// Calculate number of digits above 5 located at positions that would double
// Doubling them would result to values above 9 which will necessitate subtracting
// 9. But in mod 10 arithmetic -9 and +1 is the same so simply adding
// count of such digits is enough
sum += u64::from((mask2.wrapping_sub(v) & 0x8080808080808080).count_ones());
sum += v.wrapping_mul(mask1) >> 56;
} else {
return None;
}
}
Some(sum)
}
And good news - luhn3
crate can check the correctness or calculate a digit to use as a checksum for you and it works somewhat fast - on a 5 year old processor when compiled with target-cpu=native
it can verify a 16 digit credit card checksum in about 3 nanoseconds - in 12 or so CPU cycles, benchmarks included.
You’ve got me wondering exactly how the mechanical version works . There’s probably a Youtube video in it, for anyone reading who wants ideas.
https://patents.google.com/patent/US2950048A/en - more images and explanations here
Yes, that’s where I got that image from. My curiosity is at the level of “I’d happily watch/read a clear explanation, preferably with animation” but not at “let’s wade through this patent that’s not written to help people understand”.
I’m already reading RFCs with my free time, I’ve got to draw the line somewhere!