If you pay a special sort of attention, you may notice that the upper-case and lower-case alphabet characters on the ASCII table are always exactly offset from each other by 32.

For example, upper-case A is 65, while the lowercase a is 97

As such the way the binary place value works out, you can manipulate the case of a character by setting or clearing the sixth bit… (2^5 = 32, we start counting bits and everything at 0, we’re programmers!)

Decimal 32 expressed as hexadecimal is 0x20 and expressed as binary is 00100000

This binary form is useful for clearing, setting or toggling bits:

OperationOperatorMask bitEffect on target bit
SETOR (|)1Forces bit to 1
CLEARAND (&)0Forces bit to 0
TOGGLEXOR (^)1Flips bit (0→1, 1→0)
NO-OPAND (&)1Leaves bit unchanged
NO-OPOR (|)0Leaves bit unchanged

Read up on logic gates, particularly the Truth tables section for this to make more sense.

What we’re building to here is the understanding that it’s incredibly easy and efficient to lower, upper or toggle the case of a letter using bitwise operations.

A little demo in C:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#include <stdio.h>

char letter_togglecase (char a)
{
        return a ^ 0x20;
}

char letter_lowercase (char a)
{
        return a | 0x20;
}

char letter_uppercase (char a)
{
        return a & ~ 0x20;
}

int main (int argc, char **argv)
{
        char my_char = 'A';

        printf ("%c\n", letter_lowercase (my_char));
        printf ("%c\n", letter_togglecase (my_char));

        printf ("%c\n", letter_uppercase ('b'));

        return 0;
}

Compile and run the program to see the functions work…

$ clang -O0 -g -o charcase charcase.c
$ ./charcase
a
a
B
B

…and as you can imagine, we’re going to peak at this through a debugger as well.

% clang -O0 -g -o charcase charcase.c
% lldb ./charcase 
(lldb) target create "./charcase"
Current executable set to './charcase' (x86_64).
(lldb) b letter_togglecase
Breakpoint 1: where = charcase`letter_togglecase + 10 at charcase.c:5:16, address = 0x000000000020167a
(lldb) run
Process 15598 launched: './charcase' (x86_64)
a
Process 15598 stopped
* thread #1, name = 'charcase', stop reason = breakpoint 1.1
    frame #0: 0x000000000020167a charcase`letter_togglecase(a='A') at charcase.c:5:16
   2   	
   3   	char letter_togglecase (char a)
   4   	{
-> 5   	        return a ^ 0x20;
   6   	}
   7   	
   8   	char letter_lowercase (char a)
(lldb) disassemble -b
charcase`letter_togglecase:
    0x201670 <+0>:  55           push   rbp
    0x201671 <+1>:  48 89 e5     mov    rbp, rsp
    0x201674 <+4>:  40 88 f8     mov    al, dil
    0x201677 <+7>:  88 45 ff     mov    byte ptr [rbp - 0x1], al
->  0x20167a <+10>: 0f be 45 ff  movsx  eax, byte ptr [rbp - 0x1]
    0x20167e <+14>: 83 f0 20     xor    eax, 0x20
    0x201681 <+17>: 5d           pop    rbp
    0x201682 <+18>: c3           ret    
(lldb) var -L
0x0000000820324e2f: (char) a = 'A'
(lldb) memory read -format b 0x0000000820324e2f -size 1 -count 1
0x820324e2f: 0b01000001
(lldb) memory read -format d 0x0000000820324e2f -size 1 -count 1
0x820324e2f: 65
(lldb) b 0x201681
Breakpoint 2: where = charcase`letter_togglecase + 17 at charcase.c:5:9, address = 0x0000000000201681
(lldb) continue 
Process 15598 resuming
Process 15598 stopped
* thread #1, name = 'charcase', stop reason = breakpoint 2.1
    frame #0: 0x0000000000201681 charcase`letter_togglecase(a='A') at charcase.c:5:9
   2   	
   3   	char letter_togglecase (char a)
   4   	{
-> 5   	        return a ^ 0x20;
   6   	}
   7   	
   8   	char letter_lowercase (char a)
(lldb) register read -format c al
      al = 'a'
(lldb) register read -format d al
      al = 97
(lldb) register read -format b al
      al = 0b01100001
(lldb) 

Note the xor 0x20 instruction in the function’s ASM:

1
2
3
4
5
6
7
8
push   rbp
mov    rbp, rsp
mov    al, dil
mov    byte ptr [rbp - 0x1], al
movsx  eax, byte ptr [rbp - 0x1]
xor    eax, 0x20
pop    rbp
ret    

eax is the full 32-bit register where as al is just the first byte, a subset of the same register.