C Programming – How to Fix String Function Breakdowns

c++

This is my second day on C. I'm still trying to familiarize myself with how memory works!
Here's my code that I made in order to split strings in to an array, it does it's job pretty well:

char* stringCutterOneDArr(char *str, int *len){   

    if(strlen(str) == 0) {
        printf("No string is inputted.\n");
        return "";
    }

    char *piece = strtok(str, " ");

    int startFrom = 0;
    int arrLen = 0;
    char *arr = (char* )malloc(sizeof(char) * strlen(piece));

    if(arr == NULL) {
        printf("Memory allocation failed.");
        return "";
    }

    /* I'll comment this later for the example */
    strcpy(arr, piece);


    startFrom += strlen(piece) + 1;
    arrLen++;

    piece = strtok(NULL, " ");

    while(piece != NULL) {

        // Reallocate memory before assigning string.
        char *newarr = (char* )realloc(arr, startFrom + sizeof(char) * strlen(piece)); 
        if(newarr == NULL) {
            printf("Memory allocation failed.");
            return "";
        }
        arr = newarr;

        // Assing the string to the newly allocated memory
        strcpy(arr + startFrom, piece);
        startFrom += strlen(piece) + 1;
        arrLen++;

        // Refetch the next change if there's any
        piece = strtok(NULL, " ");
    }

    *len = arrLen;

    return arr;
}

void stringSplitOneDArr() {
    char str[] = "Let's break this string apart";
    int stringStart = 0;
    int len;
    
    char *string = stringCutterOneDArr(str, &len);

    for(int i = 0; i < len; i++) {
        char *selectedString = string + stringStart;
        printf("%s\n", selectedString);

        stringStart += strlen(selectedString) + 1; 
    }
    
}

This code produced will produce the expected output of:

❯ runc 
Let's
break
this
string
apart

However if I commented the first strcpy that I highlighted above, it only prints \n five times (indicating that the array is empty). The expected output for me when I commented the first strcpy was:

❯ runc 

break
this
string
apart 

Because it skips the first one even though the memory for the first one is already allocated. And then continue printing the rest like normal. Instead none of the five strings are in the array. Why did this happen?

Best Answer

short answer:

  • The first thing is the missing +1 in malloc(sizeof(char) * strlen(piece)) and realloc(arr, startFrom + strlen(piece));
  • the second thing is that Each subsequent call to strcpy(arr + startFrom, piece); appends additional tokens to arr. but if you skip strcpy(arr, piece);, arr contains random data or is just uninitialized. then append will lead to undefined behavior, often resulting in empty lines or garbage values when printing the strings later.

explanation:

I will explain to you the problem exactly and how your code worked in the first place, because it came to my mind how it worked correctly in the first place? it should not even work so let's break it down:

  • the first problem and the main important one is the missing +1 in malloc(sizeof(char) * strlen(piece)) and realloc(arr, startFrom + strlen(piece));

what you are doing is reserving a memory location for your string but at the same time you forgot to allocate space for the null terminator (\0). This could lead to improper string termination. that means the string termination is not defined, then you are printing like this:

printf("%s\n", selectedString);

so you are looking for the (/0) to print the first token and second one and so one, but the question is:

How did it work in the first place without the +1? as before the comment it was working correctly?

the answer to this is by being lucky, it gave the same behavior, and let me show you what happened

first you did malloc(sizeof(char) * strlen(piece)) for a size 5 which is exactly "let's" without the (/0) then you copied like this: strcpy(arr, piece); which is a memory copy that takes the exact memory data as it to another memory location.

so as a result: you copied "let's/0" to arr and you did a memory violation, as you exceeded the size, you copied 6 bytes into a 5 bytes allocated memory.

refer to this photo here: enter image description here

then you did reallocation to add the next token "break" but you started the reallocation from byte number 6, not 5, which means you added a byte after "let's" and then you did the memory copy with copping an extra byte again causing memory violation. look at the photo:

enter image description here

so as you see the (/0) after "let's" was lost, because already it wasn't reserved. and you were doing the same thing for each token, and the result for that was having a token then one undefined byte.

the final arr that you return from the function is this:

(4c 65 74 27 73) cd (62 72 65 61 6b) cd (74 68 69 73)
cd (73 74 72 69 6e 67) cd (61 70 61 72 74) 00

each (cd) is an undefined byte and the (00) at the end is a memory violation. you may lose it if another program takes that byte.

then at the end using the printf("%s\n", selectedString); by luck is cutting the string on the (cd), and by luck the (00) was still available in the memory.

so this answers the question of how it was working.

how it stopped working after commenting the strcpy? Skipping strcpy(arr, piece); leaves arr with random data or uninitialized data (5 bytes of (cd)). that means 5 empty string according to printf("%s\n", selectedString);

to proof that change the loop from i < len to i < 10 and you will see the 5 empty lines then your strings.

enter image description here

In short, strcpy(arr, piece); is initializing the arr, and keeping arr undefined will lead to the behavior you got. plus the missing +1 allocation.