This is my second day on C. I'm still trying to familiarize myself with how memory works!
Here's my code that I made in order to split strings in to an array, it does it's job pretty well:
char* stringCutterOneDArr(char *str, int *len){
if(strlen(str) == 0) {
printf("No string is inputted.\n");
return "";
}
char *piece = strtok(str, " ");
int startFrom = 0;
int arrLen = 0;
char *arr = (char* )malloc(sizeof(char) * strlen(piece));
if(arr == NULL) {
printf("Memory allocation failed.");
return "";
}
/* I'll comment this later for the example */
strcpy(arr, piece);
startFrom += strlen(piece) + 1;
arrLen++;
piece = strtok(NULL, " ");
while(piece != NULL) {
// Reallocate memory before assigning string.
char *newarr = (char* )realloc(arr, startFrom + sizeof(char) * strlen(piece));
if(newarr == NULL) {
printf("Memory allocation failed.");
return "";
}
arr = newarr;
// Assing the string to the newly allocated memory
strcpy(arr + startFrom, piece);
startFrom += strlen(piece) + 1;
arrLen++;
// Refetch the next change if there's any
piece = strtok(NULL, " ");
}
*len = arrLen;
return arr;
}
void stringSplitOneDArr() {
char str[] = "Let's break this string apart";
int stringStart = 0;
int len;
char *string = stringCutterOneDArr(str, &len);
for(int i = 0; i < len; i++) {
char *selectedString = string + stringStart;
printf("%s\n", selectedString);
stringStart += strlen(selectedString) + 1;
}
}
This code produced will produce the expected output of:
❯ runc
Let's
break
this
string
apart
However if I commented the first strcpy
that I highlighted above, it only prints \n
five times (indicating that the array is empty). The expected output for me when I commented the first strcpy
was:
❯ runc
break
this
string
apart
Because it skips the first one even though the memory for the first one is already allocated. And then continue printing the rest like normal. Instead none of the five strings are in the array. Why did this happen?
Best Answer
short answer:
+1
inmalloc(sizeof(char) * strlen(piece))
andrealloc(arr, startFrom + strlen(piece));
strcpy(arr + startFrom, piece);
appends additional tokens toarr
. but if you skipstrcpy(arr, piece);
,arr
contains random data or is just uninitialized. then append will lead to undefined behavior, often resulting in empty lines or garbage values when printing the strings later.explanation:
I will explain to you the problem exactly and how your code worked in the first place, because it came to my mind how it worked correctly in the first place? it should not even work so let's break it down:
+1
inmalloc(sizeof(char) * strlen(piece))
andrealloc(arr, startFrom + strlen(piece));
what you are doing is reserving a memory location for your string but at the same time you forgot to allocate space for the
null
terminator(\0)
. This could lead to improper string termination. that means the string termination is not defined, then you are printing like this:so you are looking for the
(/0)
to print the first token and second one and so one, but the question is:How did it work in the first place without the +1? as before the comment it was working correctly?
the answer to this is by being lucky, it gave the same behavior, and let me show you what happened
first you did
malloc(sizeof(char) * strlen(piece))
for a size 5 which is exactly"let's"
without the(/0)
then you copied like this:strcpy(arr, piece);
which is a memory copy that takes the exact memory data as it to another memory location.so as a result: you copied
"let's/0"
toarr
and you did a memory violation, as you exceeded the size, you copied 6 bytes into a 5 bytes allocated memory.refer to this photo here:
then you did reallocation to add the next token
"break"
but you started the reallocation from byte number 6, not 5, which means you added a byte after"let's"
and then you did the memory copy with copping an extra byte again causing memory violation. look at the photo:so as you see the
(/0)
after"let's"
was lost, because already it wasn't reserved. and you were doing the same thing for each token, and the result for that was having a token then one undefined byte.the final
arr
that you return from the function is this:each
(cd)
is an undefined byte and the(00)
at the end is a memory violation. you may lose it if another program takes that byte.then at the end using the
printf("%s\n", selectedString);
by luck is cutting the string on the(cd)
, and by luck the(00)
was still available in the memory.so this answers the question of how it was working.
how it stopped working after commenting the strcpy? Skipping
strcpy(arr, piece);
leavesarr
with random data or uninitialized data (5 bytes of (cd)
). that means 5 empty string according toprintf("%s\n", selectedString);
to proof that change the loop from
i < len
toi < 10
and you will see the 5 empty lines then your strings.In short,
strcpy(arr, piece);
is initializing thearr
, and keepingarr
undefined will lead to the behavior you got. plus the missing +1 allocation.