Skip to main content

String

Introduction

There is no separate string type in C. Strings are treated as arrays of characters, i.e. arrays of type char. For example, the string "Hello" is treated as an array {'H', 'e', 'l', 'l', 'o'}.

The compiler allocates a contiguous section of memory to the array, and all characters are stored in adjacent memory cells. At the end of the string, C automatically adds a byte full of binary 0, written as the \0 character, to indicate the end of the string. The character \0 differs from the character 0 in that the ASCII code for the former is 0 (in the binary form 00000000) and the ASCII code for the latter is 48 (in the binary form 00110000). So the actual array stored in the string Hello is {'H', 'e', 'l', 'l', 'o', '\0'}.

The last character of all strings, is \0. The advantage of this is that C doesn't need to know the length of the string to read the string inside memory, and as soon as it finds a character that is \0, then it knows that the string is over.

char localString[10];

The above example declares a 10-member array of characters that can be treated as a string. Since one position must be left for 0, the string can only hold a maximum of 9 characters.

Strings are cumbersome to write as arrays, and C provides a shorthand where the characters within double quotes are automatically treated as arrays of characters.

{'H', 'e', 'l', 'l', 'o', '\0'}

// Equivalent to
"Hello"

The two strings written above are equivalent, and are stored internally in the same way. For strings inside double quotes, you don't need to add the ending character \0 yourself, C will do it automatically.

Note that double quotes are strings and single quotes are characters, they are not interchangeable. If you put Hello inside single quotes, the compiler will report an error.

// error is reported
'Hello'

On the other hand, even if there is only one character inside a double quote (e.g. "a"), it is still treated as a string (stored as 2 bytes) instead of the character 'a' (stored as 1 byte).

If a string contains double quotes inside it, the double quotes need to be escaped using a backslash.

"She replied, \"It does.\""

Backslashes can also indicate other special characters, such as line breaks (\n), tabs (\t), etc.

"Hello, world!\n"

If the string is too long, you can split a line into multiple lines by using a backslash (\) at the end where you need to break the line.

"hello \
world"

In the example above, the backslash at the end of the first line splits the string into two lines.

One disadvantage of this writing style above is that the second line must be written at the top, and if you want to include an indent, then the indent will be counted as part of the string. To solve this problem, C allows multiple string literals to be merged, as long as there is no spacing between them, or only spaces, and C will automatically merge them.

char greeting[50] = "Hello," "how are you" "today!";
// Equivalent to
char greeting[50] = "Hello, how are you today!";

This new way of writing supports the merging of multi-line strings.

char greeting[50] = "Hello, "
"how are you "
"today!";

`printf() uses the placeholder %s to output a string.

printf("%s\n", "hello world")

Declaration of string variables

A string variable can be declared as an array of characters, or as a pointer to an array of characters.

// Write one
char s[14] = "Hello, world!";

// Write two
char* s = "Hello, world!";

Both ways of writing above declare a string variable s. With the first way of writing, the length of the character array can be omitted from the declaration since the length of the character array can be allowed to be calculated automatically by the compiler.

char s[] = "Hello, world!";

In the above example, the compiler will specify the length of the array s as 14, which exactly holds the string that follows.

The length of a character array can be greater than the actual length of the string.

char s[50] = "hello";

In the above example, the length of the character array s is 50, but the actual length of the string "hello" is only 6 (including the ending symbol \0), so the next 44 empty positions will be initialised to \0.

The length of the character array cannot be less than the actual length of the string.

char s[5] = ``hello'';

In the above example, the length of the string array s is 5, which is less than the actual length of the string "hello", which is 6, and the compiler will report an error. This is because if only the first 5 characters are written, and the final ending symbol 0 is omitted, this is likely to cause errors in the string-related code that follows.

The two ways of declaring string variables, character pointers and character arrays, are basically equivalent, but there are two differences.

The first difference is that the string pointed to by the pointer is treated as a constant within the C language and cannot be modified by the string itself.

char* s = "Hello, world!";
s[0] = 'z'; // error

The above code uses a pointer, declares a string variable, and then modifies the first character of the string. This is written incorrectly and will lead to unpredictable consequences, and will probably report an error on execution.

If you declare a string variable using an array, you don't have this problem and can modify any member of the array.

char s[] = "Hello, world!";
s[0] = 'z';

Why can't a string be modified when it is declared as a pointer, but can be modified when it is declared as an array? The reason is that the system stores the literal of the string in a constant area of memory, which is not allowed to be modified by the user. When declared as a pointer, the value stored in the pointer variable is a memory address that points to the constant area, so the user cannot modify the constant area via this address. However, when declared as an array, the compiler allocates a separate section of memory to the array and the string literals are interpreted by the compiler as an array of characters, written character by character into this newly allocated section of memory, which can be modified.

To remind users that strings may not be modified after being declared as pointers, they may be declared with the const descriptor to ensure that the string is read-only.

const char* s = "Hello, world!";

The use of the const specifier when the above string is declared as a pointer ensures that the string cannot be modified. Once modified, the compiler will certainly report an error.

The second difference is that pointer variables can point to other strings.

char* s = "hello";
s = "world";

In the above example, the character pointer can point to another string.

However, a character array variable cannot point to another string.

char s[] = "hello";
s = "world"; // error reported

In the above example, the array name of the character array, which always points to the address of the string at initialization, cannot be modified.

For the same reason, after declaring a character array, you cannot directly assign it with a string.

char s[10];
s = "abc"; // error

In the above example, you cannot assign a string directly to a character array variable and an error will be reported. The reason for this is that the variable name of a character array, which is bound to the array it refers to, cannot point to another address.

Why can't an array variable be assigned to another array? The reason is that the address of an array variable cannot be changed, or rather, once the compiler has assigned an address to an array variable, that address is bound to the array variable and the binding relationship remains unchanged.

To reassign it, you must use the strcpy() function, which is provided natively in C, to do so by means of a string copy. After doing this, the address of the array variable remains the same, i.e. strcpy() just writes a new string to the original address, rather than making the array variable point to the new address.

char s[10];
strcpy(s, "abc");

In the above example, the strcpy() function copies the string abc to the variable s, the detailed usage of this function will be described later.

strlen()

The strlen() function returns the length in bytes of the string, excluding the null character \0 at the end. The prototype of this function is as follows.

// string.h
size_t strlen(const char* s);

Its argument is a string variable, and it returns an unsigned integer of type size_t, unless it is an extremely long string, which is normally treated as type int. Here is an example of usage.

char* str = "hello";
int len = strlen(str); // 5

The prototype of strlen() is defined in the string.h file of the standard library, and use requires loading the header file string.h.

#include <stdio.h>
#include <string.h>

int main(void) {
char* s = "Hello, world!";
printf("The string is %zd characters long.\n", strlen(s));
}

Note that the length of a string (strlen()) and the length of a string variable (sizeof()), are two different concepts.

char s[50] = ``hello``;
printf("%d\n", strlen(s)); // 5
printf("%d\n", sizeof(s)); // 50

In the above example, the string length is 5 and the string variable length is 50.

If you don't use this function, you can calculate the string length yourself by judging the \0 at the end of the string.

int my_strlen(char *s) {
int count = 0;
while (s[count] ! = '\0')
count++;
return count;
}

strcpy()

A string is copied, and the assignment operator cannot be used to assign a string directly to a character array variable.

char str1[10];
char str2[10];

str1 = "abc"; // error reported
str2 = str1; // report an error

Both of the above ways of writing a copy of a string are wrong. This is because the variable name of an array is a fixed address and cannot be modified so that it points to another address.

In the case of character pointers, the assignment operator (=) simply copies the address of one pointer to another, not the string.

char* s1;
char* s2;

s1 = "abc";
s2 = s1;

The above code can be run with the result that the two pointer variables s1 and s2 point to the same string, rather than copying the contents of the string s1 to s2.

The C language provides the strcpy() function for copying the contents of one string to another, which is equivalent to string assignment. The prototype of this function is defined inside the string.h header file.

strcpy(char dest[], const char source[])

strcpy() accepts two arguments, the first being the destination string array and the second the source string array. Before copying the string, you must ensure that the length of the first argument is not less than the second argument, otherwise, although no error will be reported, the bounds of the first string variable will be overflowed and unpredictable results will occur. The const specifier for the second argument indicates that this function will not modify the second string.

#include <stdio.h>
#include <string.h>

int main(void) {
char s[] = "Hello, world!";
char t[100];

strcpy(t, s);

t[0] = 'z';
printf("%s\n", s); // "Hello, world!"
printf("%s\n", t); // "zello, world!"
}

The above example takes the value of the variable s and puts a copy of it into the variable t, turning it into two different strings, so that modifying one does not affect the other. Also, the variable t is longer than s, and any extra positions after copying (after the end marker \0) are random values.

strcpy() can also be used for character array assignment.

char str[10];
strcpy(str, "abcd");

The above example takes a character array variable and assigns it to the string "abcd".

The return value of strcpy() is a string pointer (i.e. char*) to the first argument.

char* s1 = "beast";
char s2[40] = "Be the best that you can be.";
char* ps;

ps = strcpy(s2 + 7, s1);

puts(s2); // Be the beast
puts(ps); // beast

In the above example, the string beast is copied from position 7 of s2, leaving the previous positions unchanged. This causes everything after s2 to be truncated, as the null character at the end of beast will be copied along with it. strcpy() returns a pointer to the position where the copy begins.

Another use of the strcpy() return value is to assign values to multiple character arrays in succession.

strcpy(str1, strcpy(str2, "abcd"));

The above example calls strcpy() twice to complete the assignment of two string variables.

Also, the first argument to strcpy() should preferably be an array that has been declared, rather than a character pointer that has been declared and not initialized.

char* str;
strcpy(str, "hello world"); // error

The code above is faulty. strcpy() assigns the string to the pointer variable str, but str is not initialized and points to a random location, so the string could be copied anywhere.

If you don't use strcpy() and implement the string copy yourself, you can use the following code.

char* strcpy(char* dest, const char* source) {
char* ptr = dest;
while (*dest++ = *source++);
return ptr;
}

int main(void) {
char str[25];
strcpy(str, "hello world");
printf("%s\n", str);
return 0;
}

The key line in the above code is while (*dest++ = *source++), which is a loop that assigns each character of source to dest in turn, and then moves to the next position until it encounters \0 and the loop's judgment condition is no longer true, thus jumping out of the loop. The expression *dest++ is equivalent to *(dest++), i.e. it returns the address dest and then does a self-increment operation to move to the next position, while *dest can be assigned to the current position.

The strcpy() function is a safety risk because it does not check the length of the target string, whether it is long enough to hold a copy of the source string, and may lead to a write overflow. If there is no guarantee that an overflow will not occur, it is recommended that the strncpy() function be used instead.

strncpy()

strncpy() is used in exactly the same way as strcpy(), except that it takes a third argument to specify the maximum number of characters to be copied to prevent overflowing the bounds of the target string variable.

char* strncpy(
char* dest,
char* src,
size_t n
);

The third parameter n in the above prototype defines the maximum number of characters to be copied. If the source string is still not copied after the maximum number of characters is reached, copying will stop and the destination string will then have no terminator \0 at the end, which is important to note. If the source string has fewer than n characters, strncpy() behaves exactly like strcpy().

strncpy(str1, str2, sizeof(str1) - 1);
str1[sizeof(str1) - 1] = '\0';

In the above example, the string str2 is copied to str1, but the copy length is at most the length of str1 minus 1. The last bit left of str1 is used to write the ending flag \0 to the string. This is because strncpy() does not add \0 itself, and if the copied string fragment does not contain the ending flag, it will need to be added manually.

strncpy() can also be used to copy parts of strings.

char s1[40];
char s2[12] = ``hello world``;

strncpy(s1, s2, 5);
s1[5] = '\0';

printf("%s\n", s1); // hello

The above example specifies that only the first 5 characters are copied.

strcat()

The strcat() function is used to concatenate strings. It accepts two strings as arguments and adds a copy of the second string to the end of the first. This function changes the first string, but leaves the second string unchanged.

The prototype of this function is defined inside the string.h header file.

char* strcat(char* s1, const char* s2);

The return value of strcat() is a string pointer to the first argument.

char s1[12] = "hello";
char s2[6] = "world";

strcat(s1, s2);
puts(s1); // "helloworld"

In the above example, after calling strcat(), you can see that the value of the string s1 has changed.

Note that the length of the first argument to strcat() must be long enough to accommodate the addition of a second argument string. Otherwise, the spliced string will overflow the boundaries of the first string and be written to an adjacent memory cell, which is dangerous, and it is recommended that the following strncat() be used instead.

strncat()

strncat() is used to concatenate two strings, and is used in exactly the same way as strcat(), except that a third argument is added, specifying the maximum number of characters to be added. During addition, once the specified number of characters is reached, or if the null character \0 is encountered in the source string, it will not be added. Its prototype is defined in the string.h header file.

char* strncat(
const char* dest,
const char* src,
size_t n
);

strncat() returns the first argument, which is a pointer to the target string.

To ensure that the concatenated string, after concatenation, does not exceed the length of the target string, `strncat() is usually written as follows.

strncat(
str1,
str2,
sizeof(str1) - strlen(str1) - 1
);

strncat() will always automatically add the null character \0 to the end of the splice result, so the maximum value of the third argument should be the length of the variable str1 minus the length of the string str1, minus 1. Here is an example of usage.

char s1[10] = ``Monday``;
char s2[8] = "Tuesday";

strncat(s1, s2, 3);
puts(s1); // "MondayTue"

In the above example, the variable length of s1 is 10 and the character length is 6. Subtracting the two and subtracting 1 gives 3, indicating that up to three more characters can be added to s1, so the result obtained is MondayTue.

strcmp()

If you want to compare two strings, which cannot be done directly but character by character, the C language provides the strcmp() function.

The strcmp() function is used to compare the contents of two strings. The prototype of this function is as follows and is defined inside the string.h header file.

int strcmp(const char* s1, const char* s2);

In dictionary order, if two strings are the same, the return value is 0; if s1 is less than s2, strcmp() returns less than 0; if s1 is greater than s2, the return value is greater than 0.

Here is an example usage.

// s1 = Happy New Year
// s2 = Happy New Year
// s3 = Happy Holidays

strcmp(s1, s2) // 0
strcmp(s1, s3) // greater than 0
strcmp(s3, s1) // less than 0

Note that strcmp() is only used to compare strings, not characters. Because characters are small integers, they can be compared directly with the equality operator (==). So do not put the value of a character type (char), into strcmp() as an argument.

strncmp()

Since strcmp() compares the whole string, C provides the strncmp() function again, which compares only to the specified position.

This function adds a third argument specifying the number of characters to be compared. Its prototype is defined inside the string.h header file.

int strncmp(
const char* s1,
const char* s2,
size_t n
);

It returns the same value as strcmp(). If the two strings are the same, the return value is 0; if s1 is smaller than s2, strcmp() returns less than 0; if s1 is larger than s2, the return value is greater than 0.

Here is an example.

char s1[12] = "hello world";
char s2[12] = "hello C";

if (strncmp(s1, s2, 5) == 0) {
printf("They all have hello.\n");
}

The above example compares only the first 5 characters of two strings.

sprintf(), snprintf()

The sprintf() function is similar to printf(), but is used to write data to a string rather than outputting it to the display. The prototype of this function is defined inside the stdio.h header file.

int sprintf(char* s, const char* format, ...) ;

The first argument to sprintf() is a string pointer variable, the rest of the arguments are the same as printf(), i.e. the second argument is a format string and the subsequent arguments are a list of variables to be written.

char first[6] = "hello";
char last[6] = "world";
char s[40];

sprintf(s, "%s %s", first, last);

printf("%s\n", s); // hello world

In the above example, sprintf() combines the output into "hello world" and then puts in the variable s.

The return value of sprintf() is the number of characters written to the variable (not counting the trailing null character \0). If an error is encountered, a negative value is returned.

sprintf() has a serious security risk in that if the string written is too long and exceeds the length of the target string, sprintf() will still write it, causing an overflow to occur. To control the length of the written string, C provides another function, snprintf().

snprintf() takes only one more argument, n, than sprintf() and is used to control that the string written to the variable does not exceed n - 1 characters, leaving one position to write the null character \0. Here is its prototype.

int snprintf(char*s, size_t n, const char* format, ...) ;

`snprintf() will always automatically write the null character at the end of the string. If you try to write more than the specified maximum number of characters, snprintf() will write n - 1 characters, leaving the last position to write the null character.

Here is an example.

snprintf(s, 12, "%s %s", "hello", "world");

In the above example, the second argument to snprintf() is 12, indicating that the maximum length of the written string is no more than 12 (including trailing null characters).

The return value of snprintf() is the number of characters written to the format string (not counting the trailing null character \0). If n is large enough, the return value should be less than n, but sometimes the length of the format string may be greater than n, so the return value will be greater than n, but it is actually n-1 characters that are actually written to the variable. If an error is encountered, a negative value is returned. Therefore, the return value will only confirm that the full format string was written to the variable if it is non-negative and less than n.

String arrays

If each member of an array is a string, this needs to be achieved by means of a two-dimensional character array. Each string is itself an array of characters, and multiple strings form a further array.

char weekdays[7][10] = {
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday"
};

The above example is an array of 7 strings, so the length of the first dimension is 7. The length of the longest string is 10 (with the terminator \0 at the end), so the length of the second dimension is set to 10.

Because the length of the first dimension can be calculated automatically by the compiler, it can be omitted.

char weekdays[][10] = {
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday"
};

In the above example, the length of the first dimension of the two-dimensional array can be calculated automatically by the compiler, based on the assignment that follows, so it can be left out.

The second dimension of the array, which is set to a uniform length of 10, is a bit of a waste of space, since most of the members are less than 10. The solution is to change the second dimension of the array from a character array to a character pointer.

char* weekdays[] = {
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday"
};

The string array above is actually a one-dimensional array whose members are seven character pointers, each pointing to a string (the character array).

Iterating through the string array is written as follows.

for (int i = 0; i < 7; i++) {
printf("%s\n", weekdays[i]);
}