C strings are notoriously difficult to use and easy to mess up, for both beginners and experienced. If you come from other programming languages, C strings can cause a lot of headaches due to its lack of supporting ordinary features to work with strings that you might see in other programming languages.

#1. Array of characters

Strings in C are defined as an array of chracters. While type character, char, is a primitive data type in C, strings are not. Therefore, you can assign strings to variables like these:

file_type_c
char *str1 = "Hello world";
char str2[] = "Hello world";
char str3[] = {'H', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd', '\0'};
char str4[12] = "Hello world";

The first two examples are special syntaxes that the compiler can understand and create two arrays of characters for you. The only difference is that the double-quoted string in the second example specifies the initial values of the character in the array, whereas the first one creates a static, read-only string in memory and assign the address to the pointer str1.1

Fun fact

In memory, the first example will create a readonly string in the .rodata, which is a section in the binary that stores read-only data such as string literals. Those are created at compile time and cannot be modified during runtime.

You can test this by creating two strings with the same content and compare their addresses.

file_type_c
main.c
char hello1[] = {'H', 'e', 'l', 'l', 'o', '\0'};
char hello2[] = {'H', 'e', 'l', 'l', 'o', '\0'};
char *constant1 = "template string";
char *constant2 = "template string";
printf("Address of hello1: %p\n", hello1);
printf("Address of hello2: %p\n", hello2);
printf("Address of constant1: %p\n", constant1);
printf("Address of constant2: %p\n", constant2);

Output:

default_file
Output
Address of hello1: 0x7ffe2cb51346
Address of hello2: 0x7ffe2cb51340
Address of constant1: 0x402010
Address of constant2: 0x402010

The third example allows you to explicitly define the array of characters. However, you need to append the null terminator character, \0, at the end of the array to indicate the end of the string. Otherwise, other functions can keep reading pass the array and cause undefined behaviors.

The last example is simply for readability. Similar to the second example where the compiler can infer the size of the array from the string literal, the third one allows you to explicitly define the size of the array.

Note

In the last example, the size of array must be at least 12 to include the null terminator character. Otherwise, the compiler will complain about the size of the array is too small.

file_type_c
main.c
char too_long_str[12] = "This initializer string exceeds the size of the array";
printf("%s\n", too_long_str);

Output:

default_file
Output
main.c: In function 'main':
main.c:13:29: warning: initializer-string for array of 'char' is too long
| char too_long_str[12] = "This initializer string exceeds the size of the array";
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This initial

#2. Modifying C strings

Programming languages like Java and Python have immutable strings, which means once you create a string, you cannot modify it. C, on the other hand, allows you to modify strings in place because to C, strings are just arrays of characters.

file_type_c
main.c
char str[] = "Hello world";
str[0] = 'h';
printf("%s\n", str);

Output:

default_file
Output
hello world

C also provides a set of functions to directly modify strings. For example,

file_type_c
main.c
#include <stdio.h>
#include <ctype.h>
int main()
{
char str[] = "Hello world";
// Uppercase all characters
for (int i = 0; str[i] != '\0'; ++i) {
str[i] = toupper(str[i]);
}
printf("%s\n", str);
// Lower case all characters
for (int i = 0; str[i] != '\0'; ++i) {
str[i] = tolower(str[i]);
}
printf("%s\n", str);
return 0;
}

Output:

default_file
Output
HELLO WORLD
hello world
Note

C allows you to directly modify strings as long as those changes are within the boundaries of the array. Another limitation is that you cannot change the size of the array. If you need to change the size of the string, you need to create a new array and copy the content of the old array to the new one.

As mentioned in the previous section, in order to modify strings, you need to create one with char[] instead of char *, because the latter creates a readonly string and cannot be modified.

#3. Size and iteration

One easy way to get the size of a string is to iterate through the array and count the number of characters until the null terminator character is reached.

file_type_c
main.c
#include <stdio.h>
int main()
{
char str[] = "Hello world";
int size = 0;
for (int i = 0; str[i] != '\0'; ++i) {
++size;
}
// or
for (char *p = str; *p != '\0'; ++p) {
++size;
}
printf("Size of string: %d\n", size);
return 0;
}

Output:

default_file
Output
Size of string: 11

However, this is not efficient and can be troublesome if you're not careful. Luckily, C provides a function to get the size of a string called strlen(3).

file_type_c
main.c
#include <stdio.h>
int main()
{
char str[] = "Hello world";
int size = strlen(str);
printf("Size of string: %d\n", size);
return 0;
}

Output:

default_file
Output
Size of string: 11
Note

Another way to get the size of a string is to use the sizeof operator.

file_type_c
main.c
size_t size = sizeof(str) / sizeof(str[0]); // 12

However, this method is not recommended because it also counts the null terminator character.

#4. String comparison

An intuitive way to compare two strings is to iterate through two strings and compare one-by-one character. However, C provides a set of string-comparison functions to save the troubles for you

#strcmp and strncmp

strcmp is the most commonly used function to compare two strings.

Meanwhile, strncmp is similar to strcmp but allows you to specify the number of characters to compare. This is useful when you want to compare a substring of two strings such checking if two strings have the same prefix.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char str1[] = "Hello world";
char str2[] = "Hello world";
char str3[] = "hello world";
printf("str1 == str2: %d\n", strcmp(str1, str2));
printf("str1 == str3: %d\n", strcmp(str1, str3));
printf("str2 == str3: %d\n", strcmp(str2, str3));
printf("(last word) str1 == str3: %d\n", strncmp(str1 + 6, str3 + 6, 5));
return 0;
}

Output:

default_file
Output
str1 == str2: 0
str1 == str3: -32
str2 == str3: -32
(last word) str1 == str3: 0

The return value of strcmp and strncmp returns 0 if the two strings are equal, a negative value if the first string is less than the second string, and a positive value if the first string is greater than the second string. To know what the value means exactly, you might want to refer to the documentation.

#strcasecmp and strncasecmp

Addition to strcmp, strcasecmp is another function to compare two strings but ignores the case of the strings.

This is useful when you want to compare two strings but don't care about their cases such as checking two headers in HTTP requests.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char *request_header = "Content-Type";
char *content_type_header = "content-type";
char *content_length_header = "content-length";
printf("request_header == content_type_header: %d\n",
strcmp(request_header, content_type_header));
printf("(same prefix) request_header == content_length_header: %d\n",
strcmp(request_header, content_length_header, 7));
return 0;
}

Output:

default_file
Output
request_header == content_type_header: 0
(same prefix) request_header == content_length_header: 0

#5. String copy

#strcpy

String copy can be done by iterating through the source string and copy each character to the destination string. However, C provides a function to do this for you.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char str1[] = "Hello world";
char str2[12];
strcpy(str2, str1);
printf("str2: %s\n", str2);
return 0;
}

Output:

default_file
Output
str2: Hello world

#strncpy

strncpy is similar to strcpy but allows you to specify the number of characters to copy. This is useful when you want to copy a substring of a string.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char str1[] = "Hello world";
char str2[6];
strncpy(str2, str1 + 6, 5);
printf("str2: %s\n", str2);
return 0;
}

Output:

default_file
Output
str2: world
Note

strcpy and strncpy do not check the size of the destination string. It is the responsibility of the programmer to make sure that the destination string has enough space to store the source string, including the null terminator, i.e. strlen(src) + 1.

#strdup

While strcpy and strcpy requires you to create the destination string and make sure it has enough space on your own, strdup(3) and strndup(3) will do those things for you.

The only drawback is that since the returned string is dynamically allocated, you need to free it to avoid memory leaks. Also, as a result of dynamic allocation, you might also want to check if the allocation is successful.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char str1[] = "Hello world";
char *str2 = strdup(str1);
char *str3 = strndup(str1, 5);
if (str2 == NULL) {
printf("Failed to allocate memory\n");
return 1;
}
printf("str2: %s\n", str2);
printf("str3: %s\n", str3);
free(str2);
return 0;
}

Output:

default_file
Output
str2: Hello world
str3: Hello

#6. String concatenation

Programming languages like Java and Python provide an overloaded operator+() to concatenate two strings. C, on the other hand, does not provide such thing. However, C provides a set of functions to achieve the same results

#In-place concatenation

strcat allows you to append a string to an original string, and override the null terminator if need. Similar to strcpy, you need to make sure that the original string has enough space to store the appended string, i.e. strlen(dest) + strlen(src) + 1.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char str1[12] = "Hello";
char str2[] = " world";
strcat(str1, str2);
printf("str1: %s\n", str1);
return 0;
}

#New strings with malloc

If you don't want to modify the original strings while concatenating them, you can create a new string and copy the content of the original strings to the new one.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main()
{
char str1[] = "Hello";
char str2[] = " world";
// Make sure to allocate enough space for the null terminator (+1)
char *str3 = malloc(strlen(str1) + strlen(str2) + 1);
if (str3 == NULL) {
printf("Failed to allocate memory\n");
return 1;
}
strcpy(str3, str1);
strcat(str3, str2);
printf("str3: %s\n", str3);
free(str3);
return 0;
}

#New strings with char[]

Allocating memory with malloc can be troublesome. If the string is relatively small, you can create a new string with char[] and copy the content of the original strings to the new one.

In order to determine the size of the new string, you can use strlen to get the size of the original strings and add 1 for the null terminator.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char str1[] = "Hello";
char str2[] = " world";
char str3[strlen(str1) + strlen(str2) + 1];
strcpy(str3, str1);
strcat(str3, str2);
printf("str3: %s\n", str3);
return 0;
}

Another way is to determine the maximum size of your string. This is useful when your application requires a fixed-size buffer to store the string, like HTTP headers

file_type_c
main.c
#include <stdio.h>
#include <string.h>
#define MAX_HEADER_SIZE 1024
int main()
{
char str1[] = "Hello";
char str2[] = " world";
char str3[MAX_HEADER_SIZE];
strcpy(str3, str1);
strcat(str3, str2);
printf("str3: %s\n", str3);
return 0;
}

#New string with string literals

C allows you to concatenate string literals togehter.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char *str1 = "Hello" " " "world";
printf("str1: %s\n", str1);
return 0;
}

The above syntax is extremely useful when you have string literals as macros. For example, you can define a macro to change the color of the text in the terminal.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
#define RED "\033[0;31m"
#define GREEN "\033[0;32m"
#define RESET "\033[0m"
int main()
{
char *str1 = RED "Hello" RESET " " GREEN "world" RESET " from C";
printf("str1: %s\n", str1);
return 0;
}

Output:

default_file
Output

str1: Hello world from C

#7. String breaking

In particular programs, you might encounter strings that are too long to fit in a single readable line. In this case, you can break the string into multiple lines by using the \ character. This is used in almost CLI programs written in C. For example, coreutils/cat uses this technique to break the help message into multiple lines.

file_type_c
main.c
const char *helpmsg = "\
Usage: cat [OPTION]... [FILE]...\n\
Concatenate FILE(s) to standard output.\n\
\n\
With no FILE, or when FILE is -, read standard input.\n\
\n\
-A, --show-all equivalent to -vET\n\
-b, --number-nonblank number nonempty output lines, overrides -n\n\
-e equivalent to -vE\n\
-E, --show-ends display $ at end of each line\n\
-n, --number number all output lines\n\
-s, --squeeze-blank suppress repeated empty output lines\n\
-t equivalent to -vT\n\
-T, --show-tabs display TAB characters as ^I\n\
-u (ignored)\n\
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB\n\
--help display this help and exit\n\
--version output version information and exit\n\
";

You can break a string into multiple lines using string literals.

file_type_c
main.c
const char *http_request_template = "GET / HTTP/1.1\r\n"
"Host: %s\r\n"
"User-Agent: %s\r\n"
"Accept: */*\r\n"
"\r\n";

#8. String templating

Another features that other programming languages have but C does not is string templating/interpolation. For example, in Python, you can use the f-string, or in JavaScript, you can use the template literal `${}` syntax to interpolate variables into a string.

file_type_python
main
.py
main
.js
name = input("What is your name? ")
print(f"Hello {name}")

#With stack-based strings

In C, you can use sprintf to achieve the same result. However, as usual, you need to make sure that the destination string has enough space to store the interpolated string.

file_type_c
main.c
#include <stdio.h>
#include <string.h>
int main()
{
char name[32];
char greeting[64];
printf("What is your name? ");
scanf("%s", name);
sprintf(greeting, "Hello %s", name);
printf("%s\n", greeting);
return 0;
}

You can extent this technique to apply on template files such as HTML files where there are placeholders that you want to replace with actual values.

index
.html
file_type_c
main
.c
default_file
Output
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>%s</title>
</head>
<body>
<h1 class="text-3xl font-bold underline">Hello %s</h1>
</body>
</html>

The problem with this approach is that you need to make sure that the size of the destination string is large enough to store the interpolated string. For example:

file_type_c
main.c
const char *template = "You entered %d";
char str[13];
snprintf(str, 13, template, 123456789);

This might cause a buffer overflow because the size of the destination string is smaller than what is required to store the interpolated string. Buffer overflow can cause undefined behaviors, which are not detected by the compiler and also hard to catch during runtime.

Use `snprintf`

sprintf is not safe because neither does it check the size of the destination nor give you a way to specify the size of the destination string. Instead, you can use snprintf. It allows you to specify the size of the destination string and the function will not write more than the size you specified.

#With dynamic strings

It is sometimes hard to determine the correct size of the destination array in some situations. Let's consider the following example where you want to return an HTTP response with an HTML page.

file_type_c
main.c
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
int main()
{
int fd = open("index.html", O_RDONLY);
if (fd == -1)
{
perror("open");
exit(EXIT_FAILURE);
}
struct stat sb;
if (fstat(fd, &sb) == -1)
{
close(fd);
perror("fstat");
exit(EXIT_FAILURE);
}
char *html_str = (char *)mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
if (html_str == MAP_FAILED)
{
close(fd);
perror("mmap");
exit(EXIT_FAILURE);
}
char *response = NULL;
const char *template = "HTTP/1.1 200 OK\r\n"
"Content-Type: text/html\r\n"
"Content-Length: %ld\r\n"
"\r\n"
"%s";
asprintf(&response, template, sb.st_size, html_str);
if (response == NULL)
{
close(fd);
munmap(html_str, sb.st_size);
perror("asprintf");
exit(EXIT_FAILURE);
}
printf("%s\n", response);
close(fd);
munmap(html_str, sb.st_size);
free(response);
return 0;
}

Output:

default_file
HTTP Response
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 266
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Index page</title>
</head>
<body>
<h1 class="text-3xl font-bold underline">Hello page</h1>
</body>
</html>

In this example, you don't need to worry about the size of the destination string because asprintf will calculate the correct size and allocate the string for you. The only drawback is that you need to check and free the allocated string manually.

#9. Strings as paramaters

In C, strings are passed as pointers to the first character of the string. That implies two things:

The string parameters are new pointers that point to the same memory address as the original string. Modifying the content to which the new pointers point will also modify the original string.

#With char *

The most simple way to pass a string as a parameter is to use char *.

file_type_c
read_str.c
void read_str(char *str)
{
for (int i = 0; str[i] != '\0'; ++i) {
printf("%c", str[i]);
}
}

This function assumes that your string is correctly initialized. With the help of the null terminator character, you can iterate through the string and read each character until the null terminator character is reached.

#With size_t

In most case, you need to retrieve the size of the array. With only one string as parameter, you can use strlen to get the size of the string. However, strlen will iterate through the string every time you call it. This is inefficient if you need to call it multiple times.

To increase a bit of performance, you can pass the size of the string as an extra parameter.

file_type_c
read_str.c
void read_str(char *str, size_t size)
{
for (int i = 0; i < size; ++i) {
printf("%c", str[i]);
}
}

#With const

If you don't have an intention to modify the string, you might want to use const to indicate that the string is readonly and prevent accidental modification.

file_type_c
read_str.c
void read_str(const char *str, size_t size)
{
for (int i = 0; i < size; ++i) {
printf("%c", str[i]);
}
}

#10. Return strings

Unlike other programming languages, C does not allow you to return strings directly from a function. However, there are 3 ways to work around this.

#Why not return directly?

Consider an intuitive way to return a string from a function.

file_type_c
str.c
char *get_str()
{
char str[BUFLEN];
snprintf(str, BUFLEN, "Hello, world!");
return str;
}

But when we compile it with gcc, we get the following warning:

default_file
Output

str.c: In function 'get_str':

str.c:10:12: warning: function returns address of local variable [-Wreturn-local-addr]

   10 |     return str;

      |            ^~~

The array str is a chunk of memory allocated on the stack in the scope of the function get_str. str points to the first character of the array. When the function returns, the stack frame is popped and the memory chunk ceases to exist. Therefore, the pointer str has nothing to point to and becomes invalid.

#With static

The first way is to return a pointer to a static string. This is similar to returning a pointer to a string literal.

file_type_c
static_str
.c
file_type_c
main
.c
default_file
Output
#define BUFLEN 1024
char *get_static_str(void)
{
static char str[BUFLEN];
snprintf(str, BUFLEN, "Hello world");
return str
}

Static strings are created once and live throughout the lifetime of the program. Therefore, you don't need to worry about the lifetime of the string. However, since the string is shared among all calls to the function, you need to make sure that the string is used with caution.

#With stack-based strings

The second way is to create an empty array of characters and pass it as an argument. As mentioned above, modifying the content of the array will also modify the original string.

file_type_c
stack_str
.c
file_type_c
main
.c
default_file
Output
int get_stack_str(char *str, size_t size)
{
snprintf(str, size, "Hello, world!");
return 0;
}

Most of the C string functions use this technique such as strcpy, strcat. The only drawback is that you need to make sure the size of the parameterized string is large enough to store the returned string.

Since you already use an argument to store the returned string, you can use the return value to indicate the status of the function. In most C functions, 0 indicates success and -1 indicates failure.

#With malloc

The third way is to use malloc to allocate memory for the string. This is similar to the technique used in strdup and strndup.

file_type_c
malloc_str
.c
file_type_c
main
.c
default_file
Output
char *get_heap_str()
{
char *str = (char *)malloc(14);
if (str == NULL) {
return NULL;
}
snprintf(str, 14, "Hello, world!");
return str;
}

This technique can extent to other dynamically-allocated string functions such as asprintf and strdup. The allocated string is created on the heap and the object will live there until you free it. That's why you can return the string because the memory is still valid. The same drawback in any dynamic allocation is that you need to keep track and free the allocated memory to avoid memory leaks.

#Conclusion

Other programming languages provide string-support features and abstract away the details of working with strings. C, on the other hand, does not provide such features. You have the total control over how you want to create and use strings, with some extra help from the standard library. It's also easier to mess up with C strings over small things such as forgetting to append the null terminator.

Working with C strings might be frustrating due to its lack of supporting, especially if you come from other programming languages. Therefore, I hope this article can make your work with C a bit easier.

Footnotes

  1. comp.lang.c FAQ list - Question 1.32 (Link)