W3cubDocs

/C++

string literal

Syntax

" (unescaped_character|escaped_character)* " (1)
L " (unescaped_character|escaped_character)* " (2)
u8 " (unescaped_character|escaped_character)* " (3) (since C++11)
u " (unescaped_character|escaped_character)* " (4) (since C++11)
U " (unescaped_character|escaped_character)* " (5) (since C++11)
prefix(optional) R "delimiter( raw_characters )delimiter" (6) (since C++11)

Explanation

unescaped_character - Any valid character except the double-quote ", backslash \, or new-line character
escaped_character - See escape sequences
prefix - One of L, u8, u, U
delimiter - A character sequence made of any source character but parentheses, backslash and spaces (can be empty, and at most 16 characters long)
raw_characters - Any character sequence, except that it must not contain the closing sequence )delimiter"
1) Narrow multibyte string literal. The type of an unprefixed string literal is const char[N], where N is the size of the string in code units of the execution narrow encoding, including the null terminator.
2) Wide string literal. The type of a L"..." string literal is const wchar_t[N], where N is the size of the string in code units of the execution wide encoding, including the null terminator.
3) UTF-8 encoded string literal. The type of a u8"..." string literal is const char[N] (until C++20)const char8_t[N] (since C++20), where N is the size of the string in UTF-8 code units including the null terminator.
4) UTF-16 encoded string literal. The type of a u"..." string literal is const char16_t[N], where N is the size of the string in UTF-16 code units including the null terminator.
5) UTF-32 encoded string literal. The type of a U"..." string literal is const char32_t[N], where N is the size of the string in UTF-32 code units including the null terminator.
6) Raw string literal. Used to avoid escaping of any character. Anything between the delimiters becomes part of the string. prefix, if present, has the same meaning as described above.

Notes

The null character ('\0', L'\0', char16_t(), etc) is always appended to the string literal: thus, a string literal "Hello" is a const char[6] holding the characters 'H', 'e', 'l', 'l', 'o', and '\0'.

The encoding of narrow multibyte string literals (1) and wide string literals (2) is implementation-defined. For example, gcc selects them with the commandline options -fexec-charset and -fwide-exec-charset.

String literals placed side-by-side are concatenated at translation phase 6 (after the preprocessor). That is, "Hello," " world!" yields the (single) string "Hello, world!". If the two strings have the same encoding prefix (or neither has one), the resulting string will have the same encoding prefix (or no prefix).

If one of the strings has an encoding prefix and the other doesn't, the one that doesn't will be considered to have the same encoding prefix as the other.

L"Δx = %" PRId16 // at phase 4, PRId16 expands to "d"
                 // at phase 6, L"Δx = %" and "d" form L"Δx = %d"

If a UTF-8 string literal and a wide string literal are side by side, the program is ill-formed.

(since C++11)

Any other combination of encoding prefixes may or may not be supported by the implementation. The result of such a concatenation is implementation-defined.

String literals have static storage duration, and thus exist in memory for the life of the program.

String literals can be used to initialize character arrays. If an array is initialized like char str[] = "foo";, str will contain a copy of the string "foo".

The compiler is allowed, but not required, to combine storage for equal or overlapping string literals. That means that identical string literals may or may not compare equal when compared by pointer.

bool b = "bar" == 3+"foobar" // could be true or false, implementation-defined

Attempting to modify a string literal results in undefined behavior: they may be stored in read-only storage (such as .rodata) or combined with other string literals:

const char* pc = "Hello";
char* p = const_cast<char*>(pc);
p[0] = 'M'; // undefined behavior

In C, string literals are of type char[], and can be assigned directly to a (non-const) char*. C++03 allowed it as well (but deprecated it, as literals are const in C++). C++11 no longer allows such assignments without a cast.

A string literal is not necessarily a C string: if a string literal has embedded null characters, it represents an array which contains more than one string.

const char* p = "abc\0def"; // std::strlen(p) == 3, but the array has size 8

If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence. String concatenation can be used as a workaround:

//const char* p = "\xfff"; // error: hex escape sequence out of range
const char* p = "\xff""f"; // OK: the literal is const char[3] holding {'\xff','f','\0'}

Example

#include <iostream>
 
char array1[] = "Foo" "bar";
// same as
char array2[] = { 'F', 'o', 'o', 'b', 'a', 'r', '\0' };
 
const char* s1 = R"foo(
Hello
World
)foo";
//same as
const char* s2 = "\nHello\nWorld\n";
 
int main()
{
    std::cout << array1 << '\n';
    std::cout << array2 << '\n';
 
    std::cout << s1;
    std::cout << s2;
}

Output:

Foobar
Foobar
 
Hello
World
 
Hello
World

See also

user-defined literals literals with user-defined suffix (C++11)

© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
http://en.cppreference.com/w/cpp/language/string_literal