A popular Unicode multibyte encoding. For example
$codecvt_wide
UTF-8
specifies that codecvt_byname<wchar_t, char, mbstate_t> will implement the UTF-8 encoding scheme. If this data is in a file called "en_US", then the following program can be used to output a wchar_t string in UTF-8 to a file:
#include <locale> #include <fstream> int main() { std::locale loc("en_US"); std::wofstream out; out.imbue(loc); out.open("test.dat"); out << L"This is a test \x00DF"; }
The binary contents of the file is (in hex):
54 68 69 73 20 69 73 20 61 20 74 65 73 74 20 C3 9F
Without the UTF-8 encoding, the default encoding will take over (all wchar_t bytes in native byte order):
#include <fstream>
int main()
{
std::wofstream out("test.dat");
out << L"This is a test \x00DF";
}
On a big-endian machine with a 2 byte wchar_t
the resulting file in hex is:
00 54 00 68 00 69 00 73 00 20 00 69 00 73 00 20
00 61 00 20 00 74 00 65 00 73 00 74 00 20 00 DF