Character Set Considerations
This page will attempt to discuss design considerations for the user. For expediency, the simplest solution is to not define _UNICODE
and UNICODE
. Then everything uses narrow characters.
Data is Neutral
Crypto++ is generally a neutral library. That is, when the Crypto++ Library operates on data (even when the data is housed in a string), the data is being interpreted as a byte[]. A better (but less portable) abstraction is a Rope. Consider the following fragment, presuming the File is storing binary data:
string sink; FileSource( filename, true, new StringSink( sink ) );
There is no regard to wide or narrow - data is data. Next, suppose it is desired to hash the data:
MD5 hash; hash.Put( (byte*)sink.c_str(), sink.size() ); hash.MessageEnd(); ...
The hash operates on a stream of bytes - the stream could be binary data, narrow characters (which the hash regards as byte[]), or wide characters (which the hash regards as byte[]). The programmer only needs to specify the number of bytes (size) to hash. The hash is indifferent.
Finally, suppose the previous example Hex Encoded the hash before storing it in a narrow string. The program could be either Unicode, SBCS, or MBCS. The next sections discuss this issue.
Crypto++ is Narrow
There are times when one will requires passing a string to Crypto++. These times would include Named Parameters and Filenames. In this case, one of two situations arise.
Wide to Narrow
Wide to Narrow conversion can further be decomposed into two cases:
- using the Standard C++ Library
- using the Win32 API
Using the Standard C++ Library
Users of Visual Studio 6.0 and earlier are at a handicap. Bjarne Stroustrup devoted Appendix D: Locales of his work to issues similar to these (complete with Sample code). However, the code does not compile with VS 6.0. The following will work for the reader.
// Courtesy of Tom Widmer (VC++ MVP) std::wstring StringWiden( const std::string& narrow ) { std::wstring wide; wide.resize( narrow.length() ); typedef std::ctype<wchar_t> CT; CT const& ct = std::_USE(std::locale(), CT); // Non Portable // Iterators should not be used as pointers (works in VC++ 6.0) // ct.widen( narrow.begin(), narrow.end(), wide.begin() ); // Portable // ct.widen(&narrow[0], &narrow[0] + narrow.size(), &wide[0]); // Portable ct.widen(narrow.data(), narrow.data() + narrow.size(), wide.data()); return wide; }
Using the Win32 API
See MSDN for examples of using MultiByteToWideChar.
Narrow to Wide
Narrow to Wide conversion can further be decomposed two cases:
- using the Standard C++ Library
- using the Win32 API
Using the Standard C++ Library
// Courtesy of Tom Widmer (VC++ MVP) std::string StringNarrow( const std::wstring& wide ) { typedef std::ctype<wchar_t> CT; std::string narrow; narrow.resize( wide.length() ); CT const& ct = std::_USE(std::locale(), CT); // Non-Portable // ct.narrow( wide.begin(), wide.end(), '_', narrow.begin() ); // Portable ct.narrow( &wide[0], &wide[0] + wide.length(), '_', &narrow[0] ); return narrow; }
Using the Win32 API
See MSDN for examples of using WideToMultiByteChar.
Application is Wide
Due to the predominace of Windows NT and family, the author exclusively uses the Unicode character set. With that in mind, the following is a typical Design Overview. Notice that anything data related is omitted - a byte[] is a byte[].
Windows API ⇔ Application is fairly generic. The Application will use L""
rather than the _T("")
macro. This means conversion are occuring frequently if UNICODE
and _UNICODE
are not defined.
Generally, the Crypto ⇔ Application conversion is StringWiden(...) for items such as digests. An exception is the occasional need for narrowing a filename.
Caveats
The Win32 API switches between narrow and wide character set based on UNICODE
. The Standard C++ Library switch occurs based on _UNICODE
. This will rear its head when one outputs using cout
. One may receive memory addresses rather than strings on the console (in Visual C++ 6.0). Either #define
both, or #define
neither (and use cout
or wcout
accordingly). A similar behavior used to occur in database code.
When using wide.resize( narrow.length() )
(and the narrow version), do not use length() + 1
- the resulting string will have an additional NULL
added. This will break some substring and most string matching code.
Sample
Please visit The Code Project and download A File Checksum Shell Menu Extension Dll.
Downloads
AppendixDLocales.zip - The C++ Programming Language (3rd Edition), Appendix D: Locales by Stroustrup - 232 kB