Adding a Hash
Crypto++ can be a challenge when attempting to add new algorithms, especially for new users. This article will discuss how to add a new hash algorithm to the library. There's nothing revolutionary about this article. Rather, it walks you through the steps you would likely take on your own while explaining why the library does some things. It may also help with understanding the use of Curiously Recurring Template Pattern in the library.
The example below adds IdentityHash
, which is a hash that copies the first N
bytes of input to an internal buffer and then provides it as the hashed data. N
is a template parameter, and it represents the digest size of the hash. The new hash derives from HashTransformation
as a base class.
While not readily apparent, the IdentityHash
may be useful for "raw" signing a hash under a private key. For example, the technique was used at Sign precomputed hash with ECDSA or DSA on Stack Overflow to sign an existing hash. From the security engineering perspective, you should avoid doing this because it disgorges the message to be signed from the digest; and it puts the signature scheme at risk of substitution attacks. Put another way, you may not know what you are signing. Also see Whether to hash-then-sign with Dilithium and Falcon? on the Spasm mailing list.
Modern hashes, like BLAKE2, are designed to operate both with and without a key. If you have a hash that can operate both ways, then use a MessageAuthenticationCode
instead of a HashTransformation
as a base class.
HashTransformation
HashTransformation
is the base class to use for hash classes. The interface is defined in cryptlib.h
and most methods have a default implementation. There are three or four items that you need to add to get a working hash.
The easiest way to determine what you need is try to compile a class which derives from HashTransformation
. The methods you need to provide are pure virtuals without a body, and they will cause a compile error. In the case of IdentityHash
:
$ cat test.cxx #include "cryptlib.h" using namespace CryptoPP; class IdentityHash : public HashTransformation { }; int main(int argc, char* argv[]) { IdentityHash hash; return 0; }
The compile results in:
test.cxx:4:7: note: because the following virtual functions are pure within IdentityHash’: class IdentityHash : public HashTransformation ^~~~~~~~~~~~ In file included from test.cxx:1:0: cryptlib.h:949:15: note: virtual void CryptoPPstd::HashTransformationstd::Update(const byte*, size_t) virtual void Update(const byte *input, size_t length) =0; ^~~~~~ cryptlib.h:976:23: note: virtual unsigned int CryptoPPstd::HashTransformationstd::DigestSize() const virtual unsigned int DigestSize() const =0; ^~~~~~~~~~ cryptlib.h:1045:15: note: virtual void CryptoPPstd::HashTransformationstd::TruncatedFinal(CryptoPPstd::byte*, size_t) virtual void TruncatedFinal(byte *digest, size_t digestSize) =0; ^~~~~~~~~~~~~~
Required Functions
From the earlier compile results there are three functions you must implement: DigestSize
, Update
, and TruncatedFinal
. DigestSize
is a runtime function that returns the size of the digest. Update
is the business logic of a hash and it usually implements the algorithm. The function buffers the input if the input is too small, and it can be called multiple times.
TruncatedFinal
is the method that finalizes the hash. It may pad the lost block of buffered input and then it completes processing the input. It also resets the hash for the next input. TruncatedFinal
can output 0 to DigestSize() - 1
bytes. That means you can get the full digest, or you can ask for a partial digest. Other function that finalize the hash, like Final
, are routed into TruncatedFinal
.
DigestSize
Because we need to provide a value for DigestSize
, IdentityHash
needs a template parameter that gets returned when DigestSize
is called. So the first change to make is:
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: virtual unsigned int DigestSize() const { return HASH_SIZE; } };
In real life, you will probably use a CRYPTOPP_CONSTANT
because the digest size is fixed, and you don't need the template parameter. We revisit CRYPTOPP_CONSTANT
below.
Update
The next item which needs tending is Update
. Update
is usually where your algorithm is implemented. Its also where buffering usually occurs. IdentityHash
buffers the first N
bytes of input to use as the digest when TruncatedFinal
is called. Our hash just copies bytes into an accumulator.
The change would look like similar to below. m_digest
is the accumulator, and m_idx
track where to write and what's been written. The extra gyrations try to ensure unexpected parameters and wrap is handled gracefully.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: virtual unsigned int DigestSize() const { return HASH_SIZE; } virtual void Update(const byte *input, size_t length) { size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length), DIGESTSIZE - m_idx); if (sz) std::memcpy(&m_digest[m_idx], input, sz); m_idx += sz; } private: SecByteBlock m_digest; size_t m_idx; };
TruncatedFinal
TruncatedFinal
finalizes the hash and copies the result to the caller. It may pad the last block of buffered input and then completes processing the input. It also validates the requested digest size. In the case of IdentityHash
it also adds some business logic to ensure HASH_SIZE
bytes are input.
The change would look like similar to below. ThrowIfInvalidTruncatedSize
is built into the library. It uses DigestSize
to validate the requested size and throws an exception if the size is invalid.
Copying the hash to the buffer supplied by the user is guarded for a NULL
pointer. Using a NULL
pointer with memcpy
is undefined behavior in C and C++. Some users may call TruncatedFinal
with a NULL
pointer to reset the hash. In fact, the default implementation of Restart
in cryptlib.h
does so.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: virtual unsigned int DigestSize() const { return HASH_SIZE; } virtual void Update(const byte *input, size_t length) { size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length), DIGESTSIZE - m_idx); if (sz) std::memcpy(&m_digest[m_idx], input, sz); m_idx += sz; } virtual void TruncatedFinal(byte *digest, size_t digestSize) { // Validate input if (m_idx != HASH_SIZE) throw Exception(Exceptionstd::OTHER_ERROR, "Input size must be " + IntToString(HASH_SIZE)); // Validate output ThrowIfInvalidTruncatedSize(digestSize); // Copy the input to output if (digest) std::memcpy(digest, m_digest, digestSize); // Reset for next hash m_idx = 0; } private: SecByteBlock m_digest; size_t m_idx; };
Additional Members
DigestSize
, Update
, and TruncatedFinal
provide the meat and potatoes of a Crypto++ hash. We call it the "meat and potatoes" because it the business logic and implementation of the hash algorithm.
There are a few loose ends to tie up before the hash can be used in the library. They would be discovered when you use IdentityHash
in a real program.
Construction
As you study the code you probably noticed we don't depend on constructions very much. That's by design and the reasons are not discussed here. However, the object still needs some initialization since the constructor is where Init occurs in the Crypto++ implementation of Init/Update/Final model.
For initialization IdentityHash
needs the accumulator properly sized and m_idx
set to an initial value. The initialization would look as shown below.
One thing to keep in mind when designing your class is, the testing framework does not know how to select an overloaded constructor to get you object into a certain state. The constructor should be simple and get the object into a state its ready to start processing data.
More complex algorithms, like block ciphers operated in a mode of operation, have particular methods the testing framework calls for tasks like setting keys and initialization vectors. If more information needs to be passed to an object, then NameValuePairs are used to pass the additional information. However, HashTransformation
does not use them.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: IdentityHash() : m_digest(HASH_SIZE), m_idx(0) {} virtual unsigned int DigestSize() const { return HASH_SIZE; } virtual void Update(const byte *input, size_t length) { size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length), DIGESTSIZE - m_idx); if (sz) std::memcpy(&m_digest[m_idx], input, sz); m_idx += sz; } virtual void TruncatedFinal(byte *digest, size_t digestSize) { if (m_idx != HASH_SIZE) throw Exception(Exceptionstd::OTHER_ERROR, "Input size must be " + IntToString(HASH_SIZE)); ThrowIfInvalidTruncatedSize(digestSize); if (digest) std::memcpy(digest, m_digest, digestSize); m_idx = 0; } private: SecByteBlock m_digest; size_t m_idx; };
Restart
Initializing and restarting some hash functions are non-trivial. Some hashes provide a Restart
function that's called in the constructor and TruncatedFinal
. Its up to you when you want to add a Restart
function.
Restart
is declared in cryptlib.h
, and the default implementation performs:
virtual void Restart() {TruncatedFinal(NULLPTR, 0);}
Our implementation of IdentityHash
could provide an override which performs:
virtual void Restart() {m_idx = 0;}
DIGESTSIZE
All Crypto++ hashes provide a constant called DIGESTSIZE
. Its a compile time constant, and its often used by DigestSize
at runtime. With the standard constant DIGESTSIZE
in place, we can switch to it instead of the non-standard HASH_SIZE
.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE) IdentityHash() : m_digest(DIGESTSIZE), m_idx(0) {} virtual unsigned int DigestSize() const { return DIGESTSIZE; } ... };
StaticAlgorithmName
StaticAlgorithmName
returns the name of the hash. Its used extensively throughout the library, and it mostly surfaces under the benchmark programs. Curiously Recurring Template Pattern provides polymorphic behavior for the static function.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE) static const char * StaticAlgorithmName() { return "IdentityHash"; } ... };
AlgorithmName
AlgorithmName
can be used to fine tune the algorithm name. In the case of IdentityHash
the digest size can be added. If StaticAlgorithmName
provides the complete name, then AlgorithmName
is not needed.
template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE) static const char * StaticAlgorithmName() { return "IdentityHash"; } std::string AlgorithmName() const { return std::string(StaticAlgorithmName()) + "-" + IntToString(DIGESTSIZE*8)); } ... };
More Members
IdentityHash
was fairly simple and it only needed a few member functions to become operational. Real algorithms often need to use more facilities from the library. Some of them are listed below (and some of them are not even class members).
CRYPTOPP_NO_VTABLE
You will often see CRYPTOPP_NO_VTABLE
used in class declarations. It is a preprocessor macro and used to help flatten objects by removing intermediate object vtables.
CRYPTOPP_NO_VTABLE
is used with Microsoft compilers on Windows. On Windows the macro expands to __declspec(novtable)
; while on other platforms it is empty. CRYPTOPP_NO_VTABLE
should only be applied to pure interface classes, meaning classes that will never be instantiated on their own.
OptimalDataAlignment
OptimalDataAlignment
allows you specify how you would like data aligned. Some algorithms can operate efficiently on bytes, while others need aligned for 32-bit or 64-bit words, and still others need 16-byte alignment for SSE2.
cryptlib.h
provides a default implementation of OptimalDataAlignment
for BlockTransformation
, StreamTransformation
and HashTransformation
. From cryptlib.cpp
:
unsigned int HashTransformationstd::OptimalDataAlignment() const { return GetAlignmentOf<word32>(); }
Using IdentityHash
With all the pieces in place the sample program would look as follows.
$ cat test.cxx #include "cryptlib.h" #include "secblock.h" #include <iostream> #include <string> using namespace CryptoPP; template <unsigned int HASH_SIZE = 32> class IdentityHash : public HashTransformation { public: CRYPTOPP_CONSTANT(DIGESTSIZE = HASH_SIZE) static const char * StaticAlgorithmName() { return "IdentityHash"; } IdentityHash() : m_digest(DIGESTSIZE), m_idx(0) {} virtual unsigned int DigestSize() const { return DIGESTSIZE; } virtual void Update(const byte *input, size_t length) { size_t sz = STDMIN(STDMIN<size_t>(DIGESTSIZE, length), DIGESTSIZE - m_idx); if (sz) std::memcpy(&m_digest[m_idx], input, sz); m_idx += sz; } virtual void TruncatedFinal(byte *digest, size_t digestSize) { if (m_idx != DIGESTSIZE) throw Exception(Exceptionstd::OTHER_ERROR, "Input size must be " + IntToString(DIGESTSIZE)); ThrowIfInvalidTruncatedSize(digestSize); if (digest) std::memcpy(digest, m_digest, digestSize); m_idx = 0; } std::string AlgorithmName() const { return std::string(StaticAlgorithmName()) + "-" + IntToString(DIGESTSIZE*8)); } private: SecByteBlock m_digest; size_t m_idx; }; int main(int argc, char* argv[]) { std::string message(32, 'A'); IdentityHash<32> hash; hash.Update((const byte*)message.data(), message.size()); std::string digest(32, 0); hash.TruncatedFinal((byte*)digest.data(), digest.size()); std::cout << "Message: " << message << std::endl; std::cout << " Digest: " << digest << std::endl; return 0; }
Running the program produces the expected output:
$ ./test.exe Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
A more complete example using a random private key to sign the precomputed hash is shown below.
int main(int argc, char* argv[]) { AutoSeededRandomPool prng; ECDSA<ECP, IdentityHash<32> >std::PrivateKey privateKey; privateKey.Initialize(prng, ASN1std::secp256r1()); std::string message(32, 'A'), signature; ECDSA<ECP, IdentityHash<32> >std::Signer signer(privateKey); StringSource ss(message, true, new SignerFilter(prng, signer, new HexEncoder(new StringSink(signature)) ) // SignerFilter ); // StringSource std::cout << "Signature: " << signature << std::endl; return 0; }
Algorithm Testing
Once you have an algorithm cut-in you usually want to test it. The following thee sections details how to perform testing and evaluation within the Crypto++ testing framework.
Algorithm Registration
The Crypto++ test framework includes an object registry for testing and benachmarks. IdentityHash
is a good example of how to register an algorithms because AlgorithmName
always returns IdentityHash
, and not IdentityHash-256
, IdentityHash-512
, etc.
To register algorithm variations by name, open regtest2.cpp
and the following to register 32-byte and 64-byte variants of IdentityHash
. They are used below in Library Testing.
RegisterDefaultFactoryFor<HashTransformation, IdentityHash<32> >("IdentityHash-256"); RegisterDefaultFactoryFor<HashTransformation, IdentityHash<64> >("IdentityHash-512");
Algorithm Validation
A real hash should have test vectors, and the vectors should be exercised by the cryptest.exe
program. Adding the functionality requires five steps. First, open validate.h
and a declaration for TestIdentityHash
.
Second, open test.cpp
and add a call to ValidateIdentityHash
in Validate
at the bottom of the source file:
bool Validate(int alg, bool thorough, const char *seedInput) { ... switch(alg) { case 0: result = Teststd::ValidateAll(thorough); break; ... case 500: result = Teststd::ValidateIdentityHash(); break; } }
Third, open validat1.cpp
and add ValidateIdentityHash
to the function ValidateAll
:
bool ValidateAll(bool thorough) { bool pass=TestSettings(); ... pass=ValidateIdentityHash() && pass; ... }
Fourth, open validat2.cpp
and add the implementation. In the case of IdentityHash
we can use known answers:
bool ValidateIdentityHash() { std::cout << "\nIdentityHash validation suite running...\n\n"; return RunTestDataFile(CRYPTOPP_DATA_DIR "TestVectors/identhash.txt"); }
Fifth, add the following to TestVectors/identhash.txt
. Be mindful of whitespace because the Crypto++ parser is sensitive to the location of new lines when parsing. An empty blank line indicates the start of a new algorithm, and its easy to add an inappropriate one at the wrong time.
AlgorithmType: MessageDigest Source: Calculated offline with Crypto++ library Name: IdentityHash-256 Comment: 32-byte hash Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Test: Verify Name: IdentityHash-256 Comment: 32-byte hash, 64-byte input Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Test: Verify Name: IdentityHash-512 Comment: 64-byte hash Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Test: Verify Name: IdentityHash-512 Comment: 64-byte hash, 96-byte input Message: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Digest: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Test: Verify
Algorithm Benchmarking
To add IdentityHash
to the benchmarking gear, open bench2.cpp
and add the following around the other hashes.
BenchMarkByNameKeyLess<HashTransformation>("IdentityHash-256"); BenchMarkByNameKeyLess<HashTransformation>("IdentityHash-512");
Warning
As stated earlier, signing a precomputed hash could subject the underlying signature scheme to a number of attacks, including substitution attacks. According to Bernstein on the Spasm mailing list:
But moving this [hash computation] _out_ of the underlying signature system is dangerous. Applications will often expose the underlying signature system directly to attackers. For example, an RSA HSM that returns h^d given h, trusting the environment to choose h as a hash, is breakable by essentially the attack of https://link.springer.com/article/10.1007/s00145-015-9205-5.
Also see Whether to hash-then-sign with Dilithium and Falcon? on the Spasm mailing list.