Enhancing the block hash function
Ryan Matthew Lederman, ryan@ript.net
26 January 2003, USA
-------------------------------------------
1. SYNOPSIS
2. DESIGN PRINCIPLES
3. PERFORMANCE BENEFITS
4. SECURITY NOTES
5. SUMMARY
6. EXAMPLE IMPLEMENTATION
-------------------------------------------
Section 1 : Synopsis
This document is based on some new hash
function design principles, and looks specifically at potential design advances
in block hash functions like the MD (Message Digest) family of hash functions.
It will be discussed how it is possible, at least in certain
implementations, to increase the output size of the block function by 50%
while still retaining a good portion of the original
performance. For the examples in this document, the MD5 function is used,
since it is very popular as well as the basis for design of other
functions. The findings discussed here should be applicable to other
block hash functions as well.
The primary reason that one would want to increase the output size of a hash
function is simple. As computer hardware advances, the difficulty
involved in a brute-force attack on a cryptographic system is lessened. A
decade ago, a 128-bit hash was considerably less computationally feasible to
break, since processing power was so much more restrictive. Today, with
the exponential growth in technology, this size becomes less secure, in
theory. Increasing the output of a function to 192-bits increases the
difficulty in attacking the function by an astronomically large amount, and
provides peace of mind for cryptographers relying on the function to be secure.
Unfortunately, modern hash functions that have a larger output size are
incredibly slow in comparison to older and smaller functions like MD5.
These functions do not need to be slow, and the authors of these functions have
overlooked a simple enough concept in their design.
The design principles introduced here are formulated with software-based
implementations in mind. These principles may not be ideal, or even
feasible, for certain platforms with limited resources or processing
power. However, when these principles are considered for implementation
on any nearly-modern computer of moderate quality, the results are quite
staggering. Since a good portion of computer-related products are
designed for the PC/server architecture, this information is necessary.
We will begin by exploring exactly what these new design principles are, and
why they are advantageous.
Section 2 : Design Principles
The ideas presented here are based on a single concept. This is the concept of optimal allocation of CPU cycles used during a hash function's execution. We all know that the purpose of a hash function is to easily compute the fingerprint for any message. For optimal performance, the CPU cycles required by this process should be used in such a way that they are directly responsible for the computation of the fingerprint. In other words, CPU cycles needed to compute the fingerprint should be allocated so that a minimal amount are used for secondary calculations.
In the MD-style function, these secondary calculations include things such as:
These will be referred to as secondary calculations for the remainder of this document. These secondary calculations, although necessary, can be used in such a way that they have less impact on the time needed to compute a fingerprint. By taking CPU cycles away from secondary calculations, more cycles are available for what will be referred to as primary calculations. Primary calculations will be any calculation performed by the function that is directly involved in the generation of the fingerprint. These will be defined as "Any operation included in the mathematical algorithm of the function." Some primary calculations are listed here:
Obviously, in an ideal situation, all CPU cycles would be dedicated to the direct computation of the fingerprint. Unfortunately, given the design of computers and the complexities involved in said computation, this is just not possible. No matter what, some cycles must be spent preparing data or temporary memory for the actual computation. The computer is not capable of determining what you intend to do -- at least not at this point in time.
There is one fundamental way in which optimal allocation of CPU cycles can be attained: processing larger blocks of message data.
MD5 processes message data in blocks of 64 octets. In
the example implementation of an enhanced version of MD5, a 192 octet
block-size will be used. When increasing the block-size of a function,
you must also increase the amount of operations performed on said block, so
that the security of the function is not weakened. In other words, you
must perform an operation at least once for every part of the block, whether
these parts are measured as bits, octets, or words. This is necessary so
that every bit of message data has an effect on the fingerprint produced.
Thus, as you increase the size of the data processed with each pass, you must
also increase the amount of operations performed on the block to retain this
1-to-1 relationship between parts of the message and their effect on the output
of the function. This is because with the addition of 2 chaining
variables, each pass must ensure that all chaining variables are modified
by the compression functions equally for each pass-set.
As a result of this required 1-to-1 relationship, there is a point where
increasing the block size no longer maintains an advantage over smaller block
sizes. A block of 192 octets is sufficiently large as to increase
performance by a hefty margin, but not so large as to be counter-productive.
It is not possible to simply increase the amount of data processed in each pass of the function without a complete overhaul of the function's internal mechanisms. This modification of the function's design requires a few special changes to the internal mechanisms, most notably:
The addition of chaining variables is not very difficult, but the result of this is that now the internal design of the function requires adjustment. This requires much design consideration and security consideration, and is outside of the scope of this document. This document will only discuss here that it is possible to perform these modifications, not how they should be performed.
Before delving further in to the discussion of the benefits that
result from these design principles, let's take some time to discuss the
sample enhanced MD-style function that will be used in all examples
and comparisons for the remainder of the document.
For completeness, an example ANSI C implementation of an enhanced MD-style
function that puts to use the design principles introduced in this document has
been prepared. It is code named S-MD5, for Super-MD5. Do
not be fooled by this name. In no way, shape or form is S-MD5 going to
produce the same output as MD5. It is simply a mock
function for testing and example purposes. Throughout the rest of
this document, it will be used as an example enhanced block
function. The example is not designed to be considered a concrete
implementation, but rather a good model for measurement of performance
benefits that result from the introduction of the ideas
presented. By performing benchmark tests with S-MD5 and MD5 side-by-side,
the performance gains, or otherwise, will be made evident.
S-MD5's design is the result of borrowing code from other hash
functions (MD5 and Validus), with some ad hoc functionality. The
compression functions are designed to accept 6 chaining variables as input, but
their functionality is essentially the same as MD5's. These compression
functions are not designed to be secure. This is
because there was no time to waste on the theoretical security of
these functions, but they still needed to make use of the extra chaining
variables in order to get realistic benchmark statistics.
The majority of the code found in S-MD5 is taken directly from Validus. The exceptions to this statement are that one of the two cyclical rotations in the compression functions has been removed (and replaced them with MD5's rotation constants), the internals of the compression functions have been changed to more closely resemble MD5, and the mixer functions have been replaced with those of MD5.
Now the performance advantages gained by the ideas introduced here will be thoroughly discussed and measured. The results of benchmark tests involving not only MD5 and S-MD5 will be documented, but other hash functions as well. This will provide a good visual and conceptual scale of the relative performance (on the test platform) of different hash functions in existence.
Section 3 : Performance Benefits
First and foremost, let's examine the reasons why our enhanced function can produce more output, and not lose performance when compared to its original design.
As discussed in Section 2, the re-allocation of CPU cycle consumption
is the fundamental concept behind this enhanced function. Let's look at
what this really means, mathematically.
For a message of about 1 billion octets in size (1 GB), it is
apparent how the larger block size would be advantageous:
Function Iterations
--------------------------------------------------
MD5 1GB / 64
octets = (1,073,741,824 / 64) = 16,777,216.000
S-MD5 1GB / 192 octets
= (1,073,741,824 / 192) = 5,592,405.333
This small table compares the iterations required by both MD5 and our enhanced function to process exactly 1 GB of message data. What we can see here is that MD5 needs to make about 16.7 million iterations, and our enhanced function about 5.5 million. This is a difference of ~11,184,810 iterations, or about 66%.
During each iteration, secondary calculations are performed to
prepare a block of the message to be hashed, perform any special operations
required, and change the state of the function to reflect its new
progress. The direct result of this large decrease in iterations for our
enhanced function is that 66% less time needs to be spent executing secondary
calculations, and can now be used for primary calculations.
The amount of time saved in secondary calculations by performing 66% less
iterations is expressed as:
(where Ts is time saved for the entire duration of the fingerprint
generation, I is the number of iterations not performed, and Ti
is the time spent on secondary calculations for each iteration)
Ts = (I x Ti)
Therefore, for the enhanced function to retain at least the overall performance of the original function, it must be able to perform all of its additional primary calculations in the time range [0 ... Ts]. This can be represented in two ways: One was just mentioned, that is, that the sum of the time it takes to perform the additional primary calculations of each iteration is within the range just defined. Another way is to say that the time required for the additional primary calculations in each iteration must be within the range [0 ... Ti].
The design of the enhanced function's mixer and compression functions will be the deciding factor in the overall performance comparison of the enhanced and original functions. If the augmentation of the internal functions causes a massive performance loss over the original counterparts, then obviously the enhanced function could not be able to retain the overall performance of the original function. However, the enhanced function, although slower overall, would still be faster relatively because it produces more output. The enhanced function would have to be 50% slower than the original to be at the same speed/output ratio, since it produces 50% more output bits.
We can determine how much overall performance is gained or lost in
the enhanced function by running benchmark tests that measure the throughput of
both versions of the function, as well as other existing hash functions.
In the following benchmarks, the host computer is equipped with an
Intel Pentium 4 Mobile processor with a clock speed of 2.2GHz, and 512MB
of DDR memory. The compiler used is the Intel C++ compiler v6.0.
The following benchmark statistics are the result of a 1GB random message
test. The hash functions that were tested are listed from fastest to
slowest. All the functions were set to little-endian mode so that no time
was spent swapping the byte order of the message.
Function Time
elapsed
Throughput Output size
------------------------------------------------------------------
S-MD5 3835ms (3.84
sec) 248.68 MB/sec 192
bits
MD5 4377ms
(4.38 sec) 217.88 MB/sec 128
bits
HAVAL 8573ms (8.57
sec) 111.24 MB/sec 256
bits
SHA-1 13429ms (13.43
sec) 71.02 MB/sec 160 bits
Tiger 16624ms (16.62
sec) 57.37 MB/sec 192 bits
RMD-160 17045ms (17.05
sec) 55.95 MB/sec 160 bits
The results of this benchmark are clear; the enhanced S-MD5 function
was able to process the additional primary calculations required by its larger
block size in less time than Ts. In fact, S-MD5 actually processed the
data in less time than MD5. This can be attributed to the ad hoc
design of the compression functions in S-MD5, and perhaps that the MD5
implementation used is not very optimized. In a real-world implementation
of the enhanced function, the compression and mixer functions would most likely
be more complex, and therefore, slower.
However, this benchmark still shows that it is definitely possible to extend
the MD-style function to 192-bits of output while retaining comparable overall
performance. This is possible, of course by the simple redistribution of
CPU time.
Section 4 : Security Notes
If the security of the function was compromised by enhancing it to
produce more output, then the whole idea would be useless. Nobody has a
use for a less secure hash function, even if it is much faster.
Therefore, the security retention of the enhanced function is an important
subject. However, any in-depth discussion of security in a theoretical
enhanced function would be purely speculation, and not within the scope of this
document.
However, there is a statement that can be made about the theoretical security
of the enhanced function in relation to the original without risking
speculation: If the internal workings of the enhanced function are
properly expanded from the original version, then the enhanced version would
have at least the security level of the original. That is, that the
enhanced function makes use of the same design principles found in the
original, secure function, but expands the design to accommodate the additional
chaining variables. Additionally, the enhanced function would have the
extra security provided by the 50% more output bits.
Obviously, the ultimate burden for security would be on the designer of the enhanced function. All that is implied here is that it is possible to make the enhanced function at least as secure as the original, but likely, more secure.
Section 5 : Summary
The ideas here are presented to you in order to show the potential of their application to existing and future block hash functions. Although, it is doubtful that any currently used functions will be replaced just because there is a faster version. Since the enhanced function would not be able to retain the same internal algorithm, it wouldn't be compatible with software implementing the original version. This is not to say that the ideas introduced in this document won't be taken in to consideration when the next generation of hash functions are designed, and by all means they should. It is the natural progression of technology that bigger and better designs emerge as the direct result of this type of information, that is, information that results from people seeing what others may have missed in the past. Perhaps Sir Isaac Newton said it best: "If I have seen farther, it is because I have stood on the shoulders of giants."
Section 6 : Example Implementation
The ANSI C source code of the S-MD5 test function is available
here. Remember, this is only an example function, and shouldn't be used
for cryptography. Choose the file format that you wish to download:
.ZIP Format, 5 KB
.TGZ Format, 5 KB