Cryptographic hash functions

Introduction

In this lesson we will look at cryptographic hash functions which see extensive use in quick validation & authentication.

By the end of the lesson we will have covered:

What cryptographic hash functions are
Python code examples demonstrating the use of hash functions
A look at applications of cryptographic hashing
The security of cryptographic hashing
Threats to these algorithms from both classical and quantum computers

Introduction to cryptographic hashing

represent a valuable construct in cryptography as they help enable validation with confidentiality. As such, hash functions form an important component of mechanisms for data authentication and integrity, such as (HMAC) and . This article will discuss the basic ideas and security considerations underpinning cryptographic hash functions and outline potential vulnerabilities from the advent of quantum computing.

Basic rationale and design of hash functions

There are many situations where authentication and integrity verification need to be performed cheaply and without revealing private information to the party performing the validation.

For example, when downloading software from a remote server, an efficient mechanism is needed to verify that the software actually downloaded has not been modified since being created by the original author of the software. Similarly, when authenticating users of web applications, it would be desirable to use a mechanism that does not involve physically storing or transmitting the actual passwords, which can potentially compromise their confidentiality.

Cryptographic hash functions (CHFs) address such needs efficiently and securely.

Fundamentally, a cryptographic hash function takes an input (or message) of arbitrary length and returns a fixed-size string of n-bits as output. The output of a is also referred to as a digest.

A useful CHF should satisfy several key properties:

Uniformity: The digests produced by a CHF should be distributed uniformly and should look random. The aim is to ensure the output leaks no information about the input.
Determinism: For a given input, a CHF must always produce the same , i.e., it must be deterministic.
Irreversibility: A CHF is a in that, given a digest, it should be infeasible to invert the hashing and obtain the input.
Approximate injectivity: While CHFs are many-to-one functions, they should appear to look like one-to one functions. This is achieved by combining an enormous output space size with the avalanche effect whereby tiny changes in the input lead to wildly divergent digests. This characteristic is known as approximate injectivity.

Given this, it's possible to validate a piece of data against the original instance by comparing a digest of the data to a digest of the original.

If the two digests match, we can be confident with high probability that the data is identical to the original.
If the digests differ, we can be sure that the data was tampered with or is otherwise inauthentic.

Since the CHF digests themselves do not reveal the actual contents of the data or the original, they enable validation while preserving privacy.

A hash function $H$ can be defined as:

$H : Σ^* \rightarrow \{0,1\}^n$

where $Σ^*$ is the set of all possible strings which we may consider to be binary strings of any length.

The fact that the size of the input domain $Σ^*$ of $H$ is unbounded while that of the co-domain $\{0,1\}^n$ is bounded means that $H$ is necessarily many-to-one mapping infinitely many inputs to any given n-bit string.

The properties of uniformity and determinism are nicely encapsulated within the of cryptographic hashing.

Example of cryptographic hashing in Python with SHA-256

This simple example demonstrates cryptographic hashing using the popular algorithm as provided by the cryptography Python library.

First we show how a minor difference in plain texts leads to a very large difference in the hash digests.

Output:

The two messages differ by 1 characters

The two messages differ in exactly one character.

Next, we instantiate hash objects to enable the hashing process, which involves calls to two methods: update and finalize.

We see that due to the avalanche effect in the SHA-256 CHF, a one-character difference in input messages yields two very different digests.

Output:

digest-1: 6e0e6261b7131bd80ffdb2a4d42f9d042636350e45e184b92fcbcc9646eaf1e7
digest-2: 6b0abb368c3a1730f935b68105e3f3ae7fd43d7e786d3ed3503dbb45c74ada46
The two digests differ by 57 characters

Applications of cryptographic hashing

The unique properties of CHFs make them suitable for a wide array of applications:

Data integrity checks: Hash functions can be used to create a for a set of data. Any modifications to the data, intentional or not, will result in a different checksum, alerting systems or users to the change. The checksum is also typically much more compact than the original data, which makes checksum comparisons very fast.

Fig 1: Secure hashing for data integrity checks

Figure 1. Secure hashing for data integrity

Digital signatures: Cryptographic hashes are essential to the functioning of digital signatures as they involve comparing cryptographically hashed messages to establish authenticity and integrity while preserving privacy.

Fig 2: Digital signatures

Figure 2. Digital signatures

Blockchain and cryptocurrencies: Cryptocurrencies like Bitcoin rely heavily on CHFs, particularly in creating transaction integrity and enabling consensus mechanisms like proof of work.

Security of cryptographic hashing

The security of a CHF is typically assessed based on resistance to two types of attacks: and .

Pre-image resistance

Pre-image resistance means that given a digest, it should be infeasible to find the input.

This is related to the one-way property of CHFs.

A good CHF is designed in such a way that a party wishing to conduct a pre-image attack has no better option than a brute force approach, which has time complexity $2^n$ .

mathematical details

Given a CHF $H$ and digest $g$ , it should be computationally infeasible to find any input $m$ from the pre-image of $g$ whereby $H(m) = g$ .

Collision resistance

Collision resistance means that it is difficult to find two different inputs that hash to the same digest.

A cryptographic hash collision occurs when two inputs hash to the same digest. While collisions inevitably exist given the many-to-one nature of CHFs, a good CHF nevertheless makes it infeasible to locate one at will.

Collision resistance is crucial for applications like digital signatures and certificates, where it could be disastrous if a malicious party were able to create a forgery that hashes to the same value.

mathematical details of hash collisions

$m_1, m_2$ can be found such that $H(m_1) = H(m_2)$ .

Hash length

Collision resistance is a harder requirement than pre-image resistance and necessitates output lengths twice as long as that needed for pre-image resistance. This is because a brute force attack known as the , which can be used to identify hash collisions, has time complexity $2^{n/2}$ .

In the absence of cryptanalytic weaknesses, the security of a hash function is therefore primarily influenced by its hash length. The longer the hash, the more secure it is, as it becomes harder to mount brute force attacks.

Commonly used cryptographic hash functions

The following table lists some commonly used cryptographic hash functions, along with their hash lengths and primary application domains:

Hash Function	Output Length (bits)	Common Applications
MD5	128	File integrity checking, older systems, non-crypto uses
SHA-1	160	Legacy systems, Git for version control
SHA-256	256	Cryptocurrency (Bitcoin), digital signatures, certificates
SHA-3	Variable (up to 512)	Various cryptographic applications, successor to SHA-2
Blake2	Variable (up to 512)	Cryptography, replacing MD5/SHA-1 in some systems
Blake3	Variable (up to 256)	Cryptography, file hashing, data integrity

and , while still seen in less sensitive applications places, are considered deprecated in terms of collision resistance and are not recommended for new systems. SHA-256, part of the family, is widely used and currently secure for most applications.
is a newer standard that was selected by NIST in 2015 as the winner of the NIST hash function competition. It's designed to be a drop-in replacement for SHA-2, but it also has some different internal characteristics and is resistant to certain types of attacks that might be effective against SHA-2.
are cryptographic hash functions that are faster than MD5, SHA-1, SHA-2, and SHA-3, but at least as secure as the latest standard, SHA-3. They are increasingly being used in new systems, particularly where speed is important.

Quantum risks to traditional cryptographic hashing

The primary quantum threat to cryptographic hashing is posed by brute force attacks.

Given a particular digest, an attacker will try out random inputs until they locate one which produces the same digest.

With $n$ bits in the input, there are $2^n$ possible values. Therefore, the attacker needs to try out around $2^{n-1}$ inputs to have a better than 50% chance of success.

Grover's algorithm

For such an unstructured search context, Grover's algorithm can provide a quadratic speedup using a technique known as , reducing the time complexity of a pre-image attack to $2^{n/2}$ .

In practical terms, this means that a 256-bit CHF, which is currently considered secure against pre-image attacks by classical computers, would provide the same level of security as a 128-bit CHF when faced with a quantum attacker utilizing Grover's algorithm.

Grover's algorithm by itself is not known to provide additional quantum speedups with respect to collision attacks beyond the limit set by the , which can be carried out on classical computers. Since the classical birthday attack already requires CHFs to employ hash lengths that are twice as long as needed for pre-image resistance, the fact that Grover search effectively halves the hash length with respect to pre-image attacks does not pose a practical threat.

For example, in the case of SHA-256, performing on the order of $2^{128}$ operations to execute a Grover-assisted pre-image attack would still be infeasible in the foreseeable future.

BHT algorithm

A quantum algorithm that combines aspects of the birthday attack with Grover search was proposed in 1997 by (BHT) and affords a theoretical scaling of $O(2^{n/3})$ for finding hash collisions. However, this improved scaling is predicated on the existence of quantum random access memory technology, which currently does not exist.

Without QRAM, the realizable scaling is $\tilde{O}(2^{2n/5})$ and for hash lengths currently in use, this marginal improvement in collision-finding capability relative to the birthday attack does not represent a critical threat.

Summary

Cryptographic hash functions are an essential component for ensuring data integrity and privacy in digital information systems and find widespread application in many contexts.

The security requirements of CHFs are mainly understood in terms of their resistance to pre-image and collision attacks. For well-designed CHFs, the hash length is a good proxy for the security level.

While the advent of quantum computers executing the Grover and BHT algorithms in theory affects the pre-image and collision resistance of traditional CHFs, the long hash lengths already in use today means that modern cryptographic hashing algorithms, such as SHA-3, are likely to remain secure barring the discovery of as-yet-unknown cryptanalytic attacks.

The relevance of CHFs lies in their role as a fundamental building block for quantum-resistant cryptographic schemes, ensuring that digital information remains secure even in the face of potential future advancements in quantum computing algorithms and technologies.

Was this page helpful?

Yes