Shorten, or integrity in practice

Piotr Nazimek
Calendar icon
4 października 2016

Integrity service ( integrity) is one of the four basic services of information protection. It performs the function of ensuring that the processed information has not been altered in any way. Such change can be accidental (error during transmission) as well as intentional (change by an attacker). Cryptographic hash functions are used to ensure data integrity.

Cryptographic hash functions

The hash function takes a message as input, and the result of its operation is a fixed-length digest that is independent of the length of the message. This means that there are bound to be messages that will result in the same hash function value. Such a situation is called collision, and in cryptographic applications it is highly undesirable.

funkcja-skrotu.webp

Not every function can be a cryptographic hash function. To become one, it must meet the following conditions:

  • it is a hard problem to find for a given value of the hash function such a message that produced it ( preimage resistance),
  • is a hard problem to find N messages for a given M for which the values of the hash function will be the same ( second preimage resistance),
  • is a hard problem to find any two messages N and M for which the values of the hash function will be equal ( collision resistance).

Thus, a hash function used in data structures is not a hash function. Nor is it a checksum such as cyclic redundancy check(CRC). These are easily reversible functions. This means that for a given hash function value, it is easy to find any message giving that value. It is also an easy problem to find two messages that give the same hash, that is, to find a collision. In cryptography, such functions have no use. For cryptographic hash functions there is a solution to the above problems, but it is difficult to find. Difficult means useless in practice and impossible with the currently available memory and time resources. It is unlikely to be profitable for an attacker to find a collision decades later.

Algorithms

The best-known hash functions form two families of algorithms: MD ( Message Digest) and SHA ( Secure Hash Algorithm). Representatives of the first are, for example, the MD4 and MD5 algorithms. They should no longer be used because finding collisions for these functions is now an easy problem. The second family includes the algorithms developed by the National Institute of Standards and Technology (NIST) presented in FIPS 180-4. The current recommendation is to use a hash function of at least SHA-256, which produces a 256-bit hash value. The SHA-1 function should be retired from use in existing systems and not used in new ones.

In Java, we can calculate hash functions using the MessageDigest factory. The name of the hash function is passed as a string. It can be found in the Standard Algorithm Name Documentation.

The effect of the following program will be to calculate the value of the SHA-256 hash function for a sample message.

1import java.security.MessageDigest;
2
3public class DigestTest {
4	public static void main(String[] args) throws Exception {
5		byte[] data = "abcdefghijklmnoprstuwxyz1234567890!".getBytes();
6
7		MessageDigest md = MessageDigest.getInstance("SHA-256");
8		byte sha256[] = md.digest(data);
9	}
10}

The preparation of a new algorithm by NIST takes many years and involves detailed analysis. In the past, algorithms were developed by specialists employed by NIST. Nowadays, competitions are announced, among other reasons, to avoid suspicion of proposing such algorithms, which perhaps NIST already knows how to crack, unlike the rest of the world. Such a competition for the title of a new hash function was announced in 2007, and a standard in the form of the FIPS 202 document was adopted in August 2015. The winner was the Keccak algorithm, which joined the SHA family becoming the SHA-3 function. Currently in Java we can calculate it, using an external cryptographic library such as Bouncy Castle. This is demonstrated in the following example.

1import java.security.Security;
2import java.security.MessageDigest;
3
4import org.bouncycastle.jce.provider.BouncyCastleProvider;
5
6public class DigestTest2 {
7    public static void main(String[] args) throws Exception {
8     		byte[] data = "abcdefghijklmnoprstuwxyz1234567890!".getBytes();
9
10        Security.addProvider(new BouncyCastleProvider());
11        MessageDigest md = MessageDigest.getInstance("SHA3-256");
12        byte sha3[] = md.digest(data);
13    }
14}

Practical applications

Looking at the properties of hash functions, one can point out their other interesting uses beyond the integrity service itself. Properly used, they are perfect for securely storing user passwords in the database. Instead of storing the password in an explicit form, we store its hash function value. In this way, an attacker who steals such a database will not learn the users' passwords. On one condition: if he prepares in advance a large dictionary containing thousands of passwords and their corresponding hashes, he can carry out a so-called dictionary attack, hoping to find a hash from the stolen database in his dictionary, and this will allow him to learn the password. This is not a surefire attack, but it is often very effective, as users often use weak passwords. This attack can be avoided by using a salt, or salt, which is a random string pasted to the user's password before calculating the hash. The salt is stored along with the password digest in plaintext. It renders prepared dictionaries useless. Why does an attacker need to know passwords? After all, he has already hacked into the service under attack. However, he counts on users applying the same password to other sites, which is a common dangerous practice. By hacking into one site, he will thus gain easy access to others.

Using the hash function, it is also possible to play a safe coin toss over the network. Such a protocol is called the bit commitment protocol. The coin tosser must treat its toss result with a hash function and send its value to the guesser. With this operation, he will no longer be able to change it, i.e. cheat the guesser, but the guesser has no way of knowing what was thrown. Only after an unsuccessful guessing attempt does the thrower reveal what message produced the transmitted digest. This protocol should also be properly implemented. Otherwise, the guesser will easily apply a dictionary attack.

Summary

Without ensuring integrity, there is no point in building any information system. However, it alone will not protect us from an attack based on receiving integral information from an unknown source. After all, an attacker can intercept the data, change it, recalculate the value of the hash function and send it to us if we transmit this data through the channels to which he gained access. The message will be consistent, but it will not be the message sent by the sender from whom we expected to receive it. However, this additional service --- checking that the message came from the original source --- is not the purpose of integrity. In the next post, this problem will be solved by adding authentication implemented with a message authentication code (MAC).

Read also

Calendar icon

22 sierpień

A new era of knowledge management: Omega-PSIR at Kozminski University

Kozminski University in Warsaw, one of the leading universities in Poland, has been using the Omega-PSIR system we have implemented t...

Calendar icon

12 sierpień

What is Event-Driven Achitecture and why do you need it?

Event-Driven Architecture (EDA) is a modern approach to IT system design. Learn how EDA can impact your organization's growth!

Calendar icon

31 lipiec

How to use Rust with Python?

Learn how to integrate Rust and Python using PyO3 and Maturin. Learn how to write native Python modules in Rust and how to build and ...