What is a Hash, Actually?
A hash or hash value is a data record of a fixed length (16 bits, 32 bits, 64 bits, etc.) used for data or data transmission verification. To create a hash value, you need a hash function. The function converts texts, music, or entire programs into a hash value. Regardless of the program's size, the hash remains the same length! They are typically provided in hexadecimal notation, using numbers 1, 2, 3, 4, 5, 6, 7, 8, 9, and letters A, B, C, D, E, and F. Over time, many different hash functions have been developed, creating a cat-and-mouse game. Black Hats (groups with criminal goals – hackers) constantly try to crack the functions to access the data behind them, while developers and the NSA keep creating new functions with even better encryption to ensure data security. One formerly common encryption method was the Md5 algorithm, but more on that later.
Here's a Hash Generator for you to experiment with!
What Does a Hash Look Like?
Here's an example hash. We used the SHA-2 algorithm for this encryption:
Compare the two encrypted texts. Even though only a "!" was added, the complete hash value looks different. Every position of each character has changed, and it seems like complete nonsense. However, the special thing is that these results always remain the same, no matter how many times I encrypt "Hello World." It outputs the same string every time. So, identical data leads to the same hash.
The Secure Hash Algorithm (SHA)
The SHA algorithm was initially developed in 1993. The National Security Agency (NSA) and the National Institute of Standards and Technology (NIST) were involved in its development at that time. The first version, SHA-0, had to be quickly modified after vulnerabilities were discovered. So, in 1995, SHA-1 was created. This algorithm uses a compression function and generates 160-bit long hash values. SHA has been continuously improved over time to adapt to the increased performance of common computers. In 2005, SHA-1 was successfully broken for the first time, allowing the creation of fake messages with identical hash values. In response, SHA-2 and SHA-3 were developed. These two algorithms are still secure standards in data encryption today.
Properties of a Hash
The generated hash values are collision-resistant, which means the same hash cannot occur twice. This is why hash values are so suitable for verifying data
A hash function is a one-way function! Complex data cannot be reconstructed from the hash (assuming a hash is generated from an entire Word document, you cannot recreate the document from the hash).
A good hash algorithm should not be too fast because that makes it easy to decrypt. For large data sets, a processing time of 1-2 seconds is good.
How Does Hash Encryption Work?
You've probably wondered how it's possible to generate such long strings of characters. At first glance, these lengthy hash values may seem very complicated, but once you understand the core principle, everything becomes much simpler. Each hash function includes encryption, which transforms the input into a hash value. Let's explain this with a simple example:
Number sequence: 135879 = 1+3+5+8+7+9 = 33
Letter sequence: Hello World = 81121215 2351220 = 21 15
This is a simple encryption involving the formation of a checksum! In the case of "Hello World," each letter is assigned a number corresponding to its position in the alphabet. H becomes 8, A becomes 1, and so on. This results in the two strings above. You then create a checksum from these, and ultimately, "Hello World" becomes "21 15." In essence, complex hash functions do just that! However, they can convert entire documents and entire films into a hash value. They break down the content of the files into small packets of bits. These packets are processed in batches until all bits of the file have been fully processed. Let's assume one packet consists of 512 bits, and the document is 5000 bits in size. The function works until all 5000 bits have been processed in packets! As you can see, there are bits left over at the end. This isn't a problem because the algorithm is capable of filling in these remaining empty spaces. As a result, the hash value remains the same length. During this packet processing, additional encryption mechanisms are applied, but this becomes too complex to explain here. If you're interested, we recommend checking out the Computerphile YouTube channel.
Three Practical Examples
Identification of Duplicates: In bulk processing of assets, you can identify duplicates using a hash. To do this, generate hash values for all your assets and check afterward if any duplicate hashes appear. Duplicate hashes indicate duplicate files!
Password Verification: Servers store the hash value of a password and compare hash values when the password is entered again. If these values match, the passwords match, granting you access.
Data Transfer Verification: Hash values can be used to control the transmission of internal documents. Senders and recipients generate hash values and exchange them for verification. This way, you can check afterward whether the internal document was manipulated during transmission.
What Is Salting?
The term "salting" originates from cooking, just as a meal is salted to add flavor, passwords are "salted" to make them even more secure. Something is added to them to enhance security. A random set of characters (salt) is appended to the existing password.
For example, the password "hello123" could become either "D2+sc#aPhello123" or "hello123D2+sc#aP". The salt can be added to both ends of the password. Servers assign each user an individual salt value, and this newly created password is then converted into a hash value. This provides double-layer security!
The Risk of (Old) Hash Functions!
Poor hash functions are manipulable and even decodable. While you cannot recover entire text documents, passwords like "Hello123" can be decoded in seconds. You can try this for yourself.
Encrypted with MD5 (Message-Digest Algorithm 5).
Search for "Decrypt MD5 hash" in Google, and you will get the original text from this random string. The MD5 algorithm was a standard encryption method on the internet for years, but it has now been replaced and superseded by more secure algorithms. This is a good thing! As you can see, this algorithm is no longer secure; it has been cracked. Newer encryption variants like SHA-2 and SHA-3 are considered secure and are used in practice.
What could possibly go wrong?!
A greater risk with bad hashes is the manual falsification of a hash. Once you understand how the function works, you can manually generate any hash values you want, and that can be very dangerous.
Let's say you're sending an important document like a birth certificate to a government office for a record. The birth certificate has a specific hash value. With older hash algorithms (like the MD5 algorithm), it's relatively easy to alter the file (change the name, incorrect birth year, etc.) and THEN MANUALLY reconstruct the previous hash value. As a result, the recipient would believe the fake document is genuine, and from that point on, you might be living under a false identity or be 10 years younger. Scary or rather tempting?