Storing passwords database

Storing passwords securely is a recurring concern.
But what are the main methods, how do they work, and what are they worth against current password cracking techniques?
In this article we explain the main principles of secure storage (hash, salt, pepper, iteration) and highlight their importance for resisting password recovery methods. Finally, we will talk about a reliable hash function for secure storage.

Plain Text Passwords

Let’s say an attacker manages to retrieve a database from a Web application and extracts from it the couple <Login, Password>.

LoginPassword
adminazerty
totomatrix
billyyep59f$4txwrr
tatamatrix
titifreepass
attackertesPwndPassword

In this case, the attacker directly owns the passwords of all users in plain text. Even Billy who has a strong password is not protected.

Storing passwords in plain text is NOT a secure solution. No one, including website / database administrators, should have access to the plain text password of the user.

Encrypted passwords

In some cases, passwords are stored in a database after being encrypted by a reversible algorithm (rot13, mask encryption…). As the algorithm is reversible, it does not comply with the rules of the CNIL (French National Commission on Informatics and Liberty). 

Indeed, it recommends that any password be transformed by a non-reversible cryptographic function. (source in French)

Since the attacker knows his password in plain text/encrypted form, he can guess the logic of the encryption and try to reverse it. If he succeeds, all passwords will be retrieved as quickly as they were in plain text, regardless of the algorithm’s complexity.

Obsolete Hash Function 

In many cases, passwords are stored with outdated irreversible cryptographic functions (md5, sha1…). For example, the LinkedIn site used to store part of its passwords with sha1, and after the hash leaks in 2012, it took only three days to recover 90% of the passwords. (source in French)

Let’s take the following database (the passwords are the same as earlier)

LoginPassword (md5)
adminab4f63f9ac65152575886860dde480a1
toto21b72c0b7adc5c7b4a50ffcb90d92dd6
billy47ad898a379c3dad10b4812eba843601
tata21b72c0b7adc5c7b4a50ffcb90d92dd6
titi5b9a8069d33fe9812dc8310ebff0a315

To notice:

  • In our case, all passwords (except Billy’s) are very frequently used passwords and are among the most used passwords (for example in the 10-million-password-list-top-1000.txt)
  •  It is also interesting to note that since hashs do not have a notion of randomness, toto and tata share the same hash, as they have the same password.

A simple search of the admin’s hash on the internet allows to directly retrieve their passwords.

Searching md5 hash in google

Except for the user Billy who has a complex password, it is possible to retrieve all hashes directly by queries in a search engine.

If the hashes are not found directly in a search engine, the attacker has other methods:

Brute force

Brute force is the action of trying out all the possibilities iteratively following a generation rule. It’s like when we try to open a padlock by listing all the possibilities from 0000 to 9999 until the lock opens. 

Dictionary

A dictionary attack is an attack where you try all the terms in a word list. Several types of dictionaries can be imagined:

  • A language dictionary
  • A ranking of the most used passwords 
  • A list adapted to a given context

If we take the image of the padlock, we can imagine a contextualised list like this:  

We know that the padlock belongs to “Tutu”, that he loves the number 42, and that he was born on November 26, 2001.  So we can assume that the padlock can possibly contain the number 42, 26, 11 and 01 and thus generate a list according to these criteria. 

Note: By abuse of language, it is common to call a dictionary attack a brute force attack. 

Rainbow table

Rainbow tables are a subject that deserves an article on its own. Quickly, it’s a data structure that allows retrieving passwords with a good storage/time compromise. This structure has a list of pre-calculated hashes and makes it possible to retrieve a hash in an acceptable time. 
Many rainbow tables are available online.

Benchmark to Retrieve Md5 Passwords

A benchmark was performed on the database with the rockyou list which contains 14,341,564 unique passwords. The benchmark was performed on a virtual machine, which is not optimal for breaking passwords.

Rockyou list

Looking at the results, we see that except for Billy’s password, which isn’t in the rockyou list, all three passwords have been found and only 11 seconds were necessary for the software to calculate all hashes present in rockyou.

Inappropriate hash function

After seeing the previous bad examples, it is tempting to use secure irreversible functions like sha256, sha512, or sha3. However, the purpose of these functions is to be used to compute a cryptographic summary to check the integrity of a file, to make an electronic signature, or to optimise search and indexing. They are not suitable for storing passwords, because they are fast to calculate, as the following benchmark proves:

LoginPassword (sha512)
admindf6b9fb15cfdbb7527be5a8a6e39f39e572c8ddb943fbc79a943438e9d3d85ebfc2ccf9e0eccd9346026c0b6876e0e01556fe56f135582c05fbdbb505d46755a
toto11a25e88658143a853d280bf77f81ff391347aaba2db54a3aab0149b265276de419880762a473fc496388bcf70566d7cfd0346c34add40652f8f7b669caf9ec0
billyfe9cb9b07725fd1cc3906232405119fedf9a020436630d3c1e0f632f73909e6ed9e731c149ac22743bbe9541881f35ceebf1d2782d469eb3b42968469d55a7a4
tata11a25e88658143a853d280bf77f81ff391347aaba2db54a3aab0149b265276de419880762a473fc496388bcf70566d7cfd0346c34add40652f8f7b669caf9ec0
titif767036acd951f5ddaf4eed5291c677db060055806dbcae69ca35d95847559dc8abce5011fd2b50833e760eca2d84d6daf1f078200f42b4fc10b58bad3761c88
Hashcat

Here again, except for the user Billy, all passwords have been retrieved and only 16 seconds was necessary for hashcat to finish its operations.

Improving SHA512

Even if it has been said earlier that the SHA512 function was not optimised for password storage, it may be interesting to show how to optimise it to understand the interest of appropriate hash functions for passwords.

Use of a Salt

The salt is a pollutant to the raw data (here the password) allowing producing two different hashes from the same data.  The salt is unique for each user and is composed of a random sequence.  It increases the chance that a password is unique and therefore the chance that a hash has never been used. 
For example, with salt, toto and tata will not have the same hash in the database.

The advantages of salt are multiple:

  1. It is almost impossible to find hash directly on the internet if it is salted. However, the salt must be long enough and random.
  2. Rainbow tables do not work with salted hash.
  3. As said before, two users with the same password will not have the same hash if salt is used. Password cracking software (hashcat, Johntheripper…), after breaking a hash, look to see if it is not present for another user. Therefore, without salt, after discovering the password of toto, the password of tata is directly discovered. However, with a salt, the software must start again from scratch for each user. 

Benchmark with salt

UserSaltHash sha512 with salt
adminBGdd6d6^ZgvkMhKf@W3RqT7509d123bce1aa92331861cf8fd738a58205045123f0e25f0862477cb19d3ee0757cd99865c30b123ad1e7f1be1e31a6058090458cb9941031f5c36683c8446e
totoHZBD^@gL*wvoExo6yJ7hVB6b28830776de6ad7ef1dd8c221e0d53fec4532c623075d0216d937ab82ab284a56a461ce5d4ec77d1783665a262a6a1eb98627b1f6260da55dbb782d7cb75bc4
billywvVndjwcZJy!dwT4fBD@U^2847b2605f6a1cd88399e6c9784c0e583799be9485cb128fe5f541f43636559067ec32de33e9b3fa2c15b15eec294cf262fd7aab2395dd64d6dbd9640b4fe6fd
tataQeNWm9NXqJ8m@m2^F7Kh9*165bc06b69fa2bfcd893bfde86358394406c87c7f7abba891cd10ed9fac887c54d52ed14310ad675078033e9bca80084d345fb2836933e55c60f734982430e2b
titiiQUemgw9M6Gw*&v6RG%MZ#f8eded6c815c7522ab6197aa319d3ff4cddc2c7eeffa0f91c1291603f807a47f320324d2ce2fed1fb3cbfe19524fc5d9c105093f755d76a949efb212fb85c942

With this configuration, it took 33 seconds for the hashcat software to recover the passwords. None of the hashes in the database are indexed by a search engine.

Use of Pepper

Pepper is also a pollutant, but common to all users. It is not stored in a database, but in the sources of an application, in a configuration file or in an environment variable. An attacker having “just” understood a database must guess the pepper or retrieve it in another way to be able to effectively break hashes.

Increasing the Number of Iterations

Another way to increase security is to repeat the number of iterations of hash. Increasing the number of iterations means that we’re going to hash the password several times. For example, with sha512 we have the following loop:

As long as iteration is greater than 0
hash = sha512(hash)
Decrement iteration

For a user who logs in, the calculation of the hash will be longer (it still takes a millisecond). But where a user loses a few milliseconds to log in, an attacker will lose much more time, because the attacker will lose several milliseconds per attempt, and since the attacker makes millions of attempts, this will result in additional hours/days to retrieve passwords.

Merging the 3 Methods

We can merge the three methods (salt, pepper and number of iterations) to have one method to store passwords more securely than a simple hash.

Function calculation_hash(password,salt,pepper,iteration)
Inputs
password is the user's password in plain text
salt is the unique salt per user and is randomly generated
pepper is the common pepper for all users and is randomly generated.
iteration is the number of iterations

Output:
The password hash
Hash = sha512(salt+password+pepper)
As long as iteration is greater than 0
hash = sha512(hash)
Decrement iteration

return hash

Then, to check the passwords when logging in, just call the same function with the password entered by the user and compare it with the hash in the database. If both are identical, then the login is successful.

Using Specific Functions

Previously, we managed to create an algorithm generating hashes that are more resistant to password cracking software. However, functions already exist and have proven their effectiveness over time. It is therefore useless to reinvent the wheel and risk causing errors. Among these different functions, we can find: Argon2, scrypt, PBKDF2, bcrypt…

These functions have many strengths:

  • Slower to calculate
  • More RAM-intensive (which is the weak point of GPUs)
  • Defining the number of iterations of the cryptographic function used. As seen previously, the more iterations, the more expensive the calculation will be.

Bcrypt

bcrypt is a hash function created by Niels Provos and David Mazières. It is based on the Blowfish encryption algorithm and was presented at USENIX in 1999. 

Among these positive points in addition to those mentioned above we find implementations in many languages. Moreover, since this algorithm dates back to 1999, it has shown its robustness over time, where some algorithms like Argon2(i) only exist since 2015.

The hash computed by bcrypt has a predefined form:

$2y$11$SXAXZyioy60hbnymeoJ9.ulscXwUFMhbvLaTxAt729tGusw.5AG4C

  1. Algorithm: This one can take several versions depending on the version of bcrypt ($2$, $2a$, $2x$, $2y$ and $2b$)
  2. The cost: The number of iterations in power of 2. For example here, the iteration is 11, the algorithm will do 211 iterations (2048 iterations).
  3. Salt: Instead of storing the salt in a dedicated column, it is directly stored in the final hash.
  4. The hashed password

Since bcrypt stores the number of iterations, this makes it an adaptive function, because the number of iterations can be increased and therefore it is longer and longer. This allows it, despite its age and the evolution of computing power, to be still robust against brute force attacks. The following benchmark shows that it takes 23 days for hashcat to compute the totality of rockyou hashes.

Bcrypt

It is interesting to note that the passwords azerty and matrix, being very weak passwords and present at the top of the list, were found during the short time (2 hours) that the software worked in the example.

Conclusion

We have seen in this article the usefulness of a robust hash function and the advantage of using an already existing function. Moreover, the problem of password storage has legal issues in addition to security issues.

Finally, it is interesting to note that in all cases the passwords azerty and matrix were found quickly, while the password yep59f$4txwrr was never found. Indeed, as this one is not present on any list, the only way to find it is to perform an exhaustive brute force on 13 characters, which is a very time-consuming operation (due to the large number of password possibilities). This also shows how important it is for a web application to force a strong password policy.