Storing passwords securely is a recurring concern.
But what are the main methods, how do they work, and what are they worth against current password cracking techniques?
In this article we explain the main principles of secure storage (hash, salt, pepper, iteration) and highlight their importance for resisting password recovery methods. Finally, we will talk about a reliable hash function for secure storage.
Plain Text Passwords
Let’s say an attacker manages to retrieve a database from a Web application and extracts from it the couple <Login, Password>.
In this case, the attacker directly owns the passwords of all users in plain text. Even Billy who has a strong password is not protected.
Storing passwords in plain text is NOT a secure solution. No one, including website / database administrators, should have access to the plain text password of the user.
In some cases, passwords are stored in a database after being encrypted by a reversible algorithm (rot13, mask encryption…). As the algorithm is reversible, it does not comply with the rules of the CNIL (French National Commission on Informatics and Liberty).
Indeed, it recommends that any password be transformed by a non-reversible cryptographic function. (source in French)
Since the attacker knows his password in plain text/encrypted form, he can guess the logic of the encryption and try to reverse it. If he succeeds, all passwords will be retrieved as quickly as they were in plain text, regardless of the algorithm’s complexity.
Obsolete Hash Function
In many cases, passwords are stored with outdated irreversible cryptographic functions (md5, sha1…). For example, the LinkedIn site used to store part of its passwords with sha1, and after the hash leaks in 2012, it took only three days to recover 90% of the passwords. (source in French)
Let’s take the following database (the passwords are the same as earlier)
- In our case, all passwords (except Billy’s) are very frequently used passwords and are among the most used passwords (for example in the 10-million-password-list-top-1000.txt)
- It is also interesting to note that since hashs do not have a notion of randomness, toto and tata share the same hash, as they have the same password.
A simple search of the admin’s hash on the internet allows to directly retrieve their passwords.
Except for the user Billy who has a complex password, it is possible to retrieve all hashes directly by queries in a search engine.
If the hashes are not found directly in a search engine, the attacker has other methods:
Brute force is the action of trying out all the possibilities iteratively following a generation rule. It’s like when we try to open a padlock by listing all the possibilities from 0000 to 9999 until the lock opens.
A dictionary attack is an attack where you try all the terms in a word list. Several types of dictionaries can be imagined:
- A language dictionary
- A ranking of the most used passwords
- A list adapted to a given context
If we take the image of the padlock, we can imagine a contextualised list like this:
We know that the padlock belongs to “Tutu”, that he loves the number 42, and that he was born on November 26, 2001. So we can assume that the padlock can possibly contain the number 42, 26, 11 and 01 and thus generate a list according to these criteria.
Note: By abuse of language, it is common to call a dictionary attack a brute force attack.
Rainbow tables are a subject that deserves an article on its own. Quickly, it’s a data structure that allows retrieving passwords with a good storage/time compromise. This structure has a list of pre-calculated hashes and makes it possible to retrieve a hash in an acceptable time.
Many rainbow tables are available online.
Benchmark to Retrieve Md5 Passwords
A benchmark was performed on the database with the rockyou list which contains 14,341,564 unique passwords. The benchmark was performed on a virtual machine, which is not optimal for breaking passwords.
Looking at the results, we see that except for Billy’s password, which isn’t in the rockyou list, all three passwords have been found and only 11 seconds were necessary for the software to calculate all hashes present in rockyou.
Inappropriate hash function
After seeing the previous bad examples, it is tempting to use secure irreversible functions like sha256, sha512, or sha3. However, the purpose of these functions is to be used to compute a cryptographic summary to check the integrity of a file, to make an electronic signature, or to optimise search and indexing. They are not suitable for storing passwords, because they are fast to calculate, as the following benchmark proves:
Here again, except for the user Billy, all passwords have been retrieved and only 16 seconds was necessary for hashcat to finish its operations.
Even if it has been said earlier that the SHA512 function was not optimised for password storage, it may be interesting to show how to optimise it to understand the interest of appropriate hash functions for passwords.
Use of a Salt
The salt is a pollutant to the raw data (here the password) allowing producing two different hashes from the same data. The salt is unique for each user and is composed of a random sequence. It increases the chance that a password is unique and therefore the chance that a hash has never been used.
For example, with salt, toto and tata will not have the same hash in the database.
The advantages of salt are multiple:
- It is almost impossible to find hash directly on the internet if it is salted. However, the salt must be long enough and random.
- Rainbow tables do not work with salted hash.
- As said before, two users with the same password will not have the same hash if salt is used. Password cracking software (hashcat, Johntheripper…), after breaking a hash, look to see if it is not present for another user. Therefore, without salt, after discovering the password of toto, the password of tata is directly discovered. However, with a salt, the software must start again from scratch for each user.
Benchmark with salt
|User||Salt||Hash sha512 with salt|
With this configuration, it took 33 seconds for the hashcat software to recover the passwords. None of the hashes in the database are indexed by a search engine.
Use of Pepper
Pepper is also a pollutant, but common to all users. It is not stored in a database, but in the sources of an application, in a configuration file or in an environment variable. An attacker having “just” understood a database must guess the pepper or retrieve it in another way to be able to effectively break hashes.
Increasing the Number of Iterations
Another way to increase security is to repeat the number of iterations of hash. Increasing the number of iterations means that we’re going to hash the password several times. For example, with sha512 we have the following loop:
As long as iteration is greater than 0
hash = sha512(hash)
For a user who logs in, the calculation of the hash will be longer (it still takes a millisecond). But where a user loses a few milliseconds to log in, an attacker will lose much more time, because the attacker will lose several milliseconds per attempt, and since the attacker makes millions of attempts, this will result in additional hours/days to retrieve passwords.
Merging the 3 Methods
We can merge the three methods (salt, pepper and number of iterations) to have one method to store passwords more securely than a simple hash.
password is the user's password in plain text
salt is the unique salt per user and is randomly generated
pepper is the common pepper for all users and is randomly generated.
iteration is the number of iterations
The password hash
Hash = sha512(salt+password+pepper)
As long as iteration is greater than 0
hash = sha512(hash)
Then, to check the passwords when logging in, just call the same function with the password entered by the user and compare it with the hash in the database. If both are identical, then the login is successful.
Using Specific Functions
Previously, we managed to create an algorithm generating hashes that are more resistant to password cracking software. However, functions already exist and have proven their effectiveness over time. It is therefore useless to reinvent the wheel and risk causing errors. Among these different functions, we can find: Argon2, scrypt, PBKDF2, bcrypt…
These functions have many strengths:
- Slower to calculate
- More RAM-intensive (which is the weak point of GPUs)
- Defining the number of iterations of the cryptographic function used. As seen previously, the more iterations, the more expensive the calculation will be.
bcrypt is a hash function created by Niels Provos and David Mazières. It is based on the Blowfish encryption algorithm and was presented at USENIX in 1999.
Among these positive points in addition to those mentioned above we find implementations in many languages. Moreover, since this algorithm dates back to 1999, it has shown its robustness over time, where some algorithms like Argon2(i) only exist since 2015.
The hash computed by bcrypt has a predefined form:
- Algorithm: This one can take several versions depending on the version of bcrypt ($2$, $2a$, $2x$, $2y$ and $2b$)
- The cost: The number of iterations in power of 2. For example here, the iteration is 11, the algorithm will do 211 iterations (2048 iterations).
- Salt: Instead of storing the salt in a dedicated column, it is directly stored in the final hash.
- The hashed password
Since bcrypt stores the number of iterations, this makes it an adaptive function, because the number of iterations can be increased and therefore it is longer and longer. This allows it, despite its age and the evolution of computing power, to be still robust against brute force attacks. The following benchmark shows that it takes 23 days for hashcat to compute the totality of rockyou hashes.
It is interesting to note that the passwords azerty and matrix, being very weak passwords and present at the top of the list, were found during the short time (2 hours) that the software worked in the example.
We have seen in this article the usefulness of a robust hash function and the advantage of using an already existing function. Moreover, the problem of password storage has legal issues in addition to security issues.
Finally, it is interesting to note that in all cases the passwords azerty and matrix were found quickly, while the password yep59f$4txwrr was never found. Indeed, as this one is not present on any list, the only way to find it is to perform an exhaustive brute force on 13 characters, which is a very time-consuming operation (due to the large number of password possibilities). This also shows how important it is for a web application to force a strong password policy.