Skip to content

Commit f931fcb

Browse files
committed
Add a DCE barrier builtin
In #43852 we noticed that the compiler is getting good enough to completely DCE a number of our benchmarks. We need to add some sort of mechanism to prevent the compiler from doing so. This adds just such an intrinsic. The intrinsic itself doesn't do anything, but it is considered effectful by our optimizer, preventing it from being DCE'd. At the LLVM level, it turns into a volatile store to an alloca (or an llvm.sideeffect if the values passed to the `dcebarrier` do not have any actual LLVM-level representation). The docs for the new intrinsic are as follows: ``` dcebarrier(args...) This function prevents dead-code elimination (DCE) of itself and any arguments passed to it, but is otherwise the lightest barrier possible. In particular, it is not a GC safepoint, does model an observable heap effect, does not expand to any code itself and may be re-ordered with respect to other side effects (though the total number of executions may not change). A useful model for this function is that it hashes all memory `reachable` from args and escapes this information through some observable side-channel that does not otherwise impact program behavior. Of course that's just a model. The function does nothing and returns `nothing`. This is intended for use in benchmarks that want to guarantee that `args` are actually computed. (Otherwise DCE may see that the result of the benchmark is unused and delete the entire benchmark code). **Note**: `dcebarrier` does not affect constant foloding. For example, in `dcebarrier(1+1)`, no add instruction needs to be executed at runtime and the code is semantically equivalent to `dcebarrier(2).` *# Examples function loop() for i = 1:1000 # The complier must guarantee that there are 1000 program points (in the correct # order) at which the value of `i` is in a register, but has otherwise # total control over the program. dcebarrier(i) end end ``` I believe the voltatile store at the LLVM level is actually somewhat stronger than what we want here. Ideally the `dcebarrier` would not and up generating any machine code at all and would also be compatible with optimizations like SROA and vectorization. However, I think this is fine for now.
1 parent e3b681c commit f931fcb

File tree

9 files changed

+88
-2
lines changed

9 files changed

+88
-2
lines changed

base/boot.jl

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -198,6 +198,8 @@ export
198198
<:, typeof, isa, typeassert,
199199
# method reflection
200200
applicable, invoke,
201+
# dcebarrier
202+
dcebarrier,
201203
# constants
202204
nothing, Main
203205

base/compiler/tfuncs.jl

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -527,6 +527,7 @@ add_tfunc(atomic_pointerset, 3, 3, (a, v, order) -> (@nospecialize; a), 5)
527527
add_tfunc(atomic_pointerswap, 3, 3, (a, v, order) -> (@nospecialize; pointer_eltype(a)), 5)
528528
add_tfunc(atomic_pointermodify, 4, 4, atomic_pointermodify_tfunc, 5)
529529
add_tfunc(atomic_pointerreplace, 5, 5, atomic_pointerreplace_tfunc, 5)
530+
add_tfunc(dcebarrier, 0, INT_INF, (@nospecialize args...)->Nothing, 0)
530531

531532
# more accurate typeof_tfunc for vararg tuples abstract only in length
532533
function typeof_concrete_vararg(t::DataType)
@@ -1695,6 +1696,8 @@ function _builtin_nothrow(@nospecialize(f), argtypes::Array{Any,1}, @nospecializ
16951696
return true
16961697
end
16971698
return false
1699+
elseif f === dcebarrier
1700+
return true
16981701
end
16991702
return false
17001703
end

base/docs/basedocs.jl

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2897,4 +2897,39 @@ See also [`"`](@ref \")
28972897
"""
28982898
kw"\"\"\""
28992899

2900+
"""
2901+
dcebarrier(args...)
2902+
2903+
This function prevents dead-code elimination (DCE) of itself and any arguments
2904+
passed to it, but is otherwise the lightest barrier possible. In particular,
2905+
it is not a GC safepoint, does model an observable heap effect, does not expand
2906+
to any code itself and may be re-ordered with respect to other side effects
2907+
(though the total number of executions may not change).
2908+
2909+
A useful model for this function is that it hashes all memory `reachable` from
2910+
args and escapes this information through some observable side-channel that does
2911+
not otherwise impact program behavior. Of course that's just a model. The
2912+
function does nothing and returns `nothing`.
2913+
2914+
This is intended for use in benchmarks that want to guarantee that `args` are
2915+
actually computed. (Otherwise DCE may see that the result of the benchmark is
2916+
unused and delete the entire benchmark code).
2917+
2918+
**Note**: `dcebarrier` does not affect constant foloding. For example, in
2919+
`dcebarrier(1+1)`, no add instruction needs to be executed at runtime and
2920+
the code is semantically equivalent to `dcebarrier(2).`
2921+
2922+
# Examples
2923+
2924+
function loop()
2925+
for i = 1:1000
2926+
# The complier must guarantee that there are 1000 program points (in the correct
2927+
# order) at which the value of `i` is in a register, but has otherwise
2928+
# total control over the program.
2929+
dcebarrier(i)
2930+
end
2931+
end
2932+
"""
2933+
dcebarrier
2934+
29002935
end

src/builtin_proto.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ DECLARE_BUILTIN(typeassert);
5353
DECLARE_BUILTIN(_typebody);
5454
DECLARE_BUILTIN(typeof);
5555
DECLARE_BUILTIN(_typevar);
56+
DECLARE_BUILTIN(dcebarrier);
5657

5758
JL_CALLABLE(jl_f_invoke_kwsorter);
5859
#ifdef DEFINE_BUILTIN_GLOBALS
@@ -65,6 +66,7 @@ JL_CALLABLE(jl_f__abstracttype);
6566
JL_CALLABLE(jl_f__primitivetype);
6667
JL_CALLABLE(jl_f__setsuper);
6768
JL_CALLABLE(jl_f__equiv_typedef);
69+
JL_CALLABLE(jl_f_dcebarrier);
6870

6971
#ifdef __cplusplus
7072
}

src/builtins.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1472,6 +1472,11 @@ JL_CALLABLE(jl_f__setsuper)
14721472
return jl_nothing;
14731473
}
14741474

1475+
JL_CALLABLE(jl_f_dcebarrier)
1476+
{
1477+
return jl_nothing;
1478+
}
1479+
14751480
static int equiv_field_types(jl_value_t *old, jl_value_t *ft)
14761481
{
14771482
size_t nf = jl_svec_len(ft);
@@ -1834,6 +1839,7 @@ void jl_init_primitives(void) JL_GC_DISABLED
18341839
add_builtin_func("_setsuper!", jl_f__setsuper);
18351840
jl_builtin__typebody = add_builtin_func("_typebody!", jl_f__typebody);
18361841
add_builtin_func("_equiv_typedef", jl_f__equiv_typedef);
1842+
jl_builtin_dcebarrier = add_builtin_func("dcebarrier", jl_f_dcebarrier);
18371843

18381844
// builtin types
18391845
add_builtin("Any", (jl_value_t*)jl_any_type);

src/cgutils.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,13 @@ static Value *track_pjlvalue(jl_codectx_t &ctx, Value *V)
88
return ctx.builder.CreateAddrSpaceCast(V, ctx.types().T_prjlvalue);
99
}
1010

11+
static Value *unsafe_untrack_prjlvalue(jl_codectx_t &ctx, Value *V)
12+
{
13+
if (V->getType() == ctx.types().T_prjlvalue)
14+
return ctx.builder.CreateAddrSpaceCast(V, ctx.types().T_pjlvalue);
15+
return V;
16+
}
17+
1118
// Take an arbitrary untracked value and make it gc-tracked
1219
static Value *maybe_decay_untracked(jl_codectx_t &ctx, Value *V)
1320
{

src/codegen.cpp

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3464,6 +3464,29 @@ static bool emit_builtin_call(jl_codectx_t &ctx, jl_cgval_t *ret, jl_value_t *f,
34643464
return true;
34653465
}
34663466

3467+
else if (f == jl_builtin_dcebarrier) {
3468+
*ret = mark_julia_const(ctx, jl_nothing);
3469+
bool emitted_any_side_effect = false;
3470+
for (size_t i = 1; i <= nargs; ++i) {
3471+
const jl_cgval_t &obj = argv[i];
3472+
if (obj.V) {
3473+
// TODO is this strong enough to constitute a read of any contained
3474+
// pointers?
3475+
Value *V = unsafe_untrack_prjlvalue(ctx, obj.V);
3476+
Value *slotv = emit_static_alloca(ctx, V->getType());
3477+
ctx.builder.CreateStore(V, slotv, true);
3478+
emitted_any_side_effect = true;
3479+
}
3480+
}
3481+
if (!emitted_any_side_effect) {
3482+
Function *sideeffect_func = Intrinsic::getDeclaration(
3483+
ctx.f->getParent(),
3484+
Intrinsic::sideeffect);
3485+
ctx.builder.CreateCall(sideeffect_func);
3486+
}
3487+
return true;
3488+
}
3489+
34673490
return false;
34683491
}
34693492

src/staticdata.c

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ extern "C" {
2626
// TODO: put WeakRefs on the weak_refs list during deserialization
2727
// TODO: handle finalizers
2828

29-
#define NUM_TAGS 152
29+
#define NUM_TAGS 153
3030

3131
// An array of references that need to be restored from the sysimg
3232
// This is a manually constructed dual of the gvars array, which would be produced by codegen for Julia code, for C.
@@ -198,6 +198,7 @@ jl_value_t **const*const get_tags(void) {
198198
INSERT_TAG(jl_builtin__expr);
199199
INSERT_TAG(jl_builtin_ifelse);
200200
INSERT_TAG(jl_builtin__typebody);
201+
INSERT_TAG(jl_builtin_dcebarrier);
201202

202203
// All optional tags must be placed at the end, so that we
203204
// don't accidentally have a `NULL` in the middle
@@ -252,7 +253,7 @@ static const jl_fptr_args_t id_to_fptrs[] = {
252253
&jl_f_applicable, &jl_f_invoke, &jl_f_sizeof, &jl_f__expr, &jl_f__typevar,
253254
&jl_f_ifelse, &jl_f__structtype, &jl_f__abstracttype, &jl_f__primitivetype,
254255
&jl_f__typebody, &jl_f__setsuper, &jl_f__equiv_typedef, &jl_f_opaque_closure_call,
255-
NULL };
256+
&jl_f_dcebarrier, NULL };
256257

257258
typedef struct {
258259
ios_t *s;

test/compiler/codegen.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -711,3 +711,10 @@ end
711711
@test !cmp43123(Ref{Function}(+), Ref{Union{typeof(+), typeof(-)}}(-))
712712
@test cmp43123(Function[+], Union{typeof(+), typeof(-)}[+])
713713
@test !cmp43123(Function[+], Union{typeof(+), typeof(-)}[-])
714+
715+
# Test that dcebarrier survives through to LLVM time
716+
f_dcebarrier_input(x) = dcebarrier(x+1)
717+
f_dcebarrier_const() = dcebarrier(1+1)
718+
@test occursin("store", get_llvm(f_dcebarrier_input, Tuple{Int64}, true, false, false))
719+
@test !occursin("store", get_llvm(f_dcebarrier_const, Tuple{}, true, false, false))
720+
@test occursin("llvm.sideeffect", get_llvm(f_dcebarrier_const, Tuple{}, true, false, false))

0 commit comments

Comments
 (0)