Skip to content

Commit 34cf4d1

Browse files
committed
Support names field in source maps
This adds support for `names` field in source maps, which contains function names. Source map mappings are correspondingly updated and emsymbolizer now can provide function name information only with source maps. While source maps don't provide the full inlined hierarchies, this provides the name of the original (= pre-inlining) function, which may not exist in the final binary because they were inlined. This is because source maps are primarily intended for user debugging. This also demangles C++ function names using `llvm-cxxfilt`, so the printed names can be human-readable. I tested with `wasm-opt.wasm` from Binaryen by `if (EMSCRIPTEN)` setup here: https://github.com/WebAssembly/binaryen/blob/95b2cf0a4ab2386f099568c5c61a02163770af32/CMakeLists.txt#L311-L372 with `-g -gsource-map`. With this PR and WebAssembly/binaryen#8068, the source map file size increases by 3.5x (8632423 -> 30070042) primarily due to the function name strings. From `llvm-dwarfdump` output, this also requires additional parsing of `DW_TAG_subprogram` and `DW_TAG_inlined_subroutine` tags which can be at any depths (because functions can be within nested namespaces or classes), so we cannot use `--recurse-depth=0` (#9580) anymore. In case of `wasm-opt.wasm` built with DWARF info, without `--recurse-depth=0` in the command line, the size of its text output increased by 27.5x, but with the `--filter-child-tag` / `-t` option (llvm/llvm-project#165720), the text output increased only (?) by 3.2x, which I think is tolerable. This disables `names` field generation when `-t` option is not available in `llvm-dwarfdump` because it was added recently. To avoid this text size problem, we can consider using DWARF-parsing Python libraries like https://github.com/eliben/pyelftools, but this will make another third party dependency, so I'm not sure if it's worth it at this point. This also increased running time of `wasm-sourcemap.py`, in case of the `wasm-opt.wasm`, by 2.3x (6.6s -> 15.4s), but compared to the linking time this was not very noticeable. Fixes #20715 and closes #25116.
1 parent 9f751bb commit 34cf4d1

File tree

4 files changed

+321
-35
lines changed

4 files changed

+321
-35
lines changed

test/core/test_dwarf.cpp

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
#include <emscripten.h>
2+
3+
EM_JS(int, out_to_js, (int x), {})
4+
5+
class MyClass {
6+
public:
7+
void foo();
8+
void bar();
9+
};
10+
11+
void __attribute__((noinline)) MyClass::foo() {
12+
out_to_js(0); // line 12
13+
out_to_js(1);
14+
out_to_js(2);
15+
}
16+
17+
void __attribute__((always_inline)) MyClass::bar() {
18+
out_to_js(3);
19+
__builtin_trap(); // line 19
20+
}
21+
22+
int main() {
23+
MyClass mc;
24+
mc.foo();
25+
mc.bar();
26+
}

test/test_other.py

Lines changed: 64 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -9639,12 +9639,50 @@ def check_dwarf_loc_info(address, funcs, locs):
96399639
for loc in locs:
96409640
self.assertIn(loc, out)
96419641

9642-
def check_source_map_loc_info(address, loc):
9642+
def check_source_map_loc_info(address, func, loc):
96439643
out = self.run_process(
96449644
[emsymbolizer, '-s', 'sourcemap', 'test_dwarf.wasm', address],
96459645
stdout=PIPE).stdout
9646+
self.assertIn(func, out)
96469647
self.assertIn(loc, out)
96479648

9649+
def do_tests(src):
9650+
# 1. Test DWARF + source map together
9651+
# For DWARF, we check for the full inlined info for both function names and
9652+
# source locations. Source maps does not provide inlined info. So we only
9653+
# check for the info of the outermost function.
9654+
self.run_process([EMCC, test_file(src), '-g', '-gsource-map', '-O1', '-o',
9655+
'test_dwarf.js'])
9656+
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9657+
out_to_js_call_loc)
9658+
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_func[0],
9659+
out_to_js_call_loc[0])
9660+
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9661+
# Source map shows the original (inlined) source location with the original
9662+
# function name
9663+
check_source_map_loc_info(unreachable_addr, unreachable_func[0],
9664+
unreachable_loc[0])
9665+
9666+
# 2. Test source map only
9667+
# The addresses, function names, and source locations are the same across
9668+
# the builds because they are relative offsets from the code section, so we
9669+
# don't need to recompute them
9670+
self.run_process([EMCC, test_file(src), '-gsource-map', '-O1', '-o',
9671+
'test_dwarf.js'])
9672+
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_func[0],
9673+
out_to_js_call_loc[0])
9674+
check_source_map_loc_info(unreachable_addr, unreachable_func[0],
9675+
unreachable_loc[0])
9676+
9677+
# 3. Test DWARF only
9678+
self.run_process([EMCC, test_file(src), '-g', '-O1', '-o',
9679+
'test_dwarf.js'])
9680+
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9681+
out_to_js_call_loc)
9682+
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9683+
9684+
9685+
# -- C program test --
96489686
# We test two locations within test_dwarf.c:
96499687
# out_to_js(0); // line 6
96509688
# __builtin_trap(); // line 13
@@ -9667,31 +9705,32 @@ def check_source_map_loc_info(address, loc):
96679705
# The first one corresponds to the innermost inlined location.
96689706
unreachable_loc = ['test_dwarf.c:13:3', 'test_dwarf.c:18:3']
96699707

9670-
# 1. Test DWARF + source map together
9671-
# For DWARF, we check for the full inlined info for both function names and
9672-
# source locations. Source maps provide neither function names nor inlined
9673-
# info. So we only check for the source location of the outermost function.
9674-
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9675-
out_to_js_call_loc)
9676-
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
9677-
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9678-
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
9679-
9680-
# 2. Test source map only
9681-
# The addresses, function names, and source locations are the same across
9682-
# the builds because they are relative offsets from the code section, so we
9683-
# don't need to recompute them
9684-
self.run_process([EMCC, test_file('core/test_dwarf.c'),
9685-
'-gsource-map', '-O1', '-o', 'test_dwarf.js'])
9686-
check_source_map_loc_info(out_to_js_call_addr, out_to_js_call_loc[0])
9687-
check_source_map_loc_info(unreachable_addr, unreachable_loc[0])
9708+
do_tests('core/test_dwarf.c')
96889709

9689-
# 3. Test DWARF only
9690-
self.run_process([EMCC, test_file('core/test_dwarf.c'),
9691-
'-g', '-O1', '-o', 'test_dwarf.js'])
9692-
check_dwarf_loc_info(out_to_js_call_addr, out_to_js_call_func,
9693-
out_to_js_call_loc)
9694-
check_dwarf_loc_info(unreachable_addr, unreachable_func, unreachable_loc)
9710+
# -- C++ program test --
9711+
# We test two locations within test_dwarf.cpp:
9712+
# out_to_js(0); // line 12
9713+
# __builtin_trap(); // line 19
9714+
self.run_process([EMCC, test_file('core/test_dwarf.cpp'),
9715+
'-g', '-gsource-map', '-O1', '-o', 'test_dwarf.js'])
9716+
# Address of out_to_js(0) within MyClass::foo(), uninlined
9717+
out_to_js_call_addr = self.get_instr_addr('call\t0', 'test_dwarf.wasm')
9718+
# Address of __builtin_trap() within MyClass::bar(), inlined into main()
9719+
unreachable_addr = self.get_instr_addr('unreachable', 'test_dwarf.wasm')
9720+
9721+
# Function name of out_to_js(0) within MyClass::foo(), uninlined
9722+
out_to_js_call_func = ['MyClass::foo()']
9723+
# Function names of __builtin_trap() within MyClass::bar(), inlined into
9724+
# main(). The first one corresponds to the innermost inlined function.
9725+
unreachable_func = ['MyClass::bar()', 'main']
9726+
9727+
# Source location of out_to_js(0) within MyClass::foo(), uninlined
9728+
out_to_js_call_loc = ['test_dwarf.cpp:12:3']
9729+
# Source locations of __builtin_trap() within MyClass::bar(), inlined into
9730+
# main(). The first one corresponds to the innermost inlined location.
9731+
unreachable_loc = ['test_dwarf.cpp:19:3', 'test_dwarf.cpp:25:6']
9732+
9733+
do_tests('core/test_dwarf.cpp')
96959734

96969735
def test_emsymbolizer_functions(self):
96979736
'Test emsymbolizer use cases that only provide function-granularity info'

tools/emsymbolizer.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,7 @@ class Location:
118118
def __init__(self):
119119
self.version = None
120120
self.sources = []
121+
self.funcs = []
121122
self.mappings = {}
122123
self.offsets = []
123124

@@ -129,6 +130,7 @@ def parse(self, filename):
129130

130131
self.version = source_map_json['version']
131132
self.sources = source_map_json['sources']
133+
self.funcs = source_map_json['names']
132134

133135
chars = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/='
134136
vlq_map = {c: i for i, c in enumerate(chars)}
@@ -156,6 +158,7 @@ def decodeVLQ(string):
156158
src = 0
157159
line = 1
158160
col = 1
161+
func = 0
159162
for segment in source_map_json['mappings'].split(','):
160163
data = decodeVLQ(segment)
161164
info = []
@@ -170,7 +173,9 @@ def decodeVLQ(string):
170173
if len(data) >= 4:
171174
col += data[3]
172175
info.append(col)
173-
# TODO: see if we need the name, which is the next field (data[4])
176+
if len(data) == 5:
177+
func += data[4]
178+
info.append(func)
174179

175180
self.mappings[offset] = WasmSourceMap.Location(*info)
176181
self.offsets.append(offset)
@@ -208,6 +213,7 @@ def lookup(self, offset, lower_bound=None):
208213
self.sources[info.source] if info.source is not None else None,
209214
info.line,
210215
info.column,
216+
self.funcs[info.func] if info.func is not None else None,
211217
)
212218

213219

0 commit comments

Comments
 (0)