Skip to content

Commit 3ba4997

Browse files
chandlercdanakjCarbonInfraBotjonmeow
authored
Canonicalize away bit width and embed small integers into IntIds (#4487)
The first change here is to canonicalize away bit width when tracking integers in our shared value store. This lets us have a more definitive model of "what is the mathematical value". It also frees us to use more efficient bit widths when available, such as bits inside the ID itself. For canonicalizing, we try to minimize the width adjustments and maximize the use of the SSO in APInt, and so we never shrink belowe 64-bits and grow in multiples of the word bit width in the implementation. We also canonicalize to the signed 2s compliment representation so we can represent negative numbers in an intuitive way. The canonicalizing requires getting the bit width out of the type and adjusting to it within the toolchain when doing any kind of math, and this PR updates various places to do that, as well as adding some convenience APIs to assist. Then we take advantage of the canonical form and embed small integers into the ID itself rather than allocating storage for them and referencing them with an index. This is especially helpful for the pervasive small integers such as the sizes of types, arrays, etc. Those no longer require indirection at all. Various short-cut APIs to take advantage of this have also been added. This PR improves lexing by about 5% when there are lots of `i32` types. --------- Co-authored-by: Dana Jansens <[email protected]> Co-authored-by: Carbon Infra Bot <[email protected]> Co-authored-by: Jon Ross-Perkins <[email protected]>
1 parent 39ed62d commit 3ba4997

23 files changed

+848
-98
lines changed

toolchain/base/BUILD

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ cc_library(
5656
hdrs = ["value_ids.h"],
5757
deps = [
5858
":index_base",
59+
"//common:check",
5960
"//common:ostream",
6061
"@llvm-project//llvm:Support",
6162
],
@@ -89,10 +90,41 @@ cc_test(
8990
],
9091
)
9192

93+
cc_library(
94+
name = "int_store",
95+
srcs = ["int_store.cpp"],
96+
hdrs = ["int_store.h"],
97+
deps = [
98+
":index_base",
99+
":mem_usage",
100+
":value_store",
101+
":yaml",
102+
"//common:check",
103+
"//common:hashtable_key_context",
104+
"//common:ostream",
105+
"//common:set",
106+
"@llvm-project//llvm:Support",
107+
],
108+
)
109+
110+
cc_test(
111+
name = "int_store_test",
112+
size = "small",
113+
srcs = ["int_store_test.cpp"],
114+
deps = [
115+
":int_store",
116+
"//testing/base:gtest_main",
117+
"//testing/base:test_raw_ostream",
118+
"//toolchain/testing:yaml_test_helpers",
119+
"@googletest//:gtest",
120+
],
121+
)
122+
92123
cc_library(
93124
name = "shared_value_stores",
94125
hdrs = ["shared_value_stores.h"],
95126
deps = [
127+
":int_store",
96128
":mem_usage",
97129
":value_ids",
98130
":value_store",

toolchain/base/int_store.cpp

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
// Part of the Carbon Language project, under the Apache License v2.0 with LLVM
2+
// Exceptions. See /LICENSE for license information.
3+
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
4+
5+
#include "toolchain/base/int_store.h"
6+
7+
namespace Carbon {
8+
9+
auto IntStore::CanonicalBitWidth(int significant_bits) -> int {
10+
// For larger integers, we store them in as a signed APInt with a canonical
11+
// width that is the smallest multiple of the word type's bits, but no
12+
// smaller than a minimum of 64 bits to avoid spurious resizing of the most
13+
// common cases (<= 64 bits).
14+
static constexpr int WordWidth = llvm::APInt::APINT_BITS_PER_WORD;
15+
16+
return std::max<int>(
17+
MinAPWidth, ((significant_bits + WordWidth - 1) / WordWidth) * WordWidth);
18+
}
19+
20+
auto IntStore::CanonicalizeSigned(llvm::APInt value) -> llvm::APInt {
21+
return value.sextOrTrunc(CanonicalBitWidth(value.getSignificantBits()));
22+
}
23+
24+
auto IntStore::CanonicalizeUnsigned(llvm::APInt value) -> llvm::APInt {
25+
// We need the width to include a zero sign bit as we canonicalize to a
26+
// signed representation.
27+
return value.zextOrTrunc(CanonicalBitWidth(value.getActiveBits() + 1));
28+
}
29+
30+
auto IntStore::AddLarge(int64_t value) -> IntId {
31+
auto ap_id =
32+
values_.Add(llvm::APInt(CanonicalBitWidth(64), value, /*isSigned=*/true));
33+
return MakeIndexOrInvalid(ap_id.index);
34+
}
35+
36+
auto IntStore::AddSignedLarge(llvm::APInt value) -> IntId {
37+
auto ap_id = values_.Add(CanonicalizeSigned(value));
38+
return MakeIndexOrInvalid(ap_id.index);
39+
}
40+
41+
auto IntStore::AddUnsignedLarge(llvm::APInt value) -> IntId {
42+
auto ap_id = values_.Add(CanonicalizeUnsigned(value));
43+
return MakeIndexOrInvalid(ap_id.index);
44+
}
45+
46+
auto IntStore::LookupLarge(int64_t value) const -> IntId {
47+
auto ap_id = values_.Lookup(
48+
llvm::APInt(CanonicalBitWidth(64), value, /*isSigned=*/true));
49+
return MakeIndexOrInvalid(ap_id.index);
50+
}
51+
52+
auto IntStore::LookupSignedLarge(llvm::APInt value) const -> IntId {
53+
auto ap_id = values_.Lookup(CanonicalizeSigned(value));
54+
return MakeIndexOrInvalid(ap_id.index);
55+
}
56+
57+
auto IntStore::OutputYaml() const -> Yaml::OutputMapping {
58+
return values_.OutputYaml();
59+
}
60+
61+
auto IntStore::CollectMemUsage(MemUsage& mem_usage, llvm::StringRef label) const
62+
-> void {
63+
mem_usage.Collect(std::string(label), values_);
64+
}
65+
66+
} // namespace Carbon

0 commit comments

Comments
 (0)