-
-
Notifications
You must be signed in to change notification settings - Fork 33k
Description
Bug report
Bug description:
When constructing dataclass or NamedTuple instances on multiple threads (on a free threading build), or accessing enum class attributes, performance doesn't scale when using multiple threads.
Regular class example (scales well):
# b_regular_class.py
from threading import Thread
from time import time
import sys
class Foo:
def __init__(self, x):
self.x = x
niter = 5 * 1000 * 1000
def benchmark(n):
for i in range(n):
Foo(x=1)
for nth in (1, 4):
t0 = time()
threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"{nth=} {(time() - t0) / nth}")
Dataclass example (doesn't scale well):
# b_dataclass.py
from threading import Thread
from dataclasses import dataclass
from time import time
import sys
@dataclass
class Foo:
x: int
niter = 5 * 1000 * 1000
def benchmark(n):
for i in range(n):
Foo(x=1)
for nth in (1, 4):
t0 = time()
threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"{nth=} {(time() - t0) / nth}")
Named tuple example (doesn't scale well):
# b_namedtuple.py
from threading import Thread
from typing import NamedTuple
from time import time
import sys
class Foo(NamedTuple):
x: int
niter = 5 * 1000 * 1000
def benchmark(n):
for i in range(n):
Foo(x=1)
for nth in (1, 4):
t0 = time()
threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"{nth=} {(time() - t0) / nth}")
Enum example (doesn't scale well):
# b_enum.py
from threading import Thread
from time import time
from enum import Enum
import sys
class Foo(Enum):
X = 1
Y = 2
niter = 5 * 1000 * 1000
def benchmark(n):
for i in range(n):
Foo.X
Foo.Y.value
for nth in (1, 4):
t0 = time()
threads = [Thread(target=benchmark, args=(niter,)) for _ in range(nth)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"{nth=} {(time() - t0) / nth}")
Results on recent main branch (running on an EC2 instance):
(cpython-dev) jukka@jukka-coder-dbx free-threading-benchmarks $ py b_regular_class.py
nth=1 1.1085155010223389
nth=4 0.2796591520309448
(cpython-dev) jukka@jukka-coder-dbx free-threading-benchmarks $ py b_dataclass.py
nth=1 1.1910037994384766
nth=4 1.0931583642959595
(cpython-dev) jukka@jukka-coder-dbx free-threading-benchmarks $ py b_namedtuple.py
nth=1 1.5688557624816895
nth=4 2.0257126092910767
(cpython-dev) jukka@jukka-coder-dbx free-threading-benchmarks $ py b_enum.py
nth=1 0.9439797401428223
nth=4 2.272495985031128
The expected behavior is that when using 4 threads (nth=4
), the elapsed time per benchmark iteration (the second printed value) goes down significantly compared to when using a single thread (nth=1
), which happens with the first benchmark (b_regular_class.py
) but not the others.
cc @colesbury (we discussed this at CPython Core Dev Sprint in person)
CPython versions tested on:
CPython main branch
Operating systems tested on:
Linux