-
Notifications
You must be signed in to change notification settings - Fork 138
Open
Description
Hi,
We're a research group that works on testing concurrent language implementations. When applying our recent prototype to GraalPy, we noticed that it would abrutply crash without any output or error.
This is a rare bug (observed 1 or 2 times over 100 executions) that affects the Windows AMD64 release for versions 25.0 and 24.2.2. We have not been able to reproduce this bug on Linux AMD64 at all. We're checking ARM, I'll update below if we find anything.
One time I was able to observe the following stack trace on 24.2.2:
ERROR: java.lang.IllegalStateException: The language did not complete all polyglot threads but should have: [Thread[#50051,Polyglot-python-49999,5,GRAALPYTHON_THREADS]]
org.graalvm.polyglot.PolyglotException: java.lang.IllegalStateException: The language did not complete all polyglot threads but should have: [Thread[#50051,Polyglot-python-49999,5,GRAALPYTHON_THREADS]]
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotLanguageContext.dispose(PolyglotLanguageContext.java:463)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.disposeContext(PolyglotContextImpl.java:3404)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.finishClose(PolyglotContextImpl.java:2979)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.closeImpl(PolyglotContextImpl.java:2883)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.closeAndMaybeWait(PolyglotContextImpl.java:2035)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.close(PolyglotContextImpl.java:1968)
at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextDispatch.close(PolyglotContextDispatch.java:72)
at org.graalvm.polyglot/org.graalvm.polyglot.Context.close(Context.java:881)
at org.graalvm.polyglot/org.graalvm.polyglot.Context.close(Context.java:908)
at com.oracle.graal.python.shell.GraalPythonMain.launch(GraalPythonMain.java:854)
at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:312)
at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:126)
at org.graalvm.launcher.AbstractLanguageLauncher.runLauncher(AbstractLanguageLauncher.java:180)
Suppressed: Attached Guest Language Frames (0)
Internal GraalVM error, please report at https://github.com/oracle/graal/issues/.
Here is the program we're using to find this bug (test.py
):
import threading
import sys
from collections import *
def t0(b1,s,res):
b1.wait()
try:
res.append(s.insert(0, 10))
except Exception as error:
res.append(repr(error))
def t1(b1,s,res):
b1.wait()
try:
res.append(s.clear())
except Exception as error:
res.append(repr(error))
def t2(b1,s,res):
b1.wait()
try:
res.append(s.insert(0, 10))
except Exception as error:
res.append(repr(error))
def t3(b1,s,res):
b1.wait()
try:
res.append(s.__ne__([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
except Exception as error:
res.append(repr(error))
def Test():
normal_data_structure = [1, 2, 3]
possible_results = [([10], ['None', 'None', 'None', 'True']),
([], ['None', 'None', 'None', 'True']),
([10, 10], ['None', 'None', 'None', 'True'])]
threads=[]
barrier = threading.Barrier(4)
res = []
threads.append(threading.Thread(target= t0, args=(barrier, normal_data_structure,res)))
threads.append(threading.Thread(target= t1, args=(barrier, normal_data_structure,res)))
threads.append(threading.Thread(target= t2, args=(barrier, normal_data_structure,res)))
threads.append(threading.Thread(target= t3, args=(barrier, normal_data_structure,res)))
for i in range(0, len(threads)):
threads[i].start()
for i in range(0, len(threads)):
threads[i].join()
for i in range(0, len(res)):
if isinstance(res[i], set):
temp = list(map(str,list(res[i])))
temp.sort()
res[i]= str(temp)
else:
res[i]= str(res[i])
res.sort()
if (normal_data_structure, res) not in possible_results:
print("found bug: " + str((normal_data_structure,res)))
normal_data_structure = [1, 2, 3]
print("test begin...")
for i in range(0,100):
threads = []
print(i)
# if i % 1000 == 0:
# print(i)
for i in range(0,100):
threads.append(threading.Thread(target= Test))
for t in threads:
t.start()
for t in threads:
t.join()
print("test Done")
And we run it with the following driver:
import subprocess
import traceback
fail_count = 0
for i in range(100):
try:
r = subprocess.run(['graalpy-24.2.2-windows-amd64/bin/graalpy.exe',
'--log.python.level=FINE', '--log.engine.level=FINE', '--log.launcher.level=FINE',
'test.py'],
capture_output=True, # comment this line if you want to see the live counter from the subprocess
check=True,
text=True)
print(f"Done with {i}")
if r.stderr:
with open(f'good_output{i}.txt', 'w+') as f:
f.write(r.stderr)
except subprocess.CalledProcessError as e:
fail_count += 1
print(f"Failcount: {fail_count}")
print(f"Subprocess Error: [{e}]")
print("Return code:", e.returncode)
print('stderr written to error_output.txt')
if e.stderr:
with open(f'error_output{i}.txt', 'w+') as f:
f.write(e.stderr)
# dead code below
except Exception:
print("Something else went wrong")
traceback.print_exc()
print(f"Subprocess failed {fail_count} times")
@mqbal is part of the team, adding them so they get notified about further discussion.
Metadata
Metadata
Assignees
Labels
No labels