Skip to content

GraalPy silent error on multi-threaded code #543

@luisggpina

Description

@luisggpina

Hi,

We're a research group that works on testing concurrent language implementations. When applying our recent prototype to GraalPy, we noticed that it would abrutply crash without any output or error.

This is a rare bug (observed 1 or 2 times over 100 executions) that affects the Windows AMD64 release for versions 25.0 and 24.2.2. We have not been able to reproduce this bug on Linux AMD64 at all. We're checking ARM, I'll update below if we find anything.

One time I was able to observe the following stack trace on 24.2.2:

ERROR: java.lang.IllegalStateException: The language did not complete all polyglot threads but should have: [Thread[#50051,Polyglot-python-49999,5,GRAALPYTHON_THREADS]]
org.graalvm.polyglot.PolyglotException: java.lang.IllegalStateException: The language did not complete all polyglot threads but should have: [Thread[#50051,Polyglot-python-49999,5,GRAALPYTHON_THREADS]]
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotLanguageContext.dispose(PolyglotLanguageContext.java:463)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.disposeContext(PolyglotContextImpl.java:3404)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.finishClose(PolyglotContextImpl.java:2979)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.closeImpl(PolyglotContextImpl.java:2883)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.closeAndMaybeWait(PolyglotContextImpl.java:2035)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextImpl.close(PolyglotContextImpl.java:1968)
	at org.graalvm.truffle/com.oracle.truffle.polyglot.PolyglotContextDispatch.close(PolyglotContextDispatch.java:72)
	at org.graalvm.polyglot/org.graalvm.polyglot.Context.close(Context.java:881)
	at org.graalvm.polyglot/org.graalvm.polyglot.Context.close(Context.java:908)
	at com.oracle.graal.python.shell.GraalPythonMain.launch(GraalPythonMain.java:854)
	at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:312)
	at org.graalvm.launcher.AbstractLanguageLauncher.launch(AbstractLanguageLauncher.java:126)
	at org.graalvm.launcher.AbstractLanguageLauncher.runLauncher(AbstractLanguageLauncher.java:180)
	Suppressed: Attached Guest Language Frames (0)
Internal GraalVM error, please report at https://github.com/oracle/graal/issues/.

Here is the program we're using to find this bug (test.py):

import threading
import sys
from collections import *

def t0(b1,s,res):
    b1.wait()
    try:
        res.append(s.insert(0, 10))
    except Exception as error:
        res.append(repr(error))
def t1(b1,s,res):
    b1.wait()
    try:
        res.append(s.clear())
    except Exception as error:
        res.append(repr(error))
def t2(b1,s,res):
    b1.wait()
    try:
        res.append(s.insert(0, 10))
    except Exception as error:
        res.append(repr(error))
def t3(b1,s,res):
    b1.wait()
    try:
        res.append(s.__ne__([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
    except Exception as error:
        res.append(repr(error))
def Test():
  normal_data_structure = [1, 2, 3]
  possible_results = [([10], ['None', 'None', 'None', 'True']),
                     ([], ['None', 'None', 'None', 'True']),
                     ([10, 10], ['None', 'None', 'None', 'True'])]
  threads=[]
  barrier = threading.Barrier(4)
  res = []
  threads.append(threading.Thread(target= t0, args=(barrier, normal_data_structure,res)))
  threads.append(threading.Thread(target= t1, args=(barrier, normal_data_structure,res)))
  threads.append(threading.Thread(target= t2, args=(barrier, normal_data_structure,res)))
  threads.append(threading.Thread(target= t3, args=(barrier, normal_data_structure,res)))
  for i in range(0, len(threads)):
      threads[i].start()
  for i in range(0, len(threads)):
      threads[i].join()
  for i in range(0, len(res)):
      if isinstance(res[i], set):
          temp = list(map(str,list(res[i])))
          temp.sort()
          res[i]= str(temp)
      else:
          res[i]= str(res[i])
  res.sort()
  if (normal_data_structure, res) not in possible_results:
      print("found bug: " + str((normal_data_structure,res)))
  normal_data_structure = [1, 2, 3]

print("test begin...")
for i in range(0,100):
    threads = []
    print(i)
    # if i % 1000 == 0:
        # print(i)
    for i in range(0,100):
        threads.append(threading.Thread(target= Test))
    for t in threads:
        t.start()
    for t in threads:
        t.join()
print("test Done")

And we run it with the following driver:

import subprocess
import tracebackfail_count = 0
for i in range(100):
    try:
        r = subprocess.run(['graalpy-24.2.2-windows-amd64/bin/graalpy.exe',
                        '--log.python.level=FINE', '--log.engine.level=FINE', '--log.launcher.level=FINE',
                        'test.py'], 
                        capture_output=True, # comment this line if you want to see the live counter from the subprocess
                    check=True,
                    text=True)
        print(f"Done with {i}")
        if r.stderr:
            with open(f'good_output{i}.txt', 'w+') as f:
                f.write(r.stderr)
    except subprocess.CalledProcessError as e:
        fail_count += 1
        print(f"Failcount: {fail_count}")
        print(f"Subprocess Error: [{e}]")
        print("Return code:", e.returncode)
        print('stderr written to error_output.txt')
        if e.stderr:
            with open(f'error_output{i}.txt', 'w+') as f:
                f.write(e.stderr)
    # dead code below
    except Exception:
        print("Something else went wrong")
        traceback.print_exc()
print(f"Subprocess failed {fail_count} times")

@mqbal is part of the team, adding them so they get notified about further discussion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions