Skip to content

Grid tables with double-width characters crash rst2myst: “null bytes should be removed by now” (Docutils NUL \x00 leaks into tokens) #95

@coxxny

Description

@coxxny

Describe the bug

context
When I run rst2myst on an .rst file (UTF-8) that contains a grid table with double-width characters (e.g., Japanese):

+-------+-------+
|  列A  |  列B  |
+-------+-------+
|  あい |  かき |
+-------+-------+

expectation
I expected the table to convert to MyST/Markdown (or safely fall back) without errors—just like Docutils' rst2html.py renders the same file correctly.

bug
But instead rst2myst crashes with a null-byte error.
Here’s an error message I ran into:

$ rst2myst convert test.rst
test.rst -> test.md
FAILED:
null bytes should be removed by now

FINISHED ALL! (extensions: [])

problem
This is a problem for people converting RST docs that include fullwidth text—e.g., multilingual projects using rst-to-myst—because the build fails on valid tables. Internally, Docutils pads double-width characters with a NUL (\x00) during table parsing; that sentinel appears to leak into token text in the RST→MyST path and triggers the renderer’s null-byte error, breaking documentation builds.

Reproduce the bug

  1. Create test.rst (UTF-8, no BOM) with a grid table that includes double-width characters.
  2. run rst2myst convert test.rst.

List your environment

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions