-
Notifications
You must be signed in to change notification settings - Fork 0
Description
I haven't done any benchmarking yet, but it seems likely that the current algorithm will be fairly slow. It can probably be made much faster if needed.
For the "plain text" of the document, PATTERN does a regex match one character at a time via the final (.) pattern. (And for each regex match there is a bunch of type-unstable code that executes.) One simple improvement would be update the regex so that it can match long strings of plain text.
There might be other ways to improve performance. Relying on StringEncodings/iconv for translating a few bytes at a time (we have to flush before every print because Unicode and Windows-codepage encodings are intermixed) is surely inefficient, and also type-unstable because the code page is in the type of the encoder stream. But I don't really want to implement a Julia-native code-page conversion routine myself.