Skip to content

Conversation

@serso
Copy link

@serso serso commented Oct 17, 2025

Prior to this commit it was not possible to trace back the parsed node location in the original string. This information could be used to highlight the syntax of a math expression displayed in a rich editor (e.g. a spreadsheet or a calculator).

This commit is based on the PR #2796 that has been abandoned for 2 years. I rebased the task branch and fixed the issues reported in the original PR.

In the nutshell, each parsed node stores an array of sources (SourceMapping[]) that are set during parsing (see tokenSource and its usages for details). The node's constructor and clone method are adjusted to take an optional MetaOptions object containing the source mappings. In the future MetaOptions could be extended to store more information.

Benchmarks showed no change in evaluate. parse became slower, from 3.68µs to 5.17µs with the changes from this commit.

Closes #2795

Prior to this commit it was not possible to trace back the parsed node location
in the original string. This information could be used to highlight the syntax of
a math expression displayed in a rich editor (e.g. a spreadsheet or a calculator).

This commit is based on the PR josdejong#2796 that has been abandoned for 2 years.
I rebased the task branch and fixed the issues reported in the original PR.

In the nutshell, each parsed node stores an array of sources (`SourceMapping[]`)
that are set during parsing (see `tokenSource` and its usages for details).
The node's constructor and clone method are adjusted to take an optional `MetaOptions`
object containing the source mappings. In the future `MetaOptions` could be
extended to store more information.

Benchmarks showed no change in `evaluate`. `parse` became slower, from 3.68µs to 5.17µs
with the changes from this commit.

Closes josdejong#2795
serso added 2 commits October 20, 2025 12:16
In some cases it's important to parse data as quickly as possible.
As source tracing impacts the `parse` performance I've decided to add a new config
option that allows disabling the feature.

I set `traceSources` to `false` and ran `expression_parser.js` benchmark and got
the following result:
```
(plain js) evaluate                   0.04 µs   ±0.01%
(mathjs) evaluate                     0.25 µs   ±0.30%
(mathjs) parse, compile, evaluate     7.22 µs   ±0.79%
(mathjs) parse, compile               7.19 µs   ±5.51%
(mathjs) parse                        6.05 µs  ±12.34%
```

If `traceSources` is set to `true` I'm getting these times:
```
(plain js) evaluate                   0.04 µs   ±0.00%
(mathjs) evaluate                     0.25 µs   ±0.19%
(mathjs) parse, compile, evaluate     7.25 µs   ±0.59%
(mathjs) parse, compile               7.59 µs  ±10.22%
(mathjs) parse                        6.42 µs  ±18.91%
```

The control times (prior to the changes in this PR):
```
(plain js) evaluate                   0.04 µs   ±0.00%
(mathjs) evaluate                     0.25 µs   ±0.27%
(mathjs) parse, compile, evaluate     6.48 µs  ±15.36%
(mathjs) parse, compile               6.42 µs  ±14.74%
(mathjs) parse                        4.94 µs  ±13.23%
```
It looks like the hotspot was `tokenSource` method and the object
allocation followed by the method call: `{ sources: [tokenSource(state)] }`.
Instead of using `tokenSource` directly let's add a new method `tokenSourceMetaOptions`
and use it. This method returns a preallocated `defaultMetaOptions` if
source tracing is disabled.

`expression_parser.js` benchmark BEFORE this change:
```
(plain js) evaluate                   0.04 µs   ±0.00%
(mathjs) evaluate                     0.26 µs  ±31.65%
(mathjs) parse, compile, evaluate     6.23 µs  ±13.74%
(mathjs) parse, compile               6.01 µs  ±10.45%
(mathjs) parse                        5.31 µs  ±27.41%
```

AFTER this change:
```
(plain js) evaluate                   0.04 µs   ±0.00%
(mathjs) evaluate                     0.25 µs  ±15.79%
(mathjs) parse, compile, evaluate     6.00 µs   ±0.41%
(mathjs) parse, compile               5.89 µs   ±6.29%
(mathjs) parse                        4.66 µs   ±5.90%
```

BEFORE all the changes in the PR (control group):
```
(plain js) evaluate                   0.04 µs   ±0.00%
(mathjs) evaluate                     0.24 µs   ±0.09%
(mathjs) parse, compile, evaluate     4.89 µs   ±0.28%
(mathjs) parse, compile               4.97 µs  ±10.20%
(mathjs) parse                        3.61 µs   ±6.10%
```
@serso
Copy link
Author

serso commented Oct 22, 2025

@josdejong any chance you or someone else can look into this PR? Feedback is highly appreciated 🙏

@josdejong
Copy link
Owner

josdejong commented Oct 22, 2025

Hi Sergey, thanks a lot for picking this up! I think I don't have enough time today to review this (large) PR, I hope to look into it coming Friday.

Copy link
Owner

@josdejong josdejong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for picking this up Sergey! It's quite a job but it looks solid. it will be a very useful addition.

I made quite some inline comments (nothing very big though). Can you have a look at those? Thanks!

All nodes have the following methods:

- `clone() : Node`
- `clone(options: MetaOptions) : Node`
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The options argument is optional, and it is named meta in the source code (consistent with the constructor arguments of the Node classes). Can you document this like clone(meta?: MetaOptions) : Node?

And similarly, the argument meta is optional in the constructors of each of the Node classes, can you udpate that too in this documentation page?

Properties:

- `text: string` the token representing this node in the parsed string
- `index; number` the index of the token in the parsed string
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo, I think the semicolon ; sould be a colon :.

All is all well documented BTW 👍


const name = 'Node'
const dependencies = ['mathWithTransform']
export const defaultMetaOptions = {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to define a defaultMetaOptions. However, would it make sense to create a new object and array every time to prevent people from being able to mess up by mutating this shared/reused object? Maybe we can use Object.freeze here to solve this. (Thinking aloud here)

*/
constructor (object, index) {
super()
constructor (object, index, meta = defaultMetaOptions) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking aloud here: technically it is not needed to define a default value for meta in each of the Node classes since it will be set in the root Node.js class. What do you think? Would it be OK to leave meta as it is in these constructors and only set it in the constructor of Node.js? It would save a bit of overhead.

const name = 'Node'
const dependencies = ['mathWithTransform']
export const defaultMetaOptions = {
sources: []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about what would best communicate when sources are not available: an empty array like it is now? Or setting sources: null for example? In usage, it may be easier if you can know for sure that sources is always an array so you don't have to check whether it exists, so an empty array can make sense. But it can also be misleading and giving the impression that sources are available. Any thoughts?

assert.deepStrictEqual(parsedNaN.sources, [{ index: 0, text: 'NaN' }])
})

it('adds sources for function calls', function () {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for defining a function (uses the FunctionAssignmentNode)?

})

it('adds sources for function calls', function () {
const parsed = math.parse('foo(1, 2)')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for a function call with more than two arguments?

assert.deepStrictEqual(doubleQuote.sources, doubleSources)
})

it('adds sources for objects', function () {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for an empty object?

* {string} randomSeed
* Random seed for seeded pseudo random number generator.
* Set to null to randomly seed.
* {boolean} traceSources
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document this new configuration option traceSources on the page docs/core/configuration.md?

}

// Maps a parsed node back to its place in the original source string
export interface SourceMapping {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the type definitions of the constructors of all Node classes and also their .clone method describing the new meta option?

@gwhitney
Copy link
Collaborator

Does this also supersede #2615 ?

@josdejong
Copy link
Owner

Does this also supersede #2615 ?

Ah, yes. I'll close that PR in favor of this one. Thanks for pointing this out.

@josdejong
Copy link
Owner

@serso I just merged #3547 adding support for optional chaining. This causes a few merge conflicts with this PR. Please let me know if you need help resolving them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants