Skip to content

Conversation

@gfngfn
Copy link

@gfngfn gfngfn commented Oct 22, 2018

This PR proposes an extension that enables us to handle Yojson data equipped with code positions the data come from. It adds

  • a submodule Yojson.SafePos and
  • a type Yojson.position for code positions,

and thereby maintains backward compatibility with the latest version (i.e. yojson 1.4.1). It probably be useful in situations like the following: where one expects the value corresponding to a certain name (such as "title") in objects to be a specific kind of data (such as string) but some data in a given (possibly handwritten) Yojson do not meet the expectation. In such case, one can report the code position where the unexpected data occur by using Yojson.SafePos in order to ask for correction in an easy-to-understand manner.

For an example of usage, see examples/filtering_pos.ml (which can be run by invoking the updated run-examples.sh).

@mjambon
Copy link
Member

mjambon commented Nov 19, 2018

Hi! Thanks for looking into this.

It seems that keeping track of locations would be useful to people who use yojson directly i.e. not from atdgen and don't have strict performance requirements. I imagine this fits the scenarios where json is used as a simple config file. Note that there's a bit of tension between achieving the best performance when working with large amounts of machine-generated json data and with human-written json which benefits from better error reporting.

A thing I never particularly like about yojson is that we have already 3 modules to pick from (Basic, Safe, and Raw) and that they're generated by cppo. Are you confident that this fourth module is the right one? Is there anything else we should add to it or remove while we're at it?

Other than that, my concern is about the performance loss introduced by creating and discarding location objects. Did you run performance benchmarks? I haven't done that recently, but based on results I got in the past, it would be significantly faster (10% or more) to call get_raw_position only if the position is not discarded. Can you try getting rid of these calls by using macros or preprocessor conditionals?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants