Skip to content

The performance of DOM Parser and Schema-Based Parser. #52

@ZhaiMo15

Description

@ZhaiMo15

I've been testing the performance of Simdjson recently. The basic test is similar to default test, using twitter.json, as below:

@Benchmark
    public int recordSimdjson() {
        Set<String> defaultUsers = new HashSet<>();
        TwitterRecord twitter = simdJsonParser.parse(buffer, buffer.length, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            UserRecord user = status.user();
            if (user.default_profile()) {
                defaultUsers.add(user.screen_name());
            }
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int JsonValueSimdjson() {
        JsonValue simdJsonValue = simdJsonParser.parse(buffer, buffer.length);
        Set<String> defaultUsers = new HashSet<>();
        Iterator<JsonValue> tweets = simdJsonValue.get("statuses").arrayIterator();
        while (tweets.hasNext()) {
            JsonValue tweet = tweets.next();
            JsonValue user = tweet.get("user");
            if (user.get("default_profile").asBoolean()) {
                defaultUsers.add(user.get("screen_name").asString());
            }
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int recordJackson() throws IOException {
        Set<String> defaultUsers = new HashSet<>();
        TwitterRecord twitter = objectMapper.readValue(buffer, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            UserRecord user = status.user();
            if (user.default_profile()) {
                defaultUsers.add(user.screen_name());
            }
        }
        return defaultUsers.size();
    }

    record UserRecord(boolean default_profile, String screen_name) {
    }

    record StatusRecord(UserRecord user) {
    }

    record TwitterRecord(List<StatusRecord> statuses) {
    }

What's different is I shrunk the size of statuses, default is 101, I tested 101, 51, and 1 respectively, the result is below:
size 101:
image

size 51:
image

size 1:
image

What's more, I changed the depth of test, the default is 3 and I changed it to 2, as below:

@Benchmark
    public int recordSimdjson() {
        Set<Object> defaultUsers = new HashSet<>();
        TwitterRecord twitter = simdJsonParser.parse(buffer, buffer.length, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            long id = status.id();
            String text = status.text();
            defaultUsers.add(id);
            defaultUsers.add(text);
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int JsonValueSimdjson() {
        JsonValue simdJsonValue = simdJsonParser.parse(buffer, buffer.length);
        Set<Object> defaultUsers = new HashSet<>();
        Iterator<JsonValue> tweets = simdJsonValue.get("statuses").arrayIterator();
        while (tweets.hasNext()) {
            JsonValue tweet = tweets.next();
            JsonValue id = tweet.get("id");
            JsonValue text = tweet.get("text");
            defaultUsers.add(id.asLong());
            defaultUsers.add(text.asString());
        }
        return defaultUsers.size();
    }

    @Benchmark
    public int recordJackson() throws IOException {
        Set<Object> defaultUsers = new HashSet<>();
        TwitterRecord twitter = objectMapper.readValue(buffer, TwitterRecord.class);
        for (StatusRecord status : twitter.statuses()) {
            long id = status.id();
            String text = status.text();
            defaultUsers.add(id);
            defaultUsers.add(text);
        }
        return defaultUsers.size();
    }

    record StatusRecord(long id, String text) {
    }

    record TwitterRecord(List<StatusRecord> statuses) {
    }

The results are:
size 101:
image

size 51:
image

size 1:
image

Here are my questions:

  1. The performance of Simdjson is not always faster than jackson? The shorter the JSON, the worse of Simdjson? If my JSON is short, I'd better not use simdjson?
  2. DOM Parser vs Schema-Based Parser, the performance also depends on size of JSON? My first thought is Schema-Based is faster.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions