Skip to content

Student Proposal: Integrating performance improvements from the Elm compiler into elm-format #13

@emmabastas

Description

@emmabastas

Integrating performance improvements from the Elm compiler into elm-format

Edit1: Added section Relevant parsec API
Edit2: Updated section Benefits and Requirements to reflect some of the points of feedback
Edit3: Updated Timeline after feedback.
Edit4: Final edit before submitting proposal.

Name: Emma Bastås
Name in Elm Slack: @emmabastas
Email: [email protected]

Summary

Elm's de-facto standard source code formatter elm-format is based on Elm's 0.15 compiler parsing code. Since Elm 0.19 the compilers parsing code has been rewritten to eliminate the dependency on parsec and indents for parsing and to greatly improve performance. elm-format has however diverged from the compiler to the point where integrating this rewrite is nontrivial. The end goal of this project is to integrate the performance improvements from the 0.19 compiler into elm-format.

How will I achieve this

  1. Replace elm-format's dependency on parsec with an adapter layer that implements the relevant parsec API on top of the compilers new parser API
  2. Incrementally migrate elm-format to use the new parsing API directly instead of via the adapter layer. Does not need to be fully completed.
  3. Benchmark elm-format before and after change. There is already strong evidence from the compiler rewrite that there will be a performance improvement, and this project will have other benefits regardless, so benchmarking is not super important and can be considered optional.

What will the project focus on

Integrating the performance improvements from the Elm compiler into elm-format. We should be confident that no new bugs are introduced and the code should be relatively clean and maintainable.

Benefits

elm-format is an integral part of the experience of Elm and is basically used universally within the community. This widespread adoption has a major benefit; how Elm code should be formatted becomes a none-issue. To ensure that elm-format remains this widely used it has to be strictly better than not using a formatter and performance is a part of that, no one should ever have to consider not using elm-format out of performance concerns. This project will help with that. Another benefit is that having elm-format and the compiler share as much of the codebase as possible (within reason) is good for maintainability. Future changes to the compilers parsing logic will be easier to integrate.

Timeline

Week 20-22 - Community bonding

Get to know mentor and other relevant community members. Figure out how we'd like to do our meetings (I would prefer regular meetings at fixed dates & times), how often, how long, what to discuss etc, what do we expect of one another? Refine goals, focuses and timeline. Discuss procedures for submitting code, testing, style etc. Most importantly: have a good time, build trust and lay the foundation for a healthy, frustration free mentor-student relationship.

Week 23 & 24

During week 23 my university does it's semester evaluations, which usually makes this a very intense week. Therefore I have planned to not do as much work on GSOC for this week. Any time lost will of course be compensated for during the subsequent weeks.

Get familiar with how elm-format and the Elm compiler does it's parsing. Develop an intuitive understanding of parsec/indents (Text.Parsec.Prim and Text.Parsec.Indent strike me as the most important modules) and the Elm compilers Primitives.hs, and how they differ, write it down in a document. Look at the relevant parsec API in more detail, are there declarations from parsec that look like they could be particularity difficult to implement? Detail that in a document.

After these two weeks I have produced a document detailing the key differences between parsec and Primitives.hs, and important/problematic parsec functions. I and the mentor agree on a rough order and manner in which to implement the wrapper layer. I assign subgoals to the week 25-28 block. I have written a sort of skeleton for the wrapper module(s) with all the declarations needed for elm-format to compile without parsec and indents in place, all of the functions bodies being error "todo".

Week 25-28

Implement the adapter layer for the compilers parsing code i.e replace all the error "todo" bodies with a suitable implementation. All tests are passing and we are confident that elm-format behaves the same as before.
This block will have been split further during week 24, when more details about the work that will need to be done is known.

Week 29 & 30

Incrementally migrate elm-format to use the new parsing API directly instead of via the adapter layer. This migration doesn't necessarily have to be completed fully.

Week 31

This weeks is a buffer for schedule overruns.
If there are no overruns then this week can be used for various things, refactoring code, perform a simple benchmark before and after this project, or continue with the work done during weeks 29 & 30.

Week X

Reserved for a potential vacation, date not yet decided. During this week I would not be able to do any work or be contacted (no internet). Planning to have a date set before the bonding period starts. The hours of work lost on this week would be compensated for during all of the other weeks.

Goals

  • elm-format's parsers uses the parser API from the compiler via the adapter layer instead of parsec and indents.
  • The dependency on parsec and indents can be removed.
  • No new bugs are introduced.
  • Code is relatively clean and maintainable.
  • (optional) Parsers use the new parser API directly of the adapter layer. The adapter layer can be removed.

Requirements

GHC, Cabal and other Haskell related programs.

Relevant parsec API

Here's a list of all of the declarations from parsec and indents that elm-format makes use of. Instance declarations not included.

Text.Parsec.Pos

  • SourceName
  • SourcePos
  • sourceLine
  • sourceColumn

Text.Parsec.Error

  • Message
  • ParseError
  • errorPos
  • errorMessages

Text.Parsec.Prim

  • ParsecT
  • Stream
  • (<?>)
  • (<|>)
  • lookAhead
  • try
  • many
  • skipMany
  • runParserT
  • getPosition
  • getInput
  • setInput
  • getState
  • updateState

Text.Parsec.Combinator

  • choice
  • many1
  • manyTill
  • skipMany1
  • option
  • optionMaybe
  • anyToken
  • notFollowedBy
  • between
  • eof

Text.Parsec.Char

  • oneOf
  • space
  • upper
  • lower
  • alphaNum
  • letter
  • digit
  • hexDigit
  • octDigit
  • char
  • anyChar
  • satisfy
  • string

Text.Parsec.Indent

  • IndentParser
  • runIndent
  • block
  • indented
  • checkIndent
  • withPos

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions