From 60c9950695de58a15b0b53f73de5fe563c8e39d3 Mon Sep 17 00:00:00 2001
From: Nikita Popov
This document is meant to be a sort of "big TODO list" for LLVM. Each -project in this document is something that would be useful for LLVM to have, and -would also be a great way to get familiar with the system. Some of these -projects are small and self-contained, which may be implemented in a couple of -days, others are larger. Several of these projects may lead to interesting -research projects in their own right. In any case, we welcome all -contributions.
- -If you are thinking about tackling one of these projects, please send a mail -to the LLVM -Developer's mailing list, so that we know the project is being worked on. -Additionally this is a good way to get more information about a specific project -or to suggest other projects to add to this page. -
- -The projects in this page are open-ended. More specific projects are -filed as unassigned enhancements in the -LLVM bug tracker. See the -list of currently outstanding issues -if you wish to help improve LLVM.
- -In addition to hacking on the main LLVM project, LLVM has several subprojects, - including Clang and others. If you are interested in working on these, please - see their "Open projects" page:
- -Improvements to the current infrastructure are always very welcome and tend -to be fairly straight-forward to implement. Here are some of the key areas that -can use improvement...
- -Currently, both Clang and LLVM have a separate target description infrastructure, -with some features duplicated, others "shared" (in the sense that Clang has to create -a full LLVM target description to query specific information).
- -This separation has grown in parallel, since in the beginning they were quite -different and served disparate purposes. But as the compiler evolved, more and -more features had to be shared between the two so that the compiler would behave -properly. An example is when targets have default features on speficic configurations -that don't have flags for. If the back-end has a different "default" behaviour -than the front-end and the latter has no way of enforcing behaviour, it -won't work.
- -An alternative would be to create flags for all little quirks, but first, Clang -is not the only front-end or tool that uses LLVM's middle/back ends, and second, -that's what "default behaviour" is there for, so we'd be missing the point.
- -Several ideas have been floating around to fix the Clang driver WRT recognizing -architectures, features and so on (table-gen it, user-specific configuration files, -etc) but none of them touch the critical issue: sharing that information with the -back-end.
- -Recently, the idea to factor out the target description infrastructure from -both Clang and LLVM into its own library that both use, has been floating around. -This would make sure that all defaults, flags and behaviour are shared, but would -also reduce the complexity (and thus the cost of maintenance) a lot. That would -also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour -across the board.
- -The main challenges are:
- --The LLVM bug tracker occasionally -has "code-cleanup" bugs filed in it. -Taking one of these and fixing it is a good way to get your feet wet in the -LLVM code and discover how some of its components work. Some of these include -some major IR redesign work, which is high-impact because it can simplify a lot -of things in the optimizer. -
- --Some specific ones that would be great to have: - -
Additionally, there are performance improvements in LLVM that need to get -fixed. These are marked with the slow-compile keyword. Use - -this LLVM bug tracker query -to find them.
- --The llvm-test testsuite is -a large collection of programs we use for nightly testing of generated code -performance, compile times, correctness, etc. Having a large testsuite gives -us a lot of coverage of programs and enables us to spot and improve any -problem areas in the compiler.
- --One extremely useful task, which does not require in-depth knowledge of -compilers, would be to extend our testsuite to include new programs and benchmarks. -In particular, we are interested in cpu-intensive programs that have few -library dependencies, produce some output that can be used for correctness -testing, and that are redistributable in source form. Many different programs -are suitable, for example, see this list for some -potential candidates. -
- -We are always looking for new testcases and benchmarks for use with LLVM. In -particular, it is useful to try compiling your favorite C source code with LLVM. -If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you -get the program to compile, it would be extremely useful to convert the build -system to be compatible with the LLVM Programs testsuite so that we can check it -into SVN and the automated tester can use it to track progress of the -compiler.
- -When testing a code, try running it with a variety of optimizations, and with -all the back-ends: CBE, llc, and lli.
- -Find benchmarks either using our test results or on your own, -where LLVM code generators do not produce optimal code or where another -compiler produces better code. Try to minimize the test case that demonstrates -the issue. Then, either submit a -bug with your testcase and the code that LLVM produces vs. the code that it -should produce, or even better, see if you can improve the code -generator and submit a patch. The basic idea is that it's generally quite easy -for us to fix performance problems if we know about them, but we generally don't -have the resources to go finding out why performance is bad.
- -The -LNT perf database has some nice features like detect moving average, -standard deviations, variations, etc. But the report page give too much emphasis -on the individual variation (where noise can be higher than signal), eg. - -this case.
- -The first part of the project would be to create an analysis tool that would -track moving averages and report: -
The second part would be to create a web page which would show all related -benchmarks (possibly configurable, like a dashboard) and show the basic statistics -with red/yellow/green colour codes to show status and links to more detailed -analysis of each benchmark.
- -A possible third part would be to be able to automatically cross reference -different builds, so that if you group them by architecture/compiler/number -of CPUs, this automated tool would understand that the changes are more common -to one particular group.
- -The -LLVM Coverage Report has a nice interface to show what source lines are -covered by the tests, but it doesn't mentions which tests, which revision and -what architecture is covered.
- -A project to renovate LCOV would involve: -
Another idea is to enable the test suite to run all built backends, not only - the host architecture, so that coverage report can be built in a fast machine - and have one report per commit without needing to update the buildbots.
- -Sometimes creating new things is more fun than improving existing things. -These projects tend to be more involved and perhaps require more work, but can -also be very rewarding.
- -Many proposed extensions and -improvements to LLVM core are awaiting design and implementation.
- -We have a strong base for development of -both pointer analysis based optimizations as well as pointer analyses -themselves. We want to take advantage of this:
- -- for () - x += sqrt(loopinvariant); -- -
We'd like to transform this into:
- -- t = sqrt(loopinvariant); - for () - x += t; -- -
This transformation is safe, because the value of errno isn't -otherwise changed in the loop and the exit value of errno from the -loop is the same. We currently can't do this, because sqrt clobbers -errno, so it isn't "readonly" or "readnone" and we don't have a good -way to model this.
- -The important part of this project is figuring out how to describe -errno in the optimizer: each libc #defines errno to something different -it seems. Maybe the solution is to have a __builtin_errno_addr() or -something and change sys headers to use it.
- -We now have a unified infrastructure for writing profile-guided -transformations, which will work either at offline-compile-time or in the JIT, -but we don't have many transformations. We would welcome new profile-guided -transformations as well as improvements to the current profiling system. -
- -Ideas for profile-guided transformations:
- -Improvements to the existing support:
- -LLVM aggressively optimizes for performance, but does not yet optimize for code size. -With a new ARM backend, there is increasing interest in using LLVM for embedded systems -where code size is more of an issue. -
- -Someone interested in working on implementing code compaction in LLVM might want to read -this article, describing using -link-time optimizations for code size optimization. -
- -- -It would seem that operating on LLVM code would save a lot of time -because its semantics are much simpler than x86. The cost of operating -on LLVM is that target-specific tricks would be missed.
- -The outcome would be a new LLVM pass that subsumes at least the -instruction combiner, and probably a few other passes as well. Benefits -would include not missing cases missed by the current combiner and also -more easily adapting to changes in the LLVM IR.
- -All previous superoptimizers have worked on linear sequences of code. -It would seem much better to operate on small subgraphs of the program -dependency graph.
- In addition to projects that enhance the existing LLVM infrastructure, there - are projects that improve software that uses, but is not included with, the - LLVM compiler infrastructure. These projects include open-source software - projects and research projects that use LLVM. Like projects that enhance the - core LLVM infrastructure, these projects are often challenging and rewarding. -
- -- At least one project (and probably more) needs to use analysis information - (such as call graph analysis) from within a MachineFunctionPass, however, - most analysis passes operate at the LLVM IR level. In some cases, a value - (e.g., a function pointer) cannot be mapped from the MachineInstr level back - to the LLVM IR level reliably, making the use of existing LLVM analysis - passes from within a MachineFunctionPass impossible (or at least brittle). -
- -- This project is to encode analysis information from the LLVM IR level into - the MachineInstr IR when it is generated so that it is available to a - MachineFunctionPass. The exemplar is call graph analysis (useful for - control-flow integrity instrumentation, analysis of code reuse defenses, and - gadget compilers); however, other LLVM analyses may be useful. -
-- Implement an on-demand function relocator in the LLVM JIT. This can help - improve code locality using runtime profiling information. The idea is to use - a relocation table for every function. The relocation entries need to be - updated upon every function relocation (take a look at - - this article). - A (per-function) basic block reordering would be a useful extension. -
-- The goal of this project is to implement better data layout optimizations - using the model of reference affinity. This - - paper - provides some background information. -
-- Slimmer is a prototype tool, built using LLVM, that uses dynamic analysis to - find potential performance bugs in programs. Development on Slimmer started - during Google Summer of Code in 2015 and resulted in an initial prototype, - but evaluation of the prototype and improvements to make it portable and - robust are still needed. This project would have a student pick up and - finish the Slimmer work. The source code of Slimmer and - its current documentation can be found at its - Github web page. -
-