From 60c9950695de58a15b0b53f73de5fe563c8e39d3 Mon Sep 17 00:00:00 2001 From: Nikita Popov Date: Mon, 22 Sep 2025 15:07:42 +0200 Subject: [PATCH] Remove non-GSoC "open projects" These lists of "open projects" are badly outdated, and nobody is maintaining them. And I don't think anyone wants to maintain them in this place and format. A very large fraction of the "open" projects have already been implemented. I think it would be best to just delete these entirely. --- OpenProjects.html | 675 ---------------------------------------------- 1 file changed, 675 deletions(-) diff --git a/OpenProjects.html b/OpenProjects.html index 898d1d4b..444729cc 100755 --- a/OpenProjects.html +++ b/OpenProjects.html @@ -6088,681 +6088,6 @@

- -
- What is this? -
- - -
- -

This document is meant to be a sort of "big TODO list" for LLVM. Each -project in this document is something that would be useful for LLVM to have, and -would also be a great way to get familiar with the system. Some of these -projects are small and self-contained, which may be implemented in a couple of -days, others are larger. Several of these projects may lead to interesting -research projects in their own right. In any case, we welcome all -contributions.

- -

If you are thinking about tackling one of these projects, please send a mail -to the LLVM -Developer's mailing list, so that we know the project is being worked on. -Additionally this is a good way to get more information about a specific project -or to suggest other projects to add to this page. -

- -

The projects in this page are open-ended. More specific projects are -filed as unassigned enhancements in the -LLVM bug tracker. See the -list of currently outstanding issues -if you wish to help improve LLVM.

- -
- - -
- LLVM Subprojects: Clang and More -
- - -
- -

In addition to hacking on the main LLVM project, LLVM has several subprojects, - including Clang and others. If you are interested in working on these, please - see their "Open projects" page:

- - - -
- - -
- Improving the current system -
- - -
- -

Improvements to the current infrastructure are always very welcome and tend -to be fairly straight-forward to implement. Here are some of the key areas that -can use improvement...

- -
- - -
- Factor out target descriptions -
- -
- -

Currently, both Clang and LLVM have a separate target description infrastructure, -with some features duplicated, others "shared" (in the sense that Clang has to create -a full LLVM target description to query specific information).

- -

This separation has grown in parallel, since in the beginning they were quite -different and served disparate purposes. But as the compiler evolved, more and -more features had to be shared between the two so that the compiler would behave -properly. An example is when targets have default features on speficic configurations -that don't have flags for. If the back-end has a different "default" behaviour -than the front-end and the latter has no way of enforcing behaviour, it -won't work.

- -

An alternative would be to create flags for all little quirks, but first, Clang -is not the only front-end or tool that uses LLVM's middle/back ends, and second, -that's what "default behaviour" is there for, so we'd be missing the point.

- -

Several ideas have been floating around to fix the Clang driver WRT recognizing -architectures, features and so on (table-gen it, user-specific configuration files, -etc) but none of them touch the critical issue: sharing that information with the -back-end.

- -

Recently, the idea to factor out the target description infrastructure from -both Clang and LLVM into its own library that both use, has been floating around. -This would make sure that all defaults, flags and behaviour are shared, but would -also reduce the complexity (and thus the cost of maintenance) a lot. That would -also allow all tools (lli, llc, lld, lldb, etc) to have the same behaviour -across the board.

- -

The main challenges are:

- - - -
- - -
- Implementing Code Cleanup bugs -
- -
- -

-The LLVM bug tracker occasionally -has "code-cleanup" bugs filed in it. -Taking one of these and fixing it is a good way to get your feet wet in the -LLVM code and discover how some of its components work. Some of these include -some major IR redesign work, which is high-impact because it can simplify a lot -of things in the optimizer. -

- -

-Some specific ones that would be great to have: - -

-

- -

Additionally, there are performance improvements in LLVM that need to get -fixed. These are marked with the slow-compile keyword. Use - -this LLVM bug tracker query -to find them.

- -
- - -
- Add programs to the llvm-test testsuite -
- -
- -

-The llvm-test testsuite is -a large collection of programs we use for nightly testing of generated code -performance, compile times, correctness, etc. Having a large testsuite gives -us a lot of coverage of programs and enables us to spot and improve any -problem areas in the compiler.

- -

-One extremely useful task, which does not require in-depth knowledge of -compilers, would be to extend our testsuite to include new programs and benchmarks. -In particular, we are interested in cpu-intensive programs that have few -library dependencies, produce some output that can be used for correctness -testing, and that are redistributable in source form. Many different programs -are suitable, for example, see this list for some -potential candidates. -

- -
- - -
- Compile programs with the LLVM Compiler -
- -
- -

We are always looking for new testcases and benchmarks for use with LLVM. In -particular, it is useful to try compiling your favorite C source code with LLVM. -If it doesn't compile, try to figure out why or report it to the llvm-bugs list. If you -get the program to compile, it would be extremely useful to convert the build -system to be compatible with the LLVM Programs testsuite so that we can check it -into SVN and the automated tester can use it to track progress of the -compiler.

- -

When testing a code, try running it with a variety of optimizations, and with -all the back-ends: CBE, llc, and lli.

- -
- - - -
- Benchmark the LLVM compiler -
- -
- -

Find benchmarks either using our test results or on your own, -where LLVM code generators do not produce optimal code or where another -compiler produces better code. Try to minimize the test case that demonstrates -the issue. Then, either submit a -bug with your testcase and the code that LLVM produces vs. the code that it -should produce, or even better, see if you can improve the code -generator and submit a patch. The basic idea is that it's generally quite easy -for us to fix performance problems if we know about them, but we generally don't -have the resources to go finding out why performance is bad.

- -
- - - -
- Benchmark Statistics and Warning System -
- -
- -

The -LNT perf database has some nice features like detect moving average, -standard deviations, variations, etc. But the report page give too much emphasis -on the individual variation (where noise can be higher than signal), eg. - -this case.

- -

The first part of the project would be to create an analysis tool that would -track moving averages and report: -

- -

The second part would be to create a web page which would show all related -benchmarks (possibly configurable, like a dashboard) and show the basic statistics -with red/yellow/green colour codes to show status and links to more detailed -analysis of each benchmark.

- -

A possible third part would be to be able to automatically cross reference -different builds, so that if you group them by architecture/compiler/number -of CPUs, this automated tool would understand that the changes are more common -to one particular group.

- -
- - -
- Improving Coverage Reports -
- -
- -

The -LLVM Coverage Report has a nice interface to show what source lines are -covered by the tests, but it doesn't mentions which tests, which revision and -what architecture is covered.

- -

A project to renovate LCOV would involve: -

- -

Another idea is to enable the test suite to run all built backends, not only - the host architecture, so that coverage report can be built in a fast machine - and have one report per commit without needing to update the buildbots.

- -
- - -
- Miscellaneous Improvements -
- -
- -
    - -
  1. Completely rewrite bugpoint. In addition to being a mess, bugpoint suffers -from a number of problems where it will "lose" a bug when reducing. It should -be rewritten from scratch to solve these and other problems.
  2. -
  3. Add support for -transactions to the PassManager for improved bugpoint.
  4. -
  5. Improve bugpoint to -support running tests in parallel on MP machines.
  6. -
  7. Add MC assembler/disassembler and JIT support to the SPARC port.
  8. -
  9. Move more optimizations out of the -instcombine pass and into -InstructionSimplify. The optimizations that should be moved are those that -do not create new instructions, for example turning sub i32 %x, 0 -into %x. Many passes use InstructionSimplify to clean up code as -they go, so making it smarter can result in improvements all over the place.
  10. -
- -
- - -
- Adding new capabilities to LLVM -
- - -
- -

Sometimes creating new things is more fun than improving existing things. -These projects tend to be more involved and perhaps require more work, but can -also be very rewarding.

- -
- - -
- Extend the LLVM intermediate representation -
- -
- -

Many proposed extensions and -improvements to LLVM core are awaiting design and implementation.

- -
    -
  1. Improvements -for Debug Information Generation
  2. -
  3. EH support for non-call exceptions
  4. -
  5. Many ideas for feature requests are stored in LLVM bugzilla. Search -for bugs with a "new-feature" keyword.
  6. -
- -
- - -
- Pointer and Alias Analysis -
- -
- -

We have a strong base for development of -both pointer analysis based optimizations as well as pointer analyses -themselves. We want to take advantage of this:

- -
    -
  1. The globals mod/ref pass does an inexpensive bottom-up context sensitive - alias analysis. There are some inexpensive things that we could do to better - capture the effects of functions that access pointer arguments. This can be - really important for C++ methods, which spend lots of time accessing pointers - off 'this'.
  2. - -
  3. The alias analysis API supports the getModRefBehavior method, which allows - the implementation to give details analysis of the functions. For example, we - could implement full knowledge of - printf/scanf side effects, which would be useful. This feature is in - place but not being used for anything right now.
  4. - -
  5. We need some way to reason about errno. Consider a loop like this: - -
    -    for ()
    -      x += sqrt(loopinvariant);
    -
    - -

    We'd like to transform this into:

    - -
    -    t = sqrt(loopinvariant);
    -    for ()
    -      x += t;
    -
    - -

    This transformation is safe, because the value of errno isn't -otherwise changed in the loop and the exit value of errno from the -loop is the same. We currently can't do this, because sqrt clobbers -errno, so it isn't "readonly" or "readnone" and we don't have a good -way to model this.

    - -

    The important part of this project is figuring out how to describe -errno in the optimizer: each libc #defines errno to something different -it seems. Maybe the solution is to have a __builtin_errno_addr() or -something and change sys headers to use it.

    - -
  6. There are lots of ways to optimize out and improve handling of -memcpy/memset.
  7. - -
- -
- - -
- Profile-Guided Optimization -
- -
- -

We now have a unified infrastructure for writing profile-guided -transformations, which will work either at offline-compile-time or in the JIT, -but we don't have many transformations. We would welcome new profile-guided -transformations as well as improvements to the current profiling system. -

- -

Ideas for profile-guided transformations:

- -
    -
  1. Superblock formation (with many optimizations)
  2. -
  3. Loop unrolling/peeling
  4. -
  5. Profile directed inlining
  6. -
  7. Code layout
  8. -
  9. ...
  10. -
- -

Improvements to the existing support:

- -
    -
  1. The current block and edge profiling code that gets inserted is very simple -and inefficient. Through the use of control-dependence information, many fewer -counters could be inserted into the code. Also, if the execution count of a -loop is known to be a compile-time or runtime constant, all of the counters in -the loop could be avoided.
  2. - -
  3. You could implement one of the "static profiling" algorithms which analyze a -piece of code an make educated guesses about the relative execution frequencies -of various parts of the code.
  4. - -
  5. You could add path profiling support, or adapt the existing LLVM path -profiling code to work with the generic profiling interfaces.
  6. -
- -
- - -
- Code Compaction -
- -
-

LLVM aggressively optimizes for performance, but does not yet optimize for code size. -With a new ARM backend, there is increasing interest in using LLVM for embedded systems -where code size is more of an issue. -

- -

Someone interested in working on implementing code compaction in LLVM might want to read -this article, describing using -link-time optimizations for code size optimization. -

- -
- - -
- New Transformations and Analyses -
- -
- -
    -
  1. Implement a Loop Dependence Analysis Infrastructure
    - - Design some way to represent and query dep analysis
  2. -
  3. Value range propagation pass
  4. -
  5. More fun with loops: - - Predictive Commoning - -
  6. -
  7. Type inference (aka. devirtualization)
  8. -
  9. Value - assertions (also PR810).
  10. -
- -
- - -
- Code Generator Improvements -
- -
- -
    -
  1. Generalize target-specific backend passes that could be target-independent, - by adding necessary target hooks and making sure all IR/MI features (such as - register masks and predicated instructions) are properly handled. Enable these - for other targets where doing so is demonstrably beneficial. - For example: -
    1. lib/Target/Hexagon/RDF*
    2. -
    3. lib/Target/AArch64/AArch64AddressTypePromotion.cpp
    4. -
    -
  2. -
  3. Merge the delay slot filling logic that is duplicated into (at least) - the Sparc and Mips backends into a single target independent pass. - Likewise, the branch shortening logic in several targets should be merged - together into one pass.
  4. -
  5. Implement 'stack slot coloring' to allocate two frame indexes to the same - stack offset if their live ranges don't overlap. This can reuse a bunch of - analysis machinery from LiveIntervals. Making the stack smaller is good - for cache use and very important on targets where loads have limited - displacement like ppc, thumb, mips, sparc, etc. This should be done as - a pass before prolog epilog insertion. This is now done for register - allocator temporaries, but not for allocas.
  6. -
  7. Implement 'shrink wrapping', which is the intelligent placement of callee - saved register save/restores. Right now PrologEpilogInsertion always saves - every (modified) callee save reg in the prolog and restores it in the - epilog, however, some paths through a function (e.g. an early exit) may - not use all regs. Sinking the save down the CFG avoids useless work on - these paths. Work has started on this, please inquire on llvm-dev.
  8. -
  9. Implement interprocedural register allocation. The CallGraphSCCPass can be - used to implement a bottom-up analysis that will determine the *actual* - registers clobbered by a function. Use the pass to fine tune register usage - in callers based on *actual* registers used by the callee.
  10. -
  11. Add support for 16-bit x86 assembly and real mode to the assembler and - disassembler, for use by BIOS code. This includes both 16-bit instruction - encodings as well as privileged instructions (lgdt, lldt, ltr, lmsw, clts, - invd, invlpg, wbinvd, hlt, rdmsr, wrmsr, rdpmc, rdtsc) and the control and - debug registers. -
- -
- - -
- Miscellaneous Additions -
- -
- -
    -
  1. Port the Bigloo -Scheme compiler, from Manuel Serrano at INRIA Sophia-Antipolis, to -output LLVM bytecode. It seems that it can already output .NET -bytecode, JVM bytecode, and C, so LLVM would ostensibly be another good -candidate.
  2. -
  3. Write a new frontend for some other language (Java? OCaml? Forth?)
  4. -
  5. Random test vector generator: Use a C grammar to generate random C code, -e.g., quest; -run it through llvm-gcc, then run a random set of passes on it using opt. -Try to crash opt. When -opt crashes, use bugpoint to reduce the -test case and post it to a website or mailing list. Repeat ad infinitum.
  6. -
  7. Add sandbox features to the Interpreter: catch invalid memory accesses, - potentially unsafe operations (access via arbitrary memory pointer) etc. -
  8. -
  9. Port Valgrind to use LLVM code generation - and optimization passes instead of its own.
  10. -
  11. Write LLVM IR level debugger (extend Interpreter?)
  12. -
  13. Write an LLVM Superoptimizer. It would be interesting to take ideas from - this superoptimizer for x86: -paper #1 and paper #2 and adapt them to run on LLVM code.

    - -It would seem that operating on LLVM code would save a lot of time -because its semantics are much simpler than x86. The cost of operating -on LLVM is that target-specific tricks would be missed.

    - -The outcome would be a new LLVM pass that subsumes at least the -instruction combiner, and probably a few other passes as well. Benefits -would include not missing cases missed by the current combiner and also -more easily adapting to changes in the LLVM IR.

    - -All previous superoptimizers have worked on linear sequences of code. -It would seem much better to operate on small subgraphs of the program -dependency graph.

  14. -
- -
- - -
- Projects using LLVM -
- - -
- -

- In addition to projects that enhance the existing LLVM infrastructure, there - are projects that improve software that uses, but is not included with, the - LLVM compiler infrastructure. These projects include open-source software - projects and research projects that use LLVM. Like projects that enhance the - core LLVM infrastructure, these projects are often challenging and rewarding. -

- -
- - -
- Encode Analysis Results in MachineInstr IR -
- -
-

- At least one project (and probably more) needs to use analysis information - (such as call graph analysis) from within a MachineFunctionPass, however, - most analysis passes operate at the LLVM IR level. In some cases, a value - (e.g., a function pointer) cannot be mapped from the MachineInstr level back - to the LLVM IR level reliably, making the use of existing LLVM analysis - passes from within a MachineFunctionPass impossible (or at least brittle). -

- -

- This project is to encode analysis information from the LLVM IR level into - the MachineInstr IR when it is generated so that it is available to a - MachineFunctionPass. The exemplar is call graph analysis (useful for - control-flow integrity instrumentation, analysis of code reuse defenses, and - gadget compilers); however, other LLVM analyses may be useful. -

-
- - -
- Code Layout in the LLVM JIT -
- -
-

- Implement an on-demand function relocator in the LLVM JIT. This can help - improve code locality using runtime profiling information. The idea is to use - a relocation table for every function. The relocation entries need to be - updated upon every function relocation (take a look at - - this article). - A (per-function) basic block reordering would be a useful extension. -

-
- - -
- Improved Structure Splitting and Field Reordering -
- -
-

- The goal of this project is to implement better data layout optimizations - using the model of reference affinity. This - - paper - provides some background information. -

-
- - -
- Finish the Slimmer Project -
- -
-

- Slimmer is a prototype tool, built using LLVM, that uses dynamic analysis to - find potential performance bugs in programs. Development on Slimmer started - during Google Summer of Code in 2015 and resulted in an initial prototype, - but evaluation of the prototype and improvements to make it portable and - robust are still needed. This project would have a student pick up and - finish the Slimmer work. The source code of Slimmer and - its current documentation can be found at its - Github web page. -

-
-