Skip to content

Data-oriented representation of Expression partly on stack and optimized for constant collection #463

@dadhi

Description

@dadhi

Imagine having a compact and serializable Expression representation by using a SOA (structure of arrays):

public struct Expr
{
    public int RootLambdaIndex;
    
    // using ImTools.SmallList4 will keep the first 4 items on stack

    public SmallList4<LambdaItem> Lambdas; // the root lambda is referred by index
    public SmallList4<object> Constants;
    public SmallList4<ParamOrVarEntry> ParamsAndVars;
    public SmallList4<MethodLikeItem> CtorsMethodsInvokes;
    
    public SmallList4<BinaryEntry> Binaries;
    public SmallList4<UnaryEntry> Unaries;
    
    // ... the rest non-conformists
}

// The lambda is repsented as a slice of `LambdaItem`, 0 element in slice denoting the ParamCount. 
// The next ParamCount items refer to `ParamsAndVars` by index in `ParamCountOrParamIndex`, 
// the `ReturnTypeOrNullForParam` being null (@perf how to optimize it even more?)
public struct LambdaItem
{
    public int ParamCountOrParamIndex;
    public Type ReturnTypeOrNullForParam;
}

public struct ParamOrVarEntry
{
    public Type ParamType;
    public bool IsByRef;

    // For now Name is a string, but it may be even a slice referencing to the external continuos text of all strings, 
    // including all Param and Var names.
    public string NameMaybe;
}

public struct MethodLikeItem
{
    public MethodBase Method; // ConstructorInfo, MethodInfo, etc.
    // Instead of enum and tranlastion to the corresponing list Expr, it may be the offest inside Expr struct, or both the enum with the offset value
    public ExprKind ParamKind;
    public int ParamIndex;
}

// Can be constructed like this:
Expr expr = default;
expr.Lambda<Func<int, Foo>>(
    expr.New(typeof(Foo).GetConstructors()[0], 
        Constant("hey, "), 
        expr.Parameter(typeof(int), "n", same: 0), 
        Constant("sistah", putIntoClosure: true)),
    expr.Parameter(same: 0)
);


Lambda<int, Foo> lambda = expr.CompileFastToLambda();
var fooSis = lambda.Invoke(31);

lambda.Closure.SetConstant(0, "brotha");
var fooBro = lambda.Invoke(42);

// Where the Lambda might be
public struct Lambda<T1, R> 
{
    public Func<ArrayClosure, T1, R> Func;
    public FriendlyClosure Closure;
    public R Invoke<T1>(T1 t1) => Func(FriendlyClosure.ArrayClosure, t1); 
}

The benefits:

  1. Less memory consumed overall.
  2. Small expressions are fully on stack.
  3. Using the indexes to refer to the children expressions makes it transferrable and persistent over the network, benefits the De/Serialization making it almost no-op.
  4. Data is grouped and serialized by the ExpressionType simplifies the processing of the specific ExpressionType, e.g. collecting the Constants does not require the full expression traversal, nor the collecting of the nested Lambdas or the nested Blocks.

The cons:

  1. Not compatible with System Expression API and requires the conversion.

Update - 2025.04.22

  • Expression forest as a single struct of "array"
  • Use precise capacity for "arrays" for the expression with all known sub-expressions, number of constants, calls, etc.

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions