Skip to content

Composite specifications #541

@fiseni

Description

@fiseni

Every once in a while, we get requests for composite specifications. In this issue, I want to elaborate more on why we've been reluctant adding them and why we think they might not be the best idea for this library.

First, let's shortly explain the "original" specification pattern. It was proposed in the early 2000s by Eric Evans and Martin Fowler. The core idea was to encapsulate business rules/conditions into separate constructs. Then, using boolean operations (AND/OR/NOT), these atomic specifications may be combined/composed to form composite specifications. They will mainly emit a boolean result, whether a given object/entity satisfies the specification or not. And that's the crucial point, they're all about criteria.

This library, on the other hand, implements query specifications. The main goal is to extract common queries into separate constructs, and apply them for different providers. The intent was clean and concise since its inception. It was always about queries. That said, the overall design was optimized for this purpose. The aim was to have as little overhead as possible, and we've done tons of optimizations to achieve that. Allocation-wise, as seen from the benchmarks, we barely have 0.5% overhead, and the execution time is in the range of being statistically insignificant. We plan to improve this even further in the next versions.

Why am I writing about it? The design that makes this library efficient for queries is the reason that makes it not fit for in-memory operations. The primary issue is the state, or the data. Once you design it for queries and keep the state as expressions, the implementation for the other usage, at best, will be "mediocre". Any in-memory operation will require us to compile the expressions we store. That's a very expensive process. In case of in-memory collections and Evaluate feature, we're caching the delegates locally; so for large collections, it might be acceptable. But, for IsSatisifedBy feature, it's quite inefficient. We're compiling expressions just to check a single entity. Even if we ignore the compiling part, just a waste of storing the criteria as an expression (instead of a delegate) is not small at all.

Expression<Func<Customer, bool>> criteria = x => x.Age > 18

This seemingly simple expression (no capturing, no closures) allocates ~600 bytes. That would be totally acceptable for queries, since users anyway will create these expressions and there won't be any overhead from our side. However, that's not the case for in-memory operations.

Let me be more pragmatic and provide some simple benchmarks. As seen here, this is a very simple and rudimentary example, and yet, it allocates 10K of memory. In case of more complex cases, and composite specifications, this value will be drastically higher.

Method Mean Error StdDev Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
IfStatements 0.7186 ns 0.3452 ns 0.0189 ns 1.00 0.03 - - - NA
Specifications 65,891.1906 ns 12,552.7049 ns 688.0561 ns 91,731.85 2,221.37 0.7324 0.6104 10371 B NA
Benchmark Code
using Ardalis.Specification;
using BenchmarkDotNet.Attributes;

namespace CompositeSpecifications;

[MemoryDiagnoser]
[ShortRunJob]
public class Benchmark
{
    private Customer _customer = null!;
    private Order _order = null!;

    [GlobalSetup]
    public void Setup()
    {
        _customer = new Customer(30);
        _order = new Order("Alcohol");
    }

    [Benchmark(Baseline = true)]
    public bool IfStatements()
    {
        return _customer.Age >= 21
            && _order.ItemName == "Alcohol";
    }

    [Benchmark]
    public bool Specifications()
    {
        var adultSpec = new AdultCustomerSpec();
        var alcoholSpec = new AlcoholBeveragesSpec();
        return adultSpec.IsSatisfiedBy(_customer)
            && alcoholSpec.IsSatisfiedBy(_order);
    }

    public class AdultCustomerSpec : Specification<Customer>
    {
        public AdultCustomerSpec()
            => Query.Where(c => c.Age >= 21);
    }

    public class AlcoholBeveragesSpec : Specification<Order>
    {
        public AlcoholBeveragesSpec()
            => Query.Where(o => o.ItemName == "Alcohol");
    }

    public record Customer(int Age);
    public record Order(string? ItemName);
}

You may be compelled to say we've done a terrible job. That won't be totally accurate. Here is another benchmark to show you the pure cost of compiling expressions. Creating and compiling an expression (a very simple one) allocates 5K of memory. In the previous example, we had two of them, hence 10K allocations. So, all of the cost originates simply by this operation.

Method Mean Error StdDev Gen0 Gen1 Allocated
Delegate 0.7886 ns 1.0418 ns 0.0571 ns - - -
Expression 266.9091 ns 100.0398 ns 5.4835 ns 0.0458 - 584 B
CompiledExpr 14,349.5107 ns 1,907.9908 ns 104.5834 ns 0.3662 0.3357 4903 B
Benchmark Code
using BenchmarkDotNet.Attributes;
using System.Linq.Expressions;

namespace CompositeSpecifications;

[MemoryDiagnoser]
[ShortRunJob]
public class Benchmark
{
    public record Customer(int Age);

    [Benchmark]
    public Func<Customer, bool> Delegate()
    {
        return x => x.Age >= 21;
    }

    [Benchmark]
    public Expression<Func<Customer, bool>> Expression()
    {
        return x => x.Age >= 21;
    }

    [Benchmark]
    public Func<Customer, bool> CompiledExpr()
    {
        Expression<Func<Customer, bool>> expr = x => x.Age >= 21;
        var func = expr.Compile();
        return func;
    }
}

And there is not much to do here. Either we optimize for queries (and that's 95% of our users), or we optimize for in-memory operations.

I hope it's more clear why we've been so reluctant on expanding the in-memory features. Yes, we do indeed have exposed the IsSatisfiedBy functionality. But at this point, that's a niche feature (somwhere on the edge) that noone talks about. If we do expand into composite specifications and other related features, that somehow will become the main theme of the library, while offering sub-standard performance. We do care about the quality and performance very deeply. The epic for version 9 was all about reducing allocations, and we went to great lengths just to reduce a few bytes of allocation.

There are other reasons why we were avoiding composite specifications. We've elaborated that on our FAQ page. But, in this post/issue, I didn't want to focus on all the "subjective" reasons, and instead focus on a very tangible and objective reason.

We'll leave this issue open, and we're keen to hear your opinion. Now that you have more details on this issue, would you still consider using those new features in this library?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions