Skip to content

[Improvement]: Simplify table process extension for table formats #4095

@baiyangtx

Description

@baiyangtx

Search before asking

  • I have searched in the issues and found no similar issues.

What would you like to be improved?

Currently, extending table processes for a new TableFormat requires wiring several low-level, AMS-internal components together:

  • Implementing TableRuntimeFactory and coupling it with AMS table runtime lifecycle.
  • Implementing ActionCoordinator and handling process creation / recovery / retry logic via AMS internals.
  • Managing process state via ProcessState / TableProcessState and other AMS storage details.

This makes it hard for plugin authors to add new processes for additional table formats (e.g. Paimon, Lance) or actions (e.g. compaction, snapshot expiration) in a simple and consistent way. The current entrypoint is too low-level and leaks AMS internal implementation details to plugin implementations.

How should we improve?

We should provide a higher-level and simpler extension point for table processes, and let plugins focus on describing:

  • which TableFormat and Action combinations they support, and
  • how to trigger and build the corresponding table processes.

The direction (implemented in PR #4081) is:

  1. Promote ProcessFactory to an ActivePlugin

    • Let ProcessFactory work directly with TableRuntime / TableProcess instead of low-level ProcessState / TableProcessState.
    • Introduce ProcessTriggerStrategy so each ProcessFactory can describe its trigger behavior (interval, snapshot-driven triggering, parallelism, etc.).
    • Enhance the ProcessFactory interface to expose supported table formats, actions and scheduling policies.
  2. Integrate ProcessFactory into the plugin system

    • Which will make we can management ProcessFacotries via standrad plugin management configuration and SPI interface.
  3. Unify table runtime implementations

    • Rename the original DefaultTableRuntime to CompatibleTableRuntime.
    • Provide a new DefaultTableRuntime implementation that works across table formats.
    • Update optimizing, scheduler and table runtime code to use the new abstractions while keeping behavior consistent.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Subtasks

No response

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions