This document outlines the known limitations of VIDEX. The primary goal of VIDEX is to simulate the MySQL/MariaDB optimizer's behavior for "what-if" index analysis. Its accuracy depends on simulating the database's internal cost models and using statistical metadata.
Important: Due to the inherent complexity of database systems, many edge cases remain untested. This list is not exhaustive. We strongly encourage users to test VIDEX's behavior in their own environments and welcome contributions and issue reports for any discrepancies found.
This section details the database features that VIDEX does not support or has not been fully tested.
- Unsupported: VIDEX may not correctly handle tables containing the following data types:
BLOBandTEXTfamilies (BLOB,MEDIUMBLOB,LONGBLOB,TEXT,MEDIUMTEXT,LONGTEXT). (See Issue #66)JSONdata type.
- Insufficiently Tested: The following data types have not been thoroughly tested and may lead to unexpected behavior:
- For MySQL 8+:
BIT,YEAR,TIME,SET, and Spatial Data Types (GEOMETRY,POINT, etc.). - For MariaDB 11.8+:
BIT,INET4/INET6,SERIAL,ROW TYPE OF,TYPE OF,SQL_TSI_YEAR,YEAR,TIME.
- For MySQL 8+:
- Functional Indexes: Currently not supported, though under internal testing. (See Issue #66)
- FULL-TEXT Indexes: Currently not supported, though under internal testing.
- Spatial Indexes.
- Multi-valued Indexes (MySQL 8+).
- Cost simulation for equality lookups on Primary Keys (PK) and Unique Keys (UK) has known issues. (See Issue #67)
EXPLAINOnly: VIDEX is designed to work withEXPLAINstatements. It does not supportEXPLAIN ANALYZE, asANALYZEinvolves actual query execution, which is outside VIDEX's simulation scope.- Single-Row Subqueries: May not be simulated correctly in all cases. (See Issue #68)
- Table Structures: Features like Partitioned Tables, Temporary Tables, and Sequences have not been sufficiently tested.
- Column Operations: While
ADD/DROP INDEXis fully supported, the impact of otherALTER TABLEoperations like adding or dropping columns has not been fully tested.
VIDEX's accuracy is fundamentally tied to the quality of its statistical metadata and its ability to capture the joint distributions of multiple columns.
- Multi-Column NDV Estimation: VIDEX only collects multi-column NDV (Number of Distinct Values) statistics for existing composite indexes. For other column combinations, it defaults to the independence assumption, which often overestimates the true NDV for correlated data.
- Multi-Column Cardinality Estimation: Similarly, cardinality estimation for predicates on multiple columns defaults to the independence assumption, which can lead to an underestimation of selectivity (and thus, cardinality) for correlated data.
pct_cachedVariability: Thepct_cachedparameter, which estimates the percentage of an index residing in the buffer pool, is highly dynamic in a live production environment. VIDEX's simulation of this value is a simplification and may not reflect the real-time state of the cache.- Dependency on AI Models: If you integrate a custom AI model for NDV and cardinality estimation, the accuracy of VIDEX's predictions becomes entirely dependent on the performance and accuracy of that model.
- Deviation from InnoDB's Estimation: In some cases, particularly for range queries, InnoDB's own cardinality estimates can be significantly inaccurate. VIDEX, by using more precise statistics (like histograms), may produce estimates closer to the ground truth. This can result in VIDEX generating a different but potentially better query plan than the one produced on the actual InnoDB instance. While VIDEX is more 'correct' in this scenario, it deviates from its goal of perfectly mimicking the original optimizer's behavior. (See Issue #5)
Unverified AI Models: While AI models (PLM4NDV, AdaNDV) have been integrated (see #46, #48), they lack comprehensive validation on benchmarks like TPC-H and JOB. Additionally, the model training pipelines are not included in this repository, necessitating further documentation and testing for user adoption.
- Performance Impact of Default Collection: The default metadata collection scripts can impact production database performance, although collection operations (NDV and histogram) are run at most once per column.
- Sampling-based Metadata Collection: This feature, introduced in #48, has not been fully tested and requires further validation on benchmarks like TPC-H and JOB. (See Issue #69)
- Stale Statistics: VIDEX does not automatically refresh statistics. The collected metadata can become stale if the underlying data changes significantly, leading to inaccurate plan simulations.
VIDEX's scope is limited to simulating the query optimizer's planning phase. It does not simulate the actual query execution environment.
- Single-Query Focus: VIDEX is designed to simulate a single query in isolation. It does not account for runtime behaviors or database features like:
- Triggers
- Stored Procedures and Functions
- Transactions and their isolation levels
- Locking and concurrency effects
- Views: Support for
VIEWs has undergone limited testing and is not guaranteed to be fully stable or accurate in all scenarios.
- VIDEX has been primarily tested on MySQL 8.0, MySQL 5.7, and MariaDB 11.8. While it may function on other patch versions, compatibility is not guaranteed.
- For a detailed support matrix, please refer to the
Version Supportsection in the mainREADME.md.
For a list of currently open bugs and known issues, please consult our GitHub Issues page, filtered by the "Bug" label:
https://github.com/bytedance/videx/issues?q=is%3Aissue+is%3Aopen+label%3ABug