[feature](variant) schema template auto cast#60362
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31752 ms |
ClickBench: Total hot run time: 28.57 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31852 ms |
ClickBench: Total hot run time: 28.51 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 31759 ms |
ClickBench: Total hot run time: 28.38 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 32287 ms |
ClickBench: Total hot run time: 28.23 s |
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-H: Total hot run time: 30810 ms |
ClickBench: Total hot run time: 28.26 s |
FE UT Coverage ReportIncrement line coverage |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
1 similar comment
FE Regression Coverage ReportIncrement line coverage |
|
run p0 5 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
There was a problem hiding this comment.
Pull request overview
Implements schema-template-based auto-casting for VARIANT path expressions during Nereids analysis, gated by a new session variable, and aligns glob-pattern matching behavior across FE/BE with expanded regression/unit tests.
Changes:
- Add session variable
enable_variant_schema_auto_castand apply template-based casts forVARIANTElementAtpaths during expression analysis. - Introduce shared glob→regex utilities (FE) and switch BE variant glob matching to RE2 with caching.
- Add end-to-end regression tests and focused FE/BE unit tests for field matching and auto-cast behavior.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| regression-test/suites/variant_p0/predefine/test_schema_template_auto_cast.groovy | New regression suite covering auto-cast across clauses, chained paths, alias/subquery, joins, and disable-switch behavior. |
| regression-test/data/variant_p0/predefine/test_schema_template_auto_cast.out | Expected outputs for the new regression suite. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/types/VariantFieldMatchTest.java | Unit tests for glob/exact matching and VariantType.findMatchingField. |
| fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzerVariantAutoCastTest.java | Unit tests validating analysis-time cast insertion and chained-path behavior. |
| fe/fe-core/src/test/java/org/apache/doris/common/GlobRegexUtilTest.java | Unit tests for FE glob→regex conversion and cache behavior. |
| fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java | Adds enable_variant_schema_auto_cast session variable definition and accessor. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantType.java | Adds findMatchingField helper for schema template lookup. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantField.java | Adds matches() for glob/exact matching using FE glob-regex utility. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/ElementAt.java | Preserves VariantType predefined fields in signature computation to support template matching. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzer.java | Adds variant schema auto-cast logic for ElementAt and alias binding behavior. |
| fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java | Updates join-equality variant checks to account for casts. |
| fe/fe-core/src/main/java/org/apache/doris/common/GlobRegexUtil.java | New FE glob→regex conversion + small compiled-pattern LRU cache. |
| fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java | Switch inverted-index glob matching to new FE glob-regex matcher. |
| be/test/olap/rowset/segment_v2/variant_util_test.cpp | Adds BE tests for glob→regex conversion and RE2-based matching. |
| be/src/vec/common/variant_util.h | Declares BE glob→regex and glob match helpers. |
| be/src/vec/common/variant_util.cpp | Implements BE glob→regex and RE2 glob matching with an LRU cache; replaces fnmatch usage. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| std::mutex g_glob_regex_cache_mutex; | ||
| std::list<std::string> g_glob_regex_cache_lru; | ||
| std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache; |
There was a problem hiding this comment.
The glob regex cache globals have external linkage (g_glob_regex_cache_mutex, g_glob_regex_cache_lru, g_glob_regex_cache). Since they're only used in this translation unit, consider marking them static or moving them into an anonymous namespace to avoid unintended symbol exports and reduce the chance of name collisions.
| std::mutex g_glob_regex_cache_mutex; | |
| std::list<std::string> g_glob_regex_cache_lru; | |
| std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache; | |
| static std::mutex g_glob_regex_cache_mutex; | |
| static std::list<std::string> g_glob_regex_cache_lru; | |
| static std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache; |
| private Expression wrapVariantElementAtWithCast(Expression expr) { | ||
| ElementAt elementAt = (ElementAt) expr; | ||
| if (suppressVariantElementAtCastDepth > 0) { | ||
| return elementAt; | ||
| } |
There was a problem hiding this comment.
wrapVariantElementAtWithCast assumes the input is always an ElementAt and unconditionally casts expr to ElementAt. This can throw ClassCastException when auto-cast is enabled and maybeCastAliasExpression passes an Alias whose child is not ElementAt (e.g., StructElement from struct dereference). Consider guarding with instanceof ElementAt (and ideally also ensuring the resolved root type is VariantType) and returning expr unchanged when it doesn't apply.
| return alias; | ||
| } | ||
| Expression child = alias.child(); | ||
| Expression casted = wrapVariantElementAtWithCast(child); |
There was a problem hiding this comment.
maybeCastAliasExpression calls wrapVariantElementAtWithCast(alias.child()) without checking the child expression type. When Alias wraps non-ElementAt expressions (e.g., StructElement from nested struct access), this will crash due to the cast inside wrapVariantElementAtWithCast. Add a type check (e.g., only attempt when child is ElementAt) or make wrapVariantElementAtWithCast safely no-op for non-ElementAt inputs.
| Expression casted = wrapVariantElementAtWithCast(child); | |
| Expression casted = child; | |
| if (child instanceof ElementAt) { | |
| casted = wrapVariantElementAtWithCast(child); | |
| } |
| } | ||
|
|
||
| private boolean containsVariantTypeOutsideCast(Expression expr, boolean underCast) { | ||
| boolean nextUnderCast = underCast || expr instanceof Cast; |
There was a problem hiding this comment.
containsVariantTypeOutsideCast treats any Cast node as a safe boundary and ignores variant-typed expressions under it. This can incorrectly allow join equal conditions that still evaluate to VARIANT (e.g., a cast whose target type is VariantType / identity casts), bypassing the intended restriction. Consider only suppressing the check when the Cast's result type is non-variant (or checking expr instanceof Cast && !expr.getDataType().isVariantType()).
| boolean nextUnderCast = underCast || expr instanceof Cast; | |
| boolean nextUnderCast = underCast || (expr instanceof Cast && !expr.getDataType().isVariantType()); |
| if (sessionVariable == null || !sessionVariable.isEnableVariantSchemaAutoCast()) { | ||
| return false; | ||
| } | ||
| return sessionVariable.isEnableVariantSchemaAutoCast(); |
There was a problem hiding this comment.
isEnableVariantSchemaAutoCast() redundantly checks and then returns sessionVariable.isEnableVariantSchemaAutoCast() twice. This can be simplified to a single return expression to reduce noise and avoid future inconsistencies if the logic changes.
| if (sessionVariable == null || !sessionVariable.isEnableVariantSchemaAutoCast()) { | |
| return false; | |
| } | |
| return sessionVariable.isEnableVariantSchemaAutoCast(); | |
| return sessionVariable != null && sessionVariable.isEnableVariantSchemaAutoCast(); |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
This PR implements Variant Schema Template Auto Cast end-to-end. It applies schema-template-based casts during analysis so behavior is consistent across all clauses (
SELECT/WHERE/ORDER/GROUP/HAVING/JOIN/window), supports chained paths (a.b/ a['b']) with correct path resolution, and makes alias-basedORDER BY/GROUP BY/JOINON work by restoring original expressions via alias mapping. A single global switch enable_variant_schema_auto_cast controls the feature. Regression tests are expanded to cover leaf vs non-leaf paths, alias/subquery scenarios, and ordering/aggregation/join behavior.doc: apache/doris-website#3339
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)