Skip to content

[feature](variant) schema template auto cast#60362

Merged
eldenmoon merged 28 commits intoapache:masterfrom
gary-cloud:schema-cast
Feb 9, 2026
Merged

[feature](variant) schema template auto cast#60362
eldenmoon merged 28 commits intoapache:masterfrom
gary-cloud:schema-cast

Conversation

@gary-cloud
Copy link
Contributor

@gary-cloud gary-cloud commented Jan 29, 2026

What problem does this PR solve?
Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
This PR implements Variant Schema Template Auto Cast end-to-end. It applies schema-template-based casts during analysis so behavior is consistent across all clauses (SELECT/WHERE/ORDER/GROUP/HAVING/JOIN/window), supports chained paths (a.b/ a['b']) with correct path resolution, and makes alias-based ORDER BY/GROUP BY/JOIN ON work by restoring original expressions via alias mapping. A single global switch enable_variant_schema_auto_cast controls the feature. Regression tests are expanded to cover leaf vs non-leaf paths, alias/subquery scenarios, and ordering/aggregation/join behavior.

doc: apache/doris-website#3339

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 29, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@gary-cloud
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 80.56% (87/108) 🎉
Increment coverage report
Complete coverage report

@gary-cloud
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31752 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2fb3545ddcb3b2fc79ad6a643f2b6a176614a0d6, data reload: false

------ Round 1 ----------------------------------
q1	17651	5209	5063	5063
q2	2012	322	191	191
q3	10210	1300	744	744
q4	10218	893	319	319
q5	7548	2157	1906	1906
q6	193	178	150	150
q7	878	737	616	616
q8	9258	1372	1042	1042
q9	5248	4783	4834	4783
q10	6820	1955	1559	1559
q11	545	286	292	286
q12	334	379	228	228
q13	17799	4034	3242	3242
q14	230	246	216	216
q15	905	860	815	815
q16	660	672	630	630
q17	626	765	505	505
q18	6608	6469	6394	6394
q19	1402	1003	630	630
q20	392	346	225	225
q21	2556	2032	1927	1927
q22	347	310	281	281
Total cold run time: 102440 ms
Total hot run time: 31752 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5323	5322	5326	5322
q2	262	342	234	234
q3	2129	2679	2238	2238
q4	1374	1747	1275	1275
q5	4261	4158	4233	4158
q6	215	188	139	139
q7	1919	2298	1911	1911
q8	2616	2365	2417	2365
q9	7540	7369	7465	7369
q10	2858	3034	2685	2685
q11	542	481	440	440
q12	651	710	626	626
q13	3892	4518	3740	3740
q14	352	313	277	277
q15	875	841	833	833
q16	676	720	687	687
q17	1179	1351	1443	1351
q18	8524	7984	7941	7941
q19	880	828	832	828
q20	2098	2221	2031	2031
q21	4942	4214	4087	4087
q22	598	591	529	529
Total cold run time: 53706 ms
Total hot run time: 51066 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2fb3545ddcb3b2fc79ad6a643f2b6a176614a0d6, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.08
query4	1.61	0.11	0.12
query5	0.27	0.25	0.25
query6	1.16	0.69	0.67
query7	0.03	0.03	0.02
query8	0.05	0.04	0.04
query9	0.56	0.49	0.49
query10	0.56	0.54	0.55
query11	0.14	0.09	0.10
query12	0.14	0.10	0.11
query13	0.62	0.61	0.63
query14	1.09	1.05	1.05
query15	0.88	0.86	0.88
query16	0.39	0.39	0.39
query17	1.08	1.12	1.12
query18	0.23	0.21	0.21
query19	2.12	2.02	2.01
query20	0.02	0.01	0.02
query21	15.39	0.28	0.16
query22	5.09	0.06	0.05
query23	15.73	0.28	0.10
query24	1.45	0.61	0.72
query25	0.12	0.06	0.08
query26	0.14	0.14	0.13
query27	0.08	0.05	0.05
query28	4.87	1.13	0.96
query29	12.57	3.92	3.16
query30	0.29	0.13	0.12
query31	2.82	0.64	0.40
query32	3.24	0.60	0.51
query33	3.23	3.22	3.30
query34	15.94	5.36	4.72
query35	4.78	4.73	4.81
query36	0.64	0.49	0.48
query37	0.11	0.07	0.07
query38	0.07	0.04	0.03
query39	0.04	0.03	0.02
query40	0.19	0.16	0.16
query41	0.09	0.03	0.04
query42	0.04	0.02	0.02
query43	0.05	0.03	0.03
Total cold run time: 98.33 s
Total hot run time: 28.57 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 73.15% (79/108) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 78.70% (85/108) 🎉
Increment coverage report
Complete coverage report

@gary-cloud
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31852 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3d6e7c2e8e9ad6fdfb96cc89e03adfa4003c7f40, data reload: false

------ Round 1 ----------------------------------
q1	17610	5315	5096	5096
q2	2072	303	197	197
q3	10225	1301	744	744
q4	10217	881	320	320
q5	7548	2168	1921	1921
q6	206	187	151	151
q7	886	740	624	624
q8	9268	1382	1050	1050
q9	5159	4825	4759	4759
q10	6835	1956	1569	1569
q11	502	303	297	297
q12	340	384	232	232
q13	17768	4053	3215	3215
q14	242	238	219	219
q15	875	829	808	808
q16	687	673	632	632
q17	663	819	482	482
q18	6648	6488	6453	6453
q19	1248	1008	629	629
q20	396	344	236	236
q21	2666	2026	1949	1949
q22	350	316	269	269
Total cold run time: 102411 ms
Total hot run time: 31852 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5322	5333	5351	5333
q2	258	342	260	260
q3	2191	2699	2246	2246
q4	1339	1763	1339	1339
q5	4333	4207	4364	4207
q6	218	185	141	141
q7	2122	2079	1988	1988
q8	2752	2558	2483	2483
q9	7646	7671	7536	7536
q10	2847	3096	2596	2596
q11	552	508	484	484
q12	714	757	720	720
q13	3895	4557	3754	3754
q14	323	320	311	311
q15	912	847	831	831
q16	703	771	692	692
q17	1150	1468	1434	1434
q18	8187	8348	7867	7867
q19	857	801	798	798
q20	1991	2038	1903	1903
q21	4564	4203	4062	4062
q22	599	529	509	509
Total cold run time: 53475 ms
Total hot run time: 51494 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.51 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3d6e7c2e8e9ad6fdfb96cc89e03adfa4003c7f40, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.26	0.09	0.08
query4	1.60	0.11	0.12
query5	0.28	0.25	0.26
query6	1.18	0.67	0.67
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.48
query10	0.56	0.55	0.53
query11	0.14	0.11	0.09
query12	0.14	0.11	0.10
query13	0.63	0.62	0.61
query14	1.06	1.07	1.08
query15	0.88	0.87	0.89
query16	0.39	0.39	0.40
query17	1.10	1.13	1.11
query18	0.22	0.21	0.21
query19	2.14	2.03	2.10
query20	0.02	0.01	0.02
query21	15.41	0.27	0.15
query22	5.40	0.05	0.05
query23	16.13	0.28	0.11
query24	1.22	0.38	1.18
query25	0.09	0.07	0.12
query26	0.15	0.13	0.14
query27	0.07	0.05	0.06
query28	4.77	1.14	0.97
query29	12.56	3.92	3.13
query30	0.28	0.14	0.12
query31	2.81	0.63	0.42
query32	3.24	0.60	0.50
query33	3.28	3.33	3.29
query34	16.30	5.38	4.72
query35	4.85	4.83	4.80
query36	0.64	0.50	0.48
query37	0.11	0.08	0.07
query38	0.08	0.04	0.04
query39	0.04	0.03	0.03
query40	0.19	0.16	0.15
query41	0.09	0.04	0.04
query42	0.04	0.03	0.02
query43	0.05	0.04	0.04
Total cold run time: 99.17 s
Total hot run time: 28.51 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 77.52% (100/129) 🎉
Increment coverage report
Complete coverage report

@gary-cloud
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31759 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit cd777b776723ba66fa7cf50edc4dacf3a7ea8cec, data reload: false

------ Round 1 ----------------------------------
q1	17632	5329	5069	5069
q2	2006	319	201	201
q3	10209	1276	733	733
q4	10202	827	316	316
q5	7542	2131	1887	1887
q6	194	181	148	148
q7	901	742	604	604
q8	9263	1401	1048	1048
q9	5156	4870	4823	4823
q10	6806	1926	1565	1565
q11	518	285	281	281
q12	336	370	225	225
q13	17796	4039	3239	3239
q14	231	237	213	213
q15	893	823	809	809
q16	674	688	612	612
q17	654	716	576	576
q18	6623	6522	6333	6333
q19	1239	975	606	606
q20	379	338	234	234
q21	2605	2015	1961	1961
q22	358	308	276	276
Total cold run time: 102217 ms
Total hot run time: 31759 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5315	5287	5301	5287
q2	262	345	250	250
q3	2197	2743	2233	2233
q4	1324	1727	1295	1295
q5	4242	4135	4166	4135
q6	219	179	137	137
q7	1898	2281	1820	1820
q8	2647	2446	2499	2446
q9	7401	7466	7380	7380
q10	2900	3057	2554	2554
q11	559	475	452	452
q12	653	732	633	633
q13	3891	4510	3691	3691
q14	298	322	299	299
q15	858	810	809	809
q16	683	926	659	659
q17	1146	1353	1337	1337
q18	8135	7656	7737	7656
q19	899	842	815	815
q20	2057	2148	2011	2011
q21	4669	4235	4020	4020
q22	566	536	524	524
Total cold run time: 52819 ms
Total hot run time: 50443 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.38 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit cd777b776723ba66fa7cf50edc4dacf3a7ea8cec, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.04
query3	0.25	0.08	0.08
query4	1.60	0.11	0.11
query5	0.28	0.26	0.25
query6	1.16	0.70	0.67
query7	0.04	0.02	0.03
query8	0.05	0.04	0.04
query9	0.58	0.50	0.50
query10	0.54	0.54	0.55
query11	0.14	0.09	0.10
query12	0.14	0.10	0.10
query13	0.64	0.62	0.62
query14	1.07	1.06	1.09
query15	0.90	0.87	0.88
query16	0.41	0.41	0.40
query17	1.11	1.14	1.18
query18	0.24	0.22	0.22
query19	2.13	1.98	2.06
query20	0.02	0.02	0.02
query21	15.38	0.26	0.14
query22	5.08	0.05	0.05
query23	15.91	0.28	0.10
query24	0.93	0.25	0.35
query25	0.09	0.11	0.06
query26	0.16	0.13	0.14
query27	0.11	0.08	0.04
query28	3.66	1.20	0.97
query29	12.54	3.93	3.16
query30	0.28	0.13	0.11
query31	2.81	0.68	0.40
query32	3.24	0.61	0.51
query33	3.31	3.30	3.23
query34	15.87	5.58	4.81
query35	4.83	4.76	4.80
query36	0.67	0.50	0.50
query37	0.11	0.07	0.06
query38	0.07	0.04	0.04
query39	0.05	0.03	0.04
query40	0.19	0.16	0.15
query41	0.08	0.03	0.02
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.91 s
Total hot run time: 28.38 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 81.58% (62/76) 🎉
Increment coverage report
Complete coverage report

@gary-cloud
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32287 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c21e1182dadc032e39e82be4ef9876b82b71f5bb, data reload: false

------ Round 1 ----------------------------------
q1	17661	5340	5067	5067
q2	2043	309	192	192
q3	10227	1325	754	754
q4	10211	825	315	315
q5	7528	2130	1921	1921
q6	193	181	149	149
q7	881	731	603	603
q8	9271	1379	1094	1094
q9	5113	4829	4891	4829
q10	6845	1944	1550	1550
q11	503	282	279	279
q12	355	382	233	233
q13	17770	4071	3194	3194
q14	234	232	223	223
q15	893	825	803	803
q16	678	674	638	638
q17	623	765	518	518
q18	6735	6564	7280	6564
q19	1351	1072	651	651
q20	413	387	264	264
q21	2811	2267	2168	2168
q22	368	331	278	278
Total cold run time: 102707 ms
Total hot run time: 32287 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5522	5436	5484	5436
q2	278	347	246	246
q3	2449	2861	2568	2568
q4	1535	1908	1454	1454
q5	5114	4572	4547	4547
q6	226	182	139	139
q7	2054	1936	1783	1783
q8	2531	2375	2420	2375
q9	7545	7774	7685	7685
q10	2780	3102	2819	2819
q11	562	476	452	452
q12	715	712	582	582
q13	3668	4082	3380	3380
q14	276	304	290	290
q15	852	814	813	813
q16	646	699	649	649
q17	1100	1359	1387	1359
q18	7611	7538	7495	7495
q19	889	804	839	804
q20	1993	2043	1881	1881
q21	4527	4230	4119	4119
q22	598	554	510	510
Total cold run time: 53471 ms
Total hot run time: 51386 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.23 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c21e1182dadc032e39e82be4ef9876b82b71f5bb, data reload: false

query1	0.06	0.05	0.05
query2	0.09	0.05	0.05
query3	0.26	0.08	0.08
query4	1.61	0.11	0.12
query5	0.27	0.24	0.24
query6	1.16	0.68	0.67
query7	0.03	0.02	0.02
query8	0.06	0.04	0.04
query9	0.57	0.50	0.49
query10	0.56	0.55	0.55
query11	0.14	0.10	0.11
query12	0.14	0.10	0.10
query13	0.62	0.62	0.63
query14	1.06	1.07	1.04
query15	0.88	0.86	0.86
query16	0.39	0.42	0.37
query17	1.10	1.06	1.14
query18	0.23	0.24	0.21
query19	2.03	1.96	1.93
query20	0.02	0.02	0.01
query21	15.42	0.26	0.15
query22	5.25	0.06	0.05
query23	16.00	0.28	0.10
query24	1.22	0.38	0.20
query25	0.08	0.11	0.11
query26	0.15	0.13	0.13
query27	0.09	0.05	0.06
query28	3.98	1.17	0.96
query29	12.54	3.89	3.18
query30	0.28	0.13	0.12
query31	2.81	0.64	0.41
query32	3.25	0.59	0.50
query33	3.16	3.25	3.21
query34	16.35	5.40	4.76
query35	4.84	4.84	4.84
query36	0.63	0.50	0.49
query37	0.11	0.07	0.06
query38	0.08	0.04	0.04
query39	0.05	0.03	0.04
query40	0.18	0.16	0.15
query41	0.10	0.04	0.03
query42	0.05	0.03	0.04
query43	0.04	0.04	0.03
Total cold run time: 97.94 s
Total hot run time: 28.23 s

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 81.58% (62/76) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

@eldenmoon
Copy link
Member

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 30810 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 566e82c354ca8ade00cdbfb7ed4ae291d3f211ac, data reload: false

------ Round 1 ----------------------------------
q1	17599	4505	4290	4290
q2	2030	366	237	237
q3	10136	1303	731	731
q4	10204	774	303	303
q5	7533	2191	1931	1931
q6	195	190	149	149
q7	893	737	612	612
q8	9262	1399	1118	1118
q9	4687	4616	4639	4616
q10	6743	1975	1557	1557
q11	534	310	300	300
q12	335	386	223	223
q13	17764	4018	3264	3264
q14	227	251	227	227
q15	893	820	810	810
q16	689	727	619	619
q17	696	782	558	558
q18	6565	5923	6291	5923
q19	1116	1085	686	686
q20	558	520	389	389
q21	2766	2011	1976	1976
q22	374	337	291	291
Total cold run time: 101799 ms
Total hot run time: 30810 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4511	4590	4734	4590
q2	275	352	255	255
q3	2392	2938	2453	2453
q4	1492	1909	1493	1493
q5	4583	4434	4551	4434
q6	221	182	139	139
q7	2004	1932	1782	1782
q8	2611	2387	2505	2387
q9	7521	7529	7510	7510
q10	2884	3104	2578	2578
q11	560	471	445	445
q12	699	833	628	628
q13	3761	4434	3524	3524
q14	271	284	277	277
q15	817	786	774	774
q16	634	687	638	638
q17	1073	1262	1287	1262
q18	7644	7704	7374	7374
q19	831	808	816	808
q20	1956	2041	1882	1882
q21	4529	4339	4095	4095
q22	595	588	509	509
Total cold run time: 51864 ms
Total hot run time: 49837 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.26 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 566e82c354ca8ade00cdbfb7ed4ae291d3f211ac, data reload: false

query1	0.05	0.05	0.04
query2	0.10	0.05	0.05
query3	0.25	0.08	0.08
query4	1.61	0.12	0.11
query5	0.27	0.25	0.25
query6	1.16	0.69	0.66
query7	0.04	0.03	0.02
query8	0.05	0.03	0.04
query9	0.57	0.51	0.49
query10	0.55	0.54	0.54
query11	0.14	0.10	0.10
query12	0.15	0.11	0.11
query13	0.65	0.62	0.61
query14	1.08	1.08	1.06
query15	0.89	0.87	0.87
query16	0.41	0.39	0.41
query17	1.08	1.09	1.14
query18	0.23	0.21	0.22
query19	2.12	2.01	2.04
query20	0.02	0.02	0.02
query21	15.41	0.27	0.15
query22	5.27	0.05	0.06
query23	15.85	0.28	0.10
query24	1.42	0.27	0.21
query25	0.09	0.06	0.12
query26	0.16	0.13	0.15
query27	0.06	0.06	0.06
query28	3.07	1.16	0.97
query29	12.65	3.93	3.16
query30	0.29	0.13	0.14
query31	2.81	0.64	0.41
query32	3.24	0.60	0.49
query33	3.23	3.22	3.30
query34	15.86	5.38	4.72
query35	4.79	4.76	4.80
query36	0.64	0.51	0.50
query37	0.12	0.07	0.07
query38	0.08	0.05	0.04
query39	0.05	0.04	0.03
query40	0.21	0.16	0.16
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 96.9 s
Total hot run time: 28.26 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 84.34% (140/166) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.63% (19418/36898)
Line Coverage 36.17% (180874/500106)
Region Coverage 32.48% (140082/431335)
Branch Coverage 33.50% (60693/181165)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.67% (25911/36154)
Line Coverage 54.34% (270994/498705)
Region Coverage 51.66% (224915/435381)
Branch Coverage 53.20% (96712/181800)

1 similar comment
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.67% (25911/36154)
Line Coverage 54.34% (270994/498705)
Region Coverage 51.66% (224915/435381)
Branch Coverage 53.20% (96712/181800)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.32% (26513/36159)
Line Coverage 56.42% (281482/498882)
Region Coverage 53.90% (234862/435719)
Branch Coverage 55.71% (101316/181873)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.33% (26515/36159)
Line Coverage 56.42% (281477/498882)
Region Coverage 53.90% (234846/435719)
Branch Coverage 55.71% (101318/181873)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

1 similar comment
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

@eldenmoon
Copy link
Member

run p0 5

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.51% (26581/36159)
Line Coverage 56.65% (282603/498882)
Region Coverage 54.17% (236009/435719)
Branch Coverage 55.91% (101688/181873)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

@eldenmoon
Copy link
Member

run p0

Copy link
Contributor

@csun5285 csun5285 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

PR approved by anyone and no changes requested.

Copy link
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 9, 2026
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 9, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 9, 2026

PR approved by at least one committer and no changes requested.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements schema-template-based auto-casting for VARIANT path expressions during Nereids analysis, gated by a new session variable, and aligns glob-pattern matching behavior across FE/BE with expanded regression/unit tests.

Changes:

  • Add session variable enable_variant_schema_auto_cast and apply template-based casts for VARIANT ElementAt paths during expression analysis.
  • Introduce shared glob→regex utilities (FE) and switch BE variant glob matching to RE2 with caching.
  • Add end-to-end regression tests and focused FE/BE unit tests for field matching and auto-cast behavior.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
regression-test/suites/variant_p0/predefine/test_schema_template_auto_cast.groovy New regression suite covering auto-cast across clauses, chained paths, alias/subquery, joins, and disable-switch behavior.
regression-test/data/variant_p0/predefine/test_schema_template_auto_cast.out Expected outputs for the new regression suite.
fe/fe-core/src/test/java/org/apache/doris/nereids/types/VariantFieldMatchTest.java Unit tests for glob/exact matching and VariantType.findMatchingField.
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzerVariantAutoCastTest.java Unit tests validating analysis-time cast insertion and chained-path behavior.
fe/fe-core/src/test/java/org/apache/doris/common/GlobRegexUtilTest.java Unit tests for FE glob→regex conversion and cache behavior.
fe/fe-core/src/main/java/org/apache/doris/qe/SessionVariable.java Adds enable_variant_schema_auto_cast session variable definition and accessor.
fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantType.java Adds findMatchingField helper for schema template lookup.
fe/fe-core/src/main/java/org/apache/doris/nereids/types/VariantField.java Adds matches() for glob/exact matching using FE glob-regex utility.
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/scalar/ElementAt.java Preserves VariantType predefined fields in signature computation to support template matching.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/ExpressionAnalyzer.java Adds variant schema auto-cast logic for ElementAt and alias binding behavior.
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/analysis/CheckAfterRewrite.java Updates join-equality variant checks to account for casts.
fe/fe-core/src/main/java/org/apache/doris/common/GlobRegexUtil.java New FE glob→regex conversion + small compiled-pattern LRU cache.
fe/fe-core/src/main/java/org/apache/doris/catalog/OlapTable.java Switch inverted-index glob matching to new FE glob-regex matcher.
be/test/olap/rowset/segment_v2/variant_util_test.cpp Adds BE tests for glob→regex conversion and RE2-based matching.
be/src/vec/common/variant_util.h Declares BE glob→regex and glob match helpers.
be/src/vec/common/variant_util.cpp Implements BE glob→regex and RE2 glob matching with an LRU cache; replaces fnmatch usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +139 to +141
std::mutex g_glob_regex_cache_mutex;
std::list<std::string> g_glob_regex_cache_lru;
std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glob regex cache globals have external linkage (g_glob_regex_cache_mutex, g_glob_regex_cache_lru, g_glob_regex_cache). Since they're only used in this translation unit, consider marking them static or moving them into an anonymous namespace to avoid unintended symbol exports and reduce the chance of name collisions.

Suggested change
std::mutex g_glob_regex_cache_mutex;
std::list<std::string> g_glob_regex_cache_lru;
std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache;
static std::mutex g_glob_regex_cache_mutex;
static std::list<std::string> g_glob_regex_cache_lru;
static std::unordered_map<std::string, GlobRegexCacheEntry> g_glob_regex_cache;

Copilot uses AI. Check for mistakes.
Comment on lines +744 to +748
private Expression wrapVariantElementAtWithCast(Expression expr) {
ElementAt elementAt = (ElementAt) expr;
if (suppressVariantElementAtCastDepth > 0) {
return elementAt;
}
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrapVariantElementAtWithCast assumes the input is always an ElementAt and unconditionally casts expr to ElementAt. This can throw ClassCastException when auto-cast is enabled and maybeCastAliasExpression passes an Alias whose child is not ElementAt (e.g., StructElement from struct dereference). Consider guarding with instanceof ElementAt (and ideally also ensuring the resolved root type is VariantType) and returning expr unchanged when it doesn't apply.

Copilot uses AI. Check for mistakes.
return alias;
}
Expression child = alias.child();
Expression casted = wrapVariantElementAtWithCast(child);
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybeCastAliasExpression calls wrapVariantElementAtWithCast(alias.child()) without checking the child expression type. When Alias wraps non-ElementAt expressions (e.g., StructElement from nested struct access), this will crash due to the cast inside wrapVariantElementAtWithCast. Add a type check (e.g., only attempt when child is ElementAt) or make wrapVariantElementAtWithCast safely no-op for non-ElementAt inputs.

Suggested change
Expression casted = wrapVariantElementAtWithCast(child);
Expression casted = child;
if (child instanceof ElementAt) {
casted = wrapVariantElementAtWithCast(child);
}

Copilot uses AI. Check for mistakes.
}

private boolean containsVariantTypeOutsideCast(Expression expr, boolean underCast) {
boolean nextUnderCast = underCast || expr instanceof Cast;
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

containsVariantTypeOutsideCast treats any Cast node as a safe boundary and ignores variant-typed expressions under it. This can incorrectly allow join equal conditions that still evaluate to VARIANT (e.g., a cast whose target type is VariantType / identity casts), bypassing the intended restriction. Consider only suppressing the check when the Cast's result type is non-variant (or checking expr instanceof Cast && !expr.getDataType().isVariantType()).

Suggested change
boolean nextUnderCast = underCast || expr instanceof Cast;
boolean nextUnderCast = underCast || (expr instanceof Cast && !expr.getDataType().isVariantType());

Copilot uses AI. Check for mistakes.
Comment on lines +738 to +741
if (sessionVariable == null || !sessionVariable.isEnableVariantSchemaAutoCast()) {
return false;
}
return sessionVariable.isEnableVariantSchemaAutoCast();
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isEnableVariantSchemaAutoCast() redundantly checks and then returns sessionVariable.isEnableVariantSchemaAutoCast() twice. This can be simplified to a single return expression to reduce noise and avoid future inconsistencies if the logic changes.

Suggested change
if (sessionVariable == null || !sessionVariable.isEnableVariantSchemaAutoCast()) {
return false;
}
return sessionVariable.isEnableVariantSchemaAutoCast();
return sessionVariable != null && sessionVariable.isEnableVariantSchemaAutoCast();

Copilot uses AI. Check for mistakes.
@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 93.53% (130/139) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.34% (26518/36159)
Line Coverage 56.43% (281508/498882)
Region Coverage 53.90% (234864/435719)
Branch Coverage 55.72% (101332/181873)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 64.46% (107/166) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon merged commit 3ad74c0 into apache:master Feb 9, 2026
38 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.1.x kind/feature Categorizes issue or PR as related to a new feature. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants