Skip to content

Conversation

@zhiqiang-hhhh
Copy link
Contributor

@zhiqiang-hhhh zhiqiang-hhhh commented Jan 29, 2026

Before this change, when the amount of data used to train the index was less than the required amount, import or compaction might fail, which severely impacted user experience. Now, in such cases, it automatically determines whether training and index generation are needed. When the amount is completely insufficient, index construction is skipped, and during queries, it falls back to brute-force computation.

For the calculation of min_train_rows:

  1. IVF requires no less than nlist rows.
  2. PQ requires no less than 2^pq_nbits * 100 rows.

Take the max of the two as the required minimum number of rows.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jan 29, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32707 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2d278478a0f238840f14b45790c04ee387019ca4, data reload: false

------ Round 1 ----------------------------------
q1	17658	5294	5080	5080
q2	2057	308	197	197
q3	10222	1366	759	759
q4	10235	809	322	322
q5	8411	2221	1913	1913
q6	224	182	159	159
q7	917	794	612	612
q8	9271	1453	1097	1097
q9	5600	4835	4952	4835
q10	6868	1977	1568	1568
q11	519	300	281	281
q12	388	381	230	230
q13	17815	4041	3216	3216
q14	250	242	217	217
q15	908	828	822	822
q16	674	685	627	627
q17	700	807	453	453
q18	7125	6961	7340	6961
q19	1609	1057	628	628
q20	413	377	242	242
q21	2926	2225	2188	2188
q22	374	332	300	300
Total cold run time: 105164 ms
Total hot run time: 32707 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5574	5599	5572	5572
q2	268	364	303	303
q3	2335	2883	2628	2628
q4	1466	1993	1451	1451
q5	4751	4512	4437	4437
q6	225	178	138	138
q7	2157	1949	1790	1790
q8	2605	2482	2781	2482
q9	7446	7426	7540	7426
q10	2759	2888	2416	2416
q11	513	452	432	432
q12	618	711	546	546
q13	3562	4003	3204	3204
q14	266	299	265	265
q15	843	799	792	792
q16	642	677	634	634
q17	1078	1268	1338	1268
q18	7603	7378	7336	7336
q19	838	777	803	777
q20	1952	2032	1896	1896
q21	4485	4218	4033	4033
q22	582	550	517	517
Total cold run time: 52568 ms
Total hot run time: 50343 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2d278478a0f238840f14b45790c04ee387019ca4, data reload: false

query1	0.05	0.05	0.04
query2	0.10	0.05	0.05
query3	0.26	0.09	0.09
query4	1.61	0.11	0.10
query5	0.27	0.25	0.25
query6	1.16	0.69	0.67
query7	0.03	0.03	0.03
query8	0.05	0.04	0.04
query9	0.57	0.49	0.50
query10	0.56	0.54	0.55
query11	0.14	0.10	0.09
query12	0.14	0.11	0.10
query13	0.64	0.61	0.61
query14	1.04	1.07	1.05
query15	0.88	0.86	0.86
query16	0.41	0.41	0.40
query17	1.12	1.14	1.17
query18	0.23	0.22	0.21
query19	1.99	1.99	2.06
query20	0.02	0.02	0.01
query21	15.40	0.25	0.15
query22	5.20	0.06	0.05
query23	16.19	0.28	0.10
query24	0.93	0.23	0.58
query25	0.06	0.10	0.08
query26	0.15	0.14	0.13
query27	0.08	0.06	0.07
query28	3.88	1.14	0.96
query29	12.59	3.95	3.18
query30	0.27	0.13	0.11
query31	2.82	0.63	0.40
query32	3.24	0.59	0.49
query33	3.28	3.21	3.26
query34	16.37	5.37	4.73
query35	4.80	4.78	4.82
query36	0.65	0.50	0.49
query37	0.11	0.07	0.06
query38	0.08	0.04	0.04
query39	0.05	0.03	0.04
query40	0.20	0.16	0.15
query41	0.10	0.03	0.03
query42	0.04	0.03	0.03
query43	0.06	0.04	0.03
Total cold run time: 97.82 s
Total hot run time: 28.28 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31776 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b78079c1e4740ab892bab06fd2d818b8ec8f2579, data reload: false

------ Round 1 ----------------------------------
q1	17649	5271	5056	5056
q2	2031	310	192	192
q3	10203	1377	774	774
q4	10233	870	317	317
q5	8217	2199	1920	1920
q6	228	188	153	153
q7	913	729	609	609
q8	9267	1425	1124	1124
q9	5494	4904	4784	4784
q10	6859	1972	1571	1571
q11	537	299	287	287
q12	380	382	227	227
q13	17775	4072	3267	3267
q14	240	238	227	227
q15	927	827	816	816
q16	704	667	626	626
q17	656	843	448	448
q18	7587	6554	6396	6396
q19	1235	981	642	642
q20	398	357	230	230
q21	2778	2140	1836	1836
q22	364	319	274	274
Total cold run time: 104675 ms
Total hot run time: 31776 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5346	5361	5340	5340
q2	268	339	279	279
q3	2175	2705	2268	2268
q4	1383	1760	1294	1294
q5	4319	4264	4610	4264
q6	270	212	144	144
q7	2191	1950	1789	1789
q8	2588	2425	2371	2371
q9	7699	7523	7713	7523
q10	2818	3040	2681	2681
q11	537	493	456	456
q12	654	780	666	666
q13	4039	4318	3508	3508
q14	289	303	287	287
q15	925	872	826	826
q16	689	747	918	747
q17	1235	1408	1363	1363
q18	8219	8014	7840	7840
q19	905	869	889	869
q20	2079	2145	1977	1977
q21	4837	4276	4164	4164
q22	585	569	515	515
Total cold run time: 54050 ms
Total hot run time: 51171 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b78079c1e4740ab892bab06fd2d818b8ec8f2579, data reload: false

query1	0.06	0.04	0.04
query2	0.10	0.05	0.04
query3	0.25	0.08	0.08
query4	1.61	0.11	0.10
query5	0.29	0.25	0.25
query6	1.16	0.69	0.68
query7	0.04	0.03	0.02
query8	0.05	0.04	0.05
query9	0.56	0.51	0.49
query10	0.54	0.55	0.54
query11	0.14	0.09	0.10
query12	0.14	0.11	0.10
query13	0.65	0.63	0.62
query14	1.08	1.06	1.05
query15	0.87	0.87	0.88
query16	0.39	0.40	0.40
query17	1.14	1.10	1.14
query18	0.22	0.22	0.21
query19	1.98	1.97	1.93
query20	0.02	0.02	0.01
query21	15.41	0.27	0.15
query22	5.35	0.05	0.05
query23	16.15	0.28	0.10
query24	1.24	0.49	0.59
query25	0.09	0.12	0.09
query26	0.13	0.13	0.13
query27	0.06	0.06	0.06
query28	4.31	1.16	0.97
query29	12.60	3.93	3.17
query30	0.28	0.13	0.12
query31	2.83	0.62	0.41
query32	3.23	0.60	0.49
query33	3.28	3.36	3.23
query34	16.12	5.37	4.74
query35	4.81	4.78	4.74
query36	0.64	0.50	0.48
query37	0.11	0.07	0.07
query38	0.07	0.04	0.04
query39	0.05	0.04	0.04
query40	0.20	0.16	0.17
query41	0.09	0.03	0.04
query42	0.05	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 98.44 s
Total hot run time: 28.48 s

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33229 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 431a87e10cb4b1bb5332e2899f5881ccf26a0eba, data reload: false

------ Round 1 ----------------------------------
q1	17621	5353	5180	5180
q2	2002	314	190	190
q3	10420	1335	761	761
q4	10313	809	327	327
q5	9239	2215	1949	1949
q6	211	179	150	150
q7	898	743	612	612
q8	9265	1450	1207	1207
q9	5224	4838	4833	4833
q10	6859	1973	1577	1577
q11	528	295	285	285
q12	395	383	234	234
q13	17815	4087	3231	3231
q14	247	236	218	218
q15	899	832	822	822
q16	678	687	617	617
q17	901	857	458	458
q18	7205	7057	7436	7057
q19	1577	1074	676	676
q20	420	380	246	246
q21	3086	2378	2307	2307
q22	375	331	292	292
Total cold run time: 106178 ms
Total hot run time: 33229 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5601	5502	5568	5502
q2	268	355	259	259
q3	2425	2916	2495	2495
q4	1454	2007	1464	1464
q5	4778	4470	4538	4470
q6	226	186	142	142
q7	2000	1997	1877	1877
q8	2593	2412	2445	2412
q9	7647	7506	7440	7440
q10	2612	2822	2428	2428
q11	527	458	433	433
q12	644	701	555	555
q13	3574	4067	3265	3265
q14	279	297	262	262
q15	847	813	809	809
q16	661	684	642	642
q17	1092	1227	1239	1227
q18	7582	7312	7459	7312
q19	867	835	849	835
q20	1987	2084	1909	1909
q21	4656	4359	4126	4126
q22	581	554	529	529
Total cold run time: 52901 ms
Total hot run time: 50393 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.19 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 431a87e10cb4b1bb5332e2899f5881ccf26a0eba, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.04
query3	0.26	0.09	0.08
query4	1.60	0.11	0.11
query5	0.27	0.25	0.24
query6	1.16	0.70	0.69
query7	0.03	0.03	0.03
query8	0.05	0.03	0.04
query9	0.57	0.50	0.50
query10	0.55	0.55	0.53
query11	0.14	0.10	0.09
query12	0.14	0.10	0.11
query13	0.63	0.62	0.62
query14	1.07	1.06	1.04
query15	0.87	0.85	0.88
query16	0.40	0.38	0.40
query17	1.15	1.13	1.14
query18	0.22	0.21	0.21
query19	2.05	1.90	1.97
query20	0.02	0.02	0.01
query21	15.41	0.28	0.16
query22	5.32	0.05	0.04
query23	16.25	0.26	0.11
query24	0.94	0.50	0.19
query25	0.09	0.12	0.08
query26	0.16	0.13	0.13
query27	0.08	0.07	0.05
query28	3.42	1.16	0.96
query29	12.56	3.97	3.16
query30	0.30	0.13	0.12
query31	2.80	0.62	0.41
query32	3.25	0.60	0.50
query33	3.31	3.29	3.28
query34	15.83	5.39	4.75
query35	4.75	4.78	4.80
query36	0.65	0.51	0.49
query37	0.12	0.07	0.06
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.19	0.16	0.14
query41	0.09	0.03	0.04
query42	0.04	0.04	0.03
query43	0.05	0.04	0.03
Total cold run time: 97.06 s
Total hot run time: 28.19 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 64.41% (76/118) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.49% (19275/36720)
Line Coverage 35.98% (179143/497892)
Region Coverage 32.43% (139220/429240)
Branch Coverage 33.34% (60112/180316)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 77.24% (95/123) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.51% (25741/35995)
Line Coverage 54.18% (269131/496705)
Region Coverage 51.80% (224609/433637)
Branch Coverage 53.12% (96169/181044)

@zhiqiang-hhhh
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b2ec788892d845487708874d5f8c6c6e08637d69, data reload: false

------ Round 1 ----------------------------------
q1	17669	5280	5029	5029
q2	2035	307	188	188
q3	10194	1319	739	739
q4	10206	827	309	309
q5	7540	2149	1892	1892
q6	202	187	152	152
q7	856	723	609	609
q8	9257	1378	1080	1080
q9	5193	4815	4818	4815
q10	6821	1956	1568	1568
q11	515	301	277	277
q12	334	386	224	224
q13	17772	4049	3270	3270
q14	236	246	216	216
q15	893	813	828	813
q16	692	670	628	628
q17	628	813	497	497
q18	6917	6506	6452	6452
q19	1114	979	628	628
q20	380	341	229	229
q21	2624	2023	1928	1928
q22	353	310	273	273
Total cold run time: 102431 ms
Total hot run time: 31816 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5310	5251	5274	5251
q2	262	345	260	260
q3	2211	2723	2261	2261
q4	1383	1799	1319	1319
q5	4251	4183	4211	4183
q6	217	180	139	139
q7	2130	2140	1860	1860
q8	2677	2403	2386	2386
q9	7633	7421	7576	7421
q10	2877	3134	2659	2659
q11	573	473	454	454
q12	719	731	613	613
q13	3850	4445	3619	3619
q14	301	323	329	323
q15	865	856	813	813
q16	665	735	702	702
q17	1166	1340	1352	1340
q18	8058	8135	7805	7805
q19	884	852	867	852
q20	2111	2141	1898	1898
q21	4519	4286	4177	4177
q22	554	559	508	508
Total cold run time: 53216 ms
Total hot run time: 50843 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b2ec788892d845487708874d5f8c6c6e08637d69, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.04	0.04
query3	0.26	0.08	0.08
query4	1.60	0.11	0.10
query5	0.27	0.25	0.25
query6	1.19	0.66	0.67
query7	0.03	0.03	0.02
query8	0.06	0.04	0.04
query9	0.58	0.49	0.50
query10	0.55	0.55	0.55
query11	0.14	0.10	0.10
query12	0.15	0.11	0.10
query13	0.64	0.61	0.60
query14	1.06	1.06	1.06
query15	0.87	0.87	0.87
query16	0.40	0.40	0.40
query17	1.15	1.11	1.15
query18	0.23	0.21	0.21
query19	2.01	2.01	2.08
query20	0.02	0.02	0.02
query21	15.39	0.25	0.14
query22	5.07	0.06	0.05
query23	15.79	0.29	0.11
query24	2.00	0.24	0.32
query25	0.08	0.10	0.08
query26	0.14	0.12	0.13
query27	0.07	0.06	0.06
query28	3.67	1.15	0.96
query29	12.59	3.90	3.20
query30	0.28	0.13	0.12
query31	2.82	0.63	0.40
query32	3.23	0.59	0.50
query33	3.19	3.25	3.30
query34	16.41	5.37	4.71
query35	4.78	4.84	4.81
query36	0.65	0.52	0.49
query37	0.10	0.07	0.07
query38	0.07	0.05	0.04
query39	0.04	0.04	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.03
query42	0.04	0.03	0.03
query43	0.05	0.04	0.04
Total cold run time: 98.09 s
Total hot run time: 28.4 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 64.41% (76/118) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.49% (19275/36720)
Line Coverage 35.98% (179146/497892)
Region Coverage 32.45% (139285/429240)
Branch Coverage 33.34% (60115/180316)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants