Skip to content

Conversation

@liaoxin01
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

The async lambdas in TabletStream captured raw 'this' pointer, which could become dangling when the TabletStream object is destroyed before the async task completes (e.g., when thrift connection is broken).

Fix by using shared_from_this() to capture shared_ptr instead of raw pointer, ensuring the object stays alive until all async tasks complete.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Copilot AI review requested due to automatic review settings January 22, 2026 08:55
@Thearas
Copy link
Contributor

Thearas commented Jan 22, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a use-after-free bug in TabletStream where async lambdas captured raw this pointers that could become dangling when the TabletStream object is destroyed before async tasks complete (e.g., when thrift connection is broken).

Changes:

  • Made TabletStream inherit from std::enable_shared_from_this<TabletStream> to enable safe shared_ptr capture
  • Updated async lambdas in append_data(), add_segment(), and _run_in_heavy_work_pool() to capture shared_from_this() instead of raw this
  • Replaced all member access in async lambdas from direct access to self-> member access pattern

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
be/src/runtime/load_stream.h Added std::enable_shared_from_this<TabletStream> inheritance to TabletStream class
be/src/runtime/load_stream.cpp Updated three async lambda captures in TabletStream methods to use shared_from_this() instead of raw this pointer, ensuring object lifetime safety

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@doris-robot
Copy link

TPC-H: Total hot run time: 31003 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4cc73673f7a85760901c06a1bd27cc3ae02e432, data reload: false

------ Round 1 ----------------------------------
q1	17658	4791	4555	4555
q2	2035	311	197	197
q3	10233	1294	774	774
q4	10203	817	300	300
q5	7522	2128	1802	1802
q6	192	174	139	139
q7	832	712	602	602
q8	9258	1408	1166	1166
q9	4899	4688	4494	4494
q10	6742	1703	1298	1298
q11	476	283	288	283
q12	343	372	223	223
q13	17775	3824	3118	3118
q14	232	238	212	212
q15	589	520	529	520
q16	655	637	611	611
q17	651	754	516	516
q18	6606	6444	6426	6426
q19	1268	983	637	637
q20	397	354	231	231
q21	2661	1996	1935	1935
q22	1031	986	964	964
Total cold run time: 102258 ms
Total hot run time: 31003 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4769	4723	4693	4693
q2	324	403	326	326
q3	2168	2675	2260	2260
q4	1363	1811	1332	1332
q5	4126	3996	4038	3996
q6	211	175	137	137
q7	1931	1841	1749	1749
q8	2871	2487	2453	2453
q9	7365	7209	7266	7209
q10	2675	2721	2333	2333
q11	558	520	473	473
q12	707	769	631	631
q13	3623	4208	3386	3386
q14	316	363	304	304
q15	576	512	513	512
q16	677	693	648	648
q17	1138	1363	1357	1357
q18	8194	8298	7824	7824
q19	990	938	873	873
q20	1980	2145	1926	1926
q21	4902	4519	4400	4400
q22	1132	1096	976	976
Total cold run time: 52596 ms
Total hot run time: 49798 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173013 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d4cc73673f7a85760901c06a1bd27cc3ae02e432, data reload: false

query5	4393	634	503	503
query6	316	207	196	196
query7	4237	447	268	268
query8	331	251	227	227
query9	8697	2862	2863	2862
query10	428	322	280	280
query11	15390	15178	15044	15044
query12	202	122	115	115
query13	1260	460	368	368
query14	6493	3101	2840	2840
query14_1	2728	2661	2672	2661
query15	199	193	180	180
query16	1026	521	467	467
query17	1094	692	575	575
query18	2599	439	339	339
query19	203	182	157	157
query20	123	120	122	120
query21	223	141	120	120
query22	3823	3871	3791	3791
query23	16005	15607	15363	15363
query23_1	15559	15503	15381	15381
query24	7225	1549	1191	1191
query24_1	1155	1182	1196	1182
query25	544	456	414	414
query26	1231	273	157	157
query27	2758	455	281	281
query28	4499	2161	2154	2154
query29	799	555	460	460
query30	310	240	210	210
query31	850	640	575	575
query32	90	79	78	78
query33	542	371	322	322
query34	908	891	534	534
query35	731	758	675	675
query36	843	920	851	851
query37	139	105	103	103
query38	2746	2699	2726	2699
query39	796	748	731	731
query39_1	731	711	716	711
query40	220	142	121	121
query41	73	68	67	67
query42	101	94	92	92
query43	425	442	412	412
query44	1314	759	748	748
query45	195	192	187	187
query46	848	957	631	631
query47	1380	1441	1378	1378
query48	313	338	247	247
query49	603	421	342	342
query50	683	279	200	200
query51	3718	3782	3779	3779
query52	90	103	81	81
query53	208	235	167	167
query54	281	251	243	243
query55	81	80	76	76
query56	290	291	291	291
query57	1017	1058	909	909
query58	271	254	280	254
query59	2153	2133	1978	1978
query60	332	340	310	310
query61	151	140	143	140
query62	397	357	312	312
query63	198	162	163	162
query64	4870	1117	825	825
query65	3775	3724	3749	3724
query66	1413	415	313	313
query67	15678	15459	15497	15459
query68	2434	1059	709	709
query69	409	324	275	275
query70	1000	933	931	931
query71	309	335	267	267
query72	5289	3103	3224	3103
query73	619	728	305	305
query74	8752	8733	8631	8631
query75	2285	2312	1901	1901
query76	2286	1050	676	676
query77	362	377	316	316
query78	9771	10010	9186	9186
query79	1052	893	582	582
query80	1150	511	450	450
query81	531	263	230	230
query82	1376	150	117	117
query83	349	257	245	245
query84	250	116	99	99
query85	987	476	410	410
query86	363	293	328	293
query87	2856	2892	2856	2856
query88	3470	2595	2557	2557
query89	310	263	241	241
query90	1784	169	162	162
query91	179	156	136	136
query92	75	75	68	68
query93	1054	1034	644	644
query94	549	332	299	299
query95	576	341	382	341
query96	632	516	233	233
query97	2387	2393	2335	2335
query98	212	202	193	193
query99	597	592	520	520
Total cold run time: 246632 ms
Total hot run time: 173013 ms

The async lambdas in TabletStream captured raw 'this' pointer, which
could become dangling when the TabletStream object is destroyed before
the async task completes (e.g., when thrift connection is broken).

Fix by using shared_from_this() to capture shared_ptr instead of raw
pointer, ensuring the object stays alive until all async tasks complete.
@doris-robot
Copy link

ClickBench: Total hot run time: 26.72 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d4cc73673f7a85760901c06a1bd27cc3ae02e432, data reload: false

query1	0.05	0.04	0.05
query2	0.09	0.04	0.05
query3	0.26	0.08	0.08
query4	1.61	0.11	0.11
query5	0.27	0.27	0.26
query6	1.15	0.66	0.66
query7	0.03	0.02	0.03
query8	0.05	0.04	0.04
query9	0.57	0.51	0.49
query10	0.55	0.55	0.55
query11	0.15	0.09	0.09
query12	0.14	0.11	0.11
query13	0.61	0.59	0.59
query14	0.95	0.93	0.94
query15	0.80	0.78	0.79
query16	0.42	0.39	0.39
query17	1.00	1.00	1.07
query18	0.24	0.21	0.21
query19	1.96	1.79	1.86
query20	0.02	0.01	0.01
query21	15.51	0.25	0.13
query22	5.42	0.05	0.05
query23	16.04	0.28	0.10
query24	1.14	0.61	0.23
query25	0.06	0.08	0.06
query26	0.13	0.13	0.13
query27	0.08	0.08	0.08
query28	4.15	1.09	0.88
query29	12.53	3.99	3.21
query30	0.29	0.17	0.11
query31	2.83	0.63	0.40
query32	3.24	0.56	0.46
query33	2.98	3.02	3.09
query34	16.03	5.11	4.43
query35	4.47	4.51	4.42
query36	0.63	0.50	0.49
query37	0.12	0.06	0.07
query38	0.07	0.03	0.03
query39	0.05	0.03	0.03
query40	0.16	0.14	0.13
query41	0.10	0.03	0.02
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 97.04 s
Total hot run time: 26.72 s

@liaoxin01 liaoxin01 force-pushed the fix-tablet-stream-use-after-free-master branch from d4cc736 to 1cdf449 Compare January 22, 2026 09:38
@liaoxin01
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31159 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1cdf44902ed42e446d9c346b6a62c721ee11ff59, data reload: false

------ Round 1 ----------------------------------
q1	17684	4806	4582	4582
q2	2031	313	197	197
q3	10239	1280	730	730
q4	10202	821	304	304
q5	7518	2073	1890	1890
q6	186	166	141	141
q7	865	719	574	574
q8	9254	1368	1145	1145
q9	4952	4596	4687	4596
q10	6821	1635	1235	1235
q11	512	303	283	283
q12	340	382	236	236
q13	17794	3843	3045	3045
q14	255	240	210	210
q15	591	522	534	522
q16	650	629	568	568
q17	647	799	513	513
q18	6676	6248	6631	6248
q19	1308	1032	681	681
q20	431	406	250	250
q21	3139	2292	2148	2148
q22	1124	1061	1070	1061
Total cold run time: 103219 ms
Total hot run time: 31159 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5066	4870	4916	4870
q2	346	419	348	348
q3	2355	2942	2518	2518
q4	1402	1846	1406	1406
q5	4576	4343	4463	4343
q6	213	168	135	135
q7	2126	2052	2024	2024
q8	2543	2420	2424	2420
q9	7327	7301	7084	7084
q10	2471	2840	2309	2309
q11	552	477	437	437
q12	717	749	655	655
q13	3654	3815	3150	3150
q14	264	279	271	271
q15	536	495	492	492
q16	612	657	607	607
q17	1085	1253	1273	1253
q18	7193	7439	7170	7170
q19	857	773	806	773
q20	1888	1968	1824	1824
q21	4540	4257	4109	4109
q22	1066	1002	983	983
Total cold run time: 51389 ms
Total hot run time: 49181 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172530 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1cdf44902ed42e446d9c346b6a62c721ee11ff59, data reload: false

query5	4422	642	503	503
query6	319	213	201	201
query7	4220	453	261	261
query8	325	251	223	223
query9	8652	2855	2850	2850
query10	459	322	279	279
query11	15277	15174	14906	14906
query12	184	116	116	116
query13	1243	486	396	396
query14	6049	3037	2777	2777
query14_1	2686	2675	2692	2675
query15	203	195	173	173
query16	976	472	451	451
query17	1105	687	577	577
query18	2434	432	328	328
query19	198	177	158	158
query20	120	121	115	115
query21	217	138	117	117
query22	3824	3907	3832	3832
query23	15956	15549	15255	15255
query23_1	15555	15502	15324	15324
query24	7184	1569	1178	1178
query24_1	1184	1190	1169	1169
query25	558	420	386	386
query26	1236	266	144	144
query27	2771	437	270	270
query28	4582	2127	2129	2127
query29	749	524	421	421
query30	307	236	205	205
query31	797	624	556	556
query32	80	74	73	73
query33	512	343	317	317
query34	915	877	530	530
query35	704	737	664	664
query36	881	934	774	774
query37	131	90	84	84
query38	2757	2673	2643	2643
query39	775	736	730	730
query39_1	699	710	731	710
query40	217	141	114	114
query41	65	77	60	60
query42	97	88	90	88
query43	440	465	441	441
query44	1340	745	750	745
query45	190	189	186	186
query46	838	928	583	583
query47	1392	1413	1375	1375
query48	320	329	246	246
query49	594	433	333	333
query50	691	280	205	205
query51	3860	3794	3751	3751
query52	90	97	80	80
query53	205	219	174	174
query54	271	260	245	245
query55	85	80	75	75
query56	293	297	306	297
query57	1032	1008	926	926
query58	264	254	262	254
query59	2110	2076	2029	2029
query60	316	320	311	311
query61	148	148	142	142
query62	398	345	313	313
query63	189	163	163	163
query64	4906	1146	833	833
query65	3781	3686	3799	3686
query66	1457	424	311	311
query67	15501	15585	15582	15582
query68	2464	1073	711	711
query69	391	303	273	273
query70	1020	937	839	839
query71	292	276	275	275
query72	5313	3238	3388	3238
query73	620	732	315	315
query74	8709	8739	8517	8517
query75	2326	2353	1904	1904
query76	2279	1049	689	689
query77	362	396	310	310
query78	9721	9969	9117	9117
query79	1079	931	587	587
query80	1391	529	457	457
query81	522	266	230	230
query82	1367	149	122	122
query83	360	263	251	251
query84	257	121	102	102
query85	931	464	403	403
query86	371	302	325	302
query87	2859	2832	2828	2828
query88	3527	2583	2560	2560
query89	308	274	239	239
query90	1904	172	165	165
query91	168	176	133	133
query92	79	72	69	69
query93	1057	1013	651	651
query94	577	312	264	264
query95	582	324	310	310
query96	667	517	231	231
query97	2353	2370	2318	2318
query98	209	202	198	198
query99	610	577	505	505
Total cold run time: 245794 ms
Total hot run time: 172530 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 26.95 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1cdf44902ed42e446d9c346b6a62c721ee11ff59, data reload: false

query1	0.06	0.05	0.05
query2	0.10	0.05	0.05
query3	0.25	0.09	0.08
query4	1.61	0.11	0.10
query5	0.28	0.24	0.26
query6	1.14	0.65	0.66
query7	0.03	0.02	0.02
query8	0.06	0.04	0.04
query9	0.55	0.51	0.50
query10	0.54	0.54	0.55
query11	0.14	0.09	0.10
query12	0.14	0.11	0.11
query13	0.60	0.59	0.59
query14	0.95	0.93	0.93
query15	0.80	0.77	0.79
query16	0.40	0.38	0.40
query17	1.04	1.06	1.06
query18	0.23	0.21	0.21
query19	1.98	1.83	1.85
query20	0.02	0.01	0.01
query21	15.43	0.24	0.14
query22	5.25	0.06	0.05
query23	15.99	0.27	0.10
query24	1.57	0.35	0.44
query25	0.06	0.05	0.06
query26	0.15	0.13	0.13
query27	0.08	0.05	0.06
query28	4.60	1.07	0.88
query29	12.70	3.94	3.15
query30	0.28	0.14	0.12
query31	2.82	0.64	0.40
query32	3.25	0.56	0.47
query33	2.95	3.03	3.04
query34	16.04	5.08	4.46
query35	4.50	4.46	4.50
query36	0.65	0.49	0.49
query37	0.12	0.07	0.07
query38	0.07	0.04	0.04
query39	0.05	0.03	0.03
query40	0.17	0.13	0.14
query41	0.08	0.04	0.03
query42	0.04	0.03	0.04
query43	0.05	0.04	0.04
Total cold run time: 97.82 s
Total hot run time: 26.95 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 66.67% (12/18) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.46% (19115/36440)
Line Coverage 35.82% (177554/495624)
Region Coverage 32.31% (137303/424963)
Branch Coverage 33.23% (59420/178793)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.89% (16/18) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.45% (25521/35719)
Line Coverage 54.02% (267466/495092)
Region Coverage 51.45% (220981/429476)
Branch Coverage 52.99% (95148/179564)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 22, 2026
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants