Skip to content

Bluetooth: controller: Control PDU buffer leak into Data PDU buffer pool #32898

@LeBlue

Description

@LeBlue

Describe the bug
hci_usb FW crashes with imprecise bus error after a repeated pairing attempt with secure only connections.

To Reproduce
Steps to reproduce the behavior:
On the central side I use the hci_usb FW with a linux/bluez 5.55 host. The peripheral is a custom FW with secure only connections that uses bt_conn_set_security(conn, BT_SECURITY_L4) in the connect cb.

  1. Connect and pair the peripheral
  2. Disconnect the peripheral and remove its stored keys
  3. Try reconnecting. bluez provides the now outdated keys and establishing secure connction fails.
  4. Remove the keys in bluez
  5. Reconnect the peripheral. The key exchange procedure crashes the central (hci_usb FW)

Unfortunately I was not able to reproduce the issue with the peripheral_sc_only sample.

If step 3. is removed, everything works as expected.
If after step 3. bluez or the FW is reset, everything works as expected.

The behaviour is reproducable almost 100%, sometimes the hci_usb FW just "hangs" instead of crashing.

Expected behavior
hci_usb FW should not crash or hang.

Impact
Would be showstopper, but workarounds exist.

Logs and console output

btmon log

< HCI Command: LE Create Conne.. (0x08|0x000d) plen 25  #6065 [hci0] 13669.632058
        Scan interval: 60.000 msec (0x0060)
        Scan window: 60.000 msec (0x0060)
        Filter policy: White list is not used (0x00)
        Peer address type: Random (0x01)
        Peer address: DA:16:BC:11:74:7E (Static)
        Own address type: Random (0x01)
        Min connection interval: 50.00 msec (0x0028)
        Max connection interval: 70.00 msec (0x0038)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Min connection length: 0.000 msec (0x0000)
        Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4               #6066 [hci0] 13669.634932
      LE Create Connection (0x08|0x000d) ncmd 1
        Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19               #6067 [hci0] 13670.633029
      LE Connection Complete (0x01)
        Status: Success (0x00)
        Handle: 0
        Role: Master (0x00)
        Peer address type: Random (0x01)
        Peer address: DA:16:BC:11:74:7E (Static)
        Connection interval: 70.00 msec (0x0038)
        Connection latency: 0 (0x0000)
        Supervision timeout: 420 msec (0x002a)
        Master clock accuracy: 0x06
@ MGMT Event: Device Connected (0x000b) plen 16      {0x0001} [hci0] 13670.633124
        LE Address: DA:16:BC:11:74:7E (Static)
        Flags: 0x00000000
        Data length: 3
        Flags: 0x06
          LE General Discoverable Mode
          BR/EDR Not Supported
< HCI Command: LE Read Remote U.. (0x08|0x0016) plen 2  #6068 [hci0] 13670.633420
        Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 4                #6069 [hci0] 13670.634051
      LE Channel Selection Algorithm (0x14)
        Handle: 0
        Algorithm: #2 (0x01)
> HCI Event: Command Status (0x0f) plen 4               #6070 [hci0] 13670.636064
      LE Read Remote Used Features (0x08|0x0016) ncmd 1
        Status: Success (0x00)
> ACL Data RX: Handle 0 flags 0x02 dlen 6               #6071 [hci0] 13670.704037
      SMP: Security Request (0x0b) len 1
        Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
> HCI Event: LE Meta Event (0x3e) plen 12               #6072 [hci0] 13670.775035
      LE Read Remote Used Features (0x04)
        Status: Success (0x00)
        Handle: 0
        Features: 0x7f 0x49 0x01 0x00 0x00 0x00 0x00 0x00
          LE Encryption
          Connection Parameter Request Procedure
          Extended Reject Indication
          Slave-initiated Features Exchange
          LE Ping
          LE Data Packet Length Extension
          LL Privacy
          LE 2M PHY
          LE Coded PHY
          Channel Selection Algorithm #2
          Minimum Number of Used Channels Procedure
< ACL Data TX: Handle 0 flags 0x00 dlen 11              #6073 [hci0] 13670.775244
      SMP: Pairing Request (0x01) len 6
        IO capability: KeyboardOnly (0x02)
        OOB data: Authentication data not present (0x00)
        Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
        Max encryption key size: 16
        Initiator key distribution: EncKey Sign (0x05)
        Responder key distribution: EncKey IdKey Sign (0x07)
= bluetoothd: ../git/src/device.c:load_gatt_db() No cache for DA..   13670.778396
< ACL Data TX: Handle 0 flags 0x00 dlen 7               #6074 [hci0] 13670.781270
      ATT: Exchange MTU Request (0x02) len 2
        Client RX MTU: 517
> HCI Event: Number of Completed Packets (0x13) plen 5  #6075 [hci0] 13670.845037
        Num handles: 1
        Handle: 0
        Count: 1
> HCI Event: LE Meta Event (0x3e) plen 11               #6076 [hci0] 13670.847032
      LE Data Length Change (0x07)
        Handle: 0
        Max TX octets: 27
        Max TX time: 2704
        Max RX octets: 27
        Max RX time: 2704
> HCI Event: Number of Completed Packets (0x13) plen 5  #6077 [hci0] 13670.849026
        Num handles: 1
        Handle: 0
        Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 11              #6078 [hci0] 13670.914059
      SMP: Pairing Response (0x02) len 6
        IO capability: DisplayOnly (0x00)
        OOB data: Authentication data not present (0x00)
        Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
        Max encryption key size: 16
        Initiator key distribution: <none> (0x00)
        Responder key distribution: <none> (0x00)
> ACL Data RX: Handle 0 flags 0x02 dlen 7               #6079 [hci0] 13670.914095
      ATT: Exchange MTU Response (0x03) len 2
        Server RX MTU: 252
< ACL Data TX: Handle 0 flags 0x00 dlen 27              #6080 [hci0] 13670.923254
< ACL Data TX: Handle 0 flags 0x01 dlen 27              #6081 [hci0] 13670.923293
< ACL Data TX: Handle 0 flags 0x01 dlen 15              #6082 [hci0] 13670.923312
      SMP: Pairing Public Key (0x0c) len 64
        X: fa943f0262bdc282650467cbdedc15e2da2af338a8660eb241ebdfb80da59e1e
        Y: a40bd8d1e379d130baf81a4baf86b0dee11c8f3bab3bdd23156043e08c71a952
< ACL Data TX: Handle 0 flags 0x00 dlen 11              #6083 [hci0] 13670.923686
      ATT: Read By Type Request (0x08) len 6
        Handle range: 0x0001-0xffff
        Attribute type: Unknown (0x2b3a)
< HCI Command: Disconnect (0x01|0x0006) plen 3          #6084 [hci0] 13701.312561
        Handle: 0
        Reason: Remote User Terminated Connection (0x13)

hci_usb sample log (annotated)

Mar 04 13:27:54 box-mwr log-ble-fw[368]: *** Booting Zephyr OS build zephyr-v2.5.0  ***
Mar 04 13:27:54 box-mwr log-ble-fw[368]: Bluetooth over USB sample
Mar 04 13:27:54 box-mwr log-ble-fw[368]: [00:00:00.349,853] <inf> bt_hci_raw: Bluetooth enabled in RAW mode
Mar 04 13:27:55 box-mwr systemd[1]: Starting Start BLE Bluetooth Pin Agent...
Mar 04 13:27:55 box-mwr systemd[1]: Started Start BLE Bluetooth Pin Agent.
Mar 04 13:28:41 box-mwr ble-autopair[862]: PASSKEY REQUEST COMING IN FOR  "da:16:bc:11:74:7e" <- 1. step: initial pair
Mar 04 13:28:42 box-mwr log-ble-fw[368]: [00:00:47.434,570] <inf> bt_ctlr_ull_conn: incoming Tx: count= 0, len= 0, rate= 0 bps.
Mar 04 13:28:43 box-mwr log-ble-fw[368]: [00:00:48.485,046] <inf> bt_ctlr_ull_conn: incoming Tx: count= 15, len= 247, rate= 2016 bps.
Mar 04 13:28:44 box-mwr log-ble-fw[368]: [00:00:49.534,790] <inf> bt_ctlr_ull_conn: incoming Tx: count= 13, len= 203, rate= 1656 bps.
Mar 04 13:28:45 box-mwr log-ble-fw[368]: [00:00:50.583,435] <inf> bt_ctlr_ull_conn: incoming Tx: count= 11, len= 179, rate= 1575 bps.
Mar 04 13:28:45 box-mwr log-ble-fw[368]: [00:00:51.563,201] <inf> bt_ctlr_ull_conn: force_md_cnt: old= 0, new= 211.
Mar 04 13:28:46 box-mwr log-ble-fw[368]: [00:00:51.703,094] <inf> bt_ctlr_ull_conn: incoming Tx: count= 16, len= 239, rate= 1947 bps.
Mar 04 13:28:48 box-mwr log-ble-fw[368]: [00:00:52.823,272] <inf> bt_ctlr_ull_conn: incoming Tx: count= 8, len= 168, rate= 1371 bps.
Mar 04 13:28:49 box-mwr log-ble-fw[368]: [00:00:53.944,274] <inf> bt_ctlr_ull_conn: incoming Tx: count= 8, len= 168, rate= 1371 bps.
Mar 04 13:28:50 box-mwr log-ble-fw[368]: [00:00:55.170,471] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 164 bps.
Mar 04 13:29:13 box-mwr log-ble-fw[368]: [00:01:18.065,826] <inf> bt_ctlr_ull_conn: incoming Tx: count= 7, len= 49, rate= 974 bps.
Mar 04 13:29:17 box-mwr log-ble-fw[368]: [00:01:22.024,688] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 9, rate= 3 bps. <- 3. Step: reconnect with old keys
Mar 04 13:29:20 box-mwr log-ble-fw[368]: [00:01:25.225,158] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 46 bps.
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.540,344] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 57 bps. <- 5. Step: reconnect with old keys removed
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: ***** BUS FAULT *****
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os:   Imprecise data bus error
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: r0/a1:  0x200066a8  r1/a2:  0x200066ac  r2/a3:  0x01c4e0ac
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: r3/a4:  0x01c4e0a8 r12/ip:  0x01c4e0a8 r14/lr:  0x00006fcf
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os:  xpsr:  0x21000000
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: Faulting instruction address (r15/pc): 0x00006fe0
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: Current thread: 0x20000778 (unknown)
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.755,096] <err> os: Halting system
addr2line -e zephyr/zephyr.elf 0x6fe0
subsys/bluetooth/controller/util/mem.c:66

Environment (please complete the following information):

  • OS: MacOS
  • Toolchain gnuarmemb (gcc-arm-none-eabi-9-2019-q4-major)
  • v2.5.0, also tested v2.4.0 (first discovered), master@f1a27a8189e

Additional context
Now to the interesting parts:

current master works
I have tested the master@f1a27a8189e and the crash does not happen. Bisecting from v2.5.0 shows the first fixed commit is:
45bdff1

The odd thing is that it does not seem related. The change that makes the difference is the different size of struct pdu_data_llctrl . Manually increasing the size by adding a pad union member also fixes the bug on older revisions (v2.5.0):

diff --git a/subsys/bluetooth/controller/ll_sw/pdu.h b/subsys/bluetooth/controller/ll_sw/pdu.h
index 0e7385ba85..dc94078389 100644
--- a/subsys/bluetooth/controller/ll_sw/pdu.h
+++ b/subsys/bluetooth/controller/ll_sw/pdu.h
@@ -599,6 +599,7 @@ struct pdu_data_llctrl {
                struct pdu_data_llctrl_phy_rsp phy_rsp;
                struct pdu_data_llctrl_phy_upd_ind phy_upd_ind;
                struct pdu_data_llctrl_min_used_chans_ind min_used_chans_ind;
+              char pad[25];
        } __packed;
 } __packed;

** trying to find the source **

Sometimes the pairing would fail with

Mar 03 03:48:35 box-mwr log-ble-fw[389]: [00:01:25.475,555] <err> bt_ctlr_hci: Tx Buffer Overflow
Mar 03 03:48:35 box-mwr log-ble-fw[389]: [00:01:25.475,555] <err> usb_bluetooth: Error sending to driver

Poking around gdb, trying to find the code part that triggers bus fault

Breakpoint 1, k_sys_fatal_error_handler (reason=reason@entry=0, esf=esf@entry=0x20004ff8 <z_interrupt_stacks+1984>) at zephyr/include/generated/syscalls/log_ctrl.h:33
33		compiler_barrier();
(gdb) frame 3
#3  0x0000696a in z_arm_fault (msp=<optimized out>, psp=<optimized out>, exc_return=<optimized out>, callee_regs=<optimized out>)
    at /Users/matthiaswauer/Projects/nordic_hci_ble/ble_fw/firmware/zephyr/zephyr/arch/arm/core/aarch32/cortex_m/fault.c:1005
1005		z_arm_fatal_error(reason, &esf_copy);
(gdb) p esf
$1 = (z_arch_esf_t *) 0x200038b8 <rx_thread_stack+896>
(gdb) p/x *esf
$3 = {basic = {{a1 = 0x200038dc, r0 = 0x200038dc}, {a2 = 0x2000280d, r1 = 0x2000280d}, {a3 = 0x12a4, r2 = 0x12a4}, {a4 = 0x12a0, r3 = 0x12a0}, {ip = 0x0, r12 = 0x0}, {
      lr = 0x6fcf, r14 = 0x6fcf}, {pc = 0x6fda, r15 = 0x6fda}, xpsr = 0x21000000}}
(gdb) info symbol 0x200038dc
rx_thread_stack + 932 in section noinit
(gdb) x/20xw rx_thread_stack + 932
0x200038dc <rx_thread_stack+932>:	0x000012a0	0x20005bd0	0x20005bd0	0x2000390c
0x200038ec <rx_thread_stack+948>:	0x00009ff7	0x20005bd0	0x20005bd0	0x20000554
0x200038fc <rx_thread_stack+964>:	0x00000000	0x00000000	0x00008cdf	0x20000554
0x2000390c <rx_thread_stack+980>:	0x00000000	0x00000000	0x00000841	0x20005bd0
0x2000391c <rx_thread_stack+996>:	0x00004c13	0x00004bed	0x00000000	0x00000000
(gdb) info symbol 0x000012a0
event_conn_upd_prep + 1552 in section text
(gdb) info symbol 0x00009ff7
hci_acl_handle + 43 in section text
(gdb) info symbol 0x20005bd0
net_buf_hci_acl_pool in section noinit
(gdb) info symbol 0x00008cdf
hci_driver_send + 59 in section text
(gdb) disassemble /m hci_acl_handle
---- snip
4070		len = sys_le16_to_cpu(acl->len);
   0x00009fe6 <+26>:	ldrh	r3, [r5, #12]
   0x00009fe8 <+28>:	ldrh	r7, [r0, #2]

4071		handle = sys_le16_to_cpu(acl->handle);
   0x00009fea <+30>:	ldrh.w	r8, [r0]

4072
4073		if (buf->len < len) {
   0x00009fee <+34>:	cmp	r3, r7
   0x00009ff0 <+36>:	bcc.n	0xa07a <hci_acl_handle+174>

4074			BT_ERR("Invalid HCI ACL packet length");
   0x0000a07a <+174>:	ldr	r1, [pc, #104]	; (0xa0e4 <hci_acl_handle+280>)
   0x0000a07c <+176>:	ldr	r3, [pc, #96]	; (0xa0e0 <hci_acl_handle+276>)
   0x0000a07e <+178>:	ldr	r0, [pc, #112]	; (0xa0f0 <hci_acl_handle+292>)
   0x0000a080 <+180>:	b.n	0xa066 <hci_acl_handle+154>

4075			return -EINVAL;
   0x0000a074 <+168>:	mvn.w	r0, #21
   0x0000a078 <+172>:	b.n	0xa00e <hci_acl_handle+66>

4076		}
4077
4078		/* assigning flags first because handle will be overwritten */
4079		flags = bt_acl_flags(handle);
   0x00009ff2 <+38>:	bl	0xd70c <ll_tx_mem_acquire>
   0x00009ff6 <+42>:	mov.w	r3, r8, lsr #12

4080		handle = bt_acl_handle(handle);

4081
4082		node_tx = ll_tx_mem_acquire();

4083		if (!node_tx) {
   0x00009ffa <+46>:	mov	r4, r0
   0x00009ffc <+48>:	cmp	r0, #0
   0x00009ffe <+50>:	beq.n	0xa082 <hci_acl_handle+182>

4084			BT_ERR("Tx Buffer Overflow");
   0x0000a082 <+182>:	ldr	r3, [pc, #92]	; (0xa0e0 <hci_acl_handle+276>)
---- snip

From another run (free value is garbage, should be SRAM addr? Probably dereferencing caused bus error ll_tx_mem_acquire -> mem_aquire(mem_conn_tx))

(gdb) p/x mem_conn_tx
$6 = {free = 0x5a1820, pool = {0x10, 0x26, 0x0, 0x20, 0x5, 0x0, 0x0, 0xb5, 0x4f, 0xf, 0xde, 0x1e, 0x99, 0x1f, 0x50, 0xf9, 0xf4, 0x87, 0xb, 0x9a, 0xfd, 0x5e, 0x5, 0x15, 0xe0,
    0x5e, 0x6a, 0x29, 0xa7, 0x17, 0x33, 0xd3, 0x3d, 0x1d, 0x0, 0x0, 0x7c, 0x26, 0x0, 0x20, 0x4, 0x0, 0x0, 0x41, 0x0, 0x6, 0x0, 0xc, 0xe2, 0x83, 0x73, 0xca, 0x6a, 0xb0, 0xac,
    0x97, 0x4, 0xd0, 0x90, 0xe7, 0x68, 0xe2, 0x76, 0x47, 0x2d, 0x64, 0x5c, 0xc3, 0xeb, 0xe1, 0x0, 0x0, 0x58, 0x26, 0x0, 0x20, 0x7, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
    0x0, 0x0, 0x0, 0x0, 0xf0, 0xaa, 0xad, 0xba, 0x2e, 0x5f, 0x22, 0x30, 0x37, 0xdb, 0x39, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xec, 0x25, 0x0, 0x20, 0x6, 0x0, 0x0, 0x11, 0x0,
    0x6, 0x0, 0x3, 0x22, 0xe5, 0xc2, 0xd5, 0xe7, 0x11, 0x91, 0x26, 0xb7, 0xb0, 0x25, 0x3d, 0x78, 0x2a, 0x64, 0xe3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xa0, 0x26, 0x0,
    0x20, 0x3, 0x0 <repeats 31 times>, 0xc4, 0x26, 0x0, 0x20, 0x0 <repeats 68 times>}}

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions