-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Description
Describe the bug
hci_usb FW crashes with imprecise bus error after a repeated pairing attempt with secure only connections.
To Reproduce
Steps to reproduce the behavior:
On the central side I use the hci_usb FW with a linux/bluez 5.55 host. The peripheral is a custom FW with secure only connections that uses bt_conn_set_security(conn, BT_SECURITY_L4)
in the connect cb.
- Connect and pair the peripheral
- Disconnect the peripheral and remove its stored keys
- Try reconnecting. bluez provides the now outdated keys and establishing secure connction fails.
- Remove the keys in bluez
- Reconnect the peripheral. The key exchange procedure crashes the central (hci_usb FW)
Unfortunately I was not able to reproduce the issue with the peripheral_sc_only sample.
If step 3. is removed, everything works as expected.
If after step 3. bluez or the FW is reset, everything works as expected.
The behaviour is reproducable almost 100%, sometimes the hci_usb FW just "hangs" instead of crashing.
Expected behavior
hci_usb FW should not crash or hang.
Impact
Would be showstopper, but workarounds exist.
Logs and console output
btmon log
< HCI Command: LE Create Conne.. (0x08|0x000d) plen 25 #6065 [hci0] 13669.632058
Scan interval: 60.000 msec (0x0060)
Scan window: 60.000 msec (0x0060)
Filter policy: White list is not used (0x00)
Peer address type: Random (0x01)
Peer address: DA:16:BC:11:74:7E (Static)
Own address type: Random (0x01)
Min connection interval: 50.00 msec (0x0028)
Max connection interval: 70.00 msec (0x0038)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Min connection length: 0.000 msec (0x0000)
Max connection length: 0.000 msec (0x0000)
> HCI Event: Command Status (0x0f) plen 4 #6066 [hci0] 13669.634932
LE Create Connection (0x08|0x000d) ncmd 1
Status: Success (0x00)
> HCI Event: LE Meta Event (0x3e) plen 19 #6067 [hci0] 13670.633029
LE Connection Complete (0x01)
Status: Success (0x00)
Handle: 0
Role: Master (0x00)
Peer address type: Random (0x01)
Peer address: DA:16:BC:11:74:7E (Static)
Connection interval: 70.00 msec (0x0038)
Connection latency: 0 (0x0000)
Supervision timeout: 420 msec (0x002a)
Master clock accuracy: 0x06
@ MGMT Event: Device Connected (0x000b) plen 16 {0x0001} [hci0] 13670.633124
LE Address: DA:16:BC:11:74:7E (Static)
Flags: 0x00000000
Data length: 3
Flags: 0x06
LE General Discoverable Mode
BR/EDR Not Supported
< HCI Command: LE Read Remote U.. (0x08|0x0016) plen 2 #6068 [hci0] 13670.633420
Handle: 0
> HCI Event: LE Meta Event (0x3e) plen 4 #6069 [hci0] 13670.634051
LE Channel Selection Algorithm (0x14)
Handle: 0
Algorithm: #2 (0x01)
> HCI Event: Command Status (0x0f) plen 4 #6070 [hci0] 13670.636064
LE Read Remote Used Features (0x08|0x0016) ncmd 1
Status: Success (0x00)
> ACL Data RX: Handle 0 flags 0x02 dlen 6 #6071 [hci0] 13670.704037
SMP: Security Request (0x0b) len 1
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
> HCI Event: LE Meta Event (0x3e) plen 12 #6072 [hci0] 13670.775035
LE Read Remote Used Features (0x04)
Status: Success (0x00)
Handle: 0
Features: 0x7f 0x49 0x01 0x00 0x00 0x00 0x00 0x00
LE Encryption
Connection Parameter Request Procedure
Extended Reject Indication
Slave-initiated Features Exchange
LE Ping
LE Data Packet Length Extension
LL Privacy
LE 2M PHY
LE Coded PHY
Channel Selection Algorithm #2
Minimum Number of Used Channels Procedure
< ACL Data TX: Handle 0 flags 0x00 dlen 11 #6073 [hci0] 13670.775244
SMP: Pairing Request (0x01) len 6
IO capability: KeyboardOnly (0x02)
OOB data: Authentication data not present (0x00)
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
Max encryption key size: 16
Initiator key distribution: EncKey Sign (0x05)
Responder key distribution: EncKey IdKey Sign (0x07)
= bluetoothd: ../git/src/device.c:load_gatt_db() No cache for DA.. 13670.778396
< ACL Data TX: Handle 0 flags 0x00 dlen 7 #6074 [hci0] 13670.781270
ATT: Exchange MTU Request (0x02) len 2
Client RX MTU: 517
> HCI Event: Number of Completed Packets (0x13) plen 5 #6075 [hci0] 13670.845037
Num handles: 1
Handle: 0
Count: 1
> HCI Event: LE Meta Event (0x3e) plen 11 #6076 [hci0] 13670.847032
LE Data Length Change (0x07)
Handle: 0
Max TX octets: 27
Max TX time: 2704
Max RX octets: 27
Max RX time: 2704
> HCI Event: Number of Completed Packets (0x13) plen 5 #6077 [hci0] 13670.849026
Num handles: 1
Handle: 0
Count: 1
> ACL Data RX: Handle 0 flags 0x02 dlen 11 #6078 [hci0] 13670.914059
SMP: Pairing Response (0x02) len 6
IO capability: DisplayOnly (0x00)
OOB data: Authentication data not present (0x00)
Authentication requirement: Bonding, MITM, SC, No Keypresses (0x0d)
Max encryption key size: 16
Initiator key distribution: <none> (0x00)
Responder key distribution: <none> (0x00)
> ACL Data RX: Handle 0 flags 0x02 dlen 7 #6079 [hci0] 13670.914095
ATT: Exchange MTU Response (0x03) len 2
Server RX MTU: 252
< ACL Data TX: Handle 0 flags 0x00 dlen 27 #6080 [hci0] 13670.923254
< ACL Data TX: Handle 0 flags 0x01 dlen 27 #6081 [hci0] 13670.923293
< ACL Data TX: Handle 0 flags 0x01 dlen 15 #6082 [hci0] 13670.923312
SMP: Pairing Public Key (0x0c) len 64
X: fa943f0262bdc282650467cbdedc15e2da2af338a8660eb241ebdfb80da59e1e
Y: a40bd8d1e379d130baf81a4baf86b0dee11c8f3bab3bdd23156043e08c71a952
< ACL Data TX: Handle 0 flags 0x00 dlen 11 #6083 [hci0] 13670.923686
ATT: Read By Type Request (0x08) len 6
Handle range: 0x0001-0xffff
Attribute type: Unknown (0x2b3a)
< HCI Command: Disconnect (0x01|0x0006) plen 3 #6084 [hci0] 13701.312561
Handle: 0
Reason: Remote User Terminated Connection (0x13)
hci_usb sample log (annotated)
Mar 04 13:27:54 box-mwr log-ble-fw[368]: *** Booting Zephyr OS build zephyr-v2.5.0 ***
Mar 04 13:27:54 box-mwr log-ble-fw[368]: Bluetooth over USB sample
Mar 04 13:27:54 box-mwr log-ble-fw[368]: [00:00:00.349,853] <inf> bt_hci_raw: Bluetooth enabled in RAW mode
Mar 04 13:27:55 box-mwr systemd[1]: Starting Start BLE Bluetooth Pin Agent...
Mar 04 13:27:55 box-mwr systemd[1]: Started Start BLE Bluetooth Pin Agent.
Mar 04 13:28:41 box-mwr ble-autopair[862]: PASSKEY REQUEST COMING IN FOR "da:16:bc:11:74:7e" <- 1. step: initial pair
Mar 04 13:28:42 box-mwr log-ble-fw[368]: [00:00:47.434,570] <inf> bt_ctlr_ull_conn: incoming Tx: count= 0, len= 0, rate= 0 bps.
Mar 04 13:28:43 box-mwr log-ble-fw[368]: [00:00:48.485,046] <inf> bt_ctlr_ull_conn: incoming Tx: count= 15, len= 247, rate= 2016 bps.
Mar 04 13:28:44 box-mwr log-ble-fw[368]: [00:00:49.534,790] <inf> bt_ctlr_ull_conn: incoming Tx: count= 13, len= 203, rate= 1656 bps.
Mar 04 13:28:45 box-mwr log-ble-fw[368]: [00:00:50.583,435] <inf> bt_ctlr_ull_conn: incoming Tx: count= 11, len= 179, rate= 1575 bps.
Mar 04 13:28:45 box-mwr log-ble-fw[368]: [00:00:51.563,201] <inf> bt_ctlr_ull_conn: force_md_cnt: old= 0, new= 211.
Mar 04 13:28:46 box-mwr log-ble-fw[368]: [00:00:51.703,094] <inf> bt_ctlr_ull_conn: incoming Tx: count= 16, len= 239, rate= 1947 bps.
Mar 04 13:28:48 box-mwr log-ble-fw[368]: [00:00:52.823,272] <inf> bt_ctlr_ull_conn: incoming Tx: count= 8, len= 168, rate= 1371 bps.
Mar 04 13:28:49 box-mwr log-ble-fw[368]: [00:00:53.944,274] <inf> bt_ctlr_ull_conn: incoming Tx: count= 8, len= 168, rate= 1371 bps.
Mar 04 13:28:50 box-mwr log-ble-fw[368]: [00:00:55.170,471] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 164 bps.
Mar 04 13:29:13 box-mwr log-ble-fw[368]: [00:01:18.065,826] <inf> bt_ctlr_ull_conn: incoming Tx: count= 7, len= 49, rate= 974 bps.
Mar 04 13:29:17 box-mwr log-ble-fw[368]: [00:01:22.024,688] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 9, rate= 3 bps. <- 3. Step: reconnect with old keys
Mar 04 13:29:20 box-mwr log-ble-fw[368]: [00:01:25.225,158] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 46 bps.
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.540,344] <inf> bt_ctlr_ull_conn: incoming Tx: count= 1, len= 23, rate= 57 bps. <- 5. Step: reconnect with old keys removed
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: ***** BUS FAULT *****
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: Imprecise data bus error
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: r0/a1: 0x200066a8 r1/a2: 0x200066ac r2/a3: 0x01c4e0ac
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,842] <err> os: r3/a4: 0x01c4e0a8 r12/ip: 0x01c4e0a8 r14/lr: 0x00006fcf
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: xpsr: 0x21000000
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: Faulting instruction address (r15/pc): 0x00006fe0
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.688,873] <err> os: Current thread: 0x20000778 (unknown)
Mar 04 13:29:32 box-mwr log-ble-fw[368]: [00:01:38.755,096] <err> os: Halting system
addr2line -e zephyr/zephyr.elf 0x6fe0
subsys/bluetooth/controller/util/mem.c:66
Environment (please complete the following information):
- OS: MacOS
- Toolchain gnuarmemb (gcc-arm-none-eabi-9-2019-q4-major)
- v2.5.0, also tested v2.4.0 (first discovered), master@f1a27a8189e
Additional context
Now to the interesting parts:
current master works
I have tested the master@f1a27a8189e and the crash does not happen. Bisecting from v2.5.0 shows the first fixed commit is:
45bdff1
The odd thing is that it does not seem related. The change that makes the difference is the different size of struct pdu_data_llctrl
. Manually increasing the size by adding a pad union member also fixes the bug on older revisions (v2.5.0):
diff --git a/subsys/bluetooth/controller/ll_sw/pdu.h b/subsys/bluetooth/controller/ll_sw/pdu.h
index 0e7385ba85..dc94078389 100644
--- a/subsys/bluetooth/controller/ll_sw/pdu.h
+++ b/subsys/bluetooth/controller/ll_sw/pdu.h
@@ -599,6 +599,7 @@ struct pdu_data_llctrl {
struct pdu_data_llctrl_phy_rsp phy_rsp;
struct pdu_data_llctrl_phy_upd_ind phy_upd_ind;
struct pdu_data_llctrl_min_used_chans_ind min_used_chans_ind;
+ char pad[25];
} __packed;
} __packed;
** trying to find the source **
Sometimes the pairing would fail with
Mar 03 03:48:35 box-mwr log-ble-fw[389]: [00:01:25.475,555] <err> bt_ctlr_hci: Tx Buffer Overflow
Mar 03 03:48:35 box-mwr log-ble-fw[389]: [00:01:25.475,555] <err> usb_bluetooth: Error sending to driver
Poking around gdb, trying to find the code part that triggers bus fault
Breakpoint 1, k_sys_fatal_error_handler (reason=reason@entry=0, esf=esf@entry=0x20004ff8 <z_interrupt_stacks+1984>) at zephyr/include/generated/syscalls/log_ctrl.h:33
33 compiler_barrier();
(gdb) frame 3
#3 0x0000696a in z_arm_fault (msp=<optimized out>, psp=<optimized out>, exc_return=<optimized out>, callee_regs=<optimized out>)
at /Users/matthiaswauer/Projects/nordic_hci_ble/ble_fw/firmware/zephyr/zephyr/arch/arm/core/aarch32/cortex_m/fault.c:1005
1005 z_arm_fatal_error(reason, &esf_copy);
(gdb) p esf
$1 = (z_arch_esf_t *) 0x200038b8 <rx_thread_stack+896>
(gdb) p/x *esf
$3 = {basic = {{a1 = 0x200038dc, r0 = 0x200038dc}, {a2 = 0x2000280d, r1 = 0x2000280d}, {a3 = 0x12a4, r2 = 0x12a4}, {a4 = 0x12a0, r3 = 0x12a0}, {ip = 0x0, r12 = 0x0}, {
lr = 0x6fcf, r14 = 0x6fcf}, {pc = 0x6fda, r15 = 0x6fda}, xpsr = 0x21000000}}
(gdb) info symbol 0x200038dc
rx_thread_stack + 932 in section noinit
(gdb) x/20xw rx_thread_stack + 932
0x200038dc <rx_thread_stack+932>: 0x000012a0 0x20005bd0 0x20005bd0 0x2000390c
0x200038ec <rx_thread_stack+948>: 0x00009ff7 0x20005bd0 0x20005bd0 0x20000554
0x200038fc <rx_thread_stack+964>: 0x00000000 0x00000000 0x00008cdf 0x20000554
0x2000390c <rx_thread_stack+980>: 0x00000000 0x00000000 0x00000841 0x20005bd0
0x2000391c <rx_thread_stack+996>: 0x00004c13 0x00004bed 0x00000000 0x00000000
(gdb) info symbol 0x000012a0
event_conn_upd_prep + 1552 in section text
(gdb) info symbol 0x00009ff7
hci_acl_handle + 43 in section text
(gdb) info symbol 0x20005bd0
net_buf_hci_acl_pool in section noinit
(gdb) info symbol 0x00008cdf
hci_driver_send + 59 in section text
(gdb) disassemble /m hci_acl_handle
---- snip
4070 len = sys_le16_to_cpu(acl->len);
0x00009fe6 <+26>: ldrh r3, [r5, #12]
0x00009fe8 <+28>: ldrh r7, [r0, #2]
4071 handle = sys_le16_to_cpu(acl->handle);
0x00009fea <+30>: ldrh.w r8, [r0]
4072
4073 if (buf->len < len) {
0x00009fee <+34>: cmp r3, r7
0x00009ff0 <+36>: bcc.n 0xa07a <hci_acl_handle+174>
4074 BT_ERR("Invalid HCI ACL packet length");
0x0000a07a <+174>: ldr r1, [pc, #104] ; (0xa0e4 <hci_acl_handle+280>)
0x0000a07c <+176>: ldr r3, [pc, #96] ; (0xa0e0 <hci_acl_handle+276>)
0x0000a07e <+178>: ldr r0, [pc, #112] ; (0xa0f0 <hci_acl_handle+292>)
0x0000a080 <+180>: b.n 0xa066 <hci_acl_handle+154>
4075 return -EINVAL;
0x0000a074 <+168>: mvn.w r0, #21
0x0000a078 <+172>: b.n 0xa00e <hci_acl_handle+66>
4076 }
4077
4078 /* assigning flags first because handle will be overwritten */
4079 flags = bt_acl_flags(handle);
0x00009ff2 <+38>: bl 0xd70c <ll_tx_mem_acquire>
0x00009ff6 <+42>: mov.w r3, r8, lsr #12
4080 handle = bt_acl_handle(handle);
4081
4082 node_tx = ll_tx_mem_acquire();
4083 if (!node_tx) {
0x00009ffa <+46>: mov r4, r0
0x00009ffc <+48>: cmp r0, #0
0x00009ffe <+50>: beq.n 0xa082 <hci_acl_handle+182>
4084 BT_ERR("Tx Buffer Overflow");
0x0000a082 <+182>: ldr r3, [pc, #92] ; (0xa0e0 <hci_acl_handle+276>)
---- snip
From another run (free
value is garbage, should be SRAM addr? Probably dereferencing caused bus error ll_tx_mem_acquire
-> mem_aquire(mem_conn_tx)
)
(gdb) p/x mem_conn_tx
$6 = {free = 0x5a1820, pool = {0x10, 0x26, 0x0, 0x20, 0x5, 0x0, 0x0, 0xb5, 0x4f, 0xf, 0xde, 0x1e, 0x99, 0x1f, 0x50, 0xf9, 0xf4, 0x87, 0xb, 0x9a, 0xfd, 0x5e, 0x5, 0x15, 0xe0,
0x5e, 0x6a, 0x29, 0xa7, 0x17, 0x33, 0xd3, 0x3d, 0x1d, 0x0, 0x0, 0x7c, 0x26, 0x0, 0x20, 0x4, 0x0, 0x0, 0x41, 0x0, 0x6, 0x0, 0xc, 0xe2, 0x83, 0x73, 0xca, 0x6a, 0xb0, 0xac,
0x97, 0x4, 0xd0, 0x90, 0xe7, 0x68, 0xe2, 0x76, 0x47, 0x2d, 0x64, 0x5c, 0xc3, 0xeb, 0xe1, 0x0, 0x0, 0x58, 0x26, 0x0, 0x20, 0x7, 0x0, 0x0, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0xf0, 0xaa, 0xad, 0xba, 0x2e, 0x5f, 0x22, 0x30, 0x37, 0xdb, 0x39, 0xfc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xec, 0x25, 0x0, 0x20, 0x6, 0x0, 0x0, 0x11, 0x0,
0x6, 0x0, 0x3, 0x22, 0xe5, 0xc2, 0xd5, 0xe7, 0x11, 0x91, 0x26, 0xb7, 0xb0, 0x25, 0x3d, 0x78, 0x2a, 0x64, 0xe3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xa0, 0x26, 0x0,
0x20, 0x3, 0x0 <repeats 31 times>, 0xc4, 0x26, 0x0, 0x20, 0x0 <repeats 68 times>}}