Skip to content

Conversation

worr
Copy link
Contributor

@worr worr commented Oct 11, 2014

Only test for sched_yield on non-Windows platforms, since it breaks under MingW. sched_yield ought to be tested for on platforms other than Solaris, since it is required. Fixes #45.

@xfxyjwf xfxyjwf merged commit 20e0a61 into protocolbuffers:master Oct 11, 2014
TeBoring pushed a commit to TeBoring/protobuf that referenced this pull request Jan 19, 2019
make googlepb compile with gcc < 4.5.1
copybara-service bot pushed a commit that referenced this pull request May 31, 2025
There is no need to keep these inline as they are not specialized to the function.  We also don't need branch prediction slots for them as the `ret` will always be correctly predicted, and the "end of buffer" branch is rarely taken.

With these changes, the functions to parse fixed fields are about as short and optimal as they could be:

<details>

<summary>x86-64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: 45 84 c9                      test    r9b, r9b
       3: 0f 85 00 00 00 00             jne     0x9 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x9>
                0000000000000005:  R_X86_64_PLT32       _upb_FastDecoder_DecodeGeneric-0x4
       9: 4c 89 c8                      mov     rax, r9
       c: 48 c1 e8 30                   shr     rax, 0x30
      10: 45 89 ca                      mov     r10d, r9d
      13: 41 c1 ea 18                   shr     r10d, 0x18
      17: 4d 0f ab d0                   bts     r8, r10
      1b: 44 8b 56 01                   mov     r10d, dword ptr [rsi + 0x1]
      1f: 44 89 14 02                   mov     dword ptr [rdx + rax], r10d
      23: 48 83 c6 05                   add     rsi, 0x5
      27: 48 3b 77 08                   cmp     rsi, qword ptr [rdi + 0x8]
      2b: 0f 83 00 00 00 00             jae     0x31 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x31>
                000000000000002d:  R_X86_64_PLT32       upb_DecodeFast_MessageIsDoneFallback-0x4
      31: 0f b7 06                      movzx   eax, word ptr [rsi]
      34: 49 89 c9                      mov     r9, rcx
      37: 49 c1 f9 08                   sar     r9, 0x8
      3b: 41 89 c2                      mov     r10d, eax
      3e: 41 21 ca                      and     r10d, ecx
      41: 41 81 e2 f8 00 00 00          and     r10d, 0xf8
      48: 4f 8b 5c 51 20                mov     r11, qword ptr [r9 + 2*r10 + 0x20]
      4d: 4f 8b 4c 51 18                mov     r9, qword ptr [r9 + 2*r10 + 0x18]
      52: 49 31 c1                      xor     r9, rax
      55: 41 ff e3                      jmp     r11
```

</details>

<details>

<summary>ARM64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: f2401cbf      tst     x5, #0xff
       4: 54000040      b.eq    0xc <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0xc>
       8: 14000000      b       0x8 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x8>
                0000000000000008:  R_AARCH64_JUMP26     _upb_FastDecoder_DecodeGeneric
       c: d370fca8      lsr     x8, x5, #48
      10: b840102      ldur    w9, [x1, #0x1]
      14: 5280002a      mov     w10, #0x1               // =1
      18: 91001421      add     x1, x1, #0x5
      1c: b8286849      str     w9, [x2, x8]
      20: d358fca8      lsr     x8, x5, #24
      24: f9400409      ldr     x9, [x0, #0x8]
      28: 9ac82148      lsl     x8, x10, x8
      2c: eb01013f      cmp     x9, x1
      30: 54000149      b.ls    0x58 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x58>
      34: 79400029      ldrh    w9, [x1]
      38: 9348fc6a      asr     x10, x3, #8
      3c: 12001c6b      and     w11, w3, #0xff
      40: aa040104      orr     x4, x8, x4
      44: 8a09016b      and     x11, x11, x9
      48: 8b0b054a      add     x10, x10, x11, lsl #1
      4c: a941994b      ldp     x11, x6, [x10, #0x18]
      50: ca090165      eor     x5, x11, x9
      54: d61f00c0      br      x6
      58: aa040104      orr     x4, x8, x4
      5c: 14000000      b       0x5c <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x5c>
                000000000000005c:  R_AARCH64_JUMP26     upb_DecodeFast_MessageIsDoneFallback
```

</details>

PiperOrigin-RevId: 765574494
copybara-service bot pushed a commit that referenced this pull request May 31, 2025
There is no need to keep these inline as they are not specialized to the function.  We also don't need branch prediction slots for them as the `ret` will always be correctly predicted, and the "end of buffer" branch is rarely taken.

With these changes, the functions to parse fixed fields are about as short and optimal as they could be:

<details>

<summary>x86-64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: 45 84 c9                      test    r9b, r9b
       3: 0f 85 00 00 00 00             jne     0x9 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x9>
                0000000000000005:  R_X86_64_PLT32       _upb_FastDecoder_DecodeGeneric-0x4
       9: 4c 89 c8                      mov     rax, r9
       c: 48 c1 e8 30                   shr     rax, 0x30
      10: 45 89 ca                      mov     r10d, r9d
      13: 41 c1 ea 18                   shr     r10d, 0x18
      17: 4d 0f ab d0                   bts     r8, r10
      1b: 44 8b 56 01                   mov     r10d, dword ptr [rsi + 0x1]
      1f: 44 89 14 02                   mov     dword ptr [rdx + rax], r10d
      23: 48 83 c6 05                   add     rsi, 0x5
      27: 48 3b 77 08                   cmp     rsi, qword ptr [rdi + 0x8]
      2b: 0f 83 00 00 00 00             jae     0x31 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x31>
                000000000000002d:  R_X86_64_PLT32       upb_DecodeFast_MessageIsDoneFallback-0x4
      31: 0f b7 06                      movzx   eax, word ptr [rsi]
      34: 49 89 c9                      mov     r9, rcx
      37: 49 c1 f9 08                   sar     r9, 0x8
      3b: 41 89 c2                      mov     r10d, eax
      3e: 41 21 ca                      and     r10d, ecx
      41: 41 81 e2 f8 00 00 00          and     r10d, 0xf8
      48: 4f 8b 5c 51 20                mov     r11, qword ptr [r9 + 2*r10 + 0x20]
      4d: 4f 8b 4c 51 18                mov     r9, qword ptr [r9 + 2*r10 + 0x18]
      52: 49 31 c1                      xor     r9, rax
      55: 41 ff e3                      jmp     r11
```

</details>

<details>

<summary>ARM64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: f2401cbf      tst     x5, #0xff
       4: 54000040      b.eq    0xc <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0xc>
       8: 14000000      b       0x8 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x8>
                0000000000000008:  R_AARCH64_JUMP26     _upb_FastDecoder_DecodeGeneric
       c: d370fca8      lsr     x8, x5, #48
      10: b840102      ldur    w9, [x1, #0x1]
      14: 5280002a      mov     w10, #0x1               // =1
      18: 91001421      add     x1, x1, #0x5
      1c: b8286849      str     w9, [x2, x8]
      20: d358fca8      lsr     x8, x5, #24
      24: f9400409      ldr     x9, [x0, #0x8]
      28: 9ac82148      lsl     x8, x10, x8
      2c: eb01013f      cmp     x9, x1
      30: 54000149      b.ls    0x58 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x58>
      34: 79400029      ldrh    w9, [x1]
      38: 9348fc6a      asr     x10, x3, #8
      3c: 12001c6b      and     w11, w3, #0xff
      40: aa040104      orr     x4, x8, x4
      44: 8a09016b      and     x11, x11, x9
      48: 8b0b054a      add     x10, x10, x11, lsl #1
      4c: a941994b      ldp     x11, x6, [x10, #0x18]
      50: ca090165      eor     x5, x11, x9
      54: d61f00c0      br      x6
      58: aa040104      orr     x4, x8, x4
      5c: 14000000      b       0x5c <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x5c>
                000000000000005c:  R_AARCH64_JUMP26     upb_DecodeFast_MessageIsDoneFallback
```

</details>

PiperOrigin-RevId: 765574494
copybara-service bot pushed a commit that referenced this pull request May 31, 2025
There is no need to keep these inline as they are not specialized to the function.  We also don't need branch prediction slots for them as the `ret` will always be correctly predicted, and the "end of buffer" branch is rarely taken.

With these changes, the functions to parse fixed fields are about as short and optimal as they could be:

<details>

<summary>x86-64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: 45 84 c9                      test    r9b, r9b
       3: 0f 85 00 00 00 00             jne     0x9 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x9>
                0000000000000005:  R_X86_64_PLT32       _upb_FastDecoder_DecodeGeneric-0x4
       9: 4c 89 c8                      mov     rax, r9
       c: 48 c1 e8 30                   shr     rax, 0x30
      10: 45 89 ca                      mov     r10d, r9d
      13: 41 c1 ea 18                   shr     r10d, 0x18
      17: 4d 0f ab d0                   bts     r8, r10
      1b: 44 8b 56 01                   mov     r10d, dword ptr [rsi + 0x1]
      1f: 44 89 14 02                   mov     dword ptr [rdx + rax], r10d
      23: 48 83 c6 05                   add     rsi, 0x5
      27: 48 3b 77 08                   cmp     rsi, qword ptr [rdi + 0x8]
      2b: 0f 83 00 00 00 00             jae     0x31 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x31>
                000000000000002d:  R_X86_64_PLT32       upb_DecodeFast_MessageIsDoneFallback-0x4
      31: 0f b7 06                      movzx   eax, word ptr [rsi]
      34: 49 89 c9                      mov     r9, rcx
      37: 49 c1 f9 08                   sar     r9, 0x8
      3b: 41 89 c2                      mov     r10d, eax
      3e: 41 21 ca                      and     r10d, ecx
      41: 41 81 e2 f8 00 00 00          and     r10d, 0xf8
      48: 4f 8b 5c 51 20                mov     r11, qword ptr [r9 + 2*r10 + 0x20]
      4d: 4f 8b 4c 51 18                mov     r9, qword ptr [r9 + 2*r10 + 0x18]
      52: 49 31 c1                      xor     r9, rax
      55: 41 ff e3                      jmp     r11
```

</details>

<details>

<summary>ARM64 Assembly</summary>

```
0000000000000000 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte>:
       0: f2401cbf      tst     x5, #0xff
       4: 54000040      b.eq    0xc <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0xc>
       8: 14000000      b       0x8 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x8>
                0000000000000008:  R_AARCH64_JUMP26     _upb_FastDecoder_DecodeGeneric
       c: d370fca8      lsr     x8, x5, #48
      10: b840102      ldur    w9, [x1, #0x1]
      14: 5280002a      mov     w10, #0x1               // =1
      18: 91001421      add     x1, x1, #0x5
      1c: b8286849      str     w9, [x2, x8]
      20: d358fca8      lsr     x8, x5, #24
      24: f9400409      ldr     x9, [x0, #0x8]
      28: 9ac82148      lsl     x8, x10, x8
      2c: eb01013f      cmp     x9, x1
      30: 54000149      b.ls    0x58 <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x58>
      34: 79400029      ldrh    w9, [x1]
      38: 9348fc6a      asr     x10, x3, #8
      3c: 12001c6b      and     w11, w3, #0xff
      40: aa040104      orr     x4, x8, x4
      44: 8a09016b      and     x11, x11, x9
      48: 8b0b054a      add     x10, x10, x11, lsl #1
      4c: a941994b      ldp     x11, x6, [x10, #0x18]
      50: ca090165      eor     x5, x11, x9
      54: d61f00c0      br      x6
      58: aa040104      orr     x4, x8, x4
      5c: 14000000      b       0x5c <upb_DecodeFast_Fixed32_Scalar_Tag1Byte+0x5c>
                000000000000005c:  R_AARCH64_JUMP26     upb_DecodeFast_MessageIsDoneFallback
```

</details>

PiperOrigin-RevId: 765632464
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Searching for sched_yield fails on MinGW.

2 participants