Increase robustness of LwsApiCall implementation #2134

vikramdattu · 2025-05-14T17:04:22Z

Handle partial writes by sending data in multiple iterations
Use retries when message send fails
Track and handle message receive if in parts using receiveMessage var

sirknightj

These are good improvements..

Wanted to know if you had a method or setup to test these failure recovery scenarios and if we could add something for it? Seems like it would be a lot of additional effort.

sirknightj · 2025-05-22T17:24:21Z

src/source/Signaling/LwsApiCalls.c

+                retValue = (INT32) lws_write(wsi, pLwsCallInfo->sendBuffer + LWS_PRE, remainingSize, LWS_WRITE_TEXT);
+                if (retValue < 0) {
+                    DLOGW("Write failed with %d", retValue);
+                    CHK(FALSE, !STATUS_SUCCESS);


Should put an actual status here -- I believe this would get converted to 1 which is STATUS_NULL_ARG

sirknightj · 2025-05-22T17:26:11Z

src/source/Signaling/LwsApiCalls.c

        offset = ATOMIC_LOAD(&pSignalingClient->pOngoingCallInfo->sendOffset);
        size = ATOMIC_LOAD(&pSignalingClient->pOngoingCallInfo->sendBufferSize);

        result = (SERVICE_CALL_RESULT) ATOMIC_LOAD(&pSignalingClient->messageResult);

        if (offset != size && result == SERVICE_CALL_RESULT_NOT_SET) {
            CHK_STATUS(CVAR_WAIT(pSignalingClient->sendCvar, pSignalingClient->sendLock, SIGNALING_SEND_TIMEOUT));
+            retryCount++;


Wondering if we can debug log (DLOGD) the retry count increased

If the CVAR_WAIT returned non-success (eg status operation timed out), CHK_STATUS would goto Cleanup and bypass the retry count -- doesn't seem intended since there's a "// Check if we timed out" down below

@sirknightj looks a bit odd, but here is what's happening:

If CVAR_WAIT returns on timeout, we exit anyway.

Only if it was returned early, we increment the retryCount and go again for wait if offset != size.
else we stop iterating and go for the rest of the code path.

So, retryCount is basically, making it retry if wakeup was within timeout.
Let me know if you have a better suggestion to handle sendCvar wake-ups without just giving up entirely.

sirknightj · 2025-05-23T04:17:26Z

src/source/Signaling/LwsApiCalls.c

    connectInfo.port = SIGNALING_DEFAULT_SSL_PORT;
+    connectInfo.alpn = "http/1.1";     // Force HTTP/1.1 only
+    connectInfo.protocol = "http/1.1"; // Force HTTP/1.1 protocol


Wondering the purpose of the H2 flag if 1.1 is forced, is it for latency/performance optimization?

It seems that HTTP 2 should from this quick test:

curl -sI https://m-xxxxxxxx.kinesisvideo.us-west-2.amazonaws.com -o/dev/null -w '%{http_version}\n' 2 curl -sI https://v-xxxxxxxx.kinesisvideo.us-west-2.amazonaws.com -o/dev/null -w '%{http_version}\n' 2

I added this to make this ws connection work more reliably on ESP platforms. Helps reduce memory consumption a bit and works much reliably. BTW, do we really gain much with http2 for our use case?

Now that, I have lot of optimisations in place, I can test http2 with ESP32 and revert the change.

sirknightj · 2025-05-23T04:20:20Z

src/source/Signaling/LwsApiCalls.c

@@ -1907,6 +1948,9 @@ STATUS writeLwsData(PSignalingClient pSignalingClient, BOOL awaitForResponse)
    SIZE_T offset, size;
    SERVICE_CALL_RESULT result;

+    UINT32 retryCount = 0;
+    const UINT32 MAX_RETRY_COUNT = 3;


Thinking this should be a #define (very very minor memory reduction)

- Handle partial writes by sending data in multiple iterations - Use retries when message send fails - Track and handle message receive if in parts using `receiveMessage` var

vikramdattu · 2025-06-18T13:37:26Z

Wanted to know if you had a method or setup to test these failure recovery scenarios and if we could add something for it? Seems like it would be a lot of additional effort.

@sirknightj I can see the re-attempts reproducible quite often on ESP platforms and hence the change. Will check if it's straight forward to add some test.

vikramdattu force-pushed the improve/lwsapicalls_robustness branch from 38a25f5 to 9f7ebd9 Compare May 15, 2025 10:21

sirknightj reviewed May 23, 2025

View reviewed changes

Increase robustness of LwsApiCall implementation

152ba5c

- Handle partial writes by sending data in multiple iterations - Use retries when message send fails - Track and handle message receive if in parts using `receiveMessage` var

vikramdattu force-pushed the improve/lwsapicalls_robustness branch from 9f7ebd9 to 152ba5c Compare June 18, 2025 13:00

vikramdattu requested a review from sirknightj June 18, 2025 15:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase robustness of LwsApiCall implementation #2134

Increase robustness of LwsApiCall implementation #2134

Uh oh!

vikramdattu commented May 14, 2025

Uh oh!

sirknightj left a comment

Uh oh!

sirknightj May 22, 2025

Uh oh!

sirknightj May 22, 2025

Uh oh!

sirknightj May 22, 2025

Uh oh!

vikramdattu Jun 18, 2025

Uh oh!

sirknightj May 23, 2025

Uh oh!

vikramdattu Jun 18, 2025

Uh oh!

sirknightj May 23, 2025

Uh oh!

vikramdattu commented Jun 18, 2025

Uh oh!

Uh oh!

Increase robustness of LwsApiCall implementation #2134

Are you sure you want to change the base?

Increase robustness of LwsApiCall implementation #2134

Uh oh!

Conversation

vikramdattu commented May 14, 2025

Uh oh!

sirknightj left a comment

Choose a reason for hiding this comment

Uh oh!

sirknightj May 22, 2025

Choose a reason for hiding this comment

Uh oh!

sirknightj May 22, 2025

Choose a reason for hiding this comment

Uh oh!

sirknightj May 22, 2025

Choose a reason for hiding this comment

Uh oh!

vikramdattu Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

sirknightj May 23, 2025

Choose a reason for hiding this comment

Uh oh!

vikramdattu Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

sirknightj May 23, 2025

Choose a reason for hiding this comment

Uh oh!

vikramdattu commented Jun 18, 2025

Uh oh!

Uh oh!