Skip to content

Conversation

vikramdattu
Copy link
Contributor

  • Handle partial writes by sending data in multiple iterations
  • Use retries when message send fails
  • Track and handle message receive if in parts using receiveMessage var

@vikramdattu vikramdattu force-pushed the improve/lwsapicalls_robustness branch from 38a25f5 to 9f7ebd9 Compare May 15, 2025 10:21
Copy link
Contributor

@sirknightj sirknightj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are good improvements..

Wanted to know if you had a method or setup to test these failure recovery scenarios and if we could add something for it? Seems like it would be a lot of additional effort.

retValue = (INT32) lws_write(wsi, pLwsCallInfo->sendBuffer + LWS_PRE, remainingSize, LWS_WRITE_TEXT);
if (retValue < 0) {
DLOGW("Write failed with %d", retValue);
CHK(FALSE, !STATUS_SUCCESS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should put an actual status here -- I believe this would get converted to 1 which is STATUS_NULL_ARG

offset = ATOMIC_LOAD(&pSignalingClient->pOngoingCallInfo->sendOffset);
size = ATOMIC_LOAD(&pSignalingClient->pOngoingCallInfo->sendBufferSize);

result = (SERVICE_CALL_RESULT) ATOMIC_LOAD(&pSignalingClient->messageResult);

if (offset != size && result == SERVICE_CALL_RESULT_NOT_SET) {
CHK_STATUS(CVAR_WAIT(pSignalingClient->sendCvar, pSignalingClient->sendLock, SIGNALING_SEND_TIMEOUT));
retryCount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we can debug log (DLOGD) the retry count increased

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the CVAR_WAIT returned non-success (eg status operation timed out), CHK_STATUS would goto Cleanup and bypass the retry count -- doesn't seem intended since there's a "// Check if we timed out" down below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sirknightj looks a bit odd, but here is what's happening:

  • If CVAR_WAIT returns on timeout, we exit anyway.
  • Only if it was returned early, we increment the retryCount and go again for wait if offset != size.
    else we stop iterating and go for the rest of the code path.

So, retryCount is basically, making it retry if wakeup was within timeout.
Let me know if you have a better suggestion to handle sendCvar wake-ups without just giving up entirely.

connectInfo.port = SIGNALING_DEFAULT_SSL_PORT;
connectInfo.alpn = "http/1.1"; // Force HTTP/1.1 only
connectInfo.protocol = "http/1.1"; // Force HTTP/1.1 protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering the purpose of the H2 flag if 1.1 is forced, is it for latency/performance optimization?

It seems that HTTP 2 should from this quick test:

curl -sI https://m-xxxxxxxx.kinesisvideo.us-west-2.amazonaws.com -o/dev/null -w '%{http_version}\n'
2
curl -sI https://v-xxxxxxxx.kinesisvideo.us-west-2.amazonaws.com -o/dev/null -w '%{http_version}\n'
2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to make this ws connection work more reliably on ESP platforms. Helps reduce memory consumption a bit and works much reliably. BTW, do we really gain much with http2 for our use case?

Now that, I have lot of optimisations in place, I can test http2 with ESP32 and revert the change.

@@ -1907,6 +1948,9 @@ STATUS writeLwsData(PSignalingClient pSignalingClient, BOOL awaitForResponse)
SIZE_T offset, size;
SERVICE_CALL_RESULT result;

UINT32 retryCount = 0;
const UINT32 MAX_RETRY_COUNT = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking this should be a #define (very very minor memory reduction)

 - Handle partial writes by sending data in multiple iterations
 - Use retries when message send fails
 - Track and handle message receive if in parts using `receiveMessage` var
@vikramdattu vikramdattu force-pushed the improve/lwsapicalls_robustness branch from 9f7ebd9 to 152ba5c Compare June 18, 2025 13:00
@vikramdattu
Copy link
Contributor Author

Wanted to know if you had a method or setup to test these failure recovery scenarios and if we could add something for it? Seems like it would be a lot of additional effort.

@sirknightj I can see the re-attempts reproducible quite often on ESP platforms and hence the change. Will check if it's straight forward to add some test.

@vikramdattu vikramdattu requested a review from sirknightj June 18, 2025 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants