-
Notifications
You must be signed in to change notification settings - Fork 94
[WIP] AWS Polly Agent TTS Provider #536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe changes introduce AWS Polly as a supported text-to-speech provider for Deepgram Voice Agent. This includes new data structures for AWS credentials, updates to the provider configuration to support Polly-specific fields, an example Python script demonstrating integration, and comprehensive documentation for setup and troubleshooting. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant MainScript
participant DeepgramClient
participant DeepgramAgentAPI
participant AWSPolly
User->>MainScript: Start script
MainScript->>DeepgramClient: Initialize with AWS Polly credentials
MainScript->>DeepgramAgentAPI: Connect via WebSocket
DeepgramAgentAPI->>MainScript: Connection established
MainScript->>DeepgramAgentAPI: Send agent configuration (incl. Polly as speak provider)
DeepgramAgentAPI->>AWSPolly: Request speech synthesis (with credentials)
AWSPolly-->>DeepgramAgentAPI: Return synthesized audio
DeepgramAgentAPI-->>MainScript: Deliver audio and events
MainScript-->>User: Playback audio, handle events, keepalive loop
User->>MainScript: Interact (e.g., keyboard interrupt)
MainScript->>DeepgramAgentAPI: Close connection
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)deepgram/clients/agent/v1/websocket/options.pyexamples/agent/aws_polly/main.py✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
deepgram/clients/agent/v1/websocket/options.py (1)
187-247
:⚠️ Potential issueAddress pylint violation for too many instance attributes.
The
SpeakProvider
enhancements properly support AWS Polly integration, but the class now exceeds pylint's instance attribute limit (8/7).Consider these solutions:
- Add a pylint disable comment for this specific case
- Group related AWS Polly fields into a nested dataclass
- Increase the pylint limit if justified
Apply this diff to suppress the pylint warning:
@dataclass +# pylint: disable=too-many-instance-attributes class SpeakProvider(BaseResponse):
🧰 Tools
🪛 GitHub Actions: Check - lint
[error] 189-189: pylint: Too many instance attributes (8/7) (too-many-instance-attributes)
🧹 Nitpick comments (1)
examples/agent/aws_polly/main.py (1)
148-153
: Consider more efficient keepalive implementation.The current keepalive loop sends a message every second, which might be excessive. Consider using a longer interval or implementing a more sophisticated heartbeat mechanism.
Apply this diff for a more reasonable keepalive interval:
# Keep the connection alive while True: - time.sleep(1) + time.sleep(30) # Send keepalive every 30 seconds instead of every second keep_alive = AgentKeepAlive() connection.send(keep_alive.to_json())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepgram/clients/agent/v1/websocket/options.py
(4 hunks)examples/agent/aws_polly/README.md
(1 hunks)examples/agent/aws_polly/main.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
deepgram/clients/agent/v1/websocket/options.py (1)
deepgram/clients/common/v1/shared_response.py (1)
BaseResponse
(16-44)
🪛 LanguageTool
examples/agent/aws_polly/README.md
[uncategorized] ~63-~63: The preposition ‘to’ seems more likely in this position.
Context: ...e 1. Set up your environment variables in .env
2. Run the example: ```bash ...
(AI_HYDRA_LEO_REPLACE_IN_TO)
🪛 GitHub Actions: Check - lint
deepgram/clients/agent/v1/websocket/options.py
[error] 189-189: pylint: Too many instance attributes (8/7) (too-many-instance-attributes)
🔇 Additional comments (4)
examples/agent/aws_polly/README.md (1)
1-81
: Well-structured documentation with comprehensive setup instructions.The README provides excellent coverage of AWS Polly integration including prerequisites, configuration examples, and troubleshooting guidance. The examples correctly demonstrate the new
AWSPollyCredentials
usage for both IAM and STS authentication methods.🧰 Tools
🪛 LanguageTool
[uncategorized] ~63-~63: The preposition ‘to’ seems more likely in this position.
Context: ...e 1. Set up your environment variables in.env
2. Run the example: ```bash ...(AI_HYDRA_LEO_REPLACE_IN_TO)
deepgram/clients/agent/v1/websocket/options.py (2)
169-185
: Well-implemented AWS Polly credentials dataclass.The
AWSPollyCredentials
dataclass follows the established patterns in the codebase with proper field definitions, exclusion metadata, and dictionary access implementation. The support for both IAM and STS credential types provides flexibility for different AWS authentication scenarios.
210-215
:✅ Verification successful
Verify Union type handling in voice field.
The
voice
field now acceptsUnion[CartesiaVoice, str]
to support both Cartesia objects and AWS Polly voice names. Ensure that downstream code properly handles both types during serialization/deserialization.
🏁 Script executed:
#!/bin/bash # Description: Search for voice field usage to ensure proper type handling # Expected: Find usages that can handle both CartesiaVoice objects and strings echo "Searching for voice field usage in agent-related files..." rg -A 3 -B 3 "\.voice" --type py | grep -E "(agent|speak|voice)" echo -e "\nSearching for CartesiaVoice usage..." rg -A 3 "CartesiaVoice" --type pyLength of output: 1785
🏁 Script executed:
#!/bin/bash # Inspect serialization logic for the `voice` field in options.py echo "Locating `to_dict` implementation in options.py..." grep -R -n "def to_dict" deepgram/clients/agent/v1/websocket/options.py echo -e "\nShowing the `to_dict` block around the `voice` field (lines ~190–260)..." sed -n '190,260p' deepgram/clients/agent/v1/websocket/options.pyLength of output: 2977
🏁 Script executed:
#!/bin/bash # Locate BaseResponse definition to inspect its to_dict implementation echo "Searching for BaseResponse class definition..." rg -n "class BaseResponse" -C 3 --type py # Once found, display the to_dict method block (adjust path if needed) # Example: if BaseResponse is in deepgram/clients/common/base.py # sed -n '1,200p' deepgram/clients/common/base.py | grep -n -C 5 "def to_dict"Length of output: 656
🏁 Script executed:
#!/bin/bash # Display the BaseResponse implementation to inspect to_dict and JSON methods echo "Showing deepgram/clients/common/v1/shared_response.py (first 200 lines):" sed -n '1,200p' deepgram/clients/common/v1/shared_response.pyLength of output: 2260
Resolved:
voice
Union[CartesiaVoice, str] is correctly handledDataClassJsonMixin’s
to_dict
serializes bothCartesiaVoice
instances (to dict) and strings natively, and the overridden__getitem__
inOptions
converts any dict back toCartesiaVoice
viafrom_dict
. No further changes are needed.examples/agent/aws_polly/main.py (1)
87-95
: Clarify audio configuration and sample rate consistency.The audio input and output configurations use the same sample rate (24000), which is good for consistency. The comment about AWS Polly outputting MP3 is helpful for understanding the format choice.
api_key = os.getenv("DEEPGRAM_API_KEY") | ||
if not api_key: | ||
raise ValueError("DEEPGRAM_API_KEY environment variable is not set") | ||
print("API Key found") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add validation for AWS credentials.
While the Deepgram API key is validated, the AWS credentials are not checked before usage, which could lead to runtime errors.
Apply this diff to add AWS credential validation:
if not api_key:
raise ValueError("DEEPGRAM_API_KEY environment variable is not set")
print("API Key found")
+
+ # Validate AWS credentials
+ aws_access_key = os.getenv("AWS_ACCESS_KEY_ID")
+ aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY")
+ if not aws_access_key or not aws_secret_key:
+ raise ValueError("AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are required")
+ print("AWS credentials found")
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
api_key = os.getenv("DEEPGRAM_API_KEY") | |
if not api_key: | |
raise ValueError("DEEPGRAM_API_KEY environment variable is not set") | |
print("API Key found") | |
api_key = os.getenv("DEEPGRAM_API_KEY") | |
if not api_key: | |
raise ValueError("DEEPGRAM_API_KEY environment variable is not set") | |
print("API Key found") | |
# Validate AWS credentials | |
aws_access_key = os.getenv("AWS_ACCESS_KEY_ID") | |
aws_secret_key = os.getenv("AWS_SECRET_ACCESS_KEY") | |
if not aws_access_key or not aws_secret_key: | |
raise ValueError("AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are required") | |
print("AWS credentials found") |
🤖 Prompt for AI Agents
In examples/agent/aws_polly/main.py around lines 24 to 27, AWS credentials are
not validated before use, which can cause runtime errors. Add checks to verify
that the necessary AWS environment variables (such as AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY) are set, and raise a clear error if any are missing,
similar to the existing validation for DEEPGRAM_API_KEY.
credentials=AWSPollyCredentials( | ||
type="IAM", | ||
region="us-east-1", | ||
access_key_id=os.getenv("AWS_ACCESS_KEY_ID", ""), | ||
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", ""), | ||
) | ||
) | ||
|
||
# Configure AWS Polly endpoint | ||
options.agent.speak.endpoint = Endpoint( | ||
method="POST", # Explicitly set the method for AWS Polly | ||
url="https://polly.us-east-1.amazonaws.com/v1/speech", #use the correct region | ||
headers=[ | ||
Header( | ||
key="Content-Type", | ||
value="application/json" | ||
) | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix region inconsistency between credentials and endpoint.
The AWS credentials specify us-east-1
region, but this should be consistent throughout the configuration. Also, the hardcoded Polly endpoint URL may not be necessary as AWS SDKs typically handle endpoint resolution.
Apply this diff to ensure consistency:
credentials=AWSPollyCredentials(
type="IAM",
- region="us-east-1",
+ region="us-west-2", # Match the region used in README examples
access_key_id=os.getenv("AWS_ACCESS_KEY_ID", ""),
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", ""),
)
)
- # Configure AWS Polly endpoint
- options.agent.speak.endpoint = Endpoint(
- method="POST", # Explicitly set the method for AWS Polly
- url="https://polly.us-east-1.amazonaws.com/v1/speech", #use the correct region
- headers=[
- Header(
- key="Content-Type",
- value="application/json"
- )
- ]
- )
+ # Note: AWS Polly endpoint is typically handled automatically by the AWS SDK
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
credentials=AWSPollyCredentials( | |
type="IAM", | |
region="us-east-1", | |
access_key_id=os.getenv("AWS_ACCESS_KEY_ID", ""), | |
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", ""), | |
) | |
) | |
# Configure AWS Polly endpoint | |
options.agent.speak.endpoint = Endpoint( | |
method="POST", # Explicitly set the method for AWS Polly | |
url="https://polly.us-east-1.amazonaws.com/v1/speech", #use the correct region | |
headers=[ | |
Header( | |
key="Content-Type", | |
value="application/json" | |
) | |
] | |
) | |
credentials=AWSPollyCredentials( | |
type="IAM", | |
region="us-west-2", # Match the region used in README examples | |
access_key_id=os.getenv("AWS_ACCESS_KEY_ID", ""), | |
secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY", ""), | |
) | |
) | |
# Note: AWS Polly endpoint is typically handled automatically by the AWS SDK |
🤖 Prompt for AI Agents
In examples/agent/aws_polly/main.py around lines 116 to 134, ensure the AWS
region specified in the credentials matches the region used in the Polly
endpoint configuration. Remove the hardcoded Polly endpoint URL and rely on the
AWS SDK's default endpoint resolution to maintain consistency and avoid region
mismatches.
More to do here as the example isn't running correctly yet, getting this error:
|
closing for #532 which will be our approach going forward for the Agent Provider objects. |
Proposed changes
Types of changes
Summary by CodeRabbit