feat - tests + evals #20

mclenhard · 2025-05-02T14:08:05Z

Adds new e2e test that loads an MCP client, which in turn runs the server and processes the actual tool call. Afterwards, it then grades the response for correctness.

note: I'm the package author

mclenhard · 2025-05-15T13:00:45Z

hey @ferrislucas any feedback ?

ferrislucas · 2025-05-17T13:07:30Z

Hi @mclenhard , sorry I’ve been very busy. I really appreciate the contribution! I love this idea. I have two questions:

Can this be made a dev dependency instead of requiring it as a dependency to run the server?
It seems like this is intended to be used as a GitHub action, and I think it needs a yml file to setup the action. Is that your intent?

Thanks again! It would be great to get this to run on every PR without requiring end users to have it in their build.

mclenhard · 2025-05-18T12:23:45Z

Yeah, I can make this a dev dependency. Good idea.

Regarding the GitHub action, It isn't 100% necessary as a Github action. For some projects I just model them as tests that I run locally. Really up to you how you want the setup to be. That being said, I can add it as a GitHub action, but you'll need to set up an OPEN_AI API key. You'll just add it as a GitHub action secret. Since this is an open-source project, though, if you enable data sharing in OpenAI, you can get 2.5 million tokens free, which should be more than enough.

Let me know how if you want to go the Github action route and I can setup the yaml's.

ferrislucas · 2025-05-19T12:25:11Z

@mclenhard awesome! I would love to get this configured as an action, I’m happy to setup the necessary secret. Thank you so much for contributing! Much appreciated!

mclenhard · 2025-05-19T22:13:48Z

Sounds great will work on this tomorrow!

mclenhard · 2025-05-22T14:47:47Z

Just added the github action! Let me know if you have any feedback.

ferrislucas · 2025-05-28T02:55:20Z

Thanks @mclenhard ! I think the action needs to run on a macOS image in order for Iterm to available and pass the eval. There probably needs to be an init step to start Iterm before the eval. Does that make sense?

feat - tests + evals

cc2dd96

[feat] add github action

90c2b2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat - tests + evals #20

feat - tests + evals #20

Uh oh!

mclenhard commented May 2, 2025

Uh oh!

mclenhard commented May 15, 2025

Uh oh!

ferrislucas commented May 17, 2025

Uh oh!

mclenhard commented May 18, 2025

Uh oh!

ferrislucas commented May 19, 2025

Uh oh!

mclenhard commented May 19, 2025

Uh oh!

mclenhard commented May 22, 2025

Uh oh!

ferrislucas commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat - tests + evals #20

Are you sure you want to change the base?

feat - tests + evals #20

Uh oh!

Conversation

mclenhard commented May 2, 2025

Uh oh!

mclenhard commented May 15, 2025

Uh oh!

ferrislucas commented May 17, 2025

Uh oh!

mclenhard commented May 18, 2025

Uh oh!

ferrislucas commented May 19, 2025

Uh oh!

mclenhard commented May 19, 2025

Uh oh!

mclenhard commented May 22, 2025

Uh oh!

ferrislucas commented May 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants