add LangTest (#610)

zhimin-z · web-flow · commit 371b7eee8cc0 · 2024-09-30T00:42:43.000-04:00
diff --git a/README.md b/README.md
@@ -313,10 +313,10 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
 * [AlpacaEval](https://github.com/tatsu-lab/alpaca_eval) ![](https://img.shields.io/github/stars/tatsu-lab/alpaca_eval.svg?style=social) - AlpacaEval is an automatic evaluator for instruction-following language models.
 * [ARES](https://github.com/stanford-futuredata/ARES) ![](https://img.shields.io/github/stars/stanford-futuredata/ARES.svg?style=social) - ARES is a framework for automatically evaluating Retrieval-Augmented Generation (RAG) models.
 * [AutoML Benchmark](https://github.com/openml/automlbenchmark) ![](https://img.shields.io/github/stars/openml/automlbenchmark.svg?style=social) - AutoML Benchmark is a framework for evaluating and comparing open-source AutoML systems.
-* [Banana-lyzer](https://github.com/reworkd/bananalyzer) ![](https://img.shields.io/github/stars/reworkd/bananalyzer.svg?style=social) - Banana-lyzer is an open source AI Agent evaluation framework and dataset for web tasks with Playwright.
+* [Banana-lyzer](https://github.com/reworkd/bananalyzer) ![](https://img.shields.io/github/stars/reworkd/bananalyzer.svg?style=social) - Banana-lyzer is an open-source AI Agent evaluation framework and dataset for web tasks with Playwright.
 * [Code Generation LM Evaluation Harness](https://github.com/bigcode-project/bigcode-evaluation-harness) ![](https://img.shields.io/github/stars/bigcode-project/bigcode-evaluation-harness.svg?style=social) - Code Generation LM Evaluation Harness is a framework for the evaluation of code generation models.
-* [continuous-eval](https://github.com/relari-ai/continuous-eval) ![](https://img.shields.io/github/stars/relari-ai/continuous-eval.svg?style=social) - continuous-eval is a framework for data-driven evaluation of LLM-powered application.
-* [Deepchecks](https://github.com/deepchecks/deepchecks) ![](https://img.shields.io/github/stars/deepchecks/deepchecks.svg?style=social) - Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to thoroughly test your data and models from research to production.
+* [continuous-eval](https://github.com/relari-ai/continuous-eval) ![](https://img.shields.io/github/stars/relari-ai/continuous-eval.svg?style=social) - continuous-eval is a framework for data-driven evaluation of LLM-powered applications.
+* [Deepchecks](https://github.com/deepchecks/deepchecks) ![](https://img.shields.io/github/stars/deepchecks/deepchecks.svg?style=social) - Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling you to test your data and models from research to production thoroughly.
 * [DeepEval](https://github.com/confident-ai/deepeval) ![](https://img.shields.io/github/stars/confident-ai/deepeval.svg?style=social) - DeepEval is a simple-to-use, open-source evaluation framework for LLM applications.
 * [EvalAI](https://github.com/Cloud-CV/EvalAI) ![](https://img.shields.io/github/stars/Cloud-CV/EvalAI.svg?style=social) - EvalAI is an open-source platform for evaluating and comparing AI algorithms at scale.
 * [Evals](https://github.com/openai/evals) ![](https://img.shields.io/github/stars/openai/evals.svg?style=social) - Evals is a framework for evaluating OpenAI models and an open-source registry of benchmarks.
@@ -333,11 +333,12 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
 * [Inspect](https://github.com/UKGovernmentBEIS/inspect_ai) ![](https://img.shields.io/github/stars/UKGovernmentBEIS/inspect_ai.svg?style=social) - Inspect is a framework for large language model evaluations.
 * [InterCode](https://github.com/princeton-nlp/intercode) ![](https://img.shields.io/github/stars/princeton-nlp/intercode.svg?style=social) - InterCode is a lightweight, flexible, and easy-to-use framework for designing interactive code environments to evaluate language agents that can code.
 * [Langfuse](https://github.com/langfuse/langfuse) ![](https://img.shields.io/github/stars/langfuse/langfuse.svg?style=social) - Langfuse is an observability & analytics solution for LLM-based applications.
+* [LangTest](https://github.com/JohnSnowLabs/langtest) ![](https://img.shields.io/github/stars/JohnSnowLabs/langtest.svg?style=social) - LangTest is a comprehensive evaluation toolkit for NLP models.
 * [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) ![](https://img.shields.io/github/stars/EleutherAI/lm-evaluation-harness.svg?style=social) - Language Model Evaluation Harness is a framework to test generative language models on a large number of different evaluation tasks.
 * [LightEval](https://github.com/huggingface/lighteval) ![](https://img.shields.io/github/stars/huggingface/lighteval.svg?style=social) - LightEval is a lightweight LLM evaluation suite.
 * [LLMonitor](https://github.com/lunary-ai/lunary) ![](https://img.shields.io/github/stars/lunary-ai/lunary.svg?style=social) - LLMonitor is an observability & analytics for AI apps and agents.
-* [LLMPerf](https://github.com/ray-project/llmperf) ![](https://img.shields.io/github/stars/ray-project/llmperf.svg?style=social) - LLMPerf is a tool for evaulation the performance of LLM APIs.
-* [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) ![](https://img.shields.io/github/stars/mlabonne/llm-autoeval.svg?style=social) - LLM AutoEval simplifies the process of evaluating LLMs using a convenient Colab notebook. You just need to specify the name of your model, a benchmark, a GPU, and press run!
+* [LLMPerf](https://github.com/ray-project/llmperf) ![](https://img.shields.io/github/stars/ray-project/llmperf.svg?style=social) - LLMPerf is a tool for evaluating the performance of LLM APIs.
+* [LLM AutoEval](https://github.com/mlabonne/llm-autoeval) ![](https://img.shields.io/github/stars/mlabonne/llm-autoeval.svg?style=social) - LLM AutoEval simplifies the process of evaluating LLMs using a convenient Colab notebook.
 * [lmms-eval](https://github.com/EvolvingLMMs-Lab/lmms-eval) ![](https://img.shields.io/github/stars/EvolvingLMMs-Lab/lmms-eval.svg?style=social) - lmms-eval is an evaluation framework meticulously crafted for consistent and efficient evaluation of LMM.
 * [MLPerf Inference](https://github.com/mlcommons/inference) ![](https://img.shields.io/github/stars/mlcommons/inference.svg?style=social) - MLPerf Inference is a benchmark suite for measuring how fast systems can run models in a variety of deployment scenarios.
 * [mltrace](https://github.com/loglabs/mltrace) ![](https://img.shields.io/github/stars/loglabs/mltrace.svg?style=social) - mltrace is a lightweight, open-source Python tool to get "bolt-on" observability in ML pipelines.
@@ -360,7 +361,7 @@ Please review our [CONTRIBUTING.md](https://github.com/EthicalML/awesome-product
 * [Tonic Validate](https://github.com/TonicAI/tonic_validate) ![](https://img.shields.io/github/stars/TonicAI/tonic_validate.svg?style=social) - Tonic Validate is a high-performance evaluation framework for LLM/RAG outputs.
 * [TruLens](https://github.com/truera/trulens) ![](https://img.shields.io/github/stars/truera/trulens.svg?style=social) - TruLens provides a set of tools for evaluating and tracking LLM experiments.
 * [TrustLLM](https://github.com/HowieHwong/TrustLLM) ![](https://img.shields.io/github/stars/HowieHwong/TrustLLM.svg?style=social) - TrustLLM is a comprehensive framework to evaluate the trustworthiness of large language models, which includes principles, surveys, and benchmarks.
-* [UpTrain](https://github.com/uptrain-ai/uptrain) ![](https://img.shields.io/github/stars/uptrain-ai/uptrain.svg?style=social) - UpTrain is an open-source tool to evaluate LLM applications.
+* [UpTrain](https://github.com/uptrain-ai/uptrain) ![](https://img.shields.io/github/stars/uptrain-ai/uptrain.svg?style=social) - UpTrain is an open-source tool for evaluating LLM applications.
 * [VBench](https://github.com/Vchitect/VBench) ![](https://img.shields.io/github/stars/Vchitect/VBench.svg?style=social) - VBench is a comprehensive benchmark suite for video generative models.
 * [VLMEvalKit](https://github.com/open-compass/VLMEvalKit) ![](https://img.shields.io/github/stars/open-compass/VLMEvalKit.svg?style=social) - VLMEvalKit is an open-source evaluation toolkit of large vision-language models (LVLMs).