promptfoo
OLMo
promptfoo | OLMo | |
---|---|---|
5 | 3 | |
328 | 4,081 | |
- | 2.4% | |
10.0 | 9.9 | |
11 months ago | 7 days ago | |
TypeScript | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
promptfoo
-
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.
[0] https://github.com/typpo/promptfoo
- Meta Llama 3
-
Launch HN: Talc AI (YC S23) – Test Sets for AI
Congrats on the launch!
I've been interested in automatic testset generation because I find that the chore of writing tests is one of the reasons people shy away from evals. Recently landed eval testset generation for promptfoo (https://github.com/typpo/promptfoo), but it is non-RAG so more simplistic than your implementation.
Was also eyeballing this paper https://arxiv.org/abs/2401.03038, which outlines a method for generating asserts from prompt version history that may also be useful for these eval tools.
-
GPT-Prompt-Engineer
Thanks for the promptfoo mention. For anyone else who might prefer deterministic, programmatic evaluation of LLM outputs, I've been building promptfoo: https://github.com/typpo/promptfoo
Example asserts include basic string checks, regex, is-json, cosine similarity, etc.
OLMo
-
Meta Llama 3
Olmo from AI2. They released the model weights plus training data and training code.
link: https://allenai.org/olmo
-
Hello OLMo: A Open LLM
It looks like the weights [0] and code [1] are Apache licensed, but the training data [2] is using the license that OP is quoting from.
[0] https://huggingface.co/allenai/OLMo-7B
[1] https://github.com/allenai/OLMo
[2] https://huggingface.co/datasets/allenai/dolma
- FLaNK Stack Weekly 12 February 2024
What are some alternatives?
rebuff - LLM Prompt Injection Detector
electric - Local-first sync layer for web and mobile apps. Build reactive, realtime, local-first apps directly on Postgres.
gpt-engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
RustPython - A Python Interpreter written in Rust
ChainForge - An open-source visual programming environment for battle-testing prompts to LLMs.
Real-Time-Voice-Cloning - Clone a voice in 5 seconds to generate arbitrary speech in real-time
gateway - A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.
flink-cdc - Flink CDC is a streaming data integration tool
shap-e - Generate 3D objects conditioned on text or images
rich-cli - Rich-cli is a command line toolbox for fancy output in the terminal
sugarcane-ai - npm like package ecosystem for Prompts 🤖
tortoise-tts - A multi-voice TTS system trained with an emphasis on quality