promptfoo
DeepSeek-Coder
promptfoo | DeepSeek-Coder | |
---|---|---|
5 | 8 | |
328 | 5,679 | |
- | 6.4% | |
10.0 | 8.5 | |
11 months ago | 13 days ago | |
TypeScript | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
promptfoo
-
Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B
Jumping in because I'm a big believer in (1) local LLMs, and (2) evals specific to individual use cases.
[0] https://github.com/typpo/promptfoo
- Meta Llama 3
-
Launch HN: Talc AI (YC S23) β Test Sets for AI
Congrats on the launch!
I've been interested in automatic testset generation because I find that the chore of writing tests is one of the reasons people shy away from evals. Recently landed eval testset generation for promptfoo (https://github.com/typpo/promptfoo), but it is non-RAG so more simplistic than your implementation.
Was also eyeballing this paper https://arxiv.org/abs/2401.03038, which outlines a method for generating asserts from prompt version history that may also be useful for these eval tools.
-
GPT-Prompt-Engineer
Thanks for the promptfoo mention. For anyone else who might prefer deterministic, programmatic evaluation of LLM outputs, I've been building promptfoo: https://github.com/typpo/promptfoo
Example asserts include basic string checks, regex, is-json, cosine similarity, etc.
DeepSeek-Coder
-
Meta Llama 3
deepseek-coder-instruct 6.7B still looks like is better than llama 3 8B on HumanEval [0], and deepseek-coder-instruct 33B still within reach to run on 32 GB Macbook M2 Max - Lamma 3 70B on the other hand will be hard to run locally unless you really have 128GB ram or more. But we will see in the following days how it performs in real life.
[0] https://github.com/deepseek-ai/deepseek-coder?tab=readme-ov-...
-
Mistral Remove "Committing to open models" from their website
Deepseek (https://github.com/deepseek-ai/DeepSeek-Coder?tab=readme-ov-...) code is MIT and the model license is available too.
- FLaNK Stack 05 Feb 2024
-
Stable Code 3B: Coding on the Edge
https://github.com/deepseek-ai/deepseek-coder
33B Instruct doesnβt beat 6.7B Instruct by much but maybe those % improvements mean more for your usage.
I run 6.7B since I have 16GB RAM.
-
What the heck is so great about this model?
Deepseek Coder: https://github.com/deepseek-ai/DeepSeek-Coder (Best open source coding model right now)
- Deepseek Coder instruct β 6.7B model beats gpt3.5-turbo in coding
- FLaNK Stack Weekly for 13 November 2023
- DeepSeek-Coder: Has anyone tried this one?
What are some alternatives?
rebuff - LLM Prompt Injection Detector
draw-a-ui - Draw a mockup and generate html for it
gpt-engineer - Specify what you want it to build, the AI asks for clarification, and then builds it.
FT-Merge-Quantize-Infer-CML
ChainForge - An open-source visual programming environment for battle-testing prompts to LLMs.
cucim - cuCIM - RAPIDS GPU-accelerated image processing library
gateway - A Blazing Fast AI Gateway. Route to 100+ LLMs with 1 fast & friendly API.
linen.dev - Lightweight Google-searchable Slack alternative for Communities
shap-e - Generate 3D objects conditioned on text or images
wubloader
sugarcane-ai - npm like package ecosystem for Prompts π€
clipea - ππ’ Like Clippy but for the CLI. A blazing fast AI helper for your command line