anthropic-tokenizer
llm_utils
anthropic-tokenizer | llm_utils | |
---|---|---|
3 | 2 | |
87 | 24 | |
- | - | |
7.8 | 6.0 | |
6 days ago | 22 days ago | |
Python | Rust | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
anthropic-tokenizer
llm_utils
-
Show HN: Token price calculator for 400+ LLMs
> tiktoken.encoding_for_model(model)
Calling this where model == 'gpt-4o' will encode with CL200k no?
But yes, I do agree with you. I had time implementing non-tiktoken tokenizers for my project. I ended up manually adding tokenizer.json files into my repo.[1] The other options is downloading from HF, but the official repos where the model's tokenizer.json lives require agreeing to their terms to access. So it requires an HF key, and agreeing to the terms. So not a good experience for a consumer of the package.
> Message frame tokens?
Do you mean the chat template tokens? Oh, that's another good point. Yeah, it counts OpenAI prompt tokens. I solved this by implementing a Jinja templating engine to create the full prompt. [2] Granted, both llama.cpp and mistral-rs do this on the backend, so it's purely for counting tokens. I guess it would make sense to add a function to convert tokens to Dollars.
[1] https://github.com/ShelbyJenkins/llm_utils/tree/main/src/mod...
What are some alternatives?
tokencost - Easy token price estimates for 400+ LLMs