Show HN: Token price calculator for 400+ LLMs

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • tokencost

    Easy token price estimates for 400+ LLMs

    > Tangential critiques are preferable?

    Not at all. Reasoned arguments and adding information (the examples I gave were what I thought your main points were) to the discussion are preferable to (what seemed to be) character attacks. Your comment here, as an example, was mostly great. It provides the same level of usefulness to anyone reading it (highlighting that the computation is just C100K and that people will be misled if they try to use it the wrong way), and you also added reasoned counter-arguments to my OOM idea and several other interesting pieces of information. To the extent that you kept the character attacks against the author, you at least softened the language.

    Respectfully attacking ideas instead of people is especially important in online discourse like this. Even if you're right, attacking people tends to spiral a conversation out of control and convince no one (often persuading them of the opposite of whatever you were trying to say).

    > just C100K

    It's not just C100K though. It is for a few models [0], but even then the author does warn the caller (mind you, I prefer mechanisms like an `allow_approximate_token_count=False` parameter or whatever, but that's not fraud on the author's part; that's a dangerous API design).

    Going back to the "tone" thing, calling out those sorts of deficiencies is a great way to warn other people, let them decide if that sort of thing matters for their use case, point out potential flaws in your own reasoning (e.g., it's not totally clear to me if you think the code always uses C100K or always uses it for a subset of models, but if it's the former then you'd probably be interested in knowing that the tokenizer is actually correct for most models) and discuss better API designs. It makes everyone better off for having read your comment and invites more discussions which will hopefully also make everyone better off.

    > outside the realm of even needing a function

    Maybe! I'd argue that it's useful to have all those prices (especially since not all tokens are created equally) in one place somewhere, but arguing that this is left-pad for LLM pricing is also a reasonable thing to talk about.

    > it's impossible to get definitive error bars

    That's also true, but that doesn't matter for every application. E.g., suppose you want to run some process on your entire corporate knowledge-base and want a ballpark estimate of costs. The tokenizer error is on average much smaller than the 30%+ you saw for some specific (currently unknown to us here at HN) very small input. Just run your data through this tool, tally up the costs, and you ought to be within 10%. Nobody cares if it's a $900 project or a $1300 project (since nobody is allocating expensive, notoriously unpredictable developers to a project with only 10-30% margins). You just tell the stakeholders it'll cost $2k and a dev-week, and if it takes less then everyone is happily surprised. If they say no at that estimate, they probably wouldn't have been ecstatic with the result if it actually cost $900 and a dev-day anyway.

    [0] https://github.com/AgentOps-AI/tokencost/blob/main/tokencost...

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • openai-messages-token-helper

    A utility library for dealing with token counting for messages sent to an LLM (currently OpenAI models only)

    I grappled with that issue for https://github.com/pamelafox/openai-messages-token-helper as I wanted to be able to use it for a quick token check with SLMs as well, so I ended up adding a parameter "fallback_to_default" for developers to indicate they're okay with assuming gpt-35 BPE encoding.

  • anthropic-tokenizer

    Approximation of the Claude 3 tokenizer by inspecting generation stream

  • llm_utils

    Utilities for Llama.cpp, Openai, Anthropic, Mistral-rs.

    > tiktoken.encoding_for_model(model)

    Calling this where model == 'gpt-4o' will encode with CL200k no?

    But yes, I do agree with you. I had time implementing non-tiktoken tokenizers for my project. I ended up manually adding tokenizer.json files into my repo.[1] The other options is downloading from HF, but the official repos where the model's tokenizer.json lives require agreeing to their terms to access. So it requires an HF key, and agreeing to the terms. So not a good experience for a consumer of the package.

    > Message frame tokens?

    Do you mean the chat template tokens? Oh, that's another good point. Yeah, it counts OpenAI prompt tokens. I solved this by implementing a Jinja templating engine to create the full prompt. [2] Granted, both llama.cpp and mistral-rs do this on the backend, so it's purely for counting tokens. I guess it would make sense to add a function to convert tokens to Dollars.

    [1] https://github.com/ShelbyJenkins/llm_utils/tree/main/src/mod...

  • anthropic-tokenizer-typescript

    Anthropic actually has a Claude 3 tokenizer tucked away in one of their repos: https://github.com/anthropics/anthropic-tokenizer-typescript

    At this moment, Tokencost uses the OpenAI tokenizer as a default tokenizer, but this would be a welcome PR!

  • litellm

    Call all LLM APIs using the OpenAI format. Use Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate (100+ LLMs)

    Very cool! Is this cost directory you're using the best source for historical cost per 1M tokens? https://github.com/BerriAI/litellm/blob/main/model_prices_an...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Gloe v0.6.0 released Your Code as a Flow

    1 project | news.ycombinator.com | 26 Jun 2024
  • A Better Way to Code: Documentation Driven Development

    3 projects | news.ycombinator.com | 26 Jun 2024
  • Nuitka Is a Python Compiler

    1 project | news.ycombinator.com | 26 Jun 2024
  • Show HN: R2R V2 – A open source RAG engine with prod features

    2 projects | news.ycombinator.com | 26 Jun 2024
  • Show HN: TF-GPT – a TensorFlow implementation of a decoder-only transformer

    1 project | news.ycombinator.com | 26 Jun 2024

Did you konow that Python is
the 1st most popular programming language
based on number of metions?