GPT-4 vs Claude-2 context recall analysis

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

gpt-pilot

20 28,382 9.9 Python

The first real AI developer

I’m working on an AI dev tool GPT Pilot that uses LLMs a lot. So, I was interested in context recall - however, it becomes more apparent at larger context sizes. In other words, how well can the LLM find the information it needs that is in the context? Less than ideal, as it turns out.

LLMTest_NeedleInAHaystack

4 1,065 8.4 Jupyter Notebook

Doing simple retrieval from LLM models at various context lengths to measure accuracy

This research follows the “haystack test” Greg Kamradt published when the update GPT-4 came out (twitter, code). That test provided useful insight into (the lack of) context recall performance. But it was performed on a very small sample test (limiting its statistical significance) and was initially limited to GPT-4 (he has since published an updated version that also uses Claude 2.1). Moreover, the test data consists of essays that were likely already used pretraining LLMs, and the results were evaluated by GPT-4, potentially introducing confounding variables into the mix.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

MinoanLoop: Can GPT Translate Linear A?

1 project | news.ycombinator.com | 13 May 2024
How to Build a Chat App with Your Postgres Data using Agent Cloud

3 projects | dev.to | 13 May 2024
FLaNK-AIM Weekly 13 May 2024

34 projects | dev.to | 13 May 2024
Date Recordings from Background Noises

1 project | news.ycombinator.com | 13 May 2024
Song Maker

2 projects | news.ycombinator.com | 13 May 2024

GPT-4 vs Claude-2 context recall analysis

This page summarizes the projects mentioned and recommended in the original post on dev.to Post date: 5 Dec 2023

gpt-pilot

LLMTest_NeedleInAHaystack

InfluxDB

Related posts

MinoanLoop: Can GPT Translate Linear A?

How to Build a Chat App with Your Postgres Data using Agent Cloud

FLaNK-AIM Weekly 13 May 2024

Date Recordings from Background Noises

Song Maker