Show HN: Skyvern – open-source browser automation tool

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

skyvern

8 5,122 9.6 Python

Automate browser-based workflows with LLMs and Computer Vision

https://github.com/Skyvern-AI/skyvern/blob/d0935755963b017ed...
We also spit out the cost for each step within the visualizer. Click on any task > Steps > there's a column that's dedicated to how much things cost to run
https://github.com/Skyvern-AI/skyvern/issues/70
2. We have a roadmap item to "cache" or "memorize" specific tasks, so you pay the cost once, and then just run it over and over again. We're going to get to it soon!!

LaVague

4 4,563 9.7 Python

Large Action Model framework to develop AI Web Agents

We're quite different than LaVague. LaVague passes in the entire HTML DOM to the LLM to help it generate XPaths and valid Selenium code. (https://github.com/lavague-ai/LaVague/blob/main/src/lavague/...)
Try this at your own risk.. any reasonable website would result in extraordinarily high input token costs
We spend quite a bit of our time building a layer between the HTML and the LLM call to distill important pieces of information down to actions the LLM can take.. better weighing cost vs output. We're still not at 100% coverage.

Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
self-operating-computer

14 7,815 9.8 Python

A framework to enable multimodal models to operate a computer.

This is quite different than https://github.com/OthersideAI/self-operating-computer
Self-operating-computer uses pixel mapping to control your computer. This is a very good approach, but it's extremely unreliable. GPT-4V frequently hallucinates pixel outputs, causing it to miss interactions, or enter fail-loops
>The approach by AI Jason
AI Jason is using image-only methods to interact with the browser. This is a great first step, but this approach tends to be rife with hallucinations or errors. We do dom parsing in addition to image anaylsis to help GPT-4V correlate information in the image to the interactable elements within the DOM. This dramatically boosts its ability to perform the same task over and over again reliably (which proved impossible with the image-only approach)

vimGPT

7 2,504 7.4 Python

Browse the web with GPT-4V and Vimium
OpenAdapt

28 619 9.3 Python

AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

Congratulations on shipping!
Check out https://github.com/OpenAdaptAI/OpenAdapt for an open source (MIT license) alternative that also works on desktop (including Citrix!)

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

4 projects | news.ycombinator.com | 28 May 2024
PaliGemma: Open-Source Multimodal Model by Google

5 projects | news.ycombinator.com | 15 May 2024
Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

8 projects | news.ycombinator.com | 15 May 2024
Rabbit R1 can be run on a Android device

1 project | news.ycombinator.com | 5 May 2024
OpenAdapt: AI-First Process Automation with Large Multimodal Models

1 project | news.ycombinator.com | 5 May 2024

Show HN: Skyvern – open-source browser automation tool

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
process-automation Python Transformers
Post date: 14 Mar 2024

skyvern

LaVague

Scout Monitoring

self-operating-computer

vimGPT

OpenAdapt

InfluxDB

Related posts

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

PaliGemma: Open-Source Multimodal Model by Google

Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

Rabbit R1 can be run on a Android device

OpenAdapt: AI-First Process Automation with Large Multimodal Models

Show HN: Skyvern – open-source browser automation tool

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com process-automation Python Transformers Post date: 14 Mar 2024

skyvern

LaVague

Scout Monitoring

self-operating-computer

vimGPT

OpenAdapt

InfluxDB

Related posts

Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

PaliGemma: Open-Source Multimodal Model by Google

Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

Rabbit R1 can be run on a Android device

OpenAdapt: AI-First Process Automation with Large Multimodal Models

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
process-automation Python Transformers
Post date: 14 Mar 2024