Show HN: Skyvern – open-source browser automation tool

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • skyvern

    Automate browser-based workflows with LLMs and Computer Vision

  • https://github.com/Skyvern-AI/skyvern/blob/d0935755963b017ed...

    We also spit out the cost for each step within the visualizer. Click on any task > Steps > there's a column that's dedicated to how much things cost to run

    https://github.com/Skyvern-AI/skyvern/issues/70

    2. We have a roadmap item to "cache" or "memorize" specific tasks, so you pay the cost once, and then just run it over and over again. We're going to get to it soon!!

  • LaVague

    Large Action Model framework to develop AI Web Agents

  • We're quite different than LaVague. LaVague passes in the entire HTML DOM to the LLM to help it generate XPaths and valid Selenium code. (https://github.com/lavague-ai/LaVague/blob/main/src/lavague/...)

    Try this at your own risk.. any reasonable website would result in extraordinarily high input token costs

    We spend quite a bit of our time building a layer between the HTML and the LLM call to distill important pieces of information down to actions the LLM can take.. better weighing cost vs output. We're still not at 100% coverage.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • self-operating-computer

    A framework to enable multimodal models to operate a computer.

  • This is quite different than https://github.com/OthersideAI/self-operating-computer

    Self-operating-computer uses pixel mapping to control your computer. This is a very good approach, but it's extremely unreliable. GPT-4V frequently hallucinates pixel outputs, causing it to miss interactions, or enter fail-loops

    >The approach by AI Jason

    AI Jason is using image-only methods to interact with the browser. This is a great first step, but this approach tends to be rife with hallucinations or errors. We do dom parsing in addition to image anaylsis to help GPT-4V correlate information in the image to the interactable elements within the DOM. This dramatically boosts its ability to perform the same task over and over again reliably (which proved impossible with the image-only approach)

  • vimGPT

    Browse the web with GPT-4V and Vimium

  • OpenAdapt

    AI-First Process Automation with Large ([Language (LLMs) / Action (LAMs) / Multimodal (LMMs)] / Visual Language (VLMs)) Models

  • Congratulations on shipping!

    Check out https://github.com/OpenAdaptAI/OpenAdapt for an open source (MIT license) alternative that also works on desktop (including Citrix!)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Llama 3-V: Matching GPT4-V with a 100x smaller model and 500 dollars

    4 projects | news.ycombinator.com | 28 May 2024
  • PaliGemma: Open-Source Multimodal Model by Google

    5 projects | news.ycombinator.com | 15 May 2024
  • Show HN: Tarsier – vision for text-only LLM web agents that beats GPT-4o

    8 projects | news.ycombinator.com | 15 May 2024
  • Rabbit R1 can be run on a Android device

    1 project | news.ycombinator.com | 5 May 2024
  • OpenAdapt: AI-First Process Automation with Large Multimodal Models

    1 project | news.ycombinator.com | 5 May 2024