Top 8 Python multi-modality Projects

LLaVA

21 17,102 9.3 Python

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Project mention: PaliGemma: Open-Source Multimodal Model by Google | news.ycombinator.com | 2024-05-15

Here's a tutorial https://wandb.ai/byyoung3/ml-news/reports/How-to-Fine-Tune-L...
There's not really a super easy to use software solution yet, but a few different ones have cropped up. Right now you'll have to read papers to get the training recipes.
- https://github.com/haotian-liu/LLaVA/blob/main/scripts/finet...

clip-as-service

15 12,232 5.2 Python

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Project mention: Search for anything ==> Immich fails to download textual.onnx | /r/immich | 2023-09-15

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
deep-daze

49 4,379 0.0 Python

Simple command line tool for text to image generation using OpenAI's CLIP and Siren (Implicit neural representation network). Technique was originally created by https://twitter.com/advadnoun
Otter

4 3,473 9.1 Python

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Project mention: OpenAI vs Google, Detect ChatGPT Content with 99% accuracy, Navigating AI compute costs | /r/ChatGPT | 2023-06-15

👀 Video-LLaMA - Empower large language models with video and audio understanding capability. (link) 🦦 Otter - Multi-modal model with improved instruction-following and in-context learning ability. 🔗 Linkly.AI - AI-powered lead analytics and management platform that helps you track, analyze, and streamline your leads in one place. 🎬 Jet Cut Ready - AI plugin for Adobe Premiere Pro that automatically removes silent parts in videos. (link) 💬 HeyGen's ChatGPT Plugin - Convert text into high-quality videos using AI text and video generation.

swarms

1 739 10.0 Python

Orchestrate Swarms of Agents From Any Framework Like OpenAI, Langchain, and Etc for Business Operation Automation. Join our Community: https://discord.gg/DbjBMJTSWD

Project mention: Swarms – Automating all digital activities with millions of autonomous AI Agents | news.ycombinator.com | 2023-07-10

Multi-Modality-Arena

1 387 7.7 Python

Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!

Project mention: [R] Tiny LVLM-eHub: Early Multimodal Experiments with Bard - OpenGVLab, Shanghai AI Laboratory 2023 - Encourages innovative strategies aimed at advancing multimodal techniques! | /r/MachineLearning | 2023-08-13

Github: https://github.com/OpenGVLab/Multi-Modality-Arena

Sophia

3 361 4.5 Python

Effortless plugin and play Optimizer to cut model training costs by 50%. New optimizer that is 2x faster than Adam on LLMs. (by kyegomez)

Project mention: [D] Potential scammer on github stealing work of other ML researchers? | /r/MachineLearning | 2023-08-17

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
multi_token

1 150 8.5 Python

Embed arbitrary modalities (images, audio, documents, etc) into large language models.

Project mention: Embed arbitrary modalities (images, audio, documents, etc.) into LLMs | news.ycombinator.com | 2023-12-18

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python multi-modality related posts

[R] Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

2 projects | /r/MachineLearning | 26 May 2023
The Sophia optimizer, a faster alternative to AdamW

2 projects | news.ycombinator.com | 24 May 2023