Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 PDF Open-Source Projects
-
quivr
Your GenAI Second Brain 🧠 A personal productivity assistant (RAG) ⚡️🤖 Chat with your docs (PDF, CSV, ...) & apps using Langchain, GPT 3.5 / 4 turbo, Private, Anthropic, VertexAI, Ollama, LLMs, Groq that you can share with users ! Local & Private alternative to OpenAI GPTs & ChatGPT powered by retrieval-augmented generation.
-
Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
-
SurveyJS
Open-Source JSON Form Builder to Create Dynamic Forms Right in Your App. With SurveyJS form UI libraries, you can build and style forms in a fully-integrated drag & drop form builder, render them in your JS app, and store form submission data in any backend, inc. PHP, ASP.NET Core, and Node.js.
-
paperless-ngx
A community-supported supercharged version of paperless: scan, index and archive all your physical documents
-
best-resume-ever
:necktie: :briefcase: Build fast :rocket: and easy multiple beautiful resumes and create your best CV ever! Made with Vue and LESS.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
koodo-reader
A modern ebook manager and reader with sync and backup capacities for Windows, macOS, Linux and Web
-
koreader
An ebook reader application supporting PDF, DjVu, EPUB, FB2 and many more formats, running on Cervantes, Kindle, Kobo, PocketBook and Android devices
-
mit-deep-learning-book-pdf
MIT Deep Learning Book in PDF format (complete and parts) by Ian Goodfellow, Yoshua Bengio and Aaron Courville
-
milewski-ctfp-pdf
Bartosz Milewski's 'Category Theory for Programmers' unofficial PDF and LaTeX source
-
QuestPDF
QuestPDF is a modern open-source .NET library for PDF document generation. Offering comprehensive layout engine powered by concise and discoverable C# Fluent API. Easily generate PDF reports, invoices, exports, etc.
-
h2ogpt
Private chat with local GPT with document, images, video, etc. 100% private, Apache 2.0. Supports oLLaMa, Mixtral, llama.cpp, and more. Demo: https://gpt.h2o.ai/ https://codellama.h2o.ai/
-
xournalpp
Xournal++ is a handwriting notetaking software with PDF annotation support. Written in C++ with GTK3, supporting Linux (e.g. Ubuntu, Debian, Arch, SUSE), macOS and Windows 10. Supports pen input from devices such as Wacom Tablets.
-
PyPDF2
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: privateGPT VS quivr - a user suggested alternative | libhunt.com/r/privateGPT | 2024-01-12
Project mention: Stirling PDF: Self-hosted, web-based PDF manipulation tool | news.ycombinator.com | 2024-05-02Well it was developed initially by ChatGPT. First file I open I see repeated comments.
https://github.com/Stirling-Tools/Stirling-PDF/blob/7f577a60...
Project mention: How can I turn awesome-cv coverletter.tex and cv.tex into a single PDF? | /r/LaTeX | 2023-10-02I am in the process of rewriting my CV using the [awesome-cv](https://github.com/posquit0/Awesome-CV) template and am pretty happy with how things are turning out.
I steered a friend towards Paperless (and away from an LLM solution) as a way of searching/accessing GBs of architectural PDFs recently - so far, it’s apparently working well for them.
https://github.com/paperless-ngx/paperless-ngx
Kobos[1] and Pocketbooks[2] are a lot more open than Kindles. AFAIK you can transfer .epub files into both devices and these epubs are perfectly readable via the stock OS. If for some reason you find the stock proprietary OS lacking, you can install an open source one like KOreader [3] or Plato[4]
Of course you want a good way of organizing epubs pdfs mobi, and like has already been mentioned Calibre[5] is a great option.
[1]https://www.kobo.com/
[2]https://pocketbookstore.com/en-ca
[3]https://github.com/koreader/koreader
[4]https://github.com/baskerville/plato
[5]https://calibre-ebook.com/
I am playing around with this github project, which takes a user question as input and immediately runs a vector search on it to find relevant storied information before delivering an answer.
Using react-pdf, we crafted a solution that allowed users to manipulate their reports with an impressive degree of flexibility. But, as data grew (imagine trying to cram an entire financial year's worth of invoices, up to 22,000 rows, into one PDF), our solution began to falter, especially on older PCs with limited resources.
I’m curious, have you tried SumatraPDF (uses muPDF under the hood)?
https://github.com/sumatrapdfreader/sumatrapdf
Project mention: TextSnatcher: Copy text from images, for the Linux Desktop | news.ycombinator.com | 2024-03-14Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.
Project mention: reflect-cpp - Now with compile time extraction of field names from structs and enums using C++-20. | /r/cpp | 2023-12-09Category Theory for Programmers by Bartosz Milewski (https://github.com/hmemcpy/milewski-ctfp-pdf/releases)
What is QuestPDF? QuestPDF is an open-source .NET library for PDF document generation. It uses a fluent API approach to compose together many simple elements to create complex documents.
Project mention: Ask HN: How do I train a custom LLM/ChatGPT on my own documents in Dec 2023? | news.ycombinator.com | 2023-12-24As others have said you want RAG.
The most feature complete implementation I've seen is h2ogpt[0] (not affiliated).
The code is kind of a mess (most of the logic is in an ~8000 line python file) but it supports ingestion of everything from YouTube videos to docx, pdf, etc - either offline or from the web interface. It uses langchain and a ton of additional open source libraries under the hood. It can run directly on Linux, via docker, or with one-click installers for Mac and Windows.
It has various model hosting implementations built in - transformers, exllama, llama.cpp as well as support for model serving frameworks like vLLM, HF TGI, etc or just OpenAI.
You can also define your preferred embedding model along with various other parameters but I've found the out of box defaults to be pretty sane and usable.
[0] - https://github.com/h2oai/h2ogpt
Project mention: Intro to DOMPDF - lightest and simplest PHP library to generate PDF documents | dev.to | 2024-04-05Generating PDF documents out of your app's HTML output is a very common requirement and there are several open source libraries to accomplish this. I came across this need for my project recently and I evaluated many popular ones such as TCPDF, mpdf, FPDF, etc. But the one that truly stood up to my evaluation in terms of efficiency (minimal footprint) and ease of implementation was DOMPDF.
Project mention: Rnote – An open-source vector-based drawing app | news.ycombinator.com | 2024-03-11I highly recommend Rnote to anyone on Linux that misses the "hodgepodge" notetaking of apps like OneNote. It works like a dream on touchscreens and drawing tablets, with a surprising amount of configuration under the hood.
Also worth noting is Xournal, an older but similar project: https://xournalpp.github.io/
Read through the comments and was surprised no one mentioned libvips - https://github.com/libvips/libvips. At my current small company we were trying to allow image uploads and started with imagemagick but certain images took too long to process and we were looking for faster alternatives. It's a great tool with minimum overhead. For video thumbnails, we use ffmpeg which is really heavy. We off-load video thumbnail generation to a queue. We've had great luck with these tools.
Project mention: 33 React Libraries Every React Developer Should Have In Their Arsenal | dev.to | 2024-01-0723.react-pdf
PDF related posts
-
Sioyek is a PDF viewer with a focus on textbooks and research papers
-
Stirling PDF: Self-hosted, web-based PDF manipulation tool
-
PDF Generation using QuestPDF in ASP.NET Core — Part 1
-
A small lathe built in a Japanese prison camp
-
DEMO - Voice to PDF - Complete PDF documents with voice commands using the Claude 3 Opus API
-
Ask HN: Best Open E-Reader?
-
MuPDF WASM Viewer Demo
-
A note from our sponsor - InfluxDB
www.influxdata.com | 13 May 2024
Index
What are some of the best open-source PDF projects? This list will help you:
Project | Stars | |
---|---|---|
1 | quivr | 32,917 |
2 | Stirling-PDF | 24,441 |
3 | Awesome-CV | 21,887 |
4 | paperless-ngx | 17,064 |
5 | awesome-english-ebooks | 16,697 |
6 | best-resume-ever | 16,236 |
7 | Etherpad | 15,898 |
8 | koodo-reader | 15,767 |
9 | koreader | 15,319 |
10 | gpt4-pdf-chatbot-langchain | 14,594 |
11 | react-pdf | 14,185 |
12 | sumatrapdf | 12,642 |
13 | mit-deep-learning-book-pdf | 12,360 |
14 | OCRmyPDF | 12,134 |
15 | milewski-ctfp-pdf | 10,768 |
16 | QuestPDF | 10,662 |
17 | h2ogpt | 10,506 |
18 | Dompdf | 10,285 |
19 | xournalpp | 10,313 |
20 | Zettlr | 9,660 |
21 | libvips | 9,125 |
22 | react-pdf | 8,616 |
23 | PyPDF2 | 7,466 |
Sponsored