Parsing URLs in Python

Scout Monitoring - Free Django app performance insights with Scout Monitoring

Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

www.scoutapm.com

featured

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

can_ada

2 123 6.9 C++

Python bindings for Ada, a fast and spec-compliant URL parser.

I apologize for the misjudgment. I just followed the link to can_ada and saw really minimal tests, e.g. https://github.com/TkTech/can_ada/blob/main/tests/test_parsi...
I didn't understand that can_ada is not where the parser is developed.

furl

1 2,574 0.0 Python

🌐 URL parsing and manipulation made easy.
Scout Monitoring

www.scoutapm.com featured

Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
yarl

2 1,246 9.4 Python

Yet another URL library
universal_pathlib

1 186 7.6 Python

pathlib api extended to use fsspec backends

You might be interested in https://github.com/fsspec/universal_pathlib

ada

6 1,244 9.2 C++

WHATWG-compliant and fast URL parser written in modern C++

...
can_ada is just the python bindings, largely generated via pybind11.
The actual project is at https://github.com/ada-url/ada

w3lib

1 384 6.4 Python

Python library of web-related functions

A great initiative!
We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.
It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.
Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.
Ada / can_ada look very promising!

url

1 4 8.4 Python

Python bindings to the Rust url crate (by crate-py)

Nice.
I'll also throw in that I've recently wrote bindings to Mozilla's servo URL library.
Those live at https://github.com/crate-py/url
They're not complete yet (meaning only the parsing bits are exposed, not URL modification) but I too was frustrated with the state of URL parsing.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rust-url

2 1,246 7.2 Rust

URL parser for Rust

IMO that URL crate is not especially high quality. I barely work with URLs and I quickly found an embarrassingly trivial bug:
https://github.com/servo/rust-url/issues/864#issuecomment-16...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

What Is a URL: Dangers of inconsistent parsing of URLs

1 project | /r/programming | 1 May 2023
Napster Sparked a File-Sharing Revolution 25 Years Ago

3 projects | news.ycombinator.com | 1 Jun 2024
Manifest V2 phase-out begins

7 projects | news.ycombinator.com | 31 May 2024
TTE: Terminal Text Effects

13 projects | news.ycombinator.com | 28 May 2024
EcoAct has released it's third open source library!

2 projects | dev.to | 28 May 2024

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Url Python URL Manipulation HacktoberFest url-parsing
Post date: 16 Mar 2024

can_ada

furl

Scout Monitoring

yarl

universal_pathlib

ada

w3lib

url

InfluxDB

rust-url

Related posts

What Is a URL: Dangers of inconsistent parsing of URLs

Napster Sparked a File-Sharing Revolution 25 Years Ago

Manifest V2 phase-out begins

TTE: Terminal Text Effects

EcoAct has released it's third open source library!