Parsing URLs in Python

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Scout Monitoring - Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com
featured
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
  • can_ada

    Python bindings for Ada, a fast and spec-compliant URL parser.

  • I apologize for the misjudgment. I just followed the link to can_ada and saw really minimal tests, e.g. https://github.com/TkTech/can_ada/blob/main/tests/test_parsi...

    I didn't understand that can_ada is not where the parser is developed.

  • furl

    🌐 URL parsing and manipulation made easy.

  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • yarl

    Yet another URL library

  • universal_pathlib

    pathlib api extended to use fsspec backends

  • You might be interested in https://github.com/fsspec/universal_pathlib

  • ada

    WHATWG-compliant and fast URL parser written in modern C++

  • ...

    can_ada is just the python bindings, largely generated via pybind11.

    The actual project is at https://github.com/ada-url/ada

  • w3lib

    Python library of web-related functions

  • A great initiative!

    We need a better URL parser in Scrapy, for similar reasons. Speed and WHATWG standard compliance (i.e. do the same as web browsers) are the main things.

    It's possible to get closer to WHATWG behavior by using urllib and some hacks. This is what https://github.com/scrapy/w3lib does, which Scrapy currently uses. But it's still not quite compliant.

    Also, surprisingly, on some crawls URL parsing can take CPU amounts similar to HTML parsing.

    Ada / can_ada look very promising!

  • url

    Python bindings to the Rust url crate (by crate-py)

  • Nice.

    I'll also throw in that I've recently wrote bindings to Mozilla's servo URL library.

    Those live at https://github.com/crate-py/url

    They're not complete yet (meaning only the parsing bits are exposed, not URL modification) but I too was frustrated with the state of URL parsing.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • rust-url

    URL parser for Rust

  • IMO that URL crate is not especially high quality. I barely work with URLs and I quickly found an embarrassingly trivial bug:

    https://github.com/servo/rust-url/issues/864#issuecomment-16...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • What Is a URL: Dangers of inconsistent parsing of URLs

    1 project | /r/programming | 1 May 2023
  • Napster Sparked a File-Sharing Revolution 25 Years Ago

    3 projects | news.ycombinator.com | 1 Jun 2024
  • Manifest V2 phase-out begins

    7 projects | news.ycombinator.com | 31 May 2024
  • TTE: Terminal Text Effects

    13 projects | news.ycombinator.com | 28 May 2024
  • EcoAct has released it's third open source library!

    2 projects | dev.to | 28 May 2024