Python git-scraping

Open-source Python projects categorized as git-scraping

Top 6 Python git-scraping Projects

  • github-stats

    Better GitHub statistics images for your profile, with stats from private repos too

  • Project mention: Ask HN: How to Do a GitHub Wrapped? | news.ycombinator.com | 2023-12-19

    I have done similar work using the GitHub APIs before. I recommend using their GraphQL explorer to develop your queries interactively. You may need to fall back on the REST API instead of the GraphQL one for certain stats.

    https://docs.github.com/en/graphql/overview/explorer

    You can also refer to my code here, which may already collect some of the statistics you're interested in.

    https://github.com/jstrieb/github-stats/blob/master/github_s...

    I predict the most annoying part of this project will be dealing with authentication. There are a handful of ways to do it, and the permissions can be finicky depending on what data you are fetching.

    Best of luck!

  • spotify-playlist-archive

    Daily snapshots of public Spotify playlists

  • Project mention: Git Scraping Spotify | news.ycombinator.com | 2023-08-11
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • csv-diff

    Python CLI tool and library for diffing CSV and JSON files

  • help-scraper

    Record a history of --help for various commands

  • nepstonks

    An automated bot that scrapes the latest upcoming issues, news, and investment opportunities that are announced inside Nepal and sends them to a telegram channel.

  • Project mention: Git scraping: track changes over time by scraping to a Git repository | news.ycombinator.com | 2023-08-10

    Git is a key technology in this approach, because the value you get out of this form of scraping is the commit history - it's a way of turning a static source of information into a record of how that information changed over time.

    I think it's fine to use the term "scraping" to refer to downloading a JSON file.

    These days an increasing number of websites work by serving up JSON which is then turned into HTML by a client-side JavaScript app. The JSON often isn't a formally documented API, but you can grab it directly to avoid the extra step of processing the HTML.

    I do run Git scrapers that process HTML as well. A couple of examples:

    scrape-san-mateo-fire-dispatch https://github.com/simonw/scrape-san-mateo-fire-dispatch scrapes the HTML from http://www.firedispatch.com/iPhoneActiveIncident.asp?Agency=... and records both the original HTML and converted JSON in the repository.

    scrape-hacker-news-by-domain https://github.com/simonw/scrape-hacker-news-by-domain uses my https://shot-scraper.datasette.io/ browser automation tool to convert an HTML page on Hacker News into JSON and save that to the repo. I wrote more about how that works here: https://simonwillison.net/2022/Dec/2/datasette-write-api/

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python git-scraping related posts

  • Datasette’s new JSON write API: The first alpha of Datasette 1.0

    3 projects | news.ycombinator.com | 2 Dec 2022
  • LocalStack and AWS Parity Explained

    4 projects | news.ycombinator.com | 4 Aug 2022
  • Canberra COVID Megathread: Tuesday 17 August

    2 projects | /r/canberra | 16 Aug 2021
  • New exposure sites just added

    2 projects | /r/canberra | 16 Aug 2021

Index

What are some of the best open-source git-scraping projects in Python? This list will help you:

Project Stars
1 github-stats 2,750
2 spotify-playlist-archive 383
3 csv-diff 275
4 help-scraper 41
5 nepstonks 24
6 scrape-san-mateo-fire-dispatch 1

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com