YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
-
Updated
Nov 24, 2024 - Python
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
A versatile Python-based web scraper that extracts content from single URLs or entire sitemaps, organizing data into structured text files. Features include sitemap parsing, content grouping by URL structure, and an easy-to-use command-line interface. Ideal for data extraction, content analysis, and web research tasks.
Crawl sitemap of a given website and export metadata of its pages recursively into CSV format.
A Python script to automate URL update and deletion notifications to Google using the Indexing and Search Console APIs. It fetches URLs from an XML sitemap or processes batch URLs from a JSON file, handling Google API authentication and indexing requests.
Python tool to extract URLs from XML sitemaps (including nested sitemap indexes) and automatically submit them to Google Indexing API for faster indexing. Supports bulk processing, rate limiting, and detailed progress reporting.
An advanced, auto-pilot web scraping framework for building clean text datasets. Features recursive sitemap parsing, smart media filtering, and content extraction from complex page builders (like Elementor). Supports auto-resume and multithreading.
Atlas — Web-scraping toolkit for Swedish authorities. Automates data collection with pagination, PDF crawling, metadata extraction, and multi-format export.
Высокопроизводительный асинхронный веб-парсер на Python с поддержкой Sitemap, рендерингом JS (Playwright) и дашбордом в реальном времени на Streamlit.
🕷️ Automate web scraping with OmniCrawler, a powerful tool that builds large datasets by discovering and downloading relevant content effortlessly.
Add a description, image, and links to the sitemap-parser topic page so that developers can more easily learn about it.
To associate your repository with the sitemap-parser topic, visit your repo's landing page and select "manage topics."