🚜 Parse text and tables from PDF files.
-
Updated
Jan 21, 2026 - HTML
🚜 Parse text and tables from PDF files.
ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.
TelegramBackup is a sophisticated tool designed for extracting, organizing, and archiving messages from your Telegram chats, channels, and groups.
Extract HTML elements from the command line using CSS selectors or XPath. Pipe-friendly Python CLI.
Metadetabase of 13145 records generated for Allergens with a tabular view of the data. Web interface connected to ease the use, analysis and extraction of data with several added functionalities. Tutorial section added to educate the users of the interface design and features and the database.
Zer0Snatch – A lightweight, zero-dependency tool to securely extract and archive target data from digital sources. Designed for OSINT, automation, and ethical security research.
In this project, we built a database that demonstrates the changes in American top fastest-growing private companies through time. The database is built on by ingesting, combining, and restructuring data from three main data sources into a conformed one Postgresql database, and deploy into the Flask app.
A lightweight data extraction tool that collects, processes, and structures information from web sources for analysis and automation.
An intelligent resume parsing engine built with Python and NLP, aimed at automating the tedious task of sifting through resumes. It accurately extracts vital candidate information such as contact details, employment history, educational qualifications, and technical skills, making it an invaluable asset for recruitment and HR professionals.
A Selenium WebDriver project that reads all article and post analytics, and stores it in an MS Excel file.
rotating proxy system
HARVEST - AI-powered web data to CSV automation using Claude Code and Playwright MCP
A user-friendly web application built with Python and the Flask framework that allows you to easily fetch and download subtitles from any YouTube video URL. The app leverages the powerful youtube-transcript-api library to retrieve both manually created and automatically generated captions, offering various download formats like SRT or plain text.
Full-stack web scraping system that extracts structured, section-aware content from static and JS-rendered websites with interaction support and a JSON viewer frontend.
Web scraping Webometrics and Wlkilpedia using Python (Beautiful soup and Wptools) to make a list of top 100 Universities in nigeria
Professional Rails app for converting embroidery thread colors between brands
With this project, you see the visualization of open ports around the world on a map.
Airflow orchestrated ETL pipeline for extracting board game data from BoardGameGeek.com
Personal notes and experiments while learning the Scrapy framework for web scraping with Python
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."