#

data-extraction

Here are 42 public repositories matching this topic...

adrienjoly / npm-pdfreader

🚜 Parse text and tables from PDF files.

javascript parsing tabular-data pdf-converter data-extraction pdf-reader parse-tables rule-based-parsing

Updated Jan 21, 2026
HTML

Exif

AryanVBW / Exif

ExifTool is a powerful command-line tool that can be used to extract and edit metadata in a wide range of media files, including images, audio, and video. Metadata is information that is stored within a file that describes the file’s content or other attributes.

image-processing data-extraction image-metadata hacktoberfest information-gathering vivek powered-by-aryan-technologies aryan-technologies images-hacking aryanshop aryanvbw vivek-w vivek-wagdare

Updated Oct 12, 2025
HTML

N4rr34n6 / TelegramBackup

TelegramBackup is a sophisticated tool designed for extracting, organizing, and archiving messages from your Telegram chats, channels, and groups.

python open-source telegram data-extraction media-download automation-script telegram-backup channel-backup message-archiving supergroup-backup entity-processing chat-history-export telegram-data-analysis html-report-generation group-chat-backup media-files-download

Updated Apr 14, 2025
HTML

aborruso / scrape-cli

Extract HTML elements from the command line using CSS selectors or XPath. Pipe-friendly Python CLI.

python html cli scraping web-scraping xpath data-extraction command-line-tool lxml css-selectors

Updated Feb 22, 2026
HTML

maitreyeepaliwal / Alleropedia-Database-for-Allergens

Metadetabase of 13145 records generated for Allergens with a tabular view of the data. Web interface connected to ease the use, analysis and extraction of data with several added functionalities. Tutorial section added to educate the users of the interface design and features and the database.

bioinformatics biology data-extraction bioinformatics-data allergies bioinformatics-databases allergy database-generator bioinfo metadatabase allergic-diseases biological-database biological-databases database-generation-for-allergens allergen-database secondary-database biology-project alleropedia

Updated Jun 5, 2021
HTML

AyushParkara / Zer0Snatch

Zer0Snatch – A lightweight, zero-dependency tool to securely extract and archive target data from digital sources. Designed for OSINT, automation, and ethical security research.

python security automation osint cybersecurity data-extraction ethical-hacking information-gathering password-sniffer security-tools cli-tool ethical-hacking-tools zer0snatch

Updated Jun 3, 2025
HTML

ermiasgelaye / ETL-Project

In this project, we built a database that demonstrates the changes in American top fastest-growing private companies through time. The database is built on by ingesting, combining, and restructuring data from three main data sources into a conformed one Postgresql database, and deploy into the Flask app.

python api postgres data-science etl pandas-dataframe extract scraping postgresql pandas flask-application data-extraction load transformation scraping-websites flask-sqlalchemy production-database

Updated Aug 17, 2020
HTML

Shreesh8 / Data-Extractor

A lightweight data extraction tool that collects, processes, and structures information from web sources for analysis and automation.

javascript python html automation web-scraping data-extraction

Updated Aug 14, 2025
HTML

Anwarsha7 / resumeparser

An intelligent resume parsing engine built with Python and NLP, aimed at automating the tedious task of sifting through resumes. It accurately extracts vital candidate information such as contact details, employment history, educational qualifications, and technical skills, making it an invaluable asset for recruitment and HR professionals.

python natural-language-processing text-mining information-extraction data-extraction recruitment resume-parser npl resume-analysis hr-management hr-tech parsing-data document-parsing candidate-screening

Updated Jun 2, 2025
HTML

SamadhanSonwane / LinkedIn-Activity-Stats

A Selenium WebDriver project that reads all article and post analytics, and stores it in an MS Excel file.

java automation selenium selenium-java data-extraction selenium-webdriver testng data-extractor automated-testing linkedin-signin apache-poi

Updated Mar 25, 2018
HTML

proxywhirl

wyattowalsh / proxywhirl

rotating proxy system

python data sqlite proxy proxy-server python3 rotating-proxy data-extraction sqlite3 proxypool proxy-list proxy-checker web-data-extraction dataextraction proxy-scraper

Updated Mar 4, 2026
HTML

GitHawkAI / harvest

HARVEST - AI-powered web data to CSV automation using Claude Code and Playwright MCP

automation csv ai mcp web-scraping data-extraction playwright claude-code

Updated Oct 27, 2025
HTML

raymondoyondi / YouTube-Subtitle-Downloader

A user-friendly web application built with Python and the Flask framework that allows you to easily fetch and download subtitles from any YouTube video URL. The app leverages the powerful youtube-transcript-api library to retrieve both manually created and automatically generated captions, offering various download formats like SRT or plain text.

python flask web-app data-extraction youtube-downloader subtitle-downloader transcript-api

Updated Feb 8, 2026
HTML

KomalGoel18 / lyftr-assignment

Full-stack web scraping system that extracts structured, section-aware content from static and JS-rendered websites with interaction support and a JSON viewer frontend.

python automation frontend backend json-api rest-api web-scraping data-extraction fullstack scraping-tool fastapi playwright

Updated Feb 23, 2026
HTML

TelRich / Web_Scrapping_with_BeautifulSoup-and-Wptool

Web scraping Webometrics and Wlkilpedia using Python (Beautiful soup and Wptools) to make a list of top 100 Universities in nigeria

pandas-dataframe web-scraping data-extraction data-gathering beautifulsoup4 api-python requests-library-python wptools json-python

Updated Aug 24, 2022
HTML

raven-wills / embroidery-thread-converter

Professional Rails app for converting embroidery thread colors between brands

ruby rails crafting postgresql web-application data-extraction embroidery portfolio-project conversion-tool thread-conversion

Updated Nov 21, 2025
HTML

Onurkekec0 / Open-Ports-visualization

With this project, you see the visualization of open ports around the world on a map.

python shodan data-visualization cybersecurity data-extraction folium shodan-api passive-data

Updated Nov 3, 2023
HTML

Diggernaut / diggernaut-meta-lang-docs

Diggernaut meta language documentation

web-scraping data-extraction

Updated Mar 3, 2024
HTML

boardgameanalytics / bga-pipeline

Airflow orchestrated ETL pipeline for extracting board game data from BoardGameGeek.com

python docker airflow data-engineering data-extraction etl-pipeline

Updated Oct 22, 2022
HTML

Aidoni0797 / docs.scrapy

Personal notes and experiments while learning the Scrapy framework for web scraping with Python

python learning tutorial web-scraping scrapy data-extraction spiders

Updated Dec 11, 2024
HTML

Improve this page

Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."