data-processing-pipelines

Here are 10 public repositories matching this topic...

NVIDIA-NeMo / Curator

Scalable data pre processing and curation toolkit for LLMs

python data data-processing data-preparation deduplication data-quality data-curation data-prep fine-tuning fast-data-processing data-processing-pipelines datacuration large-language-models llm llmapps large-scale-data-processing datarecipes semantic-deduplication llm-data-quality

Updated Mar 4, 2026
Python

graphbookai / graphbook

Star

Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python

workflow data-science machine-learning framework research ai ml pytorch data-processing data-processing-pipelines

Updated Mar 3, 2026
Python

westandskif / convtools

Star

convtools is a specialized Python library for dynamic, declarative data transformations with automatic code generation

python csv-converter csv parsing transformations conversions data-analysis code-generation data-processing-pipelines

Updated Feb 9, 2026
Python

edrewitz / WxData

Star

A Python library that acts as a client to download, pre-process and post-process weather data. Friendly for users on VPN/PROXY connections.

python data-science data automation data-engineering meteorology weather-data data-processing data-engineering-pipeline data-processing-pipelines meteorology-library

Updated Mar 4, 2026
Python

kaburia / filter-stations

Star

Making it easier to navigate and clean TAHMO weather station data for ML development

pypi-package api-development data-processing-pipelines

Updated Jan 7, 2026
Python

tamasgal / thepipe

Sponsor

Star

A simplistic, general purpose pipeline framework.

python data-science pipelines provenance data-processing hacktoberfest data-processing-pipelines

Updated Jul 21, 2022
Python

caddickzac / Music-Manager-for-Plex

Star

Streamlit app to export and bulk-update Plex music metadata and smart playlist creator.

metadata plex playlist-generator plex-server plex-media-server data-processing plex-api plexapi data-processing-pipelines

Updated Mar 3, 2026
Python

Plato-solutions / artifician

Star

Artifician is an event-driven framework designed to simplify and accelerate the process of preparing datasets for Artificial Intelligence models.

python machine-learning artificial-intelligence data-processing dataset-preparation data-processing-pipelines

Updated Jan 30, 2024
Python

chandnii7 / Big-Data-Processing-Pipeline

Star

A pipeline that consumes twitter data to extract meaningful insights about a variety of topics using the following technologies: twitter API, Kafka, MongoDB, and Tableau.

kafka big-data mongodb twitter-api data-visualization zookeeper data-analytics kafka-consumer kafka-producer tableau nosql-database kafka-streaming big-data-processing data-processing-pipelines

Updated Aug 2, 2021
Python

Lucky-akash321 / Resume-Application-Tracker-System-ATS-using-Gemini-Pro-Vision

Star

The Resume Application Tracking System uses Google Gemini Pro Vision to automatically parse, analyze, and categorize resumes for efficient recruitment. It integrates AI-driven vision capabilities to enhance resume processing and candidate selection.

python machine-learning natural-language-processing data-visualization data-processing-pipelines ai-based-resume-parsing google-gemini-pro-vision automated-candidate-categorization cloud-based-solutions

Updated Feb 13, 2025
Python

Improve this page

Add a description, image, and links to the data-processing-pipelines topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-processing-pipelines topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-processing-pipelines

Here are 10 public repositories matching this topic...

NVIDIA-NeMo / Curator

graphbookai / graphbook

westandskif / convtools

edrewitz / WxData

kaburia / filter-stations

tamasgal / thepipe

caddickzac / Music-Manager-for-Plex

Plato-solutions / artifician

chandnii7 / Big-Data-Processing-Pipeline

Lucky-akash321 / Resume-Application-Tracker-System-ATS-using-Gemini-Pro-Vision

Improve this page

Add this topic to your repo