bfloat16

Here are 25 public repositories matching this topic...

uxlfoundation / oneDNN

oneAPI Deep Neural Network Library (oneDNN)

library performance deep-neural-networks deep-learning cpp x64 x86-64 openmp amx tbb aarch64 avx512 sycl bfloat16 oneapi onednn vnni xe-architecture

Updated Mar 4, 2026
C++

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

Updated Mar 4, 2026
C

libxsmm / libxsmm

Star

Library for specialized dense and sparse matrix operations, and deep learning primitives.

machine-learning fortran vector matrix intel avx sse jit simd matrix-multiplication sparse blas convolution avx2 amx tensor avx512 transpose bfloat16

Updated Feb 14, 2026
C

VoidStarKat / half-rs

Sponsor

Star

Half-precision floating point types f16 and bf16 for Rust.

rust crates floating-point ieee754 binary16 float16 rust-embedded f16 bfloat16

Updated Feb 11, 2026
Rust

JuliaMath / BFloat16s.jl

Star

Julia implementation for the BFloat16 number type

math julia bfloat16

Updated Jan 19, 2026
Julia

higham / chop

Star

Round matrix elements to lower precision in MATLAB

matlab matrix arithmetic ieee rounding half-precision fp16 binary16 bfloat16 arithmetic-formats precisions subnormal-numbers

Updated Jun 14, 2022
MATLAB

shibatch / tlfloat

Sponsor

Star

C++ template library for floating point operations

library cplusplus cross-platform math constexpr templates cuda elementary-functions arbitrary-precision floating-point ieee754 half-precision cpp20 quadruple-precision float128 bfloat16 heapless octuple-precision float256

Updated Dec 28, 2025
C++

DW0RKiN / Floating-point-Library-for-Z80

Star

Floating-Point Arithmetic Library for Z80

z80 zx-spectrum floating-point half-precision binary16 half bfloat16 bfloat

Updated Sep 11, 2023
Assembly

KernelTuner / kernel_float

Star

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

performance cpp gpu cuda kernel-tuner hip vectorization floating-point half-precision mixed-precision low-precision bfloat16 header-only-library reduced-precision

Updated Mar 2, 2026
C++

afterdusk / flop

Star

IEEE 754-style floating-point converter

floating-point ieee-754 fp16 floating-point-conversion bfloat16 tensorfloat

Updated Jan 30, 2023
TypeScript

aahouzi / llama2-chatbot-cpu

Star

A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.

Updated Feb 27, 2024
Python

nestordemeure / jochastic

Star

A JAX implementation of stochastic addition.

stochastic rounding addition jax bfloat16

Updated Aug 15, 2022
Python

d4l3k / go-bfloat16

Star

Bfloat16 conversion utilities for Go/Golang

go golang binary16 float16 bfloat16 bf16

Updated Oct 5, 2021
Go

nestordemeure / stochastorch

Star

A Pytorch implementation of stochastic addition.

pytorch stochastic floating-point addition bfloat16

Updated Aug 11, 2022
Python

sigurd4 / custom_float

Star

Customizable floating point types, with all standard floating point operations implemented from scratch.

floating-point float ieee754 bfloat16 tensorfloat

Updated Dec 8, 2025
Rust

StarOne01 / bfloat16

Star

A lightweight C++ implementation of the Brain Floating Point (bfloat16) format.

cxx float bfloat16

Updated Aug 22, 2025
C++

imciner2 / ChopBLAS

Sponsor

Star

Basic linear algebra routines implemented using the chop rounding function

matlab matrix arithmetic rounding half-precision bfloat16 stochastic-rounding

Updated Feb 14, 2023
MATLAB

Essenceia / Systolic_Array_with_DFT_v2

Star

IHP 130nm ASIC tapeout of a 2x2 bfloat16 matrix matrix multiplication with DFT infrastructure. Iteration on the previous accelerator taped out on GF180.

asic rtl verilog ihs systolic-arrays bfloat16 bfloat 130nm sg13g2 ihp-sg13g2

Updated Mar 3, 2026
Verilog

puzzlef / vector-sum

Star

Comparison of vector element sum using various data types.

experiment vector sum sequential float single-threaded bfloat16

Updated Apr 8, 2025
C++

puzzlef / pagerank-datatype

Star

Comparison of PageRank algorithm using various datatypes.

experiment graph pagerank csr sequential float pull single-threaded bfloat16

Updated Apr 8, 2025
C++

Improve this page

Add a description, image, and links to the bfloat16 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the bfloat16 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bfloat16

Here are 25 public repositories matching this topic...

uxlfoundation / oneDNN

ashvardanian / SimSIMD

libxsmm / libxsmm

VoidStarKat / half-rs

JuliaMath / BFloat16s.jl

higham / chop

shibatch / tlfloat

DW0RKiN / Floating-point-Library-for-Z80

KernelTuner / kernel_float

afterdusk / flop

aahouzi / llama2-chatbot-cpu

nestordemeure / jochastic

d4l3k / go-bfloat16

nestordemeure / stochastorch

sigurd4 / custom_float

StarOne01 / bfloat16

imciner2 / ChopBLAS

Essenceia / Systolic_Array_with_DFT_v2

puzzlef / vector-sum

puzzlef / pagerank-datatype

Improve this page

Add this topic to your repo