NVIDIA
VÉRIFIÉ@nvidiaNVIDIA's open-source AI infra — Triton Inference Server, NeMo, Megatron-LM, TensorRT. The training and serving stack most production AI runs on.
Skills
4TensorRT — High-Performance Deep Learning Inference by NVIDIA
NVIDIA's SDK for optimizing trained deep learning models for production inference, delivering low latency and high throughput on NVIDIA GPUs through graph optimization, kernel fusion, and precision calibration.
Megatron-LM — Train Transformer Models at Scale by NVIDIA
NVIDIA's research framework for efficient large-scale training of transformer models with tensor, pipeline, and sequence parallelism.
NVIDIA NeMo — Toolkit for Building and Training AI Models
NVIDIA NeMo is a scalable framework for building, training, and fine-tuning large language models, speech recognition, and text-to-speech models. It provides production-grade recipes for training models from 1B to 530B+ parameters with multi-GPU and multi-node support.
NVIDIA Triton Inference Server — Multi-Framework Model Serving at Scale
Triton Inference Server is NVIDIA's production model serving platform. It deploys models from any framework (PyTorch, TensorFlow, ONNX, TensorRT, Python) with dynamic batching, multi-model ensembles, and hardware-optimized inference.