Running a trained AI model on new input data to generate predictions or outputs — the primary use case for GPU compute networks like Akash and io.net.
Inference is typically less computationally intensive than training, making it accessible to a wider range of GPUs. As LLM deployment grows, inference demand is becoming the dominant workload in DePIN compute networks, and per-token pricing is emerging as a standard billing unit.