Technical Guide
5 min read
Build an AI Content Moderation Pipeline with Open-Source Models
Learn how to wire together open-source classifiers and LLMs into a production-ready content moderation pipeline that catches harmful text, images, and edge cases. This hands-on guide walks you through every step, from model selection to deployment considerations.
Technical Guide
5 min read
Build a Document Q&A Pipeline With Open-Weights Embeddings
Learn how to build a fully local document Q&A system using open-weights embedding models, a vector store, and a retrieval-augmented generation pattern. This hands-on tutorial takes you from raw PDFs to accurate, cited answers in under an hour.
Technical Guide
5 min read
Model Quantisation: Cut Inference Costs Without Losing Quality
Model quantisation can slash your inference costs by up to 4x while preserving most of your model's accuracy. This hands-on tutorial walks you through INT8 and INT4 quantisation using Hugging Face and bitsandbytes, covering real pitfalls and how to sidestep them.
Technical Guide
5 min read
Run LLM Inference on CPU with llama.cpp and a REST API
Learn how to compile llama.cpp, load a quantized model, and expose it through a local REST API endpoint — all without a GPU. A practical walkthrough for developers who need cost-effective, self-hosted language model inference.
Technical Guide
5 min read
Build a Low-Cost Semantic Search Engine With Open-Source Embeddings
Learn how to build a fully functional semantic search engine using free, open-source embedding models and a lightweight vector store — no expensive APIs required. This hands-on tutorial walks you through every step, from encoding documents to querying results in milliseconds.
Technical Guide
4 min read
Run LLM Inference on CPU With llama.cpp and a REST API
Learn how to compile llama.cpp, download a quantized model, and expose it through a local REST API — all without a GPU. This tutorial walks you through every step so you can run production-grade language model inference on any Linux or macOS machine.
Technical Guide
5 min read
Build a Production RAG System With Open-Source Models, No GPU
Learn how to build a fully functional Retrieval-Augmented Generation pipeline using open-source models that run entirely on CPU. This step-by-step guide covers everything from document ingestion to query serving without a single GPU in sight.
Technical Guide
5 min read
Build an AI Content Moderation Pipeline With Open-Source Models
Learn how to build a production-ready AI content moderation pipeline using open-source models like Llama Guard and Detoxify. This step-by-step guide walks developers through setup, inference, and deployment considerations.
Technical Guide
5 min read
Run LLM Inference on CPU with llama.cpp and a REST API
Learn how to build a fully local, CPU-based LLM inference server using llama.cpp and a lightweight REST API wrapper. This tutorial walks you through every step, from model download to serving real HTTP requests.