High-Performance AI Inference: Systems, Caching, and Distributed Execution