Skip to main content

Senior Site Reliability Engineer AI Infrastructure

Andromeda Cluster

San Francisco 2026-04-30 $0–$0

AI Summary

Powered by Claude

You will design, operate, and debug large-scale GPU infrastructure used for distributed training and inference, working directly with customers pushing the limits of modern AI systems. What You’ll Own GPU Cluster Architecture: Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.

Job description

You will design, operate, and debug large-scale GPU infrastructure used for distributed training and inference, working directly with customers pushing the limits of modern AI systems. What You’ll Own GPU Cluster Architecture: Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.

Get a weekly digest of similar roles

Save this search for Senior Site Reliability Engineer AI Infrastructure in San Francisco around $0–$0 and get the strongest matches every week.

Privacy-first. Unsubscribe anytime.

Catalitium logo

Weekly high-match job digest

One email per week with your best-matching roles and salary signals. No spam, unsubscribe anytime.

Privacy-friendly. One curated email per week.

Catalitium logo Contact us

Questions, partnerships, or feedback? Drop a note and we'll reply.