Software Engineer, Site Reliability - Niantic
Job Offers
Information
Learn more: https://careers.nianticlabs.com/openings/software-engineer-site-reliability/
Contact us: ahoy@nianticlabs.com
San Francisco, CA
Sunnyvale, CA
Seattle Area (Bellevue), WA
Los Angeles, CA
Tokyo, Japan
Niantic’s Engineering Team is seeking a Site Reliability engineer to ensure the availability of server infrastructure that supports the hosted AR/Geo platform underpinning projects such as Pokémon GO, Ingress, and Harry Potter: Wizards Unite. You will design and implement infrastructures to host and monitor our game servers and platform services, and conduct real-time root cause analysis of system anomalies at massive scales on servers hosting hundreds of millions of events per day.
Responsibilities
Build high-throughput, low-latency, highly available and scalable systems that host Niantic’s products.
Design and implement monitoring and fault-tolerance systems for Java-based servers.
Collaborate with other engineers to ensure that new and upgraded deployed systems meet internal standards for reliability, availability, latency, performance and cost.
Develop tools to automate system provisioning, application deployment and configuration management. Participate in code reviews and conduct troubleshooting to ensure uptime for live systems.
Participate in blameless incident post-mortems, identify lessons learned and take ownership of followup action items.
Qualifications
BS in Computer Science or a similar major.
2+ years of experience monitoring or building production infrastructure with substantial traffic.
Experience with scripting languages such as BASH, Python or Lua.
Familiarity with Kubernetes and cloud platforms such as Google Cloud, Azure or AWS.
Plus If...
You have experience with infrastructure management tools such as Terraform and Ansible.
You have experience with real-time or asynchronous processing of large-scale datasets.
You have experience with a compiled server-side language, such as C++, Go, Java or C#.