Site Reliability Engineer

Site Reliability Engineer

Do you want to be part of the team?
Site Reliability Engineer
Site Reliability Engineer
Site Reliability Engineer

Information

We are looking for a skilled Site Reliability Engineer to join our team. This position will focus on supporting the LATAM timezone, working closely with a team of SREs and a hands-on Lead SRE, while collaborating with a European-based SRE team. The role ensures seamless follow-the-sun 24/7 on-call support for a customer platform comprising multiple Java backend services. Responsibilities * Deliver 12/7 on-call support for Java backend services, ensuring consistent platform performance and uptime * Oversee API Gateway observability to monitor and safeguard service health * Implement and deploy patches to address issues in Java code and cloud infrastructure components * Build and maintain metrics and dashboards to evaluate and enhance platform stability and performance * Develop and refine runbooks for EOS backend services to optimize operational workflows * Track and monitor Service Level Objectives (SLOs), addressing errors and contributing code changes to improve service reliability * Diagnose and resolve complex system issues using logs and telemetry to identify root causes effectively * Work with various teams to ensure operational readiness and improve incident response processes
Nombre de la empresa
EPAM
Location
Argentina
Work scheme
Remote
Años de experiencia requeridos
2

Log in

See all the content and easy-to-use features by logging in or registering!