
Site Reliability Engineer
Do you want to be part of the team?



Information
We are looking for a skilled Site Reliability Engineer to join our team. This position will focus on supporting the LATAM timezone, working closely with a team of SREs and a hands-on Lead SRE, while collaborating with a European-based SRE team. The role ensures seamless follow-the-sun 24/7 on-call support for a customer platform comprising multiple Java backend services.
Responsibilities
* Deliver 12/7 on-call support for Java backend services, ensuring consistent platform performance and uptime
* Oversee API Gateway observability to monitor and safeguard service health
* Implement and deploy patches to address issues in Java code and cloud infrastructure components
* Build and maintain metrics and dashboards to evaluate and enhance platform stability and performance
* Develop and refine runbooks for EOS backend services to optimize operational workflows
* Track and monitor Service Level Objectives (SLOs), addressing errors and contributing code changes to improve service reliability
* Diagnose and resolve complex system issues using logs and telemetry to identify root causes effectively
* Work with various teams to ensure operational readiness and improve incident response processes
Nombre de la empresa
EPAM
Location
Argentina
Apply here
Work scheme
Remote
Años de experiencia requeridos
2
