
Staff Machine Learning Engineer, Distributed Systems
Join the team
Information
What you'll do
*Help us turn research into meaningful ML products that help tens or hundreds of millions of users
*Design, build, and maintain model training systems that leverage and enable cutting-edge developments in training machine learning models at scale
*Actively work on creating and improving tools to parallelize model training, unifying dataset creation and accuracy measurements across experiments
*Collaborate with and mentor engineers and researchers to help them maximize the speed and efficiency in model research and training
*Actively follow advancements in AI and ML, and participate in discussions about them with the ML and research teams
What you'll need
*Minimum of 5 years experience working on distributed computing projects
*Desire to learn new things, work closely with peers from different teams, and help others
*Production experience with modern cloud computing management (AWS, Kubernetes, Docker, etc.)
*Experience working with distributed computing technologies (for example: Hadoop, Spark, Airflow, Ray)
*Experience with at least one major programming language (Python, Go, Java, etc.)
