Site Reliability Engineer SRE ML platform Job at Donato Technologies, Inc, Austin, TX

U0draFZmZXg1Slc0RGFvNk96Nzc5Rzd5SUE9PQ==
  • Donato Technologies, Inc
  • Austin, TX

Job Description

Title: Site Reliability Engineer SRE ML platform

Location: Austin, TX OR Sunnyvale, CA

Type: FTC

Responsibilities:

  • Continuous Deployment using GitHub Actions, Flux, Kustomize
  • Design and implement cloud solutions, build MLOps on cloud AWS
  • Data science model containerization, deployment using docker, VLLM, Kubernetes
  • Communicate with a team of data scientists, data engineers and architects, document the processes
  • Develop and deploy scalable tools and services for our clients to handle machine learning training and inference.
  • Knowledge of ML models and LLM

Qualifications:

  • 6+ years of experience in ML Ops with strong knowledge in Kubernetes, Python, MongoDB and AWS.
  • Good understanding of Apache SOLR.
  • Proficient with Linux administration.
  • Knowledge of ML models and LLM.
  • Ability to understand tools used by data scientists and experience with software development and test automation
  • Ability to design and implement cloud solutions and ability to build MLOps pipelines on cloud solutions (AWS)
  • Experience working with cloud computing and database systems
  • Experience building custom integrations between cloud-based systems using APIs
  • Experience developing and maintaining ML systems built with open-source tools
  • Experience with MLOps Frameworks like Kubeflow, MLFlow, DataRobot, Airflow etc., experience with Docker and Kubernetes
  • Experience developing containers and Kubernetes in cloud computing environments
  • Familiarity with one or more data-oriented workflow orchestration frameworks (Kubeflow, Airflow, Argo, etc.)
  • Ability to translate business needs to technical requirements
  • Strong understanding of software testing, benchmarking, and continuous integration
  • Exposure to machine learning methodology and best practices
  • Good communication skills and ability to work in a team

Note: Focus is to have 60% SRE and 40% ML Ops

Skill Area

Includes

Weight (%)

Platform Reliability & Containerization

Kubernetes, Docker, Microservices, Linux

30%

MLOps & AWS Cloud

Model deployment, versioning, monitoring, AWS (SageMaker, S3, Lambda, EKS)

25%

CI/CD & GitOps

GitHub Actions, Flux

15%

Monitoring & Observability

Splunk, Grafana, Prometheus, performance tracking

15%

Integration & Collaboration

Python scripting, API integrations, Apache Solr, LLM awareness, teamwork with data scientists & engineers

15%

Job Tags

Fixed term contract,

Similar Jobs

SeekTeachers

Psychology Teacher Job at SeekTeachers

Description Psychology Teacher - Unleash Minds, Transform Lives in the UAE! &##129504;Here is an...  ...and transformative lessons. Engage students with innovative teaching methods that nurture critical thinking. Foster a love for learning... 

Ironclad

Remote Customer Success Manager Job at Ironclad

 ...Ironclad is the leading AI-powered contract lifecycle management platform, processing billions of contracts every year. Every business...  ...Y Combinator, and BOND. Wed love for you to join us! As a Customer Success Manager, youll be responsible for partnering with our... 

Envista Holdings Corporation

HR Intern - Operations Job at Envista Holdings Corporation

**Job Description:****Position Summary:** We are seeking a proactive and people-oriented **HR Intern** who values the importance of building strong, in-person relationships with employees. The ideal candidate will demonstrate confidence in engaging with cross-functional... 

Dignity Health

Janitor Job at Dignity Health

**Responsibilities**As a Janitor, your role will be to provide basic and specialized cleaning services throughout the hospital. Your primary focus will be on the maintenance and cleaning of hard and soft flooring surfaces, including stairwells, hallways, and public spaces... 

Strategy Education

Teacher of Maths- Norfolk Job at Strategy Education

 ... 18 secondary school in Norfolk who are looking to appoint a teacher of Mathematics to cover a maternity leave within the department...  ...term basis starting in December 2020 and continuing until the summer term or 2021.The successful candidate will be subject to a variety...