Senior Site Reliability Engineer

11 Августа

Партнерские Вакансии

Город:

Тбилиси

Занятость:

Полная занятость

Опыт:

Более 6 лет

Компания "Балхаш Системс"

We have 30 years of expertise in designing and building custom software systems. We provide software development services focusing on complex high-load applications, AI and BI solutions, and mobile apps.

Our client is a company in Luxembourg specializing in a knowledge assessment system with expertise in various areas, including academia (universities and schools).

As a DevOps Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of our systems. You will bridge the gap between development and operations by applying software engineering principles to infrastructure and operations problems. Your role will focus on automation, incident response, monitoring, capacity planning, and improving system resilience while supporting production workloads on Google Cloud Platform (GCP).

Responsibilities:

  • Design, implement, and maintain highly available, scalable, and resilient cloud-based infrastructure using Google Cloud Platform (GCP).
  • Define and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).
  • Conduct capacity planning, performance tuning, and load testing to optimize system performance.
  • Develop chaos engineering practices to identify and mitigate failure scenarios.
  • Develop and maintain Infrastructure as Code (IaC) using Terraform, Ansible, or equivalent tools.
  • Automate system provisioning, configuration management, and deployments using CI/CD pipelines (ArgoCD, GitOps, GitHub Actions).
  • Improve auto-healing and self-recovery capabilities in production environments.
  • Monitor system health and performance using Google Cloud Operations Suite (Stackdriver), Prometheus, Dynatrace, Grafana and Datadog.
  • Participate in on-call rotation, troubleshoot and resolve production incidents by applying root cause analysis (RCA).
  • Implement postmortem processes and drive corrective actions to prevent recurrence.
  • Implement and enforce security best practices, ensuring compliance with ISO 27001, SOC 2, and GDPR.
  • Apply IAM (Identity & Access Management) best practices for secure cloud operations.
  • Manage network security, including firewalls, VPNs, and service mesh (e.g., Istio).
  • Work closely with development, security, and operations teams to improve deployment strategies.
  • Advocate for blameless postmortems, knowledge sharing, and documentation improvements.
  • Lead SRE best practices adoption, including error budgeting and toil reduction.

Required experience and skills:

- 3+ years of experience in a DevOps, SRE, or Cloud Engineering role.

− Strong expertise in Google Cloud Platform (GCP) services, including GKE, Cloud Run, Cloud Functions, Cloud SQL, BigQuery, and Pub/Sub.

− Experience with Kubernetes (GKE) and container orchestration.

− Proficiency in Terraform, Helm, and Kubernetes operators for infrastructure automation.

− Strong scripting and automation skills in Python, Bash, or Go.

− Experience with monitoring, logging, and tracing tools (e.g., Google Cloud Operations Suite, Prometheus, OpenTelemetry).

− Strong understanding of CI/CD pipelines using tools like ArgoCD, Jenkins, or GitHub Actions.

− Knowledge of GitOps methodologies and IaC best practices.

− Strong experience with PostgreSQL, Redis, and NoSQL databases.

− Strong problem-solving and critical-thinking skills.

− Ability to work collaboratively in a fast-paced environment.

− Strong communication and documentation skills.

− Ability to manage incidents under pressure and work on call as needed.

− Experience with multi-cloud (AWS/GCP) and hybrid environments.

− Knowledge of site reliability engineering principles (Google SRE).

− Understanding of security best practices for cloud-native applications.

− Google Cloud Certification (Professional Cloud DevOps Engineer, Professional Cloud Architect) is a plus.

Our offer as your future employer:

  • full-time job with the flexible work schedule
  • possibility to work remotely
  • opportunities for professional growth.
Похожие вакансии

06 Августа

Awesome full-stack .NET Core / JS software engineer

Тбилиси

Компания "Центория Софт" Ready to level up your tech game? Centaurea is on the hunt for an exceptional Full-Stack .NET Core / JS Software...

Отправить резюме подробнее

06 Августа

Lead QA Engineer( Andersen )

Тбилиси

Компания "Andersen" Andersen is hiring a Lead QA Engineer for a project with a global insurance provider in Germany. The role focuses on...

Отправить резюме подробнее

06 Августа

Middle Atlassian Engineer

Тбилиси

Компания "N1 INTERACTIVE Ltd" Overview: We're hiring a Middle Atlassian Engineer to join our Atlassian Team within the IT Department....

Отправить резюме подробнее

07 Августа

Junior Full Stack QA Engineer

Батуми

Компания "Andersen" Andersen is looking for a Junior Full Stack QA Engineer for a project with a global insurance company in Germany. The...

Отправить резюме подробнее

06 Августа

Network Engineer (Vietnam)

Батуми

Компания "Servers.com" Мы ищем Сетевого инженера к нам в команду во Вьетнаме. Servers.com — это международный хостинг-провайдер с...

Отправить резюме подробнее

Вакансия размещена в отрасли

Информационные технологии / IT / Интернет: