View All Jobs 127929

Jr. Site Reliability Engineer 網站可靠性工程師

Build automated tools to improve infrastructure reliability and system performance
Taipei, Taiwan
Junior
1 month ago
17LIVE

17LIVE

A live streaming platform that allows users to broadcast, watch, and interact with performers in real-time.

Site Reliability Engineer

We are currently hiring for Site Reliability Engineer professionals that will take part in:

Responsible for the overall performance and reliability of 17LIVE's infrastructure and products. SREs design and implement the tools that automate building reliable and performant systems.

  • Ensuring the overall performance and reliability of 17LIVE's infrastructure and products.
  • Automation: SREs can't stand tasks that aren't automated or tools that aren't in place.
  • System Architecture: Understanding the lifecycle of a system (e.g., from startup to service provision to shutdown).
  • Deployment and Change Management: Knowing the service release process (e.g., GitFlow, GitHubFlow, GitLabFlow) and how to manage version control, understanding GitOps.
  • Monitoring Services: Understanding how to collect logs, metrics, and create dashboards for monitoring services.
  • Enhancing Availability: Knowing how to deploy High Availability (HA) and Disaster Recovery (DR) architectures.
  • Incident Management: Handling system incidents (improving the On-Call experience with tools and procedures), being able to preliminarily identify possible causes of incidents, and assisting with post-incident analysis.
  • Understanding Infrastructure as Code (IaC) and being proficient with at least one IaC tool, such as Terraform.
  • Participate in on call duty to provide 24/7 support.

Good to Have:

  • Understanding the basic principles of Linux and a willingness to delve deeper into Linux's internal structure.
  • Strong programming skills in at least one of the following languages: Go, C, C++, Python, Java, and the ability to learn other languages.
  • Basic shell scripting skills.
  • Experience in maintaining Kubernetes, CI/CD, and Monitoring systems.
  • Experience with IDC, AWS, GCP, or Azure.

You will be highly considered if you have the following experience:

  • Possessing Kubernetes or cloud-related certifications.
  • Knowledge of container technologies such as Docker, containerd, or podman.
  • Knowledge of one of the following: MySQL, MongoDB, ELK, Datadog, Prometheus, or similar technologies.
  • Understanding of caching and queue systems like Redis, Memcached, RabbitMQ, Apache Kafka, etc.
  • Contributions to open-source software.
+ Show Original Job Post
























Jr. Site Reliability Engineer 網站可靠性工程師
Taipei, Taiwan
Engineering
About 17LIVE
A live streaming platform that allows users to broadcast, watch, and interact with performers in real-time.