Cloud Operations Specialist
Insight Global is looking for a Cloud Operations Specialist to join the team of one of its large OEM customers based in Dearborn, MI. This individual will be joining the Cloud Operations organization, and will be focused on providing platform services to modern cloud native applications based in GCP. Currently, Cloud Data Messages Services leverage technologies like GCP Pub Sub and Confluent Kafka Cloud. This position requires an understanding of how GCP messaging services like Pub/Sub, alongside Kafka, integrate with other native services like Cloud Run, Dataflow, etc., to meet customer needs. This role is hybrid on-site in Dearborn 1 day a week (subject to potential change).
Specific responsibilities include:
- Develop, improve, and support Infrastructure as Code (IaC) practices.
- Providing highly scalable and available infrastructures.
- Implement and enhance SRE practices.
- Develop, integrate, and enhance automations for the services and its management.
- Enabling real-time data processing and event-driven architectures.
- Collaborate with many application DevOps teams, as well as product vendors.
- Develop automated processes to simplify the adoption and improve experiences for application development teams.
- Identify opportunities for adopting new data streaming technologies and patterns to solve existing needs and anticipate future challenges.
- Create and maintain Terraform modules and documentation for provisioning and managing Pub/Sub topics/subscriptions, Kafka clusters, and related networking configurations, often with a paired partner.
- Improve continuous integration tooling by automating manual processes within the delivery pipeline for messaging applications and enhancing quality gates based on past learnings.
- Monitor application logs, metrics, and alerts to proactively identify and resolve issues in cloud resources and infrastructure, ensuring high uptime and optimal performance.
- Ensure cloud systems are configured correctly, run efficiently, and remain secure against potential threats.
- Ensure the availability and reliability of cloud services on public and private cloud platforms through proactive monitoring and incident response.
- Own and drive end to end technical resolution of critical incidents which might need involvement from multiple parties and ensures the right collaboration and communication.
- Implement disaster recovery and backup strategies to protect critical data and configurations and perform periodic full-scale tests to verify plans and make improvements.
- Ensure compliance with industry regulations and maintain clear and up-to-date documentation for cloud infrastructure and procedures.
- Recommend solutions to improve availability, performance, incident resolution, observability, and supportability.
Compensation: $50/hr to $57/hr. Exact compensation may vary based on several factors, including skills, experience, and education. Benefit packages for this role will start on the 31st day of employment and include medical, dental, and vision insurance, as well as HSA, FSA, and DCFSA account options, and 401k retirement account access with employer matching. Employees in this role are also entitled to paid sick leave and/or other paid time off as provided by applicable law.
We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to HR@insightglobal.com.
Skills and requirements:
- 5 years of experience in IT Operations, and with Infrastructure Automation / DevOps
- 2 years of experience utilizing cloud Data Messaging services - GCP Pub Sub & Confluent Kafka
- 2 years of experience in Terraform for infrastructure provisioning
- Experience in CI/CD pipelines using Jenkins, Cloud Build or RedHat Tekton - Tekton is preferred
- Scripting experience with either Python, PowerShell or Bash Shell
- Programming experience with either Java or Python - Java is preferred
- Experience in authentication and authorization services/tools like Oauth2, AD, LDAP, ADFS, SSL
- Experience with Github Monitoring tools experience with either Grafana or Dynatrace
- Experience with Chatbot development or Agentic AI