View All Jobs 161535

Distinguished, Software Engineer - Observability

Architect and lead the development of enterprise-wide real-time telemetry systems
Sunnyvale, California, United States
Expert
$169,000 – 338,000 USD / year
2 days ago
Walmart

Walmart

A multinational retail corporation operating a chain of hypermarkets, discount department stores, and grocery stores.

Distinguished, Software Engineer - Observability

As an observability Distinguished Engineer, you will be a key researcher and technical lead expert in the architecture and development of cloud native observability designs, managed services, and real-time telemetry software systems. You will use your depth of engineering and experience to create visionary software architectures and telemetry systems to achieve an observability software product portfolio. Additionally, you will design, develop, and implement large-scale distributed systems that process large volumes of data focusing on scalability, latency, and fault-tolerance in every system built. You must be able to effectively communicate and build collaboration at all areas and levels of the business and engineering. An ideal candidate will be adept at architecting large scale distributed systems and proficient in coding Java. Furthermore, experience in socializing architectural designs and roadmaps to internal and external customers. To achieve software solutions and designs, you will utilize multiple telemetry technologies such as: data models, metric libraries, data logging, distributed tracing, data lakes, data correlation, rule based alerting engines, real-time data streaming pipelines, TSDBs, and application performance management (APM). While working in a cloud infrastructure ecosystem consisting of VMs, Kubernetes, and containers, you will create metric software designs and solutions enabling real-time monitoring and alerting of system and application metrics. You will lead research initiatives for cloud native designs and implementation within public and private clouds. You will also utilize TSDBs and correlation and data fusion of multiple data types and heterogeneous data streams coupled with Artificial Intelligence (AI) and Learned Behaviors for anomaly detection, and forward projections of system and application expected behaviors. This role will involve collaboration with enterprise architects, product managers, data scientists, engineers and business managers to bring telemetry R&D projects into production. To achieve this effect, you will use a combination of open source and COTS technologies to solve real-time telemetry problems at an enterprise-wide scale. In parallel, you will lead the design of new systems and the redesign of existing systems to meet business requirements, changing needs, and integration of state-of-the-art technology. You will be an evangelist for the Observability foundation socialization technology designs and implementations to engineering and business customers.

Location: Open to Sunnyvale CA, Seattle WA, and Bentonville AR

Minimum Qualifications:

  • BS/MS in Computer Science, Engineering, or equivalent, with 15+ or more years in software engineering, design and architecture

  • This role requires a deep understanding of the Java language and associated frameworks and previous development of Java applications, Libs, SDK or services.

  • Strong architecture leadership with demonstrated enterprise level software implementations.

  • Previous demonstrated architectural leadership in research, evaluation, creation of software designs, and distributed software implementations in production.

  • Experience with technical leadership, software roadmaps, research and development, new software initiatives and customer and engineering coordination and engagement.

  • Full stack cloud software development experience.

Experience with the following:

  • API development, integration, and utilization

  • Cloud technologies and cloud native designs

  • Cloud infrastructures and technologies, such as OpenStack, Azure, GCP or AWS..

  • Large scale distributed systems experience including scalability and fault tolerance.

  • TSDBs (InfluxDB, Kairos, Cortex, Thanos, Prometheus) or equivalent

  • Extract, transform, and load (ETL) processes

  • Real-time telemetry pipelines and publish/subscribe models (Kafka or equivalent)

  • Data warehousing, data lakes, processing and data analytics

  • SQL (AzureSQL, Postgress or equivalent) a solid foundation in advanced SQL

  • Unix/Linux shell scripting or similar programming/scripting knowledge

  • Real-time time monitoring and alerting: metric agents, real-time dashboards, alerting rules

  • Excellent written and verbal communication skills for diverse audiences based on engineering subject matter

  • Ability to document requirements, architectural designs, and analysis findings in both business and technical terminology

  • Software development in an Agile iterative CI/CD development environment

  • Promote and support company policies, procedures, mission, values, and standards of ethics and integrity

Preferred Qualifications:

  • Knowledge and/or use of agentic AI – Model context protocol (MCP) servers, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), Natural Language processing (NPL)

  • Fluency in Python, JavaScript, advanced shell scripting, Configuration management -Ansible, chef, puppet

  • Experience with the following:

    • Application Performance Monitoring (APM) and/or Distributed Tracing

    • Deployment of Kubernetes, containers, service meshes, and micro services

    • Micro services architectures, Istio, and micrometer

    • Open Telemetry standards and protocols

    • Go development

    • Observability tools and system architectures

    • Experience in creating and maintaining managed metric services

    • NoSQL (Cassandra, CosmosDB or equivalent)

    • Storm, Spark or similar real-time streaming software

  • Knowledge of UI development - JavaScript, HTML, CSS and experience with frameworks like React and AngularJS

  • Involvement and contribution with open-source software communities

  • Demonstrated background in developing software systems and

+ Show Original Job Post
























Distinguished, Software Engineer - Observability
Sunnyvale, California, United States
$169,000 – 338,000 USD / year
Engineering
About Walmart
A multinational retail corporation operating a chain of hypermarkets, discount department stores, and grocery stores.