View All Jobs 153905

Software Engineer (networking & Telemetry Systems)

Lead the development of telemetry frameworks to improve cloud network observability
San Francisco Bay Area
yesterday
California Staffing

California Staffing

Arkansas Staffing appears to be a government-associated entity focused on workforce development and employment services within the state of Arkansas.

296 Similar Jobs at California Staffing

Software Engineer (Networking & Telemetry Systems)

Do you want to be at the forefront of innovating the latest hardware designs to propel Microsoft's cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross-team collaboration with business insight and strategy? Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.

Join the Strategic Planning and Architecture (SPARC) team within Microsoft's Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft's expanding Cloud Infrastructure and for powering Microsoft's "Intelligent Cloud" mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide, and AHSI is the team behind our expanding cloud infrastructure. We are seeking a Software Engineer (Networking & Telemetry Systems) to lead the design and development of scalable networking systems, transports, and telemetry frameworks that support Azure's AI and data center infrastructure. This role requires deep expertise in systems and network architecture, performance diagnostics, and telemetry engineering, with a focus on building robust observability and debugging capabilities.

Responsibilities:

  • Coding: Learns to review code and helps to review code of others to ensure it meets team standards. Participates in code review processes for self-development, gathers feedback, and learns about coding standards and the team's features. Applies coding patterns and best practices. Learns how and begins to use automated source code analysis tools that are incorporated into the build/development process with minimal supervision. Develops and applies knowledge of debugging tools, tests, logs, telemetry, and other methods to begin supporting efforts to proactively flag issues before they occur for product features in production. Learns to conduct incident retrospectives to identify root causes of problems, and begins to implement repair actions with direct supervision. Grows understanding of and begins to apply least-access principles and uses logging, telemetry, and other appropriate mechanisms with direct supervision to investigate issues while retaining privacy and security. With guidance, learns how and creates and implements code for a product, service, or feature reusing code as applicable. Writes and learns to create code that is extensible and maintainable. Learns about and applies diagnosability, reliability, and maintainability, and understands when the code is ready to be shared and delivered. Applies coding patterns and best practices to write code (e.g., leveraging state-of-the-art generative artificial intelligence [GenAI], approaches to source code organization, naming conventions). With guidance from more experienced colleagues, identifies and escalates blockers or unknowns during the development process, and communicates how they will impact timelines.
  • Design: Understands proposals and develops an understanding of how to apply them under the technical leadership of others. With managerial guidance, tests and explores various design options for a product/solution feature, outlining strengths and weaknesses of each option. Produces code to test hypotheses for technical solutions and assists with technical validation efforts. Helps with and participates in the development of design documents that support simple user stories with oversight. Develops an awareness of the current technology landscape. Escalates findings from investigations to team members for design decisions. Learns about the implications of performance, scalability, resiliency, cost of goods sold (COGS), and other requirements and expectations in systems architecture. Begins to uphold Microsoft standards of security, privacy, and other compliance requirements in systems architecture. Develops an understanding of the importance of building solutions that expand upon the work of others. Contributes to the refinement and integration of feedback in product features by escalating findings from analyses to inform decisions regarding the engineering of products. Supports the identification of dependencies, and their incorporation into the development of design documents for a product feature with oversight. Learns and helps to actively identify other teams and technologies to leverage, how they interact, and where their own system or team can support others. Learns about downstream interactions between systems. Collaborates with others to understand and execute a defined test strategy that ensures solution quality, prevents regression from being introduced into existing code. Assists with executing test plans that incorporate security testing to validate security invariants (including negative cases) as assigned. Builds testable code for a feature under guidance from more experienced peers. Understands the most common types of tests and test strategies that can be done for the code for their feature, and begins to develop an understanding of testing architectures used both across Microsoft and across the industry. Leverages artificial intelligence (AI) tools for test automation with direct managerial oversight.
  • Engineering Excellence: Learns about and helps to ensure the correct processes are followed to achieve a high degree of security, privacy, safety, and accessibility. Contributes to efforts to check for visible evidence (e.g., audit trail) to demonstrate compliance for product features. Develops understanding of the implications of onboarding new technologies following expectations of compliance at Microsoft. Develops an understanding of global and local regulations for technologies and system applications. Develops an understanding of and applies security best practices and establishes code invariants to model "security as code," ensuring each layer is independently secure, and minimizing risk with direct supervision. Begins to adopt security standards for clear security code review practices for a product feature that align with design and engineering principles to raise the security hardening for both protections and detections. Supports efforts to incorporate deployment gates on security controls, and scanners for a product feature to prevent regressions and/or vulnerabilities that would have customer impact. Includes required security monitoring to ensure detection of violations with direct guidance. With direct supervision, contributes to working with relevant security partners to define security promises and security invariants while factoring in attacker/investigator personas for security monitoring and telemetry needs, ensure threat models and premortems validate upstream and downstream assumptions and security invariants, establish security breach drills and security incident response processes (e.g., impact analysis, containment), and ensure that artificial intelligence (AI) safety features are implemented for the AI production systems tied to a product feature. Works with partner teams to ensure a product feature works well with the components of the partner team with direct supervision, supporting team efforts to ensure proper end-to-end testing, live-site coverage, scalability, performance, and DRI escalation pathways are established before going live. Learns to develop and contribute to automation within production and deployment of a product feature. Runs code in simulated, or other non-production environments to confirm functionality and error-free runtime for products with oversight. Develops knowledge of and learns to apply best practices to build code based on well-established methods and secure design principles. Learns about customer scaling requirements and application of best practices for meeting scaling needs and performance expectations and security promises. Reviews current developments and proactively seeks new knowledge that will improve the availability, reliability, efficiency, observability, and performance of products while also driving consistency in monitoring and operations at scale. Develops collateral materials for learning and literary sessions used to raise awareness on relevant engineering design principles (e.g., security, testability, performance, scalability, accessibility, product knowledge), with some oversight. Learns about, shares new ideas, and leverages software developer tools to create, debug, and maintain code for features. Identifies whether open source or internal code is available for addressing coding needs for a set of product features, and reuses it in a responsible manner where applicable with some guidance.
  • Implement: Reviews work items to increase knowledge of product features in partnership with appropriate stakeholders (e.g., technical program managers) with guidance from more experienced peers. Supports team efforts on breaking down work items into tasks and providing estimation. Escalates issues that might cause a delay. Assists with ensuring required security protections and detection processes are accounted for in planning. Supports team efforts for ensuring project plans adhere to security, privacy, and compliance requirements with direct supervision. Ensures assigned code for a product feature is properly flighted for quicker mitigation of production incidents with managerial oversight. Contributes to calculating capacity for planning, accounting for appropriate failover and backup/restore mechanisms for disaster recovery for a product feature with guidance from more experienced peers. Learns about and begins to make considerations for efficient operation of a product feature after it is live with direct managerial oversight. Supports efforts to establish a rollback plan for a feature as instructed. Learns about and supports deployment to customers by following the correct measures to push features out to customers. Follows safe change deployment practices (e.g., ensuring that flights are set correctly) for their team to minimize adverse impact to users and other services with managerial guidance. Learns about and applies best practices for the deployment of features safely with managerial oversight and/or guidance from more experienced peers. Contributes to monitoring dependency status and ensuring that only the latest, secure versions are deployed. Identifies when rollback plans should be enacted for a product feature with direct supervision. Contributes to building deployment infrastructure to allow developers' private builds for a product feature to be tested in a production-like environment.
  • Reliability and Supportability: Acts as a designated responsible individual (DRI) in monitoring a system/product feature/service for degradation, downtime, or interruptions for simple problems, and recommends actions to restore system/product/service by following the playbook. Escalates more complex problems to other DRIs as to status. Responds within service level agreement (SLA) timeframe. Escalates
+ Show Original Job Post
























Software Engineer (networking & Telemetry Systems)
San Francisco Bay Area
Engineering
About California Staffing
Arkansas Staffing appears to be a government-associated entity focused on workforce development and employment services within the state of Arkansas.