Skip to main content

Lead, Service Reliability Engineer R&D

Apply now
Share
Download (1)

This job posting is anticipated to close on Jun 05 2026. We may however extend this time period, in which case the posting will remain available on www.careers.jnj.com to accept additional applications.

Description

At Johnson & Johnson, we believe health is everything. Our strength in healthcare innovation empowers us to build a world where complex diseases are prevented, treated, and cured, where treatments are smarter and less invasive, and solutions are personal. Through our expertise in Innovative Medicine and MedTech, we are uniquely positioned to innovate across the full spectrum of healthcare solutions today to deliver the breakthroughs of tomorrow, and profoundly impact health for humanity. Learn more at jnj.com

As guided by Our Credo, Johnson & Johnson is responsible to our employees who work with us throughout the world. We provide an inclusive work environment where each person is considered as an individual. At Johnson & Johnson, we respect the diversity and dignity of our employees and recognize their merit.

Job Function:

Technology Product & Platform Management

Job Sub Function:

Technical Product Management

Job Category:

Scientific/Technology

All Job Posting Locations:

Raritan, New Jersey, United States of America

Job Description:

We are searching for the best talent for a Lead, Service Reliability Engineer R&D to be located in Raritan, NJ

Position Summary

The Service Reliability Engineer (SRE) designs, builds, and operates reliability practices and technical capabilities that ensure critical engineering and enterprise services are available, performant, secure, and resilient. This is a hands-on, non-manager role focused on improving service reliability through observability, incident response, automation, and engineering excellence.

This role partners closely with Product Owners, development teams, infrastructure/platform engineering, Quality/Validation, Security, and Enterprise Architecture to define reliability targets, implement operational controls, and maintain documentation appropriate for regulated environments. The SRE helps standardize operational patterns across environments (dev/test/prod), including monitoring baselines, access controls, runbooks, change management, and deployment readiness.

Key outcomes include establishing and measuring Service Level Indicators/Objectives (SLIs/SLOs), improving alert quality and troubleshooting speed, reducing incident frequency and Mean Time to Recovery (MTTR), and enabling safe, repeatable releases through automation and operational readiness. The SRE identifies reliability risks and technical gaps, recommends scalable and resilient designs, implements reusable operational tooling, and participates in Agile ceremonies and on-call support aligned to the team’s ways of working.

Major Duties & Responsibilities

  • Define, implement, and continuously improve reliability standards for production services, including SLIs/SLOs, error budgets, and operational readiness criteria.
  • Build and maintain observability capabilities (metrics, logs, traces, dashboards) and establish actionable alerts that reflect customer impact.
  • Participate in on-call rotations, lead incident triage and restoration, and drive root-cause analysis with corrective and preventive actions.
  • Engineer reliability improvements through automation (self-healing, auto-remediation, runbook automation) and eliminate toil through scripting and tooling.
  • Partner with engineering teams to design and validate resilient architectures (timeouts/retries, circuit breaking, queuing, graceful degradation) and to improve deployment safety.
  • Perform capacity planning and performance analysis; proactively identify bottlenecks and reliability risks, and validate scaling strategies.
  • Establish and maintain operational runbooks, playbooks, and escalation paths; conduct game days and resilience testing (e.g., failover/chaos exercises) as appropriate.
  • Improve change management by defining deployment/rollback standards, validating monitoring coverage, and supporting release readiness reviews across dev/test/prod.
  • Create and maintain operational documentation (service catalogs, SLIs/SLOs, runbooks, monitoring standards) and ensure knowledge transfer across teams.
  • Support validation and audit readiness by following SDLC/IT controls, producing required evidence (e.g., monitoring/test results), and supporting controlled releases in regulated environments.
  • Develop reliability reporting (availability, latency, error rates, MTTR, incident trends) and present insights and recommendations to stakeholders.
  • Apply security-by-design principles (identity/access, secrets management, vulnerability management, data protection) and ensure operational practices meet company standards.
  • Collaborate with internal teams and vendors as needed to implement reliability improvements, manage platform upgrades, and continuously improve maintainability and supportability.

Qualifications

Required

  • Bachelor’s degree in Computer Science, Engineering, or related discipline, or equivalent experience.
  • 5+ years of experience in SRE, DevOps, platform engineering, or software engineering with substantial production operations responsibilities.
  • Hands-on experience with observability and incident management practices, including monitoring/alerting design, on-call operations, and root-cause analysis.
  • Experience with infrastructure-as-code and CI/CD (e.g., Terraform/CloudFormation, Git, Azure DevOps/Jenkins or similar) and automated testing/release practices.
  • Experience operating services in cloud-hosted or hybrid enterprise environments (AWS and/or on-prem), including networking fundamentals, secure configuration, and environment management.
  • Strong communication skills with the ability to explain technical issues, incident impact, reliability risks, and tradeoffs to both technical and non-technical stakeholders.
  • Working knowledge of Agile delivery practices and ability to collaborate across cross-functional teams (Product, Engineering, QA/Validation, Security, Infrastructure) to deliver reliable, well-managed releases.
  • Experience working in MedTech, Life Sciences, or other regulated environments, including familiarity with validated systems, documentation expectations, and controlled change processes.
  • Demonstrates AI Fluency—the ability to use and evaluate AI technologies responsibly (with a primary focus on generative AI in the workplace)—to improve productivity and decision quality while maintaining human accountability, managing risk, and complying with applicable governance, privacy, security, and policy requirements

Johnson & Johnson is an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, age, national origin, disability, protected veteran status or other characteristics protected by federal, state or local law. We actively seek qualified candidates who are protected veterans and individuals with disabilities as defined under VEVRAA and Section 503 of the Rehabilitation Act.

Johnson & Johnson is committed to providing an interview process that is inclusive of our applicants’ needs. If you are an individual with a disability and would like to request an accommodation, external applicants please contact us via https://www.jnj.com/contact-us/careers , internal employees contact AskGS to be directed to your accommodation resource.

#JNJTECH

#LI-Hybrid

Required Skills:

Product Lifecycle Management (PLM), Reliability Engineering

Preferred Skills:

Agile Product Development, Analytical Reasoning, Coaching, Collaborating, Competitive Landscape Analysis, Critical Thinking, Customer Alignment, Demand Forecasting, Human-Computer Interaction (HCI), Organizing, Product Development, Product Improvements, Product Strategies, Requirements Analysis, Research and Development, Software Development Life Cycle (SDLC), Software Development Management, Stakeholder Management, Technical Credibility, Technical Writing, Technologically Savvy

The anticipated base pay range for this position is :

$94,000 - $151,800

Additional Description for Pay Transparency:

Subject to the terms of their respective plans, employees and/or eligible dependents are eligible to participate in the following Company sponsored employee benefit programs: medical, dental, vision, life insurance, short- and long-term disability, business accident insurance, and group legal insurance. Subject to the terms of their respective plans, employees are eligible to participate in the Company’s consolidated retirement plan (pension) and savings plan (401(k)). Subject to the terms of their respective policies and date of hire, Employees are eligible for the following time off benefits: Vacation –120 hours per calendar year Sick time - 40 hours per calendar year; for employees who reside in the State of Washington –56 hours per calendar year Holiday pay, including Floating Holidays –13 days per calendar year Work, Personal and Family Time - up to 40 hours per calendar year Parental Leave – 480 hours within one year of the birth/adoption/foster care of a child Condolence Leave – 30 days for an immediate family member: 5 days for an extended family member Caregiver Leave – 10 days Volunteer Leave – 4 days Military Spouse Time-Off – 80 hours Additional information can be found through the link below. https://www.careers.jnj.com/employee-benefits

Lead, Service Reliability Engineer R&D

Apply now
Share

Not ready for a new role right now?

No worries. Join our talent community and we’ll reach out when we post new jobs that may match your interests and skills so you can apply when the time is right.

A man looking down at his mobile device