Site Reliability Engineer
Company
Guidewire
Location
Remote
Type
Full Time
Job Description
What You'll do
- Collaborate with development teams to troubleshoot and solve problems, reducing customer impact.
- Develop automated runbooks and implement measures to handle issues proactively.
- Apply sound engineering principles and mature automation to our operating environments.
- Monitor, maintain, and enhance the reliability and performance of applications on our Guidewire Cloud Platform.
- Leverage your automation and software engineering expertise to optimize systems and eliminate toil.
- Document and examine incidents to improve processes and continuously prevent future occurrences.
- Stay up-to-date with the latest industry trends, tools, and best practices in site reliability engineering.
- Contribute to a culture of innovation, learning, and continuous improvement.
Want more jobs like this?
Get Software Engineering jobs that are Remote delivered to your inbox every week.
What You'll Bring
- Proven experience as an SRE or similar role, with a track record of improving system reliability
- Strong problem-solving skills and the ability to analyze complex systems and devise effective solutions
- Excellent collaboration and communication abilities to work cross-functionally and clearly document processes
- Experience with automation, monitoring, and performance optimization tools and techniques
- Dedication to maximizing uptime, scalability, and delivering an exceptional end-user experience
- A passion for technology and a strong desire to continuously learn and grow your skills
- Alignment with Guidewire's mission to leverage technology to help protect and support others
Required Skills & Experience
- Proven experience leveraging application performance monitoring (APM) and telemetry tools to troubleshoot and diagnose problems
- Proven experience triaging and debugging distributed systems on cloud infrastructure
- Proven experience in designing and engineering CI/CD pipelines within Kubernetes (K8S) and legacy ecosystems
- Proven experience in designing and engineering monitors, dashboards, and synthetic transactions in Datadog
- Proven experience in building, deploying, and running scalable infrastructure within AWS and Kubernetes ecosystems using Terraform and other cloud-native approaches
- Proven experience in managing infrastructure configuration at scale using multiple approaches and/or tools such as GitOps, Puppet, or Ansible
- Good understanding of AWS cloud networking and security with hands-on experience remediating infrastructure vulnerabilities at scale
- Good understanding of SLIs, SLOs, and Error Budgets
- Comfortable with Linux system administration, with the ability to program/script using Python, Go, Java, shell, or equivalent
- Participate in mandatory on-call rotations to ensure service availability and reliability, responding to incidents and alerts outside regular hours, including weekends and holidays. Candidates must be willing and able to fulfill this critical responsibility.
Preferred Skills
- SRE certified in multiple categories
- AWS certified in multiple categories
- Proficiency with SQL, database administration, data pipelines, performance tuning, and schema design
- Proficiency with multiple pipelining tools such as TeamCity, Bitbucket Pipelines, Jenkins, and GitHub Actions
- Familiarity with open-source distributed data processing frameworks such as Hadoop, Apache Spark, AWS Redshift, etc.
Date Posted
10/14/2024
Views
0
Similar Jobs
Linux Support Engineer - Voltage Park
Views in the last 30 days - 0
Voltage Park is seeking a Linux Support Engineer for a fulltime remote position The ideal candidate will have command line level Linux sys administrat...
View DetailsTechnical Architect - CDW
Views in the last 30 days - 0
CDW offers a rewarding career opportunity for a Technical Architect with expertise in ServiceNow The role involves delighting customers by collaborati...
View DetailsFederal Security Solutions Engineer - Rapid7
Views in the last 30 days - 0
Rapid7 is seeking a Federal Solutions Engineer with 5 years of experience in cybersecurity solutions engineering or technical sales focusing on federa...
View DetailsSales Engineer - Dandy
Views in the last 30 days - 0
Dandy a venturebacked company is revolutionizing the 200B dental industry with advanced technology They are looking for a Sales Engineer with 5 years ...
View DetailsEngineering Manager (Group Practice Tooling & Provider CX) - Headway
Views in the last 30 days - 0
Headway is a mental healthcare company founded in 2019 aiming to build a new mental health care system accessible to everyone They have a national net...
View DetailsEngineering Manager (Claims Platform) - Headway
Views in the last 30 days - 0
Headway is a mental healthcare company founded in 2019 aiming to build a new mental health care system accessible to everyone They have a national net...
View Details