Staff Engineer- SRE

Remote, USA Full-time
Responsibilities: The Site Reliability Engineering (SRE) team is responsible for the reliability, scalability, stability and performance of systems and services. They work with cross-functional teams to design, build and maintain systems and they troubleshoot issues when they arise. They bridge the gap between development and operations teams. They work closely with business teams to define Service Level Objectives (SLO) and agreements (SLA) of critical systems. They also monitor and maintain the uptime of these systems in-line with the defined SLO’s and SLA’s. They deploy and manage monitoring tools to gain insights on system health and performance. They analyze performance, identify bottlenecks and implement solutions to improve a system’s scalability and latency durations. They develop scripts, implement tools and automation frameworks to reduce the manual intervention efforts of deployment, monitoring and scaling. They work with development teams for design and development of observability practices like logging, metrics, tracing, etc. They aim to diagnose and troubleshoot issues proactively. They create actionable alerts on monitoring systems to ensure rapid response for potential production incidents. They forecast resource needs and provision adequately for current and future demand. They design and execute “chaos experiments” to test system’s failure resiliency. They own, define and implement the Disaster Recovery (DR) processes for systems. They also conduct planned and unplanned mock DR drills to test for response preparedness during production incidents. They ensure that security best practices are followed and implemented during design and operations of systems. They also own and maintain documentation of processes, playbooks, and systems. They publish KPI reports and other system health updates on a regular basis to the business. Requirements: Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience Must-have - 12+ years of overall IT experience Must-have - 7+ year of proven work experience as a Senior Site Reliability Engineer or a similar position. Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. Must-have - AWS experience - 3+ years’ experience with using a broadrange of AWS technologies (e.g. EC2, RDS, ELB, S3, VPC, CloudWatch & Monitoring Tools) to develop and maintain an Amazon AWS based cloud solution, with an emphasis on best practice cloud security. Must-have - 2+ year of experience in CDN and/or Cache systems like Fastly, Akamai, CloudFront, etc. Proven Understanding & strong experience with Cloud deployments ( AWS / Docker/ Kubernetes) Knowledge on provisioning IAC Tools like Terraform, Chef, Ansible, Shell, groovy, python, etc. Experience with monitoring systems such as CloudWatch, NewRelic, Datadog/Splunk, ELK stack. Experience managing cloud network resources (AWS Preferred) such as CloudWatch, VPC, URL proxies, private link, DNS, ACLs, firewalls, and C2S access points. Platform or Application Engineering and Operational Knowledge in any of the CI/CD tooling like GitHub Actions, Jenkins, etc. Experience in other tooling Technologies like JIRA, Bitbucket, Jenkins, Fortify, SonarQube, Nexus, Nexus IQ Experience with configuration automation tools like Puppet/Ansible/Chef/Salt Scripting Skills: Strong scripting (e.g. Bash & Python) and automation skills. Operating Systems: Windows and Linux system administration. Problem Solving: Ability to analyze and resolve complex infrastructure resource and application deployment issues Strong attention to detail. Excellent verbal and written communication skills. Strong documentation skills. Good To Have: Experience with Terraform/Ansible/Chef/Puppet Experience with GitHub Actions Experience with CloudFront, Fastly Oversees team members performing these functions Anticipates problems and future technical needs and takes necessary steps to address issues. Work primarily in server side technologies and comfortable with client side whenever required Enthusiastically follow technology trends, software engineering best practices and technologies Perks: Day off on the 3rd Friday of every month (one long weekend each month) Monthly Wellness Reimbursement Program to promote health well-being Paid paternity and maternity leaves Must-have - Bachelor's degree, preferably in CS or a related field, or equivalent experience Must-have - 12+ years of overall IT experience Must-have - 5+ years of AWS Cloud experience with AWS Certified DevOps Engineer or SysOps or Security etc. Forbes Advisor is a new initiative for consumers under the Forbes Marketplace umbrella that provides journalist- and expert-written insights, news and reviews on all things personal finance, health, business, and everyday life decisions. We do this by providing consumers with the knowledge and research they need to make informed decisions they can feel confident in, so they can get back to doing the things they care about most. If you're looking for challenges and opportunities similar to those of a startup, with the benefits of a seasoned and successful company, then read on: Originally posted on Himalayas
Apply Now

Similar Jobs

Senior DevOps Engineer (Poland Remote)

Remote, USA Full-time

**Experienced Customer Service Representative (Remote) – Delivering Exceptional Support to Pennsylvania Residents**

Remote, USA Full-time

Experienced Customer Service Representative – Remote Opportunity with Flexible Hours and Career Growth Potential

Remote, USA Full-time

**Experienced Customer Support Associate – Remote Help Desk and Application Support Specialist**

Remote, USA Full-time

**Experienced Customer Service Representative – Exceptional Service Delivery and Team Collaboration**

Remote, USA Full-time

**Experienced Customer Engineer, AI/ML, Healthcare and Life Sciences (HCLS) - Transforming Businesses with Google Cloud**

Remote, USA Full-time

**Experienced Customer Engineer, Security, Public Sector – Cloud Security Solutions and Architecture Expert**

Remote, USA Full-time

Experienced Customer Service Representative – Amazon's Global Customer Support Team (Fully Remote)

Remote, USA Full-time

**Customer Success Specialist (US - Fully Remote) - Unlock the Power of Government Grants with Grantify**

Remote, USA Full-time

**Experienced Data Entry Clerk – Remote Opportunity with Amazon**

Remote, USA Full-time

Experienced Remote Customer Support Specialist – Delivering Exceptional Client Experiences in the Vehicle Rental Industry

Remote, USA Full-time

Experienced Remote Live Chat Support Specialist - Deliver Exceptional Online Customer Service and Earn a Competitive Hourly Rate of $25-$35

Remote, USA Full-time

[Remote] Mergers and Acquisitions Analyst

Remote, USA Full-time

**Experienced Customer Service Agent - Remote Work Opportunity with blithequark - Join Our Global Team!**

Remote, USA Full-time

Experienced Live Chat Agent – Remote Customer Service Representative for Dynamic Team at blithequark

Remote, USA Full-time

Experienced Administrative Assistant for Remote Data Entry and Customer Service Opportunities with Flexible Scheduling and Competitive Compensation

Remote, USA Full-time

Data Entry - Work from Home

Remote, USA Full-time

**Experienced Entry-Level Remote Data Entry Specialist - Join blithequark's Dynamic Team and Accelerate Your Career in E-commerce and Pet Care**

Remote, USA Full-time

Benefits & Incentives, Senior Manager, Ad Partner Management, Walmart Connect

Remote, USA Full-time

**Experienced Data Entry Clerk – Remote Work Opportunity with arenaflex**

Remote, USA Full-time
Back to Home