Site Reliability Engineer

Information Technology
in St. Petersburg
, FL
Reference: 19-02559


Our Site Reliability Engineering team and our Software Engineers work side by side to help deliver quality solutions inside of AWS. We have a large presence in AWS and are charged with stability, security, deployment automation, and helping developers to use AWS within their projects. We need people who have a STRONG knowledge of our core AWS technologies (ECS, Lambda, API Gateway, SQS, Elasticache, IAM, VPC). Candidate needs to have production level experience using and debugging these services. We need people who have experience with at least two of the following languages and have experience integrating with third party API’s like Python, Ruby, Powershell, Bash, and JavaScript. We need people who have experience debugging REST API’s and other HTTP services. We need people who have experience automating deployment processes. We need people who are creative, passionate about technology, and ready to help build something great. Our SRE team members are embedded with one of several engineering teams within our client. As an SRE, you will attend standup with the dev team and help guide them as they create new solutions within AWS. We do a lot of work to empower our developers to safely use these services while still following company policies. We do this via various utilities that we have built and implemented within the engineering team. As processes constantly evolve, you will help to create and expand upon tools that further this initiative.


  • You will help us expand on our monitoring capabilities by identifying valuable data and ensuring it is parsed and searchable by our engineering teams.
  • We use a combination of logs, system metrics, and application performance metrics to help debug interesting challenges.
  • As our services grow, the more interesting these challenges can become, so we will rely on you to help us identify, resolve, and predict these issues.
  • You will build infrastructure inside of AWS via code.
  • All of our environments are expected to be scripted and checked in, so familiarity with tools such as Terraform or CloudFormation will come in handy here.
  • You will architect secure and robust solutions with regional disaster recovery in mind.
  • You will help design ‘self-healing’ solutions to help ensure the stability and security of our services, in addition to helping control costs.
  • We regularly write custom code to help perform monotonous tasks that would normally require human intervention.
  • This requires careful consideration and lots of testing, but tends to be pretty fun.

Required Skills:

  • Deep knowledge and production experience in designing, deploying and administering complex Amazon AWS cloud applications (API Gateway, Lambda, ECS, ALB, WAF, EC2, RDS, Elasticache, Elasticsearch, SQS, IAM, VPC, Cloudformation)
  • Experience working with configuration management tools (Puppet, Chef, Ansible)
  • Production experience with docker
  • An in-depth knowledge of Linux troubleshooting, including networking, file systems, security, and the kernel
  • Strong knowledge of TCP/IP networking, including both hardware and host-based routing, VLAN’s, firewalls, subnetting, and load balancing
  • Experience writing code to create automated solutions (Bash, Ruby, Python, JavaScript, Powershell)
  • In depth knowledge of troubleshooting tools for debugging/tuning Restful API’s
  • Excellent knowledge of git best practices (Git Flow)
  • Good understanding of modern micro-services architectures
  • Production experience gathering, digesting, and improving monitoring and performance metrics
  • Experience designing and enforcing disaster recovery plans and business continuity contingencies
  • Experience being on-call in a 24/7 production environment
  • Meticulous attention to detail and strong organization skills
  • Bachelor’s degree in Computer Science, Information Systems, Engineering, or other related disciplines and 5+ years of experience in IT infrastructure services or related field with at least 5 years of RHEL, CentOS, Ubuntu or Debian Linux experience
  • Additional training, technical certification, and/or year’s experience may be substituted in lieu of a degree

What We Offer:

  • Competitive compensation and benefit packages
  • A quickly growing, great work environment that supports growth and development
  • A company who enjoys having fun; holiday and summer parties, annual global company off-site, experienced a private Star Wars pre-opening day viewing and lots of other great stuff