Site Reliability Engineer

About IO
IO is a flexible remote organisation with its origins in Cape Town. We design, build and ship scalable digital products. We collectively work on client and startup projects (which makes work at IO exciting, dynamic and agile) and we launch about 4 - 5 digital products per year. We are committed to modern methodologies and team structures and we really do put our people first.

About the role
We design, build and ship scalable products and startups. We are cultivating a company with a collaborative culture and strong values and work ethics, supported by a team-based organisational structure. The products we build are exceptional, and so is our team. We are always on the lookout for people to support our clients, stakeholders, entrepreneurs, product designers, developers and engineers to deliver the quality products and startups we strive for in IO.

We require a Site Reliability Engineer with a solid track record working with a collaborative team, and well-versed in Agile practices. Our ideal candidate can research, apply and document pragmatic SRE solutions, tools and software in line with our companys maturity and product requirements.

Our ideal candidate has

3 or more years of experience in the field of DevOps or Site Reliability Engineering
Experience with Cloud providers (AWS, Google Cloud Platform, Microsoft Azure and Digital Ocean or similar)
Experience with installing, setting up and running Linux distributions (Ubuntu, SUSE, RedHat or similar)
A technical background, with coding experience in a language like Python, PHP, Javascript or similar (A bonus would be previous development experience)
The ability to use Git as a version control system and has experience with either Gitlab, Github, Bitbucket or similar
Strong skills in using configuration, serialisation or markup languages like YAML, JSON, HCL and XML
An understanding of Agile development processes
Experience in managing environments using Terraform and Kubernetes
Excellent organisational and time management skills
Accuracy and attention to detail
Self-development skills to keep up to date with fast-changing trends
The ability to continually improve on processes, tools, documentation and their own skill-level

Our ideal candidate is...

A true problem-solver and self-starter
Knowledgeable within the software development lifecycle (from UX / UI design to deployment)
Knowledgeable with regards to project and defect management tooling and SaaS platforms, specifically ClickUp (or something similar like Jira)
Patient, collaborative and transparent
Respectful towards their peers
Well-spoken and articulate
Pragmatic and logical
A team player, and understands team dynamics
Fantastic at prioritisation
Trustworthiness (Trust is a major part of this role; and it comes with the territory)

Our ideal candidate has most of these SaaS Platforms under the belt (or at the very least a solid knowledge)

GitLab (or GitHub / BitBucket)
ClickUp (or Jira)
Google Workspace
Docker
GCP, AWS, Azure
Terraform & Kubernetes
Ansible, Chef or Puppet

Responsibilities of the role

Availability of applications (Kubernetes) and Cloud infrastructure
Latency and networking of Cloud infrastructure and applications (Kubernetes)
Performance of Cloud infrastructure and applications, performance testing, auto-scaling setups
Efficiency in automating processes and creating reusable code libraries for IAAC (Infrastructure as Code)
Change management using Infrastructure as Code, Git, Gitlab, HCL, YAML
Monitoring of Cloud infrastructure and applications
Identifying, handling and responding to emergency issues, and drafting security response plans
Capacity planning of applications and Cloud infrastructure, clusters, VMs, storage
High-security practices and enforcement (Hashicorp Vault/Boundary), vulnerability scanning and reports, orchestrating penetration testing of applications and infrastructure
Building, scaling and maintaining the infrastructure for various projects
Develop, maintain, and configure software to automate processes and improve efficiency
Testing and optimising systems to create stable, operational environments
Perform code reviews, maintain and improve code quality
Compose and maintain documentation of our infrastructure and tooling
Collaborate with cross-functional teams to define, ship and scale new features

Our expectations

Above all else, we value an attitude of lifelong self-learning. We are a team of people that keep up to date and continue to educate ourselves through research, mentoring and discussions
An attitude of openness to keep learning is more important to us than fancy qualifications
We are looking for highly motivated individuals who are willing to be part of a growing company. You must display a continuous willingness to learn and grow as a team player, and adaptability and flexibility in terms of tech stacks used
We expect you to take full ownership of your work, and to be a reliable team member especially when production issues arise and need to be tackled quickly
We take the time to put good structures, apps and tools in place to make work-life as easy as possible at IO, but your teams will still rely on you to display coping skills when it comes to complexity and real-world deadlines

Why join us?

We are a very close-knit, supportive and kind team
We are proud of what we build, and we believe in our products
We build great products and startups with a fantastic and highly skilled team, focused on standards, quality and efficiency
We stay ahead of the curve
We are remote and flexible
We believe in continuous improvement
We dont micro-manage
We have a flat but mutually respectful structure
We want to assist you to grow at all times

Our work principles
At IO we pride ourselves in our ability to stay current while adopting modern methodologies to achieve the best possible results for our clients and teams.

Principle #1
Customer Value = Business Value
We are as invested in the success of our client's products as they are themselves. Working together towards a shared goal delivers value and meaning far beyond what is simply represented on an invoice.

Principle #2
Work in short cycles
Short work cycles allow us to quickly learn from our actions and make evidence-based decisions.

Principle #3
Hold regular open retrospectives
We use regular retrospectives to look back at our work as a team and improve our performance and work relationships swiftly.

Principle #4
Go and See
We observe, learn and share. We amplify good patterns and successes and make it part of our daily discussions.

Alternative Principle #5
Fast and Flexible
We quickly identify what we need to know, progressing to research and validating critical assumptions when and where it adds the most value.

Principle #6
Work as a modern, balanced team
We have a modern staffing model. Working with our dedicated cross-functional teams, we work on the same things at the same time. We empower and trust our teams to work with autonomy.

Principle #7
Radical Transparency
To produce the best possible work, everyone needs to be on the same page, at all times.

Principle #8
Celebrate Achievements
We provide regular incentives and celebrate exceptional work.

Principle #9
Make learning a first-class citizen of your backlog
Learning is part of our product development process, we document learnings and incorporate them into future developments.

Occupation:

IT, computing jobs

This job offer is not active at the moment.

You have already applied to this job position

Save ad