Site Reliability Engineer – Remote (Hourly)
Crossing Hurdles · Nigéria
Job description
About the role
We are looking for a skilled Site Reliability Engineer to join our remote team on an hourly contract basis. You will be responsible for ensuring the stability, performance, and scalability of containerized AI training environments.
Key responsibilities
- Deploy, monitor, and recover containerized AI training workloads.
- Troubleshoot infrastructure bottlenecks and resolve system failures in real time.
- Build and manage resilient systems to optimise stability and performance.
- Collaborate with engineering teams to improve CI/CD pipelines and automation.
- Manage filesystem structures, storage, and process scheduling in containerised environments.
- Execute dynamic replanning during runtime issues and document processes, solutions, and best practices.
Required profile
- Strong experience with terminal‑based system administration and troubleshooting.
- Ability to manage dynamic infrastructure recovery in high‑pressure scenarios.
- Excellent written and verbal communication skills.
Required skills
- Docker
- Kubernetes
- Python (scripting, automation, debugging)
- Bash
- Linux system administration
- CI/CD pipeline concepts
- Version control systems (e.g., Git)
Questions fréquentes
Why are you reporting this job?
Apply in 30 seconds
Enter your email to apply. An account will be created automatically.
By continuing, you accept our terms of use.
Already have an account? Login
Published 3 days ago
Expires 1 month from now
10 views · 0 applications
Boost your chances
Upload your CV — we will match you with relevant openings.
Analyzing your CV...
Crossing Hurdles
Nigéria