Schedule a Call

Role

Reporting to the Head of Engineering, this individual will work closely with the QA and Customer Success teams to ensure that the product is delivered successfully to a customer with high quality and is always available in a secure manner.

  • Design and maintain fault-tolerant, scalable, and reliable systems to meet uptime goals
  • Implement and manage monitoring, logging, and alerting systems.
  • Optimize infrastructure and application performance on cloud platforms (AWS, Azure, GCP).
  • Build and maintain CI/CD pipelines for automated deployments using tools like Jenkins or GitLab CI.
  • Automate operational tasks using scripting (Python, Bash) and infrastructure-as-code tools (Terraform, Ansible).
  • Collaborate with development and operations teams to improve reliability and resolve incidents.
  • Participate in on-call rotations for 7x24 system availability, promptly responding to and resolving production issues.

Required Skills

  • Platforms:  Strong experience in Linux/Unix system administration and cloud platforms (AWS, Azure, or GCP).
  • Tools: Proficiency in containerization and orchestration tools (Docker, Kubernetes).
  • Network and security: Hands-on experience with networking concepts (DNS, HTTP, load balancing) and security best practices.
  • Scripting: Advanced scripting skills (Python, Groovy) and familiarity with at least one programming language (e.g., Go, Java).
  • Log analysis: Expertise in monitoring and observability tools (e.g., Datadog, Nagios, or Splunk).
  • Deployment: Solid understanding of CI/CD principles and tools (e.g., Jenkins, Git).
  • Mindset: Must be driven to deliver a robust, reliable, secure environment that is always available for a customer.

Qualifications

  • 2-4 years of experience 
  • Show us your work to date
Apply for position

or send your resume at: careers@devicethread.com