devicethread® | Transforming Smart Hospitality

Role

Reporting to the Head of Engineering, this individual will work closely with the QA and Customer Success teams to ensure that the product is delivered successfully to a customer with high quality and is always available in a secure manner.

Design and maintain fault-tolerant, scalable, and reliable systems to meet uptime goals
Implement and manage monitoring, logging, and alerting systems.
Optimize infrastructure and application performance on cloud platforms (AWS, Azure, GCP).
Build and maintain CI/CD pipelines for automated deployments using tools like Jenkins or GitLab CI.
Automate operational tasks using scripting (Python, Bash) and infrastructure-as-code tools (Terraform, Ansible).
Collaborate with development and operations teams to improve reliability and resolve incidents.
Participate in on-call rotations for 7x24 system availability, promptly responding to and resolving production issues.

Required Skills

Platforms: Strong experience in Linux/Unix system administration and cloud platforms (AWS, Azure, or GCP).
Tools: Proficiency in containerization and orchestration tools (Docker, Kubernetes).
Network and security: Hands-on experience with networking concepts (DNS, HTTP, load balancing) and security best practices.
Scripting: Advanced scripting skills (Python, Groovy) and familiarity with at least one programming language (e.g., Go, Java).
Log analysis: Expertise in monitoring and observability tools (e.g., Datadog, Nagios, or Splunk).
Deployment: Solid understanding of CI/CD principles and tools (e.g., Jenkins, Git).
Mindset: Must be driven to deliver a robust, reliable, secure environment that is always available for a customer.

Qualifications

2-4 years of experience
Show us your work to date

Junior Site Reliability Engineer

Location Hyderabad | India In-office Full time

Will require work across shifts including shifts on nights and holidays

Role

Required Skills

Qualifications