Senior DevOps/SRE Engineer (Azure)
  • 5 years of Site Reliability Engineering or related systems support experience
  • Undergraduate degree in Computer Science or equivalent working experience
Responsibilities
  • Developing systems and applications to client's coding and quality standards
  • Technically manage complex and large-scale project efforts in development, maintenance and enhancement of business system applications
  • Improve reliability, availability and performance of the highly scaling cloud systems with recovery automation
  • Develop high-level system narratives, storyboards, designs and user interface prototypes
  • Develop system test plans, ensuring achievement of software quality assurance (SQA) standards, and that validate achievement of business goals
  • Report project/task status to the appropriate Manager, DevOps Engineering on a weekly basis
  • Identify issues that require more attention, and work to resolve issues based on an understanding of the business problem being solved.
  • Draw appropriate resources together in order to address technical issues.

Key Areas of Responsibility

  • Work with other members of their assigned value stream to ensure that in-scope applications/platforms are meeting performance and stability requirements, this includes managing major incidents to mitigation/resolution.
  • Improve monitoring capabilities to reduce outage frequency and duration.
  • Conduct post-incident reviews and seek for improving using right tooling and tech stack
  • Exploring and researching the latest SRE/DevOps technology and trends.
  • Perform all stages of the software development life cycle, self-manage activities on smaller projects and serve as technical lead on small, medium and large projects.
  • Perform post-incident reviews of all major incidents and determine action items required to avoid similar issues/minimize downtime for future incidents
  • Provide primary operational support and engineering for multiple large, distributed software applications
  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
  • Partner with Application Development to ensure that assigned applications/platforms have appropriate monitoring and metrics in place to appropriately measure performance and stability
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objectives
  • Ensure that applications/platforms in the value stream are operationally ready for production, this includes annual review of all SOPs/knowledge articles
  • Optimize and participate in on-call rotations & investigations related to the platform's availability
  • Participate in a 24x7 on-call rotations
  • Support incidents related to production systems
Requirements
  • 5 years of Site Reliability Engineering or related systems support experience
  • Undergraduate degree in Computer Science or equivalent working experience 
  • 4 years of direct experience programing in two of the following languages: Java/.Net, C#, Ruby, Python, PowerShell, Bash
  • 3 years of direct experience programming and or supporting Terraform
  • 4 years of direct experience with APM and infrastructure monitoring tools such as SCCM, Splunk, Dynatrace, DataDog
  • 5 years of direct experience in two of the following data management and visualization tools: Grafana, Prometheus, Splunk, ELK, SquaredUp
  • Proven ability to solve new challenges and problems quickly and independently
  • Excellent written and verbal communication skills with the ability to communicate effectively with all stakeholders including senior leadership
  • Demonstrated ability to understand and articulate details and impacts of complex proposed solutions
  • Ability to debug, optimize code, and automate routine tasks
  • Familiar with Windows and Linux operating systems and networking
  • Strong technical troubleshooting, diagnosing and problem-solving skills
  • Have a detail-oriented mindset considering edge cases, failure modes, behavioral patterns before all
  • Experience in DevSecOps, including integrating code analysis and vulnerability scanning tools into the CI/CD pipeline is a plus
  • Experience in Containers and Container orchestration (e.g., ECS, EKS, Kubernetes, and Docker Swarm) is a plus
  • Experience working in Azure, AWS, Google Cloud is a plus
  • Experience working in an Agile Scrum environment
  • 3 years direct experience with one of the following CI/CD pipelines and tools: Azure DevOps, Jenkins, Git, or Jira
  • 4 years of direct experience supporting and managing Azure PaaS, SaaS (strongly preferred), or other cloud providers such as AWS, Google, or relevant certifications
Location
Remote
Interested in vacancy?
Send us your resume. We'll contact you soon.
Your full name
Your email
Your phone
Your skype
Your CV
Contacts
hrmarketing@intellias.com
We'll be glad to provide you with any additional information

Lviv | Kyiv | Kharkiv | Odesa | Ivano-Frankivsk | Krakow

Made on
Tilda