Designation: Software Engineer / Sr Software Engineer (Platform)
Experience: 5-10 yrs
The Senior Cloud Operations engineer will be part the Operations team that has an exciting and challenging mission to help Tally deliver the next generation cloud platform and SaaS to millions of businesses for billions of transactions. This position offers significant growth opportunities and increased responsibility over time. Responsibilities will range from build, deploy, operate, monitor and maintain large scale multi-system SaaS environments in public/private clouds that are secure, stable, auto-scalable and always ON.
What will you be doing?
- Systems engineering
- Working with product engineering teams, design, develop, operate and maintain secure, auto-scalable, highly available, vendor agnostic cloud environments with the lowest TCO (total cost of ownership)
- Sizing and capacity planning
- Release management
- Automate deployment of great products and services in cloud environments
- Patch management – emergency and scheduled maintenance
- Operations for public (AWS/Azure/SoftLayer/NetMagic) and private cloud environments
- Capacity management
- Logging, monitoring and event management
- Change management
- Incident management and support
- Proactively maintain the overall health of all environments
- Understand the overall system design and ensure integrity is maintained
- Be available on-call as part of a team maintaining 24×7 availability of SaaS
- Execute procedures and activities in accordance with audit guidelines
- Implement and follow security policies
- Maintain up-to-date knowledge of industry trends, emerging technologies, and software development best practices
Who are we looking for?
- 5+ years of hands-on experience over a wide range of infrastructure technologies and operations supporting large scale mission-critical IaaS and SaaS environments serving millions of users
- Bachelor’s degree in Information Technology, Computer Science or engineering (or equivalent experience)
- Expert level knowledge of Linux or Unix system administration
- Strong knowledge of compute, storage and network architecture
- Strong knowledge and expertise in virtualization and managing cloud based virtualized infrastructure
- Experience in rapid troubleshooting of HW, OS and App issues
- Skilled in at least one of the tools/technology in each category
- Public clouds: AWS, Azure or SoftLayer
- Deployment & Configuration mgmt:Puppet, Chef, Salt or Ansible
- Scripting language: Shell, PHP, Python, Go or Perl.
- Log management: Logstash, Syslog, ElasticSearch, etc.
- Monitoring:Ganglia, Graphite, Nagios or similar
- Highly driven, self-managed individual who demonstrates initiative and proactively seeks solutions to problems
- Greatly developed problem analysis and critical thinking experience
- Passion for automation and efficient management of infrastructure
- Ability to understand technical issues and translate those issues in terms of business/customer organizational needs
- Outstanding interpersonal and communication skills
- Must be good at organizing the work and paying attention to details
- Should be able to work independently
- Container technology such as Docker
- Windows system administration
- SQL and NoSQL databases, such as, MySQL, MongoDB or HBase
- Experience with big data systems, such as, Hadoop or Cassandra
- Experience with load balancers, caching layers and real-time event processing
- Familiarity with encryption technologies, Server & storage hardware knowledge key management, etc.
- Capacity planning and performance tuning