• Build, Deploy and Manage the Enterprise Lucene DB systems (Splunk &
Elastic) to ensure that the legacy physical, Virtual systems and container
infrastructure for business-critical services are being rigorously and
effectively served for high quality logging services with high availability.
• Support periodic Observability and infrastructure monitoring tool releases and tool upgrades, Environment creation, Performance tuning of large scale Prometheus systems
• Serve as dev, ops, SRE for the internal observability systems in Client's various data centers across the globe including in Cloud environment
• Lead the evaluation, selection, design, deployment, and advancement of the portfolio of tools used to provide infrastructure and service monitoring. Ensure tools utilized can provide the critical visibility on modern architectures leveraging technologies such as cloud, containers etc.
• Build and grow the scope and capabilities of the Enterprise Monitoring team with a top-down, service-driven focus. Ensure methodologies keep pace with the shifts & transformations taking place within IT.
• Ensure monitoring team increases use of automation and adopts a DevOps/SRE mentality
• 10+ years of enterprise system logging and monitoring tools experience, with
a desired 5+ years in a relevant critical infrastructure of Elasticsearch, ECE,
Open Distort Elastic and Enterprise Splunk
• Experience with designing and engineering solutions to monitor critical systems and container infrastructure across a wide array of technologies and platforms
• In-depth experience managing monitoring tools such as Prometheus, Grafana and other commercial APMs, Nagios, SCOM, Zabbix, sysdig, BMC patrol.
• Strong knowledge on opensource logging and monitoring tools.
• Experience with containers logging and monitoring solutions.
• Experience with Windows and Linux operating system management and administration
• Familiarity with LAN/WAN technologies and clear understanding of basic network concepts / services
• Strong understanding of multi-tier application architectures and application runtime environments
• Experience with monitoring infrastructure in cloud platforms such as AWS and Azure is desired
• Knowledge of Python and other scripting languages and infrastructure automation technologies such as Ansible is desired
• CKA (Certified Kubernetes Administrator) or CKAD is a plus