Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: SAIC
Location: Springfield, VA
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description

Description

The Senior Monitoring Engineer in Springfield, VA is a senior-level technical expert who is accountable for the advanced troubleshooting, performance analysis, and optimization of enterprise monitoring platforms. This position is responsible for the design, implementation, and ongoing enhancement of observability solutions in hybrid environments, including on-premises, cloud, and virtual infrastructure. The engineer is responsible for the final escalation point for complex monitoring issues, collaborates with other teams to guarantee system reliability, and promotes best practices in observability.

Key Responsibilities:

  • Serve as the Tier 3 escalation point for issues related to any of the monitoring/observability platforms and tools.
  • Lead root cause analysis (RCA) for major incidents and recurring performance issues.
  • Maintain, configure, and optimize monitoring tool deployments across cloud (e.g., AWS, Azure), on-premises, and VMware environments.
  • Design and implement custom dashboards, synthetic monitoring, and service-level objectives (SLOs).
  • Develop and maintain alerting strategies that reduce noise and ensure actionable notifications.
  • Work closely with application, infrastructure, DevOps, and security teams to define monitoring requirements and integrate observability into CI/CD pipelines.
  • Analyze metrics, logs, and traces to ensure end-to-end service visibility and performance optimization.
  • Assist in onboarding applications and teams into the observability platform.
  • Provide training and mentorship to Tier 1 and Tier 2 support teams.
  • Ensure platform resilience, availability, and compliance with internal standards and SLAs.
  • Participate in on-call rotations for high-priority incidents as needed.

Qualifications

Required Education & Experience:

  • BS an 9 years experience; MS and 7 years experience; may accept additional experience in lieu of degree.
  • 5+ years of experience in IT infrastructure, application performance monitoring, or site reliability engineering (SRE).
  • 2+ years of hands-on experience using Dynatrace and monitoring tools in VMware Cloud Foundation (VCF).
  • Solid understanding of observability concepts including metrics, logs, traces, and user experience monitoring.
  • Experience supporting complex, distributed systems in cloud and hybrid environments.
  • Proficient with scripting and automation (e.g., PowerShell, Python, Bash, or Ansible).
  • Strong understanding of networking, Linux/Windows systems, containers, and application architectures (microservices, APIs, etc.).
  • Experience curating and implementing dashboards.
  • Excellent troubleshooting and problem-solving skills.
  • Strong written and verbal communication.
  • Ability to work independently and collaboratively across teams.
  • Customer-focused mindset and attention to detail.
  • Continuous learning and adaptability in a fast-paced environment.

Required Clearance:

  • US Citizenship.
  • Active top secret security clearance.

Preferred Qualifications:

  • Dynatrace Associate or Professional Certification.
  • Experience with Dynatrace, including OneAgent deployment, Smartscape, PurePath, and Davis AI.
  • Experience with integration of Dynatrace with tools such as ServiceNow, Splunk, Jira, or CI/CD pipelines.
  • Experience with other observability tools (e.g., Prometheus, Grafana, New Relic, AppDynamics, Splunk, Elastic).
  • Familiarity with DevOps practices and Infrastructure-as-Code (e.g., Terraform).
  • Understanding of ITIL framework and change management processes.
  • Experience using platforms such as Zabbix.


 Apply on company website