
Description
The Technology Operations Center (TOC) Manager is responsible for the strategic leadership, management of processes, procedures, daily operations, team members, observability platform(s), and performance of the TOC, ensuring 24/7 monitoring and management of all technologies including, but not limited to, infrastructure, applications, and services. This role leads a team of operations analysts focused on proactive incident prevention, rapid incident response, service restoration, and continuous improvement. The TOC Manager also leads the observability team consisting of engineering resource(s), observability strategy, platform ownership, and management. The TOC Manager will collaborate closely with cross-functional IT Teams and key business stakeholders to ensure alignment, transparency, and timely communication across all operational activities.
Duties and Responsibilities
- Establish the TOC as a central function in incident response, observability, technology operations, and continuous service improvement
- Establish and foster an operational ecosystem where observability, incident, change, capacity, knowledge, and problem management functions as a unified lifecycle
- Monitor the effectiveness of operational processes and workflows based on including, but not limited to, risks, adoption rate, efficiencies, outcomes, and service levels
- Ensure 24x7x365 proactive monitoring and operational support of critical systems, applications, and services, with a focus on early detection, automated response, and continuous service improvement
- Facilitate post-incident reviews for major incidents, ensuring clear root cause analysis, accountable follow through, and integration of lessons learned into ongoing service improvement efforts
- Monitor and manage technology partners' performance related to support issues, monitoring, incident response, SLA adherence, integration of partner platforms to ITSM, and vendors are held accountable through data-driven evaluations and continuous improvement practices
- Foster strong collaboration across the IT Pillars, including infrastructure and operation, applications and systems, cybersecurity, data and analytics, and service management to ensure shared visibility, aligned priorities, and coordinated responses that support end-end service health and resilience
- Contribute to the configuration and continuous improvement of core ITSM processes, ensuring they are effectively operationalized within the platform and integrated with TOC workflows
- Collaborate with other IT teams to reduce incident recurrence through thorough root cause analysis leading to permanent resolution of recurring issues
- Conduct after-action reviews (AARs) for major incidents and improvement initiatives
- Govern and enhance ITSM core processes, including but not limited to incident, request, problem, change, and asset management
- Lead and develop the TOC team with a focus on operational ownership, technical growth, and accountability, providing clear direction, performance coaching, and aligning the team contribution with the evolving needs of a proactive, service-centric operations model
- Assist with the efforts to reduce tool sprawl by consolidating legacy ticketing systems into a unified ITSM platform
- Establish and monitor KPIs and SLAs to support visibility into operational effectiveness, including the ITSM core processes, resolution times, system availability and team output
- Integrate observability insights into ITSM reporting and CSI efforts to move from reactive troubleshooting to data-informed service improvements.
- Develop and maintain operational runbooks and standard operating procedures, ensuring accuracy, version control, and use during live incidents
- Govern CMDB accuracy and ensure relationships between CIs and service impacts are maintained
- Experienced in designing, implementing, and managing observability platforms, with the ability to define telemetry standards, build meaningful dashboards, and establish actionable alerting tied to service health and business impact.
Scope
- Staff supervision and development: Yes
- Decision making: Drafts policy documents and resolves problems; Provides data for decision support; Provides consultation or expert advice; Participates in planning business objectives; Represents the company in handling complaints, disputes or resolving grievances
- Travel: Up to 5%
- Flex Designation: Anywhere
Requirements
Education and Experience
Education Requirements
- Bachelor's degree in computer science, information systems or a related technical field, or equivalent experience in technical operations leadership role within Retail and/or Supply Chain environment
Years of Experience
- 5 years of experience in a Network or Technology Operations Center environment, supporting enterprise systems and services
Skills
- Availability: Nights and weekends as needed
- Remote work environment with occasional on-site presence required during events
- Maintain regular collaboration with cross-functional IT teams and key business stakeholders
- Drive Improvement initiatives based on service performance analytics
- Hands-on understanding telemetry, alerting, and monitoring technologies
- Hands-on experience with observability and IT operations tools such as Datadog, Splunk, Dynatrace, PagerDuty, and enterprise ITSM platforms
- Strong working knowledge of ITIL principles and other service management frameworks, with focus on operationalizing core ITSM processes
- Excellent communication and collaboration skills with the ability to partner effectively across technical and non-technical stakeholders
- Proficient in SQL and scripting to automate operational tasks, improvement incident diagnostics,
- reporting, and service improvement efforts
Physical Requirements
General office environment requiring ability to:
- Stand, walk, sit for extended periods of time.
- Speak and listen to others in person and over the phone and video conferencing.
- Use keyboard and read from computer screen and reports.
- The ability to lift up to 15 lbs.
Safety
- Must be able to perform this job safely in accordance with standard operating procedures and good manufacturing practices, without endangering the health or safety of self or others.
corporate corporate corporate
Apply on company website