Description
217713-en_US
About the role
Computacenter is looking for an AI Principal Consultant to join its professional services team. We are looking for someone with the ability to work on a dynamic customer focused team that requires excellent interpersonal skills.
This role will be interacting with customers, partners and internal teams, to analyse, define and implement large-scale networking projects. The scope of these efforts includes a combination of data center scale networking, system design and automation.
Annual compensation: $175K - $220K USD
What you'll be doing
- Partner with business leaders to deliver services that support company objectives and that are consistent with Winning Together values.
- Design, architect, and implement distributed InfiniBand networks for high-performance computing (HPC) and data center environments.
- Providing ethernet and routing expertise to customers during project delivery to design, architect and test ethernet networking solutions
- Designing, implementing, and optimizing high-performance fabric architectures for our data center and infrastructure projects.
- Support operational and reliability aspects of large-scale Al clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
- Your primary focus would be on understanding the Al workload and how it interacts with other parts of the system like networking, storage, deep learning frameworks, data cleaning tools, etc.
- Work on multi-functional teams to provide ethernet network expertise to server infrastructure builds, accelerated computing workloads and GPU enabled AI applications.
- Implementing tasks related to network configuration and validation for data centers.
- Create methods of procedure and deployment documents.
- Embrace and support Computacenter's mission and core values.
What you have
- Bachelor's degree in Information Technology, Engineering, or related field (or equivalent experience)
- Strong understanding of NVIDIA technologies including GX Cloud, NVIDIA Al Enterprise Al Software, Base Command Manager, NEMO and NVIDIA Inference Microservices.
- Deep understanding of Kubernetes‑based GPU scheduling, GPU virtualization concepts (fractional GPUs, MIG awareness), and policy‑driven resource allocation in multi‑tenant clusters.
- Experience optimizing cluster‑level GPU utilization, workload throughput, and job latency using Run:AI in conjunction with NVIDIA GPU platforms.
- Strong routing hands-on experience including BGP, VxLAN and EVPN
- Cluster management technologies knowledge and BCM (Base Command Manager.)
- Legally eligible to work in the United States
- Experience with IT service delivery lifecycle and methodologies.
- Demonstrated experience designing, deploying, or operating Run:AI–based GPU orchestration platforms in production environments.
- Ability to design in-depth, complex technical solutions.
- In-depth knowledge of IT Infrastructure technology and its business application.
- Excellent communication and presentation skills, with the ability to present at large internal and external audiences and at board level.
Apply on company website