Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Austin, TX
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. 

AMD together we advance_



THE TEAM:

AMD's Data Center GPU organization is at the forefront of innovation, transforming industries with powerful AI-based graphics processors. Our mission is to develop and deliver exceptional AI and GPU solutions that drive the next generation of computing—from cloud AI workloads and HPC to embedded and industrial systems. Join a team building world-class AI-powered products with exceptional people.

 

THE ROLE:

This role will blend AI model operations with infrastructure-level orchestration to ensure smooth deployment, monitoring, and scaling of cutting-edge AI workloads. This role requires hands-on experience with Python, Kubernetes (K8s), Slurm, OpenStack, and Ansible, along with the ability to support external clients in live troubleshooting sessions. 

 

THE PERSON:

We are seeking an experienced AI Engineer with a strong background in deploying and validating Large Language Models (LLMs) across high-performance environments.

 

KEY RESPONSIBILITIES:

  • Deploy, run, and validate LLM models on both local and distributed compute clusters.
  • Develop Python-based automation scripts for data pipelines, validation workflows, and deployment.
  • Leverage Ansible (where applicable) to automate infrastructure configuration and software provisioning.
  • Manage and troubleshoot Kubernetes clusters, containerized environments (Docker), and support extensive K8s-based workflows.
  • Operate and optimize Grafana and Prometheus for metrics visualization and system observability.
  • Assist with application deployments and ensure reliable runtime across cloud and on-prem systems.
  • Support MPI-based workloads and optimize parallel execution (bonus).
  • Validate system compatibility and performance using frameworks like ROCm, RCCL, or similar.
  • Perform low-level cluster management tasks, including node health checks, job orchestration, and basic hardware monitoring.

PREFERRED QUALIFICATIONS:

  • Strong Python programming skills with experience in system orchestration and AI model integration.
  • Hands-on experience with LLMs, AI model fine-tuning, and inference pipelines.
  • Deep understanding of Kubernetes concepts including operators, Helm charts, service meshes, and cluster scaling.
  • Exposure to HPC environments, MPI job execution, and GPU compute clusters.
  • Familiarity with ROCm stack, RCCL, or equivalent AI/ML hardware acceleration frameworks.
  • Working knowledge of Ansible, Docker, Prometheus, and Grafana.
  • Prior experience with infrastructure for AI/ML workloads in a production or research setting.
  • Knowledge of Slurm or other HPC workload managers.
  • Experience contributing to open-source AI or orchestration frameworks.
  • Exposure to model monitoring, drift detection, and performance optimization strategies.

 

LOCATION:

Remote or Austin TX

 

#LI-RW1

#LI-HYBRID



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website