Back to Search Results
Get alerts for jobs like this Get jobs like this tweeted to you
Company: AMD
Location: Beijing, Beijing, China
Career Level: Mid-Senior Level
Industries: Technology, Software, IT, Electronics

Description



WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives. 

AMD together we advance_



THE ROLE: 

you will be responsible for developing and optimizing deep learning operators (kernels) for high-performance training, as well as contributing to the design and implementation of large-scale training frameworks such as Megatron-LM.

 

THE PERSON: 

This position sits at the intersection of low-level performance engineering and high-level model training, and is ideal for candidates who are passionate about pushing the limits of training efficiency and scalability on modern hardware platforms (e.g., AMD GPU)

 

KEY RESPONSIBILITIES: 

  • Design, implement, and optimize custom deep learning operators (CUDA/HIP/Triton).

  • Profile and improve performance bottlenecks in end-to-end model training pipelines.

  • Contribute to distributed training frameworks such as Megatron-LM, DeepSpeed, or similar.

  • Ensure correctness, numerical stability, and scalability of both operators and frameworks.

  • Collaborate with model teams to support real-world training workloads (e.g., LLMs, MoE, multimodal models).

  • Work closely with compiler and runtime teams to integrate low-level improvements.

 

PREFERRED EXPERIENCE: 

  • Strong programming skills in C++ and Python.

  • Hands-on experience with GPU programming (CUDA or HIP), and familiarity with performance tuning tools (Nsight, rocprof, etc.).

  • Deep understanding of tensor computation, memory hierarchy, and kernel fusion techniques.

  • Experience contributing to open-source DL frameworks (e.g., PyTorch, Megatron-LM, DeepSpeed, Hugging Face Transformers).

  • Solid grasp of training large-scale models, including distributed training techniques (DDP, pipeline/model parallelism).

  • Familiarity with PyTorch custom ops, FX, TorchScript, or Triton is a plus.

  • B.S./M.S./Ph.D. in Computer Science, Electrical Engineering, or a related field.

 

ACADEMIC CREDENTIALS: 

  • Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent 

 

#LI-FL1



Benefits offered are described:  AMD benefits at a glance.

 

AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law.   We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.


 Apply on company website