UT Dallas Job - 49099908 | CareerArc
  Search for More Jobs
Get alerts for jobs like this Get jobs like this tweeted to you
Company: UT Dallas
Location: Dallas, TX
Career Level: Mid-Senior Level
Industries: Government, Nonprofit, Education

Description

Reporting to the Center for Vital Longevity's Associate Director, this is a systems engineer with a background in a High Performance Computing environment. To collaborate with and support center personnel, the engineer must have demonstrated a consultative customer service attitude in prior roles in similar organizations. Primary responsibilities include: the ability to build and run the underlying storage and high-performance computer systems that form the foundation of HPC services. The engineer ensures systems are secure, available, and reliable. They must have broad industry knowledge of hardware and software services involved in building and operating HPC environments.

This position is responsible for provisioning, deploying, administering, monitoring, maintaining, troubleshooting, upgrading, and patching high performance computational (HPC) resources and related research services supporting the operations of the Center for Vital Longevity (CVL). The engineer will collaborate closely with the HPC Operations team at the University of Texas, Dallas. The engineer will demonstrate a customer service mindset and adapt with agility to different work styles. They will interact with employees and stakeholders in a positive, productive, and appropriately stratified manner at various levels. The applicant must be self-motivated to stay abreast of applicable new technologies and technical methodologies to advance their productivity and career path. This engineer will produce effective, timely results both on individual projects and in team projects of all sizes. The engineer will have a good understanding of HPC solutions, architecture, and Linux administration. The engineer will have additional responsibility to document processes, procedures, system configurations, services and to place configuration information within our configuration management systems.

Minimum Education and Experience

No degree – Six years recent applicable experience
Associate Degree with 4 years applicable experience
Bachelor's Degree with 2 years applicable experience

Preferred Education and Experience

  • Master's degree in Computer Science or closely related with four years of experience in corresponding research services, support efforts, products and technologies.
  • Current knowledge of HPC best practice and systems deployment and maintenance.
  • Troubleshooting methodology and awareness of industry standards.
  • Excellent interpersonal, written, and verbal communication skills.
  • Demonstrate strong technical documentation, architecture diagramming, and organizational skills.
  • Good understanding of data center operations including fundamentals of networking and electrical power.
  • Excellent Linux administration and networking skills.
  • Experienced in supporting on-premises and code storage platforms, ability to support administrating operating system (Multiple Linux Versions) and ability to apply security policies to platforms and integrate new hardware into an HPC framework.
  • Experience with at least one high performance cluster operating systems such as OpenHPC, ROCKS, Bright/Nvidia Cluster Manager.
  • Experience with large scale high performance parallel file storage systems such as MooseFS, WEKA, VAST, GPFS, BGFS, CEPH.
  • Experience in supporting and operating 1Gbps – 100Gbps Ethernet and 56Gbps – 200 Gbps Infiniband HPC network interconnects.
  • Familiarity with: Open source and commercial research related software such as Python, R, Matlab, Mathworks, Julia, Ansys, Intel, nVidia cuda and GCC compilers.
  • Experience with all related dev ops tools such as GitHub, GitLab, Ansible, package management tools for rpm and or deb package building.
  • Experience with SLURM job scheduler.
  • Familiarity with national level academic HPC resources such as those found at TACC, SDSC, NCSA and PSC.
  • Familiarity with national level HPC Research Computing organizations such as XSEDE, ACCESS et al.


Essential Duties and Responsibilities

Expected areas of expertise and duties will include proficiency in the following:
  • Able to coalesce design specifications independently and collaboratively, manage and develop stable, best practice, environment specific enterprise class solutions. Able to manage and move forward projects.
  • Respond to user tickets from faculty, staff and students. Level 3 support experience at scale of 1 to 3 with 3 being a senior specialist.
  • Act as a role model in demonstrating integrity and ethical behavior in working with confidential and university information. 
  • Assist in development and implementation of internal policies, rules, and operation procedures for Research Computing and Cyber infrastructure to guarantee various assurance models such as NIST 800-53 and NIST 800-171 under which assured research is conducted.
  • Perform annual updates, expert level software coding (prefer Python, Linux Shell, etc.) in at least two or more languages. 
  • Support network technologies (routing, switching, firewalls, etc.) in the HPC environment.
  • Independent installation, configuration, updating, networking, performance monitoring and troubleshooting of HPC Systems.
  • Ability to develop, troubleshoot, modify, catalog, document, and update scripts.
  • Ability to package scientific software into RPMs and integrate with Lmod—so users can 
  • Able to compile, test and install many related open source scientific software packages as requested by research faculty, staff and students.
  • Supervise and guide Software Systems Specialist I/II based at the CVL and promote their professional development through training and special projects.


Additional Information

  • On-call availability necessary for quickly responding to and resolving system emergencies, both during regular and emergency off-hours.
  • Emergency on-call rotation availability for 24×7×365 coverage.
  • Hybrid Remote Work may be available after six months of continuous employment. Remote work schedule at CVL is 3 days in office/2 days remote. Employees must be flexible and adjust this schedule as needed with a 24-hour notice.
  • Sitting for extended periods of time. Dexterity of hands and fingers to operate a computer keyboard, mouse, power tools, and to handle other computer components. Lifting and transporting of moderately heavy objects, such as servers, switches, computers, and peripherals.
  • Work as part of the CVL administrative support team
  • Visa sponsorship is not available.


Important Message

1) All employees serve as a representative of the University and are expected to display respect, civility, professional courtesy, consideration of others and discretion in all interactions with members of the UT Dallas community and the general public.

2) The University of Texas at Dallas is committed to providing an educational, living, and working environment that is welcoming, respectful, and inclusive of all members of the university community. UT Dallas does not discriminate on the basis of race, color, religion, sex (including pregnancy), sexual orientation, gender identity, gender expression, age, national origin, disability, genetic information, or veteran status in its services, programs, activities, employment, and education, including in admission and enrollment. EOE, including disability/veterans. The University is committed to providing access, equal opportunity, and reasonable accommodation for individuals with disabilities. To request reasonable accommodation in the employment application and interview process, contact the ADA Coordinator. For inquiries regarding nondiscrimination policies, contact the Title IX Coordinator.


 Apply on company website