US
0 suggestions are available, use up and down arrow to navigate them
What job do you want?

Apply to this job.

Think you're the perfect candidate?

HPC Systems Administrator

Prestige Staffing Information Technology Job Dallas, TX (Onsite) Contractor
JobID: 48998

HPC System Administrator | Job Description

Overview

Compensation: $60 p/hr - $80 p/hr

This individual's primary goal will be to ensure a smooth experience for users of a hybrid onprem-cloud HPC system through onboarding, documentation, and day-to-day support. This individual will also participate in the evaluation of the latest hardware technologies and cluster management tools, make recommendations, support any hardware upgrades, and oversee the system for performance and security. 

Key Responsibilities: 

  • Oversee smooth operation of the HPC cluster in support of various R&D initiatives. Perform installation, testing, maintenance, upgrades and administration of operating system and application software

  • Fine-tune system configuration for reliability and performance. Perform file management and administration tasks; troubleshoot problems; ensure system remains operational and assist with user onboarding and access to the system

  • Monitor systems for performance and security; analyze malfunctions; troubleshoot and resolve problems in response to system performance or security issues

  • Implement system policies to adhere with relevant company policies and standards; recommend policies where applicable

  • Research and recommend configurations for new systems based on vendor and industry trends and contacts. Maintain up-to-date knowledge of the HPC hardware and management tools

  • Perform account maintenance and user management activities

Required Education: 

  • Bachelor’s degree + 5 years experience (or educational equivalent) in relevant discipline

Preferred Education: 

  • Relevant certification/training in various Cluster management tools

  • Relevant certification/training in various Cluster provisioning tools

  • Relevant certification/training in high-performance networking system such as Infiniband

  • Relevant certification/training in various parallel file systems (e.g., Lustre)

  • Relevant certification/training in various parallel programming e.g. OpenMPI

Required Experience: 

  • Experience with Linux systems administration

  • Experience with network and security administration

  • Experience with cluster design and system tunings

  • Experience with programming with modern languages and experience with parallel application software, protocols, tools and utilities

  • Understanding of Linux file systems (e.g., Ext 3)

  • Experience with hardware maintenance of HPC systems

  • Understanding of parallel file systems (e.g., Lustre)

  • HPC Schedulers and Workload Managers (e.g., Slurm)

  • Ability to provide technical support to users

  • Excellent problem identification and troubleshooting, system performance tuning

  • Excellent organizational and communication skills; ability to clearly communicate technical concepts to non-technical audience

Preferred Experience: 

  • Management and Design of HPC systems, fundamentals of Infiniband networking

  • Cluster provisioning and configuration tools 

  • Application Parallelization (MPI and OpenMP), working knowledge of core programming languages (C, Python, Perl)

  • Working knowledge of parallel applications installation, debugging, and support

  • Experience with cloud HPC offerings from leading providers

  • Experience with cloud bursting

Get job alerts by email. Join Our Talent Network!

Job Snapshot

Employee Type

Contractor

Location

Dallas, TX (Onsite)

Experience

Not Specified

Date Posted

04/01/2025

Apply to this job.

Think you're the perfect candidate?