Chief HPC Engineer

EPAM Systems β€’ Barra do Garças, Brazil

Company

EPAM Systems

Location

Barra do Garças, Brazil

Type

Full Time

Job Description

We are currently seeking an experienced Chief HPC Engineer to manage the daily operations and engineering activities within our HPC environment.
The perfect candidate should be proficient in engineering with substantial expertise in setting up and enhancing HPC infrastructure. This role will involve collaboration with our L3 HPC infrastructure engineering team to facilitate the use of an HPC cluster by our Scientific research team. Priority will be given to candidates residing in India, though the position is available to candidates from any location.

#LI-DNI

Responsibilities

  • Maintenance and support of the HPC infrastructure
  • Implementation of infrastructure automation through IaC (Infrastructure as Code)
  • Participation in software and hardware upgrades while resolving incidents
  • Management of job scheduling and resource distribution with HPC job schedulers
  • Configuration and installation of Bright Cluster Manager
  • Optimization and maintenance of GPFS/Lustre file systems
  • Supervision of InfiniBand/OmniPath network interconnect configurations
Requirements

Want more jobs like this?

Get jobs in Barra do GarΓ§as, Brazil delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • 10+ years as a general technical expert in HPC
  • Background in engineering or HPC system development
  • Experience in configuring and supporting HPC infrastructure
  • Proficiency in Linux (any rpm-based) including knowledge of kernel modules compilation and debugging tools such as strace, coredump, and tcpdump
  • Skills in managing HPC job schedulers including IBM LSF and Slurm
  • Competency in configuring and installing Bright Cluster Manager
  • Familiarity with GPFS and Lustre file systems
  • Understanding of InfiniBand and OmniPath network interconnect technologies
Nice to have
  • Understanding of hardware diagnostics, upgrades, and tuning including HCA InfiniBand and disk arrays from Lustre, Vast, IBM
  • Skills in infrastructure monitoring using Zabbix, Splunk, or Grafana
  • Familiarity with Easybuild
  • Experience in a GxP environment
  • Capability to use Jira and ServiceNow

Apply Now

Date Posted

01/24/2025

Views

0

Back to Job Listings ❀️Add To Job List Company Info View Company Reviews
Neutral
Subjectivity Score: 0

Similar Jobs

Lead Manufacturing Specialist 2 - Prod Process and Equip_AVI - GE Aerospace

Views in the last 30 days - 0

View Details

Software Engineer II, Backend - Enterprise Identity - Uber

Views in the last 30 days - 0

View Details

Senior Software Engineer I, Canvas Logic & Execution - Braze

Views in the last 30 days - 0

View Details

Project Quality Engineer - Licensing - Mattel

Views in the last 30 days - 0

View Details

Energy & Mining Sr Enterprise Account Exec - ServiceNow

Views in the last 30 days - 0

View Details

Customer Engineer, Google Cloud, Application Modernization - Google

Views in the last 30 days - 0

View Details