IT - InfiniBand GPU, Sr Systems Engineer (Atlanta, Georgia)

Cadence Remote

Company

Cadence

Location

Remote

Type

Full Time

Job Description

At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology.

Responsibilities;

  • Responsible for assisting with all projects and repairs throughout the data center.
  • Participate in an on-call rotation and provide hands-on coverage during maintenance.
  • Direct and perform tasks related to solving operational issues within the data center
  • Analyze and design operations that will improve workflow, handle equipment layout, and help ensure accident prevention
  • Support operations, including the physical layout of equipment.
  • Customer deployments and ensure on-time bring-up of GPU Servers.
  • InfiniBand fabric bring-up, configuration, and subnet management on the IB switch.

Want more jobs like this?

Get Software Engineering jobs that are Remote delivered to your inbox every week.

By signing up, you agree to our Terms of Service & Privacy Policy.
  • Document existing operational processes, equipment, and processes.
  • Utilize a framework for monitoring tools, escalating key issues, and ensuring timely service implementation.
  • Diagnoses/troubleshoots/installs/repairs all software, hardware, and components.
  • Installing, Basic Configuring, and Troubleshooting Networking Equipment: Routers and Switches.
  • Good understanding of the OSI Model and TCP/IP protocol suite (IP, ARP, ICMP, TCP, UDP, SMTP, FTP, TFTP)
  • Configure Terminal Servers for out-of-band management
  • Manage daily issues, including daily health checks of servers and processes, working closely with end-users, development teams, and Infrastructure teams to prioritize, resolve, and mitigate outages.
  • Server installation and maintenance (rack and stack, label, HDD, memory, CPU, RAID batteries, NICs, etc.)
  • Able to review design documentation & validate equipment deployment according to plans
  • Network installation and maintenance (rack and stack, label, cabling, parts replacement, etc.)
  • The site builds and refreshes while meeting current quality standards
  • Interact with onsite staff and vendors for hardware replacement, delivery, and diagnostics.
  • Perform operational tasks associated with data center implementation, migration, deployments, cabling, rack, and stack.
  • Responsible for assisting with all projects and repairs throughout the data center.
  • Participate in an on-call rotation and provide hands-on coverage during maintenance.

Requirements;

  • Experience with cluster bring-up, drivers, loading
  • Experience with GPU end to end testing in a cluster with InfiniBand
  • Experience with setup of GPU servers in a cluster.
  • Need experience in Linux environments and proficiency in tasks such as shell scripting
  • Excellent data center organization skills and meticulous attention to detail.
  • Familiarity with fiber and copper network cabling, including IP and SAN deployments.
  • Responsible for maintaining acceptable ticket loads and incident SLAs.
  • Follow documented escalation procedures.
  • Sync with global teams on various tasks and upcoming initiatives.
  • Understand and adhere to documented policies, processes, and procedures
  • Assist with process improvement initiatives and documentation of policies, processes, and procedures, including runbooks.
  • Able to move 50+ pounds

#LI-MA1

We're doing work that matters. Help us solve what others can't.

Apply Now

Date Posted

01/24/2025

Views

0

Back to Job Listings ❤️Add To Job List Company Info View Company Reviews
Positive
Subjectivity Score: 0.9

Similar Jobs

Account Manager, Care Partnerships - Headway

Views in the last 30 days - 0

Headway a mental health care company founded in 2019 aims to revolutionize mental healthcare by building a national network of providers accepting ins...

View Details

Linux Support Engineer - Voltage Park

Views in the last 30 days - 0

Voltage Park is seeking a Linux Support Engineer for a fulltime remote position The ideal candidate will have command line level Linux sys administrat...

View Details

Technical Architect - CDW

Views in the last 30 days - 0

CDW offers a rewarding career opportunity for a Technical Architect with expertise in ServiceNow The role involves delighting customers by collaborati...

View Details

Federal Security Solutions Engineer - Rapid7

Views in the last 30 days - 0

Rapid7 is seeking a Federal Solutions Engineer with 5 years of experience in cybersecurity solutions engineering or technical sales focusing on federa...

View Details

Sales Engineer - Dandy

Views in the last 30 days - 0

Dandy a venturebacked company is revolutionizing the 200B dental industry with advanced technology They are looking for a Sales Engineer with 5 years ...

View Details

Engineering Manager (Group Practice Tooling & Provider CX) - Headway

Views in the last 30 days - 0

Headway is a mental healthcare company founded in 2019 aiming to build a new mental health care system accessible to everyone They have a national net...

View Details