Aptly Technology Logo

High-Performance Network Engineer (Infiniband)

Aptly Technology

All India, Delhi • 2 months ago

Experience: 5 to 9 Yrs

PREMIUM
Deal of the Day --:--:--

15 Days Free Trial

After Free Trial → Flat 50% OFF

Upgrade to CVX24 Premium

Offer Announcement Banner
  • Free Resume Writing
  • Get a Verified Blue tick
  • See who viewed your profile
  • Unlimited chat with recruiters
  • Rank higher in recruiter searches
  • Get up to 10× more recruiter visibility
  • Auto-forward profile to 10 top recruiters
  • Receive verified recruiter messages directly
  • Unlock hidden jobs, not visible to free users

A small token amount will be charged to verify. Get Refund in 48 Hours.
Free Earplugs Delivery Only after Payment of Rs. 99 for Five Consecutive Months.
After free-trial 6 Months subscription will be auto Activated @ $ 1 (Cancel Anytime). Quoted price includes 50% discount.

Job Description

As an InfiniBand Engineer, your role involves designing, deploying, and supporting high-performance, low-latency network infrastructures. You will have hands-on experience with InfiniBand fabrics, data center networking, and large-scale distributed computing environments such as HPC, AI, and ML clusters. **Key Responsibilities:** - Design, implement, and manage large-scale InfiniBand (IB) fabrics in data center and HPC environments. - Configure and troubleshoot InfiniBand switches and adapters (e.g., Mellanox / NVIDIA IB platforms). - Perform fabric bring-up, subnet management (OpenSM), partitioning, and performance tuning. - Monitor and optimize network performance, latency, throughput, and congestion control. - Integrate InfiniBand with Ethernet-based networking environments. - Support RDMA technologies (RoCE, iWARP) and GPUDirect environments. - Collaborate with system, storage, and compute teams to support AI/ML and distributed workloads. - Perform firmware upgrades, patching, and capacity planning. - Troubleshoot Layer 2 / Layer 3 networking issues (BGP, OSPF, VLAN, VXLAN, etc.). - Maintain documentation, network diagrams, and SOPs. **Required Skills & Qualifications:** - 5+ years of networking experience with solid fundamentals (TCP/IP, routing, switching). - Hands-on experience with InfiniBand technologies (HDR/NDR preferred). - Experience with NVIDIA / Mellanox Technologies switches and adapters. - Strong understanding of RDMA, congestion control, QoS, and low-latency tuning. - Experience with subnet managers (OpenSM) and fabric diagnostic tools. - Solid understanding of BGP, OSPF, EVPN-VXLAN, MPLS (good to have). - Experience in HPC, AI/ML cluster networking environments is highly preferred. - Familiarity with Linux networking and troubleshooting tools. - Experience with automation (Python, Ansible) is a plus. **Preferred Qualifications:** - Experience supporting large GPU clusters. - Knowledge of storage networking (NVMe-oF, parallel file systems). - Experience with monitoring tools and telemetry systems. - Networking certifications (CCNP/CCIE or equivalent). In addition to the above details, the company values individuals with strong analytical and troubleshooting skills, the ability to work in high-performance, mission-critical environments, excellent documentation, and communication skills, and a proactive problem-solving mindset. As an InfiniBand Engineer, your role involves designing, deploying, and supporting high-performance, low-latency network infrastructures. You will have hands-on experience with InfiniBand fabrics, data center networking, and large-scale distributed computing environments such as HPC, AI, and ML clusters. **Key Responsibilities:** - Design, implement, and manage large-scale InfiniBand (IB) fabrics in data center and HPC environments. - Configure and troubleshoot InfiniBand switches and adapters (e.g., Mellanox / NVIDIA IB platforms). - Perform fabric bring-up, subnet management (OpenSM), partitioning, and performance tuning. - Monitor and optimize network performance, latency, throughput, and congestion control. - Integrate InfiniBand with Ethernet-based networking environments. - Support RDMA technologies (RoCE, iWARP) and GPUDirect environments. - Collaborate with system, storage, and compute teams to support AI/ML and distributed workloads. - Perform firmware upgrades, patching, and capacity planning. - Troubleshoot Layer 2 / Layer 3 networking issues (BGP, OSPF, VLAN, VXLAN, etc.). - Maintain documentation, network diagrams, and SOPs. **Required Skills & Qualifications:** - 5+ years of networking experience with solid fundamentals (TCP/IP, routing, switching). - Hands-on experience with InfiniBand technologies (HDR/NDR preferred). - Experience with NVIDIA / Mellanox Technologies switches and adapters. - Strong understanding of RDMA, congestion control, QoS, and low-latency tuning. - Experience with subnet managers (OpenSM) and fabric diagnostic tools. - Solid understanding of BGP, OSPF, EVPN-VXLAN, MPLS (good to have). - Experience in HPC, AI/ML cluster networking environments is highly preferred. - Familiarity with Linux networking and troubleshooting tools. - Experience with automation (Python, Ansible) is a plus. **Preferred Qualifications:** - Experience supporting large GPU clusters. - Knowledge of storage networking (NVMe-oF, parallel file systems). - Experience with monitoring tools and telemetry systems. - Networking certifications (CCNP/CCIE or equivalent). In addition to the above details, the company values individuals with strong analytical and troubleshooting skills, the ability to work in high-performance, mission-critical environments, excellent documentation, and communication skills, and a proactive problem-solving mindset.

Posted on: March 7, 2026

Relevant Jobs

Technical Delivery Manager SRE & Infrastructure

GoWin Training

All India, Hyderabad

View Job →

Technical Account Manager, India

SingleStore

All India, Pune

View Job →

UI/UX Java/Python Developer

Alten calsoft labs

All India

View Job →

GCP Cloud Developer

PwC India

All India

View Job →

Technical Delivery Manager SRE & Infrastructure

GoWin Training

All India, Hyderabad

View Job →

Technical Delivery Manager SRE & Infrastructure

GoWin Training

All India, Hyderabad

View Job →

Technical Delivery Manager SRE & Infrastructure

GoWin Training

All India, Hyderabad

View Job →

Cyber Security - Internship

SkillLevel

All India

View Job →

Data Engineer

Infogain

All India, Bangalore

View Job →

Senior software architect, network systems

Happiest Minds Technologies

All India

View Job →