HPC System Administrator

APAR TECHNOLOGIES PTE. LTD.
Job Description
We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.
Key Responsibilities:
1. HPC System Management and Maintenance
• Install, configure, integrate, and maintain high-performance compute clusters and associated hardware
• Monitor system performance, troubleshoot issues, and ensure security compliance
• Process and document change management procedures
2. User Support and Consultation
• Assist users with computational jobs and optimize workflows for efficient resource utilization
• Provide training sessions and resolve user issues related to HPC environments
3. Software and Application Support
• Install, configure, and maintain scientific and engineering HPC software solutions
• Support software development for parallel computing and performance optimization
4. Collaboration with Research Teams
• Understand research project requirements and recommend appropriate HPC solutions
• Assist in designing and optimizing computational workflows for researchers
5. Resource Allocation and Scheduling
• Manage resource allocation and job scheduling within the HPC environment
• Implement policies for job queuing, resource limits, and workload balancing
• Enforce operational best practices and implementation plans
6. System and Network Optimization
• Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure
• Conduct performance benchmarking and optimization efforts
7. Documentation and Reporting
• Maintain detailed system documentation, configuration guides, and user manuals
• Generate reports on system performance, resource utilization, and operational efficiency.
Qualifications and Skills:
• Strong experience with HPC system administration, Linux-based environments, and cluster management tools.
• Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).
• Hands-on experience with networking protocols, security policies, and data transfer optimizations.
• Familiarity with scientific computing software and parallel programming techniques. Ability to troubleshoot complex system and application issues effectively.
• Strong communication skills to collaborate with researchers and support teams.
EA Number: 11C4879
We are seeking a skilled HPC System Administrator to manage and maintain high-performance computing (HPC) systems. The ideal candidate will be responsible for system administration, user support, software integration, and collaboration with research teams to optimize computational workflows.
Key Responsibilities:
1. HPC System Management and Maintenance
• Install, configure, integrate, and maintain high-performance compute clusters and associated hardware
• Monitor system performance, troubleshoot issues, and ensure security compliance
• Process and document change management procedures
2. User Support and Consultation
• Assist users with computational jobs and optimize workflows for efficient resource utilization
• Provide training sessions and resolve user issues related to HPC environments
3. Software and Application Support
• Install, configure, and maintain scientific and engineering HPC software solutions
• Support software development for parallel computing and performance optimization
4. Collaboration with Research Teams
• Understand research project requirements and recommend appropriate HPC solutions
• Assist in designing and optimizing computational workflows for researchers
5. Resource Allocation and Scheduling
• Manage resource allocation and job scheduling within the HPC environment
• Implement policies for job queuing, resource limits, and workload balancing
• Enforce operational best practices and implementation plans
6. System and Network Optimization
• Configure and maintain high-speed networks for optimal data transfer within the HPC infrastructure
• Conduct performance benchmarking and optimization efforts
7. Documentation and Reporting
• Maintain detailed system documentation, configuration guides, and user manuals
• Generate reports on system performance, resource utilization, and operational efficiency.
Qualifications and Skills:
• Strong experience with HPC system administration, Linux-based environments, and cluster management tools.
• Proficiency in job scheduling and resource management frameworks (e.g., Slurm, PBS, Grid Engine).
• Hands-on experience with networking protocols, security policies, and data transfer optimizations.
• Familiarity with scientific computing software and parallel programming techniques. Ability to troubleshoot complex system and application issues effectively.
• Strong communication skills to collaborate with researchers and support teams.
EA Number: 11C4879
JOB SUMMARY
HPC System Administrator

APAR TECHNOLOGIES PTE. LTD.
Singapore
2 days ago
N/A
Full-time
HPC System Administrator