High-Performance Computing (HPC) Systems Administrator
Brigham and Women's Hospital
Mass General Brigham relies on a wide range of professionals, including doctors, nurses, business people, tech experts, researchers, and systems analysts to advance our mission. As a not-for-profit, we support patient care, research, teaching, and community service, striving to provide exceptional care. We believe that high-performing teams drive groundbreaking medical discoveries and invite all applicants to join us and experience what it means to be part of Mass General Brigham.
Job Summary
"The Martinos Center for Biomedical Imaging at Massachusetts General Hospital seeks a dedicated and highly motivated High-Performance Computing (HPC) Systems Administrator (Sysadmin) to oversee and optimize the center's HPC cluster, a core computational resource supporting cutting-edge biomedical and neuroimaging research. The HPC Sysadmin will play a critical role in maintaining and enhancing the cluster's performance, supporting researchers in their computational workflows, and ensuring the scalability and reliability of the system.This role is ideal for an individual with strong experience in HPC systems administration, an understanding of scientific computing needs, and the ability to work collaboratively with researchers from diverse disciplines."
"Work Environment
This position is based at the Martinos Center for Biomedical Imaging in the Charlestown Navy Yard. This position offers a hybrid work environment, allowing for a combination of remote work and on-site responsibilities. The candidate must be located within a commutable distance to Charlestown, MA, and be available to attend regular in-person meetings with the Center’s Faculty and Leadership.
Why Join Us?
• Work in a multidisciplinary environment supporting groundbreaking research in computational methods, neuroscience, cancer, and cardiovascular health.
• Operate a state-of-the-art HPC cluster in collaboration with world-class researchers and scientists.
• Be part of a team dedicated to pushing the boundaries of technology in biomedical
imaging."
Qualifications
Education
Bachelor's Degree Related Field of Study required
Licenses and Credentials
Class D Passenger Vehicle Driver's License [State License] - Generic - HR Only preferred
Experience
Experience in systems/applications administration. 2-3 years required
Key Responsibilities
- Cluster Management:
- Oversee the day-to-day operations, maintenance, and optimization of the Martinos Center's HPC cluster, ensuring high availability, reliability, and performance.
- Perform hardware and software upgrades, patching, and troubleshooting of HPC nodes, storage, and networking.
- User Support:
- Provide technical support and guidance to researchers and staff using the HPC cluster for computational tasks, such as neuroimaging, machine learning, and data analysis.
- Assist users with job scheduling, resource allocation, and troubleshooting.
- System Monitoring and Performance Optimization:
- Develop and implement robust monitoring tools to track resource utilization and identify performance bottlenecks.
- Analyze workloads and provide recommendations for optimization of computational workflows.
- Collaboration and Training:
- Collaborate with researchers to understand their computational needs and assist in designing tailored HPC solutions for their projects.
- Develop training materials and lead workshops to educate researchers on best practices for using the cluster.
Qualifications
Required:
- Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field.
- 3+ years of experience in HPC systems administration or equivalent.
- Strong expertise in Linux systems administration (e.g., CentOS, RHEL, Ubuntu) in an HPC environment.
- Experience with job scheduling using Slurm.
- Proficiency in HPC-related programming and scripting languages (e.g., Bash, Python, Perl).
- Familiarity with parallel computing, distributed systems, and scientific computing frameworks.
- Hands-on experience with storage systems, networking, and security in an HPC environment.
- Excellent interpersonal and communication skills to interact with researchers and non-technical staff, and previous experience working with researchers
- Demonstrated ability to adapt to changing technologies, workflows, and priorities in a dynamic research environment.
- Strong organizational and time-management skills to efficiently manage multiple concurrent projects and tasks.
Preferred:
- Advanced degree in Computer Science, Engineering, or a related field.
- Knowledge of biomedical or neuroimaging applications and related software (e.g., FreeSurfer, FSL, SPM, ANTs, MATLAB).
- Experience with machine learning workflows and GPU-based computing (e.g., PyTorch, CUDA, TensorFlow).
- Familiartiy with data-intensive workflows and large-scale storage systems.
Additional Job Details (if applicable)
Remote Type
Work Location
Scheduled Weekly Hours
Employee Type
Work Shift
Pay Range
$63,648.00 - $90,750.40/Annual
Grade
6
EEO Statement:
Mass General Brigham Competency Framework
At Mass General Brigham, our competency framework defines what effective leadership “looks like” by specifying which behaviors are most critical for successful performance at each job level. The framework is comprised of ten competencies (half People-Focused, half Performance-Focused) and are defined by observable and measurable skills and behaviors that contribute to workplace effectiveness and career success. These competencies are used to evaluate performance, make hiring decisions, identify development needs, mobilize employees across our system, and establish a strong talent pipeline.