HPC Systems Administrator Job at Northeastern University
About the Opportunity
Job Summary
The Research Computing (RC) systems team at Northeastern University (NU) is seeking a talented individual to fill the role of High-Performance Computing (HPC) Systems Administrator. This critical role will help operate and maintain cutting edge technologies in support of the university research efforts and contribute to NURC’s mission to assist NU researchers in taking full advantage of the HPC resources located at the Massachusetts Green High Performance Computing Center (MGHPCC).
Reporting to the Senior HPC System Administrator, the successful candidate will support the ongoing operations of all NURC HPC systems – in particular NURC flagship HPC cluster – to maximize systems uptime and ensure that research needs are met. The successful candidate will also help with the design and integration of new and novel technology solutions to support research, teaching, and learning as they apply to RC systems and services.
Minimum Qualifications
Requirements
- Minimum of 3 years post-secondary education or relevant work experience.
- At least one year of experience in a combination of: building, configuration, and administration of large Linux clusters (e.g. storage, cluster computing, network, database, virtualized systems).
- Experience with configuration management (e.g. Ansible) and version control (Git).
- Experience diagnosing system and application software problems.
- Knowledge of Linux kernel internals, and kernel modules.
- Knowledge of or experience in networking systems, including DNS, HTTP, and TCP/IP.
- Familiarity with cluster configuration and management tools (e.g. Torque, SLURM, OGE).
- Demonstrated experience working in an environment with rapidly changing underlying technologies and job priorities.
- Knowledge of or experience administering computer security software and hardware requirements.
- Demonstrated team performance skills, service mindset approach, and the ability to act as a trusted collaborator.
- Demonstrated strong writing skills with an ability to document and communicate solutions to users and team members clearly.
- Ability and willingness to learn new technologies and remain current in developing trends in the HPC community
Preferred
- Experience with HPC systems, in particular HPC clusters.
- Experience with a parallel file system, e.g. GPFS.
- Experience with compilers, e.g. C/C++.
- Experience with parallel computing software (MPI, openMP).
- Experience with scripting languages, e.g. Bash, Python, Perl.
- Experience working with Agile methodologies.
- Experience with virtualization tools, container development and deployment/orchestration, eg Docker, Kubernetes, Terraform, Vagrant, etc.
- Experience with automating IT infrastructure provisioning, Infrastructure as Code (IaC).
Key Responsibilities & Accountabilities
- Help administer the RC HPC cluster, storage systems and other RC infrastructure, including hardware maintenance.
- Diagnose, solve, and implement solutions for the HPC cluster which may include hardware repairs (break/fix), operating system configuration, system software updates, and procedure automation.
- Proactively monitor and maintain the health and integrity of the RC systems including upgrading and patching.
- Use and develop additional monitoring scripts and/or platforms as needed.
- Take part in collaborative efforts defining and tracking performance metrics to ensure efficient current and future use of RC resources.
- Assist end-users through the RC’s ticket queue system
- Assist the RC systems team with network hardware and network service maintenance and configuration.
- Communicate progress and participate in reviews with the Senior HPC Systems Administrator, technical staff and senior management.
- Work in collaboration with RC’s Documentation Specialist to create new- or update existing- internal documentation in support of the RC HPC infrastructure.
- Build and maintain relationships with external vendor technicians, engineers and support teams.
- Participate in external collaborations (locally/regionally) such as NESE, NERC, MOC, etc.
- Attend conferences and workshops relevant to HPC technologies to advance skills.
- Participate in regional/national/international collaborations to advance skills and expand the NU RC solution/service catalog.
Position Type
Information Technology
Additional Information
Northeastern University considers factors such as candidate work experience, education and skills when extending an offer.
Northeastern has a comprehensive benefits package for benefit eligible employees. This includes medical, vision, dental, paid time off, tuition assistance, wellness & life, retirement- as well as commuting & transportation. Visit
https://hr.northeastern.edu/benefits/
for more information.
Northeastern University is an equal opportunity employer, seeking to recruit and support a broadly diverse community of faculty and staff. Northeastern values and celebrates diversity in all its forms and strives to foster an inclusive culture built on respect that affirms inter-group relations and builds cohesion.
All qualified applicants are encouraged to apply and will receive consideration for employment without regard to race, religion, color, national origin, age, sex, sexual orientation, disability status, or any other characteristic protected by applicable law.
To learn more about Northeastern University’s commitment and support of diversity and inclusion, please see
www.northeastern.edu/diversity
.
Please Note :
ajayjain.com is the go-to platform for job seekers looking for the best job postings from around the web. With a focus on quality, the platform guarantees that all job postings are from reliable sources and are up-to-date. It also offers a variety of tools to help users find the perfect job for them, such as searching by location and filtering by industry. Furthermore, ajayjain.com provides helpful resources like resume tips and career advice to give job seekers an edge in their search. With its commitment to quality and user-friendliness, Site.com is the ideal place to find your next job.