Skip to content

Cluster Maintenance Updates and News

Wulver Maintenance

Wulver Monthly Maintenance

Beginning Feb 1, 2024, ARCS HPC will be instituting a monthly maintenance downtime on all HPC systems on the second Tuesday from 9AM - 9PM. Wulver and the associated GPFS storage will be taken out of service for maintenance, repairs, patches and upgrades. During the maintenance downtime, logins will be disabled, users will not have access to their stored data in /project, /home and /scratch. All jobs that do not end before 9AM will be held by the scheduler until the downtime is complete and the systems are returned to service.

We anticipate maintenance to be completed by the scheduled time. However, occasionally the maintenance may be completed earlier than scheduled or could be extended to the following days. A notification will be sent to the user mailing list when the systems are returned to service or the maintenance window is extended. Additionally, users will encounter the cluster service information upon logging in to Wulver during maintenance. Please pay attention to the Message of the Day when logging in, as it will serve as a reminder for upcoming downtimes or other crucial cluster-related information. Users should take into account the maintenance window when scheduling jobs and developing plans to meet various deadlines. Please do not contact the help desk, HPC staff or open SNOW tickets for access to the cluster or data during the maintenance downtime.

Lochness Maintenance Updates

Lochness is Back Online!

Lochness is mostly back up after being moved to a new facility. The move required complete disassembly and reassembly of the entire cluster. There are 8 nodes down as they were damaged in the move, repairs forthcoming. Infiniband network issues affect 50 nodes, these are in "drain" state. Currently 120 nodes are fully functional. You can use sinfo to see the exact states of nodes accessible to you. Please email hpc@njit.edu for assistance.

Wulver Maintenance

GPFS Fileset Changes

Wulver will be out of service Wed Oct 18th between 9:00 am-11:00 am for updates and configuration changes. The maintence will be conducted to fix the stale file handle error on /scratch while accessing files from login node.

Maintencance Plans

Recommendation:

  • Each fileset gets it’s own inode namespace
  • Fileset names to automatically inherit pool policies
  • Additional fileset settings for chmod to not conflict with ACLs

Migration Plan:

  • Create new filesets and link under /mmfs1/Scratch and /mmfs1/Project
    • New fileset names with sata1-project_xx and nvme1-scratch_xx (no bearing on FS path)
    • New fileset have own inode spaces
  • Rsync data from old to new location
  • Job outage for final copy and change
  • Final rsyncs
    • Remove symlink for /mmfs1/scratch and create /mmfs1/scratch
    • Unlink/relink filesets in new location
    • Resolve any links remaining on nodes/images

Relocation of Lochness to Databank Datacenter

Dear Lochness Users,

We hope this message finds you well. We want to inform you of an upcoming significant event regarding our HPC (High-Performance Computing) cluster, lochness.njit.edu. The GITC datacenter is scheduled for demolition on November 1, 2023. In order to maintain the operation of our computing infrastructure, we will be relocating the cluster to the Databank colocation facility in Piscataway.

Key Details:

  • Cluster Shutdown Date: October 6th, 2023 at Noon
  • Anticipated Duration: Up to Seven Days
  • Operational Continuity: After the move the cluster will remain operational until the end of the semester.
  • User Migration: We are actively working on migrating all users to the new cluster, wulver.njit.edu, before the end of the semester.

Cluster Relocation Details:

The scheduled shutdown of the lochness.njit.edu cluster will take place on October 6th, 2023. We have estimated that the relocation process will require no longer than five days to complete. During this time, the cluster will not be accessible. We understand the importance of uninterrupted access to computational resources, and we will make every effort to minimize downtime.

Operational Continuity:

Rest assured that we are committed to maintaining cluster availability for your research and academic needs. The lochness.njit.edu cluster will remain operational until the end of the current semester. This means that you will have access to its computing power throughout your ongoing projects and coursework.

User Migration:

Our team is actively working on the migration process to ensure a smooth transition for all users. We plan to migrate all users to the new cluster, wulver.njit.edu, well before the end of the semester. Detailed instructions and support will be provided to facilitate this transition, and we will keep you updated on the migration progress.

We understand that this relocation may raise questions or concerns, and we are here to address them. Please feel free to reach out to hpc@njit.edu you have any specific inquiries or require further information.

We appreciate your understanding and cooperation during this transitional period. The relocation of our HPC cluster is aimed at providing you with an improved and more reliable computing environment.

Thank you for your ongoing support and contributions to our research community.