Research Computing >> 2006 news archive

RESEARCH COMPUTING NEWS 2006

Please check here regularly for updated news and information on research computing.


Recent announcements

November 21: SSCC batch server out of service
November 6: SSCC back online after problem with file server.
October 10: "/scratch2" in skew3 accessible -- /scratch2 clean up policy.
October 10: WRDS reported outage last night.
October 9: SSCC back online after complete hardware overhaul.
October 6: skew3 up coming maintenance; WRDS service disruption.
September 29: skew3 and SSCC back online after power outage.
August 31: Stata Journal issues 6(2) and 6(3) available now
August 29: WRDS UNIX shell and PC SAS connections restored this afternoon
August 28: WRDS server down
August 21: Hardware upgrades in the SSCC
August 4: Daily US Treasury
July 8: Matlab server was down
February 28: Latest issues of Stata Journal

Announcements during previous years:


November 21: SSCC batch server out of service.

The PBS Pro batch server in the SSCC experienced problems yesterday and stopped working last night, during routine maintenance. Jobs that were running in the batch queue are unaffected by this outage. However, user will be unable to submit new jobs and check the status of the existing ones (the server will refuse the connection when the qstat command is issued). PBS Pro technical support has been contacted and the issue will hopefully be resolved today.

November 6: SSCC back online after problem with file server.

Home directories in the SSCC were not accessible over the weekend and this morning due to a crash of the system's file server. The SSCC system administrators are still investigating the cause of the crash, but the system is accessible again. Jobs that were running in the batch queue during the outage may have timed out if they needed to write to a user's home directory. If you had any job running that required writing to your home directory, please go over the output carefully.

October 10: "/scratch2" in skew3 accessible again -- /scratch2 clean up policy.

To skew3 users:

/scratch2 is accessible again: NUIT system administrators have been able to remount the disk array that holds the files in the "/scratch2" directory. Fortunately, this has been possible without rebooting skew3. Furthermore, the data held in this temporary storage space is intact as far as can be determined. Some files could be corrupted if a program was writing to them when the power outage occurred.

The problem was due to a change in the disk array's device ID, probably a consequence of the power outage on September 29.

New clean up policy on /scratch2: Starting on Monday, October 16, we will make effective a clean up policy on files in "/scratch2": Every night, the system will run a script to delete files older than 120 days. This is a longer term equivalent of the policy in effect for "/scratch", where files 15 days or older are purged every night. Please make judicious use of these shared spaces for your work. "/scratch" has 400GB of space allocated, while "/scratch2" has 350GB of space. At any point, you can check how much space is available in both directories by issuing the following command: df -k

Once again, thanks for your patience while the system administrators diagnosed and fixed the problem. Please feel free to contact us if you any further concerns or questions.

October 10: WRDS reported outage last night

Last night, WRDS was unavailable from approximately 7:30 pm to 9:45 pm (ET) due to a server problem. A few users also reported data permission problems early today that are related to last night’s downtime and WRDS is fixing this problem.

October 9: SSCC back online after complete hardware overhaul

The Social Sciences Computing Cluster has returned to service after upgrading the computers with new hardware.

Please note that significant changes have been made:

  • The SSH secure shell host identities of the interactive hosts have changed. When you first login, you will be warned that the host identification has changed. Continue with the connection, and save the new host key to the local database.
  • The node name of hardin2 has been changed to hardin. Change your login profile to login to hardin.it.northwestern.edu.
  • The IPR computer mule2 is still out of service, pending an operating system upgrade. Please use hardin and seldon until mule2 is put back into service.
  • All user accounts and home directory files were unchanged by this upgrade, but files in scratch directories on seldon, and hardin2 were destroyed.

The upgrades include the following:

  • Seldon and hardin are now Dual-CPU AMD Opteron Model 254 (2.8 GHz) machines, each with 16 GB of memory. These machines are expected to run 2 times faster than the AMD Athlons previously in use.
    The AMD Opterons are also Intel-compatible CPUs, but, like the Athlons, their speed in MHz is not directly comparable to the speed of Intel CPUs in MHz.
  • Mule2 was unchanged, in terms of hardware. The operating system will be upgraded to bring it to the same level as the new machines.
  • Sixteen batch nodes were added with the same dual-CPU architecture, but with 8 GB of memory per dual-CPU machine.
    The new AMD Opteron CPUs provide 64-bit memory address support, so that jobs requiring more than 2 GB of memory may be run with programs that provide 64-bit capability.
  • Software applications will be upgraded to 64-bit capability as they become available.
Currently, only MATLAB has been upgraded to 64-bit capability.

Planned upgrades include Stata, GAUSS and the Portland Group Fortran and C/C++ compiler suite.

Other applications may be transparently run in 32-bit compatibility mode on the new machines.

The timing of this change is designed to facilitate the needs of people with computational deadlines in late September as well as people with deadlines in November.

The technical details of this upgrade will be documented separately on the SSCC web site at http://sscc.northwestern.edu.

Funding for this upgrade was provided by the generous support of President Henry Bienen, the Weinberg College of Arts and Sciences, the Kellogg School of Management, the School of Education and Social Policy, the Institute for Policy Research, and NU Information Technology.

If you have questions concerning the SSCC upgrade, please contact:

Bruce Foster (bef@northwestern.edu) 847/491-4055. Northwestern University | NU IT Academic Technologies NU Library, 2EAST | 1970 Campus Drive | Evanston IL 60208-2323 847/491-4055 | http://charlotte.at.northwestern.edu/bef/

October 6: skew3 up coming maintenance; WRDS service disruption.

Skew3 up coming reboot

Following last Friday's power outage, the disk array that holds one of the directories dedicated to temporary storage space, "/scratch2", needs to be repaired and reconnected to skew3. This may require several reboots of skew3 which we will not be able to schedule with any significant lead time because they require some coordination of NUIT system administrators and technical support staff from Sun Microsystems. The work so far, has been to try to diagnose the precise nature of the problem and all involved suspect a hardware problem that will require rebooting the system.

Thus, we ask that you avoid running jobs that require several days of computing on skew3 at this time. We will avoid any rebooting of skew3 until next week, when the Social Science Computing Cluster (SSCC, http://sscc.northwestern.edu) is back online after a complete hardware upgrade, so everyone will have a platform to do computing while skew3 is worked on. When the reboot becomes necessary, we will send the message the day before.

Please let us know if you have any further questions or concerns.

WRDS service disruption

This afternoon we were notified of a disruption of access to WRDS between 4:00pm and 4:15pm EST due to some network maintenance work in their system.

September 29: skew3 and SSCC back online after power outage.

Due to a power outage in Northwestern's data center, major NU servers were not accessible for most of the day. The Social Science Computing Cluster (SSCC) was back online around 5:00pm, while Kellogg's UNIX server, skew3, was brought back online close to 11:00pm. The system engineers were unable to connect to skew3's disk arrays (/scratch2) and have therefore created a pointer to a different directory. Further work on the failed disk array will continue on Monday morning. Please report any problems you might encounter to Research Computing. You can check the status of major servers by pointing your browser to status.northwestern.edu.

August 31: Stata Journal issues 6(2) and 6(3) available now

The latest electronic copies of the Stata Journal are available online.

August 29: WRDS UNIX shell and PC SAS connections restored this afternoon

Earlier today the WRDS server was brought back to full service, with UNIX shell and PC SAS connections restored. User files were recovered from backup tapes and all contents is now restored to its status on Friday, August 25 at 10:00pm.

August 28: WRDS server is down

The WRDS server was down for several hours yesterday, Sunday August 27th, and today. WRDS system engineers have been working with Sun to determine the cause of a failure in the file system. As of 6:45pm today, web services were available, while UNIX shell and PC SAS connections had not been restored yet. If the cause of the problem is a hardware failure, there will be additional down time. Otherwise, WRDS expects to bring the remaining services back up tomorrow, August 29. The message from WRDS:

"WRDS Server Unavailable, August 27 & 28, 2006

The WRDS server was unavailable for a significant time period on Sunday August 27 and Monday August 28. The problem stemmed from a file system error and the WRDS server was shut down late Sunday August 27 for approximately 7 hours and again on Monday starting at 1 PM.

Downtime is likely to continue until after 8PM today (ET). We are in contact with Sun system engineers and we hope to resolve the issue shortly.

We apologize for any inconvenience."

August 21: Hardware upgrades in the SSCC

Thanks to co-funding from the Social Science Computing Cluster's major partners, the compute nodes on the SSCC will be replaced with the latest compute technology this October.

We need your cooperation as we prepare for this upgrade, particularly by avoiding running jobs on the SSCC that will not be finished by October 2.

The Social Sciences Computing Cluster will be removed from service for the week starting Monday October 2, 2006 in order to perform extensive hardware upgrades.

The SSCC will return to service on Monday October 9, 2006.

Please note the following interruptions in service during the upgrade week:

  • ALL SSCC Interactive Hosts (seldon, hardin2, mule2) will be unavailable October 2-October 9.
  • ALL interactive jobs running on seldon, hardin2 and mule2 will be terminated on October 2.
  • ALL SSCC Batch Nodes will be shut down on October 2,
  • ALL SSCC batch jobs still running on October 2 will be lost this includes jobs running at the time of shutdown and those waiting in a queue to begin execution. We cannot transfer batch jobs to the new SSCC system configuration.
  • All user accounts and home directory files will be unchanged by this upgrade, but files in scratch directories on seldon, hardin2 and mule2 will disappear.

Electronic mail service on seldon, hardin2 and mule2 will be unavailable during October 2-October 9.

New computers will be installed during the week of October 2-October 9.

The upgrades include the following:

  • Seldon and Hardin2 will be replaced with Dual-CPU AMD Opteron Model 254 (2.8 GHz) machines, each with 16 GB of memory. These machines are expected to run 2-3 times faster than the AMD Athlons currently in use.The AMD Opterons are also Intel-compatible CPUs, but, like the Athlons, their speed in MHz is not directly comparable to the speed of Intel CPUs in MHz.
  • Mule2 will be unchanged.
  • Sixteen batch nodes will be added with the same dual-CPU architecture, but with 8 GB of memory per dual-CPU machine. The new AMD Opteron CPUs provide 64-bit memory address support, so that jobs requiring more than 2 GB of memory may be run with programs that provide 64-bit capability.
  • Software applications will be upgraded to 64-bit capability as they become available. Currently, that includes MATLAB, Stata, GAUSS and the Portland Group Fortran and C/C++ compiler suite. Other applications may be transparently run in 32-bit compatibility mode on the new machines.

The timing of this change is designed to facilitate the needs of people with computational deadlines in late September as well as people with deadlines in November.

The technical details of this upgrade will be documented separately on the SSCC web site at sscc.northwestern.edu.

Funding for this upgrade was provided by the generous support of President Henry Bienen, the Weinberg College of Arts and Sciences, the Kellogg School of Management, the School of Education and Social Policy, the Institute for Policy Research, and NU Information Technology.

If you have questions concerning your preparation for the SSCC upgrade, please contact:

Bruce Foster (bef@northwestern.edu) 847/491-4055. Northwestern University | NU IT Academic Technologies NU Library, 2EAST | 1970 Campus Drive | Evanston IL 60208-2323 847/491-4055 | charlotte.at.northwestern.edu/bef

August 4: Daily US Treasury

Daily US Treasury is available now through WRDS. For the contents of the daily US treasury data please go to this page.

July 8: Matlab server was down

The Matlab license manager was reported down during the morning and was reset shortly before 5:00pm; it is now working properly.

February 28: Latest issues of Stata Journal

The latest electronic copies of the Stata Journal is available online.

Access to the STBs and the Stata Journal is by request only. Send a message to Research Computing to have your netid added. Hard copies of all issues are available in the research computing library (Jacobs 4219) for short-term borrowing by Kellogg faculty and doctoral students only.

© 2001-2010 Kellogg School of Management, Northwestern University