above are the results from the first week of uptime studies on the hdi test stand here at ucsb.
we find that the rate of hardware failures varies wildly on timescales longer than about 15 minutes. we are currently accumulating more data on this subject, so this page will be updated fairly often. (the image is automatically updated at the end of every data-taking run, so only the commentary will be lacking.)
the grayed areas in the plot represent times when the uptime routine was not taking data. the vertical lines represent midnight on the date indicated. statistical errors only are indicated.
okay... now the result... this graph seems to strongly suggest that the cause of these failures is environmental in nature. if the failures were due to a malfunctioning component on the ATOM chip, on the HDI or DAQ link card, or on the ROM, we would expect the failure rate to be fairly constant throughout the day. this suggests that our test stand is susceptible to some as yet unknown environmental condition which causes these failures. the near absence of failures during the hours of 9 PM and 4:30 AM suggests as well that this environmental condition is related to building occupancy. there are always exceptions, however, as indicated by the overnight data on 5-6 nov.
created and maintained by michael mazur
last modified on 6/11/01
created on 31/10/01