Berkeley Outage

Sir Ulli · 07.01.2005

January 7, 2005
Outage Notice. Because of nearby building construction there will be an extended power outage this coming weekend. All of our machines will be turned off so there will be no data or web service. The outage will begin Sunday, January 9, at 18:00 PST (January 10, 02:00 UT). We hope to be back up Monday at 08:00 PST (16:00 UT).

das gilt für BOINC und natürlich auf für Seti Classic.

http://setiweb.ssl.berkeley.edu/

mfg
Sir Ulli

ComputerTho · 08.01.2005

Na dann noch mal ordentlich den Cache füllen

Wenns mal wieder länger dauert ...

Sir Ulli · 08.01.2005

Original geschrieben von ComputerTho
Na dann noch mal ordentlich den Cache füllen

Wenns mal wieder länger dauert ...

erinnere mich gerade an ne Auzeit von einer Woche vor ca 3 Jahren, damals wurde das kabel wohl geklaut, ja die Kupferpreise waren damals schon sehr Hoch

finde gerade den Link nicht, aber damals war der Server 1 Woche down....

mfg
Sir Ulli

ComputerTho · 08.01.2005

Oh ja, das weiß ich auch noch

Das waren noch Zeiten.

War da net die WAN Anbindung kaputt?!
Ob das geklaut war oder ein Bagger durchs Kabel gefahren war weiß ich nicht mehr genau

Mike · 09.01.2005

Hi

Habe mal den cache auf 5 Tage gestellt.
Das sollte reichen

greetz Mike

Sir Ulli · 10.01.2005

Berkeley ist wieder da

, aber wie immer nach solch grösseren Ausfällen kann es Anfangs zu Problemen kommen.

mfg
Sir Ulli

Sir Ulli · 10.01.2005

update, scheint doch ein paar Probleme zu geben

January 10, 2005 - 20:00 UTC
Because of nearby building construction there was an extended power outage last night, during which all of our machines were turned off. Everything powered back up normally, except for one drive in the master science database, which failed upon booting up. Assimilating/splitting is turned off until we can figure out how to recover.

http://setiweb.ssl.berkeley.edu/tech_news.php

mfg
Sir Ulli

Sir Ulli · 10.01.2005

and the next Update

UPDATE: we were able to rescue the drive by replacing its circuit board (we figured the disks were good and had circuit boards fry in the past). However, there was slight data corruption in one of the pages on this disk, and therefore the database won't cleanly start. This is probably easy to fix, but we're waiting to hear back from the experts first.

mfg
Sir Ulli

ComputerTho · 11.01.2005

Na zum Glück sind die Daten nicht verloren.

Man kann nur hoffen, das die gut gesichert werden.

Gibts da eigentlich Daten über die Speichergrößen, die verwendet werden?

Sir Ulli · 12.01.2005

gibts da eigentlich Daten über die Speichergrößen, die verwendet werden?

ich meine gelesen zu haben das alleine die Datenbank ca 70 Gig hat

neueste Infos

January 11, 2005 - 20:00 UTC
Update on yesterday's disk failure: the database integrity has been checked, and the remaining off-line servers are now being started. For the record, these disk arrays were not only powered down, but unplugged from the wall during the outage. We've had disks (and monitors) die before that were on the edge of failure that finally died during a very clean power cycle. As well, this disk array happens to be completely non-RAIDed. We are firmly aware this is not optimal, and are very actively working towards replacing the array with bigger, RAIDed storage. We would have fallen back to a tape backup if yesterday's disk repair didn't work, and would have only lost scientific results. These are reproducible - just resplit missing tapes and resend work. Yes, there is a net loss of CPU processing, but users would still have the credit for the work completed since the last backup.

naja wollen hoffen das die Jungs das wieder hinkriegen...

mfg
Sir Ulli

ComputerTho · 12.01.2005

Eigentlich schade, dass man die Ergebnisse verliert, nur weil die Daten nicht auf RAID Systemen abgelegt werden können.

Hat jemand von euch auch Validate Errors gehabt innerhalb der letzten 2 Tage?

Bei mir jetzt schon 3:
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=7480363
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=7480244
http://setiweb.ssl.berkeley.edu/workunit.php?wuid=7568379

Hatte ich bisher noch nie auf dem System... :-[

ComputerTho · 31.01.2005

January 30, 2005
We will begin the migration to the new hardware at 19:00 UT tomorrow, 1/31. The project will be down for several hours.

Vielleicht nochmal schnell die Caches etwas mit WU auffüllen...

Man weiß ja nie, wann das ganze wieder läuft. *buck*

Frank · 31.01.2005

We are currently in the middle of a server outage that started at 19:00 UTC. We hope to be back on line within 4-5 hours. This estimate will be updated as time progresses.

Kann mir mal eben jemand auf die Sprünge helfen wann "19:00 UTC" nach unserer Zeit war ?

Danke

Sir Ulli · 31.01.2005

UTC = GMT

http://www.worldtimeserver.com/current_time_in_UTC.aspx

mfg
Sir Ulli

Frank · 31.01.2005

Besten Dank, Ulli, nu bin ich im Bilde

Nur bei SETI scheint mal wieder was daneben gegangen zu sein :
Updated: 22:30 UTC: <- 23:30 MEZ

We are currently in the middle of a server outage that started at 19:00 UTC. We hope to be back on line within 8-10 hours. This estimate will be updated as time progresses.
Ist doch irgendwie beruhigend, dass auf Berkeley zumindest im Bezug auf seine Unzuverlässigkeit verlass ist

Naja - meine Caches waren voll, ich kann nun also getrost in die Heia *lol*

Sir Ulli · 31.01.2005

scheint doch etwas länger zu dauern

Updated: 22:30 UTC:

We are currently in the middle of a server outage that started at 19:00 UTC. We hope to be back on line within 8-10 hours. This estimate will be updated as time progresses.

edit 2 Leute ein Gedanke

mfg
Sir Ulli

Major J · 01.02.2005

Habe noch für ~30-40h WUs ... mal sehen

pipin · 01.02.2005

Ich hab Gott sei dank letzte Tage als ich auf meinem Schleppi den Client installiert habe, die Cachedauer auf 5 Tage eingestellt.

Sir Ulli · 01.02.2005

ich habe auch keine Probs, genug Work für 2-3 Tage und dann noch EAH, aber es sind immer wieder Probleme mit der DB, die Probleme machen.

für die leute die das nicht wissen,

die DB, Datenbank von BOINC ist zu zeit ca 70 Gigabyte gross

, es wird zwar eine SQL-DB eingesetzt, aber Probleme mit importieten Daten wie zb SAH Classic, machen immer wieder Probleme, wie ich aus eigener Erfahrung weiss.

ausserdem wird auf dem Neuen Server ein Hardware Raid eingesetzt, auf der alten Sun lief ja nur ein Software Raid, und alles von einem alten Server auf einen neuen zu übertragen ist nun mal recht aufwendig, wenn wir pech haben fällt noch ne Platte aus oder...

Daumen drück das alles soweit gut geht.

btw der neue Server ist nen Dual Opteron, auf 4 CPUs erweiterbar, mit 8 Gig Ram, und nen Ultra320 SCSI disks Hardware Raid.

mfg
Sir Ulli

Sir Ulli · 01.02.2005

update scheint wieder zu laufen

http://setiweb.ssl.berkeley.edu/index.php

http://setiweb.ssl.berkeley.edu/sah_status.html

das Forum ist noch down... aber

SETI@home - 2005-02-01 01:46:33 - Sending request to scheduler: http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi
SETI@home - 2005-02-01 01:46:37 - Scheduler RPC to http://setiboinc.ssl.berkeley.edu/sah_cgi/cgi succeeded

sieht also gut aus.

edit mein Firefox Plugin läuft auch wieder...

mfg
Sir Ulli

Sir Ulli · 01.02.2005

das war wohl nix

February 1, 2005 - 01:00 UTC
We were about one third or so through the database migration when the new server hung and the migration job stopped. We are diagnosing the problem now. During the migration the internal I/O mentioned below (we think it is some sort of garbage collection) was also occurring. This was vastly slowing the data movement to the new server. In addition to figuring out why the server crashed, we will wait until the garbage collection is finished before restarting the migration. It will be at least a day from now.

naja neuer Versuch schadet nix, war mir fast schon klar das das nicht so einfach wird.

mfg
Sir Ulli

Major J · 01.02.2005

Funktioniert wieder. Super Sache!

ComputerTho · 01.02.2005

Die Migration auf die neue Hardware hat aber nicht geklappt, oder?

January 31, 2005
The DB migration stopped when the new server crashed. See Technical News.

February 1, 2005 - 01:00 UTC
We were about one third or so through the database migration when the new server hung and the migration job stopped. We are diagnosing the problem now. During the migration the internal I/O mentioned below (we think it is some sort of garbage collection) was also occurring. This was vastly slowing the data movement to the new server. In addition to figuring out why the server crashed, we will wait until the garbage collection is finished before restarting the migration. It will be at least a day from now.

Sir Ulli · 01.02.2005

neuer Tag neuer Versuch,

hoffen wir das es beim nächsten mal klappt.

mfg
Sir Ulli

Sir Ulli · 03.02.2005

irgentwie scheint BOINC down zu sein

http://setiweb.ssl.berkeley.edu/index.php

ein ping geht aber *noahnung*

edit alles klar

Warning: mysql_pconnect(): Too many connections in /disks/koloth/raid5_b/users/boincadm/projects/sah/html/inc/db.inc on line 16
Unable to connect to database - please try again later Error: 1040Too many connections

mfg
Sir Ulli

Berkeley Outage

Grand Admiral Special

Commander

Grand Admiral Special

Commander

Grand Admiral Special

Grand Admiral Special

Grand Admiral Special

Grand Admiral Special

Commander

Grand Admiral Special

Commander

Commander

Admiral Special

Grand Admiral Special

Admiral Special

Grand Admiral Special

Admiral Special

Administrator

Grand Admiral Special

Grand Admiral Special

Grand Admiral Special

Admiral Special

Commander

Grand Admiral Special

Grand Admiral Special

Ähnliche Themen

Aktuelle Aktionen

Wichtige Links