Erratic behavior of WUs after suspend/resume


Advanced search

Message boards : SZTAKI Desktop Grid : Erratic behavior of WUs after suspend/resume

AuthorMessage
wstomv
Send message
Joined: Jan 16 07
Posts: 4
Credit: 603
RAC: 0
Message 5553 - Posted 25 Jan 2007 13:10:37 UTC

    Here are two observations/questions:

    1) I have had WUs (on a Mac G4) that ran for well over 10 hours, seemingly making no progress at all (still stuck at 0%). Typically, I let them \"Run always\" since I do no number crunching on this machine. Occasionally, I suspend them. And then when I resume them, all of a sudden, the WU is completed and ready to report. As if suspending interrupted it. The result invariably was Success. I find this strange.

    2) When I suspend a task, the CPU timer stops (obviously). However, when I resume the same task, the time jumps back (on some occasions a significant amount). Is that time lost, somehow? I.e. was my CPU time wasted?
    ____________

    robert.mouris
    Send message
    Joined: Nov 3 05
    Posts: 129
    Credit: 4,124,194
    RAC: 0
    Message 5554 - Posted 25 Jan 2007 13:31:24 UTC - in response to Message 5553.

      The result invariably was Success.

      This project has problems with checkpoints and loops. Processing does only work properly if we don\'t stop crunching WUs. If we suspend them, it\'s OK, provided that the WU stays in memory. Otherwise on resumption the program jumps to the next line (that\'s the name we give to sub-units) to process. The result is seemingly correct for Boinc, but during joint validation of 3 units your work is discarded. The result is useless to the project and you get no credits. This is annoying, so please try by all means not to interrupt Boinc.

      Sometimes they can take many hours without the percentage going up, but be assured that the processing is correctly working. Other Boinc projects reported stuck WUs, but apparently not here.

      ____________

      gwg
      Avatar
      Send message
      Joined: Aug 1 06
      Posts: 58
      Credit: 213,886
      RAC: 0
      Message 5561 - Posted 25 Jan 2007 23:58:41 UTC - in response to Message 5554.

        The result invariably was Success.

        This project has problems with checkpoints and loops. Processing does only work properly if we don\'t stop crunching WUs. If we suspend them, it\'s OK, provided that the WU stays in memory. Otherwise on resumption the program jumps to the next line (that\'s the name we give to sub-units) to process. The result is seemingly correct for Boinc, but during joint validation of 3 units your work is discarded. The result is useless to the project and you get no credits. This is annoying, so please try by all means not to interrupt Boinc.

        Sometimes they can take many hours without the percentage going up, but be assured that the processing is correctly working. Other Boinc projects reported stuck WUs, but apparently not here.


        Also, if for some reason BOINC\'s 30-second “heartbeat” isn\'t sent in time, all processes that are active quit. I run a mix of processes under BOINCstats BAM! and — depending on the mix — a heartbeat can be missed every three to four hours. That doesn\'t bother other processes, but it causes STAKI to have to start all over again, or else skip the previous line. It happens even though I run 24/7 and rarely reboot. Not good. The STAKI team needs to find a way to checkpoint the state in the middle of a line.

        Since we are dealing with matrix manipulation and probably deep recursion, such a thing is hard to do unless the recursion is unwound. The saved state will be very large, but it is worth it too obtain more completed results faster.

        George
        ------
        ____________
        Dr George W Gerrity
        4 Coral Place
        Campbell, ACT 2612
        AUSTRALIA

        Ph: +61 2 6156 0286
        Time: +10 hours (ref GMT)
        PGP RSA Public Key Fingerprint:
        73EF 318A DFF5 EB8A 6810 49AC 0763 AF07

        Post to thread

        Message boards : SZTAKI Desktop Grid : Erratic behavior of WUs after suspend/resume


        Home | My Account | Message Boards


        Copyright © 2017 SZTAKI Desktop Grid