An Old Problem Returns ???


Advanced search

Message boards : SZTAKI Desktop Grid : An Old Problem Returns ???

AuthorMessage
Stick
Send message
Joined: Jun 12 06
Posts: 193
Credit: 66,271
RAC: 0
Message 7592 - Posted 18 Dec 2008 23:00:19 UTC

    Last modified: 18 Dec 2008 23:03:43 UTC

    I got some of the most recent WU\'s on 12/17/08 and both of my hosts had started working on one. One WU had 8 hours of CPU time but was still at 0.00% Progress. The other was at about an hour of CPU time and was also still at 0.00% Progress. Then, today I installed a Windows Update on both hosts and had to restart the computers afterwards. When BOINC resumed working on the WU\'s after the restart, both WU\'s jumped from 0.00% Progress to 5.00% Progress immediately. For those of you who may not have been with this project very long, this \"jumping\" was a sympton of an old mid-line checkpointing problem that was discussed at length on several old threads. On single line WU\'s, it also resulted in the \"Output is Empty\" problem. AFAIK, the problem was never fixed. It just sort of \"went away\" when the current version of the program was introduced and the time required to process a \"line\" was reduced dramatically. Now that we are in dimension 13, it appears the time to process a line has risen again. Therefore, the problem appears to be back.
    ____________

    ai5000
    Send message
    Joined: Mar 19 08
    Posts: 8
    Credit: 284,115
    RAC: 0
    Message 7593 - Posted 19 Dec 2008 10:00:05 UTC - in response to Message 7592.

      I also had one stuck at 0% after running for an hour.

      mbliny
      Send message
      Joined: Feb 7 07
      Posts: 2
      Credit: 324,238
      RAC: 0
      Message 7594 - Posted 19 Dec 2008 16:40:39 UTC

        Same problem here. 8 hrs, 0%.
        Tried \"Suspend\" -> \"Resume\" still 0%.
        Tried exit boing manager, restart boink manager. Its now at 5%.

        Work Unit is:
        52b339a5-f283-42c9-90ef-35ded7000473_5559e73f-f849-4640-b8fe-9caae4be0673_1488_2

        ai5000
        Send message
        Joined: Mar 19 08
        Posts: 8
        Credit: 284,115
        RAC: 0
        Message 7595 - Posted 19 Dec 2008 21:24:01 UTC

          So how long is it suppose to be stuck at 5%? Or should I just abort this work unit?

          Stick
          Send message
          Joined: Jun 12 06
          Posts: 193
          Credit: 66,271
          RAC: 0
          Message 7596 - Posted 19 Dec 2008 22:02:36 UTC - in response to Message 7595.

            Last modified: 19 Dec 2008 22:03:46 UTC

            So how long is it suppose to be stuck at 5%? Or should I just abort this work unit?


            SZTAKI WU\'s tend to be unpredictable - so it\'s hard to know whether it\'s stuck or if the line is just taking a long time to calculate. However, in the past, a \"jump\" after a restart was an indication that the \"line\" was not calculated properly and, therefore, the result may not be valid. I hope Adam is reading this thread and will comment on these issues (i.e. being \"stuck\", \"jumping\" and \"validating\").
            ____________

            ai5000
            Send message
            Joined: Mar 19 08
            Posts: 8
            Credit: 284,115
            RAC: 0
            Message 7598 - Posted 20 Dec 2008 7:42:21 UTC

              Abort it is then!

              Profile Death
              Send message
              Joined: Jul 7 08
              Posts: 2
              Credit: 47,647
              RAC: 11
              Message 7600 - Posted 21 Dec 2008 22:43:29 UTC

                Database Status
                Status Approximate # of WUs
                Ready to send 10,791
                In progress 10,791

                what the??
                ____________
                ===
                wbr, Me. Dead J. Dona
                Jodis | AllSubmitter

                jjwhalen
                Send message
                Joined: Feb 15 06
                Posts: 3
                Credit: 35,795
                RAC: 0
                Message 7602 - Posted 24 Dec 2008 2:06:09 UTC

                  I recall that when this problem occurred last time, I simply turned SDG \"off\" for several months until it resolved.

                  Already, one of my machines can\'t get any work from SDG, due to the ongoing \"\'windows_x86_64\' not found\" problem that project management has known about for months. Now the other is (again) spending valuable CPU time on work of questionable validity. Once again, SDG is turned off pending resolution.

                  The ongoing lack of responsiveness by project management to these threads is worrying, whatever the reason. But, I have 9 other projects that are glad of the extra CPU cycles.

                  Happy holidays to all.



                  ____________
                  [http://www.boincstats.com/signature/user_398853.gif]

                  Odysseus
                  Avatar
                  Send message
                  Joined: Feb 27 06
                  Posts: 212
                  Credit: 221,397
                  RAC: 0
                  Message 7603 - Posted 24 Dec 2008 10:44:16 UTC - in response to Message 7596.

                    SZTAKI WU\'s tend to be unpredictable - so it\'s hard to know whether it\'s stuck or if the line is just taking a long time to calculate. However, in the past, a \"jump\" after a restart was an indication that the \"line\" was not calculated properly and, therefore, the result may not be valid. I hope Adam is reading this thread and will comment on these issues (i.e. being \"stuck\", \"jumping\" and \"validating\").

                    AFAICT “stuck” isn’t necessarily a problem (unlike in other projects that have more or less continuous progress indicators), depending on how many lines are in the task. A twenty-liner will only show progress in 5% increments, and it can indeed take quite some time to get through a line. OTOH the “jumping” issue can cause results to be considered invalid: it’s best to keep apps in memory and to avoid relaunching BOINC or restarting the system while an SDG task is running.


                    ____________

                    Profile [B^S] Astral Walker
                    Send message
                    Joined: May 30 06
                    Posts: 6
                    Credit: 26,070
                    RAC: 0
                    Message 7604 - Posted 24 Dec 2008 15:22:43 UTC

                      I\'ve been on NNW for a while but I\'ve heard from teammates that things were running better here. I\'m currently running this WU which I downloaded yesterday. It has a 14 day deadline and an ETC of about 9.5 days. I thought this was supposed to be fixed? Oh, and 16 hours in and the WU is still at 0%.

                      I\'m tired of wasting my CPU time here so if something isn\'t resolved and soon, not only am I going back to NNW I will detach for good. The lack of any communication after all this time by the project managers leads me to consider that this project has been abandoned.
                      ____________

                      Badidas
                      Send message
                      Joined: Dec 28 06
                      Posts: 1
                      Credit: 1,002,316
                      RAC: 0
                      Message 7607 - Posted 26 Dec 2008 11:59:10 UTC

                        ...this is madness, no answer or explanation from the admin(s). I already have spent to many cpu cycles on this \"dying\" project. Stop sending out work until the errors are fixed!!! Or tell us what\'s wrong.
                        None of my macs have managed to finish the 13th dimension packages. Hundreds of hours with wasted cpu time:(
                        ____________

                        6dj72cn8
                        Send message
                        Joined: May 27 06
                        Posts: 36
                        Credit: 8,504
                        RAC: 0
                        Message 7608 - Posted 27 Dec 2008 3:32:40 UTC

                          Last modified: 27 Dec 2008 3:56:14 UTC

                          I am afraid I have bailed out of my current WU. It is a 20-liner at 0% after nearly three hours, which points to a fairly long run time. The problem is that I am the sixth quorum member to receive it. Two have already aborted, two have over 500 other tasks each in their queue and the fifth member only joined on Christmas Eve and must be wondering what the heck has hit them. In my judgement, the WU will hit six failures before it hits three successes. So, while I\'m confident I could return a valid result, there is no point in doing so.

                          Back to NNT for another couple of months for me.
                          ____________

                          Profile Paul D. Buck
                          Send message
                          Joined: Jul 20 05
                          Posts: 35
                          Credit: 57,732
                          RAC: 0
                          Message 7611 - Posted 28 Dec 2008 17:39:58 UTC - in response to Message 7602.

                            I recall that when this problem occurred last time, I simply turned SDG \"off\" for several months until it resolved.

                            Already, one of my machines can\'t get any work from SDG, due to the ongoing \"\'windows_x86_64\' not found\" problem that project management has known about for months. Now the other is (again) spending valuable CPU time on work of questionable validity. Once again, SDG is turned off pending resolution.

                            The ongoing lack of responsiveness by project management to these threads is worrying, whatever the reason. But, I have 9 other projects that are glad of the extra CPU cycles.

                            Happy holidays to all.

                            I too have turned off this project and am waiting to see if they notice that there are problems. All indications are that they are not paying attention, again ...

                            Unlike you I have only about 40 other projects ... :)

                            But the good news is that I almost have my Gold badge for WCG Clean Energy Project making it my 6 or 7th gold badge ... only 30 some work days to process and I should be there ... also I am getting close to pushing WCG over my total work for SaH, a long time goal ... which I should do in about 9 days ...

                            Even better, I have a faster Nvidia card in the \"mail\" and that should let me very rapidly push GPU Grid right up there ... heck in a week or so of working my fairly slow 9800 card has done 30K of work already ... at that rate I could be pushing GPU GRid over SaH by mid summer!

                            Well, I can see that with no project response to a message posted nearly 2 weeks ago, I will have to wait awhile longer before I will be turning any computers back to this project ...
                            ____________

                            Profile [B^S] Astral Walker
                            Send message
                            Joined: May 30 06
                            Posts: 6
                            Credit: 26,070
                            RAC: 0
                            Message 7621 - Posted 2 Jan 2009 18:39:10 UTC - in response to Message 7604.

                              I\'m currently running this WU which I downloaded yesterday. It has a 14 day deadline and an ETC of about 9.5 days. I thought this was supposed to be fixed? Oh, and 16 hours in and the WU is still at 0%.


                              Even BOINC seems to have given up on this project. After 9 days I found that the WU is still on 16 hours and BOINC added a boatload of WUs from other projects in front of it and simply refses to run sztaki.
                              ____________

                              Profile philip-in-hongkong
                              Send message
                              Joined: Feb 21 06
                              Posts: 14
                              Credit: 199,266
                              RAC: 73
                              Message 7624 - Posted 3 Jan 2009 5:37:05 UTC

                                The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.
                                ____________

                                Profile philip-in-hongkong
                                Send message
                                Joined: Feb 21 06
                                Posts: 14
                                Credit: 199,266
                                RAC: 73
                                Message 7625 - Posted 3 Jan 2009 8:55:30 UTC - in response to Message 7624.

                                  The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.


                                  An update on this WU 2283038. 75% completed with 11hrs26mins (3hrs42mins to go). Looking good.
                                  ____________

                                  Profile philip-in-hongkong
                                  Send message
                                  Joined: Feb 21 06
                                  Posts: 14
                                  Credit: 199,266
                                  RAC: 73
                                  Message 7630 - Posted 3 Jan 2009 14:01:17 UTC - in response to Message 7625.

                                    The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.


                                    An update on this WU 2283038. 75% completed with 11hrs26mins (3hrs42mins to go). Looking good.


                                    Another update on this WU 2283038. 90% completed with 13hrs24mins (1hr31mins to go). Nearly finish.
                                    ____________

                                    Profile philip-in-hongkong
                                    Send message
                                    Joined: Feb 21 06
                                    Posts: 14
                                    Credit: 199,266
                                    RAC: 73
                                    Message 7632 - Posted 4 Jan 2009 1:44:39 UTC - in response to Message 7630.

                                      The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.


                                      An update on this WU 2283038. 75% completed with 11hrs26mins (3hrs42mins to go). Looking good.


                                      Another update on this WU 2283038. 90% completed with 13hrs24mins (1hr31mins to go). Nearly finish.


                                      Further update on this WU 2283038. 95% completed with 15hrs20mins (49mins to go). Almost ...
                                      ____________

                                      Profile philip-in-hongkong
                                      Send message
                                      Joined: Feb 21 06
                                      Posts: 14
                                      Credit: 199,266
                                      RAC: 73
                                      Message 7634 - Posted 4 Jan 2009 16:37:44 UTC - in response to Message 7632.

                                        The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.


                                        An update on this WU 2283038. 75% completed with 11hrs26mins (3hrs42mins to go). Looking good.


                                        Another update on this WU 2283038. 90% completed with 13hrs24mins (1hr31mins to go). Nearly finish.


                                        Further update on this WU 2283038. 95% completed with 15hrs20mins (49mins to go). Almost ...


                                        This story ended sadly. When the WU is completed and uploaded, a \"Too many total results\" error is generated. I counted the total replicates to be 11. So in any cases, this error message will be generated and no credits will be granted. A lesson learned but the system should keep the replicates to 10 and not sending out excessive WUs for crunching.
                                        ____________

                                        ANCHULA-MARK
                                        Send message
                                        Joined: Mar 2 08
                                        Posts: 2
                                        Credit: 115,796
                                        RAC: 0
                                        Message 7650 - Posted 8 Jan 2009 1:45:47 UTC - in response to Message 7634.

                                          The WU 2283038 looks good. 70% completed with 10hrs26mins (4hrs14mins to go). Please do not abort this WU. I will post the WU\'s progress again.


                                          An update on this WU 2283038. 75% completed with 11hrs26mins (3hrs42mins to go). Looking good.


                                          Another update on this WU 2283038. 90% completed with 13hrs24mins (1hr31mins to go). Nearly finish.


                                          Further update on this WU 2283038. 95% completed with 15hrs20mins (49mins to go). Almost ...


                                          This story ended sadly. When the WU is completed and uploaded, a \"Too many total results\" error is generated. I counted the total replicates to be 11. So in any cases, this error message will be generated and no credits will be granted. A lesson learned but the system should keep the replicates to 10 and not sending out excessive WUs for crunching.


                                          3 more machines are still running that wu just to get no credit at the end of it.

                                          etrecords
                                          Send message
                                          Joined: Aug 9 08
                                          Posts: 2
                                          Credit: 179,437
                                          RAC: 0
                                          Message 7653 - Posted 8 Jan 2009 10:08:36 UTC

                                            During my holliday 5 WU of SDG are loaded on my system. one finised and gives credits. One WU is finished after 66 hours and is waiting on validation.
                                            But the three other WU all crashed after 86.4 hours with the following error: Maximum CPU time exceeded
                                            This mans that the config of the WU is not correct and due to that fact my system hase worked 250 hours without any sense.

                                            I have seen this also at the end of the summer and switched SDG to NNW. I have decided to do this also at this moment until there is clear answer of the project lead how hey will prevent such atime wasting in the future

                                            b_rr
                                            Send message
                                            Joined: Jan 19 08
                                            Posts: 1
                                            Credit: 9,909
                                            RAC: 0
                                            Message 7654 - Posted 8 Jan 2009 11:10:03 UTC - in response to Message 7653.

                                              During my holliday 5 WU of SDG are loaded on my system. one finised and gives credits. One WU is finished after 66 hours and is waiting on validation.
                                              But the three other WU all crashed after 86.4 hours with the following error: Maximum CPU time exceeded
                                              This mans that the config of the WU is not correct and due to that fact my system hase worked 250 hours without any sense.

                                              I have seen this also at the end of the summer and switched SDG to NNW. I have decided to do this also at this moment until there is clear answer of the project lead how hey will prevent such atime wasting in the future


                                              The problem is here that the recent WUs computational time is undeterministic, so it implies that a lot of problems may occur.

                                              It\'s not official, but AFAIK Adam is working on splitting the current WUs toward reduced sized packets,but the server infrastructure has it\'s own limitations, so smaller sized WUs may cause server-side troubles.

                                              Please keep up the good work and don\'t deattach or come back later if you are disappointed, the project isn\'t dead, just they had their own problems.

                                              Post to thread

                                              Message boards : SZTAKI Desktop Grid : An Old Problem Returns ???


                                              Home | My Account | Message Boards


                                              Copyright © 2017 SZTAKI Desktop Grid