when do these wu finish?


Advanced search

Message boards : SZTAKI Desktop Grid : when do these wu finish?

AuthorMessage
Profile Jordan Bashir
Avatar
Send message
Joined: Jun 6 06
Posts: 3
Credit: 13,213
RAC: 0
Message 5292 - Posted 25 Dec 2006 5:22:25 UTC

    hello

    i have 4 wus on a quad g5:
    http://szdg.lpds.sztaki.hu/szdg/show_host_detail.php?hostid=222327
    http://szdg.lpds.sztaki.hu/szdg/workunit.php?wuid=57806
    http://szdg.lpds.sztaki.hu/szdg/workunit.php?wuid=57796
    http://szdg.lpds.sztaki.hu/szdg/workunit.php?wuid=57793
    http://szdg.lpds.sztaki.hu/szdg/workunit.php?wuid=57806

    all wu have run >32h, all read \'\'00:02:18\'\' to complete, all are \'\'0.00%\'\' progress?!

    when do these wu finish?
    ____________

    Profile Jordan Bashir
    Avatar
    Send message
    Joined: Jun 6 06
    Posts: 3
    Credit: 13,213
    RAC: 0
    Message 5293 - Posted 25 Dec 2006 7:59:58 UTC - in response to Message 5292.

      all 4 wu are now \'\'ready to report\'\' after 33:40:57, but next wus are with \'\'0.00%\'\' progress again!

      how long do these wu normally take?
      ____________

      Stick
      Send message
      Joined: Jun 12 06
      Posts: 193
      Credit: 66,271
      RAC: 0
      Message 5294 - Posted 25 Dec 2006 8:43:07 UTC - in response to Message 5293.

        all 4 wu are now \'\'ready to report\'\' after 33:40:57, but next wus are with \'\'0.00%\'\' progress again!

        how long do these wu normally take?


        Your units are all from the 21b342bd series. That series is well known for having problems. They are usually single-line units - meaning that progress is always 0.00% while the line is being processed, then it jumps to 100.00% when the line is finished. I have posted several times on this thread describing the problems I have encountered when processing them. Another, related thread that discusses \"Output is empty\" problems may also be helpful to you.
        ____________

        Profile Jordan Bashir
        Avatar
        Send message
        Joined: Jun 6 06
        Posts: 3
        Credit: 13,213
        RAC: 0
        Message 5295 - Posted 25 Dec 2006 9:07:01 UTC - in response to Message 5294.

          thank you
          ____________

          Profile John Hunt
          Avatar
          Send message
          Joined: Sep 15 06
          Posts: 24
          Credit: 20,933
          RAC: 0
          Message 5298 - Posted 26 Dec 2006 11:15:32 UTC

            Last modified: 26 Dec 2006 11:16:31 UTC

            If these 21b342bd are known to be \'problem\' WUs, why are they still being issued?
            I decided to give this project another try and find myself still running one of these after 25 hours now.....

            http://szdg.lpds.sztaki.hu/szdg/workunit.php?wuid=58366



            ____________

            robert.mouris
            Send message
            Joined: Nov 3 05
            Posts: 129
            Credit: 4,124,194
            RAC: 0
            Message 5300 - Posted 26 Dec 2006 12:32:54 UTC - in response to Message 5298.

              If these 21b342bd are known to be \'problem\' WUs, why are they still being issued?

              The description of the project is: \"The aim of the project is to find all the generalized binary number systems up to dimension 11.\" Well, this is outdated because we are now searching dimension 12. Nevertheless, we must crunch all the WUs, including the \"problem\" WUs. Until the batch of them is completely processed, they must be reissued. Otherwise the scientific value of the whole work is useless.

              I would not consider them as being \"problem\" WUs, even if they are sometimes hard to get their quorum right away. What is wrong with 25+ hours, as long as the maximum CPU error doesn\'t occur? CPDN has WUs running thousands of hours and this is accepted by the crunching community.

              ____________

              Profile John Hunt
              Avatar
              Send message
              Joined: Sep 15 06
              Posts: 24
              Credit: 20,933
              RAC: 0
              Message 5302 - Posted 26 Dec 2006 12:50:52 UTC - in response to Message 5300.


                I would not consider them as being \"problem\" WUs, even if they are sometimes hard to get their quorum right away. What is wrong with 25+ hours, as long as the maximum CPU error doesn\'t occur? CPDN has WUs running thousands of hours and this is accepted by the crunching community.


                I run CPDN with no problem at all. I also have run WCG WUs which are 12 hrs plus.
                Where I consider there is a problem is that so far two computers have already proved \'valid\' results for this WU - one of them in 8 minutes and the other after 3 and a half hours.....

                Concerning the seeming lack of response from admins to queries etc. on this project, CPDN does have devs and moderators who are always there to answer questions and give advice!



                Stick
                Send message
                Joined: Jun 12 06
                Posts: 193
                Credit: 66,271
                RAC: 0
                Message 5303 - Posted 26 Dec 2006 14:17:20 UTC - in response to Message 5300.

                  Last modified: 26 Dec 2006 14:19:41 UTC

                  I would not consider them as being \"problem\" WUs, even if they are sometimes hard to get their quorum right away. What is wrong with 25+ hours, as long as the maximum CPU error doesn\'t occur?


                  Although a \"Max CPU time\" error is a concern, the major problem with these WU\'s (as I see it) is the checkpointing problem. That is, these units are never able to write a usable checkpoint and, therefore, must complete processing without an intervening restart of BOINC to finish successfully and validate. If BOINC is ever restarted after the first (faulty) checkpoint is written, the units will end immediately with \"Output is empty\". Since the units tend to take a long time, that is a tall order. I would add that I have yet to see one of these units form a quorum and validate. In fact, I don\'t believe I have ever seen a single result that did not have an \"Output is empty\" message (or have some other error). Therefore, until Adam finds and fixes the checkpointing issue, these units are a big problem.
                  ____________

                  Profile John Hunt
                  Avatar
                  Send message
                  Joined: Sep 15 06
                  Posts: 24
                  Credit: 20,933
                  RAC: 0
                  Message 5310 - Posted 26 Dec 2006 19:36:39 UTC


                    31 hours of crunching so far on this WU and it is still showing 0.00%.
                    I\'ve read a few of the comments in this forum about these WUs and I\'ve come to the conclusion that I would rather abandon the WU than spend 200+ CPU hours on it for no result at all. Both of the other computers on this WU seem to have an \'output is empty\' message in their results. Does this mean that those two computers have wasted their time? Am I wasting my time?
                    http://szdg.lpds.sztaki.hu/szdg/result.php?resultid=276891
                    http://szdg.lpds.sztaki.hu/szdg/result.php?resultid=276892

                    Call me impatient if you wish. Call me unsuitable for this project.
                    As others have previously posted in these forums, I\'m not prepared to work
                    on a project which has so little going for it.


                    robert.mouris
                    Send message
                    Joined: Nov 3 05
                    Posts: 129
                    Credit: 4,124,194
                    RAC: 0
                    Message 5312 - Posted 26 Dec 2006 20:58:49 UTC - in response to Message 5310.

                      Last modified: 26 Dec 2006 21:01:07 UTC

                      Call me unsuitable for this project.

                      Distributed computing is free and voluntary. So you are not unsuitable for the project, the project is unsuitable for you.

                      Sorry to see you leaving, especially leaving with a negative feeling. I wish you more satisfaction with the other projects you are going to work for.

                      Robert
                      ____________

                      Chris Luth
                      Send message
                      Joined: Feb 14 06
                      Posts: 2
                      Credit: 2,616
                      RAC: 0
                      Message 5518 - Posted 22 Jan 2007 12:05:40 UTC

                        So I take it by this thread that my WUs that appear to be stuck (on one machine, I\'m sitting at 57.999% with 20 hours of processing, and on the other machine, I\'m at 30.000% with 15 hours of processing) are not really stuck? I was about to abort, reset and even perhaps detach/reattach the project (I can\'t remember, but I may have aborted one or two of previous WUs too, thinking they were stuck), but I thought I should check the forums before doing so.

                        If I\'m reading this thread correctly, then, as I said, they\'re not stuck--they\'re just in the middle of a certain line and will be updated when that line is finished processing. Can anyone confirm my analysis?

                        If I really am having problems, I\'ll reset the project, but if it\'s working as designed, then I\'ll leave it be. It is a bit scary, though, that my RAC is 0.02, and no completed WUs are appearing in the results list (though this may be because I had aborted a WU the other day, so it\'s just that I haven\'t completed one since then). Might I really be having a problem?
                        ____________

                        Stick
                        Send message
                        Joined: Jun 12 06
                        Posts: 193
                        Credit: 66,271
                        RAC: 0
                        Message 5519 - Posted 22 Jan 2007 13:24:47 UTC - in response to Message 5518.

                          Last modified: 22 Jan 2007 13:25:02 UTC

                          If I\'m reading this thread correctly, then, as I said, they\'re not stuck--they\'re just in the middle of a certain line and will be updated when that line is finished processing. Can anyone confirm my analysis?


                          Your analysis is correct. You just need to \"believe\". But, you should also read this and the messages that follow it. That is, the problem with writing disk \"checkpoints\" could cause your results to be invalid if you have exited/restarted BOINC while your WU\'s are \"In progress\".

                          ____________

                          robert.mouris
                          Send message
                          Joined: Nov 3 05
                          Posts: 129
                          Credit: 4,124,194
                          RAC: 0
                          Message 5520 - Posted 22 Jan 2007 13:43:39 UTC - in response to Message 5518.

                            If I\'m reading this thread correctly, then, as I said, they\'re not stuck--they\'re just in the middle of a certain line and will be updated when that line is finished processing. Can anyone confirm my analysis?

                            I have never had a WU that was really \"stuck\". Sooner or later they did progress. I would leave yours crunching.
                            ____________

                            larry1186
                            Send message
                            Joined: Sep 25 06
                            Posts: 37
                            Credit: 18,502
                            RAC: 0
                            Message 5523 - Posted 22 Jan 2007 17:58:33 UTC - in response to Message 5520.

                              If I\'m reading this thread correctly, then, as I said, they\'re not stuck--they\'re just in the middle of a certain line and will be updated when that line is finished processing. Can anyone confirm my analysis?

                              I have never had a WU that was really \"stuck\". Sooner or later they did progress. I would leave yours crunching.


                              I have had some (maybe one or two) WUs get stuck, but they weren\'t SZTAKI WU. I say they were stuck because the CPU time was not incrementing but yet the status was \"running\". SZTAKI WUs just seem to be stuck since the % complete doesn\'t move for hours upon hours, BUT the CPU time still climbs indicating that it is doing something. Leave them crunching unless the CPU time is not incrementing while the status is set as \"running\".
                              ____________
                              Don't get distracted by shiny objects.

                              robert.mouris
                              Send message
                              Joined: Nov 3 05
                              Posts: 129
                              Credit: 4,124,194
                              RAC: 0
                              Message 5525 - Posted 22 Jan 2007 20:34:42 UTC

                                Last modified: 22 Jan 2007 20:34:59 UTC

                                Not exiting Boinc isn\'t enough. I should also make sure to avoid \"No heartbeat from core client for 31 sec - exiting\", as seen here. No credits in perspective. So we try with a new replacement result unit, number 16! Will this be enough to get a quorum? We are heading dangerously towards the 10/20/10 limit.
                                ____________

                                Profile Nightbird
                                Forum moderator
                                Avatar
                                Send message
                                Joined: Jul 12 05
                                Posts: 920
                                Credit: 114,924
                                RAC: 0
                                Message 5526 - Posted 23 Jan 2007 2:07:36 UTC - in response to Message 5525.

                                  Last modified: 23 Jan 2007 2:09:00 UTC

                                  Not exiting Boinc isn\'t enough. I should also make sure to avoid \"No heartbeat from core client for 31 sec - exiting\", as seen here. No credits in perspective. So we try with a new replacement result unit, number 16! Will this be enough to get a quorum? We are heading dangerously towards the 10/20/10 limit.

                                  Since all the resultid\'s have the infamous \"APP: Output is empty, placing msg in out.txt\" in their stderr out, presently a quorum is not possible and no credit will be granted.

                                  ____________

                                  Philip Martin Kryder
                                  Send message
                                  Joined: Apr 3 06
                                  Posts: 73
                                  Credit: 10,801
                                  RAC: 0
                                  Message 5527 - Posted 23 Jan 2007 2:37:35 UTC - in response to Message 5525.

                                    Not exiting Boinc isn\'t enough. I should also make sure to avoid \"No heartbeat from core client for 31 sec - exiting\", as seen here. No credits in perspective. So we try with a new replacement result unit, number 16! Will this be enough to get a quorum? We are heading dangerously towards the 10/20/10 limit.


                                    Why dangerously?

                                    I\'ve never seen a cogent argument for keep those numbers \"low.\"

                                    If the initial replication \"works\"
                                    Then the others are not needed or sent
                                    If the initial replication doesn\'t work, then the higher numbers are there to help.



                                    ____________

                                    Chris Luth
                                    Send message
                                    Joined: Feb 14 06
                                    Posts: 2
                                    Credit: 2,616
                                    RAC: 0
                                    Message 5539 - Posted 24 Jan 2007 5:47:40 UTC

                                      OK. On two of my three computers, BOINC runs solidly in the background, and they\'re up for 30, 60, 90 or more days at a time.

                                      The third (and my primary) computer is a laptop. I don\'t usually restart it (although occasionally it gets locked up or the battery dies or whatever, because I actually use it and do things to it), so I understand from these posts that some of those SZTAKI results may be error-prone. However, it does get put to sleep a lot, and of course BOINC cycles through the projects every few hours, suspending SZTAKI.

                                      I do have BOINC set to leave projects in memory. As long as the BOINC app itself isn\'t killed or the computer, will the SZTAKI WUs survive through these suspensions (sleeps and other projects running)?

                                      Is a solution to this problem near?

                                      I\'ve been considering suspending or detaching from this project as well as SETI@Home and SETI@Home Beta. I\'m not convinced that we will ever find extraterrestrial life, and I\'m not familiar enough with the mathematics behind this project to know if it is really a great problem of our time, and so I\'m leaning towards ensuring my cycles are going to projects that will help humanity the greatest, like protein folding and other medically-slanted projects that could help find a cure for cancer or other diseases, although I find Einstein@Home interesting, as physics and astronomy has always fascinated me.

                                      With that in mind, does someone want to attempt to convince me that SZTAKI is worthwhile? :-)

                                      It\'s fun reliving memories seeing the Hungarian here: I\'ve spent a week in Hungary and enjoyed both Buda and Pest (mostly Pest, sorry Buda) and some of the country hospitality in rural regions (driving through the eastern part on our way to Romania). My all-time favorite place name is Oktogon, translated in one (or more) of my English guidebooks as the oxymoronic \"Octagon Square.\" (Not by my choice, I remember eating in a Burger King there, or at least nearby...I would prefer to eat authentic Hungarian food, but my traveling companion--who was paying for the food--is less interested in culinary exploration than I am...)
                                      ____________

                                      gwg
                                      Avatar
                                      Send message
                                      Joined: Aug 1 06
                                      Posts: 58
                                      Credit: 213,886
                                      RAC: 0
                                      Message 5548 - Posted 24 Jan 2007 22:17:59 UTC - in response to Message 5539.

                                        ...

                                        I\'ve been considering suspending or detaching from this project as well as SETI@Home and SETI@Home Beta. I\'m not convinced that we will ever find extraterrestrial life, and I\'m not familiar enough with the mathematics behind this project to know if it is really a great problem of our time, and so I\'m leaning towards ensuring my cycles are going to projects that will help humanity the greatest, like protein folding and other medically-slanted projects that could help find a cure for cancer or other diseases, although I find Einstein@Home interesting, as physics and astronomy has always fascinated me.

                                        With that in mind, does someone want to attempt to convince me that SZTAKI is worthwhile? :-)


                                        I\'m as frustrated as you are, but I can assure you that this work is fundamental to our understanding of numbers and very important to many fields of Engineering and Computer Science, as well as its importance intellectually.

                                        Results will be in the nature of new ways to do computation, and in particular, in coding theory, digital filters and encryption. Basically, to do anything with numbers (even counting), we have to find some way of representing them. As we know, the positional number system based on powers of ten, perfected in India and passed on to us by Arab scholars, is much more useful for doing computations than non-positional systems, such as Roman numerals.

                                        Some positional systems are better than others both for performing computations and for revealing the fundamental structure of numbers. We are familiar with binary number systems because most computer arithmetic is base 2 or some power of 2. The reason for this is that computational operations are particularly simple with base 2, and because storage with 2 stable states tends to be more reliable than with more than 2.

                                        However, there are many ways of representing numbers based on 2, and some of them are not unique (also true of base ten, but we won\'t go into that). If a number has more than one representation, or if more than one number has the same representation, it is in most cases useless for computational purposes.

                                        What this project is doing is searching for unique representations in generalised number systems, discarding the ones that aren\'t unique. Why do so?

                                        Well, the hope is to find a pattern in the distrubution of unique number systems, and thence to develop theorems about the properties of unique number systems.

                                        A similar sort of challenge is work in prime numbers: if you study their distribution, you can not only develop better means of finding prime numbers, but you actually find out more about the nature of numbers themselves. For instance, there is a relationship between the distribution of prime numbers and the zeros of the Reimann zeta function. An understanding of why this should be so would give enormous insight into the nature of numbers (and hence of the nature of existence) itself.

                                        In a similar manner, knowing about unique number systems and why they are unique, will give insights into the nature of numbers themselves.

                                        These comments are entirely my own — comments from a non-mathematician who has nevertheless been involved in algorithms for computer arithmetic, algorithms for digital correlators and filters, encryption, and computational algorithms for experimental physics — based on my own background, and do not represent the mathematicians for whom SZTAKI computations are being done, nor have I any contact with them.

                                        George
                                        ------
                                        ____________
                                        Dr George W Gerrity
                                        4 Coral Place
                                        Campbell, ACT 2612
                                        AUSTRALIA

                                        Ph: +61 2 6156 0286
                                        Time: +10 hours (ref GMT)
                                        PGP RSA Public Key Fingerprint:
                                        73EF 318A DFF5 EB8A 6810 49AC 0763 AF07

                                        Post to thread

                                        Message boards : SZTAKI Desktop Grid : when do these wu finish?


                                        Home | My Account | Message Boards


                                        Copyright © 2017 SZTAKI Desktop Grid