I haven\'t received any work for Sztaki anymore


Advanced search

Message boards : Észrevételek, tapasztalatok : I haven\'t received any work for Sztaki anymore

AuthorMessage
Kurt D.C.
Send message
Joined: Feb 24 06
Posts: 3
Credit: 4,010
RAC: 0
Message 2724 - Posted 20 May 2006 17:42:01 UTC - in response to Message 1650.

    Problems with Boinc-client:

    Options set to 95% Seti, 5% to Sztaki.

    Since march 5th I haven\'t received any work for Sztaki anymore.

    Kurt.
    ____________

    Odysseus
    Avatar
    Send message
    Joined: Feb 27 06
    Posts: 212
    Credit: 221,397
    RAC: 0
    Message 2725 - Posted 20 May 2006 20:07:28 UTC - in response to Message 2724.

      Since march 5th I haven\'t received any work for Sztaki anymore.

      Could you post twenty lines or so of messages from the log, beginning where the BOINC client is launched?
      ____________

      Profile Nightbird
      Forum moderator
      Avatar
      Send message
      Joined: Jul 12 05
      Posts: 920
      Credit: 114,924
      RAC: 0
      Message 2726 - Posted 20 May 2006 21:05:27 UTC

        Last modified: 20 May 2006 21:12:44 UTC

        thread moved from Ad@m\'s BLOG (page 3) ;)
        ____________

        Profile Nightbird
        Forum moderator
        Avatar
        Send message
        Joined: Jul 12 05
        Posts: 920
        Credit: 114,924
        RAC: 0
        Message 2733 - Posted 22 May 2006 0:01:16 UTC

          Ehm, you have received 50 wus now.
          ____________

          Rakarin
          Avatar
          Send message
          Joined: Feb 4 06
          Posts: 17
          Credit: 46,513
          RAC: 0
          Message 2737 - Posted 22 May 2006 4:37:55 UTC - in response to Message 2724.

            Last modified: 22 May 2006 4:39:43 UTC

            Problems with Boinc-client:
            Options set to 95% Seti, 5% to Sztaki.
            Since march 5th I haven\'t received any work for Sztaki anymore.
            Kurt.



            [Edit: Never mind. I just saw the post about bad WU\'s from March 8 & 9.]


            I know it\'s a bit late now...

            Around mid March, I noticed my Linux PC wasn\'t running any SZTAKI work units. I waited a few days to see if the server was down. Nothing for Linux, no errors, but I was getting work units on my Windows and Mac PC\'s. I reset the project on that computer, and it\'s been working fine since.

            Is it possible there were some bad work units around that time?

            Not that it makes much difference now.

            ____________

            Profile Ananas
            Send message
            Joined: Jul 12 05
            Posts: 222
            Credit: 665,833
            RAC: 0
            Message 2740 - Posted 22 May 2006 8:43:23 UTC

              Last modified: 22 May 2006 8:45:51 UTC

              That sounds much like a problem with piled up debits.

              BOINC collects debits for projects that do not run for any reason, this happens when a project is out of work or when a different project runs in \"panic mode\" (earliest deadline first) or when it\'s set to \"no new work\" and so on

              The other projects, which run instead of the project that should run, receive negative debits during this phase.

              If a project has collected too many negative debits, BOINC will refuse to download enough work to fill the caches, later it will refuse to download any work anymore from this project.

              This can be fixed without using \"reset\" by either editing client_state.xml and setting all <long_term_debt> and <short_term_debt> values to 0 (stop BOINC before you modify client_state.xml or by using a modified BOINC core client

              The optimized BOINC client by Trux has a command line option for resetting those credits and there is a simple way to patch any BOINC client to make a restart reset all debits (this is what I am using).

              The Berkeley people are aware of the problem and will redesign the scheduler soon.

              Profile Keck_Komputers
              Avatar
              Send message
              Joined: Sep 18 05
              Posts: 16
              Credit: 7,898
              RAC: 0
              Message 2741 - Posted 22 May 2006 13:49:55 UTC - in response to Message 2740.


                The Berkeley people are aware of the problem and will redesign the scheduler soon.


                I think you are mistaken here. This is not a problem it is how BOINC adheres to the resource shares set by the participant. There is a scheduler redesign expected in the next client however I do not believe it will include any major changes to LTD or how it is used to respect resource shares.
                ____________
                BOINC WIKI

                BOINCing since 2002/12/8

                Profile Ananas
                Send message
                Joined: Jul 12 05
                Posts: 222
                Credit: 665,833
                RAC: 0
                Message 2742 - Posted 22 May 2006 14:41:38 UTC

                  Last modified: 22 May 2006 14:47:41 UTC

                  It is a problem of the scheduler, as there is not a limit for those debits.

                  Projects like LHC need power only now and then - but when they need it they need much of it. This makes people set LHC set a very high priority so IF it kicks in, it crunches a lot.

                  Or projects like Orbit - people have attached to it hoping to get work some day within the next 20 years - but it already influences the caching behaviour now, which is not such a good idea.

                  Another one where the caching fails to work after some time : I have one Sulphur task running on a Dual Xeon HT. In order to ensure that this task is running all the time, it has a little more than 25% of the ressources. This \"little more\" piles up debits for CPDN and negative debits for other projects - so after a while the other projects don\'t fill their cache anymore.


                  The scheduler should not pile up endless amounts of debits for projects that have no work, the long_term_debit needs to be limited.

                  Everything would be fine, if the long_term_debit would fade away similar to the RAC, so 10 years old debits wouldn\'t count so much anymore today.


                  In theory, it\'s of course a problem of \"wrong\" ressource share settings - but the program needs to be able to handle it as the \"right\" settings exist only in theory.

                  Kurt D.C.
                  Send message
                  Joined: Feb 24 06
                  Posts: 3
                  Credit: 4,010
                  RAC: 0
                  Message 2743 - Posted 22 May 2006 23:41:43 UTC - in response to Message 2724.

                    Problems with Boinc-client:

                    Options set to 95% Seti, 5% to Sztaki.

                    Since march 5th I haven\'t received any work for Sztaki anymore.

                    Kurt.


                    When I suspend receiving units for Seti, I am receiving Sztaki units again. The problem is that a projects sends too much work (at least the boinc client think it\'s too much). In fact every unit has a much larger estimated time then it really takes to process it. This results in the boinc client switching to overcommitted state and earliest-deadline-first processing.

                    5/22/2006 5:27:19 PM|SZTAKI Desktop Grid|Throughput 30306 bytes/sec
                    5/22/2006 5:55:52 PM||Allowing work fetch again.
                    5/22/2006 6:03:23 PM||Rescheduling CPU: application exited
                    5/22/2006 6:03:23 PM|SZTAKI Desktop Grid|Computation for task 4883f9e4-ac07-4faf-9ce8-a4c3e82c263d_1 finished
                    5/22/2006 6:03:23 PM|SZTAKI Desktop Grid|Starting task a5faec8e-96bb-481c-9ef9-523631cd9156_1 using search version 110
                    5/22/2006 6:03:24 PM||Suspending work fetch because computer is overcommitted.
                    5/22/2006 6:03:25 PM|SZTAKI Desktop Grid|Started upload of file 4883f9e4-ac07-4faf-9ce8-a4c3e82c263d_1_0
                    5/22/2006 6:03:27 PM|SZTAKI Desktop Grid|Finished upload of file 4883f9e4-ac07-4faf-9ce8-a4c3e82c263d_1_0
                    5/22/2006 6:03:27 PM|SZTAKI Desktop Grid|Throughput 26028 bytes/sec
                    5/22/2006 6:04:40 PM||Allowing work fetch again.
                    5/22/2006 6:04:50 PM||Suspending work fetch because computer is overcommitted.
                    5/22/2006 6:04:55 PM||Allowing work fetch again.
                    5/22/2006 6:12:53 PM||Rescheduling CPU: application exited
                    5/22/2006 6:12:53 PM|SZTAKI Desktop Grid|Computation for task 3eabf159-4166-4dda-90d3-68cfaa7ee55b_1 finished
                    5/22/2006 6:12:53 PM||Resuming round-robin CPU scheduling.
                    5/22/2006 6:12:53 PM|SETI@home|Restarting task 26fe99aa.5731.17680.272152.3.142_2 using setiathome_enhanced version 512
                    5/22/2006 6:12:53 PM|SETI@home|Resuming task 30mr99aa.29736.11506.254826.3.201_0 using setiathome_enhanced version 512
                    5/22/2006 6:12:53 PM|SZTAKI Desktop Grid|Pausing task a5faec8e-96bb-481c-9ef9-523631cd9156_1 (removed from memory)
                    5/22/2006 6:12:55 PM|SZTAKI Desktop Grid|Started upload of file 3eabf159-4166-4dda-90d3-68cfaa7ee55b_1_0
                    5/22/2006 6:12:59 PM|SZTAKI Desktop Grid|Finished upload of file 3eabf159-4166-4dda-90d3-68cfaa7ee55b_1_0
                    5/22/2006 6:12:59 PM|SZTAKI Desktop Grid|Throughput 27186 bytes/sec
                    5/22/2006 9:38:25 PM||Rescheduling CPU: application exited
                    5/22/2006 9:38:25 PM|SETI@home|Computation for task 26fe99aa.5731.17680.272152.3.142_2 finished

                    So it\'s in \'allowing work fetch again\', it uploads a result and results in overcommitted state without having fetched units. Strange not?
                    ____________

                    Kurt D.C.
                    Send message
                    Joined: Feb 24 06
                    Posts: 3
                    Credit: 4,010
                    RAC: 0
                    Message 2744 - Posted 23 May 2006 0:15:07 UTC - in response to Message 2733.

                      Ehm, you have received 50 wus now.


                      Yes finaly received the 50 wu\'s after detaching and reattaching the project.

                      Now 49 of the 50 are processed and still no new wu\'s are feched (even though the seti project is set to \'no new tasks\').

                      Guess the sheduler is based on a far too complicated logic. Perhaps that\'s also why the number of server connects is that high (on average 4 times per wu for the seti since switching to boinc).
                      ____________

                      Profile Ananas
                      Send message
                      Joined: Jul 12 05
                      Posts: 222
                      Credit: 665,833
                      RAC: 0
                      Message 2745 - Posted 23 May 2006 1:04:43 UTC

                        Last modified: 23 May 2006 1:16:17 UTC

                        A smaller cache might help (\"Overcommitted\" is always a sign of a way too large cache). With 2 or 3 projects attached to a computer, it shouldn\'t need more than a day of caching (I\'m using less than half a day even).


                        Remember, BOINC tries to collect the cache size for each project so 5 days cache * 2 projects is already 10 days cache - well, it isn\'t exactly like this, I think it tries to respect the ressource share up to some point but that does not always work so well.


                        Now SETI Enhanced is new and your BOINC client does not know the new time correction factor yet. If the project estimated the time needed too low (caused by a high benchmark value), the first few WUs will update this correction factor and it might turn out, that (for example) the downloaded 5 days SETI cache are sufficient for 8 days working - which (together with 5 days for SZTAKI) is already 13 days total cache size.

                        This is when BOINC goes into panic mode, i.e. it chooses the WUs by deadline, not by priority. And - as there is already too much stuff in the cache - it will of course refuse to download even more.
                        ____________________

                        A project reset destroys the correction factor btw. so the BOINC client has to learn it again.

                        A share of only 5% is very little too, this sure influences the caching behaviour.

                        Post to thread

                        Message boards : Észrevételek, tapasztalatok : I haven\'t received any work for Sztaki anymore


                        Home | My Account | Message Boards


                        Copyright © 2017 SZTAKI Desktop Grid