Jump to content

Welcome to Smart Home Forum by FIBARO

Dear Guest,

 

as you can notice parts of Smart Home Forum by FIBARO is not available for you. You have to register in order to view all content and post in our community. Don't worry! Registration is a simple free process that requires minimal information for you to sign up. Become a part of of Smart Home Forum by FIBARO by creating an account.

 

As a member you can:

  •     Start new topics and reply to others
  •     Follow topics and users to get email updates
  •     Get your own profile page and make new friends
  •     Send personal messages
  •     ... and learn a lot about our system!

 

Regards,

Smart Home Forum by FIBARO Team


Recommended Posts

Posted (edited)
19 minutes ago, tinman said:

So the question is more, is the quickapp code brokek because something caused overflow and data corruption, or the code contains characters which breaks the pipe

Interesting.

 

Btw, it seems to be some cron job that restarts the QA every minute (on crash) - or rather, it really likes to restart the QA 1sec past the whole minute for me - no matter when it crashes.

Edited by jgab
  • Topic Author
  • Posted
    3 minutes ago, jgab said:

    Interesting.

     

    Btw, it seems to be some cron job that restarts the QA every minute - or rather, it really likes to restart the QA 1s past the whole minute for me - no matter when it crashes.

    restarting QA by cron it's look like woodpecker ? with hope what issue in QA will solved by praying ? 

    so we have a situation with low mem, for example and restarting by cron trying to put the last nail in the coffin  - the next stop is 502

    Posted

    Thanks guys. Really appreciate it. @jgab Yeah the "one minute" watchdog thing makes sense in explaining the repeated (nonsensical) error. Like @10der says it has the signature of a "leak" but I would like to point out that it has to something "localised" because the error does not seem to affact the BUI or scenes or any other part of HC3. @tinman interesting data, you happen to know what kind of "pipe" we're talking? I mean is it at Linux pipe or some TCP socket thingy? I was just wondering what gets "killed" when this error gets logged and which part stays up (threads? processes?). To it sounds as if there is a A and B and A detects a problem with B en reloads or restarts A, but that does not solve the problem because B behaves in exactly the same way as before. Rebooting kills A en B and solves the problem.

     

    1 minute ago, 10der said:

    restarting QA by cron it's look like woodpecker ? with hope what issue in QA will solved by praying ? 

    I think I get what you mean, if you do a a "device by zero" in your scene you can restart as many times as you like, you'll get the same error.

     

    On the other hand... If the cause is external then it does make sense.

     

    It's not bad. But I cannot think of anything better from a "supervision" point of view. Except.... Of course.... catching some post mortem information and give some error message LOL

    • Like 1
    Posted (edited)

    I vaguely remember reading about "Luabind" exceptions and if you want to waste some time on a lazy afternoon then this might be the ultimate read :D 

     

    Please login or register to see this link.

     

    TL;DR

     

    So part "13  Exceptions": 

     

    ... If any of the functions you register throws an exception when called, that exception will be caught by luabind and converted to an error string

    ... If the exception is unknown, a generic string saying that the function threw an exception will be pushed.

    ... The state function returns a pointer to the Lua state in which the error was thrown. This pointer may be invalid if you catch this exception after the lua state is destructed. If the Lua state is valid you can use it to retrieve the error message from the top of the Lua stack.

     

    So even without never having compiled and used "luabind" I think there are many ways in which error messages can just disappear at this level and there are a few more levels

     

    This, however, does not explain why the crashes repeat.

     

    Of course, if this was Python, someone might just have written this code somewhere

     

    Please login or register to see this code.

     

    It's never easy. As someone said "you can hide complexity, but you cannot make it go away"

    Edited by petergebruers
    Posted (edited)
    39 minutes ago, petergebruers said:

    interesting data

     

    let's show that more detailed

     

    39 minutes ago, petergebruers said:

    you happen to know what kind of "pipe" we're talking? I mean is it at Linux pipe or some TCP socket thingy?

     

    the OS is r/o, so things are running from tmp, so main QA code per QA is loaded from there, 

     

     

    Please login or register to see this image.

    /monthly_2022_01/image.png.9ce54cfdd614c46e6fa58d8a8cd9c0c3.png" />

     

    piped together with (~include part) user part of code

     

    image.png.8e40fe21ad564003c1a03b87f740cbdc.png

     

    and parsed line by line to the to the startluaenvironment (there are lot of checks to catch errors ... the last one is the unknown error)

     

    Please login or register to see this attachment.

     

     

     

    39 minutes ago, petergebruers said:

    I was just wondering what gets "killed" when this error gets logged and which part stays up (threads? processes?). To it sounds as if there is a A and B and A detects a problem with B en reloads or restarts A, but that does not solve the problem because B behaves in exactly the same way as before. Rebooting kills A en B and solves the problem.

     

    so i can imagine when names are broken (remember? space / special characters issue in lua QA files / libs) things could get bad, but as your QA is running all the time, this is not here the case. Wrong characters would be all the time here as well, unless the code got replaced somehow (i think loud about API calls to replace code with external tools, remember 502?)

     

    The content from these tmp files, if these are broken, only reboot can fix it, which make sense or how it works on r/o systems.

     

    However, is the content broken, or piped wrong? Support can of course login and check these tmp file, to see if they broken, but that does not solve the problem that they broken (the question is why they broken). If they not broken, again no easy to track, because the question remains, why when they not broken, they can't be piped together ... Of course it can be as well that settimeout part of the code is piped/executed corrupted. 

     

    While generally one could say "something is wrong here", i know that i got fixed these errors in my quick apps by catching / thinking bit more, about errors coming from e.g. tcp http.
    I know that there was memory leak in 5.040, which caused massive errros after months or runtime when one tried to change quickapp code (while the tmp code was still ok and running), but that has been fixed long time ago as well, and that was not the "unknown" thing alone.

     

    I think Fibaro could spend some hours on that (even if this is 99,7% only error for @10derbut i think they will need access to affected HC3 before it got rebooted, of sh***t this is what you proposed @petergebruers ?

     

     

     

     

    Edited by tinman
    Posted

    I gave up...

    I will move automation sys to HASS.

    HC3 will works as coordinator for me.

  • Topic Author
  • Posted
    54 minutes ago, rangee said:

    I gave up...

    I will move automation sys to HASS.

    HC3 will works as coordinator for me.

    * i am HC3 mobile APP addiction

    * i want to see all my zigbee devices in HC3

     

    but yes, no automation in HC3.

    yesterday i have remove last one - GPS (the lighting when i near  the home), cuz HC3 mobile APP working thru the part of the body we sit on!

     

    Posted (edited)
    9 hours ago, petergebruers said:

    Well, that was kind of the whole point why I reported it to helpdesk, because it is hard to reproduce and my HC3 had such QA in such a "weird state". It happens 3 times per year. I offered Fibaro a way to look into this and they said "you have to use paid support for that".

     

    I guess it's their right to think that it's your code that is the problem. You get an error "Unknown error occurred:" even if the error message seems to be "chopped" off.

    The error is probably not that common so an investigation ends up at the bottom of the back-log...

    If we could come up with an example that always generate the error it may get more attention...?

     

    Providing code that other's use, it quickly appears (to a normal organization) that it's worth investing in good error messages for the users of the code. It reduces drastically the support effort needed if users can figure out themselves what went wrong. The "Silent restart" and the "Unknown error occurred:" clearly falls under this even if the "Silent restart" seems to be more common. 

    "Silent restart" is easy to fix and "Unknown..." they seems to have a clue what it is...

    If I was sitting on the money at Fibaro I would see a business case to fix it asap - besides cost of unnecessary support, at stake is also the attractiveness of the HC3 as a developer platform going forward.... which we now see as good developers increasingly turning their back to the platform.

     

     

    Edited by jgab
    • Like 1
    Posted

    I rarely see unknown error. Its not problem. But memory is increasing every time.

    after 2 days, it shows over 90%.

    there are some memory leaks. Maybe it could be my fault.

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.

    Guest
    Reply to this topic...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    ×
    ×
    • Create New...