Jump to content

Welcome to Smart Home Forum by FIBARO

Dear Guest,

 

as you can notice parts of Smart Home Forum by FIBARO is not available for you. You have to register in order to view all content and post in our community. Don't worry! Registration is a simple free process that requires minimal information for you to sign up. Become a part of of Smart Home Forum by FIBARO by creating an account.

 

As a member you can:

  •     Start new topics and reply to others
  •     Follow topics and users to get email updates
  •     Get your own profile page and make new friends
  •     Send personal messages
  •     ... and learn a lot about our system!

 

Regards,

Smart Home Forum by FIBARO Team


Recommended Posts

Posted

ohh my ... interessting logs :) will post later 

Posted

I sent a mail to fibaro to have a look at my hc3,

it's not normal that all 4 cores are 80% and up, i have no qa/lua code or what so ever

 

 

Posted

@Sjakie @akatar do you have “outside” application like a HomeKit bridge or commercial version of it ?!

@tinman Tin, Good evening 

is “later” occurred?! ;) 

 
Quote

 

Please login or register to see this link.

ohh my ... interessting logs :) will post later 

 

 

Posted

@10der No, i had qa buienradar, deleted that one and i had google home, erased my account

so nothing outside, except what hc3 itself sends around to google and fibaro (i use a pihole, so not much is going to google)

Posted
3 hours ago, 10der said:

is “later” occurred?! ;) 

 

yes, it is.

 

Short answer - combination of lua errors(ER), missed values (ER), json error (ER), hue timeouts, and probably Telegram QA killing TCP. Question is who start the disaster (watchdog loop, starting zwave, starting hcserver, hcserver crashing due to errors .. and so on). As there are times in which all services are up and running for some time, the HC3 seems to work (even if delayed due to high load), till next crash. Additionally 3 Aeotec's and one Fibaro needs probably soft-reconfiguration. @Sjakie got full description of my observation, and can check (maybe with Jan) ER related things, Telegram can be disabled (for test), these few devices soft-reconfigured - and then i can take a look again on logs (after HC3 reboot). I would probably try to switch to Hue QA, it gives at least some logs, so one can see how it behaves (or fix the other things first and then check again if hue is doing still something bad to tcp read/write).

 

  • Like 1
  • Thanks 1
  • Topic Author
  • Posted

    hi @10der,

    The only "outside" application I use is Raspberry Pi.

    //Sjakie

    Posted
    6 hours ago, tinman said:

    yes, it is.

    Short answer - combination of lua errors(ER), missed values (ER), json error (ER),. @Sjakie got full description of my observation, and can check (maybe with Jan) ER related things,

    Great, it would be nice to have a look.

    My understanding is that @Sjakie is in the process to try and port a large set of ER rules from HC2 to HC3, and is understandable getting a lots of errors.  

    ER throws error when detected, often started by Lua throwing the errors. Usually errors result in the offending rule being disabled. 

    So I would expect a lot of errors. Are these error reports causing the load - are they repeated and frequent (every x ms?) - then it would be interesting to see if we can do anything about it. If unable to parse rules, errors are reported and the QA is terminated, meaning that it would go into the 60s restart cycle. I imagine that would cause a spike in load every 60s as all the rules as re-parsed. Maybe the QA should be disabled after X restarts.

    Anyway, would be very interesting to see what's going on. If we can see errors that ER is not reporting in the console log but still generates internal errors it should be something we could improve on.

    @Sjakie could you share @tinman's findings with me?

  • Topic Author
  • Posted

    Morning guys,

    I have not a problem with sharing Tinmans findings if he is okay with it.

    //Sjakie

    Posted

    @Sjakie Do you use my Hue QA? I think there could be problem with growing logs (history). I have tried to add  self:updateProperty("saveLogs", false) to all my Hue QAs
     

     

    @tinman Is it possible to check size of logs/history for QAs?

     

     

     

  • Topic Author
  • Posted

    Hello Petrkl yep Iwas using your QA.

    //Sjakie

    Posted

    Generally interessting, that Sjakie's HC3 got sig11 crashes, something like 11 times within 1.5 hrs, and then no crash for next 6 hrs (as everybody was sleeping), and next 3 crashes between 16:00 and 21:00 only. I don't think it's hardware issue here, on the other hand i never got sig11 on HC3 (even if i test a lot).

     

    The provided logs shows something like 48hrs timeframe, with something like 2000+ lua errors and ~1500 tcp/http errors in that time (there are lot f other errors as well, like wrong propety, no valid icons, bad json, etc., however i think HC3 can handle them). To compare with my test HC3, zero (of these unknown exceptons below) lua errors and ~1200 TCP/HTTP errors  - within 7 days.

     

    ######################################################

     

    that for Jan (somehow), if Sjakie still have things on HC3.

     

    device 397 - ER OverloopCombi:

    Lua error for deviceId: 397. Message: Unknown exception: 

    Lua error for deviceId: 397. Message: ./include/main.lua:3: attempt to index a nil value (global 'Rule')

     

    device 813 - ER BovenVerdieping

    Lua error for deviceId: 813. Message: Unknown exception:

    Lua error for deviceId: 813. Message: #011./include/main.lua:57: '}' expected (to close '{' at line 55) near '.'

     

    These two errors are coming "lot" of times, they (alternately) at the second or right before sig11 crash. 

     

    ######################################################

     

    that for petrkl12 

     

    device 629, 644, 646,  621,  623, 649 (hue QAs) - 

    Unable to load view for Quick App XXX. Something went wrong: $jason:body:sections:unsupported component type: map::at

     

    When no connectivity to bridge, tons of TCP errors, HueMainID_Axxxxxxx_Read, HueMainID_Axxxxxxx_Write.

    No idea if that can be fixed

     

    ######################################################

     

    for Sjakie:
     

    What are that? Webservice / HTTP Requests to them, lot 

    SchoonmaakBadkamer, SchoonmaakToilet, HumDouche,  wasDroger

     

    • Thanks 1
  • Topic Author
  • Posted

    hi,

    Tinman thanks for your support.

    Long story!

    Wasdroger (dryer) Quibino plug was not functioning good

    Plug can deal with 3000W   and so now and then he switched off (max load 2200 W). In cooperation with supplier they will send modified plug for testing. Thats why the dryer is still in the QA.

     

    Webservice / HTTP Requests to them, lot SchoonmaakBadkamer, SchoonmaakToilet, HumDouche,  wasDroger

    those are Global Variabel's

    If toilet/bath room1 or bath room2 door open > 4 min >>>schoonmaak(Cleaning).

    If 6 persons use sanitair let say 8 times a day you should have 50-80 calls max

    HumDouche global variabel when humiity is > x starts mechanical fan

    I have no idea that a GV makes webservice / http requests

     

    Let me try explain:

     

    ER Overloop should become a new QA with the content off the sanitair rooms including mechanical fan to delete all GV (SchoonmaakBadkamer, SchoonmaakToilet, HumDouche).

     

    ER BovenVerdieping should be the new QA for all left overs at the first floor.

     

    Due to the fact I am not so good in all QA's I was starting this "project"2 days ago to reduce the number of QA from 20 pieces  into around 5 pieces.

    But I got also some delays by waiting till HC3 did what I asked.

     

    Okay you can/will say there are other ways to accomplish but for me it was smooth.

    //Sjakie

     

     

    Posted
    4 minutes ago, Sjakie said:

    Webservice / HTTP Requests to them, lot SchoonmaakBadkamer, SchoonmaakToilet, HumDouche,  wasDroger

    those are Global Variabel's

     

    ok, so someting goes wrong there while reading them, that might be nothing, code error, or just HC3 somewhere in nirvana already.

    • Thanks 1
    Posted
    6 hours ago, tinman said:

    that for Jan (somehow), if Sjakie still have things on HC3.

     

    device 397 - ER OverloopCombi:

    Lua error for deviceId: 397. Message: Unknown exception: 

    Lua error for deviceId: 397. Message: ./include/main.lua:3: attempt to index a nil value (global 'Rule')

     

    device 813 - ER BovenVerdieping

    Lua error for deviceId: 813. Message: Unknown exception:

    Lua error for deviceId: 813. Message: #011./include/main.lua:57: '}' expected (to close '{' at line 55) near '.'

     

    These two errors are coming "lot" of times, they (alternately) at the second or right before sig11 crash. 

     

    The "main" of the ER is the user's own code.

     

    397 table Rule is not defined - probably a leftover from Sjakie's rules moved from the HC2. I trap it with a pcall, print a log message and rethrows an error. Causes a restart of the QA.

    813 Lua syntax error and the QA crash when it tries to load users' code in QA/ alternatively save QA.  It's really tricky to catch the error if it's a syntax error when the Lua engine loads the code... causes restart of the QA...? For syntax errors I would expect them to disable the QA - what's the use of continue to reload it?

     

    So I understand the crashes but I don't understand it repeating every sec. Both errors crashes the QA(s) at startup and I would expect the QA fallback to the 60s restart cycle that is the behaviour I have seen.

     

    So, I try to make sure that the ER crashes with grace, but I would expect the QA engine to handle Lua errors without sig11... because that have scary implications....

     

     

    Posted
    7 hours ago, tinman said:

    that for petrkl12 

     

    device 629, 644, 646,  621,  623, 649 (hue QAs) - 

    Unable to load view for Quick App XXX. Something went wrong: $jason:body:sections:unsupported component type: map::at

     

    @Sjakie

    what type of devices are 629, 644, 646,  621,  623, 649?

    Posted
    8 hours ago, tinman said:

    that for petrkl12 

     

    device 629, 644, 646,  621,  623, 649 (hue QAs) - 

    Unable to load view for Quick App XXX. Something went wrong: $jason:body:sections:unsupported component type: map::at

     

    When no connectivity to bridge, tons of TCP errors, HueMainID_Axxxxxxx_Read, HueMainID_Axxxxxxx_Write.

    No idea if that can be fixed

    Everything can be fixed :) but without access to logs as you have it's difficult ...

     

    for example this - I need to know more info about. Maybe I have same error in my HC3 ...

    Unable to load view for Quick App XXX. Something went wrong: $jason:body:sections:unsupported component type: map::at

     

  • Topic Author
  • Posted

    Petrkl,

    Sorry I have done a factory restore due toso many errors it was better(I hope).

    //Sjakie

    Posted

    amen

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.

    Guest
    Reply to this topic...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    ×
    ×
    • Create New...