Jump to content

Welcome to Smart Home Forum by FIBARO

Dear Guest,

 

as you can notice parts of Smart Home Forum by FIBARO is not available for you. You have to register in order to view all content and post in our community. Don't worry! Registration is a simple free process that requires minimal information for you to sign up. Become a part of of Smart Home Forum by FIBARO by creating an account.

 

As a member you can:

  •     Start new topics and reply to others
  •     Follow topics and users to get email updates
  •     Get your own profile page and make new friends
  •     Send personal messages
  •     ... and learn a lot about our system!

 

Regards,

Smart Home Forum by FIBARO Team


Recommended Posts

Posted
6 hours ago, cag014 said:

What you suggest doing with "broken" slave devices/scenes/vraiables?

I mean if I have actions that that executed according to this slave sensors or to turnOn/Off devices on this slave? 

And even so, when the slave back online, AOQ need to be reinitialized to read salve's configuration and check the jM lines? 

 

It might be an idea to have an option to load different jM configurations if any slave is down.

 

 

I suggest to display a warning and continue running without abortscene or crashing. This will enable the AOQ to continue serving non-broken devices and will decrease required human interference which is the main point of automation systems.

I can see that you are excessivly using the fn abortscene() which aborts and wait for user to restart AOQ. I recommend to change that, keep running to serve non-broken devices with displaying proper errors. Dont abort till its an absolute must.

 

For backonline slave controllers, why do you assume its configuration is changed so you need to reload jm?? Actually in a stable running environment, usually there will be no config changes.

Again here, my advice would be keep running using initial jm data you have. User if require can restart the AOQ at his convenience.

  • Topic Author
  • Posted (edited)
    8 hours ago, Mohamed Refaat said:

    For backonline slave controllers, why do you assume its configuration is changed so you need to reload jm?? Actually in a stable running environment, usually there will be no config changes.

    Again here, my advice would be keep running using initial jm data you have. User if require can restart the AOQ at his convenience.

    In order to continue w/o reloading slave configuration, I'll need to store the entire slave configuration somewhere and this is a huge amount of data... it's almost like backup the slave.

    The biggest problem is not to control slave devices, but if the user uses some values of the slave devices for condition or calculations. Since the device is not reachable there is no way to get the correct value for that.  Yes, while AOQ is running and slave disconnected, AOQ uses the last received values of the slave, but once AOQ is restarted all slave data is gone... 

    Edited by cag014
    Posted
    24 minutes ago, cag014 said:

    Yes, while AOQ is running and slave disconnected, AOQ uses the last received values of the slave, but once AOQ is restarted all slave data is gone... 

    Then instead of proper aborting and waiting the user to restart AOQ (its now manual control and the whole automation concept is lost), lets restart the AOQ through embedded code or even crash it and let the system restart it.

    As said before, the key to successful automation ofcourse the correct functionality + diminish human interference. Don't let your user to feel that there is a system that needs maintaining, restarting, checking,..etc. Work silently with displaying proper warnings and restart your AOQ through code only in the most necessary situations. 
     

  • Topic Author
  • Posted
    24 minutes ago, mjahedobeid said:

    Then instead of proper aborting and waiting the user to restart AOQ (its now manual control and the whole automation concept is lost), lets restart the AOQ through embedded code or even crash it and let the system restart it.

    As said before, the key to successful automation ofcourse the correct functionality + diminish human interference. Don't let your user to feel that there is a system that needs maintaining, restarting, checking,..etc. Work silently with displaying proper warnings and restart your AOQ through code only in the most necessary situations. 
     

    Currently (if slave is offline) AOQ crashes after ~10 minutes and restarted by the hub. You're saying let's allow to AOQ to crash immediately and to be restarted by the hub?

    Posted
    28 minutes ago, cag014 said:

    Currently (if slave is offline) AOQ crashes after ~10 minutes and restarted by the hub. You're saying let's allow to AOQ to crash immediately and to be restarted by the hub?

    I will explaing my self more

     

    lets take this example, master that is connected to slave#1 and slave#2. Master is controlling local devices + devices on both slaves.

     

    We have different cases:
    1. Case1: At AOQ startup (for any reason, lets take power shutdown and comming back as reason). Master finds Slave#1 but Slave#2 is missing (maybe late booting, removed, damaged, whatever). In your current code, scene will just abort seizing service totally from local devices, slave#1 devices and will wait for human to interfere, check and restart AOQ.

    Suggestion: dont abort, serve whatever you can. When slave#2 comes back, as per your above explanation you must fetch its data and that cannot be done while running, then when slave#2 comes online, just restart the AOQ with no need for human to interfere. Restarting can be through embedded code or crashing and let the system re-run. However, if you can find a way to fetch data while running, that would be the best.

     

    2. Case 2: While AOQ is running: Slave#2 went offline. Display a warning that it went offline and continue serving other devices without stopping till it comes back online and then you face similar situation like Case1. 

    3. As much as you can, dont let errors/missing devices/ .. totally disturb the whole thing. You can add if conditions to deal with "nil" and give debug messages instead of crashing.
    Dont crash for no reason. Why would a missing device make your code crash after 10 minutes. Restart and find it is still missing, then abort and wait for user. Why all that??
    Dont make the user feel you, You have written already a pretty smart code, dont ruin that by crashing and aborting whenever possible. Instead keep it running whenever possible, try as much as you can to deal with exceptions and errors.

    Really appreciate your keenness to hear us and make it better. 

     

     

    Posted

    Added 2 new Temp Sensors today, gave me 6 new device entries: both temp sensors have one parent and one for temp and hum (IDs 212 ..217)

    AOQ restart showed folowing warning:
    [25.01.2023] [20:58:57] [WARNING] [AOQ98]: jM{-}Ni  Device:214:New[unknown] ➯ New Device has been detected. Please restart AOQ at your convenience.


    After adding some additional lines in user data, it led to the error:

    (36) ID 214 not found.

    The device #214 can be used in regular scenes and automations

     

    After removing it, the other devices worked fine, adding it again gave me the error and QA crashed when referencing #214 (hum sensor) 

    (36) ID 214 not found.

  • Topic Author
  • Posted
    14 hours ago, PSi said:

    Added 2 new Temp Sensors today, gave me 6 new device entries: both temp sensors have one parent and one for temp and hum (IDs 212 ..217)

    AOQ restart showed folowing warning:
    [25.01.2023] [20:58:57] [WARNING] [AOQ98]: jM{-}Ni  Device:214:New[unknown] ➯ New Device has been detected. Please restart AOQ at your convenience.


    After adding some additional lines in user data, it led to the error:

    (36) ID 214 not found.

    The device #214 can be used in regular scenes and automations

     

    After removing it, the other devices worked fine, adding it again gave me the error and QA crashed when referencing #214 (hum sensor) 

    (36) ID 214 not found.

    During the debugging I have reset device number 214 to verify that AOQ recognize new device and forgot to delete the line from the code. 

    You have added new device with exact the same ID number. Unbelievable, what a chance for that!!!

    Sorry my bad

    Please download attached version:

     

    Please login or register to see this attachment.

    Posted
    8 hours ago, cag014 said:

    During the debugging I have reset device number 214 to verify that AOQ recognize new device and forgot to delete the line from the code. 

    You have added new device with exact the same ID number. Unbelievable, what a chance for that!!!

    Sorry my bad

    Please download attached version:

     

    Please login or register to see this attachment.

    Thanks for the fix, now going to buy a lottery ticket ;)

    • Like 1
  • Topic Author
  • Posted (edited)
    On 1/25/2023 at 4:28 PM, Mohamed Refaat said:

    I will explaing my self more

     

    lets take this example, master that is connected to slave#1 and slave#2. Master is controlling local devices + devices on both slaves.

     

    We have different cases:
    1. Case1: At AOQ startup (for any reason, lets take power shutdown and comming back as reason). Master finds Slave#1 but Slave#2 is missing (maybe late booting, removed, damaged, whatever). In your current code, scene will just abort seizing service totally from local devices, slave#1 devices and will wait for human to interfere, check and restart AOQ.

    Suggestion: dont abort, serve whatever you can. When slave#2 comes back, as per your above explanation you must fetch its data and that cannot be done while running, then when slave#2 comes online, just restart the AOQ with no need for human to interfere. Restarting can be through embedded code or crashing and let the system re-run. However, if you can find a way to fetch data while running, that would be the best.

     

    2. Case 2: While AOQ is running: Slave#2 went offline. Display a warning that it went offline and continue serving other devices without stopping till it comes back online and then you face similar situation like Case1. 

    3. As much as you can, dont let errors/missing devices/ .. totally disturb the whole thing. You can add if conditions to deal with "nil" and give debug messages instead of crashing.
    Dont crash for no reason. Why would a missing device make your code crash after 10 minutes. Restart and find it is still missing, then abort and wait for user. Why all that??
    Dont make the user feel you, You have written already a pretty smart code, dont ruin that by crashing and aborting whenever possible. Instead keep it running whenever possible, try as much as you can to deal with exceptions and errors.

    Really appreciate your keenness to hear us and make it better. 

     

     

    1. As I Mentioned, AOQ will restart automatically every 10 minutes.

        Using your example, AOQ started, but Slave#2 is missing and his devices are part of conditions - like

        {"`light`", "turnOn", " if {"`lux'slave2`:value>60 and `motion'slave2`:value=true}"}

         for above line I need some values which are not available.

         Please pay attention that in both statements the property is value, but one of them is a number and another is a Boolean! Another problem is to deal with global variables,

           while AOQ doesn't have any reference to their values. 

         So even if somehow to make it possible to ignore errors it will be a mess why and when the lights are ON (in this example) or any other unpredictable condition result.

         At least in my case I am adding new devices to HC3, while majority of devices and conditions are part of my slave HC2. 

    2. While AOQ is running and slave goes offline, it will not crash the AOQ. It will continue to run (using latest value of devices before went down) and will back to full operation when the slave is back online.

     

         The only reasonable option is to ignore any jM line where offline slave devices are included and when slave is back online to re-init AOQ.

         By the way I don't see a big problem to re-init AOQ, since all working devices are already in the correct state and offline slave devices and conditions should be initialized anyhow. 

     

    Edited by cag014
    Posted
    4 hours ago, cag014 said:

    The only reasonable option is to ignore any jM line where offline slave devices are included and when slave is back online to re-init AOQ.

    Why dont you use the same at startup?

    Please be reminded that stopping coz of 1 slave missing will stop functionality to all, even local devices.

    Posted

    Using v15.8, noticed the following which is driving me crazy

    AOQ id on master is 38, using it to get status of light switch on a slave. light switch id on the slave is 38 too.

     

    AOQ always read this status as on, even when turned off and can see it with right status on slave.

     

    Does common ID between AOQ and switch has anything to do with that?

     

     

     

  • Topic Author
  • Posted
    18 hours ago, Mohamed Refaat said:

    Why dont you use the same at startup?

    Please be reminded that stopping coz of 1 slave missing will stop functionality to all, even local devices.

    Because on startup I don't have any data of the slave, I mean no device IDs, properties and values, same for global variables. I even don't know if the devices included in jM line are actually exist in the slave.

    Once AOQ is running, I have all necessary data of the slave, and if it goes down AOQ still has the latest properties and values. That's the different.

     

  • Topic Author
  • Posted (edited)
    22 hours ago, Mohamed Refaat said:

    Using v15.8, noticed the following which is driving me crazy

    AOQ id on master is 38, using it to get status of light switch on a slave. light switch id on the slave is 38 too.

     

    AOQ always read this status as on, even when turned off and can see it with right status on slave.

     

    Does common ID between AOQ and switch has anything to do with that?

     

     

     

    Oops... another debugging line left behind.... sorry

    In addition, this version detects all dead devices on startup but wakes up only devices that included in jM lines before initialization.

    Same approach for dead device during AOQ run.

    To reduce unnecessary load of the process. only dead devices which included in jM lines will be woke up.

    Please note, data table shows all dead devices over all controllers. (master and slaves)

     

    Please login or register to see this attachment.

     

     

     

    Edited by cag014
    Posted
    4 hours ago, cag014 said:

    Oops... another debugging line left behind.... sorry

    In addition, this version detects all dead devices on startup but wakes up only devices that included in jM lines before initialization.

    I wanted to ask you about this, why do I need AOQ to wakeup devices? Isnt that done natively through the system??

    Each controller handles his own?

    Posted

    Another question here, does the AOQ show a debug msg in case of internet connectivity is lost or re-gained ?

  • Topic Author
  • Posted
    On 1/28/2023 at 5:42 AM, Mohamed Refaat said:

    I wanted to ask you about this, why do I need AOQ to wakeup devices? Isnt that done natively through the system??

    Each controller handles his own?

    The answer is yes and no.

    By default, the dead devices are marked as unavailable and ignored by the controller, but you can configure to poll dead device in Zwave configuration panel of the hub.

    Please login or register to see this spoiler.

     

    Base on my experience I strongly recommend keeping wakeUpDeadDevice process working.

     

    • Like 1
  • Topic Author
  • Posted
    23 hours ago, Mohamed Refaat said:

    Another question here, does the AOQ show a debug msg in case of internet connectivity is lost or re-gained ?

    Yes 

     

    Posted (edited)

    Hi,

     

    We all know about fibaro problem regarding remote access. I started to see the following message while the problem is going on:
    [04.02.2023] [16:54:54] [ERROR] [AOQ1081]: jM{-}Ne Slave hub hc3l[Bad file descriptor] ➯ An error occurred while making the HTTP request.[04.02.2023] [17:01:28] [ERROR] [AOQ1081]: jM{-}Ne Local hub[Bad file descriptor] ➯ An error occurred while making the HTTP request.[04.02.2023] [17:01:28] [ERROR] [AOQ1081]: jM{-}Ne Slave hub hc3l[Bad file descriptor] ➯ An error occurred while making the HTTP request.

     

    whats does this mean ? Does it affect functionality? Why would the AOQ be affected by such problem regarding remote access fibaro platform while master and slave communicate locally on the network ?

     

    @cag014

    Edited by Mohamed Refaat
  • Topic Author
  • Posted (edited)
    5 hours ago, Mohamed Refaat said:

    Hi,

     

    We all know about fibaro problem regarding remote access. I started to see the following message while the problem is going on:
    [04.02.2023] [16:54:54] [ERROR] [AOQ1081]: jM{-}Ne Slave hub hc3l[Bad file descriptor] ➯ An error occurred while making the HTTP request.[04.02.2023] [17:01:28] [ERROR] [AOQ1081]: jM{-}Ne Local hub[Bad file descriptor] ➯ An error occurred while making the HTTP request.[04.02.2023] [17:01:28] [ERROR] [AOQ1081]: jM{-}Ne Slave hub hc3l[Bad file descriptor] ➯ An error occurred while making the HTTP request.

     

    whats does this mean ? Does it affect functionality? Why would the AOQ be affected by such problem regarding remote access fibaro platform while master and slave communicate locally on the network ?

     

    @cag014

    It does not affect functionality, is just refers to HTTP socket error...  usually, this points on slow LAN connection (some delays probbaly) 

    Interesting that I never saw this error on my hub.

    Might be good idea to ignore it...

    Edited by cag014
    Posted
    5 hours ago, cag014 said:

    does not affect functionality, is just refers to HTTP socket error...  usually, this points on slow LAN connection (some delays probbaly) 

    strange enough, i got same error on AOQ that works locally only ( no slave hubs). It only manages local devices and it gave same error.

    Join the conversation

    You can post now and register later. If you have an account, sign in now to post with your account.

    Guest
    Reply to this topic...

    ×   Pasted as rich text.   Paste as plain text instead

      Only 75 emoji are allowed.

    ×   Your link has been automatically embedded.   Display as a link instead

    ×   Your previous content has been restored.   Clear editor

    ×   You cannot paste images directly. Upload or insert images from URL.

    ×
    ×
    • Create New...