Device "transfer failed" following by "transfer OK"

AutoFrank · April 25, 2017

Hi

I have a good many devices that when they process an action (switch on/off)

I get a Transfer failed

Please login or register to see this image.

/monthly_2017_04/failed.PNG.17dc6d978046edf12731b1389da8181f.PNG" />

followed a few seconds later with a transfer OK

success.PNG.cc036db8363ab14bc0d627427c25211d.PNG

The screenshots above are on device masters (as I was naming them) but the same is seen on device slaves

The action gets completed but I don't recall seeing the failed before

It's across a number of device types

Is anybody seeing the same or does anybody know what this is happening

HC2 running 4.120

Thanks

_f

Edited April 25, 2017 by AutoFrank

pos · April 26, 2017

10 minutes ago, AutoFrank said:

Thanks @pos

my fibaro multi sensor was fw 2.7

I tried to remove it but it didn't remove fully.. so I've started the recovery process as a last ditch effort before I ask Fibaro to dial in and take a look

Please post the result...

Curious

Peo

AutoFrank · April 26, 2017

Just now, pos said:

Please post the result...

Curious

Peo

@pos

Will do

recovery complete, logged in as admin and now started the restore process from last good backup

AutoFrank · April 26, 2017

Hi Fibaro Admins

@A.Socha, @T.Konopka, @M.Baranowski

I have been having an issue with my system and some of the fibaro devices. I don't think its a device issue, I think it is more a system issue.

I have tried a lot of things as you can see from the thread above and other threads. This evening I tried a full recovery and restore and it didn't fix the issue. I am still getting transfer fails and transfer ok.

I also have one device that is stuck in a reconfiguration loop and I'm not sure if that is related.

Could you organise fibaro support to dial into my system on friday morning if they are available to see if they can determine what is wrong from systems logs, etc.

The HC system resources seem okay but there is a possibility that the zwave queue is being flooded by some process or device.

I have done what I can from my side

Thanks

_f

Support request logged earlier this morning - no case number yet

Please login or register to see this attachment.

Edited April 27, 2017 by AutoFrank

AutoFrank · April 27, 2017

Hi,

I think I am making a little progress here

I did a full recovery and restore last night and apart from loosing the custom icons which is expected and won't tale long to fix

The system is back runnning but my issue is not fully resolved. I do have the same issue as last time that the switch configuration of my dimmer 1 modules is no incorrect (toggle v momentary ) so I'll have to reset these

I have logged a support case with Fibaro to remote in to my HC2 and remove some 'Not configured' devices and stop a reconfigration loop that I cannot.

One of @petergebruers suggestion was that I have something overloading the zwave queue (and perhaps fibaro can see from the logs) and I think he is correct. The system seems sluggish but not for everything. Anything leveraging my sonos-api is working, my alarm as normal and system resource utilistion is healthy. Anything that leverages a http request that is not to a dodcy zwave device seems okay.

I also found that at some point the "Mark if Dead" was either not enabled or disabled and I have now enabled this for all devices. The troublesome devices (dimmers and relays) may have always be an issue or just a victim of one recovery process and I wasn't aware.

After a recovery it may be worth checking these two (physical switch setting for dimemrs and "Mark if Dead")

Plan...

I think I have one or more rogue devices that is impact the zwave queue and my chief suspect is one or more of my swiid cord switches (I have 7 in total)

This morning I plugged out all of these devices. I may exclude all of these devices this evening and leave the out all together.

I'll see if the zwave part of my setup stabilises. This evening I'll try a mesh reconfiguration on some of the 'dead devices' and failing that I'll try and exclude/include one or two to see if that resolves their stability and "transfer failed / transfer OK" issue I'm observing.

I think I'm making progress and hopefully support can remote in and stop the reconfiguration loop because the HC2 cannot reconfigure more than one at a time and all other reconfiguration requests just queue and never get started/completed

Edited April 27, 2017 by AutoFrank

petergebruers · April 27, 2017

17 hours ago, AutoFrank said:

@petergebruers

I did a but more checking and noticed that many of my fibaro devices were not enabled for "Mark if dead" so I went trough them all and enabled them. I am now seeing a few more dead devices.

I ran your script and seem to get a different result each time

Is this what you'd expect

Here is the result from three lights in the kitchen (repeated twice)

(...)

and this is the result from multiple lights

(...)

Some strange ones here - utility light was dead and then OK. I was watching the device in the WebUI and the message toggled between no communication, transfer failed, transfer OK

TBH, I'm not sure what to make of these results.

I think it's really best to let devices die when communication fails. If it's really, really, important that a device gets a message, then you can modify my script (or I can do it for you) so it dus a few attemps tu make it "undead".

Yes, the output of the script varies, if you have time-outs or communication problems. Let me go through the different sections of the log

Please login or register to see this code.

Three devices turned and confirmed within 1 second. All good.

Please login or register to see this code.

Some 15 seconds later, you try the same three devices and it doesn't look good, ID 176 now takes 4-5 seconds to get OK. You might have seen the message "Zwave transfer failed." briefly on your homepage.

The script continues and gives similar results for 174 (4 seconds) and 90 (7 seconds).

I see no "DEAD" in the log, I think in post #9 you accidentally copy/pasted the same output for the second test.

But anyway, the only difference is that you well get a "dead" device, but the script will make it undead by calling:

Please login or register to see this code.

With id = the ID of the failed device.

It seems to suggest to me that sometimes everything is fin, and then for at least 30 seconds, your network is very busy.

Some information regarding turning of devices: if you power off a device, it's best to let it 'die'. If it is dead, the HC2 makes no attempt to send information to the devices. But if you decide to make it immortal, every command, even one simple 'turnOn' will cause a lot of traffic, because Z-Wave makes several (futile) attempts to get the message across.

AutoFrank · April 27, 2017

3 minutes ago, petergebruers said:

I think in post #9 you accidentally copy/pasted the same output for the second test

Oops @petergebruers

Please login or register to see this code.

4 minutes ago, petergebruers said:

It seems to suggest to me that sometimes everything is fin, and then for at least 30 seconds, your network is very busy.

@petergebruers

zwave network I assume ?

5 minutes ago, petergebruers said:

But if you decide to make it immortal

How could i (accidentally) make it immortal ?

I don't have an scene or vd that tries to wake up dead devices....

6 minutes ago, petergebruers said:

.

I have disabled almost all VD's at this stage and the majority of scenes and I'm still getting dead devices

I think these may be dead all the time as opposed to being in that state from a flooded zwave queue or something else

At this stage I'm out of ideas and hopefully when Fibaro support remote in they will find something

One question regarding reconfiguring the mesh - am I better off bringing the HC2 close to that location or does it matter at all...

thanks again for all the help in trying to resolve

_f

petergebruers · April 27, 2017

4 minutes ago, AutoFrank said:

zwave network I assume ?

How could i (accidentally) make it immortal ?

I don't have an scene or vd that tries to wake up dead devices....

Yes, Z-Wave network is most likely explanation. Not memory, not CPU.

But keep in mind, that your HC2 might be the cause. For example, if you send ten thousand 'turnOn' commands to a device, they will be queued and sent. I guesstimate, on a normal network and with a direct connection, this will keep your network busy for about 10 minutes.

You only have the global "mark as dead" and individual "mark as dead" flags, but that's it. So if you never use "wakeUpAllDevices" then you can't accidentally change status 'dead' into 'alive'.

The second data set looks worse (performance). All devices would be 'dead' after this test. Oh, no, that's not true! Look at the last line... device 1499 is OK. And the device before that takes a very long time to acknowledge, but it's never dead. But before that, device 88 doesn't seem to recover within 45 seconds, a limit I put in the code (because I don't want to be the cause of loop of pointless traffic sent...).

I think... I'd try to limit scenes and VDs, check reporting of sensors (you already meantioned that yourself, regarding the MS6). Trial and error, unfortunately.

Welcome to Smart Home Forum by FIBARO

Device "transfer failed" following by "transfer OK"

Question

AutoFrank 372

27 answers to this question

Recommended Posts

pos 22

AutoFrank 372

AutoFrank 372

AutoFrank 372

petergebruers 1,268

AutoFrank 372

petergebruers 1,268

Join the conversation

FORUM

KNOWLEDGE

DOWNLOADS

SUPPORT