Preventing I2C hanging

rovtech · Joined: 24 Sep 2006 Posts: 262

My master PIC is controlling several slave PICs.
It does not mind a slave being disconnected as long as it is not expecting more than one reply. This works if the device is disconnected:

newguy · Joined: 24 Jun 2004 Posts: 1902

Start a timer just before you attempt to address the slave. Set the timer to expire (one shot mode) in, perhaps, 2x or 3x the "normal" time it would take to signal the slave and receive a response (the ACK).

The line immediately following where the slave would ACK (if it was present), insert a command to kill the timer.

In the timer's ISR, "kill" the I2C. You'll have to read the data sheet to see how to abort a transfer in progress. Also set a flag in the ISR - something like "I2C aborted". Any I2C commands/transfers after the line that may hang now need to be protected by a test for the "I2C aborted" flag.

Ttelmah · Joined: 11 Mar 2010 Posts: 19221

You don't have to bother.

Use the NACK returned by the I2C_write.

Basically when you send the device address, the write returns the acknowledge. If there is no device it'll return a 1, if the device is there and NACK's it'll return a 0.

So:

rovtech · Joined: 24 Sep 2006 Posts: 262

Thank you both.
I was thinking like you, newguy, but was looking for something like Ttelmah's reply.
I will try that Ttelmah, Excellent. Thanks.

temtronic · Posted: Mon Mar 20, 2017 4:51 pm

Mr T's code will work IF the I2C devices are working properly
but...
silicon DOES fail..
so...

Newguys timer code works for the 'bad chip' problem.

Whenever you have a 'network' of 'devices', and I2C is a 'network' , you need to consider a chip failing, either holding lines high, low, or open.
Some method of 'failsafe' needs to be enabled. In that respect I2C is no different than 'RS232' code which CCS has a 'timeout' or Microchip,which has a WDT for their PICs.

Really smart coders can isolate 'bad sections' of a network,yet allow data to travel to their destinations.

Jay

rovtech · Joined: 24 Sep 2006 Posts: 262

I tried the suggestion from Ttelmah and it appeared to fix the problem but when I tried to duplicate the problem I found the software worked fine whether the three slave PICs were connected or not. Apparently I had some other problem, possibly my ICD3 being connected to the PIC, or finger trouble. I also get a warning from the compiler that the “if” statement code does nothing.
My understanding is that the master initiates a start pulling SDA low, it sends 7 bits for the address of the slave followed by a high 8th bit to indicate it wants to read data (an odd number address), and expects the slave to pull SDA low in the 9th clock to acknowledge. If there is no ACK because the device is not connected then the master thinks the slave has nothing to send and terminates the transmission with a stop bit (SDA goes high when SCL is high). I don’t know what happens with the next two I2C_read(); statements but assume the master goes through the motions clocking the data (all highs because of the pullups). I get 255 returned for the two variables when disconnected and the correct values when connected which seems to indicate this is what is happening.
The only way to hang the communications is by clock stretching but this cannot happen with nothing connected.
temtronic gave me something else to consider: I have a Master PIC in a control console that gets switch and pot data from another PIC by I2C and sends data to three graphic LCDs by I2C. It also communicates with an ROV over 300ft of CAT5 on a pair of the wires using RS422 to another Master PIC in the ROV. This ROV Master sends instructions over I2C to three other PICs that control motors, and to three more PICs on the same I2C that control sections of a robotic arm. Perhaps I should be using watchdog timers to act in case the I2C fails. The ROV shuts down if RS422 communications is lost but having a thruster jammed full on, or a robotic arm that won't retract, because of a failed I2C would be a problem. Everything has been working well for several years but things happen.

newguy · Joined: 24 Jun 2004 Posts: 1902

Ttelmah · Joined: 11 Mar 2010 Posts: 19221

This is the thing of 'layers'.

Write your code internally to catch as many possible failure modes as possible.
Then add 'layers' round this to cope with things not going as expected.

This is also how a properly written watchdog has to be done.

On the watchdog, you see people enabling the watchdog, then 'scattering' the code with watchdog resets. Now this will work to catch the one condition (where the processor hangs), but does not catch the conditions where either the code goes 'off course' (an incorrect jump or return can still hit the watchdog resets), or the code is not operating as expected...

So the well designed watchdog, has just one routine that resets the watchdog. This is programmed so there is a 'return' in front of the entry.
(This prevents code doing a 'drunkards walk', reaching the routine).
Then when it enters it tests a series of 'status' bits, exiting immediately for any that are not reset. It then resets the watchdog, only if all of these have been reset as required. It then re-activates all these bits.

The main code just clears the bits one by one, only when individual operations have worked correctly, and at intervals calls the watchdog restart. Calling this more than needed won't restart the watchdog, since the bits won't have all been cleared. You design the watchdog timing, and the status tests, so that when the code is running correctly, all the bits will be reset in perhaps half the minimum watchdog time (beware on most PIC's the minimum watchdog time is very much shorter than the 'nominal' time). This way _all_ the components of the code have to be running correctly, or the chip will restart. A proper 'watchdog'.

The same idea can be used at a lower level for problems like yours.

What you can also do, is simply test that the clock line is 'high' before performing any I2C. So just use the 'input_state' command on the line used for the I2C clock. This will catch things like clock stretching, or something going wrong and perhaps an I2C chip being hung...

Which 'if' is saying it does nothing?.

The test on the I2C_write should not do this, unless I've typed something wrong. I use exactly that test to 'know' whether multiple I2C cards are present, and it compiles without complaint:

rovtech · Joined: 24 Sep 2006 Posts: 262

Are we agreed that the master will not see if the slaves(s) are disconnected and just go through the motions?
If you get sloppy with your editing and write:

Ttelmah · Joined: 11 Mar 2010 Posts: 19221

So I didn't get it wrong.... Very Happy

I didn't have the semicolon in what I posted....

Though I often do make mistakes when typing, that is one I'd have expected to 'self catch'.

As I have said since, you can also test for the bus being 'hung', by testing if the clock line is idle. However there is not much you can do about this, _except_ have code in your slaves, that if they don't (say) receive a transaction every second, they switch off and back on their I2C (which will release a clock stretch if one has gone wrong).