CCS C Software and Maintenance Offers
FAQFAQ   FAQForum Help   FAQOfficial CCS Support   SearchSearch  RegisterRegister 

ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

CCS does not monitor this forum on a regular basis.

Please do not post bug reports on this forum. Send them to support@ccsinfo.com

SOLVED/Kinda - Random Resets with reason of MCLR_FROM_RUN...
Goto page 1, 2  Next
 
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion
View previous topic :: View next topic  
Author Message
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

SOLVED/Kinda - Random Resets with reason of MCLR_FROM_RUN...
PostPosted: Fri Nov 13, 2015 8:46 am     Reply with quote

For reference I am using the following:
Compiler: PCWHD v4.135
Chip: PIC18F66K80
Memory usage: ROM=86% RAM=37% - 39%
FUSES:
Code:
#FUSES VREGSLEEP_SW,INTRC_HP,SOSC_DIG,HSM,PLLEN,NOFCMEN,IESO,PUT,NOBROWNOUT,BORV30,NOWDT,WDT1,CANE,MCLR,NOXINST,PROTECT,NOCPD,NOSTVREN,NODEBUG,NOCPB,NOWRT,NOWRTB,NOWRTC,NOWRTD,NOEBTR,NOEBTRB



I recently made what should have been a minor change to the project in question. This project has been running on this chip for the last 3 years and has not had any issues.

After making the change I started experiencing random resets. When the reset happens for a given version of code, I can duplicate it nearly every time and on different boards. But, removing commented code, adding comments, or removing unused code all result in either the problem going away, or moving to a new part of the code. The part of the code where the reset/problem manifests has nothing to do with the minor change that was made.

I implemented a call to reset_cause() as the first line of main, and am displaying it to our display. From a normal power cycle, the value is 12 (NORMAL_POWER_UP). But in the instance where we get the unexpected reset the value is 15 (MCLR_FROM_RUN).

I am using a Saleae Logic Analyzer to monitor the MCLR pin. The pin does not drop out at anytime during this time. I have the sampling rate set to the highest that it supports, thus should have seen whether there was an issue. I did try disabling the MCLR fuse, but that didn't seem to have any affect. The MCLR pin is connected through a 10K resistor to pin 6 of a TC1232 watchdog chip.

Using the logic analyzer and a spare pin, I did narrow down the part of the code where the reset happened to occur. But there were no issues with that part of the code. The code in question is listed below... current and future lcd line arrays all have a size of 17. All other variables are unsigned int8.

Code:
                case 1:                                                     // IN INSP? Y/N
                case 5:                                                     // AT DOWN LIMIT Y/N
                case 7:                                                     // AT FLR XX Y/N
                case 11:                                                    // AT UP LIMIT Y/N
                case 12:                                                    // 102? Y/N
                    if(current_lcd_line1[15]==0x20){                        // Blank
                        if(setup_var)
                            future_lcd_line1[15] = "Y";
                        else
                            future_lcd_line1[15] = "N";
                        setup_var_blink_tmr = 75;
                    }
                    else{
                        future_lcd_line1[15] = 0x20;
                        setup_var_blink_tmr = 15;
                    }
                     break;



I am at a loss as to what to try now. My minor change was to add some logic for checking an already existing value. At most I added 4 lines of code. I can't see that this is a software issue, but maybe it is uncovering a hidden issue? Too close to Max ROM usage on this chip?

I'm hoping that someone can point me in the right direction to get this solved. But as of right now, because of how seemingly unrelated changes cause errors in other unrelated parts of the code, I don't trust the compiler anymore. I tried looking for change logs for version 4.1xx of the compilers, but can only find it for v5.xxx. Is it possible this is a compiler issue and fixed in a later version?

Any help is appreciated!


Last edited by terryopie on Tue Nov 24, 2015 7:52 am; edited 1 time in total
wangine



Joined: 07 Jul 2009
Posts: 98
Location: Curtea de Arges, Romania

View user's profile Send private message Send e-mail Yahoo Messenger

PostPosted: Fri Nov 13, 2015 9:18 am     Reply with quote

Exist several issues actually can do the random reset on your MCLR pin. First of all try to identify where is the true issue. Can be a a WTD chip, power source, PIC or compiler. Need to take step by step. Remove the watchdog chip and put hard MCLR up, with 1k resistor and also can put a 100nF close on MCLR, that just for test. If resets stop and chip run normally can be WDT chip or power source fault, remove the CAP and watch again, if chip run normally then power source is also ok, but is more good to change to see what happen. After can test the WDT chip separately, if all indicated as good remain code issue or a simply compiler mistake. In that moment without step by step is hard to find the mistake.
Ttelmah



Joined: 11 Mar 2010
Posts: 19266

View user's profile Send private message

PostPosted: Fri Nov 13, 2015 9:24 am     Reply with quote

First thing why have you got NOSTVREN selected?. STVREN, is one that should always be selected, unless you add your own code to monitor the stack. I'd possibly not be surprised if you enabled this, that you got the flag saying that the reset was caused by a stack overflow.

Now 'MCLR_FROM_RUN', is distinguished by being the one you get if nothing else has reset the chip. So there has not been a watchdog, not been a power on reset, not been a brownout, etc.. All that is left is an MCLR reset from run, so this is what it reported. You will get this if (for instance) you jump to the first location in memory, or if an invalid value is popped from the stack, resulting in a return to the bottom of memory. This is why I suspect STVREN.

So look carefully at your stack handling.
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

PostPosted: Fri Nov 13, 2015 10:58 am     Reply with quote

Quote:
First thing why have you got NOSTVREN selected?. STVREN, is one that should always be selected, unless you add your own code to monitor the stack. I'd possibly not be surprised if you enabled this, that you got the flag saying that the reset was caused by a stack overflow.


The following are the only defined constants for restart_cause() that I find in the header file:

Code:
// Constants returned from RESTART_CAUSE() are:

#define WDT_TIMEOUT       7   
#define MCLR_FROM_SLEEP  11   
#define MCLR_FROM_RUN    15   
#define NORMAL_POWER_UP  12   
#define BROWNOUT_RESTART 14   
#define WDT_FROM_SLEEP   3     
#define RESET_INSTRUCTION 0   



I did enable STVREN, but still got back the MCLR_FROM_RUN reason. Possibly because there isn't a define for stack overflow?

Since there wasn't I also dumped out the value of the STKPTR register to my display... It came up with a value of 0x40. This would indicate a stack underflow condition.

I understand an overflow... Get into a recursion loop that you can't get out of, or not take the full stack size into account before going too deep.

But what causes a stack underflow? How can it pop the stack pointer past "main"? Now I'm confused... I didn't add or modify any functions.

Ideas?
Thank you!
guy



Joined: 21 Oct 2005
Posts: 291

View user's profile Send private message Visit poster's website

PostPosted: Fri Nov 13, 2015 1:55 pm     Reply with quote

I had an issue just like you describe (stable code, minor change, resets). In my case it turned out to be a stack overflow in a printf() statement. These, when nested inside functions and dealing with floating point numbers (but not only FP) tend to cause resets. If this could be the case, try simplifying the printf() by making calculations before the printf and then only displaying the result, try avoiding floating point, etc.
Also avoid nested functions if the printf() is deeply nested.
On the PIC24 you can increase the stack.

I'm not sure that my explanation is 100% correct but these practices solved the problem in my case.
Ttelmah



Joined: 11 Mar 2010
Posts: 19266

View user's profile Send private message

PostPosted: Sat Nov 14, 2015 3:33 am     Reply with quote

OK.

On the reset_cause, _you_ have to test for stack over/underflow. The bits for this are not part of RCON register, which is what 'restart_cause' actually reflects. With STVREN enabled, if you add:
Code:

#bit STKFUL=getenv("bit:STKFUL")
#bit STKUNF=getenv("bit:STKUNF")

   if (STKFUL)
       //display or indicate somehow that you have a stack overflow

   if (STKUNF)
       //display or indicate that you had a stack underflow

All the RCON bits are 'undefined' if a stack error occurs.

The 'underflow' errors can be tested for without STVREN, but the overflow error can't.

Now are you running this in debug?. There is a little problem here, that the debugger steals two stack levels. So code that could actually run OK for real, then gives stack overflows....

What does the listing show for stack used?.

The classic thing that can cause a stack error other than just 'running out', is a GOTO. This is one reason they are 'discouraged'. If (for instance), you jump from a piece of code inside a function, where a return address is on the stack (sometime inside a switch statement in some cases for example), then the stack can be left 'out of balance'.
Also remember that if your code (for instance) uses one more stack level, then the actual fault can appear somewhere else, when this just happens to step over the edge....
asmallri



Joined: 12 Aug 2004
Posts: 1630
Location: Perth, Australia

View user's profile Send private message Send e-mail Visit poster's website

Re: Random Resets with reason of MCLR_FROM_RUN...
PostPosted: Sat Nov 14, 2015 7:03 am     Reply with quote

terryopie wrote:

After making the change I started experiencing random resets. When the reset happens for a given version of code, I can duplicate it nearly every time and on different boards.


Are the boards powered with their own power supply or are all boards being tested with a common test bench power supply? If you are performing this testing with a common test setup then check for problems in the test setup. insufficient power supply filtering, faulty power supply, insufficient current etc.
_________________
Regards, Andrew

http://www.brushelectronics.com/software
Home of Ethernet, SD card and Encrypted Serial Bootloaders for PICs!!
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 7:44 am     Reply with quote

Quote:
On the reset_cause, _you_ have to test for stack over/underflow. The bits for this are not part of RCON register, which is what 'restart_cause' actually reflects. With STVREN enabled, if you add:


I did enable STVREN. First thing in main, I am saving away STKPTR register. It is giving me a value of 0x40 (Underflow ).


Quote:
Now are you running this in debug?. There is a little problem here, that the debugger steals two stack levels. So code that could actually run OK for real, then gives stack overflows....


No, I am not running in debug.


Quote:
What does the listing show for stack used?.


Listing shows stack usage here:
Code:

               ROM used: 56620 bytes (86%)
                         Largest free fragment is 8912
               RAM used: 1366 (37%) at main() level
                         1423 (39%) worst case
               Stack:    7 worst case (6 in main + 1 for interrupts)
Ttelmah



Joined: 11 Mar 2010
Posts: 19266

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 8:16 am     Reply with quote

_Underflow_. Very interesting.

Somehow you are executing a return from something that is not actually called, or popping a value from the stack.

Does the code use function pointers?. Classic is these being overwritten so the code jumps to an unexpected location in memory.

Goto as already mentioned.

Interrupt enabled without a handler present (effect depends on what other code is down there).
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 9:28 am     Reply with quote

Ttelmah wrote:
_Underflow_. Very interesting.

Somehow you are executing a return from something that is not actually called, or popping a value from the stack.

Does the code use function pointers?. Classic is these being overwritten so the code jumps to an unexpected location in memory.

Goto as already mentioned.

Interrupt enabled without a handler present (effect depends on what other code is down there).



Not using function pointers anywhere. Only have one segment of inline assembly. No GOTO or CALL commands being used. I'll have to go back through and double check that all but the one interrupt that we are using are disabled. Unfortunately can't check that for a few days... I'll report back.

Thank you for the suggestions!!
Ttelmah



Joined: 11 Mar 2010
Posts: 19266

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 9:44 am     Reply with quote

One section of in-line assembly?.
Postable?.

Any write to STKPTR, could cause this.
Any instruction that accesses PCL, PCLATH, or PCLATU.
Any POP.

The first two could be the result of a memory pointer (or array access), that is accessing an address outside the array...
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 10:11 am     Reply with quote

Here is the Assembly:

Code:
#ASM

    MOVF   _a_lo,W                                          ; Set-up address to write to
    MOVWF   EEADR

    MOVF   _a_hi,W
    MOVWF   EEADRH

    MOVF   _a_lo,W                                          ; Set-up address to write to
    MOVWF   EEADR

    MOVF   _ee_data,W                                       ; Set-up data to write
    MOVWF   EEDATA

    BCF      EECON1,7                                       ; Point to Data EEPROM Memory
    BSF      EECON1,2                                       ; Enable EEPROM Write

    BCF      INTCON,7                                       ; Disable interrupts globally

    MOVLW   0x55                                            ; The next four lines are required to allow the write
    MOVWF   EECON2
    MOVLW   0xAA
    MOVWF   EECON2
    BSF      EECON1,1                                       ; Set WR bit to begin write

    BSF      INTCON,7                                       ; Enable interrupts globally

#ENDASM



This snippet is how we write the internal EEPROM. Its somewhat faster than using the builtin interface.
PCM programmer



Joined: 06 Sep 2003
Posts: 21708

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 10:28 am     Reply with quote

I suspect that you are putting a RETURN instruction in the ASM code,
instead of letting the compiler handle the return by letting the function
proceed to the closing brace. Maybe you are not doing it in the posted
routine, but you may be doing it somewhere.

This would work, but sometimes the compiler won't do a CALL. It will do
a pseudo-call with a BRA to the routine, and the compiler inserts a BRA at
the end of the routine to jump back to the caller. There is no stack
involved. In this case, the insertion of RETURN is extremely ill advised.

If you thwart the compiler by inserting in your own RETURN in #asm,
you are sabotaging your own program. Absolutely marginal gains
are not worth going to assembly code.
terryopie



Joined: 13 Nov 2015
Posts: 13

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 10:36 am     Reply with quote

PCM programmer wrote:
I suspect that you are putting a RETURN instruction in the ASM code,
instead of letting the compiler handle the return by letting the function
proceed to the closing brace. Maybe you are not doing it in the posted
routine, but you may be doing it somewhere.

This would work, but sometimes the compiler won't do a CALL. It will do
a pseudo-call with a BRA to the routine, and the compiler inserts a BRA at
the end of the routine to jump back to the caller. There is no stack
involved. In this case, the insertion of RETURN is extremely ill advised.

If you thwart the compiler by inserting in your own RETURN in #asm,
you are sabotaging your own program. Absolutely marginal gains
are not worth going to assembly code.



The only assembly is what is listed in my above reply... No return that I can see would be added from that. Correct me if I am wrong.
Ttelmah



Joined: 11 Mar 2010
Posts: 19266

View user's profile Send private message

PostPosted: Mon Nov 16, 2015 11:16 am     Reply with quote

There are several instructions missing from the posted assembler. After the GIE, you should clear the WREN bit. If this is not done, later table accesses can result in writes to the memory....
Then before initiating the write, you must clear EEPGD, and CFGS bits, and set the WREN bit. As written it could fail to write completely (it the WREN bit is not set), and could write to the program memory, instead of the EEPROM.
Look at the listing in the data sheet.
Display posts from previous:   
Post new topic   Reply to topic    CCS Forum Index -> General CCS C Discussion All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group