Can you help debug an ESP32 demo crash?

(Marc MERLIN) #1

Can you help debug an ESP32 demo crash?
I’ve had this set of demos from Sublime for a while now:
The problem is that they work perfectly fine on ESP8266, or Teensy, but one of them crashes (reboot) with no traceback last time I spent a while doing printf debugging, it felt that I was just moving the crash around as code alignment/compilation was changed.
I’m almost certain it’s an ESP32 core bug, but I can’t prove it and can’t narrow it down, because when I do, it just moves/goes away.
@Yves_BAZIN or someone else with ESP32, is that something you’d have the time, skill and curiosity to debug on your ESP32 rig?
you’d think it’s an obvious out of bounds write, but

  1. I haven’t found it
  2. typically ESP32 will catch those and give a traceback, which it does not there

You’d have to take the code, install FastLED::NeoMatrix and change config.h to comment out SmartMatrix
and define the init of your array (pin numbers and resolution), it’s reasonably easy to do.

(Yves BAZIN) #2

@Marc_MERLIN I’ll have a look at it !!

(Marc MERLIN) #3

thanks @Yves_BAZIN this is what lightening looks like when it works (top on teensy) vs bottom on ESP32 where lightening is disabled.
By the way, it’s kind of glorious to have 64x64 in such a small form factor :slight_smile:
missing/deleted image from Google+

(Dylan Lovinger) #4

I just debugged a freeze/reboot issue on my ESP8266. It was the WiFi module causing it, when I disabled the WiFi chip the reboots have stopped. Just need the line WiFi.forceSleepBegin(); in setup, from “ESP8266WiFi.h” (may be different for ESP32).

(Marc MERLIN) #5

@Dylan_Lovinger thanks for the idea. On ESP32 it’s this instead:
#include <WiFi.h>
WiFi.mode( WIFI_OFF );

I just tried, it made no difference.

(Yves BAZIN) #6

On this line
leds[XY2(lx+1,ly-1,true)] = lightningColor;
When lx =matrix_width-1 ( end of the loop)
Then lx=matrix_width
I don’t know the function XY2 but my guess is that you’re going out of bound of the array you display the result of that function and see if you are outside 64*64

(Marc MERLIN) #7

@Yves_BAZIN yeah, that code (which I inherited) is confusing and full of out of bounds stuff. The original code had a checker for XY that went to a dead pixel.
In FastLED::NeoMatrix / SmartMatrix_GFX I fixed it another way
XY2 mostly calls XY
which in “fixes” any coordinates that are out of bounds:

So it’s not a simple reading/writing out of bounds

(Yves BAZIN) #8

@Marc_MERLIN this is what you want?

(Marc MERLIN) #9

@Yves_BAZIN yes, I wanted to have rain and lightening in your living room :wink:
But more seriously, thanks for trying it out. I’m now even more curious as to why it works for you and the same exact code on the same chip, does not work for me.
Either my arduino compiler is different from yours (I have 1.8.6 hourly, but I’ve had it fail with older compilers a few months ago), or my ESP32 is an older silicon with a hardware bug that is fixed in yours, or something else that is escaping me.
Either way, that narrows it down quite a bit, thanks for that.

Now you can go mop up all that water in your living room :smiley:

(Yves BAZIN) #10

@Marc_MERLIN you have
as it goes in a loop It crashes
move it as a global variable
if you want it to be used only for the function then use a malloc and then free. otherwise it seems that the compiler ha na issue with this local variable

(Marc MERLIN) #11

@Yves_BAZIN so first of all, you have fixed my problem, and I think I understand why, now, but I don’t think I would have found that problem on my own without your hint, so thanks a lot.

If I understand correctly, it’s because global variables go in heap space, of which there was enough, while local variables in a function go in stack space, of which there was not enough, causing the crash.

This code is indeed quite inefficient. I didn’t read it all carefully to fix all its issues. For one the lightening array could be a bitfield array, but in the meantime, I changed it to be uint8_t instead of int.
This single line patch makes the crash go away:

  •                   int lightning[MATRIX_WIDTH][MATRIX_HEIGHT];
  •                   uint8_t lightning[MATRIX_WIDTH][MATRIX_HEIGHT];

this points towards a limit of how much can be allocated in a function I guess. 4KB is ok but 16KB is not.

I can also move this as a global which of course is more efficient since it doesn’t need to be recreated every time the function is called:
a) the code seemed to depend on that array being re-initialized to 0 every time the function is entered, although actually it resets each cell to 0 every time it’s used, so it ends up working as a global afterall.
b) I was curious as to why re-creating an array every time you loop in, was causing a crash. It is of course inefficient, but after all it’s a 16KB array that goes out of scope at the end of the function call and should be freed up, so that it can be re-created next time around.

Do you think there is anissue where it runs out of stack space which is limited to something smaller than 16KB apparently?
I can see that moving from uint32 to uint8 divides the space by 3, which may get us back under a limit, and that going global allocates the array goes in heap instead of stack, so the problem isn’t triggered either?

Did it work before you moved it as a global for you?

(Yves BAZIN) #12

When i saw that line it seems also really strange to me to redeclare a large array each time.
first it did not work for me before I moved it as a global variable.
I had the error ‘overflow in a loop’. So moving it to global was the first logical move for me. Luckily it work :wink:
Happy that it is running for you as well. Can’t wait to see a video

(Marc MERLIN) #13

@Yves_BAZIN oh, your ESP32 crashed with a useful message? You’re lucky, mine just rebooted with no output. I guess your screensize is different from mine and triggered a crash report that for me didn’t do anything but cause the stack to be smashed and the code to just reboot with no warning.
As for the video, I could give you one, but you already have it running on your giant screen, sounds like you don’t need one :slight_smile:

(Yves BAZIN) #14

@Marc_MERLIN i have installed the package to interpret the error codes maybe it helped too :wink:
Size doesn’t matter I have only put on pattern for the test and 64x64 in such a small for factor is nice to see in action.

(Marc MERLIN) #15

@Yves_BAZIN I have ESPExceptiondecoder too, but all it does is translate the binary crash sent over the serial port. For some reason the 4096*4 malloc on the stack made my ESP32 crash reboot while the different allocation on yours caused the hypervisor thingy to kick in and help you.
I made a video yesterday and uploaded it when I announced SmartMatrix::GFX here: