While playing with the Grbl workspace to control my RaspberryPi3 + Arduino based laser cutter.

While playing with the Grbl workspace to control my RaspberryPi3 + Arduino based laser cutter. I encountered the problem, that the gcode streaming at some arbitrary point stops. Is there any material that helps debugging such issues? Maybe examples of “normal” serial-port-json-server logs? Trouble shooting FAQs? Debug-Modes for the workspace with extended logging etc.? Thanks :slight_smile:

Does the g-code stop at the same line or randomly? What is the machine state after stop (alarm or not)? Do you have limit switches enabled? Sounds like fault alarm due to electrical noise.

It stops randomly, the machine state is Idle and I disabled hard/soft end stop checking. If feels more like a protocol sync lost between bufferflow_grbl.go and the workspace, something with buffers, queues, maybe even \r\n related stuff? I could provide the verbose serial-port-json-server log at the point where it stops.

I think this is the same bug i mentioned 2 days ago to somebody else. I think the Grbl buffer always had the possibility of getting deadlocked. TinyG use to have this bug and I fixed it over a year ago. The Grbl buffer still has old code in it and it should just get rewritten using the TinyG buffer code.

Yep, it was me. Hopefully somebody can get this fixed.

@John Ok, thanks for this fast assessment.

@jlauer ​ Do you possibly remember what part of the code has to be checked/changed?

You could start with using the TinyG buffer as your template and then steal code from the grbl buffer. Things you’d have to change are grbl needs 120 bytes in buffer while TinyG has 200. Grbl needs to query every 250ms since it doesn’t report location. Our you could go reverse and start with grbl and merge the locking code from TinyG

After having a lock at the source code it is clear that for fixing synchronization issues just reading a GO tutorial won’t be sufficient. Is it realistic to expect that one of the contributors will fix it in the next days?

@Henry_Hoffmann_elyss ​ I am trying to reproduce this problem, running 1.7 MB file with more than 83000 lines of gcode. 10550 lines sent already, ~13600 in the queue and no problem so far. I’m running SPJS v.1.88 on Win7 PC with Arduino connected to the same host.

That’s why it was originally such an elusive bug to find

And this is still only a theory

What I will say though is the TinyG buffer has had no reports of this bug in the last year

I filled the buffer after sending 34700 lines. Now buffer is emptying and I am waiting to see if consecutive g-code lines will follow…
G-code lines started ‘flowing’ again when buffer was ~16000 and all goes normal.
SJPS -v logging shows ‘Buffer decreased’, ‘Buffer increased’, ‘Buffer full’.
Next ‘buffer full waiting point’ after sending 45900 lines and lines continue after buffer dropped to ~15600.

I have no RPi3 here.

First, thanks a lot that you try to reproduce the problem! So far I have only observed the problem with picture engraving gcode e.g. generated by laserweb. I uploaded on file that fails to dropbox: https://www.dropbox.com/s/35h801ac9ndu91i/engraving%20that%20breaks%20SJPS.gcode?dl=0
To be sure, I just retested it by pasting it into laserweb2 and running it after one homing cycle. It failed again at about 2/3 of the code. I also uploaded the verbose log file from this test: https://www.dropbox.com/s/496rm6akyq334jq/engraving%20that%20breaks%20SJPS%20log.txt.zip?dl=0 . 212429 is the last line with an “Grbl just completed a line of gcode” after this the machine is idle.

@sszafran I think you reinforced the theory. It’s a hard bug to reproduce but it’s there. What happens is when gcode is sent to Grbl it increments a counter. If the Gcode is 80 bytes long the counter is set at 80. Then it sends the next line immediately. If the next line is 19 bytes long the counter is now at 99. It goes to send the 3rd line which is 50 bytes and knows that would overflow the Grbl buffer which has max 128 bytes available. So it does a thread pause so it’s blocking sending. When an OK comes back from Grbl on the receive thread it decrements the count for that OK line. This would be the OK from the 1st line which was 80 bytes so the counter is now at 19 bytes. It then sends a signal to the send thread to unblock itself because there’s room again to send.

The deadlock is if the two threads just at the right moment unblock and lock, but the lock executes in the wrong order down to the nanosecond in the processor. This was solved via thread locking in the TinyG buffer code because it eliminates the possibility of those two things happening at the same time.

@sszafran Did the files helped to reproduce the issue at your hardware?

Today I tried LaserWeb2 generated g-code in Chilipeppr grbl workspace and I went into ‘freezing’ situation, you mentioned earlier. Unfortunately no logging was enabled and I am trying to catch the problem again.