I decided to do some more time checks with my POV code.  As a

I decided to do some more time checks with my POV code. As a refresher, this is doing a single read for 144 bytes from an SD card and pushing it out to FastSPI which is bitbanging it on pins 6 and 7 to a 48 pixels long WS2801 string. So I’m thinking, there are a few places for slowdowns there … reading the SD … bit banging … the WS2801 itself, why not figure it out. So this is what the timing code does:

while(1) {
  unsigned int timeStart, timeMid, timeEnd;
  for (int myCol = 0; myCol < columns; myCol++) {
    timeStart = micros();
    myFile.read((char*)leds, NUM_LEDS * 3);  // read all 144 bytes
    timeMid = micros();
    LEDS.show();  // update string
    timeEnd = micros();
    cout << "Elapsed time to read SD: " << timeMid - timeStart << " micros.\n";
    cout << "Elapsed time to push to string: " << timeEnd - timeMid << " micros.\n\n";
  // Rewind back to the beginning of the file, AFTER the header

As you can see, I start the timer right before an SD read. I take a sample after the read is completed, and another at the end after LEDS.show();. This is the result:

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1524 micros.

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1564 micros.

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1568 micros.

Elapsed time to read SD: 1292 micros.
Elapsed time to push to string: 1564 micros.

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1572 micros.

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1568 micros.

Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1564 micros.

Elapsed time to read SD: 1292 micros.
Elapsed time to push to string: 1564 micros.

Every 4th read, it takes a long time because it’s seeking for the next block on the card. This is expected. But what really surprised me was the time it takes to push the data out.

When I tested the same timing checks with the PROGMEM version of the program, same thing: pushing data out to the string is what’s taking a long time. The numbers were about 20-30 usecs faster. Big whoop. (The PROGMEM version is actually a tad slower in reading the data because I have to loop 48 times just to get a single column of data, and I also have to process each value to extract r, g, and b. The SD version is raw data with no processing needed. - Thanks to @Mark_Kriegsman for that suggestion.)

So, I’m back to wondering if it’s the WS2801 string that’s slow, or the fact that I’m bitbanging it on non-SPI pins. Or maybe both?

Unfortunately, I can’t (easily) put the string on the SPI bus because the SD card is on there. I say ‘easily’ because I could use a gate to add a select pin which will select between the card or the string but I have to believe that toggling back and forth between a read and string push will cause some delays … But I’m willing to entertain the possibilities … Does anyone have experience flipping back and forth between two SPI devices and do both still work fine at full speed?

Because of the 800kHz timing, it takes 1.25us for one bit. Multiply by 24 bits per pixel and multiply by 48 pixels and you get 1440us. That’s just for the basic bit timing and not including any inherent delays or cycle time for reading the data out of memory. With either the SPI pins or GPIO, the library can’t update the pixels faster than that.

That still doesn’t explain me needing to add an additional 1ms of pause between each full column read (not visible in the above snippet of code.) So updates to the string actually happen in the 2.5ms range. On an image with 200 columns, that’s 500ms to display the full image. And I don’t know where the breakdown happens.

Basically replace the serial out messages with:

_delay_us(2500L - (timeEnd - timeStart));

Anything lower than 2500us and I get image corruption (or as @Mark_Kriegsman said, repeating pixels.) I’ve checked the SD reads and there’s no data loss there, it’s reading in all the necessary data, correctly, and in the order it needs to be. So somewhere between that happening and it being pushed out to the string, something bad happens, but only if there isn’t an appropriate amount of “pause” …

Here’s a question, does LEDS.show() have some sort of buffer? As in, when LEDS.show() is called, does it shove data out and latches while blocking, or does it take the data, start sending it to the string while allowing the program to continue to run? Kinda like a threaded setup. I know 8bit AVRs can’t do threading but then Daniel’s a magician with his code, so who knows …

If there’s some kind of buffering, that may explain why I’m getting corruption if the next set of data comes in before it’s actually done sending the previous bit.

Actually, if your bit banging try pulling the clock divider back down towards 1 - your data rate is closer to 700kbps than 1Mbos - the clock timings for bit banging are nowhere near exact at the moment. The clock divider basically becomes how many nops to add to each bit output for bit banging.

Also, Paul, you are confusing the ws2811 with the ws2801 - the 2801’s data rate is at least 1Mhz, and I’ve seen it higher with shorter strands or good grounding.

Are you referring to setting the data rate? Cuz I’m not declaring that right now.

Yes, I am - for ws2801 it defaults to the clock divider value for 1Mhz - but that setting is nowhere near exact for bit banging.

Ok, I’ll play with that when I get home later. Certainly something to try. Thanks, I hadn’t thought of that. I figured if I don’t declare it, it will go “as fast as it can” …

HEY, waidaminit! Aren’t you supposed to be working on your July 4th project?! grin

Ahh sorry for the confusion - I misread it as WS2811.

I got enough questions/issues caused by the ws2801’s clock limitations that I put in a default lower clock rate for it.

Well, as soon as my LPDs come in, I’m bidding farewell to the WS2801 for a while … :slight_smile:

Between the ws2811’s for low cost, and the LPD8806’s for speed, the only time I light up the ws2801s anymore is for testing for folks who still insist on using them :slight_smile:

Of course, I may have a way to get a higher aggregate data rate out of the ws2811’s than the LPD8806’s at even their fastest speed :slight_smile:

I hope it works, because the july 4th project will need it…

I’m thinking the next logical step for me (in a future revision) is to use a DMA capable microcontroller … Then push data like there’s no tomorrow … Of course this also relies on the author of FastSPI having time to work on DMA code … oh wait. Hey, HI DANIEL!

Ok @Daniel_Garcia , different values for DATA_RATE_MHZ(x). The values to push the data out are averages over 10 columns.

x = 128
x = 64
x = 32
x = 16
x = 8
Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 531.6 micros.

x = 4
Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 681.2 micros.

x = 2
Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 982.0 micros.

x =1
Elapsed time to read SD: 108 micros.
Elapsed time to push to string: 1,582.8 micros.

And when I don’t define a data rate at all it returns the same values as if it was set to 1, so 1,582.8 usecs.

So it seems the lower a value I use for data rate, the slower it gets.

I’ll try setting it to >8 and try some pictures later tonight, see what it does when I change my delay, which right now is at 2500 usecs.

For lower to slower - that makes a lot of sense, actually. The data rate values are direct - e.g. DATA_RATE_MHZ(1) should be (roughly) 1Mhz (though for bit banging it looks like you’re actually getting closer to 725khz), DATA_RATE_MHZ(2) should be 2, so twice as fast, and so on. For the hardware SPI support, those values are used to actually set the clock (as close as possible, at least, depending on what dividers are available). This is a lot more portable than simply having people specify clock dividers which will change based on what you’re building for :slight_smile: For bit banging, it affects insertion of nop delays, which - as you can see, results in a much rougher mapping between desired data rate and real data rate. (those rates up there map to 0.72Mbps, 1.17Mbps, 1.69Mbps, and 2.16Mbps).

So, it looks like you’re able to get a bit over 2 Mbps with bit banging. Alas, higher bit banging rates would require either a faster CPU clock, or clock/data on the same port, which I need to fix up to work better with devices that can’t handle clock/data being strobed together.

I’m happy with the higher rate. It’s whether it solves the image corruption that I’m even more interested in. It’s too bright to take a proper picture and look at the image itself. I still don’t quite understand why it works when I add the 2500 usecs delay after LEDS.show(), essentially telling to wait before cycling to the next column.

When it sends the latch (500usecs delay), does it actually block the program, or does it send a delay and return right away? In other words, is it possible that I’m sending the next column of data before the 500usecs have expired?

So the docs that I had for the ws2801 said 24µs delay, not 500 - i wonder if you’ve got a different run/batch? You can go into chipsets.h and change the line that says:

CMinWait<24> mDelay;


CMinWait<500> mDelay

although - shit, i just realized that the code in there isn’t using the delay at all - argh… brb…

Ignore that last line - I was looking at a different chipset - anyway, back to this - try changing the delay from 24 to 500.

The code is smart in that it marks the time in µs when you finish writing data, and then on your next attempt to write, it checks to see if 24µs (or, 500 if you make that change above) have passed since mark was called. If not, then it waits the remaining amount of time.

In other words, it tries to let you burn as much of that 24 or 500µs in your code instead of mine as possible :slight_smile:

Hmm - I’ll need to add a fix in there to deal with the fact that micros rolls over every 70 minutes…

Ok, I’ll add that to the list of test to perform. I don’t actually know if these strips require a delay of 500usecs. I’m going by a datasheet I received when I purchased my first set of WS2801 individual ICs for a custom project.

So this is the plan right now:

  • check image display with current settings of data rate set at >8 and delay at 2500 usecs
  • if the image looks okay, gradually drop the delay. Ideal delay is around 750 usecs.
  • if things DON’T look okay, i’ll change the delay in chipsets.h and run the same battery of tests again.

Anything I’m forgetting?

Not that I can think of.

You know - this is something i should think about how to export - you could use the value of that minwait delay to force a framerate. You know it takes 531µs to write out a frame - so take your desired framerate (say, 1000fps), figure out how many µs you need per frame (1000µs, i made my math easy), and now set CMinWait in the header file to 470 (give or take).

Gating this way is probably better than trying to figure out the appropriate delay values to put in various places.