For folks who raise questions about the overhead of C++ vs.

Daniel_Garcia · December 10, 2015, 2:17am

For folks who raise questions about the overhead of C++ vs. C - check this out. While doing timing testing and tweaking for the CRGBSet stuff, I added support for C++11 style ranged loops - and I discovered that they were faster! For example - this code:

for( CRGB & pixel : leds) { pixel = CRGB::Black; }

runs about 10-20% faster on AVR than this code:

for(int i = 0; i < NUM_LEDS; i++ ) { leds[i] = CRGB::Black; }

(10% faster if NUM_LEDS is less than 255, 20% faster if NUM_LEDS is over 255. Bonus points to folks that can tell me why (I already know and @Mark_Kriegsman I suspect you do too :))

Some good stuff coming down the line (Also, I’ve changed how i’m referring to these things internally to PixelSets - mostly because RGB pixels aren’t going to be the only types of pixels supported before long)

Mark_Kriegsman · December 10, 2015, 2:27am

This is GREAT. I’m loving the new compact, simple syntax for doing the thing that we do ALL the time: loop over pixels… Just great. And the performance increase? Very, very nice!

Jarrod_Wagner · December 10, 2015, 3:35am

What a coincidence, I just finished the chapter in C++ primer on iterators and range loops!

No idea why a byte value would run faster though…

That’s a bit (8 to be precise) over my head!

Any tests on an ARM platform?

marmil · December 10, 2015, 3:41am

I don’t understand (yet) how the C++11 style loops work, but I do know that a 10% or 20% speed improvement is a very nice bonus.

Any link suggestions for where to learn the basics of this style loop would be welcome. Because 10%!

Daniel_Garcia · December 10, 2015, 4:09am

So - to be clear, this is 10-20% over what is a mostly empty loop (foo = CRGB::Black basically just copies 3 zeros - if you’re doing a whole lot of work inside of the loop, you probably won’t see a whole lot of increase. (Also the performance difference has as much to do with me tweaking the internals of CRGBSet and its iterators, it used to be 70% slower than the regular C loop).

Not every thing can use this kind of loop - although C++11 is trying to make it easier. http://www.codesynthesis.com/~boris/blog/2012/05/16/cxx11-range-based-for-loop/ is a pretty good article about it.

I need to be careful with these features in the library. Some things - like putting in support for iterator based range loops can be done in a way that folks using pre-C++11 compilers can still use the library. Other things, though, for example, if I were to start using ranged loops inside of FastLED anywhere, would break people who aren’t using C++11/C++14 compilers yet.

Still, i’m looking forward to spending some time with the spec and new features and seeing what I can pull into FastLED to make things easier for folks

Jarrod_Wagner · December 10, 2015, 4:16am

@marmil for(reference to object of: this){ operation on object }

The compiler determines the elements in the range (this) and works its way through the code between the range brackets for each element in “this” without needing to manually increment an index the way we do with a traditional c- for loop.

Passing a reference (&) to the range allows us to modify the iterator we are currently at whereas without, you can only “view” the object, not modify.

For example:

int i=0;
for(CRGB pixel: leds){i++;}

Would yield NUM_PIXELS, which is to say, for each CRGB pixel in leds, we increment i by 1.

If you Google range loop it should yield some insightful reading.

Daniel_Garcia · December 10, 2015, 5:29am

@Jarrod_Wagner right now the range loop is about 10% slower on arm - some of this, I suspect, is because a lot of operations that on arm take only a single cycle can take 2-8 cycles on avr. As I did with avr, i’ll spend some time digging through the compiler output to see what I can tweak.

Daniel_Garcia · December 10, 2015, 6:22am

Ok - tracked down. It turned out the C loop on ARM was 10% faster than the ranged loop because the compiler was taking advantage of the loop conditional being a constant and was putting an extra bit of optimization in there. If I changed my test code to pass in a CRGBSet for the range loop test and a CRGB*/int for the C loop test, then the tables are reversed, and the C loop is now 10% slower. For those interested in what the compiler is doing under the hood - here’s a note dump from digging into it - https://gist.github.com/focalintent/6a936de0502c98bbba6c