For folks who raise questions about the overhead of C++ vs. C - check this out. While doing timing testing and tweaking for the CRGBSet stuff, I added support for C++11 style ranged loops - and I discovered that they were faster! For example - this code:
for(int i = 0; i < NUM_LEDS; i++ ) { leds[i] = CRGB::Black; }
(10% faster if NUM_LEDS is less than 255, 20% faster if NUM_LEDS is over 255. Bonus points to folks that can tell me why (I already know and @Mark_Kriegsman I suspect you do too :))
Some good stuff coming down the line (Also, I’ve changed how i’m referring to these things internally to PixelSets - mostly because RGB pixels aren’t going to be the only types of pixels supported before long)
This is GREAT. I’m loving the new compact, simple syntax for doing the thing that we do ALL the time: loop over pixels… Just great. And the performance increase? Very, very nice!
So - to be clear, this is 10-20% over what is a mostly empty loop (foo = CRGB::Black basically just copies 3 zeros - if you’re doing a whole lot of work inside of the loop, you probably won’t see a whole lot of increase. (Also the performance difference has as much to do with me tweaking the internals of CRGBSet and its iterators, it used to be 70% slower than the regular C loop).
I need to be careful with these features in the library. Some things - like putting in support for iterator based range loops can be done in a way that folks using pre-C++11 compilers can still use the library. Other things, though, for example, if I were to start using ranged loops inside of FastLED anywhere, would break people who aren’t using C++11/C++14 compilers yet.
Still, i’m looking forward to spending some time with the spec and new features and seeing what I can pull into FastLED to make things easier for folks
@marmil for(reference to object of: this){ operation on object }
The compiler determines the elements in the range (this) and works its way through the code between the range brackets for each element in “this” without needing to manually increment an index the way we do with a traditional c- for loop.
Passing a reference (&) to the range allows us to modify the iterator we are currently at whereas without, you can only “view” the object, not modify.
For example:
int i=0;
for(CRGB pixel: leds){i++;}
Would yield NUM_PIXELS, which is to say, for each CRGB pixel in leds, we increment i by 1.
If you Google range loop it should yield some insightful reading.
@Jarrod_Wagner right now the range loop is about 10% slower on arm - some of this, I suspect, is because a lot of operations that on arm take only a single cycle can take 2-8 cycles on avr. As I did with avr, i’ll spend some time digging through the compiler output to see what I can tweak.
Ok - tracked down. It turned out the C loop on ARM was 10% faster than the ranged loop because the compiler was taking advantage of the loop conditional being a constant and was putting an extra bit of optimization in there. If I changed my test code to pass in a CRGBSet for the range loop test and a CRGB*/int for the C loop test, then the tables are reversed, and the C loop is now 10% slower. For those interested in what the compiler is doing under the hood - here’s a note dump from digging into it - https://gist.github.com/focalintent/6a936de0502c98bbba6c