Re fast scaling (following up on another thread) For full on (scale = 255)

Re fast scaling (following up on another thread)

For full on (scale = 255) the library is really scaling by 255/256. Thus input values
0->0,
1->0,
2->1,
3->2

254->253
255->254

One option (if there is time enough) would be to increment the output value if non-zero.

Better (if there is code space/time for it) would be to add the channel value to result of the multiply (add low byte and propagate the carry to the high byte) before the effective right shift of 8 (by using the high byte of the result). This not only handles making a scale of 255 mean “full”, it handles the other intermediate values more accurately as well.

outvalue = (invalue * scale + invalue) >> 8;
(This simplifies in assy).

You don’t want to increment the output value if non-zero because the original value may have been 0 itself :slight_smile:

There are some chipsets and platforms where I quite literally do not have the spare clock cycle for the extra addition.

Most likely, the way to deal with this will be to have separate code paths for when the scale is 255, that don’t go through the scaling operation at all, however that requires re-adjusting the timings for the clock less chipsets, so it’s probably something that’s going to get pushed off until the summer (i’m fairly under the gun, time wise, for the next few weeks - and I am going to have to end up implementing new controllers and/or porting the library to a new platform for this project, which cuts into my time even more until july).

There’s another thing that comes up at the low end of the scale process, too: consider the case if the input value is very low, e.g., “1”. If this were sent to the strip directly, there would be SOME light. If you just multiply times ANY scale value, it now becomes zero – total black, no light. And that looks VERY different from “dim light”. And even if you use the adjustment proposed above (which works well at the high end of the scale), when scale is, say, 64, the result still rounds down to zero.

To deal with this, the library provides two main scaling functions: scale8( inputvalue, scale), and scale8_video( inputvalue, scale). The scale8_video function makes sure that a value with SOME light never rounds down to a value with NO light, and that when scale=255, the output is the same as the input. The scale8_video function works well for this, but it takes a few more clock cycles and I don’t think we can incorporate that one into the ‘global dimming for free’ system. Nevertheless, it’s great to use in your own code; it’s what I use all the time to scale brightness values without worrying about accidentally rounding down to total black.

…and just for the record, Dan and I are already talking together about a few different options for what we could do in the library for POV users who want scale=255 to basically mean “don’t scale at all.” We still have to document and make examples for the big v2.0 release, but I suspect this will be on the to-do list for after that.

OK, given your concern about the low end, the proposed calculation (in a world where there were enough cycles) could be:

outvalue = (invalue * scale + 255) >> 8;

Yep. Basically: round up, not down.

Thought. For the 1829 chipset, you could add 254 instead. Then the maximum scaled value would be 254, but everything but value 1 scaled by 1 would still show up as visible. If this is just a matter of loading a register with the appropriate “roundup offset” based on chipset before going into the send loop, it would not slow anything down. Of course, finding the cycles to do that addition at all is another matter.

See a comment elsewhere - I’m beginning to run out of registers with some of the controllers :slight_smile:

I’m beginning to think there’s a fairly strong case to be made for either the fastest in-loop scaling that we can do (with minimal other hits on the system) or no in-loop scaling, and letting people do their own scaling algorithms outside the loop.

There’s also code complexity/maintainability to think about in here - as it is I have 8+ chipsets across at least 3 separate MCU architectures - some of which have 3-4 different control structure possibilities (hardware SPI, USART SPI, bit-bang on same port, bit-bang on separate port). Juggling 2-4 different types of scaling algorithms to meet different requirements people have is just starting to get nightmarish from a testing standpoint :slight_smile:

Ah, let’s call these “desiderata” rather than “requirements”. Do what you can, it’s already a great library. I do see how it gets complicated.

that said - I am all for trying to find if there are places where we can support these things outside of the show loop. Basically, in the show loop, depending on chipset, we sometimes have a couple of extra clock cycles that we could do things in - if there’s something that I can squeeze in there that provides a general benefit to people, i’m a fan of that - though we now have two items squeezed into those clock cycles - the RGB re-ordering and the global brightness (which I’m mostly ok with having be linear vs. gamma corrected, mostly because it’s the rough equivalent of a dimmer switch).

However, doing things like providing higher level gamma adjustments in, say, the hsv2rgb conversion, or rgb specific adjustments, etc… - I’m a fan of seeing what we can throw in there - far far fewer clock cycles being counted.

At least until I start trying to interleave the hsv2rgb conversion as part of the show loop :slight_smile: