Re fast scaling (following up on another thread)
For full on (scale = 255) the library is really scaling by 255/256. Thus input values
One option (if there is time enough) would be to increment the output value if non-zero.
Better (if there is code space/time for it) would be to add the channel value to result of the multiply (add low byte and propagate the carry to the high byte) before the effective right shift of 8 (by using the high byte of the result). This not only handles making a scale of 255 mean “full”, it handles the other intermediate values more accurately as well.
outvalue = (invalue * scale + invalue) >> 8;
(This simplifies in assy).