MeerK40t 0.6.0 released!

Took a couple months. Like 250 commits. But, it’s a really nice bit of software.

  • svgelements update fixes use of defs, use, and style tags and css classed objects. This should correct some loading errors and the use if Illustrator CSS classed svg data.
  • Ruida Emulator should do some cursory work and function like a ruida device.
  • Kernel/Device was changed significantly. This change makes devices more independent of the kernel and generally more customizable for each backend.
  • Device Manager was made to launch a different main window for each device. And to launch the device manager if nothing was set to launch at the start.
  • RD Works, RD Files can now be natively opened.
  • Optimize cuts, inner first.
  • Addition of the console and a large amount of commands. This allows for expert manipulation and testing and diagnostics from texted based commands.
  • Addition of a CLI batch command to execute a file of console commands before starting.
  • Link made between the Keymap and the console. So keymap commands, run console commands and can be made to run anything console has control over, which is most things.
  • Alias/Bind functionality was added. Where keymapped keys and command key sequences can be added to and bound. This includes keydown with automatic keyup alias binds with the use of + and - prefixes. As well as methods to loop, and end particular commands to trigger while a key is being pressed.
  • A number of additional keybinds were added including most vector manipulations through numpad keys.
  • Default Operations Values Now Save. These will be used for reclassification and persist between sessions.
  • Added in a proper widget system. So the drawn objects are self-contained and they each deal with their own mouse commands and can do so in different and unique ways.
  • Added direct manipulations of objects on screen with the mouse.

When I read the ruida emulator, I wondered why you didn’t populate a LUT once and then use it, instead of calculating the swizzle for every byte; there aren’t very many possible byte values so a LUT is cheap. :wink:

It’s not merely normally faster, but I think can also lead to more readable code in my opinion. (I remember you caring about performance when I asked about another design decision earlier. :slight_smile:)

2 reasons. The magic number changes. There’s at least two different magic numbers being used and while I detect these and change it on the fly it matters that I’d need at least 2 LUT maybe more if there’s others. And secondly, I used an xor swap and managed to get the algorithm down to bit changes:

   b ^= (b >> 7) & 0xFF
   b ^= (b << 7) & 0xFF
   b ^= (b >> 7) & 0xFF
   b ^= magic
   b = (b + 1) & 0xFF

That’s literally not something that takes any real time. It’s a bunch of basically free commands 1 clock cycle commands. I don’t think a LUT could beat that, not in terms of speed or readability. Some of the earlier versions without an xor swap for the 0 and 7 bits looked remotely complex, and set another variable, but here we look like a few xors and those are basically processor free.

You can calculate the LUT for a magic when you first need it, but even bidirectional with three magic numbers is 256*6 = 1.5K plus a few bytes of overhead total memory for six LUTs. Byte LUTs were very common even decades ago when memory was expensive. I remember decades ago sending byte LUTs to firmware in drivers…

Python is byte compiled, but the standard CPython implementation doesn’t JIT, so unless you are using PyPy it’s not 1 clock cycle, it’s one bytecode interpreter cycle. But again I think that the best thing is readabliity :slight_smile: I think it’s clearer to b = swizzle[b] than to do the math inline.

I’m not remotely convinced that even an on-the-fly lut would be worth it. It’s only really trivial bit manipulations. I wouldn’t consider a lut for adding one to a number, and I don’t actually consider that set of commands to be remotely that much more complex. That it might not jit and maybe you’re talking tens of clock cycles, but we don’t even have a sqrt or anything I’d consider to be hefty. 4 xors, 4 ands, 3 bit shifts, and and addition. I am to accept that python makes that so slow as to warrant creating a datastructure to accelerate it. I can’t fathom it saving 10 milliseconds over an entire file.

I guess I’ll just time it. It would be basically a single line to make the lut: swiz = [unswizzle_byte(s) for s in range(256)]

Over 2.55 million bytes we get.
without lut: 2.7991600036621094
with lut: 0.3950226306915283

We’re talking 7x the speed. Without lut, we’re doing 1.1 e-7 per byte. If we consider a reasonably size rd file at 50k. We’re spending 0.005 seconds the conversion. So 5 ms. Which could be slightly less than one.

def test_ruidaspeed(self):
    magic = 0x88
    def unswizzle_byte(b):
        b = (b - 1) & 0xFF
        b ^= magic
        b ^= (b >> 7) & 0xFF
        b ^= (b << 7) & 0xFF
        b ^= (b >> 7) & 0xFF
        return b

    swiz = [unswizzle_byte(s) for s in range(256)]
    def unswizzle_byte2(b):
        return swiz[b]

    import time
    t = time.time()
    for i in range(10000):
        for q in range(255):

    t = time.time()
    for i in range(10000):
        for q in range(255):
    print(time.time() - t)

    for q in range(255):
        self.assertEqual(unswizzle_byte2(q), unswizzle_byte(q))

I didn’t mean to argue about what color to paint this bike shed, and it was primarily readability not performance thoughts that triggered the question. I just reflexively also think about performance in tight loops. And it was stupid of me to then lead my question with performance. :frowning: Didn’t mean to go hare off after clock cycles. Sorry about that.

1 Like

Meh, I did the timing after writing it up. So I changed it. Sure, 7x faster for something taking 5ms. But, writing it is most of the work and I already did that. Might as well bank the improvement. Maybe if there’s a raster taking several megs we might start talking about real time.