Optimization ideas for Draco calculation engine

Since HOnza posted about his Initiative ’24 about making FileMaker calculations quicker, I spend some time thinking on how to do this practically. Like what could be optimized if Claris puts work into their calculation engine. And I got some ideas:

Math seems to be a problem sometimes. Have a flag internal for number values on whether they are 32-bit integers. If math functions are invoked and the flag is set for both values, perform integer math directly with normal C math instructions and skip the precision math library. E.g. the normal "j = j + 1 " in a loop doesn't need to do precision math.
Cache parsed calculations. Evaluate() taking a text parsed calculation opcodes. The same calculations are frequently used, so if possible cache the parsed version of the calculation for some time. e.g. the calculation engine could keep a hash map with the last 10 parsed calculations to avoid parsing it again if the same calculation is needed again soon. e.g. for calculations running very frequently in a loop.
Apply caching for constants. It may be worth to cache calculations and their result for calculations, that don't need externals. e.g. if the calculation is just a number or text and avoid running it through the full calculation engine each time.
Have a flag for text values on whether they carry StyleRuns. Have text functions optimized to run a faster code path if no styles are involved. e.g. a function like text concatting may not need need to bother about merging style runs if both strings have no styles.
For small text values, consider a small text optimization. e.g. like std::string class have space for a few characters in the text value data structure, so you don't need to allocate bytes for the data. If I remember correctly std::string uses the high bit in length value to indicate that characters are stored in the data pointer directly instead of it referencing additional memory for the text. 8 bytes for a pointer then fit up to 7 characters in std::string.
Use reference counting as much as possible for strings. Avoid copying text values and just reference them. Using e..g Middle($s; 200; 300) on a long text should reference the new text and reference the original memory for the character to avoid copying it. The new text value then has a different data pointer and size value to just point to the requested text portion.
Consider walking through older Draco code and find cases where Draco was optimized to work on Macs with 128 K of memory. Like don't try to process data in small chunks, have it enjoy the current machines with GBs of memory and optimize functions to use bigger block sizes or work on the whole string.
Have values be marked with a flag for whether they a known to contain only ASCII text, so text functions could have branches which handle full unicode text different than pure ASCII text, since that usually can be handled much quicker.
A ton of API calls need text as UTF-8 or UTF-16 or UTF-32 depending on what API is used. Your text values could cache such representations. If the same text encoding is needed later, you don't need the conversion again. If the text changes, you would clear it. And if the same text encoding is not needed again, you just delayed the freeing operation.
Cache parsed JSON/XML/Calculation values. If you are at caching in the text values, you could do it for parsed texts as XML, JSON or Calculations.

Well, if I would design a class in C++ to hold a text value (n UTF-16), it may be something like this:

class TextValue {

	// Flags: IsSmallText, IsASCIIOnly, HasStyles
	// one of: hasJSON, hasXML, hasParsedCalculation
	// one of: hasTextUTF8, hasTextUTF32
	uint32 flags;
	uint32 length;
	StyleRuns *styleRuns;
	
	union
	{
		uint16 *bigTextdata;
		uint16 smallTextData[4];
	};
	
	union
	{
		JSONFragment *json;
		XMLDocument *xml;
		ParsedCalculation *calculation;
	};

	union
	{
		uint8* textUTF8;
		uint32* textUTF32;
	};
}

Since we use union to store values exclusively controlled by flags, we can do this all in about 48 bytes. For most texts almost all fields are NULL, but if you'd ask the text value twice for something, the second call can return a cached value. Just make sure all code calls the relevant functions instead of accessing the text directly. I personally use optimizations like this in my plugin code.

For all changes, please measure. As an outsider we have no idea whether the overhead for caching needs more CPU time than it saves. But the goal could be to write fast replacements for a couple of functions which produces the same result and behavior. It would be an interesting job to optimize a calculation engine like FileMaker's and find the spots where an optimization brings a real result to the users.

Or maybe someone at Claris is already working on a translation of calculations to JavaScript to leverage a quick JavaScript engine? But if that approach would be considered, better skip JavaScript and write a LLVM frontend for FileMaker calculations and let LLVM do the compile part and run the machine code it outputs.