Furball file size optimizations

Having trouble, or an idea? Share here.
User avatar
SyntacticKitsune
Posts: 22
Joined: 05 Jan 2024, 02:44

Furball file size optimizations

Post by SyntacticKitsune »

So the first thing I want to talk about is the previously mentioned furball GZIP compression. I did manage to get a C# implementation together, which can be found here. This implementation is designed to be completely unnoticeable by both anyone using this class and also e.g. people using the editor -- I didn't see any need to make the compression configurable since I don't think there's likely to be any case where the compression is worse, unless the concern is CPU time, in which case I'm probably not the person to measure that (I can only run VS using a VM). I'm just not entirely sure what there would be to gain from such a configuration option. Regardless, writing the implementation was surprisingly simple, even if I had to spend some time debugging shenanigans like my Java implementation not writing the GZIP trailer or the written format version being misleading. (What's funny is I forgot to bump the max format version the first time I tested it, so Finmer thought it was itself out of date. Another fun moment was when I found out the format version of the furball being written doesn't necessarily equal the latest version, so I was accidentally writing format 21 furballs without compression. Fun times.)

Anyway, so for Core it's a decrease of 59.4% (down to 420 kiB -- small enough to upload to the forum!), which is different from my previous figure because apparently I wasn't writing the GZIP trailer. Regardless, I still think that's a pretty good decrease (and certainly blows any other optimizations I'm about to suggest out of the water). I've also tested some other furballs I have on hand: "DeepForest" decreases by 33980 bytes (69.1%), "Commission02.furball" decreases by 18726 bytes (60.6%), "Fetch module.furball" decreases by 9360 bytes (58.6%), and "Ollie.furball" decreases by 14362 bytes (57.9%). The average decrease for all of these is 61.1%, so that's probably around the figure that can be expected for most furballs. (Although this is only with a sample size of 5, there are simply not many furballs to test with, at least as far as I'm aware.) For fun, I also tested an empty furball (no assets, no dependencies), and compression did add 15 bytes to the output -- I think this is a bit of an extreme case though.

The next thing I wanted to suggest (or at least bring up): those funny 7 bit ints. They're already in use by strings (see BinaryReader), but they could also be used for other numbers like list lengths, byte array lengths, or enum constants. I've similarly tested this in Java and the decrease I got (for Core) was 1.2% (12404 byte decrease), which isn't terribly exciting (and combining that with compression makes the decrease almost imperceptible at only a 1108 byte difference). This doesn't take into account using 7 bit ints for e.g. creature stats though, but I don't imagine those would change the figure much. (I suppose I should also mention that 7 bit ints are bad for negative numbers since it ends up writing a full five bytes, or so I've heard.)

Another thing -- although this is arguably more error avoidance than any kind of file size optimization -- is about how lists are currently written using the optional serializable code path, meaning you could have lists with null objects. Requiring them to always be present and omitting the booleans results in a truly amazing decrease of 0.1% (1193 bytes), which is why I'm not suggesting this one as an optimization. Of course, a null object inside a list would probably only appear inside one of those questionable hand-crafted furballs, but it might still be worth doing this.

Finally, something I just now thought of (rather than like, three months ago): has OptiPNG (or one of its derivatives) been run over the item icons? It could reduce the file size of them somewhat (at the cost of muddying the git history slightly), assuming Finmer can read indexed mode (i.e., non-RGB) PNGs. (In fact, I just tested this, and it seems to shrink the Core furball by 10 kB, which isn't amazing, but it's free. Still don't know if Finmer will read these images -- though I assume it can.)

I'll probably submit a PR for the first optimization (assuming the current implementation is acceptable), but should I also include any of the others?

User avatar
Nuntis
Game Creator
Posts: 32
Joined: 11 Nov 2023, 13:27

Re: Furball file size optimizations

Post by Nuntis »

Okay, a lot to think about here! Thanks for taking the time to write this all out.

GZIP compression
Fair enough. You're correct that there's not really a reason to not do it, and provided that the loader remains backwards-compatible with versions 19 and 20, it should be fine.

Also, I can bump the forum upload size if needed; 1 MB seemed like a somewhat sane default to me (I do have limited disk space with my web host, though we're very far from using it up), but I can tweak that to allow for larger modules.

7-bit integers
I'm familiar with them; I've done some Minecraft network protocol stuff in the past, and they use 7-bit integers everywhere. You're right that there's probably a couple places that could benefit from them. Annoyingly, BinaryReader.Read7BitEncodedInt (and the writer counterpart) are protected, so Finmer would have to reimplement them. Not a showstopper, though.

Closely related - there are also, I believe, a few int-sized enums that could be downsized to bytes. Negligible gains, probably, though.

Non-optional list elements
Ah, that'd be referring largely to the visual script node lists, I think? I don't really think this would be worthwhile to pursue, given the tiny gains. But maybe if we're in the area anyway... hm. I'll think about it for deserialization error handling.

PNG images
I haven't looked into image optimization tools, however I did note some time ago that there seem to be different ways to save the exact same PNG, which I did not know. For instance, GIMP saves one of the item icons in Core as a 4 kB file, where Paint.NET can re-save the exact same image in around 1.5 kB.

I suppose if this were a thing, it would be kinda nice if it were a built-in feature of the editor, where it automatically optimizes imported assets for you.

Pull request
Your implementation looks sensible; I may have some naming nitpicks but those can be worked out in the PR. I think that if there is going to be a movement to optimize the furball file format, it would be wise to merge such a PR to a branch (instead of master), so that more changes can be made before committing format version 21.

If you'd like to include multiple of these optimizations, separate commits or perhaps even separate PRs depending on scale would make sense, I think.

User avatar
SyntacticKitsune
Posts: 22
Joined: 05 Jan 2024, 02:44

Re: Furball file size optimizations

Post by SyntacticKitsune »

Nuntis wrote: 22 May 2024, 20:34

I can bump the forum upload size if needed

I don't think there's a huge need to bump the upload size (at least right now), since the only thing it's preventing is like, redistributing modified versions of the core module (which won't be as much of a problem with compression). I don't imagine there'll be any >1 MB modules showing up for a while anyway -- all of the ones I've seen have been relatively small. But if at some point someone has such a module then I imagine it can be bumped then. (I mainly pointed it out as a fun side effect of the compression.)

Nuntis wrote: 22 May 2024, 20:34

provided that the loader remains backwards-compatible with versions 19 and 20

Related to the point about backwards-compatibility: I did test a version 20 module (I think my Deep Forest one), and I didn't see any loading errors. It could be worth at some point setting up some kind of automated testing for loading old modules (it can just make sure they parse) to quickly catch anything like this, should it ever happen.

Nuntis wrote: 22 May 2024, 20:34

they use 7-bit integers everywhere

Yeah, apparently there's (as of 1.20.1) a whole 138 instances of 7-bit ints there (and oh boy apparently there's var longs now too).

Nuntis wrote: 22 May 2024, 20:34

Annoyingly, BinaryReader.Read7BitEncodedInt (and the writer counterpart) are protected

I didn't notice that those methods are protected (although I don't think I've looked at them since writing Furblorb's versions). Assuming protected works like I'm used to (in Java), Finmer could probably just extend the two classes and widen the visibility of those methods.

Nuntis wrote: 22 May 2024, 20:34

there are also, I believe, a few int-sized enums that could be downsized to bytes.

I think I pointed out the int-sized enums back in the Deep Forest thread, although that may have gone unnoticed. Let me go find all of them again... Looks like all visual scripting stuff:

  • CommandPlayerSetEquipment.ESlot
  • CommandPlayerSetHealth.EOperation
  • CommandPlayerSetMoney.EOperation
  • CommandPlayerSetStat.EOperation
  • CommandPlayerSetStat.EStat
  • CommandPlayerStat.EStat
  • CommandVarSetNumber.EOperation
  • ScriptConditionGroup.EConditionMode
  • ScriptConditionNumberComparison.EOperator
Nuntis wrote: 22 May 2024, 20:34

that'd be referring largely to the visual script node lists, I think?

Outside of visual scripting components, there's also AssetItem equip effects and EquipEffectGroup buffs (from what I can see in Furblorb), but it is mostly the visual scripting node lists.

Nuntis wrote: 22 May 2024, 20:34

For instance, GIMP saves one of the item icons in Core as a 4 kB file, where Paint.NET can re-save the exact same image in around 1.5 kB.

The size difference there is possibly the difference between RGBA (4 bytes per pixel) and indexed (max 256 colors, but uses 1 byte per pixel). OptiPNG and friends I believe also try to optimize the compression (since PNGs are compressed -- who would have thought?) which usually result in slight further decreases (although it depends on the image in question). It can be a little slow (mainly on large images), since it's kind of a brute force operation.

Nuntis wrote: 22 May 2024, 20:34

I suppose if this were a thing, it would be kinda nice if it were a built-in feature of the editor, where it automatically optimizes imported assets for you.

I imagine for integrating with the editor first off there'd likely be a toggle somewhere (since this optimization process can be slow sometimes) and then I suppose the question would be whether to use OptiPNG itself (which is C) or I believe Oxipng (which is Rust) is the new one (although I've never used it). I guess it mainly boils down to whether the editor would try to interact with it natively (like with Lua) or just fork out a different process to run it. The former approach might be more difficult with Rust, but the latter'll probably work for anything.

Nuntis wrote: 22 May 2024, 20:34

Your implementation looks sensible; I may have some naming nitpicks but those can be worked out in the PR.

I'll open the PR soon. I think I put one of the methods in the wrong spot (according to the style guide) so I'll move that first.

Nuntis wrote: 22 May 2024, 20:34

it would be wise to merge such a PR to a branch (instead of master)

I was going to ask if you wanted to make a format-21 (or some other name) branch I could target the PR against, but I guess I never got around to actually doing that. Worst case scenario the base branch of the PR can be changed.

Nuntis wrote: 22 May 2024, 20:34

separate commits or perhaps even separate PRs depending on scale would make sense

I was planning on keeping the changes in at least separate commits (or PRs -- that'd be good for e.g. the lists), since otherwise it becomes a pain to review (at least for me).

User avatar
Nuntis
Game Creator
Posts: 32
Joined: 11 Nov 2023, 13:27

Re: Furball file size optimizations

Post by Nuntis »

SyntacticKitsune wrote: 22 May 2024, 22:14

It could be worth at some point setting up some kind of automated testing for loading old modules (it can just make sure they parse) to quickly catch anything like this, should it ever happen.

Yep; hopefully the upcoming CLI I'm working on can help with that. However, I do not have a test framework set up for Finmer, nor am I actually sure if I want one outside a few quick integration tests in the CI pipeline. TDD is great and all, but it's often still a little vexing to me (again, C++ background talking: mock frameworks are an unfathomable luxury).

SyntacticKitsune wrote: 22 May 2024, 22:14

I didn't notice that those methods are protected (although I don't think I've looked at them since writing Furblorb's versions). Assuming protected works like I'm used to (in Java), Finmer could probably just extend the two classes and widen the visibility of those methods.

Unfortunately no - they're marked protected internal to be precise, so they are not visible outside the source assembly. I could use reflection to get at them, but that would be silly; I'd rather just reimplement them. The question then is how to expose that to the IFurballContentWriter interface; I don't want to give it a Write/Read7BitEncodedInt method because that exposes binary-mode implementation details - the JSON implementation has no need for such a feature. Perhaps I could just change all ints to be written compactly...

SyntacticKitsune wrote: 22 May 2024, 22:14

The size difference there is possibly the difference between RGBA (4 bytes per pixel) and indexed (max 256 colors, but uses 1 byte per pixel).

Hmm, maybe - I did notice that there were also differences in file layout. GIMP seemed to insert big header blocks and tons of null bytes; PDN cut off almost all the headers and seemed to generate data that almost looked compressed (random-looking bytes, as opposed to patterns / lots of nulls).

SyntacticKitsune wrote: 22 May 2024, 22:14

then I suppose the question would be whether to use OptiPNG itself (which is C) or I believe Oxipng (which is Rust) is the new one (although I've never used it).

Unless there is a decently performant managed library, I'd likely just go with a C library and P/Invoke into it. That is, if this should even be an editor feature. It is a little niche, admittedly.

SyntacticKitsune wrote: 22 May 2024, 22:14

I'll open the PR soon.

Received - and I see you found the branch I quickly put up too :) I'll take a peek when I'm not falling asleep in my chair.

Thank you, for all the time and energy you're putting into this silly vorny game.

User avatar
SyntacticKitsune
Posts: 22
Joined: 05 Jan 2024, 02:44

Re: Furball file size optimizations

Post by SyntacticKitsune »

Nuntis wrote: 22 May 2024, 22:53

they're marked protected internal to be precise, so they are not visible outside the source assembly.

Ah, unfortunate. In that case yeah they'd just have to be re-implemented.

Nuntis wrote: 22 May 2024, 22:53

The question then is how to expose that to the IFurballContentWriter interface

For exposing them there's probably three options: we could write all ints as 7-bit ones (like you suggested) or maybe we could introduce a set of methods called like WriteCompressedInt32Property(), with a contract that's something like: "Writes a 32-bit integer property. The integer may or may not be compressed (optional operation) in a format to be read using ReadCompressedInt32Property()." The third option is we could just keep writing the properties as they are and only tackle lists etc. (the sizes of which are already controlled by the implementation).

I've just tested the first option and it does seem to make the size gains significantly worse (it added 1 kB to Core). It looks like this is because of the asset type IDs which are frequently large numbers (or negative). AssetSerializer does appear to be relying on the writer/reader type so it could possibly upcast and invoke a normal WriteInt32Property implementation. Accounting for this, the sizes do actually get better than even the original figures (which only touched lists, enums, and arrays) by around 1 kB. In light of this, the option most worth pursuing is probably just changing all the ints over to 7-bit ones except for the AssetSerializer's use.

Nuntis wrote: 22 May 2024, 22:53

GIMP seemed to insert big header blocks and tons of null bytes

Oh yeah something I forgot until just now about GIMP is that it likes inserting a bunch of metadata (even stuff like a thumbnail) into the saved images. That's more likely to be what's going on. I usually turn those all off, to the point where I guess I don't even notice they're there.

Nuntis wrote: 22 May 2024, 22:53

Thank you, for all the time and energy you're putting into this silly vorny game.

You're welcome. I think what's funny is that this is not even the first time I've sunk countless hours into modifying or developing tools for vore games -- just the first time I've ever publicly shared any results. (I did not in fact originally write the 7-bit int code that's in Furblorb for Finmer, nor its string handling. Ironically the reason I split Furblorb's IO stuff out into a separate jar was so I could reuse it in this other project.)

User avatar
SyntacticKitsune
Posts: 22
Joined: 05 Jan 2024, 02:44

Re: Furball file size optimizations

Post by SyntacticKitsune »

Alright, so now that the compression PR has been merged, should I work on and open a PR for the 7-bit stuff?

User avatar
Nuntis
Game Creator
Posts: 32
Joined: 11 Nov 2023, 13:27

Re: Furball file size optimizations

Post by Nuntis »

If you want to, absolutely! I'm a little unsure what the best approach would be for exposing this to the content stream interfaces; your idea of adding a WriteCompressed(U)Int32Property contract (and then kind of not look too hard at the slightly leaky abstraction :D) might be the simplest solution. Maybe the function should accept only unsigned integers, too, to help indicate its intended use, though if that requires a lot of casting all over the place, that'd be undesirable.

Looking at the usages of WriteInt32Property, I think these are the types to not convert to using var-ints:

  • SingleDeltaBuff
  • AssetSerializer (type keys)
  • CommandShop
  • ValueWrapperInt
  • FurballFileDeviceText

All other usages should be fair game as far as I can tell; there's only a few places where negative integers could show up, and the above cases should cover that.

FurballContentWriterBinary.BeginArray could also be changed to write a var-int, since an array cannot have a negative length. Maybe the same for WriteByteArrayProperty, however that function is unused (WriteAttachment references it, but could inline it). Perhaps zero could be used as 'null' marker instead.

User avatar
SyntacticKitsune
Posts: 22
Joined: 05 Jan 2024, 02:44

Re: Furball file size optimizations

Post by SyntacticKitsune »

Alright, that PR has now been created. After that I believe the only two changes left in this area are downsizing some enums (which likely requires a method to pick sizes based on version -- could be useful infrastructure in case any enums need to be resized later) and the optional-not-so-optional list entries (although I think you were still on the fence about those?).

In other news I tried to tackle the Lua bit32 library backport. It went well until I found out lbitlib.c depends on some kind of new Lua data type (representing an unsigned integer I guess). I might try that again from the C# side later.