Brian Chrzanowski's Website

About

JSON Parsing With Fat Structs

Recently, while working on a C/C++ library to interact with a JSON API, I had an idea about how to make this parsing much easier. In my experience, it's been quite a aggravating swapping between strictly typed languages (C, C++, C#), and weakly typed languages (Python, Ruby, JavaScript).

In my professional work, I've seen libraries that use reflection and meta-programming to parse a given JSON blob into a higher level structure. The most common place I've encountered this is using ASP.NET Core to bridge the gap between a REST API and a C# controller method.

And, that works reasonably. Personally, the issues with a JSON (de)serializer can be quite hard to debug, but once the problems are resolved, the mapping between JSON and C# objects happen smoothly.

But, what if you didn't want a runtime reflection system to map JSON blobs to higher level objects? What if the end goal was an extremely deterministic set of steps for each JSON mapping operation?

Enter Fat Structs.

Fat Structs

So, what is a Fat Struct?

Well, simply put, a Fat Struct is a structure that includes members for one or more logical types. This means, no unions, no sub-typing, no components, and nothing of the sort. While the downside to this is clear: runtime storage gets bloated in comparison to discrete types for all functionality.

However, the benefits are enormous. Fat Structing results in a much richer set of sub-types, as well as fewer code paths for the same number of features, which is obviously better. In reality, you end up with bundles of data that can be manipulated however you'd like.

It should be noted that the specific use of a struct isn't required for this technique. You can obviously use classes or any other kind of grouping mechanism your language / environment has.

JSON Parsing

So, those are fine things to claim, but how does this help us parse JSON?

Assume we have two JSON blobs that are defined as such:

// cat
{
    "properties": {
        "id":         { "type": "string" },
        "name":       { "type": "string" },
        "hasFur":     { "type": "boolean" },
        "isHungry":   { "type": "boolean" },
        "isMeowing":  { "type": "boolean" }
    }
}

// dog
{
    "properties": {
        "id":         { "type": "string" },
        "name":       { "type": "string" },
        "hasFur":     { "type": "boolean" },
        "isHungry":   { "type": "boolean" },
        "isBarking":  { "type": "boolean" },
        "isAGoodBoy": { "type": "boolean" }
    }
}

Let's say we use a discrete type for both of these JSON blobs. We might define and parse data from these blobs with a fictitious JSON API like so:

struct cat {
    char *id;
    char *name;
    bool hasFur;
    bool isHungry;
    bool isMeowing;
};

struct dog {
    char *id;
    char *name;
    bool hasFur;
    bool isHungry;
    bool isBarking;
    bool isAGoodBoy;
};

struct cat ParseCat(char *json)
{
    struct cat c = {};

    c.id = GetString(json, "id");
    c.name = GetString(json, "name");
    c.hasFur = GetBool(json, "hasFur");
    c.isHungry = GetBool(json, "isHungry");
    c.isMeowing = GetBool(json, "isMeowing");

    return c;
}

struct dog ParseDog(char *json)
{
    struct dog d = {};

    d.id = GetString(json, "id");
    d.name = GetString(json, "name");
    d.hasFur = GetBool(json, "hasFur");
    d.isHungry = GetBool(json, "isHungry");
    d.isBarking = GetBool(json, "isBarking");
    d.isAGoodBoy = GetBool(json, "isAGoodBoy");

    return d;
}

As you can tell, there's quite a bit of duplicated code. An observant reader might suggest to use inheritance to solve this problem; however, observe how additional code must be written to make use of this inheritance.

struct base {
    char *id;
    char *name;
    bool hasFur;
    bool isHungry;
};

struct cat {
    struct base base;
    bool isMeowing;
};

struct dog {
    struct base base;
    bool isBarking;
    bool isAGoodBoy;
};

struct base ParseBase(char *json)
{
    struct base b = {};

    b.id = GetString(json, "id");
    b.name = GetString(json, "name");
    b.hasFur = GetBool(json, "hasFur");
    b.isHungry = GetBool(json, "isHungry");

    return b;
}

struct cat ParseCat(char *json)
{
    struct cat c;

    c.base = ParseBase(json);
    c.isMeowing = GetBool(json, "isMeowing");

    return c;
}

struct dog ParseDog(char *json)
{
    struct dog d;

    d.base = ParseBase(json);
    d.isBarking = GetBool(json, "isBarking");
    d.isAGoodBoy = GetBool(json, "isAGoodBoy");

    return d;
}

Not only do we still need to parse cat and dog types separately, but we even need a new function just to parse out the base properties, "because we want to re-use code".

And it's still the case that in these two examples, users of this API be are required to peek into the JSON blob to see what data they actually have, so interacting over a WebSocket or something similar requires multiple "parsing / deserialization steps".

That truly is the worse possible timeline. sigh.

Observe what happens to parsing when we stick all of the possible fields into a Fat Struct.

typedef enum {
    ANIMAL_TYPE_NONE,
    ANIMAL_TYPE_CAT,
    ANIMAL_TYPE_DOG
} ANIMAL_TYPE;

struct animal {
    char *id;
    char *name;
    int type;
    bool hasFur;
    bool isHungry;
    bool isMeowing;
    bool isBarking;
    bool isAGoodBoy;
};

struct animal ParseAnimal(char *json)
{
    struct animal a = {};

    if (KeyPresent(json, "id"))
        a.id = GetString(json, "id");

    if (KeyPresent(json, "name"))
        a.name = GetString(json, "name");

    if (KeyPresent(json, "hasFur"))
        a.hasFur = GetBool(json, "hasFur");

    if (KeyPresent(json, "isHungry"))
        a.isHungry = GetBool(json, "isHungry");

    if (KeyPresent(json, "isMeowing"))
        a.isMeowing = GetBool(json, "isMeowing");

    if (KeyPresent(json, "isBarking"))
        a.isBarking = GetBool(json, "isBarking");

    if (KeyPresent(json, "isAGoodBoy"))
        a.isAGoodBoy = GetBool(json, "isAGoodBoy");

    a.type = a.isMeowing ? ANIMAL_TYPE_CAT : ANIMAL_TYPE_DOG;

    return a;
}

The resulting function is a little longer, and a little more complicated, but both of our JSON structures pass through that single function. To assist with the discrimination of the actually parsed data, I included a type field, which is mostly for convenience. Additionally, now that all of the given data is within a Fat Struct, operating over any and all animals is as easy as writing a function to do something with a struct animal.

I'll leave you with this thought: how much more complexity is introduced if you want to have a struct catdog with all three techniques?