OwenGage.com: writing

Understanding Rust's serde using macro expansion

2021-07-23

While I was writing fastnbt, I struggled to find an in depth explanation of how to write a deserializer with serde. I want to explore how serde works using cargo-expand.

This article expects familiarity with Rust, and at least a little experience using the de facto serialization/deserialization library serde.

Expansive mess

cargo expand is a custom subcommand for Cargo that lets you print the results of expanding a macro. Let's try it for a simple Deserialize macro:

#[derive(Deserialize)]
struct Human {
    name: String,
}

Here we simply have a Human struct that contains a name. We derive an implementation of the Deserialize trait. If we run cargo expand...

cargo install cargo-expand
cargo expand

...then we get the incredibly short... (don't spend time looking at this)

#[doc(hidden)]
#[allow(non_upper_case_globals, unused_attributes, unused_qualifications)]
const _: () = {
    #[allow(unused_extern_crates, clippy::useless_attribute)]
    extern crate serde as _serde;
    #[automatically_derived]
    impl<'de> _serde::Deserialize<'de> for Human {
        fn deserialize<__D>(__deserializer: __D) -> _serde::__private::Result<Self, __D::Error>
        where
            __D: _serde::Deserializer<'de>,
        {
            #[allow(non_camel_case_types)]
            enum __Field {
                __field0,
                __ignore,
            }
            struct __FieldVisitor;
            impl<'de> _serde::de::Visitor<'de> for __FieldVisitor {
                type Value = __Field;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private::Formatter,
                ) -> _serde::__private::fmt::Result {
                    _serde::__private::Formatter::write_str(__formatter, "field identifier")
                }
                fn visit_u64<__E>(self, __value: u64) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        0u64 => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
                fn visit_str<__E>(
                    self,
                    __value: &str,
                ) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        "name" => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
                fn visit_bytes<__E>(
                    self,
                    __value: &[u8],
                ) -> _serde::__private::Result<Self::Value, __E>
                where
                    __E: _serde::de::Error,
                {
                    match __value {
                        b"name" => _serde::__private::Ok(__Field::__field0),
                        _ => _serde::__private::Ok(__Field::__ignore),
                    }
                }
            }
            impl<'de> _serde::Deserialize<'de> for __Field {
                #[inline]
                fn deserialize<__D>(
                    __deserializer: __D,
                ) -> _serde::__private::Result<Self, __D::Error>
                where
                    __D: _serde::Deserializer<'de>,
                {
                    _serde::Deserializer::deserialize_identifier(__deserializer, __FieldVisitor)
                }
            }
            struct __Visitor<'de> {
                marker: _serde::__private::PhantomData<Human>,
                lifetime: _serde::__private::PhantomData<&'de ()>,
            }
            impl<'de> _serde::de::Visitor<'de> for __Visitor<'de> {
                type Value = Human;
                fn expecting(
                    &self,
                    __formatter: &mut _serde::__private::Formatter,
                ) -> _serde::__private::fmt::Result {
                    _serde::__private::Formatter::write_str(__formatter, "struct Human")
                }
                #[inline]
                fn visit_seq<__A>(
                    self,
                    mut __seq: __A,
                ) -> _serde::__private::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::SeqAccess<'de>,
                {
                    let __field0 =
                        match match _serde::de::SeqAccess::next_element::<String>(&mut __seq) {
                            _serde::__private::Ok(__val) => __val,
                            _serde::__private::Err(__err) => {
                                return _serde::__private::Err(__err);
                            }
                        } {
                            _serde::__private::Some(__value) => __value,
                            _serde::__private::None => {
                                return _serde::__private::Err(_serde::de::Error::invalid_length(
                                    0usize,
                                    &"struct Human with 1 element",
                                ));
                            }
                        };
                    _serde::__private::Ok(Human { name: __field0 })
                }
                #[inline]
                fn visit_map<__A>(
                    self,
                    mut __map: __A,
                ) -> _serde::__private::Result<Self::Value, __A::Error>
                where
                    __A: _serde::de::MapAccess<'de>,
                {
                    let mut __field0: _serde::__private::Option<String> = _serde::__private::None;
                    while let _serde::__private::Some(__key) =
                        match _serde::de::MapAccess::next_key::<__Field>(&mut __map) {
                            _serde::__private::Ok(__val) => __val,
                            _serde::__private::Err(__err) => {
                                return _serde::__private::Err(__err);
                            }
                        }
                    {
                        match __key {
                            __Field::__field0 => {
                                if _serde::__private::Option::is_some(&__field0) {
                                    return _serde::__private::Err(
                                        <__A::Error as _serde::de::Error>::duplicate_field("name"),
                                    );
                                }
                                __field0 = _serde::__private::Some(
                                    match _serde::de::MapAccess::next_value::<String>(&mut __map) {
                                        _serde::__private::Ok(__val) => __val,
                                        _serde::__private::Err(__err) => {
                                            return _serde::__private::Err(__err);
                                        }
                                    },
                                );
                            }
                            _ => {
                                let _ = match _serde::de::MapAccess::next_value::<
                                    _serde::de::IgnoredAny,
                                >(&mut __map)
                                {
                                    _serde::__private::Ok(__val) => __val,
                                    _serde::__private::Err(__err) => {
                                        return _serde::__private::Err(__err);
                                    }
                                };
                            }
                        }
                    }
                    let __field0 = match __field0 {
                        _serde::__private::Some(__field0) => __field0,
                        _serde::__private::None => {
                            match _serde::__private::de::missing_field("name") {
                                _serde::__private::Ok(__val) => __val,
                                _serde::__private::Err(__err) => {
                                    return _serde::__private::Err(__err);
                                }
                            }
                        }
                    };
                    _serde::__private::Ok(Human { name: __field0 })
                }
            }
            const FIELDS: &'static [&'static str] = &["name"];
            _serde::Deserializer::deserialize_struct(
                __deserializer,
                "Human",
                FIELDS,
                __Visitor {
                    marker: _serde::__private::PhantomData::<Human>,
                    lifetime: _serde::__private::PhantomData,
                },
            )
        }
    }
};

We can add some clarity here by

  • Replacing private aliases with the more expected form. So _serde::__private::Result is actually std::result::Result.
  • Renaming type parameters to be easier on the eyes, like __D to just D.
  • Removing some of the annotations like #[automatically_derived].
  • Removing the wrapping scope ie const _: () = {...}.
  • Moving nested struct and impl blocks to the top level.

These things are to isolate the expanded code from the code around it. Preventing the expanded code affecting yours, and yours from affecting the expanded code.

A Reddit user pointed out that the wrapping scope was introduced because of GitHub serde issue 159.

There's quite a few types and implementations created by this expansion. Below is a quick summary:

ThingDescription
impl Deserialize for HumanThis is exactly what we wanted to derive.
struct HumanVisitorA visitor that gets called by the deserializer. It's job is to produce the Human value.
enum FieldThis enum represents the fields of our struct Human, in our case it simply containsfield0 for 'name', and anignore variant.
struct FieldVisitorA visitor purely to check identifier-like values produced by the deserializer match our fields.

Deserialize implementation

After all that clean up for human eyes, here's our Deserialize implementation:

impl<'de> serde::Deserialize<'de> for Human {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: Deserializer<'de>,
    {
        const FIELDS: &'static [&'static str] = &["name"];

        serde::Deserializer::deserialize_struct(
            deserializer,
            "Human",
            FIELDS,
            HumanVisitor {
                marker: PhantomData::<Human>,
                lifetime: PhantomData,
            },
        )
    }
}

We can see here that this just delegates to the deserialize_struct method, passing some extra information like the names of our fields, and a visitor that was also generated by the macro. Nothing too complicated here. What's that HumanVisitor?

Our visitor

Here's our visitor with some code snipped out for brevity:

struct HumanVisitor<'de> {
    marker: PhantomData<Human>,
    lifetime: PhantomData<&'de ()>,
}

impl<'de> serde::de::Visitor<'de> for HumanVisitor<'de> {
    type Value = Human;
    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        fmt::Formatter::write_str(formatter, "struct Human")
    }

    fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
    where
        A: SeqAccess<'de>,
    {
        // ...
    }

    fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
    where
        A: MapAccess<'de>,
    {
        // ...
    }
}

The purpose of a visitor is to be driven by the deserializer, constructing the values as it goes. This visitor in particular is expecting to have methods called that can be converted to our Human structure. It would make no sense if the visitor received an integer type, because that does not represent a structure.

The visitor only implements the methods that make sense for the type it is expecting. For structs, you would generally expect a map of key-value pairs. This is why our visitor implements visit_map. It also implements visit_seq (seq for sequence); this supports formats that encode the values in order, skipping keys.

Default method implementations

The serde::de::Visitor trait has default implementations of each method, which simply raise an error or forward the call on to another method. Here's the default implementation of visit_bool:

fn visit_bool<E>(self, v: bool) -> Result<Self::Value, E>
where
    E: Error,
{
    Err(Error::invalid_type(Unexpected::Bool(v), &self))
}

So if the deserializer calls the visitor with a bool, but it does not implement the method, by default you will get an error. Some method implementations add convenience for the common cases, like visit_u8, which forwards on to visit_u64:

fn visit_u8<E>(self, v: u8) -> Result<Self::Value, E>
where
    E: Error,
{
    self.visit_u64(v as u64)
}

This makes some sense. If you are making a visitor that accepts unsigned integer types, you can implement visit_u64 and handle unsigned integers of smaller types like u32 for free. This can make untagged enums containing these forwarded types difficult however, forcing us to create stricter deserializers.

Map access

Let's take a look at the code for visit_map:

fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
where
    A: MapAccess<'de>,
{
    let mut field0: Option<String> = None;
    while let Some(key) = MapAccess::next_key::<Field>(&mut map)? {
        match key {
            Field::field0 => {
                if Option::is_some(&field0) {
                    return Err(<A::Error as Error>::duplicate_field("name"));
                }
                field0 = Some(MapAccess::next_value::<String>(&mut map)?);
            }
            _ => {
                let _ = MapAccess::next_value::<serde::de::IgnoredAny>(&mut map)?;
            }
        }
    }
    let field0 = match field0 {
        Some(field0) => field0,
        None => serde::__private::de::missing_field("name")?,
    };
    Ok(Human { name: field0 })
}

This function iterates through the given maps keys and values, looking for any of the fields of the structure. In our case we have the single field 'name' which is encoded in the Field type which we will look at in a second.

For each key, it checks if it is one of our fields. If it is, it tries to get the value as the expected String type. It ignores fields that are not in our structure. Finally it checks it has all the required fields and returns our Human.

The Field types

The Field and related types looks like this:

enum Field {
    field0,
    ignore,
}

impl<'de> serde::Deserialize<'de> for Field {
    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
    where
        D: serde::Deserializer<'de>,
    {
        Deserializer::deserialize_identifier(deserializer, FieldVisitor)
    }
}

struct FieldVisitor;

impl<'de> serde::de::Visitor<'de> for FieldVisitor {
    type Value = Field;

    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
        fmt::Formatter::write_str(formatter, "field identifier")
    }

    fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            0u64 => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }

    fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            "name" => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }

    fn visit_bytes<E>(self, value: &[u8]) -> Result<Self::Value, E>
    where
        E: Error,
    {
        match value {
            b"name" => Ok(Field::field0),
            _ => Ok(Field::ignore),
        }
    }
}

We have a very similar situation to our Human visitor here. Deserialize is implemented for the Field enum, and it just passes on to deserialize_identifer passing the FieldVisitor along.

This FieldVisitor expects the deserializer to call methods on it that 'look like' field identifiers. So it implements visit_str and visit_bytes, both of which see if the deserialized value looks like one of our fields. If it doesn't, the field gets deserialized to the special Field::ignore variant.

There is also the visit_u64 method, which allows the field name to be the number of the field; zero in our case.

Conclusion

This was a brief look into how serde deserializes data into values. If you would like more detailed information about this, let me know via whatever medium you found this on.

Some things I think I would like to expand upon are:

  • How optional types, nested structures, enums, bytes, and strings are handled.
  • How to write a deserializer for a data format.
  • How borrowing from underlying data is provided.

Thanks for reading!