Introduction
When working with data values it is not arbitrary the way we align those values. At first glance we may think that we should just store one next to the other no matter what. This is often true but not always, in this article I am going to explain different ways we have in Rust to explicitly define how we want our structures to be aligned.
Layout alignment in Rust
The Rust compiler does not guarantee to generate the same layout alignment every compilation. What that means is that our structures of data may be different everytime we change the code. Sometimes this could be good (obviously there is a reason why Rust does that) but sometimes could tend to performance issues we do not want. Is always a fight between two properties: Size vs Speed.
Size vs Speed
This is the main reason to choose one or another layout representation. Sometimes we are struggling with low capacity microcontrollers and we need this boost of space optimization to store our structures. In this kind of situations is where it is better to just represent our data structures layout as sequencials as possible.
On the other hand, other times we face optimization issues where our code is not fast enought to perform the task we need. The cache and the way we access to our memory is the key to have a well organized data layout.
We are not going to discuss how cache and memory access works in computers but you can find more about this topic in the following links:
Default representation
With the attribute #[repr]
we can specify the way we want to represent our values in the layout. If no attribute value is presented, Rust will use its own representation that does not guarantee any kind of data layout.
C representation
The #[repr(C)]
attribute lets us represent our data structures the way C does it. This way let Rust to interoperate with C. Representation in Rust with the C way has some defined properties:
- Structure alignment: The alignment of the struct is the alignment of the most-aligned field in it, e.g: If we have a
struct
calledT
with two values;bool
andu32
. The structure alignment would be4 bytes
(based onu32
). - Size and offset: For each field in declaration order in the struct, first determine the size and alignment of the field. If the current offset is not a multiple of the field’s alignment, then add padding bytes to the current offset until it is a multiple of the field’s alignment. The offset for the field is what the current offset is now. Then increase the current offset by the size of the field. Finally, the size of the struct is the current offset rounded up to the nearest multiple of the struct’s alignment.
Pseudocode:
/// Returns the amount of padding needed after `offset` to ensure that the
/// following address will be aligned to `alignment`.
fn padding_needed_for(offset: usize, alignment: usize) -> usize {
let misalignment = offset % alignment;
if misalignment > 0 {
// round up to next multiple of `alignment`
alignment - misalignment
} else {
// already a multiple of `alignment`
0
}
}
struct.alignment = struct.fields().map(|field| field.alignment).max();
let current_offset = 0;
for field in struct.fields_in_declaration_order() {
// Increase the current offset so that it's a multiple of the alignment
// of this field. For the first field, this will always be zero.
// The skipped bytes are called padding bytes.
current_offset += padding_needed_for(current_offset, field.alignment);
struct[field].offset = current_offset;
current_offset += field.size;
}
struct.size = current_offset + padding_needed_for(current_offset, struct.alignment);
Simple and small example extracted from allaboutcircuits:
#[repr(C)]
struct T {
c: u32,
d: bool,
}
#[repr(C)] for unions and enums
Unions
The #[repr(C)]
for unions types will be structured with size of the maximum value in the union rounded with the alignment, and an alignment of the maximum alignment of all of its fields.
#[repr(C)]
union MyUnion {
f1: u64,
f2: [u32; 8],
}
fn main() {
assert_eq!(std::mem::size_of::<MyUnion>(), std::mem::size_of::<u32>() * 8);
assert_eq!(std::mem::align_of::<MyUnion>(), std::mem::size_of::<u64>());
}
Enums with fields
The representation of an enum
with fields is a struct
with two fields, also called a “tagged union” in C:
// This Enum has the same representation as ...
#[repr(C)]
enum MyEnum {
A(u32),
B(f32, u64),
}
// ... this struct.
#[repr(C)]
struct MyEnumRepr {
tag: MyEnumDiscriminant,
payload: MyEnumFields,
}
// This is the discriminant enum.
#[repr(C)]
enum MyEnumDiscriminant { A, B, C, D }
// This is the variant union.
#[repr(C)]
union MyEnumFields {
A: MyAFields,
B: MyBFields,
}
#[repr(C)]
#[derive(Copy, Clone)]
struct MyAFields(u32);
#[repr(C)]
#[derive(Copy, Clone)]
struct MyBFields(f32, u64);
fn main() {
assert_eq!(std::mem::size_of::<MyEnum>(), std::mem::size_of::<MyEnumRepr>());
}
References
- https://doc.rust-lang.org/reference/type-layout.html
- Rust for Rustaceans: Idiomatic Programming for Experienced Developers (B0957SWKBS)