Linux internals and network programming: Data structure alignment

Firstly, We have a structure as below:

typedef struct student_s
{     
       int id;        // sizeof (int) = 4
       char name[7];  // sizeof (name) = 7 
} student_t;

When we print:

printf ("n = %d", (int) sizeof (student_t));

Do you think what number will be printed?

The answer is 12 :).

This is really interesting, especially with newbies, so now we will go though it to figure out WTH is existing in here.

This event is called as padding.

Memory addressing

-> Computers commonly address their memory in word-sized chunks. Its size is defined by the the computer architecture or specific processor design.

With 32 bit OS, word size is 4 bytes, and is 8 bytes with 64 bit OS. This results in only being able to address memory at offsets which are multiples of the word-size

What is padding?

-> Storage for the basic C datatypes on modern processors doesn’t normally start at arbitrary byte addresses in memory. Rather, each object of multi-bytes type has an alignment requirement; chars can start on any byte address, but 2-byte shorts must start on an even address, 4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8. Signed or unsigned makes no difference.

Set a structure:

typedef struct mystruct_s
{
       char c;
       int i;
       short s;
} mystruct_t;

Without alignment, memory will as below:

Misaligned memory

This would need to access to two memory cells and some bit shifting to fetch the int.

In reverse, structure are padded.

Aligned memory

Why is padding necessary?

-> Padding increases memory but makes app to run faster.

Who have responsibility to pad?

-> Memory padding is done automatically by compiler.

How do a structure be padded?

1.

typedef struct mystruct_s
{     
       char c;
       char padding[1];
       short s;
       int i;
} mystruct_t;

size of this struct is 1 + 1 + 2 + 4 = 8 bytes

2.

typedef struct mystruct_s
{
       short s;
       char s_padding[2];
       int i;
       char c;
       char padding[3];
} mystruct_t;

size of this struct is 2 + 2 + 4 + 1 + 3 = 12 bytes

3.

typedef struct mystruct_s
{
       int i;
       short s;
       char c;
       char padding[1];
} mystruct_t;

size of this struct is 4 + 2 + 1 + 1 = 8 bytes

4.

typedef struct mystruct_s
{
       char c;
       char c_padding[7];
       double d;
       int i;
       char i_padding[4];
} mystruct_t;

size of this struct is 1 + 7 + 8 + 4 + 4 = 24 bytes

5.

typedef struct mystruct_s
{
       double d;
       int i;
       char c;
       char padding[3];
} mystruct_t;

size of this struct is 8 + 4 + 1 + 3 = 16 bytes

In reverse, what is about packing?

On the other hand prevents compiler from doing padding

- a technique for reducing the memory footprint of C programs - manually repacking C structure declarations for reduced size.

- if you want to avoid memory pading, you can use (1st way) [#pragma pack(1)] after struct declaration).

- under GCC it's the 2nd way by declaring __attribute__((__packed__)) in struct like this one:

typedef struct __attribute__((__packed__)) mystruct_s
{
       double d;
       int i;
       char c;
} mystruct_t;

would produce structure of size 13.

- Packing is explicitly prohibited on strict alignment architectures like SPARC.

- Useless: write code for memory-constrained embedded systems, or operating-system kernels. It is useful if you are working with application data sets so large that your programs routinely hit memory limits. It is good to know in any application where you really, really care about minimizing cache-line misses.

Linux internals and network programming

My Blog List

Wednesday, August 10, 2016

Data structure alignment