编译器处理结构体的原理


比如 struct {short :2; short b :14;}c; ,编译器是怎么根据语法树知道需要为 c 分配一个两字节空间,并且里面有两个位域的?

这只是一个例子,想知道编译器在生成语法树后,是怎么处理并展开结构体变量的?

c 编译器

neko3 10 years, 2 months ago

(我是搬運工)

Structure with information about how a bitfield should be accessed.

Often we layout a sequence of bitfields as a contiguous sequence of bits.
When the AST record layout does this, we represent it in the LLVM IR's type
as either a sequence of i8 members or a byte array to reserve the number of
bytes touched without forcing any particular alignment beyond the basic
character alignment.

Then accessing a particular bitfield involves converting this byte array
into a single integer of that size (i24 or i40 -- may not be power-of-two
size), loading it, and shifting and masking to extract the particular
subsequence of bits which make up that particular bitfield. This structure
encodes the information used to construct the extraction code sequences.
The CGRecordLayout also has a field index which encodes which byte-sequence
this bitfield falls within. Let's assume the following C struct:

 
  struct S {
  char a, b, c;
  unsigned bits : 3;
  unsigned more_bits : 4;
  unsigned still_more_bits : 7;
};
 

This will end up as the following LLVM type. The first array is the
bitfield, and the second is the padding out to a 4-byte alignmnet.

 
  %t = type { i8, i8, i8, i8, i8, [3 x i8] }
 

When generating code to access more_bits, we'll generate something
essentially like this:

 
  define i32 @foo(%t* %base) {
  %0 = gep %t* %base, i32 0, i32 3
  %2 = load i8* %1
  %3 = lshr i8 %2, 3
  %4 = and i8 %3, 15
  %5 = zext i8 %4 to i32
  ret i32 %i
}
 

參考資料:
[1]: http://clang.llvm.org/doxygen/CGRecordLayout_8h_source.html
[2]: http://www.zhihu.com/question/26415342/answer/32741740

mizia answered 10 years, 2 months ago

Your Answer