Extracting type-information from a Go binary

Introduction

Go is a strongly typed language. This means that you can’t concatenate for example a string with an integer without first converting the integer to a string. For this to be enforced, there needs to be a way for the runtime to track all the different types. In terms of Go, all the types have a definition that is included in the binary. By parsing all of these type definitions, it is possible to reconstruct all the types inside the binary which can aid the analysis of a suspicious application/malware. This post will walk through where this data is located, how to extract and parse it so the type definitions can be reconstructed for all the types in the binary.

It all starts with moduledata

As described in a previous blog post, the moduledata structure holds a pointer to some very important data structures in the Go binary. For recovering type-information, we are mainly interested in two data structures: types and typelinks. Below is the current moduledata structure as of this writing.

type moduledata struct {
	pclntable    []byte
	ftab         []functab
	filetab      []uint32
	findfunctab  uintptr
	minpc, maxpc uintptr

	text, etext           uintptr
	noptrdata, enoptrdata uintptr
	data, edata           uintptr
	bss, ebss             uintptr
	noptrbss, enoptrbss   uintptr
	end, gcdata, gcbss    uintptr
	types, etypes         uintptr

	textsectmap []textsect
	typelinks   []int32 // offsets from types
	itablinks   []*itab

	ptab []ptabEntry

	pluginpath string
	pkghashes  []modulehash

	modulename   string
	modulehashes []modulehash

	hasmain uint8 // 1 if module contains the main function, 0 otherwise

	gcdatamask, gcbssmask bitvector

	typemap map[typeOff]*_type // offset to *_rtype in previous module

	bad bool // module failed to load and should be ignored

	next *moduledata
}

The moduledata structure has been relatively stable in the last few releases of the Go compiler. In version 1.8 the field textsectmap was added which means the offset for the typelinks slice is different between 1.7 and 1.8+, the moduledata structure for 1.7 is shown below, otherwise, it has been unchanged.

type moduledata struct {
	pclntable    []byte
	ftab         []functab
	filetab      []uint32
	findfunctab  uintptr
	minpc, maxpc uintptr

	text, etext           uintptr
	noptrdata, enoptrdata uintptr
	data, edata           uintptr
	bss, ebss             uintptr
	noptrbss, enoptrbss   uintptr
	end, gcdata, gcbss    uintptr
	types, etypes         uintptr

	typelinks []int32 // offsets from types
	itablinks []*itab

	modulename   string
	modulehashes []modulehash

	gcdatamask, gcbssmask bitvector

	typemap map[typeOff]*_type // offset to *_rtype in previous module

	next *moduledata
}

All the type-information is located in the types data. The types data not only holds the type-information, but it also holds other data about the types. To find the type-information, the typelinks slice is needed. This slice holds offsets from the beginning of the types to where the information of a type is stored. Unfortunately, offsets for all types are not located within this slice, but it is still possible to find all types using this array.

Parsing the type-information

The offsets in the typelinks points to a data structure that describes the type. The data structure is used by Go track all the different types within the binary. The structure is defined in three places: the compiler, the reflect package, and the runtime package. In the runtime package, the name of the structure is _type and in the reflect package it is called rtype. The definition of the rtype structure is shown below.

type rtype struct {
	size       uintptr
	ptrdata    uintptr  // number of bytes in the type that can contain pointers
	hash       uint32   // hash of type; avoids computation in hash tables
	tflag      tflag    // extra type-information flags
	align      uint8    // alignment of variable with this type
	fieldAlign uint8    // alignment of struct field with this type
	kind       uint8    // enumeration for C
	alg        *typeAlg // algorithm table
	gcdata     *byte    // garbage collection data
	str        nameOff  // string form
	ptrToThis  typeOff  // type for pointer to this type, may be zero
}

As said earlier, all types in the binary have a corresponding _type/rtype structure. This includes all the primitive types and user-defined types. The kind field is an enum value corresponding to the underlying primitive type. All the possible options are shown below.

const (
	Invalid Kind = iota
	Bool
	Int
	Int8
	Int16
	Int32
	Int64
	Uint
	Uint8
	Uint16
	Uint32
	Uint64
	Uintptr
	Float32
	Float64
	Complex64
	Complex128
	Array
	Chan
	Func
	Interface
	Map
	Ptr
	Slice
	String
	Struct
	UnsafePointer
)

Another interesting field is str. This value is an offset from the beginning of the types data to where a packed byte structure exists with the type’s name and other string information. For example, the primitive type Int will also have the name of int, but derived types are different. Say you have defined a type superInt as below. Its name would be superInt while the kind enum is an Int.

type superInt int

The tflag field is a bitmask that is used to inform about potentially other data after the structure as described in the source code snippet shown below.

// tflag is used by an rtype to signal what extra type-information is
// available in the memory directly following the rtype value.
//
// tflag values must be kept in sync with copies in:
//	cmd/compile/internal/gc/reflect.go
//	cmd/link/internal/ld/decodesym.go
//	runtime/type.go
type tflag uint8

const (
	// tflagUncommon means that there is a pointer, *uncommonType,
	// just beyond the outer type structure.
	//
	// For example, if t.Kind() == Struct and t.tflag&tflagUncommon != 0,
	// then t has uncommonType data and it can be accessed as:
	//
	//	type tUncommon struct {
	//		structType
	//		u uncommonType
	//	}
	//	u := &(*tUncommon)(unsafe.Pointer(t)).u
	tflagUncommon tflag = 1 << 0

	// tflagExtraStar means the name in the str field has an
	// extraneous '*' prefix. This is because for most types T in
	// a program, the type *T also exists and reusing the str data
	// saves binary size.
	tflagExtraStar tflag = 1 << 1

	// tflagNamed means the type has a name.
	tflagNamed tflag = 1 << 2
)

An uncommonType

As mentioned in the previous section, some times can be uncommon types. So what are uncommon types? It turns out that they are more common than you think. In Go, any type can have methods associated with it. This is done by the example shown below.

type T struct{}

func (t T) myMethod()

In the code snippet, myMethod is method for the type T. This makes T an uncommon type. In other words, uncommon types are types with methods.

Information about the type’s methods is defined in the uncommon structure. As described in the section above, this structure is located right after the type structure. The layout of the uncommonType structure is shown below. It holds information about the import path, the number of methods (total and exported), and an offset from this structure to an array of method data structures. This is the current definition of the structure as the release of Go 1.13beta1 and its general shape has been like this since the first release of Go 1.7. Versions before 1.7 have a very different look.

type uncommonType struct {
	pkgPath nameOff // import path; empty for built-in types like int, string
	mcount  uint16  // number of methods
	xcount  uint16  // number of exported methods
	moff    uint32  // offset from this uncommontype to [mcount]method
	_       uint32  // unused
}

Go 1.7beta1 was the first release with the new design of this structure. Its uncommonType is shown below. It is much smaller than the current one, but it essentially holds the same information. This structure definition is unique and does not exist any binaries produced by other versions of the Go compiler.

type uncommonType struct {
	pkgPath nameOff // import path; empty for built-in types like int, string
	mcount  uint16  // number of methods
	moff    uint16  // offset from this uncommontype to [mcount]method
}

The general shape of the structure was released with the release of Go 1.7beta2. It is the same size as the current structure but the xcount field is unused. For extracting the methods, this has no noticeable effect.

type uncommonType struct {
	pkgPath nameOff // import path; empty for built-in types like int, string
	mcount  uint16  // number of methods
	_       uint16  // unused
	moff    uint32  // offset from this uncommontype to [mcount]method
	_       uint32  // unused
}

One of the fields in the structure, moff, points to an array of method structures. The definition of this structure is shown below.

// Method on non-interface type
type method struct {
	name nameOff // name of method
	mtyp typeOff // method type (without receiver)
	ifn  textOff // fn used in interface call (one-word receiver)
	tfn  textOff // fn used for normal method call
}

The mtyp field is an offset to the function type for the method. It is a _type/rtype structure with the kind value of Func. More on this type later. Both of the ifn and tfn fields points to offsets in the text section of the binary. This where function code is located.

When analyzing real binaries, it turns out that some methods do not have a method type or an offset in the text section. Below is an analysis of a binary. In the snippet, the method array for *strconv.decimal is walked and the values are printed. It can be seen that most of them do not have a method type and some of the functions do not have offsets to function code.

*strconv.decimal has 9 methods
Method 1 name: Assign
Function at 0x58930 and 0x58930
Method 2 name: Round
Function at 0x59170 and 0x59170
Method 3 name: RoundDown
Function at 0x592d0 and 0x592d0
Method 4 name: RoundUp
Function at 0x59320 and 0x59320
Method 5 name: RoundedInteger
Function at 0x0 and 0x0
Method 6 name: Shift
Function at 0x590a0 and 0x590a0
Method 7 name: String
Method type: func() string
Function at 0x58310 and 0x58310
Method 8 name: floatBits
Function at 0x0 and 0x0
Method 9 name: set
Method type: func(string) bool
Function at 0x0 and 0x0

The symbols in the binary, shown below, also confirms that some functions are missing.

0x00458720  sym.strconv.__decimal_.String
0x00458bf0  sym.strconv.__decimal_.Assign
0x00459130  sym.strconv.__decimal_.Shift
0x00459200  sym.strconv.__decimal_.Round
0x004592d0  sym.strconv.__decimal_.RoundUp
0x00459710  sym.strconv.__extFloat_.FixedDecimal
0x00459c10  sym.strconv.__extFloat_.ShortestDecimal
0x0045e210  sym.type..hash.strconv.decimal
0x0045e270  sym.type..eq.strconv.decimal

It turns out that the Go compiler does some pruning of methods that are not used. While not all information is always present, the name of the method is still available which can be used for further analysis.

Some of Go Types

Each primitive type has a corresponding data type in the runtime. All of these data types are structures and the _type/rtype is the first field. It is an anonymous field so hence embedded. This means, when parsing the type data, all the extra data for the specific type is usually located right after the _type/rtype data. The kind field can be used to figure out what type and what data will be right after the _type/rtype structure.

Struct type

The structType data type, shown below, is used to store information about each type derived from the primitive struct type. It has two extra field, pkgPath, and fields. The pkgPath field is the import name of the package while the fields is a slice of structField, also shown below, which are used to store information about the fields. The structField structure has three fields. The first one is the name of the field, the second is a pointer to a _type/rtype structure that can be used to determine the type of the field, the last is an integer that encodes the offset and if the field is embedded/anonymous.

// structType represents a struct type.
type structType struct {
	rtype
	pkgPath name
	fields  []structField // sorted by offset
}

// Struct field
type structField struct {
	name        name    // name is always non-empty
	typ         *rtype  // type of field
	offsetEmbed uintptr // byte offset of field<<1 | isEmbedded
}

func (f *structField) offset() uintptr {
	return f.offsetEmbed >> 1
}

func (f *structField) embedded() bool {
	return f.offsetEmbed&1 != 0
}

If the struct type has some methods attached to it, it is an uncommon type. In this scenario, the uncommon data structure is right after the structType data as shown below.

type structTypeUncommon struct {
	structType
	u uncommonType
}

Pointer type

Pointers to types have their own type called ptrType, it is shown in the code block below. It essentially just adds a pointer to a _type/rtype for the type it points to. This means, for example, *int and *uint are two different types and have their own ptrType structure stored in the binary.

// ptrType represents a pointer type.
type ptrType struct {
	rtype
	elem *rtype // pointer element (pointed at) type
}

One note when it comes to methods. If a pointer receiver is used when defining a method, as seen in the example below, the methods will be attached to *myThing and not myThing.

type myThing struct{}

func (m *myThing) DoSomething()

Interface type

The data structure for interfaces is simple and is shown below. It has essentially two additional fields. One for the import pathname and a slice of imethod. The imethod structure, also shown below, provides information about the functions that need to be implemented to satisfy the interface. The first field in the imethod structure is the name. This is the function name. The second field is the offset to a _type/rtype structure. This structure is of the “kind” function and hence provide information about the function definition, i.e., types for the function arguments and return values.

// interfaceType represents an interface type.
type interfaceType struct {
	rtype
	pkgPath name      // import path
	methods []imethod // sorted by hash
}

// imethod represents a method on an interface type
type imethod struct {
	name nameOff // name of method
	typ  typeOff // .(*FuncType) underneath
}

Map type

The map type is probably the most complex structures of all the types. It is shown below. It has information about a bunch of sizes that are used under the hood. Luckily, this is created by the compiler and the programmer has no control over it so it can be ignored. The fields that are of interest are key and elem. By parsing these values, it is possible to reconstruct the source code representation of the type definition. The fields are pointers to two _type/rtype structures and essentially corresponds to map[key]elem.

// mapType represents a map type.
type mapType struct {
	rtype
	key        *rtype // map key type
	elem       *rtype // map element (value) type
	bucket     *rtype // internal bucket structure
	keysize    uint8  // size of key slot
	valuesize  uint8  // size of value slot
	bucketsize uint16 // size of bucket
	flags      uint32
}

Slice and array type

The slice and array types are very similar, both shown below. The slice type information is recorded in the elem field and for arrays, the length is stored in the len field.

// sliceType represents a slice type.
type sliceType struct {
	rtype
	elem *rtype // slice element type
}

// arrayType represents a fixed array type.
type arrayType struct {
	rtype
	elem  *rtype // array element type
	slice *rtype // slice type
	len   uintptr
}

Channel type

Similar to the array, slice, and map type, the chanType also has a field called elem to track what type is sent over the channel. It also has an enum to indicate if the channel only can receive, only send, or send and receive.

// chanType represents a channel type.
type chanType struct {
	rtype
	elem *rtype  // channel element type
	dir  uintptr // channel direction (ChanDir)
}

Function type

Since functions in Go are first-class citizens, there is also a type definition for function types. The following code snippet is taken from the standard library describing the type. Since it’s possible for all types to have methods, making them an uncommonType, function types can also have methods. When this happens, the code snippet below describes how the data is stored in the binary. The funcType just has two additional fields after the rtype/_type structure, a uint16 for the number of function arguments and a uint16 for the number of function return values. The type-information for the function arguments and return values are stored in an array right after the funcType data structure.

// funcType represents a function type.
//
// A *rtype for each in and out parameter is stored in an array that
// directly follows the funcType (and possibly its uncommonType). So
// a function type with one method, one input, and one output is:
//
//	struct {
//		funcType
//		uncommonType
//		[2]*rtype    // [0] is in, [1] is out
//	}
type funcType struct {
	rtype
	inCount  uint16
	outCount uint16 // top bit is set if last input parameter is ...
}

Conclusion

All the types used by a Go application are stored within a types section inside the binary. By parsing this data structure, it is possible to fully recover all the function definitions. This includes private types and fields.