目录

Fishhook-源码分析

Fishhook-源码分析

看腾讯开源的iOS内存监控组件OOMDetector的过程中,发现其内部使用了Facebook开源的fishhook。遂跟着研究了一下这个库。

fishhook可以在模拟器和设备上的iOS上运行的Mach-O二进制文件中动态重新绑定符号,可以交换C的函数。

fishhook代码量不多,一共就两个文件fishhook.h和fishhook.c。其中.c文件只有200多行,整体所以看起来不是很复杂的,不过其中涉及到了 Mach-O 和函数指针相关的知识,想看懂这些代码还是需要一些基础的。

先看一下调用流程。

调用流程

1
2
3
4
5
6
int rebind_symbols_image(void *header,
                         intptr_t slide,
                         struct rebinding rebindings[],
                         size_t rebindings_nel);

int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel);

这两个函数是暴漏在.h,给用户调用的,其内部都是直接或者间接调用了

1
2
3
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide);

接着最终调用

1
2
3
4
5
6
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
                                           section_t *section,
                                           intptr_t slide,
                                           nlist_t *symtab,
                                           char *strtab,
                                           uint32_t *indirect_symtab);

大概流程就是这样了,其中在perform_rebinding_with_section函数中这段代码是交换函数的地方

1
2
3
4
5
if (cur->rebindings[j].replaced != NULL &&
    indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
    *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
}
indirect_symbol_bindings[i] = cur->rebindings[j].replacement;

实例应用(OOMDetector中替换malloc等函数)

此实例是OOMDetector使用fishhook替换malloc函数。

下面的替换malloc等函数的入口函数hookMalloc,该函数调用了OOMDetector的rebind_symbols_for_imagename函数。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void hookMalloc()
{
    if(!isPaused){
        beSureAllRebindingFuncBeenCalled();
        
        orig_malloc = malloc;
        orig_calloc = calloc;
        orig_valloc = valloc;
        orig_realloc = realloc;
        orig_block_copy = _Block_copy;
        rebind_symbols_for_imagename(
                                     (struct rebinding[5]){
                                                        {"realloc",(void*)new_realloc,(void**)&orig_realloc},
                                                        {"malloc", (void*)new_malloc, (void **)&orig_malloc},
                                                        {"valloc",(void*)new_valloc,(void**)&orig_valloc},
                                                        {"calloc",(void*)new_calloc,(void**)&orig_calloc},
                                                        {"_Block_copy",(void*)new_block_copy,(void**)&orig_block_copy}},
                                     5,
                                     getImagename());
    }
    else{
        isPaused = false;
    }

}

hookMalloc替换了reallocmallocvalloccalloc_Block_copy这几个函数。

下面是新的new_malloc函数定义:

1
2
3
4
5
static void* (*orig_malloc)(size_t);
static void* (*orig_calloc)(size_t, size_t);
static void* (*orig_realloc)(void *, size_t);
static void* (*orig_valloc)(size_t);
static void* (*orig_block_copy)(const void *aBlock);

下面是rebind_symbols_for_imagename第三个入参————镜像名称的获取方式:

1
2
3
4
5
6
7
8
9
const char *getImagename()
{
    const char* name = _dyld_get_image_name(0);
    const char* tmp = strrchr(name, '/');
    if (tmp) {
        name = tmp + 1;
    }
    return name;
}

调用了<mach-o/dyld.h> 头文件中的_dyld_get_image_name函数。 _dyld_get_image_name 根据镜像的索引,获取镜像的名称。strrchr函数是反向查找第一个给定字符。返回第一次匹配到的地址指针。

rebind_symbols_for_imagename函数是调用了fishhook中的rebind_symbols_image函数实现了函数的交换。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
void rebind_symbols_for_imagename(struct rebinding rebindings[],
                                  size_t rebindings_nel,
                                  const char *imagename)
{
    uint32_t count = _dyld_image_count();
    for (uint32_t i = 0; i < count; i++) {
        const mach_header_t* header = (const mach_header_t*)_dyld_get_image_header(i);
        const char* name = _dyld_get_image_name(i);
        const char* tmp = strrchr(name, '/');
        long slide = _dyld_get_image_vmaddr_slide(i);
        if (tmp) {
            name = tmp + 1;
        }
        if(strcmp(name,imagename) == 0){
            rebind_symbols_image((void *)header,
                                 slide,
                                 rebindings,
                                 rebindings_nel);
            break;
        }
    }
}

rebindings承载重新绑定的所有信息,rebindings_nelrebindings的个数,imagename要替换函数指针镜像名称,也就是只替换名称为imagename文件中的函数指针,其他库不做替换。

_dyld_get_image_header 获取镜像的header头,_dyld_get_image_vmaddr_slide获取镜像的随机启动地址。rebind_symbols_image函数为重新绑定做准备,增加了两个参数,header、slide。header就是加载到内存的中二进制文件的头。slide 的是ALSR技术中的随机启动地址。这个地址的含义可以参考iOS crash reports: atos not working as expectediOS crash log 解析 symbol address = stack address - slide 运行时获取slide的api 利用dwarfdump从dsym文件中得到symbol

关于ALSR

下面为了理解ALSR中的slide,贴出一段测试代码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
void understandALSR()
{
    ///枚举所有镜像
    for (int i = 0; i < _dyld_image_count(); i++)
    {
        char *image_name = (char *)_dyld_get_image_name(i);
        const struct mach_header *mh = _dyld_get_image_header(i);
        intptr_t vmaddr_slide = _dyld_get_image_vmaddr_slide(i);
        
        printf("Image name %s ,image header 0x%llx , ASLR slide 0x%lx.\n",
               image_name, (mach_vm_address_t)mh, vmaddr_slide);
    }
}

输出如下:

1
2
3
Image name /Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/Library/CoreSimulator/Profiles/Runtimes/iOS.simruntime/Contents/Resources/RuntimeRoot/usr/lib/dyld_sim ,image header 0x107ca6000 , ASLR slide 0x107ca6000.

Image name /Users/ankang/Library/Developer/CoreSimulator/Devices/C9991234-7FA4-4F9E-9C73-629AFC886DC1/data/Containers/Bundle/Application/5C4118B1-2236-4A9C-B0A3-0DF77D765054/LearnMachO.app/LearnMachO ,image header 0x107c9c000 , ASLR slide 0x7c9c000.

代码对应的二进制文件中的加载地址(链接时load address)示意图:

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/macho01.png

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/macho02.png

结合上面的示例代码和二进制内容图片,可以诠释下面的公式:

1
2
slide = (运行时)load address - (链接时)load address;
symble address  stack address - slide

其中:

  1. stack address : 程序运行时线程栈中所有函数调用的地址。
  2. symble address : dsym文件中函数符号对应的地址,用此地址在dsym文件中可以查出对应的符号信息。

可以看出,没有ASLR时:

1
symble address =  stack address

源码分析

fishhook代码这么少,所以可以进行一次全面的分析了。

首先,fishhook定义了一个结构体,存储绑定的数据。定义如下:

1
2
3
4
5
6
7
8
9
/*
 * A structure representing a particular intended rebinding from a symbol
 * name to its replacement
 */
struct rebinding {
  const char *name;  `  //替换的函数名字
  void *replacement;	//替换后的函数指针
  void **replaced;      //原始的方法
};

接着看rebind_symbols_image函数:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
int rebind_symbols_image(void *header,
                         intptr_t slide,
                         struct rebinding rebindings[],
                         size_t rebindings_nel) {
    struct rebindings_entry *rebindings_head = NULL;
    int retval = prepend_rebindings(&rebindings_head, rebindings, rebindings_nel);
    rebind_symbols_for_image(rebindings_head, (const struct mach_header *) header, slide);
    if (rebindings_head) {
      free(rebindings_head->rebindings);
    }
    free(rebindings_head);
    return retval;
}

rebind_symbols_image 主要工作是通过prepend_rebindings构造绑定的结构体struct rebindings_entry,定义如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
struct rebindings_entry {
  struct rebinding *rebindings;
  size_t rebindings_nel;
  struct rebindings_entry *next;
};

static int prepend_rebindings(struct rebindings_entry **rebindings_head,
                              struct rebinding rebindings[],
                              size_t nel) {
  // 申请new_entry地址  
  struct rebindings_entry *new_entry = (struct rebindings_entry *) malloc(sizeof(struct rebindings_entry));
  if (!new_entry) {
    return -1;
  }
  // 给new_entry->rebindings分配内存
  new_entry->rebindings = (struct rebinding *) malloc(sizeof(struct rebinding) * nel);
  if (!new_entry->rebindings) {
    free(new_entry);
    return -1;
  }
  // 拷贝重新绑定信息到new_entry->rebindings  
  memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
  new_entry->rebindings_nel = nel;
  new_entry->next = *rebindings_head;
    
  // 这个将创建的new_entry赋值给rebindings_head,通过二级指针返回给调用者
  *rebindings_head = new_entry;
  return 0;
}

prepend_rebindings只是将绑定的struct rebinding类型参数转化为struct rebindings_entry结构,这种结构类似链表,可以方便的管理多个struct rebinding结构。下面看rebind_symbols_for_image

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
  Dl_info info;
  // find the image containing a given address
  // 验证库是否存在
  if (dladdr(header, &info) == 0) {
    return;
  }

  segment_command_t *cur_seg_cmd;
  
  // SEG_LINKEDIT commond指针
  segment_command_t *linkedit_segment = NULL;
  // LC_SYMTAB commond指针
  struct symtab_command* symtab_cmd = NULL;
  // LC_DYSYMTAB commond指针
  struct dysymtab_command* dysymtab_cmd = NULL;
	
  // 定位linkedit_segment、symtab_cmd、dysymtab_cmd三指针
  uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize)
  {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT)
    {
      if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0)
      {
        linkedit_segment = cur_seg_cmd;
      }
    }
    else if (cur_seg_cmd->cmd == LC_SYMTAB)
    {
      symtab_cmd = (struct symtab_command*)cur_seg_cmd;
    }
    else if (cur_seg_cmd->cmd == LC_DYSYMTAB)
    {
      dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
    }
  }

  if (!symtab_cmd || !dysymtab_cmd || !linkedit_segment ||
      !dysymtab_cmd->nindirectsyms) {
    return;
  }

  // Find base symbol/string table addresses 查找符号表和string表
  uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
  nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
  char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);

  // Get indirect symbol table (array of uint32_t indices into symbol table)
  uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);

  cur = (uintptr_t)header + sizeof(mach_header_t);
  
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) 
  {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT)
    {
      if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
          strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
        continue;
      }
      // 找到类型为S_LAZY_SYMBOL_POINTERS、S_NON_LAZY_SYMBOL_POINTERS类型的section
      for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
        section_t *sect =
          (section_t *)(cur + sizeof(segment_command_t)) + j;
        if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
        if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
      }
    }
  }
}

rebind_symbols_for_image 函数主要定位到三表,类型为S_LAZY_SYMBOL_POINTERSS_NON_LAZY_SYMBOL_POINTERS类型的section。然后调用perform_rebinding_with_section 函数。其中部分宏定义如下。

1
2
3
4
#define LC_SEGMENT_ARCH_DEPENDENT LC_SEGMENT_64
#define	SEG_LINKEDIT	"__LINKEDIT"
#define	S_NON_LAZY_SYMBOL_POINTERS 0x6  /* section with only non-lazy symbol pointers */
#define	S_LAZY_SYMBOL_POINTERS 0x7     /* section with only lazy symbol

上面的宏定义于<mach-o/loader.h>文件中,位于477行。 在S_NON_LAZY_SYMBOL_POINTERS宏的上面有一段注释。这段注释非常的关键。说明了S_NON_LAZY_SYMBOL_POINTERS类型的section中的数据和动态符号表中的符号是顺序对应的。只有理解了这段内容。才能真正的理解fishhook。

For the two types of symbol pointers sections and the symbol stubs section they have indirect symbol table entries.

对于存储符号指针、符号存根两种类型的section,它们有间接符号表条目。

For each of the entries in the section,the indirect symbol table entries, in corresponding order in the indirect symbol table, start at the index stored in the reserved1 field of the section structure.

间接符号表条目顺序对应section中的条目,对应从section的reserved1索引开始。

Since the indirect symbol table entries correspond to the entries in the section the number of indirect symbol table entries is inferred from the size of the section divided by the size of the entries in the section. For symbol pointers sections the size of the entries in the section is 4 bytes (看结构是8bytes,可能是64位的缘故)

由于间接符号表条目对应于section中的条目,因此间接符号表条目的数量由section的大小除以section中的条目的大小来推断。对于符号指针节,section中的条目的大小是4个字节

下面是perform_rebinding_with_section函数源码:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
                                           section_t *section,  // section
                                           intptr_t slide,     // 基址
                                           nlist_t *symtab,  // 符号表
                                           char *strtab,  // sting 表
                                           uint32_t *indirect_symtab // 动态符号表
                                           )
{
    // 动态符号表 对应section的地址,动态符号表的基地址+section对应的偏移量
    uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;

    // section加载到内存的虚拟地址(是要替换这里面的东西)
    void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
    
    // section中替换函数指针
    for (uint i = 0; i < section->size / sizeof(void *); i++)
    {
        // 由于section中条目和动态符号表中的条目对应,所以将section中的条目索引i用于动态符号表。
        // 下面目的是获取section中存储的函数指针的符号  动态符号表-> 符号表-> string表
        
        // indirect_symbol_indices[i]中存储符号表的索引。symtab_index就是符号表的索引
        uint32_t symtab_index = indirect_symbol_indices[i];
        if (symtab_index == INDIRECT_SYMBOL_ABS || 
            symtab_index == INDIRECT_SYMBOL_LOCAL ||
            symtab_index == (INDIRECT_SYMBOL_LOCAL | 
                             INDIRECT_SYMBOL_ABS))
        {
            continue;
        }
        // 通过nlist_t的n_strx找到String表的偏移地址,strtab_offset。
        uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
        char *symbol_name = strtab + strtab_offset;
        if (strnlen(symbol_name, 2) < 2)
        {
            continue;
        }
        
        ///依次枚举struct rebindings_entry 结构,判断时候有和当前函数指针相等的带替换的函数。
        struct rebindings_entry *cur = rebindings;
        while (cur)
        {
            for (uint j = 0; j < cur->rebindings_nel; j++)
            {
                ///如果符号名称和替换的名称一样,进行替换
                if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0)
                {
                    if (cur->rebindings[j].replaced != NULL &&
                      indirect_symbol_bindings[i] != cur->rebindings[j].replacement)
                    {
                        ///保存原始的指针地址,只保存一次
                        *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
                    }
                    ///在section中替换。将indirect_symbol_bindings[i]中存储的函数指针地址进行替换。   
                    indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
                    goto symbol_loop;
                }
            }
            cur = cur->next;
        }
        symbol_loop:;
    }
}

perform_rebinding_with_section函数完成替换的关键函数。 将section中的函数指针替换为新的函数指针。

图解fishhook

同事画的9张图片解释fishhook原理,看代码懵逼的可以看看图,有助于理解

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP1.png

图1. load command中_DATA segement中__la_symbol_ptr section结构图,说明该section和动态符号表对应的起始索引是146

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP2.png

图2.数据区域中__la_symbol_ptr的结构,可以看出该section的起始地址是0x00240B0,图2和图3是为了找malloc函数指针的位置

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP3.png

图3.数据区域中__la_symbol_ptr的结构, 偏移了一定的位置。文件0x000242B0地址出存储的是malloc函数指针。

计算偏移数量的方法:0x242B0 - 0x240B0 = 0x200 每个条目占用8个字节0x200/0x8 = 0x40 = 64 得出:在符号表中,malloc偏移64个条目

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP4.png

图4 .转到数据区域中动态符号表的起始位置处,该图说明动态符号表的起始地址是0x3B0A4

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP5.png

图5 。 计算la_symbol_ptr 对应的符号在动态符号表中的位置: 0x3B060 + 146*4 = 0x3B060 + 0x248 = 0x3B2A8,地址0x0003B2A8处后面的符号和la_symbol_ptr中的条目对应。

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP6.png

图6 . 查找64个偏移后的动态符号表的地址:0x3B2A8 + 0x40*x4 = 0x3B3A8,地址0x3B3A8中存储的值是符号表中的索引, 为0xb32。由于machoviewer看不到符号表,所以用代码查看符号表中索引为0xB32 的符号信息。

查看代码:struct nlist_64 const * mallocNlist =[self getSymbol64ByIndex:0xB32];

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP7.png

图7 .代码查看符号表示意图。符号表的索引为0xB32的符号的信息。可以看出n_strx的值是0x2B07, 这个值指的是string表中的偏移量

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP8.png

图8 .string 表,string表的起始地址是0x3B498

https://anyanf-img-1256234566.cos.ap-beijing.myqcloud.com/2018/fishhookP9.png

图9 .string 表, 偏移0x2B07 后的结果—— 0x3B498 + 0x2B07 = 0x3DF9F, 这个地址存储的字符串就是图3中

地址为0x10001ef10的函数指针的名称,名称是malloc。这个名称和我么要替换的名称一致,所以替换图3中的 0x10001ef10,替换成新指定函数指针地址。实现替换。