linux – 使用ftrace和kprobes捕获用户空间程序集(使用虚拟地址转换)?

对于冗长的帖子道歉,我在以较短的方式制定它时遇到了麻烦.此外,这可能更适合Unix& Linux Stack Exchange,但我会先在SO上尝试,因为有一个ftrace标签.

无论如何 – 我想观察使用ftrace在完整的function_graph捕获的上下文中执行用户程序的机器指令.一个问题是我需要这个旧内核:

$uname -a
Linux mypc 2.6.38-16-generic #67-Ubuntu SMP Thu Sep 6 18:00:43 UTC 2012 i686 i686 i386 GNU/Linux

……在这个版本中,没有UPROBES – 正如Uprobes in 3.5 [LWN.net]所说,它应该能够做到这一点. (只要我不需要修补原始内核,我就会愿意尝试使用树构建的内核模块,正如User-Space Probes (Uprobes) [chunghwan.com]似乎证明的那样;但据我在0: Inode based uprobes [LWN.net]中可以看到,2.6可能需要一个完整的补丁)

但是,在这个版本上,有一个/ sys / kernel / debug / kprobes和/ sys / kernel / debug / tracing / kprobe_events;和Documentation/trace/kprobetrace.txt意味着可以直接在地址上设置kprobe;即使我无法在任何地方找到如何使用它的例子.

在任何情况下,我仍然不确定使用什么地址 – 作为一个小例子,假设我想跟踪wtest.c程序的主要功能的开始(包括在下面).我可以这样做来编译并获得一个机器指令汇编列表:

$gcc -g -O0 wtest.c -o wtest
$objdump -S wtest | less
...
08048474 <main>:
int main(void) {
 8048474:       55                      push   %ebp
 8048475:       89 e5                   mov    %esp,%ebp
 8048477:       83 e4 f0                and    $0xfffffff0,%esp
 804847a:       83 ec 30                sub    $0x30,%esp
 804847d:       65 a1 14 00 00 00       mov    %gs:0x14,%eax
 8048483:       89 44 24 2c             mov    %eax,0x2c(%esp)
 8048487:       31 c0                   xor    %eax,%eax
  char filename[] = "/tmp/wtest.txt";
...
  return 0;
 804850a:       b8 00 00 00 00          mov    $0x0,%eax
}
...

我会通过这个脚本设置ftrace日志记录:

sudo bash -c '
KDBGPATH="/sys/kernel/debug/tracing"
echo function_graph > $KDBGPATH/current_tracer
echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options
echo 0 > $KDBGPATH/tracing_on
echo > $KDBGPATH/trace
echo 1 > $KDBGPATH/tracing_on ; ./wtest ; echo 0 > $KDBGPATH/tracing_on
cat $KDBGPATH/trace > wtest.ftrace
'

您可以在debugging – Observing a hard-disk write in kernel space (with drivers/modules) – Unix & Linux Stack Exchange(我从中获得示例)中看到(否则是复杂的)结果ftrace日志的一部分.

基本上,我想在这个ftrace日志中打印输出,当主要的第一条指令 – 比如0x8048474,0x8048475,0x8048477,0x804847a,0x804847d,0x8048483和0x8048487的指令 – 由(任何)CPU执行时.问题是,据我所知,从Anatomy of a Program in Memory : Gustavo Duarte开始,这些地址就是虚拟地址,从过程本身的角度来看(我收集的是,相同的视角由/ proc / PID / maps显示)……显然,对于krpobe_event,我需要一个物理地址?

所以,我的想法是:如果我能找到对应于程序反汇编的虚拟地址的物理地址(比如编写一个内核模块,它可以接受pid和地址,并通过procfs返回物理地址),我可以设置通过上面脚本中的/ sys / kernel / debug / tracing / kprobe_events将地址作为一种“跟踪点” – 并希望将它们放在ftrace日志中.原则上这可行吗?

我在Linux(ubuntu),C language: Virtual to Physical Address Translation – Stack Overflow找到了一个问题:

In user code,you can’t know the physical address corresponding to a virtual address. This is information is simply not exported outside the kernel. It could even change at any time,especially if the kernel decides to swap out part of your process’s memory.

Pass the virtual address to the kernel using systemcall/procfs and use vmalloc_to_pfn. Return the Physical address through procfs/registers.

但是,vmalloc_to_pfn似乎也不是微不足道的:

x86 64 – vmalloc_to_pfn returns 32 bit address on Linux 32 system. Why does it chop off higher bits of PAE physical address? – Stack Overflow

VA: 0xf8ab87fc PA using vmalloc_to_pfn: 0x36f7f7fc. But I’m actually expecting: 0x136f7f7fc.

The physical address falls between 4 to 5 GB. But I can’t get the exact physical address,I only get the chopped off 32-bit address. Is there another way to get true physical address?

所以,我不确定我是如何可靠地提取物理地址所以它们被kprobes追踪 – 特别是因为“它甚至可以在任何时候改变”.但是在这里,我希望由于程序很小而且微不足道,程序在跟踪时不会交换,从而可以获得适当的捕获. (所以即使我必须多次运行调试脚本,只要我希望在10次(甚至100次)中获得“正确”捕获,我就可以了.)

请注意,我希望通过ftrace输出,以便时间戳在同一个域中表示(有关时间戳问题的说明,请参阅Reliable Linux kernel timestamps (or adjustment thereof) with both usbmon and ftrace? – Stack Overflow).因此,即使我能想出一个gdb脚本,从用户空间运行和跟踪程序(同时获得ftrace捕获) – 我想避免这种情况,因为gdb本身的开销会显示在ftrace日志中.

所以,总结一下:

>是否从虚拟(从可执行文件的反汇编)地址获取(可能通过单独的内核模块)物理地址的方法 – 因此它们用于触发由ftrace记录的kprobe_event – 值得追求?如果是这样,是否有任何可用于此地址转换目的的内核模块示例?
>在执行特定的内存地址时,我是否可以使用内核模块“注册”回调/处理函数?然后我可以简单地在该函数中使用trace_printk来创建一个ftrace日志(或者甚至没有它,处理程序函数名称本身应该显示在ftrace日志中),并且它似乎不会有太多的开销. .

实际上,在2007年发布的Jim Keniston – utrace-based uprobes: systemtap mailing list中,有11个Uprobes示例(添加到Documentation / uprobes.txt中),这似乎就是一个内核模块注册处理函数.不幸的是,它使用linux / uprobes.h;我的/usr/src/linux-headers-2.6.38-16/include/linux/中只有kprobes.h.另外,在我的系统上,甚至systemtap抱怨没有启用CONFIG_UTRACE(参见this comment)…所以如果有任何其他方法我可以用来获得我想要的调试跟踪,而不必重新编译内核来获取探测器,它很高兴知道……

wtest.c:

#include <stdio.h>
#include <fcntl.h>  // O_CREAT,O_WRONLY,S_IRUSR

int main(void) {
  char filename[] = "/tmp/wtest.txt";
  char buffer[] = "abcd";
  int fd;
  mode_t perms = S_IRUSR|S_IWUSR|S_IRGRP|S_IWGRP|S_IROTH|S_IWOTH;

  fd = open(filename,O_RDWR|O_CREAT,perms);
  write(fd,buffer,4);
  close(fd);

  return 0;
}

解决方法

显然,使用内核3.5上的内置增强功能会更容易;但鉴于我的内核2.6.38的uprobes是一个非常深入的补丁(我无法在单独的内核模块中真正隔离,以避免修补内核),这是我可以注意到的独立模块在2.6.38. (由于我仍然不确定很多事情,我仍然希望看到一个可以纠正这篇文章中任何误解的答案.)

我想我到了某个地方,但没有kprobes.我不确定,但似乎我设法让物理地址正确;但是,kprobes文档是特定的,当使用“@ADDR:在ADDR获取内存(ADDR应该在内核中)”时;我得到的物理地址低于0xc0000000的内核边界(但是,0xc0000000通常与虚拟内存布局一起?).

所以我使用了硬件断点 – 模块在下面,但是需要注意的是 – 它随机行为,偶尔会导致内核哎呀!通过编译模块,并在bash中运行:

$sudo bash -c 'KDBGPATH="/sys/kernel/debug/tracing" ;
echo function_graph > $KDBGPATH/current_tracer ; echo funcgraph-abstime > $KDBGPATH/trace_options
echo funcgraph-proc > $KDBGPATH/trace_options ; echo 8192 > $KDBGPATH/buffer_size_kb ;
echo 0 > $KDBGPATH/tracing_on ; echo > $KDBGPATH/trace'
$sudo insmod ./callmodule.ko && sleep 0.1 && sudo rmmod callmodule && \
tail -n25 /var/log/syslog | tee log.txt && \
sudo cat /sys/kernel/debug/tracing/trace >> log.txt

……我得到一份日志.我想跟踪wtest main()的前两个指令,对我来说是:

$objdump -S wtest/wtest | grep -A3 'int main'
int main(void) {
 8048474:   55                      push   %ebp
 8048475:   89 e5                   mov    %esp,%ebp
 8048477:   83 e4 f0                and    $0xfffffff0,%esp

…在虚拟地址0x08048474和0x08048475.在syslog输出中,我可以得到,说:

...
[ 1106.383011] callmodule: parent task a: f40a9940 c: kworker/u:1 p: [14] s: stopped
[ 1106.383017] callmodule: - wtest [9404]
[ 1106.383023] callmodule: Trying to walk page table; addr task 0xEAE90CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1106.383029] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383049] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383067] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f63e5d80; *virtual (page_address) @   (null) (is_vmalloc_addr 0 virt_addr_valid 0 virt_to_phys 0x40000000) page_to_pfn 639ec page_to_phys 0x639ec000
[ 1106.383083] callmodule: physaddr : (0x080483c0 ->) 0x639ec3c0 : (0x08048474 ->) 0x639ec474
[ 1106.383106] callmodule: 0x08048474 id [3]
[ 1106.383113] callmodule: 0x08048475 id [4]
[ 1106.383118] callmodule: (( 0x08048000 is_vmalloc_addr 0 virt_addr_valid 0 ))
[ 1106.383130] callmodule: cont pid task a: eae90ca0 c: wtest p: [9404] s: runnable
[ 1106.383147] initcall callmodule_init+0x0/0x1000 [callmodule] returned with preemption imbalance
[ 1106.518074] callmodule: < exit

…意味着它将虚拟地址0x08048474映射到物理地址0x639ec474.但是,物理不用于硬件断点 – 我们可以直接向register_user_hw_breakpoint提供虚拟地址;但是,我们还需要提供该过程的task_struct.有了这个,我可以在ftrace输出中得到这样的东西:

...
  597.907256 |   1)   wtest-5339   |               |  handle_mm_fault() {
...
  597.907310 |   1)   wtest-5339   | + 35.627 us   |      }
  597.907311 |   1)   wtest-5339   | + 46.245 us   |    }
  597.907312 |   1)   wtest-5339   | + 56.143 us   |  }
  597.907313 |   1)   wtest-5339   |   1.039 us    |  up_read();
  597.907317 |   1)   wtest-5339   |   1.285 us    |  native_get_debugreg();
  597.907319 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
  597.907322 |   1)   wtest-5339   |   1.129 us    |  native_get_debugreg();
  597.907324 |   1)   wtest-5339   |   1.189 us    |  native_set_debugreg();
  597.907329 |   1)   wtest-5339   |               |  () {
  597.907333 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [3] */
  597.907334 |   1)   wtest-5339   |   5.567 us    |  }
  597.907336 |   1)   wtest-5339   |   1.123 us    |  native_set_debugreg();
  597.907339 |   1)   wtest-5339   |   1.130 us    |  native_get_debugreg();
  597.907341 |   1)   wtest-5339   |   1.075 us    |  native_set_debugreg();
  597.907343 |   1)   wtest-5339   |   1.075 us    |  native_get_debugreg();
  597.907345 |   1)   wtest-5339   |   1.081 us    |  native_set_debugreg();
  597.907348 |   1)   wtest-5339   |               |  () {
  597.907350 |   1)   wtest-5339   |               |  /* callmodule: hwbp hit: id [4] */
  597.907351 |   1)   wtest-5339   |   3.033 us    |  }
  597.907352 |   1)   wtest-5339   |   1.105 us    |  native_set_debugreg();
  597.907358 |   1)   wtest-5339   |   1.315 us    |  down_read_trylock();
  597.907360 |   1)   wtest-5339   |   1.123 us    |  _cond_resched();
  597.907362 |   1)   wtest-5339   |   1.027 us    |  find_vma();
  597.907364 |   1)   wtest-5339   |               |  handle_mm_fault() {
...

…对应于程序集的跟踪由断点ID标记.值得庆幸的是,正如预期的那样,它们是正确的;但是,ftrace还在其间捕获了一些调试命令.无论如何,这是我想看到的.

以下是有关该模块的一些注意事项:

>大部分模块来自Execute/invoke user-space program,and get its pid,from a kernel module;启动用户进程并获取pid的位置

>因为我们必须到task_struct去到pid;在这里我保存两者(这是多余的)

>不输出功能符号的地方;如果符号是kallsyms,那么我使用一个指向地址的函数指针;否则从源复制其他所需的功能
>我不知道如何启动用户空间进程停止,所以在产生后我发出一个SIGSTOP(它本身似乎有点不可靠),并将状态设置为__TASK_STOPPED).

>我可能仍然会获得状态“runnable”,有时我不期望它 – 但是,如果init早期退出并出现错误,我已经注意到wtest会在进程列表中挂起很久就会自然终止,所以我猜作品.

>为了获得绝对/物理地址,我使用Walking page tables of a process in Linux到达对应于虚拟地址的页面,然后挖掘内核源代码,我发现page_to_phys()到达地址(内部通过页面帧号); LDD3 ch.15有助于理解pfn和物理地址之间的关系.

>从这里我希望有物理地址,我不使用PAGE_SHIFT,但直接从objdump的汇编输出计算偏移量 – 我不是100%确定这是正确的.
>注意,(另见How to get a struct page from any address in the Linux kernel),模块输出表明虚拟地址0x08048000既不是is_vmalloc_addr也不是virt_addr_valid;我想,这应该告诉我,人们既不能使用vmalloc_to_pfn()也不能使用virt_to_page()来获取其物理地址!

>从内核空间为ftrace设置kprobes有点棘手(需要复制函数)

>尝试在我得到的物理地址上设置kprobe(例如0x639ec474),总是会产生“无法插入探测器(-22)”
>只是为了查看格式是否被解析,我正在尝试下面的tracing_on()函数(0xc10bcf60)的kallsyms地址;这似乎有用 – 因为它引发了一个致命的“BUG:调度而原子”(显然,我们并不打算在module_init中设置断点?). Bug是致命的,因为它使得kprobes目录从ftrace调试目录中消失
>只是创建kprobe不会使它出现在ftrace日志中 – 它也需要启用;启用的必要代码是 – 但我从未尝试过,因为之前的错误

>最后,断点设置是从Watch a variable (memory address) change in Linux kernel,and print stack trace when it changes?开始

>我从未见过设置可执行硬件断点的示例;它一直对我失败,直到通过内核源搜索,我发现对于HW_BREAKPOINT_X,attr.bp_len需要设置为sizeof(long)
>如果我尝试打印attr变量 – 来自_init或来自处理程序 – 某些东西变得严重搞砸了,无论我接下来打印什么变量,我得到它的值0x5(或0x48)(?!)
>因为我试图为两个断点使用单个处理函数,所以唯一可靠的信息从_init存储到处理程序,能够区分两者,似乎是bp-> id
>这些id是自动分配的,如果取消注册断点,它们似乎不会被重新声明(我不会取消注册它们以避免额外的ftrace打印输出).

就随机性而言,我认为这是因为该过程不是在停止状态下开始的;当它停止时,它会以不同的状态结束(或者很可能,我在某个地方错过了一些锁定).无论如何,你也可以在syslog中期待:

[ 1661.815114] callmodule: Trying to walk page table; addr task 0xEAF68CA0 ->mm ->start_code: 0x08048000 ->end_code: 0x080485F4
[ 1661.815319] callmodule: walk_ 0x8048000 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.815837] callmodule: walk_ 0x80483c0 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0
[ 1661.816846] callmodule: walk_ 0x8048474 callmodule: Valid pgd : Valid pud: Valid pmd: page frame struct is @ f5772000; *virtual (page_address) @ c0000000 (is_vmalloc_addr 0 virt_addr_valid 1 virt_to_phys 0x0) page_to_pfn 0 page_to_phys 0x0

…即使使用适当的任务指针(通过start_code判断),也只获得0x0作为物理地址.有时你得到相同的结果,但是使用start_code:0x00000000 – > end_code:0x00000000.有时,即使pid可以,也无法获取task_struct:

[  833.380417] callmodule:c: pid 7663
[  833.380424] callmodule: everything all right; pid 7663 (7663)
[  833.380430] callmodule: p is NULL - exiting
[  833.516160] callmodule: < exit

好吧,希望有人会评论并澄清这个模块的一些行为:)
希望这有助于某人,
干杯!

Makefile文件:

EXTRA_CFLAGS=-g -O0
obj-m += callmodule.o
all:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules
clean:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

callmodule.c:

#include <linux/module.h>
#include <linux/slab.h> //kzalloc
#include <linux/syscalls.h> // SIGCHLD,... sys_wait4,...
#include <linux/kallsyms.h> // kallsyms_lookup,print_symbol
#include <linux/highmem.h> // ‘kmap_atomic’ (via pte_offset_map)
#include <asm/io.h> // page_to_phys (arch/x86/include/asm/io.h)

struct subprocess_infoB; // forward declare
// global variable - to avoid intervening too much in the return of call_usermodehelperB:
static int callmodule_pid;
static struct subprocess_infoB* callmodule_infoB;
#define TRY_USE_KPROBES 0 // 1 // enable/disable kprobes usage code
#include <linux/kprobes.h> // enable_kprobe
// for hardware breakpoint:
#include <linux/perf_event.h>
#include <linux/hw_breakpoint.h>

// define a modified struct (with extra fields) here:
struct subprocess_infoB {
  struct work_struct work;
  struct completion *complete;
  char *path;
  char **argv;
  char **envp;
  int wait; //enum umh_wait wait;
  int retval;
  int (*init)(struct subprocess_info *info);
  void (*cleanup)(struct subprocess_info *info);
  void *data;
  pid_t pid;
  struct task_struct *task;
  unsigned long long last_page_physaddr;
};

struct subprocess_infoB *call_usermodehelper_setupB(char *path,char **argv,char **envp,gfp_t gfp_mask);

static inline int
call_usermodehelper_fnsB(char *path,int wait,//enum umh_wait wait,int (*init)(struct subprocess_info *info),void (*cleanup)(struct subprocess_info *),void *data)
{
  struct subprocess_info *info;
  struct subprocess_infoB *infoB;
  gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL;
  int ret;

  populate_rootfs_wait();

  infoB = call_usermodehelper_setupB(path,argv,envp,gfp_mask);
  printk(KBUILD_MODNAME ":a: pid %d\n",infoB->pid);
  info = (struct subprocess_info *) infoB;

  if (info == NULL)
      return -ENOMEM;

  call_usermodehelper_setfns(info,init,cleanup,data);
  printk(KBUILD_MODNAME ":b: pid %d\n",infoB->pid);

  // this must be called first,before infoB->pid is populated (by __call_usermodehelperB):
  ret = call_usermodehelper_exec(info,wait);

  // assign global pid (and infoB) here,so rest of the code has it:
  callmodule_pid = infoB->pid;
  callmodule_infoB = infoB;    
  printk(KBUILD_MODNAME ":c: pid %d\n",callmodule_pid);

  return ret;
}

static inline int
call_usermodehelperB(char *path,int wait) //enum umh_wait wait)
{
  return call_usermodehelper_fnsB(path,wait,NULL,NULL);
}

static void __call_usermodehelperB(struct work_struct *work)
{
  struct subprocess_infoB *sub_infoB =
      container_of(work,struct subprocess_infoB,work);
  int wait = sub_infoB->wait; // enum umh_wait wait = sub_info->wait;
  pid_t pid;
  struct subprocess_info *sub_info;
  // hack - declare function pointers
  int (*ptrwait_for_helper)(void *data);
  int (*ptr____call_usermodehelper)(void *data);
  // assign function pointers to verbatim addresses as obtained from /proc/kallsyms
  int killret;
  struct task_struct *spawned_task;
  ptrwait_for_helper = (void *)0xc1065b60;
  ptr____call_usermodehelper = (void *)0xc1065ed0;

  sub_info = (struct subprocess_info *)sub_infoB;

  if (wait == UMH_WAIT_PROC)
      pid = kernel_thread((*ptrwait_for_helper),sub_info,//(wait_for_helper,CLONE_FS | CLONE_FILES | SIGCHLD);
  else
      pid = kernel_thread((*ptr____call_usermodehelper),//(____call_usermodehelper,CLONE_VFORK | SIGCHLD);

  spawned_task = pid_task(find_vpid(pid),PIDTYPE_PID);

  // stop/suspend/pause task
  killret = kill_pid(find_vpid(pid),SIGSTOP,1); 
  if (spawned_task!=NULL) {
    // does this stop the process really?
    spawned_task->state = __TASK_STOPPED;
    printk(KBUILD_MODNAME ": : exst %d exco %d exsi %d diex %d inex %d inio %d\n",spawned_task->exit_state,spawned_task->exit_code,spawned_task->exit_signal,spawned_task->did_exec,spawned_task->in_execve,spawned_task->in_iowait);
  }
  printk(KBUILD_MODNAME ": : (kr: %d)\n",killret);
  printk(KBUILD_MODNAME ": : pid %d (%p) (%s)\n",pid,spawned_task,(spawned_task!=NULL)?((spawned_task->state==-1)?"unrunnable":((spawned_task->state==0)?"runnable":"stopped")):"null" );
  // grab and save the pid (and task_struct) here:
  sub_infoB->pid = pid;
  sub_infoB->task = spawned_task;
    switch (wait) {
    case UMH_NO_WAIT:
        call_usermodehelper_freeinfo(sub_info);
        break;
    case UMH_WAIT_PROC:
        if (pid > 0)
            break;
        /* FALLTHROUGH */
    case UMH_WAIT_EXEC:
        if (pid < 0)
            sub_info->retval = pid;
        complete(sub_info->complete);
    }
}

struct subprocess_infoB *call_usermodehelper_setupB(char *path,gfp_t gfp_mask)
{
    struct subprocess_infoB *sub_infoB;
    sub_infoB = kzalloc(sizeof(struct subprocess_infoB),gfp_mask);
    if (!sub_infoB)
        goto out;

    INIT_WORK(&sub_infoB->work,__call_usermodehelperB);
    sub_infoB->path = path;
    sub_infoB->argv = argv;
    sub_infoB->envp = envp;
  out:
    return sub_infoB;
}

#if TRY_USE_KPROBES
// copy from /kernel/trace/trace_probe.c (is unexported)
int traceprobe_command(const char *buf,int (*createfn)(int,char **))
{
  char **argv;
  int argc,ret;

  argc = 0;
  ret = 0;
  argv = argv_split(GFP_KERNEL,buf,&argc);
  if (!argv)
    return -ENOMEM;

  if (argc)
    ret = createfn(argc,argv);

  argv_free(argv);

  return ret;
}

// copy from kernel/trace/trace_kprobe.c?v=2.6.38 (is unexported)
#define TP_FLAG_TRACE   1
#define TP_FLAG_PROFILE 2
typedef void (*fetch_func_t)(struct pt_regs *,void *,void *);
struct fetch_param {
  fetch_func_t    fn;
  void *data;
};
typedef int (*print_type_func_t)(struct trace_seq *,const char *,void *);
enum {
  FETCH_MTD_reg = 0,FETCH_MTD_stack,FETCH_MTD_retval,FETCH_MTD_memory,FETCH_MTD_symbol,FETCH_MTD_deref,FETCH_MTD_END,};
// Fetch type information table * /
struct fetch_type {
  const char      *name;          /* Name of type */
  size_t          size;           /* Byte size of type */
  int             is_signed;      /* Signed flag */
  print_type_func_t       print;  /* Print functions */
  const char      *fmt;           /* Fromat string */
  const char      *fmttype;       /* Name in format file */
  // Fetch functions * /
  fetch_func_t    fetch[FETCH_MTD_END];
};
struct probe_arg {
  struct fetch_param      fetch;
  struct fetch_param      fetch_size;
  unsigned int            offset; /* Offset from argument entry */
  const char              *name;  /* Name of this argument */
  const char              *comm;  /* Command of this argument */
  const struct fetch_type *type;  /* Type of this argument */
};
struct trace_probe {
  struct list_head        list;
  struct kretprobe        rp;     /* Use rp.kp for kprobe use */
  unsigned long           nhit;
  unsigned int            flags;  /* For TP_FLAG_* */
  const char              *symbol;        /* symbol name */
  struct ftrace_event_class       class;
  struct ftrace_event_call        call;
  ssize_t                 size;           /* trace entry size */
  unsigned int            nr_args;
  struct probe_arg        args[];
};
static  int probe_is_return(struct trace_probe *tp)
{
  return tp->rp.handler != NULL;
}
static int probe_event_enable(struct ftrace_event_call *call)
{
  struct trace_probe *tp = (struct trace_probe *)call->data;

  tp->flags |= TP_FLAG_TRACE;
  if (probe_is_return(tp))
    return enable_kretprobe(&tp->rp);
  else
    return enable_kprobe(&tp->rp.kp);
}
#define KPROBE_EVENT_SYSTEM "kprobes"
#endif // TRY_USE_KPROBES

// <<<<<<<<<<<<<<<<<<<<<<

static struct page *walk_page_table(unsigned long addr,struct task_struct *intask)
{
  pgd_t *pgd;
  pte_t *ptep,pte;
  pud_t *pud;
  pmd_t *pmd;

  struct page *page = NULL;
  struct mm_struct *mm = intask->mm;

  callmodule_infoB->last_page_physaddr = 0ULL; // reset here,in case of early exit

  printk(KBUILD_MODNAME ": walk_ 0x%lx ",addr);

  pgd = pgd_offset(mm,addr);
  if (pgd_none(*pgd) || pgd_bad(*pgd))
    goto out;
  printk(KBUILD_MODNAME ": Valid pgd ");

  pud = pud_offset(pgd,addr);
  if (pud_none(*pud) || pud_bad(*pud))
    goto out;
  printk( ": Valid pud");

  pmd = pmd_offset(pud,addr);
  if (pmd_none(*pmd) || pmd_bad(*pmd))
    goto out;
  printk( ": Valid pmd");

  ptep = pte_offset_map(pmd,addr);
  if (!ptep)
    goto out;
  pte = *ptep;

  page = pte_page(pte);
  if (page) {
    callmodule_infoB->last_page_physaddr = (unsigned long long)page_to_phys(page);
    printk( ": page frame struct is @ %p; *virtual (page_address) @ %p (is_vmalloc_addr %d virt_addr_valid %d virt_to_phys 0x%llx) page_to_pfn %lx page_to_phys 0x%llx",page,page_address(page),is_vmalloc_addr((void*)page_address(page)),virt_addr_valid(page_address(page)),(unsigned long long)virt_to_phys(page_address(page)),page_to_pfn(page),callmodule_infoB->last_page_physaddr);
  }

  //~ pte_unmap(ptep);

out:
  printk("\n");
  return page;
}

static void sample_hbp_handler(struct perf_event *bp,struct perf_sample_data *data,struct pt_regs *regs)
{
  trace_printk(KBUILD_MODNAME ": hwbp hit: id [%llu]\n",bp->id );
  //~ unregister_hw_breakpoint(bp);
}

// ----------------------

static int __init callmodule_init(void)
{
  int ret = 0;
  char userprog[] = "/path/to/wtest";
  char *argv[] = {userprog,"2",NULL };
  char *envp[] = {"HOME=/","PATH=/sbin:/usr/sbin:/bin:/usr/bin",NULL };
  struct task_struct *p;
  struct task_struct *par;
  struct task_struct *pc;
  struct list_head *children_list_head;
  struct list_head *cchildren_list_head;
  char *state_str;
  unsigned long offset,taddr;
  int (*ptr_create_trace_probe)(int argc,char **argv); 
  struct trace_probe* (*ptr_find_probe_event)(const char *event,const char *group);
  //int (*ptr_probe_event_enable)(struct ftrace_event_call *call); // not exported,copy
  #if TRY_USE_KPROBES
  char trcmd[256] = "";
  struct trace_probe *tp;
  #endif //TRY_USE_KPROBES
  struct perf_event *sample_hbp,*sample_hbpb;
  struct perf_event_attr attr,attrb;

  printk(KBUILD_MODNAME ": > init %s\n",userprog);

  ptr_create_trace_probe = (void *)0xc10d5120;
  ptr_find_probe_event = (void *)0xc10d41e0;
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065b60 is %s\n",0xc1065b60); // shows wait_for_helper+0x0/0xb0
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc1065ed0 is %s\n",0xc1065ed0); // shows ____call_usermodehelper+0x0/0x90
  print_symbol(KBUILD_MODNAME ": symbol @ 0xc10d5120 is %s\n",0xc10d5120); // shows create_trace_probe+0x0/0x590
  ret = call_usermodehelperB(userprog,UMH_WAIT_EXEC); 
  if (ret != 0)
      printk(KBUILD_MODNAME ": error in call to usermodehelper: %i\n",ret);
  else
      printk(KBUILD_MODNAME ": everything all right; pid %d (%d)\n",callmodule_pid,callmodule_infoB->pid);
  tracing_on(); // earlier,so trace_printk of handler is caught!
  // find the task:
  rcu_read_lock();
  p = pid_task(find_vpid(callmodule_pid),PIDTYPE_PID);
  rcu_read_unlock();
  if (p == NULL) {
    printk(KBUILD_MODNAME ": p is NULL - exiting\n");
    return 0;
  }
  state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": pid task a: %p c: %s p: [%d] s: %s\n",p,p->comm,p->pid,state_str);
  // find parent task:
  par = p->parent;
  if (par == NULL) {
    printk(KBUILD_MODNAME ": par is NULL - exiting\n");
    return 0;
  }
  state_str = (par->state==-1)?"unrunnable":((par->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": parent task a: %p c: %s p: [%d] s: %s\n",par,par->comm,par->pid,state_str);

  // iterate through parent's (and our task's) child processes:
  rcu_read_lock(); // read_lock(&tasklist_lock);
  list_for_each(children_list_head,&par->children){
    p = list_entry(children_list_head,struct task_struct,sibling);
    printk(KBUILD_MODNAME ": - %s [%d] \n",p->pid);
    if (p->pid == callmodule_pid) {
      list_for_each(cchildren_list_head,&p->children){
        pc = list_entry(cchildren_list_head,sibling);
        printk(KBUILD_MODNAME ": - - %s [%d] \n",pc->comm,pc->pid);
      }
    }
  }
  rcu_read_unlock(); //~ read_unlock(&tasklist_lock);

  // NOTE: here p == callmodule_infoB->task !!
  printk(KBUILD_MODNAME ": Trying to walk page table; addr task 0x%X ->mm ->start_code: 0x%08lX ->end_code: 0x%08lX \n",(unsigned int) callmodule_infoB->task,callmodule_infoB->task->mm->start_code,callmodule_infoB->task->mm->end_code);
  walk_page_table(0x08048000,callmodule_infoB->task);
  // 080483c0 is start of .text; 08048474 start of main; for objdump -S wtest
  walk_page_table(0x080483c0,callmodule_infoB->task);
  walk_page_table(0x08048474,callmodule_infoB->task);

  if (callmodule_infoB->last_page_physaddr != 0ULL) {
    printk(KBUILD_MODNAME ": physaddr ");
    taddr = 0x080483c0; // .text
    offset = taddr - callmodule_infoB->task->mm->start_code;
    printk(": (0x%08lx ->) 0x%08llx ",taddr,callmodule_infoB->last_page_physaddr+offset);
    taddr = 0x08048474; // main
    offset = taddr - callmodule_infoB->task->mm->start_code;
    printk(": (0x%08lx ->) 0x%08llx ",callmodule_infoB->last_page_physaddr+offset);
    printk("\n");

    #if TRY_USE_KPROBES // can't use this here (BUG: scheduling while atomic,if probe inserts)
    //~ sprintf(trcmd,"p:myprobe 0x%08llx",callmodule_infoB->last_page_physaddr+offset);
    // try symbol for c10bcf60 - tracing_on
    sprintf(trcmd,(unsigned long long)0xc10bcf60);
    ret = traceprobe_command(trcmd,ptr_create_trace_probe); //create_trace_probe);
    printk("%s -- ret: %d\n",trcmd,ret);
    // try find probe and enable it (compiles,but untested):
    tp = ptr_find_probe_event("myprobe",KPROBE_EVENT_SYSTEM);
    if (tp != NULL) probe_event_enable(&tp->call);
    #endif //TRY_USE_KPROBES
  }

  hw_breakpoint_init(&attr);
  attr.bp_len = sizeof(long); //HW_BREAKPOINT_LEN_1;
  attr.bp_type = HW_BREAKPOINT_X ;
  attr.bp_addr = 0x08048474; // main
  sample_hbp = register_user_hw_breakpoint(&attr,(perf_overflow_handler_t)sample_hbp_handler,p);
  printk(KBUILD_MODNAME ": 0x08048474 id [%llu]\n",sample_hbp->id); //
  if (IS_ERR((void __force *)sample_hbp)) {
    int ret = PTR_ERR((void __force *)sample_hbp);
    printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n",ret);
    //~ return ret;
  }

  hw_breakpoint_init(&attrb);
  attrb.bp_len = sizeof(long);
  attrb.bp_type = HW_BREAKPOINT_X ;
  attrb.bp_addr = 0x08048475; // first instruction after main
  sample_hbpb = register_user_hw_breakpoint(&attrb,p);
  printk(KBUILD_MODNAME ": 0x08048475 id [%llu]\n",sample_hbpb->id); //45
  if (IS_ERR((void __force *)sample_hbpb)) {
    int ret = PTR_ERR((void __force *)sample_hbpb);
    printk(KBUILD_MODNAME ": Breakpoint registration failed (%d)\n",ret);
    //~ return ret;
  }

  printk(KBUILD_MODNAME ": (( 0x08048000 is_vmalloc_addr %d virt_addr_valid %d ))\n",is_vmalloc_addr((void*)0x08048000),virt_addr_valid(0x08048000));

  kill_pid(find_vpid(callmodule_pid),SIGCONT,1); // resume/continue/restart task
  state_str = (p->state==-1)?"unrunnable":((p->state==0)?"runnable":"stopped");
  printk(KBUILD_MODNAME ": cont pid task a: %p c: %s p: [%d] s: %s\n",state_str);

  return 0;
}

static void __exit callmodule_exit(void)
{
  tracing_off(); //corresponds to the user space /sys/kernel/debug/tracing/tracing_on file
  printk(KBUILD_MODNAME ": < exit\n");
}

module_init(callmodule_init);
module_exit(callmodule_exit);
MODULE_LICENSE("GPL");

相关文章

linux常用进程通信方式包括管道(pipe)、有名管道(FIFO)、...
Linux性能观测工具按类别可分为系统级别和进程级别,系统级别...
本文详细介绍了curl命令基础和高级用法,包括跳过https的证书...
本文包含作者工作中常用到的一些命令,用于诊断网络、磁盘占满...
linux的平均负载表示运行态和就绪态及不可中断状态(正在io)的...
CPU上下文频繁切换会导致系统性能下降,切换分为进程切换、线...