stp优化基础知识(实例演示用Kdump分析内核奔溃原因)

本文主要介绍kdump服务和crash的使用,并结合一个简单的实例演示如何分析内核奔溃的原因。本文基于linux kernel 4.19, 体系结构为aarch64。

kdump概述
  1. kdump

kdump 是一种先进的基于 kexec 的内核崩溃转储机制,用来捕获kernel crash(内核崩溃)的时候产生的crash dump。当内核产生错误时,kdump会将内存导出为vmcore保存到磁盘。

  1. kdump流程

当系统崩溃时,kdump 使用 kexec 启动到第二个内核。第二个内核通常叫做捕获内核,以很小内存启动以捕获转储镜像。第一个内核启动时会保留一段内存给kdump用。

stp优化基础知识(实例演示用Kdump分析内核奔溃原因)(1)

  1. kdump的配置
  • 系统启动时为crashkernel保留内存

可以在kernel command line中加入如下参数:crashkernel=size[@offset]。保留内存是否预留成功,可以通过cat /proc/meminfo查看。。

cat /proc/meminfo | grep Crash

  • 安装kexec-toools

yum install kexec-tools

kexec-tool推荐使用rpm方式安装,使用时需要和内核版本配套。

  • 启动kdump服务

systemctl start kdump.service // 启动kdump服务

service kdump status // 查看kdump状态

  • 测试kdump是否可以正常dump

echo c > /proc/sysrq-trigger

如果没有问题,系统会自动重启,重启后可以看到在/var/crash/目录下生成了coredump文件。

qemu使用kdump

我们经常会使用qemu去启动虚拟机。qemu启动的内核发生错误也可以用kdump生成vmcore文件。

  1. 首先先将qemu的panic重启关闭,防止coredump的时候发生了reboot

echo 0 > /proc/sys/kernel/panic

  1. 触发kernel panic

echo c > /proc/sysrq-trigger

  1. kernel panic后,使得qemu进入monitor模式

ctrl A, ---> c, qemu进入monitor模式

  1. 进入monitor模式后,进行coredump

dump-guest-memory -z xxx-vmcore

如下图所示,成功在qemu 的kernel panic后,获得了coredump文件。

stp优化基础知识(实例演示用Kdump分析内核奔溃原因)(2)

使用crash分析内核奔溃转储文件

在内核奔溃后,如果部署了kdump, 会在/var/crash目录中找到vmcore转储文件,vmcore文件可以配合crash工具进行分析。

crash的版本要和内核的版本保持一致, 比如上面成功dump了qemu arm64的coredump文件,就需要配套的arm64的crash工具进行分析,否则会报兼容性错误。

编译arm64 crash工具:

下载:https://github.com/crash-utility/crash/releases

编译安装:

$ tar -xf crash-7.2.8.tar.gz

$ cd crash-7.2.8/

$ make target=arm64

安装完成后,使用crash工具分析vmcore文件, vmLinux在编译内核时会在根目录下生成。

crash vmcore vmlinux

stp优化基础知识(实例演示用Kdump分析内核奔溃原因)(3)

crash常用命令
  • bt: 查看函数调用栈

crash> bt PID: 1452 TASK: ffff80007b0f1a80 CPU: 1 COMMAND: "sh" #0 [ffff00000aeb3900] __delay at ffff000008af2528 #1 [ffff00000aeb3930] __const_udelay at ffff000008af2488 #2 [ffff00000aeb3940] panic at ffff0000080d7f04 #3 [ffff00000aeb3a20] die at ffff00000808cb18 #4 [ffff00000aeb3a60] die_kernel_fault at ffff00000809f7e8 #5 [ffff00000aeb3a90] __do_kernel_fault at ffff00000809f07c #6 [ffff00000aeb3ac0] do_page_fault at ffff00000809f12c #7 [ffff00000aeb3b30] do_translation_fault at ffff00000809f574 #8 [ffff00000aeb3b40] do_mem_abort at ffff000008081448 #9 [ffff00000aeb3ca0] el1_ia at ffff00000808318c PC: ffff0000085dc0d0 [sysrq_handle_crash 32] LR: ffff0000085dc0bc [sysrq_handle_crash 12] SP: ffff00000aeb3cb0 PSTATE: 40000005 X29: ffff00000aeb3cb0 X28: ffff80007b0f1a80 X27: 0000000000000000 X26: 0000000000000000 X25: 0000000056000000 X24: 0000000000000000 X23: 0000000000000007 X22: ffff000009289000 X21: ffff000009289400 X20: 0000000000000063 X19: ffff0000091a1000 X18: ffffffffffffffff X17: 0000000000000000 X16: 0000000000000000 X15: ffff0000091896c8 X14: ffff0000892ed70f X13: ffff0000092ed71d X12: ffff0000091a1000 X11: 0000000005f5e0ff X10: ffff000009189940 X9: 00000000ffffffd0 X8: ffff000008602b08 X7: 54203a2071527379 X6: 00000000000000d2 X5: 0000000000000000 X4: 0000000000000000 X3: ffffffffffffffff X2: 2c501196acfc7700 X1: 0000000000000000 X0: 0000000000000001 #10 [ffff00000aeb3cb0] sysrq_handle_crash at ffff0000085dc0cc #11 [ffff00000aeb3cc0] __handle_sysrq at ffff0000085dc6cc #12 [ffff00000aeb3d00] write_sysrq_trigger at ffff0000085dcc60 #13 [ffff00000aeb3d20] proc_reg_write at ffff0000082ac7e4 #14 [ffff00000aeb3d40] __vfs_write at ffff00000823a9cc #15 [ffff00000aeb3de0] vfs_write at ffff00000823ace0 #16 [ffff00000aeb3e20] ksys_write at ffff00000823afd4 #17 [ffff00000aeb3e70] __arm64_sys_write at ffff00000823b064 #18 [ffff00000aeb3e80] el0_svc_common at ffff000008094ef4 #19 [ffff00000aeb3eb0] el0_svc_handler at ffff000008094fa8 #20 [ffff00000aeb3ff0] el0_svc at ffff000008084044 PC: 0000000000401a58 LR: 00000000004b2be4 SP: 0000ffffe68f8e10 X29: 0000ffffe68f9500 X28: 0000ffffe68f9fba X27: 000000000056f9c0 X26: 00000000005ed000 X25: 0000000000000000 X24: 0000000000000020 X23: 0000000011710110 X22: 00000000005ed000 X21: 0000000000000002 X20: 0000000011710110 X19: 0000000000000001 X18: 0000000000000001 X17: 0000000000000000 X16: 0000000000000000 X15: 0000000000000008 X14: 0000000000000012 X13: 726567676972742d X12: 0101010101010101 X11: 0000005000564818 X10: 0101010101010101 X9: fffffffffffffff0 X8: 0000000000000040 X7: 0000000011710120 X6: 0080808080808080 X5: 0000000000000000 X4: 0000000000000063 X3: 0000000011710111 X2: 0000000000000002 X1: 0000000011710110 X0: 0000000000000001 ORIG_X0: 0000000000000001 SYSCALLNO: 40 PSTATE: 80000000

  • log: 查看内核dmesg日志

crash> log [ 0.000000] Booting linux on physical CPU 0x0000000000 [0x411fd070] [ 0.000000] Linux version 4.20.0-rc4-00007-gef78e5e (root@localhost.localdomain) (gcc version 7.3.1 20180425 [linaro-7.3-2018.05 revision d29120a424ecfbc167ef90065c0eeb7f91977701] (Linaro GCC 7.3-2018.05)) #3 SMP PREEMPT Wed Jan 15 07:52:10 PST 2020 [ 0.000000] Machine model: linux,dummy-virt [ 0.000000] efi: Getting EFI parameters from FDT: [ 0.000000] efi: UEFI not found. [ 0.000000] cma: Reserved 32 MiB at 0x00000000be000000 [ 0.000000] NUMA: No NUMA configuration found [ 0.000000] NUMA: Faking a node at [mem 0x0000000040000000-0x00000000bfffffff] [ 0.000000] NUMA: NODE_DATA [mem 0xbdfea840-0xbdfebfff] [ 0.000000] Zone ranges: [ 0.000000] DMA32 [mem 0x0000000040000000-0x00000000bfffffff] [ 0.000000] Normal empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000bfffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000040000000-0x00000000bfffffff] [ 0.000000] On node 0 totalpages: 524288 [ 0.000000] DMA32 zone: 8192 pages used for memmap [ 0.000000] DMA32 zone: 0 pages reserved [ 0.000000] DMA32 zone: 524288 pages, LIFO batch:63 [ 0.000000] psci: probing for conduit method from DT. [ 0.000000] psci: PSCIv0.2 detected in firmware. [ 0.000000] psci: Using standard PSCI v0.2 function IDs [ 0.000000] psci: Trusted OS migration not required [ 0.000000] random: get_random_bytes called from start_kernel 0xa8/0x418 with crng_init=0 [ 0.000000] percpu: Embedded 23 pages/cpu @(____ptrval____) s55704 r8192 d30312 u94208 [ 0.000000] pcpu-alloc: s55704 r8192 d30312 u94208 alloc=23*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [ 0.000000] Detected PIPT I-cache on CPU0 [ 0.000000] CPU features: enabling workaround for ARM erratum 832075 [ 0.000000] CPU features: enabling workaround for ARM erratum 834220 [ 0.000000] CPU features: enabling workaround for EL2 vector hardening [ 0.000000] CPU features: detected: Kernel page table isolation (KPTI) [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 516096 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: rdinit=/linuxrc console=ttyAMA0 [ 0.000000] Memory: 2009884K/2097152K available (10876K kernel code, 1414K rwdata, 5100K rodata, 1344K init, 380K bss, 54500K reserved, 32768K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] rcu: Preemptible hierarchical RCU implementation. [ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2. [ 0.000000] Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 25 jiffies. [ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 [ 0.000000] NR_IRQS: 64, nr_irqs: 64, preallocated irqs: 0 [ 0.000000] GICv2m: range[mem 0x08020000-0x08020fff], SPI[80:143] [ 0.000000] arch_timer: cp15 timer(s) running at 62.50MHz (virt). [ 0.000000] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0x1cd42e208c, max_idle_ns: 881590405314 ns

  • struct: 查看数据结构

crash> struct task_struct ffff0000085dc0d0 -x struct task_struct { thread_info = { flags = 0xa8c17bfd39000020, addr_limit = 0xd503201fd65f03c0, preempt_count = 0xa9bf7bfd }, state = 0x97ec827fd50342ff, stack = 0xd65f03c0a8c17bfd, usage = { counter = 0xa9bd7bfd }, flags = 0x910003fd, ptrace = 0xa90153f3, wake_entry = { next = 0xaa0103f4911b2262 }, on_cpu = 0xf9400041, cpu = 0xf90017a1, wakee_flips = 0xd2800001, wakee_flip_decay_ts = 0x37f8018097f909a2, last_wakee = 0xf10bfc7ff94013a3, recent_used_cpu = 0x54000228, wake_cpu = 0xb0006561, on_rq = 0x91018021, prio = 0xf9401284, static_prio = 0x52800000, normal_prio = 0xb9404022, rt_priority = 0x79000083, sched_class = 0x911b2273b9004022, se = { load = { weight = 0x940b05adf9400013, inv_weight = 0x91012260 }, runnable_weight = 0x97edbddd91052260, run_node = { __rb_parent_color = 0x940b05c5aa1403e0, rb_right = 0x97f0ec85aa1303e0, rb_left = 0xa8c27bfda94153f3 }, group_node = { next = 0xd503201fd65f03c0, prev = 0x52800021a9bf7bfd }, on_rq = 0x910003fd, exec_start = 0xd280000097f251d0, sum_exec_runtime = 0xa8c17bfd97ec8318, vruntime = 0xd503201fd65f03c0, prev_sum_exec_runtime = 0x910003fda9be7bfd, nr_migrations = 0xd1012013f9000bf3, statistics = {<No data fields>}, depth = 0x39434660, parent = 0x52800020f9000fb4, cfs_rq = 0xb940ce7439034a60, my_q = 0x52800023d5033f9f, avg = { last_update_time = 0x940b0cf552800001, load_sum = 0x52800003aa1303e0, runnable_load_sum = 0x5280002152800c62, util_sum = 0x940b0cf0,

struct -o [struct] : 显示结构体中成员的偏移

struct [struct] [address] : 显示对应地址结构体的值

[struct] [address] :简化形式显示对应地址结构体的值

[struct] [address] -xo: 打印结构体定义和大小

[struct].member[address]: 显示某个成员的值

  • rd: 读取内存内容

crash> rd ffff0000085dc0d0 32 ffff0000085dc0d0: a8c17bfd39000020 d503201fd65f03c0 ..9.{...._.. .. ffff0000085dc0e0: 910003fda9bf7bfd 97ec827fd50342ff .{.......B...... ffff0000085dc0f0: d65f03c0a8c17bfd 910003fda9bd7bfd .{...._..{...... ffff0000085dc100: b0005d73a90153f3 aa0103f4911b2262 .S..s]..b"...... ffff0000085dc110: f90017a1f9400041 910083a2d2800001 A.@............. ffff0000085dc120: 37f8018097f909a2 f10bfc7ff94013a3 .......7..@..... ffff0000085dc130: b000656154000228 f940128491018021 (..Tae..!.....@. ffff0000085dc140: b940402252800000 1100044279000083 ...R"@@....yB... ffff0000085dc150: 911b2273b9004022 f9400261f94017a2 "@..s"....@.a.@. ffff0000085dc160: b50000c1ca010041 a8c37bfda94153f3 A........SA..{.. ffff0000085dc170: 128002a0d65f03c0 97ebee0b17fffff7 .._............. ffff0000085dc180: 910003fda9be7bfd aa0003f4a90153f3 .{.......S...... ffff0000085dc190: 940b05adf9400013 97ec5fb991012260 ..@.....`"..._.. ffff0000085dc1a0: 97edbddd91052260 940b05c5aa1403e0 `".............. ffff0000085dc1b0: 97f0ec85aa1303e0 a8c27bfda94153f3 .........SA..{.. ffff0000085dc1c0: d503201fd65f03c0 52800021a9bf7bfd .._.. ...{..!..R

rd [addr] [len]: 查看指定地址,长度为len的内存

rd -S [addr][len]: 尝试将地址转换为对应的符号

rd [addr] -e [addr] : 查看指定内存区域内容

  • dis: 进行返汇编,查看对应地址的代码逻辑

嵌入式物联网需要学的东西真的非常多,千万不要学错了路线和内容,导致工资要不上去!

无偿分享大家一个资料包,差不多150多G。里面学习内容、面经、项目都比较新也比较全!某鱼上买估计至少要好几十。

点击这里找小助理0元领取:https://s.pdb2.com/l/cnklSITCGo24eIn

stp优化基础知识(实例演示用Kdump分析内核奔溃原因)(4)

crash> dis -r ffff0000085dc0d0 0xffff0000085dc0b0 <sysrq_handle_crash>: stp x29, x30, [sp,#-16]! 0xffff0000085dc0b4 <sysrq_handle_crash 4>: mov x29, sp 0xffff0000085dc0b8 <sysrq_handle_crash 8>: bl 0xffff000008141a48 <__rcu_read_unlock> 0xffff0000085dc0bc <sysrq_handle_crash 12>: adrp x1, 0xffff0000092e9000 <xen_dummy_shared_info 984> 0xffff0000085dc0c0 <sysrq_handle_crash 16>: mov w0, #0x1 // #1 0xffff0000085dc0c4 <sysrq_handle_crash 20>: str w0, [x1,#1448] 0xffff0000085dc0c8 <sysrq_handle_crash 24>: dsb st 0xffff0000085dc0cc <sysrq_handle_crash 28>: mov x1, #0x0 // #0 0xffff0000085dc0d0 <sysrq_handle_crash 32>: strb w0, [x1]

crash> dis -f ffff0000085dc0d0 0xffff0000085dc0d0 <sysrq_handle_crash 32>: strb w0, [x1] 0xffff0000085dc0d4 <sysrq_handle_crash 36>: ldp x29, x30, [sp],#16 0xffff0000085dc0d8 <sysrq_handle_crash 40>: ret

  • ps: 查看线程状态

crash> ps PID PPID CPU TASK ST %MEM VSZ RSS COMM > 0 0 0 ffff000009192580 RU 0.0 0 0 [swapper/0] 0 0 1 ffff80007bbc1a80 RU 0.0 0 0 [swapper/1] 1 0 0 ffff80007bb68000 IN 0.0 2196 60 linuxrc 2 0 0 ffff80007bb68d40 IN 0.0 0 0 [kthreadd] 3 2 0 ffff80007bb69a80 ID 0.0 0 0 [rcu_gp] 4 2 0 ffff80007bb6a7c0 ID 0.0 0 0 [rcu_par_gp] 5 2 0 ffff80007bb6b500 ID 0.0 0 0 [kworker/0:0] 6 2 0 ffff80007bb6c240 ID 0.0 0 0 [kworker/0:0H] 7 2 0 ffff80007bb6cf80 ID 0.0 0 0 [kworker/u4:0] 8 2 0 ffff80007bb6dcc0 ID 0.0 0 0 [mm_percpu_wq] 9 2 0 ffff80007bb6ea00 IN 0.0 0 0 [ksoftirqd/0] 10 2 0 ffff80007bbc0000 ID 0.0 0 0 [rcu_preempt] 11 2 0 ffff80007bbc0d40 IN 0.0 0 0 [migration/0] 12 2 0 ffff80007bbc27c0 IN 0.0 0 0 [cpuhp/0] 13 2 1 ffff80007bbc3500 IN 0.0 0 0 [cpuhp/1] 14 2 1 ffff80007bbc4240 IN 0.0 0 0 [migration/1] 15 2 1 ffff80007bbc4f80 IN 0.0 0 0 [ksoftirqd/1] 16 2 1 ffff80007bbc5cc0 ID 0.0 0 0 [kworker/1:0] 17 2 1 ffff80007bbc6a00 ID 0.0 0 0 [kworker/1:0H] 18 2 0 ffff80007bbd0000 IN 0.0 0 0 [kdevtmpfs] 19 2 0 ffff80007bbd0d40 ID 0.0 0 0 [netns] 20 2 0 ffff80007b040000 ID 0.0 0 0 [kworker/u4:1] 21 2 1 ffff80007b040d40 IN 0.0 0 0 [rcu_tasks_kthre] 42 2 1 ffff80007b0f3500 ID 0.0 0 0 [kworker/1:1] 43 2 0 ffff80007b0f4240 ID 0.0 0 0 [kworker/0:1] 49 2 1 ffff80007b0f4f80 ID 0.0 0 0 [kworker/u4:2] 56 2 1 ffff80007b140000 IN 0.0 0 0 [kauditd] 212 2 0 ffff80007b26ea00 ID 0.0 0 0 [kworker/u4:3] 256 2 0 ffff80007b336a00 ID 0.0 0 0 [kworker/u4:4] 471 2 1 ffff80007b2d6a00 IN 0.0 0 0 [oom_reaper] 472 2 1 ffff80007b2d5cc0 ID 0.0 0 0 [writeback] 474 2 0 ffff80007b330d40 IN 0.0 0 0 [kcompactd0] 475 2 0 ffff80007b3327c0 IN 0.0 0 0 [ksmd] 476 2 0 ffff80007b2d1a80 IN 0.0 0 0 [khugepaged] 477 2 0 ffff80007b2d0000 ID 0.0 0 0 [crypto] 478 2 1 ffff80007b2d0d40 ID 0.0 0 0 [kintegrityd] 480 2 1 ffff80007b2d27c0 ID 0.0 0 0 [kblockd] 501 2 1 ffff80007b2d3500 ID 0.0 0 0 [tpm_dev_wq] 508 2 1 ffff80007b2d4240 ID 0.0 0 0 [ata_sff] 541 2 0 ffff80007ac98000 ID 0.0 0 0 [edac-poller] 551 2 1 ffff80007b044240 ID 0.0 0 0 [devfreq_wq] 561 2 1 ffff80007b268000 IN 0.0 0 0 [watchdogd] 647 2 0 ffff80007b268d40 ID 0.0 0 0 [rpciod] 648 2 1 ffff80007b26c240 ID 0.0 0 0 [kworker/u5:0] 649 2 0 ffff80007ad04f80 ID 0.0 0 0 [xprtiod] 718 2 1 ffff80007bbd3500 IN 0.0 0 0 [kswapd0] 815 2 1 ffff80007ad00000 ID 0.0 0 0 [nfsiod] 1250 2 0 ffff80007b26dcc0 ID 0.0 0 0 [vfio-irqfd-clea] > 1452 1 1 ffff80007b0f1a80 RU 0.0 2196 76 sh

ps -p [pid]: 显示进程父子关系

ps -t [pid]: 显示进程运行时间

  • kmem: 查看内核内存使用情况

crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 511276 2 GB ---- FREE 506631 1.9 GB 99% of TOTAL MEM USED 4645 18.1 MB 0% of TOTAL MEM SHARED 353 1.4 MB 0% of TOTAL MEM BUFFERS 0 0 0% of TOTAL MEM CACHED 480 1.9 MB 0% of TOTAL MEM SLAB 1930 7.5 MB 0% of TOTAL MEM TOTAL HUGE 0 0 ---- HUGE FREE 0 0 0% of TOTAL HUGE TOTAL SWAP 0 0 ---- SWAP USED 0 0 0% of TOTAL SWAP SWAP FREE 0 0 0% of TOTAL SWAP COMMIT LIMIT 255638 998.6 MB ---- COMMITTED 479 1.9 MB 0% of TOTAL LIMIT crash>

kmem -i: 查看内存整体使用情况

kmem -s: 查看slab使用情况

kmem [addr]: 搜索地址所属的内存结构

  • 更多其它命令通过help查看
内核panic实例

内核访问空指针产生panic。

  1. 驱动制作

编写一个驱动,构造一个内核模块访问空指针的异常,演示如何使用crash分析内核奔溃的原因。

include <linux/module.h> #include <linux/kernel.h> #include <linux/atomic.h> #include <linux/slab.h> struct my_struct { unsigned long head; spinlock_t lock; }; int *addr = 0; //null pointer void panic_foo(struct my_struct *ms) { int *p = addr; spin_lock(&ms->lock); if (ms->head == 10) { *p = 0xFFFF; } else if (ms->head = 0) { // do sth } else { // do sth } spin_unlock(&ms->lock); } int panic_kernel_init(void) { struct my_struct *ms = kzalloc(sizeof(struct my_struct), GFP_KERNEL); spin_lock_init(&ms->lock); ms->head = 10; panic_foo(ms); return 0; } void panic_kernel_exit(void) { } module_init(panic_kernel_init); module_exit(panic_kernel_exit);

obj-m := panic-kernel.o KERNEL_DIR := /home/linux PWD := $(shell pwd) all: make -C $(KERNEL_DIR) SUBDIRS=$(PWD) modules clean: rm *.o *.ko *.mod.c .PHONY: clean

将编好的驱动打包进根文件系统, 启动后插入内核模块。

stp优化基础知识(实例演示用Kdump分析内核奔溃原因)(5)

  1. panic 分析

内核的call trace如上图所示, 将对应的文件反汇编,找到问题出现对应的代码。

aarch64-linux-gnu-objdump -S panic-kernel.o > test.txt

截取部分反汇编如下:

Disassembly of section .text: 0000000000000000 <panic_foo>: int *addr = 0; //null pointer void panic_foo(struct my_struct *ms) { 0: a9bd7bfd stp x29, x30, [sp, #-48]! 4: 910003fd mov x29, sp 8: a90153f3 stp x19, x20, [sp, #16] c: aa0003f3 mov x19, x0 int *p = addr; 10: 90000000 adrp x0, 0 <panic_foo> raw_spin_lock_init(&(_lock)->rlock); \ } while (0)

从汇编代码可以看出, panic_foo函数的参数(x0)最终保存在x19寄存器。我们现在想要知道出现问题时,代码走的是哪一个分支。

配合crash进行分析,先导入模块符号表:

crash> mod -S my_module MODULE NAME SIZE OBJECT FILE ffff000000ae2000 panic_kernel 16384 my_module/panic-kernel.o

使用crash 查看出问题时结构体的值,确认函数走的是哪个分支。函数的参数是x19:

crash> struct my_struct ffff8000fa4d9780 struct my_struct { head = 10, lock = { { rlock = { raw_lock = { { val = { counter = 1 }, { locked = 1 '\001', pending = 0 '\000' }, { locked_pending = 1, tail = 0 } } } } } } }

从打印的之来看,head成员的值为10, 可以确定代码走的是哪一个分支。

再结合之前的反汇编代码, 出错的位置在pc: panic_foo 0x54。pc保存的是栈顶指针,lr保存的是函数返回的地址(x30)

static __always_inline void spin_unlock(spinlock_t *lock) { raw_spin_unlock(&lock->rlock); 38: aa1403e0 mov x0, x20 3c: 94000000 bl 0 <_raw_spin_unlock> } else { // do sth } spin_unlock(&ms->lock); } 40: f94013f5 ldr x21, [sp, #32] 44: a94153f3 ldp x19, x20, [sp, #16] 48: a8c37bfd ldp x29, x30, [sp], #48 4c: d65f03c0 ret *p = 0xFFFF; 50: 529fffe0 mov w0, #0xffff // #65535 54: b90002a0 str w0, [x21] 58: aa1403e0 mov x0, x20 5c: 94000000 bl 0 <_raw_spin_unlock> }

偏移54的位置是把w0的值保存到x21, 而x21的地址是0。w0的值是mov w0, 0xffff直接赋值得来的。所以这里是将0xffff直接写到0地址导致的问题。

原文作者:人人极客

原文标题:实例演示 | 用Kdump分析内核奔溃原因

原文链接:https://mp.weixin.qq.com/s/_ZKix3ZJ8NqwkcvgR_6Zgg

,

免责声明:本文仅代表文章作者的个人观点,与本站无关。其原创性、真实性以及文中陈述文字和内容未经本站证实,对本文以及其中全部或者部分内容文字的真实性、完整性和原创性本站不作任何保证或承诺,请读者仅作参考,并自行核实相关内容。文章投诉邮箱:anhduc.ph@yahoo.com

    分享
    投诉
    首页