项目上遇到的奇怪问题,两个月了没有进展,发这里大家支支招。
硬件平台
GD32F450VGT6(和STM32F407架构一样,用的STM32F407的HAL库)
RTT版本
3.1.5
问题描述
我的系统上有5个串口设备,1-DEBUG口 2-串口屏 3-无 4-串口WIFI模块 5-串口电表 6-刷卡板,现在只要屏和WIFI同时工作,系统跑十个小时左右就会死机,有时候打印hardFault后死机,有时候连打印都没有直接挂死。
但是,只单独跑其中一个模块一点问题没有。我试过把WIFI模块去掉跑72小时没死,或者把屏去掉跑72小时也没事,其他串口设备保持工作状态。不知道是不是因为GD的库和STM32的库在串口上有差别,对照datasheet寄存器也没看出有啥特别的不同。
下面是hardFault的一个Log。有哪位大神遇到同样问题的给个思路,谢了!
r00: 0x20001af8
r01: 0x00000800
r02: 0x2101e9d4
r03: 0x2000e338
r04: 0x00000800
r05: 0x20003148
r06: 0x00000000
r07: 0x00000000
r08: 0x080b2828
r09: 0x20001af8
r10: 0x00000000
r11: 0x00000000
r12: 0xffffffff
lr: 0x080aa9c7
pc: 0x080ac4f2
hard fault on thread: main
thread pri status sp stack size max used left tick error
-------- --- ------- ---------- ---------- ------ ---------- ---
pt02 24 ready 0x00000530 0x00001000 53% 0x00000003 000
pt01 24 suspend 0x00000224 0x00002000 29% 0x00000002 000
pt00 24 ready 0x00000170 0x00000c00 59% 0x00000003 -02
tshell 20 suspend 0x0000020c 0x00001000 50% 0x00000006 000
esp_net 16 suspend 0x000001f8 0x00000800 67% 0x0000000d 000
at_clnt 9 suspend 0x000000f8 0x00000600 70% 0x00000004 000
sys_work 23 suspend 0x0000010c 0x00000800 66% 0x00000001 000
phy 30 ready 0x000000bc 0x00000200 36% 0x00000001 -02
tcpip 10 suspend 0x000000f0 0x00000400 58% 0x00000008 000
etx 12 suspend 0x000000b0 0x00000400 17% 0x0000000d 000
erx 12 suspend 0x000000b8 0x00000400 17% 0x00000010 000
tidle 31 ready 0x00000080 0x00000400 19% 0x00000004 000
main 10 ready 0x00000240 0x00001000 48% 0x00000010 000
FPU active!
bus fault:
SCB_CFSR_BFSR:0x82 PRECISERR SCB->BFAR:A4088A88
问题追加
刚开始我也很怀疑是内存不够了,专门打了内存信息出来,下面LOG是出问题前的内存信息。我的MCU内存是192K的,应该是够用的。
I/ocpprpc [05-23 00:53:12] ocpp16.c....pt02---301...prebuffer=1024
I/ocpprpc [05-23 00:53:12] ocpp16.c....pt02---305
I/ocpprpc [05-23 00:53:12] >>> [2,"1621731192190","MeterValues",{"connectorId":1,"transactionId":16313,"meterValue":[{"timestamp":"2021-23T00:53:12Z","sampledValue":[{"value":"15520","context":"Sample.Periodic","measurand":"Energy.Active.Import.Register","unit":"Wh"},{"val:"0.9","context":"Sample.Periodic","measurand":"Current.Import","unit":"A"},{"value":"232.6","context":"Sample.Periodic","measurand":"Vole","unit":"V"},{"value":"51","context":"Sample.Periodic","measurand":"Temperature"}]}]}]
I/websocket [05-23 00:53:12] >>>>>> rws send 484 bytes
I/websocket [05-23 00:53:13] <<<<<<< rws recv 24 bytes
I/ocpprpc [05-23 00:53:14] <<< [3,"1621731192190",{}]
I/ocpp16 [05-23 00:54:13] ocpp16.c.............Heartbeat1()
total memory: 187644
used memory : 92804
maximum allocated memory: 98556
available memory: 94840
memory heap address:
heap_ptr: 0x200022e4
lfree : 0x2000d944
heap_end: 0x2002fff0
I/ocpprpc [05-23 00:54:13] ocpp16.c....pt02---301...prebuffer=1024
I/ocpprpc [05-23 00:54:13] ocpp16.c....pt02---305
I/ocpprpc [05-23 00:54:13] >>> [2,"162173125396","Heartbeat",{}]
I/websocket [05-23 00:54:13] >>>>>> rws send 39 bytes
I/websocket [05-23 00:54:14] <<<<<<< rws recv 59 bytes
I/ocpprpc [05-23 00:54:14] <<< [3,"162173125396",{"currentTime":"2021-05-23T00:54:16Z"}]
psr: 0x81000000
r00: 0x31333631
r01: 0x00000000
r02: 0x00000001
r03: 0x2000e400
r04: 0x00000000
r05: 0x20001b00
r06: 0x00000001
r07: 0x40030134
r08: 0x00000012
r09: 0x20001b00
r10: 0xa0000000
r11: 0x406d1999
r12: 0xa0da3332
lr: 0x080abbdf
pc: 0x080ad0b8
hard fault on thread: pt02
thread pri status sp stack size max used left tick error
-------- --- ------- ---------- ---------- ------ ---------- ---
pt03 24 suspend 0x0000032c 0x00000c00 62% 0x00000003 000
pt02 24 ready 0x0000041c 0x00001000 64% 0x00000005 000
pt01 24 suspend 0x00000198 0x00002000 30% 0x00000001 000
pt00 24 suspend 0x00000170 0x00000c00 53% 0x00000005 000
tshell 20 suspend 0x00000210 0x00001000 47% 0x00000006 000
ec20_net 16 suspend 0x00000204 0x00000800 64% 0x00000013 000
at_clnt 9 suspend 0x000000f8 0x00000600 70% 0x00000001 000
sys_work 23 suspend 0x0000010c 0x00000800 50% 0x00000007 000
phy 30 suspend 0x000000bc 0x00000200 47% 0x00000002 000
tcpip 10 suspend 0x000000f0 0x00000400 89% 0x00000006 000
etx 12 suspend 0x000000ac 0x00000400 17% 0x0000000e 000
erx 12 suspend 0x000000b4 0x00000400 62% 0x00000005 000
tidle 31 ready 0x00000064 0x00000400 12% 0x00000002 000
main 10 suspend 0x00000368 0x00001000 51% 0x00000002 000
FPU active!
bus fault:
SCB_CFSR_BFSR:0x82 PRECISERR SCB->BFAR:31333631
从A4088A88这个地址来看,99%是空指针引起的,根据PC和LR应该很方便定位出来。
@flashman2002 他这种问题一般没法检查吧,平常是好的,长时间之后死机,怎么使用free去查看内存资源啊
不瞒你说,我还真在mem.c里加入了打印当前mem信息,出问题时候内存是够的:
@aozima 如果是空指针引起的,不管我去掉哪个设备,早晚都要挂死,但现在的情况是,只要去掉其中一个串口设备就没有问题。
@suntao_222 有明确的PC和LR了,把真正原因找出来呗。