Hi guys,
We have a strange issue where only a couple of our 50 deployed devices seem to randomly get “stuck”, sometimes after weeks in the field. When we get them back home nothing prints in the serial, but restarting seems to help. I can’t really get closer to the problem without either a way to get and understand a trace of whats going on, or a way for the system to reboot itself when it hits a fault. I seem to remember that I’ve seen some of the sample code restarting on faults, but I can’t trace down the right config flags to set to make this happen.
So my question is two-fold:
1) Which config flags should I be using to protect against faults just making the entire system hang?
2) If I manage to get one of these stuck devices home (assuming it’s one of the ones that havent been rebooted automatically), what can I do to get closer to finding out what’s gone wrong?
Note that I have watchdog in place, and I have an LED that I turn on/off when the device is in the main loop, which is why I know that it’s stuck - The LED keeps being on forever, and the watchdog it not triggered, so i’m thinking that it’s a fault that just makes the system hang. Here is my config:
CONFIG_HEAP_MEM_POOL_SIZE=1024
CONFIG_MAIN_STACK_SIZE=2048
CONFIG_DEBUG=n
# Needed for battery measurement
CONFIG_ADC=y
# Add the accelerometer
CONFIG_I2C=y
CONFIG_SENSOR=y
CONFIG_LIS2DH=y
CONFIG_LIS2DH_TRIGGER_GLOBAL_THREAD=y
CONFIG_LIS2DH_ODR_RUNTIME=y
# SPI Flash
CONFIG_SPI=y
CONFIG_SPI_NOR=y
CONFIG_SPI_NOR_FLASH_LAYOUT_PAGE_SIZE=4096
CONFIG_SPI_NOR_IDLE_IN_DPD=y
# Enable flash operations.
CONFIG_FLASH=y
CONFIG_PM_EXTERNAL_FLASH_MCUBOOT_SECONDARY=n
CONFIG_LOG_PRINTK=y
CONFIG_LOG_MODE_IMMEDIATE=y
# Zephyr Device Power Management
CONFIG_PM_DEVICE=y
CONFIG_PM=y
# Network
CONFIG_NETWORKING=y
CONFIG_NET_SOCKETS=y
CONFIG_NET_NATIVE=n
CONFIG_NET_SOCKETS_POSIX_NAMES=y
# LTE link control
CONFIG_NEWLIB_LIBC=y
CONFIG_NEWLIB_LIBC_FLOAT_PRINTF=y
CONFIG_LTE_LINK_CONTROL=y
CONFIG_LTE_AUTO_INIT_AND_CONNECT=n
CONFIG_LTE_NETWORK_MODE_NBIOT=y
CONFIG_LTE_NETWORK_MODE_LTE_M=n
CONFIG_LTE_NETWORK_USE_FALLBACK=n
CONFIG_LTE_NETWORK_TIMEOUT=300
CONFIG_LTE_PSM_REQ_RPTAU="00000110" # 60 min
CONFIG_LTE_PSM_REQ_RAT="00000000" # 0 sec active time
CONFIG_LTE_LC_MODEM_SLEEP_NOTIFICATIONS=y
# GPS
CONFIG_LOCATION=n
#CONFIG_LOCATION_METHOD_GNSS=y
#CONFIG_LOCATION_METHOD_CELLULAR=n
#CONFIG_MODEM_ANTENNA=y
#CONFIG_MODEM_ANTENNA_AT_COEX0="AT\%XCOEX0=1,1,1565,1586"
#CONFIG_MODEM_ANTENNA_GNSS_EXTERNAL=y
#CONFIG_LTE_NETWORK_MODE_NBIOT_GPS=y
# Modem library
CONFIG_NRF_MODEM_LIB=y
# Async
CONFIG_UART_ASYNC_API=y
CONFIG_UART_LINE_CTRL=y
# Enable Zephyr application to be booted by MCUboot
CONFIG_BOOTLOADER_MCUBOOT=y
# Enable reboot
CONFIG_REBOOT=y
CONFIG_WATCHDOG=y
CONFIG_WDT_LOG_LEVEL_DBG=n
CONFIG_WDT_DISABLE_AT_BOOT=y
# MQTT
CONFIG_MQTT_LIB=y
CONFIG_MQTT_LIB_TLS=n
CONFIG_MQTT_CLEAN_SESSION=n
CONFIG_MQTT_KEEPALIVE=0
# Time
CONFIG_DATE_TIME=y
CONFIG_DATE_TIME_MODEM=y
CONFIG_DATE_TIME_AUTO_UPDATE=y
# utils
CONFIG_JSON_LIBRARY=y
CONFIG_CJSON_LIB=y
# RTC
CONFIG_COUNTER=y
CONFIG_PCF85063A=y