-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Description
I initially reported this on the mailing list, but I figured I should add an issue.
I was getting strange lockups on my nodes, and managed to work out that a timer was being inserted into the list when it was already in the list, which would cause an infinite loop here
I modified the function with this inside the while loop:
if (*list_head == timer || (*list_head)->next == timer){
disableIRQ();
while(1);
}And waited to see where the invocation was coming from. It triggered (on the second condition in the if statement), and this was the call chain:
#0 _add_timer_to_list (list_head=0x200021ac <sampling_stack+1755>, list_head@entry=0x20000630 <timer_list_head>, timer=timer@entry=0x2000148c <comms_stack+1832>) at //sys/xtimer/xtimer_core.c:226
#1 0x00000d12 in _xtimer_set_absolute (timer=0x2000148c <comms_stack+1832>, target=1963159872) at //sys/xtimer/xtimer_core.c:197
#2 0x00000dec in xtimer_set (timer=timer@entry=0x2000148c <comms_stack+1832>, offset=offset@entry=1000000) at //sys/xtimer/xtimer_core.c:133
#3 0x00000a4e in xtimer_set_msg (target_pid=<optimized out>, msg=0x20001484 <comms_stack+1824>, offset=1000000, timer=0x2000148c <comms_stack+1832>) at //sys/xtimer/xtimer.c:124
#4 xtimer_msg_receive_timeout (msg=msg@entry=0x200014bc <comms_stack+1880>, us=us@entry=1000000) at //sys/xtimer/xtimer.c:200
#5 0x00001c94 in comms (arg=<optimized out>) at //Apps/accel/main.c:192
#6 0x00000288 in sched_switch (other_prio=<optimized out>) at //core/sched.c:180
#7 0x20001404 in comms_stack ()
The comms thread looks basically like this (I am working on isolated reproducer):
msg_t dns_msg; //type set to MSG_TYPE_DNS_TIMER once in main
xtimer_t dns_timer; //nobody else uses this
void *comms(void *arg)
{
msg_t msg;
msg_init_queue(_comms_msg_queue, COMMS_QUEUE_SIZE);
gnrc_netreg_entry_t entry = { NULL, 4041, thread_getpid() };
gnrc_netreg_register(GNRC_NETTYPE_UDP, &entry))
while(1) {
if (xtimer_msg_receive_timeout(&msg, NETWORK_MAXRTT_US) < 0)
msg.type = MSG_TYPE_MUST_TX; //default to resending old packet
switch(msg.type) {
case GNRC_NETAPI_MSG_TYPE_RCV: {
/* snip */
gnrc_pktbuf_release((gnrc_pktsnip_t *) msg.content.ptr);
//Deliberately do not break
}
case MSG_TYPE_MUST_TX: { //Sent every ~100ms from sampling_thread
/* snip building packet 'ip' */
gnrc_netapi_dispatch_send(GNRC_NETTYPE_UDP, GNRC_NETREG_DEMUX_CTX_ALL, ip)
break;
}
case MSG_TYPE_DNS_TIMER: { //sent once by main, otherwise only to ourselves
update_dns_record(); //Sends a UDP packet, like above
xtimer_set_msg(&dns_timer, DNS_INTERVAL, &dns_msg, thread_getpid());
break;
}
} //end switch
} //end while
} //endfunSo essentially the xtimer_msg_receive_timeout(&msg, NETWORK_MAXRTT_US) is sometimes causing a stack-allocated timer to appear in the timer list twice. This is pretty interesting considering that there is a call to remove the timer earlier in the call chain, although they are in different critical sections, so maybe something is happening between them.
I will do my best to reproduce this with a small application that runs on the SAMR21-xpro.
Also, apologies if this is user error, I am still getting the hang of RIOT.