I have spent my spare time in the past few days debugging a problem that simply wouldn't go away. Every few seconds, there would be a random byte that was simply skipped out of the UART. Fortunately, the checksum was protecting bad data from getting through, but there were times when more than 10% of the frames were being corrupted!
What's more, the symptoms weren't very consistent with one another, leading me down a few rabbit trails. For instance, compiling with a 64 byte UART buffer was causing problems, but compiling with a 32 byte buffer was not. This lead me to believe that the issue was related to memory map, which took a while to get through before I was convinced that it wasn't a memory issue.
The full symptoms are documented in Issue 7 and Issue 8 of Dispatch.
After all of that messing around, it turns out that the UART driver had a concurrency issue with buffer read/write access. This was a total amateur mistake which I should have caught when writing the UART library, but I didn't.
Lets look at the problem code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | void UART_write(uint8_t* data, uint16_t length){
uint32_t i = 0;
while(i < length){
while(UART_writeable() == 0); /* wait for any current writes to clear */
BUF_write8((Buffer*)&txBuf, data[i]);
i++;
}
/* if the transmit isn't active, then kick-start the
* transmit; the interrupt routine will finish sending
* the remainder of the buffer */
while((U1STAbits.UTXBF == 0) && (BUF_status((Buffer*)&txBuf) != BUFFER_EMPTY)){
U1TXREG = BUF_read8((Buffer*)&txBuf);
}
}
void _ISR _U1TXInterrupt(void){
/* read the byte(s) to be transmitted from the tx circular
* buffer and transmit using the hardware register */
while((BUF_status((Buffer*)&txBuf) != BUFFER_EMPTY)
&& (U1STAbits.UTXBF == 0)){
U1TXREG = BUF_read8((Buffer*)&txBuf);
}
IFS0bits.U1TXIF = 0;
}
|
We have two functions above, UART_write()
and _U1TXInterupt()
. The purpose of UART_write()
is to load the circular buffer and, if necessary, load the U1TXREG
register to get transmit
interrupts started. The purpose of _U1TXInterrupt()
is to empty the circular buffer and to
move the contents to the hardware UART transmit register.
For those that are experienced, you may have already spotted the problem. Both of these functions
have full and unblocked access to the same buffer! In other words, In the middle of a UART_write()
function, an interrupt could occur and the interrupt would modify the same variables before the
UART_write()
was complete. This is NOT a thread-safe design and must be fixed.
Fortunately, the fix is simple. We are going to introduce a locking mechanism that limits access to the variable any time the non-interrupt code is modifying the circular buffer.
The fixed code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | volatile static uint8_t writeLock = 0;
void UART_write(uint8_t* data, uint16_t length){
uint32_t i = 0;
while(i < length){
while(UART_writeable() == 0); /* wait for any current writes to clear */
writeLock = 1;
BUF_write8((Buffer*)&txBuf, data[i]);
writeLock = 0;
i++;
}
/* if the transmit isn't active, then kick-start the
* transmit; the interrupt routine will finish sending
* the remainder of the buffer */
writeLock = 1;
while((U1STAbits.UTXBF == 0) && (BUF_status((Buffer*)&txBuf) != BUFFER_EMPTY)){
U1TXREG = BUF_read8((Buffer*)&txBuf);
}
writeLock = 0;
}
void _ISR _U1TXInterrupt(void){
if(writeLock == 0){
/* read the byte(s) to be transmitted from the tx circular
* buffer and transmit using the hardware register */
while((BUF_status((Buffer*)&txBuf) != BUFFER_EMPTY)
&& (U1STAbits.UTXBF == 0)){
U1TXREG = BUF_read8((Buffer*)&txBuf);
}
}
IFS0bits.U1TXIF = 0;
}
|
The writeLock
variable is used to provide access to the buffer. Note that only non-interrupt
code will modify writeLock
while the interrupt code will only read writeLock
. Also note that
the writeLock is active for the shortest period of time possible, which is just before and just
after the buffer accesses.
This case has one additional wrinkle. The interrupt is the primary method of getting bytes to the
transmit register, but we have written the writeLock
in such a way that the entire _U1TXInterrupt()
is basically skipped if writeLock
is active... how do we know that everything will be transmitted?
The first writeLock
occurs at lines 9-11, which is where the bulk of the parameter array is
shifted into the circular buffer. This is the likely place for a collision between the two buffer
accesses. If you look to lines 19-23, we have an additional load of the U1TXREG if the transmit
registers are empty and the circular buffer is not empty. In this way, we ensure that the process
continues even if an interrupt was skipped during the writeLock event.
It is not necessary to make all of your code thread-safe, but it is absolutely critical to make any variables that are modified by both interrupt and main-line code thread-safe. If you have random problems that don't seem very consistent, this is an area to focus on.
There are books written about this issue and the approach that I took to solve the problem - though valid - is not very sophisticated. For more sophisticated methods, you might want to utilize an RTOS or other similar architectures that support semaphores and limited access.