In transferring data, there are many layers that one can add in order to ensure that the communication has certain properties. For instance, up to this point, we have implemented the physical layer and part of the the data layer. The physical layer specifies thevoltages we are permitted to operate on and the data layer ensures that each element transferred is a byte.
In this post, we are going to elaborate on the data layer and enable arbitrary transfers that contain a series of bytes or a block of data. What those bytes mean won't be won't matter at this level, we just need to ensure that the data that is received is actually the data that was intended.
Framing is a method of sending a coherent block of data and verifying that the received data is exactly the intended data.
The first element of the frame is the payload. In our data model, the frame payload has a minimum size of 1 byte.
Once the sender knows the contents of the frame payload, it can calculate a CRC or checksum of the data and append that verification code to the end of the data. This verification code allows the receiver to verify that the payload is unchanged between sender and receiver.
The Start-of-Frame (SOF) flag is a specific value that indicates that the frame is about to begin. The SOF will not occur at any place within the frame (see 'Escape Sequences' below) and will not occur again until the next frame starts. The SOF itself is discarded by the receiver as it is not valid data in itself.
The End-of-Frame (EOF) flag is exactly analagous to the SOF flag except that it delimits the end of a frame. All data between the SOF and EOF are data to be consumed by the receiver while the SOF and EOF flags are discarded by the receiver.
One might imagine that the payload itself might contain a byte of data that happens to be the of the same value as the SOF or EOF. What are we to do when this occurs? This is where we escape the character and switch it out for another so that the entire sequence between SOF and EOF remains free of SOF and EOF characters. Thus, we find ourselves with a third special character called the ESC character. As the data is being framed, each byte is examined to ensure that it doesn't correspond to the SOF, EOF, or ESC characters. If it does, then the ESC character is inserted and the offending character is XORed with a value that changes its transmitted value. When the receiver encounters the ESC character, the ESC character is discarded and the very next character in the sequence is XORed with the same value, returning the string of characters to its original value.
The numbers that were chosen for the escape sequences are as follows:
#define SOF 0xf7
#define EOF 0x7f
#define ESC 0xf6
#define ESC_XOR 0x20
There is nothing special about these numbers. They are simply chosen based on observation of other similar implementations.
Assume, for a moment, that we have a sequence of data that is to be transmitted.
The data is the payload, so we will designate it PL[0] through PL[4] for a 5-byte
sequence that must be transmitted.
The first step in framing is to simply calculate the verification sequence. In
this case, we will be calculating a fletcher16 checksum and append the value to
the end as CK[0] and CK[1]:
We append the checksum to the end of the data. Next, we must add any escaping that is necessary. In this sequence, lets say that PL[2] corresponds to the value of the SOF, EOF, or ESC characters. Note that the checksum is subject to escaping as well.
Once all of the SOF, EOF, and ESC characters have been removed by the escaping process, we can add our SOF and EOF secure in the knowledge that there is no data in between that will coincide with the SOF, EOF, or ESC that isn't supposed to be there.
The receiver now re-executes this sequence in reverse. It first removes the SOF and EOF flags and discards them. The receiver then goes through the sequence identifying ESC characters and, where found, discards the ESC character, XORs the next character with the proper escape value, and saves that value as the data. Once the data has been un-framed and un-escaped, the receiver then calculates the checksum on the data and compares it with the checksum appended to the end of the data. If the checksum matches, then the data is accepted as a frame. If the checksum does not match, the entire data set is discarded.
One thing to take note of. As the features of the network increase, so, too, does the amount of metadata required. For instance, in our simple framing example above, it took 10 characters to transmit a single 5 characters. This is an abysmal 50% transmission efficiency just to frame the characters! The advantage is that we can send messages that are far longer than 5 bytes. If we were to send a similar message of a length of 100 bytes with a 1.2% escape occurance, then we would end up adding 6 bytes for framing (2 checksum, 1 SOF, 1 EOF, and 1.2 rounded up to 2 for escaping). This would result in 94% efficiency. For this reason, it is critical for bandwidth that higher-level protocols transmit as many bytes in a single frame as possible in order to better leverage the available channels.
In order to implement framing, we must have two byte-aligned circular buffers in the software layer just below our framing layer. The software layer must have available four functions:
function | description |
---|---|
UART_readable() | returns the number of full slots in the receive buffer |
UART_writeable() | returns the number of empty slots in the send buffer |
UART_read() | reads a number of bytes from the receive circular buffer |
UART_write() | writes a number of bytes to the send circular buffer |
The underlying UART library must take care of sending and receiving all bytes
written to and received from the circular buffers. In our case, we are using
the circular buffers written about in a previous article
to store UART data in both directions.
We are going to write three functions, FRM_push(), FRM_pull(), and FRM_fletcher16().
function | description |
---|---|
FRM_push() | "pushes" data to the UART for sending after framing |
FRM_pull() | "pulls" data from the UART for receiving and de-frames |
FRM_fletcher16() | calculates the Fletcher16 checksum of the data |
Data is 'pushed' out from higher level software using the FRM_push function:
void FRM_push(uint8_t* data, uint16_t length)
An array of data and the data length are supplied to the function. This data corresponds to our payload from the initial framing discussions. At this point, there is no SOF, EOF, checksum, or ESC applied to the payload.
The C code is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | void FRM_push(uint8_t* data, uint16_t length){
uint8_t txFrame[TX_FRAME_LENGTH];
int16_t i = 0, txIndex = 0;
uint16_t check = FRM_fletcher16(data, length);
uint8_t check0 = (uint8_t)(check & 0x00ff);
uint8_t check1 = (uint8_t)((check & 0xff00) >> 8);
/* start the frame */
txFrame[txIndex] = SOF;
txIndex++;
/* copy the data from data to the txFrame */
while(i < length){
/* add proper escape sequences */
if((data[i] == SOF) || (data[i] == EOF) || (data[i] == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = data[i] ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = data[i];
txIndex++;
}
i++;
}
/* append the checksum ensuring that the values are properly escaped */
if((check0 == SOF) || (check0 == EOF) || (check0 == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = check0 ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = check0;
txIndex++;
}
if((check1 == SOF) || (check1 == EOF) || (check1 == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = check1 ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = check1;
txIndex++;
}
/* end the frame */
txFrame[txIndex] = EOF;
txIndex++;
/* wait for the UART circular buffer to clear from
* any currently in-progress transmissions */
while(UART_writeable() < txIndex);
UART_write(txFrame, txIndex);
}
|
The comments explain reasonably well what is going on, but lets take a few lines at a time anyway
uint8_t txFrame[TX_FRAME_LENGTH];
int16_t i = 0, txIndex = 0;
These lines simply declare some useful variables. The txFrame[]
array is
the place in which we will build the frame to be transmitted. The txIndex
variable will act as an index for txFrame as the frame is being constructed.
The i
variable will be used throughout the function for looping and is
declared here for convenience.
uint16_t check = FRM_fletcher16(data, length);
uint8_t check0 = (uint8_t)(check & 0x00ff);
uint8_t check1 = (uint8_t)((check & 0xff00) >> 8);
First, we calculate the fletcher checksum and store the 16-bit results into
two 8-bit registers, check0
and check1
. This calculation is done on the
raw data before any data is stored on the frame.
txFrame[txIndex] = SOF;
txIndex++;
Next, we start the frame. Based on previous discussions, this should be
clear, but note that we increment the txIndex
. Each write to the txFrame[]
must be accompanied by an increment of the txIndex
so that data doesn't
get overwritten.
while(i < length){
/* add proper escape sequences */
if((data[i] == SOF) || (data[i] == EOF) || (data[i] == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = data[i] ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = data[i];
txIndex++;
}
i++;
}
We now copy the majority of the data into the txFrame[]
array. Note that
we have an if
condition that ensures that SOF, EOF, and ESC characters are
identified and properly escaped.
if((check0 == SOF) || (check0 == EOF) || (check0 == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = check0 ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = check0;
txIndex++;
}
if((check1 == SOF) || (check1 == EOF) || (check1 == ESC)){
txFrame[txIndex] = ESC;
txIndex++;
txFrame[txIndex] = check1 ^ ESC_XOR;
txIndex++;
}else{
txFrame[txIndex] = check1;
txIndex++;
}
Nearing the end now, each checkx
byte that was saved previously is also
checked to ensure that it doesn't coincide with the SOF, EOF, or ESC characters
before saving on the txFrame[]
array.
txFrame[txIndex] = EOF;
txIndex++;
Of course, the final step to constructing the frame is saving the EOF character.
At this point, the frame is completely constructed and ready to be written
to the UART using UART_write()
. It is prudent to ensure that the number of
bytes to be written will actually fit in the transmit buffer, so we place a
check and wait statement. The software waits for UART_writeable()
to return
a value less than the length of the frame before writing to it.
while(UART_writeable() < txIndex);
UART_write(txFrame, txIndex);
Note that this method would still allow multiple frames to stack up in the outbound circular buffer, as long as those frames were relatively short. This effectively stalls the processor and keeps it from losing data. Care must be taking to write a reaonable amount of data through this interface or the processor will bottleneck at this interface.
I will leave FRM_pull()
as an exercise for the reader. One solution can be found
in the project github repository.
For the Python code, we will take a closer look at the de-framing mechanism. I should include the caveat that all code is subject to change, depending on what sorts of bugs are found. The presented code is what is working today.
First off, a class called Frame
is created, which creates a thread called run
.
This thread will be run continually in order to extract frames into a list of
lists rx_messages
, which is part of the Frame object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | def run(self):
while True:
for element in self.port.read(1000):
self.raw.append(element)
# pull out the frames
while (self.SOF in self.raw) and (self.EOF in self.raw):
# find the SOF by removing bytes in front of it
while self.raw[0] != self.SOF:
self.raw.pop(0)
# find the EOF
for i, element in enumerate(self.raw):
if element == self.EOF:
end_char_index = i
break
frame = self.raw[:end_char_index]
self.raw = self.raw[end_char_index:]
# remove the SOF and EOF from the frame
frame.pop(0)
message = []
escape_flag = False
for element in frame:
if escape_flag is False:
if element == self.ESC:
escape_flag = True
else:
message.append(element)
else:
message.append(element ^ self.ESC_XOR)
escape_flag = False
# remove the fletcher16 checksum
f16_check = message.pop(-1) * 256
f16_check += message.pop(-1)
# calculate the checksum
if self.fletcher16_checksum(message) == f16_check:
self.rx_messages.append(message)
time.sleep(0.1)
|
Again, a few lines at a time...
while True:
for element in self.port.read(1000):
self.raw.append(element)
The code will be executed for as long as the Frame
object is in memory.
Also, the code will continually read up to 1000 bytes from the serial port
self.port
and append to self.raw
in preparation for processing.
while (self.SOF in self.raw) and (self.EOF in self.raw):
When self.raw
contains at least one SOF and one EOF chracter, process the
detected frame.
while self.raw[0] != self.SOF:
self.raw.pop(0)
These lines remove any residual data from in front of self.raw
up to the
first detected SOF.
for i, element in enumerate(self.raw):
if element == self.EOF:
end_char_index = i
break
frame = self.raw[:end_char_index]
self.raw = self.raw[end_char_index:]
The frame
list is created from the values between the SOF and EOF of
self.raw
. This swath is then removed from self.raw
so that it doesn't
get re-processed later. From this point on, the data of concern is within
frame
.
frame.pop(0)
message = []
escape_flag = False
for element in frame:
if escape_flag is False:
if element == self.ESC:
escape_flag = True
else:
message.append(element)
else:
message.append(element ^ self.ESC_XOR)
escape_flag = False
The message
list will store the final de-framed data. Note that any
escaped variables are caught before appending data to message
. This
is a stateful execution, meaning that the state of this code is either
'escaped' or 'not escaped' using escape_flag
.
# remove the fletcher16 checksum
f16_check = message.pop(-1) * 256
f16_check += message.pop(-1)
# calculate the checksum
if self.fletcher16_checksum(message) == f16_check:
self.rx_messages.append(message)
The last two bytes of the received data contain the Fletcher16 value.
This value is first removed from the message and integrated back to
f16_check
. If the calculated checksum value is the same as
f16_check
, then the complete processed data is placed into
self.rx_messages
for later processing. If the check fails, the
data is discarded.
For a test of our implementation, we are sending a short frame from the micrcontroller to the PC using the presented code.
1 2 3 4 | uint8_t txData[] = {0,0xf7,0,0x7f,0,0xf6,6,7};
uint16_t txLength = 8;
FRM_push(txData, txLength);
|
Note that we chose values in txData
so that the escape sequences would
be necessary in order to transmit the data. We displayed the data on the
console using the print statement:
[0,247,0,127,0,246,6,7]
Which is exactly what we intended to send, except as a list rather than an array.
For full implementations, be sure to visit the github repository.