Pokology - a community-driven site around GNU poke
_____
---' __\_______
______) Tips and Tricks
__)
__)
---._______)
Table of Contents
_________________
1. Sequence of bytes ended by two consecutive zero bytes
2. Sub-byte endianness swaps
1 Sequence of bytes ended by two consecutive zero bytes
=======================================================
So today I was writing a pickle for the BER encoding of ASN-1 and came
with an interesting problem: when the length of the contents of a BER
data value is indefinite (not explicitly specified) we shall read
bytes until we find an end marker that consists on two consecutive 0
bytes.
Sounds simple enough... but I really had to think for a while before
coming to this:
,----
| type BER_Variable_Contents =
| struct
| {
| type Datum =
| union
| {
| byte[2] pair : pair[1] != 0UB;
| byte single : single != 0UB;
| };
|
| Datum[] data;
| byte[2] end : end == [0UB,0UB];
| };
`----
How does this work?
First, notice that there must be at least two bytes in the IO space,
which would correspond to the two 0 bytes that terminate the sequence.
Otherwise it would be an integrity error.
When it reaches the data field of the struct, which is an unbounded
array, poke will decode Datum=s and add them to the array =data until
either a constraint fails or the end of the IO space is reached. So
it will first try to decode a pair of bytes in which the second byte
is not zero. If that fails (due to either a constraint error or EOF)
then it will try to map a single byte that is not zero. If that also
fails, poke is done with data and then it expects the two zero bytes
in end.
This is how the resulting structures look like:
,----
| (poke) .mem
| (poke) byte[] @ 0#B = ['a','b','c','d','e']
| (poke) BER_Variable_Contents @ 0#B
| BER_Variable_Contents {
| data=[Datum {
| pair=[0x61UB,0x62UB]
| },Datum {
| pair=[0x63UB,0x64UB]
| },Datum {
| single=0x65UB
| }],
| end=[0x0UB,0x0UB]
| }
`----
A suitable method could be added to the struct type in order to get an
array with all the bytes:
,----
| method get_bytes = byte[]:
| {
| var bytes = byte[]();
| for (d in data)
| try bytes += d.pair;
| catch if E_elem { bytes += [d.single]; }
| return bytes;
| }
`----
In action:
,----
| (poke) (BER_Variable_Contents @ 0#B).get_bytes
| [0x61UB,0x62UB,0x63UB,0x64UB,0x65UB]
`----
2 Sub-byte endianness swaps
===========================
Byte-endianness is typically handled quite transparently in poke by
using integral types and integral structs. However, endianness can
have an impact on the ordering of C bit-fields in a structure, even
when the bit-fields are smaller than one byte. This ordering is
indeed implementation specific.
A good (or bad) example of this is the way registers are encoded in
eBPF instructions. eBPF is the in-kernel virtual machine of Linux,
and features an ISA with ten general-purpose registers. eBPF
instructions generally use two registers, namely the source register
and the destination register. Each register is encoded using 4 bits,
and the fields encoding registers are consecutive in the instructions.
Typical. However, the kernel hackers used the following C struct to
denote BPF instructions:
,----
| struct bpf_insn {
| __u8 code; /* opcode */
| __u8 dst_reg:4; /* dest register */
| __u8 src_reg:4; /* source register */
| __s16 off; /* signed offset */
| __s32 imm; /* signed immediate constant */
| };
`----
The ordering of the sub-byte bit fields (like dst_reg and src_reg) can
change depending on the endianess, and it is implementation-specific.
In the case of GCC the source and destination register fields are
switched depending on the endianness.
In big-endian systems the order is:
,----
| dst:4 src:4
`----
Whereas in little-endian systems the order is:
,----
| src:4 dst:4
`----
In Poke, the obvious way of representing data whose structure depends
on some condition is using an union. In this case, it could read like
this:
,----
| type BPF_Insn_Regs =
| union
| {
| struct
| {
| BPF_Reg src;
| BPF_Reg dst;
| } le : get_endian == ENDIAN_LITTLE;
|
| struct
| {
| BPF_Reg dst;
| BPF_Reg src;
| } be;
| };
`----
Note the call to the get_endian function (which takes no arguments and
thus can be called Algol68-style, without specifying an empty argument
list) in the constraint of the union alternative. This way, the
register fields will have the right order corresponding to the current
endianness.
Nifty. However, there is an ever better way to denote the structure
of these fields. This is it:
,----
| type BPF_Insn_Regs =
| struct
| {
| var little_p = (get_endian == ENDIAN_LITTLE);
|
| BPF_Reg src @ !little_p * 4#b;
| BPF_Reg dst @ little_p * 4#b;
| };
`----
This version, where the ordering of the fields is implemented using
field labels, is not only more compact, but also has the virtue of not
requiring additional "intermediate" fields like le and be above. It
also shows how convenient can be to declare variables inside structs.