GONet Raw Data Format

Overview

GONet cameras save raw image data in a special 12-bit packed format. Each frame is saved as a .jpg file, but the pixel data inside the file is encoded using a compact Bayer-mosaiced scheme with 12 bits per pixel. This allows storage of high dynamic range data with minimal space usage.

The sensor uses a BGGR Bayer pattern, as shown below:

Byte Parsing

Each pixel is encoded with 12 bits, and every group of two pixels is packed into 3 consecutive bytes. The format is as follows:

Byte 0: lower 8 bits of Pixel 0
Byte 1: lower 8 bits of Pixel 1
Byte 2: upper 4 bits of both Pixel 0 and Pixel 1, packed as two nibbles

This packaging is commonly known as a 12-bit packed little endian format. The following diagram shows how 3 bytes form 2 pixel values:

Byte-wise packing of 12-bit pixels — Packing of 12-bit pixel values into 8-bit bytes.

In GONetFile._parse_jpg_file() raw files are read in binary mode. In order to reconstruct the original pixels values from the bytes, the following procedures are executed. Let’s assume the bytes are listed in a bytes_array numpy.array

To recreate the first pixel, the first element of every block of 3 elements of bytes_array is left shifted by 4 bits (using the operator <<). The third element of every block of 3 elements is then cut to the first 4 less significant bits (done by using an & operator with the number 15, which is 1111). These 2 new numbers are then summed.
```
byte0 = b'00010110'
byte0_left_shifted = byte0 << 4 # -> 000101100000
byte2 = b'10100111'
byte2_cut = byte2 & 15 # -> 0111
pixel0 = byte0_left_shifted + byte2_cut # -> 000101100111
```
To recreate the second pixel, the second element of every block of 3 elements of bytes_array is left shifted by 4 bits. The third element of every block of 3 elements is then right shifted by 4 bits (using the operator >>). These 2 new numbers are then summed.
```
byte0 = b'11000110'
byte0_left_shifted = byte0 << 4 # -> 110001100000
byte2_right_shifted = byte2 >> 4 # -> 1010
pixel0 = byte0_left_shifted + byte2_right_shifted # -> 110001101010
```

Each recreated pixel is then stored in each channel following the bayer pattern.

Reconstructing the full image

After decoding the raw bytes from the sensor, we reconstruct the pixel values and arrange them into three separate images—one for each color channel: red, green, and blue. Due to the nature of the Bayer pattern, the sensor contains twice as many green pixels as red or blue. To create a balanced color representation, we define a super pixel as a 2×2 block consisting of one red, one blue, and two green pixels. For each super pixel, we compute the green value as the average of its two green pixels, while red and blue are taken directly. As a result of this averaging, the green channel will exhibit different statistical properties, such as reduced variance, compared to the red and blue channels.