finalfusion specification

Version 0

Goals

finalfusion is a format for storing word embeddings. The goals of the first version of the finalfusion format are:

Easy to parse
Fast to parse
Extensible
Support for:
- Memory mapping
- Tokens with spaces
- Subword units
- Quantized matrices
Existing embeddings should be convertible

Each finalfusion file consists of a header, followed by chunks. Currently, a finalfusion file must contain the following chunk order:

The permitted chunks may be extended in a future version of the specification. In particular, we would like to make it possible:

All data must be in little endian byte order.

The header consists of:

i8
u8
i16
u16
i32
u32
i64
u64
i128
u128
f32
f64

The chunk format is as follows:

Chunk identifier: 1
Vocab length: u64 (vocab_len)
vocab_len times:
- word length in bytes: u32 (word_len)
- word_len times u8.

Chunk identifier: 3
Minimum n-gram length: u32
Maximum n-gram length: u32
Bucket exponent: u32
Vocab length: u64 (vocab_len)
vocab_len times:
- word length in bytes: u32 (word_len)
- word_len times u8.

Chunk identifier: 7
Minimum n-gram length: u32
Maximum n-gram length: u32
Number of buckets: u32
Vocab length: u64 (vocab_len)
vocab_len times:
- word length in bytes: u32 (word_len)
- word_len times u8.

Chunk identifier: 8
Minimum n-gram length: u32
Maximum n-gram length: u32
Vocab length: u64 (vocab_len)
vocab_len times:
- word length in bytes: u32 (word_len)
- word_len times u8.
N-gram vocab length: u64 (n_ngrams)
n_ngrams times:
- n-gram length in bytes: u32 (ngram_len)
- ngram_len times u8.

Chunk identifier: 4
Use projection (0 or 1): u32
Use norms (0 or 1): u32
Quantized embedding length: u32 (quantized_len)
Reconstructed embedding length: u32 (reconstructed_len)
Number of quantizer centroids: u32
Quantized matrix rows: u64 (matrix_rows)
Quantized matrix type: u32 (quantized_type)
Reconstruced matrix type: u32 (reconstructed_type)
Padding, such that data is at a multiple of the largest matrix data type.
Projection matrix: reconstructed_len x reconstructed_len x sizeof(reconstructed_type)
Subquantizers: quantized_len x (reconstructed_len / quantized_len) x sizeof(quantized_type)
Norms: matrix_rows x sizeof(reconstructed_type)
Quantized embedding matrix: matrix_rows x quantized_len x sizeof(reconstructed_type)