bcs/
lib.rs

1// Copyright (c) The Diem Core Contributors
2// SPDX-License-Identifier: Apache-2.0
3
4#![forbid(unsafe_code)]
5
6//! # Binary Canonical Serialization (BCS)
7//!
8//! BCS (formerly "Libra Canonical Serialization" or LCS) is a serialization format developed
9//! in the context of the [Diem](https://diem.com) blockchain.
10//!
11//! BCS was designed with the following main goals in mind:
12//! * provide good performance and concise (binary) representations;
13//! * support a rich set of data types commonly used in Rust;
14//! * enforce canonical serialization, meaning that every value of a given type should have
15//! a single valid representation.
16//!
17//! BCS also aims to mitigate the consequence of malicious inputs by enforcing well-defined limits
18//! on large or nested containers during (de)serialization.
19//!
20//! ## Rust Implementation
21//!
22//! This crate provides a Rust implementation of BCS as an encoding format for the [Serde library](https://serde.rs).
23//! As such, this implementation covers most data types supported by Serde -- including user-defined structs,
24//! tagged variants (Rust enums), tuples, and maps -- excluding floats, single unicode characters (char), and sets.
25//!
26//! BCS is also available in other programming languages, thanks to the separate project [serde-reflection](https://github.com/novifinancial/serde-reflection).
27//!
28//! ## Application to Cryptography
29//!
30//! The BCS format guarantees canonical serialization, meaning that for any given data type, there
31//! is a one-to-one correspondance between in-memory values and valid byte representations.
32//!
33//! In the context of a cryptographic application, canonical serialization has several benefits:
34//! * It provides a natural and reliable way to associate in-memory values to cryptographic hashes.
35//! * It allows the signature of a message to be defined equivalently as the signature of the serialized bytes or as the signature of the in-memory value.
36//!
37//! Note that BCS ensures canonical serialization for each data type separately. The data type of a serialized value
38//! must be enforced by the application itself. This requirement is typically fulfilled
39//! using unique hash seeds for each data type. (See [Diem's cryptographic library](https://github.com/diem/diem/blob/master/crypto/crypto/src/hash.rs) for an example.)
40//!
41//! ## Backwards Compatibility
42//!
43//! By design, BCS does not provide implicit versioning or backwards/forwards compatibility, therefore
44//! applications must carefully plan in advance for adhoc extension points:
45//! * Enums may be used for explicit versioning and backward compatibility (e.g. extensible query interfaces).
46//! * In some cases, data fields of type `Vec<u8>` may also be added to allow (future) unknown payloads
47//! in serialized form.
48//!
49//! ## Detailed Specifications
50//!
51//! BCS supports the following data types:
52//!
53//! * Booleans
54//! * Signed 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers
55//! * Unsigned 8-bit, 16-bit, 32-bit, 64-bit, and 128-bit integers
56//! * Option
57//! * Unit (an empty value)
58//! * Fixed and variable length sequences
59//! * UTF-8 Encoded Strings
60//! * Tuples
61//! * Structures (aka "structs")
62//! * Externally tagged enumerations (aka "enums")
63//! * Maps
64//!
65//! BCS is not a self-describing format. As such, in order to deserialize a message, one must
66//! know the message type and layout ahead of time.
67//!
68//! Unless specified, all numbers are stored in little endian, two's complement format.
69//!
70//! ### Recursion and Depth of BCS Data
71//!
72//! Recursive data-structures (e.g. trees) are allowed. However, because of the possibility of stack
73//! overflow during (de)serialization, the *container depth* of any valid BCS data cannot exceed the constant
74//! `MAX_CONTAINER_DEPTH`. Formally, we define *container depth* as the number of structs and enums traversed
75//! during (de)serialization.
76//!
77//! This definition aims to minimize the number of operations while ensuring that
78//! (de)serialization of a known BCS format cannot cause arbitrarily large stack allocations.
79//!
80//! As an example, if `v1` and `v2` are values of depth `n1` and `n2`,
81//! * a struct value `Foo { v1, v2 }` has depth `1 + max(n1, n2)`;
82//! * an enum value `E::Foo { v1, v2 }` has depth `1 + max(n1, n2)`;
83//! * a pair `(v1, v2)` has depth `max(n1, n2)`;
84//! * the value `Some(v1)` has depth `n1`.
85//!
86//! All string and integer values have depths `0`.
87//!
88//! ### Booleans and Integers
89//!
90//! |Type                       |Original data          |Hex representation |Serialized bytes        |
91//! |---                        |---                    |---                |---                     |
92//! |Boolean                    |True / False           |0x01 / 0x00        |01 / 00                 |
93//! |8-bit signed integer       |-1                     |0xFF               |FF                      |
94//! |8-bit unsigned integer     |1                      |0x01               |01                      |
95//! |16-bit signed integer      |-4660                  |0xEDCC             |CC ED                   |
96//! |16-bit unsigned integer    |4660                   |0x1234             |34 12                   |
97//! |32-bit signed integer      |-305419896             |0xEDCBA988         |88 A9 CB ED             |
98//! |32-bit unsigned integer    |305419896              |0x12345678         |78 56 34 12             |
99//! |64-bit signed integer      |-1311768467750121216   |0xEDCBA98754321100 |00 11 32 54 87 A9 CB ED |
100//! |64-bit unsigned integer    |1311768467750121216    |0x12345678ABCDEF00 |00 EF CD AB 78 56 34 12 |
101//!
102//! ### ULEB128-Encoded Integers
103//!
104//! The BCS format also uses the [ULEB128 encoding](https://en.wikipedia.org/wiki/LEB128) internally
105//! to represent unsigned 32-bit integers in two cases where small values are usually expected:
106//! (1) lengths of variable-length sequences and (2) tags of enum values (see the corresponding
107//! sections below).
108//!
109//! |Type                       |Original data          |Hex representation |Serialized bytes   |
110//! |---                        |---                    |---                |---                |
111//! |ULEB128-encoded u32-integer|2^0 = 1                |0x00000001         |01                 |
112//! |                           |2^7 = 128              |0x00000080         |80 01              |
113//! |                           |2^14 = 16384           |0x00004000         |80 80 01           |
114//! |                           |2^21 = 2097152         |0x00200000         |80 80 80 01        |
115//! |                           |2^28 = 268435456       |0x10000000         |80 80 80 80 01     |
116//! |                           |9487                   |0x0000250f         |8f 4a              |
117//!
118//! In general, a ULEB128 encoding consists of a little-endian sequence of base-128 (7-bit)
119//! digits. Each digit is completed into a byte by setting the highest bit to 1, except for the
120//! last (highest-significance) digit whose highest bit is set to 0.
121//!
122//! In BCS, the result of decoding ULEB128 bytes is required to fit into a 32-bit unsigned
123//! integer and be in canonical form. For instance, the following values are rejected:
124//! * 80 80 80 80 80 01 (2^36) is too large.
125//! * 80 80 80 80 10 (2^33) is too large.
126//! * 80 00 is not a minimal encoding of 0.
127//!
128//! ### Optional Data
129//!
130//! Optional or nullable data either exists in its full representation or does not. BCS represents
131//! this as a single byte representing the presence `0x01` or absence `0x00` of data. If the data
132//! is present then the serialized form of that data follows. For example:
133//!
134//! ```rust
135//! # use bcs::{Result, to_bytes};
136//! # fn main() -> Result<()> {
137//! let some_data: Option<u8> = Some(8);
138//! assert_eq!(to_bytes(&some_data)?, vec![1, 8]);
139//!
140//! let no_data: Option<u8> = None;
141//! assert_eq!(to_bytes(&no_data)?, vec![0]);
142//! # Ok(())}
143//! ```
144//!
145//! ### Fixed and Variable Length Sequences
146//!
147//! Sequences can be made of up of any BCS supported types (even complex structures) but all
148//! elements in the sequence must be of the same type. If the length of a sequence is fixed and
149//! well known then BCS represents this as just the concatenation of the serialized form of each
150//! individual element in the sequence. If the length of the sequence can be variable, then the
151//! serialized sequence is length prefixed with a ULEB128-encoded unsigned integer indicating
152//! the number of elements in the sequence. All variable length sequences must be
153//! `MAX_SEQUENCE_LENGTH` elements long or less.
154//!
155//! ```rust
156//! # use bcs::{Result, to_bytes};
157//! # fn main() -> Result<()> {
158//! let fixed: [u16; 3] = [1, 2, 3];
159//! assert_eq!(to_bytes(&fixed)?, vec![1, 0, 2, 0, 3, 0]);
160//!
161//! let variable: Vec<u16> = vec![1, 2];
162//! assert_eq!(to_bytes(&variable)?, vec![2, 1, 0, 2, 0]);
163//!
164//! let large_variable_length: Vec<()> = vec![(); 9_487];
165//! assert_eq!(to_bytes(&large_variable_length)?, vec![0x8f, 0x4a]);
166//! # Ok(())}
167//! ```
168//!
169//! ### Strings
170//!
171//! Only valid UTF-8 Strings are supported. BCS serializes such strings as a variable length byte
172//! sequence, i.e. length prefixed with a ULEB128-encoded unsigned integer followed by the byte
173//! representation of the string.
174//!
175//! ```rust
176//! # use bcs::{Result, to_bytes};
177//! # fn main() -> Result<()> {
178//! // Note that this string has 10 characters but has a byte length of 24
179//! let utf8_str = "çå∞≠¢õß∂ƒ∫";
180//! let expecting = vec![
181//!     24, 0xc3, 0xa7, 0xc3, 0xa5, 0xe2, 0x88, 0x9e, 0xe2, 0x89, 0xa0, 0xc2,
182//!     0xa2, 0xc3, 0xb5, 0xc3, 0x9f, 0xe2, 0x88, 0x82, 0xc6, 0x92, 0xe2, 0x88, 0xab,
183//! ];
184//! assert_eq!(to_bytes(&utf8_str)?, expecting);
185//! # Ok(())}
186//! ```
187//!
188//! ### Tuples
189//!
190//! Tuples are typed composition of objects: `(Type0, Type1)`
191//!
192//! Tuples are considered a fixed length sequence where each element in the sequence can be a
193//! different type supported by BCS. Each element of a tuple is serialized in the order it is
194//! defined within the tuple, i.e. [tuple.0, tuple.2].
195//!
196//! ```rust
197//! # use bcs::{Result, to_bytes};
198//! # fn main() -> Result<()> {
199//! let tuple = (-1i8, "diem");
200//! let expecting = vec![0xFF, 4, b'd', b'i', b'e', b'm'];
201//! assert_eq!(to_bytes(&tuple)?, expecting);
202//! # Ok(())}
203//! ```
204//!
205//!
206//! ### Structures
207//!
208//! Structures are fixed length sequences consisting of fields with potentially different types.
209//! Each field within a struct is serialized in the order specified by the canonical structure
210//! definition. Structs can exist within other structs and as such, BCS recurses into each struct
211//! and serializes them in order. There are no labels in the serialized format, the struct ordering
212//! defines the organization within the serialization stream.
213//!
214//! ```rust
215//! # use bcs::{Result, to_bytes};
216//! # use serde::Serialize;
217//! # fn main() -> Result<()> {
218//! #[derive(Serialize)]
219//! struct MyStruct {
220//!     boolean: bool,
221//!     bytes: Vec<u8>,
222//!     label: String,
223//! }
224//!
225//! #[derive(Serialize)]
226//! struct Wrapper {
227//!     inner: MyStruct,
228//!     name: String,
229//! }
230//!
231//! let s = MyStruct {
232//!     boolean: true,
233//!     bytes: vec![0xC0, 0xDE],
234//!     label: "a".to_owned(),
235//! };
236//! let s_bytes = to_bytes(&s)?;
237//! let mut expecting = vec![1, 2, 0xC0, 0xDE, 1, b'a'];
238//! assert_eq!(s_bytes, expecting);
239//!
240//! let w = Wrapper {
241//!     inner: s,
242//!     name: "b".to_owned(),
243//! };
244//! let w_bytes = to_bytes(&w)?;
245//! assert!(w_bytes.starts_with(&s_bytes));
246//!
247//! expecting.append(&mut vec![1, b'b']);
248//! assert_eq!(w_bytes, expecting);
249//! # Ok(())}
250//! ```
251//!
252//! ### Externally Tagged Enumerations
253//!
254//! An enumeration is typically represented as a type that can take one of potentially many
255//! different variants. In BCS, each variant is mapped to a variant index, a ULEB128-encoded 32-bit unsigned
256//! integer, followed by serialized data if the type has an associated value. An
257//! associated type can be any BCS supported type. The variant index is determined based on the
258//! ordering of the variants in the canonical enum definition, where the first variant has an index
259//! of `0`, the second an index of `1`, etc.
260//!
261//! ```rust
262//! # use bcs::{Result, to_bytes};
263//! # use serde::Serialize;
264//! # fn main() -> Result<()> {
265//! #[derive(Serialize)]
266//! enum E {
267//!     Variant0(u16),
268//!     Variant1(u8),
269//!     Variant2(String),
270//! }
271//!
272//! let v0 = E::Variant0(8000);
273//! let v1 = E::Variant1(255);
274//! let v2 = E::Variant2("e".to_owned());
275//!
276//! assert_eq!(to_bytes(&v0)?, vec![0, 0x40, 0x1F]);
277//! assert_eq!(to_bytes(&v1)?, vec![1, 0xFF]);
278//! assert_eq!(to_bytes(&v2)?, vec![2, 1, b'e']);
279//! # Ok(())}
280//! ```
281//!
282//! If you need to serialize a C-style enum, you should use a primitive integer type.
283//!
284//! ### Maps (Key / Value Stores)
285//!
286//! Maps are represented as a variable-length, sorted sequence of (Key, Value) tuples. Keys must be
287//! unique and the tuples sorted by increasing lexicographical order on the BCS bytes of each key.
288//! The representation is otherwise similar to that of a variable-length sequence. In particular,
289//! it is preceded by the number of tuples, encoded in ULEB128.
290//!
291//! ```rust
292//! # use bcs::{Result, to_bytes};
293//! # use std::collections::HashMap;
294//! # fn main() -> Result<()> {
295//! let mut map = HashMap::new();
296//! map.insert(b'e', b'f');
297//! map.insert(b'a', b'b');
298//! map.insert(b'c', b'd');
299//!
300//! let expecting = vec![(b'a', b'b'), (b'c', b'd'), (b'e', b'f')];
301//!
302//! assert_eq!(to_bytes(&map)?, to_bytes(&expecting)?);
303//! # Ok(())}
304//! ```
305
306mod de;
307mod error;
308mod ser;
309pub mod test_helpers;
310
311/// Variable length sequences in BCS are limited to max length of 2^31 - 1.
312pub const MAX_SEQUENCE_LENGTH: usize = (1 << 31) - 1;
313
314/// Maximal allowed depth of BCS data, counting only structs and enums.
315pub const MAX_CONTAINER_DEPTH: usize = 500;
316
317pub use de::{
318    from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit, from_reader,
319    from_reader_seed, from_reader_seed_with_limit, from_reader_with_limit,
320};
321pub use error::{Error, Result};
322pub use ser::{
323    is_human_readable, serialize_into, serialize_into_with_limit, serialized_size,
324    serialized_size_with_limit, to_bytes, to_bytes_with_limit,
325};