libwebsockets/READMEs/README.json-lejp.md
Andy Green dcaa0013b4 lecp: add CBOR stream parser LECP like JSON LEJP
This provides very memory-efficient CBOR stream parsing
and writing.

The parser  converts pieces of CBOR into callbacks that define
the structure and collate string and blobs into buffer chunks
for extensible and easy access.

It is fragementation-safe and does not need all the CBOR in
the same place at one time, chunks of CBOR are parsed and
discarded as provided.

It does not allocate and just needs a few hundred bytes of
stack for even huge CBOR objects.  Huge strings and blobs
are handled without needing memory to hold them atomically.

Includes ./minimal-examples/api-tests/api-test-lecp that
unit tests it against 82 official example CBORs and
26 additional test vectors from COSE (just checking the CBOR
parsing).

The writing apis allow printf style semantics with a variety
of CBOR-aware %-formats.  The apis write into a context that
manages output buffer usage, if the output buffer fills,
then the apis return with an AGAIN code that lets you issue
and reset the output buffer and repeat the api all to issue
more output.  The subsequent calls can occur much later or
from a different function context, so this is perfect for
WRITEABLE-mediated output from the network parts of lws.

See ./READMEs/README.cbor-lecp.md
2021-08-21 17:44:40 +01:00

107 lines
4.1 KiB
Markdown

# LEJP JSON Stream Parser
|||
|---|---|---|
|cmake| `LWS_WITH_LEJP`|
|Header| ./include/libwebsockets/lws-lejp.h|
|api-test| ./minimal-examples/api-tests/api-test-lejp/|
|test app| ./test-apps/test-lejp.c -> libwebsockets-test-lejp|
LEJP is a lightweight JSON stream parser.
The features are:
- completely immune to input fragmentation, give it any size blocks of JSON as
they become available, 1 byte, or 100K at a time give identical parsing
results
- input chunks discarded as they are parsed, whole JSON never needed in memory
- nonrecursive, fixed stack usage of a few dozen bytes
- no heap allocations at all, just requires ~500 byte context usually on
caller stack
- creates callbacks to a user-provided handler as members are parsed out
- no payload size limit, supports huge / endless strings bigger than
system memory
- collates utf-8 text payloads into a 250-byte chunk buffer in the json parser
context object for ease of access
## Type handling
LEJP leaves all numbers in text form, they are signalled in different callbacks
according to int or float, but delivered as text strings in the first
`ctx->npos` chars of `ctx->buf`.
For numeric types, you would typically use `atoi()` or similar to recover the
number as a host type.
## Callback reasons
The user callback does not have to handle any callbacks, it only needs to
process the data for the ones it is interested in.
|Callback reason|JSON structure|Associated data|
|---|---|---|
|`LEJPCB_CONSTRUCTED`|Created the parse context||
|`LEJPCB_DESTRUCTED`|Destroyed the parse context||
|`LEJPCB_COMPLETE`|The parsing completed OK||
|`LEJPCB_FAILED`|The parsing failed||
|`LEJPCB_VAL_TRUE`|boolean true||
|`LEJPCB_VAL_FALSE`|boolean false||
|`LEJPCB_VAL_NULL`|explicit NULL||
|`LEJPCB_PAIR_NAME`|The name part of a JSON `key: value` map pair|`ctx->buf`|
|`LEJPCB_VAL_STR_START`|A UTF-8 string is starting||
|`LEJPCB_VAL_STR_CHUNK`|The next string chunk|`ctx->npos` bytes in `ctx->buf`|
|`LEJPCB_VAL_STR_END`|The last string chunk|`ctx->npos` bytes in `ctx->buf`|
|`LEJPCB_ARRAY_START`|An array is starting||
|`LEJPCB_ARRAY_END`|An array has ended||
|`LEJPCB_OBJECT_START`|A JSON object is starting||
|`LEJPCB_OBJECT_END`|A JSON object has ended||
## Handling JSON UTF-8 strings
When a string is parsed, an advisory callback of `LECPCB_VAL_STR_START` occurs
first. No payload is delivered with the START callback.
Payload is collated into `ctx->buf[]`, the valid length is in `ctx->npos`.
For short strings or blobs where the length is known, the whole payload is
delivered in a single `LECPCB_VAL_STR_END` callback.
For payloads larger than the size of `ctx->buf[]`, `LECPCB_VAL_STR_CHUNK`
callbacks occur delivering each sequential bufferload.
The last chunk (which may be zero length) is delievered by `LECPCB_VAL_STR_END`.
## Parsing paths
LEJP maintains a "parsing path" in `ctx->path` that represents the context of
the callback events. As a convenience, at LEJP context creation time, you can
pass in an array of path strings you want to match on, and have any match
checkable in the callback using `ctx->path_match`, it's 0 if no active match,
or the match index from your path array starting from 1 for the first entry.
|CBOR element|Representation in path|
|---|---|
|JSON Array|`[]`|
|JSON Map|`.`|
|JSON Map entry key string|`keystring`|
## Comparison with LECP (CBOR parser)
LECP is based on the same principles as LEJP and shares most of the callbacks.
The major differences:
- LEJP value callbacks all appear in `ctx->buf[]`, ie, floating-point is
provided to the callback in ascii form like `"1.0"`. CBOR provides a more
strict typing system, and the different type values are provided either in
`ctx->buf[]` for blobs or utf-8 text strtings, or the `item.u` union for
converted types, with additional callback reasons specific to each type.
- CBOR "maps" use `_OBJECT_START` and `_END` parsing callbacks around the
key / value pairs. LEJP has a special callback type `PAIR_NAME` for the
key string / integer, but in LECP these are provided as generic callbacks
dependent on type, ie, generic string callbacks or integer ones, and the
value part is represented according to whatever comes.