X-Git-Url: https://git.efficios.com/?a=blobdiff_plain;f=README.adoc;h=c3c545ed91fed049bbf7a1fdb3699d3435fda810;hb=HEAD;hp=72c2f82b8e61890162af90d4f65d7fd798f7e51a;hpb=c2b79cf65845f7358da8cd45803d513934a667ac;p=normand.git diff --git a/README.adoc b/README.adoc index 72c2f82..c3c545e 100644 --- a/README.adoc +++ b/README.adoc @@ -1,3 +1,6 @@ +// SPDX-FileCopyrightText: 2023 Philippe Proulx +// SPDX-License-Identifier: CC-BY-SA-4.0 + // Show ToC at a specific location for a GitHub rendering ifdef::env-github[] :toc: macro @@ -29,7 +32,7 @@ _**Normand**_ is a text-to-binary processor with its own language. This package offers both a portable {py3} module and a command-line tool. -WARNING: This version of Normand is 0.5, meaning both the Normand +WARNING: This version of Normand is 0.23, meaning both the Normand language and the module/CLI interface aren't stable. ifdef::env-github[] @@ -98,13 +101,14 @@ Output: aa bb f7 a7 32 da ---- -UTF-8, UTF-16, and UTF-32 literal strings:: +Strings:: + Input: + ---- "hello world!" 00 u16le"stress\nverdict 🤣" +s:latin3{hex(ICITTE)} ---- + Output: @@ -112,7 +116,8 @@ Output: ---- 68 65 6c 6c 6f 20 77 6f 72 6c 64 21 00 73 00 74 ┆ hello world!•s•t 00 72 00 65 00 73 00 73 00 0a 00 76 00 65 00 72 ┆ •r•e•s•s•••v•e•r -00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd ┆ •d•i•c•t• •>•#• +00 64 00 69 00 63 00 74 00 20 00 3e d8 23 dd 30 ┆ •d•i•c•t• •>•#•0 +78 32 66 ┆ x2f ---- Labels: special variables holding the offset where they're defined:: @@ -133,23 +138,25 @@ Variables:: The value of a variable assignment is the evaluation of a valid {py3} expression which may include label and variable names. -Fixed-length integer with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order:: +Fixed-length number with a given length (8{nbsp}bits to 64{nbsp}bits) and byte order:: + Input: + ---- {strength = 4} -{be} 67 44 $178 {(end - lbl) * 8 + strength : 16} $99 -{le} {-1993 : 32} +!be 67 44 $178 [(end - lbl) * 8 + strength : 16] $99 +!le [-1993 : 32] +[-3.141593 : 64be] ---- + Output: + ---- -67 44 b2 00 2c 63 37 f8 ff ff +67 44 b2 00 2c 63 37 f8 ff ff c0 09 21 fb 82 c2 +bd 7f ---- + -The encoded integer is the evaluation of a valid {py3} expression which +The encoded number is the evaluation of a valid {py3} expression which may include label and variable names. https://en.wikipedia.org/wiki/LEB128[LEB128] integer:: @@ -157,8 +164,8 @@ https://en.wikipedia.org/wiki/LEB128[LEB128] integer:: Input: + ---- -aa bb cc {-1993 : sleb128} dd ee ff -{meow * 199 : uleb128} +aa bb cc [-1993 : sleb128] dd ee ff +[meow * 199 : uleb128] ---- + Output: @@ -170,12 +177,41 @@ aa bb cc b7 70 dd ee ff e3 07 The encoded integer is the evaluation of a valid {py3} expression which may include label and variable names. +Conditional:: ++ +Input: ++ +---- +aa bb cc + +( + "foo" + + !if {ICITTE > 10} + "bar" + !else + "fight" + !end +) * 4 +---- ++ +Output: ++ +---- +aa bb cc 66 6f 6f 66 69 67 68 74 66 6f 6f 66 69 ┆ •••foofightfoofi +67 68 74 66 6f 6f 62 61 72 66 6f 6f 62 61 72 ┆ ghtfoobarfoobar +---- + Repetition:: + Input: + ---- aa bb * 5 cc "yeah\0" * {zoom * 3} + +!repeat 3 + ff ee "juice" +!end ---- + Output: @@ -188,8 +224,86 @@ aa bb bb bb bb bb cc 79 65 61 68 00 79 65 61 68 ┆ •••••••yeah 61 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 ┆ ah•yeah•yeah•yea 68 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 ┆ h•yeah•yeah•yeah 00 79 65 61 68 00 79 65 61 68 00 79 65 61 68 00 ┆ •yeah•yeah•yeah• +ff ee 6a 75 69 63 65 ff ee 6a 75 69 63 65 ff ee ┆ ••juice••juice•• +6a 75 69 63 65 ┆ juice +---- + +Alignment:: ++ +Input: ++ +---- +!be + + [199:32] +@64 [43:64] +@16 [-123:16] +@32~255 [5584:32] +---- ++ +Output: ++ +---- +00 00 00 c7 00 00 00 00 00 00 00 00 00 00 00 2b +ff 85 ff ff 00 00 15 d0 +---- + +Filling:: ++ +Input: ++ +---- +!le +[0xdeadbeef:32] +[-1993:16] +[9:16] ++0x40 +[ICITTE:8] +"meow mix" ++200~FFh +[ICITTE:8] +---- ++ +Output: ++ +---- +ef be ad de 37 f8 09 00 00 00 00 00 00 00 00 00 ┆ ••••7••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +40 6d 65 6f 77 20 6d 69 78 ff ff ff ff ff ff ff ┆ @meow mix••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ┆ •••••••••••••••• +ff ff ff ff ff ff ff ff c8 ┆ ••••••••• +---- + +Transformation:: ++ +Input: ++ ---- +"end of file @ " [end:8] + +!transform gzip + "this part will be gzipped" +!end + +---- ++ +Output: ++ +---- +65 6e 64 20 6f 66 20 66 69 6c 65 20 40 20 3c 1f ┆ end of file @ <• +8b 08 00 7b 7b 26 65 02 ff 2b c9 c8 2c 56 28 48 ┆ •••{{&e••+••,V(H +2c 2a 51 28 cf cc c9 51 48 4a 55 48 af ca 2c 28 ┆ ,*Q(•••QHJUH••,( +48 4d 01 00 d4 cc 5b 8a 19 00 00 00 ┆ HM••••[••••• +---- Multilevel grouping:: + @@ -211,6 +325,38 @@ aa bb 7a 6f 6f 6d cc aa bb 7a 6f 6f 6d cc aa bb ┆ ••zoom•••zoom• 6f 6d cc aa bb 7a 6f 6f 6d cc de de de de ┆ om•••zoom••••• ---- +Macros:: ++ +Input: ++ +---- +!macro hello(world) + "hello" + !if world " world" !end +!end + +!repeat 17 + ff ff ff ff + m:hello({ICITTE > 15 and ICITTE < 60}) +!end +---- ++ +Output: ++ +---- +ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel +6c 6f ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c ┆ lo••••hello worl +64 ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ d••••hello world +ff ff ff ff 68 65 6c 6c 6f 20 77 6f 72 6c 64 ff ┆ ••••hello world• +ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c ┆ •••hello••••hell +6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 ┆ o••••hello••••he +6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ff ff ┆ llo••••hello•••• +68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ff ff ┆ hello••••hello•• +ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ ••hello••••hello +ff ff ff ff 68 65 6c 6c 6f ff ff ff ff 68 65 6c ┆ ••••hello••••hel +6c 6f ff ff ff ff 68 65 6c 6c 6f ┆ lo••••hello +---- + Precise error reporting:: + ---- @@ -222,11 +368,13 @@ Precise error reporting:: ---- + ---- -/tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`mix`, `zoom`}. +/tmp/meow.normand:24:19 - Illegal (unknown or unreachable) variable/label name `meow` in expression `(meow - 45) // 8`; the legal names are {`ICITTE`, `mix`, `zoom`}. ---- + ---- -/tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE` at byte offset 45. +/tmp/meow.normand:32:19 - While expanding the macro `meow`: +/tmp/meow.normand:35:5 - While expanding the macro `zzz`: +/tmp/meow.normand:18:9 - Value 315 is outside the 8-bit range when evaluating expression `end - ICITTE`. ---- You can use Normand to track data source files in your favorite VCS @@ -257,10 +405,81 @@ to your project to use it (both the <> function and the <>). `normand.py` has _no external dependencies_, but if you're using -Python{nbsp}3.4, you'll need a local copy of the standard `typing` -module. +Python{nbsp}3.4 or Python{nbsp}3.5, you'll need a local copy of the +standard `typing` module. ==== +== Design goals + +The design goals of Normand are: + +Portability:: + We're making sure `normand.py` works with Python{nbsp}≥{nbsp}3.4 and + doesn't have any external dependencies so that you may just copy the + module as is to your own project. + +Ease of use:: + The most basic Normand input is a sequence of hexadecimal constants + (for example, `4e6f726d616e64`) which produce exactly what you'd + expect. ++ +Most Normand features map to programming language concepts you already +know and understand: constant integers, literal strings, variables, +conditionals, repetitions/loops, and the rest. + +Concise and readable input:: + We could have chosen XML or YAML as the input format, but having a + DSL here makes a Normand input compact and easy to read, two + important traits when using Normand to write tests, for example. ++ +Compare the following Normand input and some hypothetical XML +equivalent, for example: ++ +.Actual Normand input. +---- +ff dd 01 ab $192 $-128 %1101:0011 + +[end:8] + +{iter = 1} + +!if {not something} + # five times because xyz + !repeat 5 + "hello world " [iter:8] + {iter = iter + 1} + !end +!end + + +---- ++ +.Hypothetical Normand XML input. +[source,xml] +---- + + + + + + + + + + + + + + + hello world + + + + + +---- + == Learn Normand A Normand text input is a sequence of items which represent a sequence @@ -276,21 +495,27 @@ current state: |[[cur-offset]] Current offset | The current offset has an effect on the value of <> and of -the special `ICITTE` name in <>, <>, and -<> expression evaluation. +the special `ICITTE` name in <>, <>, <>, +<>, <>, +<>, <>, <>, and +<> expression evaluation. Each generated byte increments the current offset. A <> may change the -current offset. +current offset without generating data. + +An <> generates +padding bytes to make the current offset satisfy a given alignment. |`init_offset` parameter of the `parse()` function. |`--offset` option. |[[cur-bo]] Current byte order | -The current byte order has an effect on the encoding of -<>. +The current byte order can have an effect on the encoding of +<>. A <> may change the current byte order. @@ -303,30 +528,39 @@ the current byte order. |One or more `--label` options. |<> -|Mapping of variable names to integral values. +|Mapping of variable names to integral or floating point number values. |`init_variables` parameter of the `parse()` function. -|One or more `--var` options. +|One or more `--var` or `--var-str` options. |=== The available items are: -* A <> representing a single byte. +* A <> representing one or more + constant bytes. -* A <> representing a sequence of bytes - encoding UTF-8, UTF-16, or UTF-32 data. +* A <> representing a constant sequence + of bytes encoding UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 data. * A <> (big or little endian). -* A <> using the - <> and of which the value is the result of - a {py3} expression. +* A <> (integer or + floating point), possibly using the <>, and + of which the value is the result of a {py3} expression. * An <> of which the value is the result of a {py3} expression. +* A <> representing a sequence of bytes encoding UTF-8, + UTF-16, UTF-32, or Latin-1 to Latin-10 data, and of which the value is + the result of a {py3} expression. + * A <>. +* A <>. + +* A <>. + * A <>, that is, a named constant holding the current offset. + @@ -337,31 +571,65 @@ This is similar to an assembly label. * A <>, that is, a scoped sequence of items. -Moreover, you can <> any item above, except an offset -or a label, a given fixed or variable number of times. This is called a -repetition. +* A <>. + +* A <>. -A Normand comment may exist: +* A <>. -* Between items, possibly within a group. -* Between the nibbles of a constant hexadecimal byte. -* Between the bits of a constant binary byte. -* Between the last item and the ``pass:[*]`` character of a repetition, - and between that ``pass:[*]`` character and the following number - or expression. +* A <>. + +* A <>. + +Moreover, you can repeat many items above a constant or variable number +of times with the ``pass:[*]`` operator _after_ the item to repeat. This +is called a <>. + +A Normand comment may exist pretty much anywhere between tokens. A comment is anything between two ``pass:[#]`` characters on the same -line, or from ``pass:[#]`` until the end of the line. Whitespaces and -the following symbol characters are also considered comments where a -comment may exist: +line, or from ``pass:[#]`` until the end of the line. Whitespaces are +also considered comments. The following symbols are also considered +comments around and between items, as well as between hexadecimal +nibbles and binary bits of <>: ---- -! @ / \ ? & : ; . , + [ ] _ = | - +& , - . / : ; = ? \ _ | ---- The latter serve to improve readability so that you may write, for example, a MAC address or a UUID as is. +[[const-int]] Many items require a _constant integer_, possibly +negative, in which case it may start with `-` for a negative integer. A +positive constant integer is any of: + +Decimal:: + One or mode digits (`0` to `9`). + +Hexadecimal:: + One of: ++ +* The `0x` or `0X` prefix followed with one or more hexadecimal digits + (`0` to `9`, `a` to `f`, or `A` to `F`). +* One or more hexadecimal digits followed with the `h` or `H` suffix. + +Octal:: + One of: ++ +* The `0o` or `0O` prefix followed with one or more octal digits + (`0` to `7`). +* One or more octal digits followed with the `o`, `O`, `q`, or `Q` + suffix. + +Binary:: + One of: ++ +* The `0b` or `0B` prefix followed with one or more bits (`0` or `1`). +* One or more bits followed with the `b` or `B` suffix. + +In general, anything between `pass:[{]` and `}` is a {py3} expression. + You can test the examples of this section with the `normand` <> as such: @@ -373,24 +641,31 @@ where `file` is the name of a file containing the Normand input. === Byte constant -A _byte constant_ represents a single byte. +A _byte constant_ represents one or more constant bytes. A byte constant is: Hexadecimal form:: - Two consecutive hexits. + Two consecutive hexadecimal digits representing a single byte. Decimal form:: - A decimal number after the `$` prefix. + One or more digits after the `$` prefix representing a single byte. -Binary form:: - Eight bits after the `%` prefix. +Binary form:: {empty} ++ +-- +. __**N**__ `%` prefixes (at least one). ++ +The number of `%` characters is the number of subsequent expected bytes. + +. __**N**__{nbsp}×{nbsp}8 bits (`0` or `1`). +-- ==== Input: ---- -ab cd [3d 8F] CC +ab cd (3d 8F) CC ---- Output: @@ -435,33 +710,80 @@ Input: ---- %01110011 %01100001 %01101100 %01110101 %01110100 +%%%1101:0010 11111111 #A#11 #B#00 #C#011 #D#1 ---- Output: ---- -73 61 6c 75 74 ┆ salut +73 61 6c 75 74 d2 ff c7 ┆ salut••• ---- ==== === Literal string -A _literal string_ represents the UTF-8-, UTF-16-, or UTF-32-encoded -bytes of a string. +A _literal string_ represents the encoded bytes of a literal string +using the UTF-8, UTF-16, UTF-32, or Latin-1 to Latin-10 encoding. The string to encode isn't implicitly null-terminated: use `\0` at the end of the string to add a null character. A literal string is: -. **Optional**: one of the following encodings instead of UTF-8: +. **Optional**: one of the following encodings instead of the default + UTF-8: + -- [horizontal] -`u16be`:: UTF-16BE. -`u16le`:: UTF-16LE. -`u32be`:: UTF-32BE. -`u32le`:: UTF-32LE. +`s:u8`:: +`u8`:: + UTF-8. + +`s:u16be`:: +`u16be`:: + UTF-16BE. + +`s:u16le`:: +`u16le`:: + UTF-16LE. + +`s:u32be`:: +`u32be`:: + UTF-32BE. + +`s:u32le`:: +`u32le`:: + UTF-32LE. + +`s:latin1`:: + ISO/IEC 8859-1. + +`s:latin2`:: + ISO/IEC 8859-2. + +`s:latin3`:: + ISO/IEC 8859-3. + +`s:latin4`:: + ISO/IEC 8859-4. + +`s:latin5`:: + ISO/IEC 8859-9. + +`s:latin6`:: + ISO/IEC 8859-10. + +`s:latin7`:: + ISO/IEC 8859-13. + +`s:latin8`:: + ISO/IEC 8859-14. + +`s:latin9`:: + ISO/IEC 8859-15. + +`s:latin10`:: + ISO/IEC 8859-16. -- . The ``pass:["]`` prefix. @@ -526,7 +848,7 @@ Output: Input: ---- -u32be "\"illusion is the first\nof all pleasures\" 🦉" +s:u32be "\"illusion is the first\nof all pleasures\" 🦉" ---- Output: @@ -546,6 +868,20 @@ Output: ---- ==== +==== +Input: + +---- +s:latin1 "Paul Piché" +---- + +Output: + +---- +50 61 75 6c 20 50 69 63 68 e9 ┆ Paul Pich• +---- +==== + === Current byte order setting This special item sets the <>. @@ -553,43 +889,79 @@ This special item sets the <>. The two accepted forms are: [horizontal] -``pass:[{be}]``:: Set the current byte order to big endian. -``pass:[{le}]``:: Set the current byte order to little endian. +`!be`:: Set the current byte order to big endian. +`!le`:: Set the current byte order to little endian. -=== Fixed-length integer +=== Fixed-length number -A _fixed-length integer_ represents a fixed number of bytes encoding an -unsigned or signed integer which is the result of evaluating a {py3} -expression using the <>. +A _fixed-length number_ represents a fixed number of bytes encoding +either: -A fixed-length integer is: +* An unsigned or signed integer (two's complement). ++ +The available lengths are 8, 16, 24, 32, 40, 48, 56, and 64. -. The ``pass:[{]`` prefix. +* A floating point number + (https://standards.ieee.org/standard/754-2008.html[IEEE{nbsp}754-2008]). ++ +The available lengths are 32 (_binary32_) and 64 (_binary64_). + +The value is the result of evaluating a {py3} expression. + +The byte order to use to encode the value is either directly specified +or is the <>. + +A fixed-length number is: + +. The `[` prefix. . A valid {py3} expression. + -For a fixed-length integer at some source location{nbsp}__**L**__, this +For a fixed-length number at some source location{nbsp}__**L**__, this expression may contain the name of any accessible <> (not within a nested group), including the name of a label defined -after{nbsp}__**L**__, as well as the name of any -<> known at{nbsp}__**L**__. +after{nbsp}__**L**__ (except within a +<>), as well as the name of +any <> known at{nbsp}__**L**__. + -The value of the special name `ICITTE` in this expression is the -<> (before encoding the integer). +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before encoding the number). . The `:` character. -. An encoding length in bits amongst `8`, `16`, `24`, `32`, `40`, - `48`, `56`, and `64`. +. An encoding length in bits amongst: ++ +-- +The expression evaluates to an `int` or `bool` value:: + `8`, `16`, `24`, `32`, `40`, `48`, `56`, and `64`. ++ +NOTE: Normand automatically converts a `bool` value to `int`. + +The expression evaluates to a `float` value:: + `32` and `64`. +-- -. The `}` suffix. +. **Optional**: a suffix of the previous encoding length, without + any whitespace, amongst: ++ +-- +[horizontal] +`be`:: Encode in big endian. +`le`:: Encode in little endian. +-- ++ +Without this suffix, the encoding byte order is the <> which must be defined if the encoding length is greater +than eight. + +. The `]` suffix. ==== Input: ---- -{le} {345:16} -{be} {-0xabcd:32} +[345:16le] +[-0xabcd:32be] ---- Output: @@ -603,10 +975,10 @@ Output: Input: ---- -{be} +!be # String length in bits -{8 * (str_end - str_beg) : 16} +[8 * (str_end - str_beg) : 16] # String @@ -625,7 +997,7 @@ Output: Input: ---- -{20 - ICITTE : 8} * 10 +[20 - ICITTE : 8] * 10 ---- Output: @@ -635,6 +1007,20 @@ Output: ---- ==== +==== +Input: + +---- +[2 * 0.0529 : 32le] +---- + +Output: + +---- +ac ad d8 3d +---- +==== + === LEB128 integer An _LEB128 integer_ represents a variable number of bytes encoding an @@ -644,22 +1030,23 @@ format. An LEB128 integer is: -. The ``pass:[{]`` prefix. +. The `[` prefix. -. A valid {py3} expression. +. A valid {py3} expression of which the evaluation result type + is `int` or `bool` (automatically converted to `int`). + For an LEB128 integer at some source location{nbsp}__**L**__, this expression may contain: + -- -* The name of any <> defined before{nbsp}__**L**__. -* The name of any <> known at{nbsp}__**L**__ - which doesn't, directly or indirectly, refer to a label - defined after{nbsp}__**L**__. +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. -- + -The value of the special name `ICITTE` in this expression is the -<> (before encoding the integer). +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before encoding the integer). . The `:` character. @@ -671,13 +1058,13 @@ The value of the special name `ICITTE` in this expression is the `sleb128`:: Use the signed LEB128 format. -- -. The `}` suffix. +. The `]` suffix. ==== Input: ---- -{624485 : uleb128} +[624485 : uleb128] ---- Output: @@ -694,7 +1081,7 @@ Input: aa bb cc dd ee ff -{-981238311 + (meow * -23) : sleb128} +[-981238311 + (meow * -23) : sleb128] "hello" ---- @@ -705,141 +1092,184 @@ aa bb cc dd ee ff fd fa 8d ac 7c 68 65 6c 6c 6f ┆ •••••••• ---- ==== -=== Current offset setting - -This special item sets the <>. - -A current offset setting is: +=== String -. The `<` prefix. +A _string_ represents a variable number of bytes encoding a string which +is the result of evaluating a {py3} expression using the UTF-8, UTF-16, +UTF-32, or Latin-1 to Latin-10 encoding. -. A positive integer (hexadecimal starting with `0x` or `0X` accepted) - which is the new current offset. +A string has two possible forms: -. The `>` suffix. +Encoding prefix form:: {empty} ++ +. An encoding amongst: ++ +-- +[horizontal] +`s:u8`:: +`u8`:: + UTF-8. -==== -Input: +`s:u16be`:: +`u16be`:: + UTF-16BE. ----- - {ICITTE : 8} * 8 -<0x61> {ICITTE : 8} * 8 ----- +`s:u16le`:: +`u16le`:: + UTF-16LE. -Output: +`s:u32be`:: +`u32be`:: + UTF-32BE. ----- -00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh ----- -==== +`s:u32le`:: +`u32le`:: + UTF-32LE. -==== -Input: +`s:latin1`:: + ISO/IEC 8859-1. ----- -aa bb cc dd ee ff -<12> 11 22 33 44 55 -{meow : 8} {mix : 8} ----- +`s:latin2`:: + ISO/IEC 8859-2. -Output: +`s:latin3`:: + ISO/IEC 8859-3. ----- -aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU•• ----- -==== +`s:latin4`:: + ISO/IEC 8859-4. -=== Label +`s:latin5`:: + ISO/IEC 8859-9. -A _label_ associates a name to the <>. +`s:latin6`:: + ISO/IEC 8859-10. -All the labels of a whole Normand input must have unique names. +`s:latin7`:: + ISO/IEC 8859-13. -A label must not share the name of a <> -name. +`s:latin8`:: + ISO/IEC 8859-14. -A label is: +`s:latin9`:: + ISO/IEC 8859-15. -. The `<` prefix. +`s:latin10`:: + ISO/IEC 8859-16. +-- -. A valid {py3} name which is not `ICITTE` (see - <>, <>, and - <> to learn more). +. The ``pass:[{]`` prefix. -. The `>` suffix. +. A valid {py3} expression of which the evaluation result type + is `bool`, `int`, `float`, or `str` (the first three automatically + converted to `str`). ++ +For a string at some source location{nbsp}__**L**__, this expression may +contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before encoding the string). -=== Variable assignment +. The `}` suffix. -A _variable assignment_ associates a name to the integral result of an -evaluated {py3} expression. +Encoding suffix form:: {empty} ++ +. The `[` prefix. -A variable assignment is: +. A valid {py3} expression of which the evaluation result type + is `bool`, `int`, `float`, or `str` (the first three automatically + converted to `str`). ++ +For a string at some source location{nbsp}__**L**__, this expression may +contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before encoding the string). -. The ``pass:[{]`` prefix. +. The `:` character. -. A valid {py3} name which is not `ICITTE` (see - <>, <>, and - <> to learn more). +. A string encoding amongst: ++ +-- +[horizontal] +`s:u8`:: + UTF-8. -. The `=` character. +`s:u16be`:: + UTF-16BE. -. A valid {py3} expression. -+ -For a variable assignment at some source location{nbsp}__**L**__, this -expression may contain the name of any accessible <> (not -within a nested group), including the name of a label defined -after{nbsp}__**L**__, as well as the name of any -<> known at{nbsp}__**L**__. -+ -The value of the special name `ICITTE` in this expression is the -<>. +`s:u16le`:: + UTF-16LE. -. The `}` suffix. +`s:u32be`:: + UTF-32BE. -==== -Input: +`s:u32le`:: + UTF-32LE. ----- -{mix = 101} {le} -{meow = 42} 11 22 {meow:8} 33 {meow = ICITTE + 17} -"yooo" {meow + mix : 16} ----- +`s:latin1`:: + ISO/IEC 8859-1. -Output: +`s:latin2`:: + ISO/IEC 8859-2. ----- -11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz• ----- -==== +`s:latin3`:: + ISO/IEC 8859-3. -=== Group +`s:latin4`:: + ISO/IEC 8859-4. -A _group_ is a scoped sequence of items. +`s:latin5`:: + ISO/IEC 8859-9. -The <> within a group aren't visible outside of it. +`s:latin6`:: + ISO/IEC 8859-10. -The main purpose of a group is to <> more than a -single item. +`s:latin7`:: + ISO/IEC 8859-13. -A group is: +`s:latin8`:: + ISO/IEC 8859-14. -. The `(` prefix. +`s:latin9`:: + ISO/IEC 8859-15. -. Zero or more items. +`s:latin10`:: + ISO/IEC 8859-16. +-- -. The `)` suffix. +. The `]` suffix. ==== Input: ---- -((aa bb cc) dd () ee) "leclerc" +{iter = 1} + +!repeat 10 + u8{iter} " " + {iter = iter + 1} +!end ---- Output: ---- -aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc +31 20 32 20 33 20 34 20 35 20 36 20 37 20 38 20 ┆ 1 2 3 4 5 6 7 8 +39 20 31 30 20 ┆ 9 10 ---- ==== @@ -847,52 +1277,1029 @@ aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc Input: ---- -((aa bb cc) * 3 dd ee) * 5 +{meow = 'salut jérémie'} +[meow.upper() : s:latin1] ---- Output: ---- -aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb -cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd -ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa -bb cc aa bb cc dd ee +53 41 4c 55 54 20 4a c9 52 c9 4d 49 45 ┆ SALUT J•R•MIE ---- ==== +=== Current offset setting + +This special item sets the <>. + +A current offset setting is: + +. The `<` prefix. + +. A <> which is the new current + offset. + +. The `>` suffix. + +==== +Input: + +---- + [ICITTE : 8] * 8 +<0x61> [ICITTE : 8] * 8 +---- + +Output: + +---- +00 01 02 03 04 05 06 07 61 62 63 64 65 66 67 68 ┆ ••••••••abcdefgh +---- +==== + +==== +Input: + +---- +aa bb cc dd ee ff +<12> 11 22 33 44 55 +[meow : 8] [mix : 8] +---- + +Output: + +---- +aa bb cc dd ee ff 11 22 33 44 55 04 0f ┆ •••••••"3DU•• +---- +==== + +=== Current offset alignment + +A _current offset alignment_ represents zero or more padding bytes to +make the <> meet a given +https://en.wikipedia.org/wiki/Data_structure_alignment[alignment] value. + +More specifically, for an alignment value of{nbsp}__**N**__{nbsp}bits, +a current offset alignment represents the required padding bytes until +the current offset is a multiple of __**N**__{nbsp}/{nbsp}8. + +A current offset alignment is: + +. The `@` prefix. + +. A <> which is the alignment value + in _bits_. ++ +This value must be greater than zero and a multiple of{nbsp}8. + +. **Optional**: ++ +-- +. The ``pass:[~]`` prefix. +. A <> which is the value of the + byte to use as padding to align the <>. +-- ++ +Without this section, the padding byte value is zero. + +==== +Input: + +---- +11 22 (@32 aa bb cc) * 3 +---- + +Output: + +---- +11 22 00 00 aa bb cc 00 aa bb cc 00 aa bb cc +---- +==== + +==== +Input: + +---- +!le +77 88 +@32~0xcc [-893.5:32] +@128~0x55 "meow" +---- + +Output: + +---- +77 88 cc cc 00 60 5f c4 55 55 55 55 55 55 55 55 ┆ w••••`_•UUUUUUUU +6d 65 6f 77 ┆ meow +---- +==== + +==== +Input: + +---- +aa bb cc <29> @64~255 "zoom" +---- + +Output: + +---- +aa bb cc ff ff ff 7a 6f 6f 6d ┆ ••••••zoom +---- +==== + +=== Filling + +A _filling_ represents zero or more padding bytes to make the +<> reach a given value. + +A filling is: + +. The ``pass:[+]`` prefix. + +. One of: + +** A <> which is the current offset + target. + +** The ``pass:[{]`` prefix, a valid {py3} expression of which the + evaluation result type is `int` or `bool` (automatically converted to + `int`), and the `}` suffix. ++ +For a filling at some source location{nbsp}__**L**__, this expression +may contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before handling the items to +repeat). + +** A valid {py3} name. ++ +For the name `__NAME__`, this is equivalent to the +`pass:[{]__NAME__}` form above. + ++ +This value must be greater than or equal to the current offset where +it's used. + +. **Optional**: ++ +-- +. The ``pass:[~]`` prefix. +. A <> which is the value of the + byte to use as padding to reach the current offset target. +-- ++ +Without this section, the padding byte value is zero. + +==== +Input: + +---- +aa bb cc dd ++0x40 +"hello world" +---- + +Output: + +---- +aa bb cc dd 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ┆ •••••••••••••••• +68 65 6c 6c 6f 20 77 6f 72 6c 64 ┆ hello world +---- +==== + +==== +Input: + +---- +!macro part(iter, fill) + <0> "particular security " [ord('0') + iter : 8] +fill~0x80 +!end + +{iter = 1} + +!repeat 5 + m:part(iter, {32 + 4 * iter}) + {iter = iter + 1} +!end +---- + +Output: + +---- +70 61 72 74 69 63 75 6c 61 72 20 73 65 63 75 72 ┆ particular secur +69 74 79 20 31 80 80 80 80 80 80 80 80 80 80 80 ┆ ity 1••••••••••• +80 80 80 80 70 61 72 74 69 63 75 6c 61 72 20 73 ┆ ••••particular s +65 63 75 72 69 74 79 20 32 80 80 80 80 80 80 80 ┆ ecurity 2••••••• +80 80 80 80 80 80 80 80 80 80 80 80 70 61 72 74 ┆ ••••••••••••part +69 63 75 6c 61 72 20 73 65 63 75 72 69 74 79 20 ┆ icular security +33 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ 3••••••••••••••• +80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul +61 72 20 73 65 63 75 72 69 74 79 20 34 80 80 80 ┆ ar security 4••• +80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••••••• +80 80 80 80 80 80 80 80 70 61 72 74 69 63 75 6c ┆ ••••••••particul +61 72 20 73 65 63 75 72 69 74 79 20 35 80 80 80 ┆ ar security 5••• +80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••••••• +80 80 80 80 80 80 80 80 80 80 80 80 ┆ •••••••••••• +---- +==== + +=== Label + +A _label_ associates a name to the <>. + +All the labels of a whole Normand input must have unique names. + +A label must not share the name of a <> +name. + +A label is: + +. The `<` prefix. + +. A valid {py3} name which is not `ICITTE`. + +. The `>` suffix. + +=== Variable assignment + +A _variable assignment_ associates a name to the integral result of an +evaluated {py3} expression. + +A variable assignment is: + +. The ``pass:[{]`` prefix. + +. A valid {py3} name which is not `ICITTE`. + +. The `=` character. + +. A valid {py3} expression of which the evaluation result type is `int`, + `float`, or `bool` (automatically converted to `int`), or `str`. ++ +For a variable assignment at some source location{nbsp}__**L**__, this +expression may contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <>. + +. The `}` suffix. + +==== +Input: + +---- +{mix = 101} !le +{meow = 42} 11 22 [meow:8] 33 {meow = ICITTE + 17} +"yooo" [meow + mix : 16] +---- + +Output: + +---- +11 22 2a 33 79 6f 6f 6f 7a 00 ┆ •"*3yoooz• +---- +==== + +=== Group + +A _group_ is a scoped sequence of items. + +The <> within a group aren't visible outside of it. + +The main purpose of a group is to <> more +than a single item and to isolate labels. + +A group is: + +. The `(`, `!group`, or `!g` opening. + +. Zero or more items except, recursively, a macro definition block. + +. Depending on the group opening: ++ +-- +`(`:: + The `)` closing. + +`!group`:: +`!g`:: + The `!end` closing. +-- + +==== +Input: + +---- +((aa bb cc) dd () ee) "leclerc" +---- + +Output: + +---- +aa bb cc dd ee 6c 65 63 6c 65 72 63 ┆ •••••leclerc +---- +==== + +==== +Input: + +---- +!group + (aa bb cc) * 3 dd ee +!end * 5 +---- + +Output: + +---- +aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa bb +cc aa bb cc dd ee aa bb cc aa bb cc aa bb cc dd +ee aa bb cc aa bb cc aa bb cc dd ee aa bb cc aa +bb cc aa bb cc dd ee +---- +==== + +==== +Input: + +---- +!be +( + u16le"sébastien diaz" + [ICITTE - str_beg : 8] + [(end - str_beg) * 5 : 24] +) * 3 + +---- + +Output: + +---- +73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• +6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z••••• +73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• +6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@ +73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• +6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z••••• +---- +==== + +=== Conditional block + +A _conditional block_ represents either the bytes of zero or more items +if some expression is true, or the bytes of zero or more other items if +it's false. + +A conditional block is: + +. The `!if` opening. + +. One of: + +** The ``pass:[{]`` prefix, a valid {py3} expression of which the + evaluation result type is `int` or `bool` (automatically converted to + `int`), and the `}` suffix. ++ +For a conditional block at some source location{nbsp}__**L**__, this +expression may contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before handling the contained +items). + +** A valid {py3} name. ++ +For the name `__NAME__`, this is equivalent to the +`pass:[{]__NAME__}` form above. + +. Zero or more items to be handled when the condition is true + except, recursively, a macro definition block. + +. **Optional**: + +.. The `!else` opening. +.. Zero or more items to be handled when the condition is false + except, recursively, a macro definition block + +. The `!end` closing. + +==== +Input: + +---- +{at = 1} +{rep_count = 9} + +!repeat rep_count + "meow " + + !if {ICITTE > 25} + "mix" + !else + "zoom" + !end + + !if {at < rep_count} 20 !end + + {at = at + 1} +!end +---- + +Output: + +---- +6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 6f 77 20 7a ┆ meow zoom meow z +6f 6f 6d 20 6d 65 6f 77 20 7a 6f 6f 6d 20 6d 65 ┆ oom meow zoom me +6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 78 20 ┆ ow mix meow mix +6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 6d 69 ┆ meow mix meow mi +78 20 6d 65 6f 77 20 6d 69 78 20 6d 65 6f 77 20 ┆ x meow mix meow +6d 69 78 ┆ mix +---- +==== + +==== +Input: + +---- + +u16le"meow mix!" + + +!if {str_end - str_beg > 10} + " BIG" +!end +---- + +Output: + +---- +6d 00 65 00 6f 00 77 00 20 00 6d 00 69 00 78 00 ┆ m•e•o•w• •m•i•x• +21 00 20 42 49 47 ┆ !• BIG +---- +==== + +=== Repetition block + +A _repetition block_ represents the bytes of one or more items repeated +a given number of times. + +A repetition block is: + +. The `!repeat` or `!r` opening. + +. One of: + +** A <> which is the number of + times to repeat the previous item. + +** The ``pass:[{]`` prefix, a valid {py3} expression of which the + evaluation result type is `int` or `bool` (automatically converted to + `int`), and the `}` suffix. ++ +For a repetition block at some source location{nbsp}__**L**__, this +expression may contain: ++ +-- +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +* The name of any <> known + at{nbsp}__**L**__. +-- ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before handling the items to +repeat). + +** A valid {py3} name. ++ +For the name `__NAME__`, this is equivalent to the +`pass:[{]__NAME__}` form above. + +. Zero or more items except, recursively, a macro definition block. + +. The `!end` closing. + +You may also use a <> after +some items. The form ``!repeat{nbsp}__X__{nbsp}__ITEMS__{nbsp}!end`` +is equivalent to ``(__ITEMS__){nbsp}pass:[*]{nbsp}__X__``. + ==== Input: ---- -{be} -( - u16le"sébastien diaz" - {ICITTE - str_beg : 8} - {(end - str_beg) * 5 : 24} -) * 3 +!repeat 0o400 + [end - ICITTE - 1 : 8] +!end + ---- Output: ---- -73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• -6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 e0 ┆ n• •d•i•a•z••••• -73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• -6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 01 40 ┆ n• •d•i•a•z••••@ -73 00 e9 00 62 00 61 00 73 00 74 00 69 00 65 00 ┆ s•••b•a•s•t•i•e• -6e 00 20 00 64 00 69 00 61 00 7a 00 1c 00 00 a0 ┆ n• •d•i•a•z••••• +ff fe fd fc fb fa f9 f8 f7 f6 f5 f4 f3 f2 f1 f0 ┆ •••••••••••••••• +ef ee ed ec eb ea e9 e8 e7 e6 e5 e4 e3 e2 e1 e0 ┆ •••••••••••••••• +df de dd dc db da d9 d8 d7 d6 d5 d4 d3 d2 d1 d0 ┆ •••••••••••••••• +cf ce cd cc cb ca c9 c8 c7 c6 c5 c4 c3 c2 c1 c0 ┆ •••••••••••••••• +bf be bd bc bb ba b9 b8 b7 b6 b5 b4 b3 b2 b1 b0 ┆ •••••••••••••••• +af ae ad ac ab aa a9 a8 a7 a6 a5 a4 a3 a2 a1 a0 ┆ •••••••••••••••• +9f 9e 9d 9c 9b 9a 99 98 97 96 95 94 93 92 91 90 ┆ •••••••••••••••• +8f 8e 8d 8c 8b 8a 89 88 87 86 85 84 83 82 81 80 ┆ •••••••••••••••• +7f 7e 7d 7c 7b 7a 79 78 77 76 75 74 73 72 71 70 ┆ •~}|{zyxwvutsrqp +6f 6e 6d 6c 6b 6a 69 68 67 66 65 64 63 62 61 60 ┆ onmlkjihgfedcba` +5f 5e 5d 5c 5b 5a 59 58 57 56 55 54 53 52 51 50 ┆ _^]\[ZYXWVUTSRQP +4f 4e 4d 4c 4b 4a 49 48 47 46 45 44 43 42 41 40 ┆ ONMLKJIHGFEDCBA@ +3f 3e 3d 3c 3b 3a 39 38 37 36 35 34 33 32 31 30 ┆ ?>=<;:9876543210 +2f 2e 2d 2c 2b 2a 29 28 27 26 25 24 23 22 21 20 ┆ /.-,+*)('&%$#"! +1f 1e 1d 1c 1b 1a 19 18 17 16 15 14 13 12 11 10 ┆ •••••••••••••••• +0f 0e 0d 0c 0b 0a 09 08 07 06 05 04 03 02 01 00 ┆ •••••••••••••••• +---- +==== + +==== +Input: + +---- +{times = 1} + +aa bb cc dd + +!repeat 3 + + + !repeat {here + 1} + ee ff + !end + + 11 22 !repeat times 33 !end + + {times = times + 1} +!end + +"coucou!" +---- + +Output: + +---- +aa bb cc dd ee ff ee ff ee ff ee ff ee ff 11 22 ┆ •••••••••••••••" +33 ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ 3••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff 11 22 33 33 ee ff ee ff ee ff ee ┆ ••••••"33••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff ee ff ee ┆ •••••••••••••••• +ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ ••••••••••••••"3 +33 33 63 6f 75 63 6f 75 21 ┆ 33coucou! +---- +==== + +=== Transformation block + +A _transformation block_ represents the bytes of one or more items +transformed into other bytes by a function. + +As of this version, Normand only offers a predetermined set of +transformation functions. + +An encoded block is: + +. The `!transform` or `!t` opening. + +. A transformation function name amongst: ++ +-- +[horizontal] +`base64`:: +`b64`:: + Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-4[Base64]. + +`base64u`:: +`b64u`:: + URL-safe Base64, using `-` instead of `pass:[+]` and `_` instead of + `/`. + +`base32`:: +`b32`:: + Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-6[Base32]. + +`base16`:: +`b16`:: + Standard https://datatracker.ietf.org/doc/html/rfc4648.html#section-8[Base16]. + +`ascii85`:: +`a85`:: + https://en.wikipedia.org/wiki/Ascii85[Ascii85] without padding. + +`ascii85p`:: +`a85p`:: + Ascii85 with padding. + +`base85`:: +`b85`:: + https://en.wikipedia.org/wiki/Ascii85[Base85] (like Git-style binary + diffs) without padding. + +`base85p`:: +`b85p`:: + Base85 with padding. + +`quopri`:: +`qp`:: + MIME + https://datatracker.ietf.org/doc/html/rfc2045#section-6.7[quoted-printable] + without quoted whitespaces. + +`quoprit`:: +`qpt`:: + MIME quoted-printable with quoted whitespaces. + +`gzip`:: +`gz`:: + https://en.wikipedia.org/wiki/Gzip[gzip]. + +`bzip2`:: +`bz2`:: + https://en.wikipedia.org/wiki/Bzip2[bzip2]. +-- + +. Zero or more items except, recursively, a macro definition block. ++ +Any {py3} expression within any of those items may not refer to a future +<>. ++ +The value of the special name `ICITTE` in any {py3} expression within +any of those items is the <> _before_ Normand +applies the transformation function. Therefore, labels defined within +those items also have the current offset value _before_ Normand applies +the transformation function. + +. The `!end` closing. + +The <> after having handled the last item of +a transformation block is the value of the current offset before +handling the first item plus the size of the generated (transformed) +bytes. In other words, <> within the items of the block have no impact outside said +block. + +==== +Input: + +---- +aa bb cc dd + +"size of compressed section: " [end - start : 8] + + + +!transform bzip2 + "this will be compressed!" + 89*100 00*5000 +!end + + + +"yes!" +---- + +Output: + +---- +aa bb cc dd 73 69 7a 65 20 6f 66 20 63 6f 6d 70 ┆ ••••size of comp +72 65 73 73 65 64 20 73 65 63 74 69 6f 6e 3a 20 ┆ ressed section: +52 42 5a 68 39 31 41 59 26 53 59 68 e1 8c fc 00 ┆ RBZh91AY&SYh•••• +00 33 d1 e0 c0 00 60 00 5e 66 dc 80 00 20 00 80 ┆ •3••••`•^f••• •• +00 08 20 00 31 40 d3 43 23 26 20 ca 87 a9 a1 e8 ┆ •• •1@•C#& ••••• +18 29 44 80 9c 80 49 bf cc b3 e8 45 ed e2 76 ad ┆ •)D•••I••••E••v• +0f 12 8b 8a d6 cd 40 04 7e 2e e4 8a 70 a1 20 d1 ┆ ••••••@•~.••p• • +c3 19 f8 79 65 73 21 ┆ •••yes! +---- +==== + +==== +Input: + +---- +88*16 + +!t a85 + "I am determined to be cheerful and happy in whatever situation " + "I may find myself. For I have learned that the greater part of " + "our misery or unhappiness is determined not by our circumstance " + "but by our disposition." +!end + +@128~99h + +!t qp [ICITTE - beg : 8] * 50 !end +---- + +Output: + +---- +88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 88 ┆ •••••••••••••••• +38 4b 5f 47 59 2b 43 6f 26 2a 41 54 44 58 25 44 ┆ 8K_GY+Co&*ATDX%D +49 6d 3f 24 46 44 69 3a 32 41 4b 59 4a 72 41 53 ┆ Im?$FDi:2AKYJrAS +23 6d 6f 46 5f 69 31 2f 44 49 61 6c 27 40 3b 70 ┆ #moF_i1/DIal'@;p +31 32 2b 44 47 5e 39 47 41 28 45 2c 41 54 68 58 ┆ 12+DG^9GA(E,AThX +2a 2b 45 4d 37 3d 46 5e 5d 42 2b 44 66 2d 5b 68 ┆ *+EM7=F^]B+Df-[h +2b 44 6b 50 34 2b 44 2c 3e 2a 41 30 3e 60 37 46 ┆ +DkP4+D,>*A0>`7F +28 4b 30 22 2f 67 2a 57 25 45 5a 64 70 72 42 4f ┆ (K0"/g*W%EZdprBO +51 27 71 2b 44 62 55 74 45 63 2c 48 21 2b 45 56 ┆ Q'q+DbUtEc,H!+EV +3a 2a 46 3c 47 5b 3d 41 4b 59 57 2b 41 52 54 5b ┆ :*F +63 2e 46 3c 47 25 3c 2b 45 29 43 43 2b 43 66 2c ┆ c.F> does so. + +A macro definition may only exist at the root level, that is, not within +a <>, a <>, a +<>, or another +<>. + +All macro definitions must have unique names. + +A macro definition is: + +. The `!macro` or `!m` opening. + +. A valid {py3} name (the macro name). + +. The `(` parameter name list prefix. + +. A comma-separated list of zero or more unique parameter names, + each one being a valid {py3} name. + +. The `)` parameter name list suffix. + +. Zero or more items except, recursively, a macro definition block. + +. The `!end` closing. + +==== +---- +!macro bake() + !le [ICITTE * 8 : 16] + u16le"predict explode" +!end +---- +==== + +==== +---- +!macro nail(rep, with_extra, val) + {iter = 1} + + !repeat rep + [val + iter : uleb128] + [0xdeadbeef : 32] + {iter = iter + 1} + !end + + !if with_extra + "meow mix\0" + !end +!end +---- +==== + +=== Macro expansion + +A _macro expansion_ expands the items of a defined +<>. + +The macro to expand must be defined _before_ the expansion. + +The <> before handling the first item of the chosen macro +is: + +<>:: + Unchanged. + +<>:: + Unchanged. + +Variables:: + The only available variables initially are the macro parameters. + +Labels:: + None. + +The state after having handled the last item of the chosen macro is: + +Current offset:: + The one before handling the first item of the macro plus the size + of the generated data of the macro expansion. ++ +IMPORTANT: This means <> +items within the expanded macro don't impact the final current offset. + +Current byte order:: + The one before handling the first item of the macro. + +Variables:: + The ones before handling the first item of the macro. + +Labels:: + The ones before handling the first item of the macro. + +A macro expansion is: + +. The `m:` prefix. + +. A valid {py3} name (the name of the macro to expand). + +. The `(` parameter value list prefix. + +. A comma-separated list of zero or more unique parameter values. ++ +The number of parameter values must match the number of parameter +names of the definition of the chosen macro. ++ +A parameter value is one of: ++ +-- +* A <>, possibly negative. + +* A constant floating point number. + +* The ``pass:[{]`` prefix, a valid {py3} expression of which the + evaluation result type is `int` or `bool` (automatically converted to + `int`), and the `}` suffix. ++ +For a macro expansion at some source location{nbsp}__**L**__, this +expression may contain: + +** The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group. +** The name of any <> known + at{nbsp}__**L**__. + ++ +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before handling the items of the +chosen macro). + +* A valid {py3} name. ++ +For the name `__NAME__`, this is equivalent to the +`pass:[{]__NAME__pass:[}]` form above. +-- + +. The `)` parameter value list suffix. + +==== +Input: + +---- +!macro bake() + !le [ICITTE * 8 : 16] + u16le"predict explode" +!end + +"hello [" m:bake() "] world" + +m:bake() * 5 +---- + +Output: + +---- +68 65 6c 6c 6f 20 5b 38 00 70 00 72 00 65 00 64 ┆ hello [8•p•r•e•d +00 69 00 63 00 74 00 20 00 65 00 78 00 70 00 6c ┆ •i•c•t• •e•x•p•l +00 6f 00 64 00 65 00 5d 20 77 6f 72 6c 64 70 01 ┆ •o•d•e•] worldp• +70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • +65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 02 ┆ e•x•p•l•o•d•e•p• +70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • +65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 03 ┆ e•x•p•l•o•d•e•p• +70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • +65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 04 ┆ e•x•p•l•o•d•e•p• +70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • +65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 70 05 ┆ e•x•p•l•o•d•e•p• +70 00 72 00 65 00 64 00 69 00 63 00 74 00 20 00 ┆ p•r•e•d•i•c•t• • +65 00 78 00 70 00 6c 00 6f 00 64 00 65 00 ┆ e•x•p•l•o•d•e• +---- +==== + +==== +Input: + +---- +!macro A(val, is_be) + !le + + !if is_be + !be + !end + + [val : 16] +!end + +!macro B(rep, is_be) + {iter = 1} + + !repeat rep + m:A({iter * 3}, is_be) + {iter = iter + 1} + !end +!end + +m:B(5, 1) +m:B(3, 0) +---- + +Output: + +---- +00 03 00 06 00 09 00 0c 00 0f 03 00 06 00 09 00 ---- ==== -=== Repetition +==== +Input: + +---- +!macro flt32be(val) !be [val : 32] !end + +"CHEETOS" +m:flt32be(-42.17) +m:flt32be(56.23e-4) +---- + +Output: -A _repetition_ represents the bytes of an item repeated a given number -of times. +---- +43 48 45 45 54 4f 53 c2 28 ae 14 3b b8 41 25 ┆ CHEETOS•(••;•A% +---- +==== + +=== Post-item repetition -A repetition is: +A _post-item repetition_ represents the bytes of an item repeated a +given number of times. -. Any item. +A post-item repetition is: + +. One of those items: + +** A <>. +** A <>. +** A <>. +** An <>. +** A <>. +** A <>. +** A <>. +** A <>. . The ``pass:[*]`` character. @@ -901,28 +2308,40 @@ A repetition is: ** A positive integer (hexadecimal starting with `0x` or `0X` accepted) which is the number of times to repeat the previous item. -** The ``pass:[{]`` prefix, a valid {py3} expression, and the - ``pass:[}]`` suffix. +** The ``pass:[{]`` prefix, a valid {py3} expression of which the + evaluation result type is `int` or `bool` (automatically converted to + `int`), and the `}` suffix. + -For a repetition at some source location{nbsp}__**L**__, this expression -may contain: +For a post-item repetition at some source location{nbsp}__**L**__, this +expression may contain: + -- -* The name of any <> defined before{nbsp}__**L**__ and - which isn't part of its repeated item. +* The name of any <> defined before{nbsp}__**L**__ + which isn't within a nested group and + which isn't part of the repeated item. * The name of any <> known at{nbsp}__**L**__, which isn't part of its repeated item, and which - doesn't, directly or indirectly, refer to a label defined - after{nbsp}__**L**__. + doesn't. -- + -This expression must not contain the special name `ICITTE`. +The value of the special name `ICITTE` (`int` type) in this expression +is the <> (before handling the items to +repeat). + +** A valid {py3} name. ++ +For the name `__NAME__`, this is equivalent to the +`pass:[{]__NAME__pass:[}]` form above. + +You may also use a <>. The form +``__ITEM__{nbsp}pass:[*]{nbsp}__X__`` is equivalent to +``!repeat{nbsp}__X__{nbsp}__ITEM__{nbsp}!end``. ==== Input: ---- -{end - ICITTE - 1 : 8} * 0x100 +[end - ICITTE - 1 : 8] * 0x100 ---- Output: @@ -980,32 +2399,6 @@ ff ee ff ee ff ee ff ee ff ee ff ee ff 11 22 33 ┆ •••••••• ---- ==== -==== -This example shows how to use a repetition as a conditional section -depending on some predefined variable. - -Input: - ----- -aa bb cc dd -(ee ff "meow mix" 00) * {cond} -{be} {-1993:16} ----- - -Output (`cond` is 0): - ----- -aa bb cc dd f8 37 ----- - -Output (`cond` is 1): - ----- -aa bb cc dd ee ff 6d 65 6f 77 20 6d 69 78 00 f8 ┆ ••••••meow mix•• -37 ┆ 7 ----- -==== - == Command-line tool If you <> the `normand` package, then you @@ -1043,10 +2436,11 @@ use the `--help` option to learn more. == {py3} API -The whole `normand` package/module API is: +The whole `normand` package/module public API is: [source,python] ---- +# Byte order. class ByteOrder(enum.Enum): # Big endian. BE = ... @@ -1055,7 +2449,8 @@ class ByteOrder(enum.Enum): LE = ... -class TextLoc: +# Text location. +class TextLocation: # Line number. @property def line_no(self) -> int: @@ -1067,16 +2462,38 @@ class TextLoc: ... -class ParseError(RuntimeError): +# Parsing error message. +class ParseErrorMessage: + # Message text. + @property + def text(self): + ... + # Source text location. @property - def text_loc(self) -> TextLoc: + def text_location(self): ... -SymbolsT = typing.Dict[str, int] +# Parsing error. +class ParseError(RuntimeError): + # Parsing error messages. + # + # The first message is the most _specific_ one. + @property + def messages(self): + ... + +# Variables dictionary type (for type hints). +VariablesT = typing.Dict[str, typing.Union[int, float]] + +# Labels dictionary type (for type hints). +LabelsT = typing.Dict[str, int] + + +# Parsing result. class ParseResult: # Generated data. @property @@ -1104,6 +2521,9 @@ class ParseResult: ... +# Parses the `normand` input using the initial state defined by +# `init_variables`, `init_labels`, `init_offset`, and `init_byte_order`, +# and returns the corresponding parsing result. def parse(normand: str, init_variables: typing.Optional[SymbolsT] = None, init_labels: typing.Optional[SymbolsT] = None, @@ -1137,6 +2557,10 @@ $ normand <<< '"lol" * 10 0a' * https://github.com/psf/black[Black] * https://pycqa.github.io/isort/[isort] +Licensing and copyright follows the +https://reuse.software/tutorial/[REUSE] specification and is checked +with the https://github.com/fsfe/reuse-tool[reuse tool]. + === Testing Use https://docs.pytest.org/[pytest] to test Normand once the package is