Home | History | Annotate | Download | only in doc
      1 <html>
      2 <title>
      3 PyASN1 codecs
      4 </title>
      5 <head>
      6 </head>
      7 <body>
      8 <center>
      9 <table width=60%>
     10 <tr>
     11 <td>
     12 <h3>
     13 2. PyASN1 Codecs
     14 </h3>
     15 
     16 <p>
     17 In ASN.1 context, 
     18 <a href=http://en.wikipedia.org/wiki/Codec>codec</a>
     19 is a program that transforms between concrete data structures and a stream
     20 of octets, suitable for transmission over the wire. This serialized form of
     21 data is sometimes called <i>substrate</i> or <i>essence</i>.
     22 </p>
     23 
     24 <p>
     25 In pyasn1 implementation, substrate takes shape of Python 3 bytes or 
     26 Python 2 string objects.
     27 </p>
     28 
     29 <p>
     30 One of the properties of a codec is its ability to cope with incomplete
     31 data and/or substrate what implies codec to be stateful. In other words, 
     32 when decoder runs out of substrate and data item being recovered is still 
     33 incomplete, stateful codec would suspend and complete data item recovery 
     34 whenever the rest of substrate becomes available. Similarly, stateful encoder
     35 would encode data items in multiple steps waiting for source data to
     36 arrive. Codec restartability is especially important when application deals
     37 with large volumes of data and/or runs on low RAM. For an interesting
     38 discussion on codecs options and design choices, refer to
     39 <a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a>
     40 .
     41 </p>
     42 
     43 <p>
     44 As of this writing, codecs implemented in pyasn1 are all stateless, mostly
     45 to keep the code simple.
     46 </p>
     47 
     48 <p>
     49 The pyasn1 package currently supports 
     50 <a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and
     51 its variations -- 
     52 <a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and
     53 <a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>.
     54 More ASN.1 codecs are planned for implementation in the future.
     55 </p>
     56 
     57 <a name="2.1"></a>
     58 <h4>
     59 2.1 Encoders
     60 </h4>
     61 
     62 <p>
     63 Encoder is used for transforming pyasn1 value objects into substrate. Only
     64 pyasn1 value objects could be serialized, attempts to process pyasn1 type
     65 objects will cause encoder failure.
     66 </p>
     67 
     68 <p>
     69 The following code will create a pyasn1 Integer object and serialize it with
     70 BER encoder:
     71 </p>
     72 
     73 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
     74 <pre>
     75 >>> from pyasn1.type import univ
     76 >>> from pyasn1.codec.ber import encoder
     77 >>> encoder.encode(univ.Integer(123456))
     78 b'\x02\x03\x01\xe2@'
     79 >>>
     80 </pre>
     81 </td></tr></table>
     82 
     83 <p>
     84 BER standard also defines a so-called <i>indefinite length</i> encoding form
     85 which makes large data items processing more memory efficient. It is mostly
     86 useful when encoder does not have the whole value all at once and the
     87 length of the value can not be determined at the beginning of encoding.
     88 </p>
     89 
     90 <p>
     91 <i>Constructed encoding</i> is another feature of BER closely related to the
     92 indefinite length form. In essence, a large scalar value (such as ASN.1
     93 character BitString type) could be chopped into smaller chunks by encoder
     94 and transmitted incrementally to limit memory consumption. Unlike indefinite
     95 length case, the length of the whole value must be known in advance when
     96 using constructed, definite length encoding form.
     97 </p>
     98 
     99 <p>
    100 Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data
    101 item all at once. However, even in this case, generating indefinite length 
    102 encoding may help a low-memory receiver, running a restartable decoder,
    103 to process a large data item.
    104 </p>
    105 
    106 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    107 <pre>
    108 >>> from pyasn1.type import univ
    109 >>> from pyasn1.codec.ber import encoder
    110 >>> encoder.encode(
    111 ...   univ.OctetString('The quick brown fox jumps over the lazy dog'),
    112 ...   defMode=False,
    113 ...   maxChunkSize=8
    114 ... )
    115 b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \
    116 t\x04\x08he lazy \x04\x03dog\x00\x00'
    117 >>>
    118 >>> encoder.encode(
    119 ...   univ.OctetString('The quick brown fox jumps over the lazy dog'),
    120 ...   maxChunkSize=8
    121 ... )
    122 b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \
    123 t\x04\x08he lazy \x04\x03dog'
    124 </pre>
    125 </td></tr></table>
    126 
    127 <p>
    128 The <b>defMode</b> encoder parameter disables definite length encoding mode,
    129 while the optional <b>maxChunkSize</b> parameter specifies desired
    130 substrate chunk size that influences memory requirements at the decoder's end.
    131 </p>
    132 
    133 <p>
    134 To use CER or DER encoders one needs to explicitly import and call them - the
    135 APIs are all compatible.
    136 </p>
    137 
    138 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    139 <pre>
    140 >>> from pyasn1.type import univ
    141 >>> from pyasn1.codec.ber import encoder as ber_encoder
    142 >>> from pyasn1.codec.cer import encoder as cer_encoder
    143 >>> from pyasn1.codec.der import encoder as der_encoder
    144 >>> ber_encoder.encode(univ.Boolean(True))
    145 b'\x01\x01\x01'
    146 >>> cer_encoder.encode(univ.Boolean(True))
    147 b'\x01\x01\xff'
    148 >>> der_encoder.encode(univ.Boolean(True))
    149 b'\x01\x01\xff'
    150 >>>
    151 </pre>
    152 </td></tr></table>
    153 
    154 <a name="2.2"></a>
    155 <h4>
    156 2.2 Decoders
    157 </h4>
    158 
    159 <p>
    160 In the process of decoding, pyasn1 value objects are created and linked to
    161 each other, based on the information containted in the substrate. Thus,
    162 the original pyasn1 value object(s) are recovered.
    163 </p>
    164 
    165 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    166 <pre>
    167 >>> from pyasn1.type import univ
    168 >>> from pyasn1.codec.ber import encoder, decoder
    169 >>> substrate = encoder.encode(univ.Boolean(True))
    170 >>> decoder.decode(substrate)
    171 (Boolean('True(1)'), b'')
    172 >>>
    173 </pre>
    174 </td></tr></table>
    175 
    176 <p>
    177 Commenting on the code snippet above, pyasn1 decoder accepts substrate
    178 as an argument and returns a tuple of pyasn1 value object (possibly
    179 a top-level one in case of constructed object) and unprocessed part
    180 of input substrate.
    181 </p>
    182 
    183 <p>
    184 All pyasn1 decoders can handle both definite and indefinite length
    185 encoding modes automatically, explicit switching into one mode
    186 to another is not required.
    187 </p>
    188 
    189 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    190 <pre>
    191 >>> from pyasn1.type import univ
    192 >>> from pyasn1.codec.ber import encoder, decoder
    193 >>> substrate = encoder.encode(
    194 ...   univ.OctetString('The quick brown fox jumps over the lazy dog'),
    195 ...   defMode=False,
    196 ...   maxChunkSize=8
    197 ... )
    198 >>> decoder.decode(substrate)
    199 (OctetString(b'The quick brown fox jumps over the lazy dog'), b'')
    200 >>>
    201 </pre>
    202 </td></tr></table>
    203 
    204 <p>
    205 Speaking of BER/CER/DER encoding, in many situations substrate may not contain
    206 all necessary information needed for complete and accurate ASN.1 values
    207 recovery. The most obvious cases include implicitly tagged ASN.1 types
    208 and constrained types.
    209 </p>
    210 
    211 <p>
    212 As discussed earlier in this handbook, when an ASN.1 type is implicitly
    213 tagged, previous outermost tag is lost and never appears in substrate.
    214 If it is the base tag that gets lost, decoder is unable to pick type-specific
    215 value decoder at its table of built-in types, and therefore recover
    216 the value part, based only on the information contained in substrate. The
    217 approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or
    218 a set of them) to <i>guide</i> the decoding process by matching [possibly
    219 incomplete] tags recovered from substrate with those found in prototype pyasn1
    220 type objects (also called pyasn1 specification object further in this paper).
    221 </p>
    222 
    223 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    224 <pre>
    225 >>> from pyasn1.codec.ber import decoder
    226 >>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer())
    227 Integer(12), b''
    228 >>>
    229 </pre>
    230 </td></tr></table>
    231 
    232 <p>
    233 Decoder would neither modify pyasn1 specification object nor use
    234 its current values (if it's a pyasn1 value object), but rather use it as
    235 a hint for choosing proper decoder and as a pattern for creating new objects:
    236 </p>
    237 
    238 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    239 <pre>
    240 >>> from pyasn1.type import univ, tag
    241 >>> from pyasn1.codec.ber import encoder, decoder
    242 >>> i = univ.Integer(12345).subtype(
    243 ...   implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40)
    244 ... )
    245 >>> substrate = encoder.encode(i)
    246 >>> substrate
    247 b'\x9f(\x0209'
    248 >>> decoder.decode(substrate)
    249 Traceback (most recent call last):
    250 ...
    251 pyasn1.error.PyAsn1Error: 
    252    TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec
    253 >>> decoder.decode(substrate, asn1Spec=i)
    254 (Integer(12345), b'')
    255 >>>
    256 </pre>
    257 </td></tr></table>
    258 
    259 <p>
    260 Notice in the example above, that an attempt to run decoder without passing
    261 pyasn1 specification object fails because recovered tag does not belong
    262 to any of the built-in types.
    263 </p>
    264 
    265 <p>
    266 Another important feature of guided decoder operation is the use of
    267 values constraints possibly present in pyasn1 specification object.
    268 To explain this, we will decode a random integer object into generic Integer
    269 and the constrained one.
    270 </p>
    271 
    272 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    273 <pre>
    274 >>> from pyasn1.type import univ, constraint
    275 >>> from pyasn1.codec.ber import encoder, decoder
    276 >>> class DialDigit(univ.Integer):
    277 ...   subtypeSpec = constraint.ValueRangeConstraint(0,9)
    278 >>> substrate = encoder.encode(univ.Integer(13))
    279 >>> decoder.decode(substrate)
    280 (Integer(13), b'')
    281 >>> decoder.decode(substrate, asn1Spec=DialDigit())
    282 Traceback (most recent call last):
    283 ...
    284 pyasn1.type.error.ValueConstraintError:
    285   ValueRangeConstraint(0, 9) failed at: 13
    286 >>> 
    287 </pre>
    288 </td></tr></table>
    289 
    290 <p>
    291 Similarily to encoders, to use CER or DER decoders application has to
    292 explicitly import and call them - all APIs are compatible.
    293 </p>
    294 
    295 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    296 <pre>
    297 >>> from pyasn1.type import univ
    298 >>> from pyasn1.codec.ber import encoder as ber_encoder
    299 >>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net'))
    300 >>>
    301 >>> from pyasn1.codec.ber import decoder as ber_decoder
    302 >>> from pyasn1.codec.cer import decoder as cer_decoder
    303 >>> from pyasn1.codec.der import decoder as der_decoder
    304 >>> 
    305 >>> ber_decoder.decode(substrate)
    306 (OctetString(b'http://pyasn1.sf.net'), b'')
    307 >>> cer_decoder.decode(substrate)
    308 (OctetString(b'http://pyasn1.sf.net'), b'')
    309 >>> der_decoder.decode(substrate)
    310 (OctetString(b'http://pyasn1.sf.net'), b'')
    311 >>> 
    312 </pre>
    313 </td></tr></table>
    314 
    315 <a name="2.2.1"></a>
    316 <h4>
    317 2.2.1 Decoding untagged types
    318 </h4>
    319 
    320 <p>
    321 It has already been mentioned, that ASN.1 has two "special case" types:
    322 CHOICE and ANY. They are different from other types in part of 
    323 tagging - unless these two are additionally tagged, neither of them will
    324 have their own tag. Therefore these types become invisible in substrate
    325 and can not be recovered without passing pyasn1 specification object to
    326 decoder.
    327 </p>
    328 
    329 <p>
    330 To explain the issue, we will first prepare a Choice object to deal with:
    331 </p>
    332 
    333 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    334 <pre>
    335 >>> from pyasn1.type import univ, namedtype
    336 >>> class CodeOrMessage(univ.Choice):
    337 ...   componentType = namedtype.NamedTypes(
    338 ...     namedtype.NamedType('code', univ.Integer()),
    339 ...     namedtype.NamedType('message', univ.OctetString())
    340 ...   )
    341 >>>
    342 >>> codeOrMessage = CodeOrMessage()
    343 >>> codeOrMessage.setComponentByName('message', 'my string value')
    344 >>> print(codeOrMessage.prettyPrint())
    345 CodeOrMessage:
    346  message=b'my string value'
    347 >>>
    348 </pre>
    349 </td></tr></table>
    350 
    351 <p>
    352 Let's now encode this Choice object and then decode its substrate
    353 with and without pyasn1 specification object:
    354 </p>
    355 
    356 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    357 <pre>
    358 >>> from pyasn1.codec.ber import encoder, decoder
    359 >>> substrate = encoder.encode(codeOrMessage)
    360 >>> substrate
    361 b'\x04\x0fmy string value'
    362 >>> encoder.encode(univ.OctetString('my string value'))
    363 b'\x04\x0fmy string value'
    364 >>>
    365 >>> decoder.decode(substrate)
    366 (OctetString(b'my string value'), b'')
    367 >>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage())
    368 >>> print(codeOrMessage.prettyPrint())
    369 CodeOrMessage:
    370  message=b'my string value'
    371 >>>
    372 </pre>
    373 </td></tr></table>
    374 
    375 <p>
    376 First thing to notice in the listing above is that the substrate produced
    377 for our Choice value object is equivalent to the substrate for an OctetString
    378 object initialized to the same value. In other words, any information about
    379 the Choice component is absent in encoding.
    380 </p>
    381 
    382 <p>
    383 Sure enough, that kind of substrate will decode into an OctetString object,
    384 unless original Choice type object is passed to decoder to guide the decoding
    385 process.
    386 </p>
    387 
    388 <p>
    389 Similarily untagged ANY type behaves differently on decoding phase - when
    390 decoder bumps into an Any object in pyasn1 specification, it stops decoding
    391 and puts all the substrate into a new Any value object in form of an octet
    392 string. Concerned application could then re-run decoder with an additional,
    393 more exact pyasn1 specification object to recover the contents of Any
    394 object.
    395 </p>
    396 
    397 <p>
    398 As it was mentioned elsewhere in this paper, Any type allows for incomplete
    399 or changing ASN.1 specification to be handled gracefully by decoder and
    400 applications.
    401 </p>
    402 
    403 <p>
    404 To illustrate the working of Any type, we'll have to make the stage
    405 by encoding a pyasn1 object and then putting its substrate into an any
    406 object.
    407 </p>
    408 
    409 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    410 <pre>
    411 >>> from pyasn1.type import univ
    412 >>> from pyasn1.codec.ber import encoder, decoder
    413 >>> innerSubstrate = encoder.encode(univ.Integer(1234))
    414 >>> innerSubstrate
    415 b'\x02\x02\x04\xd2'
    416 >>> any = univ.Any(innerSubstrate)
    417 >>> any
    418 Any(b'\x02\x02\x04\xd2')
    419 >>> substrate = encoder.encode(any)
    420 >>> substrate
    421 b'\x02\x02\x04\xd2'
    422 >>>
    423 </pre>
    424 </td></tr></table>
    425 
    426 <p>
    427 As with Choice type encoding, there is no traces of Any type in substrate.
    428 Obviously, the substrate we are dealing with, will decode into the inner
    429 [Integer] component, unless pyasn1 specification is given to guide the 
    430 decoder. Continuing previous code:
    431 </p>
    432 
    433 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    434 <pre>
    435 >>> from pyasn1.type import univ
    436 >>> from pyasn1.codec.ber import encoder, decoder
    437 
    438 >>> decoder.decode(substrate)
    439 (Integer(1234), b'')
    440 >>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any())
    441 >>> any
    442 Any(b'\x02\x02\x04\xd2')
    443 >>> decoder.decode(str(any))
    444 (Integer(1234), b'')
    445 >>>
    446 </pre>
    447 </td></tr></table>
    448 
    449 <p>
    450 Both CHOICE and ANY types are widely used in practice. Reader is welcome to
    451 take a look at 
    452 <a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt>
    453 ASN.1 specifications of X.509 applications</a> for more information.
    454 </p>
    455 
    456 <a name="2.2.2"></a>
    457 <h4>
    458 2.2.2 Ignoring unknown types
    459 </h4>
    460 
    461 <p>
    462 When dealing with a loosely specified ASN.1 structure, the receiving
    463 end may not be aware of some types present in the substrate. It may be
    464 convenient then to turn decoder into a recovery mode. Whilst there, decoder
    465 will not bail out when hit an unknown tag but rather treat it as an Any
    466 type.
    467 </p>
    468 
    469 <table bgcolor="lightgray" border=0 width=100%><TR><TD>
    470 <pre>
    471 >>> from pyasn1.type import univ, tag
    472 >>> from pyasn1.codec.ber import encoder, decoder
    473 >>> taggedInt = univ.Integer(12345).subtype(
    474 ...   implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40)
    475 ... )
    476 >>> substrate = encoder.encode(taggedInt)
    477 >>> decoder.decode(substrate)
    478 Traceback (most recent call last):
    479 ...
    480 pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec
    481 >>>
    482 >>> decoder.decode.defaultErrorState = decoder.stDumpRawValue
    483 >>> decoder.decode(substrate)
    484 (Any(b'\x9f(\x0209'), '')
    485 >>>
    486 </pre>
    487 </td></tr></table>
    488 
    489 <p>
    490 It's also possible to configure a custom decoder, to handle unknown tags
    491 found in substrate. This can be done by means of <b>defaultRawDecoder</b>
    492 attribute holding a reference to type decoder object. Refer to the source
    493 for API details.
    494 </p>
    495 
    496 <hr>
    497 
    498 </td>
    499 </tr>
    500 </table>
    501 </center>
    502 </body>
    503 </html>
    504