1 <html> 2 <title> 3 PyASN1 codecs 4 </title> 5 <head> 6 </head> 7 <body> 8 <center> 9 <table width=60%> 10 <tr> 11 <td> 12 <h3> 13 2. PyASN1 Codecs 14 </h3> 15 16 <p> 17 In ASN.1 context, 18 <a href=http://en.wikipedia.org/wiki/Codec>codec</a> 19 is a program that transforms between concrete data structures and a stream 20 of octets, suitable for transmission over the wire. This serialized form of 21 data is sometimes called <i>substrate</i> or <i>essence</i>. 22 </p> 23 24 <p> 25 In pyasn1 implementation, substrate takes shape of Python 3 bytes or 26 Python 2 string objects. 27 </p> 28 29 <p> 30 One of the properties of a codec is its ability to cope with incomplete 31 data and/or substrate what implies codec to be stateful. In other words, 32 when decoder runs out of substrate and data item being recovered is still 33 incomplete, stateful codec would suspend and complete data item recovery 34 whenever the rest of substrate becomes available. Similarly, stateful encoder 35 would encode data items in multiple steps waiting for source data to 36 arrive. Codec restartability is especially important when application deals 37 with large volumes of data and/or runs on low RAM. For an interesting 38 discussion on codecs options and design choices, refer to 39 <a href=http://directory.apache.org/subprojects/asn1/>Apache ASN.1 project</a> 40 . 41 </p> 42 43 <p> 44 As of this writing, codecs implemented in pyasn1 are all stateless, mostly 45 to keep the code simple. 46 </p> 47 48 <p> 49 The pyasn1 package currently supports 50 <a href=http://en.wikipedia.org/wiki/Basic_encoding_rules>BER</a> codec and 51 its variations -- 52 <a href=http://en.wikipedia.org/wiki/Canonical_encoding_rules>CER</a> and 53 <a href=http://en.wikipedia.org/wiki/Distinguished_encoding_rules>DER</a>. 54 More ASN.1 codecs are planned for implementation in the future. 55 </p> 56 57 <a name="2.1"></a> 58 <h4> 59 2.1 Encoders 60 </h4> 61 62 <p> 63 Encoder is used for transforming pyasn1 value objects into substrate. Only 64 pyasn1 value objects could be serialized, attempts to process pyasn1 type 65 objects will cause encoder failure. 66 </p> 67 68 <p> 69 The following code will create a pyasn1 Integer object and serialize it with 70 BER encoder: 71 </p> 72 73 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 74 <pre> 75 >>> from pyasn1.type import univ 76 >>> from pyasn1.codec.ber import encoder 77 >>> encoder.encode(univ.Integer(123456)) 78 b'\x02\x03\x01\xe2@' 79 >>> 80 </pre> 81 </td></tr></table> 82 83 <p> 84 BER standard also defines a so-called <i>indefinite length</i> encoding form 85 which makes large data items processing more memory efficient. It is mostly 86 useful when encoder does not have the whole value all at once and the 87 length of the value can not be determined at the beginning of encoding. 88 </p> 89 90 <p> 91 <i>Constructed encoding</i> is another feature of BER closely related to the 92 indefinite length form. In essence, a large scalar value (such as ASN.1 93 character BitString type) could be chopped into smaller chunks by encoder 94 and transmitted incrementally to limit memory consumption. Unlike indefinite 95 length case, the length of the whole value must be known in advance when 96 using constructed, definite length encoding form. 97 </p> 98 99 <p> 100 Since pyasn1 codecs are not restartable, pyasn1 encoder may only encode data 101 item all at once. However, even in this case, generating indefinite length 102 encoding may help a low-memory receiver, running a restartable decoder, 103 to process a large data item. 104 </p> 105 106 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 107 <pre> 108 >>> from pyasn1.type import univ 109 >>> from pyasn1.codec.ber import encoder 110 >>> encoder.encode( 111 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), 112 ... defMode=False, 113 ... maxChunkSize=8 114 ... ) 115 b'$\x80\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ 116 t\x04\x08he lazy \x04\x03dog\x00\x00' 117 >>> 118 >>> encoder.encode( 119 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), 120 ... maxChunkSize=8 121 ... ) 122 b'$7\x04\x08The quic\x04\x08k brown \x04\x08fox jump\x04\x08s over \ 123 t\x04\x08he lazy \x04\x03dog' 124 </pre> 125 </td></tr></table> 126 127 <p> 128 The <b>defMode</b> encoder parameter disables definite length encoding mode, 129 while the optional <b>maxChunkSize</b> parameter specifies desired 130 substrate chunk size that influences memory requirements at the decoder's end. 131 </p> 132 133 <p> 134 To use CER or DER encoders one needs to explicitly import and call them - the 135 APIs are all compatible. 136 </p> 137 138 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 139 <pre> 140 >>> from pyasn1.type import univ 141 >>> from pyasn1.codec.ber import encoder as ber_encoder 142 >>> from pyasn1.codec.cer import encoder as cer_encoder 143 >>> from pyasn1.codec.der import encoder as der_encoder 144 >>> ber_encoder.encode(univ.Boolean(True)) 145 b'\x01\x01\x01' 146 >>> cer_encoder.encode(univ.Boolean(True)) 147 b'\x01\x01\xff' 148 >>> der_encoder.encode(univ.Boolean(True)) 149 b'\x01\x01\xff' 150 >>> 151 </pre> 152 </td></tr></table> 153 154 <a name="2.2"></a> 155 <h4> 156 2.2 Decoders 157 </h4> 158 159 <p> 160 In the process of decoding, pyasn1 value objects are created and linked to 161 each other, based on the information containted in the substrate. Thus, 162 the original pyasn1 value object(s) are recovered. 163 </p> 164 165 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 166 <pre> 167 >>> from pyasn1.type import univ 168 >>> from pyasn1.codec.ber import encoder, decoder 169 >>> substrate = encoder.encode(univ.Boolean(True)) 170 >>> decoder.decode(substrate) 171 (Boolean('True(1)'), b'') 172 >>> 173 </pre> 174 </td></tr></table> 175 176 <p> 177 Commenting on the code snippet above, pyasn1 decoder accepts substrate 178 as an argument and returns a tuple of pyasn1 value object (possibly 179 a top-level one in case of constructed object) and unprocessed part 180 of input substrate. 181 </p> 182 183 <p> 184 All pyasn1 decoders can handle both definite and indefinite length 185 encoding modes automatically, explicit switching into one mode 186 to another is not required. 187 </p> 188 189 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 190 <pre> 191 >>> from pyasn1.type import univ 192 >>> from pyasn1.codec.ber import encoder, decoder 193 >>> substrate = encoder.encode( 194 ... univ.OctetString('The quick brown fox jumps over the lazy dog'), 195 ... defMode=False, 196 ... maxChunkSize=8 197 ... ) 198 >>> decoder.decode(substrate) 199 (OctetString(b'The quick brown fox jumps over the lazy dog'), b'') 200 >>> 201 </pre> 202 </td></tr></table> 203 204 <p> 205 Speaking of BER/CER/DER encoding, in many situations substrate may not contain 206 all necessary information needed for complete and accurate ASN.1 values 207 recovery. The most obvious cases include implicitly tagged ASN.1 types 208 and constrained types. 209 </p> 210 211 <p> 212 As discussed earlier in this handbook, when an ASN.1 type is implicitly 213 tagged, previous outermost tag is lost and never appears in substrate. 214 If it is the base tag that gets lost, decoder is unable to pick type-specific 215 value decoder at its table of built-in types, and therefore recover 216 the value part, based only on the information contained in substrate. The 217 approach taken by pyasn1 decoder is to use a prototype pyasn1 type object (or 218 a set of them) to <i>guide</i> the decoding process by matching [possibly 219 incomplete] tags recovered from substrate with those found in prototype pyasn1 220 type objects (also called pyasn1 specification object further in this paper). 221 </p> 222 223 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 224 <pre> 225 >>> from pyasn1.codec.ber import decoder 226 >>> decoder.decode(b'\x02\x01\x0c', asn1Spec=univ.Integer()) 227 Integer(12), b'' 228 >>> 229 </pre> 230 </td></tr></table> 231 232 <p> 233 Decoder would neither modify pyasn1 specification object nor use 234 its current values (if it's a pyasn1 value object), but rather use it as 235 a hint for choosing proper decoder and as a pattern for creating new objects: 236 </p> 237 238 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 239 <pre> 240 >>> from pyasn1.type import univ, tag 241 >>> from pyasn1.codec.ber import encoder, decoder 242 >>> i = univ.Integer(12345).subtype( 243 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) 244 ... ) 245 >>> substrate = encoder.encode(i) 246 >>> substrate 247 b'\x9f(\x0209' 248 >>> decoder.decode(substrate) 249 Traceback (most recent call last): 250 ... 251 pyasn1.error.PyAsn1Error: 252 TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec 253 >>> decoder.decode(substrate, asn1Spec=i) 254 (Integer(12345), b'') 255 >>> 256 </pre> 257 </td></tr></table> 258 259 <p> 260 Notice in the example above, that an attempt to run decoder without passing 261 pyasn1 specification object fails because recovered tag does not belong 262 to any of the built-in types. 263 </p> 264 265 <p> 266 Another important feature of guided decoder operation is the use of 267 values constraints possibly present in pyasn1 specification object. 268 To explain this, we will decode a random integer object into generic Integer 269 and the constrained one. 270 </p> 271 272 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 273 <pre> 274 >>> from pyasn1.type import univ, constraint 275 >>> from pyasn1.codec.ber import encoder, decoder 276 >>> class DialDigit(univ.Integer): 277 ... subtypeSpec = constraint.ValueRangeConstraint(0,9) 278 >>> substrate = encoder.encode(univ.Integer(13)) 279 >>> decoder.decode(substrate) 280 (Integer(13), b'') 281 >>> decoder.decode(substrate, asn1Spec=DialDigit()) 282 Traceback (most recent call last): 283 ... 284 pyasn1.type.error.ValueConstraintError: 285 ValueRangeConstraint(0, 9) failed at: 13 286 >>> 287 </pre> 288 </td></tr></table> 289 290 <p> 291 Similarily to encoders, to use CER or DER decoders application has to 292 explicitly import and call them - all APIs are compatible. 293 </p> 294 295 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 296 <pre> 297 >>> from pyasn1.type import univ 298 >>> from pyasn1.codec.ber import encoder as ber_encoder 299 >>> substrate = ber_encoder.encode(univ.OctetString('http://pyasn1.sf.net')) 300 >>> 301 >>> from pyasn1.codec.ber import decoder as ber_decoder 302 >>> from pyasn1.codec.cer import decoder as cer_decoder 303 >>> from pyasn1.codec.der import decoder as der_decoder 304 >>> 305 >>> ber_decoder.decode(substrate) 306 (OctetString(b'http://pyasn1.sf.net'), b'') 307 >>> cer_decoder.decode(substrate) 308 (OctetString(b'http://pyasn1.sf.net'), b'') 309 >>> der_decoder.decode(substrate) 310 (OctetString(b'http://pyasn1.sf.net'), b'') 311 >>> 312 </pre> 313 </td></tr></table> 314 315 <a name="2.2.1"></a> 316 <h4> 317 2.2.1 Decoding untagged types 318 </h4> 319 320 <p> 321 It has already been mentioned, that ASN.1 has two "special case" types: 322 CHOICE and ANY. They are different from other types in part of 323 tagging - unless these two are additionally tagged, neither of them will 324 have their own tag. Therefore these types become invisible in substrate 325 and can not be recovered without passing pyasn1 specification object to 326 decoder. 327 </p> 328 329 <p> 330 To explain the issue, we will first prepare a Choice object to deal with: 331 </p> 332 333 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 334 <pre> 335 >>> from pyasn1.type import univ, namedtype 336 >>> class CodeOrMessage(univ.Choice): 337 ... componentType = namedtype.NamedTypes( 338 ... namedtype.NamedType('code', univ.Integer()), 339 ... namedtype.NamedType('message', univ.OctetString()) 340 ... ) 341 >>> 342 >>> codeOrMessage = CodeOrMessage() 343 >>> codeOrMessage.setComponentByName('message', 'my string value') 344 >>> print(codeOrMessage.prettyPrint()) 345 CodeOrMessage: 346 message=b'my string value' 347 >>> 348 </pre> 349 </td></tr></table> 350 351 <p> 352 Let's now encode this Choice object and then decode its substrate 353 with and without pyasn1 specification object: 354 </p> 355 356 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 357 <pre> 358 >>> from pyasn1.codec.ber import encoder, decoder 359 >>> substrate = encoder.encode(codeOrMessage) 360 >>> substrate 361 b'\x04\x0fmy string value' 362 >>> encoder.encode(univ.OctetString('my string value')) 363 b'\x04\x0fmy string value' 364 >>> 365 >>> decoder.decode(substrate) 366 (OctetString(b'my string value'), b'') 367 >>> codeOrMessage, substrate = decoder.decode(substrate, asn1Spec=CodeOrMessage()) 368 >>> print(codeOrMessage.prettyPrint()) 369 CodeOrMessage: 370 message=b'my string value' 371 >>> 372 </pre> 373 </td></tr></table> 374 375 <p> 376 First thing to notice in the listing above is that the substrate produced 377 for our Choice value object is equivalent to the substrate for an OctetString 378 object initialized to the same value. In other words, any information about 379 the Choice component is absent in encoding. 380 </p> 381 382 <p> 383 Sure enough, that kind of substrate will decode into an OctetString object, 384 unless original Choice type object is passed to decoder to guide the decoding 385 process. 386 </p> 387 388 <p> 389 Similarily untagged ANY type behaves differently on decoding phase - when 390 decoder bumps into an Any object in pyasn1 specification, it stops decoding 391 and puts all the substrate into a new Any value object in form of an octet 392 string. Concerned application could then re-run decoder with an additional, 393 more exact pyasn1 specification object to recover the contents of Any 394 object. 395 </p> 396 397 <p> 398 As it was mentioned elsewhere in this paper, Any type allows for incomplete 399 or changing ASN.1 specification to be handled gracefully by decoder and 400 applications. 401 </p> 402 403 <p> 404 To illustrate the working of Any type, we'll have to make the stage 405 by encoding a pyasn1 object and then putting its substrate into an any 406 object. 407 </p> 408 409 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 410 <pre> 411 >>> from pyasn1.type import univ 412 >>> from pyasn1.codec.ber import encoder, decoder 413 >>> innerSubstrate = encoder.encode(univ.Integer(1234)) 414 >>> innerSubstrate 415 b'\x02\x02\x04\xd2' 416 >>> any = univ.Any(innerSubstrate) 417 >>> any 418 Any(b'\x02\x02\x04\xd2') 419 >>> substrate = encoder.encode(any) 420 >>> substrate 421 b'\x02\x02\x04\xd2' 422 >>> 423 </pre> 424 </td></tr></table> 425 426 <p> 427 As with Choice type encoding, there is no traces of Any type in substrate. 428 Obviously, the substrate we are dealing with, will decode into the inner 429 [Integer] component, unless pyasn1 specification is given to guide the 430 decoder. Continuing previous code: 431 </p> 432 433 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 434 <pre> 435 >>> from pyasn1.type import univ 436 >>> from pyasn1.codec.ber import encoder, decoder 437 438 >>> decoder.decode(substrate) 439 (Integer(1234), b'') 440 >>> any, substrate = decoder.decode(substrate, asn1Spec=univ.Any()) 441 >>> any 442 Any(b'\x02\x02\x04\xd2') 443 >>> decoder.decode(str(any)) 444 (Integer(1234), b'') 445 >>> 446 </pre> 447 </td></tr></table> 448 449 <p> 450 Both CHOICE and ANY types are widely used in practice. Reader is welcome to 451 take a look at 452 <a href=http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt> 453 ASN.1 specifications of X.509 applications</a> for more information. 454 </p> 455 456 <a name="2.2.2"></a> 457 <h4> 458 2.2.2 Ignoring unknown types 459 </h4> 460 461 <p> 462 When dealing with a loosely specified ASN.1 structure, the receiving 463 end may not be aware of some types present in the substrate. It may be 464 convenient then to turn decoder into a recovery mode. Whilst there, decoder 465 will not bail out when hit an unknown tag but rather treat it as an Any 466 type. 467 </p> 468 469 <table bgcolor="lightgray" border=0 width=100%><TR><TD> 470 <pre> 471 >>> from pyasn1.type import univ, tag 472 >>> from pyasn1.codec.ber import encoder, decoder 473 >>> taggedInt = univ.Integer(12345).subtype( 474 ... implicitTag=tag.Tag(tag.tagClassContext, tag.tagFormatSimple, 40) 475 ... ) 476 >>> substrate = encoder.encode(taggedInt) 477 >>> decoder.decode(substrate) 478 Traceback (most recent call last): 479 ... 480 pyasn1.error.PyAsn1Error: TagSet(Tag(tagClass=128, tagFormat=0, tagId=40)) not in asn1Spec 481 >>> 482 >>> decoder.decode.defaultErrorState = decoder.stDumpRawValue 483 >>> decoder.decode(substrate) 484 (Any(b'\x9f(\x0209'), '') 485 >>> 486 </pre> 487 </td></tr></table> 488 489 <p> 490 It's also possible to configure a custom decoder, to handle unknown tags 491 found in substrate. This can be done by means of <b>defaultRawDecoder</b> 492 attribute holding a reference to type decoder object. Refer to the source 493 for API details. 494 </p> 495 496 <hr> 497 498 </td> 499 </tr> 500 </table> 501 </center> 502 </body> 503 </html> 504