Home | History | Annotate | Download | only in beautifulsoup4
      1 Additions
      2 ---------
      3 
      4 More of the jQuery API: nextUntil?
      5 
      6 Optimizations
      7 -------------
      8 
      9 The html5lib tree builder doesn't use the standard tree-building API,
     10 which worries me and has resulted in a number of bugs.
     11 
     12 markup_attr_map can be optimized since it's always a map now.
     13 
     14 Upon encountering UTF-16LE data or some other uncommon serialization
     15 of Unicode, UnicodeDammit will convert the data to Unicode, then
     16 encode it at UTF-8. This is wasteful because it will just get decoded
     17 back to Unicode.
     18 
     19 CDATA
     20 -----
     21 
     22 The elementtree XMLParser has a strip_cdata argument that, when set to
     23 False, should allow Beautiful Soup to preserve CDATA sections instead
     24 of treating them as text. Except it doesn't. (This argument is also
     25 present for HTMLParser, and also does nothing there.)
     26 
     27 Currently, htm5lib converts CDATA sections into comments. An
     28 as-yet-unreleased version of html5lib changes the parser's handling of
     29 CDATA sections to allow CDATA sections in tags like <svg> and
     30 <math>. The HTML5TreeBuilder will need to be updated to create CData
     31 objects instead of Comment objects in this situation.
     32