Home | History | Annotate | Download | only in tagsoup
      1 Changes from 1.2 to 1.2.1
      2 =========================
      3 Match DOCTYPE case-blind
      4 Extend PushbackReader's size for oddball cases like & followed by CR
      5 Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table
      6 
      7 Changes from 1.1.3 to 1.2
      8 =========================
      9 Changed license to Apache 2.0
     10 Bogon default model is now ANY, not EMPTY
     11 Support new DOCTYPE output switches --doctype-system and --doctype-public
     12 Support new XML declaration output switches --standalone and --version
     13 New --norootbogons switch makes bogons children of the root
     14 Don't resolve entity references in attribute values unless semicolon-terminated
     15 Support character entities above U+FFFF
     16 Add character entities from the 2007-12-14 draft of xml-entity-names
     17 Call SAX events startPrefixMapping and endPrefixMapping to report prefixes
     18 Clean up newline processing, shrinking html.stml considerably
     19 Allow link elements in the body as well as the head, to avoid excess bodies
     20 Allow tables inside paragraphs
     21 Allow cells and forms in thead and tfoot elements without intervening tr element
     22 The span element is no longer restartable
     23 Support non-standard elements bgsound, blink, canvas, comment, listing,
     24 	marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp
     25 In HTML mode, boolean attributes like checked are output in minimized form
     26 Correctly handle runs of less-than characters
     27 Suppress all but the first DOCTYPE declaration
     28 Modify PI targets containing colons to have underscores instead
     29 The case of element tags is now canonicalized to the schema
     30 PI targets are no longer forced to lower case
     31 
     32 Changes from 1.1.2 to 1.1.3
     33 ===========================
     34 Allow Parser.set* methods to accept null
     35 Allow setting the LexicalHandler feature to be null
     36 	in both cases means "use default behavior"
     37 
     38 Changes from 1.1.1 to 1.1.2
     39 ===========================
     40 Setting CDATAElementsFeature didn't really set CDATAElements instance variable
     41 
     42 Changes from 1.1 to 1.1.1
     43 =========================
     44 Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling
     45 Added lexical handler calls to startCDATA/endCDATA from CDATA section handling
     46 Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch
     47 
     48 Changes from 1.0.5 to 1.1
     49 =========================
     50 Add Tatu Saloranta's JAXP support package
     51 
     52 Changes from 1.0.4 to 1.0.5
     53 ===========================
     54 Major repairs to comment scanning
     55 Skip leading BOM
     56 Comment out debugging code in PYXWriter
     57 Allow &#X as well as &#x
     58 Add net.sf.saxon to list of supported XSLT engines
     59 
     60 Changes from 1.0.4 to 1.0.3
     61 ===========================
     62 Certain options were mutually exclusive that should not have been
     63 Blocked XML declaration from specifying an encoding of ""
     64 --method=html was not doing the right thing
     65 
     66 Changes from 1.0.3 to 1.0.2
     67 ===========================
     68 Fixed build file to use Java target version 1.4
     69 Fixed --version switch to print the right thing
     70 
     71 Changes from 1.0.1 to 1.0.2
     72 ===========================
     73 Version attribute default value removed from html element
     74 Leading and trailing hyphens now trimmed properly from comments
     75 Added --output-encoding switch to control encoding
     76 If output encoding is Unicode, don't generate character references
     77 Whitespace compressed and junk stripped from public identifiers
     78 
     79 Changes from 1.0 to 1.0.1
     80 =========================
     81 Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace
     82 	Patch due to David Pashley
     83 Insert spaces to break up -- in comments
     84 Change bogus chars in publicids to spaces
     85 --lexical switch now outputs DOCTYPE if there is one
     86 Remove unnecessary blank line after XML declaration
     87 
     88 Changes from 1.0rc9 to 1.0
     89 ==========================
     90 Added feature to control restartability
     91 	Patch due to Nikita Zhuk
     92 Added corresponding --norestart switch in CommandLine
     93 Made translate-colons feature actually work
     94 
     95 Changes from 1.0rc8 to 1.0rc9
     96 =============================
     97 If there is a publicid but no systemid, set systemid to ""
     98 
     99 Changes from 1.0rc7 to 1.0rc8
    100 =============================
    101 Fixed paper-bag bug (source didn't match binary in release)
    102 
    103 Changes from 1.0rc6 to 1.0rc7
    104 =============================
    105 LexicalHandler now gets DOCTYPE information (publicid and systemid)
    106 	Patch due to Mike Bremford
    107 HTMLScanner now reports more useful debug output when not commented out
    108 	Patch due to Mike Bremford
    109 Change "<memberOfAny>" to exclude "<root>" pseudo-element
    110 	This prevents "script" from being output as a root
    111 The shared HTMLParser object has been eliminated
    112 
    113 Changes from 1.0rc5 to 1.0rc6
    114 =============================
    115 If namespaceFeature is false, uri and localname are passed as empty strings
    116 The namespacePrefixesFeature is now always false
    117 Command line switch --nons no longer affects namespacePrefixesFeature
    118 Command line switch --html now implies --nons
    119 XMLWriter is now told directly to use the schema's URI as default namespace
    120 XMLWriter now takes the element name from the qname if localname is empty
    121 
    122 Changes from 1.0rc4 to 1.0rc5
    123 =============================
    124 The --nodefault switch now removes only default attributes, not all of them
    125 Added --nocolons switch and translate-colons feature to convert ":"
    126 	in names to "_" (thus suppressing namespaces other than the basic one)
    127 The root element can be unknown without problem
    128 Empty <script/> and <style/> tags now work
    129 Added all standard SAX2 features to feature hashtable
    130 Reimplemented namespacePrefixes feature (broken since 1.0rc3)
    131 
    132 Changes from 1.0rc3 to 1.0rc4
    133 =============================
    134 Remove trailing ? from processing instructions (in case the input is XHTML)
    135 Added Javadocs for all SAX standard and TagSoup-specific features and properties
    136 Fixed termination conditions for entity/character references
    137 Fixed EOF-pushback bug that was generating bogus &#x65535; references
    138 Added Parser feature and --nodefaults switch to ignore default attribute values
    139 Added support for SAX Locator
    140 Updated AFL license to version 3.0
    141 Scanner buffer size increases as needed, allowing large attribute values
    142 Look for various XSLT implementations as available (still fails in raw 5.0)
    143 Clean up handling of XML empty tags and SGML minimized end-tags
    144 Support proper options and help message internally
    145 Use Hashtable in CommandLine class instead of HashMap
    146 Do proper buffering of InputStream and Reader
    147 Clean up content model of noframes element
    148 Removed htmlMode in XMLWriter
    149 Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes
    150 Command line option --html sets both of these
    151 Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt)
    152 Removed various validity problems in html.tssl
    153 When processing a start-tag, don't restart elements that aren't in the new
    154 	element's content model
    155 Remove bogus double param in tssl.xslt
    156 
    157 Changes from 1.0rc2 to 1.0rc3
    158 =============================
    159 Convert CR and CRLF to LF in comments and PIs
    160 Force empty elements to close immediately
    161 Match close tags of CDATA elements more precisely (but case-blind)
    162 Process switches on the command line
    163 Man page available
    164 
    165 Changes from 1.0rc1 to 1.0rc2
    166 =============================
    167 Isolated & and &# now don't crash parser
    168 TagSoup no longer depends on /dev/stdin existing
    169 Refactored Parser class, removing main method to new CommandLine class
    170 Changes to content models of form, button, table, and tr elements in html.tssl
    171 '</scr' + 'ipt>' in a script element no longer terminates it
    172 Introduced "uncloseability" of form and table elements
    173 "pyxin" property specifies that input is in PYX format
    174 Correctly cope with unexpected characters around colons, also with multiple colons
    175 Correctly output comments with "--" in them (by adding a space)
    176 
    177 Changes from 0.10.2 to 1.0rc1
    178 =============================
    179 Script can now appear anywhere
    180 Switch -nocdata correctly implemented
    181 Eliminated useless M_n constants in Schema
    182 Introduced <memberofAny> and <isRoot> as alternatives to
    183 	<memberOf> in TSSL
    184 Allow prefixes in element names
    185 Attributes are now normalized
    186 Expanded public API for Element and ElementType
    187 Javadoc improved
    188 
    189 Changes from 0.10.1 to 0.10.2
    190 =============================
    191 Removed misfeature whereby > terminated a tag even inside quotes
    192 Added licensing language to XSLT scripts, RELAX NG schemas
    193 Removed long-standing mishandling of entity references in attributes
    194 Cleaned up logic for converting junky strings to proper XML Names
    195 Correctly handle empty tag that has no whitespace or attributes
    196 Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element
    197 Added script element to content model of head element
    198 
    199 Changes from 0.9.7 to 0.10.1 (there is no 0.10.0):
    200 ==================================================
    201 Convert to XSLT configuration exclusively;
    202 	Perl code and tab-separated tables are gone
    203 Remove xmlns:* attributes
    204 Append "_" to attribute names ending in ":"
    205 Don't prepend "_" to an attribute name starting in "_"
    206 Handle namespace prefixes in attributes:
    207 	"xml" prefix is handled correctly
    208 	other prefixes are mapped to "urn:x-prefix:foo"
    209 Ignore XML declarations
    210 -Dnocdata=true turns off F_CDATA on script and style elements
    211 Fixed off-by-one errors in character references that made them uninterpreted
    212 Start-tags ending in a minimized attribute are no longer being dropped
    213 XML empty tags are now supported (though slashes are still allowed in
    214 	unquoted attribute values)
    215 
    216 Changes from 0.9.6 to 0.9.7:
    217 ============================
    218 Upgraded AFL to version 2.1
    219 Passed through newlines in character content (very old bug)
    220 
    221 Changes from 0.9.5 to 0.9.6:
    222 ============================
    223 Script element can appear directly in body
    224 ">" terminates a start-tag even inside a quoted attribute,
    225 	to protect against unbalanced quotes
    226 "_" is prepended to attributes that don't begin with a letter
    227 Remove "xmlns" attributes from the input
    228 All standard features can now be set
    229 	(although there is no effect from doing so)
    230 New "bogons-empty" feature can be set to false to give bogons
    231 	 content model of ANY rather than EMPTY;
    232 	-Dany switch sets this feature to false
    233 TSSL now has an explicit group element to declare an element group
    234 STML is a new XML format for modeling state-table changes
    235 License updated to AFL 2.1
    236 
    237 Changes from 0.9.4 to 0.9.5:
    238 ============================
    239 S in the statetable now means \r and \n and \t as well as space
    240 	(as was always intended; brain fart!)
    241 Ins and del elements are now allowed everywhere
    242 TSSL now correctly supports attributes that are legal on all elements
    243 
    244 Changes from 0.9.3 to 0.9.4:
    245 ============================
    246 Fixed paper-bag bug that revealed attribute type BOOLEAN to applications.
    247 Obsolete ABSTRACT removed in favor of README.
    248 Improved implementation of CDATA restart after bogus end-tag.
    249 Allowed hyphen, underscore, and period in names as well as colon.
    250 First cut at TagSoup Schema Language -- doesn't do anything yet.
    251 Support CDATA sections on input.
    252 Don't generate built-in entities within CDATA elements.
    253 
    254 Changes from 0.9.2 to 0.9.3:
    255 ============================
    256 Convenience main program "tagsoup" in bin directory.
    257 Begin to integrate tests.
    258 Introduced BOOLEAN type (currently just converted to NMTOKEN).
    259 Features that actually work are now named constants in Parser.
    260 Double root elements are really gone now.
    261 ID attributes weren't being removed from restarted elements.
    262 Fixed a bug that made unknown elements disappear in some cases.
    263 Parser is now safely reusable.
    264 PYXWriter and XMLWriter now implement LexicalHandler.
    265 Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler.
    266 ScanHandler methods now throw only SAXException, not also IOException.
    267 -Dlexical=true switch sets the ContentHandler as a LexicalHandler as well
    268 	(XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all).
    269 -Dreuse=true switch reuses a single Parser object (no great speed gain).
    270 We now disallow an a element as the child of another a element.
    271 An empty input is now treated as zero-length character content.
    272 HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods.
    273 CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux).
    274 
    275 Changes from 0.9.1 to 0.9.2:
    276 ============================
    277 No longer inserts bogus ; after unknown entity reference without ;.
    278 Consecutive entity references now work correctly.
    279 Setting namespaces and namespace-prefixes methods now works.
    280 -Dnons=true option turns off namespace and prefix.
    281 New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons"
    282 	suppresses unknown start-tags (any end-tag will be automatically ignored).
    283 -Dnobogons=true option turns ignore-bogons on.
    284 Suppress unknown and/or empty initial start-tag always
    285 	(prevents double root element).
    286 Schema now allows style as an inline element, like script.
    287 Schema now allows tr as a child of table to avoid problems with embedded tables.
    288 Clear Parser instance variables to make Parsers properly reusable.
    289 
    290 Changes from 0.9 to 0.9.1:
    291 ==========================
    292 Incorporated patch for -jar support by Joseph Walton.
    293 Incorporated patch for Megginson XMLWriter support by Joseph Walton.
    294 Changed existing XMLWriter to HTMLWriter.
    295 Rewrote Parsermain for better features, removed Tester class.
    296 -Dnewline=true removed, now implied by -DHTML=true.
    297 -Dfiles=true now used to generate separate outputs (old Tester behavior)
    298 	with extension xhtml (removing any old extension).
    299 Fixed nasty bug in HTMLScanner that was failing to fix unusual entities.
    300 Don't attempt to smash whitespace to spaces any more.
    301 
    302 Changes from 0.8 to 0.9:
    303 ========================
    304 Ant-ified by Martin Rademacher.
    305 Don't suppress colons in element names.
    306 Entity problems fixed (I hope).
    307 Can now set namespace and namespace-prefixes features (without effect).
    308 Properly templatize HTMLModels.java.
    309 Attributes are no longer in the HTML namespace.
    310