1 Changes from 1.2 to 1.2.1 2 ========================= 3 Match DOCTYPE case-blind 4 Extend PushbackReader's size for oddball cases like & followed by CR 5 Leo Sutic's 2x-4x speedup by precompiling HTMLScanner table 6 7 Changes from 1.1.3 to 1.2 8 ========================= 9 Changed license to Apache 2.0 10 Bogon default model is now ANY, not EMPTY 11 Support new DOCTYPE output switches --doctype-system and --doctype-public 12 Support new XML declaration output switches --standalone and --version 13 New --norootbogons switch makes bogons children of the root 14 Don't resolve entity references in attribute values unless semicolon-terminated 15 Support character entities above U+FFFF 16 Add character entities from the 2007-12-14 draft of xml-entity-names 17 Call SAX events startPrefixMapping and endPrefixMapping to report prefixes 18 Clean up newline processing, shrinking html.stml considerably 19 Allow link elements in the body as well as the head, to avoid excess bodies 20 Allow tables inside paragraphs 21 Allow cells and forms in thead and tfoot elements without intervening tr element 22 The span element is no longer restartable 23 Support non-standard elements bgsound, blink, canvas, comment, listing, 24 marquee, nobr, ruby, rbc, rtc, rb, rt, rp, wbr, xmp 25 In HTML mode, boolean attributes like checked are output in minimized form 26 Correctly handle runs of less-than characters 27 Suppress all but the first DOCTYPE declaration 28 Modify PI targets containing colons to have underscores instead 29 The case of element tags is now canonicalized to the schema 30 PI targets are no longer forced to lower case 31 32 Changes from 1.1.2 to 1.1.3 33 =========================== 34 Allow Parser.set* methods to accept null 35 Allow setting the LexicalHandler feature to be null 36 in both cases means "use default behavior" 37 38 Changes from 1.1.1 to 1.1.2 39 =========================== 40 Setting CDATAElementsFeature didn't really set CDATAElements instance variable 41 42 Changes from 1.1 to 1.1.1 43 ========================= 44 Removed lexical handler calls to startCDATA/endCDATA from CDATA element handling 45 Added lexical handler calls to startCDATA/endCDATA from CDATA section handling 46 Added CDATAElementsFeature, the programmatic equivalent of the --nocdata switch 47 48 Changes from 1.0.5 to 1.1 49 ========================= 50 Add Tatu Saloranta's JAXP support package 51 52 Changes from 1.0.4 to 1.0.5 53 =========================== 54 Major repairs to comment scanning 55 Skip leading BOM 56 Comment out debugging code in PYXWriter 57 Allow &#X as well as &#x 58 Add net.sf.saxon to list of supported XSLT engines 59 60 Changes from 1.0.4 to 1.0.3 61 =========================== 62 Certain options were mutually exclusive that should not have been 63 Blocked XML declaration from specifying an encoding of "" 64 --method=html was not doing the right thing 65 66 Changes from 1.0.3 to 1.0.2 67 =========================== 68 Fixed build file to use Java target version 1.4 69 Fixed --version switch to print the right thing 70 71 Changes from 1.0.1 to 1.0.2 72 =========================== 73 Version attribute default value removed from html element 74 Leading and trailing hyphens now trimmed properly from comments 75 Added --output-encoding switch to control encoding 76 If output encoding is Unicode, don't generate character references 77 Whitespace compressed and junk stripped from public identifiers 78 79 Changes from 1.0 to 1.0.1 80 ========================= 81 Added ignorableWhitespaceFeature and --ignorable to report ignorable whitespace 82 Patch due to David Pashley 83 Insert spaces to break up -- in comments 84 Change bogus chars in publicids to spaces 85 --lexical switch now outputs DOCTYPE if there is one 86 Remove unnecessary blank line after XML declaration 87 88 Changes from 1.0rc9 to 1.0 89 ========================== 90 Added feature to control restartability 91 Patch due to Nikita Zhuk 92 Added corresponding --norestart switch in CommandLine 93 Made translate-colons feature actually work 94 95 Changes from 1.0rc8 to 1.0rc9 96 ============================= 97 If there is a publicid but no systemid, set systemid to "" 98 99 Changes from 1.0rc7 to 1.0rc8 100 ============================= 101 Fixed paper-bag bug (source didn't match binary in release) 102 103 Changes from 1.0rc6 to 1.0rc7 104 ============================= 105 LexicalHandler now gets DOCTYPE information (publicid and systemid) 106 Patch due to Mike Bremford 107 HTMLScanner now reports more useful debug output when not commented out 108 Patch due to Mike Bremford 109 Change "<memberOfAny>" to exclude "<root>" pseudo-element 110 This prevents "script" from being output as a root 111 The shared HTMLParser object has been eliminated 112 113 Changes from 1.0rc5 to 1.0rc6 114 ============================= 115 If namespaceFeature is false, uri and localname are passed as empty strings 116 The namespacePrefixesFeature is now always false 117 Command line switch --nons no longer affects namespacePrefixesFeature 118 Command line switch --html now implies --nons 119 XMLWriter is now told directly to use the schema's URI as default namespace 120 XMLWriter now takes the element name from the qname if localname is empty 121 122 Changes from 1.0rc4 to 1.0rc5 123 ============================= 124 The --nodefault switch now removes only default attributes, not all of them 125 Added --nocolons switch and translate-colons feature to convert ":" 126 in names to "_" (thus suppressing namespaces other than the basic one) 127 The root element can be unknown without problem 128 Empty <script/> and <style/> tags now work 129 Added all standard SAX2 features to feature hashtable 130 Reimplemented namespacePrefixes feature (broken since 1.0rc3) 131 132 Changes from 1.0rc3 to 1.0rc4 133 ============================= 134 Remove trailing ? from processing instructions (in case the input is XHTML) 135 Added Javadocs for all SAX standard and TagSoup-specific features and properties 136 Fixed termination conditions for entity/character references 137 Fixed EOF-pushback bug that was generating bogus 񥔵 references 138 Added Parser feature and --nodefaults switch to ignore default attribute values 139 Added support for SAX Locator 140 Updated AFL license to version 3.0 141 Scanner buffer size increases as needed, allowing large attribute values 142 Look for various XSLT implementations as available (still fails in raw 5.0) 143 Clean up handling of XML empty tags and SGML minimized end-tags 144 Support proper options and help message internally 145 Use Hashtable in CommandLine class instead of HashMap 146 Do proper buffering of InputStream and Reader 147 Clean up content model of noframes element 148 Removed htmlMode in XMLWriter 149 Added support for XSLT output options METHOD=html and OMIT_XML_DECLARATION=yes 150 Command line option --html sets both of these 151 Wrote simple validator for TSSL schemas (tssl/tssl-validator.xslt) 152 Removed various validity problems in html.tssl 153 When processing a start-tag, don't restart elements that aren't in the new 154 element's content model 155 Remove bogus double param in tssl.xslt 156 157 Changes from 1.0rc2 to 1.0rc3 158 ============================= 159 Convert CR and CRLF to LF in comments and PIs 160 Force empty elements to close immediately 161 Match close tags of CDATA elements more precisely (but case-blind) 162 Process switches on the command line 163 Man page available 164 165 Changes from 1.0rc1 to 1.0rc2 166 ============================= 167 Isolated & and &# now don't crash parser 168 TagSoup no longer depends on /dev/stdin existing 169 Refactored Parser class, removing main method to new CommandLine class 170 Changes to content models of form, button, table, and tr elements in html.tssl 171 '</scr' + 'ipt>' in a script element no longer terminates it 172 Introduced "uncloseability" of form and table elements 173 "pyxin" property specifies that input is in PYX format 174 Correctly cope with unexpected characters around colons, also with multiple colons 175 Correctly output comments with "--" in them (by adding a space) 176 177 Changes from 0.10.2 to 1.0rc1 178 ============================= 179 Script can now appear anywhere 180 Switch -nocdata correctly implemented 181 Eliminated useless M_n constants in Schema 182 Introduced <memberofAny> and <isRoot> as alternatives to 183 <memberOf> in TSSL 184 Allow prefixes in element names 185 Attributes are now normalized 186 Expanded public API for Element and ElementType 187 Javadoc improved 188 189 Changes from 0.10.1 to 0.10.2 190 ============================= 191 Removed misfeature whereby > terminated a tag even inside quotes 192 Added licensing language to XSLT scripts, RELAX NG schemas 193 Removed long-standing mishandling of entity references in attributes 194 Cleaned up logic for converting junky strings to proper XML Names 195 Correctly handle empty tag that has no whitespace or attributes 196 Restore correct 0.9.3 handling of an apparent end-tag in a CDATA element 197 Added script element to content model of head element 198 199 Changes from 0.9.7 to 0.10.1 (there is no 0.10.0): 200 ================================================== 201 Convert to XSLT configuration exclusively; 202 Perl code and tab-separated tables are gone 203 Remove xmlns:* attributes 204 Append "_" to attribute names ending in ":" 205 Don't prepend "_" to an attribute name starting in "_" 206 Handle namespace prefixes in attributes: 207 "xml" prefix is handled correctly 208 other prefixes are mapped to "urn:x-prefix:foo" 209 Ignore XML declarations 210 -Dnocdata=true turns off F_CDATA on script and style elements 211 Fixed off-by-one errors in character references that made them uninterpreted 212 Start-tags ending in a minimized attribute are no longer being dropped 213 XML empty tags are now supported (though slashes are still allowed in 214 unquoted attribute values) 215 216 Changes from 0.9.6 to 0.9.7: 217 ============================ 218 Upgraded AFL to version 2.1 219 Passed through newlines in character content (very old bug) 220 221 Changes from 0.9.5 to 0.9.6: 222 ============================ 223 Script element can appear directly in body 224 ">" terminates a start-tag even inside a quoted attribute, 225 to protect against unbalanced quotes 226 "_" is prepended to attributes that don't begin with a letter 227 Remove "xmlns" attributes from the input 228 All standard features can now be set 229 (although there is no effect from doing so) 230 New "bogons-empty" feature can be set to false to give bogons 231 content model of ANY rather than EMPTY; 232 -Dany switch sets this feature to false 233 TSSL now has an explicit group element to declare an element group 234 STML is a new XML format for modeling state-table changes 235 License updated to AFL 2.1 236 237 Changes from 0.9.4 to 0.9.5: 238 ============================ 239 S in the statetable now means \r and \n and \t as well as space 240 (as was always intended; brain fart!) 241 Ins and del elements are now allowed everywhere 242 TSSL now correctly supports attributes that are legal on all elements 243 244 Changes from 0.9.3 to 0.9.4: 245 ============================ 246 Fixed paper-bag bug that revealed attribute type BOOLEAN to applications. 247 Obsolete ABSTRACT removed in favor of README. 248 Improved implementation of CDATA restart after bogus end-tag. 249 Allowed hyphen, underscore, and period in names as well as colon. 250 First cut at TagSoup Schema Language -- doesn't do anything yet. 251 Support CDATA sections on input. 252 Don't generate built-in entities within CDATA elements. 253 254 Changes from 0.9.2 to 0.9.3: 255 ============================ 256 Convenience main program "tagsoup" in bin directory. 257 Begin to integrate tests. 258 Introduced BOOLEAN type (currently just converted to NMTOKEN). 259 Features that actually work are now named constants in Parser. 260 Double root elements are really gone now. 261 ID attributes weren't being removed from restarted elements. 262 Fixed a bug that made unknown elements disappear in some cases. 263 Parser is now safely reusable. 264 PYXWriter and XMLWriter now implement LexicalHandler. 265 Parser reports comments, startCDATA, and endCDATA events to a LexicalHandler. 266 ScanHandler methods now throw only SAXException, not also IOException. 267 -Dlexical=true switch sets the ContentHandler as a LexicalHandler as well 268 (XMLWriter prints comments, ignores CDATA sections; PYXWriter ignores all). 269 -Dreuse=true switch reuses a single Parser object (no great speed gain). 270 We now disallow an a element as the child of another a element. 271 An empty input is now treated as zero-length character content. 272 HTMLWriter is gone in favor of an extended XMLWriter with get/setHTMLMode methods. 273 CDATA elements only terminaate with matching end-tags (thanks to Sebastien Bardoux). 274 275 Changes from 0.9.1 to 0.9.2: 276 ============================ 277 No longer inserts bogus ; after unknown entity reference without ;. 278 Consecutive entity references now work correctly. 279 Setting namespaces and namespace-prefixes methods now works. 280 -Dnons=true option turns off namespace and prefix. 281 New feature http://www.ccil.org/~cowan/tagsoup/features/ignore-bogons" 282 suppresses unknown start-tags (any end-tag will be automatically ignored). 283 -Dnobogons=true option turns ignore-bogons on. 284 Suppress unknown and/or empty initial start-tag always 285 (prevents double root element). 286 Schema now allows style as an inline element, like script. 287 Schema now allows tr as a child of table to avoid problems with embedded tables. 288 Clear Parser instance variables to make Parsers properly reusable. 289 290 Changes from 0.9 to 0.9.1: 291 ========================== 292 Incorporated patch for -jar support by Joseph Walton. 293 Incorporated patch for Megginson XMLWriter support by Joseph Walton. 294 Changed existing XMLWriter to HTMLWriter. 295 Rewrote Parsermain for better features, removed Tester class. 296 -Dnewline=true removed, now implied by -DHTML=true. 297 -Dfiles=true now used to generate separate outputs (old Tester behavior) 298 with extension xhtml (removing any old extension). 299 Fixed nasty bug in HTMLScanner that was failing to fix unusual entities. 300 Don't attempt to smash whitespace to spaces any more. 301 302 Changes from 0.8 to 0.9: 303 ======================== 304 Ant-ified by Martin Rademacher. 305 Don't suppress colons in element names. 306 Entity problems fixed (I hope). 307 Can now set namespace and namespace-prefixes features (without effect). 308 Properly templatize HTMLModels.java. 309 Attributes are no longer in the HTML namespace. 310