1 page.title=Parsing XML Data 2 parent.title=Performing Network Operations 3 parent.link=index.html 4 5 trainingnavtop=true 6 7 previous.title=Managing Network Usage 8 previous.link=managing.html 9 10 @jd:body 11 12 <div id="tb-wrapper"> 13 <div id="tb"> 14 15 16 17 <h2>This lesson teaches you to</h2> 18 <ol> 19 <li><a href="#choose">Choose a Parser</a></li> 20 <li><a href="#analyze">Analyze the Feed</a></li> 21 <li><a href="#instantiate">Instantiate the Parser</a></li> 22 <li><a href="#read">Read the Feed</a></li> 23 <li><a href="#parse">Parse XML</a></li> 24 <li><a href="#skip">Skip Tags You Don't Care About</a></li> 25 <li><a href="#consume">Consume XML Data</a></li> 26 </ol> 27 28 <h2>You should also read</h2> 29 <ul> 30 <li><a href="{@docRoot}guide/webapps/index.html">Web Apps Overview</a></li> 31 </ul> 32 33 <h2>Try it out</h2> 34 35 <div class="download-box"> 36 <a href="{@docRoot}shareables/training/NetworkUsage.zip" 37 class="button">Download the sample</a> 38 <p class="filename">NetworkUsage.zip</p> 39 </div> 40 41 </div> 42 </div> 43 44 <p>Extensible Markup Language (XML) is a set of rules for encoding documents in 45 machine-readable form. XML is a popular format for sharing data on the internet. 46 Websites that frequently update their content, such as news sites or blogs, 47 often provide an XML feed so that external programs can keep abreast of content 48 changes. Uploading and parsing XML data is a common task for network-connected 49 apps. This lesson explains how to parse XML documents and use their data.</p> 50 51 <h2 id="choose">Choose a Parser</h2> 52 53 <p>We recommend {@link org.xmlpull.v1.XmlPullParser}, which is an efficient and 54 maintainable way to parse XML on Android. Historically Android has had two 55 implementations of this interface:</p> 56 57 <ul> 58 <li><a href="http://kxml.sourceforge.net/"><code>KXmlParser</code></a> 59 via {@link org.xmlpull.v1.XmlPullParserFactory#newPullParser XmlPullParserFactory.newPullParser()}. 60 </li> 61 <li><code>ExpatPullParser</code>, via 62 {@link android.util.Xml#newPullParser Xml.newPullParser()}. 63 </li> 64 </ul> 65 66 <p>Either choice is fine. The 67 example in this section uses <code>ExpatPullParser</code>, via 68 {@link android.util.Xml#newPullParser Xml.newPullParser()}. </p> 69 70 <h2 id="analyze">Analyze the Feed</h2> 71 72 <p>The first step in parsing a feed is to decide which fields you're interested in. 73 The parser extracts data for those fields and ignores the rest.</p> 74 75 <p>Here is an excerpt from the feed that's being parsed in the sample app. Each 76 post to <a href="http://stackoverflow.com">StackOverflow.com</a> appears in the 77 feed as an <code>entry</code> tag that contains several nested tags:</p> 78 79 <pre><?xml version="1.0" encoding="utf-8"?> 80 <feed xmlns="http://www.w3.org/2005/Atom" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" ..."> 81 <title type="text">newest questions tagged android - Stack Overflow</title> 82 ... 83 <entry> 84 ... 85 </entry> 86 <entry> 87 <id>http://stackoverflow.com/q/9439999</id> 88 <re:rank scheme="http://stackoverflow.com">0</re:rank> 89 <title type="text">Where is my data file?</title> 90 <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/> 91 <category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/> 92 <author> 93 <name>cliff2310</name> 94 <uri>http://stackoverflow.com/users/1128925</uri> 95 </author> 96 <link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /> 97 <published>2012-02-25T00:30:54Z</published> 98 <updated>2012-02-25T00:30:54Z</updated> 99 <summary type="html"> 100 <p>I have an Application that requires a data file...</p> 101 102 </summary> 103 </entry> 104 <entry> 105 ... 106 </entry> 107 ... 108 </feed></pre> 109 110 <p>The sample app 111 extracts data for the <code>entry</code> tag and its nested tags 112 <code>title</code>, <code>link</code>, and <code>summary</code>.</p> 113 114 115 <h2 id="instantiate">Instantiate the Parser</h2> 116 117 <p>The next step is to 118 instantiate a parser and kick off the parsing process. In this snippet, a parser 119 is initialized to not process namespaces, and to use the provided {@link 120 java.io.InputStream} as its input. It starts the parsing process with a call to 121 {@link org.xmlpull.v1.XmlPullParser#nextTag() nextTag()} and invokes the 122 <code>readFeed()</code> method, which extracts and processes the data the app is 123 interested in:</p> 124 125 <pre>public class StackOverflowXmlParser { 126 // We don't use namespaces 127 private static final String ns = null; 128 129 public List<Entry> parse(InputStream in) throws XmlPullParserException, IOException { 130 try { 131 XmlPullParser parser = Xml.newPullParser(); 132 parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false); 133 parser.setInput(in, null); 134 parser.nextTag(); 135 return readFeed(parser); 136 } finally { 137 in.close(); 138 } 139 } 140 ... 141 }</pre> 142 143 <h2 id="read">Read the Feed</h2> 144 145 <p>The <code>readFeed()</code> method does the actual work of processing the 146 feed. It looks for elements tagged "entry" as a starting point for recursively 147 processing the feed. If a tag isn't an {@code entry} tag, it skips it. Once the whole 148 feed has been recursively processed, <code>readFeed()</code> returns a {@link 149 java.util.List} containing the entries (including nested data members) it 150 extracted from the feed. This {@link java.util.List} is then returned by the 151 parser.</p> 152 153 <pre> 154 private List<Entry> readFeed(XmlPullParser parser) throws XmlPullParserException, IOException { 155 List<Entry> entries = new ArrayList<Entry>(); 156 157 parser.require(XmlPullParser.START_TAG, ns, "feed"); 158 while (parser.next() != XmlPullParser.END_TAG) { 159 if (parser.getEventType() != XmlPullParser.START_TAG) { 160 continue; 161 } 162 String name = parser.getName(); 163 // Starts by looking for the entry tag 164 if (name.equals("entry")) { 165 entries.add(readEntry(parser)); 166 } else { 167 skip(parser); 168 } 169 } 170 return entries; 171 }</pre> 172 173 174 <h2 id="parse">Parse XML</h2> 175 176 177 <p>The steps for parsing an XML feed are as follows:</p> 178 <ol> 179 180 <li>As described in <a href="#analyze">Analyze the Feed</a>, identify the tags you want to include in your app. This 181 example extracts data for the <code>entry</code> tag and its nested tags 182 <code>title</code>, <code>link</code>, and <code>summary</code>.</li> 183 184 <li>Create the following methods:</p> 185 186 <ul> 187 188 <li>A "read" method for each tag you're interested in. For example, 189 <code>readEntry()</code>, <code>readTitle()</code>, and so on. The parser reads 190 tags from the input stream. When it encounters a tag named <code>entry</code>, 191 <code>title</code>, 192 <code>link</code> or <code>summary</code>, it calls the appropriate method 193 for that tag. Otherwise, it skips the tag. 194 </li> 195 196 <li>Methods to extract data for each different type of tag and to advance the 197 parser to the next tag. For example: 198 <ul> 199 200 <li>For the <code>title</code> and <code>summary</code> tags, the parser calls 201 <code>readText()</code>. This method extracts data for these tags by calling 202 <code>parser.getText()</code>.</li> 203 204 <li>For the <code>link</code> tag, the parser extracts data for links by first 205 determining if the link is the kind 206 it's interested in. Then it uses <code>parser.getAttributeValue()</code> to 207 extract the link's value.</li> 208 209 <li>For the <code>entry</code> tag, the parser calls <code>readEntry()</code>. 210 This method parses the entry's nested tags and returns an <code>Entry</code> 211 object with the data members <code>title</code>, <code>link</code>, and 212 <code>summary</code>.</li> 213 214 </ul> 215 </li> 216 <li>A helper <code>skip()</code> method that's recursive. For more discussion of this topic, see <a href="#skip">Skip Tags You Don't Care About</a>.</li> 217 </ul> 218 219 </li> 220 </ol> 221 222 <p>This snippet shows how the parser parses entries, titles, links, and summaries.</p> 223 <pre>public static class Entry { 224 public final String title; 225 public final String link; 226 public final String summary; 227 228 private Entry(String title, String summary, String link) { 229 this.title = title; 230 this.summary = summary; 231 this.link = link; 232 } 233 } 234 235 // Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off 236 // to their respective "read" methods for processing. Otherwise, skips the tag. 237 private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException { 238 parser.require(XmlPullParser.START_TAG, ns, "entry"); 239 String title = null; 240 String summary = null; 241 String link = null; 242 while (parser.next() != XmlPullParser.END_TAG) { 243 if (parser.getEventType() != XmlPullParser.START_TAG) { 244 continue; 245 } 246 String name = parser.getName(); 247 if (name.equals("title")) { 248 title = readTitle(parser); 249 } else if (name.equals("summary")) { 250 summary = readSummary(parser); 251 } else if (name.equals("link")) { 252 link = readLink(parser); 253 } else { 254 skip(parser); 255 } 256 } 257 return new Entry(title, summary, link); 258 } 259 260 // Processes title tags in the feed. 261 private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException { 262 parser.require(XmlPullParser.START_TAG, ns, "title"); 263 String title = readText(parser); 264 parser.require(XmlPullParser.END_TAG, ns, "title"); 265 return title; 266 } 267 268 // Processes link tags in the feed. 269 private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException { 270 String link = ""; 271 parser.require(XmlPullParser.START_TAG, ns, "link"); 272 String tag = parser.getName(); 273 String relType = parser.getAttributeValue(null, "rel"); 274 if (tag.equals("link")) { 275 if (relType.equals("alternate")){ 276 link = parser.getAttributeValue(null, "href"); 277 parser.nextTag(); 278 } 279 } 280 parser.require(XmlPullParser.END_TAG, ns, "link"); 281 return link; 282 } 283 284 // Processes summary tags in the feed. 285 private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException { 286 parser.require(XmlPullParser.START_TAG, ns, "summary"); 287 String summary = readText(parser); 288 parser.require(XmlPullParser.END_TAG, ns, "summary"); 289 return summary; 290 } 291 292 // For the tags title and summary, extracts their text values. 293 private String readText(XmlPullParser parser) throws IOException, XmlPullParserException { 294 String result = ""; 295 if (parser.next() == XmlPullParser.TEXT) { 296 result = parser.getText(); 297 parser.nextTag(); 298 } 299 return result; 300 } 301 ... 302 }</pre> 303 304 <h2 id="skip">Skip Tags You Don't Care About</h2> 305 306 <p>One of the steps in the XML parsing described above is for the parser to skip tags it's not interested in. Here is the parser's <code>skip()</code> method:</p> 307 308 <pre> 309 private void skip(XmlPullParser parser) throws XmlPullParserException, IOException { 310 if (parser.getEventType() != XmlPullParser.START_TAG) { 311 throw new IllegalStateException(); 312 } 313 int depth = 1; 314 while (depth != 0) { 315 switch (parser.next()) { 316 case XmlPullParser.END_TAG: 317 depth--; 318 break; 319 case XmlPullParser.START_TAG: 320 depth++; 321 break; 322 } 323 } 324 } 325 </pre> 326 327 <p>This is how it works:</p> 328 329 <ul> 330 331 <li>It throws an exception if the current event isn't a 332 <code>START_TAG</code>.</li> 333 334 <li>It consumes the <code>START_TAG</code>, and all events up to and including 335 the matching <code>END_TAG</code>.</li> 336 337 <li>To make sure that it stops at the correct <code>END_TAG</code> and not at 338 the first tag it encounters after the original <code>START_TAG</code>, it keeps 339 track of the nesting depth.</li> 340 341 </ul> 342 343 <p>Thus if the current element has nested elements, the value of 344 <code>depth</code> won't be 0 until the parser has consumed all events between 345 the original <code>START_TAG</code> and its matching <code>END_TAG</code>. For 346 example, consider how the parser skips the <code><author></code> element, 347 which has 2 nested elements, <code><name></code> and 348 <code><uri></code>:</p> 349 350 <ul> 351 352 <li>The first time through the <code>while</code> loop, the next tag the parser 353 encounters after <code><author></code> is the <code>START_TAG</code> for 354 <code><name></code>. The value for <code>depth</code> is incremented to 355 2.</li> 356 357 <li>The second time through the <code>while</code> loop, the next tag the parser 358 encounters is the <code>END_TAG</code> <code></name></code>. The value 359 for <code>depth</code> is decremented to 1.</li> 360 361 <li>The third time through the <code>while</code> loop, the next tag the parser 362 encounters is the <code>START_TAG</code> <code><uri></code>. The value 363 for <code>depth</code> is incremented to 2.</li> 364 365 <li>The fourth time through the <code>while</code> loop, the next tag the parser 366 encounters is the <code>END_TAG</code> <code></uri></code>. The value for 367 <code>depth</code> is decremented to 1.</li> 368 369 <li>The fifth time and final time through the <code>while</code> loop, the next 370 tag the parser encounters is the <code>END_TAG</code> 371 <code></author></code>. The value for <code>depth</code> is decremented to 372 0, indicating that the <code><author></code> element has been successfully 373 skipped.</li> 374 375 </ul> 376 377 <h2 id="consume">Consume XML Data</h2> 378 379 <p>The example application fetches and parses the XML feed within an {@link 380 android.os.AsyncTask}. This takes the processing off the main UI thread. When 381 processing is complete, the app updates the UI in the main activity 382 (<code>NetworkActivity</code>).</p> 383 <p>In the excerpt shown below, the <code>loadPage()</code> method does the 384 following:</p> 385 386 <ul> 387 388 <li>Initializes a string variable with the URL for the XML feed.</li> 389 390 <li>If the user's settings and the network connection allow it, invokes 391 <code>new DownloadXmlTask().execute(url)</code>. This instantiates a new 392 <code>DownloadXmlTask</code> object ({@link android.os.AsyncTask} subclass) and 393 runs its {@link android.os.AsyncTask#execute execute()} method, which downloads 394 and parses the feed and returns a string result to be displayed in the UI.</li> 395 396 </ul> 397 <pre> 398 public class NetworkActivity extends Activity { 399 public static final String WIFI = "Wi-Fi"; 400 public static final String ANY = "Any"; 401 private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest"; 402 403 // Whether there is a Wi-Fi connection. 404 private static boolean wifiConnected = false; 405 // Whether there is a mobile connection. 406 private static boolean mobileConnected = false; 407 // Whether the display should be refreshed. 408 public static boolean refreshDisplay = true; 409 public static String sPref = null; 410 411 ... 412 413 // Uses AsyncTask to download the XML feed from stackoverflow.com. 414 public void loadPage() { 415 416 if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) { 417 new DownloadXmlTask().execute(URL); 418 } 419 else if ((sPref.equals(WIFI)) && (wifiConnected)) { 420 new DownloadXmlTask().execute(URL); 421 } else { 422 // show error 423 } 424 }</pre> 425 426 <p>The {@link android.os.AsyncTask} subclass shown below, 427 <code>DownloadXmlTask</code>, implements the following {@link 428 android.os.AsyncTask} methods:</p> 429 430 <ul> 431 432 <li>{@link android.os.AsyncTask#doInBackground doInBackground()} executes 433 the method <code>loadXmlFromNetwork()</code>. It passes the feed URL as a 434 parameter. The method <code>loadXmlFromNetwork()</code> fetches and processes 435 the feed. When it finishes, it passes back a result string.</li> 436 437 <li>{@link android.os.AsyncTask#onPostExecute onPostExecute()} takes the 438 returned string and displays it in the UI.</li> 439 440 </ul> 441 442 <pre> 443 // Implementation of AsyncTask used to download XML feed from stackoverflow.com. 444 private class DownloadXmlTask extends AsyncTask<String, Void, String> { 445 @Override 446 protected String doInBackground(String... urls) { 447 try { 448 return loadXmlFromNetwork(urls[0]); 449 } catch (IOException e) { 450 return getResources().getString(R.string.connection_error); 451 } catch (XmlPullParserException e) { 452 return getResources().getString(R.string.xml_error); 453 } 454 } 455 456 @Override 457 protected void onPostExecute(String result) { 458 setContentView(R.layout.main); 459 // Displays the HTML string in the UI via a WebView 460 WebView myWebView = (WebView) findViewById(R.id.webview); 461 myWebView.loadData(result, "text/html", null); 462 } 463 }</pre> 464 465 <p>Below is the method <code>loadXmlFromNetwork()</code> that is invoked from 466 <code>DownloadXmlTask</code>. It does the following:</p> 467 468 <ol> 469 470 <li>Instantiates a <code>StackOverflowXmlParser</code>. It also creates variables for 471 a {@link java.util.List} of <code>Entry</code> objects (<code>entries</code>), and 472 <code>title</code>, <code>url</code>, and <code>summary</code>, to hold the 473 values extracted from the XML feed for those fields.</li> 474 475 <li>Calls <code>downloadUrl()</code>, which fetches the feed and returns it as 476 an {@link java.io.InputStream}.</li> 477 478 <li>Uses <code>StackOverflowXmlParser</code> to parse the {@link java.io.InputStream}. 479 <code>StackOverflowXmlParser</code> populates a 480 {@link java.util.List} of <code>entries</code> with data from the feed.</li> 481 482 <li>Processes the <code>entries</code> {@link java.util.List}, 483 and combines the feed data with HTML markup.</li> 484 485 <li>Returns an HTML string that is displayed in the main activity 486 UI by the {@link android.os.AsyncTask} method {@link 487 android.os.AsyncTask#onPostExecute onPostExecute()}.</li> 488 489 </ol> 490 491 <pre> 492 // Uploads XML from stackoverflow.com, parses it, and combines it with 493 // HTML markup. Returns HTML string. 494 private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException { 495 InputStream stream = null; 496 // Instantiate the parser 497 StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser(); 498 List<Entry> entries = null; 499 String title = null; 500 String url = null; 501 String summary = null; 502 Calendar rightNow = Calendar.getInstance(); 503 DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa"); 504 505 // Checks whether the user set the preference to include summary text 506 SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this); 507 boolean pref = sharedPrefs.getBoolean("summaryPref", false); 508 509 StringBuilder htmlString = new StringBuilder(); 510 htmlString.append("<h3>" + getResources().getString(R.string.page_title) + "</h3>"); 511 htmlString.append("<em>" + getResources().getString(R.string.updated) + " " + 512 formatter.format(rightNow.getTime()) + "</em>"); 513 514 try { 515 stream = downloadUrl(urlString); 516 entries = stackOverflowXmlParser.parse(stream); 517 // Makes sure that the InputStream is closed after the app is 518 // finished using it. 519 } finally { 520 if (stream != null) { 521 stream.close(); 522 } 523 } 524 525 // StackOverflowXmlParser returns a List (called "entries") of Entry objects. 526 // Each Entry object represents a single post in the XML feed. 527 // This section processes the entries list to combine each entry with HTML markup. 528 // Each entry is displayed in the UI as a link that optionally includes 529 // a text summary. 530 for (Entry entry : entries) { 531 htmlString.append("<p><a href='"); 532 htmlString.append(entry.link); 533 htmlString.append("'>" + entry.title + "</a></p>"); 534 // If the user set the preference to include summary text, 535 // adds it to the display. 536 if (pref) { 537 htmlString.append(entry.summary); 538 } 539 } 540 return htmlString.toString(); 541 } 542 543 // Given a string representation of a URL, sets up a connection and gets 544 // an input stream. 545 private InputStream downloadUrl(String urlString) throws IOException { 546 URL url = new URL(urlString); 547 HttpURLConnection conn = (HttpURLConnection) url.openConnection(); 548 conn.setReadTimeout(10000 /* milliseconds */); 549 conn.setConnectTimeout(15000 /* milliseconds */); 550 conn.setRequestMethod("GET"); 551 conn.setDoInput(true); 552 // Starts the query 553 conn.connect(); 554 InputStream stream = conn.getInputStream(); 555 }</pre> 556