Home | History | Annotate | Download | only in network-ops
      1 page.title=Parsing XML Data
      2 parent.title=Performing Network Operations
      3 parent.link=index.html
      4 
      5 trainingnavtop=true
      6 
      7 previous.title=Managing Network Usage
      8 previous.link=managing.html
      9 
     10 @jd:body
     11 
     12 <div id="tb-wrapper"> 
     13 <div id="tb">
     14 
     15 
     16 
     17 <h2>This lesson teaches you to</h2>
     18 <ol>
     19   <li><a href="#choose">Choose a Parser</a></li>
     20   <li><a href="#analyze">Analyze the Feed</a></li>
     21   <li><a href="#instantiate">Instantiate the Parser</a></li>
     22   <li><a href="#read">Read the Feed</a></li>
     23   <li><a href="#parse">Parse XML</a></li>
     24   <li><a href="#skip">Skip Tags You Don't Care About</a></li>
     25   <li><a href="#consume">Consume XML Data</a></li>
     26 </ol>
     27 
     28 <h2>You should also read</h2>
     29 <ul>
     30   <li><a href="{@docRoot}guide/webapps/index.html">Web Apps Overview</a></li>
     31 </ul>
     32 
     33 <h2>Try it out</h2>
     34 
     35 <div class="download-box">
     36   <a href="{@docRoot}shareables/training/NetworkUsage.zip"
     37 class="button">Download the sample</a>
     38  <p class="filename">NetworkUsage.zip</p>
     39 </div>
     40 
     41 </div> 
     42 </div>
     43 
     44 <p>Extensible Markup Language (XML) is a set of rules for encoding documents in
     45 machine-readable form. XML is a popular format for sharing data on the internet.
     46 Websites that frequently update their content, such as news sites or blogs,
     47 often provide an XML feed so that external programs can keep abreast of content
     48 changes. Uploading and parsing XML data is a common task for network-connected
     49 apps. This lesson explains how to parse XML documents and use their data.</p>
     50 
     51 <h2 id="choose">Choose a Parser</h2>
     52 
     53 <p>We recommend {@link org.xmlpull.v1.XmlPullParser}, which is an efficient and
     54 maintainable way to parse XML on Android. Historically Android has had two
     55 implementations of this interface:</p>
     56 
     57 <ul>
     58   <li><a href="http://kxml.sourceforge.net/"><code>KXmlParser</code></a> 
     59   via {@link org.xmlpull.v1.XmlPullParserFactory#newPullParser XmlPullParserFactory.newPullParser()}. 
     60   </li>
     61   <li><code>ExpatPullParser</code>, via 
     62   {@link android.util.Xml#newPullParser Xml.newPullParser()}. 
     63   </li>
     64 </ul>
     65 
     66 <p>Either choice is fine. The
     67 example in this section uses <code>ExpatPullParser</code>, via
     68 {@link android.util.Xml#newPullParser Xml.newPullParser()}. </p>
     69 
     70 <h2 id="analyze">Analyze the Feed</h2>
     71 
     72 <p>The first step in parsing a feed is to decide which fields you're interested in. 
     73 The parser extracts data for those fields and ignores the rest.</p>
     74 
     75 <p>Here is an excerpt from the feed that's being parsed in the sample app. Each
     76 post to <a href="http://stackoverflow.com">StackOverflow.com</a> appears in the
     77 feed as an <code>entry</code> tag that contains several nested tags:</p>
     78 
     79 <pre>&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt; 
     80 &lt;feed xmlns=&quot;http://www.w3.org/2005/Atom"; xmlns:creativeCommons=&quot;http://backend.userland.com/creativeCommonsRssModule"; ...&quot;&gt;     
     81 &lt;title type=&quot;text&quot;&gt;newest questions tagged android - Stack Overflow&lt;/title&gt;
     82 ...
     83     &lt;entry&gt;
     84     ...
     85     &lt;/entry&gt;
     86     &lt;entry&gt;
     87         &lt;id&gt;http://stackoverflow.com/q/9439999</id>;
     88         &lt;re:rank scheme="http://stackoverflow.com"&gt;0&lt;/re:rank&gt;
     89         &lt;title type="text"&gt;Where is my data file?&lt;/title&gt;
     90         &lt;category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="android"/&gt;
     91         &lt;category scheme="http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest/tags" term="file"/&gt;
     92         &lt;author&gt;
     93             &lt;name&gt;cliff2310&lt;/name&gt;
     94             &lt;uri&gt;http://stackoverflow.com/users/1128925</uri>;
     95         &lt;/author&gt;
     96         &lt;link rel="alternate" href="http://stackoverflow.com/questions/9439999/where-is-my-data-file" /&gt;
     97         &lt;published&gt;2012-02-25T00:30:54Z&lt;/published&gt;
     98         &lt;updated&gt;2012-02-25T00:30:54Z&lt;/updated&gt;
     99         &lt;summary type="html"&gt;
    100             &lt;p&gt;I have an Application that requires a data file...&lt;/p&gt;
    101 
    102         &lt;/summary&gt;
    103     &lt;/entry&gt;
    104     &lt;entry&gt;
    105     ...
    106     &lt;/entry&gt;
    107 ...
    108 &lt;/feed&gt;</pre>
    109 
    110 <p>The sample app  
    111 extracts data for the <code>entry</code> tag and its nested tags
    112 <code>title</code>, <code>link</code>, and <code>summary</code>.</p>
    113 
    114 
    115 <h2 id="instantiate">Instantiate the Parser</h2>
    116 
    117 <p>The next step is to
    118 instantiate a parser and kick off the parsing process. In this snippet, a parser
    119 is initialized to not process namespaces, and to use the provided {@link
    120 java.io.InputStream} as its input. It starts the parsing process with a call to
    121 {@link org.xmlpull.v1.XmlPullParser#nextTag() nextTag()} and invokes the
    122 <code>readFeed()</code> method, which extracts and processes the data the app is
    123 interested in:</p>
    124 
    125 <pre>public class StackOverflowXmlParser {
    126     // We don't use namespaces
    127     private static final String ns = null;
    128    
    129     public List<Entry> parse(InputStream in) throws XmlPullParserException, IOException {
    130         try {
    131             XmlPullParser parser = Xml.newPullParser();
    132             parser.setFeature(XmlPullParser.FEATURE_PROCESS_NAMESPACES, false);
    133             parser.setInput(in, null);
    134             parser.nextTag();
    135             return readFeed(parser);
    136         } finally {
    137             in.close();
    138         }
    139     }
    140  ... 
    141 }</pre>
    142 
    143 <h2 id="read">Read the Feed</h2>
    144 
    145 <p>The <code>readFeed()</code> method does the actual work of processing the
    146 feed. It looks for elements tagged "entry" as a starting point for recursively
    147 processing the feed. If a tag isn't an {@code entry} tag, it skips it. Once the whole
    148 feed has been recursively processed, <code>readFeed()</code> returns a {@link
    149 java.util.List} containing the entries (including nested data members) it
    150 extracted from the feed. This {@link java.util.List} is then returned by the
    151 parser.</p>
    152 
    153 <pre>
    154 private List<Entry> readFeed(XmlPullParser parser) throws XmlPullParserException, IOException {
    155     List<Entry> entries = new ArrayList<Entry>();
    156 
    157     parser.require(XmlPullParser.START_TAG, ns, "feed");
    158     while (parser.next() != XmlPullParser.END_TAG) {
    159         if (parser.getEventType() != XmlPullParser.START_TAG) {
    160             continue;
    161         }
    162         String name = parser.getName();
    163         // Starts by looking for the entry tag
    164         if (name.equals("entry")) {
    165             entries.add(readEntry(parser));
    166         } else {
    167             skip(parser);
    168         }
    169     }  
    170     return entries;
    171 }</pre>
    172 
    173 
    174 <h2 id="parse">Parse XML</h2>
    175 
    176 
    177 <p>The steps for parsing an XML feed are as follows:</p>
    178 <ol>
    179 
    180   <li>As described in <a href="#analyze">Analyze the Feed</a>, identify the tags you want to include in your app. This
    181 example extracts data for the <code>entry</code> tag and its nested tags
    182 <code>title</code>, <code>link</code>, and <code>summary</code>.</li>
    183 
    184 <li>Create the following methods:</p>
    185 
    186 <ul>
    187 
    188 <li>A "read" method for each tag you're interested in. For example,
    189 <code>readEntry()</code>, <code>readTitle()</code>, and so on. The parser reads
    190 tags from the input stream. When it encounters a tag named <code>entry</code>, 
    191 <code>title</code>,
    192 <code>link</code> or <code>summary</code>, it calls the appropriate method
    193 for that tag. Otherwise, it skips the tag.
    194 </li>
    195 
    196 <li>Methods to extract data for each different type of tag and to advance the
    197 parser to the next tag. For example:
    198 <ul>
    199 
    200 <li>For the <code>title</code> and <code>summary</code> tags, the parser calls
    201 <code>readText()</code>. This method extracts data for these tags by calling
    202 <code>parser.getText()</code>.</li>
    203 
    204 <li>For the <code>link</code> tag, the parser extracts data for links by first 
    205 determining if the link is the kind
    206 it's interested in. Then it uses <code>parser.getAttributeValue()</code> to
    207 extract the link's value.</li>
    208 
    209 <li>For the <code>entry</code> tag, the parser calls <code>readEntry()</code>.
    210 This method parses the entry's nested tags and returns an <code>Entry</code>
    211 object with the data members <code>title</code>, <code>link</code>, and
    212 <code>summary</code>.</li>
    213 
    214 </ul>
    215 </li>
    216 <li>A helper <code>skip()</code> method that's recursive. For more discussion of this topic, see <a href="#skip">Skip Tags You Don't Care About</a>.</li>
    217 </ul>
    218   
    219   </li>
    220 </ol>
    221 
    222 <p>This snippet shows how the parser parses entries, titles, links, and summaries.</p>
    223 <pre>public static class Entry {
    224     public final String title;
    225     public final String link;
    226     public final String summary;
    227 
    228     private Entry(String title, String summary, String link) {
    229         this.title = title;
    230         this.summary = summary;
    231         this.link = link;
    232     }
    233 }
    234   
    235 // Parses the contents of an entry. If it encounters a title, summary, or link tag, hands them off
    236 // to their respective &quot;read&quot; methods for processing. Otherwise, skips the tag.
    237 private Entry readEntry(XmlPullParser parser) throws XmlPullParserException, IOException {
    238     parser.require(XmlPullParser.START_TAG, ns, "entry");
    239     String title = null;
    240     String summary = null;
    241     String link = null;
    242     while (parser.next() != XmlPullParser.END_TAG) {
    243         if (parser.getEventType() != XmlPullParser.START_TAG) {
    244             continue;
    245         }
    246         String name = parser.getName();
    247         if (name.equals("title")) {
    248             title = readTitle(parser);
    249         } else if (name.equals("summary")) {
    250             summary = readSummary(parser);
    251         } else if (name.equals("link")) {
    252             link = readLink(parser);
    253         } else {
    254             skip(parser);
    255         }
    256     }
    257     return new Entry(title, summary, link);
    258 }
    259 
    260 // Processes title tags in the feed.
    261 private String readTitle(XmlPullParser parser) throws IOException, XmlPullParserException {
    262     parser.require(XmlPullParser.START_TAG, ns, "title");
    263     String title = readText(parser);
    264     parser.require(XmlPullParser.END_TAG, ns, "title");
    265     return title;
    266 }
    267   
    268 // Processes link tags in the feed.
    269 private String readLink(XmlPullParser parser) throws IOException, XmlPullParserException {
    270     String link = "";
    271     parser.require(XmlPullParser.START_TAG, ns, "link");
    272     String tag = parser.getName();
    273     String relType = parser.getAttributeValue(null, "rel");  
    274     if (tag.equals("link")) {
    275         if (relType.equals("alternate")){
    276             link = parser.getAttributeValue(null, "href");
    277             parser.nextTag();
    278         } 
    279     }
    280     parser.require(XmlPullParser.END_TAG, ns, "link");
    281     return link;
    282 }
    283 
    284 // Processes summary tags in the feed.
    285 private String readSummary(XmlPullParser parser) throws IOException, XmlPullParserException {
    286     parser.require(XmlPullParser.START_TAG, ns, "summary");
    287     String summary = readText(parser);
    288     parser.require(XmlPullParser.END_TAG, ns, "summary");
    289     return summary;
    290 }
    291 
    292 // For the tags title and summary, extracts their text values.
    293 private String readText(XmlPullParser parser) throws IOException, XmlPullParserException {
    294     String result = "";
    295     if (parser.next() == XmlPullParser.TEXT) {
    296         result = parser.getText();
    297         parser.nextTag();
    298     }
    299     return result;
    300 }
    301   ...
    302 }</pre>
    303 
    304 <h2 id="skip">Skip Tags You Don't Care About</h2>
    305 
    306 <p>One of the steps in the XML parsing described above is for the parser to skip tags it's not interested in. Here is the parser's <code>skip()</code> method:</p>
    307 
    308 <pre>
    309 private void skip(XmlPullParser parser) throws XmlPullParserException, IOException {
    310     if (parser.getEventType() != XmlPullParser.START_TAG) {
    311         throw new IllegalStateException();
    312     }
    313     int depth = 1;
    314     while (depth != 0) {
    315         switch (parser.next()) {
    316         case XmlPullParser.END_TAG:
    317             depth--;
    318             break;
    319         case XmlPullParser.START_TAG:
    320             depth++;
    321             break;
    322         }
    323     }
    324  }
    325 </pre>
    326 
    327 <p>This is how it works:</p>
    328 
    329 <ul>
    330 
    331 <li>It throws an exception if the current event isn't a
    332 <code>START_TAG</code>.</li>
    333 
    334 <li>It consumes the <code>START_TAG</code>, and all events up to and including
    335 the matching <code>END_TAG</code>.</li>
    336 
    337 <li>To make sure that it stops at the correct <code>END_TAG</code> and not at
    338 the first tag it encounters after the original <code>START_TAG</code>, it keeps
    339 track of the nesting depth.</li>
    340 
    341 </ul>
    342 
    343 <p>Thus if the current element has nested elements, the value of
    344 <code>depth</code> won't be 0 until the parser has consumed all events between
    345 the original <code>START_TAG</code> and its matching <code>END_TAG</code>. For
    346 example, consider how the parser skips the <code>&lt;author&gt;</code> element,
    347 which has 2 nested elements, <code>&lt;name&gt;</code> and
    348 <code>&lt;uri&gt;</code>:</p>
    349 
    350 <ul>
    351 
    352 <li>The first time through the <code>while</code> loop, the next tag the parser
    353 encounters after <code>&lt;author&gt;</code> is the <code>START_TAG</code> for 
    354 <code>&lt;name&gt;</code>. The value for <code>depth</code> is incremented to
    355 2.</li>
    356 
    357 <li>The second time through the <code>while</code> loop, the next tag the parser
    358 encounters is the <code>END_TAG</code>  <code>&lt;/name&gt;</code>. The value
    359 for <code>depth</code> is decremented to 1.</li>
    360 
    361 <li>The third time through the <code>while</code> loop, the next tag the parser
    362 encounters is the <code>START_TAG</code>  <code>&lt;uri&gt;</code>. The value
    363 for <code>depth</code> is incremented to 2.</li>
    364 
    365 <li>The fourth time through the <code>while</code> loop, the next tag the parser
    366 encounters is the <code>END_TAG</code>  <code>&lt;/uri&gt;</code>. The value for
    367 <code>depth</code> is decremented to 1.</li>
    368 
    369 <li>The fifth time and final time through the <code>while</code> loop, the next
    370 tag the parser encounters is the <code>END_TAG</code> 
    371 <code>&lt;/author&gt;</code>. The value for <code>depth</code> is decremented to
    372 0, indicating that the <code>&lt;author&gt;</code> element has been successfully
    373 skipped.</li>
    374 
    375 </ul>
    376 
    377 <h2 id="consume">Consume XML Data</h2>
    378 
    379 <p>The example application fetches and parses the XML feed within an {@link
    380 android.os.AsyncTask}. This takes the processing off the main UI thread. When 
    381 processing is complete, the app updates the UI in the main activity
    382 (<code>NetworkActivity</code>).</p>
    383 <p>In the excerpt shown below, the <code>loadPage()</code> method does the
    384 following:</p>
    385 
    386 <ul>
    387 
    388   <li>Initializes a string variable with the URL for the XML feed.</li>
    389   
    390   <li>If the user's settings and the network connection allow it, invokes
    391 <code>new DownloadXmlTask().execute(url)</code>. This instantiates a new 
    392 <code>DownloadXmlTask</code> object ({@link android.os.AsyncTask} subclass) and
    393 runs its {@link android.os.AsyncTask#execute execute()} method, which downloads
    394 and parses the feed and returns a string result to be displayed in the UI.</li>
    395   
    396 </ul>
    397 <pre>
    398 public class NetworkActivity extends Activity {
    399     public static final String WIFI = "Wi-Fi";
    400     public static final String ANY = "Any";
    401     private static final String URL = "http://stackoverflow.com/feeds/tag?tagnames=android&sort=newest";
    402    
    403     // Whether there is a Wi-Fi connection.
    404     private static boolean wifiConnected = false; 
    405     // Whether there is a mobile connection.
    406     private static boolean mobileConnected = false;
    407     // Whether the display should be refreshed.
    408     public static boolean refreshDisplay = true; 
    409     public static String sPref = null;
    410 
    411     ...
    412       
    413     // Uses AsyncTask to download the XML feed from stackoverflow.com.
    414     public void loadPage() {  
    415       
    416         if((sPref.equals(ANY)) && (wifiConnected || mobileConnected)) {
    417             new DownloadXmlTask().execute(URL);
    418         }
    419         else if ((sPref.equals(WIFI)) && (wifiConnected)) {
    420             new DownloadXmlTask().execute(URL);
    421         } else {
    422             // show error
    423         }  
    424     }</pre>
    425     
    426 <p>The {@link android.os.AsyncTask} subclass shown below,
    427 <code>DownloadXmlTask</code>, implements the following {@link
    428 android.os.AsyncTask} methods:</p>
    429 
    430     <ul>
    431     
    432       <li>{@link android.os.AsyncTask#doInBackground doInBackground()} executes
    433 the method <code>loadXmlFromNetwork()</code>. It passes the feed URL as a
    434 parameter. The method <code>loadXmlFromNetwork()</code> fetches and processes
    435 the feed. When it finishes, it passes back a result string.</li>
    436       
    437       <li>{@link android.os.AsyncTask#onPostExecute onPostExecute()} takes the
    438 returned string and displays it in the UI.</li>
    439       
    440     </ul>
    441     
    442 <pre>
    443 // Implementation of AsyncTask used to download XML feed from stackoverflow.com.
    444 private class DownloadXmlTask extends AsyncTask&lt;String, Void, String&gt; {
    445     &#64;Override
    446     protected String doInBackground(String... urls) {
    447         try {
    448             return loadXmlFromNetwork(urls[0]);
    449         } catch (IOException e) {
    450             return getResources().getString(R.string.connection_error);
    451         } catch (XmlPullParserException e) {
    452             return getResources().getString(R.string.xml_error);
    453         }
    454     }
    455 
    456     &#64;Override
    457     protected void onPostExecute(String result) {  
    458         setContentView(R.layout.main);
    459         // Displays the HTML string in the UI via a WebView
    460         WebView myWebView = (WebView) findViewById(R.id.webview);
    461         myWebView.loadData(result, "text/html", null);
    462     }
    463 }</pre>
    464 
    465    <p>Below is the method <code>loadXmlFromNetwork()</code> that is invoked from
    466 <code>DownloadXmlTask</code>. It does the following:</p>
    467    
    468    <ol>
    469    
    470      <li>Instantiates a <code>StackOverflowXmlParser</code>. It also creates variables for
    471 a {@link java.util.List} of <code>Entry</code> objects (<code>entries</code>), and 
    472 <code>title</code>, <code>url</code>, and <code>summary</code>, to hold the
    473 values extracted from the XML feed for those fields.</li>
    474      
    475      <li>Calls <code>downloadUrl()</code>, which fetches the feed and returns it as 
    476      an {@link java.io.InputStream}.</li>
    477      
    478      <li>Uses <code>StackOverflowXmlParser</code> to parse the {@link java.io.InputStream}. 
    479      <code>StackOverflowXmlParser</code> populates a 
    480      {@link java.util.List} of <code>entries</code> with data from the feed.</li>
    481      
    482      <li>Processes the <code>entries</code> {@link java.util.List}, 
    483  and combines the feed data with HTML markup.</li>
    484      
    485      <li>Returns an HTML string that is displayed in the main activity
    486 UI by the {@link android.os.AsyncTask} method {@link
    487 android.os.AsyncTask#onPostExecute onPostExecute()}.</li>
    488      
    489 </ol>
    490 
    491 <pre>
    492 // Uploads XML from stackoverflow.com, parses it, and combines it with
    493 // HTML markup. Returns HTML string.
    494 private String loadXmlFromNetwork(String urlString) throws XmlPullParserException, IOException {
    495     InputStream stream = null;
    496     // Instantiate the parser
    497     StackOverflowXmlParser stackOverflowXmlParser = new StackOverflowXmlParser();
    498     List&lt;Entry&gt; entries = null;
    499     String title = null;
    500     String url = null;
    501     String summary = null;
    502     Calendar rightNow = Calendar.getInstance(); 
    503     DateFormat formatter = new SimpleDateFormat("MMM dd h:mmaa");
    504         
    505     // Checks whether the user set the preference to include summary text
    506     SharedPreferences sharedPrefs = PreferenceManager.getDefaultSharedPreferences(this);
    507     boolean pref = sharedPrefs.getBoolean("summaryPref", false);
    508         
    509     StringBuilder htmlString = new StringBuilder();
    510     htmlString.append("&lt;h3&gt;" + getResources().getString(R.string.page_title) + "&lt;/h3&gt;");
    511     htmlString.append("&lt;em&gt;" + getResources().getString(R.string.updated) + " " + 
    512             formatter.format(rightNow.getTime()) + "&lt;/em&gt;");
    513         
    514     try {
    515         stream = downloadUrl(urlString);        
    516         entries = stackOverflowXmlParser.parse(stream);
    517     // Makes sure that the InputStream is closed after the app is
    518     // finished using it.
    519     } finally {
    520         if (stream != null) {
    521             stream.close();
    522         } 
    523      }
    524     
    525     // StackOverflowXmlParser returns a List (called "entries") of Entry objects.
    526     // Each Entry object represents a single post in the XML feed.
    527     // This section processes the entries list to combine each entry with HTML markup.
    528     // Each entry is displayed in the UI as a link that optionally includes
    529     // a text summary.
    530     for (Entry entry : entries) {       
    531         htmlString.append("&lt;p&gt;&lt;a href='");
    532         htmlString.append(entry.link);
    533         htmlString.append("'&gt;" + entry.title + "&lt;/a&gt;&lt;/p&gt;");
    534         // If the user set the preference to include summary text,
    535         // adds it to the display.
    536         if (pref) {
    537             htmlString.append(entry.summary);
    538         }
    539     }
    540     return htmlString.toString();
    541 }
    542 
    543 // Given a string representation of a URL, sets up a connection and gets
    544 // an input stream.
    545 private InputStream downloadUrl(String urlString) throws IOException {
    546     URL url = new URL(urlString);
    547     HttpURLConnection conn = (HttpURLConnection) url.openConnection();
    548     conn.setReadTimeout(10000 /* milliseconds */);
    549     conn.setConnectTimeout(15000 /* milliseconds */);
    550     conn.setRequestMethod("GET");
    551     conn.setDoInput(true);
    552     // Starts the query
    553     conn.connect();
    554     InputStream stream = conn.getInputStream();      
    555 }</pre>
    556