For this assignment, you will work with an XML corpus that has already been tagged with geographic information - Digital Mitford - and use XSLT to transform XML into GeoJSON. You will then use the output GeoJSON file as in input into ArcGIS, which you will use to create your own maps. You already know to achieve this from the previous assignment. The main differences are that (a) you will be writing your own XSLT transformation, (b) you will be using Real Life data, and (c) you will be learning to style the output data with ArcGIS.
Head on over to Digital Mitford and you can find an index of the reading views of the encoded letters. For now, your corpus can be a single letter: pick something that looks interesting. Once you click through to the reading view of the letter, you can download the XML from a link at the top that says "TEI encoding of this letter."
Now you are going to want to open up your XML document in oXygen and explore the structure: in what element / attribute can you find a suitable unique identifier? Remember, to get the coordinates, you are going to need a UID to use with the map function.
Next comes the fun part: exploit the power of XML structure for geographic analysis. At a minumum, our goal will be to first determine how many unique locations are mentioned in the corpus, and then count how many times each of those locations are referenced. If we can capture that location count as a GeoJSON property, it will be pretty straightforward to later style our geographic data based on that property value using something like ArcGIS (e.g., how about circles sized based on number references in the corpus?).
Our target output is something like this, but with more points:
{ "type": "FeatureCollection", "features": [ { "type": "Feature", "geometry": { "type": "Point", "coordinates": [ -3.716667, 50.716667 ] }, "properties": { "name": "Devonshire_county", "count": "4" } } ] }
So the fields we need to supply dynamically with our transformation are (a) the coordinates, and (b) the count.
For the purposes of this exercise, we are providing the coordinates for you in the form of an XSLT map-function, which you can find at this link. You can use it to swap out the unique location ID for the coordinates themselves.
We produced the map by running an XSLT transformation on the Mitford Site Index and then copying in that code into a new stylesheet. You should be able to read and understand the XSLT that produced the provided map.
Now on to the data we wish to extract from the XML: to get a sequence of unique places, you can declare a variable that uses the distinct-values() function. Then you can iterate over the unique values in your variable with a <xsl:for-each/> loop. Note: once you are inside the for-loop iterating over the set of unique place names, you are effectively disconnected from the XML tree (because you created a variable that is a sequence of strings). One solution is to create a document-node variable, and then call it inside the for-loop: <xsl:variable name="root" as="document-node()" select="/"/>
Now that you have a valid GeoJSON file, what to do with it? GeoJSONs contain information about geometry, but do not inherently contain any styling information: for that, you have to decide what application or coding language you want to work with. Coding languages like R and Python have more robust, non-GUI libraries that produce maps and can read GeoJSON. QGIS is an open-source, free GUI alternative. These are all strong options, depending on your needs, and there are others as well.
Since Pitt has a licence for the online version (requiring no installation), we are going to work with ArcGIS for this exercise. Please consult these instructions for using your GeoJSON file to design your own map.
What's cooler than deriving geographic data from one XML document? Extracting it from all the documents.
In your assignments so far, you've been taking one single XML document as the input for your XSLT transformation. But you can also use a collection() of documents as your input. For our purposes in this assignment, doing so will allow you to derive much more interesting results: one single Mitford letter will generally only reference a handful of locations, but if you consider many of the letters together, you can answer an actual research question (e.g., "what did Mitford's conceptual world look like?").
First, in your oXygen debugging view, select "( None ) " for your XML input: we will specify the input in the code itself.
Next, create a variable with a collection as the value: <xsl:variable name="mitford-corpus" as="document-node()+" select="collection('/Users/YourPath')"/> . This variable is specifying the path to the directory where you are storying all of the XML files you want to use in this collection. If there are any files in that directory in formats not recognized by your XSLT transformation, your code won't work. To prevent problems, you can add an extra command to the end of your path telling it to ignore anything that is not an XML file: <xsl:variable name="mitford-corpus" as="document-node()+" select="collection('/Users/YourPath?select=*.xml')"/>
You are nearly there: all you have to do to make this work is match this collection variable you created instead of the document node: <xsl:template name="xsl:initial-template"> . (Note that if you declare a match for the document node (<xsl:template match="/"> it will fire once for each document in the collection.)
Please submit both your XSLT transformation and a map image file(s) you produced using ArcGIS.