Tag Archives: Free data

Analysing Southwark’s natural geography

Following my map of London’s green and blue infrastructure, I have been working on some analysis of the land uses.

I was inspired and encouraged to try this by Liliana’s interesting work called “imagining all of Southwark“. Lili and Ari have managed to get the council to release lots of data on properties and car parking, and they are producing analysis of this data by postal code area and by street. They haven’t managed to get anything on land uses, so I thought, why not produce this with OpenStreetMap data?

A few evenings later, here is the result shared on Google docs (direct link) covering the eight postal code areas that between them cover most of the borough (SE1, SE5, SE15, SE16, SE17, SE21, SE22, SE24):

What the data means

The “summary” worksheet shows the total land area, expressed in hectares (10,000 m2), for various different types of land coverage. I have also calculated the percentage of that postal code area that the land uses represent, which gives an interesting insight into the differences between the areas.

Some of the land uses will overlap, for example miscellaneous bits of green space are often mapped on top of residential areas. So the numbers aren’t supposed to add up to anything like 100%.

The spreadsheet also contains worksheets for each postal code area. These contain a dump of all the objects in OpenStreetMap in those postal code areas, and this is the raw data the summary spreadsheet uses to get the totals.

Flaws in the data

You should use this data with a large spoonful of salt. Here are the significant flaws I have noticed:

Postal code areas are approximate, for example the boundary between SE15 and SE22 should mark the boundary between Peckham Rye Common (SE15) and Peckham Rye Park (SE22). In my data both the park and the common show up in both of the postal codes, because the boundary isn’t quite right. Read down to my method to see why. The errors introduced are pretty tiny in most places (plus or minus a few meters along the full boundary), and probably cancel themselves out for big land uses like residential, but they probably also introduce some significant errors for parks where the boundaries go awry by 20-30m in places. Sadly there aren’t any accurate open data polygons I can use.

Data is missing because OpenStreetMap contributors haven’t mapped it. Of course the easy solution here is to get more of it mapped and up to date! My estimate of the different types is as follows:

  • Allotments: complete for the whole borough.
  • Parks and commons: all major and district parks complete.
  • Misc green spaces: very poor coverage of, for example, large areas of grass on estates, especially in SE5, the north pat of SE15 and SE17.
  • Woods/forest: all major woods complete, coverage of big clumps of trees e.g. on a housing estate or in a park is very uneven.
  • Residential: complete except for SE16.
  • Industrial, retail, commercial: large areas are complete, but small shopping parades, industrial parks and rows of offices are very patchy.
  • Brownfield/construction: patchy across the borough and sometimes out of date as sites are built on.

Data is also sometimes missing because of flaws in the Geofabrik shapefiles, not all of which I have corrected. For example, I noticed they were missing commons so I manually added those in, but I may have missed other land uses. One major omission, a shame given the interest in them, is the humble sports pitch/playing field.

How I produced this

After a lot of experimentation – I’ve never been trained to use GIS tools – I worked out this method. If you know of an easier way I’d love to hear about it.

  1. Prepare the boundary data:
    1. Extract a polygon for the London Borough of Southwark from the OS Boundary-Line data.
    2. Download the OS Code-Point-Open data, open the spreadsheet for the SE area in QGIS and use the ftools ‘Voronoi polygons’ plugin to infer polygons for the postal codes from the centroids. Post code centroids are very dense in the middle of residential areas, so the boundary between SE15 4HR and SE22 9BD is only going to be out by a few meters, but are quite far apart with large parks and commons, so the inferred boundaries get less accurate in those areas. See this map for an illustration of the Peckham Rye Park / Common problem mentioned above.
    3. Merge together postal codes into the areas (e.g. SE22 9QF, SE22 4DU etc. into SE22) by quering the shapefile for all objects with postal codes starting with SE22, then using the mmqgis merge tool to merge them into single polygons. Clean up the attributes so the shapefile just has one attribute for the correct postal code area.
    4. Clip the postal codes by the Southwark polygon and save the result – finally – as the postal codes shapefile for Southwark.
  2. Prepare the land use data:
    1. Download the  OpenStreetMap shapefiles from Geofabrik for Greater London.
    2. Download common and marsh ways/relations using the Overpass API (with the meta flag on), import the data into QGIS using the OpenStreetMap plugin, and save the data as a Shapefile.
    3. Merge together the Geofabrik natural and landuse shapefiles with my Overpass-derived shapefile into one land use shape file using the mmqgis plugin.
    4. Clip the land use file by the Southwark polygon and save the result – finally – as the land uses shapefile for Southwark.
  3. Produce the postal code stats; for each postal code:
    1. Select the postal code, and clip the land use layer to that selected code, saving it as a new shapefile.
    2. Open that shapefile, then save it in a new projection that will be in meters rather than degrees (I used  EPSG:32631 – WGS 84 / UTM zone 31N).
    3. Open the new shapefile, then run the ftools ‘Export/add geometry columns’ tool (in Vector/Geometry Tools) to add two attributes to the objects for the area and perimeter.
    4. Save the layer again as a CSV file.
  4. Produce the stats for the area of each postal code so we can calculate % of the area as well as ha for each land use:
    1. Save the Southwark postal codes polygon in the meters projection, add the geometry columns, and save as a CSV file.
  5. Collate all the data
    1. Tidy up and copy the data from each CSV file into a spreadsheet, then add in the formulae to tot everything up. You’re done!

For reference, some of the totals in the summary work off more than one land use type so here are the categories and the corresponding OpenStreetMap tags:

  • Allotments – landuse=allotments
  • Parks and commons – leisure=park / leisure=common
  • Misc green spaces – landuse=conservation / landuse=farm / leisure=garden / landuse=grass / landuse=greenfield / landuse=greenspace / landuse=meadow / landuse=orchard / landuse=recreation_ground
  • Woods and forest – landuse=forest / natural=wood
  • Residential, industrial, retail, commercial, brownfield, construction – corresponding landuse tags

Future ideas

One obvious improvement would be to get more data in. Perhaps this first analysis will encourage people to help out with that? I have also emailed Geofabrik about the flaws I have discovered in their shapefiles, so I hope those get fixed.

Another thought is to produce the stats by council ward. But given that there are far more wards, I’d like to find a quicker way of producing the stats for each ward (step three above) first.

It would also be interesting to do it by town/suburb, for example comparing Peckham to East Dulwich. But we don’t have any meaningful boundaries for those natural areas. It would be really interesting to do a mass version of “this isn’t fucking Dalston” for a whole borough, using the Voronoi polygons method to infer areas from surveys at thousands of locations around the borough. One day…

Tagged , , , , , ,

London’s natural geography

I’ve been playing around with open data from OpenStreetMap and Natural England to make a pretty map of “green and blue infrastructure” in London. Here’s the result:

You can download a PDF version suitable for printing here: natural_london.

I’m pretty happy with the result, my first real attempt to produce something useful with QGIS. The data I used was:

There’s no reason the Natural England data couldn’t be manually added to OpenStreetMap, giving us a complete dataset of natural features. I just chose to get on and do it this way rather than wait, or try to add all the data across areas of the city I don’t know well and am not going to visit any time soon. I also didn’t really need to use the Ordnance Survey data for boundaries, but it’s slightly more accurate and complete than OpenStreetMap data.

The map is probably missing lots of smaller patches of green space, including grass verges, green roofs and biodiverse brownfield sites. The biggest omission is the humble private garden. They cover 24% of London’s land!

But the map at least shows the more obvious, visible, public green spaces, and is a nice example of what a geek with no GIS training (but years of playing with OpenStreetMap) can do with free software and free data these days.

Tagged , , , , , , ,

Sitting around the data campfire

Similar to Gail Ramster, I went along to the Friday afternoon part of UK GovCamp 2012 without really knowing why. I suspect most people would say the same thing. You go because… well, you never know which useful people you might bump into, and what interesting things you might hear about. Plus a colleague Janet Hughes was going, and I’d cleared my desk of essential work for the week.

Here are a few takeaway thoughts from my afternoon.

1. I barely knew anyone

It’s years since I was a fish in a geeky pool, active in the free culture movement, the KDE community, software patent activism and other odds and sods.

For the past five years or so I’ve moved onto land, or perhaps a coral reef, to be more involved with issues around the environment, housing and pay inequality. The past two or so have been working as a local government employee at the GLA, supporting Green Party Members of the London Assembly. They have pushed for open data, but it’s not exactly a hot topic in our weekly meetings. My only remaining connection has been OpenStreetMap, my one geeky obsession.

Still, it didn’t matter, go along even if you know no-one at all.

2. It was nice to reconnect with optimistic techies

The event reminded me of one of the things I most like about these crowds: they’re all optimistic about the future and enthusiastic about the common interest.

I’m glad I managed to quickly chat to a few people I did know, sort of… Gail via Twitter, and Giles Gibson from the Herne Hill Forum, but sadly I only said as much as “hello” to people like Emer Coleman and Chris Osborne. That’s what you get for arriving late and leaving early.

3. It’s more meaty than you’d think

That’s “meatspace” as in “the real physical world”, compared to “cyberspace” online. Compared to events a few years ago on open data and technology, most of the discussion I heard was about councils and companies working on staff structures and consultation processes, and then thinking about how technology and data could help.

I used to get frustrated with discussions that started with the assumption that open data and technology was going to revolutionise the world. That seemed upside down to me. So I was pleasantly surprised at this.

4. There’s a lot of “we”

Somebody pointed this out in one session – it’s very easy to apply “we” to the wider population when you really mean “we sort of people in this room”.

Often “we” are innovators or early adopters of ideas that become more mainstream, like using a smart phone to access services. Sometimes “we” are set to be a significant minority, like journalists, bloggers and politicians who use data to enhance their investigatory work. Just as often “we” are a world unto our own.

It’s fine, innocent mostly, typical of any event with like-minded people. It just grated on me when people talked about reconfiguring public services or management around their preferences, as though the rest of the world will thank them.

I might make a badge for myself if I go again, with the slogan “we’re not normal” or similar!

5. Theres a lot going on out there

Cocooned in City Hall, working on affordable housing or the pay gap, it’s hard to keep even a toe dipped in this pool. It was great hearing from so many people in so many walks of work and life doing so many useful things.

Sometimes when I map an area for OpenStreetMap, walking down a street noting house numbers, I feel a bit bewildered by all these people living here! London feels impossibly enormous. I left UKGovCamp feeling similarly bewildered by the enormity of work going on in this field, relative that is to my own small bits and pieces in my job and my free time.

Tagged , , , , , , , ,

Problems and possibilities with ward boundaries

Being actively involved in my local branch of the Green Party means I’ve spent a lot of time wandering around carrying a map of a local ward.

Almost nobody seems to know which ward they are in, often because the names are a bit abstract (e.g. “The Lane” in Peckham, which I presume is because “Rye Lane” runs through the middle) or because almost nobody would say they live in the area described (e.g. “Peckham Rye”, which has Peckham Rye Common and Park in the middle and includes areas normally thought to be part of East Dulwich and Nunhead).

Since the Ordnance Survey published open data, including political boundaries, it’s been possible to put this information into OpenStreetMap. I’ve finally bothered to start doing this for Southwark – you can see the results on this nice ITO map.

Unfortunately the default map on the OpenStreetMap homepage draws the names of the wards along the rather nice dotted boundaries, displacing actual road names and leaving junctions that could easily confuse the user. Here are three examples:

You’ll notice I’ve added “ward” to the end of the names to try and help, but it’s not much of a solution. Three different proposals have been put forward on the OpenStreetMap bug tracker (a dedicated map, hide them, make them less bold).

A simple solution to the problem above would be to remove the names; a more sophisticated solution would be to give road names priority, change the text colour to the purple of the boundary lines, and hope Mapnik allows us to offset the labels so you can have the two ward names either side of the line).

It’s a bit of a shame that they create a mess because they’re useful data to include in OpenStreetMap (much like trees, which I’ve asked for a solution to).

For example, the holy grail of software to help us canvass voters would be to connect the electoral register to the OSM database of houses, allowing us to visualise and manage information on voting intentions and canvasser visits by ward on a nice map.

Another useful application could be Nominatim, which could tell you the political boundaries that any chosen OSM-mapped home, business, park or set of co-ordinates lies within.

For now I just need to finish getting all those Southwark boundaries into the database…

Data quality

One other quick point. The London Borough of Southwark boundary was already in the database, but it’s not very well mapped.

It’s really important to know if a boundary runs down the middle of the road, so that homes on one side belong to one borough and homes on the other side to another borough; or whether the boundary is offset away from the road, usually down back gardens, so that all homes on both sides are in the same borough.

Fellow OpenStreetMappers should be careful to put the boundary in exactly the right position, ideally sharing nodes/ways with the actual roads where the boundary goes down the middle so it’s precise and won’t go wrong if somebody adjusts the road position.

Tagged , , ,

Why map data sometimes matters

I was contacted recently by a parent campaigning for a local school to ensure its admissions policy is properly applied. Over-subscribed schools like this one are a common source of frustration and worry up and down the country.

Here’s the rub. Which of these two homes would you say is closer to the school, and therefore more likely to secure a place?  By the way, I’m not sure that the location on the left actually is within the catchment area, it’s just a place I randomly chose to illustrate the coming point…

Routes to the school from two locations using CloudMade maps, the home on the right wins by 500m.

Parents at the location on the right were told they were too far from the school. The method they use to calculate safe distances to the school actually suggests that the location on the right is farther away than the location on the left!

Why?

Because they are calculating distances using a model that measures the distance as if you are driving a car. If you try that, you get a totally different result:

Routes plotted for cars to get to the school, the home on the left wins by 400m.

The school’s model uses the Ordnance Survey ITN maps, and apparently doesn’t account for this short footpath at the end of one road. It was pedestrianised 25 years ago.

Happily OpenStreetMap has all the relevant data (and a few minor corrections the parent, Jasia, pointed out to me) so anybody can plot the route to prove the point.

Incidentally, if you fancy showing your support for this campaign download this letter to the governors, sign it and send it to the address at the top of the document.

Tagged , , , , , , ,

Making open data maps the almost-easy way

One of the annoying things about open data is that you often need ninja skills to do anything with it. OpenStreetMap contains a wealth of geodata, but most tools make you jump through several steps involving the command line and all manner of data wrangling just to produce a custom map.

Maperitive tries to make it much easier to create nice looking maps. It has been in gestation since late 2007, and is now close to being easy to use.

It took me about half an hour of playing around to produce my first nice hiking map of Snowdon, although a problem with NASA’s elevation data led me on a frustrating journey to get Ordnance Survey open data in there to fill the gaps. I also had to work out Maperitive’s settings file for the way features are drawn to make the maps look a little neater and, well, British.

Making open data maps the almost-easy way

(Click on the images to see them on Flickr, where you can look at full sized versions).

Another hour messing around with the settings file and I had a nice map of an area my new father in law likes to go walking, the Long Mynd in the south of Shropshire. This time I aimed for something familiar to users of the Ordnance Survey walking maps.

Making open data maps the almost-easy way

The latest beta of Maperitive also allows you to export a 3 dimensional model using elevation data, and a flat image of the map. You can import these into a modelling tool, laying the map image over the 3d model, to produce nice graphics like this one of walking routes up Snowdon:

Making open data maps the almost-easy way

If the NASA elevation data works for you and you don’t want to change the style of the maps, it’s already a fantastic and fairly usable free-to-download tool. It’s a shame it isn’t free software with the code open sourced.

UPDATE: I completely forgot about this, you can download my Ordnance Survey-inspired stylesheet here.

Tagged , , , ,

Open scrutiny in the age of open data

This is the first of perhaps two or three short essays inspired by Emer Coleman‘s masters dissertation on open data, written in a personal capacity and not as part of my job. In this post I want to look at what her proposed model of “iterative and adaptive open government” would mean for scrutiny of the Mayor of London. Her dissertation considers the difference between the New Public Management approach, characterised by public managers setting the goals and other public managers auditing their performance, and an emerging “Open Governance” approach using open data.

Continue reading

Tagged , , , , , , ,
Follow

Get every new post delivered to your Inbox.

Join 89 other followers