Thursday, 16 October 2014

The Urban Fabric of English Cities

[now updated, thanks to @udlondon - scroll to bottom of page]
Inspired by some mapping in the US by Seth Kadish, the availability of new GIS open data, and the fact that I love looking at patterns of urban form, structure and density, I have created a comparative graphic showing the building footprints of nine English cities, with London at the centre (just because it's biggest). I have done this in a very simple way, with all cities mapped at a scale of 1:125,000 in the full size versions (which are massive), plus one small scale bar and a little explanatory text. Here's what it looks like:

The urban fabric of English cities (black/red, medium res)

This graphic does a good job - in my view - of demonstrating the compactness or otherwise of the cities in question. It also illustrates how tightly-bounded some places are and how under-bounded others are. For example, Liverpool is very dense and compact in contrast to Leeds but this really is a boundary effect because the size of the local authorities differs so much. The urban area of 'Liverpool' extends far beyond the boundaries of the local authority area, which is what I show above. I wanted to compare the local authority areas rather than the wider city-region because I wanted to highlight this boundedness issue and compare like with like in terms of formal administrative areas. London is obviously a bit different so I've shown the 33 constituent parts of Greater London.

Take a closer look at the graphic by clicking on the two larger images below - one in white and one in black. They are both just a bit bigger than A0 paper size in their full size versions in the zipped folder below so if you want to take a really close look, download them. I've also uploaded smaller-sized versions in the same folder. I deliberately didn't include more information on the graphic itself, but at the bottom of the post you'll see the population of each city in 2011 (which relates to the individual city images), plus its urban area and metropolitan area population. The population of Greater London in 2011 was 8.2 million (compared to 4.4 million for the other cities shown). The cities I selected are the English members of the Core Cities group, which now also includes Glasgow and Cardiff.

Click here for a full screen white version

Click here for a full screen black version

Download a zipped folder with black and white versions in different sizes.

Update: the @udlondon people got in touch via twitter to show their attempt at fitting the core cities inside the London boundary - as below - so this inspired me to try the same with the original data. The first image below is the original @udlondon artwork and the next one is my attempt using GIS. Finally, as a reminder that nothing is ever really new, I have added a similar map which we found as part of the JR James urban image archive which we launched last year. This version has 13 different cities.

A manual approach to GIS!

My attempt at the same thing, using QGIS - full size

Some of the boundaries were a bit different in those days

Urban area
Metropolitan area

Totals: the population of the 8 city local authority areas is 4.4 million, for their urban areas it is 9.8 million and for their metropolitan areas it is 16.5 million. I may compare metropolitan areas next time, but mapping this is a little more time consuming.

Saturday, 11 October 2014

Flow mapping with QGIS

[Now updated with sample data file - see Step 1.]
I've written quite a bit about flow mapping with GIS in the past, including on this blog, and in a couple of academic papers. Previously, I'd used ArcView 3.2, ArcGIS 9 or 10 and MapInfo. MapInfo in particular has been my 'go to' GIS for mapping large flow matrices, thanks to a very short line of MapBasic code explained to me by Ed Ferrari. Others, such as James Cheshire, have used R to great effect, but this post is instead about flow mapping with QGIS, which I am extremely impressed with for its flow map capabilities. I've posted many of my QGIS flow maps on my twitter but in this post I want to explain a little bit about the method so others can experiment with their own data. Here's an example of a flow map created in QGIS - though in this case it's not a very satisfying result because of population distribution, county shape and so on*.

US county to county commuting

So, to the method. If you want to create these kinds of maps in QGIS, it's mostly about data preparation. I should also add that I currently use version QGIS 2.4 but I believe the method is the same in any version. Here's the ingredients you need.

1. A file with some kind of flow data, such as commuting, migration, flight paths, trade flows or similar. There should be columns with an origin x coordinate, origin y coordinate, destination x coordinate, destination y coordinate, some other number (such as total commuters) and any other attributes your dataset has (such as area codes and names). Here's an example csv file of global airline flows, if you want to experiment - it's the one from the screenshots below. I put it together using data from OpenFlights - by combining the airports.dat and routes.dat files. 

2. Once you have a file with the above ingredients, you then need to create a new column which has the word 'LINESTRING' in it, followed by a space, an open bracket, then the origin coordinates separated by a space, followed by a comma and a space, then the destination coordinates separated by a space and then a close bracket - as you can see below. You don't actually need to call the column 'Geom' as I have below, but when you import the file into QGIS it will ask you which column is the 'geom' one. You can create the new column in Excel by using the 'concatenate' function. If you're not familiar with it, there are loads of explainers online.

This bit probably takes the most time

3. Once you have your data in this format, you need to save it as a CSV so it's ready to import into QGIS. From within QGIS, you simply click on the 'Add Delimited Text Layer' button (the one that looks like a comma) and then make sure your settings look like the example below.

Make sure you click the right import button
Import CSV dialogue in QGIS - should be on WKT

4. Once you've done this, you simply click OK and wait a few seconds for QGIS to ask which CRS (coordinate reference system) you want to use. Select your preferred option here and then wait a few more seconds and QGIS will display the results of the import. You can then right click on the new layer and Save it as a shapefile, or your other preferred format. In the screenshot example above, the file with c60,000 airline flows took only about 10 seconds to appear on my fairly average PC running 64 bit Windows 7. I also tried it with 2.4 million lines and it only took about a minute. If you try this in ArcGIS - in my experience - it normally doesn't work with that many flows but MapInfo will handle it okay, but take longer. However, QGIS will render it more nicely as it handles transparency in a more sophisticated way and with hundreds of thousands of flows you usually have to set the layer transparency to 90% or higher.

The results, once you've done a bit of symbolisation and layer ordering, will look like some of the examples below.

Rail flows

All commuter flows

Bus flows - no labels, obviously

* I'm still trying to make sense of the US county to county flow map. The spatial structure of the counties and the distribution of the population make it more difficult to filter, so the above example is just a very rough (and not very satisfying) example.

Wednesday, 3 September 2014

A national map of cycling to work

I've recently being doing some visualisation work with the newly released Census commuting data from 2011. I've produced maps of all travel to work, and travel by car, train and bus. I've now done a map of cycling to work (below). This map is particularly interesting in relation to the patterns it reveals but also in relation to the strange long-distance flows we can see. I'm certainly not saying that anyone actually commutes by bike between Manchester and Bristol, as the map may suggest. Click on the big version and have a look around to see if you can spot anything interesting or particularly unexpected. A version with some place name labels can be found here.
This data comes from Question 41 of the 2011 Census form, which asked people to say how they 'usually' travelled to work in relation to the mode of transport which accounted for the largest part, by distance, of their journey. The results can look quite beautiful on a map, but they can also be confusing. Look closely at the map above and you'll ask yourself why there are so many long distance cyclists in England and Wales. More seriously, you might begin to question the validity of the data, the honesty of respondents or some other aspect of the results. 

The ability to interrogate datasets in this way is one of the strengths of visualising large datasets in that we can often immediately identify anomalous patterns or results that confound expectations or are just plain wrong. I'm not entirely sure what's going on with the long-distance flows. Perhaps some people take their bike on a train so ticked the 'bike' option, despite the train journey being longer. Perhaps some people live in one part of the country during the week and cycle to work there but then live at their usual address during the weekend and this is registered as their residence on the Census forms. I'm only speculating but this could be one possible explanation. 

In the image below, I've filtered the data so that only flows of 2 or more are shown. This significantly reduces the visual clutter, but also draws out stronger long distance connections between places such as Bristol and Manchester, and indeed Manchester and lots of other places. Take a closer look by clicking the link below this map. I've added some place names to this map to help with orientation.

Go to the full size version

I'd be keen to hear different interpretations on the data. You get similar results when you map the 'walk to work' data so there's definitely something interesting going on with how people have answered the Census question and the data we have to work with. I'm certainly not saying it's 'wrong', more that we need to understand what exactly it tells us. For now, I'll leave it at that.

N.B. Why didn't I include Scotland and Northern Ireland? The data are not out yet. It's not some ploy to exclude anyone and I know the blog title says 'national' so forgive me if that threw you. I intend to expand the analysis in due course.

Tuesday, 26 August 2014

Why you should start using QGIS

I've been a user of GIS since the late 1990s and in that time have mostly used ESRI software, such as ArcView 3.2 and ArcGIS versions 8 to 10. The first piece of GIS software I ever used was MapInfo 5 and I continue to use it now and again (in version 9.5 or above) - mostly for manipulating large datasets with hundreds of thousands or millions of records. I still really like both of these for different reasons so this post is definitely not a proprietary-GIS-bashing piece. It's just an encouragement to current GIS users to take a serious look at QGIS if you haven't already. I've been using it on and off for a couple of years and in that time have seen serious improvements. Most recently, I've done a good bit of mapping with it - as in the example below (commuter flows in Scotland, in case you're asking).

A flow map made in QGIS 2.4

There are many reasons to start using QGIS. The most obvious one might be that it is a cross-platform free and open source GIS that can do many things as good as or better than paid-for software. Take a look at the QGIS Flickr map showcase for some more examples. Of course, it is possible to make stunning maps with other open source packages such as R, but there is a really steep learning curve and many people don't have the time or inclination to get into it.

If I was to pick my four favourite features of QGIS, I'd have to go with the following:

1. The high quality map rendering and symbology options available to you - for example, QGIS handles layer and feature transparency in such a way that you can produce really attractive maps. QGIS includes by default so many nice looking, sensible colour schemes that it's much easier to produce quality maps. Anita Graser (QGIS author and guru) highlighted the way QGIS integrated ColorBrewer at version 1.4, for example. With version 2.4, you can also automatically invert colour schemes - which was one of the rare things that frustrated me in previous versions.

Flow map layout created in QGIS 2.4

2. The Processing Toolbox, with which you can access a huge range of spatial analysis and data management tools to perform a massive variety of tasks. See screenshot below for how it looks. If you want to add x,y coordinates to a polygon layer, this can be done really simply here, in addition to so many other geocomputation tasks (e.g. calculating area, line lengths and so on). Beyond the basics there are also so many other more complex tasks you can perform here.

The Processing Toolbox in QGIS 2.4

3. QGIS Plugins - which really are fantastic. The one I probably use the most is OpenLayers, which allows you to add a large number of different base layers to your QGIS map - from Google streets and Bing Roads to OpenStreetMap and Stamen Toner layers. As I write, there are currently 214 available plugins listed in QGIS 2.4. Another fantastically useful plugin is Table Manager, which allows you to very quickly change field headers in attribute tables.

The Plugins menu in QGIS 2.4

4. Flow mapping in QGIS. This is something I've done a lot of over the years but recently I've been blown away by the simplicity and elegance of the way QGIS can convert massive CSV files into large flow maps. MapInfo had served me very well in the past - and is still amazing when you use a single line of MapBasic - and recently ArcGIS has improved, but it still has a way to go. All you need to do in QGIS is format a CSV file and have one Geom field with the LINESTRING command and x, y coordinate pairs formatted as in the image below. Once you import this file using the Add Delimited Text Layer tool the job is done. The results - following a bit of styling - can be amazing. What I love even more about this is that I picked up this tip from a StackExchange post by a 73 year old retiree! Isn't the internet amazing?

This csv file, is easily turned into images like the one below

Travel to work flows - car and train

As I've been writing this I've mentally added several more things to the list but I'll stick with the above for my favourite four right now. I do, however, also love the labelling options, the coordinate system selector options, the vector tools and the fact the user community is so helpful. I still find the Print Composer a bit fiddly for creating maps in but this is a relatively minor issue.

I expect I'll always take a portfolio approach to working with GIS software and continue to use QGIS alongside ArcGIS and MapInfo, but I'd be a bit lost without QGIS now. I'm probably quite behind the curve with all this and I should have got more into QGIS a long time ago but it's still relatively early days in the mass take-up of the software, though some UK councils are now big users

So, why should you start using QGIS? Because it's absolutely fantastic, really powerful and pretty straightforward to learn if you already know your way around another GIS. Oh, and it's free - though it costs money to develop so you can always donate here.

Friday, 18 July 2014

Mapping Blight in the Motor City

In my preparations for the launch of our MSc in Applied GIS, I've been putting together lots of case studies of GIS in action. Luckily for me, this has coincided with the launch of the Motor City Mapping project in Detroit; part of a wider attempt by the city to understand and prevent urban blight. One part of this project has produced an amazing survey dataset covering nearly 380,000 land parcels in the city. An overview of this is provided by Motor City Mapping in the following graphic.


This data was generated by survey staff over a short period of time during winter 2013/4 and is probably the most detailed, parcel-level city survey carried out in recent times. For more about the project, take a look at the short video below. One great feature - in addition to all the rest - is that the final dataset contains a link to the photo taken of each land parcel by the survey staff (residents of Detroit surveyed their own neighbourhoods). The entire dataset is pretty big - close to 1GB - but it can be downloaded via this page and used in your GIS. This direct link worked for me.

The image below shows you what it looks like when you map the data using the land use category. 

Link to bigger version

Finally, since they very cleverly included a photo url for each land parcel in a separate column, I decided to extract a small area and put it in a web map using CartoDB so that you can click each land parcel and see what it looks like, in addition to some of the characteristics of the parcel. I extracted the data for Grand Boulevard since it's an important street in Detroit's history, with important locations such as Lee Plaza, Motown Records and Henry Ford Hospital. Click on the image below to go to the full size version. You'll see that I've coloured the map by building condition - mostly good on Grand Boulevard - and when you click on a land parcel you'll see an image of what's on it plus details about condition, occupancy and use. I also included a date of when the survey was carried out.

Full screen version

This is all part of a wider city planning project called 'Time to End Blight', and you can read more about it on their web pages. The report is a great piece of work in a really difficult time in Detroit's history so it's great to see so many people coming together for this. If you have any interest in cities, urban blight, regeneration or revitalisation then I suggest you take a closer look at the report and its recommendations in particular.

Friday, 23 May 2014

The Wonderful World of Open Access

I'm one of the editors of an open access journal, but that's not what this post is about. Instead, it's about the wider world of open access, which I've blogged about before, with some charts and stats. The web is full of opinions on open access, with comments from sceptics, advocates and others somewhere in between. I'm really excited by open access publishing, but of course - like any publishing model - it's not perfect. What I've been doing over the past few years is trying to learn a lot more about the world of open access and really understand what the open access landscape looks like. 

In doing so, I've become pretty familiar with where to find information and for this purpose DOAJ is my first port of call. What you find out very quickly is that there are literally thousands of open access journals - about 10,000 - and that they constitute a very diverse, colourful group. One way to demonstrate this is to look at the metadata on the DOAJ website. Here you can find, amongst other things, a list of URLs. So, I took a screenshot of them all - the results of which you can see here (or by clicking the image below - it takes a while to load).

Why on earth did I do this? Partly as a little spare time project to see how easily it could be done but mostly because I wanted a quick way to see what all the websites looked like - i.e. how many are full-blown fancy websites backed by international publishing houses and how many are more small-scale ventures. It also allows you to more easily identify families of open access journals (scroll down and you'll see quite a bit of this). This doesn't necessarily say anything about the quality of journals (that's for readers to decide) but it does provide a visual overview in a more accessible way. Looking through the full list of 10,000 websites would take a little longer! I used a Firefox extension for this task, and it did take quite a while. The DOAJ spreadsheet I used is from late in 2013 so some more recent journals are not included. To finish with, here are some of my favourites...

'Fast Capitalism' - I love the name and the musical intro:

'Studies in Social Justice' - nice cover shot:

'International Journal of Dental Clinics' - so many languages:

'Reading in a Foreign Language' - I just like this idea:

Not sure what caught my eye about this one, but I like it:

Tuesday, 8 April 2014

Mortgage lending data in Great Britain - a step in the right direction

Since late in 2013 data on bank lending at the postcode sector level for Great Britain has been available via the Council of Mortgage Lenders (mortgages) and the British Bankers' Association (personal loans). This followed an announcement in July 2013 that such data would be made available in order to - among other things - "boost competition" and "identify where action is required to help boost access to finance". It was also said at the time that the data would be "a major step forward in transparency on bank lending". My assessment is that this is only partly true. The new data do represent a major step forward and organisations such as the CML are to be commended for their work on this, but in relation to mortgage lending at least things are more opaque than transparent, as I attempt to explain below.

Location quotient lending map for Liverpool - HSBC lending

I should begin by saying that I think this newly available data is a fantastic resource and that it does allow us to ask important questions and - to an extent - hold lenders to account. However, since we have no data on local demand - as they do in the United States under the terms of the Home Mortgage Disclosure Act of 1975 - then the extent to which we can identify which lenders are rationing finance, excluding areas or 'redlining' is extremely limited. In fact, I would concur with George Benston, who said in 1979 (p. 147) that:

  • “If the focus is on the supply of mortgages, either in terms of numbers or dollars, a demand as well as a supply function must be constructed and specified. When demand is not accounted for, there is no way to determine the reason for any given level of supply.”
So, in the image above, which shows lending location quotients for HSBC in Merseyside, there is no way of knowing whether areas of lower lending receive less finance because fewer people there apply for mortgages or whether some other supply-side mechanism is in force (e.g. bank lending policy). All we can really do is compare the lending practices of different institutions and note the differences. The same type of map is shown below for Lloyds banking group (the UK's biggest lender). This would appear to suggest some significant differences in relation to where these two banks lend the most. People with a knowledge of Merseyside will recognise that Lloyds has many higher lending location quotients in poorer areas. Is this 'evidence' of financial exclusion, redlining, sub-prime lending? No, it definitely is not. It does show, however, that banks lend differently at the local level. This is not news, but the new data releases allow us to identify very local patterns and ask questions about it. 

Location quotient lending map for Liverpool - HSBC lending

Interpreting change
Following the first release of data in December 2013 - which included all outstanding mortgage debt up to the end of June 2013 - an updated dataset was released in April 2014, covering the period up to the end of September 2013. Once again, this is a fantastic development in many respects, but it poses new questions of interpretation. We don't have any idea at all - and please let me know if I'm wrong - on how much of the change is down to people paying off mortgages, how much is down to new loans being taken out (none?) and how much is down to data disclosure mechanisms put in place by the banks. The CML's chief economist, Bob Pannell, said this upon the release of the second iteration of the dataset:

  • "Unsurprisingly, with data covering outstanding lending rather than new flows, there are only small changes since the last quarter. It is likely to take some time before any discernable changes or trends emerge from this quarterly data series."

My calculations indicate that most areas experienced only modest change (c. +/- 2%) but as far as I can tell it's not possible to say exactly what causes the change in each area. There are some big changes at the local level in relation to individual postcode sectors, but it's only really possible to speculate at the causes. Nonetheless, here are the top five:

  • London NW9 1 - £907,640 (end Q2, 2013) to £1,396,795 (end Q3, 2013) - 53.9% increase
  • Exeter EX5 7 - £16,894,775 to £23,497,142 - 45.0% increase
  • London EC3A 5 - £2,729,724 to £3,677,645 - 34.7% increase
  • Cambridge CB2 9 - £56,992,750 to £75,546,239 - 32.6% increase
  • London EC1V 8 - £23,254,820 to £30,342,713 - 30.5% increase

Without some additional contextual information - as is available in the United States under the provisions of the HMDA, we can only really guess at the causes of such change. That's why various organisations have been campaigning for greater financial transparency for some time - most notably Friends Provident in their report from 2012.

The data
I've done a reasonable amount of analysis with the new mortgage lending data, including writing and submitting an academic paper, a series of maps and various other bits and bobs via twitter. My assessment closely mirrors that of Owen Boswarva, who notes the 'open-ish' nature of the data releases. The data do not, as far as I can see, come with any kind of licence (such as the Open Government Licence) but I and many others have just taken for granted that the data are 'open'.

The way the data are released is also interesting. The press releases and aggregate lending figures for mortgages are released via the CML, and cover about 73% of the mortgage market. This is obviously a significant advance on what was previously in the public domain, but in terms of getting your hands on individual bank data in one file, you have to scrape and mash the data together, like I did when creating my mortgage lending map site. Since I've worked with the data quite a bit, I thought I'd give a little overview of how I think individual lenders have done in relation to making the data available:

  • Barclays - if you go to the postcode sector data page on the CML website, there are links to data for all banks. When you click on 'Barclays' it will take you to their 'Citizenship' pages (as of 8 April 2014). From there you can link to the new Q3 2013 data release within a news article. It's not the easiest of journeys and could be made much more obvious. The file itself is, rather strangely, called 'Satellite.xlsx'. I think they could do better.
  • Clydesdale & Yorkshire Banks - these institutions are part of the same banking group and so report together as one. The data are pretty easy to find.
  • HSBC - for me, this is the most troublesome data release since I can only find it in PDF. It's not a massive task to convert it into a usable format, but it seems really odd in this day and age that a major financial institution would choose to release 10,000 rows of data in PDFs. If anyone has spotted another format please get in touch. The HSBC approach is at odds with the spirit of the exercise, surely.
  • Lloyds - the UK's biggest lender (following a series of acquisitions) also have a nice data page, which is easy to navigate. They provide some useful information, such as the fact that most buy-to-let mortgages are included in the data, and a direct link to the Excel spreadsheet.
  • Nationwide - my analysis suggests that Nationwide (the only building society to release data) truly live up to their name in terms of the the geography of their lending. Their data page is basic but it does the job. 
  • Santander - this institution also provides clear and simple access to their lending data. This is now different from the link provided on the CML website. 
  • RBS - as far as I can tell, RBS are the only bank to have produced their own maps of lending patterns across Great Britain and their data page is really quite good. The new data are currently provided via a news page link.

Despite the fact that some of the data are a little hard to find, it's mostly quite a good situation - apart from the HSBC PDFs of course. It would be much better if all the data were put together in a single spreadsheet by the CML but perhaps this is something the individual lenders are not too keen on (!) so it's up to people like me to stitch it all together. In which case, it would be great if HSBC started publishing in a more convenient format.

What's the point of all this?
I can sum up quite simply. If this data were released as part of a drive to increase transparency in the banking sector, then I think a few more things need to happen next:

  1. We need some indication of demand - e.g. number of applications, number of refusals, and so on.
  2. We can't do research into subjects like 'redlining' because we don't have the above information. We can make comparisons between banks in poorer areas but that's about it for now. If we really want to look at transparency, we need more information.
  3. We need more of a breakdown of the data in each new release to say how much is new lending and how much of the change is down to other factors - particularly so at the postcode sector level.
  4. Some additional clarity on the 'open' nature of the data would be very welcome.
  5. We need more banks to follow the example of the first seven and make their data available. With more than 100 lenders in the market it's probably not possible to get all to comply, but more work here would be useful.

All of the above represents really positive progress, but I think more is needed. I do realise of course that the CML are "considering additional features and functionality for future reporting waves", so I look forward to seeing what happens next.