On Tuesday 21 June 2016 the Operational Research Society’s Annual Analytics Summit takes place with morning presentations from Marks and Spencer, Movement Strategies, the Department for Education, and the Trussell Trust. The plenary talk is by Megan Lucero, Data Journalism Editor at The Times & Sunday Times. In the afternoon we are holding workshops to go deeper into the technology solutions reported in the morning sessions. We will be presenting the geospatial app built for the Trussell Trust.
We worked with the Trussell Trust to build a prototype tool for visualising and analysing food bank usage in the UK. A short report was produced for The Conversation from the full version of the report available on the Trust’s Web site.
We’ve been working with the Trussell Trust over the last 15 months to develop analytics for food banks. The Trust released its annual report on Friday, 15th April and received a lot of press coverage, which also picks up on the role of the University of Hull project funded by NEMODE.
The story of Jesus feeding 5,000 people with just five loaves of bread and two fish takes some believing, especially when read by a modern audience that is used to a society of waste and want.
But while “waste not, want not” may well be the choice phrase for millions of parents at mealtimes, food banks across the UK are performing their own small miracles every day in making sure there is enough food to go round.
UK food bank use is still at record levels. Over a million food packages – with three day’s worth of food – were given to people in crisis by the Trussell Trust in the last year alone.
The figures from the charity, which operates a network of more than 420 food banks, underline the scale of the challenge for those tackling poverty and points to a problem with hunger that’s not going away.
For the first time, academics from the University of Hull, working with data scientists from Coppelia and consultants from AAM Associates, developed a prototype tool to map food bank data against geographical demand. As well as showing actual food bank usage the prototype uses 2011 Census data to predict possible areas of food bank need.
Researchers took various census variables, for example levels of deprivation and unemployment at a ward level and found that many of these were highly correlated with food bank usage per head of population. Food bank use was shown to be higher in wards where there are more people who are unable to work due to long term sickness or disability.
Higher food bank use was also shown to be associated with deprived wards or areas with higher levels of people in skilled manual work.
Looking at anonymous postcode data for people referred to Trussell Trust food banks against census data has also enabled the trust to drill down to a micro-level and look at trends specific to a local area, as well as looking at the national picture.
Taking London as an example, the mapping shows high levels of food bank referrals due to benefit delays in certain wards in north and south-east London.
While the data alone can paint a vivid picture of food bank use in these areas, it requires more investigation to really get to the heart of the issue, and to find out if crisis provision is failing in these places, or if it is simply the case that local authorities are working more closely with food banks.
While finding out where food banks are used and by who is all interesting stuff, beyond the nitty-gritty of data metrics, there is now the opportunity for this tool to be used on a wider scale and really help to make a difference to people’s lives.
Adding in the referral agencies that provide access to food banks will help to provide another dimension for analysis. The Trussell Trust runs the majority of food banks, but future initiatives to incorporate data from non-trust food banks will also allow us to provide full coverage of UK emergency food provision for the first time.
And in time we will also add more open and external data: for example, to see if and how weather data impacts on food bank use.
On top of this, sharing data with other charities involved in poverty alleviation – for example homelessness charities – will provide a richer picture of food poverty and deprivation across the country.
With a joined up approach to data, and insights from other charities and food aid providers, this data could be used by local projects to work out where to target their efforts and which additional services would best help tackle the biggest local issues. And it is hoped this will lead to better informed interventions and greater influence on policy.
Data is a big opportunity for charities and third sector organisations and one that may have an impact that we are only beginning to understand. We hope this early analytics tool will provide a basis for food banks and other front line agencies to create powerful real world data applications.
The use of p-values has long been subject to criticism, one of which is its ability to be ‘hacked’. P-hacking is when a researcher tries lots of analyses and data treatments until they get the result they want (i.e., p<.05). For example, this might be fishing for p-values in a dataset, excluding outliers, transforming the data, analysing many measures but only reporting those with p<.05 – all represent potential selection decisions by the researcher. As Coase said, if you torture the data long enough it will confess. On 7 March 2016 the American Statistical Association (ASA) published a statement on the use of p-values (see Nature and the Oxford Internet Institute blog for background and commentary on the p-value problem). At least one journal has introduced a ban on the reporting of p-values.
The outcome of over-reliance, misinterpretation, and misuse of p-values is that much reported social science research is not reproducible – anywhere between 50% and 80% (also see John Ioannidis’ pioneering article, “Why Most Published Research Findings are False”).
For further detail on reproducibility of research see the Ioannidis video.
To find out how to p-hack (and how to prevent it) see the video by Neuroskeptic.
See the “dance of the p-values” video to see how unreliable p-values can be.
On 23 February 2016 Giles Hindle (University of Hull) and I gave a presentation to the York and Humber OR Group (YHORG) at the Circle in Sheffield on our thoughts about OR practitioners and data scientists: where they overlap and where they might differ. In coming to some tentative conclusions we reflected on our experiences on an analytics project for food banks that we have recently completed. It is only fair to say that we grossly over-simplified and set up stereotypes (caricatures?) of OR practitioners and data scientists arriving at our points for discussion:
Technical skills: OR practitioners need IT skills, top of the list being Python and R. They also need to know where to start and where to stop with IT work (e.g., when should it be handed over to an IT professional who knows how to deploy an operational system?).
Heterogeneous data: OR practitioners need to work with different data types, e.g., text mining and video analysis, rather than only seeing the world in terms of quantitative data.
Out of the comfort zone: OR practitioners need to get out of their comfort zone and engage with the business and business users in an agile way, rather than being in a specialist departmental niche with a traditional engineering mind set in which they provide solutions to the business (e.g., using small scale data in simulations).
Embedded analytics: Analytics will become embedded in successful organizations with a greater emphasis on prescriptive (action-based) applications where action is then subject to an evidence base (e.g., randomized controlled trials).
Transformation: Analytics is about organizational transformation – culture change is needed throughout the organisation if it is to become data-driven and embrace evidence-based management.
The last two points relate to the business analytics methodology that we are developing, BAM, which uses value mapping and soft systems to develop business questions that can be tackled through analytics. The full presentation is here:
As part of a research agenda in scholarly impact I wrote an R program to analyse Google Scholar data for sets of scholars, e.g., all the researchers in a Department or a University. The R package scholar does the work of scraping the Google data for individual scholars; scholarNET takes individual Google Scholar IDs, retrieves the data for each scholar, and produces a ranking based on h index:
The program also retrieves the publications for each scholar and then matches them on title to detect coauthorship relationships, which are then represented in a social network graph.
The social network shows that Shaw and Grant have coauthored two papers and Creazza and Colicchia have coauthored 14 papers. There is not a lot of evidence of coauthorship activity in this particular network. The network is also written out in GML (graph modelling language) format for visualisation in Gephi.
In setting up scholarNET the hard work is collating the Google scholar IDs for individual researchers, which is a manual task involving cutting the Google Scholar ID from the URL string for each academic’s profile. Of course, the approach also relies on academics having established a Google Scholar profile. However, more and more academics are setting themselves up on Google Scholar as they seek to demonstrate impact (e.g., at job interviews, for promotion cases) and to understand how their work is being used by others.
Download the R code for scholarNET and the list of scholar ids to try out the analysis. This was the first program I wrote in R to do something useful and the code is not pretty! If there is interest in it I will rewrite it.