A literature review is a central part of any research project, allowing the existing research to be mapped and new research questions to be asked. However, due to the limitations of human data processing, the literature review can suffer from an inability to handle large volumes of research articles. The computational literature review (CLR) automates the analysis of research articles with analyses of:
impact (citation analysis, e.g., H-index)
structure (co-authorship social network analysis)
content (topic modeling of article abstracts)
The CLR software can be used to support three use cases: (1) analysis of the literature for a research area, (2) analysis and ranking of journals, and (3) analysis and ranking of individual scholars and research teams.
The CLR and is explained and illustrated using a set of 3,386 articles related to the technology acceptance model (TAM) in:
The CLR is an open source offering, developed in the statistical programming language R, and made freely available to researchers to use and develop further.The code for the CLR is available from GitHub.
All organizations have limited resources and have to be mindful of where their time, money, people, and attention are focused. Without a clear business analytics strategy – which must be aligned with the organization’s business strategy and business model – it is unlikely that the potential of business analytics will be achieved (indeed, much time and money are likely to be wasted).
We have been working on a way of developing a business analytics strategy that is aligned with the business strategy and business model, i.e., the creation of a portfolio of analytics developments that will add value to an organization or focal business unit.
AnVIM uses a combination of the business model canvas (BMC), developed by Osterwalder and Peigneur, and systems thinking to provide context and depth to the business model. This analysis is followed by a mapping from business model to analytics opportunities:
AnVIM has been presented and workshopped at Operational Research conferences over the last two years and we are looking for collaborators who would like to experiment with the approach and work with us to develop it further.
On Tuesday 21 June 2016 the Operational Research Society’s Annual Analytics Summit takes place with morning presentations from Marks and Spencer, Movement Strategies, the Department for Education, and the Trussell Trust. The plenary talk is by Megan Lucero, Data Journalism Editor at The Times & Sunday Times. In the afternoon we are holding workshops to go deeper into the technology solutions reported in the morning sessions. We will be presenting the geospatial app built for the Trussell Trust.
We worked with the Trussell Trust to build a prototype tool for visualising and analysing food bank usage in the UK. A short report was produced for The Conversation from the full version of the report available on the Trust’s Web site.
We’ve been working with the Trussell Trust over the last 15 months to develop analytics for food banks. The Trust released its annual report on Friday, 15th April and received a lot of press coverage, which also picks up on the role of the University of Hull project funded by NEMODE.
The story of Jesus feeding 5,000 people with just five loaves of bread and two fish takes some believing, especially when read by a modern audience that is used to a society of waste and want.
But while “waste not, want not” may well be the choice phrase for millions of parents at mealtimes, food banks across the UK are performing their own small miracles every day in making sure there is enough food to go round.
UK food bank use is still at record levels. Over a million food packages – with three day’s worth of food – were given to people in crisis by the Trussell Trust in the last year alone.
The figures from the charity, which operates a network of more than 420 food banks, underline the scale of the challenge for those tackling poverty and points to a problem with hunger that’s not going away.
For the first time, academics from the University of Hull, working with data scientists from Coppelia and consultants from AAM Associates, developed a prototype tool to map food bank data against geographical demand. As well as showing actual food bank usage the prototype uses 2011 Census data to predict possible areas of food bank need.
Researchers took various census variables, for example levels of deprivation and unemployment at a ward level and found that many of these were highly correlated with food bank usage per head of population. Food bank use was shown to be higher in wards where there are more people who are unable to work due to long term sickness or disability.
Higher food bank use was also shown to be associated with deprived wards or areas with higher levels of people in skilled manual work.
Looking at anonymous postcode data for people referred to Trussell Trust food banks against census data has also enabled the trust to drill down to a micro-level and look at trends specific to a local area, as well as looking at the national picture.
Taking London as an example, the mapping shows high levels of food bank referrals due to benefit delays in certain wards in north and south-east London.
While the data alone can paint a vivid picture of food bank use in these areas, it requires more investigation to really get to the heart of the issue, and to find out if crisis provision is failing in these places, or if it is simply the case that local authorities are working more closely with food banks.
While finding out where food banks are used and by who is all interesting stuff, beyond the nitty-gritty of data metrics, there is now the opportunity for this tool to be used on a wider scale and really help to make a difference to people’s lives.
Adding in the referral agencies that provide access to food banks will help to provide another dimension for analysis. The Trussell Trust runs the majority of food banks, but future initiatives to incorporate data from non-trust food banks will also allow us to provide full coverage of UK emergency food provision for the first time.
And in time we will also add more open and external data: for example, to see if and how weather data impacts on food bank use.
On top of this, sharing data with other charities involved in poverty alleviation – for example homelessness charities – will provide a richer picture of food poverty and deprivation across the country.
With a joined up approach to data, and insights from other charities and food aid providers, this data could be used by local projects to work out where to target their efforts and which additional services would best help tackle the biggest local issues. And it is hoped this will lead to better informed interventions and greater influence on policy.
Data is a big opportunity for charities and third sector organisations and one that may have an impact that we are only beginning to understand. We hope this early analytics tool will provide a basis for food banks and other front line agencies to create powerful real world data applications.
The use of p-values has long been subject to criticism, one of which is its ability to be ‘hacked’. P-hacking is when a researcher tries lots of analyses and data treatments until they get the result they want (i.e., p<.05). For example, this might be fishing for p-values in a dataset, excluding outliers, transforming the data, analysing many measures but only reporting those with p<.05 – all represent potential selection decisions by the researcher. As Coase said, if you torture the data long enough it will confess. On 7 March 2016 the American Statistical Association (ASA) published a statement on the use of p-values (see Nature and the Oxford Internet Institute blog for background and commentary on the p-value problem). At least one journal has introduced a ban on the reporting of p-values.
The outcome of over-reliance, misinterpretation, and misuse of p-values is that much reported social science research is not reproducible – anywhere between 50% and 80% (also see John Ioannidis’ pioneering article, “Why Most Published Research Findings are False”).
For further detail on reproducibility of research see the Ioannidis video.
To find out how to p-hack (and how to prevent it) see the video by Neuroskeptic.
See the “dance of the p-values” video to see how unreliable p-values can be.
On 23 February 2016 Giles Hindle (University of Hull) and I gave a presentation to the York and Humber OR Group (YHORG) at the Circle in Sheffield on our thoughts about OR practitioners and data scientists: where they overlap and where they might differ. In coming to some tentative conclusions we reflected on our experiences on an analytics project for food banks that we have recently completed. It is only fair to say that we grossly over-simplified and set up stereotypes (caricatures?) of OR practitioners and data scientists arriving at our points for discussion:
Technical skills: OR practitioners need IT skills, top of the list being Python and R. They also need to know where to start and where to stop with IT work (e.g., when should it be handed over to an IT professional who knows how to deploy an operational system?).
Heterogeneous data: OR practitioners need to work with different data types, e.g., text mining and video analysis, rather than only seeing the world in terms of quantitative data.
Out of the comfort zone: OR practitioners need to get out of their comfort zone and engage with the business and business users in an agile way, rather than being in a specialist departmental niche with a traditional engineering mind set in which they provide solutions to the business (e.g., using small scale data in simulations).
Embedded analytics: Analytics will become embedded in successful organizations with a greater emphasis on prescriptive (action-based) applications where action is then subject to an evidence base (e.g., randomized controlled trials).
Transformation: Analytics is about organizational transformation – culture change is needed throughout the organisation if it is to become data-driven and embrace evidence-based management.
The last two points relate to the business analytics methodology that we are developing, BAM, which uses value mapping and soft systems to develop business questions that can be tackled through analytics. The full presentation is here: