Data determinism – can data really speak for itself?

It’s tempting to think that data can somehow speak for itself, that we can abandon theory because (as Anderson said in 2008):

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

This ‘data determinism’ is a worrying aspect of big data and big technology.  It makes no sense to think of data speaking for itself; as little sense as the idea of people speaking meaningfully without data and technologies to convey their message. More than 60 years ago, in 1950, Trist noted that:

Organizations are comprised of nested socio-technical systems consisting of human and technological elements intertwined in a complex web of mutual causality

Although this is  big step forward in recognising the entanglement of the technical and the social it presents them as a duality. Karen Barad’s (2007) agential realist philosophy moves away from seeing  people and technology as discrete entities, presenting them instead as part of composite and shifting assemblages and without inherently determinate boundaries. Reality is not given, but  enacted and re-enacted through practice – the social and material are intra-acting rather than inter-acting. By making a ‘cut’ the social/people and material/technology are stabilised and have the appearance of being separate and durable entities.

However, this separability can be misleading. It is the the mangling of people, organisations, technology, data, data models, data visualizations, etc. that constitutes reality through practice. As Bruno Latour (1993) has argued:

We do not need to attach our explanations to the two pure forms known as the Object or Subject/Society, because these are, on the contrary, partial and purified results of the central practice that is our sole concern.  The explanation we seek will indeed obtain Nature and Society, but only as a final outcome, not as a beginning.  Nature does revolve, but not around the Subject/Society.  It revolves around the collective that produces things and people.  The Subject does revolve, but not around Nature.  It revolves around the collective out of which people and things are generated.  At last the Middle Kingdom is represented.  Natures and societies are its satellites.  [Latour 1993, p. 79]

However, while technology does not determine how the world is, it would be foolish to deny the potential for data products and their embedded algorithms to destabilise the collective and for new (intended and unintended) forms of ‘congealed agency’ to emerge.

