An extended version (SharedIt-link) of this blog-post has now appeared as a commentary in the journal Philosophy & Technology, 30:541–545.
A few months ago I was invited to enter in a dialogue with Brussels-based artist Rossella Biscotti for the occasion of the exhibition of her installation “Other” from 2015 at the Contour Biennale in Mechelen (Belgium). In this work, she uses the Jacquard weaving technique to visualise data from Belgian census data, and engages in an exploration of data-subjects that are categorised as ‘other’ within this data-set. The resulting installation consists of 4 large carpets that display data of various minority-groups and rest-categories of the Brussels population.
My role in this collaboration was to contribute formal or mathematical insights on how rest-categories like other or none of the above could be understood. In this short piece I reflect on this collaboration. I first discuss how artistic research like Biscotti’s can contribute to the critical evaluation of contemporary data-practices, and then elaborate on how logico-mathematical insights can become part of such inquiries.
Biscotti’s 10×10 installation, the precursor of Other, was originally designed and produced to be exhibited at Haus Esters in Krefeld (Germany)—a modernist villa designed by Ludwig Mies van der Rohe for the silk-manufacturer Josef Esters—, and integrates multiple modernist ideals in a single work of art. In this work, Biscotti explores how institutional structures are imposed on individuals by combining features of automated mechanical manufacturing with conceptual and technological aspects of how large data-sets are collected and processed. She focuses in particular on how categories are used to create an overarching structure, and relates this to the punch-cards used to implement such structures within industrial (the Jacquard loom; an early 19th century device that automated the weaving of several complex patterns) and administrative (the Hollerith tabulator) processes that became increasingly automated in the early 20th Century. By showing the resulting work in Haus Esters, it becomes part of a more encompassing modernist narrative exemplified by Mies van der Rohe’s architecture.
For the exhibition of this installation at Contour, Biscotti’s team wished to extend their research with a more rigorous expression of the logic behind uses of rest-categories like other, and capture this logic in a single formal expression. This led us to a brief excursion into the meaning of the labels we use to designate such rest-categories, and suggested that we should interpret these labels as semantically empty labels that share certain features with the sentinel values that data-scientists now use to signal that certain data are missing. By asking how such empty labels interact with the generation (and ensuing reification) of categories, for instance when data are aggregated, we came to an interpretation of rest-categories as sets of data-subjects whose members should not, due to the lack of positive evidence of their similarity, be subsumed under a single kind or profile.
The recurrent attention for minority-groups and rest-categories, as well as the value accorded to automated and/or mechanical processes, naturally place Biscotti’s work within the scope of current debates on large-scale data-processing and the data-revolution. Mechanical objectivity and data-shadows are, for instance, current topics of interest within the scholarly community that tries to understand and assess the ethical, legal, and social implications of the data-revolution. And yet, the artistic research that led to 10×10 and Other deliberately only investigates historical computational technologies like the punch-card, and remains focused on the functioning of categories in census-data, which is itself a very traditional form of large-scale data-collection and organisation. It is, therefore, not immediately clear how Biscotti’s work, which (unlike the work showed at last year’s Big Bang Data at Somerset House in London) remains silent on matters like Big Data and machine learning, can contribute to our understanding of what we now see as the most salient features of the data-revolution.
What I’d like to suggest is that taking early manifestations of automated data-processes as an object of study can help us to open up new ways of questioning data-centric forms of knowledge-production, for instance by making us aware of practices that have become too familiar to deserve a critical assessment. Punch-cards and tabulators are, in that sense, similar to pre-cinematic processes: they are basic mechanical devices we study to understand the technologies that, respectively, enable contemporary artistic and documentary practices (cinema) or that enable novel epistemic practices. As such, it (re)directs our attention to the technological changes that make epistemic practices possible, or even just conceivable. It becomes a genealogical project, and has the potential to identify the technical and conceptual changes we need to be aware of to understand contemporary practices, by exposing us again to the historical building blocks of our current practices.
Biscotti’s work helps us, at the same time, avoid certain distractions. It can encourage us to look underneath the reigning rhetoric on Big Data, the mythical abilities that are often attributed to machine learning and artificial intelligence, and perhaps even the most rudimentary principles of inferential statistics. It invites us to take a few steps back—back into what we think of as known territory—, and draws our attention to the practices and assumptions that make data-driven inquiry and decision-making possible: recording, organising and processing through counting, categorisation, and automated calculation. Because artistic research like Biscotti’s is situated at the periphery of current scholarly debates, it isn’t bound by a given research-agenda and can reinvestigate familiar and often widely trusted practices, and ask elementary questions anew; from a contemporary (artistic) perspective. This includes questions that may have lost their immediate relevance because they no longer drive our scientific or scholarly curiosity, but also questions that are not aligned with the dominant themes of ongoing debates concerning privacy, fairness, transparency, or responsibility.
What then can a logico-mathematical approach contribute to artistic research concerned with the classification practices on which census-data are built? Two things at least. It can help make the idea of a “logic of classification” more explicit, and develop its implications in purely abstract terms (for instance without associating rest-categories with forms of exclusion). As such, it can reorient our critical attention from how classification-structures affect specific data-subjects in concrete settings to how classification-rules create abstract entities like the profiles or categories that become the primary entities we reason about or use to make decisions. Second, it can be used to explore alternative approaches; in this specific case, different ways of conceptualising how rest-categories should be used in the construction of categories of (in certain respects) similar data-subjects.
In relation to the focus on “other”, I specifically contrasted two different ways in which the membership of a rest-category could be conceptualised. The basic principle that underlies both is that data-subjects belong to the same category (or fall under the same profile) if and only if for all the relevant data-dimensions we have attributed them the same values (or values within the same range). In this way, we can construct categories of, say, all the children of ages between 6 and 10 that have at least one sibling. Similarly, we seem to be able to construct the category of all the data-subjects categorised as “other” in the data-dimension “household position,” and this even if the actual household-roles of the presumed members of this category do not have anything in common apart from the fact that they do not conform to any of the roles privileged by the designers of the census, and that their place or role within a household probably isn’t very common (as in the case of “other nationalities”). Treating such rest-categories as bona fide categories makes sense if we think of labels like “other” or “none of the above” as semantically significant labels; labels that provide sufficient ground for identification because they indicate that we have sufficient evidence to identify the data-subjects that were so-labelled.
If, however, we think of such labels as a mere indication of the absence of any information, this strategy quickly becomes questionable. In the context of the mentioned household positions, being categorised as “other” results from negative answers to 4 consecutive yes/no-questions, but does not need to carry any positive information. At least for some rest-categories it thus makes more sense to treat the labels we use to denote these categories along the same lines of the sentinel-values that are customarily used to signal missing data, like 9999 or the NaN (not a number) numeric data-type described by the IEEE 754 floating-point standard. Let us stipulate that two data-subjects fall under the same profile or belong to the same category if and only if, first, there is no information that indicates that they are different in a relevant respect (a potentially vacuous sense of being similar), and, in addition, there is also positive evidence that they are similar in the relevant respects. By the second requirement, the label “other” then no longer leads to the creation of a category of others. Because explicit sentinel-values like NaN have the property of not being equal to themselves (the expression NaN==NaN will typically evaluate to False), this requirement for positive information can be simulated by using such values to denote rest-categories.
Using a randomly generated data-set similar to the data used by Biscotti, the difference between the two types of approaches can easily be visualised. In the figures below the sizes of categories are displayed as bubbles; the figure on the left uses the number 10 to denote “other” (and 10==10 evaluates to True), whereas the figure on the right uses NaN.
Here, we immediately see that the presence of data-subjects labelled as “other” leads to the creation of a large periphery of different (because unknown) data-subjects whenever the label used to denote rest-categories indicates the absence of information. As such, this leads to a minimal sense in which we can understand how the meaning we assign to the labels we use to denote categories interacts with the process of creating categories or profiles and the subsequent use of these categories as an ontology used to describe a given subject-matter.