As a data scientist, I am all too familiar with the second part of the phrase, ‘all for one’. Some might say that it explains what machine learning does today. You aggregate lots and lots of data and then apply machine learning or coded rules to predict the outcome of one specific case – translating into ‘all for one’.
An example is the collection of data on COVID-19, which – in combination with an algorithm – can be used to predict how the virus could affect Switzerland. By adding more data and doing further modelling we can then estimate how specific governmental measures may affect the predicted COVID-19-development. That is the ‘all for one’ that I know as a data scientist. From all the data consisting of thousands of cases we try to estimate how different measures will affect one outcome.
Of course, the true meaning of the first part of the phrase ‘all for one’ is solidarity. A word, which – due to COVID-19 – we now find at every grocery store, apartment complex and governmental institution throughout Switzerland, not to mention also across the internet. Data science is changing the world as we self-isolate ourselves to flatten a statistical curve. One for all. Each one of us staying at home to flatten our aggregated infection curve.
Data Science and Decision Making
The measures, invoked by the Swiss government and upheld by each and every one of us, might save thousands of lives. At the same time, following these measures is also creating negative consequences: companies are pushed towards bankruptcy, people are losing their livelihoods, domestic violence cases are increasing and the physical and mental health of many is deteriorating. The Swiss government has made several very impactful and difficult decisions based on domain knowledge, data and predictive modelling.
While not all decisions made by the public sector are equally existential in their impact, collaboration and reliance on experts for domain knowledge is crucial to avoid inefficient stand-alone top-down decisions. In these collaborative efforts, a key success factor lies within the communication between domain experts and decision-makers. Data science language – i.e. facts and figures – will be used frequently whereby contextualisation of data-science-based facts and figures is extremely important for making robust and meaningful decisions. Thus, I hypothesise, that the output of my project at the ETH Library Lab will help ease the decision-makers’ burden in the public sector. Let me explain why:
The strength of data science lies in the ‘all for one’ part of the phrase, but the same does not apply to the ‘one for all’. In fact, the latter is a concept rather foreign to data science. We data scientists usually start with a big amount of data and wish to only extract one high performing predictive model, or a few explanatory numbers and figures. Data scientists’ expertise is reduction and abstraction, we do not generate much from little. On its own this is not necessarily an issue, but in the context of communication and knowledge transfer we cannot deny the inefficiency created by our use of communication that is solely based on isolated and selective summary points of the data.
In my project at the ETH Library Lab, I am developing a tool kit that is built on the premise of using stories as communication tools. By telling the story of the data-in-question instead of providing isolated and selective summary points of the data, we can translate the aggregated data into the decision-making context. Thus, the tool kit supports the decision-making process by making it more effective.
Complexity, Context and Communication
Telling the story of the data-in-question instead of providing isolated and selective summary points of the data is like preparing the kitchen for service ‘mise en place’ as opposed to simply stocking up the kitchen with groceries. In this metaphor the decision-making process is the kitchen service and data science the preparation of it. In business terms this implies that we have to pivot our product, data science, to fit the needs of the end-users, in this case the decision-makers. In my project at the ETH Library Lab, I do so by complementing data science’s strength in reduction and abstraction with the use of stories as a communication tool.
In the subsequent paragraphs, I will elaborate why I suggest using stories as communication tools for data science as opposed to simply tweaking the existing communication tools used in data science so far, such as statistical figures, graphs and tables. Let me make a fictional example of a scenario where the use of the demanded information is easy to pinpoint, a situation where a data scientist might try to deliver the information ‘mise en place’ for the decision maker, using traditional data science communication tools.
If we are asked to estimate how long a given supply of surgical N95 respirators will last, a conventional machine learning oriented data scientist might look at the prediction of his winner-takes-it-all model and answer “We are 91% certain that our supply of surgical N95 respirators is sufficient for the next 2 weeks.” A more statistics-oriented data scientist might look at a few facts and figures and answer with: “Given the current status, we predict, with a confidence interval of 95% certainty that the current inventory will last between 10 and 18 days, with a point estimate of 14 days.” In both cases we have answered the decision-makers’ question, no doubt. But that is only half of the story.
The decision-maker now must interpret that information in the context of their responsibilities. For example, they need to translate the information into the context of whether to place an order with their surgical N95 respirators supplier. In this context, answering with “We probably have enough for the next two weeks. But this estimate might be slightly too high, as it is based on past numbers, from when less masks where needed. Additionally, due to the current situation, the average delivery time over the past weeks has increased to seven days, tendency still increasing.” would most likely improve the efficiency of the knowledge transfer. Hence, in simple scenarios where the use of the demanded information is easy to pinpoint, and the potential damage is low, traditional data science communication tools might suffice . Yet, these types of scenarios are fairly rare in the communication with data science.
How can the efficiency of the knowledge transfer be improved in vaguer contexts or where the potential damage is higher? For example, if a data scientist is asked to provide a decision-maker with the average mortality rate for COVID-19 patients in a discussion on what nationwide measures should be taken. The data scientist can answer by saying “2.5%” . Again, we have answered the decision-makers question, no doubt. But we have left the contextual interpretation of the information completely up to the decision-maker. Most data scientists would try to improve their answer by adding further information e.g. “2.5% is the average mortality rate, but with the right medical treatment it is reduced to 0.7%. It does, however, go up to 23% for COVID-19 based hospitalisations.” [2, 3]. This example begs the question: does stating more statistics improve the efficiency of the knowledge transfer to the decision-maker? I do not think so.
Data scientists who do their job well acquire not only information, but knowledge. I argue that some of the issues with data science in our society today, can be attributed to the fact that those communicating rely heavily on providing information instead of transferring knowledge. My interpretation of the problem stands in contrast to data science, and especially machine learning algorithms, often being treated as black boxes. In my opinion, the actual complexity of databased technology is seldom the problem. But the perceived complexity caused by inefficient communication just might be. This perceived complexity often leads to people either denying the actual complexity or feeling overwhelmed by it. I therefore hypothesise that the selective nature of data-science-based communication is the issue and not the complexity of data science itself.
Therefore, introducing stories as supportive communication tools would allow us to embrace the holistic complexity while putting data science applications into understandable terms. And thus, fully translating the data science’s forte of ‘all for one’, into the decision-maker’s context of ‘one for all’ – finding one decision that is beneficial for all. The communication with stories does so by providing the decision-maker with an intuition for the topic, which fosters action-oriented engagement in the subject matter. Hence, supporting the decision-maker in their role. This will ultimately lead to improved explainability of databased technology and therefore create more trust in data-science-based communication.
 “Unus pro omnibus, omnes pro uno.” Wikipedia: The Free Encyclopedia. Wikimedia Foundation, Inc. 22 July 2004. Web. 10 Aug. 2004. https://en.wikipedia.org/wiki/Unus_pro_omnibus,_omnes_pro_uno as of 4 May 2020.
 Fischer, S. und Petersen, T., 2018. Was Deutschland über Algorithmen weiß und denkt. Bertelsmann Stiftung (Hrsg.), 1. Auflage.
 Guo, T., Fan, Y., Chen, M., Wu, X., Zhang, L., He, T., Wang, H., Wan, J., Wang, X. and Lu, Z., 2020. Cardiovascular implications of fatal outcomes of patients with coronavirus disease 2019 (COVID-19). JAMA cardiology.
 Ji, Y., Ma, Z., Peppelenbosch, M.P. and Pan, Q., 2020. Potential association between COVID-19 mortality and health-care resource availability. The Lancet Global Health, 8(4), p.e480.
 Krafft, T. and Zweig K., 2019. Transparenz und Nachvollziehbarkeit algorithmenbasierter Entscheidungsprozesse. Verbraucherzentrale Bundesverband e.V.