From the humble magnifying glass to the state-of-the-art electron microscope, the approach to magnification has been one of exploration and innovation.
As physicists and chemists strove to deepen their understanding of the physical world, their efforts benefited early users of magnification instruments, namely, biologists. Transitioning from applications of the macroscopic laws of optics to that of the behavior of molecules and atoms, the journey has been one of both profound conceptual shifts and interdisciplinary exchange.
There exists a necessary bootstrapping between the theoretical level of advancement of any field of study and the development of tools that enable this advancement. Each takes temporary precedence over the other from time to time. This has been the case with the older physical sciences of biology, chemistry and physics.
A similar phenomenon likely exists within the younger computational sciences, too. However, their unique methodological requirements may necessitate new conceptualizations of quality and durability.
Datasets form one of the key ingredients of statistical data-driven research. Much like the materials of experimental work, such as in a chemistry lab, data have to be carefully sourced. The purity of these materials depends not only on the diligence of the experimenter but also on the tools employed to gather these materials.
And yet there is the case of the discovery of penicillin which resulted from a petri dish that was left unwashed for days, and the process of photographic film development from under-developed plates that were thrown into a dark cabinet for later disposal.
There are instances of what may be called “measurement mismatch error” that the computational researcher can be faced with. For example, a set of data used to train a network may be obtained from a measuring device with properties different from the one that produces the test data. This type of error is sometimes referred to as “Measurement Bias”.
However such biases have motivated the development of datasheets which describe the conditions under which a particular dataset may be used for best results. This is a key step towards the standardization of datasets.
Similarly, pre-training involves labelling data entries, a task that requires a high level of labelling consistency. However, sometimes two similar items end up assigned different labels by the labeller, an instance of “Recall Bias”.
While error mitigation is desirable, a closer study of this peculiar feature of the labelling process can also reveal hitherto unknown aspects of the dataset itself enriching our understanding of the field, not unlike the famous “failures” of history that paved the way for scientific breakthroughs.
A new science characterized by unique methodological requirements will have its correspondingly unique patterns of error, evolution and stabilization. The sophistication of the older sciences can support the development of younger siblings in some ways but there will be new avenues of growth that the former have never experienced.
The new data sciences have much to offer and therefore would benefit from the same expansive exploration opportunities that were afforded to the older sciences.
Dr. Monika Krishan, is a Senior Fellow at Centre for Public Policy Research (CPPR)
Views expressed by the author are personal and need not reflect or represent the views of the Centre for Public Policy Research.
Dr Monika Krishan's academic background includes a Master’s in Electrical Engineering from the Indian Institute of Science, Bangalore, India and a Ph.D. in Cognitive Psychology from Rutgers University, New Jersey, USA. Her research interests include image processing, psychovisual perception of textures, perception of animacy, goal based inference, perception of uncertainty and invariance detection in visual and non-visual domains. Areas of study also include the impact of artificial intelligence devices on human cognition from the developmental stages of the human brain, through adulthood, all the way through the aging process, and the resulting impact on the socio-cognitive health of society. She has worked on several projects on the cognitive aspects of the use and misuse of technology in social and antisocial contexts at SERC, IISc as well as the development of interactive graphics for Magnetic Resonance Imaging systems at Siemens. She is a member of Ohio University’s Consortium for the Advancement of Cognitive Science. She has offered services at economically challenged schools and hospitals for a number of years and continues to be an active community volunteer in the field of education and mental health