Both business and science undertake research to survive and thrive. It is invariably a complex process that requires not only knowledge but also experience. Ultimately, however, data is always key, which will be my focus here. This post will attempt to define data, its purpose, whether or not it can be classified using specific criteria, and whether data comes solely from research.
First, what is ‘data’? There are many definitions available and, as usual, it is difficult to select one that is perfect, comprehensive, and precise.. In a broad sense, data is everything that is or can be processed to obtain information. It can be man-made content but also readings of machinery indicators or sensors.
First of all, data should not be confused with information. Data is usually unordered and unprocessed, and most often refers to the past. Data may be a medium for information after it is processed, analysed, and structured. .Then it becomes building blocks of specific messages, such as a recommendation to increase or decrease prices based on the current market circumstances. Hence, data per se is of no value to a researcher, academic, or manager ; it is only once it has been processed that analyses thereof provides information that can be useful.
Without going into the depths of knowledge management and the theory of information, the preparation of data and extraction of information is the foundation of the knowledge structure hierarchy, which is often presented as the DIKW pyramid (Data, Information, Knowledge, and Wisdom).
Analysed data yields information that can be used in effective, informed decision making. Data collection and analysis is the key element of research that makes it possible to answer research questions, hence driving science. In business, it is mostly about making decisions to grow the business, win new customers, improve product quality, or offer new products and services with the ultimate goal of maximising profit. Data on the social, economic, or cultural situation allows public administrations or NGOs to carry out diagnosis and implement measures to improve the life of the general public.
Data can be secondary and primary. Secondary data is available to the researcher without his or her intervention. It is most often produced, for example, by other researchers or as a result of collection and documentation of public life.. Secondary data can be further divided by form (raw data, processed data), source (private, public data), objectivity (objective, subjective), etc. Secondary data is often the first step towards an insight into the scale of the phenomenon of interest.
The other data category is primary data produced by the researcher using various research techniques such as surveys, experiments, in-depth interviews, or focus groups. The process of primary data collection, the research process will be discussed in future posts.
Various repositories, libraries, databases, and the internet offer a large quantity of research-compatible data. The digital revolution and the wide-spread use of the internet has resulted in the generation of vast quantities of data. You generate data by commenting on a post, sharing your thoughts on social media, scoring a seller after purchase, and even by merely visiting a website. You generate some of them knowingly, but others are collected by various algorithms and loggers on websites, or in mobile applications. Remember that secondary data is available not only as a ‘ready-made’ .sav or .xls file but also (mostly) as online data that needs to be downloaded by the researcher (for example from websites) and saved in an analysis-compatible file format.
Secondary data may include:
• Public statistics (published by such institutions as the Central Statistical Office, Eurostat, OECD, etc.);
• Internal data of companies and institutions (sales volume, website guests, number of requests made, etc.);
• Videos, pictures, audio files (digital and analogue);
• Blog posts, comments, even whole websites or portals;
• Books, newspapers, magazines, etc.;
• Social media content;
• Research results (reports, documents, datasets), including polls, research experiments, focus group interviews, etc.
Data can also result from human activity such as films, books, or research reports that can have a specific message. Such materials convey specific content to the recipient, but for a researcher, it is data for analysis. An example can be movies with a specific theme from a certain period (such as superhero movies, the transformation of the gangster films in the 1990s, or Reagan-era patriotic cinema), which can be analysed in terms of their content and structure.
Let us ponder on the advantages and disadvantages of working with secondary data. As I mentioned before, access to the internet has made data acquisition relatively easy without visits to libraries or archives (it is still often necessary in the case of historical data). Desk research is often the first step towards the determination of the scale of the phenomenon of interest in social or marketing research. Below there is a list of primary issues that need to be taken into consideration when selecting secondary data for analysis.
• Reliability of the data: one of the biggest disadvantages of secondary data. It is often difficult to ascertain that the data has not been tampered with or altered, and is truthful. Data on governmental websites can be considered reliable;
• Copyright and restricted use of secondary data;
• Fragmentation: secondary data does not always represent a continuous time interval of interest to the researcher, or may be missing some factual aspects. Secondary data is often scattered (in terms of sources), which makes them incomparable or may not be labelled adequately (the thematic field);
• Data validity: data may be outdated (often in the case of social research data published as datasets after some time);
• Technical challenges of data acquisition: not every researcher has the skills necessary for efficient acquisition of data from websites and data processing. Secondary data often requires long processing before relevant information can be extracted.
• It is usually cheaper to acquire secondary data compared to primary data.
• Data availability:easy access to the internet has made a lot of data available online;
• No influence of the researcher on the data acquisition process (so-called researcher bias);
• Analyses can be conducted for complete datasets, if available (for example, all issues of a magazine, often in a digital format).
To sum up, secondary data often helps the researcher gain at a relatively low cost the initial insight into the subject matter, discover the approach of other researchers, and the state of the phenomenon in the past. . The internet has made it easier to access various types of data so that the researcher does not need to burn the midnight oil over archives or traditional sources as they are often available in a digital form. When using secondary data, you need to pay attention to its reliability and validity. The researcher often needs to assess whether the available research material can be used for their purpose before any analysis takes place as the data may be copyrighted.
In conclusion, you need to remember that we all generate data. The internet and technological advances has made it easier to find, process, and acquire data for all manner of research purposes. This vast amount of data has resulted in the notion of Big Data referring to large, variable, and diverse datasets that can be used to obtain useful information. It would not be possible to process large datasets without dedicated software (such as PS CLEMENTINE PRO), which streamlines analyses of large volumes of data.
As I mentioned above, data can be divided into secondary data, which existed before and the form of which cannot be changed by the analyst, and primary data which is acquired as part of the research process. I will look deeper into the data acquisition process in a future article.
 If you are interested in definitions, check out a paper by Mariusz Grabowski and Agnieszka Zając Data, information, knowledge – attempted definition [in Polish, Dane, informacja, wiedza – próba definicji].
 Sułek A., Ogród metodologii socjologicznej [A Garden of Sociological Methodology]. Warsaw, 2002.
 If you are interested in data classification, check out Analiza danych zastanych [Desk Research] edited by Marta Makowska.
This blog is devoted to data collection and analysis with articles that aim to inspire data analysts from across the business world, academia and public sector. Our articles endeavor to inform, educate and entertain with one goal in mind: to show how to transform data into clear, attractive and usable information. We invite you to read and share.