Data Types and Sources

Data collection is the process of gathering and measuring information on variables of interest, in an established systematic fashion that enables one to answer stated research questions, test hypotheses, and evaluate outcomes. The data collection component of research is common to all fields of study including physical and social sciences, humanities, business, etc. While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. The goal for all data collection is to capture quality evidence that then translates to rich data analysis and allows the building of a convincing and credible answer to questions that have been posed.

Regardless of the field of study or preference for defining data (quantitative, qualitative), accurate data collection is essential to maintaining the integrity of research. Both the selection of appropriate data collection instruments (existing, modified, or newly developed) and clearly delineated instructions for their correct use reduce the likelihood of errors occurring.

A formal data collection process is necessary as it ensures that data gathered are both defined and accurate and that subsequent decisions based on arguments embodied in the findings are valid. The process provides both a baseline from which to measure and in certain cases a target on what to improve.

Types of Data

At the highest level, two kinds of data exist: quantitative and qualitative.

  • Quantitative data deals with numbers and things you can measure objectively: dimensions such as height, width, and length. Temperature and humidity. Prices. Area and volume.
  • Qualitative data deals with characteristics and descriptors that can’t be easily measured, but can be observed subjectively—such as smells, tastes, textures, attractiveness, and color.

Broadly speaking, when you measure something and give it a number value, you create quantitative data. When you classify or judge something, you create qualitative data. So far, so good. But this is just the highest level of data: there are also different types of quantitative and qualitative data.

Continuous Data and Discrete Data – There are two types of quantitative data, which is also referred to as numeric data: continuous and discrete. As a general rule, counts are discrete and measurements are continuous.

  • Discrete data is a count that can’t be made more precise. Typically it involves integers. For instance, the number of children (or adults, or pets) in your family is discrete data, because you are counting whole, indivisible entities: you can’t have 2.5 kids, or 1.3 pets.
  • Continuous data, on the other hand, could be divided and reduced to finer and finer levels. For example, you can measure the height of your kids at progressively more precise scales—meters, centimeters, millimeters, and beyond—so height is continuous data.

If I tally the number of individual watches in a box, that number is a piece of discrete data whereas a count of watches is discrete data. If I use a scale to measure the weight of each watch, or the weight of the entire box, that’s continuous data.

Continuous data can be used in many different kinds of hypothesis tests. For example, to assess the accuracy of the weight printed on the watch box, we could measure 30 boxes and perform a 1-sample t-test. Some analyses use continuous and discrete quantitative data at the same time. For instance, we could perform a regression analysis to see if the weight of watch boxes (continuous data) is correlated with the number of watches inside (discrete data).

Binomial Data, Nominal Data, and Ordinal Data – When you classify or categorize something, you create Qualitative or attribute data. There are three main kinds of qualitative data.

Binary data place things in one of two mutually exclusive categories: right/wrong, true/false, or accept/reject. Occasionally, I’ll get a box of watches that contains a couple of individual pieces that are either too hard or too dry. If I went through the box and classified each piece as “Good” or “Bad,” that would be binary data. I could use this kind of data to develop a statistical model to predict how frequently I can expect to get a bad watch.

When collecting unordered or nominal data, we assign individual items to named categories that do not have an implicit or natural value or rank. If I went through a box of watches and recorded the color of each in my worksheet, that would be nominal data. This kind of data can be used in many different ways—for instance, I could use chi-square analysis to see if there are statistically significant differences in the amounts of each color in a box.

We also can have ordered or ordinal data, in which items are assigned to categories that do have some kind of implicit or natural order, such as “Short, Medium, or Tall.”  Another example is a survey question that asks us to rate an item on a 1 to 10 scale, with 10 being the best. This implies that 10 is better than 9, which is better than 8, and so on.

The use for ordered data is a matter of some debate among statisticians. Everyone agrees its appropriate for creating bar charts, but beyond that the answer to the question “What should I do with my ordinal data?” is “It depends.”

Data can also be classified as either primary or secondary, as

  • Primary Data – Primary data means original data that has been collected specially for the purpose in mind. It means someone collected the data from the original source first hand. Data collected this way is called primary data. The people who gather primary data may be an authorized organization, investigator, enumerator or they may be just someone with a clipboard. These people are acting as a witness so primary data is only considered as reliable as the people who gathered it. Research where one gathers this kind of data is referred to as field research. For example: your own questionnaire.
  • Secondary Data – Secondary data is data that has been collected for another purpose. When we use Statistical Method with Primary Data from another purpose for our purpose we refer to it as Secondary Data. It means that one purpose’s Primary Data is another purpose’s Secondary Data. Secondary data is data that is being reused. Usually in a different context. Research where one gathers this kind of data is referred to as desk research.

Data Sources

Sources of failure rate and failure mode data can be classified as

  • Site/company specific : failure-rate data that have been collected from similar equipment being used on very similar sites (e.g. two or more gas compression sites where environment, operating methods, maintenance strategy and equipment are largely the same). Another example would be the use of failure rate data from a flow corrector used throughout a specific distribution network. These data might be applied to the RAMS  prediction for a new design of circuitry for the same application.
  • Industry specific : an example would be the use of the OREDA offshore failure rate data book for a RAMS prediction of a proposed offshore process package.
  • Generic: a generic data source combines a large number of applications and sources.
  • Field service data: It is the information gathered from products in use by customers and customer feedback. It is impacted by variations in installation, environment, operator procedures, etc but it represents realistic applications of the product.

Predictions require failure rates for specific modes of failure (e.g. open circuit, signal high, valve closes) and which only a few, data sources contain specific failure mode percentages. When considering data sources special attention must be paid to the conditions under which the product will be transported, stored and used.

Testability
Collection Methods

Get industry recognized certification – Contact us

keyboard_arrow_up