scale statistics. Place of measurement theory in data analysis methods

Each measurement over an object is made in a certain scale. Different coordinates of one observation vector can be expressed in different scales. So, in § 5.1 an example of an observation vector is given (Table 5.1), in which the first coordinates are in the nature of conditional labels (social affiliation of the family, gender and profession of the head of the family, quality of housing conditions), while the rest are expressed in numbers (number of family members , number of children, average annual income, etc.). The properties of these scales are very different from each other. Thus, about the sex of the head of the family, one can only say that he is either male or female, and that the sex of the male differs from that of the female; about housing conditions - that they coincide or differ and that in some cases some housing conditions are better than others; about expenses, we can say that the expenses for food of one family are less, equal, more than the expenses of another, it is possible to estimate the difference in expenses between families and calculate how many times the expenses of one family differ from the expenses of another.

The main types of scales and mathematical techniques for unifying data expressed in different scales, which usually precede the application of multivariate analysis methods, are described below.

10.2.1. Nominal scale.

This scale is used only to classify an individual, an object, into a certain class. If possible classes and rules for classifying an object in them are described in advance, then one speaks of a categorized scale, if not, then of an uncategorized one. An example of a categorized scale is gender. In the study, one of two values is assigned to an individual: the letter M or F, a special character or the number 1 or 2. In principle, other letters and numbers could be assigned, it is only important that a one-to-one correspondence between codes is maintained. To enter categorized data, it is convenient to use the “menu”, i.e. a list of possible categories with their codes. Examples of uncategorized nominal variables are first name, last name, place of birth.

Another important source of uncategorized nominal data is given in § 5.3. This is the case when an observation is given on a pair of objects, and the variable only indicates whether the objects belong to the same class or not, and does not indicate which classes they belong to.

The latter circumstance should not be regarded as a curiosity. Of course, if the classes are predetermined and it is not difficult to assign each object to a certain class, then this should be done and recorded to which class the object belongs. But sometimes classes are not described in advance, the creation of their complete classification is precisely the goal of the work, and at the same time it is possible to assess the belonging of objects to one class. For example, one can speak of a “close”, “similar” course of the disease in two patients, although all variants of the course of the disease have not been described. Moreover, the selection of empirically close variants of the course of the disease can serve as a starting point for the selection and description of all options development of the pathological process. The same applies to the allocation of socio-economic groups, etc.

The same variable can act in different qualities depending on the purpose of use. So, for example, an uncategorized nominal variable - the program name - serves only to individualize the program and, if there are few programs, can be found by directly browsing the list of programs. At the same time, if the program names in the list are sorted in some way (for example, in alphanumeric order), then the program name as a search image contains elements of an ordinal value. For every two names, we can say that they either coincide, or one of them precedes the other in the accepted way of ordering. When the ordering method changes, the follow relation also changes.

Arithmetic operations on quantities measured in the nominal scale are meaningless. Therefore, both the median and the arithmetic mean cannot be used as a meaningful measure of central tendency. More appropriate stats here are fashion.

10.2.2. Ordinal (ordinal) scale.

In addition to the function of assigning objects to a certain class, this scale also arranges classes according to the degree of expression of a given property. Each class is assigned its own symbol in such a way that in advance established order characters corresponded to the order of the classes. Thus, if numerical values are assigned to classes, then the classes will be ordered according to number sequence; if letters, then the classes will be ordered in alphabetical order, and if words, then the classes will be ordered according to the meanings of the words.

For example, in § 5.3 an example of an ordinal scale is given to describe the quality of housing conditions with four gradations (classes): “poor”, “satisfactory”, “good”, “very good”. Naturally, these classes could be numbered 1,2,3,4, or 4,3,2,1, or letters a, b, c, d and so on.

Other well-known examples of ordinal scales are: in medicine - a scale of stages of hypertension according to Myasnikov, a scale of degrees of heart failure according to Strazhesko - Vasilenko - Lang, a scale of severity of coronary insufficiency according to Vogelson; in mineralogy - the Mohs scale (talc -1, gypsum - 2, calcite - 3, fluorite - 4, apatite - 5, orthoclase - 6, quartz - 7, topaz - 8, corundum - 9, diamond - 10), according to which minerals classified according to the criterion of hardness; in geography - the Beaufort scale of winds ("calm", "light wind", "moderate wind", etc.).

The structure of the ordinal scale is not destroyed by any one-to-one transformation of codes that preserves order. Just as in the case of the nominal scale, arithmetic operations do not retain their meaning when converting ordinal scales, so it is advisable not to use them. It is easy to show that if we rely only on the properties of the scales and do not involve additional considerations external to the scales, then the only allowed statistics when using ordinal scales are members of the variation series.

10.2.3. quantitative scales.

The scale in which it is possible to reflect how much one of the objects differs from the other in terms of the degree of expression of a given property is called an interval scale. In order to set the interval scale, it is necessary to define the objects corresponding to the starting point and the unit of measure. And then, when measuring, assign a number to each object, showing how many units of measurement this object differs from the object taken as the starting point. The simplest example of an interval scale is temperature in degrees Celsius, where 0° is the starting point and 1° is the unit.

The structure of the interval scale does not change with linear transformations of the form The effect of such a transformation is to shift the starting point by b units and multiply the unit by a.

For example, by converting , where is the temperature in, you can change to a temperature in degrees Fahrenheit.

If the beginning in the interval scale is the absolute zero point, then it becomes possible to reflect in the scale how many times one measurement differs from another. The corresponding scale is called the ratio scale. The scale of relations allows transformations of the form . Most of the scales used in physics are either interval scales (to measure temperature, potential energy) or ratio scales (to measure time, mass, charge, distance).

Since quantitative scales allow arithmetic transformations, the arithmetic mean can be used to describe the integral trend in grouping data.

10.2.4. Unified representation of heterogeneous data.

Each type of scale has its own statistical technique. So, for variables measured on a nominal scale, you can use -test for polynomial distributions, -test for checking the absence of associations in contingency tables, tests for testing hypotheses about probability in the binomial distribution. The ordinal scale corresponds to methods based on the use of ranks (rank correlation, nonparametric criteria for testing hypotheses of the type ), etc.). With an interval scale, the entire arsenal of statistical methods can be used.

Moreover, statistical procedures have been developed for cases where vectors are observed, some coordinates of which are measured in one scale, and others in another. A typical example is the usual analysis of variance (see § 3.5), in which factors are measured on a nominal scale, and the responses corresponding to their combinations are measured on an interval scale.

However, in a number of statistical methods, especially in modern methods multivariate analysis, it is assumed that the data are measured in the same type of scales. In order to be able to apply these methods in the general case of heterogeneous data, various data unification techniques have been proposed. Let's get acquainted with the most important of them.

Reduction to binary variables. This method is based on the introduction, instead of each initial random variable, of a series of random variables that take only two values: 0 and 1.

For a nominal value having k gradations, k such values are entered that when when

The same technique is sometimes used when reducing a random variable measured on an ordinal scale to binary variables. However, in some cases it turns out to be convenient to single out not the event but the event. To compare the relative merits of these two methods, consider the following model problem. Let - a random variable uniformly distributed on the segment, - a small number;

The function obviously models the first way of passing to binary variables, and the function models the second. After simple calculations, we get:

The main disadvantage of the described technique is the introduction a large number new variables and partial loss of information contained in the data, both due to quantization and artificial reduction of the level of the scale used.

Digitization of nominal and ordinal variables. This method is directly opposite to the one just described, in which all variables are raised, pulled up to the level of quantitative ones by assigning numerical values to their gradations. Sometimes assigned values are called labels.

The choice of labels essentially depends on the purpose for which the digitization is performed. So, if the magnitude of the relationship between two nominal features is being studied, then the labels can be selected from the condition of maximizing the correlation coefficient between them , . If we are talking about assigning observations to one of the predetermined classes (discriminant analysis), then the choice of labels can be associated with the condition for maximizing the normalized distance in the multidimensional sample space between the centers of the studied populations (Mahalanobis distances). Sometimes this task is simplified and labels are assigned coordinate-wise so as to maximize only the normalized distance between the average values of a given coordinate. A statistical comparison on the example of one particular problem of the effectiveness of the global and coordinate-by-coordinate approach to digitization in discriminant analysis can be found in.

The presented methods of digitization, when marks are chosen from the condition of maximization of an appropriately selected functional, fit into the framework of the extreme approach mentioned in § 1.2 to the formulation of the main problems of mathematical statistics.

In general, the digitization of qualitative variables is a complex task both in computational and purely statistical terms. Some aspects of this problem are discussed in the works.

In an empirical study, for example, the following variables may occur (their most likely coding is indicated):

Gender: 1 = male

2 = female

Marital status: 1 = single/single

2 = married

3 = widower/widow

4 = divorced

Smoking 1 = non-smoking

2 = occasional smoker

3 = heavy smoker

4 = very heavy smoker.

Weight. Etc.

Consider first the column Paul. We see that the assignment of the correspondence of the numbers 1 and 2 to both sexes is absolutely arbitrary, they could be interchanged or designated by other numbers.

We certainly do not mean that women are one step below men, or that men are less important than women. Consequently, individual numbers do not correspond to any empirical value. In this case, one speaks of variables related to nominal scale. In our example, we consider a variable with a nominal scale that has two categories. Such a variable has another name - dichotomous.

The same situation is with the variable Marital status. There is also a correspondence between numbers and categories marital status has no empirical value. But unlike Paul, this variable is not dichotomous - it has four categories instead of two.

The possibilities for processing variables related to the nominal scale are very limited. Strictly speaking, only a frequency analysis of such variables can be carried out. For example, calculating the average value for the variable Marital status is completely meaningless. Variables related to nominal scale often used for groupings, by which the aggregate sample is broken down into categories of these variables. In partial samples, the same statistical tests are carried out, the results of which are then compared with each other.

As the next example, consider the variable Smoking. Here, the code digits are assigned an empirical value in the order in which they appear in the list. The variable Smoking is finally sorted in order of significance from bottom to top: a moderate smoker smokes more than a non-smoker, a heavy smoker smokes more than a moderate smoker, and so on. Such variables, for which numerical values are used corresponding to a gradual change in empirical significance, are referred to as ordinal scale.

However, the empirical significance of these variables does not depend on the difference between neighboring numerical values. So, despite the fact that the difference between the values of the code numbers for a non-smoker and an occasional smoker and an occasional smoker and a heavy smoker in both cases is equal to one, it cannot be argued that the actual difference between a non-smoker and an occasional smoker and between an occasional smoker and a heavy smoker is the same. For this, these concepts are too vague.

In addition to frequency analysis, variables with ordinal scale also allow the calculation of certain statistical characteristics, such as medians. In some cases, it is possible to calculate an average value. If a connection (correlation) with other variables of this kind is to be established, the rank correlation coefficient can be used for this purpose.

To compare different samples of variables related to the ordinal scale, non-parametric tests can be used, the formulas of which operate on ranks.

Consider now the variable Growth. Its absolute values reflect the ordinal relationship between the respondents, but the difference between the two values also has empirical significance. For example, if Ivan's height is 180, and Fedor's is 170, and Peter's is 160, we can say that Ivan is taller than Fedor, and even taller than Peter. Such variables that have a difference (interval) between two values and it has empirical significance are referred to as interval scale. They can be processed by any statistical methods without restrictions. So, for example, the average value is full statistical indicator to characterize such variables. These variables include Weight, Size, etc.

Very often variables interval scale, to which the data refers, can be called relationship scale. Therefore, in the Define Variable settings, both of these scales are defined as Scale (Metric).

Now we need to justify and define the type of scales for our variables: Vozrast (Age); Ves (Weight); Rost (Growth); Noga (Shoe size); Pol (Gender); Volos (Hair color); Glaz (Eye color).

Vozrast (Age), Ves (Weight), Rost (Height) - interval scale.

Pol (Gender), Volos (Hair color); Glaz (Eye color).- nominal scale.

After clarifying this crucial question, it is necessary to enter the type of scale in the table for our variables. Everything is done very simply: double-click on the variable name and the Define Variable window appears. In this window, select the frame Measurement (measurement) and set the button to one of three states for each variable.

Variables: Vozrast (Age), Ves (Weight), Rost (Height) will have a value scale.

Variables: Pol (Gender), Volos (Hair color); Glaz (Eye color) will matter Nominal.

Variables related to the scale Ordinal(ordinal) we will not.

We have dealt with the scale of variables. Now let's continue with the definition of variables.

type(variable type) to set the variable type, click the button Toure. The Define Variable Type dialog box opens. Accept the suggested setting Numeric(Numeric) and set the length to "2" for the variable Vozrast and the number of decimal places to "0", as this variable will only store age values. Confirm the setting with the OK button and move on to the next field of the Ves variable. Given the numerical encoding of variable values, our variables will all be Numeric.

Labels..(Variable label) is a name that allows you to describe the variable in more detail. After clicking on the button Labels.. a dialog box appears and you can enter up to 256 characters in it. Variable labels distinguish between uppercase and lowercase letters. They are displayed as they were entered. For the Vozrast variable, enter "student(s) age data" as the label.

In the same option, enter Value Labels (values). Value labels are a name that allows you to describe in more detail possible values variable. So, for example, in the case of the variable Pol, you can set the label "female" for the value "1" and the label "male" for the value "2". Confirm the default setting. However, data entry can also be confirmed with the key .

Missing values(missing values). SPSS allows two kinds of missing values:

System-defined missing values: If there are unfilled numeric cells in the data matrix, SPSS automatically identifies them as missing values. This fact is displayed in the data matrix using a comma (,).

User-defined missing values: If the variables are missing values in certain cases, for example, if the question was not answered, the answer is unknown, or there are other reasons, the user can use the Missing button to declare these values as missed. Missing values can be excluded from subsequent calculations. In our example, we will declare the answer option "0" (no data) for the variable Pol as a missing user-defined value.

Column Format(column format). The Columns field determines the width that this column will have in the table when displaying values. The column width can also be changed directly in the data editor window. To do this, place the mouse pointer on the separator between two column headers with variable names. The pointer will change. A double arrow that appears indicates that the corresponding column can be expanded or contracted by dragging.

Thus, having determined all the parameters of the variables, you can start entering the collected data for your group.

The use of certain statistical methods determines which statistical scale the received material belongs to. L.S. Stevens proposed to distinguish four statistical scales:

1) scale of names (or nominal);

2) a scale of order;

3) interval scale;

4) the scale of relations.

Knowing the typical features of each scale, it is not difficult to establish to which of them the material to be statistically processed should be attributed.

Name scale. This scale includes materials in which the studied objects differ from each other in their quality.

When processing such materials, there is no need to arrange these objects in any order based on their characteristics. In principle, objects can be placed in any order.

Here is an example: the composition of an international scientific conference is being studied. Among the participants there are French, British, Danes, Germans and Russians. Does it matter the order in which the participants are placed when examining the composition of the conference? You can arrange them alphabetically, this is convenient, but it is clear that there is no fundamental significance in this arrangement. When translating these materials into another language (and hence into another alphabet), this order will be violated. You can arrange the national groups according to the number of participants. But when comparing this material with the material of another conference, we find that this order is unlikely to be the same. The objects referred to the scale of names can be placed in any sequence depending on the purpose of the study.

In the statistical processing of such materials, one must take into account the number of units each object is represented by. There are very effective statistical methods that allow scientifically significant conclusions to be drawn from these numerical data (for example, the chi-square method).

Order scale. If in the naming scale the order of the studied objects practically does not play any role, then in the order scale - this is evident from its name - it is this sequence that all attention switches to.

This scale in statistics includes such research materials in which objects belonging to one or more classes are subject to consideration, but differ when they are compared one with another.– “more-less”, “higher-lower”, etc.

The easiest way to show the typical features of the order scale is to refer to the published results of any sports competition. In these totals, the participants who took the first, second, third and next places in order, respectively, are listed in sequence. But in this information about the results of competitions, information about the actual achievements of athletes is often missing or fades into the background, and their rankings are put in the foreground.

Let's say the chess player D. took first place in the competition. What are his achievements? It turns out he scored 12 points. Chess player E. took second place. His achievement is 10 points. The third place was taken by J. with eight points, the fourth by Z. with six points, and so on. In reports about the competition, the difference in achievements in the placement of chess players fades into the background, and their ordinal places remain in the first place. The fact that it is the ordinal place that is given the main importance has its own meaning. Indeed, in our example, Z scored six and D scored 12 points. These are their absolute achievements - the bets they won. If you try to interpret this difference in achievements purely arithmetically, then you would have to admit that Z. plays twice as badly as D., this cannot be accepted. The circumstances of the competition are not always simple, just as the way one or another participant conducted them is not always simple. Therefore, refraining from arithmetic absolutization, they limit themselves to establishing that the chess player Z. lags behind D., who won first place, by three ordinal places.

Interval scale. This includes materials that contain quantification of the object under study in fixed units.

Let's return to the experiments that the psychologist conducted with Sasha. The experiments took into account how many points Sasha himself and each of his peers could put, working at the maximum speed available to them. The evaluation units in the experiments were the number of points. Having calculated them, the researcher received the absolute number of points that it turned out to be possible to put in the allotted time for each participant in the experiments. The main difficulty in assigning materials to the scale of intervals is that it is necessary to have such a unit that would be identical to itself in all repeated measurements, i.e. the same and unchanged. In the example of chess players (order scale), there is no such unit.

Indeed, the number of games won by each competitor is taken into account. But it is clear that the games are far from the same, it is possible that the participant in the competition, who placed fourth - he won six games - won the most difficult game against the leader himself! But in the final results, as it were, it is assumed that all winning games are the same. In reality, this is not. Therefore, when working with such materials, it is appropriate to evaluate them in accordance with the requirements of the scale of order, and not the scale of intervals. Materials conforming to the spacing scale must have a unit of measure.

Relationship scale.This scale includes materials that take into account not only the number of fixed units, as in the scale of intervals, but also the ratio of the total results obtained to each other. To work with such relationships, you need to have some absolute point, from which the countdown is conducted. When studying psychological subjects, this scale is practically inapplicable.

Variables differ in how well they can be measured, or, in other words, how much measurable information their measurement scale provides. It is known that in each measurement there is some error that defines the boundaries of the “amount of information” that can be obtained in this measurement. The type of scale on which the measurement is taken is another factor that determines the amount of information contained in a variable. There are the following types of scales: nominal, ordinal (ordinal), interval relative (relationship scale). Accordingly, we have four types of variables.

Name scale(nominal scale) is actually not related to the concept of "value" and is used only for qualitative classification in order to distinguish one object from another: the number of an animal in a group or a unique code assigned to it, etc. These variables can only be measured as belonging to some significantly different classes; however, you will not be able to order these classes. For example, individuals belong to different nationalities. Typical examples of nominal variables are gender, nationality, color, city, etc. Often nominal variables are called categorical. Categorical variables are often represented as frequencies of observations that fall into certain categories and classes. If there are only two classes, then the variable will be called dichotomous. For example, in the study of the sample, it was found that the first category Female gender 30 subjects with elevated blood pressure were assigned, and the second category Gender male 25 subjects with elevated blood pressure were assigned. The possibilities for processing variables related to the nominal scale are very limited. Strictly speaking, only a frequency analysis of such variables can be carried out. For example, calculating the average value for a variable Floor , is completely meaningless.

ordinal scale (rank scale) - a scale, relative to the values of which it is impossible to say either how many times the measured value is greater (less) than another, nor how much it is greater (less). Such a scale only arranges objects by assigning certain points to them (the result of measurements is a non-strict ordering of objects). At the same time, it is indicated which of them have the quality expressed by this variable to a greater or lesser extent. However, they do not allow you to say "how much more" or "how much less". Ordinal variables are sometimes also called ordinal variables. The numbers of houses on the street are measured in an ordinal scale. A typical example of an ordinal variable is the socioeconomic status of a family. For the size of clothing, the following ordinal scale is used: S, M, L, XL, XXL, XXXL, XXXXL. The Mohs mineral hardness scale is also ordinal. The Beaufort and Richter earthquake scales are constructed similarly. Order scales are widely used in pedagogy, psychology, medicine, and other sciences that are not as precise as, say, physics and chemistry. In particular, the ubiquitous scale of school grades in points (five-point, twelve-point, etc.) can be attributed to the order scale. In biomedical research, order scales are ubiquitous and sometimes very cleverly disguised. For example, a thrombotest is used to analyze blood coagulation: 0 - no coagulation during the test time, 1 - "weak threads", 2 - jelly-like clot, 3 - clot, easily deformable, 4 - dense, elastic, 5 - dense, occupying the entire volume and so on. It is clear that the intervals between these poorly distinguishable and highly subjective positions are arbitrary. In this case, it makes no sense to compare the average values in two samples!! A lot of similar scales are still found in experimental toxicology, experimental surgery, experimental morphology. Ordinal scales in medicine are the stage scale of hypertension (according to Myasnikov), the scale of degrees of heart failure (according to Strazhesko-Vasilenko-Lang), the scale of severity of coronary insufficiency (according to Fogelson), etc. All these scales are built according to the scheme: the disease is not detected; the first stage of the disease; second stage; third stage. Each stage has its own medical characteristics. When describing disability groups, the numbers are used in the opposite order: the most severe - the first disability group, then - the second, the lightest - the third. In addition to frequency analysis, ordinal scale variables also allow the calculation of certain statistical characteristics, such as medians. In some cases, it is possible to calculate an average value. To compare different samples of variables related to the ordinal scale, non-parametric tests can be used, the formulas of which operate on ranks.

interval variables allow not only to order the objects of measurement, but also to numerically express and compare the differences between them. For example, temperature measured in degrees Fahrenheit or Celsius forms an interval scale. The Celsius scale, as you know, was set as follows: the freezing point of water was taken as zero, its boiling point as 100 degrees, and, accordingly, the temperature interval between freezing and boiling water was divided into 100 equal parts. Here the statement that the temperature of 40°C is twice as high as 20°C will be incorrect. The interval scale stores the ratio of interval lengths. Not only can you say that a temperature of 40°C is higher than a temperature of 30°C, but that an increase in temperature from 20°C to 40 degrees is twice the increase in temperature from 30 to 40 degrees. Such variables can be processed by any statistical methods without restrictions. So, for example, the average value is a full-fledged statistical indicator for characterizing such variables.

Relationship scales almost everything is measured. physical quantities- time, linear dimensions, areas, volumes, current strength, power, etc. This is the most powerful scale. This scale includes all interval variables that have an absolute zero point. In biomedical research, the ratio scale will take place, for example, when the time of appearance of a particular sign after the onset of exposure is measured (time threshold, in seconds, minutes), the intensity of exposure before the appearance of any sign (threshold of the exposure force in volts, roentgens and so on.). Naturally, all data in biochemical and electrophysiological studies (concentrations of substances, voltages, time indicators of an electrocardiogram, etc.) belong to the ratio scale. This also includes, for example, the number of correctly or incorrectly completed "tasks" in various tests for the study of higher education. nervous activity in animals. For example, the Kelvin temperature forms a ratio scale, and at the same time it can be argued that the temperature of 200 degrees is not only higher than 100 degrees, but at the same time it is twice as high. Interval scales (such as the Celsius scale) do not have this property of a ratio scale. Note that most statistical procedures do not distinguish between the properties of interval scales and ratio scales. For the last two scales, it is possible to calculate such numerical indicators as the mean value, standard deviation.

Let's look at a few more concrete examples of variables in an empirical study. Let's encode them like this:

Table 1.1

Scale types

We see that the encoding of the variable floor using the numbers 1 and 2 is absolutely arbitrary, they could be swapped or designated by other numbers. This does not mean that women are one step below men. In this case, one speaks of variables related to the nominal scale. The same situation is with the variable Family status. Here, too, the correspondence between numbers and categories of marital status has no empirical significance. But unlike gender, this variable is not dichotomous - it has four code digits instead of two.

Variable smoking sorted in order of importance from bottom to top: a moderate smoker smokes more than a non-smoker, and a heavy smoker smokes more than a moderate smoker, etc. These variables refer to the ordinal scale. However, the empirical significance of these variables does not depend on the difference between neighboring numerical values. So, despite the fact that the difference between the values of the code numbers for a non-smoker, a rare smoker and a heavy smoker in both cases is equal to one, it cannot be argued that the actual difference between a non-smoker, an occasional smoker and a heavy smoker is the same. For this, these concepts are too vague. Classical examples of variables with an ordinal scale are also variables obtained by grouping quantities into classes, such as monthly income in our example.

Consider now the intelligence quotient (IQ). And its absolute values reflect the ordinal relationship between the respondents, and the difference between the two values also has empirical significance. For example, if Fedor has an IQ of 80, Peter has an IQ of 120, and Ivan has an IQ of 160, you could say that Peter is as intelligent as Fedor as Ivan is as intelligent as Peter (namely, by 40 IQs). However, based only on the fact that Fedor's IQ is half that of Ivan, one cannot conclude that Ivan is twice as smart as Fedor. Such variables belong to the interval scale.

The highest statistical scale, on which the ratio of two values also acquires empirical significance, is the scale of ratios. An example of a variable related to such a scale is age: if Andrey is 30 years old and Alexey is 60, you can say that Alexey is twice as old as Andrey. The ratio scale is the Kelvin temperature scale with absolute zero temperatures.

In practice, including when processing data in the Statistica package, the difference between the variables related to the interval scale and the ratio scale is usually insignificant.

From a richer or more powerful scale, you can always go to a poorer one. Thus, continuous variables can be categorized. For example, continuous random variable(SV) Height can be translated from the scale of relations into an ordinal scale with gradations: low, medium, high.

Suppose the entire range of an interval variable was divided into high, medium, and low ranges, and each observation was assigned to one of three categories. This means that a phenomenon that was originally described on an interval scale can also be described on a naming scale, and therefore all those statistical methods that require the use of variables on the naming scale can be used to analyze this phenomenon. But it must be taken into account that when moving to a scale of names from scales of a higher order, we lose some information about observations. Observations that differed from each other when described on an interval scale may be perceived as the same when described on a scale of denominations. Therefore, it is recommended to use the naming scale only when it is not possible to use a scale of a higher order.

The use of certain statistical methods is determined by which statistical scale the received material belongs to. S. Stevens proposed to distinguish four statistical scales:

1. scale of names (or nominal);

2. order scale;

3. interval scale;

4. relationship scale.

Knowing the typical features of each scale, it is easy to determine which of them should include the material to be statistically processed.

Name scale. This scale includes materials in which the studied objects differ from each other in their quality.

When processing such materials, there is no need to arrange these objects in any order based on their characteristics. In principle, objects can be arranged in any sequence.

Order scale. If in the naming scale the order of the studied objects practically does not play any role, then in the order scale - this is evident from its name - it is precisely this sequence that all attention switches to.

This scale in statistics includes such research materials in which objects belonging to one or more classes are subject to consideration, but differ when compared one with another - “more-less”, “higher-lower” - and so on.

The easiest way to show the typical features of the order scale is to refer to the published results of any sports competition. In these results, the participants who have taken the first, second, third and next places in order are listed sequentially. But in this information about the results of competitions, information about the actual achievements of athletes is often missing or fades into the background, and their rankings are put in the foreground.

Let's say the chess player D. took first place in the competition. What are your achievements? It turns out he scored 12 points. Chess player E. took second place. His achievement is 10 points. The third place was taken by J. with eight points, the fourth - 3. with six points, etc. In the reports about the competition, the difference in achievements in the placement of chess players fades into the background, and their ordinal places remain in the first place. The fact that it is the ordinal place that is given the main importance has its own meaning. Indeed, in our example, Z scored six and D scored 12. These are their absolute achievements - the games they won. If we try to interpret this difference in achievements purely arithmetically, then we would have to admit that Z. plays twice as badly as D. But we cannot agree with this. The circumstances of the competition are not always simple, just as the way one or another participant conducted them is not always simple. Therefore, refraining from arithmetic absolutization, they limit themselves to the fact that they establish: the chess player 3. lags behind D., who took first place, by three ordinal places.

Interval scale. It includes such materials in which a quantitative assessment of the object under study is given in fixed units.

Let's return to the experiments that the psychologist conducted with Sasha. The experiments took into account how many points Sasha himself and each of his peers could put, working at the maximum speed available to them. The evaluation units in the experiments were the number of points. Having counted them, the researcher received the absolute number of points that it turned out to be possible to put in the allotted time for each participant in the experiments. The main difficulty in attributing materials to the scale of intervals is that it is necessary to have such a unit that would be identical to itself for all repeated measurements, that is, the same and unchanged. In the example with chess players (scale of order), such a unit does not exist at all.

Indeed, the number of games won by each participant of the competition is taken into account. But it is clear that the parties are far from identical. It is possible that the fourth place competitor - he won six games - won the most difficult game against the leader himself! But in the final results, as it were, it is assumed that all won games are the same. In reality, this is not the case. Therefore, when working with such materials, it is appropriate to evaluate them in accordance with the requirements of the scale of order, and not the scale of intervals. Materials conforming to the spacing scale must have a unit of measure.

Relationship scale. This scale includes materials that take into account not only the number of fixed units, as in the interval scale, but also the ratio of the total results obtained to each other. To work with such relationships, you need to have some absolute point, from which the countdown is conducted. When studying psychological objects, this scale is practically inapplicable.