STATISTICS : AN INTRODUCTION - Quantitative Techniques Tutorials

Origin and Growth of Statistics
The term statistics is ultimately derived from the New Latin statisticum collegium ("council of state") and the Italian word,statista ("statesman" or "politician"). The German Statistik , first introduced by Gottfried Achenwall (1749), originally designated the analysis of data about the state, signifying the "science of state" (then called political arithmetic in English).
Every State administration in the past collected and analyzed data. The data regarding population gave an idea about the possible military strength and the data regarding material wealth of a country gave an idea about the possible source of finance to the State. Similarly, data were collected for other purposes also. On examining the historical records of various ancient countries, one might find that almost all the countries had a system of collection of data.
In ancient Egypt, the data on population and material wealth of the country were collected as early as 3050 B.C., for the construction of pyramids. Census was conducted in Jidda in 2030 B.C. and the population was estimated to be 38,00,000. The first census of Rome was done as early as 435 B.C. After the 15th century the work of publishing the statistical data was also started but the first analysis of data on scientific basis was done by Captain John Graunt in the 17th century.
His first work on social statistics, ‘Observation on London Bills of Mortality' was published in 1662. During the same period the gamblers of western countries had started using statistics, because they wanted to know the more precise estimates of odds at the gambling table. This led to the development of the 'Theory of Probability'.
Ancient India also had the tradition of collection of statistical data. In ancient works, such as Manusmriti, Shukraniti, etc., we find evidences of collection of data for the purpose of running the affairs of the State where population, military force and other resources have been expressed in the form of figures. The fact and figures of the Chandragupta Mauraya's regime are described in 'Kautilya's Arthashastra'.
Statistics were also in use during the Mughal period. The data were collected regarding population, military strength, revenue, land revenue, measurements of land, etc. The system of data collection was described in Tuzuk - i - Babri and Ain-i-Akabari. During Akbar's period, his revenue minister, Raja Todarmal, made a well organised survey of land for the collection of land revenue. During the British period too, statistics were used in various areas of activities.
Although the tradition of collection of data and its use for various purposes is very old, the development of modern statistics as a subject is of recent origin. The development of the subject took place mainly after sixteenth century. The notable mathematicians who contributed to the development of statistics are Galileo, Pascal, De-Mere, Farment and Cardeno of the 17th century.
Then in later years the subject was developed by Abraham De Moivre (1667 - 1754), Marquis De Laplace (1749 - 1827), Karl Friedrich Gauss (1777 - 1855), Adolphe Quetelet (1796 - 1874), Francis Galton (1822 - 1911), etc. Karl Pearson (1857 - 1937), who is regarded as the father of modern statistics, was greatly motivated by the researches of Galton and was the first person to be appointed as Galton Professor in the University of London.
William S. Gosset (1876 - 1937), a student of Karl Pearson, propounded a number of statistical formulae under the pen-name of 'Student'. R.A. Fisher is yet another notable contributor to the field of statistics. His book 'Statistical Methods for Research Workers', published in 1925, marks the beginning of the theory of modern statistics.
The science of statistics also received contributions from notable economists such as Augustin Cournot (1801 - 1877), Leon Walras (1834 - 1910), Vilfredo Pareto (1848 - 1923), Alfred Marshall (1842 - 1924), Edgeworth, A.L. Bowley, etc. They gave an applied form to the subject.
Among the noteworthy Indian scholars who contributed to statistics are P.C. Mahalnobis, V.K.R.V. Rao, R.C. Desai, P.V. Sukhatme, etc.
Meaning and Definition of Statistics
The meaning of the word 'Statistics' is implied by the pattern of development of the subject. Since the subject originated with the collection of data and then, in later years, the techniques of analysis and interpretation were developed, the word 'statistics' has been used in both the plural and the singular sense. Statistics, in plural sense, means a set of numerical figures or data. In the singular sense, it represents a method of study and therefore, refers to statistical principles and methods developed for analysis and interpretation of data.
Statistics has been defined in different ways by different authors. These definitions can be broadly classified into two categories. In the first category are those definitions which lay emphasis on statistics as data whereas the definitions in second category emphasize statistics as a scientific method.
Statistics as Data
Statistics used in the plural sense implies a set of numerical figures collected with reference to a certain problem under investigation. It may be noted here that any set of numerical figures cannot be regarded as statistics. There are certain characteristics which must be satisfied by a given set of numerical figures in order that they may be termed as statistics. Before giving these characteristics it will be advantageous to go through the definitions of statistics in the plural sense, given by noted scholars.
"Statistics are numerical facts in any department of enquiry placed in relation to each other.”
- A.L. Bowley The main features of the above definition are: (i) Statistics (or Data) implies numerical facts. (ii) Numerical facts or figures are related to some enquiry or investigation. (iii) Numerical facts should be capable of being arranged in relation to each other.
On the basis of the above features we can say that data are those numerical facts which have been expressed as a set of numerical figures related to each other and to some area of enquiry or research. We may, however, note here that all the characteristics of data are not covered by the above definition.
"By statistics we mean quantitative data affected to a marked extent by multiplicity of causes.”
 - Yule & Kendall
 This definition covers two aspects, i.e., the data are quantitative and affected by a large number of causes.
"Statistics are classified facts respecting the conditions of the people in a state- especially those facts which can be stated in numbers or in tables of numbers or in any other tabular or classified arrangement.”
- Webster
"A collection of noteworthy facts concerning state, both historical and descriptive.”
 - Achenwall
 Definitions 3 and 4, given above, are not comprehensive because these confine the scope of statistics only to facts and figures related to the conditions of the people in a state. However, as we know that data are now collected on almost all the aspects of human and natural activities, it cannot be regarded as a state-craft only.
"Statistics are measurements, enumerations or estimates of natural or social phenomena, systematically arranged, so as to exhibit their interrelations.”
 - L.R. Connor
 This definition also covers only some but not all characteristics of data.
"By statistics we mean aggregate of facts affected to a marked extent by a multiplicity of causes, numerically expressed, enumerated or estimated according to a reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose and placed in relation to each other.”
- H. Secrist
This definition can be taken as a comprehensive definition of statistics since most of the characteristics of statistics are covered by it.
Characteristics of Statistics as Data
On the basis of the above definitions we can now state the following characteristics of statistics as data :
Statistics are numerical facts: Any set of facts can be called as statistics or data only if it is capable of being represented numerically or quantitatively. Ordinarily, the facts can be classified into two categories: Measurable facts that can be represented by numerical measurements. For example, measurement of heights of students in a college, income of persons in a locality, yield of wheat per acre in a certain district, etc.Facts that are not measurable but we can feel the presence or absence of the characteristics. Honesty, color of hair or eyes, beauty, intelligence, smoking habit etc., are examples of immeasurable facts. Statistics or data of immeasurable facts can be obtained by counting the number of individuals in different categories. For example, the population of a country can be divided into three categories on the basis of complexion of the people such as white, whitish or black.Statistics are aggregate of facts:
Statistics are always aggregates of facts. A single figure pertaining to age, birth, death, production, employment, etc., does not constitute statistics because they are not comparable and related, but a series of ages, births, deaths, production and employment is called statistics because they are related and comparable.
It is also possible to study them in relation to time, place and frequency of occurrence. For example, the number of deaths in an accident is 1, 00,000 people in a city in 1999 is not statistics even though it is a numerical statement of facts. But this number is higher when compared to an accident in another city where 85,000 deaths occurred in this same year. This is called statistics because it is related and comparable.
Statistics are affected to a marked extent by a multiplicity of factors:
Statistical facts and figures are not generally traceable to a single cause. They are affected to a considerable extent by a number of factors working together.
For example, statistics of yield of paddy are highly affected by factors such as fertility of the soil, amount of rainfall, seeds, fertilizer used, pesticides and insecticides applied, etc. It is not possible to study separately the effect of each of these factors on the yield of paddy.
Statistics are either enumerated or estimated with reasonable standard of accuracy:
Statistical facts and figures can be enumerated or estimated by expert enumerators. It is clear that estimated figures cannot be absolutely precise and accurate. The degree of accuracy desired depends largely on the purpose for which statistics are collected and also the nature of the study. If precise results are desired, statistics should be accurately compiled, but if only general impressions are looked for, even though estimates may serve the purpose.
For example, if the distance between two places is measured, even a difference of a few kilometers may be ignored. Therefore, 100% accuracy cannot be expected in any statistical enumeration or estimation. Some reasonable standards of accuracy can be attained, otherwise results may be misleading.
Statistics are collected for a predetermined purpose: The purpose of collecting statistical facts and figures should be pre-determined in the sense that the purpose should be decided and well defined in advance. It is clear that if statistical facts and figures are not collected with some well defined purpose decided in advance, the collected data would be of limited use. If they are collected with some specific purpose decided in advance, the collected data would be of great use.Statistics should be capable of being placed in relation to each other: Statistics are capable of being placed in relation to each other. It means that they are comparable. The statistical facts and figures collected should be homogenous in character because it is not possible to have valid comparisons of these facts and figures which are not homogenous in character, For example, the ages of husbands are to be compared with the corresponding ages of wives but the ages of husbands cannot be compared with the heights of trees.
Example:
Would you regard the following information as statistics? Explain by giving reasons.
The height of a person is 160cms.The height of Ram is 165cms and of Shyam is 155cms.Ram is taller than Shyam.Ram is taller than Shyam by 10cms.The height of Ram is 165cms and weight of Shyam is 55kgs.
Solution:
Each of the above statement should be examined with reference to the following conditions:
Whether information is presented as aggregate of numerical figuresWhether numerical figures are homogeneous or comparableWhether numerical figures are affected by a multiplicity of factors
On examination of the given information in the light of these conditions we find that only the information given by statement (ii) can be regarded as statistics. It should be noted that condition (c) will be satisfied, almost invariably. In order to illustrate the circumstances in which this condition is not satisfied, we assume that a relation between quantity demanded and price of a commodity is given by the mathematical equation q = 100 - 10p and the quantity demanded at various prices, using this equation, is shown in the following table,
quantitative techniques for management syllabus

The above information cannot be regarded as statistics because here quantity demanded is affected by only one factor, i.e., price and not by a multiplicity of factors. Contrary to this, the figures of quantity demanded obtained from a market at these very prices are to be regarded as statistics.
Statistics as a Science
The use of the word 'STATISTICS' in singular form refers to a science which provides methods of collection, analysis and interpretation of statistical data. Thus, statistics as a science is defined on the basis of its functions and different scholars have defined it in a different way. In order to know about various aspects of statistics, we now state some of these definitions.
"Statistics is the science of counting.”
- A.L. Bowley
"Statistics may rightly be called the science of averages.”
- A.L. Bowley
"Statistics is the science of measurement of social organism regarded as a whole in all its manifestations.”
- A.L. Bowley
"Statistics is the science of estimates and probabilities.”
- Boddington
All of the above definitions are incomplete in one sense or the other because each considers only one aspect of statistics. According to the first definition, statistics is the science of counting. However, we know that if the population or group under investigation is large, we do not count but obtain estimates.
The second definition viz. statistics is the science of averages, covers only one aspect, i.e., measures of average but, besides this, there are other measures used to describe a given set of data.The third definition limits the scope of statistics to social sciences only. Bowley himself realised this limitation and admitted that scope of statistics is not confined to this area only.
The fourth definition considers yet another aspect of statistics. Although, use of estimates and probabilities have become very popular in modern statistics but there are other techniques, as well, which are also very important. The following definitions cover some more but not all aspects of statistics.
"The science of statistics is the method of judging collective, natural or social phenomena from the results obtained by the analysis or enumeration or collection of estimates.”
- W.I. King
"Statistics or statistical method may be defined as collection, presentation, analysis and interpretation of numerical data.”
- Croxton and Cowden
This is a simple and comprehensive definition of statistics which implies that statistics is a scientific method.
"Statistics is a science which deals with collection, classification and tabulation of numerical facts as the basis for the explanation, description and comparison of phenomena.”
- Lovitt
"Statistics is the science which deals with the methods of collecting, classifying, presenting, comparing and interpreting numerical data collected to throw some light on any sphere of enquiry.”
- Seligman
The definitions given by Lovitt and Seligman are similar to the definition of Croxton and Cowden except that they regard statistics as a science while Croxton and Cowden have termed it as a scientific method.
With the development of the subject of statistics, the definitions of statistics given above have also become outdated. In the last few decades the discipline of drawing conclusions and making decisions under uncertainty has grown which is proving to be very helpful to decision-makers, particularly in the field of business. Although, various definitions have been given which include this aspect of statistics also, we shall now give a definition of statistics, given by Spiegel, to reflect this new dimension of statistics.
"Statistics is concerned with scientific method for collecting, organizing, summarizing, presenting and analyzing data as well as drawing valid conclusions and making reasonable decisions on the basis of such analysis.”
On the basis of the above definitions we can say that statistics, in singular sense, is a science which consists of various statistical methods that can be used for collection, classification, presentation and analysis of data relating to social, political, natural, economical, business or any other phenomena. The results of the analysis can be used further to draw valid conclusions and to make reasonable decisions in the face of uncertainty.
Statistics as a Science different from Natural Sciences
Science is a body of systematized knowledge developed by generalizations of relations based on the study of cause and effect. These generalized relations are also called the laws of science. For example, there are laws in physics, chemistry, statistics, mathematics, etc. It is obvious from this that statistics is also a science like any other natural science. The basic difference between statistics and other natural sciences lies in the difference in conditions under which its experiments are conducted.
Where as the experiments in natural sciences are done in laboratory, under more or less controlled conditions, the experiments in statistics are conducted under uncontrolled conditions. Consider, for example, the collection of data regarding expenditure of households in a locality. There may be a large number of factors affecting expenditure and some of these factors might be different for different households.
Due to these reasons, statistics is often termed as a non-experimental science while natural sciences are termed as experimental sciences. We may note here that social sciences like economics, business, sociology, geography, political science, etc., belong to the category of non-experimental science and thus, the laws and methods of statistics can be used to understand and analyze the problems of these sciences also.
Statistics as a Scientific Method
We have seen above that, statistics as a non-experimental science can be used to study and analyze various problems of social sciences. It may, however, be pointed out that there may be situations even in natural sciences, where conducting of an experiment under hundred per cent controlled conditions is rather impossible. Statistics, under such conditions, finds its use in natural sciences, like physics, chemistry, etc.
In view of the uses of statistics in almost all the disciplines of natural as well as social sciences, it will be more appropriate to regard it as a scientific method rather than a science. Statistics as a scientific method can be divided into the following two categories:
(a) Theoretical Statistics and (b) Applied Statistics
Theoretical Statistics: Theoretical statistics can be further sub-divided into the following three categories: Descriptive Statistics: All those methods which are used for the collection, classification, tabulation, diagrammatic presentation of data and the methods of calculating average, dispersion, correlation and regression, index numbers, etc., are included in descriptive statistics.Inductive Statistics: The methods that deal with generalizations, predictions, estimations, and decisions from data initially presented. The techniques of forecasting are also included in inductive statistics.Inferential Statistics: The method is used to make claims about the populations that give rise to the data we collect. This requires that we go beyond the data available to us. Consequently, the claims we make about populations are always subject to error; hence the term “inferential statistics” and not deductive statistics. This kind of data enables us to make confident decisions in the face of uncertainty.

Applied Statistics: It consists of the application of statistical methods to practical problems. Design of sample surveys, techniques of quality control, decision-making in business, etc., are included in applied statistics.

Is statistics a Science or an Art?
Statistics is both a science and an art in the sense that statistics affects everybody and touches life at many points. Statistics in an art in the sense that it is concerned with the skill of collecting and handling of data for the purpose of achieving a given objective, i.e. formulation of future policies which may become more reliable. By nature, the science of statistics is less precise than other physical and natural sciences.
It is because of the fact that statistics deals with such variables whose individual effects cannot be separately studied. In fact, science is only knowledge but art is an action. As such, statistics in an art. Based on the above analysis, statistics claims on both, i.e. a science and an art.

No comments:

Post a Comment