Posts

Showing posts with the label 23-2

Computer Applications and Biostatistics

  Computer Applications and Biostatistics        Computer application in biology is a complex blend of two distinct scientific disciplines – computer technology and life science. The field of biology is invariably depended on statistics too giving rise to biostatistics. The amalgam of computer applications and biostatistics in combination with several scientific fields gave birth to an interdisciplinary field called bioinformatics, which reaches the biological predictions in an  in silico  way in combination with statistics. Advancement of computer technology has led a path for biological researches giant leap. Progress in the basic functionalities of a computer like store, process, retrieve and reuse, has led to well a synchronized interplay of biology, computing technology and statistics. Biological data is extensive and heterogeneous ranging from text based genome sequences, geometric and spatial information to patterns, large images and s...

Information and computer technology

Image
  Information and computer technology The definition of information technology (IT) as defined by the Oxford dictionary is the study and use of electronic system especially with the combination of computers and telecommunications for storing, retrieving, and share information. On contrary the term as described byfree on-line dictionary of computing (FOLDOC), is commonly used as a synonym for computers and computer technology, but it also encompasses other information distribution technologies such as television and telephones. The term is used usually in the context of a business or other enterprise. The term computer technology is usually reserved for the more theoretical, academic aspects of computing. Computer technology has a great history of development from an ancient digital computation aid ‘abacus’ to fourth generation core processors. All the developments have been depicted in the table - 1 (Morley, 2014; Parsons, 2011; Jain, 1989; webopedia.com). Computer hardware, softwa...

Biostatistics

  Biostatistics Statistics is a combination of logic and mathematics. Biostatistics is the science, which utilizes mathematical logic to the analysis and interpretation of the biological information and observations. Unlike the other scientific disciplines, Statistics is not a body of substantive knowledge, but only a body of methods of obtaining knowledge. It deals with only numerical data. Information supported by the numerical facts is precise and hence statistical information is more meaningful and measurable than the non-measurable information. For example, the statement “There is 90% chance of occurring of an event” is more accurate than the statement “There is a very good chance of occurring of the event”. In the ordinary sense the term 'Statistics' is used for numerical figures or statistical data, whereas in the wider sense the term refers to various statistical methods. Statistics is as old as human society itself. Its origin can be traced back to pre-Christian era. T...

Types of Data - Biostatistics

  Types of Data :- Observations recorded during research constitute data. There are three types of data i.e. nominal, ordinal, and interval data. Statistical methods for analysis mainly depend on type of data.  Nominal data:  This is synonymous with categorical datawhere data is simply assigned “names” or categories based on the presence or absence of certain attributes/characteristics without any ranking between the categories. For example,bacterial culture studies are categorized by growth as positive or negative to particular growth media. It also includes binominal data, which refers to two possible outcomes. For example, outcome of cancer may be death or survival, drug therapy with drug ‘X’ will show improvement or no improvement at all. Ordinal data : It is also called as ordered, categorical, orgraded data. Generally, this type of data is expressed as scores or ranks. There is a natural order among categories, and they can be ranked or arranged in order. For exampl...

Measures of Central Tendencies - Biostatistics

  Measures of Central Tendencies  Mean, median, and mode are the three measures of central tendencies. Mean is the common measure of centraltendency, most widely used in calculations of averages. It is least affected by sampling fluctuations. The mean of a number of individual values (X) is always nearer the true value of the individual value itself. Mean shows less variation than that of individual values, hence they give confidence in using them. It is calculated by adding up the individual values (Σx) and dividing the sum by number of items (n). Suppose height of 7 children's is 60, 70, 80, 90, 90, 100, and 110 cms. Addition of height of 7 children is 600 cm, so mean (X) = Σx/n = 600/7 = 85.71. Median is an average, which is obtained by getting middle values of a set of data arranged or ordered from lowest to the highest (or vice versa). In this process, 50% of the population has the value smaller than and 50% of samples have the value larger than median. It is used for sco...

Types of Distribution - Biostatistics

Image
  Types of Distribution:        Though this universe is full of uncertainty and variability, a large set of experimental/biological observations always tend towards a normal distribution. This unique behavior of data is the key to entire inferential statistics. There are two types of distribution. Gaussian /normal distribution:  If data is symmetricallydistributed on both sides of mean and form a bell-shaped curve in frequency distribution plot, the distribution of data is called normal or Gaussian. The noted statistician professor Gauss developed this, and therefore, it was named after him. The normal curve describes the ideal distribution of continuous values i.e. heart rate, blood sugar level and Hb % level. Whether our data is normally distributed or not, can be checked by putting our raw data of study directly into computer software and applying distribution test. Statistical treatment of data can generate a number of useful measurements, the most...

Applications of Standard Error of Mean

  Applications of SEM include:     To determine whether a sample is drawn from same population or not when it's mean is known.      To work out the limits of desired confidence within which the population mean should lie. For example, take fasting blood sugar of 200 lawyers. Suppose mean is 90 mg% and SD = 8 mg%. With 95% confidence limits, fasting blood sugar of lawyer's would be; n = 200, SD = 8; hence SEM = SD/√n=8/√200=8/14.14=0.56. Hence, Mean fasting blood sugar + 2 SEM = 90 + (2 × 0.56) = 91.12 while Mean fasting blood sugar - 2 SEM = 90 - (2 × 0.56) = 88.88. So, confidence limits of fasting blood sugar of lawyer's population are 88.88 to 91.12 mg %. If mean fasting blood sugar of another lawyer is 80, we can say that, he is not from the same population.

Confidence Interval (CI)

  Confidence Interval (CI):        Confidence limits are two extremes of a measurement within which 95% observations would lie. These describe the limits within which 95% of the mean values if determined in similar experiments are likely to fall. The value of ‘t’ corresponding to a probability of 0.05 for the appropriate degree of freedom is read from the table of distribution. By multiplying this value with the standard error, the 95% confidence limits for the mean are obtained as per formula below. Lower confidence limit = mean - (t0.05 × SEM) Upper confidence limit = mean + (t0.05 × SEM) If n > 30, the interval M ± 2(SEM) will include M with a probability of 95% and the interval M ± 2.8 (SEM) will include M with probability of 99%. These intervals are, therefore, called the 95% and 99% confidence intervals, respectively. The important difference between the ‘p’value and confidence interval is that confidence interval represents clinical significance, ...

Null Hypothesis

  Null Hypothesis:      The primary object of statistical analysis is to find out whether the effect produced by a compound under study is genuine and is not due to chance. Hence, the analysis usually attaches a test of statistical significance. First step in such a test is to state the null hypothesis. In null hypothesis (statistical hypothesis), we make assumption that there exist no differences between the two groups. Alternative hypothesis (research hypothesis) states that there is a difference between two groups. For example, a new drug ‘A’ is claimed to have analgesic activity and we want to test it with the placebo. In this study, the null hypothesis would be ‘drug A is not better than the placebo.’ Alternative hypothesis would be ‘there is a difference between new drug ‘A’ and placebo.’ When the null hypothesis is accepted, the difference between the two groups is not significant. It means, both samples were drawn from single population, and the difference ob...

Level of Significance

  Level of Significance:       If the probability (P) of an event or outcome is high, we say it is not rare or not uncommon. But, if the P is low, we say it is rare or uncommon. In biostatistics, a rare event or outcome is called significant, whereas a non-rare event is called non-significant. The ‘P’ value at which we regard an event or outcomes as enough to be regarded as significant is called thesignificance level. In research, most commonly P value less than 0.05 or 5% is considered as significant level. However, on justifiable grounds, we may adopt a different standard like P < 0.01 or 1%. Whenever possible, it is better to give actual P values instead of P < 0.05. Even if we have found the true value or population value from sample, we cannot be confident as we are dealing with a part of population only; howsoever big the sample may be. We would be wrong in 5% cases only if we place the population value within 95% confidence limits. Significant or in...

Outliers

  Outliers:      Sometimes, when we analyze the data, one value is very extreme from the others. Such value is referred as outliers. This could be due to two reasons. Firstly, the value obtained may be due to chance; in that case, we should keep that value in final analysis as the value is from the same distribution. Secondly, it may be due to mistake. Causes may be listed as typographical or measurement errors. In such cases, these values should be deleted, to avoid invalid results.

One tailed and two tailed Test

  One-tailed and two-tailed Test:      When comparing two groups of continuous data, the null hypothesis is that there is no real difference between the groups (A and B). The alternative hypothesis is that there is a real difference between the groups. This difference could be in either direction e.g. A > B or A < B. When there is some sure way to know in advance that the difference could only be in one direction e.g. A > B and when a good ground considers only one possibility, the test is called one-tailed test. Whenever we consider both the possibilities, the test of significance is known as a two-tailed test. For example, when we know that English boys are taller than Indian boys, the result will lie at one end that is one tail distribution, hence one tail test is used. When we are not absolutely sure of the direction of difference, which is usual, it is always better to use two-tailed test. For example, a new drug ‘X’ is supposed to have an antihypertens...

Importance of Sample Size Determination

  Importance of Sample Size Determination      Sample is a fraction of the universe. Studying the universe is the best parameter. But, when it is possible to achieve the same result by taking fraction of the universe, a sample is taken. Applying this, we are saving time, manpower, cost, and at the same time, increasing efficiency. Hence, an adequate sample size is of prime importance in biomedical studies. If sample size is too small, it will not give us valid results, and validity in such a case is questionable, and therefore, whole study will be a waste. Furthermore, large sample requires more cost and manpower. It is a misuse of money to enroll more subjects than required. A good small sample is much better than a bad large sample. Hence, appropriate sample size will be ethical to produce precise results.

Factors Influencing Sample Size Include

  Factors Influencing Sample Size Include   1)      Prevalence of particular event or characteristics- If the prevalence is high, small sample can be taken and vice versa. If prevalence is not known, then it can be obtained by a pilot study.   2)      Probability level considered for accuracy of estimate- If we need more safeguard about conclusions on data, we need a larger sample. Hence, the size of sample would be larger when the safeguard is 99% than when it is only 95%. If only a small difference is expected and if we need to detect even that small difference, then we need a large sample.   3)      Availability of money, material, and manpower.   4)      Time bound study curtails the sample size as routinely observed with dissertation work in post graduate courses.

Sample Size Determination and Variance Estimate

  Sample Size Determination and Variance Estimate    To calculate sample size, the formula requires the knowledge of standard deviation or variance, but the population variance is unknown. Therefore, standard deviation has to be estimated. Frequently used sources for estimation of standard deviation are: i.         A pilot or preliminary sample may be drawn from the population, and the variance computed from the sample may be used as an estimate of standard deviation. Observations used in pilot sample may be counted as a part of the final sample. ii.       Estimates of standard deviation may be accessible from the previous or similar studies, but sometimes, they may not be correct. Calculation of Sample Size   Calculation of sample size plays a key role while doing any research. Before calculation of sample size, following five points are to be considered very carefully. First of all, we have to assess the m...

How to Choose an Appropriate Statistical Test

  How to Choose an Appropriate Statistical Test There are number of tests in biostatistics, but choice mainly depends on characteristics and type of analysis of data. Sometimes, we need to find out the difference between means or medians or association between the variables. Number of groups used in a study may vary; therefore, study design also varies. Hence, in such situation, we will have to make the decision which is more precise while selecting the appropriate test. In appropriate test will lead to invalid conclusions. Statistical tests can be divided into parametric and non- parametric tests. If variables follow normal distribution, data can be subjected to parametric test, and for non- Gaussian distribution, we should apply non-parametric test. Statistical test should be decided at the start of the study. Following are the different parametric test used in analysis of various types of data. 1)   Student's ‘t’ Test       Mr. W. S. Gosset, a civil serv...