# Are statistical functions used in research papers

The first subject received Treatment 1, and had Outcome 1. X and Y are the values of two measurements on each subject. We were unable to get a measurement for Y on the second subject, or on X for the last subject, so these cells are blank.

- Be sure to take all the other columns along with treatment, so that the data for each subject remains intact.
- Output location - New Sheet is the default. You can choose several adjacent columns for the Input Range in this case the X and Y columns , and each column is analyzed separately.
- The input range is a rectangular arrangement of cells, with rows representing levels of one factor, columns the levels of the other factor, and the cell contents the one value in that cell. With this arrangement you can do any of the analyses discussed here, and many others as well, without having to sort or rearrange your data in any way.
- Or, type in the cell address of the upper left corner of where you want to place the output in the current sheet. Get the correlation between X and Y.
- In our small test, we had to sort the rows in order to do the t-test, and copy some cells in order to get labels for the output.

We used this data to do some simple analyses and compared the results with a standard statistical package. The comparison considered the accuracy of the results as well as the ease with which the interface could be used for bigger data sets - i.

It includes a variety of choices including simple descriptive statistics, t-tests, correlations, 1 or 2-way analysis of variance, regression, etc. Two other Excel features are useful for certain analyses, but the Data Analysis tool pack is the only one that provides reasonably complete tests of statistical significance. Pivot Table in the Data menu can be used to generate summary tables of means, standard deviations, counts, etc.

Also, you could use functions to generate some statistical measures, such as a correlation coefficient. Functions generate a single number, so using functions you will likely have to combine bits are statistical functions used in research papers pieces to get what you want.

Even so, you may not be able to generate all the parts you need for a complete analysis. In order to check a variety of statistical tests, we chose the following tasks: Get means and standard deviations of X and Y for the entire group, and for each treatment group. Get the correlation between X and Y. Do a two sample t-test to test whether the two treatment groups differ on X and Y.

Do a paired t-test to test whether X and Y are statistically different from each other. Compare the number of subjects with each outcome by treatment group, using a chi-squared test.

- Data Arrangement Different analyses require the data to be arranged in various ways;
- Crosstabulation and Chi-Squared Test of Independence Our final task is to count the two outcomes in each treatment group, and use a chi-square test of independence to test for a relationship between treatment and outcome.

All of these tasks are routine for a data set of this nature, and all of them could be easily done using any of the aobve listed statistical packages. Look in the Tools menu. If you do not have a Data Analysis item, you will need to install the Data Analysis tools. Search Help for "Data Analysis Tools" for instructions. Missing Values A blank cell is the only way for Excel to deal with missing data. If you have any other missing value codes, you will need to change them to blanks.

Data Arrangement Different analyses require the data to be arranged in various ways.

## Use Excel's statistical functions to analyze data across worksheets

If you plan on a variety of different tests, there may not be a single arrangement that will work. You will probably need to rearrange the data several ways to get everything you need.

The typical dialog box will have the following items: Type the upper left and lower right corner cells. You can only choose adjacent rows and columns. Unless there is a checkbox for grouping data by rows or columns and there usually is notall the data is considered as one glop.

Labels - There is sometimes a box you can check off to indicate that the first row of your sheet contains labels. If you have labels in the first row, check this box, and your output MAY be labeled with your label.

Then again, it may not. Output location - New Sheet is the default. Or, type in the cell address of the upper left corner of where you want to place the output in the current sheet.

New Worksheet is another option, which I have not tried. Ramifications of this choice are discussed below. Other items, depending on the analysis. Output location The output from each analysis can go to a new sheet within your current Excel file this is the defaultor you can place it within the current sheet by specifying the upper left corner cell where you want it placed. Either way is a bit of are statistical functions used in research papers nuisance. If each output is in a new sheet, you end up with lots of sheets, each with a small bit of output.

You will want to make this column wide in order to be able to read the labels. But if a simple Frequency output is right underneath, then the column displaying the values being counted, which may just contain small integers, will also be wide.

Results of Analyses Descriptive Statistics The quickest way to get means and standard deviations for a entire group is using Descriptives in the Data Analysis tools. You can choose several adjacent columns for the Input Range in this case the X and Y columnsand each column is analyzed separately. The labels in the first row are used to label the output, and the empty cells are ignored. If you have more, non-adjacent columns you need to analyze, you will have to repeat the process for each group of contiguous columns.

The procedure is straightforward, can manage many columns reasonably efficiently, and empty cells are treated properly. To get the means and standard deviations of X and Y for each treatment group requires the use of Pivot Tables unless you want to rearrange the data sheet to separate the two groups. Finally, drag X in one more time, leaving it as Count of X.

This will give us the Average, standard deviation and number of observations in each treatment group for X.

- Do a two sample t-test to test whether the two treatment groups differ on X and Y;
- If you do frequencies on lots of variables, you will have difficulty knowing which frequency belongs to which column of data;
- Also, you could use functions to generate some statistical measures, such as a correlation coefficient;
- This is because each procedure requires that the data be arranged in a particular way, often different from the way another procedure wants the data arranged;
- Paired t-test The paired t-test is a method for testing whether the difference between two measurements on the same subject is significantly different from 0;
- In all fairness, it was never intended to be one.

Do the same for Y, so we will get the average, standard deviation and number of observations for Y also. This will put a total of six items in the Data box three for X and three for Y.

As you can see, if you want to get a variety of descriptive statistics for several variables, the process will get tedious. A statistical package lets you choose as many variables as you wish for descriptive statistics, whether or not they are contiguous. You can get the descriptive statistics for all the subjects together, or broken down by a categorical variable such as treatment.

You can select the statistics you want to see once, and it will apply to all variables chosen. Correlations Using the Data Analysis tools, the dialog for correlations is much like the one for descriptives - you can choose several contiguous columns, and get an output matrix of all pairs of correlations. Empty cells are ignored appropriately. The output does NOT include the number of pairs of data points used to compute each correlation which can vary, depending on where you have missing dataand does not indicate whether any of the correlations are statistically significant.

If you want correlations on non-contiguous columns, you would either have to include the intervening columns, or copy the desired columns to a contiguous location. A statistical package would permit you to choose non-contiguous columns for your correlations.

### Miss a tip?

The output would tell you how many pairs of data points were used to compute each correlation, and which correlations are statistically significant. Two-Sample T-test This test can be used to check whether the two treatment groups differ on the values of either X or Y.

- Using a statistical program, the data would normally be arranged with the rows representing the subjects, and the columns representing variables as they are in our sample data;
- How easily do the above procedures scale to a larger problem?
- This would become a serious drawback if you had more than a handful of columns, even if you use cut and paste or macros to reduce the work;
- Each time you move one column to the right of the original frequency cells, the column to be counted is shifted right from the first column you counted;
- However, it does not tolerate any empty cells anywhere in the input ranges, and you are limited to 16 independent variables.

Be sure to take all the other columns along with treatment, so that the data for each subject remains intact. Do not include the row with the labels, because the second group does not have a label row.

Therefore your output will not be labeled to indicate that this output is for X. The empty cells are ignored, and other than the problems with labeling the output, the results are correct.

A statistical package would do this task without any need to sort the data or copy it to another column, and the output would always be properly labeled to the extent that you provide labels for your variables and treatment groups. It would also allow you to choose more than one variable at a time for the t-test e. Paired t-test The paired t-test is a method for testing whether the difference between two measurements on the same subject is significantly different from 0.

In this example, we wish to test the difference between X and Y measured on the same subject. The important feature of this test is that it compares the measurements within each subject. If you scan the X and Y columns separately, they do not look obviously different. But if you look at each X-Y pair, you will notice that in every case, X is greater than Y. The paired t-test should be sensitive to this difference.

In the two cases where either X or Y is missing, it is not possible to compare the two measures on a subject. Hence, only 8 rows are usable for the paired t-test. When you run the paired t-test on this data, you get a t-statistic of 0. The test does not find any significant difference between X and Y.

Looking at the output more carefully, we notice that it says there are 9 observations. As noted above, there should only be 8. It appears that Excel has failed to exclude the observations that did not have both X and Y measurements. Are statistical functions used in research papers get the correct results copy X and Y to two new columns and remove the data in the cells that have no value for the other measure. Now re-run the paired t-test. This time the t-statistic is 6.

- Crosstabulation and Chi-Squared Test of Independence Our final task is to count the two outcomes in each treatment group, and use a chi-square test of independence to test for a relationship between treatment and outcome. Now select enough empty cells in one column to store the results - 4 in this example, even if the current column only has 2 values.
- It does not get better when you try to do more. Therefore, if you have any empty cells, you will need to copy all the columns involved in the regression to new columns, and delete any rows that contain any empty cells.
- The Regression procedure in the Data Analysis tools lets you choose one column as the dependent variable, and a set of contiguous columns for the independents.

The conclusion is completely different! Of course, this is an extreme example. But the point is that Excel does not calculate the paired t-test correctly when some observations have one of the measurements but not the other. Although it is possible to get the correct result, you would have no reason to suspect the results you get unless you are sufficiently alert to notice that the number of observations is wrong.

There is nothing in online help that would warn you about this issue. Apparently the functions and the Data Analysis tools are not consistent in how they deal with missing cells. Nevertheless, I cannot recommend the use of functions in preference to the Data Analysis tools, because the result of using a function is a single number - in this case, the 2-tail probability of the t-statistic.

The function does not give you the t-statistic itself, the degrees of freedom, or any number of other items that you would want to see if you were doing a statistical test. A statistical packages will correctly exclude the cases with one of the measurements missing, and will provide all the supporting statistics you need to interpret the output.

Crosstabulation and Chi-Squared Test of Independence Our final task are statistical functions used in research papers to count the two outcomes in each treatment group, and use a chi-square test of independence to test for a relationship between treatment and outcome.

In order to count the outcomes by treatment group, you need to use Pivot Tables. The Data area should say "Count of Outcome" — if not, double-click on it and select "Count". If you want both counts and percents, you can drag the same variable into the Data area twice, and use it once for counts and once for percents.