# Sample Size Considerations

As part of the review for all LTRC study proposals, power calculations will be made for the primary end point. Reports from the main LTRC database will be used to ascertain if adequate numbers of participants are available to achieve the study objectives for a proposed study. The DCC may assist in these calculations or an external reviewer may submit their own as part of their proposal.

The approach to estimating study size and power depends on the type of study and the type of end point considered. There are two types of end points that are anticipated in the LTRC studies: 1) Categorical end points such as whether a participant has elevated cytokine levels in a tissue sample and 2) Continuous end points such as pulmonary function test results or the actual levels of certain proteins in a tissue sample. Most categorical and continuous variables will be collected once for a participant.

Statistical power will be assessed for primary end point(s) and pre-specified secondary end points for each approved study. The DCC staff recommends adjusting the alpha level for secondary end points to reduce the number of spurious associations that would be detected if an alpha level of 0.05 was used for the analysis of all secondary end points. In prior studies, the DCC has recommended reducing the alpha level for analyzing secondary hypotheses to 0.01 as an indicator of statistical evidence and 0.001 as an indicator of strong evidence between the risk factor and the event.

In Section 7.2, two commonly used study size formulas for analyses of continuous and categorical end points are presented. These calculations are based upon the comparison of two groups. For example, a comparison of COPD participants to a control/comparison group of participants with other types of lung diseases (e.g., a case-control design), or comparison of participants with a specific lung disease who are in the early stages of the disease to participants with the same disease who are in the terminal stages of the disease. Both of these designs would allow for equal allocation of participants into the two groups, but this is not required. DCC staff will assist the investigators with issues concerning differential allocation versus balanced allocation in the design phase of each LTRC study. The DCC also has methods to estimate study size for comparing three or more groups and for estimating regression coefficients, but these types of designs are less frequently used and these formulas are not presented here. All study size estimates and power calculations will account for losses in power due to missing data. Sample size formulas will also account for the possibility that multivariate regression may be used in a study. In this circumstance, it is necessary to insure that sufficient numbers of participants are present in a study to allow regression analysis to be reliably carried out. In general, if one adheres to a rule of having 10 observations for every regressor (independent variable) anticipated to be included into a regression equation, the design will have adequate numbers to perform the required analyses.

As stated above, the main difference in performing study size calculations and power calculations for observational studies is the unequal number of participants that fall into the two comparison groups. We have designated these proportions by "a" and "1-a" in all of the study size formulas presented. The other features common to both of the sample size formulas are the critical values used to determine the alpha level and power of the test. We have designated these values as Z_{α} and Z_{β} respectively. "N" is the total sample size necessary for a study. The size of each comparison group can be obtained by multiplying "N" by "a" for one group and N by "1-a" for the other group. We have presented the formulas for study size calculations, but all of these formulas can be algebraically rearranged to give corresponding power calculations.

One of the most important aspects about the design of the analyses for the LTRC is to ensure that the investigators are able to assess and test sufficient numbers of samples that represent the full spectrum of each particular lung disease that is studied. Sufficient numbers are required so that an etiological pathway can be constructed for each disease and so that different diseases can be compared to determine how the pathways of different lung diseases are the same or different. This will require stratification to ensure sufficient numbers of data points are present for each disease and to insure that the samples for each disease are not over-weighted to specimens that have been collected long after the disease process has started.