Skip to contents

Obtaining Marginal Distributions

Marginal distributions should first be obtained using the get_marginal_distributions() function.

To obtain the marginal distributions for all variables you should only specify the dataset:

To obtain marginal distributions for select variables, you should specify the variables using the variables parameter:

library(RESIDE)
marginals <- get_marginal_distributions(
  IST,
  variables = c(
    "SEX",
    "AGE",
    "ID14",
    "RSBP",
    "RATRIAL",
    "SET14D",
    "DSIDED"
  )
)

Printing the Marginal Distributions Prior to Export

Marginal distributions can be printed when generating marginal distributions using the print parameter:

library(RESIDE)
marginals <- get_marginal_distributions(
  IST,
  print = TRUE
)

Or from a stored marginals object:

library(RESIDE)
marginals <- get_marginal_distributions(IST)
print(marginals)

Exporting Marginal Distributions

Marginal distributions can be exported using the export_marginal_distributions() function, specifying the marginal distributions (generated by `get_marginal_distributions()’) and a folder path:

library(RESIDE)
marginals <- get_marginal_distributions(IST)
export_marginal_distributions(
  marginals,
  folder_path = "/Users/ryan/marginals"
)

This folder should exist and not contain any previously exported marginal distributions. You can create the folder automatically using the create_folder parameter:

library(RESIDE)
marginals <- get_marginal_distributions(IST)
export_marginal_distributions(
  marginals,
  folder_path = "/Users/ryan/marginals",
  create_folder = TRUE
)

Files created by export_marginal_distributions()

The following files will be created by the export_marginal_distributions() function:

  • binary_variables.csv - Contains the marginal distributions for binary variables including:
    • Variable Name
      • Mean
      • Number of Missing Observations
  • categorical_variables.csv Contains the marginal distributions for categorical variables including:
    • Variable Name & Category Name
    • Number of Observations in Each Category
    • NB Missing Observations are coded as a separate category labelled missing.
  • continuous_variables.csv - Contains the marginal distributions for continuous variables including:
    • Variable Name
    • Transformed Mean
    • Transformed Standard Deviation
    • Number of Missing Observations
    • Number of Decimal Places
  • continuous_quantiles.csv - Contains the Quantile mapping to allow for back transformation. For each continuous variable this contains:
    • The original quantile value
    • The transformed quantile value
    • An epsilon value to indicate the amount of thinning applied
  • summary.csv - Contains and overall summary of the dataset including:
    • Number of Rows
    • Number of Columns
    • Variable Names (for validation)

These files should then be sent to the user.

NB If there are no variables of a certain type the corresponding file will not be created.