Documentation & User Guide

Welcome to the official documentation for the Synthetic Data Generator. This manual provides comprehensive guidance on configuring parameters for various statistical modules to synthesize robust, academically viable datasets.

1. Overview & General Interface

The primary dashboard serves as the central hub for module selection. Users can define global sample sizes, group names (e.g., Control, Experimental), and interdisciplinary variables (Numeric or Categorical).

Home Dashboard

2. Independent Samples T-Test

Designed for cross-sectional studies comparing the means of two distinct groups. Widely utilized in clinical trials (e.g., drug efficacy between Treatment and Placebo groups) and sociological surveys.

  • Group Configuration: Independently set Sample Size (N), Mean (μ), and Standard Deviation (σ) for each group.
  • Distribution: Option to enforce Normal Distribution mapping.
  • Output: Generates SPSS-compatible outputs including Levene's Test for Equality of Variances and two-tailed significance values (p-values).
T-Test Configuration
T-Test Results

3. Paired Samples T-Test

Employed for longitudinal or crossover studies where the same subjects are measured twice (e.g., Pre-test vs. Post-test). Focuses on synthesizing the mean difference between paired observations.

  • Paired Variables: Configure Mean and SD for 'Paired Var 1' and 'Paired Var 2'.
  • Target Control: Specify the exact number of pairs and target P-value (e.g., <0.05) to generate statistically valid paired differences.
Paired T-Test Configuration
Paired T-Test Results

4. One-Way ANOVA

Utilized when comparing the means of three or more independent groups. The algorithm synthesizes intra-group variance and inter-group differences to meet target F-values.

  • Multi-Group Support: Add dynamic groups (e.g., Control, Model Group, Treatment Group) with specific metrics.
  • Post-Hoc Analysis: Automatically calculates Tukey HSD Post Hoc tests, detailing Mean Differences, Standard Errors, and 95% Confidence Intervals between pairs.
ANOVA Configuration
ANOVA Output Data

5. Two-Way ANOVA

Examines the influence of two independent categorical variables on one continuous dependent variable. Essential for factorial designs to evaluate main effects and interaction effects.

  • Factor Matrix: Setup Factor A and Factor B, and define intersection group parameters.
  • Interaction Effects: Control significance across rows, columns, and their interaction components to simulate complex research hypotheses.
Two-Way ANOVA Setup
Two-Way ANOVA Interaction Data

6. Repeated Measures ANOVA

The extension of Paired T-Test to three or more time points. Ideal for longitudinal tracking over extended periods (e.g., baseline, month 1, month 3).

Repeated Measures Setup
Repeated Measures Results

7. Multiple Linear Regression

A core predictive modeling tool. It synthesizes a continuous Dependent Variable (Y) influenced by multiple Independent Variables (X), which can be continuous, categorical, or ordinal.

  • Target R² Range: Define the desired explanatory power (e.g., 0.40 ~ 0.60) of the overall model.
  • Variable Correlation: Specify the direction of influence (Positive/Negative) and desired P-value threshold for individual predictors (e.g., Total Cholesterol on Blood Sugar).
Linear Regression Settings
Linear Regression Data Preview

8. Binary Logistic Regression

Essential for classification problems where the outcome is dichotomous (DV=0 or DV=1). Highly prevalent in epidemiology for risk factor identification (e.g., Disease vs. Healthy).

  • Cross-Tabulation Matrix: Configure precise distributions of categorical predictors (e.g., Smoking, Gender) across the outcome states.
  • Significance Settings: Lock in P-values to ensure synthesized data reflects hypothesized Odds Ratios (OR).
Logistic Regression Configuration
Logistic Regression Output

9. Cox Proportional Hazards Regression

The gold standard for survival analysis. Simulates time-to-event data while accounting for right-censoring, allowing researchers to evaluate the effect of covariates on survival times.

  • Time & Event Configuration: Define total sample, actual event count (mortality/failure), and overall survival time span (e.g., 1~60 months).
  • Hazard Ratios (HR): Configure whether an independent variable increases risk (HR > 1) or decreases risk (HR < 1).
Cox Regression Configuration
Cox Regression Results

10. Non-Parametric Test (2 Independent Samples)

Mann-Whitney U test alternative for data that fails normality assumptions. Evaluates median differences based on rank-order logic.

Non-Parametric 2 Samples Config
Non-Parametric 2 Samples Output

11. Non-Parametric Test (K Independent Samples)

Kruskal-Wallis H test equivalent. Generates ordinal or non-normally distributed continuous data across three or more groups.

Non-Parametric K Samples Config
Non-Parametric K Samples Output

12. Chi-Square Test

Determines if there is a significant association between two categorical variables. Widely used for demographic cross-tabulations.

Chi-Square Setup
Chi-Square Results

13. Correlation Analysis

Simulates bivariate relationships (Pearson or Spearman) by enforcing target correlation coefficients (r-values) and significance levels.

Correlation Config
Correlation Matrix

14. ROC Curve Analysis

Evaluates the diagnostic ability of a continuous test variable in distinguishing between two states (e.g., Positive vs. Negative diagnosis).

  • Visual Feedback: Real-time plotting of Sensitivity vs. 1-Specificity.
  • AUC Control: Manipulate the mean and SD sliders for Positive and Negative groups to dynamically hit a target Area Under the Curve (AUC), e.g., 0.769.
ROC Curve Plot
ROC Curve Settings