Introduction
The kickstartR
package helps you quickly initialize new
R analysis projects with a standardized, organized directory structure
and boilerplate files. This vignette will walk you through the main
functionality and show you how to get started with creating
well-organized R projects.
Why use kickstartR?
When starting a new data analysis project, you often face the same organizational challenges:
- Where should I put my raw data?
- How should I organize my R scripts?
- Where should outputs like plots and tables go?
- What should I include in my
.gitignore
file? - How can I make my project reproducible?
kickstartR
solves these problems by providing a
standardized project structure that follows R community best
practices.
Installation
You can install kickstartR from GitHub:
# Install devtools if you haven't already
install.packages("devtools")
# Install kickstartR from GitHub
devtools::install_github("sidhuk/kickstartR")
Basic Usage
The main function in kickstartR is initialize_project()
.
Let’s create a simple project:
# Create a new project in the current directory
initialize_project("MyAnalysisProject")
This creates a directory structure like this:
MyAnalysisProject/
├── 01_data/
│ ├── 01_raw/ # Original, immutable data
│ ├── 02_processed/ # Cleaned, transformed data
│ └── 03_external/ # External data sources
├── 02_scripts/ # R scripts and analysis code
│ └── 00_main_script.R # Main analysis script template
├── 03_output/ # All analysis outputs
│ ├── 01_figures/ # Generated plots and figures
│ ├── 02_tables/ # Generated tables and summaries
│ └── 03_reports_rendered/ # Rendered reports (HTML, PDF)
├── 04_models/ # Saved model objects
├── 05_notebooks/ # R Markdown, Jupyter notebooks
├── README.md # Project documentation
├── .gitignore # Git ignore rules
├── .here # For use with the 'here' package
└── MyAnalysisProject.Rproj # RStudio project file
Project Structure Explained
Data Organization
The 01_data/
folder is organized into three
subfolders:
-
01_raw/
: Store your original, immutable data files here. Never modify these! -
02_processed/
: Save cleaned, transformed, or merged datasets here -
03_external/
: Put data from external APIs, databases, or other sources here
Scripts and Code
The 02_scripts/
folder contains your analysis code. The
package creates a starter script 00_main_script.R
with best
practice templates.
Customization Options
You can customize the project structure using various parameters:
Specify a Different Location
# Create project in a specific directory
initialize_project("MyProject", path = "~/Documents/R_Projects")
Exclude Certain Folders
# Create a simpler project without models and notebooks folders
initialize_project("SimpleProject",
include_models = FALSE,
include_notebooks = FALSE)
Skip RStudio Project File
# Don't create .Rproj file (if you're not using RStudio)
initialize_project("MyProject", create_rproj = FALSE)
Overwrite Existing Projects
# Overwrite an existing directory (use with caution!)
initialize_project("ExistingProject", overwrite = TRUE)
Recommended Workflow
Here’s how to use kickstartR in your typical analysis workflow:
1. Create Your Project
initialize_project("CustomerAnalysis2024")
2. Open in RStudio
Open the .Rproj
file to set up your RStudio environment
with the correct working directory.
3. Add Your Data
Place your raw data files in 01_data/01_raw/
. For
example:
customer_data.csv
sales_data.xlsx
survey_responses.json
4. Start Analyzing
Edit 02_scripts/00_main_script.R
or create new scripts.
The template includes helpful patterns:
# Load libraries
library(tidyverse)
library(here)
# Set base directory
here::i_am("02_scripts/00_main_script.R")
# Load data
raw_data <- read.csv(here::here("01_data", "01_raw", "customer_data.csv"))
# Process data
processed_data <- raw_data %>%
filter(!is.na(important_column)) %>%
mutate(new_column = old_column * 2)
# Save processed data
write.csv(processed_data,
here::here("01_data", "02_processed", "cleaned_customers.csv"),
row.names = FALSE)
# Create visualizations
my_plot <- ggplot(processed_data, aes(x = x_var, y = y_var)) +
geom_point() +
theme_minimal()
# Save outputs
ggsave(here::here("03_output", "01_figures", "customer_analysis.png"),
my_plot, width = 10, height = 6, dpi = 300)
5. Use the here
Package
The package creates a .here
file to work seamlessly with
the here
package,
which provides robust file path management:
Best Practices
The project structure created by kickstartR embodies several best practices:
1. Separate Raw and Processed Data
- Keep original data immutable in
01_data/01_raw/
- Save all processed/cleaned data to
01_data/02_processed/
- This makes your workflow reproducible and prevents data loss
2. Organize Outputs by Type
- Separate figures, tables, and reports for easy navigation
- Use descriptive filenames with dates:
customer_analysis_2024-01-15.png
3. Use Relative Paths
- The
.here
file enables robust path management - Scripts work on any computer without path modification
Advanced Tips
Working with Teams
When sharing projects created with kickstartR:
- Git Repository: Initialize git and push to GitHub/GitLab
- Dependencies: Document required packages in your README
-
Data Sharing: Consider using
pins
,targets
, or cloud storage for large datasets -
Environment: Use
renv
for package version management
Troubleshooting
Common Issues
Error: Directory already exists
# Use overwrite = TRUE, but be careful!
initialize_project("MyProject", overwrite = TRUE)
Want different folder names?
Currently, kickstartR uses a fixed naming convention. If you need different folder names, you can:
- Create the project with kickstartR
- Manually rename folders as needed
- Update paths in your scripts accordingly
RStudio doesn’t recognize the project?
Make sure you’re opening the .Rproj
file, not just the
folder. The .Rproj
file tells RStudio this is a
project.
What’s Next?
After creating your project structure with kickstartR:
-
Explore your data in
05_notebooks/
with R Markdown -
Develop your analysis in
02_scripts/
- Document your process by updating the README
- Share your work by pushing to GitHub and sharing the repository
The standardized structure makes it easy for others (including future you!) to understand and reproduce your work.
Getting Help
If you encounter any issues or have suggestions:
- Check the function documentation
- Look at the GitHub repository
- Open an issue for bugs or feature requests
Happy analyzing! 🚀