CSV Generator Pro Documentation

Version 2.9.1 | Last Updated: November 2024

✨ New in v2.9.1: Batch processing with per-config split settings! Process multiple configurations automatically.

Overview

CSV Generator Pro is a browser-based tool designed for data engineers, data scientists, and developers who need to generate realistic test data for AWS analytics services like Athena and Redshift. It runs entirely in your browser with no server-side processing or data collection.

Key Features

41+ realistic data field types
Direct upload to Amazon S3
Deterministic ID generation for SQL JOINs
Configuration presets for reusability
Multi-column sorting
Pagination for large datasets
Auto-save for loaded configurations
100% client-side, no data leaves your browser

Getting Started

Basic Usage

Add Fields: Click "Add Field" to create a new column
Configure Field: Select field type and set options (name, min/max values, etc.)
Set Row Count: Specify how many rows to generate
Generate: Click "Generate CSV" to create your data
Export: Download as CSV or upload to S3

💡 Pro Tip: Start with a small row count (100-1000) to test your configuration before generating millions of rows.

Field Types

CSV Generator Pro supports 41+ field types designed to create realistic test data.

Personal Information

Field Type	Description	Example
`full_name`	Complete name (first + last)	John Smith
`first_name`	First name only	Sarah
`last_name`	Last name only	Johnson
`email`	Email address	john.smith@example.com
`phone`	Phone number	(555) 123-4567
`ssn`	Social Security Number	123-45-6789

Location Data

Field Type	Description	Example
`street_address`	Street address	123 Main Street
`city`	City name	San Francisco
`state`	US state abbreviation	CA
`zip_code`	ZIP code	94102
`country`	Country name	United States
`latitude`	Latitude coordinate	37.7749
`longitude`	Longitude coordinate	-122.4194

Numeric & Financial

Field Type	Description	Configurable
`integer`	Whole numbers	Min/Max values
`decimal`	Floating point numbers	Min/Max, decimal places
`currency`	Money amounts	Min/Max, currency symbol
`credit_card`	Credit card numbers	Card type (Visa, MC, etc.)
`percentage`	Percentage values	0-100 range

Date & Time

Field Type	Description	Configurable
`date`	Date values	Start/End date, format
`datetime`	Date and time	Start/End range, format
`time`	Time only	Format (12/24 hour)
`timestamp`	Unix timestamp	Date range

Technical & Identifiers

Field Type	Description	Example
`uuid`	UUID v4	550e8400-e29b-41d4-a716-446655440000
`deterministic_id`	Consistent IDs for JOINs	CUST-1001
`ip_address`	IPv4 address	192.168.1.1
`mac_address`	MAC address	00:1B:44:11:3A:B7
`url`	Web URL	https://example.com/page
`username`	Username	jsmith42

📖 Full List: See the application interface for the complete list of 41+ field types including company names, job titles, product names, and more.

Configuration Options

Field Configuration

Each field type has specific configuration options:

Common Options

Field Name: Column header in the CSV
Field Type: Type of data to generate
Nullable: Allow null/empty values

Numeric Options

Min Value: Minimum number
Max Value: Maximum number
Decimal Places: Precision for decimals

Date Options

Start Date: Earliest possible date
End Date: Latest possible date
Format: Date format string (YYYY-MM-DD, MM/DD/YYYY, etc.)

Amazon S3 Upload

Upload generated CSV files directly to your S3 bucket.

Setup

Configure AWS credentials (Access Key ID and Secret Access Key)
Specify S3 bucket name
Set AWS region (e.g., us-east-1)
Define file path using dynamic templates

Dynamic Path Templating

Use placeholders in your S3 path for dynamic organization:

data/{YYYY}/{MM}/{DD}/{table_name}_{timestamp}.csv

Examples:
data/2024/11/15/customers_1700012345.csv
data/2024/11/15/orders_1700012346.csv

Available Placeholders

{YYYY} - 4-digit year (2024)
{MM} - 2-digit month (01-12)
{DD} - 2-digit day (01-31)
{HH} - 2-digit hour (00-23)
{mm} - 2-digit minute (00-59)
{ss} - 2-digit second (00-59)
{timestamp} - Unix timestamp

⚠️ Security Note: AWS credentials are stored only in your browser's local storage and never transmitted to external servers. For production use, consider using temporary credentials or IAM roles.

CORS Configuration

Your S3 bucket must allow CORS requests. Add this policy to your bucket:

[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["PUT", "POST"],
    "AllowedOrigins": ["*"],
    "ExposeHeaders": ["ETag"]
  }
]

Deterministic IDs

Create consistent IDs across multiple datasets to test relational data and SQL JOINs.

How It Works

Deterministic IDs generate the same set of IDs for a given configuration, allowing you to create related datasets:

// Customer dataset
customer_id | name
CUST-1001  | John Smith
CUST-1002  | Jane Doe

// Orders dataset (using same deterministic_id config)
order_id   | customer_id
ORD-5001  | CUST-1001
ORD-5002  | CUST-1002
ORD-5003  | CUST-1001

Configuration

Prefix: Text prefix (e.g., "CUST-", "ORD-")
Start Value: First ID number
Padding: Zero-pad to width (CUST-0001 vs CUST-1)

Use Cases

Testing foreign key relationships
Creating multi-table test scenarios
Validating JOIN operations in Athena/Redshift
Building dimensional models

Batch Processing

Automatically process and upload all saved configurations in sequence - perfect for populating multiple datasets in one operation.

How It Works

Batch processing cycles through all your saved configuration presets, loading each one, generating data, and uploading to S3:

Loads configuration from dropdown (in order)
Restores all settings including split preferences
Generates data according to config
Optionally pauses for review
Uploads to S3 respecting each config's split settings
Moves to next configuration

Configuration Requirements

Each configuration must have:

rowCount: Number of rows to generate
outputFormat: CSV, NDJSON, or Parquet

Configurations missing these fields will be skipped with a warning.

Per-Config Split Settings (v2.9.1)

New in v2.9.1: Split settings (Split by Date, Split by Fields) are now saved with each configuration. This means:

Config A can split by date while Config B doesn't split
Each config remembers its own split preferences
Batch processing respects individual config settings
No manual checkbox changes needed between configs

Pause Mode

Enable "Pause for confirmation before each upload" to:

Review generated data in the preview table
Verify configuration loaded correctly
Check console logs for any issues
Click Continue to proceed or Stop to cancel

Progress Tracking

During batch processing, you'll see:

Current config being processed (e.g., "Config 3 of 12")
Real-time success/failed/skipped counts
Detailed console logging of all operations
Final summary when complete

Use Cases

Daily Data Pipeline: Generate multiple test datasets in one click
Multi-Region Deployment: Upload same data to different S3 paths
Testing Scenarios: Create variations of test data automatically
Client Deliverables: Process multiple client datasets at once

Example Workflow

1. Create configs for:
   - Customers (1000 rows, split by country)
   - Orders (5000 rows, split by date)
   - Products (500 rows, no splitting)

2. Click "Batch Upload All Configs"

3. Result:
   ✓ Customers → 5 files (USA, Canada, UK, Germany, France)
   ✓ Orders → 30 files (one per day)
   ✓ Products → 1 file
   
Total: 36 files uploaded automatically

Configuration Presets

Save and load field configurations for reuse and sharing.

Saving Presets

Configure your fields
Click "Save Configuration"
Downloads as JSON file

Loading Presets

Click "Load Configuration"
Select your JSON preset file
Fields are automatically configured
Configuration is auto-saved for the session

Preset Format

{
  "fields": [
    {
      "name": "customer_id",
      "type": "deterministic_id",
      "prefix": "CUST-",
      "startValue": 1001
    },
    {
      "name": "email",
      "type": "email",
      "nullable": false
    }
  ],
  "version": "2.3.0"
}

💡 Team Sharing: Share preset files with your team to ensure consistent test data schemas across projects.

Multi-Column Sorting

Sort generated data by multiple columns with configurable sort order.

Usage

Generate your data
Click column headers to sort
Shift+Click for multi-column sort
Click again to toggle ascending/descending

Use Cases

Test query sorting performance
Validate sort order logic
Create ordered test scenarios
Check index performance

AWS Analytics Integration

Amazon Athena

Query your generated data using Athena:

-- Create external table pointing to S3
CREATE EXTERNAL TABLE customers (
  customer_id STRING,
  name STRING,
  email STRING,
  signup_date DATE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-bucket/data/'
TBLPROPERTIES ('skip.header.line.count'='1');

-- Query your test data
SELECT * FROM customers LIMIT 10;

Amazon Redshift

Load data into Redshift:

-- Copy from S3 to Redshift
COPY customers
FROM 's3://your-bucket/data/'
IAM_ROLE 'arn:aws:iam::account:role/RedshiftRole'
CSV
IGNOREHEADER 1;

AWS Glue

Crawl and catalog your test data:

Upload CSV to S3
Create Glue Crawler pointing to S3 path
Run crawler to auto-detect schema
Query via Athena or use in ETL jobs

Troubleshooting

S3 Upload Fails

Common causes:

Check AWS credentials are correct
Verify bucket exists and region is correct
Ensure CORS is configured on bucket
Check bucket permissions allow PUT operations
Review browser console for detailed error messages

Large File Generation Slow

Generate in smaller batches (e.g., 100K rows at a time)
Use pagination to preview without generating all rows
Consider closing other browser tabs

Configuration Not Saving

Check browser allows local storage
Not in private/incognito mode
Try clearing browser cache and reloading

🔍 Debug Mode: Check the browser console (F12) for detailed logging and error messages.

Frequently Asked Questions

Is my data sent to any servers?

No. CSV Generator Pro runs entirely in your browser. Data is generated client-side and either downloaded to your computer or uploaded directly to your S3 bucket. No data passes through external servers.

How much data can I generate?

The limit depends on your browser's memory. Most modern browsers can handle millions of rows, but for very large datasets (10M+ rows), generate in batches.

Can I use this for production data?

This tool is designed for test data generation. While the data looks realistic, it should not be used as real production data.

What browsers are supported?

All modern browsers including Chrome, Firefox, Safari, and Edge. Requires JavaScript enabled.

Does CSV Generator Pro support Parquet format?

Yes! Parquet import and export are fully supported as of v2.8.0. You can import existing Parquet files, generate data and export to Parquet, or convert between CSV/NDJSON/Parquet formats. Parquet files use optimized columnar storage for faster Athena and Redshift queries.

How does batch processing work?

New in v2.9.0! Batch processing automatically cycles through all your saved configuration presets, generating and uploading each one to S3. You can enable "pause mode" to review data before each upload. Each configuration remembers its own split settings (v2.9.1), so some configs can split by date while others don't. Perfect for populating multiple test datasets in one operation.

Can I contribute or request features?

Yes! This is an open source project. Submit feature requests or contribute via GitHub.

How do I report bugs?

Report issues on GitHub or check the console for error messages to include in your bug report.

Need More Help?

If you have questions not covered here, check the GitHub repository or consider supporting the project to help fund expanded documentation.

Support Development