CSV Generator Pro Documentation

Version 2.9.1 | Last Updated: November 2024

✨ New in v2.9.1: Batch processing with per-config split settings! Process multiple configurations automatically.

Overview

CSV Generator Pro is a browser-based tool designed for data engineers, data scientists, and developers who need to generate realistic test data for AWS analytics services like Athena and Redshift. It runs entirely in your browser with no server-side processing or data collection.

Key Features

  • 41+ realistic data field types
  • Direct upload to Amazon S3
  • Deterministic ID generation for SQL JOINs
  • Configuration presets for reusability
  • Multi-column sorting
  • Pagination for large datasets
  • Auto-save for loaded configurations
  • 100% client-side, no data leaves your browser

Getting Started

Basic Usage

  1. Add Fields: Click "Add Field" to create a new column
  2. Configure Field: Select field type and set options (name, min/max values, etc.)
  3. Set Row Count: Specify how many rows to generate
  4. Generate: Click "Generate CSV" to create your data
  5. Export: Download as CSV or upload to S3
💡 Pro Tip: Start with a small row count (100-1000) to test your configuration before generating millions of rows.

Field Types

CSV Generator Pro supports 41+ field types designed to create realistic test data.

Personal Information

Field Type Description Example
full_name Complete name (first + last) John Smith
first_name First name only Sarah
last_name Last name only Johnson
email Email address john.smith@example.com
phone Phone number (555) 123-4567
ssn Social Security Number 123-45-6789

Location Data

Field Type Description Example
street_address Street address 123 Main Street
city City name San Francisco
state US state abbreviation CA
zip_code ZIP code 94102
country Country name United States
latitude Latitude coordinate 37.7749
longitude Longitude coordinate -122.4194

Numeric & Financial

Field Type Description Configurable
integer Whole numbers Min/Max values
decimal Floating point numbers Min/Max, decimal places
currency Money amounts Min/Max, currency symbol
credit_card Credit card numbers Card type (Visa, MC, etc.)
percentage Percentage values 0-100 range

Date & Time

Field Type Description Configurable
date Date values Start/End date, format
datetime Date and time Start/End range, format
time Time only Format (12/24 hour)
timestamp Unix timestamp Date range

Technical & Identifiers

Field Type Description Example
uuid UUID v4 550e8400-e29b-41d4-a716-446655440000
deterministic_id Consistent IDs for JOINs CUST-1001
ip_address IPv4 address 192.168.1.1
mac_address MAC address 00:1B:44:11:3A:B7
url Web URL https://example.com/page
username Username jsmith42
📖 Full List: See the application interface for the complete list of 41+ field types including company names, job titles, product names, and more.

Configuration Options

Field Configuration

Each field type has specific configuration options:

Common Options

  • Field Name: Column header in the CSV
  • Field Type: Type of data to generate
  • Nullable: Allow null/empty values

Numeric Options

  • Min Value: Minimum number
  • Max Value: Maximum number
  • Decimal Places: Precision for decimals

Date Options

  • Start Date: Earliest possible date
  • End Date: Latest possible date
  • Format: Date format string (YYYY-MM-DD, MM/DD/YYYY, etc.)

Amazon S3 Upload

Upload generated CSV files directly to your S3 bucket.

Setup

  1. Configure AWS credentials (Access Key ID and Secret Access Key)
  2. Specify S3 bucket name
  3. Set AWS region (e.g., us-east-1)
  4. Define file path using dynamic templates

Dynamic Path Templating

Use placeholders in your S3 path for dynamic organization:

data/{YYYY}/{MM}/{DD}/{table_name}_{timestamp}.csv

Examples:
data/2024/11/15/customers_1700012345.csv
data/2024/11/15/orders_1700012346.csv

Available Placeholders

  • {YYYY} - 4-digit year (2024)
  • {MM} - 2-digit month (01-12)
  • {DD} - 2-digit day (01-31)
  • {HH} - 2-digit hour (00-23)
  • {mm} - 2-digit minute (00-59)
  • {ss} - 2-digit second (00-59)
  • {timestamp} - Unix timestamp
⚠️ Security Note: AWS credentials are stored only in your browser's local storage and never transmitted to external servers. For production use, consider using temporary credentials or IAM roles.

CORS Configuration

Your S3 bucket must allow CORS requests. Add this policy to your bucket:

[
  {
    "AllowedHeaders": ["*"],
    "AllowedMethods": ["PUT", "POST"],
    "AllowedOrigins": ["*"],
    "ExposeHeaders": ["ETag"]
  }
]

Deterministic IDs

Create consistent IDs across multiple datasets to test relational data and SQL JOINs.

How It Works

Deterministic IDs generate the same set of IDs for a given configuration, allowing you to create related datasets:

// Customer dataset
customer_id | name
CUST-1001  | John Smith
CUST-1002  | Jane Doe

// Orders dataset (using same deterministic_id config)
order_id   | customer_id
ORD-5001  | CUST-1001
ORD-5002  | CUST-1002
ORD-5003  | CUST-1001

Configuration

  • Prefix: Text prefix (e.g., "CUST-", "ORD-")
  • Start Value: First ID number
  • Padding: Zero-pad to width (CUST-0001 vs CUST-1)

Use Cases

  • Testing foreign key relationships
  • Creating multi-table test scenarios
  • Validating JOIN operations in Athena/Redshift
  • Building dimensional models

Batch Processing

Automatically process and upload all saved configurations in sequence - perfect for populating multiple datasets in one operation.

How It Works

Batch processing cycles through all your saved configuration presets, loading each one, generating data, and uploading to S3:

  1. Loads configuration from dropdown (in order)
  2. Restores all settings including split preferences
  3. Generates data according to config
  4. Optionally pauses for review
  5. Uploads to S3 respecting each config's split settings
  6. Moves to next configuration

Configuration Requirements

Each configuration must have:

  • rowCount: Number of rows to generate
  • outputFormat: CSV, NDJSON, or Parquet

Configurations missing these fields will be skipped with a warning.

Per-Config Split Settings (v2.9.1)

New in v2.9.1: Split settings (Split by Date, Split by Fields) are now saved with each configuration. This means:

  • Config A can split by date while Config B doesn't split
  • Each config remembers its own split preferences
  • Batch processing respects individual config settings
  • No manual checkbox changes needed between configs

Pause Mode

Enable "Pause for confirmation before each upload" to:

  • Review generated data in the preview table
  • Verify configuration loaded correctly
  • Check console logs for any issues
  • Click Continue to proceed or Stop to cancel

Progress Tracking

During batch processing, you'll see:

  • Current config being processed (e.g., "Config 3 of 12")
  • Real-time success/failed/skipped counts
  • Detailed console logging of all operations
  • Final summary when complete

Use Cases

  • Daily Data Pipeline: Generate multiple test datasets in one click
  • Multi-Region Deployment: Upload same data to different S3 paths
  • Testing Scenarios: Create variations of test data automatically
  • Client Deliverables: Process multiple client datasets at once

Example Workflow

1. Create configs for:
   - Customers (1000 rows, split by country)
   - Orders (5000 rows, split by date)
   - Products (500 rows, no splitting)

2. Click "Batch Upload All Configs"

3. Result:
   ✓ Customers → 5 files (USA, Canada, UK, Germany, France)
   ✓ Orders → 30 files (one per day)
   ✓ Products → 1 file
   
Total: 36 files uploaded automatically

Configuration Presets

Save and load field configurations for reuse and sharing.

Saving Presets

  1. Configure your fields
  2. Click "Save Configuration"
  3. Downloads as JSON file

Loading Presets

  1. Click "Load Configuration"
  2. Select your JSON preset file
  3. Fields are automatically configured
  4. Configuration is auto-saved for the session

Preset Format

{
  "fields": [
    {
      "name": "customer_id",
      "type": "deterministic_id",
      "prefix": "CUST-",
      "startValue": 1001
    },
    {
      "name": "email",
      "type": "email",
      "nullable": false
    }
  ],
  "version": "2.3.0"
}
💡 Team Sharing: Share preset files with your team to ensure consistent test data schemas across projects.

Multi-Column Sorting

Sort generated data by multiple columns with configurable sort order.

Usage

  1. Generate your data
  2. Click column headers to sort
  3. Shift+Click for multi-column sort
  4. Click again to toggle ascending/descending

Use Cases

  • Test query sorting performance
  • Validate sort order logic
  • Create ordered test scenarios
  • Check index performance

AWS Analytics Integration

Amazon Athena

Query your generated data using Athena:

-- Create external table pointing to S3
CREATE EXTERNAL TABLE customers (
  customer_id STRING,
  name STRING,
  email STRING,
  signup_date DATE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-bucket/data/'
TBLPROPERTIES ('skip.header.line.count'='1');

-- Query your test data
SELECT * FROM customers LIMIT 10;

Amazon Redshift

Load data into Redshift:

-- Copy from S3 to Redshift
COPY customers
FROM 's3://your-bucket/data/'
IAM_ROLE 'arn:aws:iam::account:role/RedshiftRole'
CSV
IGNOREHEADER 1;

AWS Glue

Crawl and catalog your test data:

  1. Upload CSV to S3
  2. Create Glue Crawler pointing to S3 path
  3. Run crawler to auto-detect schema
  4. Query via Athena or use in ETL jobs

Troubleshooting

S3 Upload Fails

Common causes:

  • Check AWS credentials are correct
  • Verify bucket exists and region is correct
  • Ensure CORS is configured on bucket
  • Check bucket permissions allow PUT operations
  • Review browser console for detailed error messages

Large File Generation Slow

  • Generate in smaller batches (e.g., 100K rows at a time)
  • Use pagination to preview without generating all rows
  • Consider closing other browser tabs

Configuration Not Saving

  • Check browser allows local storage
  • Not in private/incognito mode
  • Try clearing browser cache and reloading
🔍 Debug Mode: Check the browser console (F12) for detailed logging and error messages.

Frequently Asked Questions

Is my data sent to any servers?

No. CSV Generator Pro runs entirely in your browser. Data is generated client-side and either downloaded to your computer or uploaded directly to your S3 bucket. No data passes through external servers.

How much data can I generate?

The limit depends on your browser's memory. Most modern browsers can handle millions of rows, but for very large datasets (10M+ rows), generate in batches.

Can I use this for production data?

This tool is designed for test data generation. While the data looks realistic, it should not be used as real production data.

What browsers are supported?

All modern browsers including Chrome, Firefox, Safari, and Edge. Requires JavaScript enabled.

Does CSV Generator Pro support Parquet format?

Yes! Parquet import and export are fully supported as of v2.8.0. You can import existing Parquet files, generate data and export to Parquet, or convert between CSV/NDJSON/Parquet formats. Parquet files use optimized columnar storage for faster Athena and Redshift queries.

How does batch processing work?

New in v2.9.0! Batch processing automatically cycles through all your saved configuration presets, generating and uploading each one to S3. You can enable "pause mode" to review data before each upload. Each configuration remembers its own split settings (v2.9.1), so some configs can split by date while others don't. Perfect for populating multiple test datasets in one operation.

Can I contribute or request features?

Yes! This is an open source project. Submit feature requests or contribute via GitHub.

How do I report bugs?

Report issues on GitHub or check the console for error messages to include in your bug report.

Need More Help?

If you have questions not covered here, check the GitHub repository or consider supporting the project to help fund expanded documentation.

Support Development