CSV Generator Pro Documentation
Version 2.9.1 | Last Updated: November 2024
Overview
CSV Generator Pro is a browser-based tool designed for data engineers, data scientists, and developers who need to generate realistic test data for AWS analytics services like Athena and Redshift. It runs entirely in your browser with no server-side processing or data collection.
Key Features
- 41+ realistic data field types
- Direct upload to Amazon S3
- Deterministic ID generation for SQL JOINs
- Configuration presets for reusability
- Multi-column sorting
- Pagination for large datasets
- Auto-save for loaded configurations
- 100% client-side, no data leaves your browser
Getting Started
Basic Usage
- Add Fields: Click "Add Field" to create a new column
- Configure Field: Select field type and set options (name, min/max values, etc.)
- Set Row Count: Specify how many rows to generate
- Generate: Click "Generate CSV" to create your data
- Export: Download as CSV or upload to S3
Field Types
CSV Generator Pro supports 41+ field types designed to create realistic test data.
Personal Information
| Field Type | Description | Example |
|---|---|---|
full_name |
Complete name (first + last) | John Smith |
first_name |
First name only | Sarah |
last_name |
Last name only | Johnson |
email |
Email address | john.smith@example.com |
phone |
Phone number | (555) 123-4567 |
ssn |
Social Security Number | 123-45-6789 |
Location Data
| Field Type | Description | Example |
|---|---|---|
street_address |
Street address | 123 Main Street |
city |
City name | San Francisco |
state |
US state abbreviation | CA |
zip_code |
ZIP code | 94102 |
country |
Country name | United States |
latitude |
Latitude coordinate | 37.7749 |
longitude |
Longitude coordinate | -122.4194 |
Numeric & Financial
| Field Type | Description | Configurable |
|---|---|---|
integer |
Whole numbers | Min/Max values |
decimal |
Floating point numbers | Min/Max, decimal places |
currency |
Money amounts | Min/Max, currency symbol |
credit_card |
Credit card numbers | Card type (Visa, MC, etc.) |
percentage |
Percentage values | 0-100 range |
Date & Time
| Field Type | Description | Configurable |
|---|---|---|
date |
Date values | Start/End date, format |
datetime |
Date and time | Start/End range, format |
time |
Time only | Format (12/24 hour) |
timestamp |
Unix timestamp | Date range |
Technical & Identifiers
| Field Type | Description | Example |
|---|---|---|
uuid |
UUID v4 | 550e8400-e29b-41d4-a716-446655440000 |
deterministic_id |
Consistent IDs for JOINs | CUST-1001 |
ip_address |
IPv4 address | 192.168.1.1 |
mac_address |
MAC address | 00:1B:44:11:3A:B7 |
url |
Web URL | https://example.com/page |
username |
Username | jsmith42 |
Configuration Options
Field Configuration
Each field type has specific configuration options:
Common Options
- Field Name: Column header in the CSV
- Field Type: Type of data to generate
- Nullable: Allow null/empty values
Numeric Options
- Min Value: Minimum number
- Max Value: Maximum number
- Decimal Places: Precision for decimals
Date Options
- Start Date: Earliest possible date
- End Date: Latest possible date
- Format: Date format string (YYYY-MM-DD, MM/DD/YYYY, etc.)
Amazon S3 Upload
Upload generated CSV files directly to your S3 bucket.
Setup
- Configure AWS credentials (Access Key ID and Secret Access Key)
- Specify S3 bucket name
- Set AWS region (e.g., us-east-1)
- Define file path using dynamic templates
Dynamic Path Templating
Use placeholders in your S3 path for dynamic organization:
data/{YYYY}/{MM}/{DD}/{table_name}_{timestamp}.csv
Examples:
data/2024/11/15/customers_1700012345.csv
data/2024/11/15/orders_1700012346.csv
Available Placeholders
{YYYY}- 4-digit year (2024){MM}- 2-digit month (01-12){DD}- 2-digit day (01-31){HH}- 2-digit hour (00-23){mm}- 2-digit minute (00-59){ss}- 2-digit second (00-59){timestamp}- Unix timestamp
CORS Configuration
Your S3 bucket must allow CORS requests. Add this policy to your bucket:
[
{
"AllowedHeaders": ["*"],
"AllowedMethods": ["PUT", "POST"],
"AllowedOrigins": ["*"],
"ExposeHeaders": ["ETag"]
}
]
Deterministic IDs
Create consistent IDs across multiple datasets to test relational data and SQL JOINs.
How It Works
Deterministic IDs generate the same set of IDs for a given configuration, allowing you to create related datasets:
// Customer dataset
customer_id | name
CUST-1001 | John Smith
CUST-1002 | Jane Doe
// Orders dataset (using same deterministic_id config)
order_id | customer_id
ORD-5001 | CUST-1001
ORD-5002 | CUST-1002
ORD-5003 | CUST-1001
Configuration
- Prefix: Text prefix (e.g., "CUST-", "ORD-")
- Start Value: First ID number
- Padding: Zero-pad to width (CUST-0001 vs CUST-1)
Use Cases
- Testing foreign key relationships
- Creating multi-table test scenarios
- Validating JOIN operations in Athena/Redshift
- Building dimensional models
Batch Processing
Automatically process and upload all saved configurations in sequence - perfect for populating multiple datasets in one operation.
How It Works
Batch processing cycles through all your saved configuration presets, loading each one, generating data, and uploading to S3:
- Loads configuration from dropdown (in order)
- Restores all settings including split preferences
- Generates data according to config
- Optionally pauses for review
- Uploads to S3 respecting each config's split settings
- Moves to next configuration
Configuration Requirements
Each configuration must have:
- rowCount: Number of rows to generate
- outputFormat: CSV, NDJSON, or Parquet
Configurations missing these fields will be skipped with a warning.
Per-Config Split Settings (v2.9.1)
New in v2.9.1: Split settings (Split by Date, Split by Fields) are now saved with each configuration. This means:
- Config A can split by date while Config B doesn't split
- Each config remembers its own split preferences
- Batch processing respects individual config settings
- No manual checkbox changes needed between configs
Pause Mode
Enable "Pause for confirmation before each upload" to:
- Review generated data in the preview table
- Verify configuration loaded correctly
- Check console logs for any issues
- Click Continue to proceed or Stop to cancel
Progress Tracking
During batch processing, you'll see:
- Current config being processed (e.g., "Config 3 of 12")
- Real-time success/failed/skipped counts
- Detailed console logging of all operations
- Final summary when complete
Use Cases
- Daily Data Pipeline: Generate multiple test datasets in one click
- Multi-Region Deployment: Upload same data to different S3 paths
- Testing Scenarios: Create variations of test data automatically
- Client Deliverables: Process multiple client datasets at once
Example Workflow
1. Create configs for:
- Customers (1000 rows, split by country)
- Orders (5000 rows, split by date)
- Products (500 rows, no splitting)
2. Click "Batch Upload All Configs"
3. Result:
✓ Customers → 5 files (USA, Canada, UK, Germany, France)
✓ Orders → 30 files (one per day)
✓ Products → 1 file
Total: 36 files uploaded automatically
Configuration Presets
Save and load field configurations for reuse and sharing.
Saving Presets
- Configure your fields
- Click "Save Configuration"
- Downloads as JSON file
Loading Presets
- Click "Load Configuration"
- Select your JSON preset file
- Fields are automatically configured
- Configuration is auto-saved for the session
Preset Format
{
"fields": [
{
"name": "customer_id",
"type": "deterministic_id",
"prefix": "CUST-",
"startValue": 1001
},
{
"name": "email",
"type": "email",
"nullable": false
}
],
"version": "2.3.0"
}
Multi-Column Sorting
Sort generated data by multiple columns with configurable sort order.
Usage
- Generate your data
- Click column headers to sort
- Shift+Click for multi-column sort
- Click again to toggle ascending/descending
Use Cases
- Test query sorting performance
- Validate sort order logic
- Create ordered test scenarios
- Check index performance
AWS Analytics Integration
Amazon Athena
Query your generated data using Athena:
-- Create external table pointing to S3
CREATE EXTERNAL TABLE customers (
customer_id STRING,
name STRING,
email STRING,
signup_date DATE
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION 's3://your-bucket/data/'
TBLPROPERTIES ('skip.header.line.count'='1');
-- Query your test data
SELECT * FROM customers LIMIT 10;
Amazon Redshift
Load data into Redshift:
-- Copy from S3 to Redshift
COPY customers
FROM 's3://your-bucket/data/'
IAM_ROLE 'arn:aws:iam::account:role/RedshiftRole'
CSV
IGNOREHEADER 1;
AWS Glue
Crawl and catalog your test data:
- Upload CSV to S3
- Create Glue Crawler pointing to S3 path
- Run crawler to auto-detect schema
- Query via Athena or use in ETL jobs
Troubleshooting
S3 Upload Fails
Common causes:
- Check AWS credentials are correct
- Verify bucket exists and region is correct
- Ensure CORS is configured on bucket
- Check bucket permissions allow PUT operations
- Review browser console for detailed error messages
Large File Generation Slow
- Generate in smaller batches (e.g., 100K rows at a time)
- Use pagination to preview without generating all rows
- Consider closing other browser tabs
Configuration Not Saving
- Check browser allows local storage
- Not in private/incognito mode
- Try clearing browser cache and reloading
Frequently Asked Questions
Is my data sent to any servers?
No. CSV Generator Pro runs entirely in your browser. Data is generated client-side and either downloaded to your computer or uploaded directly to your S3 bucket. No data passes through external servers.
How much data can I generate?
The limit depends on your browser's memory. Most modern browsers can handle millions of rows, but for very large datasets (10M+ rows), generate in batches.
Can I use this for production data?
This tool is designed for test data generation. While the data looks realistic, it should not be used as real production data.
What browsers are supported?
All modern browsers including Chrome, Firefox, Safari, and Edge. Requires JavaScript enabled.
Does CSV Generator Pro support Parquet format?
Yes! Parquet import and export are fully supported as of v2.8.0. You can import existing Parquet files, generate data and export to Parquet, or convert between CSV/NDJSON/Parquet formats. Parquet files use optimized columnar storage for faster Athena and Redshift queries.
How does batch processing work?
New in v2.9.0! Batch processing automatically cycles through all your saved configuration presets, generating and uploading each one to S3. You can enable "pause mode" to review data before each upload. Each configuration remembers its own split settings (v2.9.1), so some configs can split by date while others don't. Perfect for populating multiple test datasets in one operation.
Can I contribute or request features?
Yes! This is an open source project. Submit feature requests or contribute via GitHub.
How do I report bugs?
Report issues on GitHub or check the console for error messages to include in your bug report.
Need More Help?
If you have questions not covered here, check the GitHub repository or consider supporting the project to help fund expanded documentation.
Support Development