CSV Generator Pro - Quick Start Guide

🆕 What's New in v2.9.2

Auto-Add Unknown Fields Feature

New checkbox in the Import Data section that automatically adds custom fields from imported files. Perfect for working with company-specific or legacy data formats!

Enabled: Automatically adds any fields not in the standard list
Disabled: Rejects files with incompatible fields (previous behavior)
Default: Enabled (checked) for maximum flexibility

🏃 30-Second Quick Start

⏱️ 30 seconds

Generate Your First Dataset

Open csv-generator-pro.html in your browser
Click "Select Common" button (selects frequently-used fields)
Click "Generate Data" button
Click "Download CSV" to save your file

✅ Done! You've just created a realistic dataset with 1000 rows.

📋 Common Tasks

⏱️ 1 minute

Task 1: Generate Customer Data

Select these fields: id, firstName, lastName, email, phone, city, country, status
Set rows to 500
Click "Generate Data"
Click "Download CSV"

⏱️ 1 minute

Task 2: Use a Built-in Preset

Click the configuration dropdown
Select "Sales Transaction Log"
Click "Load Config"
Click "Generate Data"
Click "Download CSV"

⏱️ 2 minutes

Task 3: Import and Convert Files NEW

Click "Choose File" in Import Data section
Select your CSV, NDJSON, JSON, or Parquet file
If file has custom fields:
- ✅ Keep "Auto-add unknown fields" checked (default)
- Custom fields will be automatically added
Data loads automatically with fields selected
Choose new output format (CSV, NDJSON, or Parquet)
Click "Download" to save in new format

Example: Import a Parquet file with custom fields like customer_id and order_total, then export as CSV for Excel analysis.

⏱️ 3 minutes

Task 4: Create Consistent IDs for SQL Joins

Check "Enable Deterministic IDs"
Select method: Standard (uses firstName + lastName + email)
Select fields: id, firstName, lastName, email, date
Generate 1000 rows
Save as customers.csv
Change fields to: id, firstName, lastName, email, product, price
Generate 5000 rows (using same Standard method)
Save as orders.csv

✅ Now you can JOIN these tables on the id field because the same person gets the same ID in both files!

⏱️ 5 minutes

Task 5: Upload to AWS S3 with Partitioning

Fill in S3 credentials (bucket, region, access key, secret key)
Set S3 Directory: sales/category={{category}}/year=yyyy/month=mm/
Select fields including category and date
Enable "Random Dates" with range 2024-01-01 to 2024-12-31
Check "Split by Fields" and "Split by Date"
Click "Generate Data"
Click "Quick Upload to S3"

✅ Files will be organized into partitioned directories like:
sales/category=Electronics/year=2024/month=11/

🔑 Key Features Quick Reference

Import & Export

Import formats: CSV, NDJSON, JSON, Parquet
Export formats: CSV, NDJSON, Parquet
Auto-add fields: Automatically handle custom field names (v2.9.2)
Format conversion: Import any format, export to any format

Field Selection

41+ field types covering personal, business, product, and technical data
Quick buttons: Select All, Deselect All, Select Common
Custom fields: Import files with any field names using auto-add feature

Data Generation

Row counts: 1 to 1,000,000 records
Date options: Fixed date or random date ranges
Deterministic IDs: Create consistent IDs for multi-table relationships
Deduplication: Automatic handling of duplicate records

AWS S3 Integration

Direct upload: No additional tools needed
Dynamic paths: Use {{field}} and yyyy/mm/dd placeholders
File splitting: Automatic split by date or field values
Hive partitioning: Create data lake structures like year=2024/month=11/
Test CORS: Validate bucket configuration before uploading

Configuration Management

12 built-in presets: Customer lists, sales logs, product inventory, etc.
Save custom configs: Store your field selections and settings
Import/Export: Share configurations with team members
Batch processing: Automatically process multiple configs

🔄 Common Workflows

Workflow 1: Data Warehouse Testing

Create customers table - 50,000 rows with deterministic IDs
Create orders table - 200,000 rows with same ID method
Create products table - 5,000 rows
Upload all to S3 with Hive-style partitioning
Test Athena queries with realistic data volumes

Workflow 2: Format Conversion

Import your existing CSV file
Custom fields? Auto-add feature handles them automatically
Select Parquet as output format
Download - your file is now 10-100x smaller
Upload to S3 for use with Athena or Redshift

Workflow 3: Batch Data Lake Population

Create configs for each table (customers, orders, products, etc.)
Set S3 paths with partitioning for each config
Enable split settings for large datasets
Start batch upload - all tables generated and uploaded automatically
Enable pause mode to review each dataset before upload

Workflow 4: Legacy Data Modernization

Import old CSV file with custom field names
Auto-add fields preserves all original columns
Add standard fields if needed for enrichment
Export as Parquet for modern data lake
Upload to S3 with proper partitioning

💡 Pro Tips

Start small: Generate 100 rows first to verify your configuration
Use presets: The built-in configurations are optimized and ready to use
Enable console logging: See exactly what's happening during generation and upload
Test CORS first: Before uploading to S3, use the "Test CORS" button
Export configs: Back up your configurations regularly
Parquet for analytics: Use Parquet format for data warehouse uploads
Random dates for splitting: Always use random dates when splitting by date
Deterministic IDs: Use the same method across related tables for JOIN capability
Batch with pause: Enable pause mode in batch processing to review before upload
Auto-add for imports: Keep the checkbox enabled to handle any file format

❓ Getting Help

Need more details?

Complete Help Documentation - Comprehensive guide to all features
Documentation Hub - All documentation in one place
Console Logging: Enable it to see detailed information about operations
Built-in Examples: The 12 presets demonstrate best practices

⚡ Quick Start Guide