🆕 What's New in v2.9.2
Auto-Add Unknown Fields Feature
New checkbox in the Import Data section that automatically adds custom fields from imported files. Perfect for working with company-specific or legacy data formats!
- Enabled: Automatically adds any fields not in the standard list
- Disabled: Rejects files with incompatible fields (previous behavior)
- Default: Enabled (checked) for maximum flexibility
🏃 30-Second Quick Start
⏱️ 30 seconds
Generate Your First Dataset
- Open
csv-generator-pro.htmlin your browser - Click "Select Common" button (selects frequently-used fields)
- Click "Generate Data" button
- Click "Download CSV" to save your file
✅ Done! You've just created a realistic dataset with 1000 rows.
📋 Common Tasks
⏱️ 1 minute
Task 1: Generate Customer Data
- Select these fields:
id, firstName, lastName, email, phone, city, country, status - Set rows to
500 - Click "Generate Data"
- Click "Download CSV"
⏱️ 1 minute
Task 2: Use a Built-in Preset
- Click the configuration dropdown
- Select "Sales Transaction Log"
- Click "Load Config"
- Click "Generate Data"
- Click "Download CSV"
⏱️ 2 minutes
Task 3: Import and Convert Files NEW
- Click "Choose File" in Import Data section
- Select your CSV, NDJSON, JSON, or Parquet file
- If file has custom fields:
- ✅ Keep "Auto-add unknown fields" checked (default)
- Custom fields will be automatically added
- Data loads automatically with fields selected
- Choose new output format (CSV, NDJSON, or Parquet)
- Click "Download" to save in new format
Example: Import a Parquet file with custom fields like customer_id and order_total, then export as CSV for Excel analysis.
⏱️ 3 minutes
Task 4: Create Consistent IDs for SQL Joins
- Check "Enable Deterministic IDs"
- Select method:
Standard(uses firstName + lastName + email) - Select fields:
id, firstName, lastName, email, date - Generate 1000 rows
- Save as
customers.csv - Change fields to:
id, firstName, lastName, email, product, price - Generate 5000 rows (using same Standard method)
- Save as
orders.csv
✅ Now you can JOIN these tables on the id field because the same person gets the same ID in both files!
⏱️ 5 minutes
Task 5: Upload to AWS S3 with Partitioning
- Fill in S3 credentials (bucket, region, access key, secret key)
- Set S3 Directory:
sales/category={{category}}/year=yyyy/month=mm/ - Select fields including
categoryanddate - Enable "Random Dates" with range 2024-01-01 to 2024-12-31
- Check "Split by Fields" and "Split by Date"
- Click "Generate Data"
- Click "Quick Upload to S3"
✅ Files will be organized into partitioned directories like:
sales/category=Electronics/year=2024/month=11/
🔑 Key Features Quick Reference
Import & Export
- Import formats: CSV, NDJSON, JSON, Parquet
- Export formats: CSV, NDJSON, Parquet
- Auto-add fields: Automatically handle custom field names (v2.9.2)
- Format conversion: Import any format, export to any format
Field Selection
- 41+ field types covering personal, business, product, and technical data
- Quick buttons: Select All, Deselect All, Select Common
- Custom fields: Import files with any field names using auto-add feature
Data Generation
- Row counts: 1 to 1,000,000 records
- Date options: Fixed date or random date ranges
- Deterministic IDs: Create consistent IDs for multi-table relationships
- Deduplication: Automatic handling of duplicate records
AWS S3 Integration
- Direct upload: No additional tools needed
- Dynamic paths: Use {{field}} and yyyy/mm/dd placeholders
- File splitting: Automatic split by date or field values
- Hive partitioning: Create data lake structures like
year=2024/month=11/ - Test CORS: Validate bucket configuration before uploading
Configuration Management
- 12 built-in presets: Customer lists, sales logs, product inventory, etc.
- Save custom configs: Store your field selections and settings
- Import/Export: Share configurations with team members
- Batch processing: Automatically process multiple configs
🔄 Common Workflows
Workflow 1: Data Warehouse Testing
- Create customers table - 50,000 rows with deterministic IDs
- Create orders table - 200,000 rows with same ID method
- Create products table - 5,000 rows
- Upload all to S3 with Hive-style partitioning
- Test Athena queries with realistic data volumes
Workflow 2: Format Conversion
- Import your existing CSV file
- Custom fields? Auto-add feature handles them automatically
- Select Parquet as output format
- Download - your file is now 10-100x smaller
- Upload to S3 for use with Athena or Redshift
Workflow 3: Batch Data Lake Population
- Create configs for each table (customers, orders, products, etc.)
- Set S3 paths with partitioning for each config
- Enable split settings for large datasets
- Start batch upload - all tables generated and uploaded automatically
- Enable pause mode to review each dataset before upload
Workflow 4: Legacy Data Modernization
- Import old CSV file with custom field names
- Auto-add fields preserves all original columns
- Add standard fields if needed for enrichment
- Export as Parquet for modern data lake
- Upload to S3 with proper partitioning
💡 Pro Tips
- Start small: Generate 100 rows first to verify your configuration
- Use presets: The built-in configurations are optimized and ready to use
- Enable console logging: See exactly what's happening during generation and upload
- Test CORS first: Before uploading to S3, use the "Test CORS" button
- Export configs: Back up your configurations regularly
- Parquet for analytics: Use Parquet format for data warehouse uploads
- Random dates for splitting: Always use random dates when splitting by date
- Deterministic IDs: Use the same method across related tables for JOIN capability
- Batch with pause: Enable pause mode in batch processing to review before upload
- Auto-add for imports: Keep the checkbox enabled to handle any file format
❓ Getting Help
Need more details?
- Complete Help Documentation - Comprehensive guide to all features
- Documentation Hub - All documentation in one place
- Console Logging: Enable it to see detailed information about operations
- Built-in Examples: The 12 presets demonstrate best practices