Use Case: Real-Time Data Export for Analytics Platforms
The Problem: Data Export Bottlenecks
Analytics platforms, business intelligence tools, and data warehouses face a critical challenge: users need to export large datasets, but traditional approaches create significant bottlenecks:
Typical User Request: “Export the last 6 months of customer data with all related reports”
Traditional Export Flow:
- User clicks “Export to ZIP”
- Server queries database (thousands of rows)
- Server generates CSV files in memory
- Server writes files to disk
- Server creates ZIP archive (waiting for all files)
- Server sends ZIP to user
- Server cleans up temporary files
Problems:
- Memory explosion: Loading millions of rows crashes servers
- Slow exports: Users wait 5-10 minutes for large datasets
- Storage costs: Temporary CSV and ZIP files consume disk space
- Concurrency limits: Can’t handle many simultaneous exports
- Failed exports: Timeouts on large datasets waste computation
Real-World Pain Points
SaaS Analytics Platform (100,000 users):
- Average export: 500MB (6 CSV files)
- Daily exports: 5,000 requests
- Peak hour exports: 800 concurrent requests
Infrastructure Costs:
- EC2 instances with 32GB RAM: $400/month (to handle memory spikes)
- EBS storage for temp files: $100/month (1TB)
- Failed exports (timeouts): 15% waste = $75/month in wasted compute
- Total: $575/month
User Experience:
- Average wait time: 4 minutes
- Timeout rate: 15%
- Support tickets: 50/month (“export failed”)
The ZipStream Solution: Streaming Data Exports
ZipStream enables zero-buffer data exports by streaming query results directly into ZIP archives without intermediate storage.
Architecture
Database → Stream Results → Generate CSV → ZipStream → User Download
(on-the-fly) (concurrent)
No intermediate files!
Implementation Example
// Express.js example with PostgreSQL
const { Pool } = require('pg');
const { Readable, PassThrough } = require('stream');
const fetch = require('node-fetch');
app.post('/api/exports/customer-data', async (req, res) => {
const { startDate, endDate, userId } = req.body;
// 1. Create temporary URLs for streaming CSV generation
const csvGenerationUrls = [];
// 2. Generate customers CSV
const customersUrl = await createStreamingCSVEndpoint(
'customers',
`SELECT * FROM customers WHERE user_id = $1 AND created_at BETWEEN $2 AND $3`,
[userId, startDate, endDate]
);
csvGenerationUrls.push({
url: customersUrl,
zipPath: 'data/customers.csv'
});
// 3. Generate orders CSV
const ordersUrl = await createStreamingCSVEndpoint(
'orders',
`SELECT * FROM orders WHERE user_id = $1 AND order_date BETWEEN $2 AND $3`,
[userId, startDate, endDate]
);
csvGenerationUrls.push({
url: ordersUrl,
zipPath: 'data/orders.csv'
});
// 4. Generate analytics summary
const summaryUrl = await createStreamingCSVEndpoint(
'summary',
`SELECT date, SUM(revenue) as revenue, COUNT(*) as orders
FROM orders WHERE user_id = $1 AND order_date BETWEEN $2 AND $3
GROUP BY date ORDER BY date`,
[userId, startDate, endDate]
);
csvGenerationUrls.push({
url: summaryUrl,
zipPath: 'reports/summary.csv'
});
// 5. Add static README
csvGenerationUrls.push({
url: 'https://yourapi.com/export-templates/README.txt',
zipPath: 'README.txt'
});
// 6. Stream everything to ZIP
const zipStream = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
suggestedFilename: `export-${userId}-${Date.now()}.zip`,
files: csvGenerationUrls,
compression: "DEFLATE" // CSV compresses well
})
});
// 7. Pipe to user
res.set('Content-Type', 'application/zip');
res.set('Content-Disposition', `attachment; filename="customer-data.zip"`);
zipStream.body.pipe(res);
});
// Helper: Create streaming CSV endpoint
async function createStreamingCSVEndpoint(name, query, params) {
const exportId = generateUniqueId();
// Store query details temporarily (Redis, 10 min TTL)
await redis.setex(`export:${exportId}`, 600, JSON.stringify({
query,
params,
name
}));
// Return URL to streaming endpoint
return `https://yourapi.com/internal/stream-csv/${exportId}`;
}
// Streaming CSV endpoint
app.get('/internal/stream-csv/:exportId', async (req, res) => {
const { exportId } = req.params;
// Retrieve query details
const exportData = JSON.parse(await redis.get(`export:${exportId}`));
if (!exportData) {
return res.status(404).send('Export expired');
}
const { query, params } = exportData;
// Stream CSV directly from database query
res.set('Content-Type', 'text/csv');
const pool = new Pool();
const client = await pool.connect();
try {
// Use database cursor for memory-efficient streaming
const cursor = client.query(new Cursor(query, params));
// Write CSV header
const firstRow = await cursor.read(1);
if (firstRow.length > 0) {
const headers = Object.keys(firstRow[0]).join(',');
res.write(headers + '\n');
// Write first row
res.write(Object.values(firstRow[0]).join(',') + '\n');
}
// Stream remaining rows in batches
let rows;
while ((rows = await cursor.read(100)).length > 0) {
for (const row of rows) {
res.write(Object.values(row).join(',') + '\n');
}
}
res.end();
} finally {
client.release();
}
});
Key Benefits
1. Constant Memory Usage
- Stream rows directly to CSV (no buffering)
- CSV streams directly to ZIP (no temp files)
- Memory usage: <50MB regardless of export size
- Result: Can run on smaller, cheaper instances
2. Instant Start Times
- Download begins immediately
- Users see progress within seconds
- No “generating export” spinner
3. Zero Storage Overhead
- No temporary CSV files
- No temporary ZIP files
- Storage cost: $0
4. Unlimited Concurrency
- Each export uses minimal resources
- Can handle 100+ concurrent exports on a single instance
- Infrastructure cost reduction: 70-80%
Performance Comparison
Scenario: Export 1 million rows (500MB CSV) to ZIP
| Metric | Traditional | ZipStream | Improvement |
|---|---|---|---|
| Memory usage | 2GB+ (buffer) | 50MB | 97% reduction |
| Time to first byte | 45 seconds | second | 98% faster |
| Total time | 60 seconds | 30 seconds | 50% faster |
| Temp storage | 500MB | 0MB | 100% reduction |
| Concurrent exports (16GB instance) | 5-8 exports | 100+ exports | 15x improvement |
Advanced Pattern: Multi-Format Exports
// Export data in multiple formats simultaneously
app.post('/api/exports/multi-format', async (req, res) => {
const { query, params } = req.body;
const files = [];
// 1. CSV format
const csvUrl = await createStreamingEndpoint('csv', query, params);
files.push({ url: csvUrl, zipPath: 'data/export.csv' });
// 2. JSON format (streaming NDJSON)
const jsonUrl = await createStreamingEndpoint('json', query, params);
files.push({ url: jsonUrl, zipPath: 'data/export.ndjson' });
// 3. Excel format (if you have a streaming XLSX library)
const xlsxUrl = await createStreamingEndpoint('xlsx', query, params);
files.push({ url: xlsxUrl, zipPath: 'data/export.xlsx' });
// 4. Metadata
const metadata = {
exportDate: new Date().toISOString(),
rowCount: await getRowCount(query, params),
query: query,
formats: ['csv', 'json', 'xlsx']
};
const metadataUrl = await uploadJSON(metadata, 'metadata.json');
files.push({ url: metadataUrl, zipPath: 'metadata.json' });
// Create multi-format export
const zipStream = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
body: JSON.stringify({
suggestedFilename: 'export-multi-format.zip',
files: files,
compression: "DEFLATE"
})
});
zipStream.body.pipe(res);
});
Real-World Example: BI Dashboard Exports
Case Study: Business intelligence platform with 50,000 users
Export Types:
- Sales reports (10+ CSV files)
- Customer analytics (charts + data)
- Automated scheduled exports
Implementation:
// Scheduled export (runs daily at 2am)
async function scheduledExport(userId, reportConfig) {
const files = [];
// 1. Sales data (last 30 days)
const salesUrl = await streamQuery(
`SELECT * FROM sales WHERE user_id = $1 AND date >= NOW() - INTERVAL '30 days'`,
[userId]
);
files.push({ url: salesUrl, zipPath: 'reports/sales.csv' });
// 2. Customer metrics
const customersUrl = await streamQuery(
`SELECT customer_id, SUM(revenue) as total_revenue, COUNT(*) as orders
FROM sales WHERE user_id = $1 GROUP BY customer_id`,
[userId]
);
files.push({ url: customersUrl, zipPath: 'reports/customers.csv' });
// 3. Product performance
const productsUrl = await streamQuery(
`SELECT product_id, SUM(quantity) as units_sold, SUM(revenue) as revenue
FROM sales WHERE user_id = $1 GROUP BY product_id ORDER BY revenue DESC`,
[userId]
);
files.push({ url: productsUrl, zipPath: 'reports/products.csv' });
// 4. Charts (pre-rendered images)
const chartUrls = await generateCharts(userId, reportConfig);
chartUrls.forEach(chart => {
files.push({ url: chart.url, zipPath: `charts/${chart.name}.png` });
});
// 5. Executive summary (PDF)
const summaryPdfUrl = await generatePDF(userId, reportConfig);
files.push({ url: summaryPdfUrl, zipPath: 'summary.pdf' });
// Create ZIP and email to user
const zipResponse = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
body: JSON.stringify({
suggestedFilename: `daily-report-${new Date().toISOString().split('T')[0]}.zip`,
files: files,
compression: "DEFLATE"
})
});
// Upload to S3 for user to download from email link
const s3Url = await uploadStreamToS3(
zipResponse.body,
`exports/${userId}/daily-report-${Date.now()}.zip`
);
// Email user
await sendEmail(userId, {
subject: 'Your Daily Business Report',
body: `Your report is ready. Download: ${s3Url}\n\nExpires in 7 days.`
});
}
Results:
- Before: 45 minutes to generate all daily reports (sequential)
- After: 8 minutes (parallel streaming)
- Cost savings: $300/month in compute (smaller instances)
- User satisfaction: 95% (up from 65%)
Integration with Data Warehouses
BigQuery Example
const { BigQuery } = require('@google-cloud/bigquery');
async function exportBigQueryData(query) {
const bigquery = new BigQuery();
const exportId = generateUniqueId();
// Store query
await redis.setex(`bq:${exportId}`, 600, query);
// Return streaming endpoint
return `https://yourapi.com/internal/bigquery-stream/${exportId}`;
}
app.get('/internal/bigquery-stream/:exportId', async (req, res) => {
const query = await redis.get(`bq:${req.params.exportId}`);
const bigquery = new BigQuery();
const [job] = await bigquery.createQueryJob({ query });
const stream = job.getQueryResultsStream();
res.set('Content-Type', 'text/csv');
let headerWritten = false;
stream.on('data', (row) => {
if (!headerWritten) {
res.write(Object.keys(row).join(',') + '\n');
headerWritten = true;
}
res.write(Object.values(row).join(',') + '\n');
});
stream.on('end', () => res.end());
stream.on('error', (err) => res.status(500).send(err.message));
});
Snowflake Example
const snowflake = require('snowflake-sdk');
async function exportSnowflakeData(query) {
const connection = snowflake.createConnection({...});
await connection.connect();
const stream = connection.execute({
sqlText: query,
streamResult: true
}).streamRows();
// Convert to CSV stream
const csvStream = new Transform({
transform(row, encoding, callback) {
if (!this.headerWritten) {
this.push(Object.keys(row).join(',') + '\n');
this.headerWritten = true;
}
this.push(Object.values(row).join(',') + '\n');
callback();
}
});
stream.pipe(csvStream);
return csvStream;
}
Best Practices
1. Validate Export Size Before Starting
// Check row count before export
app.post('/api/exports/validate', async (req, res) => {
const { query, params } = req.body;
const countQuery = `SELECT COUNT(*) as total FROM (${query}) as subquery`;
const result = await db.query(countQuery, params);
const rowCount = result.rows[0].total;
const estimatedSize = rowCount * 500; // Assume 500 bytes per row
if (estimatedSize > 5 * 1024 * 1024 * 1024) { // 5GB limit
return res.status(400).json({
error: 'Export too large',
rowCount: rowCount,
estimatedSize: estimatedSize,
suggestion: 'Please narrow your date range or filter criteria'
});
}
res.json({
rowCount: rowCount,
estimatedSize: estimatedSize,
estimatedTime: Math.ceil(rowCount / 10000) + ' seconds'
});
});
2. Add Progress Tracking (WebSocket)
// Notify user of export progress via WebSocket
async function exportWithProgress(userId, query) {
const ws = getWebSocket(userId);
let rowsProcessed = 0;
const totalRows = await getRowCount(query);
const stream = db.query(query).stream();
stream.on('data', (row) => {
rowsProcessed++;
if (rowsProcessed % 1000 === 0) {
ws.send(JSON.stringify({
type: 'export-progress',
processed: rowsProcessed,
total: totalRows,
percentage: Math.floor((rowsProcessed / totalRows) * 100)
}));
}
});
stream.on('end', () => {
ws.send(JSON.stringify({ type: 'export-complete' }));
});
}
3. Error Handling and Retry
async function robustExport(query, params, retries = 3) {
for (let i = 0; i < retries; i++) {
try {
const csvUrl = await createStreamingCSVEndpoint('data', query, params);
// Validate URL is accessible
const validation = await fetch('https://zipstream.app/api/validations', {
method: 'POST',
body: JSON.stringify({
files: [{ url: csvUrl, zipPath: 'data.csv' }]
})
});
const { valid } = await validation.json();
if (!valid) {
throw new Error('CSV stream not accessible');
}
// Proceed with export
return await createZipExport([{ url: csvUrl, zipPath: 'data.csv' }]);
} catch (error) {
console.error(`Export attempt ${i + 1} failed:`, error);
if (i === retries - 1) throw error;
await sleep(1000 * (i + 1)); // Exponential backoff
}
}
}
4. Rate Limit Handling
// Check ZipStream quota before export
async function checkExportQuota() {
const quota = await fetch('https://zipstream.app/api/rate-limits');
const { remaining } = await quota.json();
if (remaining < 1) {
throw new Error('Export quota exceeded. Please try again in an hour.');
}
return remaining;
}
app.post('/api/exports', async (req, res) => {
try {
await checkExportQuota();
// Proceed with export
} catch (error) {
res.status(429).json({
error: error.message,
retryAfter: 3600 // seconds
});
}
});
Cost Analysis
Monthly Export Volume: 10,000 exports
| Component | Traditional | ZipStream | Savings |
|---|---|---|---|
| EC2 instances (compute) | $400 | $150 | $250 |
| EBS storage (temp files) | $100 | $0 | $100 |
| Wasted compute (failed exports) | $75 | $10 | $65 |
| Support overhead | $200 | $50 | $150 |
| Total | $775 | $210 | $565/month |
ROI: 73% cost reduction
Limitations and Workarounds
50 File Limit
- Workaround: Combine multiple CSVs into one if they share schema
- Alternative: Create multiple ZIPs for large exports
5GB Archive Limit
Workaround: Paginate exports by date range
// Split by month if > 5GB
if (estimatedSize > 5GB) {
const months = getMonthsBetween(startDate, endDate);
for (const month of months) {
await createMonthlyExport(month);
}
}
Rate Limits (10 req/hour)
- Solution: Request higher limits for production use
- Workaround: Distribute exports across multiple IPs (different availability zones)
Conclusion
ZipStream revolutionizes data export workflows for analytics platforms:
- 97% memory reduction: Stream instead of buffer
- 70-80% infrastructure cost reduction: Smaller instances, zero storage
- 50% faster exports: Parallel streaming beats sequential generation
- 15x concurrency improvement: Handle more users on same infrastructure
- Near-zero failure rate: Streaming is more reliable than batch generation
Whether you’re building a BI platform, data warehouse interface, or SaaS analytics tool, ZipStream provides production-grade data export infrastructure without the traditional bottlenecks.
Ready to eliminate export bottlenecks? Start with ZipStream
Ready to get started?
Try ZipStream and start building scalable file delivery infrastructure.