Use Case: Distributed Backup Aggregation Without Central Storage
The Challenge: Backing Up Distributed Systems
Modern applications are distributed across multiple services, regions, and providers. When disaster strikes, you need complete backups - but aggregating them is painful:
Typical Architecture:
- User uploads: S3 (us-east-1)
- Database backups: S3 (eu-west-1)
- Application logs: Google Cloud Storage
- Configuration files: Azure Blob Storage
- Code artifacts: GitHub/GitLab
Traditional Backup Workflow:
- Download database backup from S3 EU (3GB) → Local server
- Download user files from S3 US (10GB) → Local server
- Download logs from GCS (2GB) → Local server
- Create local ZIP archive (15GB) → Local disk
- Upload ZIP to backup storage (15GB) → S3 Glacier
- Delete local files
Problems:
- Requires 30GB local storage (download + ZIP)
- Takes hours: Multiple download/upload cycles
- High bandwidth costs: Pay egress from 3+ providers, then ingress to backup
- Single point of failure: If server crashes mid-process, start over
- Complex orchestration: Need scripts to manage multi-step process
The ZipStream Solution: Direct Aggregation
ZipStream enables zero-storage backup aggregation by streaming files from multiple sources directly into a single archive.
Architecture
┌─────────────┐
│ S3 US-East │─────┐
└─────────────┘ │
│
┌─────────────┐ │ ┌─────────────┐ ┌──────────────┐
│ S3 EU-West │────────────│ ZipStream │────→│ Final Backup │
└─────────────┘ │ └─────────────┘ │ (S3 Glacier) │
│ └──────────────┘
┌─────────────┐ │
│ GCS │─────┘
└─────────────┘
No intermediate storage required!
Implementation Example
// backup-orchestrator.js
const fetch = require('node-fetch');
const { S3 } = require('aws-sdk');
const { Storage } = require('@google-cloud/storage');
async function createDistributedBackup(backupDate) {
const s3 = new S3();
const gcs = new Storage();
// Generate signed URLs from all sources
const files = [];
// 1. Database backups from S3 EU
const dbBackupUrl = await s3.getSignedUrl('getObject', {
Bucket: 'backups-eu',
Key: `db-backup-${backupDate}.sql.gz`,
Expires: 3600
});
files.push({
url: dbBackupUrl,
zipPath: 'database/backup.sql.gz'
});
// 2. User uploads from S3 US (multiple files)
const userFiles = await s3.listObjectsV2({
Bucket: 'user-uploads-us',
Prefix: `backups/${backupDate}/`
}).promise();
for (const file of userFiles.Contents) {
const url = await s3.getSignedUrl('getObject', {
Bucket: 'user-uploads-us',
Key: file.Key,
Expires: 3600
});
files.push({
url: url,
zipPath: `user-data/${file.Key.replace('backups/', '')}`
});
}
// 3. Application logs from GCS
const [gcsFiles] = await gcs.bucket('app-logs').getFiles({
prefix: `logs/${backupDate}/`
});
for (const file of gcsFiles) {
const [url] = await file.getSignedUrl({
action: 'read',
expires: Date.now() + 3600000
});
files.push({
url: url,
zipPath: `logs/${file.name.replace('logs/', '')}`
});
}
// 4. Configuration files from your API
files.push({
url: `https://api.yourcompany.com/backups/config-${backupDate}.json`,
zipPath: 'config/app-config.json'
});
// 5. Create backup manifest
const manifest = {
backupDate: backupDate,
timestamp: new Date().toISOString(),
fileCount: files.length,
sources: ['S3-US', 'S3-EU', 'GCS', 'API']
};
// Host manifest temporarily (or use data URL)
const manifestUrl = await uploadManifestTemporarily(manifest);
files.push({
url: manifestUrl,
zipPath: 'BACKUP_MANIFEST.json'
});
// 6. Create ZipStream descriptor
const zipDescriptor = {
suggestedFilename: `backup-${backupDate}.zip`,
files: files,
compression: "STORE" // Backups are already compressed
};
// 7. Stream directly to S3 Glacier (or download locally)
const zipStream = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(zipDescriptor)
});
// Upload stream directly to Glacier
await s3.upload({
Bucket: 'long-term-backups',
Key: `archives/backup-${backupDate}.zip`,
Body: zipStream.body,
StorageClass: 'GLACIER'
}).promise();
console.log(`✓ Backup completed: backup-${backupDate}.zip`);
console.log(`✓ Files aggregated: ${files.length}`);
console.log(`✓ Zero intermediate storage used`);
}
// Run daily backups
createDistributedBackup(new Date().toISOString().split('T')[0]);
Key Benefits
1. Zero Intermediate Storage
- No need for large EC2 instances with attached storage
- No local disk to manage or fill up
- Cost savings: Eliminate backup server storage costs
2. Bandwidth Optimization
Traditional approach:
- Download from S3 EU: 3GB egress = $0.27
- Download from S3 US: 10GB egress = $0.90
- Download from GCS: 2GB egress = $0.24
- Upload to S3 Glacier: 15GB ingress = $0 (ingress is free)
- Total: $1.41 per backup
ZipStream approach:
- Stream from all sources directly to Glacier
- Only pay egress from source providers
- Total: $1.41 per backup (same egress costs)
- But: No download/upload cycles, no local storage, faster completion
3. Faster Backups
Traditional (sequential):
- Download 15GB @ 100Mbps: ~20 minutes
- Create ZIP: ~5 minutes
- Upload 15GB @ 100Mbps: ~20 minutes
- Total: ~45 minutes
ZipStream (parallel streaming):
- Stream and upload concurrently: ~20 minutes
- Total: ~20 minutes (55% faster)
4. Resilient to Failures
- No partial downloads to clean up
- If stream fails, just retry
- No risk of filling disk and crashing
Advanced Pattern: Multi-Region Disaster Recovery
// Backup strategy: Keep copies in 3 regions
async function distributedDRBackup() {
const zipDescriptor = await createBackupDescriptor();
// Create temporary ZipStream link
const linkResponse = await fetch('https://zipstream.app/api/download-links', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(zipDescriptor)
});
const { downloadUrl } = await linkResponse.json();
const fullUrl = `https://zipstream.app${downloadUrl}`;
// Upload to 3 regions in parallel
await Promise.all([
uploadToS3(fullUrl, 'us-east-1'),
uploadToS3(fullUrl, 'eu-west-1'),
uploadToS3(fullUrl, 'ap-south-1')
]);
console.log('✓ Backup replicated to 3 regions');
}
async function uploadToS3(streamUrl, region) {
const s3 = new S3({ region });
const response = await fetch(streamUrl);
await s3.upload({
Bucket: `backups-${region}`,
Key: `backup-${Date.now()}.zip`,
Body: response.body,
StorageClass: 'GLACIER'
}).promise();
}
Note: ZipStream temporary links expire in 60 seconds. For multi-region uploads, use POST /api/downloads and pipe to multiple destinations simultaneously.
Kubernetes Backup Example
Backup all configs, secrets, and persistent volume snapshots:
// k8s-backup.js
const k8s = require('@kubernetes/client-node');
async function backupKubernetesCluster() {
const kc = new k8s.KubeConfig();
kc.loadFromDefault();
const files = [];
// 1. Export all ConfigMaps
const configMaps = await k8sApi.listConfigMapForAllNamespaces();
const configMapJson = JSON.stringify(configMaps.body, null, 2);
const configMapUrl = await uploadToTempStorage(configMapJson, 'configmaps.json');
files.push({ url: configMapUrl, zipPath: 'k8s/configmaps.json' });
// 2. Export all Secrets (encrypted)
const secrets = await k8sApi.listSecretForAllNamespaces();
const encryptedSecrets = await encrypt(JSON.stringify(secrets.body));
const secretsUrl = await uploadToTempStorage(encryptedSecrets, 'secrets.enc');
files.push({ url: secretsUrl, zipPath: 'k8s/secrets.enc' });
// 3. Export deployment manifests
const deployments = await appsApi.listDeploymentForAllNamespaces();
const deploymentsUrl = await uploadToTempStorage(
JSON.stringify(deployments.body, null, 2),
'deployments.json'
);
files.push({ url: deploymentsUrl, zipPath: 'k8s/deployments.json' });
// 4. Add PersistentVolume snapshots (already in S3)
const pvSnapshots = await getPVSnapshotURLs(); // Returns S3 signed URLs
pvSnapshots.forEach(snapshot => {
files.push({
url: snapshot.url,
zipPath: `volumes/${snapshot.name}.tar.gz`
});
});
// Create unified backup
const zipStream = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
body: JSON.stringify({
suggestedFilename: `k8s-backup-${Date.now()}.zip`,
files: files,
compression: "DEFLATE" // Config files benefit from compression
})
});
// Upload to backup storage
await uploadBackup(zipStream.body);
}
Docker Registry Backup
Aggregate images from multiple registries:
// Pull image layers from multiple registries and create backup archive
async function backupDockerImages() {
const images = [
'myregistry.io/app:v1.0.0',
'gcr.io/project/service:latest',
'docker.io/library/postgres:14'
];
const files = [];
for (const image of images) {
// Export image to tar (stored temporarily in S3)
const tarUrl = await exportImageToTar(image);
files.push({
url: tarUrl,
zipPath: `images/${image.replace(/[/:]/g, '_')}.tar`
});
}
// Create backup manifest
const manifest = {
images: images,
timestamp: new Date().toISOString(),
registry: 'multi-registry-backup'
};
const manifestUrl = await uploadToTempStorage(JSON.stringify(manifest), 'manifest.json');
files.push({ url: manifestUrl, zipPath: 'MANIFEST.json' });
// Stream to backup
const zipStream = await fetch('https://zipstream.app/api/downloads', {
method: 'POST',
body: JSON.stringify({
suggestedFilename: `docker-backup-${Date.now()}.zip`,
files: files,
compression: "STORE" // Tar files are already compressed
})
});
return zipStream.body;
}
Best Practices
1. Pre-flight Validation
// Validate all sources are accessible before starting backup
const validation = await fetch('https://zipstream.app/api/validations', {
method: 'POST',
body: JSON.stringify({ files: backupFiles })
});
const result = await validation.json();
if (!result.valid) {
console.error('Backup failed: Some sources inaccessible');
console.error(result.results.filter(r => !r.accessible));
throw new Error('Pre-flight validation failed');
}
2. Backup Integrity Checks
// Include checksums in manifest
const manifest = {
files: files.map(f => ({
path: f.zipPath,
sourceUrl: f.url,
checksum: await calculateChecksum(f.url) // SHA256
})),
timestamp: new Date().toISOString()
};
3. Compression Strategy
// Use compression selectively
const descriptor = {
files: files.map(f => {
// Already compressed files
if (f.zipPath.match(/\.(gz|zip|jpg|png|mp4)$/)) {
return { ...f, compression: 'STORE' };
}
// Text files benefit from compression
return { ...f, compression: 'DEFLATE' };
})
};
// Note: ZipStream currently applies one compression setting globally
// For now, pre-compress text files and use STORE globally
4. Monitoring and Alerting
const startTime = Date.now();
try {
await createDistributedBackup(backupDate);
const duration = Date.now() - startTime;
// Send success metric to monitoring
await sendMetric('backup.success', 1, {
duration: duration,
fileCount: files.length
});
} catch (error) {
await sendAlert('backup.failed', {
error: error.message,
backupDate: backupDate
});
throw error;
}
Performance Comparison
Scenario: Daily backup of 20GB across 4 cloud providers
| Metric | Traditional | ZipStream | Improvement |
|---|---|---|---|
| Local storage required | 40GB | 0GB | 100% reduction |
| Backup duration | 60 minutes | 25 minutes | 58% faster |
| Infrastructure cost | $50/month (EC2) | $0 | 100% reduction |
| Bandwidth cost | ~$2/day | ~$2/day | Same |
| Failure recovery time | 60 minutes (restart) | 0 minutes (retry) | Instant |
Limitations and Workarounds
Rate Limits
- Default: 10 requests/hour per IP
- Workaround: Spread backups throughout the day, or contact ZipStream for higher limits
5GB Archive Limit
- Maximum ZIP size: 5GB
- Workaround: Split large backups into multiple archives by date or service
// Split by service
await createBackup('database', dbFiles);
await createBackup('user-uploads', uploadFiles);
await createBackup('logs', logFiles);
50 File Limit
- Maximum: 50 files per archive
- Workaround: For more files, create multiple archives or pre-bundle smaller files
// Bundle small config files into a tar first
const configTar = await createTarball(configFiles);
files.push({ url: configTar, zipPath: 'configs.tar' });
Conclusion
ZipStream transforms distributed backup strategies by eliminating intermediate storage and orchestration complexity:
- Zero storage overhead: No backup server required
- Faster completion: Parallel streaming reduces backup windows
- Lower costs: Eliminate backup server infrastructure
- Simpler orchestration: One API call replaces complex multi-step scripts
- Cloud-native: Works seamlessly with S3, GCS, Azure, and any HTTP source
Whether you’re backing up Kubernetes clusters, Docker registries, or multi-cloud applications, ZipStream provides a production-grade solution for aggregating distributed data without central storage.
Ready to simplify your backups? Get started with ZipStream
Ready to get started?
Try ZipStream and start building scalable file delivery infrastructure.