Use Case: Distributed Backup Aggregation Without Central Storage

The Challenge: Backing Up Distributed Systems

Modern applications are distributed across multiple services, regions, and providers. When disaster strikes, you need complete backups - but aggregating them is painful:

Typical Architecture:

User uploads: S3 (us-east-1)
Database backups: S3 (eu-west-1)
Application logs: Google Cloud Storage
Configuration files: Azure Blob Storage
Code artifacts: GitHub/GitLab

Traditional Backup Workflow:

Download database backup from S3 EU (3GB) → Local server
Download user files from S3 US (10GB) → Local server
Download logs from GCS (2GB) → Local server
Create local ZIP archive (15GB) → Local disk
Upload ZIP to backup storage (15GB) → S3 Glacier
Delete local files

Problems:

Requires 30GB local storage (download + ZIP)
Takes hours: Multiple download/upload cycles
High bandwidth costs: Pay egress from 3+ providers, then ingress to backup
Single point of failure: If server crashes mid-process, start over
Complex orchestration: Need scripts to manage multi-step process

The ZipStream Solution: Direct Aggregation

ZipStream enables zero-storage backup aggregation by streaming files from multiple sources directly into a single archive.

Architecture

┌─────────────┐
│  S3 US-East │─────┐
└─────────────┘     │
                    │
┌─────────────┐     │     ┌─────────────┐     ┌──────────────┐
│  S3 EU-West │────────────│  ZipStream  │────→│ Final Backup │
└─────────────┘     │     └─────────────┘     │  (S3 Glacier) │
                    │                          └──────────────┘
┌─────────────┐     │
│     GCS     │─────┘
└─────────────┘

No intermediate storage required!

Implementation Example

// backup-orchestrator.js
const fetch = require('node-fetch');
const { S3 } = require('aws-sdk');
const { Storage } = require('@google-cloud/storage');

async function createDistributedBackup(backupDate) {
  const s3 = new S3();
  const gcs = new Storage();

  // Generate signed URLs from all sources
  const files = [];

  // 1. Database backups from S3 EU
  const dbBackupUrl = await s3.getSignedUrl('getObject', {
    Bucket: 'backups-eu',
    Key: `db-backup-${backupDate}.sql.gz`,
    Expires: 3600
  });
  files.push({
    url: dbBackupUrl,
    zipPath: 'database/backup.sql.gz'
  });

  // 2. User uploads from S3 US (multiple files)
  const userFiles = await s3.listObjectsV2({
    Bucket: 'user-uploads-us',
    Prefix: `backups/${backupDate}/`
  }).promise();

  for (const file of userFiles.Contents) {
    const url = await s3.getSignedUrl('getObject', {
      Bucket: 'user-uploads-us',
      Key: file.Key,
      Expires: 3600
    });
    files.push({
      url: url,
      zipPath: `user-data/${file.Key.replace('backups/', '')}`
    });
  }

  // 3. Application logs from GCS
  const [gcsFiles] = await gcs.bucket('app-logs').getFiles({
    prefix: `logs/${backupDate}/`
  });

  for (const file of gcsFiles) {
    const [url] = await file.getSignedUrl({
      action: 'read',
      expires: Date.now() + 3600000
    });
    files.push({
      url: url,
      zipPath: `logs/${file.name.replace('logs/', '')}`
    });
  }

  // 4. Configuration files from your API
  files.push({
    url: `https://api.yourcompany.com/backups/config-${backupDate}.json`,
    zipPath: 'config/app-config.json'
  });

  // 5. Create backup manifest
  const manifest = {
    backupDate: backupDate,
    timestamp: new Date().toISOString(),
    fileCount: files.length,
    sources: ['S3-US', 'S3-EU', 'GCS', 'API']
  };

  // Host manifest temporarily (or use data URL)
  const manifestUrl = await uploadManifestTemporarily(manifest);
  files.push({
    url: manifestUrl,
    zipPath: 'BACKUP_MANIFEST.json'
  });

  // 6. Create ZipStream descriptor
  const zipDescriptor = {
    suggestedFilename: `backup-${backupDate}.zip`,
    files: files,
    compression: "STORE" // Backups are already compressed
  };

  // 7. Stream directly to S3 Glacier (or download locally)
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(zipDescriptor)
  });

  // Upload stream directly to Glacier
  await s3.upload({
    Bucket: 'long-term-backups',
    Key: `archives/backup-${backupDate}.zip`,
    Body: zipStream.body,
    StorageClass: 'GLACIER'
  }).promise();

  console.log(`✓ Backup completed: backup-${backupDate}.zip`);
  console.log(`✓ Files aggregated: ${files.length}`);
  console.log(`✓ Zero intermediate storage used`);
}

// Run daily backups
createDistributedBackup(new Date().toISOString().split('T')[0]);

Key Benefits

1. Zero Intermediate Storage

No need for large EC2 instances with attached storage
No local disk to manage or fill up
Cost savings: Eliminate backup server storage costs

2. Bandwidth Optimization

Traditional approach:

Download from S3 EU: 3GB egress = $0.27
Download from S3 US: 10GB egress = $0.90
Download from GCS: 2GB egress = $0.24
Upload to S3 Glacier: 15GB ingress = $0 (ingress is free)
Total: $1.41 per backup

ZipStream approach:

Stream from all sources directly to Glacier
Only pay egress from source providers
Total: $1.41 per backup (same egress costs)
But: No download/upload cycles, no local storage, faster completion

3. Faster Backups

Traditional (sequential):

Download 15GB @ 100Mbps: ~20 minutes
Create ZIP: ~5 minutes
Upload 15GB @ 100Mbps: ~20 minutes
Total: ~45 minutes

ZipStream (parallel streaming):

Stream and upload concurrently: ~20 minutes
Total: ~20 minutes (55% faster)

4. Resilient to Failures

No partial downloads to clean up
If stream fails, just retry
No risk of filling disk and crashing

Advanced Pattern: Multi-Region Disaster Recovery

// Backup strategy: Keep copies in 3 regions
async function distributedDRBackup() {
  const zipDescriptor = await createBackupDescriptor();

  // Create temporary ZipStream link
  const linkResponse = await fetch('https://zipstream.app/api/download-links', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(zipDescriptor)
  });

  const { downloadUrl } = await linkResponse.json();
  const fullUrl = `https://zipstream.app${downloadUrl}`;

  // Upload to 3 regions in parallel
  await Promise.all([
    uploadToS3(fullUrl, 'us-east-1'),
    uploadToS3(fullUrl, 'eu-west-1'),
    uploadToS3(fullUrl, 'ap-south-1')
  ]);

  console.log('✓ Backup replicated to 3 regions');
}

async function uploadToS3(streamUrl, region) {
  const s3 = new S3({ region });
  const response = await fetch(streamUrl);

  await s3.upload({
    Bucket: `backups-${region}`,
    Key: `backup-${Date.now()}.zip`,
    Body: response.body,
    StorageClass: 'GLACIER'
  }).promise();
}

Note: ZipStream temporary links expire in 60 seconds. For multi-region uploads, use POST /api/downloads and pipe to multiple destinations simultaneously.

Kubernetes Backup Example

Backup all configs, secrets, and persistent volume snapshots:

// k8s-backup.js
const k8s = require('@kubernetes/client-node');

async function backupKubernetesCluster() {
  const kc = new k8s.KubeConfig();
  kc.loadFromDefault();

  const files = [];

  // 1. Export all ConfigMaps
  const configMaps = await k8sApi.listConfigMapForAllNamespaces();
  const configMapJson = JSON.stringify(configMaps.body, null, 2);
  const configMapUrl = await uploadToTempStorage(configMapJson, 'configmaps.json');
  files.push({ url: configMapUrl, zipPath: 'k8s/configmaps.json' });

  // 2. Export all Secrets (encrypted)
  const secrets = await k8sApi.listSecretForAllNamespaces();
  const encryptedSecrets = await encrypt(JSON.stringify(secrets.body));
  const secretsUrl = await uploadToTempStorage(encryptedSecrets, 'secrets.enc');
  files.push({ url: secretsUrl, zipPath: 'k8s/secrets.enc' });

  // 3. Export deployment manifests
  const deployments = await appsApi.listDeploymentForAllNamespaces();
  const deploymentsUrl = await uploadToTempStorage(
    JSON.stringify(deployments.body, null, 2),
    'deployments.json'
  );
  files.push({ url: deploymentsUrl, zipPath: 'k8s/deployments.json' });

  // 4. Add PersistentVolume snapshots (already in S3)
  const pvSnapshots = await getPVSnapshotURLs(); // Returns S3 signed URLs
  pvSnapshots.forEach(snapshot => {
    files.push({
      url: snapshot.url,
      zipPath: `volumes/${snapshot.name}.tar.gz`
    });
  });

  // Create unified backup
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    body: JSON.stringify({
      suggestedFilename: `k8s-backup-${Date.now()}.zip`,
      files: files,
      compression: "DEFLATE" // Config files benefit from compression
    })
  });

  // Upload to backup storage
  await uploadBackup(zipStream.body);
}

Docker Registry Backup

Aggregate images from multiple registries:

// Pull image layers from multiple registries and create backup archive
async function backupDockerImages() {
  const images = [
    'myregistry.io/app:v1.0.0',
    'gcr.io/project/service:latest',
    'docker.io/library/postgres:14'
  ];

  const files = [];

  for (const image of images) {
    // Export image to tar (stored temporarily in S3)
    const tarUrl = await exportImageToTar(image);
    files.push({
      url: tarUrl,
      zipPath: `images/${image.replace(/[/:]/g, '_')}.tar`
    });
  }

  // Create backup manifest
  const manifest = {
    images: images,
    timestamp: new Date().toISOString(),
    registry: 'multi-registry-backup'
  };
  const manifestUrl = await uploadToTempStorage(JSON.stringify(manifest), 'manifest.json');
  files.push({ url: manifestUrl, zipPath: 'MANIFEST.json' });

  // Stream to backup
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    body: JSON.stringify({
      suggestedFilename: `docker-backup-${Date.now()}.zip`,
      files: files,
      compression: "STORE" // Tar files are already compressed
    })
  });

  return zipStream.body;
}

Best Practices

1. Pre-flight Validation

// Validate all sources are accessible before starting backup
const validation = await fetch('https://zipstream.app/api/validations', {
  method: 'POST',
  body: JSON.stringify({ files: backupFiles })
});

const result = await validation.json();
if (!result.valid) {
  console.error('Backup failed: Some sources inaccessible');
  console.error(result.results.filter(r => !r.accessible));
  throw new Error('Pre-flight validation failed');
}

2. Backup Integrity Checks

// Include checksums in manifest
const manifest = {
  files: files.map(f => ({
    path: f.zipPath,
    sourceUrl: f.url,
    checksum: await calculateChecksum(f.url) // SHA256
  })),
  timestamp: new Date().toISOString()
};

3. Compression Strategy

// Use compression selectively
const descriptor = {
  files: files.map(f => {
    // Already compressed files
    if (f.zipPath.match(/\.(gz|zip|jpg|png|mp4)$/)) {
      return { ...f, compression: 'STORE' };
    }
    // Text files benefit from compression
    return { ...f, compression: 'DEFLATE' };
  })
};

// Note: ZipStream currently applies one compression setting globally
// For now, pre-compress text files and use STORE globally

4. Monitoring and Alerting

const startTime = Date.now();

try {
  await createDistributedBackup(backupDate);

  const duration = Date.now() - startTime;

  // Send success metric to monitoring
  await sendMetric('backup.success', 1, {
    duration: duration,
    fileCount: files.length
  });

} catch (error) {
  await sendAlert('backup.failed', {
    error: error.message,
    backupDate: backupDate
  });
  throw error;
}

Performance Comparison

Scenario: Daily backup of 20GB across 4 cloud providers

Metric	Traditional	ZipStream	Improvement
Local storage required	40GB	0GB	100% reduction
Backup duration	60 minutes	25 minutes	58% faster
Infrastructure cost	$50/month (EC2)	$0	100% reduction
Bandwidth cost	~$2/day	~$2/day	Same
Failure recovery time	60 minutes (restart)	0 minutes (retry)	Instant

Limitations and Workarounds

Rate Limits

Default: 10 requests/hour per IP
Workaround: Spread backups throughout the day, or contact ZipStream for higher limits

5GB Archive Limit

Maximum ZIP size: 5GB
Workaround: Split large backups into multiple archives by date or service

// Split by service
await createBackup('database', dbFiles);
await createBackup('user-uploads', uploadFiles);
await createBackup('logs', logFiles);

50 File Limit

Maximum: 50 files per archive
Workaround: For more files, create multiple archives or pre-bundle smaller files

// Bundle small config files into a tar first
const configTar = await createTarball(configFiles);
files.push({ url: configTar, zipPath: 'configs.tar' });

Conclusion

ZipStream transforms distributed backup strategies by eliminating intermediate storage and orchestration complexity:

Zero storage overhead: No backup server required
Faster completion: Parallel streaming reduces backup windows
Lower costs: Eliminate backup server infrastructure
Simpler orchestration: One API call replaces complex multi-step scripts
Cloud-native: Works seamlessly with S3, GCS, Azure, and any HTTP source

Whether you’re backing up Kubernetes clusters, Docker registries, or multi-cloud applications, ZipStream provides a production-grade solution for aggregating distributed data without central storage.

Ready to simplify your backups? Get started with ZipStream