Distributed Backup Aggregation Without Central Storage

Use Case: Distributed Backup Aggregation Without Central Storage

The Challenge: Backing Up Distributed Systems

Modern applications are distributed across multiple services, regions, and providers. When disaster strikes, you need complete backups - but aggregating them is painful:

Typical Architecture:

  • User uploads: S3 (us-east-1)
  • Database backups: S3 (eu-west-1)
  • Application logs: Google Cloud Storage
  • Configuration files: Azure Blob Storage
  • Code artifacts: GitHub/GitLab

Traditional Backup Workflow:

  1. Download database backup from S3 EU (3GB) → Local server
  2. Download user files from S3 US (10GB) → Local server
  3. Download logs from GCS (2GB) → Local server
  4. Create local ZIP archive (15GB) → Local disk
  5. Upload ZIP to backup storage (15GB) → S3 Glacier
  6. Delete local files

Problems:

  • Requires 30GB local storage (download + ZIP)
  • Takes hours: Multiple download/upload cycles
  • High bandwidth costs: Pay egress from 3+ providers, then ingress to backup
  • Single point of failure: If server crashes mid-process, start over
  • Complex orchestration: Need scripts to manage multi-step process

The ZipStream Solution: Direct Aggregation

ZipStream enables zero-storage backup aggregation by streaming files from multiple sources directly into a single archive.

Architecture

┌─────────────┐
│  S3 US-East │─────┐
└─────────────┘     │
                    │
┌─────────────┐     │     ┌─────────────┐     ┌──────────────┐
│  S3 EU-West │────────────│  ZipStream  │────→│ Final Backup │
└─────────────┘     │     └─────────────┘     │  (S3 Glacier) │
                    │                          └──────────────┘
┌─────────────┐     │
│     GCS     │─────┘
└─────────────┘

No intermediate storage required!

Implementation Example

// backup-orchestrator.js
const fetch = require('node-fetch');
const { S3 } = require('aws-sdk');
const { Storage } = require('@google-cloud/storage');

async function createDistributedBackup(backupDate) {
  const s3 = new S3();
  const gcs = new Storage();

  // Generate signed URLs from all sources
  const files = [];

  // 1. Database backups from S3 EU
  const dbBackupUrl = await s3.getSignedUrl('getObject', {
    Bucket: 'backups-eu',
    Key: `db-backup-${backupDate}.sql.gz`,
    Expires: 3600
  });
  files.push({
    url: dbBackupUrl,
    zipPath: 'database/backup.sql.gz'
  });

  // 2. User uploads from S3 US (multiple files)
  const userFiles = await s3.listObjectsV2({
    Bucket: 'user-uploads-us',
    Prefix: `backups/${backupDate}/`
  }).promise();

  for (const file of userFiles.Contents) {
    const url = await s3.getSignedUrl('getObject', {
      Bucket: 'user-uploads-us',
      Key: file.Key,
      Expires: 3600
    });
    files.push({
      url: url,
      zipPath: `user-data/${file.Key.replace('backups/', '')}`
    });
  }

  // 3. Application logs from GCS
  const [gcsFiles] = await gcs.bucket('app-logs').getFiles({
    prefix: `logs/${backupDate}/`
  });

  for (const file of gcsFiles) {
    const [url] = await file.getSignedUrl({
      action: 'read',
      expires: Date.now() + 3600000
    });
    files.push({
      url: url,
      zipPath: `logs/${file.name.replace('logs/', '')}`
    });
  }

  // 4. Configuration files from your API
  files.push({
    url: `https://api.yourcompany.com/backups/config-${backupDate}.json`,
    zipPath: 'config/app-config.json'
  });

  // 5. Create backup manifest
  const manifest = {
    backupDate: backupDate,
    timestamp: new Date().toISOString(),
    fileCount: files.length,
    sources: ['S3-US', 'S3-EU', 'GCS', 'API']
  };

  // Host manifest temporarily (or use data URL)
  const manifestUrl = await uploadManifestTemporarily(manifest);
  files.push({
    url: manifestUrl,
    zipPath: 'BACKUP_MANIFEST.json'
  });

  // 6. Create ZipStream descriptor
  const zipDescriptor = {
    suggestedFilename: `backup-${backupDate}.zip`,
    files: files,
    compression: "STORE" // Backups are already compressed
  };

  // 7. Stream directly to S3 Glacier (or download locally)
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(zipDescriptor)
  });

  // Upload stream directly to Glacier
  await s3.upload({
    Bucket: 'long-term-backups',
    Key: `archives/backup-${backupDate}.zip`,
    Body: zipStream.body,
    StorageClass: 'GLACIER'
  }).promise();

  console.log(`✓ Backup completed: backup-${backupDate}.zip`);
  console.log(`✓ Files aggregated: ${files.length}`);
  console.log(`✓ Zero intermediate storage used`);
}

// Run daily backups
createDistributedBackup(new Date().toISOString().split('T')[0]);

Key Benefits

1. Zero Intermediate Storage

  • No need for large EC2 instances with attached storage
  • No local disk to manage or fill up
  • Cost savings: Eliminate backup server storage costs

2. Bandwidth Optimization

Traditional approach:

  • Download from S3 EU: 3GB egress = $0.27
  • Download from S3 US: 10GB egress = $0.90
  • Download from GCS: 2GB egress = $0.24
  • Upload to S3 Glacier: 15GB ingress = $0 (ingress is free)
  • Total: $1.41 per backup

ZipStream approach:

  • Stream from all sources directly to Glacier
  • Only pay egress from source providers
  • Total: $1.41 per backup (same egress costs)
  • But: No download/upload cycles, no local storage, faster completion

3. Faster Backups

Traditional (sequential):

  • Download 15GB @ 100Mbps: ~20 minutes
  • Create ZIP: ~5 minutes
  • Upload 15GB @ 100Mbps: ~20 minutes
  • Total: ~45 minutes

ZipStream (parallel streaming):

  • Stream and upload concurrently: ~20 minutes
  • Total: ~20 minutes (55% faster)

4. Resilient to Failures

  • No partial downloads to clean up
  • If stream fails, just retry
  • No risk of filling disk and crashing

Advanced Pattern: Multi-Region Disaster Recovery

// Backup strategy: Keep copies in 3 regions
async function distributedDRBackup() {
  const zipDescriptor = await createBackupDescriptor();

  // Create temporary ZipStream link
  const linkResponse = await fetch('https://zipstream.app/api/download-links', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(zipDescriptor)
  });

  const { downloadUrl } = await linkResponse.json();
  const fullUrl = `https://zipstream.app${downloadUrl}`;

  // Upload to 3 regions in parallel
  await Promise.all([
    uploadToS3(fullUrl, 'us-east-1'),
    uploadToS3(fullUrl, 'eu-west-1'),
    uploadToS3(fullUrl, 'ap-south-1')
  ]);

  console.log('✓ Backup replicated to 3 regions');
}

async function uploadToS3(streamUrl, region) {
  const s3 = new S3({ region });
  const response = await fetch(streamUrl);

  await s3.upload({
    Bucket: `backups-${region}`,
    Key: `backup-${Date.now()}.zip`,
    Body: response.body,
    StorageClass: 'GLACIER'
  }).promise();
}

Note: ZipStream temporary links expire in 60 seconds. For multi-region uploads, use POST /api/downloads and pipe to multiple destinations simultaneously.

Kubernetes Backup Example

Backup all configs, secrets, and persistent volume snapshots:

// k8s-backup.js
const k8s = require('@kubernetes/client-node');

async function backupKubernetesCluster() {
  const kc = new k8s.KubeConfig();
  kc.loadFromDefault();

  const files = [];

  // 1. Export all ConfigMaps
  const configMaps = await k8sApi.listConfigMapForAllNamespaces();
  const configMapJson = JSON.stringify(configMaps.body, null, 2);
  const configMapUrl = await uploadToTempStorage(configMapJson, 'configmaps.json');
  files.push({ url: configMapUrl, zipPath: 'k8s/configmaps.json' });

  // 2. Export all Secrets (encrypted)
  const secrets = await k8sApi.listSecretForAllNamespaces();
  const encryptedSecrets = await encrypt(JSON.stringify(secrets.body));
  const secretsUrl = await uploadToTempStorage(encryptedSecrets, 'secrets.enc');
  files.push({ url: secretsUrl, zipPath: 'k8s/secrets.enc' });

  // 3. Export deployment manifests
  const deployments = await appsApi.listDeploymentForAllNamespaces();
  const deploymentsUrl = await uploadToTempStorage(
    JSON.stringify(deployments.body, null, 2),
    'deployments.json'
  );
  files.push({ url: deploymentsUrl, zipPath: 'k8s/deployments.json' });

  // 4. Add PersistentVolume snapshots (already in S3)
  const pvSnapshots = await getPVSnapshotURLs(); // Returns S3 signed URLs
  pvSnapshots.forEach(snapshot => {
    files.push({
      url: snapshot.url,
      zipPath: `volumes/${snapshot.name}.tar.gz`
    });
  });

  // Create unified backup
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    body: JSON.stringify({
      suggestedFilename: `k8s-backup-${Date.now()}.zip`,
      files: files,
      compression: "DEFLATE" // Config files benefit from compression
    })
  });

  // Upload to backup storage
  await uploadBackup(zipStream.body);
}

Docker Registry Backup

Aggregate images from multiple registries:

// Pull image layers from multiple registries and create backup archive
async function backupDockerImages() {
  const images = [
    'myregistry.io/app:v1.0.0',
    'gcr.io/project/service:latest',
    'docker.io/library/postgres:14'
  ];

  const files = [];

  for (const image of images) {
    // Export image to tar (stored temporarily in S3)
    const tarUrl = await exportImageToTar(image);
    files.push({
      url: tarUrl,
      zipPath: `images/${image.replace(/[/:]/g, '_')}.tar`
    });
  }

  // Create backup manifest
  const manifest = {
    images: images,
    timestamp: new Date().toISOString(),
    registry: 'multi-registry-backup'
  };
  const manifestUrl = await uploadToTempStorage(JSON.stringify(manifest), 'manifest.json');
  files.push({ url: manifestUrl, zipPath: 'MANIFEST.json' });

  // Stream to backup
  const zipStream = await fetch('https://zipstream.app/api/downloads', {
    method: 'POST',
    body: JSON.stringify({
      suggestedFilename: `docker-backup-${Date.now()}.zip`,
      files: files,
      compression: "STORE" // Tar files are already compressed
    })
  });

  return zipStream.body;
}

Best Practices

1. Pre-flight Validation

// Validate all sources are accessible before starting backup
const validation = await fetch('https://zipstream.app/api/validations', {
  method: 'POST',
  body: JSON.stringify({ files: backupFiles })
});

const result = await validation.json();
if (!result.valid) {
  console.error('Backup failed: Some sources inaccessible');
  console.error(result.results.filter(r => !r.accessible));
  throw new Error('Pre-flight validation failed');
}

2. Backup Integrity Checks

// Include checksums in manifest
const manifest = {
  files: files.map(f => ({
    path: f.zipPath,
    sourceUrl: f.url,
    checksum: await calculateChecksum(f.url) // SHA256
  })),
  timestamp: new Date().toISOString()
};

3. Compression Strategy

// Use compression selectively
const descriptor = {
  files: files.map(f => {
    // Already compressed files
    if (f.zipPath.match(/\.(gz|zip|jpg|png|mp4)$/)) {
      return { ...f, compression: 'STORE' };
    }
    // Text files benefit from compression
    return { ...f, compression: 'DEFLATE' };
  })
};

// Note: ZipStream currently applies one compression setting globally
// For now, pre-compress text files and use STORE globally

4. Monitoring and Alerting

const startTime = Date.now();

try {
  await createDistributedBackup(backupDate);

  const duration = Date.now() - startTime;

  // Send success metric to monitoring
  await sendMetric('backup.success', 1, {
    duration: duration,
    fileCount: files.length
  });

} catch (error) {
  await sendAlert('backup.failed', {
    error: error.message,
    backupDate: backupDate
  });
  throw error;
}

Performance Comparison

Scenario: Daily backup of 20GB across 4 cloud providers

Metric Traditional ZipStream Improvement
Local storage required 40GB 0GB 100% reduction
Backup duration 60 minutes 25 minutes 58% faster
Infrastructure cost $50/month (EC2) $0 100% reduction
Bandwidth cost ~$2/day ~$2/day Same
Failure recovery time 60 minutes (restart) 0 minutes (retry) Instant

Limitations and Workarounds

Rate Limits

  • Default: 10 requests/hour per IP
  • Workaround: Spread backups throughout the day, or contact ZipStream for higher limits

5GB Archive Limit

  • Maximum ZIP size: 5GB
  • Workaround: Split large backups into multiple archives by date or service
// Split by service
await createBackup('database', dbFiles);
await createBackup('user-uploads', uploadFiles);
await createBackup('logs', logFiles);

50 File Limit

  • Maximum: 50 files per archive
  • Workaround: For more files, create multiple archives or pre-bundle smaller files
// Bundle small config files into a tar first
const configTar = await createTarball(configFiles);
files.push({ url: configTar, zipPath: 'configs.tar' });

Conclusion

ZipStream transforms distributed backup strategies by eliminating intermediate storage and orchestration complexity:

  • Zero storage overhead: No backup server required
  • Faster completion: Parallel streaming reduces backup windows
  • Lower costs: Eliminate backup server infrastructure
  • Simpler orchestration: One API call replaces complex multi-step scripts
  • Cloud-native: Works seamlessly with S3, GCS, Azure, and any HTTP source

Whether you’re backing up Kubernetes clusters, Docker registries, or multi-cloud applications, ZipStream provides a production-grade solution for aggregating distributed data without central storage.


Ready to simplify your backups? Get started with ZipStream

Back to All Articles

Ready to get started?

Try ZipStream and start building scalable file delivery infrastructure.