Kalaido Logo

Kalaido Documentation

Kalaido Overview

Kalaido is built as a coordinated system of expert foundational models. The diffusion pipeline comprises a cascaded structure of two primary diffusion models (both operating at latent space) along with multiple style-LoRAs. This arrangement of MoE style architecture allows Kalaido to maximize frontier level aesthetics, text-rendering accuracy, stylistic consistency, and adherence to human preferences. We innovate at both the two key phases of training: pre-training and reinforcement learning, to improve the results and improve training efficiency. We curate highly specialized datasets from large corpus on Internet and filter out the highest signal images and prompts.

Technical Architecture

Kalaido Architecture

Benchmarking Details

Benchmark Image 1
Benchmark Image 2
Benchmark Image 3
Benchmark Image 4
Benchmark Image 5

Paper Publication(s)

  1. Effective Text-to-Image alignment with Quality Aware Pair Ranking(NeurIPS’24 Adaptive Foundational Model)
  2. VISUAL PROMPTING METHODS FOR GPT-4V BASED ZERO-SHOT GRAPHIC LAYOUT DESIGN GENERATION(ICLR’24)