Skip to content
lumalabs.ai
Getting started

Models

Capabilities and pricing for the Luma Agents models — uni-1 and uni-1-max for images, ray-3.2 for video.

The Luma Agents API has models for two media types: uni-1 and uni-1-max for image generation and editing, and ray-3.2 for video. They share the same endpoint, request envelope, and submit-poll-download flow — the supported type values and model-specific fields differ.

ModelSupportsUse for
uni-1image, image_editDefault image generation and editing
uni-1-maximage, image_editHigher-quality image output than uni-1
ray-3.2video, video_edit, video_reframeText-to-video, image-to-video, video editing, and aspect-ratio reframing

See Pricing for current rates and Provisioned Throughput plans.

CapabilitySupported
Text-to-image (type: "image")
Image editing (type: "image_edit")
image_ref for style or content guidance (up to 9)
source image for editing
web_search grounding
Output formats png / jpeg

uni-1-max produces higher-quality output than uni-1; the two share the same wire format and parameter set.

CapabilitySupported
Text-to-video (type: "video")
Image-to-video (video.start_frame / video.end_frame)
Multi-keyframe image-to-video (video.keyframes + video.keyframe_indexes, up to 64 anchors)
Video editing (type: "video_edit" with source.generation_id, source.url, or source.data)
Video reframing (type: "video_reframe")
Extend a prior video (video.start_frame.generation_id / video.end_frame.generation_id)
Per-signal edit controls (video.edit.controls)
HDR output (video.hdr, 720p/1080p)
EXR export (video.exr_export, requires hdr: true)
Seamless looping (video.loop, type: "video" only)

Both uni-1 and uni-1-max accept the same parameter set; the difference is output quality and per-image price.

CapabilityDescription
Text renderingRenders readable text on signs, labels, book covers, and other surfaces. Put exact text in quotes in your prompt.
Spatial reasoningProduces accurate shadows, correct perspective, and physically plausible object layouts.
Cultural stylesUnderstands visual traditions — manga panels with screentone shading, ukiyo-e woodblock prints, film noir lighting, art house cinema framing, and more.
Reference-guided generationAccepts up to 9 reference images via image_ref for text-to-image, or up to 8 for image editing (where the source image occupies its own slot).
Image editingModifies existing images via source — change backgrounds, swap objects, transfer styles while preserving unmentioned parts.
Web search groundingWhen web_search: true, searches the web for visual references before generating.
Multi-panel outputGenerates multi-panel sequences (e.g., storyboards) with consistent style when described in the prompt.

Aspect ratios are model-and-type-dependent. The shared AspectRatio enum has twelve members, but no single (model, type) pair accepts all twelve. See:

FormatBest for
pngLossless quality
jpegSmaller file size, photographs

When output_format is omitted, the model picks a format based on the prompt. Applies to image models only.

Video output is delivered as MP4. When video.hdr: true, the MP4 is HDR-encoded; adding video.exr_export: true exports an EXR file alongside it for professional colour-grading workflows.

Generated images and videos are delivered as presigned URLs that expire after 1 hour. Download promptly or request a fresh URL by polling GET /v1/generations/{id} again.

The default image model. Use it for the broad set of image-generation and image-editing tasks — see Image model strengths above. Defaults to model: "uni-1" when omitted from the request.

See Pricing — uni-1 for the per-task table.

uni-1-max produces higher-quality output than uni-1. Same wire format and parameter set — same prompts, same aspect_ratio, same image_ref semantics. Pass "model": "uni-1-max" on POST /v1/generations.

{ "model": "uni-1-max", "prompt": "A neon-lit Tokyo alley in the rain" }

See Pricing — uni-1-max for the per-task table.

Ray 3.2 generates video from text or from anchor images, edits existing videos, extends previous generations, and reframes videos to a new aspect ratio:

  • Text-to-videotype: "video" with a prompt
  • Image-to-videotype: "video" with video.start_frame and/or video.end_frame as ImageRef
  • Video extendtype: "video" with exactly one prior generation id in video.start_frame.generation_id or video.end_frame.generation_id
  • Video editingtype: "video_edit" with a source video: source.generation_id, a hosted source.url, or inline source.data (the last two with a video/* source.media_type)
  • Video reframingtype: "video_reframe" with a source video and target aspect_ratio

Resolutions: 540p, 720p, 1080p. Durations: 5s, 10s. Aspect ratios: 9:16, 3:4, 1:1, 4:3, 16:9, 21:9.

loop is create-only (type: "video" only). Video edits add video.edit for conditioning — auto_controls for a model-derived schedule, or strength and per-signal controls for manual tuning. Video reframing is standard dynamic range only and rejects video.edit, loop, start_frame, and end_frame.

{
"model": "ray-3.2",
"type": "video",
"prompt": "A slow dolly shot through a misty greenhouse at sunrise",
"video": { "resolution": "720p", "duration": "5s" }
}

See Video generation for full parameters, Video editing for edits, Video reframing for aspect-ratio changes, and Pricing — ray-3.2 per-video pricing for rates.