Create a generation

POST/generations

Submit an image or video generation job. Returns immediately with an opaque job ID to poll via GET /generations/{id}.

Body ParametersJSONExpand Collapse

prompt: string

Text prompt

minLength1

maxLength6000

aspect_ratio: optional "3:1" or "2:1" or "21:9" or 9 more

Output aspect ratio. Valid values depend on the selected model and generation type; the server validates the final model-specific set.

One of the following:

"3:1"

"2:1"

"21:9"

"16:9"

"4:3"

"3:2"

"1:1"

"3:4"

"2:3"

"9:16"

"1:2"

"1:3"

image_ref: optional array of ImageRef { data, generation_id, media_type, url }

Reference images for style/content guidance. Up to 9 for type 'image', up to 8 for type 'image_edit'.

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

MIME type (for example, image/jpeg or video/mp4). Required with data. Required with source.url on video_edit/video_reframe so the route can dispatch video ingest before fetching bytes; optional for image URLs.

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

model: optional Model

Model identifier. uni-1 is the default image tier; uni-1-max produces higher-quality output than uni-1 at a higher per-image price. ray-3.2 is the public video model for text-to-video, image-to-video, and video-to-video editing.

One of the following:

"uni-1"

"uni-1-max"

"ray-3.2"

output_format: optional "png" or "jpeg"

Output image format

One of the following:

"png"

"jpeg"

source: optional ImageRef { data, generation_id, media_type, url }

Media reference for guided generation. Provide exactly one of url, inline base64 data, or generation_id. URL/data references accept image media at image positions; video_edit and video_reframe sources also accept source.url or source.data when source.media_type is a video/* MIME. generation_id chains image_edit off a prior image output, video_edit/video_reframe off a prior video output, and video.start_frame/end_frame for extension.

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

style: optional "auto" or "manga"

Style preset (auto, manga)

One of the following:

"auto"

"manga"

type: optional "image" or "image_edit" or "video" or 2 more

The kind of generation to perform

One of the following:

"image"

"image_edit"

"video"

"video_edit"

"video_reframe"

user_id: optional string

Your end-user's stable opaque identifier (no PII). Forwarded to upstream model providers as their per-user tagging field so trust & safety violations can be attributed to a specific end-user rather than the whole API account. Also used for per-end-user usage breakdowns in /v1/usage. Strongly recommended for partner integrations.

maxLength256

video: optional VideoOptions { duration, edit, end_frame, 8 more }

Ray 3.2 video request options. Common output settings live at the top level for type=video, type=video_edit, and type=video_reframe; video-to-video conditioning lives under edit.

duration: optional VideoDuration

Video duration

One of the following:

"5s"

"10s"

edit: optional VideoEditOptions { auto_controls, controls, keyframe_indexes, 2 more }

Ray 3.2 video-to-video edit controls. Only valid under video.edit when type is video_edit. The source video must be 18 seconds or shorter; output duration matches the source.

auto_controls: optional boolean

When true, the model derives the control schedule from the source video. When omitted, supplying strength or controls implies manual mode.

controls: optional AdvancedControls { depth, face, normals, 2 more }

Per-signal manual conditioning controls for video edits

depth: optional DepthControl { blur, enabled }

Depth / scene-geometry conditioning control

blur: optional number

Depth-map blur amount from 0 to 1. Higher values allow more geometric freedom.

formatfloat

minimum0

maximum1

enabled: optional boolean

Enable or disable depth conditioning. Omit to use the model default.

face: optional FaceControl { enabled }

Face-identity conditioning control

enabled: optional boolean

Enable or disable face conditioning. Omit to use the model default.

normals: optional NormalsControl { augmentation, enabled }

Surface-normals conditioning control

augmentation: optional number

Surface-normals augmentation from 0 to 1. Higher values allow more reinterpretation of surface geometry.

formatfloat

minimum0

maximum1

enabled: optional boolean

Enable or disable normals conditioning. Omit to use the model default.

pose: optional PoseControl { enabled, strength }

Pose / skeleton conditioning control

enabled: optional boolean

Enable or disable pose conditioning. Omit to use the model default.

strength: optional PoseControlStrength

Pose-conditioning strength

One of the following:

"precise"

"coarse"

trajectory: optional TrajectoryControl { enabled, sparsity }

Motion-trajectory conditioning control

enabled: optional boolean

Enable or disable trajectory conditioning. Omit to use the model default.

sparsity: optional number

Point-trajectory sparsity from 0 to 1. Higher values use fewer motion anchors.

formatfloat

minimum0

maximum1

keyframe_indexes: optional array of number

Parallel list of non-negative, unique frame positions in the source video's frame grid where each keyframes[i] is anchored. Must match keyframes in length.

keyframes: optional array of ImageRef { data, generation_id, media_type, url }

Multi-anchor guide-frame images at arbitrary source-frame positions (parallel with keyframe_indexes). Up to 64 anchors. Mutually exclusive with video.start_frame (the single-anchor case). Each entry takes the same ImageRef shape as source / image_ref[].

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

strength: optional VideoEditStrength

How much a video edit preserves or reimagines the source

One of the following:

"adhere_1"

"adhere_2"

"adhere_3"

"flex_1"

"flex_2"

"flex_3"

"reimagine_1"

"reimagine_2"

"reimagine_3"

end_frame: optional ImageRef { data, generation_id, media_type, url }

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

exr_export: optional boolean

Export EXR alongside the MP4. Requires hdr=true.

hdr: optional boolean

Generate HDR video. Requires HDR access. Not supported for video_reframe.

keyframe_indexes: optional array of number

Parallel list of non-negative, unique output-frame positions where each keyframes[i] is anchored, in the duration x 24fps grid (5s -> 0..120, 10s -> 0..240). Must match keyframes in length.

keyframes: optional array of ImageRef { data, generation_id, media_type, url }

Image-to-video guide frames (type=video only), each pinned to an output-frame position via the parallel keyframe_indexes. 1-64 anchors: a single anchor is a valid start-pinned i2v (an alternate to start_frame), and any count up to 64 places guide frames at arbitrary positions. Unlike start_frame/end_frame (the legacy 2-frame surface), this supports arbitrary positions, 10s durations, and HDR. Mutually exclusive with start_frame / end_frame / loop. Only supported on model ray-3.2. For video-to-video keyframes use video.edit.keyframes on type=video_edit instead.

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

loop: optional boolean

Generate a seamlessly looping video. Only valid for type=video; not supported with duration=10s or hdr=true.

resolution: optional VideoResolution

Ray 3.2 video output resolution. 360p is the draft tier (fast, low-cost previews), accepted on type=video, video_edit, and video_reframe; on type=video it is SDR-only (not valid with hdr=true). 1080p is public for video generation; video_reframe 1080p is still rolling out and may return a coming-soon validation error until enabled for the caller.

One of the following:

"360p"

"540p"

"720p"

"1080p"

source_position: optional SourcePosition { h_norm, w_norm, x_norm, y_norm }

Normalized source rectangle inside the output canvas for video_reframe. Omit to let the model choose the default centered-fit crop.

h_norm: number

Source rectangle height, as a fraction of canvas height. Up to 2.0 so the source can bleed off-canvas.

formatfloat

exclusiveMinimum0

maximum2

w_norm: number

Source rectangle width, as a fraction of canvas width. Up to 2.0 so the source can bleed off-canvas.

formatfloat

exclusiveMinimum0

maximum2

x_norm: number

Left edge of the source rectangle, as a fraction of canvas width. May be negative when the source extends off-canvas.

formatfloat

minimum-2

maximum2

y_norm: number

Top edge of the source rectangle, as a fraction of canvas height. May be negative when the source extends off-canvas.

formatfloat

minimum-2

maximum2

start_frame: optional ImageRef { data, generation_id, media_type, url }

data: optional string

Base64-encoded image or video data

generation_id: optional string

UUID of a prior generation owned by the same caller. Used on source for image_edit, video_edit, and video_reframe chaining and on video.start_frame / video.end_frame for video extension.

formatuuid

media_type: optional string

url: optional string

Publicly accessible image URL, or a video URL when used as source for video_edit/video_reframe with media_type=video/*.

web_search: optional boolean

Enable web search grounding — the agent can search the web and download reference images before generating.

ReturnsExpand Collapse

Generation = object { id, created_at, model, 5 more }

Generation status and output

id: string

Generation identifier

formatuuid

created_at: string

Creation timestamp

model: Model

Model used

One of the following:

"uni-1"

"uni-1-max"

"ray-3.2"

state: "queued" or "processing" or "completed" or "failed"

Current state of the generation

One of the following:

"queued"

"processing"

"completed"

"failed"

type: "image" or "image_edit" or "video" or 2 more

The kind of generation to perform

One of the following:

"image"

"image_edit"

"video"

"video_edit"

"video_reframe"

failure_code: optional GenerationFailureCode

Machine-readable failure code for programmatic handling

One of the following:

"content_moderated"

"generation_failed"

"budget_exhausted"

"output_not_found"

"image_too_large"

"unsupported_format"

"corrupt_input"

"invalid_request"

"rate_limited"

failure_reason: optional string

Human-readable failure description

output: optional array of GenerationOutput { type, url }

Generated outputs (populated on completion)

type: string

Media type (e.g. image, video)

url: string

Presigned URL (1hr expiry)

formaturi