Multimodal Data¶
Multimodal is the normalized input/output type used by Draive generation APIs.
Instead of handling separate text/image/audio types at each boundary, you can pass one multimodal value and let Draive normalize it.
It accepts plain text and typed multimodal parts such as:
TextContentResourceContent/ResourceReferenceArtifactContentMultimodalTagMultimodalContent(already-composed payload)
MultimodalContent¶
MultimodalContent is the main immutable container for multimodal parts.
from draive.multimodal import MultimodalContent, TextContent
from draive.resources import ResourceContent
content = MultimodalContent.of(
TextContent.of("Describe the image"),
ResourceContent.of(image_bytes, mime_type="image/jpeg"),
)
Useful helpers include:
texts(),images(),audio(),resources()for part extractionartifacts(...),tags(...)for structured component retrievalmatching_meta(...),split_by_meta(...)for metadata-aware filteringwithout_resources(),without_artifacts()for creating reduced variants
Artifacts With Typed State¶
Use artifacts to move typed payloads through multimodal flows.
from draive import State
from draive.multimodal import ArtifactContent, MultimodalContent
class User(State, serializable=True):
first_name: str
last_name: str
payload = MultimodalContent.of(
"User profile",
ArtifactContent.of(User(first_name="James", last_name="Smith"), category="profile"),
)
Model Input/Output Context¶
ModelInput and ModelOutput store MultimodalContent blocks, so the same content model is used
for:
- direct user input,
- model responses,
- tool request/response payloads.
This keeps transformations, filtering, and observability logic consistent across pipeline stages.