Multimodal Content¶
MultimodalContent is the core immutable container for text, resources, artifacts, and tags.
It is used across generation input/output, tool payloads, and model context blocks.
Quick Start¶
from draive.multimodal import MultimodalContent, TextContent
from draive.resources import ResourceContent
report = MultimodalContent.of(
"Quarterly report:",
ResourceContent.of(image_bytes, mime_type="image/png"),
TextContent.of("Highlights: revenue up 18%"),
)
MultimodalContent.of(...) normalizes nested content and keeps ordering deterministic.
Part Types¶
TextContentfor plain text payloadsResourceContentfor inlined bytes (image/audio/video/documents)ResourceReferencefor URI-based external resourcesArtifactContentfor typed structured payloadsMultimodalTagfor lightweight XML-style tagging
Typed Artifacts¶
Artifacts are ideal for internal typed payload exchange between workflow stages.
from collections.abc import Sequence
from draive import State
from draive.multimodal import ArtifactContent, MultimodalContent
class ProductSummary(State, serializable=True):
name: str
price: float
highlights: Sequence[str]
content = MultimodalContent.of(
"Product spotlight",
ArtifactContent.of(
ProductSummary(name="Aurora Lamp", price=89.0, highlights=("Warm light", "USB-C")),
category="product",
),
)
Common Helpers¶
# Metadata-based extraction and grouping.
user_parts = content.matching_meta(source="user")
section_groups = content.split_by_meta(key="section")
# Media and artifact extraction.
images = content.images()
resources = content.resources(mime_type="application/pdf")
artifacts = content.artifacts(model=ProductSummary, category="product")
# Build filtered variant for downstream processing.
clean = content.without_resources().without_artifacts()
Tags¶
Use tags to wrap semantic regions while keeping them in multimodal flow.
from draive.multimodal import MultimodalTag
title = MultimodalTag.of("Q1 Sales Report", name="title", meta={"lang": "en"})
document = MultimodalContent.of(title, "Summary: sales increased by 15%")
first_title = document.tag("title")
all_titles = document.tags("title")
Using With Generation¶
from draive import TextGeneration, ctx
from draive.openai import OpenAI, OpenAIResponsesConfig
async def analyze_image(image_bytes: bytes) -> str:
async with ctx.scope(
"image_analysis",
OpenAIResponsesConfig(model="gpt-5"),
disposables=(OpenAI(),),
):
prompt = MultimodalContent.of(
"Describe this image.",
ResourceContent.of(image_bytes, mime_type="image/jpeg"),
)
return await TextGeneration.generate(
instructions="Provide a concise analysis.",
input=prompt,
)
The same content helpers can be applied to outputs, which makes post-processing and auditing consistent.