Multimodal Content¶
MultimodalContent
is the core container for mixing text, media, and structured data across Draive
pipelines. It keeps parts immutable, preserves ordering, and exposes rich helpers to query and
transform what you feed into or receive from models.
Quick Start¶
from draive import MultimodalContent, TextContent, ResourceContent
report = MultimodalContent.of(
"Quarterly report:",
ResourceContent.of(image_bytes, mime_type="image/png"),
TextContent.of("Highlights: revenue up 18%"),
)
- Input accepts plain strings,
TextContent
,ResourceContent
,ResourceReference
,ArtifactContent
, otherMultimodalContent
, andMultimodalTag
instances. - Construction is normalized: adjacent text fragments are merged and existing
MultimodalContent
instances are reused. - The resulting object is immutable; every mutating-style call returns a new instance.
Building Blocks¶
Part type | When to use | Key helpers |
---|---|---|
TextContent |
Text blocks with optional metadata (e.g. author, language) | .of(...) , .meta , .to_str() |
ResourceContent |
Inline binary data such as images or audio | .of(bytes, mime_type=...) , .to_data_uri() |
ResourceReference |
Reference external resources by URI when you cannot embed bytes | .of(url, mime_type=...) |
ArtifactContent |
Wrap strongly typed DataModel objects for passing structured payloads |
.of(artifact, category=...) |
MultimodalTag |
Lightweight XML-like tags for templating or post-processing | .of(content, name=..., meta=...) |
All parts share the same metadata model, so you can attach consistent descriptors like
{"section": "summary"}
regardless of part type.
Creating Content¶
from collections.abc import Sequence
from draive import DataModel, MultimodalContent, ArtifactContent, ResourceReference, TextContent
class ProductSummary(DataModel):
name: str
price: float
highlights: Sequence[str]
thumbnail = ResourceReference.of(
"https://cdn.example.com/items/1234/thumbnail.jpg",
mime_type="image/jpeg",
)
content = MultimodalContent.of(
TextContent.of("Product spotlight", meta={"section": "title"}),
thumbnail,
ArtifactContent.of(
ProductSummary(
name="Aurora Lamp",
price=89.0,
highlights=("Warm light", "USB-C power"),
),
category="product",
),
)
Metadata & Organization¶
Use metadata to organize parts and later filter or group them.
from draive import MultimodalContent
# Filter by exact metadata match
user_parts = content.matching_meta(source="user", section="summary")
# Group whenever a meta key changes value
section_groups = content.split_by_meta(key="section")
Common patterns:
- Tag user-generated vs. system-generated text with
{ "source": "user" }
and{ "source": "assistant" }
. - Store structured routing hints like
{ "stage": "retrieval" }
to drive downstream stages.
Working with Resources and Artifacts¶
# Resource helpers
if content.contains_resources:
images = content.images()
audio_tracks = content.audio()
video_clips = content.video()
resources = content.resources(mime_type="application/pdf")
content_without_media = content.without_resources()
# Artifact helpers
artifact_profiles = content.artifacts(model=ProductSummary)
content_without_artifacts = content.without_artifacts()
resources(mime_type=...)
narrows by MIME type (exact match).artifacts(model=..., category=...)
lets you filter by wrappedDataModel
type and logical category.
Tagging & Lightweight Markup¶
Wrap content in MultimodalTag
when you need template-like markers that survive round-trips through
generation.
from draive import MultimodalContent, MultimodalTag, TextContent
title = MultimodalTag.of(
TextContent.of("Q1 Sales Report"),
name="title",
meta={"lang": "en"},
)
document = MultimodalContent.of(
title,
"Summary: sales increased by 15%",
)
first_title = document.tag("title")
all_titles = document.tags("title")
updated = document.replacing_tag(
"title",
MultimodalContent.of("Updated Title"),
strip_tags=True,
)
tag(name)
returns the first matching tag;tags(name)
returns all.replacing_tag(...)
swaps one or all occurrences. Passstrip_tags=True
to unwrap the original tag markers.
Transformations & Utilities¶
# Append new pieces
extended = content.appending("Additional notes", thumbnail)
# Combine multiple multimodal payloads
final = MultimodalContent.of(content, extended)
# Render a text-only view (resources become placeholders)
text_view = content.to_str()
Because every method returns a new instance, you can chain calls to build complex documents while keeping the original inputs intact.
Using in Generation Pipelines¶
import asyncio
from draive import MultimodalContent, ResourceContent, TextGeneration, ctx
from draive.openai import OpenAI, OpenAIResponsesConfig
async def analyze_image(image_bytes: bytes) -> str:
async with ctx.scope(
"image_analysis",
OpenAIResponsesConfig(model="gpt-4o"),
disposables=(OpenAI(),),
):
prompt = MultimodalContent.of(
"Describe this image",
ResourceContent.of(image_bytes, mime_type="image/jpeg"),
)
response = await TextGeneration.generate(
instructions="Provide a concise analysis",
input=prompt,
)
return response.to_str()
asyncio.run(analyze_image(b"..."))
This example scopes provider configuration, builds a mixed prompt, and feeds it directly into a
generation facade. The response is also a MultimodalContent
, so you can reuse the same helpers
when handling model outputs.
Best Practices¶
- Keep metadata small and serializable; prefer strings, numbers, and tuples of primitives.
- Embed only assets that must travel with the request. Use
ResourceReference
for large, cacheable files. - Normalize user content early to attach provenance metadata and validate MIME types.
- When chaining transformations, retain the original content for auditing by storing both the source and derived instances.
- Use
ArtifactContent
to bridge typed internal data (e.g. summaries, retrieval chunks) rather than serializing to JSON manually.
With these patterns you can confidently build, inspect, and transform rich multimodal payloads while preserving structure for downstream Draive stages.