Skip to content

Multimodal Content

MultimodalContent is the core immutable container for text, resources, artifacts, and tags.

It is used across generation input/output, tool payloads, and model context blocks.

Quick Start

from draive.multimodal import MultimodalContent, TextContent
from draive.resources import ResourceContent

report = MultimodalContent.of(
    "Quarterly report:",
    ResourceContent.of(image_bytes, mime_type="image/png"),
    TextContent.of("Highlights: revenue up 18%"),
)

MultimodalContent.of(...) normalizes nested content and keeps ordering deterministic.

Part Types

  • TextContent for plain text payloads
  • ResourceContent for inlined bytes (image/audio/video/documents)
  • ResourceReference for URI-based external resources
  • ArtifactContent for typed structured payloads
  • MultimodalTag for lightweight XML-style tagging

Typed Artifacts

Artifacts are ideal for internal typed payload exchange between workflow stages.

from collections.abc import Sequence

from draive import State
from draive.multimodal import ArtifactContent, MultimodalContent


class ProductSummary(State, serializable=True):
    name: str
    price: float
    highlights: Sequence[str]


content = MultimodalContent.of(
    "Product spotlight",
    ArtifactContent.of(
        ProductSummary(name="Aurora Lamp", price=89.0, highlights=("Warm light", "USB-C")),
        category="product",
    ),
)

Common Helpers

# Metadata-based extraction and grouping.
user_parts = content.matching_meta(source="user")
section_groups = content.split_by_meta(key="section")

# Media and artifact extraction.
images = content.images()
resources = content.resources(mime_type="application/pdf")
artifacts = content.artifacts(model=ProductSummary, category="product")

# Build filtered variant for downstream processing.
clean = content.without_resources().without_artifacts()

Tags

Use tags to wrap semantic regions while keeping them in multimodal flow.

from draive.multimodal import MultimodalTag


title = MultimodalTag.of("Q1 Sales Report", name="title", meta={"lang": "en"})
document = MultimodalContent.of(title, "Summary: sales increased by 15%")

first_title = document.tag("title")
all_titles = document.tags("title")

Using With Generation

from draive import TextGeneration, ctx
from draive.openai import OpenAI, OpenAIResponsesConfig


async def analyze_image(image_bytes: bytes) -> str:
    async with ctx.scope(
        "image_analysis",
        OpenAIResponsesConfig(model="gpt-5"),
        disposables=(OpenAI(),),
    ):
        prompt = MultimodalContent.of(
            "Describe this image.",
            ResourceContent.of(image_bytes, mime_type="image/jpeg"),
        )
        return await TextGeneration.generate(
            instructions="Provide a concise analysis.",
            input=prompt,
        )

The same content helpers can be applied to outputs, which makes post-processing and auditing consistent.