Multimodal data¶
One of the most important skills when using Draive is fully understanding how to treat multimodal data and model context.
Any data can be converted into the Multimodal alias (text, audio, image, or artifact; any
DataModel). This multimodal representation is usually the input and output type for most
Draive functions.
Multimodal data creates the model context (user inputs and model outputs), so understanding it is crucial.
Multimodal¶
The Multimodal alias merges MultimodalContent, MultimodalTag, and others. In some places input
parameters are typed as Multimodal for automated transformation.
Note
Constructors and helpers such as `MultimodalContent.of(*elements: "Multimodal")` use the `Multimodal` alias to normalize any mix of multimodal parts into one consistent `MultimodalContent`.
Multimodal Content¶
MultimodalContent is the central container bundling the different MultimodalContentPart blocks.
Concrete parts carry concrete payloads: plain text (TextContent), references or embedded resources
(ResourceReference, ResourceContent), artifacts (ArtifactContent), and tags (MultimodalTag).
As a result any span of content can be examined as a set of selected parts or as an entire tag tree.
MultimodalContent serves as the input or output in multiple places in the Draive framework. It
can be used for filtering, splitting, replacing, and other operations.
A few examples of MultimodalContent usage in Draive:
@tooldecorator decorates functions that returnMultimodalContent(see Basic tools use)ModelInputandModelOutputclasses useMultimodalContentTextGeneration.generate(...)acceptsMultimodalContentas input (see Basic usage)
MultimodalContent is an easy-to-use and intelligent class that does a lot for you under the
hood:
- It avoids extra nesting:
inner_multimodal = MultimodalContent.of("Hello world!")
print(inner_multimodal) # {'type': 'content', 'parts': [{'text': 'Hello world!', 'meta': {}}]}
outer_multimodal = MultimodalContent.of(inner_multimodal)
print(outer_multimodal) # {'type': 'content', 'parts': [{'text': 'Hello world!', 'meta': {}}]}
# (Same as the first one despite nesting)
- It merges multiple parts if the types match:
class User(DataModel):
first_name: str
last_name: str
content = MultimodalContent.of(
MultimodalTag.of(
MultimodalTag.of(
"Hello",
name="inner",
),
ArtifactContent.of(
User(
first_name="James",
last_name="Smith",
)
),
name="outer",
)
)
print(content)
# {
# 'type': 'content',
# 'parts': [
# {
# 'text': '<outer><inner>Hello</inner>',
# 'meta': {}
# },
# {
# 'category': 'User',
# 'artifact': {
# 'first_name': 'James',
# 'last_name': 'Smith'
# },
# 'hidden': False,
# 'meta': {}
# },
# {
# 'text': '</outer>',
# 'meta': {}
# }
# ]
# }
!!! Note
- Comes with a set of helper functions to speed up your work. Examples:
print(multimodal.texts())
# (
# {'text': '<outer><inner>Hello</inner>', 'meta': {}},
# {'text': '</outer>', 'meta': {}}
# )
print(multimodal.tags())
# (
# {'name': 'outer', 'content': {...}, 'meta': {}},
# {'name': 'inner', 'content': {...}, 'meta': {}}
# )
print(multimodal.artifacts())
# (
# {
# 'category': 'User',
# 'artifact': {
# 'first_name': 'James',
# 'last_name': 'Smith'
# },
# 'hidden': False,
# 'meta': {}
# },
# )
!!! Tip
`MultimodalContent` has more ready-to-use methods for filtering such as `matching_meta()`, `split_by_meta()`, `without_resources()` or `audio()`. This is another argument to use `MultimodalContent` rather than other data models
Model Input¶
ModelInput groups the user-provided blocks (ModelInputBlock). Each block is backed by
MultimodalContent, so it preserves every part type described above. The input stream may also
embed tool responses (ModelToolResponse), which return their payload as MultimodalContent,
letting you treat tool output the same way as regular text-and-media blocks.
{style="height:400px; margin: auto;
display: block;"}
Note
Note that `ModelToolResponse` has a `content` attribute of type `MultimodalContent`. This is the reason why `@tool` decorated functions must return `MultimodalContent`.
Model Output¶
ModelOutput mirrors the same structure on the response side. Output blocks (ModelOutputBlock)
expose MultimodalContent, and both the model's reasoning trail (ModelReasoning) and its tool
requests (ModelToolRequest) rely on the same container. This means the full generation flow - from
visible content to internal thinking and tool invocations - can be analysed with one coherent set of
helpers.
{style="height:400px; margin:
auto; display: block;"}
Tip