0
# Prompt Construction
1
2
Flexible multimodal prompt system supporting text, images, and tokens with advanced attention control mechanisms. Enables fine-grained control over model attention through various control types and supports rich multimodal input combinations.
3
4
## Capabilities
5
6
### Core Prompt Container
7
8
Main container class for organizing multimodal prompt content with support for text, images, and token sequences.
9
10
```python { .api }
11
class Prompt:
12
def __init__(self, items: Union[str, Sequence[PromptItem]]):
13
"""
14
Create prompt from items or single string.
15
16
Parameters:
17
- items: Single string or sequence of PromptItem objects (Text, Image, Tokens)
18
"""
19
20
@staticmethod
21
def from_text(text: str, controls: Optional[Sequence[TextControl]] = None) -> Prompt:
22
"""Create prompt from plain text with optional attention controls."""
23
24
@staticmethod
25
def from_image(image: Image) -> Prompt:
26
"""Create prompt from single image."""
27
28
@staticmethod
29
def from_tokens(tokens: Sequence[int], controls: Optional[Sequence[TokenControl]] = None) -> Prompt:
30
"""Create prompt from token sequence with optional attention controls."""
31
32
@staticmethod
33
def from_json(items_json: Sequence[Mapping[str, Any]]) -> Prompt:
34
"""Create prompt from JSON representation."""
35
36
def to_json(self) -> Sequence[Mapping[str, Any]]:
37
"""Serialize prompt to JSON format."""
38
```
39
40
### Text Prompt Items
41
42
Text content with optional attention manipulation controls for fine-tuning model focus.
43
44
```python { .api }
45
class Text:
46
text: str
47
controls: Sequence[TextControl]
48
"""
49
Text prompt item with attention controls.
50
51
Attributes:
52
- text: The text content
53
- controls: Sequence of TextControl objects for attention manipulation
54
"""
55
56
@staticmethod
57
def from_text(text: str) -> Text:
58
"""Create Text item from plain string."""
59
60
@staticmethod
61
def from_json(json: Mapping[str, Any]) -> Text:
62
"""Create Text item from JSON representation."""
63
64
def to_json(self) -> Mapping[str, Any]:
65
"""Serialize to JSON format."""
66
```
67
68
### Image Prompt Items
69
70
Image content with support for cropping, attention controls, and multiple input formats.
71
72
```python { .api }
73
class Image:
74
base_64: str
75
cropping: Optional[Cropping]
76
controls: Sequence[ImageControl]
77
"""
78
Image prompt item with cropping and attention controls.
79
80
Attributes:
81
- base_64: Base64 encoded image data
82
- cropping: Optional image cropping specification
83
- controls: Sequence of ImageControl objects for attention manipulation
84
"""
85
86
@classmethod
87
def from_image_source(
88
cls,
89
image_source: Union[str, Path, bytes],
90
controls: Optional[Sequence[ImageControl]] = None
91
) -> Image:
92
"""Create from various image sources (file path, URL, or bytes)."""
93
94
@classmethod
95
def from_bytes(
96
cls,
97
bytes: bytes,
98
cropping: Optional[Cropping] = None,
99
controls: Optional[Sequence[ImageControl]] = None
100
) -> Image:
101
"""Create from raw image bytes."""
102
103
@classmethod
104
def from_url(
105
cls,
106
url: str,
107
controls: Optional[Sequence[ImageControl]] = None
108
) -> Image:
109
"""Create from image URL."""
110
111
@classmethod
112
def from_file(
113
cls,
114
path: Union[str, Path],
115
controls: Optional[Sequence[ImageControl]] = None
116
) -> Image:
117
"""Create from local file path."""
118
119
@classmethod
120
def from_file_with_cropping(
121
cls,
122
path: str,
123
upper_left_x: int,
124
upper_left_y: int,
125
crop_size: int,
126
controls: Optional[Sequence[ImageControl]] = None
127
) -> Image:
128
"""Create from file with square cropping."""
129
130
@classmethod
131
def from_json(cls, json: Mapping[str, Any]) -> Image:
132
"""Create from JSON representation."""
133
134
def to_json(self) -> Mapping[str, Any]:
135
"""Serialize to JSON format."""
136
137
def to_image(self) -> PILImage:
138
"""Convert to PIL Image object."""
139
140
def dimensions(self) -> Tuple[int, int]:
141
"""Get image dimensions as (width, height)."""
142
```
143
144
### Token Prompt Items
145
146
Direct token sequence input with attention controls for low-level prompt manipulation.
147
148
```python { .api }
149
class Tokens:
150
tokens: Sequence[int]
151
controls: Sequence[TokenControl]
152
"""
153
Token sequence prompt item with attention controls.
154
155
Attributes:
156
- tokens: Sequence of token IDs
157
- controls: Sequence of TokenControl objects for attention manipulation
158
"""
159
160
@staticmethod
161
def from_token_ids(token_ids: Sequence[int]) -> Tokens:
162
"""Create from sequence of token IDs."""
163
164
@staticmethod
165
def from_json(json: Mapping[str, Any]) -> Tokens:
166
"""Create from JSON representation."""
167
168
def to_json(self) -> Mapping[str, Any]:
169
"""Serialize to JSON format."""
170
```
171
172
### Attention Control Systems
173
174
#### Text Control
175
176
Fine-grained attention manipulation for text content based on character positions.
177
178
```python { .api }
179
class TextControl:
180
start: int
181
length: int
182
factor: float
183
token_overlap: Optional[ControlTokenOverlap] = None
184
"""
185
Attention control for text prompts.
186
187
Attributes:
188
- start: Starting character index
189
- length: Number of characters to affect
190
- factor: Attention adjustment factor (>1 increases, <1 decreases)
191
- token_overlap: How to handle partial token overlap
192
"""
193
```
194
195
#### Image Control
196
197
Spatial attention control for image regions using normalized coordinates.
198
199
```python { .api }
200
class ImageControl:
201
left: float
202
top: float
203
width: float
204
height: float
205
factor: float
206
token_overlap: Optional[ControlTokenOverlap] = None
207
"""
208
Spatial attention control for images.
209
210
Attributes:
211
- left: Left coordinate (0-1, normalized)
212
- top: Top coordinate (0-1, normalized)
213
- width: Width (0-1, normalized)
214
- height: Height (0-1, normalized)
215
- factor: Attention adjustment factor
216
- token_overlap: How to handle partial token overlap
217
"""
218
```
219
220
#### Token Control
221
222
Direct attention manipulation for specific token positions.
223
224
```python { .api }
225
class TokenControl:
226
pos: int
227
factor: float
228
"""
229
Direct attention control for tokens.
230
231
Attributes:
232
- pos: Token position index
233
- factor: Attention adjustment factor
234
"""
235
```
236
237
#### Control Token Overlap
238
239
Enumeration controlling how attention factors are applied when they partially overlap with tokens.
240
241
```python { .api }
242
class ControlTokenOverlap(Enum):
243
Partial = "partial" # Proportional factor adjustment
244
Complete = "complete" # Full factor application
245
```
246
247
### Image Cropping
248
249
Square cropping specification for image preprocessing.
250
251
```python { .api }
252
class Cropping:
253
upper_left_x: int
254
upper_left_y: int
255
size: int
256
"""
257
Square image cropping specification.
258
259
Attributes:
260
- upper_left_x: X coordinate of upper left corner
261
- upper_left_y: Y coordinate of upper left corner
262
- size: Size of the square crop in pixels
263
"""
264
```
265
266
### Usage Examples
267
268
```python
269
from aleph_alpha_client import (
270
Prompt, Text, Image, Tokens,
271
TextControl, ImageControl, TokenControl,
272
ControlTokenOverlap
273
)
274
275
# Simple text prompt
276
prompt = Prompt.from_text("What is artificial intelligence?")
277
278
# Text with attention control
279
text_with_control = Text(
280
text="Please focus on this important part of the text.",
281
controls=[
282
TextControl(
283
start=17, # Start at "important"
284
length=9, # Length of "important"
285
factor=2.0, # Double attention
286
token_overlap=ControlTokenOverlap.Complete
287
)
288
]
289
)
290
prompt = Prompt([text_with_control])
291
292
# Image from file
293
image = Image.from_file("path/to/image.jpg")
294
prompt = Prompt.from_image(image)
295
296
# Image with cropping and attention control
297
image = Image.from_file_with_cropping(
298
"path/to/image.jpg",
299
upper_left_x=100,
300
upper_left_y=100,
301
crop_size=200,
302
controls=[
303
ImageControl(
304
left=0.2, top=0.2, width=0.6, height=0.6,
305
factor=1.5 # Increase attention on center region
306
)
307
]
308
)
309
310
# Multimodal prompt
311
multimodal_prompt = Prompt([
312
Text.from_text("Describe this image:"),
313
Image.from_file("image.jpg"),
314
Text.from_text("Focus on the colors and composition.")
315
])
316
317
# Token-level control
318
tokens = Tokens.from_token_ids([1, 2, 3, 4, 5])
319
tokens_with_control = Tokens(
320
tokens=[1, 2, 3, 4, 5],
321
controls=[
322
TokenControl(pos=2, factor=3.0) # Emphasize token at position 2
323
]
324
)
325
```