or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

Files

docs

authentication.mdcontent-processing.mdcookies.mdhttp-client.mdhttp-requests.mdindex.mdtesting.md

content-processing.mddocs/

0

# Response Content Processing

1

2

Functions for processing HTTP response content with support for streaming, buffered access, automatic encoding detection, and JSON parsing. These functions handle the asynchronous nature of Twisted's response system.

3

4

## Capabilities

5

6

### Incremental Content Collection

7

8

Collects response body data incrementally as it arrives, useful for streaming large responses or processing data in chunks.

9

10

```python { .api }

11

def collect(response, collector):

12

"""

13

Incrementally collect the body of the response.

14

15

This function may only be called once for a given response.

16

If the collector raises an exception, it will be set as the error

17

value on the response Deferred and the HTTP transport will be closed.

18

19

Parameters:

20

- response: IResponse - The HTTP response to collect body from

21

- collector: callable - Function called with each data chunk (bytes)

22

23

Returns:

24

Deferred that fires with None when entire body has been read

25

"""

26

```

27

28

### Complete Content Retrieval

29

30

Gets the complete response content as bytes, caching the result for multiple calls.

31

32

```python { .api }

33

def content(response):

34

"""

35

Read the complete contents of an HTTP response.

36

37

This function may be called multiple times for a response, it uses

38

a WeakKeyDictionary to cache the contents of the response.

39

40

Parameters:

41

- response: IResponse - The HTTP response to get contents of

42

43

Returns:

44

Deferred that fires with complete content as bytes

45

"""

46

```

47

48

### Text Content Decoding

49

50

Decodes response content as text using automatic charset detection from Content-Type headers or a specified encoding.

51

52

```python { .api }

53

def text_content(response, encoding="ISO-8859-1"):

54

"""

55

Read and decode HTTP response contents as text.

56

57

The charset is automatically detected from the Content-Type header.

58

If no charset is specified, the provided encoding is used as fallback.

59

60

Parameters:

61

- response: IResponse - The HTTP response to decode

62

- encoding: str - Fallback encoding if none detected (default: ISO-8859-1)

63

64

Returns:

65

Deferred that fires with decoded text as str

66

"""

67

```

68

69

### JSON Content Parsing

70

71

Parses response content as JSON, automatically handling UTF-8 encoding for JSON data.

72

73

```python { .api }

74

def json_content(response, **kwargs):

75

"""

76

Read and parse HTTP response contents as JSON.

77

78

This function relies on text_content() and may be called multiple

79

times for a given response. JSON content is automatically decoded

80

as UTF-8 per RFC 7159.

81

82

Parameters:

83

- response: IResponse - The HTTP response to parse

84

- **kwargs: Additional keyword arguments for json.loads()

85

86

Returns:

87

Deferred that fires with parsed JSON data

88

"""

89

```

90

91

## Usage Examples

92

93

### Basic Content Access

94

95

```python

96

import treq

97

from twisted.internet import defer

98

99

@defer.inlineCallbacks

100

def get_content():

101

response = yield treq.get('https://httpbin.org/get')

102

103

# Get raw bytes

104

raw_data = yield treq.content(response)

105

print(f"Raw data: {raw_data[:100]}...")

106

107

# Get decoded text

108

text_data = yield treq.text_content(response)

109

print(f"Text data: {text_data[:100]}...")

110

111

# Parse as JSON

112

json_data = yield treq.json_content(response)

113

print(f"JSON data: {json_data}")

114

```

115

116

### Streaming Large Responses

117

118

```python

119

@defer.inlineCallbacks

120

def stream_large_file():

121

response = yield treq.get('https://httpbin.org/bytes/10000')

122

123

chunks = []

124

def collector(data):

125

chunks.append(data)

126

print(f"Received chunk of {len(data)} bytes")

127

128

yield treq.collect(response, collector)

129

total_data = b''.join(chunks)

130

print(f"Total received: {len(total_data)} bytes")

131

```

132

133

### Processing Different Content Types

134

135

```python

136

@defer.inlineCallbacks

137

def handle_different_types():

138

# JSON API response

139

json_response = yield treq.get('https://httpbin.org/json')

140

data = yield treq.json_content(json_response)

141

142

# Plain text response

143

text_response = yield treq.get('https://httpbin.org/robots.txt')

144

text = yield treq.text_content(text_response)

145

146

# Binary data (image, file, etc.)

147

binary_response = yield treq.get('https://httpbin.org/bytes/1024')

148

binary_data = yield treq.content(binary_response)

149

150

# Custom JSON parsing with parameters

151

json_response = yield treq.get('https://httpbin.org/json')

152

# Parse with custom options

153

data = yield treq.json_content(json_response, parse_float=float, parse_int=int)

154

```

155

156

### Error Handling

157

158

```python

159

@defer.inlineCallbacks

160

def handle_content_errors():

161

try:

162

response = yield treq.get('https://httpbin.org/status/500')

163

164

# Content functions work regardless of HTTP status

165

content = yield treq.text_content(response)

166

print(f"Error response content: {content}")

167

168

except Exception as e:

169

print(f"Request failed: {e}")

170

171

try:

172

response = yield treq.get('https://httpbin.org/html')

173

174

# This will raise an exception if content is not valid JSON

175

json_data = yield treq.json_content(response)

176

177

except ValueError as e:

178

print(f"JSON parsing failed: {e}")

179

# Fall back to text content

180

text_data = yield treq.text_content(response)

181

```

182

183

### Response Object Methods

184

185

The _Response object also provides convenient methods for content access:

186

187

```python

188

@defer.inlineCallbacks

189

def use_response_methods():

190

response = yield treq.get('https://httpbin.org/json')

191

192

# These are equivalent to the module-level functions

193

content_bytes = yield response.content()

194

text_content = yield response.text()

195

json_data = yield response.json()

196

197

# Incremental collection

198

chunks = []

199

yield response.collect(chunks.append)

200

```

201

202

## Types

203

204

Content-related types:

205

206

```python { .api }

207

# Collector function type for incremental processing

208

CollectorFunction = Callable[[bytes], None]

209

210

# Encoding detection return type

211

Optional[str] # Charset name or None if not detected

212

213

# Content function return types

214

Deferred[bytes] # For content()

215

Deferred[str] # For text_content()

216

Deferred[Any] # For json_content()

217

Deferred[None] # For collect()

218

```

219

220

## Encoding Detection

221

222

treq automatically detects character encoding from HTTP headers:

223

224

1. **Content-Type header parsing**: Extracts charset parameter from Content-Type

225

2. **JSON default**: Uses UTF-8 for application/json responses per RFC 7159

226

3. **Fallback encoding**: Uses provided encoding parameter (default: ISO-8859-1)

227

4. **Charset validation**: Validates charset names against RFC 2978 specification

228

229

The encoding detection handles edge cases like:

230

- Multiple Content-Type headers (uses last one)

231

- Quoted charset values (`charset="utf-8"`)

232

- Case-insensitive charset names

233

- Invalid charset characters (falls back to default)

234

235

## Performance Considerations

236

237

- **Buffering**: By default, treq buffers complete responses in memory

238

- **Unbuffered responses**: Use `unbuffered=True` in request to stream large responses

239

- **Multiple access**: Content functions cache results for repeated access to same response

240

- **Streaming**: Use `collect()` for processing large responses incrementally

241

- **Memory usage**: Consider streaming for responses larger than available memory