Tessl Tile for pypi/jmcomic@2.6.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

client-system.md command-line-interface.md configuration-management.md content-entities.md core-download-api.md download-system.md exception-handling.md index.md plugin-system.md text-data-processing.md

text-data-processing.mddocs/

0
# Text and Data Processing
1

2
Specialized utilities for text processing, HTML parsing, image processing, and cryptographic operations. These tools support the core functionality with URL parsing, data extraction, content processing, and security operations.
3

4
## Types
5

6
```python { .api }
7
from typing import Dict, Any, List, Optional, Union, Pattern, Match
8
```
9

10
## Capabilities
11

12
### Text Processing Utilities
13

14
Comprehensive text processing tools for URL handling, domain management, and ID parsing.
15

16
```python { .api }
17
class JmcomicText:
18
    """
19
    Text processing utilities for URL parsing, domain extraction, and ID parsing.
20
    
21
    Provides essential text manipulation functions for working with JMComic
22
    URLs, domain names, and content identifiers.
23
    
24
    Static Methods:
25
    - parse_to_jm_id(text): Parse text to extract JM IDs
26
    - extract_domain(url): Extract domain from URL
27
    - normalize_url(url): Normalize URL format
28
    - is_valid_jm_id(jm_id): Validate JM ID format
29
    - parse_album_id(text): Extract album ID from text
30
    - parse_photo_id(text): Extract photo ID from text
31
    - clean_filename(filename): Clean filename for filesystem
32
    - format_title(title): Format title for display
33
    """
34
    
35
    @staticmethod
36
    def parse_to_jm_id(text: Union[str, int]) -> str:
37
        """
38
        Parse text or URL to extract JM ID.
39
        
40
        Handles various input formats including URLs, raw IDs,
41
        and text containing IDs.
42
        
43
        Parameters:
44
        - text: str or int - Text containing JM ID
45
        
46
        Returns:
47
        str - Extracted and normalized JM ID
48
        
49
        Raises:
50
        ValueError - If no valid ID found
51
        """
52
    
53
    @staticmethod
54
    def extract_domain(url: str) -> str:
55
        """
56
        Extract domain from URL.
57
        
58
        Parameters:
59
        - url: str - URL to parse
60
        
61
        Returns:
62
        str - Extracted domain name
63
        """
64
    
65
    @staticmethod
66
    def normalize_url(url: str) -> str:
67
        """
68
        Normalize URL format for consistent processing.
69
        
70
        Parameters:
71
        - url: str - URL to normalize
72
        
73
        Returns:
74
        str - Normalized URL
75
        """
76
    
77
    @staticmethod
78
    def is_valid_jm_id(jm_id: Union[str, int]) -> bool:
79
        """
80
        Validate JM ID format.
81
        
82
        Parameters:
83
        - jm_id: str or int - ID to validate
84
        
85
        Returns:
86
        bool - True if valid JM ID format
87
        """
88
    
89
    @staticmethod
90
    def clean_filename(filename: str) -> str:
91
        """
92
        Clean filename for filesystem compatibility.
93
        
94
        Removes or replaces invalid characters for safe file operations.
95
        
96
        Parameters:
97
        - filename: str - Original filename
98
        
99
        Returns:
100
        str - Cleaned filename safe for filesystem
101
        """
102
```
103

104
Usage examples:
105

106
```python
107
# Parse various ID formats
108
jm_id = JmcomicText.parse_to_jm_id("https://example.com/album/123456")
109
jm_id = JmcomicText.parse_to_jm_id("123456")
110
jm_id = JmcomicText.parse_to_jm_id("Album ID: 123456")
111

112
# Validate IDs
113
is_valid = JmcomicText.is_valid_jm_id("123456")
114

115
# Clean filenames
116
safe_filename = JmcomicText.clean_filename("Album: Title with/invalid\\chars")
117
```
118

119
### HTML Parsing and Pattern Matching
120

121
Tools for parsing HTML content and extracting data using regular expressions.
122

123
```python { .api }
124
class PatternTool:
125
    """
126
    Regular expression utilities for HTML parsing and data extraction.
127
    
128
    Provides pre-compiled patterns and matching utilities for
129
    extracting structured data from HTML pages.
130
    
131
    Class Attributes:
132
    - ALBUM_ID_PATTERN: Pattern - Regex for album ID extraction
133
    - PHOTO_ID_PATTERN: Pattern - Regex for photo ID extraction
134
    - IMAGE_URL_PATTERN: Pattern - Regex for image URL extraction
135
    - TITLE_PATTERN: Pattern - Regex for title extraction
136
    
137
    Static Methods:
138
    - match_album_info(html): Extract album information from HTML
139
    - match_photo_info(html): Extract photo information from HTML
140
    - match_image_urls(html): Extract image URLs from HTML
141
    - find_all_matches(pattern, text): Find all regex matches
142
    """
143
    
144
    @staticmethod
145
    def match_album_info(html: str) -> Dict[str, Any]:
146
        """
147
        Extract album information from HTML content.
148
        
149
        Parameters:
150
        - html: str - HTML content to parse
151
        
152
        Returns:
153
        dict - Extracted album information
154
        """
155
    
156
    @staticmethod
157
    def match_photo_info(html: str) -> Dict[str, Any]:
158
        """
159
        Extract photo information from HTML content.
160
        
161
        Parameters:
162
        - html: str - HTML content to parse
163
        
164
        Returns:
165
        dict - Extracted photo information
166
        """
167
    
168
    @staticmethod
169
    def match_image_urls(html: str) -> List[str]:
170
        """
171
        Extract image URLs from HTML content.
172
        
173
        Parameters:
174
        - html: str - HTML content to parse
175
        
176
        Returns:
177
        List[str] - List of extracted image URLs
178
        """
179
    
180
    @staticmethod
181
    def find_all_matches(pattern: Pattern, text: str) -> List[Match]:
182
        """
183
        Find all regex matches in text.
184
        
185
        Parameters:
186
        - pattern: Pattern - Compiled regex pattern
187
        - text: str - Text to search
188
        
189
        Returns:
190
        List[Match] - List of regex match objects
191
        """
192
```
193

194
### Page Processing Tools
195

196
Specialized tools for processing HTML pages and extracting structured data.
197

198
```python { .api }
199
class JmPageTool:
200
    """
201
    HTML page parsing and data extraction utilities.
202
    
203
    Provides high-level functions for parsing JMComic HTML pages
204
    and extracting structured data for albums, photos, and searches.
205
    
206
    Static Methods:
207
    - parse_album_page(html): Parse album detail page
208
    - parse_photo_page(html): Parse photo detail page  
209
    - parse_search_page(html): Parse search results page
210
    - parse_category_page(html): Parse category listing page
211
    - extract_pagination(html): Extract pagination information
212
    - extract_metadata(html): Extract page metadata
213
    """
214
    
215
    @staticmethod
216
    def parse_album_page(html: str) -> 'JmAlbumDetail':
217
        """
218
        Parse album detail page HTML to extract album information.
219
        
220
        Parameters:
221
        - html: str - Album page HTML content
222
        
223
        Returns:
224
        JmAlbumDetail - Parsed album with metadata and episodes
225
        """
226
    
227
    @staticmethod
228
    def parse_photo_page(html: str) -> 'JmPhotoDetail':
229
        """
230
        Parse photo detail page HTML to extract photo information.
231
        
232
        Parameters:
233
        - html: str - Photo page HTML content
234
        
235
        Returns:
236
        JmPhotoDetail - Parsed photo with metadata and images
237
        """
238
    
239
    @staticmethod
240
    def parse_search_page(html: str) -> 'JmSearchPage':
241
        """
242
        Parse search results page HTML.
243
        
244
        Parameters:
245
        - html: str - Search page HTML content
246
        
247
        Returns:
248
        JmSearchPage - Parsed search results with albums and pagination
249
        """
250
    
251
    @staticmethod
252
    def extract_pagination(html: str) -> Dict[str, Any]:
253
        """
254
        Extract pagination information from page.
255
        
256
        Parameters:
257
        - html: str - HTML content with pagination
258
        
259
        Returns:
260
        dict - Pagination data (current_page, total_pages, has_next, etc.)
261
        """
262
```
263

264
### API Response Processing
265

266
Tools for processing and adapting API responses from different client types.
267

268
```python { .api }
269
class JmApiAdaptTool:
270
    """
271
    API response adaptation and transformation utilities.
272
    
273
    Handles conversion between different API response formats and
274
    standardizes data structures across client types.
275
    
276
    Static Methods:
277
    - adapt_album_response(response): Adapt album API response
278
    - adapt_photo_response(response): Adapt photo API response
279
    - adapt_search_response(response): Adapt search API response
280
    - normalize_response_data(data): Normalize response data format
281
    - validate_api_response(response): Validate API response structure
282
    """
283
    
284
    @staticmethod
285
    def adapt_album_response(response: Dict[str, Any]) -> 'JmAlbumDetail':
286
        """
287
        Adapt album API response to standard format.
288
        
289
        Parameters:
290
        - response: dict - Raw API response data
291
        
292
        Returns:
293
        JmAlbumDetail - Standardized album entity
294
        """
295
    
296
    @staticmethod
297
    def adapt_photo_response(response: Dict[str, Any]) -> 'JmPhotoDetail':
298
        """
299
        Adapt photo API response to standard format.
300
        
301
        Parameters:
302
        - response: dict - Raw API response data
303
        
304
        Returns:
305
        JmPhotoDetail - Standardized photo entity
306
        """
307
    
308
    @staticmethod
309
    def normalize_response_data(data: Dict[str, Any]) -> Dict[str, Any]:
310
        """
311
        Normalize response data format across different APIs.
312
        
313
        Parameters:
314
        - data: dict - Raw response data
315
        
316
        Returns:
317
        dict - Normalized data structure
318
        """
319
```
320

321
### Image Processing Tools
322

323
Comprehensive image processing utilities including decryption, format conversion, and manipulation.
324

325
```python { .api }
326
class JmImageTool:
327
    """
328
    Image processing, decryption, and format conversion utilities.
329
    
330
    Provides tools for handling scrambled images, format conversion,
331
    and image manipulation operations.
332
    
333
    Static Methods:
334
    - decrypt_image(image_data, scramble_id): Decrypt scrambled image
335
    - is_image_scrambled(image_data): Check if image is scrambled
336
    - convert_image_format(image_data, target_format): Convert image format
337
    - resize_image(image_data, width, height): Resize image
338
    - get_image_info(image_data): Get image metadata
339
    - merge_images_vertical(images): Merge images vertically
340
    - optimize_image(image_data): Optimize image for size
341
    """
342
    
343
    @staticmethod
344
    def decrypt_image(image_data: bytes, scramble_id: int) -> bytes:
345
        """
346
        Decrypt scrambled image data.
347
        
348
        JMComic images are sometimes scrambled for protection.
349
        This function reverses the scrambling process.
350
        
351
        Parameters:
352
        - image_data: bytes - Scrambled image data
353
        - scramble_id: int - Scramble algorithm identifier
354
        
355
        Returns:
356
        bytes - Decrypted image data
357
        """
358
    
359
    @staticmethod
360
    def is_image_scrambled(image_data: bytes) -> bool:
361
        """
362
        Check if image data is scrambled.
363
        
364
        Parameters:
365
        - image_data: bytes - Image data to check
366
        
367
        Returns:
368
        bool - True if image appears to be scrambled
369
        """
370
    
371
    @staticmethod
372
    def convert_image_format(image_data: bytes, target_format: str) -> bytes:
373
        """
374
        Convert image to different format.
375
        
376
        Parameters:
377
        - image_data: bytes - Original image data
378
        - target_format: str - Target format ('JPEG', 'PNG', 'WEBP')
379
        
380
        Returns:
381
        bytes - Converted image data
382
        """
383
    
384
    @staticmethod
385
    def get_image_info(image_data: bytes) -> Dict[str, Any]:
386
        """
387
        Get image metadata and properties.
388
        
389
        Parameters:
390
        - image_data: bytes - Image data
391
        
392
        Returns:
393
        dict - Image information (width, height, format, size)
394
        """
395
    
396
    @staticmethod
397
    def merge_images_vertical(images: List[bytes]) -> bytes:
398
        """
399
        Merge multiple images vertically into single image.
400
        
401
        Parameters:
402
        - images: List[bytes] - List of image data to merge
403
        
404
        Returns:
405
        bytes - Merged image data
406
        """
407
```
408

409
### Cryptographic Tools
410

411
Encryption and decryption utilities for API communications and data protection.
412

413
```python { .api }
414
class JmCryptoTool:
415
    """
416
    Encryption/decryption utilities for API communications.
417
    
418
    Handles the encryption protocols used by JMComic mobile API
419
    and provides security functions for data protection.
420
    
421
    Static Methods:
422
    - encrypt_api_request(data): Encrypt API request data
423
    - decrypt_api_response(encrypted_data): Decrypt API response
424
    - generate_request_signature(data): Generate request signature
425
    - validate_response_signature(response): Validate response signature
426
    - hash_password(password): Hash password for authentication
427
    """
428
    
429
    @staticmethod
430
    def encrypt_api_request(data: Dict[str, Any]) -> bytes:
431
        """
432
        Encrypt API request data using JMComic protocol.
433
        
434
        Parameters:
435
        - data: dict - Request data to encrypt
436
        
437
        Returns:
438
        bytes - Encrypted request data
439
        """
440
    
441
    @staticmethod
442
    def decrypt_api_response(encrypted_data: bytes) -> Dict[str, Any]:
443
        """
444
        Decrypt API response data using JMComic protocol.
445
        
446
        Parameters:
447
        - encrypted_data: bytes - Encrypted response data
448
        
449
        Returns:
450
        dict - Decrypted response data
451
        """
452
    
453
    @staticmethod
454
    def generate_request_signature(data: Dict[str, Any]) -> str:
455
        """
456
        Generate request signature for API authentication.
457
        
458
        Parameters:
459
        - data: dict - Request data
460
        
461
        Returns:
462
        str - Generated signature
463
        """
464
    
465
    @staticmethod
466
    def validate_response_signature(response: Dict[str, Any]) -> bool:
467
        """
468
        Validate response signature for data integrity.
469
        
470
        Parameters:
471
        - response: dict - API response with signature
472
        
473
        Returns:
474
        bool - True if signature is valid
475
        """
476
```
477

478
## Usage Examples
479

480
```python
481
# Text processing
482
jm_id = JmcomicText.parse_to_jm_id("https://jmcomic.example/album/123456")
483
clean_name = JmcomicText.clean_filename("Album: Title/with\\invalid*chars")
484

485
# HTML parsing
486
album_info = PatternTool.match_album_info(html_content)
487
image_urls = PatternTool.match_image_urls(photo_html)
488

489
# Page processing
490
album = JmPageTool.parse_album_page(album_html)
491
search_results = JmPageTool.parse_search_page(search_html)
492

493
# Image processing
494
decrypted_image = JmImageTool.decrypt_image(scrambled_data, scramble_id)
495
image_info = JmImageTool.get_image_info(image_data)
496
converted_image = JmImageTool.convert_image_format(image_data, 'JPEG')
497

498
# API processing
499
album = JmApiAdaptTool.adapt_album_response(api_response)
500
normalized_data = JmApiAdaptTool.normalize_response_data(raw_data)
501

502
# Cryptographic operations
503
encrypted_request = JmCryptoTool.encrypt_api_request(request_data)
504
decrypted_response = JmCryptoTool.decrypt_api_response(encrypted_response)
505
```
506

507
## Integration with Core Systems
508

509
These tools integrate seamlessly with the core download and client systems:
510

511
- **Text tools** are used throughout for ID parsing and URL handling
512
- **Pattern tools** power the HTML client's data extraction
513
- **Page tools** convert HTML pages to structured entities
514
- **API tools** standardize responses across different client types
515
- **Image tools** handle content processing in downloaders and plugins
516
- **Crypto tools** secure API communications in the mobile client

Version

Tile

Files

text-data-processing.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

text-data-processing.mddocs/