Tessl Tile for pypi/spacy@2.3.0

or run

npx @tessl/cli init

Version

Tile

Overview

Evals

Files

docs

core-objects.md index.md language-models.md pattern-matching.md pipeline-components.md training.md visualization.md

index.mddocs/

0
# spaCy
1

2
Industrial-strength Natural Language Processing (NLP) in Python. spaCy is designed for production use and provides fast, accurate processing for 70+ languages with state-of-the-art neural network models for tokenization, part-of-speech tagging, dependency parsing, named entity recognition, and text classification.
3

4
## Package Information
5

6
- **Package Name**: spacy
7
- **Language**: Python
8
- **Installation**: `pip install spacy`
9
- **Models**: Download language models with `python -m spacy download en_core_web_sm`
10

11
## Core Imports
12

13
```python
14
import spacy
15

16
# Load a language model
17
nlp = spacy.load("en_core_web_sm")
18
```
19

20
Most common imports:
21
```python
22
from spacy import displacy
23
from spacy.matcher import Matcher, PhraseMatcher
24
from spacy.tokens import Doc, Token, Span
25
```
26

27
## Basic Usage
28

29
```python
30
import spacy
31

32
# Load a language model
33
nlp = spacy.load("en_core_web_sm")
34

35
# Process text
36
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
37

38
# Access linguistic annotations
39
for token in doc:
40
    print(token.text, token.pos_, token.dep_, token.lemma_)
41

42
# Access named entities
43
for ent in doc.ents:
44
    print(ent.text, ent.label_)
45

46
# Process multiple texts efficiently
47
texts = ["First text", "Second text", "Third text"]
48
docs = list(nlp.pipe(texts))
49
```
50

51
## Architecture
52

53
spaCy's processing pipeline is built around a Language object that chains together multiple pipeline components. Each document passes through tokenization, then through pipeline components (tagger, parser, NER, etc.) in sequence. This design allows for:
54

55
- **Efficient processing**: Stream processing with `nlp.pipe()` for batches
56
- **Modular architecture**: Add, remove, or replace pipeline components
57
- **Multi-language support**: 70+ language models with specialized tokenizers
58
- **Production-ready**: Optimized for speed and memory usage
59

60
## Capabilities
61

62
### Core Processing Objects
63

64
The fundamental objects for text processing including documents, tokens, spans, and vocabulary management. These form the foundation of all spaCy operations.
65

66
```python { .api }
67
class Language:
68
    def __call__(self, text: str) -> Doc: ...
69
    def pipe(self, texts: Iterable[str]) -> Iterator[Doc]: ...
70

71
class Doc:
72
    text: str
73
    ents: tuple
74
    sents: Iterator
75
    
76
class Token:
77
    text: str
78
    pos_: str
79
    lemma_: str
80
    
81
class Span:
82
    text: str
83
    label_: str
84
```
85

86
[Core Objects](./core-objects.md)
87

88
### Pipeline Components
89

90
Built-in pipeline components for linguistic analysis including part-of-speech tagging, dependency parsing, named entity recognition, and text classification.
91

92
```python { .api }
93
class Tagger: ...
94
class DependencyParser: ...
95
class EntityRecognizer: ...
96
class TextCategorizer: ...
97
```
98

99
[Pipeline Components](./pipeline-components.md)
100

101
### Pattern Matching
102

103
Powerful pattern matching systems for finding and extracting specific linguistic patterns, phrases, and dependency structures from text.
104

105
```python { .api }
106
class Matcher:
107
    def add(self, key: str, patterns: List[dict]) -> None: ...
108
    def __call__(self, doc: Doc) -> List[tuple]: ...
109

110
class PhraseMatcher:
111
    def add(self, key: str, docs: List[Doc]) -> None: ...
112
```
113

114
[Pattern Matching](./pattern-matching.md)
115

116
### Language Models
117

118
Access to 70+ language-specific models and tokenizers, each optimized for specific linguistic characteristics and writing systems.
119

120
```python { .api }
121
def load(name: str, **overrides) -> Language: ...
122
def blank(name: str, **kwargs) -> Language: ...
123
```
124

125
[Language Models](./language-models.md)
126

127
### Visualization
128

129
Interactive visualization tools for displaying linguistic analysis including dependency trees, named entities, and custom visualizations.
130

131
```python { .api }
132
def render(docs, style: str = "dep", **options) -> str: ...
133
def serve(docs, style: str = "dep", port: int = 5000, **options) -> None: ...
134
```
135

136
[Visualization](./visualization.md)
137

138
### Training and Model Building
139

140
Tools for training custom models, fine-tuning existing models, and creating specialized NLP pipelines for domain-specific applications.
141

142
```python { .api }
143
def train(nlp: Language, examples: List, **kwargs) -> dict: ...
144
def evaluate(nlp: Language, examples: List, **kwargs) -> dict: ...
145
```
146

147
[Training](./training.md)
148

149
## Key Types
150

151
```python { .api }
152
class Language:
153
    """Main NLP pipeline class."""
154
    vocab: Vocab
155
    pipeline: List[tuple]
156
    pipe_names: List[str]
157
    
158
    def __call__(self, text: str) -> Doc: ...
159
    def pipe(self, texts: Iterable[str], batch_size: int = 1000) -> Iterator[Doc]: ...
160
    def add_pipe(self, component, name: str = None, **kwargs) -> callable: ...
161

162
class Doc:
163
    """Container for accessing linguistic annotations."""
164
    text: str
165
    text_with_ws: str
166
    ents: tuple
167
    noun_chunks: Iterator
168
    sents: Iterator
169
    vector: numpy.ndarray
170
    
171
    def similarity(self, other) -> float: ...
172
    def to_json(self) -> dict: ...
173

174
class Token:
175
    """Individual token with linguistic annotations."""
176
    text: str
177
    lemma_: str
178
    pos_: str
179
    tag_: str
180
    dep_: str
181
    ent_type_: str
182
    head: 'Token'
183
    children: Iterator
184
    is_alpha: bool
185
    is_digit: bool
186
    is_punct: bool
187
    like_num: bool
188

189
class Span:
190
    """Slice of a document."""
191
    text: str
192
    label_: str
193
    kb_id_: str
194
    vector: numpy.ndarray
195
    
196
    def similarity(self, other) -> float: ...
197
    def as_doc(self) -> Doc: ...
198

199
class Vocab:
200
    """Vocabulary store."""
201
    strings: StringStore
202
    vectors: Vectors
203
    
204
    def __getitem__(self, string: str) -> Lexeme: ...
205
```

Version

Tile

Files

index.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

index.mddocs/