0
# Collaborative Filtering
1
2
Recommendation system capabilities including specialized learners and models for collaborative filtering tasks, matrix factorization, and neural collaborative filtering approaches.
3
4
## Capabilities
5
6
### Collaborative Filtering Learner
7
8
Main entry point for creating recommendation system learners using collaborative filtering techniques.
9
10
```python { .api }
11
def collab_learner(dls, n_factors=50, use_nn=False, emb_szs=None, layers=None,
12
config=None, y_range=None, loss_func=None, **kwargs):
13
"""
14
Create a collaborative filtering learner.
15
16
Parameters:
17
- dls: CollabDataLoaders with user-item interaction data
18
- n_factors: Number of factors/embeddings for matrix factorization
19
- use_nn: Use neural network instead of dot product
20
- emb_szs: Custom embedding sizes for users and items
21
- layers: Hidden layer sizes for neural collaborative filtering
22
- config: Model configuration dictionary
23
- y_range: Range of rating values
24
- loss_func: Loss function (MSE for ratings, BCE for binary)
25
- **kwargs: Additional learner arguments
26
27
Returns:
28
- Learner instance for collaborative filtering
29
"""
30
```
31
32
### Collaborative Filtering Data Processing
33
34
Specialized data loaders for recommendation datasets with user-item interactions.
35
36
```python { .api }
37
class CollabDataLoaders(DataLoaders):
38
"""DataLoaders for collaborative filtering datasets."""
39
40
@classmethod
41
def from_csv(cls, path, csv_name='ratings.csv', header='infer', delimiter=None,
42
user_name=None, item_name=None, rating_name=None, valid_pct=0.2,
43
seed=None, **kwargs):
44
"""
45
Create CollabDataLoaders from CSV file.
46
47
Parameters:
48
- path: Path to data directory
49
- csv_name: Name of CSV file with ratings
50
- header: CSV header handling
51
- delimiter: CSV delimiter
52
- user_name: Column name for user IDs (auto-detected if None)
53
- item_name: Column name for item IDs (auto-detected if None)
54
- rating_name: Column name for ratings (auto-detected if None)
55
- valid_pct: Validation split percentage
56
- seed: Random seed for splitting
57
58
Returns:
59
- CollabDataLoaders instance
60
"""
61
62
@classmethod
63
def from_df(cls, ratings, valid_pct=0.2, user_name=None, item_name=None,
64
rating_name=None, seed=None, **kwargs):
65
"""
66
Create from pandas DataFrame.
67
68
Parameters:
69
- ratings: DataFrame with user-item ratings
70
- valid_pct: Validation percentage
71
- user_name: User column name
72
- item_name: Item column name
73
- rating_name: Rating column name
74
- seed: Random seed
75
76
Returns:
77
- CollabDataLoaders instance
78
"""
79
80
class CollabBlock(TransformBlock):
81
"""Transform block for collaborative filtering data."""
82
83
def __init__(self): ...
84
```
85
86
### Collaborative Filtering Models
87
88
Model architectures for different collaborative filtering approaches.
89
90
```python { .api }
91
class EmbeddingDotBias(nn.Module):
92
"""
93
Matrix factorization model with bias terms.
94
Standard approach using dot product of user and item embeddings with biases.
95
"""
96
97
def __init__(self, n_users, n_items, n_factors, y_range=(0, 5.5)):
98
"""
99
Initialize embedding dot bias model.
100
101
Parameters:
102
- n_users: Number of unique users
103
- n_items: Number of unique items
104
- n_factors: Number of embedding factors
105
- y_range: Range of rating values
106
"""
107
108
def forward(self, users, items):
109
"""
110
Forward pass computing predictions.
111
112
Parameters:
113
- users: User ID tensor
114
- items: Item ID tensor
115
116
Returns:
117
- Predicted ratings tensor
118
"""
119
120
class EmbeddingNN(nn.Module):
121
"""
122
Neural collaborative filtering model.
123
Uses neural network on concatenated user and item embeddings.
124
"""
125
126
def __init__(self, n_users, n_items, n_factors, layers, y_range=(0, 5.5),
127
use_bn=True, emb_drop=0.0, lin_drop=0.0):
128
"""
129
Initialize neural collaborative filtering model.
130
131
Parameters:
132
- n_users: Number of unique users
133
- n_items: Number of unique items
134
- n_factors: Number of embedding factors
135
- layers: List of hidden layer sizes
136
- y_range: Range of rating values
137
- use_bn: Use batch normalization
138
- emb_drop: Embedding dropout probability
139
- lin_drop: Linear layer dropout probability
140
"""
141
142
def forward(self, users, items):
143
"""Forward pass through neural network."""
144
145
class CollabTensorBias(TensorBase):
146
"""Tensor class for collaborative filtering with bias terms."""
147
148
def show(self, ctx=None, **kwargs): ...
149
```
150
151
### Collaborative Filtering Utilities
152
153
Utility functions for working with recommendation data and evaluation.
154
155
```python { .api }
156
def get_collab_learner(dls, n_factors=50, use_nn=False, **kwargs):
157
"""Create collaborative filtering learner (alias for collab_learner)."""
158
159
def collab_config(**kwargs):
160
"""Get default configuration for collaborative filtering models."""
161
162
def split_collab_data(df, valid_pct=0.2, seed=None, user_name=None,
163
item_name=None, rating_name=None):
164
"""
165
Split collaborative filtering data maintaining user/item coverage.
166
167
Parameters:
168
- df: DataFrame with ratings data
169
- valid_pct: Validation percentage
170
- seed: Random seed
171
- user_name: User column name
172
- item_name: Item column name
173
- rating_name: Rating column name
174
175
Returns:
176
- (train_df, valid_df)
177
"""
178
179
def collab_bias(n_factors=50, y_range=None):
180
"""Create embedding dot bias model."""
181
182
def collab_nn(n_factors=50, layers=None, y_range=None, **kwargs):
183
"""Create neural collaborative filtering model."""
184
```
185
186
### Recommendation Metrics
187
188
Specialized metrics for evaluating recommendation systems.
189
190
```python { .api }
191
def mean_reciprocal_rank(pred, targ, k=None):
192
"""
193
Mean Reciprocal Rank metric for ranking evaluation.
194
195
Parameters:
196
- pred: Predicted rankings
197
- targ: Target rankings
198
- k: Top-k cutoff
199
200
Returns:
201
- MRR score
202
"""
203
204
def ndcg(pred, targ, k=None):
205
"""
206
Normalized Discounted Cumulative Gain.
207
208
Parameters:
209
- pred: Predicted rankings
210
- targ: Target rankings
211
- k: Top-k cutoff
212
213
Returns:
214
- NDCG score
215
"""
216
217
def precision_at_k(pred, targ, k=10):
218
"""Precision at K metric."""
219
220
def recall_at_k(pred, targ, k=10):
221
"""Recall at K metric."""
222
223
def hit_rate_at_k(pred, targ, k=10):
224
"""Hit rate at K metric."""
225
```
226
227
### Advanced Collaborative Filtering
228
229
Advanced techniques for improving recommendation quality.
230
231
```python { .api }
232
class DeepFactorizationMachine(nn.Module):
233
"""Deep Factorization Machine for collaborative filtering."""
234
235
def __init__(self, field_dims, embed_dim, mlp_dims, dropout): ...
236
237
class NeuralMatrixFactorization(nn.Module):
238
"""Neural Matrix Factorization combining MF and MLP."""
239
240
def __init__(self, n_users, n_items, n_factors, layers): ...
241
242
def create_loaders_collab(df, valid_pct=0.2, seed=None, **kwargs):
243
"""Create data loaders for collaborative filtering with advanced options."""
244
245
def bias_learner(dls, n_factors=50, **kwargs):
246
"""Create learner with bias-only model (no user/item factors)."""
247
248
class CollabLine:
249
"""Single interaction representation for collaborative filtering."""
250
251
def __init__(self, cats, classes, names): ...
252
def show(self): ...
253
```
254
255
### Cold Start and Content-Based Extensions
256
257
Approaches for handling new users/items and incorporating content features.
258
259
```python { .api }
260
class HybridModel(nn.Module):
261
"""
262
Hybrid model combining collaborative and content-based filtering.
263
Handles cold start problem by incorporating item/user features.
264
"""
265
266
def __init__(self, n_users, n_items, n_factors, content_features=None,
267
layers=None): ...
268
269
def content_based_features(items_df, feature_cols=None):
270
"""
271
Extract content-based features from item metadata.
272
273
Parameters:
274
- items_df: DataFrame with item information
275
- feature_cols: Columns to use as features
276
277
Returns:
278
- Feature matrix for items
279
"""
280
281
def popularity_baseline(train_df, user_col='user', item_col='item',
282
rating_col='rating'):
283
"""
284
Create popularity-based baseline recommendations.
285
286
Parameters:
287
- train_df: Training data
288
- user_col: User column name
289
- item_col: Item column name
290
- rating_col: Rating column name
291
292
Returns:
293
- Popularity scores for items
294
"""
295
```