CtrlK
BlogDocsLog inGet started
Tessl Logo

tessl/pypi-wordcloud

A little word cloud generator for creating visually appealing word clouds from text data.

Overview
Eval results
Files

core-generation.mddocs/

Core Word Cloud Generation

Primary functionality for creating and customizing word clouds, including text-to-visual conversion, frequency-based generation, layout control, and multiple output format support.

Capabilities

WordCloud Class

The main class for generating word clouds with comprehensive customization options for appearance, layout, text processing, and output formatting.

class WordCloud:
    def __init__(
        self,
        font_path=None,
        width=400,
        height=200,
        margin=2,
        ranks_only=None,
        prefer_horizontal=0.9,
        mask=None,
        contour_width=0,
        contour_color='black',
        scale=1,
        color_func=None,
        colormap=None,
        max_words=200,
        min_font_size=4,
        font_step=1,
        stopwords=None,
        random_state=None,
        background_color='black',
        max_font_size=None,
        mode="RGB",
        relative_scaling='auto',
        regexp=None,
        collocations=True,
        normalize_plurals=True,
        repeat=False,
        include_numbers=False,
        min_word_length=0,
        collocation_threshold=30
    ):
        """
        Initialize WordCloud object with customization parameters.

        Parameters:
        - font_path (str, optional): Path to font file (OTF or TTF)
        - width (int): Canvas width in pixels (default: 400)
        - height (int): Canvas height in pixels (default: 200)
        - margin (int): Spacing around words in pixels (default: 2)
        - ranks_only (optional): DEPRECATED parameter with no effect, use relative_scaling instead
        - prefer_horizontal (float): Ratio of horizontal vs vertical placement attempts (0.0-1.0, default: 0.9)
        - mask (numpy.ndarray, optional): Binary mask for word placement shape
        - contour_width (float): Width of mask contour in pixels (default: 0)
        - contour_color (str): Color of mask contour (default: 'black')
        - scale (float): Scaling factor between computation and drawing (default: 1)
        - color_func (callable, optional): Custom color generation function
        - colormap (str, optional): Matplotlib colormap name (default: 'viridis')
        - max_words (int): Maximum number of words to display (default: 200)
        - min_font_size (int): Minimum font size in pixels (default: 4)
        - font_step (int): Font size step increment (default: 1)
        - stopwords (set, optional): Custom stopwords set
        - random_state (int or Random, optional): Random seed for reproducibility
        - background_color (str): Background color (default: 'black')
        - max_font_size (int, optional): Maximum font size in pixels
        - mode (str): Image mode 'RGB' or 'RGBA' (default: 'RGB')
        - relative_scaling (float or str): Importance of word frequencies for sizing ('auto', 0.0-1.0, default: 'auto')
        - regexp (str, optional): Regular expression for tokenization
        - collocations (bool): Whether to include bigrams (default: True)
        - normalize_plurals (bool): Whether to normalize plural forms (default: True)
        - repeat (bool): Whether to repeat words until max_words reached (default: False)
        - include_numbers (bool): Whether to include numbers (default: False)
        - min_word_length (int): Minimum word length to include (default: 0)
        - collocation_threshold (int): Threshold for bigram significance (default: 30)
        """

Text-Based Generation

Generate word clouds directly from text strings with automatic tokenization and frequency calculation.

def generate(self, text):
    """
    Generate word cloud from text string.

    Parameters:
    - text (str): Input text for word cloud generation

    Returns:
    - WordCloud: Self for method chaining
    """

def generate_from_text(self, text):
    """
    Generate word cloud from text string (alias for generate).

    Parameters:
    - text (str): Input text for word cloud generation

    Returns:
    - WordCloud: Self for method chaining
    """

Frequency-Based Generation

Generate word clouds from pre-calculated word frequency dictionaries for precise control over word importance.

def generate_from_frequencies(self, frequencies, max_font_size=None):
    """
    Generate word cloud from word frequency dictionary.

    Parameters:
    - frequencies (dict): Dictionary mapping words to frequencies
    - max_font_size (int, optional): Override maximum font size

    Returns:
    - WordCloud: Self for method chaining
    """

def fit_words(self, frequencies):
    """
    Generate word cloud from word frequencies (alias for generate_from_frequencies).

    Parameters:
    - frequencies (dict): Dictionary mapping words to frequencies

    Returns:
    - WordCloud: Self for method chaining
    """

Text Processing

Extract and process word frequencies from text with customizable tokenization and filtering.

def process_text(self, text):
    """
    Process text and return word frequencies.

    Parameters:
    - text (str): Input text to process

    Returns:
    - dict: Dictionary mapping words to frequencies
    """

Output Generation

Convert generated word clouds to various output formats for display, saving, or further processing.

def to_image(self):
    """
    Convert word cloud to PIL Image object.

    Returns:
    - PIL.Image: Word cloud as PIL Image
    """

def to_array(self):
    """
    Convert word cloud to numpy array.

    Returns:
    - numpy.ndarray: Word cloud as RGB array
    """

def __array__(self):
    """
    Support numpy array conversion.

    Returns:
    - numpy.ndarray: Word cloud as RGB array
    """

def to_file(self, filename):
    """
    Save word cloud to image file.

    Parameters:
    - filename (str): Output file path (supports PNG, JPEG, etc.)

    Returns:
    - WordCloud: Self for method chaining
    """

def to_svg(self, embed_font=False, optimize_embedded_font=True, embed_image=False):
    """
    Export word cloud as SVG format.

    Parameters:
    - embed_font (bool): Whether to embed font data (default: False)
    - optimize_embedded_font (bool): Whether to optimize embedded font (default: True)
    - embed_image (bool): Whether to embed as image (default: False)

    Returns:
    - str: SVG markup string
    """

Styling and Recoloring

Modify colors and appearance of existing word clouds without regenerating layout.

def recolor(self, random_state=None, color_func=None, colormap=None):
    """
    Recolor existing word cloud with new color scheme.

    Parameters:
    - random_state (int or Random, optional): Random seed for color generation
    - color_func (callable, optional): Custom color function
    - colormap (str, optional): Matplotlib colormap name

    Returns:
    - WordCloud: Self for method chaining
    """

Word Cloud Attributes

Properties available after generation containing word and layout information.

words_: dict[str, float]  # Word frequencies (normalized)
layout_: list[tuple]      # Layout data: (word_info, font_size, position, orientation, color)

Usage Examples

Basic Text Generation

from wordcloud import WordCloud

# Create word cloud from text
wc = WordCloud(width=800, height=400, background_color='white')
wc.generate("Python is great for data science and machine learning")

# Save result
wc.to_file('wordcloud.png')

Frequency-Based Generation

from wordcloud import WordCloud

# Use custom word frequencies
frequencies = {'python': 10, 'data': 8, 'science': 6, 'analysis': 4}
wc = WordCloud().generate_from_frequencies(frequencies)
image = wc.to_image()

Masked Shape Generation

from wordcloud import WordCloud
import numpy as np
from PIL import Image

# Load mask image
mask_image = np.array(Image.open('mask.png'))

# Generate word cloud in custom shape
wc = WordCloud(mask=mask_image, contour_width=2, contour_color='blue')
wc.generate(text)
wc.to_file('shaped_wordcloud.png')

Install with Tessl CLI

npx tessl i tessl/pypi-wordcloud

docs

cli-interface.md

color-generation.md

core-generation.md

index.md

text-processing.md

tile.json