or run

npx @tessl/cli init
Log in

Version

Tile

Overview

Evals

Files

docs

apidiff.mdconstraints.mdebnf.mderrors.mdevent.mdgorelease.mdindex.mdio-i2c.mdio-spi.mdjsonrpc2.mdmaps.mdmmap.mdmodgraphviz.mdrand.mdshiny.mdslices.mdslog.mdstats.mdsumdb.mdtrace.mdtxtar.mdtypeparams.mdutf8string.md
tile.json

utf8string.mddocs/

UTF-8 String

The golang.org/x/exp/utf8string package provides an efficient way to index strings by rune (Unicode code point) rather than by byte. This is particularly useful when working with UTF-8 encoded strings where multi-byte characters require byte-level indexing in standard Go strings.

Package Information

  • Package Name: golang.org/x/exp/utf8string
  • Package Type: Go module
  • Language: Go
  • Import Path: golang.org/x/exp/utf8string

Core Imports

import "golang.org/x/exp/utf8string"

Basic Usage

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Create a new UTF-8 string
	s := utf8string.NewString("Hello, 世界!")

	// Get the number of runes (not bytes)
	fmt.Println("Rune count:", s.RuneCount()) // Output: 9 (H, e, l, l, o, ,, space, 世, 界, !)

	// Index by rune position (not byte position)
	r := s.At(7)
	fmt.Printf("Rune at position 7: %c\n", r) // Output: 世

	// Get a slice by rune positions
	slice := s.Slice(0, 5)
	fmt.Println("Slice [0:5]:", slice) // Output: Hello

	// Get the full string
	fmt.Println("Full string:", s.String()) // Output: Hello, 世界!

	// Check if the string is ASCII only
	fmt.Println("Is ASCII:", s.IsASCII()) // Output: false
}

Architecture

The String type wraps a regular Go string and provides efficient rune-based indexing through an internal state machine. Key characteristics:

  • Rune-based indexing: Access characters by rune position instead of byte position
  • Incremental scanning: Efficient forward and backward scanning with O(1) cost per operation
  • Random access: O(N) complexity with optimizations for ASCII strings
  • ASCII optimization: O(1) random access for pure ASCII strings
  • Mutable state: The String type maintains internal state and is not thread-safe

Capabilities

String Creation

Create a new UTF-8 string with rune-based indexing capabilities.

func NewString(contents string) *String

Creates a new String instance that wraps the provided UTF-8 string, enabling efficient rune-based indexing and operations.

Parameters:

  • contents string - The UTF-8 encoded string to wrap

Returns:

  • *String - A pointer to a newly created String instance

String Initialization

Initialize an existing String structure with new contents.

func (s *String) Init(contents string) *String

Initializes an existing String to hold the provided contents. Useful for reusing a String instance with different content.

Parameters:

  • contents string - The UTF-8 encoded string to store

Returns:

  • *String - A pointer to the initialized String

Rune Access

Retrieve a single rune at a specific index position.

func (s *String) At(i int) rune

Returns the rune at the specified index. The sequence of runes is the same as iterating over the contents with a for range clause.

Parameters:

  • i int - The zero-based rune index

Returns:

  • rune - The rune (Unicode code point) at position i

Rune Counting

Get the total number of runes in the string.

func (s *String) RuneCount() int

Returns the number of runes (Unicode code points) in the String. For a string "Hello, 世界", this returns 9, not the byte length.

Returns:

  • int - The number of runes in the string

String Slicing

Extract a substring using rune positions.

func (s *String) Slice(i, j int) string

Returns the string sliced at rune positions [i:j], similar to string slicing in Go but using rune indices instead of byte indices.

Parameters:

  • i int - The starting rune index (inclusive)
  • j int - The ending rune index (exclusive)

Returns:

  • string - The sliced substring

ASCII Detection

Check if the string contains only ASCII characters.

func (s *String) IsASCII() bool

Returns a boolean indicating whether the String contains only ASCII bytes. This is useful for optimization decisions since ASCII strings have O(1) random access.

Returns:

  • bool - true if the string contains only ASCII characters, false otherwise

String Representation

Get the underlying string value.

func (s *String) String() string

Returns the full contents of the String. This method makes the String type directly printable by fmt.Print and other formatting functions.

Returns:

  • string - The complete string contents

Types

type String struct {
	// Has unexported fields
}

String wraps a regular string with internal state that provides efficient indexing by code point (rune) index, as opposed to byte index.

Performance Characteristics:

  • Incremental scanning (forward or backward): O(1) per index operation (though not as fast as a range clause going forwards)
  • Random access: O(N) in the length of the string, but with lower overhead than always scanning from the beginning
  • ASCII optimization: If the string contains only ASCII, random access is O(1)
  • Thread-safety: Unlike the built-in string type, String has internal mutable state and is not thread-safe

Usage Examples

Example 1: Indexing Multi-Byte Characters

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Using a standard Go string with multi-byte characters
	standardStr := "こんにちは" // Japanese text (5 characters, 15 bytes)
	fmt.Printf("Standard string byte length: %d\n", len(standardStr)) // Output: 15

	// Using UTF-8 String for efficient rune indexing
	utf8Str := utf8string.NewString("こんにちは")
	fmt.Printf("UTF-8 String rune count: %d\n", utf8Str.RuneCount()) // Output: 5

	// Access runes by position
	for i := 0; i < utf8Str.RuneCount(); i++ {
		fmt.Printf("Position %d: %c\n", i, utf8Str.At(i))
	}
}

Example 2: Comparing ASCII and Non-ASCII Strings

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// ASCII string
	asciiStr := utf8string.NewString("Hello")
	fmt.Printf("'%s' is ASCII: %v\n", asciiStr.String(), asciiStr.IsASCII()) // true

	// Non-ASCII string
	mixedStr := utf8string.NewString("Hello, 世界")
	fmt.Printf("'%s' is ASCII: %v\n", mixedStr.String(), mixedStr.IsASCII()) // false
}

Example 3: String Slicing with Rune Indices

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	original := utf8string.NewString("The quick brown fox")

	// Slice by rune positions
	words := original.Slice(4, 9) // "quick"
	fmt.Println(words) // Output: quick

	// Compare with standard string slicing (which would be wrong)
	standardStr := "The quick brown fox"
	fmt.Println(standardStr[4:9]) // Output: quic (wrong slice on UTF-8 boundary)
}

Example 4: Reusing String Instances

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Create a String instance
	s := utf8string.NewString("First string")
	fmt.Printf("Rune count: %d\n", s.RuneCount()) // Output: 12

	// Reuse the instance with new content
	s.Init("Second string")
	fmt.Printf("Rune count: %d\n", s.RuneCount()) // Output: 13
	fmt.Println(s.String()) // Output: Second string
}

Example 5: Comparison of Indexing Methods

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
	"unicode/utf8"
)

func main() {
	text := "Café" // 5 runes, 5 bytes for "Caf", 2 bytes for "é", total 7 bytes

	// Standard Go string indexing (by bytes)
	fmt.Printf("Standard string length: %d bytes\n", len(text)) // 5

	// Using utf8 package to decode runes
	runeCount := utf8.RuneCountInString(text)
	fmt.Printf("Rune count (manual): %d runes\n", runeCount) // 4

	// Using UTF-8 String for easy rune access
	utf8Str := utf8string.NewString(text)
	fmt.Printf("UTF-8 String rune count: %d runes\n", utf8Str.RuneCount()) // 4

	// Easy access to specific runes
	for i := 0; i < utf8Str.RuneCount(); i++ {
		fmt.Printf("Rune %d: %c\n", i, utf8Str.At(i))
	}
}