or run

npx @tessl/cli init

UTF-8 String

The golang.org/x/exp/utf8string package provides an efficient way to index strings by rune (Unicode code point) rather than by byte. This is particularly useful when working with UTF-8 encoded strings where multi-byte characters require byte-level indexing in standard Go strings.

Package Information

Package Name: golang.org/x/exp/utf8string
Package Type: Go module
Language: Go
Import Path: golang.org/x/exp/utf8string

Core Imports

import "golang.org/x/exp/utf8string"

Basic Usage

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Create a new UTF-8 string
	s := utf8string.NewString("Hello, 世界!")

	// Get the number of runes (not bytes)
	fmt.Println("Rune count:", s.RuneCount()) // Output: 9 (H, e, l, l, o, ,, space, 世, 界, !)

	// Index by rune position (not byte position)
	r := s.At(7)
	fmt.Printf("Rune at position 7: %c\n", r) // Output: 世

	// Get a slice by rune positions
	slice := s.Slice(0, 5)
	fmt.Println("Slice [0:5]:", slice) // Output: Hello

	// Get the full string
	fmt.Println("Full string:", s.String()) // Output: Hello, 世界!

	// Check if the string is ASCII only
	fmt.Println("Is ASCII:", s.IsASCII()) // Output: false
}

Architecture

The String type wraps a regular Go string and provides efficient rune-based indexing through an internal state machine. Key characteristics:

Rune-based indexing: Access characters by rune position instead of byte position
Incremental scanning: Efficient forward and backward scanning with O(1) cost per operation
Random access: O(N) complexity with optimizations for ASCII strings
ASCII optimization: O(1) random access for pure ASCII strings
Mutable state: The String type maintains internal state and is not thread-safe

Capabilities

String Creation

Create a new UTF-8 string with rune-based indexing capabilities.

func NewString(contents string) *String

Creates a new String instance that wraps the provided UTF-8 string, enabling efficient rune-based indexing and operations.

Parameters:

contents string - The UTF-8 encoded string to wrap

Returns:

*String - A pointer to a newly created String instance

String Initialization

Initialize an existing String structure with new contents.

func (s *String) Init(contents string) *String

Initializes an existing String to hold the provided contents. Useful for reusing a String instance with different content.

Parameters:

contents string - The UTF-8 encoded string to store

Returns:

*String - A pointer to the initialized String

Rune Access

Retrieve a single rune at a specific index position.

func (s *String) At(i int) rune

Returns the rune at the specified index. The sequence of runes is the same as iterating over the contents with a for range clause.

Parameters:

i int - The zero-based rune index

Returns:

rune - The rune (Unicode code point) at position i

Rune Counting

Get the total number of runes in the string.

func (s *String) RuneCount() int

Returns the number of runes (Unicode code points) in the String. For a string "Hello, 世界", this returns 9, not the byte length.

Returns:

int - The number of runes in the string

String Slicing

Extract a substring using rune positions.

func (s *String) Slice(i, j int) string

Returns the string sliced at rune positions [i:j], similar to string slicing in Go but using rune indices instead of byte indices.

Parameters:

i int - The starting rune index (inclusive)
j int - The ending rune index (exclusive)

Returns:

string - The sliced substring

ASCII Detection

Check if the string contains only ASCII characters.

func (s *String) IsASCII() bool

Returns a boolean indicating whether the String contains only ASCII bytes. This is useful for optimization decisions since ASCII strings have O(1) random access.

Returns:

bool - true if the string contains only ASCII characters, false otherwise

String Representation

Get the underlying string value.

func (s *String) String() string

Returns the full contents of the String. This method makes the String type directly printable by fmt.Print and other formatting functions.

Returns:

string - The complete string contents

Types

type String struct {
	// Has unexported fields
}

String wraps a regular string with internal state that provides efficient indexing by code point (rune) index, as opposed to byte index.

Performance Characteristics:

Incremental scanning (forward or backward): O(1) per index operation (though not as fast as a range clause going forwards)
Random access: O(N) in the length of the string, but with lower overhead than always scanning from the beginning
ASCII optimization: If the string contains only ASCII, random access is O(1)
Thread-safety: Unlike the built-in string type, String has internal mutable state and is not thread-safe

Usage Examples

Example 1: Indexing Multi-Byte Characters

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Using a standard Go string with multi-byte characters
	standardStr := "こんにちは" // Japanese text (5 characters, 15 bytes)
	fmt.Printf("Standard string byte length: %d\n", len(standardStr)) // Output: 15

	// Using UTF-8 String for efficient rune indexing
	utf8Str := utf8string.NewString("こんにちは")
	fmt.Printf("UTF-8 String rune count: %d\n", utf8Str.RuneCount()) // Output: 5

	// Access runes by position
	for i := 0; i < utf8Str.RuneCount(); i++ {
		fmt.Printf("Position %d: %c\n", i, utf8Str.At(i))
	}
}

Example 2: Comparing ASCII and Non-ASCII Strings

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// ASCII string
	asciiStr := utf8string.NewString("Hello")
	fmt.Printf("'%s' is ASCII: %v\n", asciiStr.String(), asciiStr.IsASCII()) // true

	// Non-ASCII string
	mixedStr := utf8string.NewString("Hello, 世界")
	fmt.Printf("'%s' is ASCII: %v\n", mixedStr.String(), mixedStr.IsASCII()) // false
}

Example 3: String Slicing with Rune Indices

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	original := utf8string.NewString("The quick brown fox")

	// Slice by rune positions
	words := original.Slice(4, 9) // "quick"
	fmt.Println(words) // Output: quick

	// Compare with standard string slicing (which would be wrong)
	standardStr := "The quick brown fox"
	fmt.Println(standardStr[4:9]) // Output: quic (wrong slice on UTF-8 boundary)
}

Example 4: Reusing String Instances

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
)

func main() {
	// Create a String instance
	s := utf8string.NewString("First string")
	fmt.Printf("Rune count: %d\n", s.RuneCount()) // Output: 12

	// Reuse the instance with new content
	s.Init("Second string")
	fmt.Printf("Rune count: %d\n", s.RuneCount()) // Output: 13
	fmt.Println(s.String()) // Output: Second string
}

Example 5: Comparison of Indexing Methods

package main

import (
	"fmt"
	"golang.org/x/exp/utf8string"
	"unicode/utf8"
)

func main() {
	text := "Café" // 5 runes, 5 bytes for "Caf", 2 bytes for "é", total 7 bytes

	// Standard Go string indexing (by bytes)
	fmt.Printf("Standard string length: %d bytes\n", len(text)) // 5

	// Using utf8 package to decode runes
	runeCount := utf8.RuneCountInString(text)
	fmt.Printf("Rune count (manual): %d runes\n", runeCount) // 4

	// Using UTF-8 String for easy rune access
	utf8Str := utf8string.NewString(text)
	fmt.Printf("UTF-8 String rune count: %d runes\n", utf8Str.RuneCount()) // 4

	// Easy access to specific runes
	for i := 0; i < utf8Str.RuneCount(); i++ {
		fmt.Printf("Rune %d: %c\n", i, utf8Str.At(i))
	}
}

Version

Tile

Files

utf8string.mddocs/

UTF-8 String

Package Information

Core Imports

Basic Usage

Architecture

Capabilities

String Creation

String Initialization

Rune Access

Rune Counting

String Slicing

ASCII Detection

String Representation

Types

Usage Examples

Example 1: Indexing Multi-Byte Characters

Example 2: Comparing ASCII and Non-ASCII Strings

Example 3: String Slicing with Rune Indices

Example 4: Reusing String Instances

Example 5: Comparison of Indexing Methods

Version

Tile

Files

utf8string.md.css-3qkkll{font-size:var(--chakra-font-sizes-sm);font-weight:var(--chakra-font-weights-normal);color:var(--chakra-colors-gray-300);}docs/

UTF-8 String

Package Information

Core Imports

Basic Usage

Architecture

Capabilities

String Creation

String Initialization

Rune Access

Rune Counting

String Slicing

ASCII Detection

String Representation

Types

Usage Examples

Example 1: Indexing Multi-Byte Characters

Example 2: Comparing ASCII and Non-ASCII Strings

Example 3: String Slicing with Rune Indices

Example 4: Reusing String Instances

Example 5: Comparison of Indexing Methods

utf8string.mddocs/