theanets.recurrent.Text

class theanets.recurrent.Text(text, alpha=None, min_count=2, unknown='x00')

A class for handling sequential text data.

Parameters:

text : str

A blob of text.

alpha : str, optional

An alphabet to use for representing characters in the text. If not provided, all characters from the text occurring at least min_count times will be used.

min_count : int, optional

If the alphabet is to be computed from the text, discard characters that occur fewer than this number of times. Defaults to 2.

unknown : str, optional

A character to use to represent “out-of-alphabet” characters in the text. This must not be in the alphabet. Defaults to ‘’.

Attributes

text (str) A blob of text, with all non-alphabet characters replaced by the “unknown” character.
alpha (str) A string containing each character in the alphabet.
__init__(text, alpha=None, min_count=2, unknown='\x00')

Methods

__init__(text[, alpha, min_count, unknown])
classifier_batches(steps, batch_size[, rng]) Create a callable that returns a batch of training data.
decode(enc) Encode a text string by replacing characters with alphabet index.
encode(txt) Encode a text string by replacing characters with alphabet index.
classifier_batches(steps, batch_size, rng=None)

Create a callable that returns a batch of training data.

Parameters:

steps : int

Number of time steps in each batch.

batch_size : int

Number of training examples per batch.

rng : numpy.random.RandomState or int, optional

A random number generator, or an integer seed for a random number generator. If not provided, the random number generator will be created with an automatically chosen seed.

Returns:

batch : callable

A callable that, when called, returns a batch of data that can be used to train a classifier model.

decode(enc)

Encode a text string by replacing characters with alphabet index.

Parameters:

classes : list of int

A sequence of alphabet index values to convert to text.

Returns:

txt : str

A string containing corresponding characters from the alphabet.

encode(txt)

Encode a text string by replacing characters with alphabet index.

Parameters:

txt : str

A string to encode.

Returns:

classes : list of int

A sequence of alphabet index values corresponding to the given text.