theanets.recurrent.Text

class theanets.recurrent.Text(text, alpha=None, min_count=2, unknown='x00')[source]

A class for handling sequential text data.

Parameters:
text : str

A blob of text.

alpha : str, optional

An alphabet to use for representing characters in the text. If not provided, all characters from the text occurring at least min_count times will be used.

min_count : int, optional

If the alphabet is to be computed from the text, discard characters that occur fewer than this number of times. Defaults to 2.

unknown : str, optional

A character to use to represent “out-of-alphabet” characters in the text. This must not be in the alphabet. Defaults to ‘’.

Attributes:
text : str

A blob of text, with all non-alphabet characters replaced by the “unknown” character.

alpha : str

A string containing each character in the alphabet.

__init__(text, alpha=None, min_count=2, unknown='\x00')[source]

x.__init__(…) initializes x; see help(type(x)) for signature

Methods

__init__(text[, alpha, min_count, unknown]) x.__init__(…) initializes x; see help(type(x)) for signature
classifier_batches(steps, batch_size[, rng]) Create a callable that returns a batch of training data.
decode(enc) Encode a text string by replacing characters with alphabet index.
encode(txt) Encode a text string by replacing characters with alphabet index.
classifier_batches(steps, batch_size, rng=None)[source]

Create a callable that returns a batch of training data.

Parameters:
steps : int

Number of time steps in each batch.

batch_size : int

Number of training examples per batch.

rng : numpy.random.RandomState or int, optional

A random number generator, or an integer seed for a random number generator. If not provided, the random number generator will be created with an automatically chosen seed.

Returns:
batch : callable

A callable that, when called, returns a batch of data that can be used to train a classifier model.

decode(enc)[source]

Encode a text string by replacing characters with alphabet index.

Parameters:
classes : list of int

A sequence of alphabet index values to convert to text.

Returns:
txt : str

A string containing corresponding characters from the alphabet.

encode(txt)[source]

Encode a text string by replacing characters with alphabet index.

Parameters:
txt : str

A string to encode.

Returns:
classes : list of int

A sequence of alphabet index values corresponding to the given text.