theanets.recurrent.Text¶
-
class
theanets.recurrent.
Text
(text, alpha=None, min_count=2, unknown='x00')[source]¶ A class for handling sequential text data.
Parameters: - text : str
A blob of text.
- alpha : str, optional
An alphabet to use for representing characters in the text. If not provided, all characters from the text occurring at least
min_count
times will be used.- min_count : int, optional
If the alphabet is to be computed from the text, discard characters that occur fewer than this number of times. Defaults to 2.
- unknown : str, optional
A character to use to represent “out-of-alphabet” characters in the text. This must not be in the alphabet. Defaults to ‘’.
Attributes: - text : str
A blob of text, with all non-alphabet characters replaced by the “unknown” character.
- alpha : str
A string containing each character in the alphabet.
-
__init__
(text, alpha=None, min_count=2, unknown='\x00')[source]¶ x.__init__(…) initializes x; see help(type(x)) for signature
Methods
__init__
(text[, alpha, min_count, unknown])x.__init__(…) initializes x; see help(type(x)) for signature classifier_batches
(steps, batch_size[, rng])Create a callable that returns a batch of training data. decode
(enc)Encode a text string by replacing characters with alphabet index. encode
(txt)Encode a text string by replacing characters with alphabet index. -
classifier_batches
(steps, batch_size, rng=None)[source]¶ Create a callable that returns a batch of training data.
Parameters: - steps : int
Number of time steps in each batch.
- batch_size : int
Number of training examples per batch.
- rng :
numpy.random.RandomState
or int, optional A random number generator, or an integer seed for a random number generator. If not provided, the random number generator will be created with an automatically chosen seed.
Returns: - batch : callable
A callable that, when called, returns a batch of data that can be used to train a classifier model.