Formatron v0.4.9
Formatron empowers everyone to control the output format of language models with minimal overhead.
|
Go to the source code of this file.
Namespaces | |
namespace | formatron |
namespace | formatron.integrations |
This subpackage contains integrations with other frameworks and libraries. | |
namespace | formatron.integrations.utils |
Functions | |
bytes | formatron.integrations.utils._multiple_replace (typing.Dict[bytes, bytes] replacements, re.Pattern[bytes] regex, bytes text) |
typing.Dict[int, bytes] | formatron.integrations.utils.get_original_characters (typing.Dict[str, int] vocab, typing.Optional[list[typing.Callable]] processors=None) |
Get a vocabulary of original characters unmangled to raw UTF-8 bytes by the provided processors. | |
typing.List[typing.Callable] | formatron.integrations.utils.autodetect_processors (typing.Dict[str, int] vocab) |
Autodetect vocabulary processors. | |
formatron.integrations.utils.update_vocab_0xHH (typing.Dict[bytes, bytes] token_to_char) | |
Vocabulary processor for <0xHH> tokens (used in llama tokenizers) | |
formatron.integrations.utils.update_vocab_sentencepiece (typing.Dict[bytes, bytes] token_to_char) | |
Vocabulary processor for ▁ token (used in sentencepiece tokenizers) | |
formatron.integrations.utils.update_vocab_dot_G (typing.Dict[bytes, bytes] token_to_char) | |
Vocabulary processor for GPT2 style token mangling, like from \n to Ġ(used in huggingface bytelevel preprocessors) | |
formatron.integrations.utils._huggingface_bytelevel_decoder () | |
I hate legacy code. | |