| 
    Formatron v0.4.11
     
     
    
   Formatron empowers everyone to control the output format of language models with minimal overhead. 
   | 
 
Go to the source code of this file.
Namespaces | |
| namespace | formatron | 
| namespace | formatron.integrations | 
| This subpackage contains integrations with other frameworks and libraries.  | |
| namespace | formatron.integrations.utils | 
Functions | |
| bytes | formatron.integrations.utils._multiple_replace (typing.Dict[bytes, bytes] replacements, re.Pattern[bytes] regex, bytes text) | 
| typing.Dict[int, bytes] | formatron.integrations.utils.get_original_characters (typing.Dict[str, int] vocab, typing.Optional[list[typing.Callable]] processors=None) | 
| Get a vocabulary of original characters unmangled to raw UTF-8 bytes by the provided processors.   | |
| typing.List[typing.Callable] | formatron.integrations.utils.autodetect_processors (typing.Dict[str, int] vocab) | 
| Autodetect vocabulary processors.   | |
| formatron.integrations.utils.update_vocab_0xHH (typing.Dict[bytes, bytes] token_to_char) | |
| Vocabulary processor for <0xHH> tokens (used in llama tokenizers)   | |
| formatron.integrations.utils.update_vocab_sentencepiece (typing.Dict[bytes, bytes] token_to_char) | |
| Vocabulary processor for ▁ token (used in sentencepiece tokenizers)   | |
| formatron.integrations.utils.update_vocab_dot_G (typing.Dict[bytes, bytes] token_to_char) | |
| Vocabulary processor for GPT2 style token mangling, like from \n to Ġ(used in huggingface bytelevel preprocessors)   | |
| formatron.integrations.utils._huggingface_bytelevel_decoder () | |
| I hate legacy code.   | |