Tag
2 articles
Perplexity AI open-sources a new Unigram tokenizer that reduces p50 latency by 5x and cuts CPU utilization by 5-6x compared to Hugging Face tokenizers.
Learn how tokenizers work in AI models and why changes to text processing can dramatically affect costs, even when prices per token stay the same.