RuntimeError: Already borrowed | Become a Data Ninja with Enrique Catalá

[ AI Kubernetes Python Performance Cloud DataNinja ]

There was a nasty bug in huggingface’s tokenizers that caused a random runtime error depending on how you deal with the tokenizer when processing your neural network in a multi-threading environment. Since I´m using kubernetes I was able to fix it by allowing the pod to be scheduled again, so I didn´t paid too much attention to it because it was related to the library itself and it was not my code the one that caused the bug.

NOTE: Details here

But then, while checking my AppInsights I found something very bad :). I found that the bug that I thought was occurring “sometimes” was happening all the time:

This is the error rate of my backend with the transformers version 4.6.0:

After digging a little bit I did a couple of things to improve the performance (none of them required change in my backend code):

Take myself in control of creating the whole image from scratch:
- Change the base image from pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime to ubuntu:20.04
  - “Official image” is 7.5Gb
  - “Mine” is 2.7Gb (the one with cpu-only support)
- Manually compile pytorch 1.9.0 with/without cuda support depending on my container requirements
Ensure fastapi[all]>=0.68.0
Ensure transformers==4.10.2
- This was the one who fixed the bug

And this is now, with the transformers version 4.10.2:

Near ~3x faster and 15x less error rate :)

Nice!