When you work with text data, you often want to split it into training and test sets. This is something very usual in machine learning and also in deep learning´s natural language processing. The problem is that you have to split the data in a way that is consistent with the training and test sets, but at the same time you want to keep your data consistent.
NOTE: img source
There was a nasty bug in huggingface’s tokenizers that caused a random runtime error depending on how you deal with the tokenizer when processing your neural network in a multi-threading environment. Since I´m using kubernetes I was able to fix it by allowing the pod to be scheduled again, so I didn´t paid too much attention to it because it was related to the library itself and it was not my code the one that caused the bug.
NOTE: Details here
But then, while checking my AppInsights I found something very bad :). I found that the bug that I thought was occurring “sometimes” was happening all the time:
This is the error rate of my backend with the transformers version 4.6.0:
There are some times when the latest version of az cli is not working. Like for example today while I was trying to debug a container issue running in Azure Container Instances.
I´m getting the following error while trying to connect and check why my container is not working:
~#@❯ az container logs --resource-group "InteligenciaAlertasSentinel" --name detector
The command failed with an unexpected error. Here is the traceback:
'ContainerInstanceManagementClient' object has no attribute 'container'
Traceback (most recent call last):
File "/opt/az/lib/python3.6/site-packages/knack/cli.py", line 231, in invoke
cmd_result = self.invocation.execute(args)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/__init__.py", line 657, in execute
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/__init__.py", line 720, in _run_jobs_serially
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/__init__.py", line 691, in _run_job
result = cmd_copy(params)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/__init__.py", line 328, in __call__
return self.handler(*args, **kwargs)
File "/opt/az/lib/python3.6/site-packages/azure/cli/core/commands/command_operation.py", line 112, in handler
client = self.client_factory(self.cli_ctx, command_args) if self.client_factory else None
File "/opt/az/lib/python3.6/site-packages/azure/cli/command_modules/container/_client_factory.py", line 18, in cf_container
AttributeError: 'ContainerInstanceManagementClient' object has no attribute 'container'
To open an issue, please run: 'az feedback'
It seems the problem is related to the version of az cli that I´m using (the last one while typing this post).
For situations like this, i recommend you to use the official docker repo of azure az cli which easily allows me to use a specific version of az cli.
Tengo el placer y honor de ser ponente en el próximo evento #netCoreConf que tendrá lugar en los dias 9 y 10 de Octubre 2021.
En mi sesión voy a hablar durante 55m de Deeplearning state-of-the-art aplicado donde veremos cómo podemos realizar transfer learning sobre un modelo transformer state of the art como RoBERTa para Named Entity Recognition con nuestro propio dataset, que acabaremos desplegando en kubernetes para poder hacer inferencia como nuestra propia API de AI con FastAPI. La idea es que os quedeis con una alternativa diferente de bajo coste aprovechando Azure VM y AKS para desplegar los modelos que vuestros cientificos de datos acaban dejando típicamente en un python notebook :)
Jugaremos con Spacy, huggingface, pytorch, python, FastAPI, AKS y terraform :)
There is a bug in WSL2 that causes the memory consumption of the vmmem process to not to be released. Right now, the version of WSL2 and windows i´m using is Windows 10 19043.1110 (21H1) and it´s still present.
This is what you can see when the error happens:
1) You have your WSL2 images stopped:
2) There is a lot of memory ussage from the vmmem process: