things Python Multiprocessing should have told for Machine Learning

keshav97
2 min readApr 18, 2020

Recently I designed multiple models for my image scoring feature that helps to score any image on multiple aspects like brightness, exposure, etc.

Now I had my 13 models running sequentially and the need of the hour was to run them in parallel to reduce the prediction time. So I did my hands dirty with the python multiprocessing library and tried using the Process module in order to run the models in parallel.

I noticed a few difficulties in using Process which every one of you may have faced already or will face as

  1. Need to create a shared variable of Manager to use between different processes.
  2. If you need to do some calculations after the models’ prediction completes, you need to add start and join by yourself, yes you read right, need to add these little things also by yourself.
  3. To kill all the child processes before the parent process, mention it explicitly.

I was like, dude these are basic necessities for multiprocessing, python what is wrong with you?

4. Here comes the devil of all flaws and I name this Mr. TinyUnbreakable because the issue is unbreakable. You ever faced the deadlock in running ML models in multiprocessing. If so, then you have already met Mr. TinyUnbreakable. Let's get to the point.

We all install keras and other ML dependencies globally in the code and use the same imports for each model and that causes the deadlock because, for each process using model, it needs a new keras environment. So, you just need to add keras each time to your new Process of the model and it will run smoothly.
So, if you faced any such issue, feel free to try my suggestions.

--

--