Blog Archives
#Docker: copy files and folders between a container and the local filesystem
Container technology lead to a dramatic shift in computing paradigms and it is exceptionally useful in contexts where platform independence and reproducibility are important factors. This is the case for example in Machine Learning and AI applications. Containers are at the base of ML platforms such as Michelangelo (Uber), Bighead (Airbnb), Databricks, Domino, etc.
So now you are happily running your stuff in containers and tinkering around, when you need to pull out a file from one of them for e.g., check the output of an algorithm is correct, or you need to paste a directory containing a new shiny hand-made dataset to quickly test. If this was a linux machine you could just use scp
command and call it a day, so you will be glad docker now offers its own cp-like command.
So for example let’s say we want to copy the file output.txt
that is saved in the home of a running container to our desktop for some analysis. We grub the container id using docker ps
command (in these examples b0e70cca9782) and we issue the following command:
#docker cp b0e70cca9782:/home/output.txt ~/Desktop
Likewise, if we have a directory containing test images and we want to copy it in the home directory of the container to test our algorithm further we can do:
#docker cp ~/Desktop/images b0e70cca9782:/home/
#Python : debugging your python processes with GDB
PyCharm is an awesome IDE, and its debugger is a massively useful tool to help in code development.
However, there are instances where the bug express itself only at runtime in conditions that are hard to reproduce on the developer machine or where traces are not available. Example of these types of bugs that are difficult to debug from within Python are:
- segfaults (not uncaught Python exceptions)
- hung processes (in cases where you can’t get a Python traceback or debug with pdb)
- out of control daemon processes
- python processes running in a Docker container in a production environment
In these cases, you can try gdb.
Let’s take the case of your python process running in a Docker container. You can get a shell into the container and install a couple of packages (e.g., for Ubuntu Linux):
#apt-get install gdb python2.7-dbg
Now you are ready to debug your process either interactively
#gdb python
...
(gdb) run [program name].py [arguments]
or automatically:
#gdb -ex r --args python [program name].py
If the process is already running (which will be the case if in production and the bug did not cause the process to terminate):
#gdb python [pid of process]
Happy debugging! 😎