Intro to Docker Containers for Data Scientists

The text is a tutorial on setting up a local development environment using Docker containers for data scientists. It highlights the importance of maintaining an updated development environment and provides step-by-step guidance on creating a Docker environment. It also explains the benefits of containerization and outlines the process of creating a Dockerfile and setting up a container.

 Intro to Docker Containers for Data Scientists

“`html

A Practical Tutorial for Setting Up a Local Dev Environment Using Docker Container

Motivation

An essential part of a data scientist’s daily work involves managing and maintaining a development environment. Our work goes considerably more smoothly when the development environment is kept up-to-date and closely reflects the production environment; when it isn’t, things start to get messy. Proficiency with the CI/CD pipeline and devops can be quite advantageous in a larger environment. Providing developments that are simple to integrate and put into production is a data scientist’s first priority.

This is where containers come into play; by encapsulating our development environment, they allow us to save time and effort.

Who Could Benefit from Working on a Docker Container Environment?

You probably don’t need to use containers if you’re a developer working on a side project and don’t care about deploying it to production. However, this one is a necessity if you are part of a team that uses the CI/CD pipeline.

What is a Container?

The concept of containers was first introduced in the 1970’s. Imagine a container as an isolated working environment — a server that we can define from scratch. We can decide what will be the properties of this server such as the operating system, python interpreter version and library dependencies. The server is sustained by relying on your machine resources. Another property of the container is that it doesn’t have access to our storage, unless we explicitly grant it permission. A good practice is to mount a folder we want to be included in our container scope.

What is Dockerfile?

As said above, a container is a capsulated environment for running our algorithms. This environment is supported by the docker extension responsible for supporting containerization.

Set Up a Container

Preliminary Prerequisites
Before we create a docker container, we first need to make sure our local working environment is ready. Let’s make sure we have the following checklist:
1. VS Code as our code editor : https://code.visualstudio.com/
2. Git for version control management: https://git-scm.com/downloads
3. Github user: https://github.com/
4. https://www.docker.com/

After you complete all these prerequisites, make sure to sign in to the docker app you have installed. This will enable us to create a docker container and track its status

Step 1 — Cloning the Repo

To begin, let’s select a repo to work with. Here I provided a repo containing an algorithm which estimates whether a text is AI generated by combining both the model’s perplexity value given a text and the number of spelling errors. Higher perplexity implies that it is more difficult for LLM to predict the next word, hence wasn’t generated by a human.

The repo’s link: GitHub – Idoleshem/setup_a_local_container

On github, Click code and copy the HTTPS address as follows:

After that, open the VS Code, and clone a repo you wish to include in your container. make sure VS Code is connected to your github account. Alternatively, you can also init a new git repo.

Step 2 — Create a Docker Image

Do this by opening the terminal and copy paste the following command:

docker build -t local_container_intro .

This might take a few moments until you see your docker image created. Click on the docker icon, the change would be reflected. Once we created the docker image we don’t need to run this command anymore. The only command we will use is docker run.

The local_container_intro is the name of the docker image, you can change it to what ever you want.

Step 3 — Create a Docker Container

To grant the container access to the repository you cloned, remember to include your project path in the docker run command. We will use the following command to create the container, giving it the name “local_container_instance”:

docker run -it --name local_container_instance -v /pate/to/your/project/folder :/project local_container_intro

You can view the container in the CONTAINERS window after it has been created. In order to actually use it, click “attach visual studio code”. This will open a new window which reflects your containerized environment. This environment includes your code and on the bottom left you can see your container name. Open the terminal and run “pip list” and see whether all the dependencies are installed. Make sure to install any Python extensions that may be required for your container.

That’s it, all that is left to do is start developing 🙂

“`

List of Useful Links:

AI Products for Business or Try Custom Development

AI Sales Bot

Welcome AI Sales Bot, your 24/7 teammate! Engaging customers in natural language across all channels and learning from your materials, it’s a step towards efficient, enriched customer interactions and sales

AI Document Assistant

Unlock insights and drive decisions with our AI Insights Suite. Indexing your documents and data, it provides smart, AI-driven decision support, enhancing your productivity and decision-making.

AI Customer Support

Upgrade your support with our AI Assistant, reducing response times and personalizing interactions by analyzing documents and past engagements. Boost your team and customer satisfaction

AI Scrum Bot

Enhance agile management with our AI Scrum Bot, it helps to organize retrospectives. It answers queries and boosts collaboration and efficiency in your scrum processes.