The Ask:
A docker image that can be used as a standardized starting point for python development work. Additionally,
- Use pyodbc to connect to SQL Server
- The smaller footprint, the better
Seems like a relatively straightforward ask. However, if you’re not careful, you’re going to hit 3 different roadblocks along your journey.
Block#1 : Docker Image Size
Maybe it was my search criteria, but many tutorials ask you to start building a python docker container by using the official Python image. But if you’re not careful, when you’re ready to deploy outside of your workstation, you’re going to realize that the size of the underlying image is over 900MB. This, followed by the immediate assertion that to reduce image size you can use the alpine flavor instead. Yes!! You’ve solved the image size issue.
Block#2: The Pyodbc module
Installing pyodbc on the official python image is a fairly straight forward command
pip install pyodbc
However, plugging this command into a Dockerfile using a python alpine image is certainly going to give you issues. As smarter people on the internet will tell you, this to do with the lack of gcc and/or g++ installed within the underlying alpine base (you’re starting to see why the image is really small in size). But, no worries, all you need is the right set of dependencies installed and we can take care of this block too.
Block#3: SQL Server as a Data Source
Top search results will tell you the best way to connect to SQL Server from docker is to make sure you have Microsoft’s ODBC drivers for SQL Server installed. Great! There’s also a dedicated page from Microsoft to do it on Linux. And when you’ve gone down the rabbit hole a bit longer, you realize that Microsoft is probably not going to directly support drivers for Alpine. So what do we do now?
Microsoft officially started supporting drivers for Alpine back around April 2020. Now we have 2 driver options to connect to SQL Server.
- Use the Microsoft drivers – the instructions for which are here
- Use the FreeTDS driver all things are right with the world.
Also, when you’re working with SQL Server, you want to make sure you have your settings done right. Here’s a great article that tells you all about it.
Block#4: Driver Location within the container
Now hold on a hot minute, why is there a 4th one listed here? Because once you’ve hopped past all the previous blocks you’re going to probably find that pyodbc is now having trouble finding the driver you just installed. Again, smarter folks of the tech world to the rescue. Essentially what we have to do is tell pydobc where to find the installed driver since it can’t seem to pick the right one up right off the bat.
Note: This may only apply to the TDS version at the moment. I need to do my tests again to confirm
Putting this all together
TL;DR – Here are the steps we need to follow to get everything working
- Use python (alpine flavor) as your image base
- Install dependencies
- Pip install pyodbc
- Set connection string
- Explicitly tell pyodbc, within your script, where to find your driver
Here’s a GitHub repo with what I managed to put together – with help from /u/jpens from the Docker subreddit and all the other articles listed.
Conclusion
Now we have a docker image with a relatively small footprint that uses pyodbc and FreeTDS to connect to SQL Server.
1 Comment
Russ
I just stumbled on this article and it saved me a ton of time working out why pyodbc was causing a failed compile to kick off. It’s not that well documented that many python packages are just a shim over some underlying binary that needs to be in place otherwise it creates a lot of compilation. Thanks again!