Improving your (Python) programming skills for Data Science and beyond
Why and how to become a better programmer?
Maybe you've found yourself in a situation similar to mine: I had to (and wanted to) write code for a project, a scientific project in my case. Things started out pretty simple. I wrote a single file to to do some computations with NumPy and made a couple of plots with Matplotlib. I've had some prior experience with Matlab, C and Java, so after digging through NumPy's and Matplotlib's documentation I had my first Python script done in less than a day. That was over 2 years ago. I then incrementally added functions and classes. My PhD project turned out to be mainly computational and my code base grew and grew. I spent 80% every day in front of Spyder, PyCharm or PyDev. Watching me at work you could have said I'm a software engineer and not a scientist.
I slowly realized that understanding the mathematics behind my science was one thing. But, getting the mathematics into a maintainable, well organized, scalable, extensible and clean software project was a completely different thing.
So, how to get the software engineering part right? Well, lots of books and even more trial and error! Learn things the hard way!
A selection of books I've read recently on that topic
This here is about the "books part" of becoming a better programmer. The "trial and error part" is something you have to do on your own. It really doesn't help to only read things. You really have to apply what you read to actually understand it.
Gang of Four - Design Patterns
This is the classic book on software design. It is written by the so called "Gang of Four": Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides. Erich Gamma is the person who invented the JUnit software testing framework. If you've ever written a serious piece of Java code you've most likely encountered it already. But even Python's unittest module is modeled after JUnit. (That is why the unittest module uses CamelCase instead of underscores as usually).
The two probably most famous quotes from this book are:
"Program to an interface, not an implementation"
and
"Favor object composition over class inheritance".
The book is mainly targeted to the C++, Java and Smalltalk audience. Some of the design patterns are really obsolete in Python because of Python's dynamic nature. For example, you don't need a real class factory in Python since object construction and a function call look the same in Python. There is no "new" keyword as in Java. Also, in Python we always program to an interface. We do duck-typing.
However, others are still important for Python or even built into the language. For example, also in Python you should favor object composition over class inheritance. This one actually hit me once really badly. Moreover, design patterns such as a "Composite", "Decorator", "Builder" or "Observer", to only name a few, are still valid and useful. Patterns such as the "Iterator" pattern are even built into the Python language.
In summary, this book is a classic read but it is not the immediately most helpful one to improve your Python skills. Read it for cultural reasons, not so much to become a better Python programmer.
Robert C. Martin - Clean Code
I guess this book could also be called "classic" by now. I know of some software companies which have this book as mandatory reading for new programmers. Compared to "Design Patterns" it is more basic. And also more focused on code quality than on software design. Similar to the "Design Patterns" books it is mainly targeted to Java/C++ coders. It starts with recommendations such as to use meaningful variable names and to keep functions short. It continues on how to comment and format code. But then also talks a bit about software architecture.
Sounds too basic? Things such as "meaningful variable names" sound trivial but not everybody does it. Belive it or not, but I've seen code with alphabetical variable names: a, b, c, d, etc.. This makes it so unnecessarily hard to read that code.
Summary: Good read for everybody working on code. It's worth it to at least skim over it and read the relevant part in more detail. Again, it is not Python specific but many of the principles stated here apply to almost any (object-oriented) language.
Robert C. Martin - Agile Software Development
I had the international edition with the beautiful autumn tree cover.
While written before the "Clean Code" book, contentwise it is its successor. Again, although written with Java/C++ in mind much of it applies to Python as well. It covers a couple of design patterns from the GoF book as well. The chapters I enjoyed most were the ones on the "SOLID" design principle, that is:
The Single-Responsibility Principle:
A class should have only one reason to change
The Open-Closed Principle:
Software entities (classes, modules, functions, etc.) should be open for extension, but closed for modification.
The Liskov Substitution Principle:
Subtypes most be substitutable for their base types
The Dependency-Inversion Principle:
- High-level modules should not depend on low-level modules. Both should depend on abstractions.
- Abstractions should not depend on detail. Details should depend on abstractions.
And for completeness, also naming the last one, the "Interface segregation principle". Although this one is not relevant for Python.
While reading about these principles I recognizing how I've had before violated these principles and how these violations have caused me much trouble afterwards. Again, the combination of reading about these practices and then applying them to my code was a rally educational experience.
Summary: Worth a read. In many aspect so general that this book can help to improve almost any software project.
Martin Fowler - Refactoring
I enjoyed this one as this book is about dealing with (your) mess. Again, the examples are mainly Java or C++. But essentially everything stated here applies to Python as well. I would really recommend this one as it nicely explains and illustrates how to convert you mess step by step into a maintainable program. Moreover, I really enjoyed its writing style. It also illustrates the importance of (unit) tests. Refactoring without unit tests is like jumping out of a plane without parachute.
One nice lesson from this book is that, as it says, you should always be aware of whether you are in refactoring mode or in feature adding mode. Don't do these things at the same time. I think this is a very good recommendation. Otherwise it is too easy to get lost and to waste time adding a new features and trying to make the code beautiful at the same time.
Summary: Good read. Not Python specific but explains very hands on how to improve the design of existing code.
Luciano Ramalho - Fluent Python
I really, really liked this one. From the books presented here, this is clearly the most Pythonic one. But it is also helpful beyond Python. If you are at an intermediate Python level, let's say with a year experience or so, this book can help you to become a much better Python programmer. But is also provides valuable insights about programming beyond Python in general.
It starts with a review on Python data structures. But it goes really in depth. It does not only explain what a list or a dictionary is, but really goes into detail and explains all the nice little language features you don't have to know to get your program running, but when you know them they make your code much more elegant and Pythonic.
It also covers, abstract base classes, how to do operator overloading right, context managers, coroutines and the new asyncio module. While this is already amazing it then continues with metaprogramming and notably a chapter on attribute descriptors. Make sure not to miss the descriptors if you want to understand how bound methods.
Summary: A must read for everybody who want's to become a good Python programmer. I really very highly recommend it.
Peter Seibel - Practical Common Lisp
Lisp? Isn't that an esoteric language with so tons of parentheses all over the place? Yes, it is! So what has that to do with becoming better at Python or any other (commonly used) language?
Indeed Lisp is quite different from Python. But I found it very interesting to look beyond my daily Python world and check what is out there. To really know your language, you also have to know how it is different from other languages. I guess I could have also listed some Haskell literature here. But Lisp was an interesting one since things like, e.g. context managers are old stuff to the Lisp community. Also Python had its inspirations. And its interesting to see where they are taken from.
Summary: Don't read this to directly improve your Python skills. But its a good idea to look beyond your daily business and appreciate what you have and what you maybe lack in Python.
Conclusion
So many nice books are out there which can help you to improve you coding and software design skills. Reading them alone won't turn you into a better programmer. But the combination of reading these books and then trying to apply it to your own project will really help you to write beautiful software.
I guess this is the difference between somebody who just gets a script to run and a real programmer.