In version 3.4, python shipped with a new module in the standard library,
pathlib . At the time of its release, I remember hearing some fanfare around it, but I didn’t quite understand the point of it. I had only recently stopped adding strings together to create paths to the files and folders I needed to work with, in favor of using the
os.path modules. It took me a little while to experiment with it and plumb the documentation for useful bits.
Since then, I’ve learned a lot about the
pathlib module, and as I have said before, it is my favorite module in the standard library.
While that’s the case, most people I talk to about it are working to hard to access files and folders with python. Many have either not heard about it, or they still don’t understand it.
They still write code like this:
BASE_FOLDER = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
OK, so there’s nothing wrong with that code per se, but most pythonistas would agree that your code should be readable. I would argue that
pathlib could make that line of code more readableOne could argue that it’s not necessarily more readable, if you’re not as familiar with it. So maybe I should settle for it being shorter? :
BASE_FOLDER = Path(__file__).resolve().parents
I don’t know what your favorite module is in the standard libraryThough, I would like to know what it is and why! , but I imagine that most people don’t really think about it.
The reason why
pathlib is my favorite module is that I have found it to be so very useful. And I really believe it can save you pain and effort in the months and years to come.
That’s why I created two resources for you.
The first is what I’m calling a “field guide” to
pathlib . It’s a standalone document, a little longer than one of my blog posts, that will introduce you to the module and some of its features I use the most.
I have also created a cheat sheet for the module, since there are over 40 useful properties and methods on a
Path object, and I find that I could use a quick-reference guide to remind me of the ones I don’t use as often.
Look, I suggest the next time you are going to work with a file or folder, give
pathlib a try.
It combines parts of the
os.path , and
glob modules (and maybe more) in one useful package.
One of the things I love about
pathlib is that you are now working with an object instead of a string.
Strings are not the best way to work with data. You can’t ask a string what the name of the file is. Instead, you’ll have to come up with method to break the string apart and harvest that information.
One way to do that would be to use the
basename method in the
os.path module and split the result:
location = '/Users/chris/field_guides/pathlib.html' file_name = os.path.basename(location).split('.')
That works, but it’s not the most readable. You have to do some mental gymnastics to unpack the four different things that happen in that one line of code. It could be challenging for people who are new to python.
Instead, you could use a
location = Path('/Users/chris/field_guides/pathlib.html') file_name = location.stem
Now, this is what I’m talking about! There’s no mental gymnastics to figure out how to split strings. There’s no having to remember what individual
os.path methods do. It’s just simply asking an object what one of its properties are.
Since it frees me from directly manipulating strings, I find that
pathlib allows me to be more expressive with the way I write code that interacts with files and folders.
I write less code, and my mind feels more able to think about the problem at hand.
For instance, you can easily reference all of the
json files in the current folder:
json_files = Path().glob('*.json')
You can even safely open a file, load its contents, and close it in one line:
json_files = Path().glob('*.json') for f in json_files: data = json.loads(f.read_text())
Even manipulating paths are easier with pathlib.
I see many people joining paths by adding strings together, like this:
# not the best way to manipulate a path root_folder = 'path/to/content/folder' config_file = root_folder + 'config/config.json'
While this can work, it’s not the best way to manipulate paths, as this way will break if you ever run your code on a different operating system, as Windows uses back-slashes to denote paths. All other platforms use slashes.
Also, it’s easy to introduce bugs like I did in that code. I didn’t include a trailing slash in the
root_folder variable. So the resulting path would look like this, and would probably fail:
# result of previous code 'path/to/content/folderconfig/config.json'
The next best way to join paths is to use the
os.path.joinpath method. This will handle the “bug” I introduced above:
config_file = os.path.join(root_folder, 'config/config.json') print(config_file) 'path/to/content/folder/config/config.json'
While this is better, it still doesn’t address the cross-platform issue.
pathlib not only makes it even easier to join paths, but it also allow you to write code that works on every operating system.
root_folder = Path('path/to/content/folder') config_file = root_folder.joinpath('config/config.json')
pathlib gives you the option to write less code when joining paths. This is equivalent to the line above:
config_file = root_folder / 'config/config.json'
Look, you seriously should give
pathlib a try.
Download my field guide. Use it to get a basic understanding of how to use the
Path object. Practice accessing files with it. Then, come back and download my cheat sheet, so you can better remember what attributes are at your disposal and be reminded of the ones you haven’t tried yet.
I’m confident you’ll enjoy working with files and folders more than you do now.
And also, let me know how you would improve these resources.