Table of Contents
Glob is a generic term that refers to matching given patterns using Unix shell rules. Glob is supported by Linux and Unix systems and shells, and the function glob() is available in system libraries.
In Python, the glob module finds files/pathnames that match a pattern. The glob pattern rules are the same as the Unix path expansion rules. It is also projected that, based on benchmarks, it will match pathnames in directories faster than other approaches. Apart from exact string search, we can combine wildcards (“*,?, [ranges]) with Glob to make path retrieval more straightforward and convenient. Note that this module is included with Python and does not need to be installed separately.
Glob in Python
Programmers can use the Glob() function to recursively discover files starting with Python 3.5. The glob module in Python helps obtain files and pathnames that match the specified pattern passed as an argument.
The pattern rule of the Glob is based on standard Unix path expansion rules. Researchers and programmers conducted a benchmarking test, and it was discovered that the glob technique is faster than alternative methods for matching pathnames within directories. Other than string-based searching, programmers can use wildcards (“*,?, etc.) with Glob to extract the path retrieval technique more efficiently and straightforwardly.
To use Glob() to find files recursively, you need Python 3.5+. The glob module supports the “**” directive(which is parsed only if you pass a recursive flag), which tells Python to look recursively in the directories.
The syntax is as follows: glob() and iglob():
glob.glob(path_name, *, recursive = False) glob.iglob(path_name, *, recursive = False)
The recursive value is set to false by default.
import glob for filename in glob.iglob('src/**/*', recursive=True): print(filename)
Using an if statement, you can check the filename for whatever condition you wish. You can use os.walk to recursively walk the directory and search the files in older Python versions. The latter is covered in a later section.
“Global patterns specify sets of filenames containing wildcard characters,” according to Wikipedia. These patterns are comparable to regular expressions, but they’re easier to use.
- The asterisk (*) indicates a match of zero or more characters.
- The question mark (?) corresponds to a single character.
# program for demonstrating how to use Glob with different wildcards import glob print('Named explicitly:') for name in glob.glob('/home/code/Desktop/underscored/data.txt'): print(name) # Using '*' pattern print('nNamed with wildcard *:') for name in glob.glob('/home/code/Desktop/underscored/*'): print(name) # Using '?' pattern print('nNamed with wildcard ?:') for name in glob.glob('/home/code/Desktop/underscored/data?.txt'): print(name) # Using [0-9] pattern print('nNamed with wildcard ranges:') for name in glob.glob('/home/code/Desktop/underscored/*[0-9].*'): print(name)
To search files recursively, use the Glob() method.
To get paths recursively from directories/files and subdirectories/subfiles, we can utilize the glob module’s glob.glob() and glob.iglob().
The syntax is as follows:
glob.glob(pathname, *, recursive=False)
glob.iglob(pathname, *, recursive=False)
When recursion is set to True, any file or directory will be matched by “**” followed by path separator(‘./**/’).
Example: Python program to find files
# recursively find files using Python # Python program to find files # recursively using Python import glob # Shows a list of names in list files. print("Using glob.glob()") files = glob.glob('/home/code/Desktop/underscored/**/*.txt', recursive = True) for file in files: print(file) # It is responsible for returning an iterator which will is simultaneously printed. print("nUsing glob.iglob()") for filename in glob.iglob('/home/code/Desktop/underscored/**/*.txt', recursive = True): print(filename)
For previous Python versions, see:
The most straightforward technique is to utilize os.walk(), which is built and optimized for recursive directory tree exploration. Alternatively, we may use os.listdir() to acquire a list of all the files in a directory and its subdirectories, which we can then filter out.
Let’s look at it through the lens of an example:
# program for finding files recursively by using Python import os # Using os.walk() for dirpath, dirs, files in os.walk('src'): for filename in files: fname = os.path.join(dirpath,filename) if fname.endswith('.c'): print(fname) """ Alternatively, let us use fnmatch.filter() for filtering out results. """ for dirpath, dirs, files in os.walk('src'): for filename in fnmatch.filter(files, '*.c'): print(os.path.join(dirpath, filename)) # employ os.listdir() path = "src" dir_list = os.listdir(path) for filename in fnmatch.filter(dir_list,'*.c'): print(os.path.join(dirpath, filename))
Example: Glob() with the Recursive parameter set to False
import glob print('Explicitly mentioned file :') for n in glob.glob('/home/code/Desktop/underscored/anyfile.txt'): print(n) # The '*' pattern print('n Fetch all with wildcard * :') for n in glob.glob('/home/code/Desktop/underscored/*n'): print(n) # The '?' pattern print('n Searching with wildcard ? :') for n in glob.glob('/home/code/Desktop/underscored/data?.txt n'): print(n) # Exploring the pattern [0-9] print('n Using the wildcard to search for number ranges :') for n in glob.glob('/home/code/Desktop/underscored/*[0-9].* n'): print(n)
In the example above, we must first import the glob module. Then we must supply the path to the Glob () method, which will look for any subdirectories and print them using the print() function. Next, we’ll append different patterns to the end of the path, such as * (asterisk),? (wildcard), and [range], so that it can fetch and display all of the folders in that subdirectory.
Example: Glob() with the Recursive parameter set to True
import glob print("The application of the glob.glob() :-") fil = glob.glob('/home/code/Desktop/underscored/**/*.txt', recursive = True) for f in fil: print(f) # an iterator responsible for printing simultaneously is returned print("n Applying the glob.iglob()") for f in glob.iglob('/home/code/Desktop/underscored/**/*.txt', recursive = True): print(f)
It is another program that demonstrates recursive traversal of directories and subdirectories. We must first import the glob module. Then we must supply the path to the Glob () method, which will look for any subdirectories and print them using the print() function.
Then we’ll utilize patterns like ** and * to represent all sub-folders and folders within that path string. The first parameter is the string, while the second parameter, recursive = True, determines whether or not to visit all sub-directories recursively. The same is true with iglob(), which stands for iterator glob and produces an iterator with the same results as Glob () but without storing them all at once.
The process of accessing files recursively in your local directory is a crucial approach that Python programmers must implement in their applications when searching for a file. The concept of the regular expression can be used to do this. Regular Expressions, often known as regex, play a crucial role in recursively discovering files in Python programming.
Glob is a term that refers to a variety of ways for matching preset patterns according to the Unix shell’s rules. Some systems, such as Unix, Linux, and shells, support Glob and render the Glob() function in system libraries.
Glob() and iglob() are two fundamental methods that, depending on the second parameter value (True/False), run over the path either straightway or recursively. Because Python has made it efficient as a method, it is more beneficial than any other manual way.
In this tutorial, you’ve learned how to use the Glob () function in Python programs to discover files recursively. We are hopeful of its informativeness, and you enjoyed it as we did.