Read Large Files Efficiently with Python

by Seth Bunke · January 16, 2021

With the rate at which data is growing the size of files we are expected to process is growing seemingly exponentially. These increasingly large files mean we need to do everything we can to optimize memory usage and processor time – especially when we are working with Python (I love you Python, but you’re definitely not the fastest!).

Anyhow, here is our sample data (multiply this by a couple billion for a large file):

Joe
bill
cindy
mary
henry
joe
mary

In most tutorials and books on reading large files you will see something like this:

name_counts = {}
 
file_name = 'names_data.txt'
with open(file_name) as names_file:
    names = names_file.read().splitlines()
 
for name in names:
    name = name.lower() #deal with different casing
    if name in name_counts:
        name_counts[name] += 1
    else:
        name_counts[name] = 1
print(name_counts)
#output is: {'joe': 2, 'bill': 1, 'cindy': 1, 'mary': 2, 'henry': 1}

While this works, it loads the whole file into a list – all at once! As a result, this has a space complexity of O(n) – needless to say, this is not memory efficient. The alternative approach is to read one line at a time:

name_counts = {}
 
file_name = 'names_data.txt'
with open(file_name) as names_file: 
    for name in names_file:
        name = name.strip().lower() #remember to strip the newline
        if name in name_counts:
            name_counts[name] += 1
        else:
            name_counts[name] = 1
print(name_counts)
#output is: {'joe': 2, 'bill': 1, 'cindy': 1, 'mary': 2, 'henry': 1}

By changing just 2 lines of code (since now we are only loading one line at a time) this approach has a space complexity of O(1)! You just need to remember to strip the string of the newline character!

With a file large enough, those 2 lines could mean the difference between being able to get things done and bringing your machine to a grinding halt!

Did you like this post? We are working on a series of posts to help readers go from newbies in Python to pros. Join our email list to make sure you don’t miss that series.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Read Large Files Efficiently with Python

Like this:

Related

You may also like...

Leave a Reply Cancel reply

Search

Categories

Meta

Read Large Files Efficiently with Python

Share this:

Like this:

Related

You may also like...

Changing Github Credentials on Mac OS

Easiest Way to Install TensorFlow on Your Machine

Two Great, Free Courses in Data Science Are Starting Today

Leave a Reply Cancel reply

Search

Tags

Categories

Meta