Photo by the author
# How does Colab work?
Google Co is an extremely powerful tool for data analysis, machine learning and Python programming. This is because it eliminates the hassle of local configuration. However, an area that often confuses beginners and sometimes even intermediate users is file management.
Where are the files located? Why do they disappear? How to upload, download or permanently store data? In this article you will find the answer to all this step by step.
Let’s clear up the biggest misunderstanding right away. Google Colab doesn’t work like your laptop. Every time you open notebook, Colab provides a transient virtual machine (VM). When you leave, everything inside will be cleaned. This means:
- Files saved locally are transient
- After resetting the runtime, the files disappear
Your default working directory is:
Whatever you put inside /content will disappear after resetting the runtime.
# Browsing files in Colab
You have two basic ways to browse files.
// Method 1: Using the visual method
Here is the recommended approach for beginners:
- Look at the left sidebar
- Click the folder icon
- Explore the interior
/content
This is great when you just want to see what’s going on.
// Method 2: Using the Python method
This is useful when creating scripts or debugging paths.
import os
os.listdir('/content')
# Uploading and downloading files
Let’s say you have a dataset or comma-separated values (CSV) file on your laptop. The first method is to submit via code.
from google.colab import files
files.upload()
The file picker will open, select the file and it will appear in /content. This file is transient unless moved elsewhere.
The second method is drag and drop. This method is elementary, but the storage remains transient.
- Open file explorer (left panel)
- Drag files directly to
/content
To download a file from Colab to your local computer:
from google.colab import files
files.download('model.pkl')
Your browser will download the file immediately. This works for CSV files, models, logs and images.
If you want your files to survive a runtime reset, you must apply Google Drive. To mount Google Drive:
from google.colab import drive
drive.mount('/content/drive')
After authorizing access, your Drive will appear at:
Everything written here is lasting.
# Recommended project folder structure
Sloppy driving becomes painful very quickly. A immaculate structure that can be reused is:
MyDrive/
└── ColabProjects/
└── My_Project/
├── data/
├── notebooks/
├── models/
├── outputs/
└── README.md
To save time, you can apply paths like:
BASE_PATH = '/content/drive/MyDrive/ColabProjects/My_Project'
DATA_PATH = f'{BASE_PATH}/data/train.csv'
To permanently save a file using Pandas: :
import pandas as pd
df.to_csv('/content/drive/MyDrive/data.csv', index=False)
To load the file later:
df = pd.read_csv('/content/drive/MyDrive/data.csv')
# File management in Colab
// Working with ZIP files
To extract the ZIP file:
import zipfile
with zipfile.ZipFile('dataset.zip', 'r') as zip_ref:
zip_ref.extractall('/content/data')
// Using shell commands to manage files
Colab supports Linux shell commands using !.
!pwd
!ls
!mkdir data
!rm file.txt
!cp source.txt destination.txt
This is very useful for automation. Once you get used to it, you will apply it often.
// Download files directly from the Internet
Instead of uploading manually, you can apply wget: :
!wget https://example.com/data.csv
Or using Requests Python library:
import requests
r = requests.get(url)
open('data.csv', 'wb').write(r.content)
This is very effective for datasets and pre-trained models.
# Additional notes
// Storage limits
Please note the following limitations:
- Colab VM disk space is approximately 100 GB (transient)
- Google Drive storage is circumscribed by your personal limit
- Browser uploads are circumscribed to approximately 5GB
When dealing with immense data sets, always plan ahead.
// Best practices
- Mount the drive at the beginning of the notebook
- Employ variables for paths
- Keep raw data read-only
- Separate data, models, and output into separate folders
- Please add a README file for yourself in the future
// When not to apply Google Drive
Avoid using Google Drive when:
- Training on extremely immense data sets
- Swift I/O is critical to performance
- You need distributed storage
Alternatives you can apply in such cases include:
# Final thoughts
Once you understand how file management works in Colab, your workflow will become much more competent. There is no need to panic about lost files or code rewriting. With these tools, you can ensure immaculate experiments and silky data transfer.
Kanwal Mehreen is a machine learning engineer and technical writer with a deep passion for data science and the intersection of artificial intelligence and medicine. She is co-author of the e-book “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she promotes diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Scholar, a Mitacs Globalink Research Scholar, and a Harvard WeCode Scholar. Kanwal is a staunch advocate for change and founded FEMCodes to empower women in STEM fields.
