
Photo by the author
# Entry
Keeping up with data science isn’t always effortless. Novel libraries, articles, datasets, and tools appear every day, but I can’t remember them all. I’ve found that just following bulletins or threads doesn’t really work. It’s more helpful to have some resources ready to go. For me, it’s a little hub where I keep research, coding, datasets, visualizations, and quick references all in one place. After trying many things, I now have 10 tabs that I utilize all the time. They facilitate me stay focused, save time and know what’s going on. I open them every morning and they kind of set the tone for my day. Here’s an overview of my most popular bookmarks and why I keep them:
# 1. arXiv: Novel Papers on Machine Learning (cs.LG).
arXiv is where I check out the latest machine learning research. The cs.LG section covers everything from theory to applied machine learning in NLP, vision and RL. I’m bookmarking it and checking back frequently so I don’t miss any articles that might inspire novel ideas or projects. This is a great way to get ahead of the curve and learn novel methods before they end up in articles or GitHub.
# 2. Popular Python repositories on GitHub
This page shows the most popular Python projects every week, from novel libraries to experimental tools. I’m bookmarking this because data science isn’t just about algorithms, it’s also about tools. Scanning what’s popular helps me spot useful libraries or patterns early, before they become too crowded. Just 10 minutes a week here usually gives me one or two things worth trying.
# 3. Data is plural
Data is plural is a newsletter and archive full of unusual and fascinating data sets. I keep it bookmarked because it’s great for finding project ideas, tutorials, or hackathon challenges. Each dataset has a tiny description and link. It’s an effortless way to explore novel data and get ideas outside of Kaggle or your usual sources.
# 4. Artificial intelligence destroyed
Artificial intelligence destroyed brings together the most vital news and articles about artificial intelligence and machine learning, saving me hours of searching. Whether it’s a novel article, a tool release, or a novel approach, it provides a quick overview so I can see what’s vital. Basically, a plain way to stay up to date and follow trends.
# 5. RAWCharts
RAWGraphs is a free browser-based tool that allows you to quickly create pristine and customizable charts. I can create visualizations directly from CSV or JSON without complicated writing matplotlib Or Seaborn code. Great for spotting trends, outliers, or creating charts for reports. Charts can be easily exported to vector formats, so they look professional on slides and articles.
# 6. Bad Quartz Data Guide
The A guide to bad quartz data this is one of my favorite ways to pristine soiled data. It discusses common problems such as missing values, garbled text, inconsistent formatting, and incorrectly entered numbers, and provides tips on how to fix them. Muddy data is just part of the job, and this guide saves me a lot of time troubleshooting. I also like the breakdown structure of who should fix what, which makes it much easier to track and troubleshoot.
# 7. Five-minute statistics
Five-minute statistics is a condensed source of basic statistical concepts and formulas. I can easily brush up on topics like hypothesis testing, probability distributions, correlations, and descriptive statistics in just a few minutes. It’s perfect for checking calculations, preparing lessons or writing tutorials without digging through textbooks.
# 8. Amazing data analysis
Amazing data analysis is a collection of GitHub tools and resources for all parts of the data flow. I keep it bookmarked because it’s great for cleaning, manipulating, visualizing data, and building machine learning pipelines. If I’m trying novel libraries, refreshing my toolkit, or sharing knowledge with colleagues or students, it helps me quickly find reliable, well-maintained tools.
# 9. Mockaroo
Mockaroo is a tool for generating random data and mock APIs. I can quickly create realistic datasets in CSV, JSON, SQL or Excel formats without typing everything in manually. Great for testing code, dashboards, or machine learning workflows, including arduous edge cases. Mock API also allows me to work on frontend and backend simultaneously.
# 10. At the traffic lights
At the traffic lights is a platform with job offers in the field of technology and data. I utilize it to browse novel job postings, track companies, and filter job listings by topic, location, or remote options. You can also export lists in CSV or JSON format, making it easier to track opportunities. It’s a plain way to stay up to date with the job market without having to jump between multiple sites.
Kanwal Mehreen is a machine learning engineer and technical writer with a deep passion for data science and the intersection of artificial intelligence and medicine. She is co-author of the e-book “Maximizing Productivity with ChatGPT”. As a 2022 Google Generation Scholar for APAC, she promotes diversity and academic excellence. She is also recognized as a Teradata Diversity in Tech Scholar, a Mitacs Globalink Research Scholar, and a Harvard WeCode Scholar. Kanwal is a staunch supporter of change and founded FEMCodes to empower women in STEM fields.
