<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:media="http://search.yahoo.com/mrss/"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Data Science &#8211; AI SCKOOL</title>
	<atom:link href="https://aisckool.com/category/data-science/feed/" rel="self" type="application/rss+xml" />
	<link>https://aisckool.com</link>
	<description>All About Artificial Intelligence</description>
	<lastBuildDate>Thu, 23 Apr 2026 03:04:00 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://aisckool.com/wp-content/uploads/2024/05/cropped-8FDB48F0-2148-449F-B10B-86E84E56DAD5-removebg-preview-1-e1716890217940-32x32.png</url>
	<title>Data Science &#8211; AI SCKOOL</title>
	<link>https://aisckool.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>5 Docker best practices for faster builds and smaller images</title>
		<link>https://aisckool.com/5-docker-best-practices-for-faster-builds-and-smaller-images/</link>
					<comments>https://aisckool.com/5-docker-best-practices-for-faster-builds-and-smaller-images/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Thu, 23 Apr 2026 03:04:00 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26324</guid>

					<description><![CDATA[Photo by the author # Entry You wrote a Dockerfile, built an image, and everything works. But then you notice that the image is over a gigabyte in size, rebuilding takes several minutes for even the smallest changes, and each press or pull feels painfully sluggish. This is not unusual. These are the default results [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div id="post-">
<p>    <center><br /><span>Photo by the author</span></center></p>
<h2><span># </span>Entry</h2>
<p>You wrote a Dockerfile, built an image, and everything works. But then you notice that the image is over a gigabyte in size, rebuilding takes several minutes for even the smallest changes, and each press or pull feels painfully sluggish.</p>
<p>This is not unusual. These are the default results if you write Dockerfiles without thinking about base image selection, build context, and caching. You don&#8217;t need a complete overhaul to fix this. A few targeted changes can shrink your image by 60-80% and turn most rebuilds from minutes to seconds.</p>
<p>In this article, we&#8217;ll cover five practical techniques that will assist you learn how to make your Docker images smaller, faster, and more effective.</p>
</p>
<h2><span># </span>Prerequisites</h2>
<p>To follow along you will need:</p>
<ul>
<li><strong><a href="https://docs.docker.com/get-docker/" target="_blank" rel="noopener">Docker</a></strong>    installed</li>
<li>Basic knowledge <code style="background: #F5F5F5;">Dockerfiles</code> and <code style="background: #F5F5F5;">docker build</code> order</li>
<li>Python project with <code style="background: #F5F5F5;">requirements.txt</code> file (the examples employ Python, but the rules apply to any language)</li>
</ul>
<h2><span># </span>Select Slim or Alpine base images</h2>
<p>Each Dockerfile starts with a <code style="background: #F5F5F5;">FROM</code> statement that selects a base image. This base image is the foundation on which your application is built, and its size becomes the minimum image size before adding one line of your own code.</p>
<p>For example, an official <code style="background: #F5F5F5;">python:3.11</code> image is a full Debian-based image that includes compilers, tools, and packages that most applications never employ.</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Full image — everything included
FROM python:3.11

# Slim image — minimal Debian base
FROM python:3.11-slim

# Alpine image — even smaller, musl-based Linux
FROM python:3.11-alpine</code></pre>
</div>
<p>Now build an image from each of them and check the sizes:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>docker images | grep python</code></pre>
</div>
<p>If you change one line in your Dockerfile, you&#8217;ll see a difference of several hundred megabytes. So which one should you employ?</p>
<ul>
<li><strong>slim</strong> is the safer default for most Python projects. Removes unnecessary tools but retains C libraries that many Python packages need to install properly.
</li>
<li><strong>alpine</strong> is even smaller, but uses a different C library — <strong><a href="https://wiki.alpinelinux.org/wiki/Musl" target="_blank" rel="noopener">muscle</a></strong>    instead <strong><a href="https://www.gnu.org/software/libc/" target="_blank" rel="noopener">glibc</a></strong>    &#8211; which may cause compatibility issues with some Python packages. So you can spend more time debugging failed pip installs than you save on image size.
</li>
</ul>
<p><strong>Rule of thumb</strong>: start with <strong>python: 3.1x-slim</strong>. Switch to Alpine only if you are sure your dependencies are compatible and need additional size reduction.</p>
</p>
<h4><span>// </span>Tier ordering to maximize cache</h4>
<p>Docker builds images layer by layer, one statement at a time. Once the layer is built, Docker caches it. If nothing changes in the next build that would affect the layer, Docker will reuse the cached version and skip rebuilding it.</p>
<p>Hook: <strong>if a layer changes, each subsequent layer will be invalidated and built anew</strong>.</p>
<p>This is very crucial when installing dependencies. Here&#8217;s a common mistake:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Bad layer order — dependencies reinstall on every code change
FROM python:3.11-slim

WORKDIR /app

COPY . .                          # copies everything, including your code
RUN pip install -r requirements.txt   # runs AFTER the copy, so it reruns whenever any file changes</code></pre>
</div>
<p>Every time you change a single line in the script, Docker invalidates <code style="background: #F5F5F5;">COPY . .</code> layer and then reinstalls all dependencies from scratch. In design with bulky <code style="background: #F5F5F5;">requirements.txt</code>these are minutes lost on reconstruction.</p>
<p>The solution is uncomplicated: <strong>first copy what changes at least</strong>.</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Good layer order — dependencies cached unless requirements.txt changes
FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .           # copy only requirements first
RUN pip install --no-cache-dir -r requirements.txt   # install deps — this layer is cached

COPY . .                          # copy your code last — only this layer reruns on code changes

CMD ["python", "app.py"]</code></pre>
</div>
<p>Now that you&#8217;ve changed <code style="background: #F5F5F5;">app.py</code>Docker reuses the cached pip layer and only re-runs the final version <code style="background: #F5F5F5;">COPY . .</code>.</p>
<p><strong>Rule of thumb</strong>: order yours <code style="background: #F5F5F5;">COPY</code> AND <code style="background: #F5F5F5;">RUN</code> instructions from least frequently changed to most frequently changed. Dependencies before code, always.</p>
</p>
<h2><span># </span>Using multi-stage builds</h2>
<p>Some tools are only needed at the build stage &#8211; compilers, test runners, build dependencies &#8211; but they end up in the final image anyway, populating it with elements that the running application never touches.</p>
<p>Multi-stage builds solve this problem. You employ one step to build or install everything you need, then copy only the finished result into a tidy, minimal final image. Build tools never make it into the image you upload.</p>
<p>Here is a Python example where we want to install dependencies but keep the final image:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Single-stage — build tools end up in the final image
FROM python:3.11-slim

WORKDIR /app

RUN apt-get update && apt-get install -y gcc build-essential
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
CMD ["python", "app.py"]</code></pre>
</div>
<p>Now with multi-step build:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Multi-stage — build tools stay in the builder stage only

# Stage 1: builder — install dependencies
FROM python:3.11-slim AS builder

WORKDIR /app

RUN apt-get update && apt-get install -y gcc build-essential

COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: runtime — tidy image with only what's needed
FROM python:3.11-slim

WORKDIR /app

# Copy only the installed packages from the builder stage
COPY --from=builder /install /usr/local

COPY . .

CMD ["python", "app.py"]</code></pre>
</div>
<p>The gcc and compilation tools &#8211; needed to compile some Python packages &#8211; disappeared from the final image. The application still runs because the compiled packages were copied. The build tools themselves remained in the builder stage, which Docker discards. This pattern is even more crucial in Go or Node.js projects, where compiler or node modules hundreds of megabytes in size can be completely excluded from the provided image.</p>
</p>
<h2><span># </span>Cleaning in the installation layer</h2>
<p>When installing system packages using <code style="background: #F5F5F5;">apt-get</code>the package manager takes lists of packages and caches files you don&#8217;t need at runtime. If you delete them in a separate file <code style="background: #F5F5F5;">RUN</code> instructions still exist in the middleware, and Docker&#8217;s layering system means they still affect the final image size.</p>
<p>To actually remove them, the cleaning must occur in the same place <code style="background: #F5F5F5;">RUN</code> instructions as installation.</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Cleanup in a separate layer — cached files still bloat the image
FROM python:3.11-slim

RUN apt-get update && apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/* # already committed in the layer above

# Cleanup in the same layer — nothing is committed to the image
FROM python:3.11-slim

RUN apt-get update && apt-get install -y curl 
    && rm -rf /var/lib/apt/lists/*</code></pre>
</div>
<p>The same logic applies to other package managers and momentary files.</p>
<p><strong>Rule of thumb</strong>: everyone <code style="background: #F5F5F5;">apt-get install</code> should be followed <code style="background: #F5F5F5;">&& rm -rf /var/lib/apt/lists/*</code> in the same <code style="background: #F5F5F5;">RUN</code> order. Make it a habit.</p>
</p>
<h2><span># </span>Implementing .dockerignore files</h2>
<p>When you run <code style="background: #F5F5F5;">docker build</code>Docker sends everything in the build directory to the Docker daemon as the build context. This happens before any instructions in the Dockerfile are run, and often contains files that you almost certainly don&#8217;t want in your image.</p>
<p>Without <code style="background: #F5F5F5;">.dockerignore</code> file, you send the entire project folder: <code style="background: #F5F5F5;">.git</code> history, virtual environments, local data files, test devices, editor configurations, and more. This slows down each build and runs the risk of copying sensitive files into the image.</p>
<p>AND <code style="background: #F5F5F5;">.dockerignore</code> file works exactly like <code style="background: #F5F5F5;">.gitignore</code>; tells Docker which files and folders should be excluded from the build context.</p>
<p>Here is a sample, although truncated, <code style="background: #F5F5F5;">.dockerignore</code> for a typical Python data project:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code># Python
__pycache__/
*.pyc
*.pyo
*.pyd
.Python
*.egg-info/

# Virtual environments
.venv/
venv/
env/

# Data files (don't bake huge datasets into images)
data/
*.csv
*.parquet
*.xlsx

# Jupyter
.ipynb_checkpoints/
*.ipynb

...

# Tests
tests/
pytest_cache/
.coverage

...

# Secrets — never let these into an image
.env
*.pem
*.key</code></pre>
</div>
<p>This significantly reduces the data sent to the Docker daemon before compilation begins. For huge data projects where parquet files or raw CSV files reside in the project folder, this may be the single biggest win of all five practices.</p>
<p>It is worth paying attention to the safety aspect. If your project folder contains <code style="background: #F5F5F5;">.env</code> files with API keys or database credentials, forgetting <code style="background: #F5F5F5;">.dockerignore</code> means that these secrets can be incorporated into your image &#8211; especially if you have extensive knowledge <code style="background: #F5F5F5;">COPY . .</code> instruction.</p>
<p><strong>Rule of thumb</strong>: Always add <code style="background: #F5F5F5;">.env</code> and any authentication files for <code style="background: #F5F5F5;">.dockerignore</code> except for data files that do not need to be blended into the image. Utilize also <strong><a href="https://docs.docker.com/engine/swarm/secrets/" target="_blank" rel="noopener">Docker secrets</a></strong>    for sensitive data.</p>
</p>
<h2><span># </span>Abstract</h2>
<p>None of these techniques require advanced Docker knowledge; they are more habits than techniques. Utilize them consistently and your images will be smaller, your builds will be faster, and your deployments will be cleaner.</p>
</p>
<table style="width: 100%; border-collapse: collapse; font-family: Arial, sans-serif; font-size: 14px; color: #333;">
<thead>
<tr style="background-color: #ffd29a;">
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">Practice</th>
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">What fixes it</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Basic slim/alpine image</td>
<td style="padding: 12px; border: 1px solid #ddd;">
<p>It provides smaller images, starting with only the necessary operating system packages.
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Layer order</td>
<td style="padding: 12px; border: 1px solid #ddd;">
<p>It avoids reinstalling dependencies every time you change your code.
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Multi-stage construction</td>
<td style="padding: 12px; border: 1px solid #ddd;">
<p>Excludes creation tools from the final image.
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Cleaning the same layer</td>
<td style="padding: 12px; border: 1px solid #ddd;">
<p>Prevents middle layers from being bloated by apt cache.
</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;"><code style="background: #F5F5F5;">.dockerignore</code></td>
<td style="padding: 12px; border: 1px solid #ddd;">
<p>Reduces build context and protects images from secrets.
</td>
</tr>
</tbody>
</table>
<p>Joyful coding!</p>
<p><b><a href="https://twitter.com/balawc27" rel="noopener" target="_blank"><strong><a href="https://www.kdnuggets.com/wp-content/uploads/bala-priya-author-image-update-230821.jpg" target="_blank" rel="noopener noreferrer">Priya C&#8217;s girlfriend</a></strong></a></b>    is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She enjoys reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates captivating resource overviews and coding tutorials.</p>
</p></div>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/5-docker-best-practices-for-faster-builds-and-smaller-images/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i0.wp.com/www.kdnuggets.com/wp-content/uploads/bala-docker-smaller-images-faster-builds.png?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>Up-to-date gas-powered data centers could emit more greenhouse gases than entire countries</title>
		<link>https://aisckool.com/up-to-date-gas-powered-data-centers-could-emit-more-greenhouse-gases-than-entire-countries/</link>
					<comments>https://aisckool.com/up-to-date-gas-powered-data-centers-could-emit-more-greenhouse-gases-than-entire-countries/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Wed, 22 Apr 2026 18:03:20 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26296</guid>

					<description><![CDATA[&#8220;[Data center operators’] we believe that the value delivered by servers far outweighs the costs of constantly operating these inefficient power plants,” says Koomey. Gas projects developed as part of Project Stargate, a massive multi-company artificial intelligence effort that was originally launched to build infrastructure for OpenAI, also represent a potential carbon bomb on WIRED&#8217;s [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p class="paywall">&#8220;[Data center operators’] we believe that the value delivered by servers far outweighs the costs of constantly operating these inefficient power plants,” says Koomey.</p>
<p class="paywall">Gas projects developed as part of Project Stargate, a massive multi-company artificial intelligence effort that was originally launched to build infrastructure for OpenAI, also represent a potential carbon bomb on WIRED&#8217;s list. Stargate campuses are being built in multiple states, including Texas, New Mexico, Ohio and Wisconsin. Permitting documents for just three natural gas projects linked to Stargate &#8211; one to power a data center campus near the project&#8217;s headquarters in Abilene, Texas, and two to power Project Jupiter at the New Mexico campus &#8211; show they have a combined potential to emit more than 24 million tons of greenhouse gases annually.</p>
<p class="paywall">“We are committed to protecting payers while building the infrastructure needed for American AI leadership,” OpenAI spokesman Aaron McLear said in a statement. “Where natural gas is needed to provide reliable energy in the near future, we are working with partners to leverage modern, efficient generation while helping to accelerate clean energy and grid modernization.”</p>
<p class="paywall">Oracle spokeswoman Julia Allyn Fishel told WIRED that a &#8220;modification&#8221; is currently underway to Project Jupiter &#8220;that is expected to significantly reduce greenhouse gas emissions.&#8221; The company did not provide new emissions estimates, which the New Mexico Department of the Environment has not yet made public.</p>
<p class="paywall">“Oracle is committed to self-funding energy costs by implementing the best energy solutions for every community, so that our AI data centers do not impact ratepayer bills and electric grid reliability,” Fishel said in a statement.</p>
<p class="paywall">The fourth gas plant at Stargate&#8217;s main campus in Abilene has the potential to produce more than 7.8 million tons of carbon dioxide equivalent each year, according to application documents. This power plant is being built by Crusoe for Microsoft&#8217;s employ. Companies <a data-offer-url="https://www.crusoe.ai/resources/newsroom/crusoe-announces-new-900-mw-ai-factory-campus-in-abilene-texas-to-support-microsoft-ai-infrastructure" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.crusoe.ai/resources/newsroom/crusoe-announces-new-900-mw-ai-factory-campus-in-abilene-texas-to-support-microsoft-ai-infrastructure&quot;}" href="https://www.crusoe.ai/resources/newsroom/crusoe-announces-new-900-mw-ai-factory-campus-in-abilene-texas-to-support-microsoft-ai-infrastructure" rel="nofollow noopener" target="_blank">announced</a> in tardy March, Crusoe will be constructing fresh buildings on the Abilene campus, including a power plant, to support Microsoft&#8217;s artificial intelligence infrastructure. (Microsoft declined to comment.)</p>
<p class="paywall">There are projects with an even larger potential carbon footprint than Stargate. Outside of Amarillo, Texas, White House darling Fermi is building what President Donald J. Trump calls the Advanced Energy and Intelligence Campus, a data center campus with a target capacity of 17 gigawatts. Fermi continues to emphasize the use of so-called &#8220;tidy&#8221; natural gas. However, the documents show that the maximum emissions of both gas projects combined could amount to over 40.3 million tons of CO<sub>2</sub> equivalents per year, more than the annual emissions of all energy sources in the state of Connecticut.</p>
<p class="paywall">About five hours south of Amarillo, near the town of Fort Stockton, Pacifico Energy is expanding <a data-offer-url="https://www.pacificoenergy.com/post/pacifico-energy-secures-7-65-gw-power-generation-permit-for-gw-ranch-project" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.pacificoenergy.com/post/pacifico-energy-secures-7-65-gw-power-generation-permit-for-gw-ranch-project&quot;}" href="https://www.pacificoenergy.com/post/pacifico-energy-secures-7-65-gw-power-generation-permit-for-gw-ranch-project" rel="nofollow noopener" target="_blank">claims</a> is the largest single energy project in the country: a 7.2-gigawatt data center campus powered by a gas project that could emit more than 33 million tons of greenhouse gases annually. (Pacifico did not respond to a request for comment.)</p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/up-to-date-gas-powered-data-centers-could-emit-more-greenhouse-gases-than-entire-countries/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i2.wp.com/media.wired.com/photos/69d6c15c510ba3167cd288cf/191:100/w_1280,c_limit/040826-data-center-gas-projects-emissions.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>Advanced pandas patterns that most data scientists don&#8217;t operate</title>
		<link>https://aisckool.com/advanced-pandas-patterns-that-most-data-scientists-dont-operate/</link>
					<comments>https://aisckool.com/advanced-pandas-patterns-that-most-data-scientists-dont-operate/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Wed, 22 Apr 2026 09:02:28 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26280</guid>

					<description><![CDATA[Photo by the author # Entry Most data scientists are learning pandas by reading tutorials and copying working patterns. This is a good solution to start with, but it often causes beginners to develop bad habits. Exploit iterrows() loops, intermediate variable assignments and repeatables merge() calls are some examples of code that is technically exact, [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div id="post-">
<p>    <center><br /><span>Photo by the author</span></center></p>
<h2><span># </span>Entry</h2>
<p>Most data scientists are learning <strong><a href="https://pandas.pydata.org/" target="_blank" rel="noopener">pandas</a></strong>    by reading tutorials and copying working patterns.</p>
<p>This is a good solution to start with, but it often causes beginners to develop bad habits. Exploit <code style="background: #F5F5F5;">iterrows()</code> loops, intermediate variable assignments and repeatables <code style="background: #F5F5F5;">merge()</code> calls are some examples of code that is technically exact, but slower than necessary and harder to read than it should be.</p>
<p>The following patterns are not edge cases. They cover the most common everyday operations in data science, such as filtering, transforming, joining, grouping, and calculating conditional columns.</p>
<p>In each of them, there is a common approach and a better approach, and the distinction is usually one of awareness, not complexity.</p>
<p>These six have the greatest impact: combining methods, <code style="background: #F5F5F5;">pipe()</code> pattern, capable joins and merges, clustering optimizations, vectorized conditional logic, and performance pitfalls.</p>
<p><img decoding="async" alt="Advanced panda patterns" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/Rosidi_Advanced-Pandas-Patterns-3.png"></p>
</p>
<h2><span># </span>Chain of methods</h2>
<p>Indirect variables can make your code more organized, but they often just add noise. <strong><a href="https://www.geeksforgeeks.org/python/method-chaining-in-python/" target="_blank" rel="noopener">Chain of methods</a></strong>    lets you write a sequence of transformations as a single expression that reads naturally and avoids naming objects that don&#8217;t require unique identifiers.</p>
<p>Instead:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df1 = df[df['status'] == 'dynamic']
df2 = df1.dropna(subset=['revenue'])
df3 = df2.assign(revenue_k=df2['revenue'] / 1000)
result = df3.sort_values('revenue_k', ascending=False)</code></pre>
</div>
<p>You write this:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>result = (
    df
    .query("status == 'active'")
    .dropna(subset=['revenue'])
    .assign(revenue_k=lambda x: x['revenue'] / 1000)
    .sort_values('revenue_k', ascending=False)
)</code></pre>
</div>
<p>Lambda in <code style="background: #F5F5F5;">assign()</code> is significant here.</p>
<p>While linking, the current file status <code>DataFrame</code> cannot be accessed by name; you have to operate lambda to refer to it. The most common cause of broken chains is forgetting, which usually results in: <code style="background: #F5F5F5;">NameError</code> or an invalid reference to a variable that was defined earlier in the script.</p>
<p>Another mistake to be aware of is usage <code style="background: #F5F5F5;">inplace=True</code> inside the chain. Methods from <code style="background: #F5F5F5;">inplace=True</code> return <code>None</code>which immediately breaks the chain. When writing string code, in-place operations should be avoided because they provide no memory benefits and make the code hard to track.</p>
</p>
<h2><span># </span>Pipe() pattern.</h2>
<p>When one of your transformations is intricate enough to deserve its own function, using <strong><a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pipe.html" target="_blank" rel="noopener">pipe()</a></strong>    allows you to keep it inside the chain.</p>
<p><code style="background: #F5F5F5;">pipe()</code>    passes <code>DataFrame</code> as the first argument of any call:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>def normalize_columns(df, cols):
    df[cols] = (df[cols] - df[cols].mean()) / df[cols].std()
    return df

result = (
    df
    .query("status == 'active'")
    .pipe(normalize_columns, cols=['revenue', 'sessions'])
    .sort_values('revenue', ascending=False)
)</code></pre>
</div>
<p>This allows you to keep intricate transformation logic within a named, testable function while preserving the chain. Each pipelined function can be tested individually, which becomes a challenge when the logic is hidden in an extensive chain.</p>
<p>Practical value <code style="background: #F5F5F5;">pipe()</code> goes beyond appearance. Dividing the processing pipeline into labeled functions and combining them <code style="background: #F5F5F5;">pipe()</code> enables self-documentation of the code. Anyone who reads the sequence can understand each step from the function name without having to analyze the implementation.</p>
<p>It also makes it easier to swap or skip steps when debugging: if you comment out one <code style="background: #F5F5F5;">pipe()</code> call, the rest of the chain will continue to function smoothly.</p>
</p>
<h2><span># </span>Productive combining and merging</h2>
<p>One of the most overused features of pandas is <strong><a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html" target="_blank" rel="noopener">to combine()</a></strong>. The two most common errors are many-to-many joins and noiseless row inflation.</p>
<p>If both dataframes have duplicate values ​​in the join key, <code style="background: #F5F5F5;">merge()</code> performs the Cartesian product of these rows. For example, if the join key is not unique on at least one side, joining the 500-row &#8220;users&#8221; table with the &#8220;events&#8221; table could result in millions of rows.</p>
<p>This does not cause an error; it just produces <code>DataFrame</code> which seems correct but is larger than expected until you check its shape.</p>
<p>The solution is <code style="background: #F5F5F5;">validate</code> parameter:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df.merge(other, on='user_id', validate="many_to_one")</code></pre>
</div>
<p>It elevates <code style="background: #F5F5F5;">MergeError</code> immediately if the many-to-one assumption is violated. Exploit &#8220;one_to_one&#8221;, &#8220;one_to_many&#8221; or &#8220;many_to_one&#8221; depending on what you want from the linking.</p>
<p>The <code style="background: #F5F5F5;">indicator=True</code> parameter is equally useful for debugging:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>result = df.merge(other, on='user_id', how='left', indicator=True)
result['_merge'].value_counts()</code></pre>
</div>
<p>This parameter adds <code style="background: #F5F5F5;">_merge</code> a column showing whether each row comes from &#8216;only_left&#8217;, &#8216;only_right&#8217;, or &#8216;both&#8217;. This is the fastest way to catch rows that didn&#8217;t match when you expected them to match.</p>
<p>In cases where both data frames share a common index, <strong><a href="https://www.stratascratch.com/blog/types-of-pandas-joins-and-how-to-use-them-in-python?utm_source=blog&#038;utm_medium=click&#038;utm_campaign=kdn+advanced+pandas+patterns" target="_blank" rel="noopener">append()</a></strong>    is faster than <code style="background: #F5F5F5;">merge()</code> because it works directly on the index instead of searching a specific column.</p>
</p>
<h2><span># </span>Group optimizations</h2>
<p>When using A <code>GroupBy</code>one of the rarely used methods is <code style="background: #F5F5F5;">transform()</code>. The difference between <code style="background: #F5F5F5;">agg()</code> AND <code style="background: #F5F5F5;">transform()</code> depends on what shape you want to recover.</p>
<p>The <strong><a href="https://www.stratascratch.com/blog/advanced-pandas-aggregations-for-data-analysts-and-scientists?utm_source=blog&#038;utm_medium=click&#038;utm_campaign=kdn+advanced+pandas+patterns" target="_blank" rel="noopener">agg() method.</a></strong>    returns one row per group. On the other hand, <code style="background: #F5F5F5;">transform()</code> restores the same shape as the original <code>DataFrame</code>with each row filled with the aggregated value of a given group. This makes it ideal for adding group-level statistics as modern columns without having to merge them later. It&#8217;s also faster than the manual approach to aggregating and merging because pandas doesn&#8217;t have to align two data frames after the fact:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df['avg_revenue_by_segment'] = df.groupby('segment')['revenue'].transform('mean')</code></pre>
</div>
<p>This will directly add the average revenue for each segment to each row. Same result with <code style="background: #F5F5F5;">agg()</code> would require calculating the average and then reconnecting based on the segment key, using two steps instead of one.</p>
<p>For category grouping columns, always operate <code style="background: #F5F5F5;">observed=True</code>: :</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df.groupby('segment', observed=True)['revenue'].sum()</code></pre>
</div>
<p>Without this argument, pandas calculates results for each category defined in the dtype column, including combinations that do not appear in the actual data. For gigantic data frames with many categories, this results in empty groups and unnecessary computations.</p>
</p>
<h2><span># </span>Vectorized conditional logic</h2>
<p>Using <code style="background: #F5F5F5;">apply()</code> With <strong><a href="https://www.stratascratch.com/blog/how-to-use-python-lambda-functions?utm_source=blog&#038;utm_medium=click&#038;utm_campaign=kdn+advanced+pandas+patterns" target="_blank" rel="noopener">lambda function</a></strong>    per-row is the least capable way to calculate conditional values. It avoids the C-level operations that speed up pandas by running the Python function independently on each line.</p>
<p>For binary conditions <strong><a href="https://numpy.org/" target="_blank" rel="noopener">NumPy</a></strong>&#8216;S <code style="background: #F5F5F5;">np.where()</code> is a direct replacement for:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df['label'] = np.where(df['revenue'] > 1000, 'high', 'low')</code></pre>
</div>
<p>For many conditions <code style="background: #F5F5F5;">np.select()</code> deals with them cleanly:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>conditions = [
    df['revenue'] > 10000,
    df['revenue'] > 1000,
    df['revenue'] > 100,
]
choices = ['enterprise', 'mid-market', 'small']
df['segment'] = np.select(conditions, choices, default="micro")</code></pre>
</div>
<p>The <code style="background: #F5F5F5;">np.select()</code> the function directly maps the if/elif/else structure at vector speed, evaluating the conditions in order and assigning the first matching option. This is typically 50 to 100 times faster than the equivalent <code style="background: #F5F5F5;">apply()</code> on <code>DataFrame</code> with a million lines.</p>
<p>In the case of numerical categorization, the conditional assignment is completely replaced by <code style="background: #F5F5F5;">pd.cut()</code> (containers of equal width) i <code style="background: #F5F5F5;">pd.qcut()</code> (quantile-based bins) that automatically return a categorical column without the need to operate NumPy. Pandas takes care of everything, including labeling and boundary value handling when you pass the number of bins or the edges of the bins.</p>
</p>
<h2><span># </span>Performance pitfalls</h2>
<p>Some common patterns leisurely down pandas code more than anything else.</p>
<p>For example, <strong><a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html" target="_blank" rel="noopener">iterrows()</a></strong>    repeats <code>DataFrame</code> rows as (index, <code>Series</code>) pairs. This is an intuitive but leisurely approach. For <code>DataFrame</code> at 100,000 rows this function call can be 100 times slower than its vector counterpart.</p>
<p>The lack of efficiency comes from building a complete system <code>Series</code> object for each line and executing the Python code on it one by one. As soon as you notice yourself writing <code style="background: #F5F5F5;">for _, row in df.iterrows()</code>stop and think if <code style="background: #F5F5F5;">np.where()</code>, <code style="background: #F5F5F5;">np.select()</code>or a grouping operation can replace it. In most cases, one of them can do it.</p>
<p>Using <code style="background: #F5F5F5;">apply(axis=1)</code> is faster than <code style="background: #F5F5F5;">iterrows()</code> but it has the same problem: python-level execution for each line. For any operation that can be represented using NumPy&#8217;s or pandas&#8217; built-in functions, the built-in method is always faster.</p>
<p>Object type columns are also an simple to miss source of slowness. When pandas stores strings as an object type d, operations on these columns are performed in Python rather than C. For low-cardinality columns such as status codes, region names, or categories, converting them to a categorical type can significantly speed up grouping and <code style="background: #F5F5F5;">value_counts()</code>.</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df['status'] = df['status'].astype('category')</code></pre>
</div>
<p>Finally, avoid string assignments. Using <code style="background: #F5F5F5;">df[df['revenue'] > 0]['label'] = 'positive'</code> he could change the initial <code>DataFrame</code>depending on whether pandas generated the copy behind the scenes. The behavior is undefined. To operate <code style="background: #F5F5F5;">.loc</code> instead next to the logical mask:</p>
<div style="width: 98%; overflow: auto; padding-left: 10px; padding-bottom: 10px; padding-top: 10px; background: #F5F5F5;">
<pre><code>df.loc[df['revenue'] > 0, 'label'] = 'positive'</code></pre>
</div>
<p>It is unambiguous and awakens no <code style="background: #F5F5F5;">SettingWithCopyWarning</code>.</p>
</p>
<h2><span># </span>Application</h2>
<p>These patterns distinguish code that works from code that works well: it is capable enough to operate on real data, readable enough to maintain, and organized in a way that makes it simple to test.</p>
<p>Chain of methods i <code style="background: #F5F5F5;">pipe()</code> address readability, while linking and grouping patterns address correctness and efficiency. Vectorization logic and trap section address speed.</p>
<p><img decoding="async" alt="Advanced panda patterns" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/Rosidi_Advanced-Pandas-Patterns-4.png"> </p>
<p>Most of the pandas code we review has at least two or three of these problems. They build up silently &#8211; there&#8217;s a leisurely loop here, an unacknowledged connection there, or an object type column that no one noticed. None of them cause obvious failures, so they persist. A sharp way to start is to repair them one at a time.</p>
<p><a href="https://twitter.com/StrataScratch" rel="noopener" target="_blank"><b><strong><a href="https://twitter.com/StrataScratch" target="_blank" rel="noopener noreferrer">Nate Rosidi</a></strong></b></a>    is a data scientist and product strategist. He is also an adjunct professor of analytics and the founder of StrataScratch, a platform that helps data scientists prepare for job interviews using real interview questions from top companies. Nate writes about the latest career trends, gives interview advice, shares data science projects, and discusses all things SQL.</p>
</p></div>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/advanced-pandas-patterns-that-most-data-scientists-dont-operate/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i3.wp.com/www.kdnuggets.com/wp-content/uploads/Rosidi_Advanced-Pandas-Patterns-1.png?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>How to watch the Lyrids meteor shower at its peak</title>
		<link>https://aisckool.com/how-to-watch-the-lyrids-meteor-shower-at-its-peak/</link>
					<comments>https://aisckool.com/how-to-watch-the-lyrids-meteor-shower-at-its-peak/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Tue, 21 Apr 2026 23:58:59 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26276</guid>

					<description><![CDATA[Astronomy in mid-April enthusiasts will be able to enjoy one of the classic celestial spectacles. The meteor shower known as the Lyrids will lithe up the sky, especially in the Northern Hemisphere, and anyone will be able to see it with the naked eye, weather permitting &#8211; if they know where to look. Lyrids began [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p><span class="lead-in-text-callout">Astronomy in mid-April</span> enthusiasts will be able to enjoy one of the classic celestial spectacles. The meteor shower known as the Lyrids will lithe up the sky, especially in the Northern Hemisphere, and anyone will be able to see it with the naked eye, weather permitting &#8211; if they know where to look.</p>
<p class="paywall">Lyrids began to appear on April 14, but their peak activity occurred on the night of April 21 and the early morning of April 22. <a href="https://science.nasa.gov/solar-system/whats-up-april-2026-skywatching-tips-from-nasa/" class="text link" target="_blank" rel="noopener">according to NASA</a>. During these hours, the shower will show 15 to 20 meteors per hour under dim skies.</p>
<p class="paywall">The shower gets its name because the meteors appear to emerge from the constellation Lyra. Locating the radiant is uncomplicated if you apply an astronomical mapping app: just find Vega, the fifth brightest star in the sky, behind only Sirius, Canopus, Alpha Centauri A and Arcturus. Once you locate it, look around; the bright traces of the Lyrids will appear projected from this point due to the effect of perspective. Please remember that the human eye takes 20 to 30 minutes to adjust to the dim.</p>
<p class="paywall">During its peak, the Moon will be in the early crescent phase, so there will be very little interference from its lithe. In a dim sky, meteors should stand out easily. The shower is usually perceptible from 10 p.m. until dawn, although early morning offers the best conditions. It&#8217;s best to stay away from lithe pollution and, if possible, observe from a high altitude. A trip to the mountains works well.</p>
<p class="paywall">Each meteor shower has a different origin. In April, the Earth passes through a cloud of fragments left by comet C/1861 G1 (Thatcher) in its orbit around the Sun. This comet, discovered in 1861, takes approximately 415 years to complete its journey. Grains of ice and rock that it released centuries ago enter the atmosphere at high speeds and produce flares we know as Lyrids.</p>
<p class="paywall">After the Lyrids, the calendar still offers several spectacles for those who follow the night sky. The Eta Aquarids will arrive in May along with the remnants of Halley&#8217;s Comet. The Perseids will appear in August, the Orionids will return in October, and the year will end with the Leonids in November and the Geminids in December. The latter is considered the most intense and reliable shower on the calendar.</p>
<p class="paywall"><em>This story originally appeared on <a href="https://es.wired.com/articulos/como-y-cuando-ver-las-liridas-la-espectacular-lluvia-de-estrellas-de-abril-de-2026" class="text link" target="_blank" rel="noopener">WIRED in Spanish</a> and was translated from Spanish.</em></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/how-to-watch-the-lyrids-meteor-shower-at-its-peak/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i3.wp.com/media.wired.com/photos/69e6859a62260009d0021b0e/191:100/w_1280,c_limit/GettyImages-2210802594.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>The US government will ask data centers how much energy they employ</title>
		<link>https://aisckool.com/the-us-government-will-ask-data-centers-how-much-energy-they-employ/</link>
					<comments>https://aisckool.com/the-us-government-will-ask-data-centers-how-much-energy-they-employ/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Tue, 21 Apr 2026 05:53:48 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26256</guid>

					<description><![CDATA[US Federal According to the letter obtained by WIRED, the government&#8217;s central energy information agency plans to implement a mandatory, nationwide survey of data centers, focusing on their energy consumption. This survey would be the first attempt of its kind to collect basic information about data centers. The letter was sent to Senators Elizabeth Warren [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p><span class="lead-in-text-callout">US Federal</span> According to the letter obtained by WIRED, the government&#8217;s central energy information agency plans to implement a mandatory, nationwide survey of data centers, focusing on their energy consumption. This survey would be the first attempt of its kind to collect basic information about data centers.</p>
<p class="paywall">The letter was sent to Senators Elizabeth Warren and Josh Hawley on April 9 by the head of the Energy Information Administration, Tristan Abbey, and responds to an earlier inquiry from senators about EIA&#8217;s plans to obtain more information about data centers. WIRED reported on Hawley and Warren&#8217;s letter last month.</p>
<p class="paywall">“Americans deserve to know how much energy data centers consume and what impact it has on their utility bills,” Warren told WIRED in a statement. &#8220;EIA&#8217;s mandatory survey is an important first step toward holding data centers accountable, but people are hurting right now. I&#8217;m pushing EIA to collect and share this data as quickly as possible.&#8221;</p>
<p class="paywall">EIA told WIRED it had no details to provide beyond those included in the letter to senators.</p>
<p class="paywall">The explosion of data centers across the United States has sparked a wave of public concern and proposed legislation to limit the employ of their resources and impose a moratorium on their construction. However, surprisingly little official data on the industry has been collected.</p>
<p class="paywall">Most details about data center energy employ &#8211; a particular concern for many voters in the face of rising utility bills &#8211; are considered proprietary business information and are not usually made public. In response to the Trump administration&#8217;s incentives to protect ratepayers, many data center developers are now starting to build their own power sources, known as behind-the-meter power. These facilities – many of which are gas-powered – are raising up-to-date concerns about air pollution and climate change. (On Tuesday, the NAACP <a data-offer-url="https://www.theguardian.com/technology/2026/apr/14/naacp-lawsuit-elon-musk-xai-memphis" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.theguardian.com/technology/2026/apr/14/naacp-lawsuit-elon-musk-xai-memphis&quot;}" href="https://www.theguardian.com/technology/2026/apr/14/naacp-lawsuit-elon-musk-xai-memphis" rel="nofollow noopener" target="_blank">filed a lawsuit</a> against xAI alleging that it ran gas turbines behind the meter at a data center in Mississippi without a permit and polluted the surrounding community. xAI did not immediately respond to a request for comment.)</p>
<p class="paywall">The EIA conducts mandatory investigations of suppliers of various types of energy generation, including oil and gas production, electricity generation and renewable energy sources, as well as their industrial customers. In overdue March, the day before the senators sent their letter, EIA announced that it would conduct a pilot survey in three areas of the country experiencing intense data center development: Texas, Washington state and the northern Virginia/DC metropolitan area.</p>
<p class="paywall">In the April 9 letter, Abbey says the agency will announce a second tranche of pilot studies &#8220;spanning at least three additional states.&#8221; Both studies will be completed by the end of September. Abbey writes that these two pilot studies represent &#8220;a necessary step in the methodological development of a nationwide mandatory study.&#8221;</p>
<p class="paywall">According to the letter, the information EIA collects from data centers under these pilot programs includes not only information on annual electricity consumption, but also information on off-meter energy generation. The surveys, Abbey writes, will also include questions about the classification of different types of data centers; cooling systems; characteristics of the object, e.g. surface; and IT specifications, including data center energy efficiency metrics.</p>
<p class="paywall">The letter still leaves many unanswered questions about the pilot structure.</p>
<p class="paywall">According to the letter, the pilot program will not ask each respondent about the full set of metrics, but rather will tailor the questions &#8220;to the specific location of each data center facility.&#8221; The <a data-offer-url="https://www.eia.gov/pressroom/releases/press585.php" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.eia.gov/pressroom/releases/press585.php&quot;}" href="https://www.eia.gov/pressroom/releases/press585.php" rel="nofollow noopener" target="_blank">current remote control</a> also asks 196 enterprises identified in three regions to choose only one location for which they will report indicators. EIA did not respond to questions about how it determined which locations should receive which questions, or whether it had set any requirements for surveying respondents about how they selected which data center locations to provide information about.</p>
<p class="paywall">The EIA also did not respond to WIRED&#8217;s questions about when the second series of pilot studies are scheduled to begin, which states will be included, or the possible timing of a mandatory national-level study.</p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/the-us-government-will-ask-data-centers-how-much-energy-they-employ/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i0.wp.com/media.wired.com/photos/69debc4d4d898774bc720bdd/191:100/w_1280,c_limit/GettyImages-2249152852.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>5 Free Ways to Host Python Applications</title>
		<link>https://aisckool.com/5-free-ways-to-host-python-applications/</link>
					<comments>https://aisckool.com/5-free-ways-to-host-python-applications/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Mon, 20 Apr 2026 20:52:43 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26240</guid>

					<description><![CDATA[Photo by the author # Entry So you&#8217;re a student or someone just starting to learn the operational side of app development. You&#8217;ve already taken the first step by developing and testing your app locally. Now you want to deploy it in the cloud so you can access it from anywhere. The problem is that [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div id="post-">
<p>    <center><br /><span>Photo by the author</span></center></p>
<h2><span># </span>Entry</h2>
<p>So you&#8217;re a student or someone just starting to learn the operational side of app development. You&#8217;ve already taken the first step by developing and testing your app locally. Now you want to deploy it in the cloud so you can access it from anywhere. The problem is that cloud hosting can seem complicated and costly when you&#8217;re just starting out.</p>
<p>In this article, we&#8217;ll look at some of the simplest free platforms that allow you to host a Python web app or application programming interface (API) application without paying upfront. While these services have confined processing power, they are usually more than enough for your first toy project, personal demo, or just experimenting with deploying, monitoring, and basic application management.</p>
</p>
<h2><span># </span>1. Share AI apps with huggable facespaces</h2>
<p><strong><a href="https://huggingface.co/spaces" target="_blank" rel="noopener">Hugging facial space</a></strong>    is one of my favorite options for hosting Python applications, especially if you work on AI-related projects. It is very beginner-friendly and makes implementation less intimidating. You can run <strong><a href="https://www.gradio.app/" target="_blank" rel="noopener">Built</a></strong>    application by simply transferring files by pressing a button <strong><a href="https://github.com/" target="_blank" rel="noopener">Git</a></strong>    commits or even using the Hugging Face Command Line Interface (CLI).</p>
<p></p>
<p><center><img decoding="async" alt="5 Free Ways to Host Python Applications" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_3.png"></center></p>
<p>This is particularly useful for machine learning and enormous language model (LLM) projects, but also supports <strong><a href="https://streamlit.io/" target="_blank" rel="noopener">Streamlined</a></strong>    and Docker-based applications. This gives you some flexibility depending on how straightforward and custom your application is.</p>
<p>The default free hardware in Hugging Face Spaces provides 2 CPU cores, 16 GB of RAM, and 50 GB of volatile disk space, which is more than enough for many demos, prototypes, class projects, and compact experiments.</p>
<p>Please note that Spaces on the basic free CPU tier will automatically go to sleep after approximately 48 hours of inactivity, but will restart when someone revisits the app.</p>
</p>
<h2><span># </span>2. Deploy data applications with Streamlit Community Cloud</h2>
<p><strong><a href="https://share.streamlit.io/" target="_blank" rel="noopener">Streamlined social cloud</a></strong>    was one of the first platforms I used when learning to implement web applications in Python. Along <strong><a href="https://www.heroku.com/" target="_blank" rel="noopener">Heroku</a></strong>this made the whole process much easier to understand. This is a great starting point for beginners because you can go from a local project to a working application without having to do too much setup.</p>
<p></p>
<p><center><img decoding="async" alt="5 Free Ways to Host Python Applications" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_2.png"></center></p>
<p>While many people still think of Streamlit as just a dashboard tool, it has become a versatile way to create data applications, internal tools, and lightweight, interactive web applications in Python. Implementation experience is one of its greatest advantages because your <strong><a href="https://github.com/" target="_blank" rel="noopener">GitHub</a></strong>    the repository acts as a source of truth, and pushes to the repository are automatically reflected in the application.</p>
<p>Streamlit claims that with the free tier, all Community Cloud users share the same pool of resources, with approximate limits of 0.078 to 2 CPU cores, 690 MB to 2.7 GB of memory, and up to 50 GB of storage. It&#8217;s worth knowing that apps that are idle for 12 hours go to sleep, but can be woken up again when someone visits the app.</p>
</p>
<h2><span># </span>3. Deploy backend APIs with Render</h2>
<p><strong><a href="https://render.com/" target="_blank" rel="noopener">To give back</a></strong>    is a more complete hosting platform that allows you to deploy all kinds of web applications, including Python, Node.js, Ruby on Rails, and Docker-based services. This is a forceful option if you want to host e.g <strong><a href="https://flask.palletsprojects.com/" target="_blank" rel="noopener">Flask</a></strong>    Or <strong><a href="https://fastapi.tiangolo.com/" target="_blank" rel="noopener">FastAPI</a></strong>    backend without configuring servers yourself.</p>
<p></p>
<p><center><img decoding="async" alt="5 Free Ways to Host Python Applications" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_4.png"></center></p>
<p>The implementation process is very straightforward. You connect to a GitHub repository &#8211; although Render also supports it <strong><a href="https://about.gitlab.com/" target="_blank" rel="noopener">GitLab</a></strong>    AND <strong><a href="https://bitbucket.org/" target="_blank" rel="noopener">Bitbucket</a></strong>    — and the platform handles the build and deployment process for you. This makes it a very beginner-friendly way to get the Python API online.</p>
<p>Render offers a free web services layer that is useful for testing ideas, hobby projects, and compact demos. The essential thing to know is that free internet services stop working after 15 minutes of inactivity, and when someone visits again, it can take up to a minute for the service to wake up again.</p>
</p>
<h2><span># </span>4. Run Python applications using Modal</h2>
<p><strong><a href="https://modal.com/" target="_blank" rel="noopener">Modal</a></strong>    is one of my favorite contemporary platforms for running Python applications, especially when the project is a little more advanced than a straightforward demo. I used it for Model Context Protocol (MCP) backends, AI agents, and more sophisticated applications where I wanted something quickly without having to manage the infrastructure myself. One of the coolest parts is that you define the infrastructure in Python, so the whole developer experience feels much more natural if you&#8217;re already working in the Python ecosystem.</p>
<p></p>
<p><center><img decoding="async" alt="5 Free Ways to Host Python Applications" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_1.png"></center></p>
<p>It is especially useful for machine learning workloads, background tasks, and back-end services. You can run Python functions, scheduled tasks, and web endpoints, making it versatile enough for APIs, asynchronous processing, and model inference.</p>
<p>The free tier is quite generous to start with. Modal&#8217;s Starter plan includes $30 per month in free credit, plus a confined number of network endpoints and cron jobs, which is usually enough for compact experiments, personal projects, and early prototypes.</p>
</p>
<h2><span># </span>5. Host full Python applications on PythonAnywhere</h2>
<p><strong><a href="https://www.pythonanywhere.com/" target="_blank" rel="noopener">Python, anywhere</a></strong>    is one of the most famed Python hosting platforms. It feels a bit more venerable school compared to newer tools, but it still gets the job done. One of the reasons people keep coming back to it is that it is built specifically for Python, so you can write code, manage files, open consoles, and deploy web applications from your browser without having to set up your own server.</p>
<p></p>
<p><center><img decoding="async" alt="5 Free Ways to Host Python Applications" width="100%" class="perfmatters-lazy" src="https://www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_6.png"></center></p>
<p>This is a good option for straightforward Flask i <strong><a href="https://www.djangoproject.com/" target="_blank" rel="noopener">Django</a></strong>    projects, especially if you want an all-in-one environment rather than combining multiple separate services. For beginners, this can make learning much easier.</p>
<p>The free account is really useful for learning and compact projects. Currently free accounts include:</p>
<ul>
<li>One web app with one workflow.
</li>
<li>Two consoles.
</li>
<li>512 MiB disk space and 100 CPU seconds.
</li>
<li>Apps run on <code>yourusername.pythonanywhere.com</code> subdomain, and free accounts confined outgoing Internet access.
</li>
</ul>
<h2><span># </span>Summary</h2>
<p>Here&#8217;s a quick comparison to aid you choose the right platform depending on the type of Python application you want to deploy.</p>
</p>
<table style="width: 100%; border-collapse: collapse; font-family: Arial, sans-serif; font-size: 14px; color: #333;">
<thead>
<tr style="background-color: #ffd29a;">
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">Platform</th>
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">Best for</th>
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">Free level style</th>
<th style="padding: 12px; border: 1px solid #ddd; text-align: left;">Good for beginners</th>
</tr>
</thead>
<tbody>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Hugging facial space</td>
<td style="padding: 12px; border: 1px solid #ddd;">AI, Gradio, Streamlit demos</td>
<td style="padding: 12px; border: 1px solid #ddd;">Free social hosting with CPU resources</td>
<td style="padding: 12px; border: 1px solid #ddd;">Yes</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Streamlined social cloud</td>
<td style="padding: 12px; border: 1px solid #ddd;">Data applications, dashboards, internal tools</td>
<td style="padding: 12px; border: 1px solid #ddd;">Free application hosting with GitHub</td>
<td style="padding: 12px; border: 1px solid #ddd;">Yes</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">To give back</td>
<td style="padding: 12px; border: 1px solid #ddd;">Flask and FastAPI backend APIs</td>
<td style="padding: 12px; border: 1px solid #ddd;">Free online service with sleep function after inactivity</td>
<td style="padding: 12px; border: 1px solid #ddd;">Yes</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Modal</td>
<td style="padding: 12px; border: 1px solid #ddd;">AI backends, agents, tasks, serverless applications</td>
<td style="padding: 12px; border: 1px solid #ddd;">Monthly free credits</td>
<td style="padding: 12px; border: 1px solid #ddd;">Moderate</td>
</tr>
<tr>
<td style="padding: 12px; border: 1px solid #ddd;">Python, anywhere</td>
<td style="padding: 12px; border: 1px solid #ddd;">Flask and Django applications</td>
<td style="padding: 12px; border: 1px solid #ddd;">Free beginner plan with one web app</td>
<td style="padding: 12px; border: 1px solid #ddd;">Yes</td>
</tr>
</tbody>
</table>
<p> </p>
<p><a href="https://abid.work" rel="noopener" target="_blank"><b><strong><a href="https://abid.work" target="_blank" rel="noopener noreferrer">Abid Ali Awan</a></strong></b></a>    (<a href="https://www.linkedin.com/in/1abidaliawan" rel="noopener" target="_blank">@1abidaliawan</a>) is a certified data science professional who loves building machine learning models. Currently, he focuses on creating content and writing technical blogs about machine learning and data science technologies. Abid holds a Master&#8217;s degree in Technology Management and a Bachelor&#8217;s degree in Telecommunications Engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.</p>
</p></div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/5-free-ways-to-host-python-applications/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i1.wp.com/www.kdnuggets.com/wp-content/uploads/awan_5_free_ways_host_python_application_5.png?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>There is modern evidence about how loneliness affects memory in senior age</title>
		<link>https://aisckool.com/there-is-modern-evidence-about-how-loneliness-affects-memory-in-senior-age/</link>
					<comments>https://aisckool.com/there-is-modern-evidence-about-how-loneliness-affects-memory-in-senior-age/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Mon, 20 Apr 2026 11:51:26 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26236</guid>

					<description><![CDATA[Neuroscientists know this There is a link between loneliness and cognitive decline in older people, although the exact scale of this link is still tough to understand. A modern longitudinal study provides evidence that some people who feel lonely are more likely to have memory problems, although this does not necessarily mean that their brains [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p><span class="lead-in-text-callout">Neuroscientists know this</span> There is a link between loneliness and cognitive decline in older people, although the exact scale of this link is still tough to understand. A modern longitudinal study provides evidence that some people who feel lonely are more likely to have memory problems, although this does not necessarily mean that their brains are aging faster.</p>
<p class="paywall">The report, published in the journal Aging &#038; Mental Health, found that older adults with higher levels of loneliness performed worse on immediate and delayed recall tests. Yet the rate of decline in their memory over the six years was virtually the same as that of people who were not single.</p>
<p class="paywall">&#8220;This suggests that loneliness may play a more significant role in the initial state of memory than in its progressive decline,&#8221; he added. <a href="https://www.eurekalert.org/news-releases/1123405" class="text link" target="_blank" rel="noopener">he said</a> Luis Carlos Venegas-Sanabria from the School of Medicine and Health Sciences at Universidad del Rosario, who led the research. “The study highlights the importance of addressing loneliness as an important factor in older adults&#8217; cognitive performance.”</p>
<h2 class="paywall">A six-year study of thousands of lonely people</h2>
<p class="paywall">The team analyzed data from the Study of Health, Aging and Retirements in Europe (SHARE), one of the most tough longitudinal databases for studying aging. For six years, researchers followed 10,217 adults aged 65 to 94 from 12 European countries. They assessed their level of loneliness and performance on memory tests.</p>
<p class="paywall"><a href="https://www.tandfonline.com/doi/full/10.1080/13607863.2026.2624569#abstract" class="text link" target="_blank" rel="noopener">Results</a> show that age was the most essential determinant of memory level and the rate of its deterioration. From the age of 75, the results began to decline faster. After the age of 85, the decline became more pronounced. Depression and chronic diseases such as diabetes also lowered the initial score. Loneliness, while affecting baseline, did not accelerate the slope of cognitive decline.</p>
<p class="paywall">The study also found that physical activity was associated with better initial memory scores. People who engaged in moderate or vigorous physical activity at least once a month remembered more words on immediate and delayed recall tests. This effect did not change the rate of decline, but it did raise the baseline level, which functions as a kind of &#8220;cognitive buffer.&#8221;</p>
<p class="paywall">Although the study did not examine the causes of the association between loneliness and cognitive function, previous research has proposed likely mechanisms. Loneliness is often associated with less social interaction, which is a factor in cognitive performance. It is also associated with an increased risk of depression, which directly affects memory tests. Additionally, single people tend to have more health problems, such as hypertension and diabetes, which also affect cognitive function.</p>
<p class="paywall">According to United Nations forecasts, by 2050, every sixth person in the world will be over 65 years of age. Societies are entering a stage where senior age will no longer be an exception, but will become the norm. Dementia, like other neurodegenerative diseases that appear with age, will be a grave challenge for health care facilities.</p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/there-is-modern-evidence-about-how-loneliness-affects-memory-in-senior-age/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i2.wp.com/media.wired.com/photos/69e26f4466e2b77438cf4564/191:100/w_1280,c_limit/GettyImages-2256650784.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>Obscure matter may consist of black holes from another universe</title>
		<link>https://aisckool.com/obscure-matter-may-consist-of-black-holes-from-another-universe/</link>
					<comments>https://aisckool.com/obscure-matter-may-consist-of-black-holes-from-another-universe/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Sun, 19 Apr 2026 17:49:25 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26228</guid>

					<description><![CDATA[Latest cosmology the model combines two of the most eccentric ideas in up-to-date physics to explain the nature of obscure matter, an unseen substance that makes up about 85 percent of all matter in the universe. To understand this, you need to look beyond the Large Bang that we all know and consider two concepts [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p><span class="lead-in-text-callout">Latest cosmology</span> the model combines two of the most eccentric ideas in up-to-date physics to explain the nature of obscure matter, an unseen substance that makes up about 85 percent of all matter in the universe. To understand this, you need to look beyond the Large Bang that we all know and consider two concepts that rarely intersect: cyclic universes and primordial black holes.</p>
<h2 class="paywall">A different kind of multiverse</h2>
<p class="paywall">There are different versions of the &#8220;multiverse&#8221;. The most popular model &#8211; the Marvel Cinematic Universe &#8211; assumes that there are as many universes as possible and that these versions of reality are parallel. Physics proposes something more sober and mathematically coherent: cosmic reflection.</p>
<p class="paywall">In this model, the universe is not born from a singularity, but expands, contracts and expands again in an infinite cycle. Each &#8220;universe&#8221; is not parallel, but sequential &#8211; that is, one arises from the ashes of the previous one.</p>
<p class="paywall">Is it possible for something to survive the end of its universe and survive into the next one? According to the publication published in <a data-offer-url="https://journals.aps.org/prd/abstract/10.1103/pr4p-6m49" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://journals.aps.org/prd/abstract/10.1103/pr4p-6m49&quot;}" href="https://journals.aps.org/prd/abstract/10.1103/pr4p-6m49" rel="nofollow noopener" target="_blank">Physical inspection d</a>Yes. Author Enrique Gaztanaga, research professor at the Institute of Space Sciences in Barcelona, ​​shows that any structure larger than about 90 meters can survive the final collapse of the universe and survive the rebound. These &#8220;relics&#8221; would not only survive, but could also give rise to the gigantic, unexplained structures observed in the early stages of the up-to-date universe. What&#8217;s more, they may be the key to understanding obscure matter.</p>
<p class="paywall">For decades, the dominant explanation for obscure matter was that it was an unknown particle or particles. But after years of experimentation without direct discoveries, physicists began to explore alternatives. One of them suggests that obscure matter is not an exotic particle, but an bountiful population of diminutive black holes that we overlook.</p>
<p class="paywall">The idea is attractive, but it has a stern problem. For these black holes to explain obscure matter, they would have to have existed from the earliest moments of the universe, long before the first stars collapsed. There are indications that these objects may exist, but there is no convincing physical mechanism to explain their origin.</p>
<h2 class="paywall">A universe born with black holes</h2>
<p class="paywall">This is where Gaztanaga&#8217;s newly proposed model shines. If cosmic reflections allowed compact structures to survive the collapse of the previous universe, then the current universe would have already been born with pre-existing black holes. They would not have to arise from extreme fluctuations or finely tuned inflationary processes, but would simply exist from the first moment.</p>
<p class="paywall">This assumption has the potential to solve two mysteries at once: the origin of black holes and the nature of obscure matter. If this model is correct, obscure matter would not be the secret of the early universe, but rather the legacy of a cosmos older than ours.</p>
<p class="paywall">“Much work remains to be done,” said Gaztanaga, also a researcher at the Institute of Cosmology and Gravity at the University of Portsmouth, in an article for the magazine <a data-offer-url="https://theconversation.com/could-dark-matter-be-made-of-black-holes-from-a-different-universe-278469" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://theconversation.com/could-dark-matter-be-made-of-black-holes-from-a-different-universe-278469&quot;}" href="https://theconversation.com/could-dark-matter-be-made-of-black-holes-from-a-different-universe-278469" rel="nofollow noopener" target="_blank">Conversation</a>. “These ideas need to be tested with data ranging from the gravitational wave background to galaxy surveys and precise measurements of the cosmic microwave background.”</p>
<p class="paywall">“But the possibility is great,” he added. &#8220;The universe may not have started once, but it may have bounced back. And the dark structures that shape today&#8217;s galaxies may be relics from a time before the Big Bang.&#8221;</p>
<p class="paywall"><em>This story originally appeared on</em> <em><a href="https://es.wired.com/articulos/la-materia-oscura-podria-estar-conformada-de-los-agujeros-negros-de-otro-universo-sugiere-un-estudio" class="text link" target="_blank" rel="noopener">WIRED in Spanish</a> and was translated from Spanish.</em></p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/obscure-matter-may-consist-of-black-holes-from-another-universe/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i0.wp.com/media.wired.com/photos/69e12b860d58c6570f78e85b/191:100/w_1280,c_limit/GettyImages-458015353.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>The problem of the &#8220;lone runner&#8221; is only seemingly basic</title>
		<link>https://aisckool.com/the-problem-of-the-lone-runner-is-only-seemingly-basic/</link>
					<comments>https://aisckool.com/the-problem-of-the-lone-runner-is-only-seemingly-basic/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Sat, 18 Apr 2026 23:47:20 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26224</guid>

					<description><![CDATA[Original version With this story appeared in Quanta Magazine. Imagine a bizarre training exercise: a group of runners start running on a circular track, each maintaining a unique, constant pace. Will every runner finish &#8220;alone&#8221; or relatively far from others at least once, regardless of speed? Mathematicians guess the answer is yes. The &#8220;lone runner&#8221; [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div>
<p><span class="lead-in-text-callout">Original version</span> <em>With <a href="https://www.quantamagazine.org/new-strides-made-on-deceptively-simple-lonely-runner-problem-20260306/" class="text link" target="_blank" rel="noopener">this story</a> appeared in <a href="https://www.quantamagazine.org/" class="text link" target="_blank" rel="noopener">Quanta Magazine</a>.</em></p>
<p class="paywall">Imagine a bizarre training exercise: a group of runners start running on a circular track, each maintaining a unique, constant pace. Will every runner finish &#8220;alone&#8221; or relatively far from others at least once, regardless of speed?</p>
<p class="paywall">Mathematicians guess the answer is yes.</p>
<p class="paywall">The &#8220;lone runner&#8221; problem may seem basic and inconsequential, but it appears in many forms in mathematics. This is the equivalent of questions in number theory, geometry, graph theory, and more &#8211; about when you can get clear visibility in a field of obstacles, where billiard balls can move across a table, or how to organize a network. &#8220;It has so many facets. It touches on so many different areas of mathematics,&#8221; he said <a data-offer-url="https://matthbeck.github.io/" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://matthbeck.github.io/&quot;}" href="https://matthbeck.github.io/" rel="nofollow noopener" target="_blank">Maciej Beck</a> San Francisco State University.</p>
<p class="paywall">With just two or three runners, the evidence for this hypothesis is rudimentary. Mathematicians proved this with four runners in the 1970s, and by 2007. <a data-offer-url="https://arxiv.org/abs/0710.4495" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://arxiv.org/abs/0710.4495&quot;}" href="https://arxiv.org/abs/0710.4495" rel="nofollow noopener" target="_blank">up to seven</a>. However, over the past two decades, no one has been able to go further.</p>
<p class="paywall">Then last year <a data-offer-url="https://www.lirmm.fr/~mrosenfeld/#Home" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.lirmm.fr/~mrosenfeld/#Home&quot;}" href="https://www.lirmm.fr/~mrosenfeld/#Home" rel="nofollow noopener" target="_blank">Matthieu Rosenfeld</a>mathematician from the Laboratory of Computer Science, Robotics and Microelectronics in Montpellier, resolved the conjecture <a data-offer-url="https://arxiv.org/abs/2509.14111" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://arxiv.org/abs/2509.14111&quot;}" href="https://arxiv.org/abs/2509.14111" rel="nofollow noopener" target="_blank">eight runners</a>. And within a few weeks, a second-year student from Oxford University named <a data-offer-url="https://users.ox.ac.uk/~sjoh6037/" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://users.ox.ac.uk/~sjoh6037/&quot;}" href="https://users.ox.ac.uk/~sjoh6037/" rel="nofollow noopener" target="_blank">Tanupat (Paul) Trakulthongchai</a> built on Rosenfeld&#8217;s ideas to prove it <a data-offer-url="https://arxiv.org/abs/2511.22427" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://arxiv.org/abs/2511.22427&quot;}" href="https://arxiv.org/abs/2511.22427" rel="nofollow noopener" target="_blank">nine and 10</a> runners.</p>
<p class="paywall">The sudden progress has sparked renewed interest in the problem. “It&#8217;s a real milestone,” said Beck, who was not involved in the work. Adding just one factor makes proving this hypothesis &#8220;exponentially more difficult,&#8221; he said. “To go from seven runners to now 10 runners is amazing.”</p>
<h2 class="paywall">Starting jump</h2>
<p class="paywall">In the beginning, the lone runner problem had nothing to do with running.</p>
<p class="paywall">Instead, mathematicians were interested in a seemingly unrelated problem: how to utilize fractions to approximate irrational numbers like pi, a task that has a huge number of applications. In the 1960s, a graduate of <a data-offer-url="https://www.uni-siegen.de/fb6/fb6/mitarbeiter/visitenkarten/vkwills.html" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.uni-siegen.de/fb6/fb6/mitarbeiter/visitenkarten/vkwills.html&quot;}" href="https://www.uni-siegen.de/fb6/fb6/mitarbeiter/visitenkarten/vkwills.html" rel="nofollow noopener" target="_blank">Jörg M. Wills</a> he guessed that <a href="https://www.quantamagazine.org/new-proof-settles-how-to-approximate-numbers-like-pi-20190814/" class="text link" target="_blank" rel="noopener">there is a hundred-year-old method for this</a> is optimal &#8211; that there is no way to improve it.</p>
<p class="paywall">In 1998, a group of mathematicians <a href="https://www.sciencedirect.com/science/article/pii/S0095895697917706" target="_blank" class="text link" rel="noopener">he rewrote this supposition</a> in the language of running. To talk <em>N</em> runners start from the same place on a circular track 1 unit long and each runs at a different constant speed. Wills&#8217; hypothesis is equivalent to saying that every runner will always be lonely at some point, regardless of the speed of the other runners. More specifically, each runner will at some point be at least 1/<em>N</em> from another runner.</p>
<p class="paywall">When Wills saw the article about the lone runner, he emailed one of the authors: <a data-offer-url="https://www.cecm.sfu.ca/people/pm/goddyn.shtml" class="external-link text link" data-event-click="{&quot;element&quot;:&quot;ExternalLink&quot;,&quot;outgoingURL&quot;:&quot;https://www.cecm.sfu.ca/people/pm/goddyn.shtml&quot;}" href="https://www.cecm.sfu.ca/people/pm/goddyn.shtml" rel="nofollow noopener" target="_blank">Luis Goddyn</a> Simon Fraser University to congratulate him on &#8220;this wonderful and poetic name.&#8221; (Goddyn&#8217;s response: &#8220;Oh, you&#8217;re still alive.&#8221;)</p>
<div class="GenericCalloutWrapper-loJzHJ hdkcxr callout--has-top-border" data-testid="GenericCallout">
<figure class="AssetEmbedWrapper-iJvQnD cOWUYC asset-embed">
<div class="AssetEmbedAssetContainer-fnduJP iaVSwI asset-embed__asset-container"><span class="SpanWrapper-kFnjvc eKnjjD responsive-asset AssetEmbedResponsiveAsset-gaAbQ hXaxHA asset-embed__responsive-asset"><picture class="ResponsiveImagePicture-jKunQM gjCCFj AssetEmbedResponsiveAsset-gaAbQ hXaxHA asset-embed__responsive-asset responsive-image"></picture></span></div>
<div class="CaptionWrapper-bpPcvW iDPSlt caption AssetEmbedCaption-eZIMNW gMgneI asset-embed__caption" data-testid="caption-wrapper"><span class="BaseText-fEwdHD CaptionText-cQpRdU kRTNAB hbiMYj caption__text"></p>
<p>Jörg Wills proposed a number theory hypothesis that decades later became known as the lone runner problem.</p>
<p></span><span class="BaseText-fEwdHD CaptionCredit-cUgOGk iQbGEh hRFzlA caption__credit">Courtesy of Jörg Wills/Quanta magazine</span></div>
</figure>
</div>
<p class="paywall">Mathematicians have also shown that the lone runner problem is equivalent to yet another question. Imagine an endless sheet of graph paper. Place a miniature square in the center of each grid. Then start at one of the corners of the grid and draw a straight line. (The line can point in any direction other than perfectly vertical or horizontal.) How substantial can the smaller squares be before the line must hit one?</p>
<p class="paywall">As versions of the lone runner problem spread throughout mathematics, interest in the question increased. Mathematicians have proven different cases of the conjecture using completely different techniques. Sometimes they relied on the tools of number theory; other times they turned to geometry or graph theory.</p>
</div>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/the-problem-of-the-lone-runner-is-only-seemingly-basic/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i3.wp.com/media.wired.com/photos/69e2786904c989eedf38ac47/191:100/w_1280,c_limit/Lonely%20Runner%20cr-Mark%20Belan-Default.jpg?ssl=1" medium="image"></media:content>
            	</item>
		<item>
		<title>5 useful Python scripts for advanced data validation and quality control</title>
		<link>https://aisckool.com/5-useful-python-scripts-for-advanced-data-validation-and-quality-control/</link>
					<comments>https://aisckool.com/5-useful-python-scripts-for-advanced-data-validation-and-quality-control/#respond</comments>
		
		<dc:creator><![CDATA[The AI Sckool]]></dc:creator>
		<pubDate>Sat, 18 Apr 2026 14:45:18 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<guid isPermaLink="false">https://aisckool.com/?p=26220</guid>

					<description><![CDATA[Photo by the author # Entry These advanced validation issues are insidious. They pass basic quality checks because the individual values ​​look good, but the underlying logic is broken. Manually controlling these issues is challenging. You need automated scripts that understand context, business rules, and relationships between data points. This article discusses five advanced Python [&#8230;]]]></description>
										<content:encoded><![CDATA[<p></p>
<div id="post-">
<p>    <center><br /><span>Photo by the author</span></center></p>
<h2><span># </span>Entry</h2>
<p>These advanced validation issues are insidious. They pass basic quality checks because the individual values ​​look good, but the underlying logic is broken. Manually controlling these issues is challenging. You need automated scripts that understand context, business rules, and relationships between data points. This article discusses five advanced Python validation scripts that catch subtle issues that basic checks miss.</p>
<p><strong><a href="https://github.com/balapriyac/data-science-tutorials/tree/main/useful-python-scripts-data-validation" target="_blank" rel="noopener">You can download the code on GitHub</a></strong>.</p>
</p>
<h2><span># </span>1. Checking for continuity and patterns in time series</h2>
</p>
<h4><span>// </span>Pain point</h4>
</p>
<h4><span>// </span>What the script does</h4>
</p>
<h4><span>// </span>How it works</h4>
<p>The script parses the timestamp columns to infer the expected frequency, identifying gaps in the expected continuous sequences. Verifies that event sequences follow logical ordering rules, applies domain-specific rate checks, and detects seasonality violations. It also generates detailed reports showing time anomalies along with assessing the impact on business activities.</p>
</p>
<h2><span># </span>2. Semantic validation using business rules</h2>
</p>
<h4><span>// </span>Pain point</h4>
<p>Individual fields pass type validation, but the combination makes no sense. Here are some examples: a purchase order from the future with a completed delivery date in the past. An account marked as &#8220;new customer&#8221; but with a transaction history of five years. These semantic violations break business logic.</p>
</p>
<h4><span>// </span>What the script does</h4>
<p>Verifies data against complicated business rules and domain knowledge. It checks multi-field conditional logic, checks stages and time progression, ensures respect for mutually exclusive categories, and flags logically impossible combinations. The script uses a rules engine that can express advanced business constraints.</p>
</p>
<h4><span>// </span>How it works</h4>
<p>The script accepts business rules defined in a declarative format, evaluates complicated conditional logic across multiple fields, and checks state changes and workflow progress. It also checks the temporal consistency of business events, applies industry-specific domain rules, and generates violation reports categorized by rule type and business impact.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/23e9.png" alt="⏩" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong><a href="https://github.com/balapriyac/data-science-tutorials/blob/main/useful-python-scripts-data-validation/semantic_validator.py" target="_blank" rel="noopener">Download the semantic validity checker script</a></strong></p>
</p>
<h2><span># </span>3. Detecting data drift and schema evolution</h2>
</p>
<h4><span>// </span>Pain point</h4>
<p>Data structure sometimes changes over time without documentation. Novel columns appear, existing columns disappear, data types change slightly, value ranges expand or contract, categorical values ​​create recent categories. These changes break downstream systems, invalidate assumptions, and cause hushed failures. Before you know it, months of corrupt data have accumulated.</p>
</p>
<h4><span>// </span>What the script does</h4>
<p>Monitors datasets for structural and statistical drift over time. It tracks schema changes such as recent and deleted columns, type changes, detects changes in the distribution of numeric and categorical data, and identifies recent values ​​in supposedly established categories. Flags changes in data ranges and restrictions and warns when statistical properties deviate from baseline values.</p>
</p>
<h4><span>// </span>How it works</h4>
<p>The script creates baseline profiles of the structure and statistics of the dataset, periodically compares current data to baseline values, calculates drift scores using statistical distance metrics such as <strong><a href="https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence" target="_blank" rel="noopener">KL discrepancy</a></strong>, <strong><a href="https://en.wikipedia.org/wiki/Wasserstein_metric" target="_blank" rel="noopener">Wasserstein distance</a></strong>and tracks schema version changes. It also maintains a history of changes, applies significance testing to distinguish real drift from noise, and generates drift reports with severity levels and recommended actions.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/23e9.png" alt="⏩" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong><a href="https://github.com/balapriyac/data-science-tutorials/blob/main/useful-python-scripts-data-validation/drift_detector.py" target="_blank" rel="noopener">Download the data drift detector script</a></strong></p>
</p>
<h2><span># </span>4. Validation of hierarchical and graph relations</h2>
</p>
<h4><span>// </span>Pain point</h4>
<p>Hierarchical data must remain acyclic and logically ordered. Cyclic reporting chains, self-referencing bills of material, cyclic taxonomies, and inconsistencies between parents and children break recursive queries and hierarchical aggregations.</p>
</p>
<h4><span>// </span>What the script does</h4>
<p>Validates chart and tree structures in relational data. Detects circular references in parent-child relationships, ensures hierarchy depth constraints are respected, and validates that directed acyclic graphs (DAGs) remain acyclic. The script also checks for orphaned nodes and disconnected subgraphs, and checks that root nodes and leaf nodes comply with business rules. It also checks the limitations of many-to-many relationships.</p>
</p>
<h4><span>// </span>How it works</h4>
<p>The script creates a graphical representation of hierarchical relationships, uses cycle detection algorithms to find circular references, and performs depth-first and breadth-first traversals to check structure. It then identifies highly related components in supposedly acyclic graphs, inspects the properties of nodes at each level of the hierarchy, and generates a visual representation of problematic subgraphs with specific violation details.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/23e9.png" alt="⏩" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong><a href="https://github.com/balapriyac/data-science-tutorials/blob/main/useful-python-scripts-data-validation/hierarchy_validator.py" target="_blank" rel="noopener">Download the Hierarchical Relationship Validation Script</a></strong></p>
</p>
<h2><span># </span>5. Checking referential integrity in tables</h2>
</p>
<h4><span>// </span>Pain point</h4>
<p>Relational data must maintain referential integrity across all foreign key relationships. Orphaned child records, references to deleted or non-existent parent records, invalid codes, and uncontrolled cascading deletions create hidden dependencies and inconsistencies. These breaches corrupt connections, distort reports, interrupt queries, and ultimately make data unreliable and strenuous to trust.</p>
</p>
<h4><span>// </span>What the script does</h4>
<p>Checks foreign key relationships and consistency between tables. Detects orphaned records that are missing parent or child references, checks cardinality constraints, and checks composite key uniqueness across tables. It also analyzes the effects of cascading deletes before they occur and identifies circular references in multiple tables. The script works with multiple data files at once to validate relationships.</p>
</p>
<h4><span>// </span>How it works</h4>
<p>The script loads the master dataset and all related reference tables, checks whether foreign key values ​​exist in parent tables, detects orphaned parent records, and orphaned child records. Checks cardinality rules to ensure one-to-one or one-to-many constraints and validates composite keys spanning multiple columns. The script also generates comprehensive reports showing all affected referential integrity violations, number of rows, and specific foreign key values ​​that have not been validated.</p>
<p><img src="https://s.w.org/images/core/emoji/17.0.2/72x72/23e9.png" alt="⏩" class="wp-smiley" style="height: 1em; max-height: 1em;" /> <strong><a href="https://github.com/balapriyac/data-science-tutorials/blob/main/useful-python-scripts-data-validation/referential_integrity_validator.py" target="_blank" rel="noopener">Download the referential integrity check script</a></strong></p>
</p>
<h2><span># </span>Summary</h2>
<p>Advanced data validation goes beyond null and duplicate checking. These five scripts support detect semantic violations, timing anomalies, structural drift, and referential integrity violations that completely miss basic quality controls.</p>
<p>Start with a script that will solve your most essential problem. Configure basic profiles and validation rules for a specific domain. Validate your data pipeline to catch issues at the ingestion stage, not the analysis stage. Configure alert thresholds appropriate to your operate case.</p>
<p>Have fun checking it out!</p>
<p><b><a href="https://twitter.com/balawc27" rel="noopener" target="_blank"><strong><a href="https://www.kdnuggets.com/wp-content/uploads/bala-priya-author-image-update-230821.jpg" target="_blank" rel="noopener noreferrer">Priya C&#8217;s girlfriend</a></strong></a></b>    is a software developer and technical writer from India. He likes working at the intersection of mathematics, programming, data analytics and content creation. Her areas of interest and specialization include DevOps, data analytics and natural language processing. She likes reading, writing, coding and coffee! He is currently working on learning and sharing his knowledge with the developer community by writing tutorials, guides, reviews, and more. Bala also creates fascinating resource overviews and coding tutorials.</p>
</p></div>
<p><script async src="//platform.twitter.com/widgets.js" charset="utf-8"></script></p>
]]></content:encoded>
					
					<wfw:commentRss>https://aisckool.com/5-useful-python-scripts-for-advanced-data-validation-and-quality-control/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<media:content url="https://i2.wp.com/www.kdnuggets.com/wp-content/uploads/bala-adv-data-val-python-scripts.png?ssl=1" medium="image"></media:content>
            	</item>
	</channel>
</rss>
