
Photo by the author Canva
# Entry
So what is Generative AI Or Like you? It is about creating modern content, such as text, images, code, audio and even video using artificial intelligence. Before the era of gigantic language models and visions, the matter was completely different. But now, with the creation of foundation models, such as GPT, Lama and Llava, everything has changed. You can build innovative tools and interactive applications without training models from scratch.
I chose them 5 projects that include some everything: Text, image, voice, vision and some background concepts, such as refining and rag. You can try both API -based solutions and local configurations, and you will affect all components used in most state-of-the-art AI Gen applications to the end. Let’s start.
# 1. Application generator application (text generation)
To combine: Build a recipe generator with React and AI: The code meets the kitchen
We will start with something elementary and humorous, which only uses the generation of text and the key of API, without the need for tough configuration. This application allows you to introduce some basic details, such as ingredients, a type of meal, kitchen preferences, cooking time and complexity. Then it generates a full recipe using GPT. You will learn how to create a Frontend form, send data to GPT and return the provision generated by AI to the user. Here is another advanced version of the same idea: Create AI recipe search engine with GPT O1 review within 1 hour. This one has more advanced rapid engineering, GPT-4, suggestions, substitution of ingredients and a more animated frontend.
# 2. image generator application (stable diffusion, local configuration)
To combine: Build a Python AI image generator in 15 minutes (free and local)
Yes, you can generate chilly images using tools such as Chatgpt, Dall · E or Midjourney by entering the prompt. But what if you want to go a step further and run everything locally without the cost of the API interface or cloud restrictions? This project does it exactly. In this film you will learn how to configure stable diffusion on your own computer. The creator simplifies: you install Python, clon the airy interface interface repository, download the control model and run the local server. That’s all. Then you can enter text hints in the browser and immediately generate AI images, all without internet connections or API.
# 3. Medical chatbot with a voice + vision + text
To combine: Build the AI Assistant Assistant application using Multimodal LLM Llava and Whisper
This project is not specially built as medical chatbot, but the exploit of exploit fits well. You talk to him, listen, he can look at the image (like X -rays or documentary radiation) and responds intelligently by combining all three modes: voice, vision and text. It was built using Llava (multimodal model in the language of vision) and Whisper (speech model for OPENAI text) in the Gradio interface. The film goes through configuring it on Colab, installing libraries, quantifying Llava to run on a graphic processor and stitching with GTT for response to audio.
# 4. Finely tuning state-of-the-art LLM
To combine: Fine Tune Gemma 3, Qwen3, Lama 4, Phi 4 and Mistral Small with free and transformers
Until now, we exploit ready -made models with rapid engineering. It works, but if you want to have more control, tuning is another step. This movie from Trelis Research is one of the best. Therefore, instead of suggesting a project that simply mentions the adaptation model, I wanted you to focus on the actual process of tuning the model. This film shows how to refine models such as Gemma 3, Qwen3, Lama 4, Phi 4 and Mistral Miniature using Unsloth (library for faster, economical training) and transformers. It’s long (about 1.5 hours), but great worth it. You will find out when tuning makes sense how to prepare data sets, run quick evolution with VLLM and debug real training problems.
# 5. Build a local rag from scratch
To combine: Local extended generation (RAG) from scratch (tutorial step by step)
Everyone loves good chatbot, but most fall apart when asked about things except training data. There is useful there. You give your LLM vector database of relevant documents and draws the context before response. The film will lead you through the construction of a fully local rag system using a colab notebook or your own machine. You load documents (such as pdf of the textbook), divide them into fragments, generate deposition using a sentence transformer model, store them in SQLITE-VSS and combine it with local LLM (e.g. Lama 2 by Ollam). This is the purest RAG tutorial I’ve seen for beginners, and when you do this, you will understand how the chatgpt plugins really work, the AI search tools and internal chatbots.
# Wrapping
Each of these projects teaches something necessary:
Text → Photo → Voice → Tuning → Download
If you just go to the AI gene and you want to actually build things, not just have fun with a demonstration, this is your plan. Start with the one that excites you the most. And remember that you can break something. You learn so.
Canwal Mehreen Kanwal is a machine learning engineer and a technical writer with a deep passion for data learning and AI intersection with medicine. He is the co -author of the ebook “maximizing performance from chatgpt”. As a Google 2022 generation scholar for APAC, it tells diversity and academic perfection. It is also recognized as a variety of terradate at Tech Scholar, Mitacs Globalink Research Scholar and Harvard Wecode Scholar. Kanwalwal is a sizzling supporter of changes, after establishing FemCodes to strengthen women in the STEM fields.
