AI Everywhere. Recent experiments with Generative AI at Home & Work.
It's never been easier to cobble together personalized digital solutions that work for you not against you.
Hi Folks, As I continue to work with clients on strategy and execution of AI use cases my knowledge and excitement for possibilities continues to grow. In this edition, I thought I’d share a small omnibus of some of the personal projects I’ve been working on, as well as AI tools that I find invaluable in my everyday life. Big Tech will likely win the AI wars in the long run, but you can benefit from using some of these powerful tools while they're figuring things out.
Note that if you’re doom scrolling while eating your lunch there’s a couple of 60 second shorts that will have you up to speed on the latest and greatest before you finish your sandwich.
Building a Low-Cost, AI-Powered Personal Assistant with ChatGPT and Raspberry Pi (4:37)
In this video, we'll explore my journey in creating a low-cost personal assistant using ChatGPT, Claude Opus, or any other Language Learning Model (LLM) as an alternative to popular smart home devices like Alexa or Google Home. We'll dive into the hardware components, software setup, and potential applications of this project, both in the home and in retail environments.
Hardware Components:
The core components of this project include a Raspberry Pi 4B, a mini speaker, a 128 GB MicroSD card, and a USB 2.0 mini microphone. Optional additions include a 128x32 OLED display, standoff spacer columns, screw kits, a Raspberry Pi UPS power supply with battery, and a cool case for the Raspberry Pi 4B. The total cost of the project came in at just over $150, which is a bit pricey for a home assistant but offers great value for hobbyists looking to use the components for other projects and experiments.
Software Setup:
On the software side, you'll need an OpenAI API Key and the code from judahpaul16's GitHub repository. The code translates audio to text on device, sends questions over the web to ChatGPT, which formulates a response and reads it back via the mini speaker. While there were some challenges along the way, the developer and other builders in the community were helpful in troubleshooting and fixing bugs. In addition to running ChatGPT, the project can also play music from Spotify and control lights, with plans for further improvements and additional features.
Operation:
To use the assistant, simply say the wake word "computer" and ask a question that is picked up by the USB mic and sent to ChatGPT. The response is then displayed on the OLED display and played through the mini speaker. Although the current voice library is a bit robotic, it can be replaced with more human-like options such as OpenAI's Whisper model in the future. There is a slight delay in the response, but this may improve with faster models like ChatGPT 4 and more optimized code.
Potential Applications:
This proof-of-concept demonstrates the potential for AI integration in the home and various aspects of life. Imagine AI-powered washing machines, vehicles, TVs, power tools, kitchen counters, gardening tools, or even clothes. The interactions with these objects and the data they collect could be fed to your intelligent agent to assist you in your daily life. In retail environments like Home Depot, having a trained AI expert built into the shelf who can see, hear, and recommend products and solutions could greatly benefit both customers and businesses.
Challenges and Limitations:
While this project is a great starting point, there are some challenges and limitations to consider. The current setup is a bit slow, robotic, and more expensive than desired. However, these issues can be addressed in future iterations through faster models, more optimized code, and cost-effective component selections. Additionally, as AI becomes more integrated into everyday objects, it's crucial to consider the ethical implications and the importance of data privacy and security.
The Future of AI - A Whole New Stack
OpenAI and other companies are discussing the development of an entirely new computing stack, from power and silicon to the operating system and applications (See Making Sense of Sam Altman’s $7 Trillion AI Chips Gambit). We are entering a new era where AI is ubiquitous and can observe, inform, and act on our behalf. By prototyping with projects like GPT-Home, we can explore what these interactions might look like and what data would be helpful to supplement or improve the experience.
Scaniverse, Claude Opus 3, and On Screen (60 sec)
Recently, these three technologies caught my eye. Each offers unique and exciting capabilities that have the potential to transform various industries. Let's dive in and explore Scaniverse, Anthropic’s Claude Opus 3, and On Screen.
Scaniverse - Photorealistic 3D Capture
First up is Scaniverse, an iOS application that leverages a technique with a name that I just love, ‘Gaussian Splatting,’ to capture photorealistic images of the world and create stunning 3D objects. What sets Scaniverse apart is its ability to generate high-quality, detailed 3D models using nothing more than your iPhone's camera. This space holds a special part in my heart from my work years ago on Microsoft’s Photosynth.
With Scaniverse, you can easily scan real-world objects, environments, and even people, creating digital replicas that can be viewed in VR, on your desktop, or across a range of different applications. This technology has countless potential use cases, from product design and architectural visualization to gaming and virtual tourism.
Claude Opus 3: Anthropic's Powerful Language Model
Next, I've been experimenting with Anthropic's Claude Opus 3, a state-of-the-art language model that's currently outperforming many of its competitors across various metrics. Anthropic, the company behind Claude, has raised an impressive $7.6 billion to develop their large language model (LLM), and the results are truly remarkable.
One of the standout features of Claude Opus 3 is its ability to handle large context windows, meaning it can process and understand substantial bodies of text. Additionally, the model can take images as input, opening up a world of possibilities for multimodal AI applications. As you evaluate different LLMs for your projects, Claude Opus 3 is definitely worth considering especially for prose and essays.
On Screen: AI-Generated Space Operas
Finally, there's On Screen, a fascinating tool created by Ben Gney that allows users to generate entire space operas using pure AI. On Screen leverages advanced AI techniques to create visuals, audio, and scripts, enabling users to craft captivating stories with minimal effort.
What's particularly exciting about On Screen is that it is a precursor to OpenAI’s SORA, an upcoming platform that promises to revolutionize how we create and consume content. By familiarizing yourself with tools like On Screen, you can gain a deeper understanding of the capabilities of AI-generated content and be better prepared for the launch of the more photorealistic and sophisticated SORA.
Perplexity AI: The Conversational Search Engine That's Replacing Google (60 sec)
In recent months I've found myself increasingly turning to Perplexity AI, a conversational search engine that's quickly become my go-to resource for finding information online. In fact, Perplexity AI has replaced a staggering 90% of my Google searches. Here's why.
Conversational Answers and Curated Sources
One of the standout features of Perplexity is its ability to provide succinct, conversational answers to your queries. Powered by the latest language learning models (LLMs).
Perplexity delivers information in a format that's easy to digest and understand. But what sets it apart is the inclusion of curated sources, which are just a click away. This means you can quickly verify the information provided and dive deeper into the topic if needed.
Perplexity AI also has a unique feature that allows it to remember you and your content. This means you can return to a search later and pick up where you left off, without starting from scratch. This is particularly useful for complex topics requiring multiple sessions to explore fully.
Real-Time Information
While ChatGPT is excellent for tackling open-ended, complex questions, its knowledge base is limited to information available up until it’s last release. In contrast, Perplexity continuously crawls the web, indexing and analyzing real-time information. This means you can access the most current information available.
Customization Options
Another compelling reason to give Perplexity a try is its customization options. The base capability is free, making it accessible to everyone. However, the paid model offers even more powerful features, including a co-pilot that asks clarifying questions to understand your needs better and deliver more accurate results.
In addition, Perplexity allows you to select from various popular LLMs, including ChatGPT and Claude. I often use this capability to select the current leading model (frequently changing) in different categories. I might use Anthropic’s Claude Opus for a large corpus of text where prose and writing are my focus, vs. ChatGPT 4o for a coding problem.
The Future of Search
As AI advances, tools like Perplexity are poised to reshape how we search for and consume information online. By combining the power of conversational AI, real-time indexing, and customization options, Perplexity offers a glimpse into the future of search.
While Google remains the dominant player in the search engine market, the rise of AI-powered alternatives like Perplexity, Bing, and others suggests that change is on the horizon. As more users discover the benefits of conversational search and real-time information, we may see a shift in how we navigate the vast expanse of the internet.
So in summary, Perplexity has quickly become an indispensable tool in my digital toolkit, replacing most of my Google searches. Its conversational answers, curated sources, real-time information, and customization options make it a compelling alternative to traditional search engines.
I highly recommend it if you're looking for a more efficient, personalized, and up-to-date way to find information online.
Suno - Simple Text to Song creation using Generative AI. Here's how to create your own. (4:00)
What is Suno?
I recently stumbled upon a new platform called Suno that has completely captivated me. Suno is a web-based tool that uses artificial intelligence to generate original songs from simple text descriptions, much like how Midjourney and DALL-E create images from text.
At its core, Suno is a platform that democratizes music creation. Users can input a description of the desired genre, theme, and style of the song they want to create. Suno's AI system, powered by a proprietary language learning model (LLM), then generates the audio and song structure, while ChatGPT creates the accompanying lyrics. The entire process takes a mere 10-15 seconds, and the results are often astounding. Users can create songs in any genre, from Britpop to country, without needing any musical expertise.
Why Suno Matters
Tools like Snno are revolutionizing how we create and consume media, making the process more accessible and democratic. Content creators like podcasters, YouTubers, and marketers can use Suno to generate unique, royalty-free music for their projects without breaking the bank. Moreover, the ability to create high-quality songs from simple text descriptions opens up new possibilities for individuals with limited musical skills to express their creativity.
As AI-generated media becomes more sophisticated, platforms like Suno could potentially disrupt the music industry and change how we think about music production and ownership. While there are still questions to be answered about AI's long-term impact on the music industry, it's clear that Suno and other similar tools are pushing the boundaries of what's possible and inspiring new forms of creativity.
My Experience
Eager to test Suno's capabilities, I decided to create a theme tune for the Shep Report. I described it as "a Britpop song about a Seattle-based technologist from the UK who loves technology, new companies, and gadgets." To my delight, the AI-generated two 45-second clips that perfectly captured the essence of my description. The music was catchy, the lyrics were relevant, and the overall vibe was exactly what I had envisioned.
Feeling inspired, I decided to experiment with different genres, like country. Once again, Suno's AI system demonstrated its versatility and adaptability, creating a twangy, Nashville-inspired tune that fit the description to a tee. It was clear that Suno's AI was capable of understanding and interpreting user input and generating high-quality music across a wide range of genres.
The Future of AI-Generated Music
As AI technology continues to advance, platforms like Suno could become even more sophisticated, allowing for greater customization and control over the music generation process. We may see the integration of AI-generated music with other forms of media, such as videos, games, and virtual reality experiences, creating immersive and personalized content.
The increasing accessibility of AI-generated music could also lead to new forms of collaboration and remixing, blurring the lines between human and machine creativity. Obviously, this capability also raises questions about the role of human musicians and the potential impact on the music industry as a whole. Personally, I think there’s room for both, as is the case with any creative expression.
If you're as intrigued by Suno as I am, I highly recommend heading over to https://suno.com/ and giving it a try.
Thanks for taking the time to read. If you’re looking for help with your emerging technology or AI strategy. Do get in touch and let’s explore together.