Submission Guide for lablab.ai Hackathons At lablab.ai, we value the innovation and creativity that participants bring to our hackathons. To help you prepare your hackathon project submission, we've outlined all the required fields and forms, as well as provided recommended practices for each. Please keep in mind that all these components are equally important and contribute to your chances of advancing to the finals.
Step 2: Identify a Problem or Unmet Need Begin by selecting a specific context, such as your current job, personal interests, or a passionate industry: Provide a solution to a problem, e.g., time-consuming tasks, repetitive processes, creative bottlenecks, content creation difficulties,potentially something people are willing to pay for or that benefits a specific group. Look for gaps or unmet need that have not been adequately addressed. Analyze trends: Pay attention to emerging trends and changes in the market that may give rise to fresh new demands or opportunities. Compile a list of challenges or inefficiencies within your chosen context that could benefit from automation or enhancement.
Step 4: Conduct market research: Once you have identified problems, delve into market to ascertain if existing solutions are available and if your solution needs some upgrading to stand out among them. This research may encompass: Analyzing industry trends. Studying customer behavior. Investigating competitor strategies. With a clear understanding of the problems and markets, your team can begin to brainstorm again to identify potential solutions for each problem. Encourage everyone in the team to participate in the ideation process, regardless of their role or experience.
. Judging Criteria Specific criteria are outlined in a separate section. by Walaa Nasr
Slides:
bioAI team
Team leader
Founder at doi.bio
"Understand biology with computers"
Ph.D. in Computer Science (ML + UX) co-op @ Google Research
Worked in structural biochemistry
Applying techniques of LLMs to protein sequences
ESM3 - Evolutionary Scale Modelling - Meta AI
EvolutionaryScale.ai $168M in funding
Alphafold 3
CASP16 underway
Evolutionary Scale Modelling
Team formerly at Meta AI (FAIR)
Protein Large Language Model
Applies techniques from LLMs to DNA and protein sequences
slide showing ESM architecture
Describe the ESM architecture
68 pages including appendices
Technically complex both in AI and structural biology
Use AI to design a series of Green Fluorescent Proteins (GFPs)
Make the DNA and insert it into cells and test for fluorescence
Extract facts from this paper
Explain this to both expert and non-expert audiences
Input was a markdown file generated by Mathpix from the original PDF
Wrote python scripts to process this file
We ran 24 different experiments
Results on github
A series of Markdown files made publically available
A YouTube video explaining this paper
Please make a 5 minute presentation from these slides. Please take the slides I have now and use the same format, the Obsidian Advanced Slides Markdown format. Add more slides that address the judging criteria. Please have just a few short points per slide.
[AI]: Here's a 5-minute presentation using the Obsidian Advanced Slides Markdown format based on your current slides. Additional slides have been included to address the judging criteria, focusing on presenting your project clearly and effectively.
# Using Falcon to Understand Protein LLM Papers
## bioAI team
---
# Team Overview
## Steven Ness
- Team leader
- Founder at doi.bio
- "Understand biology with computers"
- Ph.D. in Computer Science (ML + UX)
- Co-op @ Google Research
- Expertise in structural biochemistry
---
# Problem Statement
## Challenge in Protein Research
- Complexity of protein structures and functions is overwhelming.
- Difficulty in accessing and interpreting LLM research for practical applications.
- A need for tools to simplify complex data for both experts and non-experts.
---
# Solution Overview
## Introducing Falcon
- An AI-driven tool to extract and simplify information from complex protein LLM papers.
- Transforms dense technical content into clear, accessible language.
- Bridges the gap between advanced research and practical understanding.
---
# Technology Utilized
## ESM3: Evolutionary Scale Modelling
- Developed by former Meta AI (FAIR) team.
- Incorporates techniques from LLMs applied to DNA and protein sequences.
- Enables next-gen research and applications in protein design.
---
# ESM Architecture
## Insights into ESM Technology
- Exploits evolutionary information to predict protein properties.
- Leverages large datasets for training.
- Enhances the accuracy of biological predictions.
---
# ESM3 Paper Analysis
## Key Findings
- 68-page complex document combining AI and structural biology.
- Focus on designing Green Fluorescent Proteins (GFPs).
- Methodical approach to DNA synthesis and fluorescence testing.
---
# Market Research
## Identify Opportunities
- Rapid growth and interest in biotechnology and AI intersections.
- Increasing demand for simplified, accessible research tools.
- Large addressable market potential in academic and industrial applications.
---
# User Interaction Showcase
## Demonstrating Usability
- Screen recording of Falcon's input-output process.
- Showcasing user-friendly design and interaction.
- Highlighting real-time feedback from users.
---
# Revenue Model
## Potential Revenue Streams
- Subscription model for individual and institutional access.
- Premium features for in-depth analysis and customization.
- Partnership opportunities with academic institutions and biotech firms.
---
# Competitor Analysis
## Strengths & Weaknesses
- **Competitors**: Existing platforms lack user-friendly interfaces.
- **Strengths of Falcon**: Unique ability to simplify complex papers, specialized in LLMs.
---
# Future Prospects
## Scalability & Impact
- Potential for expansion into other scientific domains beyond proteins.
- Opportunities for collaboration with educational platforms and research institutions.
- Aspiration to become the go-to platform for breaking down scientific papers.
---
# Conclusion
## Final Thoughts
- Falcon addresses a significant gap in understanding complex LLM research.
- Strong business and market potential.
- Positioned to redefine how researchers and enthusiasts interact with biological research.
ME: Please write a script for a text to speech system to read for each slide.
[AI]: Here’s a script for a text-to-speech system to read for each slide in your presentation, designed to clearly convey the essential information while keeping it engaging.
“Welcome to our presentation on using Falcon to understand protein LLM papers, brought to you by the bioAI team.”
“I’m Steven Ness, the team leader. I’m the founder of doi.bio, where we strive to understand biology with computers. I hold a Ph.D. in computer science, specializing in machine learning and user experience. I’ve also worked with Google Research and have a background in structural biochemistry.”
“Today, we face a significant challenge in protein research. The complexity of protein structures and functions can be overwhelming. It’s difficult to access and interpret the latest research on large language models, creating a need for tools that can simplify complex data for both experts and non-experts.”
“Our solution is Falcon, an AI-driven tool designed to extract and simplify information from challenging protein LLM papers. Falcon transforms dense technical content into clear, accessible language, effectively bridging the gap between advanced research and practical understanding.”
“Falcon utilizes ESM3, or Evolutionary Scale Modelling, developed by the previous Meta AI team. It applies techniques derived from large language models to DNA and protein sequences, enabling innovative research and applications in protein design.”
“Here we see insights into the ESM technology. ESM exploits evolutionary information to predict protein properties, leveraging large datasets for training. This enhances the accuracy of biological predictions, making it a powerful tool in our research toolkit.”
“The ESM3 paper is rich with findings, spanning sixty-eight pages. It combines artificial intelligence with structural biology, with a focus on designing Green Fluorescent Proteins. The paper outlines a methodical approach for synthesizing DNA and testing for fluorescence.”
“Analyzing the market reveals significant opportunities. There is a rapid growth in the demand for tools that intersect biotechnology and artificial intelligence. Falcon is poised to meet this demand, appealing to both academic and industrial applications.”
“Let’s now showcase user interaction with Falcon. Watch the screen recording demonstrating Falcon’s input-output process. Notice how user-friendly the design is, along with the real-time feedback that enhances user experience.”
“Now, moving to potential revenue streams. We envision a subscription model targeting both individual and institutional users, along with premium features for deeper analysis. We also see partnership opportunities with educational institutions and biotech firms.”
“In analyzing competitors, we find that existing platforms often neglect user-friendly interfaces. Falcon’s strength lies in its unique ability to simplify complex research papers, particularly specialized in large language models.”
“Looking ahead, Falcon has expansive scalability and potential impact. There are opportunities to branch out into other scientific domains beyond proteins. Collaboration with educational platforms and research institutions will significantly enhance our reach.”
“In conclusion, Falcon addresses a critical gap in understanding complex LLM research. With strong business and market potential, we are well-positioned to redefine how researchers and enthusiasts engage with biological literature. Thank you for your attention!”
This script provides a clear and engaging narration for each slide, ensuring that the information is accessible and easy to follow.
from openai import OpenAI
client = OpenAI()
slides = { "Intro": "Welcome back to our channel! Today, we’re diving into an exciting topic in the world of bioinformatics and machine learning: how we can enhance the understanding of protein structures using the VQ-VAE architecture. So, let’s get started!", "What is VQ-VAE?": "First, what exactly is VQ-VAE? The Variational Quantized Variational Autoencoder, or VQ-VAE, is a powerful neural network architecture designed for generating discrete representations of data. In our case, it helps in learning accurate structural representations of proteins." }
for title, text in slides.items(): print(f"Generating speech for: {title}") response = client.audio.speech.create( model="tts-1", voice="alloy", input=text, )
# Specify the filename for the output MP3
output_file = f"{title.replace(' ', '_').lower()}.mp3"
response.stream_to_file(output_file)
print(f"Saved: {output_file}")
print("All audio files generated successfully.")
Please put that text into this python program
[AI]: Here's the Python program modified to incorporate the slide text provided in the presentation script. This assumes you have the OpenAI API set up to use their text-to-speech capabilities.
from openai import OpenAI
# Initialize the OpenAI client
client = OpenAI()
# Define the text for each slide
slides = {
"Title_Slide": "Welcome to our presentation on using Falcon to understand protein LLM papers, brought to you by the bioAI team.",
"Team_Overview": "I’m Steven Ness, the team leader. I’m the founder of doi.bio, where we strive to understand biology with computers. I hold a Ph.D. in computer science, specializing in machine learning and user experience. I’ve also worked with Google Research and have a background in structural biochemistry.",
"Problem_Statement": "Today, we face a significant challenge in protein research. The complexity of protein structures and functions can be overwhelming. It’s difficult to access and interpret the latest research on large language models, creating a need for tools that can simplify complex data for both experts and non-experts.",
"Solution_Overview": "Our solution is Falcon, an AI-driven tool designed to extract and simplify information from challenging protein LLM papers. Falcon transforms dense technical content into clear, accessible language, effectively bridging the gap between advanced research and practical understanding.",
"Technology_Utilized": "Falcon utilizes ESM3, or Evolutionary Scale Modelling, developed by the previous Meta AI team. It applies techniques derived from large language models to DNA and protein sequences, enabling innovative research and applications in protein design.",
"ESM_Architecture": "Here we see insights into the ESM technology. ESM exploits evolutionary information to predict protein properties, leveraging large datasets for training. This enhances the accuracy of biological predictions, making it a powerful tool in our research toolkit.",
"ESM3_Paper_Analysis": "The ESM3 paper is rich with findings, spanning sixty-eight pages. It combines artificial intelligence with structural biology, with a focus on designing Green Fluorescent Proteins. The paper outlines a methodical approach for synthesizing DNA and testing for fluorescence.",
"Market_Research": "Analyzing the market reveals significant opportunities. There is rapid growth in the demand for tools that intersect biotechnology and artificial intelligence. Falcon is poised to meet this demand, appealing to both academic and industrial applications.",
"User_Interaction_Showcase": "Let’s now showcase user interaction with Falcon. Watch the screen recording demonstrating Falcon’s input-output process. Notice how user-friendly the design is, along with the real-time feedback that enhances user experience.",
"Revenue_Model": "Now, moving to potential revenue streams. We envision a subscription model targeting both individual and institutional users, along with premium features for deeper analysis. We also see partnership opportunities with educational institutions and biotech firms.",
"Competitor_Analysis": "In analyzing competitors, we find that existing platforms often neglect user-friendly interfaces. Falcon’s strength lies in its unique ability to simplify complex research papers, particularly specialized in large language models.",
"Future_Prospects": "Looking ahead, Falcon has expansive scalability and potential impact. There are opportunities to branch out into other scientific domains beyond proteins. Collaboration with educational platforms and research institutions will significantly enhance our reach.",
"Conclusion": "In conclusion, Falcon addresses a critical gap in understanding complex LLM research. With strong business and market potential, we are well-positioned to redefine how researchers and enthusiasts engage with biological literature. Thank you for your attention!"
}
# Iterate through each slide text and create speech output
for title, text in slides.items():
print(f"Generating speech for: {title.replace('_', ' ')}")
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input=text,
)
# Specify the filename for the output MP3
output_file = f"{title}.mp3"
response.stream_to_file(output_file)
print(f"Saved: {output_file}")
print("All audio files generated successfully.")
Slide Texts: Each entry in the slides
dictionary has been matched with its corresponding title. Spaces in the title are replaced with underscores to make it suitable for file names.
Generating Speech: The program iterates over the slides, generates speech for each text, and saves the audio in MP3 format.
Output Files: Each generated audio file is named according to the slide title.
client.audio.speech.create()
might need adjustments based on the specific capabilities you have access to in the OpenAI API. Adjust them according to your needs.