doi.bio/falcon1-ai

+begin_ai

Submission Guide for lablab.ai Hackathons At lablab.ai, we value the innovation and creativity that participants bring to our hackathons. To help you prepare your hackathon project submission, we've outlined all the required fields and forms, as well as provided recommended practices for each. Please keep in mind that all these components are equally important and contribute to your chances of advancing to the finals.

  1. Pro Tips for a Stellar Submission: Highlight the Problem & Solution: Start with the core issue your product resolves. Detail Your Product: Discuss how it functions and the technologies involved. Showcase User Interaction: A screen recording demonstrating user interaction is impactful. Discuss Market Scope: Include Total Addressable Market (TAM) and Serviceable Addressable Market (SAM). Revenue Streams: Highlight potential revenue sources. Analyze Competitors: Delve into strengths and weaknesses, emphasizing your Unique Selling Proposition. Talk About Future Prospects: Share scalability and impact potentials. Brevity is Key: Keep slides succinct; limit to 2-3 sentences each.
  2. Judging Criteria: Presentation: How effectively you convey your project. Business Value: Commercial potential and value proposition. Application of Technology: The technologies applied and their relevance. Originality: How unique and innovative your idea is.

Step 2: Identify a Problem or Unmet Need Begin by selecting a specific context, such as your current job, personal interests, or a passionate industry: Provide a solution to a problem, e.g., time-consuming tasks, repetitive processes, creative bottlenecks, content creation difficulties,potentially something people are willing to pay for or that benefits a specific group. Look for gaps or unmet need that have not been adequately addressed. Analyze trends: Pay attention to emerging trends and changes in the market that may give rise to fresh new demands or opportunities. Compile a list of challenges or inefficiencies within your chosen context that could benefit from automation or enhancement.

Step 4: Conduct market research: Once you have identified problems, delve into market to ascertain if existing solutions are available and if your solution needs some upgrading to stand out among them. This research may encompass: Analyzing industry trends. Studying customer behavior. Investigating competitor strategies. With a clear understanding of the problems and markets, your team can begin to brainstorm again to identify potential solutions for each problem. Encourage everyone in the team to participate in the ideation process, regardless of their role or experience.

. Judging Criteria Specific criteria are outlined in a separate section. by Walaa Nasr

  1. Presentation Judging criteria Score: Presentation (pdf and video)
    1 - Poor No description of problem or gaps to fill. 2 - Limited The problem & solution are not effectively communicated and difficult to understand. Presentation video is less than 3 min. 3 - Adequate Effectively communicates the problem, solution, and value proposition in less than 5 min. Absent market analysis and or marketing revenue. No future goals or plans. 4 - Strong Effectively communicates the problem, solution, and value proposition in less than 5 min. Explain market analysis and marketing revenue. Explain future goals & plans. 5 - Excellent Exceptional in every aspect. Well-crafted and flawlessly communicates problem, solution, and value proposition. Effectively shows project's strengths and uniqueness through competitive analysis.
  2. Business value Judging criteria Score: Business value
    1 - Limited Business Value Little or no potential for practical or commercial viability. Not addressing a significant problem or market need. 2 - Some Business Value Some potential for business value. Uncertain market feasibility, scalability, or revenue generation. Address a niche market with limited demand. 3 - Moderate Business Value Shows reasonable business value. Addresses market need with potential to generate revenue. Require further development or validation. 4 - High Business Value High business value, with clear market potential. Potential to attract a large customer base and generate significant revenue. Strong market feasibility and scalability. 5 - Exceptional Business Value Stands out as having exceptional business value. Potential to disrupt the industry or create a new market. Presents a clear sustainable revenue generation and long-term business success.
  3. Application of technology Judging criteria Score: Application of technology
    1 - Poor Application Demo video is not available in video presentation. Demo link is not available. Github is not available. 2 - Limited Application Demo video may be shown in presentation but not with full features of the idea. Framework is not clearly shown. Demo link is not available or working. Github code is not available or shown. 3 - Adequate Application Demo video is shown with all features tried. Demo link is available and working as expected, but there may be minor areas for improvement or optimization. Github code is not available or partly available. 4 - Strong Application Demo video is shown with all features tried. Demo link is working & is well-executed, with features and functionalities working smoothly. Github code is available & well thought off. 5 - Excellent Application Demonstrates exceptional application of AI technology through demo link, video & GitHub code. The technical implementation is flawless, and the project utilizes technology in a way that surpasses expectations.
  4. Originality Judging criteria Score: Originality
    1 - Not Original App is an exact copy of existing solutions without any novel approaches. 2 - Limited Originality Idea is common or lacks differentiation from existing solutions in the field. 3 - Moderately Original Exhibits some unique idea or approach that distinguishes it from existing non-AI solutions like decreasing cost or time. 4 - Highly Original Unique perspective or innovative idea employing unconventional methods to solve the problem. Novel elements, creative combinations, or unexpected applications. 5 - Exceptionally Original Stands out as exceptionally original, offering transformative idea or approach. Completely new perspective. Has potential to disrupt the industry or solve the problem in a completely unprecedented way.

Slides:

Using Falcon to understand Protein LLM papers

bioAI team


Steven Ness

Team leader

Founder at doi.bio

"Understand biology with computers"

Ph.D. in Computer Science (ML + UX) co-op @ Google Research

Worked in structural biochemistry


Protein LLMs

Applying techniques of LLMs to protein sequences

ESM3 - Evolutionary Scale Modelling - Meta AI

EvolutionaryScale.ai $168M in funding

Alphafold 3

CASP16 underway


ESM3

Evolutionary Scale Modelling

Team formerly at Meta AI (FAIR)

Protein Large Language Model

Applies techniques from LLMs to DNA and protein sequences


slide showing ESM architecture

Describe the ESM architecture


ESM3 Paper

68 pages including appendices

Technically complex both in AI and structural biology

Use AI to design a series of Green Fluorescent Proteins (GFPs)

Make the DNA and insert it into cells and test for fluorescence


Falcon

Extract facts from this paper

Explain this to both expert and non-expert audiences


Input

Input was a markdown file generated by Mathpix from the original PDF

Wrote python scripts to process this file


Output

We ran 24 different experiments

Results on github

A series of Markdown files made publically available

A YouTube video explaining this paper


Please make a 5 minute presentation from these slides. Please take the slides I have now and use the same format, the Obsidian Advanced Slides Markdown format. Add more slides that address the judging criteria. Please have just a few short points per slide.

[AI]: Here's a 5-minute presentation using the Obsidian Advanced Slides Markdown format based on your current slides. Additional slides have been included to address the judging criteria, focusing on presenting your project clearly and effectively.

# Using Falcon to Understand Protein LLM Papers
## bioAI team

---

# Team Overview

## Steven Ness
- Team leader
- Founder at doi.bio
- "Understand biology with computers"
- Ph.D. in Computer Science (ML + UX)
- Co-op @ Google Research
- Expertise in structural biochemistry

---

# Problem Statement

## Challenge in Protein Research
- Complexity of protein structures and functions is overwhelming.
- Difficulty in accessing and interpreting LLM research for practical applications.
- A need for tools to simplify complex data for both experts and non-experts.

---

# Solution Overview

## Introducing Falcon
- An AI-driven tool to extract and simplify information from complex protein LLM papers.
- Transforms dense technical content into clear, accessible language.
- Bridges the gap between advanced research and practical understanding.

---

# Technology Utilized

## ESM3: Evolutionary Scale Modelling
- Developed by former Meta AI (FAIR) team.
- Incorporates techniques from LLMs applied to DNA and protein sequences.
- Enables next-gen research and applications in protein design.

---

# ESM Architecture

## Insights into ESM Technology
- Exploits evolutionary information to predict protein properties.
- Leverages large datasets for training.
- Enhances the accuracy of biological predictions.

---

# ESM3 Paper Analysis

## Key Findings
- 68-page complex document combining AI and structural biology.
- Focus on designing Green Fluorescent Proteins (GFPs).
- Methodical approach to DNA synthesis and fluorescence testing.

---

# Market Research

## Identify Opportunities
- Rapid growth and interest in biotechnology and AI intersections.
- Increasing demand for simplified, accessible research tools.
- Large addressable market potential in academic and industrial applications.

---

# User Interaction Showcase

## Demonstrating Usability
- Screen recording of Falcon's input-output process.
- Showcasing user-friendly design and interaction.
- Highlighting real-time feedback from users.

---

# Revenue Model

## Potential Revenue Streams
- Subscription model for individual and institutional access.
- Premium features for in-depth analysis and customization.
- Partnership opportunities with academic institutions and biotech firms.

---

# Competitor Analysis

## Strengths & Weaknesses
- **Competitors**: Existing platforms lack user-friendly interfaces.
- **Strengths of Falcon**: Unique ability to simplify complex papers, specialized in LLMs.

---

# Future Prospects

## Scalability & Impact
- Potential for expansion into other scientific domains beyond proteins.
- Opportunities for collaboration with educational platforms and research institutions.
- Aspiration to become the go-to platform for breaking down scientific papers.

---

# Conclusion

## Final Thoughts
- Falcon addresses a significant gap in understanding complex LLM research.
- Strong business and market potential.
- Positioned to redefine how researchers and enthusiasts interact with biological research.

Notes:

  1. Each slide emphasizes brief key points for clarity and succinctness.
  2. The presentation includes critical aspects required by the judging criteria, such as the problem statement, solution overview, market research, revenue model, competitor analysis, and future prospects.
  3. A user interaction demonstration is included to showcase the functionality of Falcon clearly.
  4. The presentation is structured to maintain engagement while providing comprehensive information relevant to the hackathon's judging criteria.

ME: Please write a script for a text to speech system to read for each slide.

[AI]: Here’s a script for a text-to-speech system to read for each slide in your presentation, designed to clearly convey the essential information while keeping it engaging.


Slide 1: Title Slide

“Welcome to our presentation on using Falcon to understand protein LLM papers, brought to you by the bioAI team.”


Slide 2: Team Overview

“I’m Steven Ness, the team leader. I’m the founder of doi.bio, where we strive to understand biology with computers. I hold a Ph.D. in computer science, specializing in machine learning and user experience. I’ve also worked with Google Research and have a background in structural biochemistry.”


Slide 3: Problem Statement

“Today, we face a significant challenge in protein research. The complexity of protein structures and functions can be overwhelming. It’s difficult to access and interpret the latest research on large language models, creating a need for tools that can simplify complex data for both experts and non-experts.”


Slide 4: Solution Overview

“Our solution is Falcon, an AI-driven tool designed to extract and simplify information from challenging protein LLM papers. Falcon transforms dense technical content into clear, accessible language, effectively bridging the gap between advanced research and practical understanding.”


Slide 5: Technology Utilized

“Falcon utilizes ESM3, or Evolutionary Scale Modelling, developed by the previous Meta AI team. It applies techniques derived from large language models to DNA and protein sequences, enabling innovative research and applications in protein design.”


Slide 6: ESM Architecture

“Here we see insights into the ESM technology. ESM exploits evolutionary information to predict protein properties, leveraging large datasets for training. This enhances the accuracy of biological predictions, making it a powerful tool in our research toolkit.”


Slide 7: ESM3 Paper Analysis

“The ESM3 paper is rich with findings, spanning sixty-eight pages. It combines artificial intelligence with structural biology, with a focus on designing Green Fluorescent Proteins. The paper outlines a methodical approach for synthesizing DNA and testing for fluorescence.”


Slide 8: Market Research

“Analyzing the market reveals significant opportunities. There is a rapid growth in the demand for tools that intersect biotechnology and artificial intelligence. Falcon is poised to meet this demand, appealing to both academic and industrial applications.”


Slide 9: User Interaction Showcase

“Let’s now showcase user interaction with Falcon. Watch the screen recording demonstrating Falcon’s input-output process. Notice how user-friendly the design is, along with the real-time feedback that enhances user experience.”


Slide 10: Revenue Model

“Now, moving to potential revenue streams. We envision a subscription model targeting both individual and institutional users, along with premium features for deeper analysis. We also see partnership opportunities with educational institutions and biotech firms.”


Slide 11: Competitor Analysis

“In analyzing competitors, we find that existing platforms often neglect user-friendly interfaces. Falcon’s strength lies in its unique ability to simplify complex research papers, particularly specialized in large language models.”


Slide 12: Future Prospects

“Looking ahead, Falcon has expansive scalability and potential impact. There are opportunities to branch out into other scientific domains beyond proteins. Collaboration with educational platforms and research institutions will significantly enhance our reach.”


Slide 13: Conclusion

“In conclusion, Falcon addresses a critical gap in understanding complex LLM research. With strong business and market potential, we are well-positioned to redefine how researchers and enthusiasts engage with biological literature. Thank you for your attention!”


This script provides a clear and engaging narration for each slide, ensuring that the information is accessible and easy to follow.

from openai import OpenAI

Initialize the OpenAI client

client = OpenAI()

Define the text for each slide

slides = { "Intro": "Welcome back to our channel! Today, we’re diving into an exciting topic in the world of bioinformatics and machine learning: how we can enhance the understanding of protein structures using the VQ-VAE architecture. So, let’s get started!", "What is VQ-VAE?": "First, what exactly is VQ-VAE? The Variational Quantized Variational Autoencoder, or VQ-VAE, is a powerful neural network architecture designed for generating discrete representations of data. In our case, it helps in learning accurate structural representations of proteins." }

Iterate through each slide text and create speech output

for title, text in slides.items(): print(f"Generating speech for: {title}") response = client.audio.speech.create( model="tts-1", voice="alloy", input=text, )

# Specify the filename for the output MP3
output_file = f"{title.replace(' ', '_').lower()}.mp3"
response.stream_to_file(output_file)
print(f"Saved: {output_file}")

print("All audio files generated successfully.")

Please put that text into this python program

[AI]: Here's the Python program modified to incorporate the slide text provided in the presentation script. This assumes you have the OpenAI API set up to use their text-to-speech capabilities.

from openai import OpenAI

# Initialize the OpenAI client
client = OpenAI()

# Define the text for each slide
slides = {
    "Title_Slide": "Welcome to our presentation on using Falcon to understand protein LLM papers, brought to you by the bioAI team.",
    "Team_Overview": "I’m Steven Ness, the team leader. I’m the founder of doi.bio, where we strive to understand biology with computers. I hold a Ph.D. in computer science, specializing in machine learning and user experience. I’ve also worked with Google Research and have a background in structural biochemistry.",
    "Problem_Statement": "Today, we face a significant challenge in protein research. The complexity of protein structures and functions can be overwhelming. It’s difficult to access and interpret the latest research on large language models, creating a need for tools that can simplify complex data for both experts and non-experts.",
    "Solution_Overview": "Our solution is Falcon, an AI-driven tool designed to extract and simplify information from challenging protein LLM papers. Falcon transforms dense technical content into clear, accessible language, effectively bridging the gap between advanced research and practical understanding.",
    "Technology_Utilized": "Falcon utilizes ESM3, or Evolutionary Scale Modelling, developed by the previous Meta AI team. It applies techniques derived from large language models to DNA and protein sequences, enabling innovative research and applications in protein design.",
    "ESM_Architecture": "Here we see insights into the ESM technology. ESM exploits evolutionary information to predict protein properties, leveraging large datasets for training. This enhances the accuracy of biological predictions, making it a powerful tool in our research toolkit.",
    "ESM3_Paper_Analysis": "The ESM3 paper is rich with findings, spanning sixty-eight pages. It combines artificial intelligence with structural biology, with a focus on designing Green Fluorescent Proteins. The paper outlines a methodical approach for synthesizing DNA and testing for fluorescence.",
    "Market_Research": "Analyzing the market reveals significant opportunities. There is rapid growth in the demand for tools that intersect biotechnology and artificial intelligence. Falcon is poised to meet this demand, appealing to both academic and industrial applications.",
    "User_Interaction_Showcase": "Let’s now showcase user interaction with Falcon. Watch the screen recording demonstrating Falcon’s input-output process. Notice how user-friendly the design is, along with the real-time feedback that enhances user experience.",
    "Revenue_Model": "Now, moving to potential revenue streams. We envision a subscription model targeting both individual and institutional users, along with premium features for deeper analysis. We also see partnership opportunities with educational institutions and biotech firms.",
    "Competitor_Analysis": "In analyzing competitors, we find that existing platforms often neglect user-friendly interfaces. Falcon’s strength lies in its unique ability to simplify complex research papers, particularly specialized in large language models.",
    "Future_Prospects": "Looking ahead, Falcon has expansive scalability and potential impact. There are opportunities to branch out into other scientific domains beyond proteins. Collaboration with educational platforms and research institutions will significantly enhance our reach.",
    "Conclusion": "In conclusion, Falcon addresses a critical gap in understanding complex LLM research. With strong business and market potential, we are well-positioned to redefine how researchers and enthusiasts engage with biological literature. Thank you for your attention!"
}

# Iterate through each slide text and create speech output
for title, text in slides.items():
    print(f"Generating speech for: {title.replace('_', ' ')}")
    response = client.audio.speech.create(
        model="tts-1",
        voice="alloy",
        input=text,
    )

    # Specify the filename for the output MP3
    output_file = f"{title}.mp3"
    response.stream_to_file(output_file)
    print(f"Saved: {output_file}")

print("All audio files generated successfully.")

Explanation:

  1. Slide Texts: Each entry in the slides dictionary has been matched with its corresponding title. Spaces in the title are replaced with underscores to make it suitable for file names.

  2. Generating Speech: The program iterates over the slides, generates speech for each text, and saves the audio in MP3 format.

  3. Output Files: Each generated audio file is named according to the slide title.

Notes:










sness@sness.net