Building a Real-Time AI-Powered Web Application with Streamlit and LLaMA 3.2 API

4 min readDec 9, 2024

Artificial intelligence has significantly enhanced how users interact with applications, enabling real-time, dynamic communication. Leveraging the LLaMA 3.2 API alongside Streamlit, this guide demonstrates how to create a seamless user experience for querying an advanced language model and receiving streaming responses.

Application Overview

This application connects:

Streamlit, a powerful framework for building interactive, web-based user interfaces.
The LLaMA 3.2 API, enabling natural language processing and streaming responses.
Requests for API interaction.
JSON for parsing structured response data.

Together, these components create an intuitive platform where users can submit questions and view responses in real time.

Key Features

User-Friendly Interface
Streamlit enables rapid prototyping of web interfaces with minimal coding.
The app includes:
A text input box for user questions.
A dropdown menu for selecting the desired model version (e.g., LLaMA 3.2 or LLaMA 2).
Streaming API Integration
The app communicates with the LLaMA 3.2 endpoint using the requests library.
It processes streaming responses to ensure users receive incremental feedback until the full response is complete.
Dynamic Feedback
A loading spinner enhances user experience by indicating progress while the application fetches data.
Error Handling
Graceful handling of API errors or connection issues ensures stability.

Implementation Details

Setting Up the API URL

The API endpoint is defined as:

API_URL = "http://localhost:11434/api/generate"

This is the base URL where the LLaMA 3.2 model processes requests.

Retrieving Responses

The function get_llama_response() is the backbone of the interaction. It sends a POST request with the user’s question and selected model, then processes the API's streaming responses:

def get_llama_response(question, model):
    try:
        response = requests.post(API_URL, json={"prompt": question, "model": model}, stream=True)
        if response.status_code == 200:
            full_response = ""
            for chunk in response.iter_lines():
                if chunk:
                    chunk_data = chunk.decode("utf-8")
                    chunk_json = json.loads(chunk_data)
                    full_response += chunk_json.get("response", "")
                    if chunk_json.get("done", False):
                        break
            return full_response
        else:
            return f"Error: {response.status_code} - {response.text}"
    except Exception as e:
        return f"Error: {e}"

Streamlit Application Workflow

Page Configuration

The app’s title is set using st.set_page_config().

User Input

A text box allows users to input their questions.
A dropdown menu facilitates model selection:

model = st.selectbox(
    "Select the model:", 
    options=["llama3.2", "llama2"], 
    key="selected_model"
)

Response Display

Once the user submits their query by clicking the button, the app displays a spinner and retrieves the LLaMA response:

if st.button("Ask the question"):
    if input_question:
        with st.spinner("Fetching response from the API..."):
            llama_response = get_llama_response(input_question, model)
            st.subheader("The response is:")
            st.write(llama_response)
    else:
        st.warning("Please enter a question before submitting.")

Error Messages

Warnings and errors are shown clearly if the user provides invalid input, or the API call fails.

Full Code

# Import necessary libraries
import streamlit as st  # Streamlit is used to create interactive web applications
import requests  # Requests is used to interact with APIs
import json  # For safely handling JSON data

# Set up the API URL
API_URL = "http://localhost:11434/api/generate"  # Updated URL of the llama3.2 endpoint

# Define a function to interact with the API
def get_llama_response(question, model):
    """
    Sends a POST request to the API with the user's question and model,
    processes streaming responses, and retrieves the final result.

    Parameters:
    - question (str): The input question from the user.
    - model (str): The selected model name.

    Returns:
    - str: The full concatenated response from the API.
    """
    try:
        # Make a POST request to the API with the user's input
        response = requests.post(API_URL, json={"prompt": question, "model": model}, stream=True)

        # Ensure the response is valid and in stream mode
        if response.status_code == 200:
            # Collect and concatenate chunks of the response
            full_response = ""
            for chunk in response.iter_lines():
                if chunk:
                    chunk_data = chunk.decode("utf-8")
                    chunk_json = json.loads(chunk_data)  # Safely parse each chunk as a dictionary
                    full_response += chunk_json.get("response", "")  # Append the response text
                    if chunk_json.get("done", False):  # Stop if "done" is true
                        break
            return full_response
        else:
            # Handle API errors
            return f"Error: {response.status_code} - {response.text}"
    except Exception as e:
        # Handle connection errors or other exceptions
        return f"Error: {e}"

# Configure the Streamlit application
st.set_page_config(page_title="LLaMA 3 Streaming API Integration")  # Set the title of the web app

# Add a header to the application
st.header("LLaMA 3 Streaming API Integration")  # Display a header at the top of the page

# Create a text input box for the user to type their question
input_question = st.text_input("Input your question:", key="input_question")

# Add a dropdown menu for model selection
model = st.selectbox(
    "Select the model:", 
    options=["llama3.2", "llama2"],  # Add other available models here
    key="selected_model"
)

# Create a button labeled "Ask the question"
if st.button("Ask the question"):
    # Check if the input field is not empty
    if input_question:
        # Display a spinner while the API request is being processed
        with st.spinner("Fetching response from the API..."):
            # Call the function to get the LLaMA model's response
            llama_response = get_llama_response(input_question, model)
            
            # Display the response below the spinner
            st.subheader("The response is:")  # Add a subheader for clarity
            st.write(llama_response)  # Display the AI's response in the web app
    else:
        # Show a warning if the input field is empty
        st.warning("Please enter a question before submitting.")

Output

Advantages of This Approach

Scalability

Streamlit’s simplicity makes it easy to extend functionality, such as adding more models or fine-tuning the user interface.

Real-Time Feedback

The streaming API ensures users receive responses incrementally, reducing perceived latency.

Ease of Deployment

The application can be quickly deployed locally or hosted on cloud platforms like Streamlit Cloud.

Conclusion

This application showcases the power of combining Streamlit and the LLaMA 3.2 API to build a dynamic, interactive web app for real-time question-answering. With minimal setup, developers can create robust, user-friendly AI-driven applications tailored to their specific needs.