Building a Real-Time AI-Powered Web Application with Streamlit and LLaMA 3.2 API

Ravi Tiwari
4 min readDec 9, 2024

--

llama 3.2 + streamlit

Artificial intelligence has significantly enhanced how users interact with applications, enabling real-time, dynamic communication. Leveraging the LLaMA 3.2 API alongside Streamlit, this guide demonstrates how to create a seamless user experience for querying an advanced language model and receiving streaming responses.

Application Overview

This application connects:

  • Streamlit, a powerful framework for building interactive, web-based user interfaces.
  • The LLaMA 3.2 API, enabling natural language processing and streaming responses.
  • Requests for API interaction.
  • JSON for parsing structured response data.

Together, these components create an intuitive platform where users can submit questions and view responses in real time.

Key Features

  • User-Friendly Interface
  • Streamlit enables rapid prototyping of web interfaces with minimal coding.
  • The app includes:
  • A text input box for user questions.
  • A dropdown menu for selecting the desired model version (e.g., LLaMA 3.2 or LLaMA 2).
  • Streaming API Integration
  • The app communicates with the LLaMA 3.2 endpoint using the requests library.
  • It processes streaming responses to ensure users receive incremental feedback until the full response is complete.
  • Dynamic Feedback
  • A loading spinner enhances user experience by indicating progress while the application fetches data.
  • Error Handling
  • Graceful handling of API errors or connection issues ensures stability.

Implementation Details

Setting Up the API URL

The API endpoint is defined as:

API_URL = "http://localhost:11434/api/generate"

This is the base URL where the LLaMA 3.2 model processes requests.

Retrieving Responses

The function get_llama_response() is the backbone of the interaction. It sends a POST request with the user’s question and selected model, then processes the API's streaming responses:

def get_llama_response(question, model):
try:
response = requests.post(API_URL, json={"prompt": question, "model": model}, stream=True)
if response.status_code == 200:
full_response = ""
for chunk in response.iter_lines():
if chunk:
chunk_data = chunk.decode("utf-8")
chunk_json = json.loads(chunk_data)
full_response += chunk_json.get("response", "")
if chunk_json.get("done", False):
break
return full_response
else:
return f"Error: {response.status_code} - {response.text}"
except Exception as e:
return f"Error: {e}"

Streamlit Application Workflow

Page Configuration

  • The app’s title is set using st.set_page_config().

User Input

  • A text box allows users to input their questions.
  • A dropdown menu facilitates model selection:
model = st.selectbox(
"Select the model:",
options=["llama3.2", "llama2"],
key="selected_model"
)

Response Display

  • Once the user submits their query by clicking the button, the app displays a spinner and retrieves the LLaMA response:
if st.button("Ask the question"):
if input_question:
with st.spinner("Fetching response from the API..."):
llama_response = get_llama_response(input_question, model)
st.subheader("The response is:")
st.write(llama_response)
else:
st.warning("Please enter a question before submitting.")

Error Messages

  • Warnings and errors are shown clearly if the user provides invalid input, or the API call fails.

Full Code

# Import necessary libraries
import streamlit as st # Streamlit is used to create interactive web applications
import requests # Requests is used to interact with APIs
import json # For safely handling JSON data

# Set up the API URL
API_URL = "http://localhost:11434/api/generate" # Updated URL of the llama3.2 endpoint

# Define a function to interact with the API
def get_llama_response(question, model):
"""
Sends a POST request to the API with the user's question and model,
processes streaming responses, and retrieves the final result.

Parameters:
- question (str): The input question from the user.
- model (str): The selected model name.

Returns:
- str: The full concatenated response from the API.
"""
try:
# Make a POST request to the API with the user's input
response = requests.post(API_URL, json={"prompt": question, "model": model}, stream=True)

# Ensure the response is valid and in stream mode
if response.status_code == 200:
# Collect and concatenate chunks of the response
full_response = ""
for chunk in response.iter_lines():
if chunk:
chunk_data = chunk.decode("utf-8")
chunk_json = json.loads(chunk_data) # Safely parse each chunk as a dictionary
full_response += chunk_json.get("response", "") # Append the response text
if chunk_json.get("done", False): # Stop if "done" is true
break
return full_response
else:
# Handle API errors
return f"Error: {response.status_code} - {response.text}"
except Exception as e:
# Handle connection errors or other exceptions
return f"Error: {e}"

# Configure the Streamlit application
st.set_page_config(page_title="LLaMA 3 Streaming API Integration") # Set the title of the web app

# Add a header to the application
st.header("LLaMA 3 Streaming API Integration") # Display a header at the top of the page

# Create a text input box for the user to type their question
input_question = st.text_input("Input your question:", key="input_question")

# Add a dropdown menu for model selection
model = st.selectbox(
"Select the model:",
options=["llama3.2", "llama2"], # Add other available models here
key="selected_model"
)

# Create a button labeled "Ask the question"
if st.button("Ask the question"):
# Check if the input field is not empty
if input_question:
# Display a spinner while the API request is being processed
with st.spinner("Fetching response from the API..."):
# Call the function to get the LLaMA model's response
llama_response = get_llama_response(input_question, model)

# Display the response below the spinner
st.subheader("The response is:") # Add a subheader for clarity
st.write(llama_response) # Display the AI's response in the web app
else:
# Show a warning if the input field is empty
st.warning("Please enter a question before submitting.")

Output

Advantages of This Approach

Scalability

  • Streamlit’s simplicity makes it easy to extend functionality, such as adding more models or fine-tuning the user interface.

Real-Time Feedback

  • The streaming API ensures users receive responses incrementally, reducing perceived latency.

Ease of Deployment

  • The application can be quickly deployed locally or hosted on cloud platforms like Streamlit Cloud.

Conclusion

This application showcases the power of combining Streamlit and the LLaMA 3.2 API to build a dynamic, interactive web app for real-time question-answering. With minimal setup, developers can create robust, user-friendly AI-driven applications tailored to their specific needs.

--

--

Ravi Tiwari
Ravi Tiwari

Written by Ravi Tiwari

Experienced hands-on CTO with 20+ years of cloud native microservices expertise, driving solutions for large-scale enterprises.

No responses yet