Groq: I Am Groq, I Am AI

Discover how Groq’s LPUs revolutionize AI hardware with superhero-like efficiency and power. Explore their impact on high-performance computing and the future of artificial intelligence.

6 min readJul 6, 2024

What is Groq?

Founded in 2016 by Jonathan Ross, Groq is revolutionizing the AI hardware industry with its groundbreaking chips specifically designed for inference, the process of running generative AI models. These chips, known as “Language Processing Units” (LPUs), are not only faster but also significantly more cost-effective — offering performance at one-tenth the cost of conventional AI hardware. 🌐✨

What is an LPU?

Groq’s Language Processing Unit (LPU) represents a paradigm shift in processor architecture, designed to revolutionize high-performance computing (HPC) and artificial intelligence (AI) workloads. This article will delve into the components, architecture, and workings of the LPU, highlighting its potential to transform the landscape of HPC and AI.

How Groq’s LPU Works

The LPU’s unique architecture enables it to outperform traditional CPUs and GPUs in HPC and AI workloads. Here’s a step-by-step breakdown of how the LPU works:

Data Input: Data is fed into the LPU, triggering the Centralized Control Unit to issue instructions to the Processing Elements (PEs).
Massively Parallel Processing: The PEs, organized in SIMD arrays, execute the same instruction on different data points concurrently, resulting in massively parallel processing.
High-Bandwidth Memory Hierarchy: The LPU’s memory hierarchy, including on-chip SRAM and off-chip memory, ensures high-bandwidth, low-latency data access.
Centralized Control Unit: The Centralized Control Unit manages the flow of data and instructions, coordinating the execution of thousands of operations in a single clock cycle.
Network-on-Chip (NoC): A high-bandwidth Network-on-Chip (NoC) interconnects the PEs, the CU, and the memory hierarchy, enabling fast, efficient communication between different components of the LPU.
Processing Elements: The Processing Elements consist of Arithmetic Logic Units, Vector Units, and Scalar Units, executing operations on large data sets simultaneously.
Data Output: The LPU outputs data based on the computations performed by the Processing Elements.

How LPU is Different from GPU

1.Architecture:

LPU: An LPU is designed specifically for natural language processing tasks, with a multi-stage pipeline that includes tokenization, parsing, semantic analysis, feature extraction, machine learning models, and inference/prediction.
GPU: A GPU has a more complex architecture, consisting of multiple streaming multiprocessors (SMs) or compute units, each containing multiple CUDA cores or stream processors.

2. Instruction Set:

LPU: The LPU’s instruction set is optimized for natural language processing tasks, with support for tokenization, parsing, semantic analysis, and feature extraction.
GPU: A GPU has a more general-purpose instruction set, designed for high-throughput, high-bandwidth data processing.

3. Memory Hierarchy:

LPU: The LPU’s memory hierarchy is optimized for natural language processing tasks, with a focus on efficient data access and processing.
GPU: A GPU has a more complex memory hierarchy, including registers, shared memory, L1/L2 caches, and off-chip memory. The memory hierarchy in GPUs is designed for high-throughput, high-bandwidth data access, but may have higher latency compared to the LPU for specific NLP tasks.

In summary, the LPU and GPU have different architectural designs and use cases. The LPU is designed specifically for natural language processing tasks, while GPUs are designed for high-throughput, high-bandwidth data processing, particularly for graphics rendering and parallel computations. The LPU offers a more streamlined, power-efficient architecture for natural language processing tasks, while GPUs provide a more complex, feature-rich architecture for a broader range of applications.

Groq Tools

Groq API endpoints support tool use for programmatic execution of specified operations through requests with explicitly defined operations. With tool use, Groq API model endpoints deliver structured JSON output that can be used to directly invoke functions from desired codebases.

Models

These following models powered by Groq all support tool use:

llama3–70b
llama3–8b
llama2–70b
mixtral-8x7b
gemma-7b-it

Parallel tool calling is enabled for both Llama3 models.

Use Cases

Convert natural language into API calls: Interpreting user queries in natural language, such as “What’s the weather in Palo Alto today?”, and translating them into specific API requests to fetch the requested information.
Call external API: Automating the process of periodically gathering stock prices by calling an API, comparing these prices with predefined thresholds and automatically sending alerts when these thresholds are met.
Resume parsing for recruitment: Analyzing resumes in natural language to extract structured data such as candidate name, skillsets, work history, and education, that can be used to populate a database of candidates matching certain criteria.

from groq import Groq
import os
import json

client = Groq(api_key=os.getenv('GROQ_API_KEY'))
MODEL = 'mixtral-8x7b-32768'

def get_game_score(team_name):
    """Get the current score for a given NBA game"""
    if "warriors" in team_name.lower():
        return json.dumps({"game_id": "401585601", "status": 'Final', "home_team": "Los Angeles Lakers", "home_team_score": 121, "away_team": "Golden State Warriors", "away_team_score": 128})
    elif "lakers" in team_name.lower():
        return json.dumps({"game_id": "401585601", "status": 'Final', "home_team": "Los Angeles Lakers", "home_team_score": 121, "away_team": "Golden State Warriors", "away_team_score": 128})
    elif "nuggets" in team_name.lower():
        return json.dumps({"game_id": "401585577", "status": 'Final', "home_team": "Miami Heat", "home_team_score": 88, "away_team": "Denver Nuggets", "away_team_score": 100})
    elif "heat" in team_name.lower():
        return json.dumps({"game_id": "401585577", "status": 'Final', "home_team": "Miami Heat", "home_team_score": 88, "away_team": "Denver Nuggets", "away_team_score": 100})
    else:
        return json.dumps({"team_name": team_name, "score": "unknown"})

def run_conversation(user_prompt):
    messages = [
        {
            "role": "system",
            "content": "You are a function calling LLM that uses the data extracted from the get_game_score function to answer questions around NBA game scores. Include the team and their opponent in your response."
        },
        {
            "role": "user",
            "content": user_prompt,
        }
    ]
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_game_score",
                "description": "Get the score for a given NBA game",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "team_name": {
                            "type": "string",
                            "description": "The name of the NBA team (e.g. 'Golden State Warriors')",
                        }
                    },
                    "required": ["team_name"],
                },
            },
        }
    ]
    response = client.chat.completions.create(
        model=MODEL,
        messages=messages,
        tools=tools,
        tool_choice="auto",
        max_tokens=4096
    )

    response_message = response.choices[0].message
    tool_calls = response_message.tool_calls
    if tool_calls:
        available_functions = {
            "get_game_score": get_game_score,
        }
        messages.append(response_message)
        for tool_call in tool_calls:
            function_name = tool_call.function.name
            function_to_call = available_functions[function_name]
            function_args = json.loads(tool_call.function.arguments)
            function_response = function_to_call(
                team_name=function_args.get("team_name")
            )
            messages.append(
                {
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": function_response,
                }
            )
        second_response = client.chat.completions.create(
            model=MODEL,
            messages=messages
        )
        return second_response.choices[0].message.content

user_prompt = "What was the score of the Warriors game?"
print(run_conversation(user_prompt))

Sequence of Steps

Initialize the API client: Set up the Groq Python client with your API key and specify the model to be used for generating conversational responses.
Define the function and conversation parameters: Create a user query and define a function (get_current_score) that can be called by the model, detailing its purpose, input parameters, and expected output format.
Process the model’s request: Submit the initial conversation to the model, and if the model requests to call the defined function, extract the necessary parameters from the model’s request and execute the function to get the response.
Incorporate function response into conversation: Append the function’s output to the conversation and a structured message and resubmit to the model, allowing it to generate a response that includes or reacts to the information provided by the function call.

Tools Specifications

tools: an array with each element representing a tool
type: a string indicating the category of the tool
function: an object that includes:
description: a string that describes the function’s purpose, guiding the model on when and how to use it
name: a string serving as the function’s identifier
parameters: an object that defines the parameters the function accepts

Tool Choice

tool_choice: A parameter that dictates if the model can invoke functions.

auto: The default setting where the model decides between sending a text response or calling a function
none: Equivalent to not providing any tool specification; the model won’t call any functions
Specifying a Function: To mandate a specific function call, use {“type”: “function”, “function”: {“name”:”get_financial_data”}}. The model is constrained to utilize the function named.

Known Limitations

Parallel tool use is disabled because of limitations of the Mixtral model. The endpoint will always return at most a single tool call at a time.

Conclusion

Groq’s LPUs are setting the stage for a new era in AI hardware, offering unparalleled performance and efficiency tailored for natural language processing and high-performance computing. Whether you’re in tech, finance, or any field leveraging AI, Groq’s innovations promise to propel your capabilities into the future. 🚀

For more details, visit Groq API Documentation.

Do Subscribe to newsletter for latest updates

THANK YOU