On-demand price is $2.8 / million tokens.
View in dashboard

And use it in the following codes.

import os
  import openai
  
  client = openai.OpenAI(
      base_url="https://llama3-1-405b.Garuda.run/api/v1/",
      api_key=os.environ.get('GARUDA_API_TOKEN')
  )
  
  completion = client.chat.completions.create(
      model="llama3-1-405b",
      messages=[
          {"role": "user", "content": "say hello"},
      ],
      max_tokens=128,
      stream=True,
  )
  
  for chunk in completion:
      if not chunk.choices:
          continue
      content = chunk.choices[0].delta.content
      if content:
          
    print(content, end="")

The rate limit for the Model APIs is 10 requests per minute across all models under Basic Plan. If you need a higher rate limit with SLA please upgrade to standard plan, or use dedicated deployment.

© Garuda AI Inc.