Using Chinese LLMs with LangChain: A Complete Guide

A practical tutorial on integrating TunanAPI with LangChain for AI-powered applications

Introduction

LangChain is a powerful framework for building applications with LLMs. When combined with Chinese AI models like DeepSeek V4, Qwen 3.7, and GLM-4 via TunanAPI, you get high-quality multilingual AI at a fraction of the cost of Western models.

In this guide, we will walk through:

Setting up TunanAPI with LangChain
Building a simple RAG application
Using DeepSeek V4 Pro for complex reasoning
Handling multilingual conversations

Prerequisites

Python 3.9+
TunanAPI account (free to start: https://tunanapi.com)
OpenAI-compatible API key from TunanAPI

Installation

pip install langchain langchain-openai python-dotenv

Step 1: Configure TunanAPI with LangChain

LangChain has ChatOpenAI class works seamlessly with TunanAPI — just change the base_url:

from langchain_openai import ChatOpenAI
import os

# Get your API key from tunanapi.com
TUNAN_API_KEY = os.getenv("TUNAN_API_KEY")

# Initialize LangChain with TunanAPI
llm = ChatOpenAI(
    base_url="https://api.tunanapi.com/v1",
    api_key=TUNAN_API_KEY,
    model="deepseek-reasoner",  # Complex reasoning
    temperature=0.7
)

# Test the connection
response = llm.invoke("Explain quantum computing in simple terms.")
print(response.content)

That has it! No other changes needed — all LangChain features work out of the box.

Step 2: Build a RAG Application with Qwen 3.7-Max

Let has build a Retrieval-Augmented Generation (RAG) system using Qwen 3.7-Max (great for Chinese + English context):

from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA

# Load documents
loader = TextLoader("your_documents.txt")
docs = loader.load()

# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)

# Create embeddings and vector store
embeddings = OpenAIEmbeddings(
    base_url="https://api.tunanapi.com/v1",
    api_key=TUNAN_API_KEY,
    model="text-embedding-v3"
)

vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)

# Create RAG chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(
        base_url="https://api.tunanapi.com/v1",
        api_key=TUNAN_API_KEY,
        model="qwen3.7-max",
        temperature=0.3
    ),
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

# Query your documents
query = "What are the key features of our product?"
result = qa_chain({"query": query})
print(result["result"])
print(f"Sources: {len(result[source_documents])}")

Step 3: Handle Multilingual Conversations

Chinese models excel at multilingual tasks. Here has how to build a bilingual chatbot:

from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

# Initialize with GLM-4-Plus (strong in Chinese + English)
llm = ChatOpenAI(
    base_url="https://api.tunanapi.com/v1",
    api_key=TUNAN_API_KEY,
    model="glm-4-plus",
    temperature=0.7
)

memory = ConversationBufferMemory()

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

# Chinese query
response = conversation.predict(input="解释什么是人工智能？")
print(f"Bot: {response}")

# English follow-up
response = conversation.predict(input="And what about machine learning?")
print(f"Bot: {response}")

Step 4: Use DeepSeek V4 Pro for Complex Reasoning

For tasks requiring advanced reasoning, DeepSeek V4 Pro excels:

reasoning_llm = ChatOpenAI(
    base_url="https://api.tunanapi.com/v1",
    api_key=TUNAN_API_KEY,
    model="deepseek-reasoner",
    temperature=0.1  # Lower temperature for more deterministic reasoning
)

# Complex reasoning task
query = """
You are a software architect. Design a microservices architecture for a
real-time chat application with 1M concurrent users. Consider scalability,
message delivery guarantees, and cost optimization.

Provide:
1. Service breakdown
2. Technology stack recommendations
3. Database schema design
4. Message queue configuration
"""

response = reasoning_llm.invoke(query)
print(response.content)

Step 5: Streaming Responses

LangChain supports streaming with TunanAPI:

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

streaming_llm = ChatOpenAI(
    base_url="https://api.tunanapi.com/v1",
    api_key=TUNAN_API_KEY,
    model="deepseek-chat",
    temperature=0.7,
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()]
)

streaming_llm.invoke("Tell me a short story about AI")

Cost Comparison

Provider	Model	Input (per 1M)	Output (per 1M)	10M tokens total
OpenAI	GPT-4o	$2.50	$10.00	$62.50
Anthropic	Claude Fable 5	$18.00	$90.00	$540.00
TunanAPI	Qwen 3.7-Max	$2.08	$6.25	$41.65
TunanAPI	DeepSeek V4 Pro	$2.18	$4.35	$32.65

Using TunanAPI with LangChain saves 33-94% compared to Western alternatives.

Tips for Best Results

Model Selection:
- Use deepseek-reasoner for complex reasoning tasks
- Use qwen3.7-max for RAG and long-context applications (1M context)
- Use glm-4-plus for multilingual conversations
- Use glm-4-flash for prototyping ($0.05/1M tokens!)
Temperature Settings:
- 0.0-0.3 for deterministic outputs (coding, analysis)
- 0.5-0.7 for creative writing
- 0.8-1.0 for more creative/varied responses
Context Window:
- Qwen 3.7-Max supports up to 1M tokens
- DeepSeek V4 Pro supports 128K tokens
- GLM-4 series supports 128K tokens
Rate Limits:
- TunanAPI enforces provider rate limits automatically
- Check /models endpoint for current limits