Using Chinese LLMs with LangChain: A Complete Guide
A practical tutorial on integrating TunanAPI with LangChain for AI-powered applications
Introduction
LangChain is a powerful framework for building applications with LLMs. When combined with Chinese AI models like DeepSeek V4, Qwen 3.7, and GLM-4 via TunanAPI, you get high-quality multilingual AI at a fraction of the cost of Western models.
In this guide, we will walk through:
- Setting up TunanAPI with LangChain
- Building a simple RAG application
- Using DeepSeek V4 Pro for complex reasoning
- Handling multilingual conversations
Prerequisites
- Python 3.9+
- TunanAPI account (free to start: https://tunanapi.com)
- OpenAI-compatible API key from TunanAPI
Installation
pip install langchain langchain-openai python-dotenv
Step 1: Configure TunanAPI with LangChain
LangChain has ChatOpenAI class works seamlessly with TunanAPI — just change the base_url:
from langchain_openai import ChatOpenAI
import os
# Get your API key from tunanapi.com
TUNAN_API_KEY = os.getenv("TUNAN_API_KEY")
# Initialize LangChain with TunanAPI
llm = ChatOpenAI(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="deepseek-reasoner", # Complex reasoning
temperature=0.7
)
# Test the connection
response = llm.invoke("Explain quantum computing in simple terms.")
print(response.content)
That has it! No other changes needed — all LangChain features work out of the box.
Step 2: Build a RAG Application with Qwen 3.7-Max
Let has build a Retrieval-Augmented Generation (RAG) system using Qwen 3.7-Max (great for Chinese + English context):
from langchain_community.document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain.chains import RetrievalQA
# Load documents
loader = TextLoader("your_documents.txt")
docs = loader.load()
# Split documents
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
# Create embeddings and vector store
embeddings = OpenAIEmbeddings(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="text-embedding-v3"
)
vectorstore = Chroma.from_documents(documents=splits, embedding=embeddings)
# Create RAG chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="qwen3.7-max",
temperature=0.3
),
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
# Query your documents
query = "What are the key features of our product?"
result = qa_chain({"query": query})
print(result["result"])
print(f"Sources: {len(result[source_documents])}")
Step 3: Handle Multilingual Conversations
Chinese models excel at multilingual tasks. Here has how to build a bilingual chatbot:
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
# Initialize with GLM-4-Plus (strong in Chinese + English)
llm = ChatOpenAI(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="glm-4-plus",
temperature=0.7
)
memory = ConversationBufferMemory()
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
# Chinese query
response = conversation.predict(input="解释什么是人工智能?")
print(f"Bot: {response}")
# English follow-up
response = conversation.predict(input="And what about machine learning?")
print(f"Bot: {response}")
Step 4: Use DeepSeek V4 Pro for Complex Reasoning
For tasks requiring advanced reasoning, DeepSeek V4 Pro excels:
reasoning_llm = ChatOpenAI(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="deepseek-reasoner",
temperature=0.1 # Lower temperature for more deterministic reasoning
)
# Complex reasoning task
query = """
You are a software architect. Design a microservices architecture for a
real-time chat application with 1M concurrent users. Consider scalability,
message delivery guarantees, and cost optimization.
Provide:
1. Service breakdown
2. Technology stack recommendations
3. Database schema design
4. Message queue configuration
"""
response = reasoning_llm.invoke(query)
print(response.content)
Step 5: Streaming Responses
LangChain supports streaming with TunanAPI:
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
streaming_llm = ChatOpenAI(
base_url="https://api.tunanapi.com/v1",
api_key=TUNAN_API_KEY,
model="deepseek-chat",
temperature=0.7,
streaming=True,
callbacks=[StreamingStdOutCallbackHandler()]
)
streaming_llm.invoke("Tell me a short story about AI")
Cost Comparison
| Provider | Model | Input (per 1M) | Output (per 1M) | 10M tokens total |
|---|---|---|---|---|
| OpenAI | GPT-4o | $2.50 | $10.00 | $62.50 |
| Anthropic | Claude Fable 5 | $18.00 | $90.00 | $540.00 |
| TunanAPI | Qwen 3.7-Max | $2.08 | $6.25 | $41.65 |
| TunanAPI | DeepSeek V4 Pro | $2.18 | $4.35 | $32.65 |
Using TunanAPI with LangChain saves 33-94% compared to Western alternatives.
Tips for Best Results
-
Model Selection:
- Use
deepseek-reasonerfor complex reasoning tasks - Use
qwen3.7-maxfor RAG and long-context applications (1M context) - Use
glm-4-plusfor multilingual conversations - Use
glm-4-flashfor prototyping ($0.05/1M tokens!)
- Use
-
Temperature Settings:
- 0.0-0.3 for deterministic outputs (coding, analysis)
- 0.5-0.7 for creative writing
- 0.8-1.0 for more creative/varied responses
-
Context Window:
- Qwen 3.7-Max supports up to 1M tokens
- DeepSeek V4 Pro supports 128K tokens
- GLM-4 series supports 128K tokens
-
Rate Limits:
- TunanAPI enforces provider rate limits automatically
- Check
/modelsendpoint for current limits
Conclusion
LangChain + TunanAPI gives you:
- Full LangChain ecosystem compatibility
- Access to 8 top Chinese AI models
- Up to 94% cost savings vs. Western models
- Strong multilingual capabilities
- No phone number or Alipay required
Get started in 30 seconds: https://tunanapi.com
Related Resources
Published: June 25, 2026 | Updated for TunanAPI V2 pricing













