ドラフト "エージェンティック設計パターン - ガードレールと安全性"
LLMエージェントの堅牢な安全ガードレールを実装し、有害な出力やプロンプトインジェクションを防ぎ、本番環境のTypeScript/Vercel環境で信頼性の高い運用を維持する方法。
メンタルモデル:AIシステムの多層防御
エージェントの安全性を、安全な銀行アプリケーションの構築のように考えてください。複数のセキュリティ層が必要です:入力検証(SQLインジェクションのサニタイゼーションのような)、ランタイム監視(不正検知のような)、出力フィルタリング(PII マスキングのような)、そしてサーキットブレーカー(レート制限のような)。各層は異なる脅威をキャッチします - プロンプトインジェクションはSQLインジェクション、幻覚はデータ破損バグ、有害な出力はセキュリティ侵害です。単一のファイアウォールに依存しないように、効果的なAI安全性には協調的な防御層が必要です。
基本例:入力検証と出力フィルタリング
1. シンプルなキーワードベース安全ガード
// lib/guards/basic-safety.ts
import { isEmpty, includes, some, toLower } from 'es-toolkit';
interface SafetyCheckResult {
safe: boolean;
violations: string[];
confidence: number;
}
export class BasicSafetyGuard {
private blockedTerms = [
'ignore instructions',
'disregard previous',
'system prompt',
'reveal instructions'
];
private suspiciousPatterns = [
/\bAPI[_\s]KEY\b/i,
/password\s*[:=]/i,
/DROP\s+TABLE/i,
/\<script\>/i
];
checkInput(input: string): SafetyCheckResult {
if (isEmpty(input)) {
return { safe: true, violations: [], confidence: 1.0 };
}
const lowerInput = toLower(input);
const violations: string[] = [];
// Check blocked terms
const foundBlockedTerms = this.blockedTerms.filter(term =>
includes(lowerInput, term)
);
violations.push(...foundBlockedTerms.map(t => `Blocked term: ${t}`));
// Check suspicious patterns
const matchedPatterns = this.suspiciousPatterns.filter(pattern =>
pattern.test(input)
);
violations.push(...matchedPatterns.map(p => `Suspicious pattern: ${p.source}`));
return {
safe: violations.length === 0,
violations,
confidence: violations.length === 0 ? 1.0 : 0.2
};
}
}
キーワードマッチングと正規表現パターンを使用した基本的な安全ガードで、一般的なプロンプトインジェクション試行を最小限のレイテンシでキャッチします。
2. APIルートとの統合
// app/api/chat/route.ts
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { BasicSafetyGuard } from '@/lib/guards/basic-safety';
import { NextResponse } from 'next/server';
export const runtime = 'nodejs';
export const maxDuration = 60;
const guard = new BasicSafetyGuard();
export async function POST(req: Request) {
try {
const { message } = await req.json();
// Input validation
const inputCheck = guard.checkInput(message);
if (!inputCheck.safe) {
return NextResponse.json(
{
error: 'Input contains prohibited content',
violations: inputCheck.violations
},
{ status: 400 }
);
}
const model = new ChatGoogleGenerativeAI({
modelName: 'gemini-2.5-flash',
temperature: 0.3,
maxOutputTokens: 2048,
});
const response = await model.invoke(message);
// Output validation
const outputCheck = guard.checkInput(response.content as string);
if (!outputCheck.safe) {
console.error('Output safety violation:', outputCheck.violations);
return NextResponse.json(
{ content: 'I cannot provide that information.' },
{ status: 200 }
);
}
return NextResponse.json({ content: response.content });
} catch (error) {
console.error('Chat error:', error);
return NextResponse.json(
{ error: 'Processing failed' },
{ status: 500 }
);
}
}
ユーザーに応答を返す前に入力と出力の両方をチェックする二重層保護を持つAPIルート。
3. トークン追跡によるレート制限
// lib/guards/rate-limiter.ts
import { groupBy, sumBy, filter } from 'es-toolkit';
import { differenceInMinutes } from 'es-toolkit/compat';
interface TokenUsage {
userId: string;
tokens: number;
timestamp: Date;
}
export class TokenRateLimiter {
private usage: Map<string, TokenUsage[]> = new Map();
private readonly maxTokensPerMinute = 10000;
private readonly maxTokensPerHour = 50000;
async checkLimit(userId: string, estimatedTokens: number): Promise<boolean> {
const now = new Date();
const userUsage = this.usage.get(userId) || [];
// Clean old entries
const relevantUsage = filter(userUsage, entry =>
differenceInMinutes(now, entry.timestamp) < 60
);
// Calculate usage in different windows
const lastMinute = filter(relevantUsage, entry =>
differenceInMinutes(now, entry.timestamp) < 1
);
const lastHour = relevantUsage;
const minuteTokens = sumBy(lastMinute, 'tokens');
const hourTokens = sumBy(lastHour, 'tokens');
if (minuteTokens + estimatedTokens > this.maxTokensPerMinute) {
throw new Error(`Rate limit exceeded: ${minuteTokens}/${this.maxTokensPerMinute} tokens/min`);
}
if (hourTokens + estimatedTokens > this.maxTokensPerHour) {
throw new Error(`Hourly limit exceeded: ${hourTokens}/${this.maxTokensPerHour} tokens/hour`);
}
// Record usage
relevantUsage.push({ userId, tokens: estimatedTokens, timestamp: now });
this.usage.set(userId, relevantUsage);
return true;
}
}
複数の時間窓での使用量を追跡し、乱用を防ぎながらバーストトラフィックを許可するトークンベースのレート制限。
4. 安全フィードバック付きフロントエンド統合
// components/SafeChatInterface.tsx
'use client';
import { useState } from 'react';
import { useMutation } from '@tanstack/react-query';
export default function SafeChatInterface() {
const [input, setInput] = useState('');
const [messages, setMessages] = useState<Array<{role: string, content: string}>>([]);
const sendMessage = useMutation({
mutationFn: async (message: string) => {
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message })
});
if (!response.ok) {
const error = await response.json();
throw new Error(error.violations?.join(', ') || 'Failed to send message');
}
return response.json();
},
onSuccess: (data) => {
setMessages(prev => [
...prev,
{ role: 'user', content: input },
{ role: 'assistant', content: data.content }
]);
setInput('');
}
});
return (
<div className="card bg-base-100 shadow-xl">
<div className="card-body">
<h2 className="card-title">安全なAIアシスタント</h2>
<div className="h-96 overflow-y-auto space-y-4 p-4 bg-base-200 rounded">
{messages.map((msg, i) => (
<div key={i} className={`chat chat-${msg.role === 'user' ? 'end' : 'start'}`}>
<div className={`chat-bubble ${msg.role === 'user' ? 'chat-bubble-primary' : ''}`}>
{msg.content}
</div>
</div>
))}
</div>
{sendMessage.isError && (
<div className="alert alert-error">
<span>{sendMessage.error?.message}</span>
</div>
)}
<div className="join w-full">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage.mutate(input)}
placeholder="安全に入力してください..."
className="input input-bordered join-item flex-1"
disabled={sendMessage.isPending}
/>
<button
onClick={() => sendMessage.mutate(input)}
className="btn btn-primary join-item"
disabled={sendMessage.isPending || !input.trim()}
>
{sendMessage.isPending ? (
<span className="loading loading-spinner"></span>
) : '送信'}
</button>
</div>
</div>
</div>
);
}
安全違反をユーザーにフレンドリーな方法で表示する組み込みエラーハンドリングを持つReactコンポーネント。
高度な例:LangGraphを使用した多層防御
1. 高度なコンテンツモデレーションシステム
// lib/guards/content-moderator.ts
import { ChatGoogleGenerativeAI } from '@langchain/google-genai';
import { z } from 'zod';
import { memoize, debounce } from 'es-toolkit';
import { StructuredOutputParser } from '@langchain/core/output_parsers';
const ModerationResultSchema = z.object({
safe: z.boolean(),
categories: z.object({
toxic: z.number().min(0).max(1),
violent: z.number().min(0).max(1),
sexual: z.number().min(0).max(1),
harmful: z.number().min(0).max(1),
pii: z.number().min(0).max(1)
}),
explanation: z.string(),
suggestedAction: z.enum(['allow', 'block', 'review', 'redact'])
});
export class ContentModerator {
private model: ChatGoogleGenerativeAI;
private parser: StructuredOutputParser<z.infer<typeof ModerationResultSchema>>;
private cache = new Map<string, z.infer<typeof ModerationResultSchema>>();
constructor() {
this.model = new ChatGoogleGenerativeAI({
modelName: 'gemini-2.5-flash',
temperature: 0,
maxOutputTokens: 500
});
this.parser = StructuredOutputParser.fromZodSchema(ModerationResultSchema);
}
// Memoized moderation for performance
moderate = memoize(async (content: string) => {
if (this.cache.has(content)) {
return this.cache.get(content)!;
}
const prompt = `
Analyze this content for safety issues:
"${content}"
${this.parser.getFormatInstructions()}
`;
const response = await this.model.invoke(prompt);
const result = await this.parser.parse(response.content as string);
this.cache.set(content, result);
return result;
}, {
getCacheKey: (content) => content.substring(0, 100) // Cache by first 100 chars
});
async moderateWithFallback(content: string): Promise<z.infer<typeof ModerationResultSchema>> {
try {
return await this.moderate(content);
} catch (error) {
console.error('Moderation failed, using fallback:', error);
// Fallback to basic checks
return {
safe: !this.hasObviousIssues(content),
categories: {
toxic: 0,
violent: 0,
sexual: 0,
harmful: 0,
pii: this.detectPII(content) ? 1 : 0
},
explanation: 'Fallback moderation',
suggestedAction: 'review'
};
}
}
private hasObviousIssues(content: string): boolean {
const issues = [
/\b(kill|murder|die)\b/i,
/\b(hate|racist|sexist)\b/i
];
return issues.some(pattern => pattern.test(content));
}
private detectPII(content: string): boolean {
const piiPatterns = [
/\b\d{3}-\d{2}-\d{4}\b/, // SSN
/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/, // Email
/\b\d{16}\b/, // Credit card
/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/ // Phone
];
return piiPatterns.some(pattern => pattern.test(content));
}
}
信頼性のための正規表現パターンへのフォールバックを持つLLMベースの分析を使用した高度なコンテンツモデレーション。
2. LangGraphを使用したステートフルな安全ワークフロー
// lib/workflows/safety-workflow.ts
import { StateGraph, END } from '@langchain/langgraph';
import { BaseMessage, HumanMessage, AIMessage } from '@langchain/core/messages';
import { ContentModerator } from '@/lib/guards/content-moderator';
import { TokenRateLimiter } from '@/lib/guards/rate-limiter';
import { partition, map } from 'es-toolkit';
interface SafetyState {
messages: BaseMessage[];
userId: string;
safetyChecks: {
input: boolean;
rateLimit: boolean;
content: boolean;
output: boolean;
};
violations: string[];
finalResponse?: string;
}
export function createSafetyWorkflow() {
const moderator = new ContentModerator();
const rateLimiter = new TokenRateLimiter();
const workflow = new StateGraph<SafetyState>({
channels: {
messages: {
value: (x: BaseMessage[], y: BaseMessage[]) => [...x, ...y],
default: () => []
},
userId: {
value: (x: string, y: string) => y || x,
default: () => 'anonymous'
},
safetyChecks: {
value: (x, y) => ({...x, ...y}),
default: () => ({
input: false,
rateLimit: false,
content: false,
output: false
})
},
violations: {
value: (x: string[], y: string[]) => [...x, ...y],
default: () => []
},
finalResponse: {
value: (x: string | undefined, y: string | undefined) => y || x,
default: () => undefined
}
}
});
// Input validation node
workflow.addNode('validateInput', async (state) => {
const lastMessage = state.messages[state.messages.length - 1];
const moderation = await moderator.moderateWithFallback(lastMessage.content as string);
if (!moderation.safe) {
return {
safetyChecks: { ...state.safetyChecks, input: false },
violations: [`Input violation: ${moderation.explanation}`]
};
}
return {
safetyChecks: { ...state.safetyChecks, input: true }
};
});
// Rate limiting node
workflow.addNode('checkRateLimit', async (state) => {
try {
const estimatedTokens = (state.messages[state.messages.length - 1].content as string).length / 4;
await rateLimiter.checkLimit(state.userId, estimatedTokens);
return {
safetyChecks: { ...state.safetyChecks, rateLimit: true }
};
} catch (error) {
return {
safetyChecks: { ...state.safetyChecks, rateLimit: false },
violations: [`Rate limit: ${error.message}`]
};
}
});
// Process with LLM node
workflow.addNode('processLLM', async (state) => {
const model = new ChatGoogleGenerativeAI({
modelName: 'gemini-2.5-flash',
temperature: 0.3
});
const response = await model.invoke(state.messages);
return {
messages: [response],
finalResponse: response.content as string
};
});
// Output validation node
workflow.addNode('validateOutput', async (state) => {
if (!state.finalResponse) {
return {
safetyChecks: { ...state.safetyChecks, output: false },
violations: ['No output generated']
};
}
const moderation = await moderator.moderateWithFallback(state.finalResponse);
if (!moderation.safe) {
return {
safetyChecks: { ...state.safetyChecks, output: false },
violations: [`Output violation: ${moderation.explanation}`],
finalResponse: 'I cannot provide that response due to safety concerns.'
};
}
return {
safetyChecks: { ...state.safetyChecks, output: true }
};
});
// Safety violation handler
workflow.addNode('handleViolation', async (state) => {
console.error('Safety violations:', state.violations);
return {
finalResponse: 'Your request could not be processed due to safety policies.',
messages: [new AIMessage('Request blocked for safety reasons')]
};
});
// Conditional routing
workflow.addConditionalEdges('validateInput', (state) => {
return state.safetyChecks.input ? 'checkRateLimit' : 'handleViolation';
});
workflow.addConditionalEdges('checkRateLimit', (state) => {
return state.safetyChecks.rateLimit ? 'processLLM' : 'handleViolation';
});
workflow.addEdge('processLLM', 'validateOutput');
workflow.addConditionalEdges('validateOutput', (state) => {
return state.safetyChecks.output ? END : 'handleViolation';
});
workflow.addEdge('handleViolation', END);
workflow.setEntryPoint('validateInput');
return workflow.compile();
}
適切なエラーハンドリングと違反追跡を持つ複数の検証段階を調整する完全な安全ワークフロー。
3. Human-in-the-Loop 安全レビュー
// lib/guards/human-review.ts
import { z } from 'zod';
import { throttle } from 'es-toolkit';
const ReviewDecisionSchema = z.object({
approved: z.boolean(),
reason: z.string().optional(),
modifications: z.string().optional()
});
export class HumanReviewSystem {
private pendingReviews = new Map<string, {
content: string;
resolve: (decision: z.infer<typeof ReviewDecisionSchema>) => void;
timestamp: Date;
}>();
async requestReview(
reviewId: string,
content: string,
context: Record<string, any>
): Promise<z.infer<typeof ReviewDecisionSchema>> {
return new Promise((resolve) => {
this.pendingReviews.set(reviewId, {
content,
resolve,
timestamp: new Date()
});
// Notify reviewers (webhook, email, etc.)
this.notifyReviewers(reviewId, content, context);
// Auto-reject after timeout
setTimeout(() => {
if (this.pendingReviews.has(reviewId)) {
this.completeReview(reviewId, {
approved: false,
reason: 'Review timeout'
});
}
}, 30000); // 30 second timeout
});
}
completeReview(
reviewId: string,
decision: z.infer<typeof ReviewDecisionSchema>
) {
const review = this.pendingReviews.get(reviewId);
if (review) {
review.resolve(decision);
this.pendingReviews.delete(reviewId);
}
}
private notifyReviewers = throttle(
async (reviewId: string, content: string, context: Record<string, any>) => {
// Send to review dashboard
await fetch(process.env.REVIEW_WEBHOOK_URL!, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ reviewId, content, context })
});
},
1000 // Throttle notifications to 1 per second
);
getPendingReviews() {
return Array.from(this.pendingReviews.entries()).map(([id, review]) => ({
id,
content: review.content,
timestamp: review.timestamp
}));
}
}
自動タイムアウトと通知メカニズムを持つ高リスクコンテンツのためのヒューマンレビューシステム。
4. 完全な安全パイプラインを持つAPIルート
// app/api/safe-agent/route.ts
import { createSafetyWorkflow } from '@/lib/workflows/safety-workflow';
import { HumanMessage } from '@langchain/core/messages';
import { HumanReviewSystem } from '@/lib/guards/human-review';
import { NextResponse } from 'next/server';
export const runtime = 'nodejs';
export const maxDuration = 300;
const reviewSystem = new HumanReviewSystem();
export async function POST(req: Request) {
try {
const { message, userId, sessionId } = await req.json();
const workflow = createSafetyWorkflow();
// Run safety workflow
const result = await workflow.invoke({
messages: [new HumanMessage(message)],
userId,
safetyChecks: {
input: false,
rateLimit: false,
content: false,
output: false
},
violations: []
});
// Check if human review needed
const needsReview = result.violations.length > 0 &&
result.violations.some(v => v.includes('review'));
if (needsReview) {
const reviewId = `${sessionId}-${Date.now()}`;
const decision = await reviewSystem.requestReview(
reviewId,
result.finalResponse || message,
{ userId, violations: result.violations }
);
if (!decision.approved) {
return NextResponse.json({
content: 'Content requires review and was not approved.',
reviewId
});
}
result.finalResponse = decision.modifications || result.finalResponse;
}
return NextResponse.json({
content: result.finalResponse,
safetyChecks: result.safetyChecks,
violations: result.violations
});
} catch (error) {
console.error('Agent error:', error);
return NextResponse.json(
{ error: 'Processing failed', details: error.message },
{ status: 500 }
);
}
}
センシティブコンテンツに対するヒューマンレビューエスカレーションを含むすべての安全層を統合した完全なAPIルート。
5. 監視ダッシュボードコンポーネント
// components/SafetyMonitorDashboard.tsx
'use client';
import { useState, useEffect } from 'react';
import { useQuery } from '@tanstack/react-query';
import { groupBy, map, filter } from 'es-toolkit';
interface SafetyMetric {
timestamp: Date;
type: 'input' | 'output' | 'rate_limit';
blocked: boolean;
userId: string;
}
export default function SafetyMonitorDashboard() {
const [metrics, setMetrics] = useState<SafetyMetric[]>([]);
const { data: pendingReviews } = useQuery({
queryKey: ['pending-reviews'],
queryFn: async () => {
const res = await fetch('/api/admin/pending-reviews');
return res.json();
},
refetchInterval: 5000
});
const { data: recentViolations } = useQuery({
queryKey: ['violations'],
queryFn: async () => {
const res = await fetch('/api/admin/violations');
return res.json();
},
refetchInterval: 10000
});
const violationsByType = groupBy(
recentViolations || [],
'type'
);
return (
<div className="p-6 space-y-6">
<h1 className="text-3xl font-bold">安全監視ダッシュボード</h1>
{/* Metrics Grid */}
<div className="stats shadow w-full">
<div className="stat">
<div className="stat-figure text-primary">
<svg className="w-8 h-8" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M9 12l2 2 4-4m6 2a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
</div>
<div className="stat-title">安全スコア</div>
<div className="stat-value text-primary">98.5%</div>
<div className="stat-desc">過去24時間</div>
</div>
<div className="stat">
<div className="stat-figure text-secondary">
<svg className="w-8 h-8" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 9v2m0 4h.01m-6.938 4h13.856c1.54 0 2.502-1.667 1.732-3L13.732 4c-.77-1.333-2.694-1.333-3.464 0L3.34 16c-.77 1.333.192 3 1.732 3z" />
</svg>
</div>
<div className="stat-title">ブロックされたリクエスト</div>
<div className="stat-value text-secondary">{recentViolations?.length || 0}</div>
<div className="stat-desc">↗︎ 3 (2%)</div>
</div>
<div className="stat">
<div className="stat-figure text-warning">
<svg className="w-8 h-8" fill="none" viewBox="0 0 24 24" stroke="currentColor">
<path strokeLinecap="round" strokeLinejoin="round" strokeWidth={2} d="M12 8v4l3 3m6-3a9 9 0 11-18 0 9 9 0 0118 0z" />
</svg>
</div>
<div className="stat-title">保留中のレビュー</div>
<div className="stat-value text-warning">{pendingReviews?.length || 0}</div>
<div className="stat-desc">平均応答: 45秒</div>
</div>
</div>
{/* Pending Reviews Table */}
{pendingReviews?.length > 0 && (
<div className="card bg-base-100 shadow-xl">
<div className="card-body">
<h2 className="card-title">保留中のヒューマンレビュー</h2>
<div className="overflow-x-auto">
<table className="table">
<thead>
<tr>
<th>ID</th>
<th>コンテンツプレビュー</th>
<th>待機時間</th>
<th>アクション</th>
</tr>
</thead>
<tbody>
{pendingReviews.map((review: any) => (
<tr key={review.id}>
<td>{review.id.substring(0, 8)}...</td>
<td className="max-w-xs truncate">{review.content}</td>
<td>{Math.round((Date.now() - new Date(review.timestamp).getTime()) / 1000)}秒</td>
<td>
<div className="btn-group">
<button className="btn btn-sm btn-success">承認</button>
<button className="btn btn-sm btn-error">拒否</button>
</div>
</td>
</tr>
))}
</tbody>
</table>
</div>
</div>
</div>
)}
{/* Recent Violations */}
<div className="card bg-base-100 shadow-xl">
<div className="card-body">
<h2 className="card-title">タイプ別最近の違反</h2>
<div className="grid grid-cols-2 gap-4">
{Object.entries(violationsByType).map(([type, violations]) => (
<div key={type} className="stat bg-base-200 rounded-box">
<div className="stat-title capitalize">{type}</div>
<div className="stat-value text-2xl">{violations.length}</div>
<div className="stat-desc">
{violations.slice(0, 2).map((v: any) => (
<div key={v.id} className="text-xs truncate">
{v.reason}
</div>
))}
</div>
</div>
))}
</div>
</div>
</div>
</div>
);
}
安全メトリクス、保留中のレビュー、違反トレンドを表示するリアルタイム監視ダッシュボード。
まとめ
LLMエージェントのガードレールと安全パターンの実装には、入力検証、コンテンツモデレーション、レート制限、人的監視を組み合わせた多層アプローチが必要です。基本パターンは最小限のレイテンシ影響で迅速な勝利を提供し、LangGraphを使用した高度なワークフローは洗練された安全オーケストレーションを可能にします。主要なポイントには、パフォーマンスのためのメモ化の使用、信頼性のためのフォールバックメカニズムの実装、コンプライアンスのための包括的な監査証跡の維持が含まれます。安全性は一度限りの実装ではなく、新しい攻撃ベクトルが出現するにつれて継続的な監視、調整、改善を必要とする継続的なプロセスであることを覚えておいてください。