[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"workflow-trulens-evaluate-and-track-llm-apps-c866ac5d":3,"seo:featured-workflow:c866ac5d-23f3-4e59-9351-a402817c90ce:zh":39,"workflow-related-trulens-evaluate-and-track-llm-apps-c866ac5d-c866ac5d-23f3-4e59-9351-a402817c90ce":83},{"id":4,"uuid":5,"slug":6,"title":7,"description":8,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":14,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":16,"files":23,"tags":24,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":31,"asset_kind":28,"target_tools":32,"install_mode":36,"entrypoint":37,"risk_profile":38,"dependencies":40,"verification":46,"agent_metadata":49,"agent_fit":60,"trust":71,"provenance":79,"created_at":81,"updated_at":82},3104,"c866ac5d-23f3-4e59-9351-a402817c90ce","trulens-evaluate-and-track-llm-apps","TruLens — Evaluate and Track LLM Apps","Instrument LLM apps and run systematic evals for RAG quality and regressions to find failure modes fast. Combine tracing and scorecards in one workflow.","8a910fec-3180-11f1-9bc6-00163e2b0d79","Agent Toolkit","https:\u002F\u002Ftokrepo.com\u002Fapple-touch-icon.png",0,"",5,"en",[17],{"id":18,"step_order":19,"title":20,"description":13,"prompt_template":21,"variables":13,"depends_on":22,"expected_output":13},3667,1,"Asset","# TruLens — Evaluate and Track LLM Apps\n\n> Instrument LLM apps and run systematic evals for RAG quality and regressions to find failure modes fast. Combine tracing and scorecards in one workflow.\n\n## Quick Use\n\n1. Install:\n   ```bash\n   pip install trulens\n   ```\n2. Run:\n   ```bash\n   python -c \"import trulens; print('trulens ok')\"\n   ```\n3. Verify:\n   - Run one quickstart evaluation and confirm you get non-empty scores and a trace view for at least one run.\n\n\n---\n\n## Intro\n\nInstrument LLM apps and run systematic evals for RAG quality and regressions to find failure modes fast. Combine tracing and scorecards in one workflow.\n\n- **Best for:** RAG\u002Fagent builders who want measurable quality (before\u002Fafter) instead of vibe-checking prompts\n- **Works with:** Python, LLM app frameworks (LangChain\u002FRAG pipelines), notebooks + CI-friendly eval runs\n- **Setup time:** 15 minutes\n\n\n### Quantitative Notes\n\n- Setup time ~15 minutes (install + one quickstart notebook or script)\n- GitHub stars + forks (verified): see Source & Thanks\n- Start with 10–50 eval cases to catch regressions early (then scale up)\n\n\n---\n\n## Practical Notes\n\nTreat evals like unit tests: freeze a small, representative dataset, define 2–4 core metrics, and make them run on every change that touches prompts\u002Fretrieval\u002Ftooling. When a score drops, inspect traces for which step (retrieval, reasoning, formatting) caused the regression.\n\n**Safety note:** Avoid optimizing for a single metric—use a small metric set (quality + safety) and review traces for overfitting.\n\n### FAQ\n\n**Q: Is it only for RAG?**\nA: No. It’s useful for any LLM app: chatbots, agents, tool callers, and prompt workflows.\n\n**Q: How do I use it in CI?**\nA: Export eval cases as data, run scoring on each PR, and fail the build on threshold drops.\n\n**Q: What should I measure first?**\nA: Start with retrieval relevance + groundedness for RAG, then add task success and safety checks.\n\n---\n\n## Source & Thanks\n\n> GitHub: https:\u002F\u002Fgithub.com\u002Ftruera\u002Ftrulens\n> Owner avatar: https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F51224128?v=4\n> License (SPDX): MIT\n> GitHub stars (verified via `api.github.com\u002Frepos\u002Ftruera\u002Ftrulens`): 3,305\n> GitHub forks (verified via `api.github.com\u002Frepos\u002Ftruera\u002Ftrulens`): 274\n\n\n---\n\n\u003C!-- ZH -->\n\n# TruLens——为 LLM 应用做评测与追踪\n\n> 给 LLM 应用加可观测性并做系统化评测：覆盖 RAG 质量、反馈函数与回归测试，快速定位失败模式；把 tracing、评分与对比看板串成可复用工作流，并可接入 CI 做阈值回归与持续改进。\n\n## 快速使用\n\n1. 安装：\n   ```bash\n   pip install trulens\n   ```\n2. 运行：\n   ```bash\n   python -c \"import trulens; print('trulens ok')\"\n   ```\n3. 验证：\n   - Run one quickstart evaluation and confirm you get non-empty scores and a trace view for at least one run.\n\n\n---\n\n## 简介\n\n给 LLM 应用加可观测性并做系统化评测：覆盖 RAG 质量、反馈函数与回归测试，快速定位失败模式；把 tracing、评分与对比看板串成可复用工作流，并可接入 CI 做阈值回归与持续改进。\n\n- **适合谁（Best for）:** 做 RAG\u002Fagent 的团队，希望用可量化指标做迭代，而不是只靠主观感受调 prompt\n- **兼容工具（Works with）:** Python、各类 LLM 应用框架（LangChain\u002FRAG pipeline）、Notebook 与 CI 评测流程\n- **安装时间（Setup time）:** 15 分钟\n\n\n### 量化信息\n\n- 跑通约 15 分钟（安装 + 一个 quickstart notebook 或脚本）\n- GitHub stars + forks（已核验）：见「来源与感谢」\n- 建议先用 10–50 条用例做回归，再逐步扩展覆盖面\n\n\n---\n\n## 实战要点\n\n把评测当单元测试：冻结一小份有代表性的用例集，定义 2–4 个核心指标，并对所有影响 prompt\u002F检索\u002F工具调用的改动强制运行。当分数下降时，结合 trace 定位是检索、推理还是格式化环节引起的回归。\n\n**安全提示：** 不要只追一个指标；用小而稳的指标组合（质量 + 安全），并结合 trace 防止过拟合。\n\n### FAQ\n\n**Q: 只适用于 RAG 吗？**\nA: 不是。任何 LLM 应用都能用：聊天、agent、工具调用、prompt 工作流等。\n\n**Q: 怎么放进 CI？**\nA: 把评测集数据化，每个 PR 跑评分；当指标跌破阈值时让 CI 失败。\n\n**Q: 最先测什么？**\nA: RAG 优先测检索相关性与有依据性；然后再补任务成功率与安全检查。\n\n---\n\n## 来源与感谢\n\n> GitHub：https:\u002F\u002Fgithub.com\u002Ftruera\u002Ftrulens\n> Owner avatar：https:\u002F\u002Favatars.githubusercontent.com\u002Fu\u002F51224128?v=4\n> 许可证（SPDX）：MIT\n> GitHub stars（已通过 `api.github.com\u002Frepos\u002Ftruera\u002Ftrulens` 核验）：3,305\n> GitHub forks（已通过 `api.github.com\u002Frepos\u002Ftruera\u002Ftrulens` 核验）：274\n","0",[],[25],{"id":26,"name":27,"slug":28,"icon":29},11,"Scripts","script","📜",false,"b0c914748577b3060d12a99238afd5d1b7c2abde906a122956fb62b5c8fa7617",[33,34,35],"claude_code","codex","gemini_cli","single","README.md",{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},null,{"npm":41,"pip":42,"brew":44,"system":45},[],[43],"trulens",[],[],{"commands":47,"expected_files":48},[],[20],{"asset_kind":28,"target_tools":50,"install_mode":36,"entrypoint":37,"risk_profile":51,"dependencies":52,"content_hash":31,"verification":57},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":53,"pip":54,"brew":55,"system":56},[],[43],[],[],{"commands":58,"expected_files":59},[],[20],{"target":34,"score":61,"status":62,"policy":62,"why":63,"asset_kind":28,"install_mode":36},29,"stage_only",[64,65,66,67,68,69,70],"target_tools includes codex","asset_kind script","install_mode single","markdown-only","policy stage_only","asset_kind script is not activated directly for Codex","trust established",{"author_trust_level":72,"verified_publisher":30,"asset_signed_hash":31,"signature_status":73,"install_count":12,"report_count":12,"dangerous_capability_badges":74,"review_status":75,"signals":76},"established","hash_only",[28],"unreviewed",[77,78],"author has published assets","content hash available",{"owner_uuid":9,"owner_name":10,"source_url":80,"content_hash":31,"visibility":19,"created_at":81,"updated_at":82},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Ftrulens-evaluate-and-track-llm-apps","2026-05-12 03:00:17","2026-05-14 00:52:18",[84,143,200,253],{"id":85,"uuid":86,"slug":87,"title":88,"description":89,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":90,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":91,"files":39,"tags":92,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":98,"asset_kind":99,"target_tools":100,"install_mode":36,"entrypoint":37,"risk_profile":101,"dependencies":102,"verification":107,"agent_metadata":110,"agent_fit":121,"trust":129,"provenance":133,"created_at":135,"updated_at":136,"__relatedScore":137,"__relatedReasons":138,"__sharedTags":142},3090,"c79b88fe-91bf-424e-9a4d-f73956516f59","weave-trace-and-debug-llm-apps","Weave — Trace and Debug LLM Apps","Weave adds tracing to LLM apps with `@weave.op`. Install `weave`, call `weave.init()`, then track inputs\u002Foutputs across API calls and validation steps.",12,[],[93],{"id":94,"name":95,"slug":96,"icon":97},13,"Knowledge","memory","🧠","e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855","knowledge",[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":103,"pip":104,"brew":105,"system":106},[],[],[],[],{"commands":108,"expected_files":109},[],[],{"asset_kind":99,"target_tools":111,"install_mode":36,"entrypoint":37,"risk_profile":112,"dependencies":113,"content_hash":98,"verification":118},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":114,"pip":115,"brew":116,"system":117},[],[],[],[],{"commands":119,"expected_files":120},[],[],{"target":34,"score":122,"status":123,"policy":124,"why":125,"asset_kind":99,"install_mode":36},96,"native","allow",[64,126,66,67,127,128,70],"asset_kind knowledge","policy allow","safe markdown-only Codex install",{"author_trust_level":72,"verified_publisher":30,"asset_signed_hash":98,"signature_status":73,"install_count":12,"report_count":12,"dangerous_capability_badges":130,"review_status":75,"signals":131},[],[77,78,132],"no dangerous capability badges",{"owner_uuid":9,"owner_name":10,"source_url":134,"content_hash":98,"visibility":19,"created_at":135,"updated_at":136},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fweave-trace-and-debug-llm-apps","2026-05-12 00:58:30","2026-05-14 00:43:10",89.67091502846026,[139,140,141],"topic-match","same-target","same-author",[],{"id":144,"uuid":145,"slug":146,"title":147,"description":148,"author_id":149,"author_name":150,"author_avatar":11,"token_estimate":151,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":152,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":153,"files":39,"tags":154,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":159,"asset_kind":160,"target_tools":161,"install_mode":36,"entrypoint":162,"risk_profile":163,"dependencies":165,"verification":170,"agent_metadata":173,"agent_fit":185,"trust":189,"provenance":193,"created_at":195,"updated_at":196,"__relatedScore":197,"__relatedReasons":198,"__sharedTags":199},443,"a543eba5-fe14-46f3-9aa5-96a5a23b72d0","opik-debug-evaluate-monitor-llm-apps-a543eba5","Opik — Debug, Evaluate & Monitor LLM Apps","Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.","8a911193-3180-11f1-9bc6-00163e2b0d79","AI Open Source",1231,173,[],[155],{"id":90,"name":156,"slug":157,"icon":158},"Configs","config","⚙️","e7822f3ad0cf593a43725bc23f3f3ee6ba3dc357ce449417ea01c27b06ae1f4b","skill",[33,34,35],"opik.md",{"executes_code":30,"modifies_global_config":30,"requires_secrets":164,"uses_absolute_paths":30,"network_access":30},[],{"npm":166,"pip":167,"brew":168,"system":169},[],[],[],[],{"commands":171,"expected_files":172},[],[162],{"asset_kind":160,"target_tools":174,"install_mode":36,"entrypoint":162,"risk_profile":175,"dependencies":177,"content_hash":159,"verification":182},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":176,"uses_absolute_paths":30,"network_access":30},[],{"npm":178,"pip":179,"brew":180,"system":181},[],[],[],[],{"commands":183,"expected_files":184},[],[162],{"target":34,"score":186,"status":123,"policy":124,"why":187,"asset_kind":160,"install_mode":36},98,[64,188,66,67,127,128,70],"asset_kind skill",{"author_trust_level":72,"verified_publisher":30,"asset_signed_hash":159,"signature_status":73,"install_count":12,"report_count":12,"dangerous_capability_badges":190,"review_status":75,"signals":191},[],[192,77,78,132],"asset has usage views",{"owner_uuid":149,"owner_name":150,"source_url":194,"content_hash":159,"visibility":19,"created_at":195,"updated_at":196},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fopik-debug-evaluate-monitor-llm-apps-a543eba5","2026-04-03 13:03:47","2026-05-13 06:08:14",88.3608238724239,[139,140],[],{"id":201,"uuid":202,"slug":203,"title":204,"description":205,"author_id":206,"author_name":207,"author_avatar":11,"token_estimate":208,"time_saved":12,"model_used":209,"fork_count":12,"vote_count":12,"view_count":210,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":211,"files":39,"tags":212,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":214,"asset_kind":160,"target_tools":215,"install_mode":36,"entrypoint":204,"risk_profile":218,"dependencies":220,"verification":225,"agent_metadata":228,"agent_fit":240,"trust":242,"provenance":245,"created_at":247,"updated_at":248,"__relatedScore":249,"__relatedReasons":250,"__sharedTags":251},291,"2c856b4d-64e5-46b2-9bbd-a7ce9f7a7296","ragas-evaluate-rag-llm-applications-2c856b4d","Ragas — Evaluate RAG & LLM Applications","Ragas evaluates LLM applications with objective metrics, test data generation, and data-driven insights. 13.2K+ GitHub stars. RAG evaluation, auto test generation. Apache 2.0.","8a910e34-3180-11f1-9bc6-00163e2b0d79","Script Depot",500,"Claude Code",107,[],[213],{"id":26,"name":27,"slug":28,"icon":29},"8abbd75ee9ff81bfc7e6f81139523d59b88f2d19c1ca46efc8cd4b45169fbf4d",[33,34,216,35,217],"cursor","windsurf",{"executes_code":30,"modifies_global_config":30,"requires_secrets":219,"uses_absolute_paths":30,"network_access":30},[],{"npm":221,"pip":222,"brew":223,"system":224},[],[],[],[],{"commands":226,"expected_files":227},[],[204],{"asset_kind":160,"target_tools":229,"install_mode":36,"entrypoint":204,"risk_profile":230,"dependencies":232,"content_hash":214,"verification":237},[33,34,216,35,217],{"executes_code":30,"modifies_global_config":30,"requires_secrets":231,"uses_absolute_paths":30,"network_access":30},[],{"npm":233,"pip":234,"brew":235,"system":236},[],[],[],[],{"commands":238,"expected_files":239},[],[204],{"target":34,"score":186,"status":123,"policy":124,"why":241,"asset_kind":160,"install_mode":36},[64,188,66,67,127,128,70],{"author_trust_level":72,"verified_publisher":30,"asset_signed_hash":214,"signature_status":73,"install_count":12,"report_count":12,"dangerous_capability_badges":243,"review_status":75,"signals":244},[],[192,77,78,132],{"owner_uuid":206,"owner_name":207,"source_url":246,"content_hash":214,"visibility":19,"created_at":247,"updated_at":248},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fragas-evaluate-rag-llm-applications-2c856b4d","2026-04-01 07:13:47","2026-05-13 10:13:21",80.05013563323043,[139,140],[28,252],"scripts",{"id":254,"uuid":255,"slug":256,"title":257,"description":258,"author_id":9,"author_name":10,"author_avatar":11,"token_estimate":12,"time_saved":12,"model_used":13,"fork_count":12,"vote_count":12,"view_count":259,"parent_id":12,"parent_uuid":13,"lang_type":15,"steps":260,"files":39,"tags":261,"has_voted":30,"visibility":19,"share_token":13,"is_featured":12,"content_hash":98,"asset_kind":267,"target_tools":268,"install_mode":36,"entrypoint":37,"risk_profile":269,"dependencies":270,"verification":275,"agent_metadata":278,"agent_fit":289,"trust":293,"provenance":296,"created_at":135,"updated_at":298,"__relatedScore":299,"__relatedReasons":300,"__sharedTags":301},3091,"d45e0c73-d0b6-4825-8bb2-80515ed82ac1","promptflow-build-and-test-llm-apps","PromptFlow — Build and Test LLM Apps","PromptFlow is a CLI + framework for building and testing LLM flows. Install `promptflow` + `promptflow-tools`, then run `pf flow init` and `pf flow test`.",24,[],[262],{"id":263,"name":264,"slug":265,"icon":266},14,"CLI Tools","cli","🖥️","cli_tool",[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":271,"pip":272,"brew":273,"system":274},[],[],[],[],{"commands":276,"expected_files":277},[],[],{"asset_kind":267,"target_tools":279,"install_mode":36,"entrypoint":37,"risk_profile":280,"dependencies":281,"content_hash":98,"verification":286},[33,34,35],{"executes_code":30,"modifies_global_config":30,"requires_secrets":39,"uses_absolute_paths":30,"network_access":30},{"npm":282,"pip":283,"brew":284,"system":285},[],[],[],[],{"commands":287,"expected_files":288},[],[],{"target":34,"score":61,"status":62,"policy":62,"why":290,"asset_kind":267,"install_mode":36},[64,291,66,67,68,292,70],"asset_kind cli_tool","asset_kind cli_tool is not activated directly for Codex",{"author_trust_level":72,"verified_publisher":30,"asset_signed_hash":98,"signature_status":73,"install_count":12,"report_count":12,"dangerous_capability_badges":294,"review_status":75,"signals":295},[267],[77,78],{"owner_uuid":9,"owner_name":10,"source_url":297,"content_hash":98,"visibility":19,"created_at":135,"updated_at":298},"https:\u002F\u002Ftokrepo.com\u002Fen\u002Fworkflows\u002Fpromptflow-build-and-test-llm-apps","2026-05-14 10:52:15",75.09691001300806,[139,140,141],[]]