Deep Dive into the AI Features and Technical Implementation of Muse
In Part 1 of this series, we explored the architecture behind Muse, our GenAI-powered recruitment tool that streamlines patient recruitment for clinical trials. We detailed how Muse leverages OpenAI’s Assistants API, dynamic workflows, and strategic automation to generate high-quality recruitment content efficiently.
In this follow-up, we’ll take a deeper look at the execution layer that powers Muse’s assistants—focusing on how the Threads API ensures contextual continuity, how assistants orchestrate multi-step workflows, and how users interact with and refine AI-generated content. We’ll also discuss some key challenges we’ve encountered, including improving content quality with limited user input, ensuring creative consistency, and learning from structured reasoning models. By addressing these challenges, we’re continuously evolving Muse into a more adaptable and intelligent system for clinical trial recruitment.
Assistant Execution
This project relies heavily on OpenAI Assistants and, more importantly, the Threads API. Threads provide a structured way to maintain execution history and context—essential for managing multi-step workflows.
Additionally, the API’s ability to automatically query the vector store for relevant material simplifies assistant configuration. This means vector lookups don’t need to be explicitly defined in workflows, making the system more user-friendly.
Lifecycle of an Assistant
1. Context Gathering
The assistant gathers all necessary material before execution, tailored to its role:
For context-producing assistants – Strategic context is dynamically assembled, as described earlier.
For validation assistants – A structured process runs:
Retrieves descriptions of both the content-producing assistant and the validation assistant.
Extracts all relevant material from its vector store to create a validation guideline document.
Caches the result for future reuse, avoiding redundant processing.
This gathered context is inserted into the thread. So far, context window limits haven’t been a concern, as threads remain well within OpenAI’s maximum context length.
2. Configured Task Step Execution
The assistant executes all defined task steps sequentially.
If the workflow involves dynamic branching, the assistant:
Generates a list of items.
Clones the parent thread for each item, running subsequent steps in parallel.
The process continues until all nodes in the workflow tree complete execution.
3. Finalization
Most assistants output data that must conform to a specific database model or JSON schema, ensuring smooth integration with front-end components. Once finalization is complete and broader orchestration nears completion, the data is presented to the user.
User Interaction with Assistant Outputs
While advanced AI models and well-designed assistants significantly enhance content generation, they aren’t yet capable of consistently delivering expert human-level performance. To ensure high-quality outputs, users can interact with and refine the content in multiple ways:
1. Chat About Content
Users can open a dedicated chat session on any generated content. (See image below)
These chats extend the original OpenAI Thread API execution, meaning:
Users can explore the reasoning behind the AI’s output, discuss source materials, or guide content in a new direction.
The AI retains context from the original workflow, allowing for informed and coherent responses.
If the conversation results in a rewrite, users can apply the change instantly, with updates reflected in the application. The chat interface is powered by real-time streaming via Server-Sent Events (SSE), ensuring a fast, responsive experience similar to a typical ChatGPT session.
2. Manual Edits
Users can directly edit content for more precise control.
For richly formatted content (e.g., advertisements), the system:
Converts them into Markdown for easy editing.
Restores them to their original rich format using a lightweight structured-output LLM upon saving.
3. Manual Validation Review
While validator feedback is applied automatically to the initial output, any manual edits trigger a one-minute validation process to review the changes. Users can accept or reject validator recommendations. Over time, user feedback from this manual review will help refine automatic validations, improving relevance, accuracy, and inclusivity.
This collaborative feedback loop ensures that AI-generated content continues to evolve and align with expert human judgment.
Challenges and Continued Work
While Muse produces robust outputs, there are still key areas for improvement as development progresses.
1. Enhancing Content Quality with Limited User Input
One of the biggest challenges is ensuring novel and insightful content generation, even when user-provided data is sparse. A promising solution is to introduce more agentic behavior, enabling the system to:
Autonomously determine when additional real-world data is needed.
Select appropriate external sources to supplement missing information.
This evolution would make Muse more proactive in filling contextual gaps and ensuring high-quality outputs regardless of input variability. There is also an opportunity here to leverage Deep Research as well.
2. Improving Consistency in Creative Outputs
Muse’s most valuable contributions—such as identifying insightful patient groupings—require a level of creative ingenuity that can be challenging for a non-deterministic system.
Current Limitation – Some projects don’t always produce the most optimal results due to the inherent variability in LLM-generated outputs.
Potential Solution – Incorporating self-evaluation logic, allowing the system to:
Assess whether its initial response sufficiently meets task requirements.
Iterate or refine its approach before finalizing an output.
3. Lessons from Chain-of-Thought Models
Models like o1, which follow a chain-of-thought reasoning approach, tend to deliver more consistent responses in structured tasks. This is because they allow for continued processing when initial outputs fall short.
While directly integrating o1 isn’t feasible due to cost, speed, and creative performance trade-offs, adopting similar recursive reasoning techniques within our workflow could enhance consistency and output quality.
By addressing these challenges, Muse will become even more adaptive, intelligent, and reliable in generating high-value content.
Conclusion
Muse represents a major step forward in AI-driven patient recruitment, but its true power lies in its ability to evolve. By leveraging OpenAI’s Assistants and Threads API, we’ve built a system that not only streamlines content creation but also fosters meaningful user collaboration. With real-time interaction, validation loops, and dynamic workflow execution, Muse ensures that AI-generated content remains accurate, compliant, and adaptable to the unique needs of each clinical trial.
However, our work doesn’t stop here. Challenges like enhancing content quality with limited input, improving creative consistency, and adopting more structured reasoning techniques will shape our next phase of development. By integrating smarter decision-making, refining validation processes, and continuously learning from user feedback, we aim to make Muse an even more intelligent and proactive tool.
As we push the boundaries of AI in clinical trial recruitment - and across clinical development in general - we’re excited about what’s ahead, and we look forward to sharing more innovations that bridge the gap between automation and human expertise.
Interested in joining our team to work on AI for clinical development? Check out our open roles here.