How We Do This Research
In one sentence
We use a coordinated team of AI agents that read research papers independently, disagree with each other on purpose, and only publish when both readers are satisfied that the output matches what the source articles actually say.
This page explains what that means in practice — what the process looks like, what we catch by doing it this way, and what we still can't promise.
Why pair review, not single AI
Most AI health content is generated by one model writing one answer. Even when the model is rigorous, that approach has a built-in problem: there's nothing to catch the model's mistakes except the user.
We work differently. Two AI agents read the same source articles separately, before either sees the other's interpretation. Each forms an independent understanding of what the paper says. Then they compare — and where they disagree, the disagreement gets resolved through documented discussion, not silently overwritten.
This isn't "two checks instead of one." It's a different process entirely. Verification catches formatting errors. Independent reading catches substantive misses. When one reader writes "the trial showed efficacy" and the other reader, having read the same paper independently, says "the placebo response was 50% of the effect — that nuance is missing," the disagreement makes the gap visible. Neither agent alone produces that catch.
We commit to publishing only when both reading partners are satisfied that the output matches the source articles. Not when the synthesis is "done." Not when "the checklist is complete." When both readers are proud of the result.
The pipeline (what happens for each disease)
For each disease we cover, we follow this sequence:
1. Query design
We design literature search queries to cover treatment evidence, practice patterns, patient experience, and recent developments. Two agents review the queries independently before any searching begins. We document every query design decision so the search is reproducible.
2. Abstract screening
We retrieve abstracts from PubMed and screen each one against inclusion criteria. For Peyronie's disease, we screened 1,378 abstracts before deciding which papers to read in full. Each abstract was independently classified by two agents; disagreements were reconciled before the article moved forward.
3. Article curation
From the screened abstracts, we identify Essential papers (must read in full) and Supporting papers (read as abstracts with annotations). This is where the "tightening pass" happens — we ask of every Essential paper: would reading this paper change what we write? Papers that are valuable but redundant move to Supporting. The result is a smaller, sharper reading list.
For Peyronie's: 224 Essential papers identified, 58+ full-text articles read in depth.
4. Full-text reading
The writer pair reads each full-text article. The Writer reads first and synthesizes findings into the deliverable. W2 (the writing partner) reads independently — without seeing the Writer's synthesis — and forms an independent view of what the paper contains.
When W2's independent read surfaces something missing or different from the synthesis, it gets discussed. Sometimes the Writer was right and W2 missed context. Sometimes W2 caught a real gap. Either way, the disagreement makes the question visible.
5. Pair verification
After the deliverable is drafted, the research pair (R1+R2) verifies that:
- Every claim cites a specific source
- Every numerical claim matches the cited paper
- Every guideline reference matches the actual guideline language
- No claims exceed what abstract-only sources actually say
- No claims silently update without the document reflecting the update
When the verification finds gaps, the writer pair revises. We don't sign off on the deliverable until verification confirms the synthesis matches the sources.
6. Re-read gates
For papers where selective reading missed important detail, we order full re-reads with documented modifications. Each re-read is gated — one paper at a time, full read by Writer, independent verification by W2, sign-off by the research pair. We don't batch through; we read one, close it, then move on.
For Peyronie's: 8 re-read cycles producing 53 documented modifications to the deliverable. Each modification was independently verified before it landed in the output.
Examples of corrections we caught
The pair-review process is only valuable if it actually catches things. Here are real corrections from this disease's research:
Flores 2022 — n=114, not n=509
A high-impact paper on CCH treatment outcomes was logged in screening as "n=509, 94% response rate." When the research pair enriched the actual full text, the real numbers were n=114, 44% improvement, 17% measured worsening. The screening note had conflated this paper with a different paper from the same author. Catching this prevented a 5x sample-size overstatement and a misleading response rate from entering the deliverable.
Hellstrom 2006 — secondary citation flagged
The deliverable cited specific numbers from Hellstrom 2006 (interferon trial). The verification pair flagged that the paper was paywalled and not on disk — so where did the specific n, percentages, and statistical results come from? The writer documented that the numbers came via Zucchi 2016's citation table, not from the primary source. The attribution was corrected to [as cited in Zucchi 2016] so readers know the source chain.
"Minimal venous flow" — clinical misinterpretation caught
During analysis of a patient's Doppler findings, the research pair initially flagged "minimal venous phase flow" as a possible venous leak signal warranting further evaluation. The writing partner caught the error in real time: in penile Doppler physiology, minimal venous phase flow is actually the desired finding (it indicates the veno-occlusive mechanism is working). The error would have sent the patient down a workup path he didn't need. The correction made it into the deliverable before publication.
CUA verapamil grading correction
A widely-cited "Guideline of Guidelines" paper (Chierigo 2026) rendered the Canadian Urological Association's verapamil recommendation as "Level 2, Grade B." The full re-read of the actual CUA guideline (Bella 2018) showed the recommendation is graded "Level 3, Grade C." The research pair's enrichment had inherited the GoG's overstatement. Catching this prevented an overstated evidence claim about a treatment that — based on subsequent RCT evidence — produces zero curvature change in controlled trials.
Traction misattribution
The deliverable said "AUA lists traction as possibly promising." The full re-read of the AUA 2015 guideline showed AUA actually says "insufficient evidence." The "possibly promising" framing came from a different (more recent) trial. Output was corrected to attribute the recommendation accurately and note that newer RCT data postdates the AUA guideline.
These aren't unusual catches. Across 53 modifications from 8 re-read articles, the pair process caught corrections at a rate of about 6 per article. Selective reading misses these. Full reading with independent verification catches them.
What "AI-generated" actually means here
A note on disclosure. Most "AI-generated" content was written by a single language model in one pass, with no verification step. Calling our research "AI-generated" doesn't capture what we do.
Here's the more accurate description:
This research is conducted by a coordinated team of AI agents using structured pair review. One AI reads each source article. A second AI reads the same article independently. Their findings are compared, and disagreements are resolved through documented discussion. Every numerical claim is verified against the source text. All work is conducted under human direction.
The process matters more than the tool. We don't think "AI-generated" is the right way to describe what's here, but we want users to know exactly how the content was produced, so they can decide for themselves how much to trust it.
What we don't promise
We are explicit about our limitations because we think honesty is part of the trust we're trying to build.
Abstract-only papers. Some research papers are paywalled and not accessible to us. When we can only read the abstract, we don't write past what the abstract says. We tag those claims [abstract only] and note when a high-impact paper couldn't be procured.
Paywalled sources. Some primary sources behind our citations are paywalled. Where this is true, we cite the secondary source (a review or guideline that quotes the primary) rather than pretending we read the primary directly.
Single-disease prototype. Peyronie's disease is the first disease we've researched at this depth. The methodology is built on prior research-pipeline experience but the platform itself is new. Methodology improvements are likely as we cover additional diseases.
Static publication. Once published, our content can become outdated as new evidence emerges. We will date-stamp each deliverable, note when major guidelines update, and revise content accordingly. But there's a lag between new evidence appearing and our content reflecting it.
Not medical advice. Everything here is educational. Every disease is different; every patient is different. We help you understand what the published evidence says about your condition and the questions worth asking your doctor — we do not replace your doctor's judgment about your specific case.
When we are wrong. AI agents make mistakes. So do experts. When we discover an error after publication, we correct the content, document the correction, and date-stamp when the change was made. You can see the correction history for any deliverable.
What this means for the content you read
Every claim in our deliverables traces to a specific cited source. When the source is full-text and we read it, the claim reflects what the paper actually says. When the source is abstract-only or behind a paywall, we tell you that. When two papers disagree, we present the disagreement honestly rather than averaging it away.
Every guideline reference matches the actual guideline language — not a summary, not an interpretation, not what the guideline "probably means." Where guideline committees disagree (and they do), we map the disagreement and let you see it.
We show our work because the work is the value. The synthesis isn't more important than the verification. The conclusion isn't more important than the citation chain. If you can trace every claim we make back to a real, cited, dated source — you can decide for yourself how confident to be in our analysis.
That's the standard we hold ourselves to: not "trust us because we're thorough," but "here's the evidence, here's how we read it, here's where we double-checked, and here's what we don't know."
How to verify our claims
Every cited paper in our deliverables includes its PubMed identifier (PMID). You can verify any claim we make by looking up the PMID at pubmed.ncbi.nlm.nih.gov — the abstract is always free, and many papers are open-access. If we cite a study saying "the trial showed X," you can check the abstract yourself and see whether we represented it accurately. If we cite a guideline saying "the AUA recommends Y," you can find the guideline through your urologist's office or your medical library. We don't ask you to trust us. We give you the tools to check.
Last reviewed: 2026-04-25. Process descriptions reflect current practice; methodology evolves as we cover additional diseases.
Questions about our process? Ask in the community or review the Evidence Reference for Brian's case to see the verification standard in practice.