About the Knowledge Hub
The RMBL Knowledge Hub is a unified search and discovery platform for environmental research at the Rocky Mountain Biological Laboratory in Gothic, Colorado. It connects scientific publications, community documents, research datasets, and a knowledge graph of species, concepts, protocols, and places studied at one of the longest-running field biology stations in North America.
At a Glance
Frequently Asked Questions
What is the RMBL Knowledge Hub?
The Knowledge Hub is a search and discovery tool that brings together the scientific output of RMBL and the Gunnison Basin into one searchable platform. It includes peer-reviewed publications dating back to 1928, community and policy documents from the Sustainable Living Library, and research datasets from multiple repositories. A knowledge graph connects these resources through shared species, concepts, research methods, and geographic locations.
Who is this for?
The Hub is designed for researchers, students, land managers, community members, and policymakers interested in the environmental research and stewardship of the Gunnison Basin. It is equally useful for scientists looking for related work and for community members exploring how research connects to local policy issues.
What are Knowledge Neighborhoods?
Knowledge Neighborhoods are research communities detected automatically by analyzing the connections in the knowledge graph. Using a community-detection algorithm (Louvain), the system identifies clusters of tightly connected authors, publications, species, concepts, and places. Each neighborhood represents a distinct research theme — from marmot behavioral ecology to watershed biogeochemistry to federal land management policy. Many neighborhoods include AI-generated research primers that summarize the key findings and cite specific publications.
How do I use the API or MCP server?
The Hub provides a REST API at /api/v1/ with endpoints for search, publication detail, entity lookup, related works, and more. Add ?format=text to any endpoint for LLM-friendly plain text. For AI assistants like Claude Desktop, an MCP server is available — see the MCP documentation for setup instructions. See /llms.txt for a machine-readable index of available endpoints.
How can I contribute or report issues?
The Knowledge Hub is developed and maintained by RMBL. If you notice missing publications, incorrect data, or have suggestions for improvement, please contact RMBL or submit an issue on the GitHub repository.
AI Integration
The Knowledge Hub can be queried by AI assistants via the REST API or the MCP (Model Context Protocol) server. This allows tools like Claude Desktop, ChatGPT, and custom scripts to search publications, explore research neighborhoods, and access the knowledge graph programmatically.
REST API
All API endpoints are at /api/v1/ and support ?format=text for LLM-friendly plain text. See /llms.txt for a complete list. Examples:
# Search for publications about alpine pollination curl "https://rmblknowledgehub.org/api/v1/search?q=alpine+pollination&format=text" # Get publication details curl "https://rmblknowledgehub.org/api/v1/publications/13?format=text" # Explore a research neighborhood with primer curl "https://rmblknowledgehub.org/api/v1/neighborhoods/620?format=text" # Look up a species curl "https://rmblknowledgehub.org/api/v1/entities/species/8426?format=text" # Find related works curl "https://rmblknowledgehub.org/api/v1/related/publications/13?format=text"
MCP Server for Claude Desktop
The MCP server gives Claude Desktop (and other MCP-compatible tools) direct access to 8 Knowledge Hub tools: search, publication/dataset/document lookup, entity detail, related works, and neighborhood exploration.
Step 1: Clone the repository and build the MCP server:
git clone https://github.com/ikb-rmbl/RMBL_knowledge_hub.git cd RMBL_knowledge_hub/mcp npm install npm run build
Step 2: Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"rmbl-knowledge-hub": {
"command": "node",
"args": ["/path/to/RMBL_knowledge_hub/mcp/dist/index.js"],
"env": {
"RMBL_API_URL": "https://rmblknowledgehub.org"
}
}
}
}Step 3: Restart Claude Desktop. You should see 8 tools available (hammer icon). Try asking:
- “Search for publications about marmot hibernation at RMBL”
- “What is research neighborhood 620 about?”
- “Find works related to publication 13”
- “Look up the species Marmota flaviventer”
Available MCP Tools
| Tool | Description |
|---|---|
| search_rmbl | Full-text search across all collections |
| get_publication | Publication detail with authors, abstract, entities, citations |
| get_dataset | Dataset detail with creators and entities |
| get_document | Document detail with entities and stakeholders |
| get_entity | Entity lookup (species, concept, protocol, place, stakeholder) |
| find_related | Related works via semantic similarity, shared entities, co-authorship, citations |
| explore_neighborhood | Research neighborhood detail with primer |
| list_neighborhoods | Browse or search 154 research neighborhoods |
Technical Deep-Dive
The sections below describe how data flows into the Knowledge Hub and how the knowledge graph is constructed.
Data Sources
Publications are sourced from the RMBL publications database, with additional discovery via OpenAlex and CrossRef. Each record is enriched with metadata from CrossRef (authors, DOIs, abstracts, citation counts) and Unpaywall (open access links). Full text is extracted from PDFs using pdftotext with OCR fallback via Tesseract.
Datasets are discovered from eight repository sources including EDI, DataONE, Dryad, Zenodo, USGS ScienceBase, Pangaea, NCBI, and Figshare. Each dataset is enriched with EML/DataCite metadata including temporal and spatial coverage, creator information, and licensing.
Documents come from the Sustainable Living Library, a collection of community and policy documents relevant to the Gunnison Basin. These include management plans, environmental impact statements, water quality reports, and local planning documents.
Author Deduplication
Authors are deduplicated across all collections using a two-phase process. First, authors with matching ORCID identifiers are merged. Then, authors sharing the same family name are compared by given name initials, with checks to prevent false merges when middle initials differ (e.g., “R. J. Smith” is kept separate from “R. A. Smith”). Author ordering on publications is repaired from CrossRef metadata to ensure correct first-author attribution.
Entity Extraction & Knowledge Graph
Entities (species, concepts, protocols, places, and stakeholders) are extracted from publication and document full text using Claude vision models (VLM extraction). Each entity mention is linked to its source item with a confidence score and extraction method. Entities are then deduplicated using embedding-based clustering (Voyage AI voyage-4, 1024 dimensions) with type-specific similarity thresholds.
Species names are validated against the ITIS (Integrated Taxonomic Information System) database. Places are enriched with coordinates from GNIS (Geographic Names Information System) and organized into a parent-child hierarchy.
The resulting knowledge graph has 93,062 entity mentions linking items to entities, plus 115,075 citation references with internal cross-links between publications.
Community Detection & Primers
Knowledge Neighborhoods are detected using the Louvain community detection algorithm on the unified knowledge graph. The graph includes all entities and items as nodes, with edges from co-occurrence in publications, co-authorship, and citations. Edge weights are boosted for structural relationships (co-authorship ×5, citations ×3) to ensure that social and citation structure drives community boundaries rather than just shared terminology.
Research primers are generated for the largest neighborhoods using Claude (Opus model) with tiered context assembly: landmark papers (full abstracts + key findings), frontier papers (2020+), breadth papers (single best finding each), and entity context (species, concepts, methods, places). Each primer includes parenthetical citations linked to specific publications in the Hub. Policy-focused neighborhoods receive primers with document citations instead.
Search & Similarity
Full-text search uses PostgreSQL tsvector with weighted ranking (title > abstract > full text) and stemmed query matching. Search results include highlighted snippets via ts_headline.
Related works are found using four signals: semantic similarity (pgvector cosine distance on Voyage AI embeddings), shared entity mentions (at least 3 shared entities), co-authorship (shared authors across publications), and citation links (from the references_cited table). Signals are merged with a multi-signal bonus for items connected by multiple pathways.
Technology Stack
The Knowledge Hub is built with Next.js and Payload CMS on PostgreSQL with pgvector. Graph visualizations use Sigma.js (WebGL). The data pipeline is a set of TypeScript scripts for scraping, enrichment, entity extraction, and graph construction. Vector embeddings are generated by Voyage AI (voyage-4, 1024 dimensions). The site is hosted on Vercel with the database on Neon (serverless PostgreSQL).
The project is open source at github.com/ikb-rmbl/RMBL_knowledge_hub.
Feedback & Contact
The Knowledge Hub is an evolving platform and we welcome feedback from the community. If you notice missing publications, incorrect data, broken links, or have ideas for new features, there are two ways to get in touch:
- Report an issue on GitHub: github.com/ikb-rmbl/RMBL_knowledge_hub/issues — best for bug reports, data corrections, and feature requests.
- Contact the developer: Ian Breckheimer — ikb@rmbl.org
Acknowledgments
The RMBL Knowledge Hub was developed with support from the Clark Family Foundation. Built by RMBL using data from CrossRef, OpenAlex, Unpaywall, ITIS, GNIS, and multiple data repositories.
