About the Knowledge Hub

The RMBL Knowledge Hub is a unified search and discovery platform for environmental research at the Rocky Mountain Biological Laboratory in Gothic, Colorado. It connects scientific publications, community documents, research datasets, and a knowledge graph of species, concepts, protocols, and places studied at one of the longest-running field biology stations in North America.

At a Glance

4,853
Publications
1,426
Datasets
1,381
Documents
7,004
Authors
1,206
Species
4,874
Concepts
1,474
Protocols
1,954
Places
154
Neighborhoods
77
Research Primers
93,062
Entity Mentions
115,075
Citation Links

Frequently Asked Questions

What is the RMBL Knowledge Hub?

The Knowledge Hub is a search and discovery tool that brings together the scientific output of RMBL and the Gunnison Basin into one searchable platform. It includes peer-reviewed publications dating back to 1928, community and policy documents from the Sustainable Living Library, and research datasets from multiple repositories. A knowledge graph connects these resources through shared species, concepts, research methods, and geographic locations.

Who is this for?

The Hub is designed for researchers, students, land managers, community members, and policymakers interested in the environmental research and stewardship of the Gunnison Basin. It is equally useful for scientists looking for related work and for community members exploring how research connects to local policy issues.

What are Knowledge Neighborhoods?

Knowledge Neighborhoods are research communities detected automatically by analyzing the connections in the knowledge graph. Using a community-detection algorithm (Louvain), the system identifies clusters of tightly connected authors, publications, species, concepts, and places. Each neighborhood represents a distinct research theme — from marmot behavioral ecology to watershed biogeochemistry to federal land management policy. Many neighborhoods include AI-generated research primers that summarize the key findings and cite specific publications.

How do I use the API or MCP server?

The Hub provides a REST API at /api/v1/ with endpoints for search, publication detail, entity lookup, related works, and more. Add ?format=text to any endpoint for LLM-friendly plain text. For AI assistants like Claude Desktop, an MCP server is available — see the MCP documentation for setup instructions. See /llms.txt for a machine-readable index of available endpoints.

How can I contribute or report issues?

The Knowledge Hub is developed and maintained by RMBL. If you notice missing publications, incorrect data, or have suggestions for improvement, please contact RMBL or submit an issue on the GitHub repository.

AI Integration

The Knowledge Hub can be queried by AI assistants via the REST API or the MCP (Model Context Protocol) server. This allows tools like Claude Desktop, ChatGPT, and custom scripts to search publications, explore research neighborhoods, and access the knowledge graph programmatically.

REST API

All API endpoints are at /api/v1/ and support ?format=text for LLM-friendly plain text. See /llms.txt for a complete list. Examples:

# Search for publications about alpine pollination
curl "https://rmblknowledgehub.org/api/v1/search?q=alpine+pollination&format=text"

# Get publication details
curl "https://rmblknowledgehub.org/api/v1/publications/13?format=text"

# Explore a research neighborhood with primer
curl "https://rmblknowledgehub.org/api/v1/neighborhoods/620?format=text"

# Look up a species
curl "https://rmblknowledgehub.org/api/v1/entities/species/8426?format=text"

# Find related works
curl "https://rmblknowledgehub.org/api/v1/related/publications/13?format=text"
MCP Server for Claude Desktop

The MCP server gives Claude Desktop (and other MCP-compatible tools) direct access to 8 Knowledge Hub tools: search, publication/dataset/document lookup, entity detail, related works, and neighborhood exploration.

Step 1: Clone the repository and build the MCP server:

git clone https://github.com/ikb-rmbl/RMBL_knowledge_hub.git
cd RMBL_knowledge_hub/mcp
npm install
npm run build

Step 2: Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "rmbl-knowledge-hub": {
      "command": "node",
      "args": ["/path/to/RMBL_knowledge_hub/mcp/dist/index.js"],
      "env": {
        "RMBL_API_URL": "https://rmblknowledgehub.org"
      }
    }
  }
}

Step 3: Restart Claude Desktop. You should see 8 tools available (hammer icon). Try asking:

  • “Search for publications about marmot hibernation at RMBL”
  • “What is research neighborhood 620 about?”
  • “Find works related to publication 13”
  • “Look up the species Marmota flaviventer”
Available MCP Tools
ToolDescription
search_rmblFull-text search across all collections
get_publicationPublication detail with authors, abstract, entities, citations
get_datasetDataset detail with creators and entities
get_documentDocument detail with entities and stakeholders
get_entityEntity lookup (species, concept, protocol, place, stakeholder)
find_relatedRelated works via semantic similarity, shared entities, co-authorship, citations
explore_neighborhoodResearch neighborhood detail with primer
list_neighborhoodsBrowse or search 154 research neighborhoods

Technical Deep-Dive

The sections below describe how data flows into the Knowledge Hub and how the knowledge graph is constructed.

Data Sources

Publications are sourced from the RMBL publications database, with additional discovery via OpenAlex and CrossRef. Each record is enriched with metadata from CrossRef (authors, DOIs, abstracts, citation counts) and Unpaywall (open access links). Full text is extracted from PDFs using pdftotext with OCR fallback via Tesseract.

Datasets are discovered from eight repository sources including EDI, DataONE, Dryad, Zenodo, USGS ScienceBase, Pangaea, NCBI, and Figshare. Each dataset is enriched with EML/DataCite metadata including temporal and spatial coverage, creator information, and licensing.

Documents come from the Sustainable Living Library, a collection of community and policy documents relevant to the Gunnison Basin. These include management plans, environmental impact statements, water quality reports, and local planning documents.

Author Deduplication

Authors are deduplicated across all collections using a two-phase process. First, authors with matching ORCID identifiers are merged. Then, authors sharing the same family name are compared by given name initials, with checks to prevent false merges when middle initials differ (e.g., “R. J. Smith” is kept separate from “R. A. Smith”). Author ordering on publications is repaired from CrossRef metadata to ensure correct first-author attribution.

Entity Extraction & Knowledge Graph

Entities (species, concepts, protocols, places, and stakeholders) are extracted from publication and document full text using Claude vision models (VLM extraction). Each entity mention is linked to its source item with a confidence score and extraction method. Entities are then deduplicated using embedding-based clustering (Voyage AI voyage-4, 1024 dimensions) with type-specific similarity thresholds.

Species names are validated against the ITIS (Integrated Taxonomic Information System) database. Places are enriched with coordinates from GNIS (Geographic Names Information System) and organized into a parent-child hierarchy.

The resulting knowledge graph has 93,062 entity mentions linking items to entities, plus 115,075 citation references with internal cross-links between publications.

Community Detection & Primers

Knowledge Neighborhoods are detected using the Louvain community detection algorithm on the unified knowledge graph. The graph includes all entities and items as nodes, with edges from co-occurrence in publications, co-authorship, and citations. Edge weights are boosted for structural relationships (co-authorship ×5, citations ×3) to ensure that social and citation structure drives community boundaries rather than just shared terminology.

Research primers are generated for the largest neighborhoods using Claude (Opus model) with tiered context assembly: landmark papers (full abstracts + key findings), frontier papers (2020+), breadth papers (single best finding each), and entity context (species, concepts, methods, places). Each primer includes parenthetical citations linked to specific publications in the Hub. Policy-focused neighborhoods receive primers with document citations instead.

Search & Similarity

Full-text search uses PostgreSQL tsvector with weighted ranking (title > abstract > full text) and stemmed query matching. Search results include highlighted snippets via ts_headline.

Related works are found using four signals: semantic similarity (pgvector cosine distance on Voyage AI embeddings), shared entity mentions (at least 3 shared entities), co-authorship (shared authors across publications), and citation links (from the references_cited table). Signals are merged with a multi-signal bonus for items connected by multiple pathways.

Technology Stack

The Knowledge Hub is built with Next.js and Payload CMS on PostgreSQL with pgvector. Graph visualizations use Sigma.js (WebGL). The data pipeline is a set of TypeScript scripts for scraping, enrichment, entity extraction, and graph construction. Vector embeddings are generated by Voyage AI (voyage-4, 1024 dimensions). The site is hosted on Vercel with the database on Neon (serverless PostgreSQL).

The project is open source at github.com/ikb-rmbl/RMBL_knowledge_hub.

Feedback & Contact

The Knowledge Hub is an evolving platform and we welcome feedback from the community. If you notice missing publications, incorrect data, broken links, or have ideas for new features, there are two ways to get in touch:

Acknowledgments

The RMBL Knowledge Hub was developed with support from the Clark Family Foundation. Built by RMBL using data from CrossRef, OpenAlex, Unpaywall, ITIS, GNIS, and multiple data repositories.