Repository Layout
研究库目录和材料边界
This repository is a local research workspace and Vocs documentation site for an ear-EEG and cEEGrid literature review. It contains scripts, metadata, paper cards, evidence matrices, narrative drafts, and generated docs pages used to study 23 local PDFs from US-pdf/.
Repository Layout
Ear_EEG_cEEGrid_history.md: Chinese chronology and technical route guide.Ear_EEG_cEEGrid_Systematic_Review_Draft.md: working systematic-review narrative.Prompt_EarEEG_cEEGrid_Review.md: extraction prompt and review role definition.scripts/: local processing scripts for PDF extraction, evidence indexing, paper-card templates, and Vocs page generation.src/pages/: generated Vocs MDX documentation pages.public/papers/: generated static PDF copies served by the Vocs/Vite build and Vercel; this directory is rebuilt fromUS-pdf/and is not committed.metadata/us_papers.json: manifest generated from the current 23-paper set.reports/us_pdf_processing_report.md: current processing status.library/: categorized PDF copies, extracted text, reading index, detailed paper cards, evidence index, evidence matrix, and mainline secondary synthesis.library/SECONDARY_SYNTHESIS_MAINLINES.md: cross-paper synthesis organized by hardware, electrodes, experiments, signal processing, and engineering routes.library/GAP_AUDIT_AND_VERIFICATION_PLAN.md: missing-field audit and follow-up verification plan for uncertain hardware, electrode, and processing details.US-pdf/: source PDFs used by the processing scripts.vocs.config.ts: Vocs navigation, search, and static rendering configuration.vite.config.ts: Vite+ formatting and linting scope.vercel.json: Vercel build/output/header configuration..github/workflows/docs.yml: GitHub Actions check and docs build workflow.
Data Policy
Original PDFs, categorized PDF copies, extracted text files, metadata, reports, and review notes are committed as part of this repository so the local study state is reproducible from Git. OCR PDFs and HTML fallback sources remain ignored unless they become intentional deliverables.
scripts/generate_vocs_docs.mjs copies the 23 source PDFs into ignored public/papers/ build input with stable filenames and adds download links on each paper detail page. This is intended for private or research-use deployments. Before publishing publicly, review copyright, open-access status, and redistribution permission for each PDF; remove any PDF that should not be redistributed and replace the download entry with DOI, PubMed, PMC, journal, or official open-version links.
Docs Commands
Use Vite+ for the docs toolchain:
vp install
vp run dev
vp check
vp run build
vp run previewDo not use bare vp build for this project. Vite+ reserves that command for the built-in Vite production build. Use vp run build so the package script runs scripts/generate_vocs_docs.mjs and then vocs build.
The Vercel project should use the checked-in vercel.json:
installCommand: pnpm install --frozen-lockfile
buildCommand: pnpm exec vp run build
outputDirectory: dist/publicGitHub Actions runs vp install --frozen-lockfile, vp check, and vp run build on pull requests and pushes to main.
Research Commands
python3 scripts/process_us_papers.py
python3 scripts/build_evidence_index.py
python3 scripts/build_paper_cards_template.py
node scripts/generate_vocs_docs.mjsprocess_us_papers.py uses pdftotext first. If a PDF lacks a usable text layer, it can use local ocrmypdf/tesseract to generate OCR copies under library/ocr_pdfs/.
Current Status
The current local run processed all 23 papers successfully. Paper 18 was corrected to the HardwareX/PMC PDF for Knierim et al. 2022. The earlier mismatched Physics Letters B PDF was removed from the repository; the source-mismatch directory only documents that event.
Legacy papers/ Directory
Paper Library
This directory documents remnants from the earlier downloader workflow. The active workflow now uses PDFs in US-pdf/ and writes processed outputs to library/. Any PDF files under this tree are kept only when useful for provenance.
Layout
US-pdf/: 23 local PDFs supplied by the user, with paper 18 replaced by the correct HardwareX/PMC PDF after source-integrity checking.library/pdfs_by_category/<category>/: categorized PDF copies.library/texts/<category>/: extracted text for local reading.metadata/us_papers.json: active manifest for titles, categories, paths, and processing status.reports/us_pdf_processing_report.md: processing status by paper.
Commands
Process the local PDF set:
python3 scripts/process_us_papers.py
python3 scripts/build_evidence_index.py
python3 scripts/build_paper_cards_template.pyNotes
The processor first tries pdftotext. If a PDF lacks a usable text layer, it uses local ocrmypdf/tesseract to create an OCR copy under library/ocr_pdfs/. In the current 23-paper set, all PDFs already had sufficient text layers, so OCR copies were not needed. The earlier paper 18 source mismatch was removed after the correct HardwareX/PMC PDF was restored; the active library now processes all 23 papers.
PDF Policy Mismatch
当前根 README 与 AGENTS 指向 PDF/text 是可复现状态的一部分,但 US-pdf/README.md 仍写着 PDF intentionally ignored by Git。站点第一版按保守策略处理:只展示本地路径和 source-integrity,不新增 PDF 下载入口。
Local PDF Sources
Place the 23 source PDFs for the ear-EEG / cEEGrid review in this directory.
The PDFs themselves are intentionally ignored by Git. The expected filenames are listed in scripts/process_us_papers.py.