Add scientific data and geospatial publishing guides & case study by 2color · Pull Request #2243 · ipfs/ipfs-docs

2color · 2026-01-23T15:16:10Z

What

Add a new geospatial publishing guide

Preview: https://bafybeiaj6f6pnzpoq4ktj5ubgw4de7fk7fzfixvcj7fipignkdwmo2ubk4.ipfs.dweb.link/how-to/publish-geospatial-data/

github-actions · 2026-01-23T15:17:52Z

🚀 Build Preview on IPFS ready

🔎 Commit: e2b3a6e
🔏 CID bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci
📦 Preview:
- dweb.link
- inbrowser.link

mishmosh · 2026-01-23T15:36:52Z

This is great as a specific how-to. Is there another, complementary place we can write about all the ways geospatial users can benefit from IPFS?

From live meeting:

Consider title “Scientific Data” as category
- Ecosystem Tooling
- Guide to Publishing Scientific Data
IPFS is used by the geospatial community for better collaboration, data integrity, and open access.
(make sure we can describe some of the architectures used)
- Connecting kubo to your existing data repositories (stac catalog)
- Private clusters (but open retrieval) or “Collaborative publishing”
- Provenance

mishmosh

A few suggestions and comments inline, but I'm confident you can take it from here. Would also like to see @vmx review.

docs/how-to/publish-geospatial-data.md

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>

2color · 2026-02-06T14:39:58Z

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

+
+Users can then access your data by using the `ipns://` prefix.
+
+TODO: add example.


Note to self: add example

lidel · 2026-02-06T17:44:32Z

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

+         --raw-leaves \
+         --chunker=size-1048576 \
+         --cid-version=1 \


Once Kubo 0.40 ships these could be removed and replaced by one-time ipfs config profile apply unixfs-v1-2025 or setting Import.* values one-by-one

2color · 2026-02-13T16:28:59Z

https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/how-to/scientific-data/landscape-guide/
https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/case-studies/orcestra/
https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/how-to/scientific-data/publish-geospatial-zarr-data/

2color · 2026-02-13T16:30:08Z

docs/how-to/scientific-data/landscape-guide.md

+
+In this guide, you'll find an overview of the problem space, available tools, and architectural patterns for publishing and working with scientific data using IPFS.
+
+## A Landscape in Flux


From Slack:

Condense “A Landscape in Flux” - Remove or significantly shorten so we get to solutions/new knowledge faster; we can highlight the main points but you can assume readers already work in geospatial and are familiar with the challenges

2color · 2026-02-13T16:34:50Z

This PR has been updated and includes the addition of three new pages:

Landscape guide: https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/how-to/scientific-data/landscape-guide/
case study: https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/case-studies/orcestra/
practical zarr publishing guide: https://bafybeih6r67ylni7hikg76j4yxqxtyzuw3uopedxajgcqpf52pnswx6wci.ipfs.dweb.link/how-to/scientific-data/publish-geospatial-zarr-data/

It still needs more work, but I think this structure works well. I'd like to also maybe add a more generic retrieval example in Python using ipfsspec for the retrieval guide.

Copilot

Pull request overview

Adds new Scientific Data documentation content to the IPFS docs site, including a hands-on guide for publishing geospatial Zarr datasets and supporting context via a landscape overview and an ORCESTRA case study. Updates the VuePress sidebar to surface the new pages and case study.

Changes:

Add a new “Publish Geospatial Zarr Data with IPFS” how-to guide.
Add a new “Scientific Data and IPFS Landscape Guide” overview page.
Add a new ORCESTRA case study and update VuePress navigation (including sidebar re-organization and case study list).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 20 comments.

File	Description
docs/how-to/scientific-data/publish-geospatial-zarr-data.md	New step-by-step publishing guide (Zarr + IPFS), including discovery/access patterns.
docs/how-to/scientific-data/landscape-guide.md	New overview of scientific data formats, architectural patterns, and ecosystem tooling.
docs/case-studies/orcestra.md	New case study describing ORCESTRA’s use of IPFS for scientific data distribution.
docs/.vuepress/config.js	Adds the new Scientific Data pages to the How-to sidebar and adds ORCESTRA to case studies; also reorganizes peer-related sidebar entries.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-13T16:36:06Z

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

+Users can then access your data by using the `ipns://` prefix.
+
+TODO: add example.


This section still contains a “TODO: add example.” placeholder. Please either add a concrete DNSLink + ipns://… (and/or gateway) example or remove the placeholder before merging so the guide reads as complete.

Suggested change

Users can then access your data by using the `ipns://` prefix.

TODO: add example.

Users can then access your data by using the `ipns://` prefix. For example, if your Zarr store lives under a `my-zarr-dataset` directory, you can use:

- IPNS URI: `ipns://data.example.org/my-zarr-dataset/`

- HTTP gateway: `https://dweb.link/ipns/data.example.org/my-zarr-dataset/`

Copilot · 2026-02-13T16:36:07Z

docs/how-to/scientific-data/landscape-guide.md

+Ultimately the choice between these approaches for content-addressed data management comes to down to the following questions:
+
+- How important is duplication? This is probably a function of the volume of your data and market costs of storage.
+- How important is having copy in content-addressed format around? If no public publishing is expected, you can forego this and only hash .
+- What libraries and which languages using to interact with the data? For example, xarray which makes use of fsspec, can read directly from a local IPFS gateway with [`ipfsspec`](https://github.com/fsspec/ipfsspec)


This paragraph has a few incomplete/unclear sentences (e.g., “forego this and only hash .” and “What libraries and which languages using…”). Please rewrite for clarity so readers can follow the trade-offs being described.

Suggested change

Ultimately the choice between these approaches for content-addressed data management comes to down to the following questions:

- How important is duplication? This is probably a function of the volume of your data and market costs of storage.

- How important is having copy in content-addressed format around? If no public publishing is expected, you can forego this and only hash .

- What libraries and which languages using to interact with the data? For example, xarray which makes use of fsspec, can read directly from a local IPFS gateway with [`ipfsspec`](https://github.com/fsspec/ipfsspec)

Ultimately the choice between these approaches for content-addressed data management comes down to the following questions:

- How important is duplication? This is probably a function of the volume of your data and market costs of storage.

- How important is it to maintain a copy of the data in a content-addressed format? If no public publishing is expected and you only need integrity checks, you may choose not to store a full content-addressed replica and instead compute hashes on demand.

- What libraries and which programming languages will you use to interact with the data? For example, Python’s xarray library, via fsspec, can read directly from a local IPFS gateway using [`ipfsspec`](https://github.com/fsspec/ipfsspec).

Copilot · 2026-02-13T16:36:07Z

docs/how-to/scientific-data/landscape-guide.md

+
+## Next Steps
+
+- [Publishing Zarr Datasets with IPFS](./publishing-zarr-datasets.md) - A hands-on guide to publishing your first dataset


The “Next Steps” section links to ./publishing-zarr-datasets.md, but that file doesn’t exist in this directory. Please update the link to an existing guide (possibly publish-geospatial-zarr-data.md) or add the missing page.

Suggested change

- [Publishing Zarr Datasets with IPFS](./publishing-zarr-datasets.md) - A hands-on guide to publishing your first dataset

- [Publishing Zarr Datasets with IPFS](./publish-geospatial-zarr-data.md) - A hands-on guide to publishing your first dataset

Copilot · 2026-02-13T16:36:07Z

docs/case-studies/orcestra.md

+
+ORCESTRA uses IPFS to make scientific data openly accessible, verifiable, and resilient.
+
+The raw data is processed by the at the Max Planck Institute for Meteorology, who process the data for publishing, where the end result is a set of CIDs corresponding to data from the different sub campaigns. allowing anyone who retrieves the data can independently verify they received exactly what was published, with no trust required in the specific server it was fetched from.


This paragraph is grammatically broken and hard to follow (“processed by the at the…”, run-on sentence, and a fragment starting with “allowing…”). Please rewrite for clarity so the data publication flow and trust model are unambiguous.

Suggested change

The raw data is processed by the at the Max Planck Institute for Meteorology, who process the data for publishing, where the end result is a set of CIDs corresponding to data from the different sub campaigns. allowing anyone who retrieves the data can independently verify they received exactly what was published, with no trust required in the specific server it was fetched from.

Raw data from the different sub-campaigns is processed at the Max Planck Institute for Meteorology into publishable datasets. These datasets are added to IPFS, producing content identifiers (CIDs) that correspond to the published data from each sub-campaign. Because each CID is derived from the content itself, anyone who retrieves the data can independently verify that they received exactly what was published, without needing to trust any specific server that served it.

Copilot · 2026-02-13T16:36:08Z

docs/how-to/scientific-data/landscape-guide.md

+- **Consolidated metadata**: All metadata can be consolidated into a single file for datasets with many arrays
+- **Designed for network access patterns**: Distributed storage tends to have high throughput and high latency
+
+> Note: To learn more about Zarr, check out the following resources: [Introduction to the Zarr format by Copernicus Marine](https://help.marine.copernicus.eu/en/articles/10401542-introduction-to-the-zarr-format), [# What is Cloud-Optimized Scientific Data?](https://tom-nicholas.com/blog/2025/cloud-optimized-scientific-data/).


The link text includes an extra “# ” (“[# What is Cloud-Optimized Scientific Data?]”), which looks like an accidental header marker. Consider removing the “#” from the link label.

Suggested change

> Note: To learn more about Zarr, check out the following resources: [Introduction to the Zarr format by Copernicus Marine](https://help.marine.copernicus.eu/en/articles/10401542-introduction-to-the-zarr-format), [# What is Cloud-Optimized Scientific Data?](https://tom-nicholas.com/blog/2025/cloud-optimized-scientific-data/).

> Note: To learn more about Zarr, check out the following resources: [Introduction to the Zarr format by Copernicus Marine](https://help.marine.copernicus.eu/en/articles/10401542-introduction-to-the-zarr-format), [What is Cloud-Optimized Scientific Data?](https://tom-nicholas.com/blog/2025/cloud-optimized-scientific-data/).

Copilot · 2026-02-13T16:36:11Z

docs/how-to/scientific-data/landscape-guide.md

+
+#### Mutable File System (MFS)
+
+MFS provides a familiar filesystem interface for organizing immutable content that in encoded with UnixFS (though not e. You can create directories, move files, and maintain a logical structure while the underlying data remains content-addressed.


This sentence is truncated/unfinished (“...content that in encoded with UnixFS (though not e.”). Please complete or remove it so the MFS section doesn’t read as broken.

Suggested change

MFS provides a familiar filesystem interface for organizing immutable content that in encoded with UnixFS (though not e. You can create directories, move files, and maintain a logical structure while the underlying data remains content-addressed.

MFS provides a familiar filesystem interface for organizing immutable content that is encoded with UnixFS. You can create directories, move files, and maintain a logical structure while the underlying data remains content-addressed.

Copilot · 2026-02-13T16:36:12Z

docs/how-to/scientific-data/landscape-guide.md

+TODO: give an example with the `kubo ipfs files api` or maybe an asciicinema
+


This section contains a TODO placeholder. Please add the referenced example (or remove the TODO) before merging to avoid publishing incomplete guidance.

Copilot · 2026-02-13T16:36:12Z

docs/case-studies/orcestra.md

+
+The [ORCESTRA data browser](http://browser.orcestra-campaign.org/) provides a web interface for discovering and retrieving datasets. Built on top of [Climate and Forecast (CF) conventions](https://cfconventions.org/) metadata embedded in the Zarr datasets, the browser lets researchers search by variable, time range, sub-campaign, and other dimensions, then retrieve data directly via IPFS.
+
+The browser leverages both Helia, the TypeScript implementation of IPFS and


This sentence is incomplete (“The browser leverages both Helia, the TypeScript implementation of IPFS and …”). Please finish the thought (e.g., what else it leverages and how) or remove the fragment.

Suggested change

The browser leverages both Helia, the TypeScript implementation of IPFS and

The browser leverages Helia, the TypeScript implementation of IPFS.

Copilot · 2026-02-13T16:36:13Z

docs/.vuepress/config.js

              children: [
                ['/case-studies/arbol', 'Arbol'],
                ['/case-studies/audius', 'Audius'],
+                ['/case-studies/orcestra', 'Orcestra'],


Sidebar label is “Orcestra”, but the project is consistently styled as “ORCESTRA” in the case study. Consider updating the display label to match the official capitalization for consistency.

Suggested change

['/case-studies/orcestra', 'Orcestra'],

['/case-studies/orcestra', 'ORCESTRA'],

Copilot · 2026-02-13T16:36:13Z

docs/how-to/scientific-data/publish-geospatial-zarr-data.md

+
+By the end of this guide, you will publish a Zarr dataset to the IPFS network in a way that is retrievable directly within [Xarray](https://xarray.dev/).
+
+If you are interested in a real-world example following the patterns in this guide, check out the [The ORCESTRA campaign](https://orcestra-campaign.org/intro.html).


In the “real-world example” sentence, the link text reads “the [The ORCESTRA campaign]”, which is grammatically incorrect (double “the”). Consider changing the anchor text to “the ORCESTRA campaign”.

Suggested change

If you are interested in a real-world example following the patterns in this guide, check out the [The ORCESTRA campaign](https://orcestra-campaign.org/intro.html).

If you are interested in a real-world example following the patterns in this guide, check out the [ORCESTRA campaign](https://orcestra-campaign.org/intro.html).

2color added 2 commits January 23, 2026 14:42

chore: move kubo specific guides into kubo categorty

fecc47d

initial draft of geospatial guide

0aac21d

2color marked this pull request as ready for review February 4, 2026 16:14

mishmosh approved these changes Feb 4, 2026

View reviewed changes

vmx reviewed Feb 5, 2026

View reviewed changes

docs/how-to/publish-geospatial-data.md Outdated Show resolved Hide resolved

docs/how-to/publish-geospatial-data.md Outdated Show resolved Hide resolved

docs/how-to/scientific-data/publish-geospatial-zarr-data.md Show resolved Hide resolved

2color and others added 6 commits February 6, 2026 14:34

Apply suggestion from @vmx

12eab53

Co-authored-by: Volker Mische <volker.mische@gmail.com>

Apply suggestion from @mishmosh

fd6337f

Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>

Apply suggestion from @mishmosh

45d15d2

Co-authored-by: Mosh <1306020+mishmosh@users.noreply.github.com>

address feedback and refine guide

3e75109

reorg into folder

8811dd5

chore: add TODO

3be1cd3

2color commented Feb 6, 2026

View reviewed changes

2color requested a review from vmx February 6, 2026 14:46

lidel reviewed Feb 6, 2026

View reviewed changes

vmx approved these changes Feb 9, 2026

View reviewed changes

2color added 2 commits February 13, 2026 17:18

docs: add landscape guide

f22d621

docs: add orcestra to case studies

e2b3a6e

2color commented Feb 13, 2026

View reviewed changes

2color requested a review from Copilot February 13, 2026 16:31

Copilot started reviewing on behalf of 2color February 13, 2026 16:31 View session

2color changed the title ~~Add geospatial publishing guide~~ Add scientific data and geospatial publishing guides & case study Feb 13, 2026

Copilot AI reviewed Feb 13, 2026

View reviewed changes


		Users can then access your data by using the `ipns://` prefix.

		TODO: add example.


		In this guide, you'll find an overview of the problem space, available tools, and architectural patterns for publishing and working with scientific data using IPFS.

		## A Landscape in Flux

-Users can then access your data by using the `ipns://` prefix.
-TODO: add example.
+Users can then access your data by using the `ipns://` prefix. For example, if your Zarr store lives under a `my-zarr-dataset` directory, you can use:
+- IPNS URI: `ipns://data.example.org/my-zarr-dataset/`
+- HTTP gateway: `https://dweb.link/ipns/data.example.org/my-zarr-dataset/`


		## Next Steps

		- [Publishing Zarr Datasets with IPFS](./publishing-zarr-datasets.md) - A hands-on guide to publishing your first dataset


		ORCESTRA uses IPFS to make scientific data openly accessible, verifiable, and resilient.

		The raw data is processed by the at the Max Planck Institute for Meteorology, who process the data for publishing, where the end result is a set of CIDs corresponding to data from the different sub campaigns. allowing anyone who retrieves the data can independently verify they received exactly what was published, with no trust required in the specific server it was fetched from.

	> Note: To learn more about Zarr, check out the following resources: [Introduction to the Zarr format by Copernicus Marine](https://help.marine.copernicus.eu/en/articles/10401542-introduction-to-the-zarr-format), [# What is Cloud-Optimized Scientific Data?](https://tom-nicholas.com/blog/2025/cloud-optimized-scientific-data/).
	> Note: To learn more about Zarr, check out the following resources: [Introduction to the Zarr format by Copernicus Marine](https://help.marine.copernicus.eu/en/articles/10401542-introduction-to-the-zarr-format), [What is Cloud-Optimized Scientific Data?](https://tom-nicholas.com/blog/2025/cloud-optimized-scientific-data/).


		#### Mutable File System (MFS)

		MFS provides a familiar filesystem interface for organizing immutable content that in encoded with UnixFS (though not e. You can create directories, move files, and maintain a logical structure while the underlying data remains content-addressed.

		TODO: give an example with the `kubo ipfs files api` or maybe an asciicinema


		The [ORCESTRA data browser](http://browser.orcestra-campaign.org/) provides a web interface for discovering and retrieving datasets. Built on top of [Climate and Forecast (CF) conventions](https://cfconventions.org/) metadata embedded in the Zarr datasets, the browser lets researchers search by variable, time range, sub-campaign, and other dimensions, then retrieve data directly via IPFS.

		The browser leverages both Helia, the TypeScript implementation of IPFS and

	The browser leverages both Helia, the TypeScript implementation of IPFS and
	The browser leverages Helia, the TypeScript implementation of IPFS.

	['/case-studies/orcestra', 'Orcestra'],
	['/case-studies/orcestra', 'ORCESTRA'],


		By the end of this guide, you will publish a Zarr dataset to the IPFS network in a way that is retrievable directly within [Xarray](https://xarray.dev/).

		If you are interested in a real-world example following the patterns in this guide, check out the [The ORCESTRA campaign](https://orcestra-campaign.org/intro.html).

Conversation

2color commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Uh oh!

github-actions bot commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Build Preview on IPFS ready

Uh oh!

mishmosh commented Jan 23, 2026

Uh oh!

mishmosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

2color Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

lidel Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

2color commented Feb 13, 2026

Uh oh!

2color Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

2color commented Feb 13, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

2color commented Jan 23, 2026 •

edited

Loading

github-actions bot commented Jan 23, 2026 •

edited

Loading