More permissive rehydration logic

So for the last little while I've been pursuing an idea called [React Snapshot](https://github.com/geelen/react-snapshot), where instead of running your code in a Node environment to generate static HTML, you run it in a virtual browser (jsdom or chrome headless) and take a snapshot of the DOM at a particularly moment in time, then host the snapshots like any other static file (technique also known as pre-rendering).

I've been tossing around different API choices (https://github.com/geelen/react-snapshot/pull/30) in order to handle components that have async data fetching requirements, but I'm already starting to see real promise in this approach. Because the snapshot environment is so similar to the client one, far fewer changes are needed to get the performance & accessibility benefits of serving real HTML to your users. This is an example of the React Snapshot async API to make a component snapshottable:

```diff
+ import { snapshot } from 'react-snapshot'

class Home extends React.Component {
  state = { quotes: null }

  componentWillMount() {
+   snapshot(() => (
      fetch('/api/quotes')
        .then(response => response.json())
+   ))
    .then(quotes => {
      this.setState({ quotes })
    })
  }

  render() {
    const { quotes } = this.state
    return (
      <div className="Quotes">
        {
          quotes && quotes.map((quote, i) => <Quote key={i} quote={quote}/>)
        }
      </div>
    )
  }
}
```

The idea is that any async parts of your app can be wrapped in a `snapshot` call, which caches responses and rehydrates on the client. However, I've hit a few walls that I think means I'd need changes to React itself to take this to its logical conclusion. Hence, I wanted to start the discussion about whether such changes would be compatible with React's future direction.

### Rehydration

As far as I can tell from my experimentation and from reading the code, the two criteria for reusing the existing DOM elements in a pre-rendered HTML page is:

* the adler32 hash of the initial client-rendered markup has to match the `data-react-checksum` present on the `rootElement`.
* the `_domID` of each instance in the render tree needs to match the `data-react-id` on each DOM element

Between those two criteria, its enforced that the _structure_ and the _content_ of the DOM is the same. I can kinda see why both are needed—the checksum is the cheapest way to confirm the structure will be the same, but the ID of each element is needed to actually wire everything up. Also, `data-react-checksum` is just an attribute, and could be calculated off something that's no longer present in the HTML.

However, generating the exact right checksum in any other way than the existing SSR API turns out to be pretty difficult!

### HTML-escaping woes

I hit this problem where I was rendering the React app like normal, then taking the `innerHTML` of the root container, then passing it to [`addChecksumToMarkup`](https://github.com/facebook/react/blob/master/src/renderers/dom/stack/server/ReactMarkupChecksum.js#L26), and not getting the same checksum as `ReactDOMServer.renderToString`. I first realised I needed to add the `data-reactid` to each element along the way, which wasn't too hard, but still it wasn't working. I figured out it's due to [`escapeTextContentForBrowser`](https://github.com/facebook/react/blob/master/src/renderers/dom/shared/escapeTextContentForBrowser.js) converting things like `'` to `&#x27;` and `"` to `&quot;`, meaning that while the content _appears_ the same once rendered, the precise string is not, therefore the checksum is not, and no rehydration takes place.

From what I can understand, again by reading the code, React _always_ sanitises the HTML content before generating markup (on server or in client), it's just the fact that once its injected into the DOM, `innerHTML` doesn't re-sanitise things like quotes. They don't technically need to be, as discussed in issue https://github.com/facebook/react/issues/3879, and so if that were to be changed this particular problem would disappear, but there may well be more I just haven't hit yet. To me, the real issue is needing to have the content be byte-for-byte equivalent, rather than just functionally (and structurally) equivalent.

### My interim solution

At the moment, I've realised its easier to boot up the app in its entirety, wait for all async processes to take place, then effectively reboot the app using `ReactDOMServer.renderToString` and splice the markup in place. Any side-effects relying on `componentDidMount` (like CSS injection or meta tags in the HEAD) that affect the DOM _outside_ the React app are preserved, but the markup and checksum of the React-rendered HTML are guaranteed to be correct. It works, but its not ideal. You still have to understand that your components are running in two different "modes", they'll run different lifecycle methods in each, and only one generates the final snapshot. Which I think adds an unreasonable conceptual burden, much the same way server-rendering does.

That's really the problem I see with the status quo and why I started looking into this problem in the first place. If snapshot/server rendering requires too much overhead, most people won't do it, which is exactly where we're at. Create-react-app doesn't include any because none of the options are simple enough with a broad enough applicability. The official [React Router docs](https://reacttraining.com/react-router/web/guides/code-splitting/code-splitting-server-rendering) warn agains combining server-rendering and code-splitting. Server-rendering boilerplates include fairly specific webpack hacks to provide the same environment on server and client, etc.

The result is that most people only ever do client-rendering. They serve a blank page & render everything client-side. Code splitting and service worker caching offer useful advantages but imo it's not enough. Snapshot rendering _could_ be the solution, but only if it can offer big benefits for small changes to application code.

### My Dream Solution

Architecturally, what I'd like is for an arbitrary React app to be launched on one browser, executed until ready (async resources complete), snapshotted (serialised to HTML), then resumed on another browser. Those snapshots would be generated then cached at the edge of a CDN during deployment, or periodically depending on how often the content changes.

Practically, I think that would require two changes to React's architecture:

The first is for a weaker check for rehydration—some other fingerprint than a hash of the escaped HTML. Some other method for a snapshot to indicate to React to reuse as much of the existing DOM as possible.

The second would be for only parts of the tree to be rehydrated rather than the whole thing. If a component has some side-effect, say in a `componentDidMount`, then the snapshotted HTML would include the result of that side-effect. But when the app boots on the client side, the render method will generate the initial behaviour. At the moment React would replace what's there with what's just been rendered, but it might be preferable to leave the DOM unchanged on the first render, then wire things up later.

I don't know the exact specifics of a solution, nor do I know enough of the internals of React as it is now or as it will become, but I wanted to start the discussion and see if there was any interest from the React team & wider community in this use case and direction. I look forward to hearing your thoughts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More permissive rehydration logic #10338

Rehydration

HTML-escaping woes

My interim solution

My Dream Solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

More permissive rehydration logic #10338

Description

Rehydration

HTML-escaping woes

My interim solution

My Dream Solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions