Forking Chrome to turn HTML into SVG
November 11, 2022
I've been working on a program called html2svg
, it converts web pages to SVG. It's based on a fork of Chromium to support modern web standards. This post explains most patches.
Take a picture
Chromium is built on top of Blink: an HTML engine forked from WebKit, and Skia: a 2D engine also used in Firefox and Android.
Blink consumes the HTML input, and Skia produces the graphical output. The Chromium compositor (cc
) is in between, but we'll ignore it for now.
In order to support multiple platforms and targets, Skia is built to render into a back-end: it could be its GPU renderer called Ganesh, its software rasterizer, or even a PDF file. This is how Chromium can work with or without a GPU and export web pages to PDF files with high fidelity.
Skia also has an experimental SVG back-end, and that's what we're going to use to build html2svg
!
To get started, we'll need to find a way to render the page into an SVG canvas and expose it under a JS API.
The Chromium docs explains how you can export an .skp
file using the --enable-gpu-benchmarking
flag.
An .skp
file is a binary representation of an SkPicture
, a C++ class containing Skia drawing instructions that can be replayed into any canvas through its playback()
method.
Looking at the code we can find that it uses cc::Layer::GetPicture()
to get an SkPicture
:
// Recursively serializes the layer tree.
// Each layer in the tree is serialized into a separate skp file
// in the given directory.
void Serialize(const cc::Layer* root_layer) {
for (auto* layer : *root_layer->layer_tree_host()) {
sk_sp<const SkPicture> picture = layer->GetPicture();
if (!picture)
continue;
Declare a function
We're going to add a new global JS API into Chromium to get started, let's call it getPageContentsAsSVG()
.
The GPU extension we're using to generate .skp
files registers itself in content::RenderFrameImpl::DidClearWindowObject()
which makes sense: it has access to the rendering data, and it's called right after window
, the global object, is created. Adding the following at the end of this method is enough to get our global function registered.
// Get access to the JS VM for this process (each tab is a process)
v8::Isolate* isolate = blink::MainThreadIsolate();
// Automatic v8::Local destruction
v8::HandleScope handle_scope(isolate);
// Get the JS context for the current tab
v8::Local<v8::Context> context = GetWebFrame()->MainWorldScriptContext();
// Automatic context entry/exit
v8::Context::Scope context_scope(context);
// Get the global object (window)
v8::Local<v8::Object> global = context->Global();
// Create a new JS function binding
v8::Local<v8::FunctionTemplate> fn = v8::FunctionTemplate::New(
isolate,
[](const v8::FunctionCallbackInfo<v8::Value>& args) {
v8::Isolate* isolate = blink::MainThreadIsolate();
args.GetReturnValue().Set(
v8::String::NewFromUtf8(isolate, "imagine this is svg").ToLocalChecked()
);
}
);
// Register the function as getPageContentsAsSVG()
global->Set(
context,
v8::String::NewFromUtf8(isolate, "getPageContentsAsSVG").ToLocalChecked(),
fn->GetFunction(context).ToLocalChecked()
).Check();
Let's run Chromium, open the debugger and try.. and it works! Now we need to do some actual work inside the function.
Render to SVG
// Get access to the main JS VM for this process (each tab is a process)
v8::Isolate* isolate = blink::MainThreadIsolate();
// Automatic v8::Local destruction
v8::HandleScope handle_scope(isolate);
// Get the WebLocalFrame for the current v8 Context
auto* frame = WebLocalFrame::FrameForCurrentContext();
// Get access to the root rendering layer
auto* root = frame->LocalRoot()->FrameWidget()->LayerTreeHost()->root_layer();
// Go over each sub-layer
for (auto* layer : *root->layer_tree_host()) {
// Get vectorial data for this layer
auto picture = layer->GetPicture();
// Skip if we get there is no data
if (!picture) {
continue;
}
// Create a memory stream to save the SVG content
SkDynamicMemoryWStream stream;
// Create an SVG canvas with the dimensions of the layer
auto canvas = SkSVGCanvas::Make(picture->cullRect(), &stream);
// Draw the layer data into the SVG canvas
canvas->drawPicture(picture.get());
// Allocate a buffer to hold the SVG data
auto size = stream.bytesWritten();
auto* bytes = new char[size];
// Copy from the stream to the buffer
stream.copyTo(static_cast<void *>(bytes));
// Return the data to the JS world
args.GetReturnValue().Set(
// Copy the UTF-8 buffer into an UTF-16 JS string
v8::String::NewFromUtf8(isolate, bytes, v8::NewStringType::kNormal, size).ToLocalChecked()
);
// Release the allocated data
delete[] bytes;
// Don't process any other layers
break;
}
This won't work because blink::WebFrameWidget::LayerTreeHost()
is private. Let's make content::RenderFrameImpl
a friend class:
// GPU benchmarking extension needs access to the LayerTreeHost
friend class GpuBenchmarkingContext;
+ // Allow RenderFrameImpl to access the LayerTreeHost for html2svg
+ friend class content::RenderFrameImpl;
Linking error now, we need to bundle SkSVGCanvas
:
- # Remove unused util sources.
- sources -= [ "//third_party/skia/src/utils/SkParsePath.cpp" ]
+ # Add SVG dependencies for html2svg
+ deps += [ "//third_party/expat" ]
+ sources += [
+ "//third_party/skia/src/xml/SkDOM.cpp",
+ "//third_party/skia/src/svg/SkSVGCanvas.cpp",
+ "//third_party/skia/src/svg/SkSVGDevice.cpp",
+ "//third_party/skia/src/xml/SkXMLParser.cpp",
+ "//third_party/skia/src/xml/SkXMLWriter.cpp",
+ ]
getPageContentsAsSVG()
does output something that resembles SVG, but there is an error opening it: XML closing tags are missing.
SkSVGCanvas
closes tags when its destructor is called:
SkSVGDevice::~SkSVGDevice() {
// Pop order is important.
while (!fClipStack.empty()) {
fClipStack.pop_back();
}
}
Let's wrap it in a scope:
// Create a memory stream to save the SVG content
SkDynamicMemoryWStream stream;
{
// Create an SVG canvas with the dimensions of the layer
auto canvas = SkSVGCanvas::Make(picture->cullRect(), &stream);
// Draw the layer data into the SVG canvas
canvas->drawPicture(picture.get());
}
// Allocate a buffer to hold the SVG data
auto size = stream.bytesWritten();
auto* bytes = new char[size];
Better, but the text is rendered with a serif font and has weird kerning.
The serif font is caused by missing fonts, we can fix that by adding a fallback to the font-family
attribute of <text>
elements:
- if (!familyName.isEmpty()) {
- this->addAttribute("font-family", familyName);
- }
+ familyName.appendf(
+ (familyName.isEmpty() ? "%s" : ", %s"),
+ "-apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'"
+ );
+
+ this->addAttribute("font-family", familyName);
The weird kerning appears because the position of each character is set, we can workaround this by only setting the position of the first character:
- fPosXStr.appendf("%.8g, ", position.fX);
- fPosYStr.appendf("%.8g, ", position.fY);
+ if (fPosXStr.isEmpty()) {
+ fPosXStr.appendf("%.8g", position.fX);
+ }
+
+ if (fPosYStr.isEmpty()) {
+ fPosYStr.appendf("%.8g", position.fY);
+ }
Little better! Here are some results:
Fix the T in Twitter
So the T
letter is missing from twitter.com, just the T
letter. Hmm okay weird, here's how Blink and Skia handles font data:
- Blink loads fonts from the system or remotely and maps them into the
SkTypeface
class SkTypeface
uses a font back-end internally:CoreText
for macOS,DirectWrite
for Windows, andFreeType
for Linux.- The font back-end parses raw font files and exports an array of supported UTF-32 code points and their vectorial representations, the position in the array is called the glyph ID.
- Blink passes the text to HarfBuzz which returns a set of glyph IDs and their relative positions, they're passed to Skia which picks the vectorial data and renders it.
Step 4 is needed to support ligatures and some languages. Arabic characters for example, have to be rendered differently based on their position in a word. This is why HarfBuzz is between Blink and Skia: it shapes a string of unicode characters into a set of glyph IDs and their positions.
Take my name written in latin and arabic script, add a space between letters and the characters are rendered differently in arabic. The unicode characters do not change, but their graphical representation do.
Some of this logic is implemented in HarfBuzz, and some is implemented in font files through tables:
CMAP
: the character map table, map glyph ID and its platform encoding IDGSUB
: the glyph substitution table, map one or more glyph IDs to one one more glyph IDs, that's where ligatures are declared
A font implementing a ligature for fi
will have a GSUB
entry to replace the glyphs for f
and i
with a special glyph for fi
, and this special glyph will very likely map to a special character in the CMAP
table.
And that's the problem, getGlyphToUnicodeMap()
just goes over the CMAP
table, the T
from Twitter is very likely implemented as a substitution on the GSUB
table, which maps to a special character, which won't map to a valid Unicode codepoint.
Our back-end is SVG, it is supposed to handle text shaping already, so we need to bypass HarfBuzz. Text shaping is handled by blink::Font::DrawText()
, which send data to HarfBuzz and then calls blink::Font::DrawBlobs()
with the glyph data. We're going to use SkTextBlob::MakeFromString()
to build a text blob of nominal glyphs from a string. These will map 1:1 to a unicode codepoint, allowing the SVG viewer to handle text shaping. Here's what the patch looks like:
- CachingWordShaper word_shaper(*this);
- ShapeResultBuffer buffer;
- word_shaper.FillResultBuffer(run_info, &buffer);
- ShapeResultBloberizer::FillGlyphs bloberizer(
- GetFontDescription(), device_scale_factor > 1.0f, run_info, buffer,
- draw_type == Font::DrawType::kGlyphsOnly
- ? ShapeResultBloberizer::Type::kNormal
- : ShapeResultBloberizer::Type::kEmitText);
- DrawBlobs(canvas, flags, bloberizer.Blobs(), point, node_id);
+ // Bypass HarfBuzz text shaping for html2svg
+ auto blob = SkTextBlob::MakeFromString(
+ StringView(run_info.run.ToStringView(), run_info.from, run_info.to - run_info.from).
+ ToString().
+ Utf8().
+ c_str(),
+ PrimaryFont()->
+ PlatformData().
+ CreateSkFont(false, &font_description_)
+ );
+
+ if (node_id != cc::kInvalidNodeId) {
+ canvas->drawTextBlob(blob, point.x(), point.y(), node_id, flags);
+ } else {
+ canvas->drawTextBlob(blob, point.x(), point.y(), flags);
+ }
Surface compositing
>
text = main.onCreateDevice()
text.drawText("chocolatine", 0, 0)
gradient = main.onCreateDevice()
gradient.drawRect(0, 0, 500, 150)
text.drawDevice(gradient, SkBlendMode::DstIn)
main.drawDevice(text, SkBlendMode::Over)
Testing mui.com
I noticed some text elements with a gradient effect did not render. This is because SkSVGDevice
does not implement surface compositing: drawing into a surface, and then rendering this surface using a Porter-Duff compositing operation.
We need to implement SkBaseDevice::onCreateDevice()
and SkBaseDevice::drawDevice()
to support this. On the GPU renderer it creates a texture, on the CPU renderer it allocates a buffer, and on SVG we're going to use <g>
.
I won't go into too much details on the implementation as it probably deserves its own post, but basically we use <g>
elements to create textures, <use>
to display them, and <feComposite>
to blend them.
Before | After |
---|---|
Render the whole page
Only the first ~6,000 pixels are rendered. We need to get the compositor to draw the whole page, and blink::WebLocalFrame::capturePaintPreview()
does exactly that! It seems to have been implemented for the phising classifier, there was a Chromium blog post about it. Combined with cc::PaintRecorder
we can get it to render into our canvas.
cc::PaintRecorder recorder;
auto rect = SkRect::MakeWH(width, height);
frame->CapturePaintPreview(
gfx::Rect(0, 0, width, height),
recorder.beginRecording(rect),
false,
false
);
auto canvas = SkSVGCanvas::Make(rect, &stream);
recorder.finishRecordingAsPicture()->Playback(canvas.get());
We now have another problem, the macOS scrollbar also gets rendered into the SVG! Suprisingly, it's fully vectorized. We can work around this by adding some CSS in the Electron code:
const style = document.createElement('style')
style.innerHTML = `
body::-webkit-scrollbar,
body::-webkit-scrollbar-track,
body::-webkit-scrollbar-thumb {
display: none;
}
`
document.head.appendChild(style)
Support shadows
One thing missing is shadows. Skia doesn't explicitly support drawing them, but it provides the two main ingredients: gaussian blur and clipping.
We need to add some code to handle the maskFilter
property of SkPaint
, it contains the SkBlurMaskFilter
used for bluring:
+ if (const SkMaskFilter* mf = paint.getMaskFilter()) {
+ SkMaskFilterBase::BlurRec maskBlur;
+
+ if (as_MFB(mf)->asABlur(&maskBlur) && maskBlur.fStyle == kNormal_SkBlurStyle) {
+ SkString maskfilterID = fResourceBucket->addColorFilter();
+
+ AutoElement filterElement("filter", fWriter);
+
+ filterElement.addAttribute("id", maskfilterID);
+
+ AutoElement floodElement("feGaussianBlur", fWriter);
+
+ floodElement.addAttribute("stdDeviation", maskBlur.fSigma);
+
+ resources.fMaskFilter.printf("url(#%s)", maskfilterID.c_str());
The clipping code had to be refactored because shadows extend outside of standard clipping bounds, but we'll skip it as it is involved.
Vectorize <canvas>
What if we could also vectorize 2D <canvas>
elements controlled by JavaScript? Turns out, Chromium has this capability built-in for printing:
// For 2D Canvas, there are two ways of render Canvas for printing:
// display list or image snapshot. Display list allows better PDF printing
// and we prefer this method.
// Here are the requirements for display list to be used:
// 1. We must have had a full repaint of the Canvas after beforeprint
// event has been fired. Otherwise, we don't have a PaintRecord.
// 2. CSS property 'image-rendering' must not be 'pixelated'.
// display list rendering: we replay the last full PaintRecord, if Canvas
// has been redraw since beforeprint happened.
if (IsPrinting() && IsRenderingContext2D() && canvas2d_bridge_) {
All we need is to make IsPrinting()
return true
:
bool HTMLCanvasElement::IsPrinting() const {
- return GetDocument().BeforePrintingOrPrinting();
+ return true;
}
And there you go, SVG pacman based on the MDN <canvas>
demo!
Final thoughts
That's all for now, html2svg
is live on GitHub, check it out!