Given the latest advancements in web technologies it has opened up a new set of options when developing an AR based solution. Recent updates to web browsers has opened the door for debate in the AR world. Is it better to build an AR experience using the web or a native app?
In this post, I will give a brief overview of JS’s use in the native app world, then I will dive into what is WebAR, how does it work, how does it compete with native apps and which is the better solution.
Who am I?
Let me start by introducing myself, I am Hermes Frangoudis, a Developer Evangelist for Agora.io, and a former engineer at Blippar. In my time at Blippar I had the opportunity to lead the NY development team working with Blippar’s AR and Computer Vision products to create custom solutions for a variety of brands spanning every industry.
What is AR?
What role does JS play on the app-side?
Spiderman vs. Vulture AR game I worked on using Blippar’s JS API
All of the aforementioned SDK’s/API’s came out before ARKit and ARCore. Now, each platform has native implementations and JS is there too! Viro Media has created a React plugin that enables both native and cross platform AR development.
Adobe is another power player in the creator space with their Project Aero, which uses the USDZ format and integrates directly with Adobe’s popular Creative Cloud suite.¹¹
We can’t talk about everyone and not mention Sketchfab. What started out as a repository for 3D artists to upload and display their work has grown into a marketplace with an API, and an ARKit enabled iOS app allows users to place 3D models in their world. Sketchfab is a company to keep an eye on as AR and VR become more popular.
What is WebAR?
WebAR is more than a subset of AR, it is also a blanket term encompassing many different implementations. WebAR solutions can range from using a device’s gyro/accelerometer sensors with a camera feed as the background, to more complex solutions such as AR.js, TensorFlowJS and USDZ.
Fundamentally AR is using a mobile device’s sensors to track its position within the augmented scene. Within the last few years mobile browsers have been adding support for JS Sensor APIs such as the camera, gyroscope, accelerometer, orientation, and magnetometer (read: compass). Leveraging these sensors, developers are able to create a range of experiences.
Blippar was one of the first to launch an in-browser AR experience from a banner advertisement; the placement was a relatively novel concept within the context of AR but made a big splash when it launched. The ad was a 360⁰ experience³ of a car interior with buttons overlaid, to toggle displaying details about the car. The “Wow!” factor came from allowing the user to tap a button and replace the background (seen through the windows and windshields) with the camera feed. Eventually these AR banner ads progressed to more complex experiences.⁴
One of the first questions I asked was how responsive is it? AR is computationally expensive so how can that work in the browser? That’s where WebAssembly comes in. WebAssembly is web-standard that allows browsers to execute assembly-like code using binary files. WebAssembly files are created by compiling C/C++ into
.wasm that are executed using JS code.
Now the barrier is much lower for creating your own machine learning models or implement existing AI models using tfjs-models.
WebAssembly is cool but it’s only half of the WebAR equation. WebAssembly does all the heavy lifting on the computer vision side of AR, and we have webGL for the rendering. WebAssembly and WebGL are the foundation but how do we use these API’s to create web based AR experiences? Enter AR.js, a framework written by Jerome Etienne, that uses A-Frame (built on top of Three.js) and JSARToolkit5 (emscripten port of ARToolKit). Yes, there are some other WebAR frameworks but most need a special web browser app or leverage propriatary APIs. AR.js is open source and doesn’t require any special app; it works within the default browsers.
In order to discuss AR.js and its implications for WebAR, it’s worth taking a quick look at the components that power the framework. A-Frame is a JS based API framework on top of Three.js to make it more like game-coding with an entity component relationship. This simplifies a lot of the Three.js syntax, allowing for the developer to focus on the experience/game. AR.js then uses JSARToolkit to track the 3D scene to a marker, leveraging Computer Vision to detect feature points. This is the type of tracking that powers most of the early app based AR experiences. AR.js has given mobile web the legs it needs to get moving and be competitive with app-based AR. I could ramble on about the possibilities with AR.js but in the interest of time, the main take away here is: AR.js has leveled the playing-field in many ways for web based AR.
Taking a look at Apple and Google’s efforts, we see they’ve taken steps toward an even deeper integration between 3D models and their respective mobile browsers. Let’s start with Apple’s
.USDZ file format.
What is USDZ and how does it work? In simplest terms Apple has built ARkit functionality into Safari for iOS. With a couple of lines of HTML and a
.USDZ file⁵, any website can contain AR elements.
<a rel="ar" href="model.usdz"> <img src="model-preview.jpg"> </a>
"ar" as the relationship attribute when linking to a
.USDZ, the browser will open up a custom AR camera view and allow users the ability to place and rotate the model(s) in the world around them.
USDZ’s can contain one or more static or animated 3D models. In the context of visualization
.USDZ is great but doesn’t do much in terms of interaction and engagement.
Why should we care?
.USDZ is Apple’s standard native file format for displaying 3D within their mobile browser, iMsg, email, and notes apps.⁶ This represents Apple’s view that AR will have a future within their mobile experience.
Like most things in the Apple ecosystem, converting to
.USDZ requires a Mac (with Xcode). Sure, there are online services that offer the ability for non-Mac users to convert their files but it’s not as fluid and in this world ease of integration goes a long way.⁷ All this points to a still maturing pipeline for exporting to
We can’t discuss
.USDZ and Apple without mentioning Google’s advancements with WebXR Device API and the WebXR Hit Test API (in Chrome Canary). Google is looking to put web based AR front and center.
.glTFfile formats. Unlike Apple, Google has chosen to adopt popular and standard formats, which shows that Google is already thinking about lowering the barriers to adoption for those already in the 3D ecosystem.
What does app-less even mean? It sounds dirty yet liberating. App-less AR refers to using native web browser to serve the AR experience allowing it to work across all platforms, devices, and mobile OS.
When Blippar launched AR digital placements (banner ads that launch AR in the web browser), we saw a surge of in-bound leads. There is a huge demand from agencies, retail, entertainment, pharma, etc. all looking to interact with users without having the friction of an app download.
The world does this funny back and forth: first everyone needed a website, then the craze was everyone needed an app! Now the App-stores are over-crowded and downloading an app adds a step between the “Call To Action” and getting the user using your application. For a platform or game this is not the biggest deal, but when speaking in the sense of advertising things are a little different.
Most agencies and brands were willing to add AR experiences into existing apps, but they also realized that the engagement is not the same as when removing the app download. Web is frictionless, everyone has a camera app with QR scanner, that can link out to a web site.
Agencies have a huge desire to meet the user with a uniform experience across all platforms, devices, and OS, which is something the web has been known to do (somewhat) well.
How does web AR compare to native app?
Currently web browsers don’t have enough access in terms of the AR camera. The AR camera differs from the traditional camera as it handles the augmentation at the OS level and not on-top of it. Current implementations of web based AR requires the calculations to be done on top of the OS causing computational lag, limiting rendering, and sometimes even causing visible lag.
A huge step towards making AR even more accessible through the web would be for the Web Standards to adopt an API for direct access to the ARCamera object.
If that abstraction could exist as a standard web API, any browser app could leverage ARkit/ARCore or whatever underlying platform exists. Once a web API exists many different frameworks will emerge. There are a few experimental browsers that leverage ARKit/ARCore but they require a specific JS framework.
USDZ is a good start but it’s missing a vital component, a layer that adds support for interaction. Google’s efforts are still only available in the canary version of Chrome, so until it is included within the production build it will lag behind Apple’s.
How can WebAR become more competitive?
Going back to the AR ad placement I mentioned earlier; at the time the biggest struggles were centered around browser compatibility, which is still an issue to-date with web based AR experiences.
Not every mobile browser has support for the Sensors API or devices are missing certain sensors, which was a huge issue with Android devices in particular. When releasing an app through a store it’s possible to control which devices the app can be installed on, but with the web you don’t have that control. Yes, it’s possible to add checks within the webpage but then you serve a screen that says “Sorry, your device is not supported,” it feels like a punch in the gut.
Native apps also have the ability to tap into ARCamera and use that to do the heavy lifting at the OS level instead of competing. What I mean is there are many ways to apply computer vision and SLAM tracking, but with ARKit and ARCore, the native integrations are bundled together and optimized with the OS.⁹
When I started writing this post, I had the mindset that there would be a clear list of pros and cons but after digging through my perceived pros/cons, there are SDKs and APIs to supplement wherever things fall short for both Web and Native.
Visual Search is only achievable through an app based solution. For example Blippar’s recognition engine didn’t rely on QR codes, it used AI to recognize known entities within its system and provide an experience if a match exists. This is great for companies looking to leverage their existing printed materials without having to change their design.
An AR visual search example that sources gifs based on the objects identified in the camera stream
The Visual Search behavior is still new and not super intuitive; most people aren’t used to pointing their phone at stuff, even with a visible call to action.
In lieu of Visual Search, WebAR relies on QR codes. From a design perspective QR codes are not very sexy but ever since iOS and Android both added support for QR code recognition within their native camera apps¹⁰ the behavior to scan QR codes has become more widely used.
Another argument could be made that although the internet and AR are globally available we need to keep in mind that in some emerging markets the internet is not as fast and/or reliable. This creates a need for support for offline use, which is only available through an app. On the other hand getting someone to download an app is much harder than visiting a website. So the final verdict is… it will really depend on the project.
Many people make predictions about the future of AR, whether it’s headsets vs. projectors vs. the extremes of implanted chips, etc. To join the ranks of the bold and brave fortune tellers of this world, I will share my thoughts.
Currently most AR content (media within experiences) is either hosted on device or delivered from the cloud.
Blippar, Facebook, Snapchat, Zappar all use a cloud based CMS that downloads the AR experience based on some sort of trigger (link, marker, face, QR-code, etc). To give some context as to how cloud delivered AR works, the mobile app has some sort of trigger or entry point (link, marker, face, QR-code, etc) that launches the experience. This trigger prompts the app to make a request to the backend system to send the assets and code for the experience.
Most platforms download the entire experience prior to launching it, which explains why Facebook and Snapchat have a 4mb limit, to keep things quick. At Blippar we were delivering a variety of experiences so sometimes we had to get creative. Projects ranged from video on a page to 3D worlds, racing cars up mountain passes, and even full on apps; so we had campaigns that ranged from >1mb to upwards of 85mb or greater.
Why is this troublesome? Like I mentioned earlier we used to get creative with the way we coded the scenes to download assets in the background, so what’s the big deal? As it turns out there are some pretty impactful numbers behind why size matters and getting the balance right is crucial to the success of your AR experience.
At Blippar we saw there is ~50% drop off for any experience that takes over 30 seconds to load (download and initialize) and another ~75% churn from those users that stuck it out for the initial interaction. That means a perceived long download time can lead to losing up to 90% of your audience, leaving roughly 10% of users that will re-engage.
So now, beyond having to get someone to download an app, in order to keep the user, your app needs to load fast. If you get the right balance the experience can see up to 3x engagements per user, with 2x the dwell time.
WebAR uses web optimizations for downloading and delivery but size is still important. Without streaming the content, the larger the experience the longer it will take to load within the mobile browser.
I think as the industry matures, cloud delivery will become dominant, but we will also realize a need for streaming and synchronization in the delivery of the experience.
Take for example the tabletop AR soccer games, that uses AI/ml to capture player movement and display it through a Hololens in AR.¹²
For such apps to become production ready they would need to integrate a real-time-network (such as the platform by Agora.io) to distribute that data with guaranteed latencies. When multiple people are watching the same game with different latencies it would ruin the shared experience.
Currently most AR experiences are either rendered completely on device or in the future they can be delivered from the cloud. Jerome Etienne has some interesting ideas around Cloud Rendering, where devices would stream the video feed to the server and the server would render the AR with better quality.
All the best solutions do not exist, there is lots of room for innovation. With all the advancements we are seeing in each native and web based capabilities these are still the fundamental elements emerging that will one day serve as the foundation for much more complex and amazing ecosystems.
Footnotes & Asides
- Though Bose and Google are making some nice advancements, the sector is still nascent and my focus for this article will be the visual implementation of AR.
- Blippar’s core app functionality was written in C++ and had Obj-C and Java wrappers for each platform.
- 360⁰ experience consist of an image or video (spherical or cubic), where the media is mapped to the interior of a 3D object (sphere or cube) and the device (camera viewport) is at the center and all elements are placed relatively around it.
- More complex experiences refers to face-tracking, games and the ever-so popular photo-booth with overlays and face-filters.
- USDZ is a zip file containing a
.USDfile (3D file format from Pixar) and the accompanying texture files.
- This speaks to a larger issue within AR/VR (and the 3D industry in general), there is no standard file format. AR is still so nascent that a mutually agreed upon standard has not been released. For example The Kronos Group (sets WebGL and OpenGL statandards) offers
.glTF, ARToolkit uses
.armodel, Blippar uses
.b3m, Apple’s Scenkit uses
.SCN, Unity uses
.objand a variety of formats, and the list goes on.
- The rate of adoption is probably not something Apple is too concerned about, as they tend to set their standard by playing the long game, slowly growing into the mainstream.
- To generate a
.USDZfrom an existing 3D model using Xcode, you need to first export to an intermediary format from your 3D editor; for static models us
.obj, for animated models export an
- Does that mean ARKit and ARCore are the best computer vision algorithms on the market? No, but it does mean that these complex CV algorithms are now democratized to the general developer and optimized for performance on the supported devices.
- iOS supports QR code reading within apps using the
metadataOutput()delegate or CoreML for any other computer vision tasks. Android MLKit handles all computer vision related tasks.
- Just between you, me and the internet… I would wager that Aero will incorporate JS in some way. I know a few former Blippar employees that are working at Adobe as part of project Aero and their backgrounds have a lot of AR tied to JS experience.
- “Starting from YouTube frames (top row), the depth maps reconstructed by the neural network ca populate a virtual 3D soccer environment, shown here as mesh-only and textured renderings (rows 2–4).” — Danny Paez