Image-to-Material: Production-Grade PBR from a Single Image

The gap nobody's closing

There are 3D assets everywhere, but very few with true PBR materials:

Photogrammetry scans with baked lighting. Game-ready meshes with hand-painted albedo and approximated PBR. Generative 3D outputs that look great in a single screenshot and fall apart the moment you try to relight them. Real-world captures sitting in archives - museums, cultural heritage projects, with geometry intact and material data completely absent.

The 3D industry is now very good at making, and now generating, shapes. But shapes without materials aren't production-ready. You can't relight them realistically and place them into a 3D scene convincingly. You can't just drop them into Unreal or Unity and expect realism out the box because the key thing.. the materials - are missing!

The dream, automatically add PBR to any mesh ; photogrammetry, artist or generative.

Existing approaches for automatic PBR creation, have tried to solve the material problem, by attempting to recover material properties through view-dependent effects; either from multi-view images under varying angles, or from controlled lighting setups like light stages and photometric stereo rigs, but many of these fall short.

At M-XR, we've worked on this problem for years with our own capture technology, Marso Measure, and we can tell you directly: recovering PBR from view-dependent effects is not only technically difficult, it's fundamentally fragile. The solutions are sensitive to lighting conditions, to the quality of the input photography, to baked illumination in the source data. They don't generalise well and they don't produce stable, production-grade results.

We think there's a better way.

This blog post explores how our latest model, I2M (image to material) solves the core PBR problem.

Our thesis: identification, not estimation

The idea that changed how we think about PBR.

For automatically adding PBR to any 3D asset - any attempts to solve the material properties through multi-view estimations is fundamentally an unstable approach. This approach has way too many unconstrained variables; lighting or exposure consistency. Such methods in my honest opinion, do not lend themselves to a production ready & stable solution.

However, when you look at an object, a scuffed metal handrail, a worn leather bag, a polished marble countertop - you don't need to see it from multiple angles or under controlled lighting to know what the material is. Speaking as someone with an artistic background - you don’t need to solve a physics equation in your head. You look at it once and you know. You are recognising material properties from context, from experience, from having seen thousands of similar surfaces in your life.

If a person can do that from a single image, so should a network.

This reframes the entire problem. Instead of trying to invert a rendering equation, which is difficult. We train a model to identify material properties the way a human would: One image. One shot. The goal, predict the same PBR properties for an object regardless of the lighting condition it was captured under.

Sounds easy, well why haven’t others delivered on this? Well here's the catch.. this only works if the training data is good enough, and by good enough, I mean physically measured, internally consistent, and captured at scale across real-world conditions. Which is exactly what we spent years building.

The Data Play

Marso Measure and it’s data that makes this possible

Our journey at M-XR towards I2M hasn’t been one of a happy accident, it’s a product of a deliberate multi-year strategy, and some pretty stubborn founders.

We started with the question: “how do you capture accurate PBR from real-world objects at scale?” Not in a studio or light stage, but in the field, on location at museums, indoors, outdoors… anywhere, and by anyone! If PBR capture is exclusive to a location and a budget, your dataset will be limited as a result - not good.

We built Marso Measure as a solution to this question. Using just a camera & a flash (very accessible), and a standard photogrammetry workflow, we can directly measure the light scattering behaviour of real surfaces and map that into accurate PBR parameters. No custom hardware or fixed-location lab environment.

With this technology, we've been building the infrastructure to generate the best PBR training data in the world, at scale. This isn’t just a feature of our approach, it's the entire strategy.

So why does this matter so much?

The training data problem everyone else is facing

If you want to train a model to predict PBR, you need PBR training data, which sounds straightforward enough. However, digging into the corpus of Open-source & private PBR datasets, you will quickly realise the PBR values & quality is quite inconsistent to say the least. We can break this into two groups

Artist-made data is subjective. Different artists arrive at different values for the same material. There's no universal ground truth. The data reflects creative judgment, not physical measurement. It also misses the features that make real-world materials real - the watermark around the base of a kettle from years of use, the smooth polished patches on a metal railing where thousands of hands have gripped it over decades, the uneven wear patterns that only some artist thinks to add but every real object has.

Scanner-captured data is biased by hardware constraints. Cross-polarisation introduces artefacts. Photometric stereo setups make assumptions about surface geometry. Light stages can only capture objects that fit inside them - good luck scanning a piece of furniture, a tree, or a side of a building. Nearly all of this hardware is bound to indoor, fixed-location environments. The cost alone limits where and what you can capture.

This results in datasets with clear limitations, mixed quality, weak PBR signals, and limited variety of objects. Learn more about this issue in our blog post

We put this data to the test when training I2M, asking the question - How hard is it for a model to learn PBR with such datasets? The results are hard to argue against.

open-source data

M-XR Data

The model trained on public data struggled to learn consistent material understanding. The different parts of the kettle aren’t being understood by the model, the relationship between diffuse & specular is totally wrong. Yet, the model trained on M-XR’s measured data produced clean, physically grounded predictions across material types. Same architecture, Different data. That's the gap → data is key, which is why we have spent so long perfecting data acquisition.

What M-XR's data actually looks like

Our dataset is captured across a wide variety of real-world environments, and include objects that are both tiny and extremely large. Most importantly, it also spans the full scope of PBR properties for both shader workflows across 7 measured channels: albedo, metallic, index of refraction, specular colour, diffuse, roughness, and normals.

A lot of PBR estimation methods, and a lot of scanning pipelines, produce channels that are derivatives of each other. The specular map is computed from the base colour luminance. The roughness map is an inversion of another channel. They look like separate maps but they're carrying the same information repackaged. This is a huge no go for any ML model!

When you train on that kind of data, the model learns the same cross-contamination. Its predictions look plausible at a glance but collapse under scrutiny, the roughness doesn't carry real roughness information, it's just a function of albedo.

Here you can see how the roughness of M-XR’s scans (middle) aren’t cross contaminated by the colour component when material types are the same, when compared with photometric stereo (right).

Render

Marso Measure

Photometry

When I2M learns from this data, it learns to predict channels that carry genuine, independent signal. The roughness map captures real surface micro-geometry. The specular map captures real reflective behaviour. They're not derivatives of each other.

Render

Roughness

Not only does our data carry accurate material properties, but it also records all the nuanced details of an object’s history; where finger prints would exist on a bar of chocolate, the thin creases on a leather shoe from walking.

How I2M works: image in, PBR out

This long journey, years of insight & data has compounded to train our foundational PBR model I2M. I2M is an image-space model, capable of converting any image into the full array of PBR channels.

Input Image

I2M Predicts PBR

Relightable Asset

We made a conscious decision for the model to work in image space rather than mesh space, I2M stays decoupled from geometry. It doesn't need to know about UVs, topology, or mesh structure. This gives it a huge degree of flexibility across three distinct use cases:

Mesh retexturing — Take an existing 3D asset; artist-made, photogrammetry scan, or generative output, and enrich it with production-grade PBR. As I2M operates in image space, it respects the existing UV layout and source geometry. You're not replacing the asset, you're adding material intelligence to it. The artist's topology stays intact. The UV islands stay intact. The art direction stays intact.

Single image prediction - Pass in a photograph, a texture swatch, a piece of concept art. Get back full PBR channels. This is useful for game development material creation, rapid prototyping, or turning reference photos into usable material data.

Multi-view consistent prediction — Feed it a set of consistent views around an object. I2M predicts PBR independently per view, maintaining material consistency across the set. This maps directly onto standard multi-view texturing pipelines for 3D assets.

Why are we doing it this way? Because we're not trying to replace your asset or your workflow. We're trying to make the assets you already have, or the assets coming out of your existing pipeline, production-ready.

Results

A large part of I2M success criteria for us has been predicting PBR consistently across a vast array of different types of objects, materials, and the many different state’s and conditions they consist of. Bellow are a some example of such varying objects and I2M’s PBR predictions which hold up through this wide gamut of object types - proving the importance of data diversity which is a key benefit of our data aquisition approach with Marso Measure.

Full PBR in 3D

Our pipeline built around I2M can quickly convert an untextured mesh into a full PBR game ready asset quickly & repeatably. Such assets can now be placed directly into a render engine from true dynamic lighting, another large step towards realism.

All of these assets used in demonstration are either generated, or downloaded from Sketchfab. Please see our Sketchfab collection to see full capture credits.

On photogrammetry scans

Photogrammetry as a medium, has huge potential in capturing the nuances of the real world, but the lack PBR creates a ceiling for the utility of such 3D scans. With I2M, we can take any photogrammetry scan with baked lighting and enrich it will all the PBR material channels necessary to deliver game-ready 3D assets. This breakthrough unlocks huge potential for vast archives of photogrammetry assets.

Input : Photogrammetry

Output : PBR Asset

On generative 3D outputs

Generative meshes are becoming increasingly more popular in the space of 3D, with recent developments leading to assets with complex geometry with clean topology. However, the elephant in the room is still the lack of quality PBR. By adding I2M model at the end of the generation pipeline to enrich these assets with PBR, you can now generate photorealistic assets that have never existed before!

Input : Generative 3D

Output : PBR Asset

What's available

We're launching Marso Studio, a platform for creating PBR assets using I2M and the broader M-XR toolset.

Alongside the platform, we'll be making I2M available via API for integration into existing pipelines. More details on API access, pricing, and documentation are coming soon.

What's next for I2M

We're continuing to invest in both the model and the data.

Normal map prediction is the next major channel we're adding to I2M's output set. Marso Measure already captures normals as part of our 7-channel pipeline, so the training data is there - it's a matter of model architecture and validation.
Higher native resolution. Currently I2M predicts at 960px per frame and can deliver 4K textures onto a mesh. We're working toward 2K native prediction per frame, which will enable 8K+ texture resolution on output.
Speed and efficiency improvements for the API, targeting faster inference and lower cost per prediction.

Ready to see what I2M can do?

We're running a small number of proof-of-concept integrations with established studios, bring your mesh and your reference material, and we'll show you what Marso Studio produces against your existing pipeline.

If that sounds like your situation, get in touch 👋

Not quite there yet? Join the waitlist and you'll be first to know when we open access further.

Image To Material (I2M)