(2021-09-28)3D-Reconstruction Review(1)
Fundamentally Reading
(仅作为学习过程中的记录,由于是最开始几篇相关论文,读的比较细)
Efficient Reflectance Capture Using an Autoencoder
1.问题介绍
渲染方程:https://zhuanlan.zhihu.com/p/52497510?from_voters_page=true
BRDF概念:https://www.cnblogs.com/mengdd/archive/2013/08/05/3237991.html
batch normalization layer概念:https://zhuanlan.zhihu.com/p/74516930
stacked neural network这里面提到了一点:https://www.cnblogs.com/tornadomeet/archive/2013/05/05/3061457.html
渲染方程中什么叫lobe:https://www.zhihu.com/question/314492478
2.相关工作
2.1 综述
Tim Weyrich, Jason Lawrence, Hendrik P. A. Lensch, Szymon Rusinkiewicz, and Todd
Zickler. 2009. Principles of Appearance Acquisition and Representation. Found.
Trends. Comput. Graph. Vis. 4, 2 (2009), 75–191.
Michael Weinmann and Reinhard Klein. 2015. Advances in Geometry and Reflectance
Acquisition. In SIGGRAPH Asia Courses. Article 1, 71 pages.
2.2 Direct Sampling
- Kristin J. Dana, Bram van Ginneken, Shree K. Nayar, and Jan J. Koenderink. 1999.
Reflectance and Texture of Real-world Surfaces. ACM Trans. Graph. 18, 1 (Jan. 1999),
1–34.
对大量的光照和观察角度进行采样,并穷举这两个因素的任意组合
评述 :typically time-consuming
- 下面的方法在尝试通过对数据进行假设,减少该消耗,减少图片数量.
- Stephen R. Marschner, Stephen H. Westin, Eric P. F. Lafortune, Kenneth E. Torrance,
and Donald P. Greenberg. 1999. Image-based BRDF Measurement Including Human
Skin. In Proc. EGWR. 131–144.
homogeneous的凸图形可以通过一个方向的观察view就能得到反射率(通过利用 normal variations对angular domain进行足够的采样) - Hendrik P. A. Lensch, Jan Kautz, Michael Goesele, Wolfgang Heidrich, and Hans-Peter
Seidel. 2003. Image-based Reconstruction of Spatial Appearance and Geometric
Detail. ACM Trans. Graph. 22, 2 (April 2003), 234–257.
假设外观(appearance)是基本材料的线性组合的话,形状已知图形的反射率(尽管处处不同)通过少量的照片可以重建 - Todd Zickler, Sebastian Enrique, Ravi Ramamoorthi, and Peter Belhumeur. 2005. Reflectance Sharing: Image-based Rendering from a Sparse Set of Images. In Proc. EGSR.
253–264.通过6D空间中的反射信息,对分散的数据进行插值( scattered-data interpolation)进行重建 - Jiaping Wang, Shuang Zhao, Xin Tong, John Snyder, and Baining Guo. 2008. Modeling
Anisotropic Surface Reflectance with Example-based Microfacet Synthesis. ACM
Trans. Graph. 27, 3, Article 41 (Aug. 2008), 9 pages.
利用反射率空间上的相似性,以及局部坐标系空间的变化,通过单角度的衡量,来完成BRDFS在的微表面分布(microfacet distributions). - Yue Dong, Jiaping Wang, Xin Tong, John Snyder, Yanxiang Lan, Moshe Ben-Ezra, and
Baining Guo. 2010. Manifold Bootstrapping for SVBRDF Capture. ACM Trans.
Graph. 29, 4, Article 98 (July 2010), 10 pages.
假设反射率在一个低维流形上,提出两阶段识别模型 - Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2015. Two-shot SVBRDF Capture for
Stationary Materials. ACM Trans. Graph. 34, 4, Article 110 (July 2015), 13 pages.
利用两张图片取建模stochastic-texture-like采莲的外观 - 上述文章仅作为比较。该文章 In particular, we reconstruct the reflectance at each
point independently, despite the low number of measurements.(?)
同时,通过训练本文中的autoencoder,额外的材料的性质很容易被利用(不用手动推导)
2.3 Complex Lighting Patterns
- 与本文的方法类似.记录不同光照模式(lighting pettern)下样本的反映来恢复反射性质.
- Abhijeet Ghosh, Tongbo Chen, Pieter Peers, Cyrus A. Wilson, and Paul Debevec. 2009.
Estimating Specular Roughness and Anisotropy from Second Order Spherical Gradient Illumination. Computer Graphics Forum 28, 4 (2009), 1161–1170.
采用spherical harmonics (SH)光照模式,,然后利用一张手工推导的反向查询表进行恢复.该反向查询表建立了the observed
radiance to anisotropic BRDF parameters的映射关系 - Giljoo Nam, Joo Ho Lee, Hongzhi Wu, Diego Gutierrez, and Min H. Kim. 2016. Simultaneous Acquisition of Microscale Reflectance and Normals. ACM Trans. Graph. 35, 6,
Article 185 (Nov. 2016), 11 pages.
提出一个相似的系统,在较少数量的基本材料的假设下,可以通过自动优化重建micro-scale的反射率 - 上述的系统都假设了入射光离得较远(相对于样品的尺寸来说)
- Andrew Gardner, Chris Tchou, Tim Hawkins, and Paul Debevec. 2003. Linear light
source reflectometry. ACM Trans. Graph. 22, 3 (2003), 749–758.and Peiran Ren, Jiaping Wang, John Snyder, Xin Tong, and Baining Guo. 2011. Pocket
reflectometry. ACM Trans. Graph. 30, 4 (2011), 1–10.
线性光源,平面的各向同型的材料样品.and
Guojun Chen, Yue Dong, Pieter Peers, Jiawan Zhang, and Xin Tong. 2014. Reflectance
Scanning: Estimating Shading Frame and BRDF with Generalized Linear Light
Sources. ACM Trans. Graph. 33, 4, Article 117 (July 2014), 11 pages
将其扩展到各向异性的发射率,方法是通过调整线性光源(linear light source)的强度,并假设外观子空间的低秩性. - Miika Aittala, Tim Weyrich, and Jaakko Lehtinen. 2013. Practical SVBRDF Capture in
the Frequency Domain. ACM Trans. Graph. 32, 4, Article 110 (July 2013), 12 pages.
提出一个单个相机,倾斜的近场LCD板作为可编程的面光源,来得到各向同性的反射率.它基于手动的频域分析. - 本文使用16~32张光照模式,高效可信的得到在空间中变化的各向异性的BRDFs以及局部坐标系.相关工作及其依赖于手动推导,而本文通过机器学习的方法,自动的决定采用什么样的lighting pattern和什么方法去恢复.
2.4 Deep-Learning-Assisted Reflectance Modeling
- 近年来(指2018年),deep learning的技巧仅能用于基于单图片的反射率重建问题.
- Miika Aittala, Timo Aila, and Jaakko Lehtinen. 2016. Reflectance Modeling by Neural
Texture Synthesis. ACM Trans. Graph. 35, 4, Article 65 (July 2016), 13 pages.
通过对单张静止的 textured material的flash image分析,为各向同性的SVBRDF和表面法向量建模.困难之处在于精确的点点对应,他们利用CNN,采用纹理的描述子对其进行规避. - Xiao Li, Yue Dong, Pieter Peers, and Xin Tong. 2017. Modeling Surface Appearance
from a Single Photograph Using Self-augmented Convolutional Neural Networks.
ACM Trans. Graph. 36, 4, Article 45 (July 2017), 11 pages.
present a CNN-based solution for modeling
SVBRDF from a single photograph of a planar sample with unknown
natural illumination, using a self-augmentation training process - 对于这类方法可以参考下面的论文
deep autoencoders: G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the Dimensionality of Data with
Neural Networks. Science 313, 5786 (2006), 504–507.
deep learning techniques: Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press.
http://www.deeplearningbook.org.
3.基础知识
3.1 渲染方程
首先来推导一下,为什么渲染方程长这个样子
$$
B(I, \mathbf{p})=\int \frac{1}{|\mathbf{x}{1}-\mathrm{x}{\mathrm{p}}|^{2}} I(l) \Psi(\mathrm{x}{1},-\omega{\mathrm{i}}) f_{r}(\omega_{\mathrm{i}}^{\prime} ; \omega_{\mathbf{o}}^{\prime}, \mathbf{p})
(\omega_{\mathbf{i}} \cdot \mathbf{n}{\mathbf{p}})(-\omega{\mathbf{i}} \cdot \mathbf{n}{1}) d \mathbf{x}{1}
$$
其中:“
We model each light as a locally planar source. $\mathrm{x}{\mathrm{p}}/\mathrm{n}{\mathrm{p}}$ are the position / normal of a point $\mathrm{p}$ on the physical sample, and $\mathrm{x}{1} / \mathrm{n}{\mathrm{l}}$ are the position / normal of a point on a light source l. $\omega_{\mathrm{i}} / \omega_{\mathbf{o}}$ are the lighting / view directions in the world space, while $\omega_{\mathrm{i}}^{\prime} / \omega_{\mathbf{o}}{ }^{\prime}$ are their counterparts expressed in the local frame of p. $\omega_{\mathrm{i}}$ can be computed as $\omega_{\mathbf{i}}=\frac{\mathrm{x}{1}-\mathrm{x}{\mathrm{p}}}{\left|\mathrm{x}{1}-\mathrm{x}{\mathrm{p}}\right|} \cdot I(l)$ is the programmable intensity for the light $l$ over its maximum intensity, in the range of $[0,1]$. The array ${I(l)}{l}$ corresponds to a lighting pattern. $\Psi\left(\mathrm{x}{1}, \cdot\right)$ describes the angular distribution of the light intensity when fully on. $f_{r}\left(\cdot ; \omega_{0}^{\prime}, p\right)$ is a $2 D$ BRDF slice, which is a function of the lighting direction only. The above integral is computed over all light sources.
“
回顾前面的知乎上面提到的渲染方程长这样子的:
$$
L_{0}\left(p, \omega_{0}\right)=L_{e}\left(p, \omega_{0}\right)+\int_{\xi^{2}} f_{r}\left(p, w_{i} \rightarrow w_{0}\right) L_{i}\left(p, \omega_{i}\right) \cos \theta d \omega_{i}
$$
$L_{0}\left(p, \omega_{0}\right):$ 最后观察到的辐射度
$p$ :我们想要得到辐射度的这个点
$\omega_{0}$ :这个点的方向,法线
$L_{e}\left(p, \omega_{0}\right):$ 出射辐射度
$\xi^{2}$ :半球的各个方向
$f_{r}$ :散射函数
$L_{i}$ :入射辐射度
$\omega_{i}$ :入射方向
$\vartheta$ :传入方向与法线的夹角
- 由于$\omega_i=\frac{x_l-x_p}{||x_l-x_p||}$,对其求导可得$d\omega_i=\frac{x_l}{||x_l-x_p||^2}$,从而对半球上的全方向积分$\int_{\xi^2}dw_i=\int \frac{x_l}{||x_l-x_p||^2}$
- $\vartheta=\omega_i\cdot n_p$
- 注意到L的概念:
所以$I(l)\Phi(x_l,-\omega_i)(\omega_i\cdot n_l)=I(l)\Phi(x_l,-\omega_i)cos\theta=L_i(p,w_i)$
(符号有点乱,但要表达的就是这个意思)
3.2 BRDF的计算
本文的计算框架没有和任意的BRDF模型绑定(这里应该采用的是一个比较通用的),所以采用下面论文中的模型
Bruce Walter, Stephen R. Marschner, Hongsong Li, and Kenneth E. Torrance. 2007.
Microfacet Models for Refraction through Rough Surfaces. In Rendering Techniques
(Proc. EGWR).
$$
\begin{aligned}
& f_{r}\left(\omega_{\mathrm{i}} ; \omega_{\mathbf{o}}, \mathbf{p}\right) \
=& \frac{\rho_{d}}{\pi}+\rho_{s} \frac{D_{\mathrm{GGX}}\left(\omega_{\mathbf{h}} ; \alpha_{x}, \alpha_{y}\right) F\left(\omega_{\mathbf{i}}, \omega_{\mathbf{h}}\right) G_{\mathrm{GG}}\left(\omega_{\mathbf{i}}, \omega_{\mathbf{o}} ; \alpha_{\mathbf{x}}, \alpha_{\mathbf{y}}\right)}{4\left(\omega_{\mathbf{i}} \cdot \mathbf{n}\right)\left(\omega_{\mathbf{o}} \cdot \mathbf{n}\right)}
\end{aligned}
$$
“
where $\rho_{d} / \rho_{s}$ are the diffuse / specular albedo, $\alpha_{x} / \alpha_{y}$ are the roughnesses parameters, and $\omega_{h}$ is the half vector. $D_{\mathrm{GGX}}$ is the microfacet distribution function, $F$ is the Fresnel term and $G_{\text {GGX }}$ accounts for shadowing / masking effects, all of which are detailed in the supplemental material for brevity.
“
注:反射率reflectance,反照率 albedo 他们的区别是https://blog.csdn.net/qq_35045096/article/details/91949463
关于微表面和宏表面的关系以及microfacet distribution function:
https://blog.csdn.net/weixin_33856370/article/details/85929381
菲涅尔准则:https://blog.csdn.net/xuehuic/article/details/6229532?locationNum=14
(菲涅尔准则是用于衡量反射光的柔和程度的)
3.3 Lumitexel
由于上式关于$L(l)$的线性性,从而可以表达为,m即称为Lumitexel
$$
B(I, \mathbf{p})=\sum_{l} I(l) m(l ; \mathbf{p})$$
根据论文:
Hendrik P. A. Lensch, Jan Kautz, Michael Goesele, Wolfgang Heidrich, and Hans-Peter
Seidel. 2003. Image-based Reconstruction of Spatial Appearance and Geometric
Detail. ACM Trans. Graph. 22, 2 (April 2003), 234–257
m is a function of the light source j,defined on each point p of the physical sample:
$$
m(j ; \mathbf{p})=B({I(l=j)=1, I(l \neq j)=0}, \mathbf{p})
$$
后面那个符号的含义是,其他的灯都关上,打开这个等,并且$I(l)=1$,在本文中 lumitexel会被当成主要的数据结构
3.4 Problem Formulation
“
Problem Formulation. From Eq. 1&2, for a point $\mathrm{p}$ on the physical sample, reflectance acquisition is essentially to solve for the unknown BRDF $f_{r}$ and its local frame, parameterized as ${\rho_{d}, \rho_{s},\alpha_{x}, \alpha_{y}, \mathbf{n}, \mathbf{t}}$, from the photographs ${B(I, \mathbf{p})}_{I}$ captured with predetermined lighting patterns {${I(l)}_l$}. All other variables involved in Eq. 1 can be pre-calibrated.
“
n是法向量,t是切向量构成了local frame.
4.FRAMEWORK
本文提出了一个对于lumitexels的autoencoder(L-DAE)用来为每一个点p上的m进行编码和解码(对于给定数量的lighting pattern).然后fit a 4D BRDF along with the local frame to the lumitexel(?).这个过程为样本上的每一个点执行,产生描述6D SVBRDF的texture maps.
基本上来看分为两步,第一步是得到编码后的lumitexel然后解码,第二步是一个分离开的BRDF fitting step 用来得到final 4D BRDF.
5. L-DAE
“ In acquisition, it is applied to each of the RGB channels to obtain an RGB lumitexel as the result.“
- DAE由两部分组成: a nonnegative, linear encoder, and a stacked, nonlinear decoder.其中encoder是通过将lighting pattern打到物理样本上,然后测量反射光,它的本质上其实是: “ performing dot products between the lumitexel and the lighting patterns “.而lighting pattern对应的是encoder中的weight.对于decoder来说,它是一个堆叠的,宽度增加的network它的结构是这样的:
- encoder实际上是一个convolutional layer with no padding.卷积核是 $c\times1\times$# 的c是lumitexel的维度,#是lighting pattern的数量.
- decoder network接在encoder后面,有11层全连接层,使用全连接层的原因是为了避免作出”lumitexel中不同元素在空间上联系”的假设.每一个全连接层前面都有一个batch normalization层,其后有一个leaky ReLU激活函数(第一个全连接层前没有bn layer)
- 这个L-DAE与传统的conventional autoencoder有一些不同.首先,encoder和handware上实现的physical acquisition process是对应的(实际上就是一个物理过程),这些排除了复杂的操作,只剩下lumitexel以及lighting patterns上的非负的乘法和加法.而decoder由于跑在计算机上就没有这个限制,所以使用stacked nonlinear neural network作为decoder,来利用现代深度学习的技术.
感觉它的流程应该是这样的,首先通过技术手段测量得到lumitexel,这个东西与光线的lighting pattern无关(说起来是pattern实际上就是不同位置光线的强弱),然后就用一些光取照,然后把得到的这B(I,p)放到decoder里面去训练神经网络.需要注意的是lighting pattern是不变的,他们训练的是,对于不同的lumutexel,与lighting pattern计算出来的东西,解码过后能够得到该lumitexel.
5.1 Loss Function
$$
L=L_{\text {auto }}(m)+\lambda \sum_{w \in \text { enc. }} L_{\text {barrier }}(w)
$$
第一项是重建出的m与ground truth之间的error.
$$
L_{\text {auto }}(m)=\sum_{j}\left[\log (1+m(j))-\log \left(1+m_{\mathrm{gt}}(j)\right)\right]^{2}
$$
使用log取解决镜面lobe中的 possible large value,这种做法与下面的论文类似:
Jannik Boll Nielsen, Henrik Wann Jensen, and Ravi Ramamoorthi. 2015. On Optimal,
Minimal BRDF Sampling for Reflectance Acquisition. ACM Trans. Graph. 34, 6,
Article 186 (Oct. 2015), 11 pages.
第二项是用来确定计算中的lighting pattern在物理上的可信性.
“ It penalizes any weight w in the encoder that is beyond the range of [0, 1], as w corresponds to the ratio of the lighting intensity over its maximum intensity for each source “
很疑惑,这个w对应的是对lighting pattern的惩罚项,那它该怎么计算?训练的时候怎么会训练到它?它不是一个系数项是人为控制的吗?
$$
L_{\text {barrier }}(w)=\tanh \left(\frac{w-(1-\epsilon)}{\epsilon}\right)+\tanh \left(\frac{-w+\epsilon}{\epsilon}\right)+2
$$
We find that $\lambda=0.03, \epsilon=0.005$ works well in our experiments.
5.2 Training Data
训练数据是通过渲染方程模拟出来的.using a large number of randomly generated fr ,the local frame and the location on the physical sample
(光源位置不是变量,因为那个仪器那些是固定的,就像一个盒子一样)
下面是样本生成的方法:
“ Specifically, for the local frame, we randomly sample n in the upper hemisphere of the sample plane, and then t as a random unit vector that is orthogonal to n. Similarly, for the location on the
physical sample, we randomly choose a point from the valid region of the sample plane. For the BRDF fr , we use the anisotropic GGX model and randomly sample ρd /ρs uniformly in the range of [0, 1], and αx /αy uniformly on the log scale in the range of [0.006, 0.5]. The calibration data of the acquisition setup (Sec. 6) are used when evaluating Eq. 4 for training lumitexel generation.“
有个问题什么叫sample n in the upper hemisphere of the sample plane?
尽管生成数据使用的是GGX model 来生成$f_r$的,但是需要注意的是该模型并不局限在任意的BRDF model上.并且并不要求训练的时候采用的BRDF model和最后结果的是同一个model.
(我的理解是因为这个网络实际上就在做类似于解线性方程组的东西,而这个模型中固定的部分只有lighting pattern(即训练的是在某些lighting pattern下解方程的过程)而m是将BRDF作为整体包含在里面的,肯定不会受到,所以肯定不会受到BRDF model的影响)
6.Acquisition setup
见论文,这篇论文的装置和下一篇论文的装置是一样的.
7.Implementation Details
- 训练模型时的back propagation使用的是RMSProp(with mini-batches of 50 and a momentum of 0.9)
Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-RMSProp: Divide the Gradient
by a Running Average of Its Recent Magnitude. Neural Networks for Machine
Learning 4 (2012), 26–31.
这个课程观看记录见: