犰狳向量 vec/fvec 的内存对齐

编程问答 2022-05-22

问题描述

我想用 __m256 直接从犰狳矢量数据加载 .memptr()。 Armadillo 是否确保数据内存是 256 位对齐的？如果是，那么我会将 .memptr() 返回的浮点/双指针转换为 __m256 指针并跳过 _mm256_load_ps()，如果它在性能方面有意义。

解决方法

犰狳似乎没有在文档中提到这一点，所以没有具体说明。因此，矢量数据可能无法确保 32 字节对齐。

但是，您不需要对齐矢量数据来将它们加载到 AVX 寄存器中：您可以使用未对齐的加载内在 _mm256_loadu_ps。 AFAIK，_mm256_load_ps 和 _mm256_loadu_ps 的性能在相对较新的 x86 处理器上大致相同。

armadillo c++intrinsics performance

相关问答

matplotlib报错：AttributeError: module 'backend_interagg' has no attribute 'FigureCanvas'. Did you mean: 'FigureCanvasAgg'?

使用本地python环境可以成功执行 import pandas as pd impor...

gitlab登录失败，报错：This challenge page was accidentally cached by an intermediary and is no longer available.

设置时间控制面板

后端开发常见错误

错误1：Request method ‘DELETE‘ not supported 错误还原：...

docker常见错误

错误1：启动docker镜像时报错：Error response from daemon:...

idea常见错误

错误1：private field ‘xxx‘ is never assigned 按Alt...

pip安装依赖失败

报错如下，通过源不能下载，最后警告pip需升级版本 Requirem...