hfai.nn.functional¶

`set_replace_torch`	把所有 hfai 优化过的 torch.nn.functional 和 torch 中的函数转换为 hfai.nn.functional 的对应函数.
`dropout`	dropout 函数, 参考 `Dropout`
`hardtanh`	hardtanh 函数, 参考 `Hardtanh`
`hardtanh_`	原地操作的 hardtanh 函数, 参考 `Hardtanh`
`log_softmax`	log_softmax 函数, 参考 `LogSoftmax`
`softmax`	softmax 函数, 参考 `Softmax`
`softmin`	softmin 函数, 参考 `Softmin`
`softplus`	softplus 函数, 参考 `Softplus`
`sync`	allgather 输入的 tensor 并沿着指定的维度拼接在一起，支持 autograd，backward 的时候梯度会传回去
`relu`	relu 函数, 参考 `ReLU`
`relu_`	原地操作的 relu 函数, 参考 `ReLU`
`relu6`	relu6 函数, 参考 `ReLU6`
`threshold`	threshold 函数, 参考 `Threshold`
`threshold_`	原地操作的 threshold 函数, 参考 `Threshold`
`leaky_relu`	leaky_relu 函数, 参考 `LeakyReLU`
`leaky_relu_`	原地操作的 leaky_relu 函数
`rrelu`	rrelu 函数, 参考 `RReLU`
`rrelu_`	原地操作的 rrelu 函数
`hardsigmoid`	hardsigmoid 函数, 参考 `Hardsigmoid`
`hardshrink`	hardshrink 函数, 参考 `Hardshrink`
`softshrink`	softshrink 函数, 参考 `Softshrink`
`abs`	压位 abs 函数, 用法与 func:torch.abs 一致
`abs_`	原地操作的 abs
`minimum`	压位 minimum 函数, 用法与 func:torch.minimum 一致
`maximum`	压位 maximum 函数, 用法与 func:torch.maximum 一致
`min`	压位 min 算子, 返回 input 和 min 中的较大值若 value 是 Tensor, 调用 hf_F.minimum(input, value) 若 value 是 float, 调用 hf_F.clamp(input, max=value) 若 value 是 int, 或者 dim 或 keepdim 不为 None, 调用 torch.min(input, dim=value or dim, keepdim=keepdim)
`max`	压位 max 算子, 返回 input 和 max 中的较大值若 value 是 Tensor, 调用 hf_F.maximum(input, value) 若 value 是 float, 调用 hf_F.clamp(input, min=value) 若 value 是 int, 或者 dim 或 keepdim 不为 None, 调用 torch.max(input, dim=value or dim, keepdim=keepdim)
`clip`	clip 函数, 参考 `hfai.nn.functional.clamp()`
`clip_`	原地操作的 clip 函数
`clamp`	压位 clamp 算子, 训练时的中间结果用 1bit 储存 [min <= x <= max], 以节省训练时的内存
`clamp_`	原地操作的 clamp 函数
`clamp_max`	压位 clamp_max 算子, 训练时的中间结果用 1bit 储存 [x <= max], 以节省训练时的内存
`clamp_max_`	原地操作的 clamp_max 函数
`clamp_min`	压位 clamp_min 算子, 训练时的中间结果用 1bit 储存 [x >= min], 以节省训练时的内存
`clamp_min_`	原地操作的 clamp_min 函数
`where`	压位 where 函数, 用法与 func:torch.where 一致
`masked_fill`	压位 masked_fill 函数, 用法与 func:torch.masked_fill 一致
`masked_fill_`	原地操作的压位 masked_fill 函数
`masked_select`	节省显存的 masked_select 函数, 用法与 func:torch.masked_select 一致
`masked_scatter`	节省显存的 masked_scatter 函数, 用法与 func:torch.masked_scatter 一致
`masked_scatter_`	原地操作的节省显存的 masked_scatter 函数
`scan`	用给定函数在数据上扫描 (类似 RNN), 并在相邻的阶段间传递隐藏状态 `hidden`
`associative_scan`	用满足结合律的二元运算函数在数据上扫描 (类似前缀和), 并行执行

class hfai.nn.functional.set_replace_torch(mode=True)[source]¶

把所有 hfai 优化过的 torch.nn.functional 和 torch 中的函数转换为 hfai.nn.functional 的对应函数. 如: torch.nn.functional.relu -> hfai.nn.functional.relu, torch.max -> hfai.nn.funtional.max, x.abs() -> hfai.nn.funtional.abs(x)

Parameters: mode (bool, optional) – 是否开启替换. 默认: True

Note

调用 hfai.nn.functional.set_replace_torch() 后, 无论是 (1)用户显示调用还是 (2)PyTorch内部调用, 一切对 torch.nn.functional.xxx 的调用, 都会执行 hfai.nn.functional.xxx .

例如 torch.nn.CrossEntropyLoss 中的 log_softmax 会自动执行 hfai.nn.functional.log_softmax

Examples:

hfai.nn.functional.set_replace_torch()
y = torch.nn.functional.softmax(x) # softmax 执行 hfai 的实现
hfai.nn.functional.set_replace_torch(False)

with hfai.nn.functional.set_replace_torch():
    loss = torch.nn.functional.cross_entropy(input, target) # 内部的 log_softmax 执行 hfai 的实现

hfai.nn.functional.dropout(input, p=0.5, training=True, inplace=False)[source]¶: dropout 函数, 参考 Dropout

hfai.nn.functional.hardtanh(input, min_val=- 1.0, max_val=1.0, inplace=False)[source]¶: hardtanh 函数, 参考 Hardtanh

hfai.nn.functional.hardtanh_(input, min_val=- 1.0, max_val=1.0)[source]¶: 原地操作的 hardtanh 函数, 参考 Hardtanh

hfai.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)[source]¶: log_softmax 函数, 参考 LogSoftmax

hfai.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)[source]¶: softmax 函数, 参考 Softmax

hfai.nn.functional.softmin(input, dim=None, _stacklevel=3, dtype=None)[source]¶: softmin 函数, 参考 Softmin

hfai.nn.functional.softplus(input, beta=1, threshold=20)[source]¶: softplus 函数, 参考 Softplus

hfai.nn.functional.sync(x, dist_group=False, dim=0, equal_size=False, tag=None, enable_timer=True, log_every_steps=1, timeout=60, reduce_grad=True)[source]¶

allgather 输入的 tensor 并沿着指定的维度拼接在一起，支持 autograd，backward 的时候梯度会传回去

F.sync.get_metrics 会返回一个字典，格式如下：

{
    "tag1": {"iters": 100, "fwd": 25, "bwd": 40, "size": 16},
    "tag2": {"iters": 100, "fwd": 25, "bwd": 40, "size": 16},
}

iters 代表该 tag 调用的次数，fwd / bwd 代表每次 forward / backward 的平均耗时（ms），size 代表每次 forward 返回结果的平均大小（byte）

Parameters

x (Tensor) – 输入的 tensor
dist_group (ProcessGroup) – ProcessGroup 对象，如果是 False 则不会做 allgather
dim (int) – allgather 之后拼接的维度
equal_size (bool) – 是否每张卡上的 tensor 大小相同
tag (str) – 计时的标签，每个标签在一次 forward 中只能用一次; tag 为 None 时不计时
enable_timer (bool) – 是否计时
log_every_steps (int) – 每多少个 step 计时一次
timeout (int) – 本函数超时的秒数，超过这个时间会抛出异常；0 代表没有时间限制；默认是 60
reduce_grad (bool) – 是否对传回来的梯度做 reduce，默认是 True

Returns

拼接后的结果

Return type

out (Tensor)

Examples:

import torch.distributed as dist
import hfai.nn.functional as F

# init process group ...

rank = dist.get_rank()
x = torch.ones(1, requires_grad=True, device='cuda') * rank
out = F.sync(x, dist_group, dim=0, tag='tag1')
out.sum().backward()

# 打印耗时、通讯量等
F.sync.print_metrics()

# 获得 metrics
print(F.sync.get_metrics())

# 重置 metrics
F.sync.reset()

hfai.nn.functional.relu(input, inplace=False)[source]¶: relu 函数, 参考 ReLU

hfai.nn.functional.relu_(input)[source]¶: 原地操作的 relu 函数, 参考 ReLU

hfai.nn.functional.relu6(input, inplace=False)[source]¶: relu6 函数, 参考 ReLU6

hfai.nn.functional.threshold(input, threshold, value, inplace=False)¶: threshold 函数, 参考 Threshold

hfai.nn.functional.threshold_(input, threshold, value)[source]¶: 原地操作的 threshold 函数, 参考 Threshold

hfai.nn.functional.leaky_relu(input, negative_slope=0.01, inplace=False)[source]¶: leaky_relu 函数, 参考 LeakyReLU

hfai.nn.functional.leaky_relu_(input, negative_slope=0.01)[source]¶: 原地操作的 leaky_relu 函数

hfai.nn.functional.rrelu(input, lower=0.125, upper=0.3333333333333333, training=False, inplace=False)[source]¶: rrelu 函数, 参考 RReLU

hfai.nn.functional.rrelu_(input, lower=0.125, upper=0.3333333333333333, training=False)[source]¶: 原地操作的 rrelu 函数

hfai.nn.functional.hardsigmoid(input, inplace=False)[source]¶: hardsigmoid 函数, 参考 Hardsigmoid

hfai.nn.functional.hardshrink(input, lambd=0.5)[source]¶: hardshrink 函数, 参考 Hardshrink

hfai.nn.functional.softshrink(input, lambd=0.5)[source]¶: softshrink 函数, 参考 Softshrink

hfai.nn.functional.abs(input)[source]¶: 压位 abs 函数, 用法与 func:torch.abs 一致

hfai.nn.functional.abs_(input)[source]¶: 原地操作的 abs

hfai.nn.functional.minimum(input1, input2)[source]¶: 压位 minimum 函数, 用法与 func:torch.minimum 一致

hfai.nn.functional.maximum(input1, input2)[source]¶: 压位 maximum 函数, 用法与 func:torch.maximum 一致

hfai.nn.functional.min(input, value=None, dim=None, keepdim=False)[source]¶

压位 min 算子, 返回 input 和 min 中的较大值若 value 是 Tensor, 调用 hf_F.minimum(input, value) 若 value 是 float, 调用 hf_F.clamp(input, max=value) 若 value 是 int, 或者 dim 或 keepdim 不为 None, 调用 torch.min(input, dim=value or dim, keepdim=keepdim)

Parameters

input (Tensor) – 输入的 Tensor
value (Tensor or float or int) – 进行比较的值或做 min操作的维度
dim (int) – 做操作的维度
keepdim (bool) – 是否 keepdim

hfai.nn.functional.max(input, value=None, dim=None, keepdim=False)[source]¶

压位 max 算子, 返回 input 和 max 中的较大值若 value 是 Tensor, 调用 hf_F.maximum(input, value) 若 value 是 float, 调用 hf_F.clamp(input, min=value) 若 value 是 int, 或者 dim 或 keepdim 不为 None, 调用 torch.max(input, dim=value or dim, keepdim=keepdim)

Parameters

input (Tensor) – 输入的 Tensor
value (Tensor or float or int) – 进行比较的值或做 max 操作的维度
dim (int) – 做操作的维度
keepdim (bool) – 是否 keepdim

hfai.nn.functional.clip(input, min=None, max=None)[source]¶: clip 函数, 参考 hfai.nn.functional.clamp()

hfai.nn.functional.clip_(input, min=None, max=None)[source]¶: 原地操作的 clip 函数

hfai.nn.functional.clamp(input, min=None, max=None, inplace=False)[source]¶

压位 clamp 算子, 训练时的中间结果用 1bit 储存 [min <= x <= max], 以节省训练时的内存

Parameters

input – 输入的 Tensor
min (float) – output 的最小值, 默认: None
max (float) – output 的最大值, 默认: None
inplace (bool, optional) – 如果是 True, 进行原地操作, 默认: False

import hfai.nn.functional as F

y = F.clamp(x, min=-0.5, max=-0.5)
# same as: y = x.clamp(min=-0.5, max=0.5)

hfai.nn.functional.clamp_(input, min=None, max=None)[source]¶: 原地操作的 clamp 函数

hfai.nn.functional.clamp_max(input, max, inplace=False)[source]¶

压位 clamp_max 算子, 训练时的中间结果用 1bit 储存 [x <= max], 以节省训练时的内存

Parameters

input – 输入的 Tensor
max (float) – output 的最大值
inplace (bool, optional) – 如果是 True, 进行原地操作, 默认: False

import hfai.nn.functional as F

y = F.clamp_max(x, max=0.5)
# same as: y = x.clamp_max(max=0.5)
# same as: y = torch.min(x, 0.5 * torch.ones_like(x))

hfai.nn.functional.clamp_max_(input, max)[source]¶: 原地操作的 clamp_max 函数

hfai.nn.functional.clamp_min(input, min, inplace=False)[source]¶

压位 clamp_min 算子, 训练时的中间结果用 1bit 储存 [x >= min], 以节省训练时的内存

Parameters

input – 输入的 Tensor
min (float) – output 的最小值
inplace (bool, optional) – 如果是 True, 进行原地操作, 默认: False

import hfai.nn.functional as F

y = F.clamp_min(x, min=-0.5)
# same as: y = x.clamp_min(min=-0.5)
# same as: y = torch.max(x, -0.5 * torch.ones_like(x))

hfai.nn.functional.clamp_min_(input, min)[source]¶: 原地操作的 clamp_min 函数

hfai.nn.functional.where(condition, input1=None, input2=None)[source]¶: 压位 where 函数, 用法与 func:torch.where 一致

hfai.nn.functional.masked_fill(input, mask, value)[source]¶: 压位 masked_fill 函数, 用法与 func:torch.masked_fill 一致

hfai.nn.functional.masked_fill_(input, mask, value)[source]¶: 原地操作的压位 masked_fill 函数

hfai.nn.functional.masked_select(input, mask)[source]¶: 节省显存的 masked_select 函数, 用法与 func:torch.masked_select 一致

hfai.nn.functional.masked_scatter(input, mask, source)[source]¶: 节省显存的 masked_scatter 函数, 用法与 func:torch.masked_scatter 一致

hfai.nn.functional.masked_scatter_(input, mask, source)[source]¶: 原地操作的节省显存的 masked_scatter 函数

hfai.nn.functional.scan(f, xs, init, dim=0, reverse=False)[source]¶

用给定函数在数据上扫描 (类似 RNN), 并在相邻的阶段间传递隐藏状态 hidden

Parameters

f (Callable[Tuple[X, Hidden], Tuple[Y, Hidden]]) – 每个阶段执行的函数, 形如 f: (x, hidden_in) -> (y, hidden_out). X 和 Y 都是 Tensor 或者叶子节点是 Tensor 的 pytree (嵌套的 tuple/list/dict). i 阶段的 hidden_out 是 i + 1 阶段的 hidden_in
xs (X) – 在主维堆叠在一起的 x. 阶段数 (f 的执行次数) 等于主维大小
init (Hidden) – 第一个阶段输入的 hidden_in
dim (int, optional) – 主维. 默认: 0
reverse (bool, optional) – 是否逆序扫描 xs. 默认: False

Returns

一个元组 (ys, h), ys 表示扫描结果; h 是最后一个阶段输出的 hidden_out

Examples:

f = lambda x, y: (x + y, x + y)
a = torch.tensor([1, 2, 3, 4])
sum = hfai.nn.functional.scan(f, a, torch.tensor(0)) # (tensor([1, 3, 6, 10]), tensor(10))
sum = hfai.nn.functional.scan(f, a, torch.tensor(0), reverse=True) # (tensor([10, 9, 7, 4]), tensor(10))

f = lambda x, y: ((x[0] + y, x[1] + y), x[0] + x[1] + y)
a = [torch.tensor([1, 2, 3, 4]), torch.tensor([4, 3, 2, 1])]
sum = hfai.nn.functional.scan(f, a, torch.tensor(0))  # ([tensor([1, 7, 13, 19]), tensor([4, 8, 12, 16])], tensor(20))
sum = hfai.nn.functional.scan(f, a, torch.tensor(0), reverse=True)  # ([tensor([16, 12, 8, 4]), tensor([19, 13, 7, 1])], tensor(20))

hfai.nn.functional.associative_scan(f, xs, dim=0, reverse=False)[source]¶

用满足结合律的二元运算函数在数据上扫描 (类似前缀和), 并行执行

Parameters

f (Callable[Tuple[X, X], X]) – 二元运算函数, 需要满足结合律, 即 f(f(a, b), c) = f(a, f(b, c)). X 是 Tensor 或者叶子节点是 Tensor 的 pytree (嵌套的 tuple/list/dict). f 的输入和输出必须结构相同 (如果 X 是 Tensor, 那么输入和输出必须 shape 相同; 如果 X 是 pytree, 那么输入和输出的 pytree 必须同构, 并且对应的叶子节点的 Tensor 必须 shape 相同)
xs (X) – 在主维堆叠在一起的 x.
dim (int, optional) – 主维. 默认: 0
reverse (bool, optional) – 是否逆序扫描 xs. 默认: False

Returns

ys, 表示扫描结果, 与 xs 结构相同

Examples:

f = lambda x, y: x + y
a = torch.tensor([1, 2, 3, 4])
sum = hfai.nn.functional.associative_scan(f, a) # tensor([1, 3, 6, 10])
sum = hfai.nn.functional.associative_scan(f, a, reverse=True) # tensor([10, 9, 7, 4])

f = lambda x, y: (y[0] * x[0], y[0] * x[1] + y[1]) # 满足结合律
a = [torch.randn(5, 6, 7), torch.randn(5, 6, 7)]
sum = hfai.nn.functional.associative_scan(f, a, dim=-1, reverse=True)