MacOS安装pytorch

1.新建conda环境

conda create -n torch python=3.9
conda activate torch

2.用pip命令安装torch

命令详见Pytorch官网指导页面

pip3 install torch torchvision torchaudio
截屏2025-01-28 18.36.46

3.环境测试代码

3.1.测试代码-1

import torch
# 查看 torch安装是否成功 并查看其版本
print(torch.__version__)
# 查看 mps是否安装成功 是否可用
print(torch.backends.mps.is_available())
# 检查 GPU 是否可用
print(torch.cuda.is_available()) # 对于 MPS,返回 False 是正常的
print(torch.backends.mps.is_available()) # 应该返回 True
# 获取 MPS 设备
mps_device = torch.device("mps")
print(mps_device) # 输出 "mps"
截屏2025-01-28 20.30.56
import torch
import math

# this ensures that the current MacOS version is at least 12.3+
print(torch.backends.mps.is_available())
# this ensures that the current current PyTorch installation was built with MPS activated.
print(torch.backends.mps.is_built())
截屏2025-01-28 19.34.50

3.2.测试代码-2

import torch
import math

dtype = torch.float
device = torch.device("mps")

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device, dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
# Forward pass: compute predicted y
y_pred = a + b * x + c * x ** 2 + d * x ** 3

# Compute and print loss
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)

# Backprop to compute gradients of a, b, c, d with respect to loss
grad_y_pred = 2.0 * (y_pred - y)
grad_a = grad_y_pred.sum()
grad_b = (grad_y_pred * x).sum()
grad_c = (grad_y_pred * x ** 2).sum()
grad_d = (grad_y_pred * x ** 3).sum()

# Update weights using gradient descent
a -= learning_rate * grad_a
b -= learning_rate * grad_b
c -= learning_rate * grad_c
d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

)

3.3.在Mac M1中指定使用GPU加速

To run PyTorch code on the GPU, use torch.device(“mps”) analogous to torch.device(“cuda”) on an Nvidia GPU. Hence, in this example, we move all computations to the GPU:

要在 Mac M2的GPU 上运行 PyTorch 代码,使用命令 torch.device(“mps”)来指定。这类似于 Nvidia GPU 上的torch.device(“cuda”)命令。具体使用方法见下方代码:

import torch
import math
# this ensures that the current MacOS version is at least 12.3+
print(torch.backends.mps.is_available())
# this ensures that the current PyTorch installation was built with MPS activated.
print(torch.backends.mps.is_built())

N = 100000000
device = torch.device("mps")
# To run PyTorch code on the GPU, use torch.device("mps") analogous to torch.device("cuda") on an Nvidia GPU.

cpu_a = torch.randn([1, N])
cpu_b = torch.randn([N, 1])
print(N, cpu_a.device, cpu_b.device)

gpu_a = torch.randn([1, N], device=device)
gpu_b = torch.randn([N, 1], device=device)
print(N, gpu_a.device, gpu_b.device)

def cpu_run():
c = torch.matmul(cpu_a, cpu_b)
return c

def gpu_run():
c = torch.matmul(gpu_a, gpu_b)
return c

import timeit
# 第一次计算,热身
cpu_time = timeit.timeit(cpu_run, number=3)
gpu_time = timeit.timeit(gpu_run, number=3)
print('warmup: ', cpu_time, gpu_time)

# 正式计算
cpu_time = timeit.timeit(cpu_run, number=10)
gpu_time = timeit.timeit(gpu_run, number=10)
print('run_time: ', cpu_time, gpu_time)

关于Jupyter Notebook

推荐博客:https://blog.csdn.net/cainiao_python/article/details/125567913

把pip升级到最新版本

pip install --upgrade pip
  • 注意:老版本的pip在安装Jupyter Notebook过程中或面临依赖项无法同步安装的问题。因此**「强烈建议」**先把pip升级到最新版本。

安装Jupyter Notebook

pip install jupyter

汉化

pip install jupyterlab-language-pack-zh-CN

启动

① 默认端口启动

jupyter notebook

jupyter lab

② 指定端口启动

jupyter notebook --port <port_number>

③ 启动服务器但不打开浏览器

jupyter notebook --no-browser

.py文件和.ipynb文件如何进行相互转换

1. 如图所示将.ipynb文件转换为.py文件

法一:在xxx.ipynb所在目录下,到导航栏(资源管理器输入和修改当前路径的地方)直接输入命令(或打开终端/cmd输入):
jupyter nbconvert --to script xxx.ipynb 

其中xxx.ipynb是要转换文件的名字,转换后在该目录下出现xxx.py文件(有的版本是xxx.txt文件,再改下后缀即可)。

法二:在Jupyter notebook或Google Colab中打开ipynb文件,然后选择file–download as–python file

2.将.py文件转换为.ipynb文件

%run xxx.py加载了xxx.py文件,相当于导包

%load xxx.pyxxx.py的代码显示出来

  • 首先将需要进行转换的py文件放在jupyter Notebook工作目录中;
  • 然后在jupyter Notebook中新建一个.ipynb文件;
  • 在新建立的文件中输入%load xxxx.py
  • 然后就可以在juypyter Notebook中以ipynb的格式打开xxxx.py文件了;

例如,

%load  ./learn/test.py
截屏2025-01-28 19.38.16

点击运行,如下图所示:

截屏2025-01-28 19.38.40

然后点击“文件”、通过“下载”将其保存为“ipynb的格式”的文件;同时工作目录下也会生成一个ipynb的文件。

截屏2025-01-28 19.39.25

Accelerated PyTorch training on Mac

参考博客:
https://blog.csdn.net/weixin_71894495/article/details/144629831
https://www.xn--vjq503a.fun/ml-note/pytorch-mps/
https://cloud.tencent.com/developer/article/2221944

MPS(Metal Performance Shaders)

PyTorch在加载设备的时候一般会有这样的语句(NVIDIA的CUDA)

if torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"

只要在中间插入MPS设备就好了:

if torch.cuda.is_available():
device = "cuda"
elif torch.backends.mps.is_available():
device = "mps"
else:
device = "cpu"

执行如上代码,能够成功打印出torch版本,证明torch安装成功;如果能打印出True证明MPS可用,至于其中的一个False是cuda是否可用,因为MacOS没有安装显卡所以并无法安装cuda加速,固然为false。

加速对比

总的来说,模型越复杂,其MPS加速越明显,如果模型太简单,只需要几秒钟就能跑完的话,MPS加速反而不如CPU,因为MPS要有一些准备工作,把数据放入图显核心里去,如果算法太简单或者数据量太少,结果运行加速节约的时间还不如数据准备的时间长,看起来就会觉得MPS反而需要更多时间来运行。

测试机配置为:Macbook Air M2(8核CPU+10核图形处理器)16G+512GB

使用如下测试代码,以mnist手写数字识别为例,演示使用mac M2芯片GPU的mps后端来加速pytorch的完整流程。

核心操作非常简单,和使用cuda类似,训练前把模型和数据都移动到torch.device(“mps”)就可以了。

import torch 
from torch import nn
import torchvision
from torchvision import transforms
import torch.nn.functional as F


import os,sys,time
import numpy as np
import pandas as pd
import datetime
from tqdm import tqdm
from copy import deepcopy
from torchmetrics import Accuracy


def printlog(info):
nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print("\n"+"=========="*8 + "%s"%nowtime)
print(str(info)+"\n")


#================================================================================
# 一,准备数据
#================================================================================

transform = transforms.Compose([transforms.ToTensor()])

ds_train = torchvision.datasets.MNIST(root="mnist/",train=True,download=True,transform=transform)
ds_val = torchvision.datasets.MNIST(root="mnist/",train=False,download=True,transform=transform)

dl_train = torch.utils.data.DataLoader(ds_train, batch_size=128, shuffle=True, num_workers=2)
dl_val = torch.utils.data.DataLoader(ds_val, batch_size=128, shuffle=False, num_workers=2)


#================================================================================
# 二,定义模型
#================================================================================


def create_net():
net = nn.Sequential()
net.add_module("conv1",nn.Conv2d(in_channels=1,out_channels=64,kernel_size = 3))
net.add_module("pool1",nn.MaxPool2d(kernel_size = 2,stride = 2))
net.add_module("conv2",nn.Conv2d(in_channels=64,out_channels=512,kernel_size = 3))
net.add_module("pool2",nn.MaxPool2d(kernel_size = 2,stride = 2))
net.add_module("dropout",nn.Dropout2d(p = 0.1))
net.add_module("adaptive_pool",nn.AdaptiveMaxPool2d((1,1)))
net.add_module("flatten",nn.Flatten())
net.add_module("linear1",nn.Linear(512,1024))
net.add_module("relu",nn.ReLU())
net.add_module("linear2",nn.Linear(1024,10))
return net

net = create_net()
print(net)

# 评估指标
class Accuracy(nn.Module):
def __init__(self):
super().__init__()

self.correct = nn.Parameter(torch.tensor(0.0),requires_grad=False)
self.total = nn.Parameter(torch.tensor(0.0),requires_grad=False)

def forward(self, preds: torch.Tensor, targets: torch.Tensor):
preds = preds.argmax(dim=-1)
m = (preds == targets).sum()
n = targets.shape[0]
self.correct += m
self.total += n

return m/n

def compute(self):
return self.correct.float() / self.total

def reset(self):
self.correct -= self.correct
self.total -= self.total

#================================================================================
# 三,训练模型
#================================================================================

loss_fn = nn.CrossEntropyLoss()
optimizer= torch.optim.Adam(net.parameters(),lr = 0.01)
metrics_dict = nn.ModuleDict({"acc":Accuracy()})


# =========================移动模型到mps上==============================
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
net.to(device)
loss_fn.to(device)
metrics_dict.to(device)
# ====================================================================


epochs = 20
ckpt_path='checkpoint.pt'

#early_stopping相关设置
monitor="val_acc"
patience=5
mode="max"

history = {}

for epoch in range(1, epochs+1):
printlog("Epoch {0} / {1}".format(epoch, epochs))

# 1,train -------------------------------------------------
net.train()

total_loss,step = 0,0

loop = tqdm(enumerate(dl_train), total =len(dl_train),ncols=100)
train_metrics_dict = deepcopy(metrics_dict)

for i, batch in loop:

features,labels = batch

# =========================移动数据到mps上==============================
features = features.to(device)
labels = labels.to(device)
# ====================================================================

#forward
preds = net(features)
loss = loss_fn(preds,labels)

#backward
loss.backward()
optimizer.step()
optimizer.zero_grad()

#metrics
step_metrics = {"train_"+name:metric_fn(preds, labels).item()
for name,metric_fn in train_metrics_dict.items()}

step_log = dict({"train_loss":loss.item()},**step_metrics)

total_loss += loss.item()

step+=1
if i!=len(dl_train)-1:
loop.set_postfix(**step_log)
else:
epoch_loss = total_loss/step
epoch_metrics = {"train_"+name:metric_fn.compute().item()
for name,metric_fn in train_metrics_dict.items()}
epoch_log = dict({"train_loss":epoch_loss},**epoch_metrics)
loop.set_postfix(**epoch_log)

for name,metric_fn in train_metrics_dict.items():
metric_fn.reset()

for name, metric in epoch_log.items():
history[name] = history.get(name, []) + [metric]


# 2,validate -------------------------------------------------
net.eval()

total_loss,step = 0,0
loop = tqdm(enumerate(dl_val), total =len(dl_val),ncols=100)

val_metrics_dict = deepcopy(metrics_dict)

with torch.no_grad():
for i, batch in loop:

features,labels = batch

# =========================移动数据到mps上==============================
features = features.to(device)
labels = labels.to(device)
# ====================================================================

#forward
preds = net(features)
loss = loss_fn(preds,labels)

#metrics
step_metrics = {"val_"+name:metric_fn(preds, labels).item()
for name,metric_fn in val_metrics_dict.items()}

step_log = dict({"val_loss":loss.item()},**step_metrics)

total_loss += loss.item()
step+=1
if i!=len(dl_val)-1:
loop.set_postfix(**step_log)
else:
epoch_loss = (total_loss/step)
epoch_metrics = {"val_"+name:metric_fn.compute().item()
for name,metric_fn in val_metrics_dict.items()}
epoch_log = dict({"val_loss":epoch_loss},**epoch_metrics)
loop.set_postfix(**epoch_log)

for name,metric_fn in val_metrics_dict.items():
metric_fn.reset()

epoch_log["epoch"] = epoch
for name, metric in epoch_log.items():
history[name] = history.get(name, []) + [metric]

# 3,early-stopping -------------------------------------------------
arr_scores = history[monitor]
best_score_idx = np.argmax(arr_scores) if mode=="max" else np.argmin(arr_scores)
if best_score_idx==len(arr_scores)-1:
torch.save(net.state_dict(),ckpt_path)
print("<<<<<< reach best {0} : {1} >>>>>>".format(monitor,
arr_scores[best_score_idx]),file=sys.stderr)
if len(arr_scores)-best_score_idx>patience:
print("<<<<<< {} without improvement in {} epoch, early stopping >>>>>>".format(
monitor,patience),file=sys.stderr)
break
net.load_state_dict(torch.load(ckpt_path))

dfhistory = pd.DataFrame(history)

在使用MPS运行时,CPU占比下降到较低水平,开始启用GPU运行,10核心的图显也仅仅使用了1颗,感觉加速不是特别明显;

截屏2025-01-29 00.36.35 截屏2025-01-29 00.37.43

在使用CPU运行时, 明显看到8核心的CPU,程序几乎占用了7个核心,GPU没有使用,

截屏2025-01-29 00.38.59 截屏2025-01-29 00.40.36

总结

Mac的GPU性能还是可以的,用来跑边端模型和训练微小模型还是很不错的,因此比较适合初学者。