﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-paulwong-随笔分类-AI-QUANTIZATION</title><link>http://www.blogjava.net/paulwong/category/55402.html</link><description /><language>zh-cn</language><lastBuildDate>Mon, 10 Feb 2025 06:12:24 GMT</lastBuildDate><pubDate>Mon, 10 Feb 2025 06:12:24 GMT</pubDate><ttl>60</ttl><item><title>量化资源</title><link>http://www.blogjava.net/paulwong/archive/2025/02/08/451577.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sat, 08 Feb 2025 15:31:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2025/02/08/451577.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/451577.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2025/02/08/451577.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/451577.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/451577.html</trackback:ping><description><![CDATA[GPTQ、GGUF、AWQ 大语言模型量化方法对比（转载）&nbsp;
<div><a href="https://caovan.com/gptqggufawq-dayuyanmoxinglianghuafangfaduibizhuanzai/.html" target="_blank">https://caovan.com/gptqggufawq-dayuyanmoxinglianghuafangfaduibizhuanzai/.html</a>
</div>
<img src ="http://www.blogjava.net/paulwong/aggbug/451577.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2025-02-08 23:31 <a href="http://www.blogjava.net/paulwong/archive/2025/02/08/451577.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>搭建llamafactory微调、评估、测试和量化环境</title><link>http://www.blogjava.net/paulwong/archive/2025/01/16/451558.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Thu, 16 Jan 2025 08:54:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2025/01/16/451558.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/451558.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2025/01/16/451558.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/451558.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/451558.html</trackback:ping><description><![CDATA[<div>0. 配置环境变量</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->HF_ENDPOINT=https://hf-mirror.com<br />
HF_HOME=/root/autodl-tmp/paul/tools/huggingface</div>
</div>
<div><br />
</div>
1. 本机安装python 3.10, 并设置软件源
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->pip&nbsp;config&nbsp;set&nbsp;global.index-url&nbsp;https://pypi.tuna.tsinghua.edu.cn/simple<br />
pip&nbsp;config&nbsp;set&nbsp;global.index-url&nbsp;https://mirrors.huaweicloud.com/repository/pypi/simple</div>
<div><br />
<div>2. 安装miniconda</div>
<div><a href="https://juejin.cn/post/7078965942968909854" target="_blank">https://juejin.cn/post/7078965942968909854</a><br />
</div>
<div><br />
</div>
<div>3. 新建一个环境, 并激活</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->conda&nbsp;create&nbsp;-n&nbsp;quantization&nbsp;python=3.12</div>
</div>
<div><br />
<div>2. 本机安装pytorch2.5.1+cuda12.4</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->pip3&nbsp;install&nbsp;torch&nbsp;torchvision&nbsp;torchaudio</div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;">pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124</div>
</div>
<div><br />
</div>
<div>3. clone llamafactory源码</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->git&nbsp;clone&nbsp;https://github.com/hiyouga/LLaMA-Factory</div>
</div>
<div><br />
</div>
<div>4. llamafactory本地安装依赖</div>
</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->pip&nbsp;install&nbsp;-e&nbsp;.</div>
</div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;">pip install -e .["vllm","gptq"]<br />
</div>
<div><br />
</div>
<div>5. 启动webui</div>
</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->llamafactory-cli&nbsp;webui</div>
</div>
<div><br />
</div>
<div>6. 在页面中填入相关参数进行操作</div>
</div>
<img src ="http://www.blogjava.net/paulwong/aggbug/451558.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2025-01-16 16:54 <a href="http://www.blogjava.net/paulwong/archive/2025/01/16/451558.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>量化大模型工具</title><link>http://www.blogjava.net/paulwong/archive/2025/01/15/451557.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Wed, 15 Jan 2025 10:00:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2025/01/15/451557.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/451557.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2025/01/15/451557.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/451557.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/451557.html</trackback:ping><description><![CDATA[<div>VLLM量化推理</div>
<div><a href="https://llmc-zhcn.readthedocs.io/en/latest/backend/vllm.html#id1" target="_blank">https://llmc-zhcn.readthedocs.io/en/latest/backend/vllm.html#id1</a><br />
</div>
<div><br />
</div>
<div>安装此工具前需安装两个包:</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->sudo&nbsp;apt-get&nbsp;install&nbsp;cmake<br />
sudo&nbsp;apt-get&nbsp;install&nbsp;pkgconfig</div>
</div>
<div><br />
</div>
<div>配置huggingface镜像地址:</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->export&nbsp;HF_ENDPOINT=https://hf-mirror.com</div>
</div>
<br />
<div>下载代码库, 并安装python依赖</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->git&nbsp;clone&nbsp;https://github.com/ModelTC/llmc.git<br />
cd&nbsp;llmc/<br />
pip&nbsp;install&nbsp;-r&nbsp;requirements.txt</div>
</div>
<div><br />
</div>
<div>找到量化方法的配置文件, 并作修改</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->base:<br />
&nbsp;&nbsp;&nbsp;&nbsp;seed:&nbsp;&amp;seed&nbsp;42<br />
model:<br />
&nbsp;&nbsp;&nbsp;&nbsp;type:&nbsp;<span style="color: red;">Llama</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;path:&nbsp;<span style="color: red;">/home/paul/.cache/huggingface/models/models--unsloth--llama-</span><span style="color: red;">3</span><span style="color: red;">-8b-Instruct-lawdata</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;torch_dtype:&nbsp;auto<br />
quant:<br />
&nbsp;&nbsp;&nbsp;&nbsp;method:&nbsp;RTN<br />
&nbsp;&nbsp;&nbsp;&nbsp;weight:<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;bit:&nbsp;8<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;symmetric:&nbsp;True<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;granularity:&nbsp;per_group<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;group_size:&nbsp;128<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;need_pack:&nbsp;True<br />
eval:<br />
&nbsp;&nbsp;&nbsp;&nbsp;eval_pos:&nbsp;<span style="color: #800000; font-weight: bold; ">[</span><span style="color: #800000; ">fake_quant</span><span style="color: #800000; font-weight: bold; ">]</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;name:&nbsp;wikitext2<br />
&nbsp;&nbsp;&nbsp;&nbsp;download:&nbsp;True<br />
&nbsp;&nbsp;&nbsp;&nbsp;path:&nbsp;<span style="color: red;">/home/paul/paulwong/work/workspaces/llmc/dataset</span><br />
&nbsp;&nbsp;&nbsp;&nbsp;bs:&nbsp;1<br />
&nbsp;&nbsp;&nbsp;&nbsp;seq_len:&nbsp;2048<br />
&nbsp;&nbsp;&nbsp;&nbsp;inference_per_block:&nbsp;False<br />
save:<br />
&nbsp;&nbsp;&nbsp;&nbsp;save_vllm:&nbsp;True<br />
&nbsp;&nbsp;&nbsp;&nbsp;save_path:&nbsp;<span style="color: red;">/home/paul/.cache/huggingface/models/models--unsloth--llama-</span><span style="color: red;">3-8b-Instruct-lawdata-quantization</span><br />
</div>
</div>
<div><br />
</div>
<div>找到run_llmc.sh, 并作修改</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->#!/bin/bash<br />
<br />
#&nbsp;export&nbsp;CUDA_VISIBLE_DEVICES=0,1<br />
<br />
llmc=<span style="color: red;">/home/paul/paulwong/work/workspaces/llmc</span><br />
export&nbsp;PYTHONPATH=$llmc:$PYTHONPATH<br />
<br />
#&nbsp;task_name=awq_w_only<br />
#&nbsp;config=${llmc}/configs/quantization/methods/Awq/awq_w_only.yml<br />
task_name=<span style="color: red;">rtn_for_vllm</span><br />
config=${llmc}<span style="color: red;">/configs/quantization/backend/vllm/rtn_w8a16.yml</span><br />
<br />
nnodes=1<br />
nproc_per_node=1<br />
<br />
<br />
find_unused_port()&nbsp;{<br />
&nbsp;&nbsp;&nbsp;&nbsp;while&nbsp;true<span style="color: #008000; ">;</span><span style="color: #008000; ">&nbsp;do</span><span style="color: #008000; "><br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;port=$(shuf&nbsp;-i&nbsp;10000-60000&nbsp;-n&nbsp;1)<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;!&nbsp;ss&nbsp;-tuln&nbsp;|&nbsp;grep&nbsp;-q&nbsp;":$port&nbsp;"<span style="color: #008000; ">;</span><span style="color: #008000; ">&nbsp;then</span><span style="color: #008000; "><br />
</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;echo&nbsp;"$port"<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return&nbsp;0<br />
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fi<br />
&nbsp;&nbsp;&nbsp;&nbsp;done<br />
}<br />
UNUSED_PORT=$(find_unused_port)<br />
<br />
<br />
MASTER_ADDR=127.0.0.1<br />
MASTER_PORT=$UNUSED_PORT<br />
task_id=$UNUSED_PORT<br />
<br />
nohup&nbsp;\<br />
torchrun&nbsp;\<br />
--nnodes&nbsp;$nnodes&nbsp;\<br />
--nproc_per_node&nbsp;$nproc_per_node&nbsp;\<br />
--rdzv_id&nbsp;$task_id&nbsp;\<br />
--rdzv_backend&nbsp;c10d&nbsp;\<br />
--rdzv_endpoint&nbsp;$MASTER_ADDR:$MASTER_PORT&nbsp;\<br />
${llmc}/llmc/__main__.py&nbsp;--config&nbsp;$config&nbsp;--task_id&nbsp;$task_id&nbsp;\<br />
&gt;&nbsp;${task_name}.log&nbsp;2&gt;&amp;1&nbsp;&amp;<br />
<br />
sleep&nbsp;2<br />
ps&nbsp;aux&nbsp;|&nbsp;grep&nbsp;'__main__.py'&nbsp;|&nbsp;grep&nbsp;$task_id&nbsp;|&nbsp;awk&nbsp;'{print&nbsp;$2}'&nbsp;&gt;&nbsp;${task_name}.pid<br />
<br />
#&nbsp;You&nbsp;can&nbsp;kill&nbsp;this&nbsp;program&nbsp;by&nbsp;<br />
#&nbsp;xargs&nbsp;kill&nbsp;-9&nbsp;&lt;&nbsp;xxx.pid<br />
#&nbsp;xxx.pid&nbsp;is&nbsp;${task_name}.pid&nbsp;file<br />
</div>
</div>
<div><br />
</div>
<div>执行量化操作</div>
<div>
<div style="background-color: #eeeeee; font-size: 13px; border-left-color: #cccccc; border-image: none; padding: 4px 5px 4px 4px; width: 98%; word-break: break-all;"><!--<br />
<br />
Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
http://www.CodeHighlighter.com/<br />
<br />
-->bash&nbsp;scripts/run_llmc.sh</div>
</div>
<div><br />
</div>
<div><br />
</div>
<div><br />
</div>
<div><br />
</div>
<img src ="http://www.blogjava.net/paulwong/aggbug/451557.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2025-01-15 18:00 <a href="http://www.blogjava.net/paulwong/archive/2025/01/15/451557.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>