欢迎来到这个存储库,您将在这里学习如何从社区论坛中获取一些令人惊叹的数据,您可以使用这些数据来微调您的 chatGPT 模型。这是一个很酷的项目,向您展示如何从头开始构建人工智能解决方案,从数据收集到模型部署。
这是从头开始的完整微调示例系列的第一部分。第二部分在这里:https://github.com/BenderScript/techGPT
在我们开始了解 GenAI 的有趣部分之前,我们需要先获取一些数据。不仅仅是任何数据,而是与我们的任务相关且有用的高质量数据。
抓取论坛并不像听起来那么容易。我们需要面对许多挑战和决定,例如:
正如您所看到的,在我们开始抓取数据之前,有很多事情需要考虑。
一旦我们抓取并清理了数据,我们就需要为训练做好准备。这意味着我们需要将数据转换为 GenAI 可以理解的格式。换句话说,我们需要以保留其含义和语气的方式重新表述问题和答案。
例如,这样的问题:“我尝试配置一个新的无线 AP,我需要知道如何配置 SSID 和密码。有人可以帮助我吗?”需要改写为:“我如何设置 SSID 和密码?”新无线 AP 的密码?”
答案也是如此。我们需要以清晰简洁的方式重新表述它们。
我在此存储库中使用的方法是利用 chatGPT 本身来清理和改写问题和答案。
起初,我尝试编写复杂的正则表达式来清理数据,但经过多次测试,我意识到 chatGPT 可以做得更好。
我还发现分两步进行会更好。首先,我要求chatGPT清理数据,然后我要求它重新措辞。是的,这需要更多时间,但结果是值得的。
好的,这是该部分的可能重写版本:
要使此存储库正常工作,您的计算机上需要安装Python 3.9或更高版本。您还需要使用 pip 获取requirements.txt中的Python 库:
您需要一个包含 OpenAI API 密钥的.env
文件,如下所示
OPENAI_API_KEY= < your api key >
要使用默认设置运行它,只需键入:
python3 main.py
保存的 JSON Lines 文件的示例
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be causing the issue if the green LED is showing on all the newly added MR44 devices but they are not being displayed as online on the dashboard?"}, {"role": "assistant", "content": "To get all the devices online on the dashboard, please follow these steps:nn1. Remove the device from the current network.n2. Create a new network.n3. Add the device to the new network.n4. Factory reset the device.n5. Move the device back to the original network.nnFor more information, you can refer to the original post here: [Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on](/t5/Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on/m-p/212259#M29291)"}]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be the reason for the presence of different application categories in my networks, where one network has a more extensive drop-down list of applications compared to the other? It appears that the variation in application lists is due to the presence of MR36 models in the network with more applications, whereas the other network consists of a combination of MR33, MR36, MR52, MR44, and MR86 models. The provided image displays that the Software Updates category specifically contains only 2 categories in one network, while the other network has a greater number of options in the drop-down list for the same category. This pattern is also observed in other Application Categories."}, {"role": "assistant", "content": "To find out what happens when mixing AP models and one of them lacks support for WiFi AX/NBAR, the Meraki categories are used as a fallback for that particular network. For more details on the integration of Next-gen Traffic Analytics - Network-Based Application Recognition (NBAR), you can refer to the link provided: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Next-gen_Traffic_Analytics_-_Network-Based_Application_Recognition_(NBAR)_Integration#Wireless.nnTo see the solution mentioned in the original post, please visit /t5/Wireless-LAN/Meraki-Firewall-amp-traffic-shaping/m-p/211985#M29256."}]}
{ "messages" : [{ "role" : " system " , "content" : " You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible. " }, { "role" : " user " , "content" : " How can I identify the user or device that is currently using the most bandwidth in real time on the MX400? " }, { "role" : " assistant " , "content" : " No, I don't think so on the MX series. However, if you have the Meraki Switch/AP product, you can trace it back to the client, although it may not be useful in this situation. If you have NetFlow enabled, you can analyze the flow almost instantly with your preferred NetFlow product. " }]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What should be considered when determining the appropriate size of the MX at the central site for deploying SSID tunneling?"}, {"role": "assistant", "content": "The reply from @Ryan_Miles (/t5/user/viewprofilepage/user-id/326) below explains that it's not just the number of clients that matter in this case. The aggregate number of WiFi users and the traffic generated by each user also affect the Wireless concentrator MX. Ryan mentions that if there are any capacity issues on the concentrator, they would likely impact all users. Therefore, it's advisable to avoid reaching its maximum capacity. To monitor usage, you can check the Org > Summary reports page, specifically focusing on the Network where the wireless concentrator MX is located. Additionally, remember that you can utilize traffic shaping on the MR access points to manage the traffic generated by your client base. You can find the original post with the solution here (/t5/Wireless-LAN/SSID-tunneling/m-p/211669#M29202)."}]}
如果您认为这很有趣并且想要贡献或建议更好的方法来清理和准备数据,我洗耳恭听!