歡迎來到這個儲存庫,您將在這裡學習如何從社區論壇中獲取一些令人驚嘆的數據,您可以使用這些數據來微調您的 chatGPT 模型。這是一個很酷的項目,向您展示如何從頭開始建立人工智慧解決方案,從資料收集到模型部署。
這是從頭開始的完整微調範例係列的第一部分。第二部分在這裡:https://github.com/BenderScript/techGPT
在我們開始了解 GenAI 的有趣部分之前,我們需要先取得一些數據。不僅僅是任何數據,而是與我們的任務相關且有用的高品質數據。
抓取論壇並不像聽起來那麼容易。我們需要面對許多挑戰和決定,例如:
正如您所看到的,在我們開始抓取資料之前,有很多事情需要考慮。
一旦我們抓取並清理了數據,我們就需要為訓練做好準備。這意味著我們需要將資料轉換為 GenAI 可以理解的格式。換句話說,我們需要以保留其意義和語氣的方式重新表達問題和答案。
例如,這樣的問題:「我嘗試設定新的無線 AP,我需要知道如何設定 SSID 和密碼。有人可以幫助我嗎?」需要改寫為:「我如何設定 SSID 和密碼?」新無線AP 的密碼?
答案也是如此。我們需要以清晰簡潔的方式重新表達它們。
我在此儲存庫中使用的方法是利用 chatGPT 本身來清理和改寫問題和答案。
起初,我嘗試編寫複雜的正規表示式來清理數據,但經過多次測試,我意識到 chatGPT 可以做得更好。
我還發現分兩步驟進行會更好。首先,我要求chatGPT清理數據,然後我要求它重新措詞。是的,這需要更多時間,但結果是值得的。
好的,這是該部分的可能重寫版本:
要使此儲存庫正常運作,您的電腦上需要安裝Python 3.9或更高版本。您還需要使用 pip 來取得requirements.txt中的Python 函式庫:
您需要一個包含 OpenAI API 金鑰的.env
文件,如下所示
OPENAI_API_KEY= < your api key >
要使用預設設定運行它,只需鍵入:
python3 main.py
儲存的 JSON Lines 檔案的範例
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be causing the issue if the green LED is showing on all the newly added MR44 devices but they are not being displayed as online on the dashboard?"}, {"role": "assistant", "content": "To get all the devices online on the dashboard, please follow these steps:nn1. Remove the device from the current network.n2. Create a new network.n3. Add the device to the new network.n4. Factory reset the device.n5. Move the device back to the original network.nnFor more information, you can refer to the original post here: [Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on](/t5/Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on/m-p/212259#M29291)"}]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be the reason for the presence of different application categories in my networks, where one network has a more extensive drop-down list of applications compared to the other? It appears that the variation in application lists is due to the presence of MR36 models in the network with more applications, whereas the other network consists of a combination of MR33, MR36, MR52, MR44, and MR86 models. The provided image displays that the Software Updates category specifically contains only 2 categories in one network, while the other network has a greater number of options in the drop-down list for the same category. This pattern is also observed in other Application Categories."}, {"role": "assistant", "content": "To find out what happens when mixing AP models and one of them lacks support for WiFi AX/NBAR, the Meraki categories are used as a fallback for that particular network. For more details on the integration of Next-gen Traffic Analytics - Network-Based Application Recognition (NBAR), you can refer to the link provided: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Next-gen_Traffic_Analytics_-_Network-Based_Application_Recognition_(NBAR)_Integration#Wireless.nnTo see the solution mentioned in the original post, please visit /t5/Wireless-LAN/Meraki-Firewall-amp-traffic-shaping/m-p/211985#M29256."}]}
{ "messages" : [{ "role" : " system " , "content" : " You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible. " }, { "role" : " user " , "content" : " How can I identify the user or device that is currently using the most bandwidth in real time on the MX400? " }, { "role" : " assistant " , "content" : " No, I don't think so on the MX series. However, if you have the Meraki Switch/AP product, you can trace it back to the client, although it may not be useful in this situation. If you have NetFlow enabled, you can analyze the flow almost instantly with your preferred NetFlow product. " }]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What should be considered when determining the appropriate size of the MX at the central site for deploying SSID tunneling?"}, {"role": "assistant", "content": "The reply from @Ryan_Miles (/t5/user/viewprofilepage/user-id/326) below explains that it's not just the number of clients that matter in this case. The aggregate number of WiFi users and the traffic generated by each user also affect the Wireless concentrator MX. Ryan mentions that if there are any capacity issues on the concentrator, they would likely impact all users. Therefore, it's advisable to avoid reaching its maximum capacity. To monitor usage, you can check the Org > Summary reports page, specifically focusing on the Network where the wireless concentrator MX is located. Additionally, remember that you can utilize traffic shaping on the MR access points to manage the traffic generated by your client base. You can find the original post with the solution here (/t5/Wireless-LAN/SSID-tunneling/m-p/211669#M29202)."}]}
如果您認為這很有趣並且想要貢獻或建議更好的方法來清理和準備數據,我洗耳恭聽!