meraki_genai_community_scraper تنزيل - meraki_genai_community

meraki_genai_community_scraper

كود الذكاء الاصطناعي

1.0.0

تنزيل

كيفية استخراج منتدى للحصول على بيانات ضبط GPT الرائعة

مرحبًا بك في هذا المستودع، حيث ستتعلم كيفية استخراج بعض البيانات المذهلة من منتدى مجتمعي والتي يمكنك استخدامها لتحسين نموذج chatGPT الخاص بك. هذا مشروع رائع يوضح لك كيفية إنشاء حل للذكاء الاصطناعي من البداية، بدءًا من جمع البيانات وحتى نشر النموذج.

هذا هو الجزء الأول في سلسلة من أمثلة الضبط الدقيق الكاملة من البداية. الجزء الثاني هنا: https://github.com/BenderScript/techGPT

كشط وتنظيف البيانات

قبل أن نصل إلى الجزء الممتع من GenAI، نحتاج إلى الحصول على بعض البيانات أولاً. وليس فقط أية بيانات، بل بيانات عالية الجودة ذات صلة ومفيدة لمهمتنا.

إن تجريف المنتدى ليس بالأمر السهل كما يبدو. هناك العديد من التحديات والقرارات التي يتعين علينا مواجهتها، مثل:

كيف نستخرج الأسئلة والأجوبة من بنية HTML لصفحات المنتدى؟
كيف نتعامل مع ترقيم صفحات المنتدى؟ هل نقوم بكشط كل الصفحات أم الصفحات القليلة الأولى فقط؟
كيف نختار أفضل الإجابات لكل سؤال؟ هل نفكر فقط في الحلول المقبولة، أم ندرج أيضًا الحلول التي حصلت على العديد من الأصوات الإيجابية؟
كيف نتعامل مع الأخطاء النحوية والإملائية في النص؟ فهل نصححها أم نتركها كما هي؟
كيف نتعامل مع مقتطفات التعليمات البرمجية والروابط ورسائل البريد الإلكتروني وعلامات HTML والرموز التعبيرية والأحرف الخاصة في النص؟ هل نزيلها أم نحتفظ بها؟ وإذا احتفظنا بها، كيف يمكننا تنسيقها بشكل صحيح؟

كما ترون، هناك أشياء كثيرة يجب التفكير فيها قبل أن نتمكن من البدء في جمع البيانات.

إعادة كتابة الأسئلة والأجوبة

بمجرد الانتهاء من مسح البيانات وتنظيفها، نحتاج إلى إعدادها للتدريب. وهذا يعني أننا بحاجة إلى تحويل البيانات إلى تنسيق يمكن لـ GenAI فهمه. بمعنى آخر، نحن بحاجة إلى إعادة صياغة الأسئلة والأجوبة بطريقة تحافظ على معناها ونبرتها.

على سبيل المثال، سؤال مثل: " لقد حاولت تكوين نقطة وصول لاسلكية جديدة وأريد معرفة كيفية تكوين SSID وكلمة المرور. هل يمكن لأي شخص مساعدتي؟" يجب إعادة صياغته على النحو التالي: "كيف أقوم بإعداد SSID وكلمة المرور". كلمة المرور على نقطة وصول لاسلكية جديدة؟"

الشيء نفسه ينطبق على الإجابات. وعلينا أن نعيد صياغتها بطريقة واضحة وموجزة.

النهج الذي استخدمته في هذا المستودع هو الاستفادة من chatGPT نفسه لتنظيف وإعادة صياغة الأسئلة والأجوبة.

في البداية، حاولت ترميز تعبير عادي معقد لتنظيف البيانات، ولكن بعد العديد من الاختبارات، أدركت أن chatGPT يمكنه القيام بعمل أفضل.

واكتشفت أيضًا أن القيام بذلك في خطوتين كان أفضل. أولاً، طلبت من chatGPT تنظيف البيانات، ثم طلبت منه إعادة صياغتها. نعم، هذا يستغرق المزيد من الوقت، ولكن النتائج تستحق العناء.

حسنًا، إليك نسخة مُعاد كتابتها من هذا الجزء:

كيفية تشغيله

لكي يعمل هذا المستودع، تحتاج إلى Python 3.9 أو أعلى على جهاز الكمبيوتر الخاص بك. تحتاج أيضًا إلى الحصول على مكتبات Python في ملف require.txt باستخدام النقطة:

أنت بحاجة إلى ملف .env مع مفتاح OpenAI API الخاص بك كما في

OPENAI_API_KEY= < your api key >

لتشغيله بالإعدادات الافتراضية، فقط اكتب:

python3 main.py

أمثلة

أمثلة على ملفات JSON Lines المحفوظة

{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be causing the issue if the green LED is showing on all the newly added MR44 devices but they are not being displayed as online on the dashboard?"}, {"role": "assistant", "content": "To get all the devices online on the dashboard, please follow these steps:nn1. Remove the device from the current network.n2. Create a new network.n3. Add the device to the new network.n4. Factory reset the device.n5. Move the device back to the original network.nnFor more information, you can refer to the original post here: [Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on](/t5/Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on/m-p/212259#M29291)"}]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be the reason for the presence of different application categories in my networks, where one network has a more extensive drop-down list of applications compared to the other? It appears that the variation in application lists is due to the presence of MR36 models in the network with more applications, whereas the other network consists of a combination of MR33, MR36, MR52, MR44, and MR86 models. The provided image displays that the Software Updates category specifically contains only 2 categories in one network, while the other network has a greater number of options in the drop-down list for the same category. This pattern is also observed in other Application Categories."}, {"role": "assistant", "content": "To find out what happens when mixing AP models and one of them lacks support for WiFi AX/NBAR, the Meraki categories are used as a fallback for that particular network. For more details on the integration of Next-gen Traffic Analytics - Network-Based Application Recognition (NBAR), you can refer to the link provided: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Next-gen_Traffic_Analytics_-_Network-Based_Application_Recognition_(NBAR)_Integration#Wireless.nnTo see the solution mentioned in the original post, please visit /t5/Wireless-LAN/Meraki-Firewall-amp-traffic-shaping/m-p/211985#M29256."}]}
{ "messages" : [{ "role" : " system " , "content" : " You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible. " }, { "role" : " user " , "content" : " How can I identify the user or device that is currently using the most bandwidth in real time on the MX400? " }, { "role" : " assistant " , "content" : " No, I don't think so on the MX series. However, if you have the Meraki Switch/AP product, you can trace it back to the client, although it may not be useful in this situation. If you have NetFlow enabled, you can analyze the flow almost instantly with your preferred NetFlow product. " }]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What should be considered when determining the appropriate size of the MX at the central site for deploying SSID tunneling?"}, {"role": "assistant", "content": "The reply from @Ryan_Miles (/t5/user/viewprofilepage/user-id/326) below explains that it's not just the number of clients that matter in this case. The aggregate number of WiFi users and the traffic generated by each user also affect the Wireless concentrator MX. Ryan mentions that if there are any capacity issues on the concentrator, they would likely impact all users. Therefore, it's advisable to avoid reaching its maximum capacity. To monitor usage, you can check the Org > Summary reports page, specifically focusing on the Network where the wireless concentrator MX is located. Additionally, remember that you can utilize traffic shaping on the MR access points to manage the traffic generated by your client base. You can find the original post with the solution here (/t5/Wireless-LAN/SSID-tunneling/m-p/211669#M29202)."}]}