download meraki_genai_community_scraper - download do código fonte meraki_genai_community

meraki_genai_community_scraper

Código-Fonte de IA

1.0.0

Baixar

Como raspar um fórum para obter dados impressionantes de ajuste fino da GPT

Bem-vindo a este repositório, onde você aprenderá como vasculhar um fórum da comunidade em busca de alguns dados incríveis que você pode usar para ajustar seu modelo chatGPT. Este é um projeto interessante que mostra como construir uma solução de IA do zero, desde a coleta de dados até a implantação do modelo.

Esta é a primeira parte de uma série de exemplos completos de ajuste fino do zero. A segunda parte está aqui: https://github.com/BenderScript/techGPT

Raspagem e limpeza de dados

Antes de chegarmos à parte divertida do GenAI, precisamos primeiro obter alguns dados. E não quaisquer dados, mas dados de alta qualidade que sejam relevantes e úteis para a nossa tarefa.

Raspar um fórum não é tão fácil quanto parece. Existem muitos desafios e decisões que precisamos enfrentar, como:

Como extraímos as perguntas e respostas da estrutura HTML das páginas do fórum?
Como lidamos com a paginação das páginas do fórum? Raspamos todas as páginas ou apenas as primeiras?
Como selecionamos as melhores respostas para cada pergunta? Consideramos apenas as soluções aceitas ou incluímos também aquelas com muitos votos positivos?
Como lidamos com erros gramaticais e ortográficos no texto? Nós os corrigimos ou os deixamos como estão?
Como lidamos com trechos de código, links, e-mails, tags HTML, emojis e caracteres especiais no texto? Nós os removemos ou os mantemos? E se os mantivermos, como os formataremos adequadamente?

Como você pode ver, há muitas coisas em que pensar antes de começarmos a extrair os dados.

Reescrevendo as perguntas e respostas

Depois de extrair e limpar os dados, precisamos prepará-los para o treinamento. Isso significa que precisamos transformar os dados em um formato que a GenAI possa entender. Por outras palavras, precisamos de reformular as perguntas e respostas de uma forma que preserve o seu significado e tom.

Por exemplo, uma pergunta como: "Tentei configurar um novo AP wireless e preciso saber como configurar o SSID e a senha. Alguém pode me ajudar?" precisa ser reformulada como: "Como configuro o SSID e senha em um novo AP sem fio?"

O mesmo vale para as respostas. Precisamos reformulá-los de forma clara e concisa.

A abordagem que usei neste repositório é aproveitar o próprio chatGPT para limpar e reformular as perguntas e respostas.

No início, tentei codificar uma expressão regex complexa para limpar os dados, mas depois de muitos testes, percebi que o chatGPT poderia fazer um trabalho melhor.

Também descobri que fazer isso em duas etapas era melhor. Primeiro, pedi ao chatGPT para limpar os dados e depois pedi para reformulá-los. Sim, isso leva mais tempo, mas os resultados valem a pena.

Ok, aqui está uma possível versão reescrita dessa parte:

Como executá-lo

Para fazer este repositório funcionar, você precisa do Python 3.9 ou superior em seu computador. Você também precisa obter as bibliotecas Python em requirements.txt usando pip:

Você precisa de um arquivo .env com sua chave de API OpenAI como em

OPENAI_API_KEY= < your api key >

Para executá-lo com as configurações padrão, basta digitar:

python3 main.py

Exemplos

Exemplos de arquivos de linhas JSON salvos

{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be causing the issue if the green LED is showing on all the newly added MR44 devices but they are not being displayed as online on the dashboard?"}, {"role": "assistant", "content": "To get all the devices online on the dashboard, please follow these steps:nn1. Remove the device from the current network.n2. Create a new network.n3. Add the device to the new network.n4. Factory reset the device.n5. Move the device back to the original network.nnFor more information, you can refer to the original post here: [Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on](/t5/Wireless-LAN/Meraki-is-showing-green-led-but-it-doesn-t-show-that-online-on/m-p/212259#M29291)"}]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What could be the reason for the presence of different application categories in my networks, where one network has a more extensive drop-down list of applications compared to the other? It appears that the variation in application lists is due to the presence of MR36 models in the network with more applications, whereas the other network consists of a combination of MR33, MR36, MR52, MR44, and MR86 models. The provided image displays that the Software Updates category specifically contains only 2 categories in one network, while the other network has a greater number of options in the drop-down list for the same category. This pattern is also observed in other Application Categories."}, {"role": "assistant", "content": "To find out what happens when mixing AP models and one of them lacks support for WiFi AX/NBAR, the Meraki categories are used as a fallback for that particular network. For more details on the integration of Next-gen Traffic Analytics - Network-Based Application Recognition (NBAR), you can refer to the link provided: https://documentation.meraki.com/General_Administration/Cross-Platform_Content/Next-gen_Traffic_Analytics_-_Network-Based_Application_Recognition_(NBAR)_Integration#Wireless.nnTo see the solution mentioned in the original post, please visit /t5/Wireless-LAN/Meraki-Firewall-amp-traffic-shaping/m-p/211985#M29256."}]}
{ "messages" : [{ "role" : " system " , "content" : " You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible. " }, { "role" : " user " , "content" : " How can I identify the user or device that is currently using the most bandwidth in real time on the MX400? " }, { "role" : " assistant " , "content" : " No, I don't think so on the MX series. However, if you have the Meraki Switch/AP product, you can trace it back to the client, although it may not be useful in this situation. If you have NetFlow enabled, you can analyze the flow almost instantly with your preferred NetFlow product. " }]}
{"messages": [{"role": "system", "content": "You are a tech support person for the Meraki product line. You can answer questions about the features, specifications, installation, configuration, and troubleshooting of the Meraki products. You are polite,professional, and helpful. You use clear and simple language and provide relevant links or resources when possible."}, {"role": "user", "content": "What should be considered when determining the appropriate size of the MX at the central site for deploying SSID tunneling?"}, {"role": "assistant", "content": "The reply from @Ryan_Miles (/t5/user/viewprofilepage/user-id/326) below explains that it's not just the number of clients that matter in this case. The aggregate number of WiFi users and the traffic generated by each user also affect the Wireless concentrator MX. Ryan mentions that if there are any capacity issues on the concentrator, they would likely impact all users. Therefore, it's advisable to avoid reaching its maximum capacity. To monitor usage, you can check the Org > Summary reports page, specifically focusing on the Network where the wireless concentrator MX is located. Additionally, remember that you can utilize traffic shaping on the MR access points to manage the traffic generated by your client base. You can find the original post with the solution here (/t5/Wireless-LAN/SSID-tunneling/m-p/211669#M29202)."}]}