ดาวน์โหลด katana - ดาวน์โหลด katana ซอร์สโค้ด

เฟรมเวิร์กการรวบรวมข้อมูลและสไปเดอร์แห่งยุคถัดไป

คุณสมบัติ • การติดตั้ง • การใช้งาน • ขอบเขต • การกำหนดค่า • ตัวกรอง • เข้าร่วม Discord

คุณสมบัติ

ภาพ

การรวบรวมข้อมูลเว็บที่รวดเร็วและกำหนดค่าได้อย่างสมบูรณ์
โหมด มาตรฐาน และ โหมดหัวขาด
การแยกวิเคราะห์ / การรวบรวม ข้อมูล JavaScript
การกรอกแบบฟอร์มอัตโนมัติ ที่ปรับแต่งได้
การควบคุมขอบเขต - ฟิลด์ที่กำหนดค่าไว้ล่วงหน้า / Regex
เอาต์พุตที่ปรับแต่งได้ - ฟิลด์ที่กำหนดค่าไว้ล่วงหน้า
อินพุต - STDIN , URL และ รายการ
เอาท์พุท - STDOUT , FILE และ JSON

การติดตั้ง

katana ต้องใช้ Go 1.18 เพื่อติดตั้งได้สำเร็จ หากต้องการติดตั้ง เพียงรันคำสั่งด้านล่างหรือดาวน์โหลดไบนารีที่คอมไพล์ไว้ล่วงหน้าจากหน้าเผยแพร่

 CGO_ENABLED=1 go install github.com/projectdiscovery/katana/cmd/katana@latest

ตัวเลือกเพิ่มเติมในการติดตั้ง / รัน katana-

นักเทียบท่า

วิธีติดตั้ง / อัปเดตนักเทียบท่าเป็นแท็กล่าสุด -

docker pull projectdiscovery/katana:latest

หากต้องการรัน katana ในโหมดมาตรฐานโดยใช้นักเทียบท่า -

docker run projectdiscovery/katana:latest -u https://tesla.com

หากต้องการเรียกใช้ katana ในโหมดหัวขาดโดยใช้นักเทียบท่า -

docker run projectdiscovery/katana:latest -u https://tesla.com -system-chrome -headless

อูบุนตู

ขอแนะนำให้ติดตั้งข้อกำหนดเบื้องต้นต่อไปนี้ -

sudo apt update
sudo snap refresh
sudo apt install zip curl wget git
sudo snap install golang --classic
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - 
sudo sh -c ' echo "deb http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list '
sudo apt update 
sudo apt install google-chrome-stable

ติดตั้งคาทาน่า -

go install github.com/projectdiscovery/katana/cmd/katana@latest

การใช้งาน

 katana -h

นี่จะแสดงความช่วยเหลือสำหรับเครื่องมือ นี่คือสวิตช์ทั้งหมดที่รองรับ

 Katana is a fast crawler focused on execution in automation
pipelines offering both headless and non-headless crawling.

Usage:
  ./katana [flags]

Flags:
INPUT:
   -u, -list string[]  target url / list to crawl
   -resume string      resume scan using resume.cfg
   -e, -exclude string[]  exclude host matching specified filter ('cdn', 'private-ips', cidr, ip, regex)

CONFIGURATION:
   -r, -resolvers string[]       list of custom resolver (file or comma separated)
   -d, -depth int                maximum depth to crawl (default 3)
   -jc, -js-crawl                enable endpoint parsing / crawling in javascript file
   -jsl, -jsluice                enable jsluice parsing in javascript file (memory intensive)
   -ct, -crawl-duration value    maximum duration to crawl the target for (s, m, h, d) (default s)
   -kf, -known-files string      enable crawling of known files (all,robotstxt,sitemapxml), a minimum depth of 3 is required to ensure all known files are properly crawled.
   -mrs, -max-response-size int  maximum response size to read (default 9223372036854775807)
   -timeout int                  time to wait for request in seconds (default 10)
   -aff, -automatic-form-fill    enable automatic form filling (experimental)
   -fx, -form-extraction         extract form, input, textarea & select elements in jsonl output
   -retry int                    number of times to retry the request (default 1)
   -proxy string                 http/socks5 proxy to use
   -H, -headers string[]         custom header/cookie to include in all http request in header:value format (file)
   -config string                path to the katana configuration file
   -fc, -form-config string      path to custom form configuration file
   -flc, -field-config string    path to custom field configuration file
   -s, -strategy string          Visit strategy (depth-first, breadth-first) (default "depth-first")
   -iqp, -ignore-query-params    Ignore crawling same path with different query-param values
   -tlsi, -tls-impersonate       enable experimental client hello (ja3) tls randomization
   -dr, -disable-redirects       disable following redirects (default false)

DEBUG:
   -health-check, -hc        run diagnostic check up
   -elog, -error-log string  file to write sent requests error log

HEADLESS:
   -hl, -headless                    enable headless hybrid crawling (experimental)
   -sc, -system-chrome               use local installed chrome browser instead of katana installed
   -sb, -show-browser                show the browser on the screen with headless mode
   -ho, -headless-options string[]   start headless chrome with additional options
   -nos, -no-sandbox                 start headless chrome in --no-sandbox mode
   -cdd, -chrome-data-dir string     path to store chrome browser data
   -scp, -system-chrome-path string  use specified chrome browser for headless crawling
   -noi, -no-incognito               start headless chrome without incognito mode
   -cwu, -chrome-ws-url string       use chrome browser instance launched elsewhere with the debugger listening at this URL
   -xhr, -xhr-extraction             extract xhr request url,method in jsonl output

SCOPE:
   -cs, -crawl-scope string[]       in scope url regex to be followed by crawler
   -cos, -crawl-out-scope string[]  out of scope url regex to be excluded by crawler
   -fs, -field-scope string         pre-defined scope field (dn,rdn,fqdn) or custom regex (e.g., '(company-staging.io|company.com)') (default "rdn")
   -ns, -no-scope                   disables host based default scope
   -do, -display-out-scope          display external endpoint from scoped crawling

FILTER:
   -mr, -match-regex string[]       regex or list of regex to match on output url (cli, file)
   -fr, -filter-regex string[]      regex or list of regex to filter on output url (cli, file)
   -f, -field string                field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
   -sf, -store-field string         field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
   -em, -extension-match string[]   match output for given extension (eg, -em php,html,js)
   -ef, -extension-filter string[]  filter output for given extension (eg, -ef png,css)
   -mdc, -match-condition string    match response with dsl based condition
   -fdc, -filter-condition string   filter response with dsl based condition

RATE-LIMIT:
   -c, -concurrency int          number of concurrent fetchers to use (default 10)
   -p, -parallelism int          number of concurrent inputs to process (default 10)
   -rd, -delay int               request delay between each request in seconds
   -rl, -rate-limit int          maximum requests to send per second (default 150)
   -rlm, -rate-limit-minute int  maximum number of requests to send per minute

UPDATE:
   -up, -update                 update katana to latest version
   -duc, -disable-update-check  disable automatic katana update check

OUTPUT:
   -o, -output string                file to write output to
   -sr, -store-response              store http requests/responses
   -srd, -store-response-dir string  store http requests/responses to custom directory
   -sfd, -store-field-dir string     store per-host field to custom directory
   -or, -omit-raw                    omit raw requests/responses from jsonl output
   -ob, -omit-body                   omit response body from jsonl output
   -j, -jsonl                        write output in jsonl format
   -nc, -no-color                    disable output content coloring (ANSI escape codes)
   -silent                           display output only
   -v, -verbose                      display verbose output
   -debug                            display debug output
   -version                          display project version

วิ่งคาทาน่า

อินพุตสำหรับคาตานะ

katana ต้องใช้ url หรือ จุดสิ้นสุด ในการรวบรวมข้อมูลและยอมรับอินพุตเดียวหรือหลายอินพุต

URL อินพุตสามารถระบุได้โดยใช้ตัวเลือก -u และสามารถระบุค่าได้หลายค่าโดยใช้อินพุตที่คั่นด้วยเครื่องหมายจุลภาค อินพุต ไฟล์ ในทำนองเดียวกันได้รับการสนับสนุนโดยใช้ตัวเลือก -list และรองรับอินพุตแบบไปป์เพิ่มเติม (stdin) ด้วย

อินพุต URL

katana -u https://tesla.com

อินพุต URL หลายรายการ (คั่นด้วยเครื่องหมายจุลภาค)

katana -u https://tesla.com,https://google.com

รายการอินพุต

$ cat url_list.txt

https://tesla.com
https://google.com

 katana -list url_list.txt

อินพุต STDIN (ไปป์)

 echo https://tesla.com | katana

cat domains | httpx | katana

ตัวอย่างการวิ่ง katana -

 katana -u https://youtube.com

   __        __                
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ / _  /
/_/_\_,_/__/_,_/_//_/_,_/ v0.0.1                     

      projectdiscovery.io

[WRN] Use with caution. You are responsible for your actions.
[WRN] Developers assume no liability and are not responsible for any misuse or damage.
https://www.youtube.com/
https://www.youtube.com/about/
https://www.youtube.com/about/press/
https://www.youtube.com/about/copyright/
https://www.youtube.com/t/contact_us/
https://www.youtube.com/creators/
https://www.youtube.com/ads/
https://www.youtube.com/t/terms
https://www.youtube.com/t/privacy
https://www.youtube.com/about/policies/
https://www.youtube.com/howyoutubeworks?utm_campaign=ytgen&utm_source=ythp&utm_medium=LeftNav&utm_content=txt&u=https%3A%2F%2Fwww.youtube.com%2Fhowyoutubeworks%3Futm_source%3Dythp%26utm_medium%3DLeftNav%26utm_campaign%3Dytgen
https://www.youtube.com/new
https://m.youtube.com/
https://www.youtube.com/s/desktop/4965577f/jsbin/desktop_polymer.vflset/desktop_polymer.js
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-home-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/cssbin/www-onepick.css
https://www.youtube.com/s/_/ytmainappweb/_/ss/k=ytmainappweb.kevlar_base.0Zo5FUcPkCg.L.B1.O/am=gAE/d=0/rs=AGKMywG5nh5Qp-BGPbOaI1evhF5BVGRZGA
https://www.youtube.com/opensearch?locale=en_GB
https://www.youtube.com/manifest.webmanifest
https://www.youtube.com/s/desktop/4965577f/cssbin/www-main-desktop-watch-page-skeleton.css
https://www.youtube.com/s/desktop/4965577f/jsbin/web-animations-next-lite.min.vflset/web-animations-next-lite.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/custom-elements-es5-adapter.vflset/custom-elements-es5-adapter.js
https://www.youtube.com/s/desktop/4965577f/jsbin/webcomponents-sd.vflset/webcomponents-sd.js
https://www.youtube.com/s/desktop/4965577f/jsbin/intersection-observer.min.vflset/intersection-observer.min.js
https://www.youtube.com/s/desktop/4965577f/jsbin/scheduler.vflset/scheduler.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-i18n-constants-en_GB.vflset/www-i18n-constants.js
https://www.youtube.com/s/desktop/4965577f/jsbin/www-tampering.vflset/www-tampering.js
https://www.youtube.com/s/desktop/4965577f/jsbin/spf.vflset/spf.js
https://www.youtube.com/s/desktop/4965577f/jsbin/network.vflset/network.js
https://www.youtube.com/howyoutubeworks/
https://www.youtube.com/trends/
https://www.youtube.com/jobs/
https://www.youtube.com/kids/

โหมดการรวบรวมข้อมูล

โหมดมาตรฐาน

รูปแบบการรวบรวมข้อมูลมาตรฐานใช้ไลบรารี go http มาตรฐานภายใต้ประทุนเพื่อจัดการคำขอ/การตอบกลับ HTTP วิธีนี้เร็วกว่ามากเนื่องจากไม่มีค่าใช้จ่ายในเบราว์เซอร์ อย่างไรก็ตาม จะวิเคราะห์เนื้อหาการตอบสนอง HTTP ตามที่เป็นอยู่ โดยไม่มี javascript หรือ DOM เรนเดอร์ใดๆ ซึ่งอาจขาดจุดสิ้นสุดที่เรนเดอร์หลัง dom หรือการเรียกจุดสิ้นสุดแบบอะซิงโครนัสที่อาจเกิดขึ้นในแอปพลิเคชันเว็บที่ซับซ้อน ขึ้นอยู่กับ ตัวอย่างเช่น ในเหตุการณ์เฉพาะของเบราว์เซอร์

โหมดหัวขาด

โหมด Headless จะเชื่อมต่อการโทรแบบ Headless ภายในเพื่อจัดการคำขอ/การตอบกลับ HTTP โดยตรงภายในบริบทของเบราว์เซอร์ สิ่งนี้มีข้อดีสองประการ:

ลายนิ้วมือ HTTP (TLS และตัวแทนผู้ใช้) ระบุไคลเอ็นต์อย่างสมบูรณ์ว่าเป็นเบราว์เซอร์ที่ถูกต้อง
ความครอบคลุมที่ดีขึ้นเนื่องจากตำแหน่งข้อมูลถูกค้นพบโดยการวิเคราะห์การตอบสนองแบบดิบมาตรฐาน เช่นเดียวกับในรูปแบบก่อนหน้านี้ และรวมถึงรูปแบบที่แสดงผลโดยเบราว์เซอร์ที่เปิดใช้งานจาวาสคริปต์

การรวบรวมข้อมูลแบบไม่มีส่วนหัวเป็นทางเลือกและสามารถเปิดใช้งานได้โดยใช้ตัวเลือก -headless

นี่คือตัวเลือก CLI ที่ไม่มีหัวอื่น ๆ -

 katana -h headless

Flags:
HEADLESS:
   -hl, -headless                    enable headless hybrid crawling (experimental)
   -sc, -system-chrome               use local installed chrome browser instead of katana installed
   -sb, -show-browser                show the browser on the screen with headless mode
   -ho, -headless-options string[]   start headless chrome with additional options
   -nos, -no-sandbox                 start headless chrome in --no-sandbox mode
   -cdd, -chrome-data-dir string     path to store chrome browser data
   -scp, -system-chrome-path string  use specified chrome browser for headless crawling
   -noi, -no-incognito               start headless chrome without incognito mode
   -cwu, -chrome-ws-url string       use chrome browser instance launched elsewhere with the debugger listening at this URL
   -xhr, -xhr-extraction             extract xhr requests

`-no-sandbox`

รันเบราว์เซอร์ Chrome ที่ไม่มีหัวพร้อมตัวเลือก แบบไม่มีแซนด์บ็อกซ์ ซึ่งมีประโยชน์เมื่อทำงานในฐานะผู้ใช้รูท

 katana -u https://tesla.com -headless -no-sandbox

`-no-incognito`

ใช้งานเบราว์เซอร์ Chrome ที่ไม่มีส่วนหัวโดยไม่มีโหมดไม่ระบุตัวตน ซึ่งมีประโยชน์เมื่อใช้เบราว์เซอร์ในเครื่อง

 katana -u https://tesla.com -headless -no-incognito

`-headless-options`

เมื่อรวบรวมข้อมูลในโหมด headless คุณสามารถระบุตัวเลือก Chrome เพิ่มเติมได้โดยใช้ -headless-options เช่น -

 katana -u https://tesla.com -headless -system-chrome -headless-options --disable-gpu,proxy-server=http://127.0.0.1:8080

การควบคุมขอบเขต

การรวบรวมข้อมูลสามารถไม่มีที่สิ้นสุดหากไม่ได้กำหนดขอบเขต เนื่องจาก katana ดังกล่าวมาพร้อมกับการรองรับหลายอย่างเพื่อกำหนดขอบเขตการรวบรวมข้อมูล

`-field-scope`

ตัวเลือกที่มีประโยชน์ที่สุดในการกำหนดขอบเขตด้วยชื่อฟิลด์ที่กำหนดไว้ล่วงหน้า rdn เป็นตัวเลือกเริ่มต้นสำหรับขอบเขตฟิลด์

rdn - การรวบรวมข้อมูลที่กำหนดขอบเขตไว้ที่ชื่อโดเมนรูทและโดเมนย่อยทั้งหมด (เช่น *example.com ) (ค่าเริ่มต้น)
fqdn - การรวบรวมข้อมูลกำหนดขอบเขตเป็นโดเมนย่อยที่กำหนด (เช่น www.example.com หรือ api.example.com )
dn - การรวบรวมข้อมูลกำหนดขอบเขตเป็นคำหลักชื่อโดเมน (เช่น example )

 katana -u https://tesla.com -fs dn

`-crawl-scope`

สำหรับการควบคุมขอบเขตขั้นสูง สามารถใช้ตัวเลือก -cs ที่มาพร้อมกับการรองรับ regex

 katana -u https://tesla.com -cs login

สำหรับกฎขอบเขตหลายรายการ สามารถส่งอินพุตไฟล์ด้วยสตริงหลายบรรทัด / regex ได้

$ cat in_scope.txt

login/
admin/
app/
wordpress/

 katana -u https://tesla.com -cs in_scope.txt

`-crawl-out-scope`

สำหรับการกำหนดสิ่งที่ไม่ควรรวบรวมข้อมูล คุณสามารถใช้ตัวเลือก -cos และยังรองรับอินพุต regex อีกด้วย

 katana -u https://tesla.com -cos logout

สำหรับกฎที่อยู่นอกขอบเขตหลายกฎ สามารถส่งอินพุตไฟล์ด้วยสตริงหลายบรรทัด / regex ได้

$ cat out_of_scope.txt

/logout
/log_out

 katana -u https://tesla.com -cos out_of_scope.txt

`-no-scope`

Katana เป็นค่าเริ่มต้นในขอบเขต *.domain เพื่อปิดการใช้งานตัวเลือก -ns นี้สามารถใช้ได้และเพื่อรวบรวมข้อมูลอินเทอร์เน็ตด้วย

 katana -u https://tesla.com -ns

`-display-out-scope`

ตามค่าเริ่มต้น เมื่อใช้ตัวเลือกขอบเขต ตัวเลือกนี้ยังใช้สำหรับลิงก์ที่จะแสดงเป็นเอาต์พุต เนื่องจาก URL ภายนอกดังกล่าวเป็นค่าเริ่มต้นที่จะยกเว้น และเขียนทับลักษณะการทำงานนี้ ตัวเลือก -do สามารถใช้เพื่อแสดง URL ภายนอกทั้งหมดที่มีอยู่ในเป้าหมาย URL ที่กำหนดขอบเขต / จุดสิ้นสุด

 katana -u https://tesla.com -do

นี่คือตัวเลือก CLI ทั้งหมดสำหรับการควบคุมขอบเขต -

 katana -h scope

Flags:
SCOPE:
   -cs, -crawl-scope string[]       in scope url regex to be followed by crawler
   -cos, -crawl-out-scope string[]  out of scope url regex to be excluded by crawler
   -fs, -field-scope string         pre-defined scope field (dn,rdn,fqdn) (default "rdn")
   -ns, -no-scope                   disables host based default scope
   -do, -display-out-scope          display external endpoint from scoped crawling

การกำหนดค่าซอฟต์แวร์รวบรวมข้อมูล

Katana มาพร้อมกับตัวเลือกมากมายในการกำหนดค่าและควบคุมการรวบรวมข้อมูลตามที่เราต้องการ

`-depth`

ตัวเลือกในการกำหนด depth เพื่อติดตาม URL สำหรับการรวบรวมข้อมูล ยิ่งมีความลึกมากเท่าใดจำนวนปลายทางที่ถูกรวบรวมข้อมูลก็จะมากขึ้น + เวลาในการรวบรวมข้อมูลก็จะมากขึ้น

 katana -u https://tesla.com -d 5

`-js-crawl`

ตัวเลือกในการเปิดใช้งานการแยกวิเคราะห์ไฟล์ JavaScript + การรวบรวมข้อมูลจุดสิ้นสุดที่พบในไฟล์ JavaScript ซึ่งปิดใช้งานตามค่าเริ่มต้น

 katana -u https://tesla.com -jc

`-crawl-duration`

ตัวเลือกในการกำหนดระยะเวลาการรวบรวมข้อมูลที่กำหนดไว้ล่วงหน้า ปิดใช้งานตามค่าเริ่มต้น

 katana -u https://tesla.com -ct 2

`-known-files`

ตัวเลือกในการเปิดใช้งานการรวบรวมข้อมูลไฟล์ robots.txt และ sitemap.xml ซึ่งปิดใช้งานตามค่าเริ่มต้น

 katana -u https://tesla.com -kf robotstxt,sitemapxml

`-automatic-form-fill`

ตัวเลือกในการเปิดใช้งานการกรอกแบบฟอร์มอัตโนมัติสำหรับฟิลด์ที่รู้จัก / ไม่รู้จัก ค่าฟิลด์ที่รู้จักสามารถปรับแต่งได้ตามต้องการโดยอัปเดตไฟล์กำหนดค่าแบบฟอร์มที่ $HOME/.config/katana/form-config.yaml

การกรอกแบบฟอร์มอัตโนมัติเป็นคุณลักษณะทดลอง

 katana -u https://tesla.com -aff

การรวบรวมข้อมูลที่ได้รับการรับรองความถูกต้อง

การรวบรวมข้อมูลที่มีการตรวจสอบความถูกต้องเกี่ยวข้องกับการรวมส่วนหัวหรือคุกกี้ที่กำหนดเองในคำขอ HTTP เพื่อเข้าถึงทรัพยากรที่ได้รับการป้องกัน ส่วนหัวเหล่านี้ให้ข้อมูลการตรวจสอบสิทธิ์หรือการอนุญาต ซึ่งช่วยให้คุณรวบรวมข้อมูลเนื้อหา/จุดสิ้นสุดที่ผ่านการตรวจสอบสิทธิ์ได้ คุณสามารถระบุส่วนหัวได้โดยตรงในบรรทัดคำสั่ง หรือจัดเตรียมเป็นไฟล์ที่มี katana เพื่อดำเนินการรวบรวมข้อมูลที่ได้รับการตรวจสอบสิทธิ์

หมายเหตุ : ผู้ใช้ต้องทำการตรวจสอบสิทธิ์ด้วยตนเองและส่งออกคุกกี้เซสชัน / ส่วนหัวไปยังไฟล์เพื่อใช้กับ katana

`-headers`

ตัวเลือกในการเพิ่มส่วนหัวหรือคุกกี้ที่กำหนดเองให้กับคำขอ

ไวยากรณ์ของส่วนหัวในข้อกำหนด HTTP

นี่คือตัวอย่างการเพิ่มคุกกี้ให้กับคำขอ:

 katana -u https://tesla.com -H 'Cookie: usrsess=AmljNrESo'

นอกจากนี้ยังสามารถระบุส่วนหัวหรือคุกกี้เป็นไฟล์ได้ด้วย ตัวอย่างเช่น:

 $ cat cookie.txt

Cookie: PHPSESSIONID=XXXXXXXXX
X-API-KEY: XXXXX
TOKEN=XX

 katana -u https://tesla.com -H cookie.txt

มีตัวเลือกเพิ่มเติมในการกำหนดค่าเมื่อจำเป็น นี่คือตัวเลือก CLI ที่เกี่ยวข้องกับการกำหนดค่าทั้งหมด -

 katana -h config

Flags:
CONFIGURATION:
   -r, -resolvers string[]       list of custom resolver (file or comma separated)
   -d, -depth int                maximum depth to crawl (default 3)
   -jc, -js-crawl                enable endpoint parsing / crawling in javascript file
   -ct, -crawl-duration int      maximum duration to crawl the target for
   -kf, -known-files string      enable crawling of known files (all,robotstxt,sitemapxml)
   -mrs, -max-response-size int  maximum response size to read (default 9223372036854775807)
   -timeout int                  time to wait for request in seconds (default 10)
   -aff, -automatic-form-fill    enable automatic form filling (experimental)
   -fx, -form-extraction         enable extraction of form, input, textarea & select elements
   -retry int                    number of times to retry the request (default 1)
   -proxy string                 http/socks5 proxy to use
   -H, -headers string[]         custom header/cookie to include in request
   -config string                path to the katana configuration file
   -fc, -form-config string      path to custom form configuration file
   -flc, -field-config string    path to custom field configuration file
   -s, -strategy string          Visit strategy (depth-first, breadth-first) (default "depth-first")

กำลังเชื่อมต่อกับเซสชันเบราว์เซอร์ที่ใช้งานอยู่

Katana ยังสามารถเชื่อมต่อกับเซสชันเบราว์เซอร์ที่ใช้งานอยู่ซึ่งผู้ใช้เข้าสู่ระบบและรับรองความถูกต้องแล้ว และใช้ในการคลาน ข้อกำหนดเพียงอย่างเดียวสำหรับสิ่งนี้คือการเริ่มเบราว์เซอร์โดยเปิดใช้งานการดีบักระยะไกล

นี่คือตัวอย่างของการเริ่มต้นเบราว์เซอร์ Chrome โดยเปิดใช้งานการดีบักระยะไกลและใช้กับ katana -

ขั้นตอนที่ 1) ขั้นแรกให้ค้นหาเส้นทางของ Chrome ที่ปฏิบัติการได้

ระบบปฏิบัติการ	ตำแหน่งที่ดำเนินการได้ของ Chromium	ตำแหน่งปฏิบัติการของ Google Chrome
Windows (64 บิต)	`C:Program Files (x86)GoogleChromiumApplicationchrome.exe`	`C:Program Files (x86)GoogleChromeApplicationchrome.exe`
Windows (32 บิต)	`C:Program FilesGoogleChromiumApplicationchrome.exe`	`C:Program FilesGoogleChromeApplicationchrome.exe`
ระบบปฏิบัติการ macOS	`/Applications/Chromium.app/Contents/MacOS/Chromium`	`/Applications/Google Chrome.app/Contents/MacOS/Google Chrome`
ลินุกซ์	`/usr/bin/chromium`	`/usr/bin/google-chrome`

ขั้นตอนที่ 2) เริ่ม Chrome โดยเปิดใช้งานการดีบักระยะไกล และมันจะส่งคืน URL ของ websocker ตัวอย่างเช่น บน MacOS คุณสามารถเริ่ม Chrome โดยเปิดใช้งานการดีบักระยะไกลโดยใช้คำสั่งต่อไปนี้ -

$ /Applications/Google  Chrome.app/Contents/MacOS/Google  Chrome --remote-debugging-port=9222


DevTools listening on ws://127.0.0.1:9222/devtools/browser/c5316c9c-19d6-42dc-847a-41d1aeebf7d6

ตอนนี้เข้าสู่เว็บไซต์ที่คุณต้องการรวบรวมข้อมูลและเปิดเบราว์เซอร์ไว้

ขั้นตอนที่ 3) ตอนนี้ใช้ URL ของ websocket กับ katana เพื่อเชื่อมต่อกับเซสชันเบราว์เซอร์ที่ใช้งานอยู่และรวบรวมข้อมูลเว็บไซต์

 katana -headless -u https://tesla.com -cwu ws://127.0.0.1:9222/devtools/browser/c5316c9c-19d6-42dc-847a-41d1aeebf7d6 -no-incognito

หมายเหตุ : คุณสามารถใช้ตัวเลือก -cdd เพื่อระบุไดเร็กทอรีข้อมูล Chrome ที่กำหนดเองเพื่อจัดเก็บข้อมูลเบราว์เซอร์และคุกกี้ แต่จะไม่บันทึกข้อมูลเซสชันหากตั้งค่าคุกกี้เป็น Session เท่านั้นหรือหมดอายุหลังจากเวลาที่กำหนด

ตัวกรอง

`-field`

Katana มาพร้อมกับฟิลด์ในตัวที่สามารถใช้เพื่อกรองเอาต์พุตสำหรับข้อมูลที่ต้องการ ตัวเลือก -f สามารถใช้เพื่อระบุฟิลด์ใด ๆ ที่มีอยู่

   -f, -field string  field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)

นี่คือตารางพร้อมตัวอย่างของแต่ละฟิลด์และผลลัพธ์ที่คาดหวังเมื่อใช้ -

สนาม	คำอธิบาย	ตัวอย่าง
`url`	ปลายทาง URL	`https://admin.projectdiscovery.io/admin/login?user=admin&password=admin`
`qurl`	URL รวมถึงพารามิเตอร์ข้อความค้นหา	`https://admin.projectdiscovery.io/admin/login.php?user=admin&password=admin`
`qpath`	เส้นทางรวมถึงพารามิเตอร์การสืบค้น	`/login?user=admin&password=admin`
`path`	เส้นทาง URL	`https://admin.projectdiscovery.io/admin/login`
`fqdn`	ชื่อโดเมนที่ผ่านการรับรองโดยสมบูรณ์	`admin.projectdiscovery.io`
`rdn`	ชื่อโดเมนราก	`projectdiscovery.io`
`rurl`	URL ราก	`https://admin.projectdiscovery.io`
`ufile`	URL พร้อมไฟล์	`https://admin.projectdiscovery.io/login.js`
`file`	ชื่อไฟล์ใน URL	`login.php`
`key`	คีย์พารามิเตอร์ใน URL	`user,password`
`value`	ค่าพารามิเตอร์ใน URL	`admin,admin`
`kv`	คีย์=ค่าใน URL	`user=admin&password=admin`
`dir`	ชื่อไดเรกทอรี URL	`/admin/`
`udir`	URL พร้อมไดเรกทอรี	`https://admin.projectdiscovery.io/admin/`

นี่คือตัวอย่างของการใช้ตัวเลือกฟิลด์เพื่อแสดงเฉพาะ URL ทั้งหมดที่มีพารามิเตอร์การสืบค้นอยู่ -

 katana -u https://tesla.com -f qurl -silent

https://shop.tesla.com/en_au?redirect=no
https://shop.tesla.com/en_nz?redirect=no
https://shop.tesla.com/product/men_s-raven-lightweight-zip-up-bomber-jacket?sku=1740250-00-A
https://shop.tesla.com/product/tesla-shop-gift-card?sku=1767247-00-A
https://shop.tesla.com/product/men_s-chill-crew-neck-sweatshirt?sku=1740176-00-A
https://www.tesla.com/about?redirect=no
https://www.tesla.com/about/legal?redirect=no
https://www.tesla.com/findus/list?redirect=no

ฟิลด์ที่กำหนดเอง

คุณสามารถสร้างช่องที่กำหนดเองเพื่อแยกและจัดเก็บข้อมูลเฉพาะจากการตอบกลับของเพจได้โดยใช้กฎ regex ฟิลด์ที่กำหนดเองเหล่านี้ถูกกำหนดโดยใช้ไฟล์กำหนดค่า YAML และโหลดจากตำแหน่งเริ่มต้นที่ $HOME/.config/katana/field-config.yaml หรือคุณสามารถใช้ตัวเลือก -flc เพื่อโหลดไฟล์กำหนดค่าฟิลด์ที่กำหนดเองจากตำแหน่งอื่น นี่คือตัวอย่างฟิลด์ที่กำหนดเอง

- name : email
  type : regex
  regex :
  - ' ([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+.[a-zA-Z0-9_-]+) '
  - ' ([a-zA-Z0-9+._-]+@[a-zA-Z0-9._-]+.[a-zA-Z0-9_-]+) '

- name : phone
  type : regex
  regex :
  - ' d{3}-d{8}|d{4}-d{7} '

เมื่อกำหนดฟิลด์ที่กำหนดเอง ระบบจะรองรับแอตทริบิวต์ต่อไปนี้:

ชื่อ (จำเป็น)

ค่าของแอตทริบิวต์ ชื่อ ถูกใช้เป็นค่าตัวเลือก -field cli

ประเภท (จำเป็น)

ประเภทของแอตทริบิวต์ที่กำหนดเอง ตัวเลือกที่รองรับในปัจจุบัน - regex

ส่วน (ไม่จำเป็น)

ส่วนการตอบสนองในการดึงข้อมูลออกมา ค่าเริ่มต้นคือ response ซึ่งรวมทั้งส่วนหัวและเนื้อหา ค่าอื่นๆ ที่เป็นไปได้คือ header และ body

กลุ่ม (ไม่จำเป็น)

คุณสามารถใช้แอตทริบิวต์นี้เพื่อเลือกกลุ่มที่ตรงกันใน regex เช่น group: 1

ใช้ katana โดยใช้ฟิลด์ที่กำหนดเอง:

 katana -u https://tesla.com -f email,phone

`-store-field`

เพื่อชมเชยตัวเลือก field ซึ่งมีประโยชน์ในการกรองเอาต์พุต ณ รันไทม์ มีตัวเลือก -sf, -store-fields ซึ่งทำงานเหมือนกับตัวเลือกฟิลด์ทุกประการ ยกเว้นแทนที่จะกรอง แต่จะเก็บข้อมูลทั้งหมดบนดิสก์ภายใต้ไดเร็กทอรี katana_field เรียงลำดับตาม url เป้าหมาย . ใช้ -sfd หรือ -store-field-dir เพื่อจัดเก็บข้อมูลในตำแหน่งอื่น

 katana -u https://tesla.com -sf key,fqdn,qurl -silent

$ ls katana_field/

https_www.tesla.com_fqdn.txt
https_www.tesla.com_key.txt
https_www.tesla.com_qurl.txt

ตัวเลือก -store-field จะมีประโยชน์ในการรวบรวมข้อมูลเพื่อสร้างรายการคำศัพท์ที่กำหนดเป้าหมายเพื่อวัตถุประสงค์ต่างๆ ซึ่งรวมถึงแต่ไม่จำกัดเพียง:

การระบุพารามิเตอร์ที่ใช้บ่อยที่สุด
ค้นพบเส้นทางที่ใช้บ่อย
ค้นหาไฟล์ที่ใช้กันทั่วไป
การระบุโดเมนย่อยที่เกี่ยวข้องหรือไม่รู้จัก

ตัวกรองคาตาน่า

`-extension-match`

เอาต์พุตที่รวบรวมข้อมูลสามารถจับคู่ได้อย่างง่ายดายสำหรับส่วนขยายเฉพาะโดยใช้ตัวเลือก -em เพื่อให้แน่ใจว่าจะแสดงเฉพาะเอาต์พุตที่มีส่วนขยายที่กำหนดเท่านั้น

 katana -u https://tesla.com -silent -em js,jsp,json

`-extension-filter`

เอาต์พุตการรวบรวมข้อมูลสามารถกรองได้อย่างง่ายดายสำหรับส่วนขยายเฉพาะโดยใช้ตัวเลือก -ef เพื่อให้แน่ใจว่าจะลบ URL ทั้งหมดที่มีส่วนขยายที่กำหนด

 katana -u https://tesla.com -silent -ef css,txt,md

`-match-regex`

แฟล็ก -match-regex หรือ -mr ช่วยให้คุณสามารถกรอง URL เอาต์พุตโดยใช้นิพจน์ทั่วไป เมื่อใช้แฟล็กนี้ เฉพาะ URL ที่ตรงกับนิพจน์ทั่วไปที่ระบุเท่านั้นที่จะถูกพิมพ์ในเอาต์พุต

 katana -u https://tesla.com -mr 'https://shop.tesla.com/*' -silent

`-filter-regex`

แฟล็ก -filter-regex หรือ -fr อนุญาตให้คุณกรอง URL เอาต์พุตโดยใช้นิพจน์ทั่วไป เมื่อใช้แฟล็กนี้ มันจะข้าม URL ที่ตรงกับนิพจน์ทั่วไปที่ระบุ

 katana -u https://tesla.com -fr 'https://www.tesla.com/*' -silent

การกรองขั้นสูง

Katana รองรับนิพจน์ที่ใช้ DSL สำหรับการจับคู่และการกรองขั้นสูง:

หากต้องการจับคู่ปลายทางกับรหัสสถานะ 200 ให้ทำดังนี้

katana -u https://www.hackerone.com -mdc ' status_code == 200 '

หากต้องการจับคู่ปลายทางที่มี "ค่าเริ่มต้น" และมีรหัสสถานะอื่นที่ไม่ใช่ 403:

katana -u https://www.hackerone.com -mdc ' contains(endpoint, "default") && status_code != 403 '

เพื่อจับคู่ตำแหน่งข้อมูลกับเทคโนโลยี PHP:

katana -u https://www.hackerone.com -mdc ' contains(to_lower(technologies), "php") '

หากต้องการกรองปลายทางที่ทำงานบน Cloudflare:

katana -u https://www.hackerone.com -fdc ' contains(to_lower(technologies), "cloudflare") '

สามารถใช้ฟังก์ชัน DSL กับคีย์ใดก็ได้ในเอาต์พุต jsonl สำหรับข้อมูลเพิ่มเติมเกี่ยวกับฟังก์ชัน DSL ที่มี โปรดเยี่ยมชมโครงการ dsl

นี่คือตัวเลือกตัวกรองเพิ่มเติม -

ขยาย