deep daze ดาวน์โหลด - deep daze ดาวน์โหลดซอร์สโค้ด

งุนงงลึก

หมอกเหนือเนินเขาสีเขียว

แผ่นหญ้าแตกกระจายอยู่บนพื้นหญ้า

ความรักและความสนใจของจักรวาล

นักเดินทางข้ามเวลาท่ามกลางฝูงชน

ชีวิตในช่วงที่เกิดโรคระบาด

ความสงบแห่งการทำสมาธิในป่าที่มีแสงแดดส่องถึง

ชายคนหนึ่งวาดภาพสีแดงสนิท

ประสบการณ์ประสาทหลอนเกี่ยวกับ LSD

นี่คืออะไร?

เครื่องมือบรรทัดคำสั่งอย่างง่ายสำหรับการสร้างข้อความเป็นรูปภาพโดยใช้ CLIP และ Siren ของ OpenAI ขอขอบคุณ Ryan Murdock สำหรับการค้นพบเทคนิคนี้ (และสำหรับการตั้งชื่อที่ยอดเยี่ยม)!

สมุดบันทึกต้นฉบับ

สมุดบันทึกแบบใหม่ที่เรียบง่าย

คุณจะต้องมี Nvidia GPU หรือ AMD GPU

แนะนำ: VRAM 16GB
ข้อกำหนดขั้นต่ำ: 4GB VRAM (ใช้การตั้งค่าที่ต่ำมาก ดูคำแนะนำการใช้งานด้านล่าง)

ติดตั้ง

$ pip install deep-daze

ติดตั้งวินโดวส์

สมมติว่าติดตั้ง Python แล้ว:

เปิดพร้อมท์คำสั่งแล้วไปที่ไดเร็กทอรีของ Python เวอร์ชันปัจจุบันของคุณ

  pip install deep-daze

ตัวอย่าง

$ imagine " a house in the forest "

สำหรับวินโดวส์:

เปิดพรอมต์คำสั่งในฐานะผู้ดูแลระบบ

  imagine " a house in the forest "

แค่นั้นแหละ.

หากคุณมีหน่วยความจำเพียงพอ คุณจะได้รับคุณภาพที่ดีขึ้นโดยการเพิ่มแฟล็ก --deeper

$ imagine " shattered plates on the ground " --deeper

ขั้นสูง

ในรูปแบบการเรียนรู้เชิงลึกอย่างแท้จริง เลเยอร์ที่มากขึ้นจะให้ผลลัพธ์ที่ดีกว่า ค่าเริ่มต้นคือ 16 แต่สามารถเพิ่มเป็น 32 ได้ ขึ้นอยู่กับทรัพยากรของคุณ

$ imagine " stranger in strange lands " --num-layers 32

การใช้งาน

คลีไอ

NAME
    imagine

SYNOPSIS
    imagine TEXT < flags >

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like ' . ' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. ` %y%m%d-%H%M%S-my_phrase_here `
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

การรองพื้น

เทคนิคที่ Mario Klingemann คิดค้นและแบ่งปันเป็นครั้งแรก ช่วยให้คุณสามารถเตรียมเครือข่ายตัวสร้างด้วยรูปภาพเริ่มต้น ก่อนที่จะหันไปทางข้อความ

เพียงระบุเส้นทางไปยังภาพที่คุณต้องการใช้ และอาจระบุจำนวนขั้นตอนการฝึกเบื้องต้นด้วย

$ imagine ' a clear night sky filled with stars ' --start_image_path ./cloudy-night-sky.jpg

ลงสีพื้นรูปภาพเริ่มต้นแล้ว

แล้วฝึกพร้อมรับคำ A pizza with green pepper.

ปรับให้เหมาะสมสำหรับการตีความภาพ

นอกจากนี้เรายังสามารถป้อนรูปภาพเป็นเป้าหมายในการเพิ่มประสิทธิภาพได้ แทนที่จะเป็นเพียงการเตรียมเครือข่ายตัวสร้างเท่านั้น Deepdaze จะแสดงการตีความภาพนั้นด้วยตัวมันเอง:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

ภาพต้นฉบับ:

การตีความของเครือข่าย:

ภาพต้นฉบับ:

การตีความของเครือข่าย:

ปรับให้เหมาะสมสำหรับข้อความและรูปภาพรวมกัน

$ imagine " A psychedelic experience. " --img samples/hot-dog.jpg

การตีความของเครือข่าย:

ใหม่: สร้างเรื่องราว

โหมดปกติสำหรับข้อความอนุญาตเพียง 77 โทเค็นเท่านั้น หากคุณต้องการเห็นภาพเรื่องราว/ย่อหน้า/เพลง/บทกวีแบบเต็ม ให้ตั้งค่า create_story เป็น True

อ้างอิงจากบทกวี “Stopping by Woods On a Snowy Evening” โดย Robert Frost - "ป่าของใครที่ฉันคิดว่าฉันรู้ แต่บ้านของเขาอยู่ในหมู่บ้าน เขาจะไม่เห็นฉันหยุดที่นี่เพื่อดูป่าของเขาเต็มไปด้วยหิมะ ม้าตัวน้อยของฉันต้องคิดว่ามันแปลกที่จะหยุดโดยไม่มีบ้านไร่ใกล้ ๆ ระหว่างป่ากับทะเลสาบน้ำแข็ง ยามเย็นที่มืดมนที่สุดของปี เขาเขย่าระฆังบังเหียนเพื่อถามว่ามีข้อผิดพลาดอะไรบ้าง มีเพียงเสียงลมพัดเบาๆ และมีขนนุ่ม เกล็ด ป่านั้นสวยงาม มืดมนและลึก แต่ฉันมีสัญญาว่าจะรักษาไว้ และต้องเดินทางอีกไกลก่อนที่ฉันจะนอน และอีกยาวไกลก่อนที่ฉันจะนอน"

เราได้รับ:

Whose_woods_these_are_I_think_I_know._His_house_is_in_the_village_ถึงแม้ว่า._He_.mp4

หลาม

เรียกใช้ `deep_daze.Imagine` ใน Python

 from deep_daze import Imagine

imagine = Imagine (
    text = 'cosmic love and attention' ,
    num_layers = 24 ,
)
imagine ()

บันทึกความคืบหน้าทุกๆ การวนซ้ำครั้งที่สี่

บันทึกรูปภาพในรูปแบบ insert_text_here.00001.png, insert_text_here.00002.png, ...สูงสุด (total_iterations % save_every)

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True
)

เพิ่มการประทับเวลาปัจจุบันในแต่ละภาพ

สร้างไฟล์ที่มีทั้งการประทับเวลาและหมายเลขลำดับ

เช่น 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True ,
    save_date_time = True ,
)

การใช้หน่วยความจำ GPU สูง

หากคุณมี vram อย่างน้อย 16 GiB คุณควรจะสามารถเรียกใช้การตั้งค่าเหล่านี้ได้โดยมีพื้นที่ว่างบ้าง

 imagine = Imagine (
    text = text ,
    num_layers = 42 ,
    batch_size = 64 ,
    gradient_accumulate_every = 1 ,
)

การใช้หน่วยความจำ GPU โดยเฉลี่ย

 imagine = Imagine (
    text = text ,
    num_layers = 24 ,
    batch_size = 16 ,
    gradient_accumulate_every = 2
)

การใช้หน่วยความจำ GPU ต่ำมาก (น้อยกว่า 4 GiB)

หากคุณหมดหวังที่จะรันสิ่งนี้บนการ์ดที่มี vram น้อยกว่า 8 GiB คุณสามารถลด image_width ได้

 imagine = Imagine (
    text = text ,
    image_width = 256 ,
    num_layers = 16 ,
    batch_size = 1 ,
    gradient_accumulate_every = 16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

เกณฑ์มาตรฐาน VRAM และความเร็ว:

การทดลองเหล่านี้ดำเนินการกับ 2060 Super RTX และ 3700X Ryzen 5 อันดับแรกเราจะพูดถึงพารามิเตอร์ (bs = ขนาดแบตช์) จากนั้นจึงใช้หน่วยความจำ และในบางกรณีเป็นการวนซ้ำการฝึกฝนต่อวินาที:

สำหรับความละเอียดภาพ 512:

bs 1, num_layers 22: 7.96 GB
bs 2, num_layers 20: 7.5 GB
bs 16, num_layers 16: 6.5 GB

สำหรับความละเอียดของภาพ 256:

bs 8, num_layers 48: 5.3 GB
bs 16, num_layers 48: 5.46 GB - 2.0 it/s
bs 32, num_layers 48: 5.92 GB - 1.67 มัน/วินาที
bs 8, num_layers 44: 5 GB - 2.39 it/s
bs 32, num_layers 44, grad_acc 1: 5.62 GB - 4.83 มัน/วินาที
bs 96, num_layers 44, grad_acc 1: 7.51 GB - 2.77 มัน/วินาที
bs 32, num_layers 66, grad_acc 1: 7.09 GB - 3.7 มัน/วินาที

@NotNANtoN แนะนำขนาดแบทช์ 32 พร้อม 44 เลเยอร์และการฝึกอบรม 1-8 ยุค

นี้จะไปไหน?

นี่เป็นเพียงทีเซอร์ เราจะสามารถสร้างภาพ เสียง อะไรก็ได้ตามใจชอบด้วยภาษาที่เป็นธรรมชาติ โฮโลเด็คกำลังจะเกิดขึ้นจริงในช่วงชีวิตของเรา

โปรดเข้าร่วมการจำลองแบบสำหรับ DALL-E สำหรับ Pytorch หรือ Mesh Tensorflow หากคุณสนใจที่จะพัฒนาเทคโนโลยีนี้ต่อไป

ทางเลือก

Big Sleep - CLIP และเครื่องกำเนิดไฟฟ้าจาก Big GAN

การอ้างอิง

 @misc { unpublished2021clip ,
    title  = { CLIP: Connecting Text and Images } ,
    author = { Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal } ,
    year   = { 2021 }
}

 @misc { sitzmann2020implicit ,
    title   = { Implicit Neural Representations with Periodic Activation Functions } ,
    author  = { Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein } ,
    year    = { 2020 } ,
    eprint  = { 2006.09661 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

ขยาย

deep daze

งุนงงลึก

นี่คืออะไร?

ติดตั้ง

ติดตั้งวินโดวส์

ตัวอย่าง

ขั้นสูง

การใช้งาน

คลีไอ

การรองพื้น

ปรับให้เหมาะสมสำหรับการตีความภาพ

ปรับให้เหมาะสมสำหรับข้อความและรูปภาพรวมกัน

ใหม่: สร้างเรื่องราว

หลาม

เรียกใช้ `deep_daze.Imagine` ใน Python

บันทึกความคืบหน้าทุกๆ การวนซ้ำครั้งที่สี่

เพิ่มการประทับเวลาปัจจุบันในแต่ละภาพ

การใช้หน่วยความจำ GPU สูง

การใช้หน่วยความจำ GPU โดยเฉลี่ย

การใช้หน่วยความจำ GPU ต่ำมาก (น้อยกว่า 4 GiB)

เกณฑ์มาตรฐาน VRAM และความเร็ว:

นี้จะไปไหน?

ทางเลือก

การอ้างอิง

แอพ DAZE CAM

ทุ่งลึก

เกมนักล่าลึก

ลึกดิ

การแข่งขันลึก: การต่อสู้

รูนลึก

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions

deep daze

งุนงงลึก

นี่คืออะไร?

ติดตั้ง

ติดตั้งวินโดวส์

ตัวอย่าง

ขั้นสูง

การใช้งาน

คลีไอ

การรองพื้น

ปรับให้เหมาะสมสำหรับการตีความภาพ

ปรับให้เหมาะสมสำหรับข้อความและรูปภาพรวมกัน

ใหม่: สร้างเรื่องราว

หลาม

เรียกใช้ deep_daze.Imagine ใน Python

บันทึกความคืบหน้าทุกๆ การวนซ้ำครั้งที่สี่

เพิ่มการประทับเวลาปัจจุบันในแต่ละภาพ

การใช้หน่วยความจำ GPU สูง

การใช้หน่วยความจำ GPU โดยเฉลี่ย

การใช้หน่วยความจำ GPU ต่ำมาก (น้อยกว่า 4 GiB)

เกณฑ์มาตรฐาน VRAM และความเร็ว:

นี้จะไปไหน?

ทางเลือก

การอ้างอิง

เรียกใช้ `deep_daze.Imagine` ใน Python