deep daze تحميل - deep daze كود المصدر التحميل

المذهول العميق

ضباب فوق التلال الخضراء

لوحات محطمة على العشب

الحب والاهتمام الكوني

مسافر عبر الزمن في الحشد

الحياة أثناء الطاعون

السلام التأملي في غابة مضاءة بنور الشمس

رجل يرسم صورة حمراء بالكامل

تجربة مخدر على LSD

ما هذا؟

أداة سطر أوامر بسيطة لتوليد النص إلى صورة باستخدام OpenAI's CLIP وSiren. يعود الفضل إلى ريان موردوك لاكتشاف هذه التقنية (والتوصل إلى الاسم الرائع)!

دفتر اصلي

دفتر مبسط جديد

سيتطلب ذلك أن يكون لديك وحدة معالجة الرسومات Nvidia أو AMD GPU

الموصى بها: 16 جيجابايت من ذاكرة الفيديو
الحد الأدنى من المتطلبات: VRAM سعة 4 جيجابايت (باستخدام إعدادات منخفضة جدًا، راجع تعليمات الاستخدام أدناه)

ثَبَّتَ

$ pip install deep-daze

تثبيت ويندوز

بافتراض تثبيت بايثون:

افتح موجه الأوامر وانتقل إلى دليل الإصدار الحالي من Python

  pip install deep-daze

أمثلة

$ imagine " a house in the forest "

لنظام التشغيل Windows:

افتح موجه الأوامر كمسؤول

  imagine " a house in the forest "

هذا كل شيء.

إذا كان لديك ذاكرة كافية، يمكنك الحصول على جودة أفضل عن طريق إضافة علامة --deeper

$ imagine " shattered plates on the ground " --deeper

متقدم

في أسلوب التعلم العميق الحقيقي، سيؤدي المزيد من الطبقات إلى نتائج أفضل. الافتراضي هو 16 ، ولكن يمكن زيادته إلى 32 حسب مواردك.

$ imagine " stranger in strange lands " --num-layers 32

الاستخدام

سطر الأوامر

NAME
    imagine

SYNOPSIS
    imagine TEXT < flags >

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like ' . ' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. ` %y%m%d-%H%M%S-my_phrase_here `
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

فتيلة

إنها تقنية تم ابتكارها ومشاركتها لأول مرة بواسطة ماريو كلينجمان، وهي تسمح لك بإعداد شبكة المولدات بصورة أولية، قبل توجيهها نحو النص.

ما عليك سوى تحديد المسار إلى الصورة التي ترغب في استخدامها، واختياريًا عدد خطوات التدريب الأولية.

$ imagine ' a clear night sky filled with stars ' --start_image_path ./cloudy-night-sky.jpg

صورة البداية معدة

ثم تدرب مع السريع A pizza with green pepper.

الأمثل لتفسير الصورة

يمكننا أيضًا تغذية الصورة كهدف تحسين، بدلاً من تجهيز شبكة المولدات فقط. ستقوم Deepdaze بعد ذلك بتقديم تفسيرها الخاص لتلك الصورة:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

الصورة الأصلية:

تفسير الشبكة:

الصورة الأصلية:

تفسير الشبكة:

الأمثل للنص والصورة مجتمعة

$ imagine " A psychedelic experience. " --img samples/hot-dog.jpg

تفسير الشبكة:

جديد: إنشاء قصة

الوضع العادي للنصوص يسمح فقط بـ 77 رمزًا. إذا كنت تريد تصور قصة/فقرة/أغنية/قصيدة كاملة، فاضبط create_story على True .

بالنظر إلى قصيدة "التوقف عند الغابة في أمسية ثلجية" لروبرت فروست - "أعتقد أنني أعرف لمن هذه الغابة. منزله يقع في القرية؛ لن يراني أتوقف هنا لمشاهدة غابته تمتلئ بالثلج. لا بد أن حصاني الصغير يعتقد أنه من الغريب أن يتوقف دون مزرعة قريبة بين الغابة والبحيرة المتجمدة في أحلك أمسية في العام. يهز أجراسه ليسأل عما إذا كان هناك خطأ ما. الصوت الآخر الوحيد هو صوت المسح السهل الرياح والرقائق الناعمة، الغابة جميلة ومظلمة وعميقة، ولكن لدي وعود يجب أن أفي بها، وأميال يجب أن أقطعها قبل أن أنام، وأميال يجب أن أقطعها قبل أن أنام.

نحصل على:

من_هذه_الغابات_التي_أفكر_أنا_أعرفها._منزله_في_القرية_على الرغم من ذلك._He_.mp4

بايثون

استدعاء `deep_daze.Imagine` في بايثون

 from deep_daze import Imagine

imagine = Imagine (
    text = 'cosmic love and attention' ,
    num_layers = 24 ,
)
imagine ()

حفظ التقدم في كل التكرار الرابع

احفظ الصور بتنسيق Insert_text_here.00001.png، Insert_text_here.00002.png، ...ما يصل إلى (total_iterations % save_every)

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True
)

قم بإضافة الطابع الزمني الحالي مسبقًا على كل صورة.

يقوم بإنشاء ملفات تحتوي على الطابع الزمني والرقم التسلسلي.

على سبيل المثال 210129-043928_328751_insert_text_here.00001.png، 210129-043928_512351_insert_text_here.00002.png، ...

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True ,
    save_date_time = True ,
)

ارتفاع استخدام ذاكرة GPU

إذا كان لديك ما لا يقل عن 16 غيغابايت من ذاكرة الفيديو المتاحة، فمن المفترض أن تكون قادرًا على تشغيل هذه الإعدادات مع بعض المساحة المرنة.

 imagine = Imagine (
    text = text ,
    num_layers = 42 ,
    batch_size = 64 ,
    gradient_accumulate_every = 1 ,
)

متوسط استخدام ذاكرة GPU

 imagine = Imagine (
    text = text ,
    num_layers = 24 ,
    batch_size = 16 ,
    gradient_accumulate_every = 2
)

استخدام ذاكرة GPU منخفض جدًا (أقل من 4 جيجا بايت)

إذا كنت ترغب بشدة في تشغيل هذا على بطاقة بها أقل من 8 جيجا بايت من vram، فيمكنك خفض image_width.

 imagine = Imagine (
    text = text ,
    image_width = 256 ,
    num_layers = 16 ,
    batch_size = 1 ,
    gradient_accumulate_every = 16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM ومعايير السرعة:

تم إجراء هذه التجارب باستخدام 2060 Super RTX و3700X Ryzen 5. نذكر أولاً المعلمات (bs = حجم الدفعة)، ثم استخدام الذاكرة وفي بعض الحالات تكرارات التدريب في الثانية:

للحصول على دقة صورة 512:

bs 1، عدد الطبقات 22: 7.96 جيجابايت
بس 2، عدد الطبقات 20: 7.5 جيجابايت
بس 16، عدد الطبقات 16: 6.5 جيجابايت

للحصول على دقة صورة 256:

بس 8، عدد الطبقات 48: 5.3 جيجابايت
bs 16، عدد الطبقات 48: 5.46 جيجابايت - 2.0 ثانية/ثانية
bs 32، عدد الطبقات 48: 5.92 جيجابايت - 1.67 وحدة/ثانية
bs 8، عدد الطبقات 44: 5 جيجابايت - 2.39 وحدة/ثانية
bs 32، num_layers 44، grad_acc 1: 5.62 جيجابايت - 4.83 وحدة/ثانية
bs 96، num_layers 44، grad_acc 1: 7.51 جيجابايت - 2.77 ثانية/ثانية
bs 32، num_layers 66، grad_acc 1: 7.09 جيجابايت - 3.7 ثانية/ثانية

توصي @NotNANtoN بحجم دفعة يبلغ 32 شخصًا مع 44 طبقة والتدريب من 1 إلى 8 فترات.

إلى أين يتجه هذا؟

هذا مجرد إعلان تشويقي. سنكون قادرين على توليد الصور والأصوات وأي شيء حسب الرغبة باستخدام اللغة الطبيعية. إن الهولوديك على وشك أن يصبح حقيقيًا في حياتنا.

يرجى الانضمام إلى جهود النسخ المتماثل لـ DALL-E لـ Pytorch أو Mesh Tensorflow إذا كنت مهتمًا بتعزيز هذه التكنولوجيا.

البدائل

Big Sleep - CLIP والمولد من Big GAN

الاستشهادات

 @misc { unpublished2021clip ,
    title  = { CLIP: Connecting Text and Images } ,
    author = { Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal } ,
    year   = { 2021 }
}

 @misc { sitzmann2020implicit ,
    title   = { Implicit Neural Representations with Periodic Activation Functions } ,
    author  = { Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein } ,
    year    = { 2020 } ,
    eprint  = { 2006.09661 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

يوسع

deep daze

المذهول العميق

ما هذا؟

ثَبَّتَ

تثبيت ويندوز

أمثلة

متقدم

الاستخدام

سطر الأوامر

فتيلة

الأمثل لتفسير الصورة

الأمثل للنص والصورة مجتمعة

جديد: إنشاء قصة

بايثون

استدعاء `deep_daze.Imagine` في بايثون

حفظ التقدم في كل التكرار الرابع

قم بإضافة الطابع الزمني الحالي مسبقًا على كل صورة.

ارتفاع استخدام ذاكرة GPU

متوسط استخدام ذاكرة GPU

استخدام ذاكرة GPU منخفض جدًا (أقل من 4 جيجا بايت)

VRAM ومعايير السرعة:

إلى أين يتجه هذا؟

البدائل

الاستشهادات

تطبيق DAZE CAM

المجال العميق

لعبة ديب هانتر

ديب دي

السباق العميق: المعركة

رون عميق

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions

deep daze

المذهول العميق

ما هذا؟

ثَبَّتَ

تثبيت ويندوز

أمثلة

متقدم

الاستخدام

سطر الأوامر

فتيلة

الأمثل لتفسير الصورة

الأمثل للنص والصورة مجتمعة

جديد: إنشاء قصة

بايثون

استدعاء deep_daze.Imagine في بايثون

حفظ التقدم في كل التكرار الرابع

قم بإضافة الطابع الزمني الحالي مسبقًا على كل صورة.

ارتفاع استخدام ذاكرة GPU

متوسط ​​استخدام ذاكرة GPU

استخدام ذاكرة GPU منخفض جدًا (أقل من 4 جيجا بايت)

VRAM ومعايير السرعة:

إلى أين يتجه هذا؟

البدائل

الاستشهادات

استدعاء `deep_daze.Imagine` في بايثون

متوسط استخدام ذاكرة GPU