Descarga de deep daze - descarga de código fuente deep daze

Aturdimiento profundo

niebla sobre colinas verdes

platos rotos en la hierba

amor y atención cósmicos

un viajero en el tiempo entre la multitud

la vida durante la plaga

paz meditativa en un bosque iluminado por el sol

un hombre pintando una imagen completamente roja

una experiencia psicodélica con LSD

¿Qué es esto?

Herramienta de línea de comando simple para la generación de texto a imagen usando CLIP y Siren de OpenAI. ¡El crédito es para Ryan Murdock por el descubrimiento de esta técnica (y por encontrar el gran nombre)!

cuaderno original

Nuevo cuaderno simplificado

Esto requerirá que tengas una GPU Nvidia o GPU AMD.

Recomendado: 16 GB de VRAM
Requisitos mínimos: 4 GB de VRAM (usando configuraciones MUY BAJAS, consulte las instrucciones de uso a continuación)

Instalar

$ pip install deep-daze

Instalación de Windows

Suponiendo que Python esté instalado:

Abra el símbolo del sistema y navegue hasta el directorio de su versión actual de Python

  pip install deep-daze

Ejemplos

$ imagine " a house in the forest "

Para Windows:

Abrir símbolo del sistema como administrador

  imagine " a house in the forest "

Eso es todo.

Si tiene suficiente memoria, puede obtener una mejor calidad agregando una bandera --deeper

$ imagine " shattered plates on the ground " --deeper

Avanzado

En el verdadero estilo del aprendizaje profundo, más capas producirán mejores resultados. El valor predeterminado es 16 , pero se puede aumentar a 32 dependiendo de sus recursos.

$ imagine " stranger in strange lands " --num-layers 32

Uso

CLI

NAME
    imagine

SYNOPSIS
    imagine TEXT < flags >

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like ' . ' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. ` %y%m%d-%H%M%S-my_phrase_here `
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

Cebado

Técnica ideada y compartida por primera vez por Mario Klingemann, permite preparar la red generadora con una imagen inicial, antes de dirigirla hacia el texto.

Simplemente especifique la ruta a la imagen que desea utilizar y, opcionalmente, la cantidad de pasos de capacitación iniciales.

$ imagine ' a clear night sky filled with stars ' --start_image_path ./cloudy-night-sky.jpg

Imagen inicial preparada

Luego entrenó con el mensaje A pizza with green pepper.

Optimizar para la interpretación de una imagen.

También podemos introducir una imagen como objetivo de optimización, en lugar de solo preparar la red del generador. Luego, Deepdaze presentará su propia interpretación de esa imagen:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

Imagen original:

La interpretación de la red:

Imagen original:

La interpretación de la red:

Optimizar para texto e imagen combinados

$ imagine " A psychedelic experience. " --img samples/hot-dog.jpg

La interpretación de la red:

Nuevo: crear una historia

El modo normal para mensajes de texto sólo permite 77 tokens. Si desea visualizar una historia/párrafo/canción/poema completo, establezca create_story en True .

Dado el poema "Parando en el bosque en una noche nevada" de Robert Frost: "Creo que sé de quién son estos bosques. Sin embargo, su casa está en el pueblo; no me verá detenerme aquí para ver cómo sus bosques se llenan de nieve. Mi pequeño caballo debe pensar que es extraño detenerse sin una granja cerca entre el bosque y el lago helado en la tarde más oscura del año. Sacude las campanas de su arnés para preguntar si hay algún error. El único otro sonido es el murmullo del viento suave. y Los bosques son hermosos, oscuros y profundos, pero tengo promesas que cumplir, y kilómetros por recorrer antes de dormir, y kilómetros por recorrer antes de dormir.

Obtenemos:

De quién_son_estos_bosques_creo_que_sé._Su_casa_está_en_la_aldea_aunque._Él_.mp4

Pitón

Invocar `deep_daze.Imagine` en Python

 from deep_daze import Imagine

imagine = Imagine (
    text = 'cosmic love and attention' ,
    num_layers = 24 ,
)
imagine ()

Guardar el progreso cada cuarta iteración

Guarde imágenes en el formato insert_text_here.00001.png, insert_text_here.00002.png, ... hasta (total_iterations % save_every)

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True
)

Anteponga la marca de tiempo actual en cada imagen.

Crea archivos con la marca de tiempo y el número de secuencia.

por ejemplo, 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True ,
    save_date_time = True ,
)

Alto uso de memoria de GPU

Si tiene al menos 16 GiB de vram disponibles, debería poder ejecutar estas configuraciones con cierto margen de maniobra.

 imagine = Imagine (
    text = text ,
    num_layers = 42 ,
    batch_size = 64 ,
    gradient_accumulate_every = 1 ,
)

Uso promedio de memoria de GPU

 imagine = Imagine (
    text = text ,
    num_layers = 24 ,
    batch_size = 16 ,
    gradient_accumulate_every = 2
)

Uso de memoria GPU muy bajo (menos de 4 GiB)

Si está desesperado por ejecutar esto en una tarjeta con menos de 8 GiB vram, puede reducir el ancho de imagen.

 imagine = Imagine (
    text = text ,
    image_width = 256 ,
    num_layers = 16 ,
    batch_size = 1 ,
    gradient_accumulate_every = 16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

VRAM y puntos de referencia de velocidad:

Estos experimentos se realizaron con un 2060 Super RTX y un 3700X Ryzen 5. Primero mencionamos los parámetros (bs = tamaño de lote), luego el uso de memoria y, en algunos casos, las iteraciones de entrenamiento por segundo:

Para una resolución de imagen de 512:

bs 1, número_capas 22: 7,96 GB
bs 2, número_capas 20: 7,5 GB
bs 16, número_capas 16: 6,5 GB

Para una resolución de imagen de 256:

bs 8, número_capas 48: 5,3 GB
bs 16, num_layers 48: 5,46 GB - 2,0 it/s
bs 32, num_layers 48: 5,92 GB - 1,67 it/s
bs 8, num_layers 44: 5 GB - 2,39 it/s
bs 32, num_layers 44, grad_acc 1: 5,62 GB - 4,83 it/s
bs 96, num_layers 44, grad_acc 1: 7,51 GB - 2,77 it/s
bs 32, num_layers 66, grad_acc 1: 7,09 GB - 3,7 it/s

@NotNANtoN recomienda un tamaño de lote de 32 con 44 capas y entrenamiento de 1 a 8 épocas.

¿A dónde va esto?

Esto es sólo un adelanto. Podremos generar imágenes, sonido, cualquier cosa a nuestro antojo, con lenguaje natural. La holocubierta está a punto de hacerse realidad en nuestras vidas.

Únase a los esfuerzos de replicación de DALL-E para Pytorch o Mesh Tensorflow si está interesado en promover esta tecnología.

Alternativas

Big Sleep: CLIP y el generador de Big GAN

Citas

 @misc { unpublished2021clip ,
    title  = { CLIP: Connecting Text and Images } ,
    author = { Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal } ,
    year   = { 2021 }
}

 @misc { sitzmann2020implicit ,
    title   = { Implicit Neural Representations with Periodic Activation Functions } ,
    author  = { Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein } ,
    year    = { 2020 } ,
    eprint  = { 2006.09661 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

Expandir

deep daze

Aturdimiento profundo

¿Qué es esto?

Instalar

Instalación de Windows

Ejemplos

Avanzado

Uso

CLI

Cebado

Optimizar para la interpretación de una imagen.

Optimizar para texto e imagen combinados

Nuevo: crear una historia

Pitón

Invocar `deep_daze.Imagine` en Python

Guardar el progreso cada cuarta iteración

Anteponga la marca de tiempo actual en cada imagen.

Alto uso de memoria de GPU

Uso promedio de memoria de GPU

Uso de memoria GPU muy bajo (menos de 4 GiB)

VRAM y puntos de referencia de velocidad:

¿A dónde va esto?

Alternativas

Citas

Aplicación DAZE CAM

Campo profundo

Juego Cazador profundo

profunda di

Carrera profunda: batalla

Runa profunda

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions

deep daze

Aturdimiento profundo

¿Qué es esto?

Instalar

Instalación de Windows

Ejemplos

Avanzado

Uso

CLI

Cebado

Optimizar para la interpretación de una imagen.

Optimizar para texto e imagen combinados

Nuevo: crear una historia

Pitón

Invocar deep_daze.Imagine en Python

Guardar el progreso cada cuarta iteración

Anteponga la marca de tiempo actual en cada imagen.

Alto uso de memoria de GPU

Uso promedio de memoria de GPU

Uso de memoria GPU muy bajo (menos de 4 GiB)

VRAM y puntos de referencia de velocidad:

¿A dónde va esto?

Alternativas

Citas

Invocar `deep_daze.Imagine` en Python