download deep daze - download do código-fonte deep daze

Profundo atordoamento

névoa sobre colinas verdes

pratos quebrados na grama

amor e atenção cósmicos

um viajante do tempo no meio da multidão

vida durante a peste

paz meditativa em uma floresta ensolarada

um homem pintando uma imagem completamente vermelha

uma experiência psicodélica com LSD

O que é isso?

Ferramenta simples de linha de comando para geração de texto em imagem usando CLIP e Siren da OpenAI. O crédito vai para Ryan Murdock pela descoberta desta técnica (e por ter criado o grande nome)!

Caderno original

Novo caderno simplificado

Isso exigirá que você tenha uma GPU Nvidia ou GPU AMD

Recomendado: 16 GB VRAM
Requisitos mínimos: 4 GB VRAM (usando configurações MUITO BAIXAS, veja as instruções de uso abaixo)

Instalar

$ pip install deep-daze

Instalação do Windows

Presumindo que o Python esteja instalado:

Abra o prompt de comando e navegue até o diretório da sua versão atual do Python

  pip install deep-daze

Exemplos

$ imagine " a house in the forest "

Para Windows:

Abra o prompt de comando como administrador

  imagine " a house in the forest "

É isso.

Se você tiver memória suficiente, poderá obter melhor qualidade adicionando um sinalizador --deeper

$ imagine " shattered plates on the ground " --deeper

Avançado

No verdadeiro estilo de aprendizagem profunda, mais camadas produzirão melhores resultados. O padrão é 16 , mas pode ser aumentado para 32 dependendo dos seus recursos.

$ imagine " stranger in strange lands " --num-layers 32

Uso

CLI

NAME
    imagine

SYNOPSIS
    imagine TEXT < flags >

POSITIONAL ARGUMENTS
    TEXT
        (required) A phrase less than 77 tokens which you would like to visualize.

FLAGS
    --img=IMAGE_PATH
        Default: None
        Path to png/jpg image or PIL image to optimize on
    --encoding=ENCODING
        Default: None
        User-created custom CLIP encoding. If used, replaces any text or image that was used.
    --create_story=CREATE_STORY
        Default: False
        Creates a story by optimizing each epoch on a new sliding-window of the input words. If this is enabled, much longer texts than 77 tokens can be used. Requires save_progress to visualize the transitions of the story.
    --story_start_words=STORY_START_WORDS
        Default: 5
        Only used if create_story is True. How many words to optimize on for the first epoch.
    --story_words_per_epoch=STORY_WORDS_PER_EPOCH
        Default: 5
        Only used if create_story is True. How many words to add to the optimization goal per epoch after the first one.
    --story_separator:
        Default: None
        Only used if create_story is True. Defines a separator like ' . ' that splits the text into groups for each epoch. Separator needs to be in the text otherwise it will be ignored
    --lower_bound_cutout=LOWER_BOUND_CUTOUT
        Default: 0.1
        Lower bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should be smaller than 0.8.
    --upper_bound_cutout=UPPER_BOUND_CUTOUT
        Default: 1.0
        Upper bound of the sampling of the size of the random cut-out of the SIREN image per batch. Should probably stay at 1.0.
    --saturate_bound=SATURATE_BOUND
        Default: False
        If True, the LOWER_BOUND_CUTOUT is linearly increased to 0.75 during training.
    --learning_rate=LEARNING_RATE
        Default: 1e-05
        The learning rate of the neural net.
    --num_layers=NUM_LAYERS
        Default: 16
        The number of hidden layers to use in the Siren neural net.
    --batch_size=BATCH_SIZE
        Default: 4
        The number of generated images to pass into Siren before calculating loss. Decreasing this can lower memory and accuracy.
    --gradient_accumulate_every=GRADIENT_ACCUMULATE_EVERY
        Default: 4
        Calculate a weighted loss of n samples for each iteration. Increasing this can help increase accuracy with lower batch sizes.
    --epochs=EPOCHS
        Default: 20
        The number of epochs to run.
    --iterations=ITERATIONS
        Default: 1050
        The number of times to calculate and backpropagate loss in a given epoch.
    --save_every=SAVE_EVERY
        Default: 100
        Generate an image every time iterations is a multiple of this number.
    --image_width=IMAGE_WIDTH
        Default: 512
        The desired resolution of the image.
    --deeper=DEEPER
        Default: False
        Uses a Siren neural net with 32 hidden layers.
    --overwrite=OVERWRITE
        Default: False
        Whether or not to overwrite existing generated images of the same name.
    --save_progress=SAVE_PROGRESS
        Default: False
        Whether or not to save images generated before training Siren is complete.
    --seed=SEED
        Type: Optional[]
        Default: None
        A seed to be used for deterministic runs.
    --open_folder=OPEN_FOLDER
        Default: True
        Whether or not to open a folder showing your generated images.
    --save_date_time=SAVE_DATE_TIME
        Default: False
        Save files with a timestamp prepended e.g. ` %y%m%d-%H%M%S-my_phrase_here `
    --start_image_path=START_IMAGE_PATH
        Default: None
        The generator is trained first on a starting image before steered towards the textual input
    --start_image_train_iters=START_IMAGE_TRAIN_ITERS
        Default: 50
        The number of steps for the initial training on the starting image
    --theta_initial=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the first layer of the network.
    --theta_hidden=THETA_INITIAL
        Default: 30.0
        Hyperparameter describing the frequency of the color space. Only applies to the hidden layers of the network.
    --save_gif=SAVE_GIF
        Default: False
        Whether or not to save a GIF animation of the generation procedure. Only works if save_progress is set to True.

Preparação

Técnica idealizada e compartilhada pela primeira vez por Mario Klingemann, permite preparar a rede geradora com uma imagem inicial, antes de ser direcionada ao texto.

Basta especificar o caminho para a imagem que deseja usar e, opcionalmente, o número de etapas iniciais do treinamento.

$ imagine ' a clear night sky filled with stars ' --start_image_path ./cloudy-night-sky.jpg

Imagem inicial preparada

Em seguida, treinei com o prompt A pizza with green pepper.

Otimize para a interpretação de uma imagem

Também podemos alimentar uma imagem como objetivo de otimização, em vez de apenas preparar a rede do gerador. Deepdaze irá então renderizar sua própria interpretação dessa imagem:

$ imagine --img samples/Autumn_1875_Frederic_Edwin_Church.jpg

Imagem original:

A interpretação da rede:

Imagem original:

A interpretação da rede:

Otimize para texto e imagem combinados

$ imagine " A psychedelic experience. " --img samples/hot-dog.jpg

A interpretação da rede:

Novidade: crie uma história

O modo normal para textos permite apenas 77 tokens. Se você deseja visualizar uma história/parágrafo/música/poema completo, defina create_story como True .

Dado o poema “Parando no bosque em uma noite de neve”, de Robert Frost - "Acho que sei de quem são esses bosques. Mas a casa dele fica na vila; ele não me verá parar aqui para ver seu bosque se encher de neve. Meu cavalinho deve achar estranho Parar sem uma casa de fazenda por perto Entre a floresta e o lago congelado Na noite mais escura do ano Ele balança os sinos dos arreios Para perguntar se há algum engano O único outro som é o vento suave. e felpudo floco. A floresta é linda, escura e profunda, Mas tenho promessas a cumprir, E quilômetros a percorrer antes de dormir, E quilômetros a percorrer antes de dormir.

Nós obtemos:

De quem são esses bosques, eu acho que sei. Sua casa está na aldeia. Ele.mp4

Pitão

Invoque `deep_daze.Imagine` em Python

 from deep_daze import Imagine

imagine = Imagine (
    text = 'cosmic love and attention' ,
    num_layers = 24 ,
)
imagine ()

Salve o progresso a cada quarta iteração

Salve imagens no formato insert_text_here.00001.png, insert_text_here.00002.png, ...até (total_iterations % save_every)

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True
)

Anexe o carimbo de data/hora atual em cada imagem.

Cria arquivos com carimbo de data/hora e número de sequência.

por exemplo, 210129-043928_328751_insert_text_here.00001.png, 210129-043928_512351_insert_text_here.00002.png, ...

 imagine = Imagine (
    text = text ,
    save_every = 4 ,
    save_progress = True ,
    save_date_time = True ,
)

Alto uso de memória GPU

Se você tiver pelo menos 16 GiB de vram disponíveis, poderá executar essas configurações com alguma margem de manobra.

 imagine = Imagine (
    text = text ,
    num_layers = 42 ,
    batch_size = 64 ,
    gradient_accumulate_every = 1 ,
)

Uso médio de memória GPU

 imagine = Imagine (
    text = text ,
    num_layers = 24 ,
    batch_size = 16 ,
    gradient_accumulate_every = 2
)

Uso de memória GPU muito baixo (menos de 4 GiB)

Se você está desesperado para executar isso em uma placa com menos de 8 GiB vram, você pode diminuir o image_width.

 imagine = Imagine (
    text = text ,
    image_width = 256 ,
    num_layers = 16 ,
    batch_size = 1 ,
    gradient_accumulate_every = 16 # Increase gradient_accumulate_every to correct for loss in low batch sizes
)

Benchmarks de VRAM e velocidade:

Esses experimentos foram realizados com um 2060 Super RTX e um 3700X Ryzen 5. Mencionamos primeiro os parâmetros (bs = tamanho do lote), depois o uso de memória e, em alguns casos, as iterações de treinamento por segundo:

Para uma resolução de imagem de 512:

bs 1, num_layers 22: 7,96 GB
bs 2, num_layers 20: 7,5 GB
bs 16, num_layers 16: 6,5 GB

Para uma resolução de imagem de 256:

bs 8, num_layers 48: 5,3 GB
bs 16, num_layers 48: 5,46 GB - 2,0 it/s
bs 32, num_layers 48: 5,92 GB - 1,67 it/s
bs 8, num_layers 44: 5 GB - 2,39 it/s
bs 32, num_layers 44, grad_acc 1: 5,62 GB - 4,83 it/s
bs 96, num_layers 44, grad_acc 1: 7,51 GB - 2,77 it/s
bs 32, num_layers 66, grad_acc 1: 7,09 GB - 3,7 it/s

@NotNANtoN recomenda um tamanho de lote de 32 com 44 camadas e treinamento de 1 a 8 épocas.

Para onde isso vai dar?

Isto é apenas uma provocação. Poderemos gerar imagens, sons, qualquer coisa à vontade, com linguagem natural. O holodeck está prestes a se tornar real em nossas vidas.

Junte-se aos esforços de replicação do DALL-E para Pytorch ou Mesh Tensorflow se estiver interessado em promover esta tecnologia.

Alternativas

Big Sleep - CLIP e o gerador do Big GAN

Citações

 @misc { unpublished2021clip ,
    title  = { CLIP: Connecting Text and Images } ,
    author = { Alec Radford, Ilya Sutskever, Jong Wook Kim, Gretchen Krueger, Sandhini Agarwal } ,
    year   = { 2021 }
}

 @misc { sitzmann2020implicit ,
    title   = { Implicit Neural Representations with Periodic Activation Functions } ,
    author  = { Vincent Sitzmann and Julien N. P. Martel and Alexander W. Bergman and David B. Lindell and Gordon Wetzstein } ,
    year    = { 2020 } ,
    eprint  = { 2006.09661 } ,
    archivePrefix = { arXiv } ,
    primaryClass = { cs.CV }
}

Expandir

deep daze

Profundo atordoamento

O que é isso?

Instalar

Instalação do Windows

Exemplos

Avançado

Uso

CLI

Preparação

Otimize para a interpretação de uma imagem

Otimize para texto e imagem combinados

Novidade: crie uma história

Pitão

Invoque `deep_daze.Imagine` em Python

Salve o progresso a cada quarta iteração

Anexe o carimbo de data/hora atual em cada imagem.

Alto uso de memória GPU

Uso médio de memória GPU

Uso de memória GPU muito baixo (menos de 4 GiB)

Benchmarks de VRAM e velocidade:

Para onde isso vai dar?

Alternativas

Citações

Aplicativo DAZE CAM

Campo Profundo

Jogo Deep Hunter

Di Profundo

Corrida Profunda: Batalha

Runa Profunda

chat.petals.dev

GPT Prompt Templates

GPTyped

node telegram bot api

typebot.io

python wechaty getting started

waymo open dataset

termwind

wp functions

deep daze

Profundo atordoamento

O que é isso?

Instalar

Instalação do Windows

Exemplos

Avançado

Uso

CLI

Preparação

Otimize para a interpretação de uma imagem

Otimize para texto e imagem combinados

Novidade: crie uma história

Pitão

Invoque deep_daze.Imagine em Python

Salve o progresso a cada quarta iteração

Anexe o carimbo de data/hora atual em cada imagem.

Alto uso de memória GPU

Uso médio de memória GPU

Uso de memória GPU muito baixo (menos de 4 GiB)

Benchmarks de VRAM e velocidade:

Para onde isso vai dar?

Alternativas

Citações

Invoque `deep_daze.Imagine` em Python