mast3r Download - mast3r Quellcode herunterladen

Banner

Offizielle Implementierung von Grounding Image Matching in 3D with MASt3R
[Projektseite], [MASt3R arxiv], [DUSt3R arxiv]

Beispiel für Matching-Ergebnisse aus MASt3R

Allgemeiner Überblick über die Architektur von MASt3R

 @misc { mast3r_arxiv24 ,
      title = { Grounding Image Matching in 3D with MASt3R } , 
      author = { Vincent Leroy and Yohann Cabon and Jerome Revaud } ,
      year = { 2024 } ,
      eprint = { 2406.09756 } ,
      archivePrefix = { arXiv } ,
      primaryClass = { cs.CV }
}

@inproceedings { dust3r_cvpr24 ,
      title = { DUSt3R: Geometric 3D Vision Made Easy } , 
      author = { Shuzhe Wang and Vincent Leroy and Yohann Cabon and Boris Chidlovskii and Jerome Revaud } ,
      booktitle = { CVPR } ,
      year = { 2024 }
}

Inhaltsverzeichnis

Inhaltsverzeichnis
Lizenz
Legen Sie los
- Installation
- Kontrollpunkte
- Interaktive Demo
- Interaktive Demo mit Docker
Verwendung
Ausbildung
- Datensätze
- Demo
- Unsere Hyperparameter
Visuelle Lokalisierung
- Datensatzvorbereitung
- Beispielbefehle

Lizenz

Der Code wird unter der CC BY-NC-SA 4.0-Lizenz vertrieben. Weitere Informationen finden Sie unter LIZENZ.

 # Copyright (C) 2024-present Naver Corporation. All rights reserved.
# Licensed under CC BY-NC-SA 4.0 (non-commercial use only).

Legen Sie los

Installation

Klonen Sie MASt3R.

git clone --recursive https://github.com/naver/mast3r
cd mast3r
# if you have already cloned mast3r:
# git submodule update --init --recursive

Erstellen Sie die Umgebung. Hier zeigen wir ein Beispiel mit Conda.

conda create -n mast3r python=3.11 cmake=3.14.0
conda activate mast3r 
conda install pytorch torchvision pytorch-cuda=12.1 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
pip install -r dust3r/requirements.txt
# Optional: you can also install additional packages to:
# - add support for HEIC images
# - add required packages for visloc.py
pip install -r dust3r/requirements_optional.txt

Kompilieren Sie optional die Cuda-Kernel für RoPE (wie in CroCo v2).

 # DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd dust3r/croco/models/curope/
python setup.py build_ext --inplace
cd ../../../../

Kontrollpunkte

Sie können die Kontrollpunkte auf zwei Arten erhalten:

Sie können unsere Huggingface_hub-Integration nutzen: Die Modelle werden automatisch heruntergeladen.
Ansonsten stellen wir mehrere vorab trainierte Modelle zur Verfügung:

Modellname	Schulungsvorsätze	Kopf	Encoder	Decoder
`MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric`	512x384, 512x336, 512x288, 512x256, 512x160	CatMLP+DPT	ViT-L	ViT-B

Sie können die Hyperparameter, die wir zum Trainieren dieser Modelle verwendet haben, im Abschnitt „Unsere Hyperparameter“ überprüfen. Überprüfen Sie unbedingt die Lizenz der von uns verwendeten Datensätze.

So laden Sie ein bestimmtes Modell herunter, zum Beispiel MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth :

mkdir -p checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/MASt3R/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth -P checkpoints/

Stellen Sie für diese Kontrollpunkte sicher, dass Sie zusätzlich zu CC-BY-NC-SA 4.0 der Lizenz aller von uns verwendeten Trainingsdatensätze zustimmen. Insbesondere die kartenfreie Datensatzlizenz ist sehr restriktiv. Weitere Informationen finden Sie unter CHECKPOINTS_NOTICE.

Interaktive Demo

Wir haben einen Huggingface-Bereich erstellt, in dem die neue spärliche globale Ausrichtung in einer vereinfachten Demo für kleine Szenen ausgeführt wird: naver/MASt3R. Es stehen zwei Demos zur lokalen Ausführung zur Verfügung:

 demo.py is the updated demo for MASt3R. It uses our new sparse global alignment method that allows you to reconstruct larger scenes

python3 demo.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric

# Use --weights to load a checkpoint from a local file, eg --weights checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric.pth
# Use --local_network to make it accessible on the local network, or --server_name to specify the url manually
# Use --server_port to change the port, by default it will search for an available port starting at 7860
# Use --device to use a different device, by default it's "cuda"

demo_dust3r_ga.py is the same demo as in dust3r (+ compatibility for MASt3R models)
see https://github.com/naver/dust3r?tab=readme-ov-file#interactive-demo for details

Interaktive Demo mit Docker

Befolgen Sie diese Anweisungen, um MASt3R mit Docker auszuführen, auch mit NVIDIA CUDA-Unterstützung:

Docker installieren : Falls noch nicht installiert, laden Sie docker und docker compose von der Docker-Website herunter und installieren Sie sie.
Installieren Sie das NVIDIA Docker Toolkit : Für GPU-Unterstützung installieren Sie das NVIDIA Docker Toolkit von der Nvidia-Website.
Erstellen Sie das Docker-Image und führen Sie es aus : cd in das Verzeichnis ./docker und führen Sie die folgenden Befehle aus:

 cd docker
bash run.sh --with-cuda --model_name= " MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric "

Oder wenn Sie die Demo ohne CUDA-Unterstützung ausführen möchten, führen Sie den folgenden Befehl aus:

 cd docker
bash run.sh --model_name= " MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric "

Standardmäßig wird demo.py mit der Option --local_network gestartet.
Besuchen Sie http://localhost:7860/ um auf die Web-Benutzeroberfläche zuzugreifen (oder ersetzen Sie localhost durch den Namen des Computers, um über das Netzwerk darauf zuzugreifen).

run.sh startet Docker-Compose entweder mit der Konfigurationsdatei „docker-compose-cuda.yml“ oder „docker-compose-cpu.ym“ und startet dann die Demo mit „entrypoint.sh“.

Demo

Verwendung

 from mast3r . model import AsymmetricMASt3R
from mast3r . fast_nn import fast_reciprocal_NNs

import mast3r . utils . path_to_dust3r
from dust3r . inference import inference
from dust3r . utils . image import load_images

if __name__ == '__main__' :
    device = 'cuda'
    schedule = 'cosine'
    lr = 0.01
    niter = 300

    model_name = "naver/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric"
    # you can put the path to a local checkpoint in model_name if needed
    model = AsymmetricMASt3R . from_pretrained ( model_name ). to ( device )
    images = load_images ([ 'dust3r/croco/assets/Chateau1.png' , 'dust3r/croco/assets/Chateau2.png' ], size = 512 )
    output = inference ([ tuple ( images )], model , device , batch_size = 1 , verbose = False )

    # at this stage, you have the raw dust3r predictions
    view1 , pred1 = output [ 'view1' ], output [ 'pred1' ]
    view2 , pred2 = output [ 'view2' ], output [ 'pred2' ]

    desc1 , desc2 = pred1 [ 'desc' ]. squeeze ( 0 ). detach (), pred2 [ 'desc' ]. squeeze ( 0 ). detach ()

    # find 2D-2D matches between the two images
    matches_im0 , matches_im1 = fast_reciprocal_NNs ( desc1 , desc2 , subsample_or_initxy1 = 8 ,
                                                   device = device , dist = 'dot' , block_size = 2 ** 13 )

    # ignore small border around the edge
    H0 , W0 = view1 [ 'true_shape' ][ 0 ]
    valid_matches_im0 = ( matches_im0 [:, 0 ] >= 3 ) & ( matches_im0 [:, 0 ] < int ( W0 ) - 3 ) & (
        matches_im0 [:, 1 ] >= 3 ) & ( matches_im0 [:, 1 ] < int ( H0 ) - 3 )

    H1 , W1 = view2 [ 'true_shape' ][ 0 ]
    valid_matches_im1 = ( matches_im1 [:, 0 ] >= 3 ) & ( matches_im1 [:, 0 ] < int ( W1 ) - 3 ) & (
        matches_im1 [:, 1 ] >= 3 ) & ( matches_im1 [:, 1 ] < int ( H1 ) - 3 )

    valid_matches = valid_matches_im0 & valid_matches_im1
    matches_im0 , matches_im1 = matches_im0 [ valid_matches ], matches_im1 [ valid_matches ]

    # visualize a few matches
    import numpy as np
    import torch
    import torchvision . transforms . functional
    from matplotlib import pyplot as pl

    n_viz = 20
    num_matches = matches_im0 . shape [ 0 ]
    match_idx_to_viz = np . round ( np . linspace ( 0 , num_matches - 1 , n_viz )). astype ( int )
    viz_matches_im0 , viz_matches_im1 = matches_im0 [ match_idx_to_viz ], matches_im1 [ match_idx_to_viz ]

    image_mean = torch . as_tensor ([ 0.5 , 0.5 , 0.5 ], device = 'cpu' ). reshape ( 1 , 3 , 1 , 1 )
    image_std = torch . as_tensor ([ 0.5 , 0.5 , 0.5 ], device = 'cpu' ). reshape ( 1 , 3 , 1 , 1 )

    viz_imgs = []
    for i , view in enumerate ([ view1 , view2 ]):
        rgb_tensor = view [ 'img' ] * image_std + image_mean
        viz_imgs . append ( rgb_tensor . squeeze ( 0 ). permute ( 1 , 2 , 0 ). cpu (). numpy ())

    H0 , W0 , H1 , W1 = * viz_imgs [ 0 ]. shape [: 2 ], * viz_imgs [ 1 ]. shape [: 2 ]
    img0 = np . pad ( viz_imgs [ 0 ], (( 0 , max ( H1 - H0 , 0 )), ( 0 , 0 ), ( 0 , 0 )), 'constant' , constant_values = 0 )
    img1 = np . pad ( viz_imgs [ 1 ], (( 0 , max ( H0 - H1 , 0 )), ( 0 , 0 ), ( 0 , 0 )), 'constant' , constant_values = 0 )
    img = np . concatenate (( img0 , img1 ), axis = 1 )
    pl . figure ()
    pl . imshow ( img )
    cmap = pl . get_cmap ( 'jet' )
    for i in range ( n_viz ):
        ( x0 , y0 ), ( x1 , y1 ) = viz_matches_im0 [ i ]. T , viz_matches_im1 [ i ]. T
        pl . plot ([ x0 , x1 + W0 ], [ y0 , y1 ], '-+' , color = cmap ( i / ( n_viz - 1 )), scalex = False , scaley = False )
    pl . show ( block = True )

passendes Beispiel für ein Krokopaar

Ausbildung

In diesem Abschnitt präsentieren wir eine kurze Demonstration, um mit dem Training von MASt3R zu beginnen.

Datensätze

Siehe Abschnitt „Datensätze“ in DUSt3R

Demo

Wie bei der DUSt3R-Schulungsdemo werden wir die gleiche Teilmenge von CO3Dv2 – Creative Commons Attribution-NonCommercial 4.0 International – herunterladen und vorbereiten und den Schulungscode darauf starten. Es ist genau der gleiche Prozess wie DUSt3R. Das Demomodell wird für einige Epochen anhand eines sehr kleinen Datensatzes trainiert. Es wird nicht sehr gut sein.

 # download and prepare the co3d subset
mkdir -p data/co3d_subset
cd data/co3d_subset
git clone https://github.com/facebookresearch/co3d
cd co3d
python3 ./co3d/download_dataset.py --download_folder ../ --single_sequence_subset
rm ../ * .zip
cd ../../..

python3 datasets_preprocess/preprocess_co3d.py --co3d_dir data/co3d_subset --output_dir data/co3d_subset_processed  --single_sequence_subset

# download the pretrained dust3r checkpoint
mkdir -p checkpoints/
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth -P checkpoints/

# for this example we'll do fewer epochs, for the actual hyperparameters we used in the paper, see the next section: "Our Hyperparameters"
torchrun --nproc_per_node=4 train.py 
    --train_dataset " 1000 @ Co3d(split='train', ROOT='data/co3d_subset_processed', aug_crop='auto', aug_monocular=0.005, aug_rot90='diff', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], n_corres=8192, nneg=0.5, transform=ColorJitter) " 
    --test_dataset " 100 @ Co3d(split='test', ROOT='data/co3d_subset_processed', resolution=(512,384), n_corres=1024, seed=777) " 
    --model " AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True) " 
    --train_criterion " ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean') " 
    --test_criterion " Regr3D_ScaleShiftInv(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288) " 
    --pretrained " checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth " 
    --lr 0.0001 --min_lr 1e-06 --warmup_epochs 1 --epochs 10 --batch_size 4 --accum_iter 4 
    --save_freq 1 --keep_freq 5 --eval_freq 1 --disable_cudnn_benchmark 
    --output_dir " checkpoints/mast3r_demo "

Unsere Hyperparameter

Wir haben nicht alle Trainingsdatensätze veröffentlicht, aber hier sind die Befehle, die wir zum Training unserer Modelle verwendet haben:

 # MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric - train mast3r with metric regression and matching loss
# we used cosxl to generate variations of DL3DV: "foggy", "night", "rainy", "snow", "sunny" but we were not convinced by it.

torchrun --nproc_per_node=8 train.py 
    --train_dataset "57_000 @ Habitat512(1_000_000, split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 68_400 @ BlendedMVS(split='train', mask_sky=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 68_400 @ MegaDepth(split='train', mask_sky=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ ARKitScenes(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ Co3d(split='train', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ StaticThings3D(mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ ScanNetpp(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 45_600 @ TartanAir(pairs_subset='', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 4_560 @ UnrealStereo4K(resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 1_140 @ VirtualKitti(optical_center_is_centered=True, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 22_800 @ WildRgbd(split='train', mask_bg='rand', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 145_920 @ NianticMapFree(split='train', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 57_000 @ DL3DV(split='nlight', resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 57_000 @ DL3DV(split='not-nlight', cosxl_augmentations=None, resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5) + 34_200 @ InternalUnreleasedDataset(resolution=[(512, 384), (512, 336), (512, 288), (512, 256), (512, 160)], aug_crop='auto', aug_monocular=0.005, transform=ColorJitter, n_corres=8192, nneg=0.5)" 
    --test_dataset " Habitat512(1_000, split='val', resolution=(512,384), seed=777, n_corres=1024) + 1_000 @ BlendedMVS(split='val', resolution=(512,384), mask_sky=True, seed=777, n_corres=1024) + 1_000 @ ARKitScenes(split='test', resolution=(512,384), seed=777, n_corres=1024) + 1_000 @ MegaDepth(split='val', mask_sky=True, resolution=(512,336), seed=777, n_corres=1024) + 1_000 @ Co3d(split='test', resolution=(512,384), mask_bg='rand', seed=777, n_corres=1024) " 
    --model " AsymmetricMASt3R(pos_embed='RoPE100', patch_embed_cls='ManyAR_PatchEmbed', img_size=(512, 512), head_type='catmlp+dpt', output_mode='pts3d+desc24', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, two_confs=True, desc_conf_mode=('exp', 0, inf)) " 
    --train_criterion " ConfLoss(Regr3D(L21, norm_mode='?avg_dis'), alpha=0.2, loss_in_log=False) + 0.075*ConfMatchingLoss(MatchingLoss(InfoNCE(mode='proper', temperature=0.05), negatives_padding=0, blocksize=8192), alpha=10.0, confmode='mean') " 
    --test_criterion " Regr3D(L21, norm_mode='?avg_dis', gt_scale=True, sky_loss_value=0) + -1.*MatchingLoss(APLoss(nq='torch', fp=torch.float16), negatives_padding=12288) " 
    --pretrained " checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth " 
    --lr 0.0001 --min_lr 1e-06 --warmup_epochs 8 --epochs 50 --batch_size 4 --accum_iter 2 
    --save_freq 1 --keep_freq 5 --eval_freq 1 --print_freq=10 --disable_cudnn_benchmark 
    --output_dir " checkpoints/MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric "

Visuelle Lokalisierung

Datensatzvorbereitung

Siehe Abschnitt „Visloc“ in DUSt3R

Beispielbefehle

Mit visloc.py können Sie unsere visuellen Lokalisierungsexperimente für Aachen-Day-Night, InLoc, Cambridge Landmarks und 7 Scenes durchführen.

 # Aachen-Day-Night-v1.1:
# scene in 'day' 'night'
# scene can also be 'all'
python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocAachenDayNight('/path/to/prepared/Aachen-Day-Night-v1.1/', subscene=' ${scene} ', pairsfile='fire_top50', topk=20) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Aachen-Day-Night-v1.1/ ${scene} /loc

# or with coarse to fine:

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocAachenDayNight('/path/to/prepared/Aachen-Day-Night-v1.1/', subscene=' ${scene} ', pairsfile='fire_top50', topk=20) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Aachen-Day-Night-v1.1/ ${scene} /loc --coarse_to_fine --max_batch_size 48 --c2f_crop_with_homography

# InLoc
python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocInLoc('/path/to/prepared/InLoc/', pairsfile='pairs-query-netvlad40-temporal', topk=20) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/InLoc/loc

# or with coarse to fine:

python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocInLoc('/path/to/prepared/InLoc/', pairsfile='pairs-query-netvlad40-temporal', topk=20) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/InLoc/loc --coarse_to_fine --max_image_size 1200 --max_batch_size 48 --c2f_crop_with_homography

# 7-scenes:
# scene in 'chess' 'fire' 'heads' 'office' 'pumpkin' 'redkitchen' 'stairs'
python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocSevenScenes('/path/to/prepared/7-scenes/', subscene=' ${scene} ', pairsfile='APGeM-LM18_top20', topk=1) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/7-scenes/ ${scene} /loc

# Cambridge Landmarks:
# scene in 'ShopFacade' 'GreatCourt' 'KingsCollege' 'OldHospital' 'StMarysChurch'
python3 visloc.py --model_name MASt3R_ViTLarge_BaseDecoder_512_catmlpdpt_metric --dataset " VislocCambridgeLandmarks('/path/to/prepared/Cambridge_Landmarks/', subscene=' ${scene} ', pairsfile='APGeM-LM18_top50', topk=20) " --pixel_tol 5 --pnp_mode poselib --reprojection_error_diag_ratio 0.008 --output_dir /path/to/output/Cambridge_Landmarks/ ${scene} /loc