Great tutorial! Much appreciated! Greetings from Hamburg
@juliensimonfr
3 жыл бұрын
Thanks for watching!
@aprilwang6036
2 жыл бұрын
Thank you very much for the video, it bridged the gap in the knowledge on the official doc and most tutorials elsewhere where you touched upon how sagemaker interacts with the training script. As someone new to AWS, I initially assumed that sagemaker 'imports' the training script, therefore I was very confused about why so many codes put the training part in the __main__ block. With your explanation about the docker thing, it made sense, and I finally understood the argparse part as well. I wish the sagemaker SDK would actually explain that part. Again, thanks!
@philippluniak5855
2 жыл бұрын
I second that. The SDK docs don't explain well how the script and the container interact. Thanks for filling that gap, Julien!
@juliensimonfr
2 жыл бұрын
Glad it helped :)
@akhil-menon
Жыл бұрын
This is such a real problem. It took me many hours of reading and research to understand the intuition behind putting the training code in a separate py file. But once I got that, this video really helped plug everything together. Thanks a lot Julien!
@juliensimonfr
Жыл бұрын
@@akhil-menon Glad I could help!
@DL-tl5wx
3 жыл бұрын
Hi, thank you for the great tutorial! I am wondering if the inference code could be added somewhere to script.py, so I could pass an unprocessed image to the endpoint to get the predictions?
@juliensimonfr
3 жыл бұрын
Hi, yes, you can pass an inference script when working with built-in frameworks. For example, with Tensorflow, look for "How to implement the pre- and/or post-processing handler(s)" at sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html
@DL-tl5wx
3 жыл бұрын
@@juliensimonfr oh, great! Thank you so much!
@harisseyassine8483
2 жыл бұрын
how can i set the tensorflow_p36 in the terminal ?
@juliensimonfr
2 жыл бұрын
You need to set the kernel in the notebook. You can create your own with conda if needed.
@sachavanweeren9578
3 жыл бұрын
I have a question - using tensorflow 2 how would you save a keras model? Can you explain or do you have a link for this
@juliensimonfr
3 жыл бұрын
www.tensorflow.org/guide/keras/save_and_serialize
@gravitycuda
4 жыл бұрын
Best ever tutorial on how to use Script mode in Sagemaker, Thank you very much Julien. I am happy that I am the first one to comment.
@juliensimonfr
4 жыл бұрын
You're welcome!
@barinderthind
Жыл бұрын
How would I use a custom model and not tensorflow? For example, a Prophet model; I followed this tutorial to the point where I was able to run my training script through terminal and passing in arguments through terminal but after that, it seems purely geared towards tensorflow
@juliensimonfr
Жыл бұрын
You can use Script Mode with any algo. However, to use Prophet, you would need either a custom container, see docs.aws.amazon.com/sagemaker/latest/dg/docker-containers-adapt-your-own.html
@tranhieu3982
4 жыл бұрын
Thanks Julien, I have a question. How can we specify the cuda version in the training instance because tensorflow 2.3 only works with cuda 10.1 or higher but the ml.p2.xlarge auto specify the cuda 10.0?
@juliensimonfr
3 жыл бұрын
I don't think you can do that with the built-in containers. Either wait for us to release TF 2.3, or try building your own container.
@vinayaknayak7374
3 жыл бұрын
Thanks for this tutorial, this was really helpful!
@juliensimonfr
3 жыл бұрын
You're welcome!
@MuhammadAli-mi5gg
4 жыл бұрын
Hello, I couldnot train the model on local-gpu...following error message is showed... RuntimeError: Failed to run: ['docker-compose', '-f', '/tmp/tmpv__81ldz/docker-compose.yaml', 'up', '--build', '--abort-on-container-exit'], Process exited with code: 1 Please, guide me, and I am completely at sea ,if you talk about dockers... Thnaks
@juliensimonfr
3 жыл бұрын
is Docker running on your machine ? If you're confused about docker, please learn about that first. Docker.com has a good tutorial.
@zeehanrahman6932
2 жыл бұрын
Hello, thank you for the helpful tutorial. May you please explain how sagemaker is able to read the training script? Are we supposed to upload the training script to an S3 bucket as well?
@juliensimonfr
2 жыл бұрын
No, this is done automatically by the Estimator.
@vinayaknayak7374
3 жыл бұрын
Hi Julien! If I want to use code from other helper funtions written in separate python files (.py files) in the same source_dir as the script.py file, what should I do? For eg. I have a utils.py file in my source_dir alongside the script.py file. Now, I am importing get_feature_vector function procedure from utils.py file in my script.py file. How can I do this, could you please guide? Thanks, Vinayak.
@vinayaknayak7374
3 жыл бұрын
Sorry, these files got copied in the docker container as well. I thought it was only the script.py file that gets copied. Thanks. Resolved the issue :)
@juliensimonfr
3 жыл бұрын
cool :)
@gravitycuda
4 жыл бұрын
Can you please tell me, how did you activate the terminal to have tensorflow_p36 environment. Thanks
@juliensimonfr
4 жыл бұрын
Open a terminal in Jupyter ("New" dropdown on the right side), and type 'source activate tensorflow_p36'
@rafikferraoun3041
3 жыл бұрын
Hello, Thank you Julien for this tutorial ! I am wondering if and how can I use kfold cross validation while training my keras models and use script mode to implement it ?
@juliensimonfr
3 жыл бұрын
Sure, it's your code :) You could run k-fold validation once the model is trained, write results to a file located in the same directory as the trained model, and grab them in the model artefact once training is complete.
@vidhipatel526
2 жыл бұрын
Thank you so much for this amazing tutorial 👍
@juliensimonfr
2 жыл бұрын
My pleasure 😊
@jessehao590
3 жыл бұрын
Thanks for your video. However, I did a test based on this code Can I set train_instance_count=2, and what does this mean, thanks. Confused.
@juliensimonfr
3 жыл бұрын
This is called distributed training, where several instances collaborate on the same training job. All Deep Learning libraries support it, for instance aws.amazon.com/blogs/machine-learning/running-distributed-tensorflow-training-with-amazon-sagemaker/
@jessehao590
3 жыл бұрын
@@juliensimonfr Thanks for your reply. Amazing, If possible, I expect some videos on the relationship between Ray, Horovod, SageMaker. Is SageMaker not enough? Why do we consider Ray for example? Really appreciate.
@juliensimonfr
3 жыл бұрын
Horovod is built in SageMaker. Nothing to add, e.g. aws.amazon.com/blogs/machine-learning/multi-gpu-and-distributed-training-using-horovod-in-amazon-sagemaker-pipe-mode/
Пікірлер: 43