Is there a list somewhere of what the default instructions are that are built into the Flan model? Also, Is there such a thing as a longformer that elaborates instead of summarizes? I'm looking for something that will over-explain (or over-analyze) narrative writing.
@SamuelAlbanie1
Жыл бұрын
1. There is quite a detailed description of the instruction data used by Flan in their arxiv paper, which may be of help: arxiv.org/abs/2109.01652 2. ChatGPT is not bad at over-explanations.
@HoriaCristescu
Жыл бұрын
Great video. I hope you publish more!
@brandomiranda6703
Жыл бұрын
Something I don't understand is how informal reasoning tasks are evaluated without the need of humans since the output isn't in a formal language that can be properly evaluated
@SamuelAlbanie1
Жыл бұрын
Good question. For some benchmarks (e.g. MMLU), the questions are multiple choice, so the output need only be a single answer choice. For open-ended text generation, the authors in this work use human raters to score the model outputs.
@lehoumusic2633
Жыл бұрын
Oh, this is very informative. I learned a lot new things.
@deb_c836
Жыл бұрын
Awesome summary of key findings and liked the comparison slide to prior work. Thank you Samuel. I did try to reach the slides listed but unfortunately they are not working and leads to "Page not found" on your website. Would you kindly update the url?
@SamuelAlbanie1
Жыл бұрын
Thanks for flagging this - I've fixed the slides link.
@MasterMan2015
Жыл бұрын
How they calculate the normalized average ? What it is negative ? I searched but I did not get it ?
@SamuelAlbanie1
Жыл бұрын
Good question. There is a qualitative description in the footnote on page 5 arxiv.org/abs/2210.11416v5: "A normalized metric scales an evaluation number with respect to a task-specific lower bound such as random guessing baseline for a multiple choice question. For example, if random guessing produces 50% accuracy and the max accuracy of 100%, then a raw accuracy of 55% would be be normalized to 10%, and a raw accuracy of 45% would be normalized to -10% since it is worse than random." So the metric can be negative if a model performs worse than the task-specific lower bound (e.g. random guessing).
@Moreoverover
Жыл бұрын
1. What software do you use to make these slides? 2. Is the rapid progress of this not ominous? What are your thoughts? Do you see this as a net good?
@SamuelAlbanie1
Жыл бұрын
1. I added a short software description kzitem.info/rock/MtxPy2z1qzwvP7wiarkoawabout
Пікірлер: 12