Summarize a website8/4/2023 ![]() This is a massive idea which needs to have a good foundation of underlying tech to extract and process large amounts of text data. I am building a newsletter platform to analyze many text pages, gets authority, virality and popularity scores for these pieces of content, and then compiles a great and easy to consume digest from the best content. So, Bing Chat, great for a personal use, was not a viable approach. But, my use case required a much bigger scale and a high level of automation - I am building a data pipeline which summarizes hundreds of articles every hour, and compiles then into a beautiful digest. While this is a great deal for most of personal use cases. It uses your computer resources to load the data so it can process it. Bing chat requires to have an Edge browser or trick Bing into thinking you are using Edge browserĢ.There are two problems with this approach: ![]() It can even summarize a PDF document! This is a good way to get the summary of a big text. The relatively easy solution to make GPT summarize a webpage looking at its actual content is to use a Bing Chat, which leverages GPT4 (or at least, an engine very similar TO GPT4) to read from a webpage. So, to make 100% sure that GPT cannot read web content, I needed to start a new conversation and add some innocent word to the url: Proof of GPT hallucinationsĪha! Now GPT creates a beautiful article summary from a non existing url ( I have appended "-sky" word to the url to brake it ). TLDR: if you feed the URL of a random article into GPT, and ask it to summarize, using "please summarize " prompt, it hallucinates by just analyzing words in the URL – it does NOT get the actual content of an article (Ok, this is true for the end of March of 2023, I am pretty sure this might change soon with OpenAI plugins).Īs you can see on this screenshot above, GPT not only hallucinates, it is also smart enough to gaslight the user when the user tries to check how it behaves on a non-existing URL! GPT leverages context knowledge from the chat to understand that the second URL is looking bad. The biggest issue here is that it is not even obvious for the user that this is a pure fantasy of GPT, so I was pretty sure GPT is able to get the new content from the web until I have started doing real and thorough fact checking of the summary it produced. ![]() This looks pretty legitimate, isn't it? GPT gives a summary for existing links and detects broken links, which apparently means there is a web scraper underneath, woah! Let's ask ChatGPT to summarize a fresh piece of news: GPT hallucinates. This is very exciting and contradictory: from OpenAI docs and from asking ChatGPT we know that GPT is trained on a web knowledge up to 2021, but read on. ![]() I have recently noticed that if I feed the URL of a random article into GPT, and ask it to summarize, using "please summarize " prompt, it gives a pretty good summary of an article, even if an article is just a few hours old. ![]()
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |