I analyzed 571M Amazon reviews to find the most profanity-filled customer rants
Comments
vivzkestrel
jmp1062
i would probably use Playwright with custom code, create chunks based on similar products, then run it on a large cluster in parallel (https://github.com/Burla-Cloud/burla).
if you have a single worker trying to scrape a shit ton of products back to back to back you're going to get rate limited or their bot detection will catch you.
skyberrys
Well the website is kind of useless, but it does suck me in. I love reading crazy reviews. The only thing that would make it better is if they also included Airbnb reviews.
The second review I read was a customer complaining about profanity in a movie and then writing out all the examples. Who has time for that?
jperryjperry
well well well... take a look at what I just built https://burla-cloud.github.io/airbnb-burla/
skyberrys
I must say the reviews you have are more in the horrifying and less in the pretty funny situation. My favorite funny (and bad) review was a host that accused his guest of flipping over all the furniture in the house and the guest was like "why and how would I do this". I still want to know what happened that day. How did all the furniture end up upside down?
skyberrys
I love it! Endless entertainment and 0 attempts to get me to stay at the Airbnb.
jperryjperry
yeah now that I have the images I want to do some silly shit with it. maybe find the all Airbnbs with satanic decor or like red rooms haha
skyberrys
Find all of the ones with taxidermy in the southwest USA.... It's like all of them. Okay I did find one in Austin without, but it still had cowhide pillows.
reaperducer
It does taste exactly like formaldehyde and kerosene swirled around with a bit of gas station kimchi that's been warming down the front of a hobo's pants. How do I know? To give this comment validity, I took the liberty of mixing up that exact concoction, then I went to down the train yard and asked a hobo to warm it up for me right on his swampy, fetid hobo taint.
Loved this until I remembered that these reviews are what AI is trained on and influenced by.
But at least he's employing hobos.
rendaw
Amazon doesn't even allow you to use slightly strong (non-profanity) wording in reviews these days. Are these old reviews?
jperryjperry
from 2023
rawgabbit
I love this. The reviews' word play tops MacBeth in my book.
jmp1062
i'm just happy they don't censor the comment section haha, makes for funny content.
i also love that people will complain about the vulgar language in a book or movie by writing a review that contains a quote with the vulgar language
mind_heist
how did you scrape all the reviews?
jperryjperry
open source dataset from McAuley Lab at UCSD https://huggingface.co/datasets/McAuley-Lab/Amazon-Reviews-2....
I'm going to publish an Airbnb example tomorrow where I scraped 1,406,718 photo URLs from public listing pages. For that I used https://docs.burla.dev/ which is a high-performance parallel processing python library I've been working on for a few years now.
add-sub-mul-div
Shit like this is why Amazon reviews are now behind a login wall for everyone.
mandeepj
I don’t think a login wall can stop scrappers
- i saw your other comment that talks about using an open source dataset but i had to ask
- how would you actually go about loading reviews if you really wanted to
- what kind of system would you need to work around the captcha and stuff