The ML-startup paradox: Can't launch without data, can't | IndiaReply

madhu

•

about 1 year

The ML-startup paradox: Can't launch without data, can't get data without launching. (I will not promote)

i will not promote I'm a high school junior working to develop an ML-based MVP focused on an industry I've been actively involved in for several years. The model needs to be trained on public social media account data and user experiences, but web scraping to get this data violates Instagram and other platform's TOS. My thought is to build a quality dataset before launch rather than releasing an untrained model. I'm considering creating a simple form system where users can submit data about their own accounts and experiences with other accounts. But I feel like I would need around 50-100 quality submissions to train my model on an accurate dataset and this might even be unrealistic without some sort of incentivization. I think I might incentivize submissions with early access to premium features and verified status on platform (but likely not enough of an incentive) Has anyone successfully built an initial dataset this way? Looking for specific strategies on getting quality user-submitted data. Or if there is a way to either get access to or build a dataset comprised of public social media account data without initially needing user submitted data?

4

© 2026 Indiareply.com. All rights reserved.