A data-driven intelligence platform designed to analyze investor behavior, financial discussions, and market sentiment across multiple digital sources, transforming unstructured content into actionable strategic insights for investment decision-making.
Scalable Big Data, Distributed Computing, NLP & BM25 Market Intelligence Platform for Brazilian FIIs
Big Data • Distributed Systems • PySpark • BM25 Ranking • NLP • Sentiment Analysis • FastAPI • Streamlit • Plotly • Docker • MinIO • Data Engineering
Institution: Pontifical Catholic University of São Paulo (PUC-SP – Humanistic AI & Data Science • 5th Semester • 2026)
School: FACEI – Faculty of Interdisciplinary Studies
Course: AI Security, Cybersecurity & Social Engineering
Professor: ✨ Eduardo Savino Gomes e
Professor: ✨ Carlos Eduardo Paes
Authors: Fabiana ⚡️ Campanari and Pedro Vyctor Almeida
This platform is not a simple analytics dashboard.
It is an AI-powered Investor Intelligence System capable of:
analyzing investor sentiment across financial communities identifying high-value discussion topics in FIIs ranking content relevance using BM25 + NLP hybrid retrieval detecting market behavior patterns supporting indirect marketing strategy decisions mapping investor engagement across digital ecosystems
The system processes structured and unstructured data from financial portals and Reddit to:
identify where investors concentrate discussions detect emerging financial narratives measure engagement strength per platform rank information relevance using BM25 generate strategic insights for asset managers and analysts
- distributed data pipelines
- PySpark processing
- MinIO Data Lake architecture
- NLP analytics
- sentiment analysis
- interactive dashboards
- AI-assisted analytics
with the objective of building a scalable financial marketing intelligence ecosystem.
Note
The platform demonstrates real-world concepts involving:
- Big Data engineering
- distributed processing
- financial NLP
- market intelligence
- AI-assisted analytics
- scalable data architectures
All implementations prioritize educational realism, maintainability, and professional engineering practices.
- Hadoop + HDFS + OpenJDK on Ubuntu
- Hadoop + HDFS + OpenJDK on macOS
- MinIO Local Data Lake Setup
- PySpark Distributed Environment Setup
- Docker Academic Infrastructure Setup
The main objective of this platform is to build a distributed financial marketing intelligence ecosystem capable of transforming unstructured investment content into actionable strategic insights.
- market narratives
- investor sentiment
- financial engagement
- dividend discussions
- passive income trends
- investment behavior
- FII communication performance
- digital financial communities
- distributed analytics
- NLP pipelines
- sentiment analysis
- scalable data engineering
- AI-assisted exploration
- executive analytics
- strategic visualization
- AI-powered insights
- financial communication monitoring
- scalable research workflows
- portfolio-grade distributed systems experimentation
Web Scraping
↓
MinIO Data Lake
↓
PySpark Distributed Processing
↓
NLP + Sentiment Analysis
↓
Financial Marketing Analytics
↓
Analytics Artifacts
↓
FastAPI Backend
↓
Streamlit + Plotly Dashboard
↓
Groq AI Assistant (Llama 3.1 8B Instant)
- Financial Portals
- InfoMoney
- Suno Research Investidor10 Funds Explorer Clube FII Status Invest FIIs.com.br Money Times Seu Dinheiro Exame Invest Bora Investir (B3) E-Investidor Estadão Valor Investe NeoFeed The Cap Eu Quero Investir TradeMap Blog Investing.com Brasil CNN Brasil Business Inteligência Financeira
#FII #FIIs #FundosImobiliarios #RendaPassiva #Dividendos #Investimentos #MercadoFinanceiro #DividendYield #CarteiraDeInvestimentos #Investidor #PassiveIncome #BolsaDeValores #B3 #Fundos #Investing #InvestimentoInteligente #Financeiro #Mercado #Acoes #EducacaoFinanceira
MANNING, Christopher D.; RAGHAVAN, Prabhakar; SCHÜTZE, Hinrich. Introduction to Information Retrieval. Cambridge: Cambridge University Press, 2008.
ROBERTSON, Stephen E.; WALKER, Steve; JONES, Susan; HANCOCK-BEAULIEU, Micheline; GATFORD, Mike. Okapi at TREC-3. In: Text REtrieval Conference (TREC-3). Gaithersburg: NIST, 1995.
ROBERTSON, Stephen; ZARAGOZA, Hugo. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, Delft, v. 3, n. 4, p. 333–389, 2009.
RUSSELL, Stuart; NORVIG, Peter. Artificial Intelligence: A Modern Approach. 4. ed. Harlow: Pearson, 2021.
GOODFELLOW, Ian; BENGIO, Yoshua; COURVILLE, Aaron. Deep Learning. Cambridge: MIT Press, 2016.
BAROCAS, Solon; SELBST, Andrew D. Big data’s disparate impact. California Law Review, Berkeley, v. 104, n. 3, p. 671–732, 2016.
EUROPEAN COMMISSION. Ethics Guidelines for Trustworthy AI. Brussels: High-Level Expert Group on Artificial Intelligence, 2019.
MOLNAR, Christoph. Interpretable Machine Learning. 2. ed. [S.l.]: Lulu.com, 2022.
JURAFSKY, Daniel; MARTIN, James H. Speech and Language Processing. 3. ed. Stanford: Stanford University, 2025.
BRASIL. Lei nº 13.709, de 14 de agosto de 2018. Lei Geral de Proteção de Dados Pessoais (LGPD). Diário Oficial da União: Brasília, DF, 15 ago. 2018.
🛸๋ My Contacts Hub
────────────── ⊹🔭๋ ──────────────
➣➢➤ Back to Top
Copyright 2026 Mindful-AI-Assistants. Code released under the MIT license.