Functional Requirements
- Return all business based on user’s location.
- Business owners can add and update their information, but this does’t needs to be refelected in real-time.
- Customer can view detailed information about a business.
Non functional Requirements
- Low Latency: User should be able to see results quickly.
- Data privacy: As location is sensitive information, follow GDPR.
- High availability and scalability on peak hours in dense areas.
Some basic calculation estimations.
- Assuming we have 100 million DAU and 200 million Businesses
- Second in a day is 84K for simplicity lets take it 100k seconds. Assuming we get 5 queries/day
- Search QPS = 5 x 100 million = 500 million(6 zeros)/100000 = 5000 requests/second
API Design
This api return a list of business given a location.
GET /v1/search/nearby
params: {
"latitude": decimal, // required
"longitude": decimal, // required
"radius": int // optional, (default 3000 meters, 3 kms)
}
response : {
"total": 10,
"business": [{Business object}]
}
For fetching additional attributes after selecting a business
GET /v1/business/:id
POST /v1/business
PUT /v1/business/:id
DELETE /v1/business/:id
Data model
Read/Write Ratio
As we can see our system is read heavy, due to two reasons.
- Search for nearby
- View Detailed Information Write is low because the business actions are into that frequent.
A relational MYSQL can be a good fit for this.
Data Schema
Key database tables are the business and geospatial index tables.
Business table
_id: PK
address
city
state
country
latitude
longitude
High level Design
graph TD
User --> LoadBalancer
LoadBalancer --> LBS
LoadBalancer --> BusinessService
LBS --> Replica1
LBS --> Replica2
LBS --> Replica3
BusinessService --> PrimaryDatabase
PrimaryDatabase --> Replica1
PrimaryDatabase --> Replica2
PrimaryDatabase --> Replica3
Replica1 -->|Replicate| PrimaryDatabase
Replica2 -->|Replicate| PrimaryDatabase
Replica3 -->|Replicate| PrimaryDatabase
subgraph "Database Cluster"
Replica1
Replica2
Replica3
PrimaryDatabase
end
Location based service
- It is a read-heavy service with no write requests.
- QPS is high, especially during peak hours in dense areas.
- This service is stateless so it’s easy to scale horizontally
Business service
- CUD is not high QPS
- Read is high QPS during peak hours.
Database Cluster
We can have a master-slave setup, writes will go to master and replicated to slave. This might cause some inconsistency but should not create a problem as the writes doesn’t need to have real time changes.
Algo to fetch nearby business
- We as we have lat and long we simply can use that and create a circle witha radius R,
- This will be problematic as we need to scan the whole table and see if it fall into the range
- Even creating indexes will not help
- We need a wat to Map 2D data to 1D
Geo Spatial indexing
We have 2 types
- Hash based
- Even Grid
- GeoHash
- Cartesian Tiers
- Tree
- QuadTree
- Google S2
- RTree
The idea is to divide the map into smaller areas and build indexes.