# Finding Datasets

Discovering the right dataset for your project is simple on Opendatabay. We offer multiple discovery methods powered by AI and traditional search, ensuring you find exactly what you need quickly and efficiently.

## Discovery Methods

### 1. Search Engine Discovery

All listed datasets on Opendatabay are **automatically indexed** by major search engines, making them discoverable through:

* Google Search
* Bing
* DuckDuckGo
* Other search engines

**How it works:**

* Search for your data needs using natural language (e.g., "cancer patient data for research")
* Opendatabay listings appear in search results
* Click through to view dataset details and purchase

**Benefits:**

* Start your search where you already work
* No need to visit the platform first
* SEO-optimised listings for better discoverability

***

### 2. LLM-Powered Discovery

Opendatabay datasets are **automatically exposed** to all major Large Language Models, enabling intelligent recommendations:

**Supported LLMs:**

* ChatGPT (OpenAI)
* Perplexity
* Google Gemini
* DeepSeek
* Mistral AI
* Claude (Anthropic)
* Grok (xAI)

**How it works:**

* Ask your preferred LLM: "What datasets are available for training a sentiment analysis model?"
* LLMs recommend Opendatabay products based on:
  * Product name and description
  * Use cases and applications
  * Price and licensing terms
  * Keywords and tags
  * Data modalities (text, image, audio, video, etc.)

**Benefits:**

* Natural language queries
* Context-aware recommendations
* Fast and reliable discovery
* Legal and compliant suggestions

***

### 3. Platform Search Bar

Use the **search bar** at the top of the Opendatabay platform (navbar) for instant results.

**Two search modes:**

#### Traditional Keyword Search

* Search by dataset name: "Financial Market Data"
* Search by data provider: "Acme Data Corp"
* Search by keywords: "healthcare", "NLP", "computer vision"

#### AI-Powered Query Search

* Enter natural language queries: "I am looking for cancer data"
* Describe your use case: "Need transaction data for fraud detection"
* Ask specific questions: "What datasets work best for training LLMs?"

**How it works:**

* Our AI engine analyses your query
* Matches you with the most suitable datasets
* Returns ranked results with **accuracy scores**

***

## Understanding Accuracy Scores

Each search result includes an **accuracy score** indicating how well the dataset matches your requirements:

| Score Range   | What It Means                                          |
| ------------- | ------------------------------------------------------ |
| **90-100%**   | Excellent match - highly recommended for your needs    |
| **75-89%**    | Good match - likely suitable with minor considerations |
| **60-74%**    | Moderate match - review carefully to ensure fit        |
| **Below 60%** | Lower match - may not fully meet your requirements     |

**Example:**

* Query: "I need sentiment analysis training data"
* Result A: 95% accuracy score → Best fit for your project
* Result B: 54% accuracy score → May not be optimal for sentiment analysis

***

## Ask AI Feature

For personalised assistance, use our [**Ask AI**](https://www.opendatabay.com/ask-ai) feature.

**What Ask AI does:**

* Answers questions about datasets and pricing
* Explains platform features and usage
* Provides dataset recommendations with probability scores
* Helps compare multiple datasets
* Clarifies licensing and delivery methods

**How to use it:**

1. Visit [opendatabay.com/ask](https://www.opendatabay.com/ask-ai)
2. Enter your question or describe your needs
3. Receive AI-powered recommendations with accuracy scores
4. Click through to view recommended datasets

**Example queries:**

* "What's the difference between General and Commercial AI licenses?"
* "Show me image datasets under £500 for computer vision"
* "Which dataset is best for training a customer churn prediction model?"

***

## Best Practices for Finding Datasets

### Be Specific

* ❌ "I need data"
* ✅ "I need labelled customer review data for sentiment analysis training"

### Include Key Details

* Use case or application
* Data modality (text, image, audio, etc.)
* Budget range
* Licensing requirements (commercial, research, etc.)
* Geographic or industry focus

### Use Multiple Discovery Methods

* Start with LLM recommendations for broad discovery
* Refine with platform search for specific criteria
* Use Ask AI for detailed comparisons

### Check Accuracy Scores

* Prioritise datasets with 75%+ accuracy scores
* Review lower-scoring options only if specific needs aren't met
* Read full descriptions even for high-scoring matches

***

## Need Help?

**Can't find what you're looking for?**

* Use [**Ask AI**](https://www.opendatabay.com/ask-ai) for personalised assistance
* Contact data providers directly through their listings
* Reach out to **<support@opendatabay.com>** for help with discovery

***

**Pro Tip:** Datasets are continuously added to Opendatabay. If you don't find what you need today, check back regularly or request data product to be added via [**Request Data**](https://www.opendatabay.com/request-data) page

***
