Map（网站地图） - 302.AI API文档

来自Firecrwal的网页地图获取服务
官方文档：https://docs.firecrawl.dev/api-reference/endpoint/map

价格：0.005 PTC/次

请求参数

Header 参数

Authorization

string

API Key

可选

示例值:

Bearer {{YOUR_API_KEY}}

Body 参数application/json

url

string

必需

string

可选

ignoreSitemap

boolean

可选

sitemapOnly

boolean

可选

includeSubdomains

boolean

可选

limit

integer

可选

timeout

integer

可选

formats

array[string]

可选

onlyMainContent

boolean

可选

includeTags

array[string]

可选

excludeTags

array[string]

可选

headers

object

可选

waitFor

integer

可选

mobile

boolean

可选

skipTlsVerification

boolean

可选

jsonOptions

object

可选

schema

object

必需

systemPrompt

string

必需

prompt

string

必需

actions

array [object {3}]

可选

type

string

可选

milliseconds

integer

可选

selector

string

可选

location

object

可选

country

string

必需

languages

array[string]

必需

removeBase64Images

boolean

可选

blockAds

boolean

可选

proxy

string

可选

changeTrackingOptions

object

可选

mode

string

必需

schema

object

必需

prompt

string

必需

示例

{
  "url": "<string>",
  "search": "<string>",
  "ignoreSitemap": true,
  "sitemapOnly": false,
  "includeSubdomains": false,
  "limit": 5000,
  "timeout": 123
}

示例代码

Shell

JavaScript

Java

Swift

PHP

Python

HTTP

Objective-C

Ruby

OCaml

Dart

返回响应

🟢200成功

application/json

Body

pages

array [object {4}]

必需

index

integer

页号

必需

markdown

string

markdown

必需

images

array [object {6}]

必需

dimensions

object

必需

model

string

模型名称

必需

usage_info

object

必需

pages_processed

integer

文件页数

必需

doc_size_bytes

integer

文件大小

必需

示例

{
    "pages": [
        {
            "index": 0,
            "markdown": "# LEVERAGING UNLABELED DATA TO PREDICT OUT-OF-DISTRIBUTION PERFORMANCE \n\nSaurabh Garg*<br>Carnegie Mellon University<br>sgarg2@andrew.cmu.edu<br>Sivaraman Balakrishnan<br>Carnegie Mellon University<br>sbalakri@andrew.cmu.edu<br>Zachary C. Lipton<br>Carnegie Mellon University<br>zlipton@andrew.cmu.edu\n\n## Behnam Neyshabur\n\nGoogle Research, Blueshift team\nneyshabur@google.com\n\nHanie Sedghi<br>Google Research, Brain team<br>hsedghi@google.com\n\n\n#### Abstract\n\nReal-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions that may cause performance drops. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples for which model confidence exceeds that threshold. ATC outperforms previous methods across several model architectures, types of distribution shifts (e.g., due to synthetic corruptions, dataset reproduction, or novel subpopulations), and datasets (WILDS, ImageNet, BREEDS, CIFAR, and MNIST). In our experiments, ATC estimates target performance $2-4 \\times$ more accurately than prior methods. We also explore the theoretical foundations of the problem, proving that, in general, identifying the accuracy is just as hard as identifying the optimal predictor and thus, the efficacy of any method rests upon (perhaps unstated) assumptions on the nature of the shift. Finally, analyzing our method on some toy distributions, we provide insights concerning when it works ${ }^{1}$.\n\n\n## 1 INTRODUCTION\n\nMachine learning models deployed in the real world typically encounter examples from previously unseen distributions. While the IID assumption enables us to evaluate models using held-out data from the source distribution (from which training data is sampled), this estimate is no longer valid in presence of a distribution shift. Moreover, under such shifts, model accuracy tends to degrade (Szegedy et al., 2014; Recht et al., 2019; Koh et al., 2021). Commonly, the only data available to the practitioner are a labeled training set (source) and unlabeled deployment-time data which makes the problem more difficult. In this setting, detecting shifts in the distribution of covariates is known to be possible (but difficult) in theory (Ramdas et al., 2015), and in practice (Rabanser et al., 2018). However, producing an optimal predictor using only labeled source and unlabeled target data is well-known to be impossible absent further assumptions (Ben-David et al., 2010; Lipton et al., 2018).\n\nTwo vital questions that remain are: (i) the precise conditions under which we can estimate a classifier's target-domain accuracy; and (ii) which methods are most practically useful. To begin, the straightforward way to assess the performance of a model under distribution shift would be to collect labeled (target domain) examples and then to evaluate the model on that data. However, collecting fresh labeled data from the target distribution is prohibitively expensive and time-consuming, especially if the target distribution is non-stationary. Hence, instead of using labeled data, we aim to use unlabeled data from the target distribution, that is comparatively abundant, to predict model performance. Note that in this work, our focus is not to improve performance on the target but, rather, to estimate the accuracy on the target for a given classifier.\n\n[^0]\n[^0]:    * Work done in part while Saurabh Garg was interning at Google\n    ${ }^{1}$ Code is available at https://github.com/saurabhgarg1996/ATC_code.",
            "images": [],
            "dimensions": {
                "dpi": 200,
                "height": 2200,
                "width": 1700
            }
        },
        {
            "index": 1,
            "markdown": "![img-0.jpeg](img-0.jpeg)\n\nFigure 1: Illustration of our proposed method ATC. Left: using source domain validation data, we identify a threshold on a score (e.g. negative entropy) computed on model confidence such that fraction of examples above the threshold matches the validation set accuracy. ATC estimates accuracy on unlabeled target data as the fraction of examples with the score above the threshold. Interestingly, this threshold yields accurate estimates on a wide set of target distributions resulting from natural and synthetic shifts. Right: Efficacy of ATC over previously proposed approaches on our testbed with a post-hoc calibrated model. To obtain errors on the same scale, we rescale all errors with Average Confidence (AC) error. Lower estimation error is better. See Table 1 for exact numbers and comparison on various types of distribution shift. See Sec. 5 for details