Tensorflow Serving的使用
Date: 2019/05/10 Categories: 工作 Tags: tensorflow DeepLearningInference docker
tensorflow-serving启动
方法一 使用docker
DockerHub上有编译 好的tensorflow-serving镜像, 可以直接运行
docker run -it --name servingInstance --net=host tensorflow/serving --help
如果在IDC环境下可以访问docker.oa.com,
可以联系andyfei
加入nlpc组后直接用docker.oa.com/nlpc/tensorflow-serving:1.12.0
方法二 使用编译好的可执行文件
wget http://10.254.99.102:8080/andyfei/bin/tensorflow_model_server-1.12.0-static -O tensorflow_model_server
chmod +x tensorflow_model_server
./tensorflow_model_server --help
输出为帮助信息
usage: ./tensorflow_model_server
Flags:
--port=8500 int32 Port to listen on for gRPC API
--grpc_socket_path="" string If non-empty, listen to a UNIX socket for gRPC API on the given path. Can be either relative or absolute path.
--rest_api_port=0 int32 Port to listen on for HTTP/REST API. If set to zero HTTP/REST API will not be exported. This port must be different than the one specified in --port.
--rest_api_num_threads=192 int32 Number of threads for HTTP/REST API processing. If not set, will be auto set based on number of CPUs.
--rest_api_timeout_in_ms=30000 int32 Timeout for HTTP/REST API calls.
--enable_batching=false bool enable batching
--batching_parameters_file="" string If non-empty, read an ascii BatchingParameters protobuf from the supplied file name and use the contained values instead of the defaults.
--model_config_file="" string If non-empty, read an ascii ModelServerConfig protobuf from the supplied file name, and serve the models in that file. This config file can be used to specify multiple models to serve and other advanced parameters including non-default version policy. (If used, --model_name, --model_base_path are ignored.)
--model_name="default" string name of model (ignored if --model_config_file flag is set)
--model_base_path="" string path to export (ignored if --model_config_file flag is set, otherwise required)
--max_num_load_retries=5 int32 maximum number of times it retries loading a model after the first failure, before giving up. If set to 0, a load is attempted only once. Default: 5
--load_retry_interval_micros=60000000 int64 The interval, in microseconds, between each servable load retry. If set negative, it doesnt wait. Default: 1 minute
...
部署模型
tensorflow-serving需要使用导出的savedModel格式的模型, 下面以docker为例进行介绍.
约定容器的/models
目录用来存放模型. 假设我们只有一个模型, 导出的savedModel文件存放在当前目录下的models/
中
models/
models/model
models/model/1
models/model/1/saved_model.pb
models/model/1/assets.extra
models/model/1/assets.extra/tf_serving_warmup_requests
models/model/1/variables
models/model/1/variables/variables.data-00000-of-00001
models/model/1/variables/variables.index
运行命令:
docker run --rm --name=infer --net=host -v $PWD/models/:/models --net=host docker.oa.com/nlpc/tensorflow-serving:1.12.0
输出类似
2019-05-10 09:13:56.533113: I tensorflow_serving/model_servers/server.cc:82] Building single TensorFlow model file config: model_name: model model_base_path: /models/model
2019-05-10 09:13:56.533344: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-05-10 09:13:56.533367: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: model
2019-05-10 09:13:56.633594: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 1}
2019-05-10 09:13:56.633635: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}
2019-05-10 09:13:56.633653: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}
2019-05-10 09:13:56.633680: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/model/1
2019-05-10 09:13:56.633698: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/model/1
2019-05-10 09:13:56.678675: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-05-10 09:13:56.878847: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:201] Restoring SavedModel bundle.
2019-05-10 09:13:57.578421: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle.
2019-05-10 09:13:57.839613: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:310] SavedModel load for tags { serve }; Status: success. Took 1205904 microseconds.
2019-05-10 09:13:58.385122: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:143] Finished reading warmup data for model at /models/model/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1.
2019-05-10 09:13:58.385356: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: model version: 1}
2019-05-10 09:13:58.414938: I tensorflow_serving/model_servers/server.cc:319] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-05-10 09:13:58.420906: I tensorflow_serving/model_servers/server.cc:339] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...
查看一下模型是否serve了, curl http://localhost:8501/v1/models/model
,输出:
{
"model_version_status": [
{
"version": "1",
"state": "AVAILABLE",
"status": {
"error_code": "OK",
"error_message": ""
}
}
]
}
如果要查看模型的输入输出等信息, 可以用tensorflow自带的saved_model_cli
命令:
$ saved_model_cli show --dir models/model --all
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['segment_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 50)
name: segment_ids:0
inputs['token'] tensor_info:
dtype: DT_STRING
shape: (-1, 50)
name: token:0
The given SavedModel SignatureDef contains the following output(s):
outputs['score'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 1)
name: dense_1/Sigmoid:0
Method name is: tensorflow/serving/predict
可以看到模型接受两个输入, 分别是segment_ids
和token
, 输出是score
, 它们的类型也标识出来了.这里是一个基于BERT的模型.
使用REST风格接口访问tensorflow serving
要启动HTTP+REST接口, 需要启动参数加上--rest_api_port=8501
.
这是可以用python requests来测试这个接口.
这里用BERT来计算两个句子的相似度, 需要把两个句子拼接起来并加上相应的padding.
import requests
def prepare(a, b, seg_length):
a, b = map(list, [a,b])
token = ['[CLS]'] + a + ['[SEP]'] + b + ['[SEP]']
token += [''] * (seg_length - len(token))
len_a = len(a) + 2
len_b = len(token) - len_a
segment_ids = [0] * len_a + [1] * len_b
return token, segment_ids
def query(token, segment_ids):
data = {
'inputs': {
'segment_ids': [segment_ids],
'token': [token]
}
}
return requests.post('http://127.0.0.1:8501/v1/models/model:predict', json=data).json()
def compare(a, b):
token, segment_ids = prepare(a, b, 50)
return query(token, segment_ids)
print compare(u'中国最高山峰', u'中国的最高山峰')
多模型serving
tensorflow-serving一个进程可以同时serve多个模型, 可以从不同的url进行调用.
比如刚才的模型, 我们将它复制一份, 变成model2, 那么现在的目录为:
models/
├── model
│ └── 1
│ ├── assets.extra
│ │ └── tf_serving_warmup_requests
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
└── model2
└── 1
├── assets.extra
│ └── tf_serving_warmup_requests
├── saved_model.pb
└── variables
├── variables.data-00000-of-00001
└── variables.index
8 directories, 8 files
编写一个model.json
文件放在/models/
下面, 内容为
model_config_list: {
config: {
name: "original_model",
base_path: "/models/model",
model_platform: "tensorflow"
},
config: {
name: "copied_model",
base_path: "/models/model2",
model_platform: "tensorflow"
},
}
启动时加上命令行参数--model_config_file=/models/model.json
即可
tfserving输出为
2019-05-10 09:46:07.919645: I tensorflow_serving/model_servers/server_core.cc:461] Adding/updating models.
2019-05-10 09:46:07.919726: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: original_model
2019-05-10 09:46:07.919736: I tensorflow_serving/model_servers/server_core.cc:558] (Re-)adding model: copied_model
2019-05-10 09:46:08.019920: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: original_model version: 1}
2019-05-10 09:46:08.019954: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: original_model version: 1}
2019-05-10 09:46:08.019965: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: original_model version: 1}
2019-05-10 09:46:08.019987: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/model/1
2019-05-10 09:46:08.020001: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/model/1
2019-05-10 09:46:08.065693: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-05-10 09:46:08.119947: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: copied_model version: 1}
2019-05-10 09:46:08.119988: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: copied_model version: 1}
2019-05-10 09:46:08.120002: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: copied_model version: 1}
2019-05-10 09:46:08.120018: I external/org_tensorflow/tensorflow/contrib/session_bundle/bundle_shim.cc:363] Attempting to load native SavedModelBundle in bundle-shim from: /models/model2/1
2019-05-10 09:46:08.120028: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /models/model2/1
2019-05-10 09:46:08.172803: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }
2019-05-10 09:46:08.309470: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:201] Restoring SavedModel bundle.
2019-05-10 09:46:08.409953: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:201] Restoring SavedModel bundle.
2019-05-10 09:46:09.078151: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle.
2019-05-10 09:46:09.140231: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:151] Running initialization op on SavedModel bundle.
2019-05-10 09:46:09.349474: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:310] SavedModel load for tags { serve }; Status: success. Took 1329464 microseconds.
2019-05-10 09:46:09.411524: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:310] SavedModel load for tags { serve }; Status: success. Took 1291495 microseconds.
2019-05-10 09:46:09.911524: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:143] Finished reading warmup data for model at /models/model/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1.
2019-05-10 09:46:09.911788: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: original_model version: 1}
2019-05-10 09:46:09.933021: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:143] Finished reading warmup data for model at /models/model2/1/assets.extra/tf_serving_warmup_requests. Number of warmup records read: 1.
2019-05-10 09:46:09.933238: I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: copied_model version: 1}
2019-05-10 09:46:09.967444: I tensorflow_serving/model_servers/server.cc:319] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2019-05-10 09:46:09.973453: I tensorflow_serving/model_servers/server.cc:339] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 237] RAW: Entering the event loop ...
可以看到两行SavedModel load for ...
调用时只需要把代码中的url改为http://127.0.0.1:8501/v1/models/original_model:predict
或
http://127.0.0.1:8501/v1/models/copied_model:predict
即可.
Batching
当吞吐量比延时重要时,比如离线批处理数据, 我们可以设置dynamic batching来提高吞吐量.
batching.conf
的一个例子为下面, 设置最大等待组成一个batch的时间为5000毫秒
max_batch_size { value: 1024 }
batch_timeout_micros { value: 5000 }
max_enqueued_batches { value: 1000 }
num_batch_threads { value: 24 }
需要加上命令行参数--enable_batching=true --batching_parameters_file=/models/batching.conf
总结
最终启动batching和多模型的/models
目录类似:
models/
├── batching.conf
├── model
│ └── 1
│ ├── assets.extra
│ │ └── tf_serving_warmup_requests
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
├── model2
│ └── 1
│ ├── assets.extra
│ │ └── tf_serving_warmup_requests
│ ├── saved_model.pb
│ └── variables
│ ├── variables.data-00000-of-00001
│ └── variables.index
├── model.json
└── models
9 directories, 10 files
启动的docker命令为
docker run --rm --name=infer --net=host -v $PWD/models/:/models --net=host \
tensorflow/serving:1.12.0 \
--model_config_file=/models/model.json \
--enable_batching=true --batching_parameters_file=/models/batching.conf
当测试单条数据时,加上5ms的batching timeout会稳定的增加5ms延迟.