-
Notifications
You must be signed in to change notification settings - Fork 119
[Don't review]Remote execution 2 - testing CI #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/26.02
Are you sure you want to change the base?
[Don't review]Remote execution 2 - testing CI #803
Conversation
Introduce CPU/remote-safe solution handling, gRPC service definitions, and remote solve client/server plumbing to support remote execution and log streaming while retaining local solve behavior.
Include gRPC packages in dependency manifests and regenerate environment files from dependencies.yaml.
Ensure incumbent callbacks are retained for gRPC workers, switch incumbent job IDs to bytes in the proto, and harden remote solve CPU-only paths and build configuration defaults.
📝 WalkthroughWalkthroughAdds comprehensive gRPC-based remote-solve support: protobuf/gRPC schemas, server and client implementations, pluggable protobuf serializer, remote-solve C/C++/Python APIs, hybrid GPU/CPU solution memory model, build/CI integration for gRPC/Protobuf, and developer docs/scripts. All changes are additive; no existing public declarations removed. Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
python/cuopt/cuopt/utilities/utils.py (1)
45-64: Pre-existing bug: operator precedence issue in boundary checks.Lines 53 and 59 contain a precedence bug that predates this PR:
if len(data.get_variable_lower_bounds() > 0): # WrongThis applies
> 0element-wise to the array first, then takeslen()of the resulting boolean array. The intent appears to be:if len(data.get_variable_lower_bounds()) > 0: # CorrectWhile this is pre-existing code not modified in this PR, it may cause unexpected behavior in validation. Consider addressing in a follow-up.
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (2)
41-52: Missing null check onPyObject_CallMethodresult beforePy_DECREF.If
PyObject_CallMethodfails (e.g., Python exception), it returnsNULL. CallingPy_DECREF(res)onNULL(line 51) causes undefined behavior. The same applies tonumba_matrixandnumpy_arrayif their creation fails.🐛 Proposed fix: Add null checks
void get_solution(void* data, void* objective_value) override { PyObject* numba_matrix = data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables); + if (!numba_matrix) return; // Python exception set PyObject* numpy_array = data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1); + if (!numpy_array) { Py_DECREF(numba_matrix); return; } PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array); Py_DECREF(numba_matrix); Py_DECREF(numpy_array); - Py_DECREF(res); + Py_XDECREF(res); // Use XDECREF to handle NULL safely }
80-91: Same null check issue inset_solution.Apply the same null-safety pattern here as recommended for
get_solution.🐛 Proposed fix
void set_solution(void* data, void* objective_value) override { PyObject* numba_matrix = data_on_device() ? get_numba_matrix(data, n_variables) : get_numpy_array(data, n_variables); + if (!numba_matrix) return; PyObject* numpy_array = data_on_device() ? get_numba_matrix(objective_value, 1) : get_numpy_array(objective_value, 1); + if (!numpy_array) { Py_DECREF(numba_matrix); return; } PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "set_solution", "(OO)", numba_matrix, numpy_array); Py_DECREF(numba_matrix); Py_DECREF(numpy_array); - Py_DECREF(res); + Py_XDECREF(res); }cpp/libmps_parser/CMakeLists.txt (1)
137-143: Static library missing BZip2/ZLIB include directories.When
MPS_PARSER_WITH_BZIP2orMPS_PARSER_WITH_ZLIBare enabled, only the shared library gets the include directories. The static library should also have these.🔧 Proposed fix
if(MPS_PARSER_WITH_BZIP2) target_include_directories(mps_parser PRIVATE BZip2::BZip2) + target_include_directories(mps_parser_static PRIVATE BZip2::BZip2) endif(MPS_PARSER_WITH_BZIP2) if(MPS_PARSER_WITH_ZLIB) target_include_directories(mps_parser PRIVATE ZLIB::ZLIB) + target_include_directories(mps_parser_static PRIVATE ZLIB::ZLIB) endif(MPS_PARSER_WITH_ZLIB)
🤖 Fix all issues with AI agents
In `@cpp/cuopt_remote_server.cpp`:
- Around line 303-313: generate_job_id currently declares static std::mt19937
gen and static std::uniform_int_distribution<uint64_t> dis which are shared
across threads and cause data races; modify generate_job_id to use thread-local
RNG state (e.g., make rd, gen, and dis thread_local) or protect the static
generators with a mutex so concurrent calls are safe; update the function
implementation around the symbols generate_job_id, gen, dis (and rd if present)
to apply one of these fixes and ensure generated IDs remain formatted the same.
- Around line 289-297: The signal handler signal_handler currently uses
non-async-signal-safe operations (std::cout, accessing shm_ctrl, and
result_cv.notify_all()), so change it to only set an atomic flag (e.g., a global
volatile sig_atomic_t or std::atomic<bool> shutdown_signal_received) and set
keep_running = false if that is signal-safe; remove direct access to shm_ctrl
and result_cv from signal_handler. In the main accept loop or a dedicated
monitor thread, poll or wait on shutdown_signal_received and there perform the
non-safe actions: log the shutdown message (std::cout), set
shm_ctrl->shutdown_requested, and call result_cv.notify_all(). Update references
in code to use shutdown_signal_received where signal_handler used to toggle
shutdown behavior.
- Around line 548-562: The current loop that finds a free slot in result_queue
sets fields non-atomically and races on result_queue[i].ready; replace this
linear check-and-write with an atomic claim using an atomic flag and
compare_exchange_strong (the same pattern used in the job-claiming code path) so
only one worker can win the slot before writing other fields. Concretely, change
the probe that tests result_queue[i].ready into an atomic compare-and-set (e.g.,
attempt to swap from false to true using std::atomic<bool> or an integer flag),
and only on successful compare_exchange_strong proceed to populate job_id,
status, data_size, shm_data_name, worker_index, error_message, retrieved, etc.;
apply the same fix to all other result_queue allocation sites referenced in the
review (the blocks around the given snippet and at the other line ranges noted)
to ensure atomic slot claiming.
In `@cpp/src/linear_programming/solver_solution.cu`:
- Around line 442-476: The host accessor methods of
optimization_problem_solution_t (get_primal_solution_host,
get_dual_solution_host, get_reduced_cost_host and their const overloads)
dereference unique_ptr members without checking is_device_memory_; add a runtime
guard to prevent null dereference by asserting or throwing when
is_device_memory_ is true or when the corresponding unique_ptr
(primal_solution_host_, dual_solution_host_, reduced_cost_host_) is null; update
each accessor to check is_device_memory_ (or pointer) and produce a clear error
(e.g., assert/throw with a message) before returning *primal_solution_host_,
*dual_solution_host_, or *reduced_cost_host_.
- Around line 411-421: The device accessor methods (get_primal_solution(),
get_dual_solution(), get_reduced_cost()) currently dereference the unique_ptr
members (e.g., primal_solution_) unconditionally and can dereference null when
is_device_memory_ is false; update each accessor to guard against null by
asserting or throwing a clear precondition (e.g., check is_device_memory_ or
that the unique_ptr is non-null) before dereferencing, and document the behavior
(or return/throw) so callers must call is_device_memory_ first when appropriate;
make the change inside the methods referenced above to validate
primal_solution_/dual_solution_/reduced_cost_ before returning *ptr.
In `@cpp/src/linear_programming/utilities/protobuf_serializer.cu`:
- Around line 1303-1334: lp_solution_to_proto currently calls cudaMemcpy on
device buffers (d_primal, d_dual, d_reduced_cost via
nc_solution.get_reduced_cost()) without checking for CUDA errors; update each
cudaMemcpy call to capture and check the returned cudaError_t and handle
failures (e.g., log via processLogger/return an error/throw) so serialization
reports failures instead of silently continuing. Locate the cudaMemcpy usages in
lp_solution_to_proto (references: d_primal, d_dual,
nc_solution/get_reduced_cost, pb_solution) and wrap each call by storing the
result (cudaError_t err = cudaMemcpy(...)); if err != cudaSuccess, convert the
error to a clear error path (return an error code or throw an exception or call
the existing logger and abort the proto conversion) so callers receive the
failure.
In `@docs/developer/SERIALIZATION_PLUGIN_GUIDE.md`:
- Around line 570-573: The doc incorrectly points to
`cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp` and
its `CMakeLists.txt` as a working MsgPack C++ example; remove or replace that
reference and instead explain that MsgPack serialization is implemented in the
Python client `cuopt-sh-client` using the `msgpack` and `msgpack-numpy`
libraries, or link to an actual C++ serializer that exists under
`cpp/src/linear_programming/utilities/serializers/` if you want a C++ example;
update the sentence and any bullets to reference `cuopt-sh-client` and the
Python libraries (`msgpack`, `msgpack-numpy`) or the correct C++ serializer file
name accordingly.
🟡 Minor comments (16)
python/cuopt/cuopt/linear_programming/internals/internals.pyx-52-56 (1)
52-56: Validatetypestrandshapeto avoid mis-sized buffersThe current
elsefallback silently treats any non-float32type asfloat64. If a different typestr slips through, this can create a wrong-sized view and read beyond the buffer. Consider explicit validation and a fast-fail.Proposed fix
def get_numpy_array(self, data, shape, typestr): - ctype = ctypes.c_float if typestr == "float32" else ctypes.c_double + dtype_map = {"float32": ctypes.c_float, "float64": ctypes.c_double} + if not isinstance(shape, int) or shape < 0: + raise ValueError("shape must be a non-negative int") + try: + ctype = dtype_map[typestr] + except KeyError: + raise ValueError(f"Unsupported typestr: {typestr}") buf_type = ctype * shape buf = buf_type.from_address(data) numpy_array = np.ctypeslib.as_array(buf) return numpy_arrayGRPC_BUILD_STATUS.md-169-171 (1)
169-171: Clarify the branch reference to avoid confusion.
The doc namesgrpc-implementation, but this PR isremote_execution_2. Consider updating or removing the branch line to keep this status doc accurate.🔧 Suggested edit
-**Branch**: `grpc-implementation` (3 commits, ready for review) +**Branch**: `remote_execution_2` (this PR)cpp/libmps_parser/CMakeLists.txt-81-85 (1)
81-85: Static librarymps_parser_staticmissing target properties and compile options.The static library is created but doesn't inherit the same
set_target_propertiesandtarget_compile_optionsapplied to the shared library (lines 87-100). This could cause ABI mismatches or compilation issues when linking.🔧 Proposed fix: Apply same properties to static library
+set_target_properties(mps_parser_static + PROPERTIES + CXX_STANDARD 20 + CXX_STANDARD_REQUIRED ON + POSITION_INDEPENDENT_CODE ON + CXX_SCAN_FOR_MODULES OFF +) + +target_compile_options(mps_parser_static + PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>" +)GRPC_PROTOTYPE_STATUS.md-113-116 (1)
113-116: Remove hardcoded local development path.The build instructions contain a developer-specific local path (
/home/tmckay/repos/nvidia-cuopt). This should be replaced with a generic placeholder or relative path for public documentation.Suggested fix
### Prerequisites ```bash # Ensure gRPC is available in conda environment conda install grpc-cpp libgrpc -c conda-forge # Build with gRPC enabled -cd /home/tmckay/repos/nvidia-cuopt +cd <path-to-cuopt-repo> ./build.sh # or your build command with -DBUILD_REMOTE_SERVER=ON</details> </blockquote></details> <details> <summary>docs/developer/SERIALIZATION_PLUGIN_GUIDE.md-182-197 (1)</summary><blockquote> `182-197`: **Incomplete code example - missing return statement.** The `deserialize_lp_solution` example has commented-out error handling but no actual return statement for the success path. This could confuse plugin developers. <details> <summary>📝 Suggested fix</summary> ```diff optimization_problem_solution_t<i_t, f_t> deserialize_lp_solution( const std::vector<uint8_t>& data) override { // Parse the solution data // Create and return solution object + optimization_problem_solution_t<i_t, f_t> solution; + // ... populate solution from data ... + return solution; // On error, return error solution: // return optimization_problem_solution_t<i_t, f_t>( // cuopt::logic_error("Parse error", cuopt::error_type_t::RuntimeError)); }cpp/cuopt_cancel_job.cpp-74-74 (1)
74-74:std::atoisilently returns 0 on invalid input.If
CUOPT_REMOTE_PORTcontains non-numeric text,atoireturns 0 without indication of error, which would cause connection failure to port 0.🔧 Suggested improvement
- if (env_port && env_port[0]) { port = std::atoi(env_port); } + if (env_port && env_port[0]) { + try { + port = std::stoi(env_port); + } catch (const std::exception&) { + std::cerr << "Warning: Invalid CUOPT_REMOTE_PORT value, using default\n"; + } + }cpp/cuopt_cancel_job.cpp-82-83 (1)
82-83:std::stoican throw on invalid input - consider adding error handling.If a user passes an invalid port string (e.g.,
-p abc),std::stoiwill throwstd::invalid_argumentorstd::out_of_range, causing an unhandled exception crash.🔧 Suggested fix
} else if (strcmp(argv[i], "-p") == 0 && i + 1 < argc) { - port = std::stoi(argv[++i]); + try { + port = std::stoi(argv[++i]); + } catch (const std::exception& e) { + std::cerr << "Error: Invalid port number\n"; + return 1; + } } else if (argv[i][0] != '-') {python/cuopt/cuopt/linear_programming/remote.py-141-173 (1)
141-173: Socket may not be closed on all exception paths.If an exception occurs after
sock.connect()but before the explicitsock.close()calls, the socket will leak. Consider using a context manager ortry/finally.🔧 Suggested fix using context manager pattern
try: # Connect to server - sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - sock.settimeout(30.0) # 30 second timeout - sock.connect((host, port)) - - # Send request (length-prefixed) - data = request.SerializeToString() - sock.sendall(struct.pack("<Q", len(data))) - sock.sendall(data) - - # Receive response - size_data = b"" - while len(size_data) < 8: - chunk = sock.recv(8 - len(size_data)) - if not chunk: - sock.close() - return CancelResult(...) - size_data += chunk - ... - sock.close() + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + sock.settimeout(30.0) # 30 second timeout + sock.connect((host, port)) + + # Send request (length-prefixed) + data = request.SerializeToString() + sock.sendall(struct.pack("<Q", len(data))) + sock.sendall(data) + + # Receive response + size_data = b"" + while len(size_data) < 8: + chunk = sock.recv(8 - len(size_data)) + if not chunk: + return CancelResult(...) + size_data += chunk + ...cpp/src/linear_programming/utilities/cuopt_remote_service.proto-210-221 (1)
210-221: Inconsistentjob_idtype:bytesvsstring.The
IncumbentRequest.job_idandIncumbent.job_idfields usebytestype (lines 211, 220), while all other request messages usestringtype forjob_id(e.g.,StatusRequest.job_idat line 155,GetResultRequest.job_idat line 160).This inconsistency could lead to encoding/decoding issues or require special handling in client/server code. If job IDs are always human-readable strings (like UUIDs), consider using
stringconsistently:message IncumbentRequest { - bytes job_id = 1; + string job_id = 1; int64 from_index = 2; // Return incumbents starting from this index int32 max_count = 3; // Optional limit (0 or negative => no limit) } message Incumbent { int64 index = 1; double objective = 2; repeated double assignment = 3; - bytes job_id = 4; + string job_id = 4; }cpp/src/linear_programming/utilities/remote_solve.cu-197-202 (1)
197-202: Log streaming thread capturesstop_logsby reference with volatile qualifier.The
volatile bool stop_logspattern is used to signal the log thread to terminate. Whilevolatileprevents compiler optimizations, it does not guarantee memory visibility across threads on modern architectures. Consider usingstd::atomic<bool>for proper cross-thread synchronization:🔒️ Suggested fix
- volatile bool stop_logs = false; + std::atomic<bool> stop_logs{false}; std::thread log_thread; if (settings.log_to_console) { log_thread = - std::thread([&]() { grpc_remote::stream_logs_to_stdout(address, job_id, &stop_logs, ""); }); + std::thread([&]() { grpc_remote::stream_logs_to_stdout(address, job_id, stop_logs, ""); }); }Note: This requires updating
stream_logs_to_stdoutsignature to acceptstd::atomic<bool>&orstd::atomic<bool>*.cpp/src/linear_programming/utilities/remote_solve.cu-38-47 (1)
38-47: Static variablelogged_no_deviceis not thread-safe.The
static bool logged_no_devicevariable can cause a data race ifcopy_incumbent_to_deviceis called concurrently from multiple threads. While the race condition only affects logging (not correctness), it violates thread-safety expectations.Consider using
std::atomic<bool>or accepting duplicate log messages:🔒️ Suggested fix
int device_count = 0; cudaError_t device_err = cudaGetDeviceCount(&device_count); if (device_err != cudaSuccess || device_count == 0) { - static bool logged_no_device = false; + static std::atomic<bool> logged_no_device{false}; - if (!logged_no_device) { + bool expected = false; + if (logged_no_device.compare_exchange_strong(expected, true)) { CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents"); - logged_no_device = true; } return false; }cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp-44-51 (1)
44-51: Missing port range validation and broad exception catch.The port parsing has two issues:
- No port range validation: Valid TCP ports are 1-65535. A value like
99999or0would be accepted.- Broad exception catch:
catch (...)silently ignores all exceptions, includingstd::out_of_range(for values outside int range). Consider catching specific exceptions and optionally logging.🛠️ Suggested fix
if (host != nullptr && port != nullptr && host[0] != '\0' && port[0] != '\0') { try { int port_num = std::stoi(port); + if (port_num < 1 || port_num > 65535) { + return std::nullopt; // Invalid port range + } return remote_solve_config_t{std::string(host), port_num}; - } catch (...) { - // Invalid port number, fall back to local solve + } catch (const std::invalid_argument&) { + // Non-numeric port value + return std::nullopt; + } catch (const std::out_of_range&) { + // Port value out of int range return std::nullopt; } }cpp/src/mip/solve.cu-486-494 (1)
486-494: Missing CUDA error checking on cudaMemcpyAsync and cudaStreamSynchronize.Per coding guidelines, CUDA memory operations should have error checking. The
cudaMemcpyAsyncandcudaStreamSynchronizecalls don't verify success:🛠️ Suggested fix
// Variable types need special handling (char array) auto var_types_span = gpu_view.get_variable_types(); if (var_types_span.size() > 0) { cpu_data.variable_types.resize(var_types_span.size()); - cudaMemcpyAsync(cpu_data.variable_types.data(), - var_types_span.data(), - var_types_span.size() * sizeof(char), - cudaMemcpyDeviceToHost, - stream); + cudaError_t err = cudaMemcpyAsync(cpu_data.variable_types.data(), + var_types_span.data(), + var_types_span.size() * sizeof(char), + cudaMemcpyDeviceToHost, + stream); + if (err != cudaSuccess) { + CUOPT_LOG_ERROR("Failed to copy variable types from device: %s", cudaGetErrorString(err)); + } } // Synchronize to ensure all copies are complete - cudaStreamSynchronize(stream); + cudaError_t sync_err = cudaStreamSynchronize(stream); + if (sync_err != cudaSuccess) { + CUOPT_LOG_ERROR("Stream synchronization failed: %s", cudaGetErrorString(sync_err)); + }cpp/src/linear_programming/utilities/cython_solve.cu-241-305 (1)
241-305: Preserve solved_by_pdlp from remote results.The remote LP path hard-codes
solved_by_pdlp_ = false. Useterm_info.solved_by_pdlpso callers see the actual solver used remotely.🛠️ Proposed fix
- lp_ret.solved_by_pdlp_ = false; + lp_ret.solved_by_pdlp_ = term_info.solved_by_pdlp;cpp/src/linear_programming/utilities/protobuf_serializer.cu-702-719 (1)
702-719: Silent failure when embedding solution in result response.When
successis true butParseFromArrayfails (lines 707, 712), the response is created withSUCCESSstatus but without solution data. The client will receive a success response with missing data.🐛 Recommended fix
if (is_mip) { cuopt::remote::MIPSolution mip_sol; if (mip_sol.ParseFromArray(result_data.data(), result_data.size())) { result->mutable_mip_solution()->CopyFrom(mip_sol); + } else { + result->set_status(cuopt::remote::ERROR_INTERNAL); + result->set_error_message("Failed to parse MIP solution"); } } else { cuopt::remote::LPSolution lp_sol; if (lp_sol.ParseFromArray(result_data.data(), result_data.size())) { result->mutable_lp_solution()->CopyFrom(lp_sol); + } else { + result->set_status(cuopt::remote::ERROR_INTERNAL); + result->set_error_message("Failed to parse LP solution"); } }cpp/src/linear_programming/solver_solution.cu-576-580 (1)
576-580: Inconsistentsolved_by_pdlpstate management.
get_solved_by_pdlp()returns the standalonesolved_by_pdlp_member, butwrite_additional_termination_statistics_to_file()(line 234) usestermination_stats_.solved_by_pdlp. These can become out of sync.🐛 Recommended fix: Use single source of truth
template <typename i_t, typename f_t> bool optimization_problem_solution_t<i_t, f_t>::get_solved_by_pdlp() const { - return solved_by_pdlp_; + return termination_stats_.solved_by_pdlp; } template <typename i_t, typename f_t> void optimization_problem_solution_t<i_t, f_t>::set_solved_by_pdlp(bool value) { - solved_by_pdlp_ = value; + termination_stats_.solved_by_pdlp = value; }Then remove the
solved_by_pdlp_member from the class.Also applies to: 622-626
🧹 Nitpick comments (29)
python/cuopt/cuopt/utilities/__init__.py (1)
15-26: Cache lazy imports after first access.The current implementation triggers
__getattr__on every access. Standard practice is to cache the imported symbol in the module namespace so subsequent accesses bypass__getattr__.♻️ Suggested improvement
def __getattr__(name): """Lazy import CUDA-dependent utilities.""" if name == "type_cast": from cuopt.utilities.type_casting import type_cast + globals()[name] = type_cast return type_cast elif name == "series_from_buf": from cuopt.utilities.utils import series_from_buf + globals()[name] = series_from_buf return series_from_buf elif name == "check_solution": from cuopt.utilities.utils import check_solution + globals()[name] = check_solution return check_solution raise AttributeError(f"module {__name__!r} has no attribute {name!r}")python/cuopt/cuopt/__init__.py (1)
19-24: Consider caching the imported module to avoid repeated__getattr__overhead.Without caching, every attribute access (e.g.,
cuopt.linear_programming) invokes__getattr__andimportlib.import_module. Whilesys.modulescaches the module itself, the lookup overhead remains. The standard PEP 562 pattern caches in the module namespace:♻️ Suggested improvement
def __getattr__(name): """Lazy import submodules to support CPU-only hosts with remote solve.""" if name in _submodules: import importlib - return importlib.import_module(f"cuopt.{name}") + module = importlib.import_module(f"cuopt.{name}") + globals()[name] = module + return module raise AttributeError(f"module {__name__!r} has no attribute {name!r}")cpp/libmps_parser/include/mps_parser/data_model_view.hpp (1)
409-426: Update parameter docs to reflect host/device flexibility.
Now that the view can represent CPU memory, the existing setter docs still say “Device memory pointer.” Consider updating the Doxygen text to “host or device” and referencingset_is_device_memory()for clarity.python/libcuopt/CMakeLists.txt (1)
47-49:BUILD_SHARED_LIBS ONhas global scope and may affect other dependencies.Setting
BUILD_SHARED_LIBS ONas a cache variable affects all subsequentadd_librarycalls without explicit type, including dependencies fetched after this point. Consider scoping this more narrowly or documenting the intentional global effect.♻️ Alternative: Use gRPC-specific variable if available
If gRPC respects
gRPC_BUILD_SHARED_LIBS, use that instead:-set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE) +# gRPC will build shared libs; avoid affecting other targetsOtherwise, document that this is intentional for the wheel build context.
cpp/scripts/run_dev_server.sh (1)
1-24: Add shell safety options and handle emptyLD_LIBRARY_PATH.The script would benefit from shell safety options and proper handling of an unset
LD_LIBRARY_PATH:♻️ Suggested improvements
#!/bin/bash # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +set -euo pipefail + # Development script to run the cuopt remote server with the correct library path # This is only needed during development when the build directory's libcuopt.so # needs to take precedence over the conda-installed version. SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" BUILD_DIR="${SCRIPT_DIR}/../build" if [ ! -f "${BUILD_DIR}/cuopt_remote_server" ]; then echo "Error: cuopt_remote_server not found. Please build with -DBUILD_REMOTE_SERVER=ON" exit 1 fi -export LD_LIBRARY_PATH="${BUILD_DIR}:${LD_LIBRARY_PATH}" +export LD_LIBRARY_PATH="${BUILD_DIR}${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}" echo "Starting cuopt remote server..." echo "Build directory: ${BUILD_DIR}"The
${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}syntax appends:$LD_LIBRARY_PATHonly if the variable is set and non-empty, avoiding a trailing colon.ci/utils/install_protobuf_grpc.sh (1)
22-24: Silent failure on gRPC installation may cause confusing downstream errors.The fallback chain ends with
|| true, which silently swallows installation failures. If gRPC packages aren't available, the build will fail later with a less obvious error about missing headers/libraries.Consider adding a verification step or at least a warning:
Suggested improvement
# gRPC C++ (headers/libs + grpc_cpp_plugin for codegen) # Package names can vary by repo; try the common ones first. - dnf install -y grpc-devel grpc-plugins || dnf install -y grpc-devel || true + dnf install -y grpc-devel grpc-plugins || dnf install -y grpc-devel || { + echo "WARNING: gRPC packages not found. gRPC-enabled builds may fail." + }GRPC_ARCHITECTURE.md (2)
227-233: Clarify TLS/authentication timeline for production deployments.The security section mentions TLS support "can enable later." For production deployments handling optimization data, consider documenting when TLS should be enabled (e.g., "TLS is recommended for any non-localhost deployment") to guide users on security posture.
183-188: Update roadmap to reflect current progress.Phase 1 shows the branch creation as unchecked, but this PR indicates the implementation is already in progress. Consider updating the checklist to reflect actual status.
cpp/cuopt_cancel_job.cpp (1)
80-81: Option-hconflicts with common help convention.Using
-hfor host conflicts with the common Unix convention where-hoften means help. Consider using--hostor-Hinstead, while keeping-has an alias for--help.python/cuopt/cuopt/linear_programming/remote.py (1)
22-28: Hardcoded development path is fragile.The hardcoded relative path to
cpp/buildis development-specific and may not exist in production installations. This could cause import failures or silently fail to find the protobuf module.♻️ Suggested improvement
Consider using a more robust approach or removing this entirely for production code:
- # Add build directory for development - build_path = os.path.join( - os.path.dirname(__file__), "..", "..", "..", "..", "..", "cpp", "build" - ) - if os.path.exists(build_path): - sys.path.insert(0, os.path.abspath(build_path)) + # Development: set PYTHONPATH to include cpp/build if needed import cuopt_remote_pb2 as pbAlternatively, wrap this in a
if "CUOPT_DEV_MODE" in os.environcheck.build.sh (3)
454-495: CMake configuration duplicated from libcuopt block.The cmake configuration (lines 461-483) is nearly identical to the libcuopt block (lines 425-446). This duplication increases maintenance burden and risks configuration drift.
Consider extracting shared cmake options into a variable or function:
# Example: Define common cmake options once COMMON_CMAKE_OPTS=( -DDEFINE_ASSERT=${DEFINE_ASSERT} -DDEFINE_BENCHMARK="${DEFINE_BENCHMARK}" # ... other shared options ) # Then use in both blocks: cmake "${COMMON_CMAKE_OPTS[@]}" -DCUOPT_ENABLE_GRPC=ON "${REPODIR}"/cpp
517-531: SKBUILD_EXTRA_ARGS processing logic is duplicated.The same semicolon-escaping and joining logic for
SKBUILD_EXTRA_ARGSappears twice (forcuoptandcuopt_mps_parser). Consider extracting this into a shell function to reduce duplication:♻️ Suggested refactor
# Define once before the build blocks: function build_skbuild_cmake_args { local extra_args=() for extra_arg in "${EXTRA_CMAKE_ARGS[@]}"; do extra_args+=("${extra_arg//;/\\;}") done local joined="" if [ ${`#extra_args`[@]} -gt 0 ]; then joined="$(IFS=';'; echo "${extra_args[*]}")" fi local base="-DCMAKE_PREFIX_PATH=${INSTALL_PREFIX};-DCMAKE_LIBRARY_PATH=${LIBCUOPT_BUILD_DIR};-DCMAKE_CUDA_ARCHITECTURES=${CUOPT_CMAKE_CUDA_ARCHITECTURES}" if [ -n "$joined" ]; then echo "${base};${joined}" else echo "${base}" fi } # Then in each build block: SKBUILD_CMAKE_ARGS="$(build_skbuild_cmake_args)"Also applies to: 538-550
418-424: Cache invalidation on ZLIB path changes is a pragmatic workaround.Automatically clearing
CMakeCache.txtwhen it references system ZLIB is a reasonable fix to avoid stale cache issues. However, this could be surprising to developers who expect their cache to persist.Consider adding a log message when the cache is cleared to improve visibility:
if grep -Eq "/(usr/)?lib64/cmake/ZLIB" "${LIBCUOPT_BUILD_DIR}/CMakeCache.txt"; then + echo "CMake cache references system ZLIB; clearing cache to use conda ZLIB..." rm -f "${LIBCUOPT_BUILD_DIR}/CMakeCache.txt" rm -rf "${LIBCUOPT_BUILD_DIR}/CMakeFiles" ficpp/src/linear_programming/utilities/remote_solve.cu (1)
56-84: CUDA error handling uses raw return code checks instead of CUDA_CHECK macro.Per coding guidelines, CUDA kernel launches and memory operations should use
CUDA_CHECKor equivalent verification. The current implementation manually checks return codes, which is correct but inconsistent with the project's error handling patterns.Consider using the project's standard CUDA error checking macros for consistency:
- if (cudaMalloc(reinterpret_cast<void**>(d_assignment_out), n * sizeof(f_t)) != cudaSuccess) { + cudaError_t err = cudaMalloc(reinterpret_cast<void**>(d_assignment_out), n * sizeof(f_t)); + if (err != cudaSuccess) { CUOPT_LOG_WARN("[remote_solve] Failed to cudaMalloc for incumbent assignment"); return false; }The current approach is functional, so this is a stylistic recommendation for consistency.
cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp (1)
25-28: Consider usinguint16_tfor port field.TCP ports are unsigned 16-bit values (1-65535). Using
uint16_twould make the valid range self-documenting:struct remote_solve_config_t { std::string host; - int port; + uint16_t port; };This would require including
<cstdint>and adjusting the parsing logic slightly.cpp/cuopt_remote_server.cpp (2)
170-182: Potential race condition instdout_streamer_tdestructor.The
running_flag is set tofalseafterfflush(stdout)but the reader thread checksrunning_in a loop. The close ofpipe_fds_[0]is the actual mechanism that unblocks theread()call, so the logic works, but the flag assignment order could be clearer.Consider moving
running_ = falsebefore any cleanup to signal intent:~stdout_streamer_t() { if (!enabled_) return; + running_ = false; // Signal thread to stop first fflush(stdout); dup2(original_stdout_, STDOUT_FILENO); close(original_stdout_); - running_ = false; close(pipe_fds_[0]); if (reader_thread_.joinable()) { reader_thread_.join(); } }
2272-2274: Detached client threads may cause shutdown issues.Using
detach()for client handler threads means they cannot be joined during shutdown. If shutdown occurs while clients are being processed, these threads may access freed resources (e.g., shared memory aftercleanup_shared_memory()).Consider using a thread pool or tracking active threads:
// Alternative: Use a bounded thread pool or track threads std::vector<std::thread> client_threads; // ... in accept loop: client_threads.emplace_back([client_fd]() { handle_client(client_fd, config.stream_logs); }); // ... during shutdown, join all threadspython/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx (1)
488-499: Empty warm-start arrays for CPU/remote solve path.Returning empty arrays for warm-start data when data is not on GPU is a reasonable fallback. Consider adding a comment explaining that warm-start is not supported for remote solves.
else: - # CPU/remote solve - no warm start data available + # CPU/remote solve - warm-start data is not available/supported + # (PDLP warm-start requires GPU-resident iterates) current_primal_solution = np.array([], dtype=np.float64)cpp/src/linear_programming/utilities/cuopt_remote.proto (1)
104-136: Important: Document iteration_limit sentinel value.The comment on line 107-109 about using
-1as "unset" is crucial since proto3 defaults numeric fields to 0. This convention should be consistently applied and documented in client implementations.Consider adding a similar comment for
time_limitif 0.0 should mean "no limit" vs "instant timeout".cpp/src/linear_programming/cuopt_c.cpp (1)
378-477: Consider extracting shared CPU/GPU population helpers.The remote/local setup in
cuOptCreateProblem,cuOptCreateRangedProblem,cuOptCreateQuadraticProblem, andcuOptCreateQuadraticRangedProblemis nearly identical. A small helper to populateproblem_cpu_data_tand create the view would reduce drift and future bugs. Based on learnings, consider extracting the common population logic.cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (2)
274-306: Host accessor methods lack null-safety documentation.The host accessors (
get_primal_solution_host(), etc.) dereferenceunique_ptrmembers that may be null whenis_device_memory()returns true. The documentation states "Only valid when is_device_memory() returns false" but consider adding[[nodiscard]]and/or assertions in the implementation to help catch misuse.
397-401: Potential inconsistency: duplicatesolved_by_pdlpstorage.The class has both a standalone
solved_by_pdlp_member (line 509) andtermination_stats_.solved_by_pdlp(in the struct at line 93). This duplication could lead to inconsistencies. Consider using only one source of truth—either delegate totermination_stats_.solved_by_pdlpor remove the duplicate.Also applies to: 508-509
cpp/src/linear_programming/solver_solution.cu (1)
508-527: Inconsistent state possible with individual host setters.Each setter (
set_primal_solution_host,set_dual_solution_host,set_reduced_cost_host) independently setsis_device_memory_ = false. If a caller sets only one of these, the solution ends up in a mixed state where some data is on host, some may be uninitialized, butis_device_memory_is false.Consider either:
- Requiring all three to be set together (a single setter method)
- Documenting that callers must set all three
- Tracking initialization state per-vector
cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp (3)
37-40: Consider adding virtualclone()for polymorphic copying.The abstract interface has a virtual destructor but no way to clone instances. If callers need to copy serializer instances (e.g., for use in multiple threads), consider adding a virtual clone method.
219-237: Nested enums/structs require verbose qualification.The
job_status_tenum and result structs are nested inside the class template, requiring verbose access likeremote_serializer_t<int, double>::job_status_t. Consider moving these to namespace scope for easier use.
307-314: Use enum instead of magic integers for request types.
get_async_request_type()returns magic integers (0-6). Consider defining an enum for request types to improve type safety and readability:♻️ Suggested improvement
enum class async_request_type_t { SUBMIT_JOB = 0, CHECK_STATUS = 1, GET_RESULT = 2, DELETE_RESULT = 3, GET_LOGS = 4, CANCEL_JOB = 5, WAIT_FOR_RESULT = 6, INVALID = -1 }; virtual async_request_type_t get_async_request_type(const std::vector<uint8_t>& data) = 0;cpp/src/linear_programming/utilities/protobuf_serializer.cu (3)
1322-1324: Avoidconst_cast- add const overload toget_reduced_cost().The
const_castis used to work around the lack of a const overload forget_reduced_cost(). The header file (solver_solution.hpp) should add:const rmm::device_uvector<f_t>& get_reduced_cost() const;This would eliminate the need for the cast.
139-154: Duplicate MIP settings serialization code.The MIP settings serialization logic is duplicated between
serialize_mip_request()(lines 139-154) andserialize_async_mip_request()(lines 341-354). Extract this to a helper method to reduce duplication. Based on coding guidelines, code duplication should be refactored into shared utilities.♻️ Suggested helper method
private: void serialize_mip_settings_to_proto(const mip_solver_settings_t<i_t, f_t>& settings, cuopt::remote::MIPSolverSettings* pb_settings) { pb_settings->set_time_limit(settings.time_limit); pb_settings->set_relative_mip_gap(settings.tolerances.relative_mip_gap); pb_settings->set_absolute_mip_gap(settings.tolerances.absolute_mip_gap); // ... rest of settings }Then use it in both
serialize_mip_request()andserialize_async_mip_request().Also applies to: 341-354
194-217: Consider optimizing request type detection.
is_mip_request()potentially parses the same data multiple times (async request, MIP request, LP request). While functionally correct, this could be optimized by using a more efficient detection method or caching the parse result.
| void signal_handler(int signal) | ||
| { | ||
| if (signal == SIGINT || signal == SIGTERM) { | ||
| std::cout << "\n[Server] Received shutdown signal\n"; | ||
| keep_running = false; | ||
| if (shm_ctrl) { shm_ctrl->shutdown_requested = true; } | ||
| result_cv.notify_all(); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signal handler uses non-async-signal-safe functions.
The signal handler calls std::cout, accesses shm_ctrl, and calls result_cv.notify_all(). None of these are async-signal-safe. This can cause deadlocks or undefined behavior if the signal arrives while another thread holds a lock used by these operations.
🔒 Proposed fix using async-signal-safe pattern
+std::atomic<bool> shutdown_signal_received{false};
+
void signal_handler(int signal)
{
if (signal == SIGINT || signal == SIGTERM) {
- std::cout << "\n[Server] Received shutdown signal\n";
+ // Only use async-signal-safe operations
+ const char msg[] = "\n[Server] Received shutdown signal\n";
+ write(STDOUT_FILENO, msg, sizeof(msg) - 1);
+ shutdown_signal_received = true;
keep_running = false;
- if (shm_ctrl) { shm_ctrl->shutdown_requested = true; }
- result_cv.notify_all();
+ // Let the main loop handle shm_ctrl and cv notification
}
}Then in the main accept loop or a dedicated thread, check shutdown_signal_received and perform the non-safe operations there.
🤖 Prompt for AI Agents
In `@cpp/cuopt_remote_server.cpp` around lines 289 - 297, The signal handler
signal_handler currently uses non-async-signal-safe operations (std::cout,
accessing shm_ctrl, and result_cv.notify_all()), so change it to only set an
atomic flag (e.g., a global volatile sig_atomic_t or std::atomic<bool>
shutdown_signal_received) and set keep_running = false if that is signal-safe;
remove direct access to shm_ctrl and result_cv from signal_handler. In the main
accept loop or a dedicated monitor thread, poll or wait on
shutdown_signal_received and there perform the non-safe actions: log the
shutdown message (std::cout), set shm_ctrl->shutdown_requested, and call
result_cv.notify_all(). Update references in code to use
shutdown_signal_received where signal_handler used to toggle shutdown behavior.
| std::string generate_job_id() | ||
| { | ||
| static std::random_device rd; | ||
| static std::mt19937 gen(rd()); | ||
| static std::uniform_int_distribution<uint64_t> dis; | ||
|
|
||
| uint64_t id = dis(gen); | ||
| char buf[32]; | ||
| snprintf(buf, sizeof(buf), "job_%016lx", id); | ||
| return std::string(buf); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread-safety issue: generate_job_id uses static variables without synchronization.
The function uses static std::mt19937 gen and static std::uniform_int_distribution<uint64_t> dis which are not thread-safe. Multiple client handler threads calling this concurrently will cause data races.
🔒 Proposed fix using thread-local or mutex
std::string generate_job_id()
{
- static std::random_device rd;
- static std::mt19937 gen(rd());
- static std::uniform_int_distribution<uint64_t> dis;
+ static std::mutex mtx;
+ static std::random_device rd;
+ static std::mt19937 gen(rd());
+ static std::uniform_int_distribution<uint64_t> dis;
+ std::lock_guard<std::mutex> lock(mtx);
uint64_t id = dis(gen);
char buf[32];
snprintf(buf, sizeof(buf), "job_%016lx", id);
return std::string(buf);
}Alternatively, use thread_local for better performance:
std::string generate_job_id()
{
thread_local std::random_device rd;
thread_local std::mt19937 gen(rd());
thread_local std::uniform_int_distribution<uint64_t> dis;
// ...
}🤖 Prompt for AI Agents
In `@cpp/cuopt_remote_server.cpp` around lines 303 - 313, generate_job_id
currently declares static std::mt19937 gen and static
std::uniform_int_distribution<uint64_t> dis which are shared across threads and
cause data races; modify generate_job_id to use thread-local RNG state (e.g.,
make rd, gen, and dis thread_local) or protect the static generators with a
mutex so concurrent calls are safe; update the function implementation around
the symbols generate_job_id, gen, dis (and rd if present) to apply one of these
fixes and ensure generated IDs remain formatted the same.
| for (size_t i = 0; i < MAX_RESULTS; ++i) { | ||
| if (!result_queue[i].ready) { | ||
| strncpy(result_queue[i].job_id, job_id.c_str(), sizeof(result_queue[i].job_id) - 1); | ||
| result_queue[i].status = 2; // Cancelled status | ||
| result_queue[i].data_size = 0; | ||
| result_queue[i].shm_data_name[0] = '\0'; | ||
| result_queue[i].worker_index = worker_id; // For pipe mode | ||
| strncpy(result_queue[i].error_message, | ||
| "Job was cancelled", | ||
| sizeof(result_queue[i].error_message) - 1); | ||
| result_queue[i].retrieved = false; | ||
| result_queue[i].ready = true; | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race condition: Result queue slot allocation is not atomic.
Multiple workers can concurrently search for a free slot in result_queue. Two workers finding the same slot with !result_queue[i].ready could both attempt to write to it before either sets ready = true.
Consider using compare_exchange_strong similar to the job claiming pattern:
for (size_t i = 0; i < MAX_RESULTS; ++i) {
- if (!result_queue[i].ready) {
+ bool expected = false;
+ if (result_queue[i].ready.compare_exchange_strong(expected, true)) {
strncpy(result_queue[i].job_id, job_id.c_str(), sizeof(result_queue[i].job_id) - 1);
result_queue[i].status = 2; // Cancelled status
// ... populate other fields ...
- result_queue[i].ready = true;
break;
}
}This same pattern should be applied to all result_queue slot allocations (lines 639-652, 719-739, 756-788, 997-1010).
🤖 Prompt for AI Agents
In `@cpp/cuopt_remote_server.cpp` around lines 548 - 562, The current loop that
finds a free slot in result_queue sets fields non-atomically and races on
result_queue[i].ready; replace this linear check-and-write with an atomic claim
using an atomic flag and compare_exchange_strong (the same pattern used in the
job-claiming code path) so only one worker can win the slot before writing other
fields. Concretely, change the probe that tests result_queue[i].ready into an
atomic compare-and-set (e.g., attempt to swap from false to true using
std::atomic<bool> or an integer flag), and only on successful
compare_exchange_strong proceed to populate job_id, status, data_size,
shm_data_name, worker_index, error_message, retrieved, etc.; apply the same fix
to all other result_queue allocation sites referenced in the review (the blocks
around the given snippet and at the other line ranges noted) to ensure atomic
slot claiming.
| rmm::device_uvector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution() | ||
| { | ||
| return primal_solution_; | ||
| return *primal_solution_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| const rmm::device_uvector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution() | ||
| const | ||
| { | ||
| return primal_solution_; | ||
| return *primal_solution_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null pointer dereference in device accessors.
The device memory accessors (get_primal_solution(), get_dual_solution(), get_reduced_cost()) dereference unique_ptr members without checking if they're valid. When is_device_memory_ is false (CPU-only solution), these pointers are null, leading to undefined behavior.
🔒️ Recommended fix: Add defensive checks or assertions
template <typename i_t, typename f_t>
rmm::device_uvector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution()
{
+ cuopt_assert(is_device_memory_ && primal_solution_, "Device accessor called on host-memory solution");
return *primal_solution_;
}Alternatively, document the precondition clearly and ensure callers check is_device_memory() first.
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solver_solution.cu` around lines 411 - 421, The
device accessor methods (get_primal_solution(), get_dual_solution(),
get_reduced_cost()) currently dereference the unique_ptr members (e.g.,
primal_solution_) unconditionally and can dereference null when
is_device_memory_ is false; update each accessor to guard against null by
asserting or throwing a clear precondition (e.g., check is_device_memory_ or
that the unique_ptr is non-null) before dereferencing, and document the behavior
(or return/throw) so callers must call is_device_memory_ first when appropriate;
make the change inside the methods referenced above to validate
primal_solution_/dual_solution_/reduced_cost_ before returning *ptr.
| template <typename i_t, typename f_t> | ||
| std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution_host() | ||
| { | ||
| return *primal_solution_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| const std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution_host() const | ||
| { | ||
| return *primal_solution_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_dual_solution_host() | ||
| { | ||
| return *dual_solution_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| const std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_dual_solution_host() const | ||
| { | ||
| return *dual_solution_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_reduced_cost_host() | ||
| { | ||
| return *reduced_cost_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| const std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_reduced_cost_host() const | ||
| { | ||
| return *reduced_cost_host_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null pointer dereference in host accessors.
The host memory accessors dereference unique_ptr members without null checks. When is_device_memory_ is true, these host pointers are null.
Consider adding runtime assertions or checks:
🔒️ Recommended fix
template <typename i_t, typename f_t>
std::vector<f_t>& optimization_problem_solution_t<i_t, f_t>::get_primal_solution_host()
{
+ cuopt_assert(!is_device_memory_ && primal_solution_host_, "Host accessor called on device-memory solution");
return *primal_solution_host_;
}🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/solver_solution.cu` around lines 442 - 476, The
host accessor methods of optimization_problem_solution_t
(get_primal_solution_host, get_dual_solution_host, get_reduced_cost_host and
their const overloads) dereference unique_ptr members without checking
is_device_memory_; add a runtime guard to prevent null dereference by asserting
or throwing when is_device_memory_ is true or when the corresponding unique_ptr
(primal_solution_host_, dual_solution_host_, reduced_cost_host_) is null; update
each accessor to check is_device_memory_ (or pointer) and produce a clear error
(e.g., assert/throw with a message) before returning *primal_solution_host_,
*dual_solution_host_, or *reduced_cost_host_.
| if (d_primal.size() > 0) { | ||
| std::vector<f_t> h_primal(d_primal.size()); | ||
| cudaMemcpy( | ||
| h_primal.data(), d_primal.data(), d_primal.size() * sizeof(f_t), cudaMemcpyDeviceToHost); | ||
| for (const auto& v : h_primal) { | ||
| pb_solution->add_primal_solution(static_cast<double>(v)); | ||
| } | ||
| } | ||
|
|
||
| // Copy dual solution from device | ||
| if (d_dual.size() > 0) { | ||
| std::vector<f_t> h_dual(d_dual.size()); | ||
| cudaMemcpy( | ||
| h_dual.data(), d_dual.data(), d_dual.size() * sizeof(f_t), cudaMemcpyDeviceToHost); | ||
| for (const auto& v : h_dual) { | ||
| pb_solution->add_dual_solution(static_cast<double>(v)); | ||
| } | ||
| } | ||
|
|
||
| // For reduced cost, we can access via const cast since we're just reading | ||
| auto& nc_solution = const_cast<optimization_problem_solution_t<i_t, f_t>&>(solution); | ||
| auto& d_reduced_cost = nc_solution.get_reduced_cost(); | ||
| if (d_reduced_cost.size() > 0) { | ||
| std::vector<f_t> h_reduced_cost(d_reduced_cost.size()); | ||
| cudaMemcpy(h_reduced_cost.data(), | ||
| d_reduced_cost.data(), | ||
| d_reduced_cost.size() * sizeof(f_t), | ||
| cudaMemcpyDeviceToHost); | ||
| for (const auto& v : h_reduced_cost) { | ||
| pb_solution->add_reduced_cost(static_cast<double>(v)); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing CUDA error checking on cudaMemcpy calls.
The cudaMemcpy calls in lp_solution_to_proto() don't check for errors. CUDA errors should be checked and handled, especially in serialization code where failures should be reported to callers.
As per coding guidelines, every CUDA operation must have error checking.
🐛 Recommended fix
- cudaMemcpy(
- h_primal.data(), d_primal.data(), d_primal.size() * sizeof(f_t), cudaMemcpyDeviceToHost);
+ auto err = cudaMemcpy(
+ h_primal.data(), d_primal.data(), d_primal.size() * sizeof(f_t), cudaMemcpyDeviceToHost);
+ if (err != cudaSuccess) {
+ CUOPT_LOG_ERROR(std::string("[protobuf_serializer] cudaMemcpy failed: ") + cudaGetErrorString(err));
+ return; // or throw
+ }Apply similar error checking to all cudaMemcpy calls in this file (lines 1305, 1315, 1327, 1399).
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/utilities/protobuf_serializer.cu` around lines
1303 - 1334, lp_solution_to_proto currently calls cudaMemcpy on device buffers
(d_primal, d_dual, d_reduced_cost via nc_solution.get_reduced_cost()) without
checking for CUDA errors; update each cudaMemcpy call to capture and check the
returned cudaError_t and handle failures (e.g., log via processLogger/return an
error/throw) so serialization reports failures instead of silently continuing.
Locate the cudaMemcpy usages in lp_solution_to_proto (references: d_primal,
d_dual, nc_solution/get_reduced_cost, pb_solution) and wrap each call by storing
the result (cudaError_t err = cudaMemcpy(...)); if err != cudaSuccess, convert
the error to a clear error path (return an error code or throw an exception or
call the existing logger and abort the proto conversion) so callers receive the
failure.
| See the MsgPack serializer for a complete working example: | ||
| - `cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp` | ||
| - `cpp/src/linear_programming/utilities/serializers/CMakeLists.txt` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for msgpack serializer references
echo "=== Searching for msgpack serializer files ==="
fd -i "msgpack" --type f
echo -e "\n=== Checking serializers directory ==="
fd -t f "serializer" -x bash -c 'echo "{}"; head -1 "{}"' 2>/dev/null || true
echo -e "\n=== List all files in linear_programming/utilities/serializers/ ==="
find cpp/src/linear_programming/utilities/serializers/ -type f 2>/dev/null | head -20Repository: NVIDIA/cuopt
Length of output: 271
🌐 Web query:
NVIDIA cuopt msgpack serializer implementation reference
💡 Result:
For cuOpt, the MessagePack (msgpack) serialization is implemented in the Python clients, not (publicly) in the core cuOpt engine source.
Primary reference (cuOpt client behavior):
- The
cuopt-sh-clientpackage documentation states the client compresses request data using Pythonmsgpackfirst, and falls back topickleif msgpack serialization fails (with env-var overrides likeCUOPT_PREFER_PICKLE/CUOPT_USE_ZLIB). [1] - The self-hosted service client API shows the client’s
result_typedefault ismime_type.MSGPACK(i.e., responses are msgpack by default unless changed). [2]
What serializer implementation it’s based on (Python libraries):
cuopt-sh-clientdepends onmsgpackandmsgpack-numpy(so msgpack encoding/decoding is via the standard Python msgpack implementation, with numpy support via msgpack-numpy). [3]
If you need an “implementation reference” to the actual pack/unpack code paths, the closest authoritative references are therefore:
cuopt-sh-clientdocs for the compression/selection logic [1], and- the upstream Python libraries (
msgpack+msgpack-numpy) that provide the serializer/deserializer implementation used by the client [3].
Sources: [1] [2] [3]
Update documentation: msgpack serializer is not a C++ reference implementation.
The guide incorrectly references msgpack_serializer.cpp as a working example. Based on the codebase, msgpack serialization is implemented in the Python client layer (cuopt-sh-client) using the standard msgpack and msgpack-numpy libraries, not as a C++ engine serializer. Remove or replace this reference with an accurate description of msgpack's role (Python client compression) or point to a serializer that actually exists in cpp/src/linear_programming/utilities/serializers/.
🤖 Prompt for AI Agents
In `@docs/developer/SERIALIZATION_PLUGIN_GUIDE.md` around lines 570 - 573, The doc
incorrectly points to
`cpp/src/linear_programming/utilities/serializers/msgpack_serializer.cpp` and
its `CMakeLists.txt` as a working MsgPack C++ example; remove or replace that
reference and instead explain that MsgPack serialization is implemented in the
Python client `cuopt-sh-client` using the `msgpack` and `msgpack-numpy`
libraries, or link to an actual C++ serializer that exists under
`cpp/src/linear_programming/utilities/serializers/` if you want a C++ example;
update the sentence and any bullets to reference `cuopt-sh-client` and the
Python libraries (`msgpack`, `msgpack-numpy`) or the correct C++ serializer file
name accordingly.
📝 WalkthroughWalkthroughThis pull request introduces a complete gRPC-based remote solving architecture for cuOpt, enabling LP/MIP optimization problems to be solved on remote servers. Changes include comprehensive documentation, build system integration with gRPC and Protobuf dependencies, a full-featured remote solve server with async job management, protobuf-based serialization, updated C++/Python APIs supporting both local GPU and remote CPU solving paths, and lazy loading mechanisms for CPU-only hosts. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx (1)
617-617: Misleading list initialization.
[] * len(py_data_model_list)returns an empty list[], not a pre-allocated list. This doesn't cause a bug sinceappend()is used, but it's misleading code.🐛 Fix the list initialization
- solutions = [] * len(py_data_model_list) + solutions = []cpp/include/cuopt/linear_programming/utilities/internals.hpp (1)
23-46: Add Doxygen documentation for new callback memory location API and integrate into callback implementations.The new enum
callback_memory_locationand accessors (set_memory_location,get_memory_location,data_on_device) lack Doxygen documentation required for public header API. Additionally, the API is not yet integrated into callback implementations—Python callbacks incallbacks_implems.hppand the gRPC callback never callset_memory_location(), relying entirely on the unsafe DEVICE default. This is problematic since Python callbacks should likely be HOST (they callget_numpy_array()for host memory).Add brief Doxygen comments documenting:
callback_memory_locationenum: purpose and valid valuesset_memory_location(): when and why to call it for HOST-backed callbacksget_memory_location()/data_on_device(): usage expectationsThen update callback constructors or initialization to properly set
callback_memory_location::HOSTfor host-backed implementations (Python callbacks, gRPC incumbent pipe callback, etc.).cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (1)
43-52: Missing NULL check on PyObject_CallMethod result can cause crash.If
PyObject_CallMethodfails (e.g., the Python callback raises an exception), it returnsNULL. CallingPy_DECREF(res)on a NULL pointer will crash. The same issue exists inset_solutionat lines 86-90.🐛 Proposed fix
PyObject* res = PyObject_CallMethod(this->pyCallbackClass, "get_solution", "(OO)", numba_matrix, numpy_array); Py_DECREF(numba_matrix); Py_DECREF(numpy_array); - Py_DECREF(res); + if (res != nullptr) { + Py_DECREF(res); + } else { + // Log or handle the Python exception + PyErr_Print(); + }
🤖 Fix all issues with AI agents
In `@build.sh`:
- Around line 517-531: The SKBUILD_CMAKE_ARGS environment variable is being
built but not exported, so scikit-build-core won't see it; change the assignment
of SKBUILD_CMAKE_ARGS (the variable constructed from INSTALL_PREFIX,
LIBCUOPT_BUILD_DIR, CUOPT_CMAKE_CUDA_ARCHITECTURES and optionally
SKBUILD_EXTRA_ARGS_JOINED) to export SKBUILD_CMAKE_ARGS before invoking python
"${PYTHON_ARGS_FOR_INSTALL[@]}" . and likewise export the second
SKBUILD_CMAKE_ARGS occurrence noted in the script so the python/scikit-build
process inherits the variable.
In `@ci/utils/install_protobuf_grpc.sh`:
- Around line 22-25: The current dnf install command that ends with "|| true"
can hide failures when grpc_cpp_plugin is not installed; remove the "|| true"
and instead run the install (the existing dnf install -y grpc-devel grpc-plugins
|| dnf install -y grpc-devel) and then explicitly verify the codegen binary
exists (check for grpc_cpp_plugin on PATH or the expected plugin path) and exit
non‑zero with a clear error if it is missing; update the block around the dnf
install line and the subsequent verification to fail fast when grpc_cpp_plugin
is unavailable.
In `@cpp/CMakeLists.txt`:
- Around line 296-306: The CMake logic currently forces gRPC by calling
find_package(gRPC CONFIG REQUIRED) and unconditionally setting CUOPT_ENABLE_GRPC
via add_compile_definitions, breaking gRPC-free builds; change this by
introducing a CMake option (e.g., option(CUOPT_ENABLE_GRPC "Enable gRPC support"
OFF)) and guard the existing code that checks TARGET gRPC::grpc++, calls
find_package(gRPC ...), and sets add_compile_definitions(CUOPT_ENABLE_GRPC)
behind that option so gRPC is only required when the option is ON (or, if you
intend to make gRPC mandatory, update project docs and remove the option but
keep a clear message); ensure the message(STATUS ...) reflects the chosen path
and that proto/codegen and targets that reference gRPC::grpc++ are only
configured when CUOPT_ENABLE_GRPC is enabled.
In `@cpp/cuopt_remote_server.cpp`:
- Around line 303-313: generate_job_id currently uses shared static RNG state
(static std::mt19937 gen and static std::uniform_int_distribution<uint64_t> dis)
which is not thread-safe; make the RNG state thread-local or otherwise
synchronized: replace those static variables with thread_local instances (e.g.,
thread_local std::mt19937 and thread_local distribution) or protect access with
a mutex inside generate_job_id, ensuring unique IDs remain formatted via
snprintf and returned as before; update references in generate_job_id
accordingly to avoid data races.
In `@cpp/libmps_parser/CMakeLists.txt`:
- Around line 84-85: The static target mps_parser_static is missing the same
configuration as the shared target; update CMake to apply set_target_properties
on mps_parser_static to set CXX_STANDARD, CXX_STANDARD_REQUIRED,
POSITION_INDEPENDENT_CODE, and CXX_SCAN_FOR_MODULES, add the same
target_compile_options used for the shared target to mps_parser_static, and
mirror the conditional include directories for compression libs by adding
target_include_directories(mps_parser_static PRIVATE BZip2::BZip2) when
MPS_PARSER_WITH_BZIP2 and target_include_directories(mps_parser_static PRIVATE
ZLIB::ZLIB) when MPS_PARSER_WITH_ZLIB so the static library builds with the same
flags and includes.
In `@cpp/src/linear_programming/solve.cu`:
- Around line 1207-1221: The code around cudaPointerGetAttributes/cudaMemcpy in
solve.cu does not check CUDA return codes; wrap the calls
(cudaPointerGetAttributes, cudaMemcpy, any cudaMemcpyAsync,
cudaStreamSynchronize used in the nearby blocks) with the project’s CUDA
error-check helper so failures are propagated and handled instead of ignored;
specifically, replace direct calls around var_types/host_var_types with the
helper (e.g., check the return of cudaPointerGetAttributes(&attrs,
var_types.data()) and on device path call checked
cudaMemcpy(host_var_types.data(), var_types.data(),
var_types.size()*sizeof(char), cudaMemcpyDeviceToHost), and on host path avoid
cudaGetLastError() and instead check/clear errors via the helper), and apply the
same pattern to the similar blocks at the other locations (~1260-1290 and
~1406-1414) so all cudaMemcpy/cudaMemcpyAsync/cudaStreamSynchronize calls use
the centralized CUDA error-check helper and propagate failures.
- Around line 1303-1367: The cpu_problem_data_t struct and its create_view()
method do not preserve row_types and warm-start information (initial primal and
dual solutions) during GPU to CPU serialization. Add vector member variables to
cpu_problem_data_t for row_types, initial_primal_solution, and
initial_dual_solution. Update the create_view() method to conditionally set
these fields on the view using appropriate setter methods (similar to how
constraint_bounds and other fields are set). Additionally, ensure the
copy_view_to_cpu() function copies these fields from the source view to the
cpu_problem_data_t struct so they are preserved during serialization.
In `@cpp/src/linear_programming/solver_solution.cu`:
- Around line 441-476: The host accessors get_primal_solution_host,
get_dual_solution_host and get_reduced_cost_host (both const and non-const
overloads) dereference pointers (primal_solution_host_, dual_solution_host_,
reduced_cost_host_) that can be null for GPU-only constructions; add explicit
null checks at the start of each of these methods and, if the respective pointer
is null, throw a clear std::runtime_error (or appropriate exception type used in
the project) describing which host vector is unavailable (e.g.
"primal_solution_host_ is null: host data not initialized"); this ensures
callers get a clear error instead of UB — alternatively, if preferred, lazily
allocate the host vector here before dereferencing, but be explicit about which
behavior you choose and implement it consistently across all six accessor
overloads.
- Around line 410-439: The device accessor methods in
optimization_problem_solution_t (get_primal_solution, get_dual_solution,
get_reduced_cost) dereference pointers that can be null for CPU-only instances;
add defensive checks at the start of each method to validate the corresponding
pointer (primal_solution_, dual_solution_, reduced_cost_) and either throw a
clear exception (e.g., std::runtime_error with a message like "device access on
CPU-only solution") or use an assert to fail fast, so callers get a meaningful
error instead of undefined behavior when the pointer is null.
In `@cpp/src/linear_programming/utilities/cuopt_remote.proto`:
- Around line 21-24: The proto declares IndexType with INT64 and allows clients
to set index_type = INT64, but the actual index fields (A_indices, A_offsets,
Q_indices, Q_offsets) are int32 causing potential truncation; either remove
INT64 from the IndexType enum or change all those fields to 64-bit (int64) to
match; update the message definitions referencing A_indices, A_offsets,
Q_indices, Q_offsets (and any other index-related fields) to use int64_t
equivalents and ensure any code generating/consuming index_type handles the
promoted types consistently when index_type == INT64.
In `@cpp/src/linear_programming/utilities/protobuf_serializer.cu`:
- Around line 1303-1333: The cudaMemcpy calls that copy device vectors d_primal,
d_dual and d_reduced_cost into host vectors h_primal, h_dual and h_reduced_cost
must be checked for errors: wrap each cudaMemcpy call with the project's CUDA
error-check helper (or check the returned cudaError_t and handle non-success
values), and on error log/propagate the failure instead of proceeding to add
values to pb_solution; update the copy sites that handle
d_primal/d_dual/d_reduced_cost (and the corresponding host buffers
h_primal/h_dual/h_reduced_cost) as well as the analogous copy at lines 1395-1402
to use the error-checking helper and return or abort serialization on failure.
- Around line 871-905: The code in the loop inside the constraint bounds
calculation assumes arrays row_types_span and b_span have at least n_rows
elements, causing possible out-of-bounds access when their size is smaller. To
fix this, add a conditional check in the loop for each i to verify that i is
within the size of row_types_span before accessing rt_ptr[i], otherwise default
row_type to 'E'. Similarly, check that i is within b_span size before accessing
b_ptr[i], otherwise default rhs to 0. This guards indexing to prevent OOB reads
while maintaining correct defaults.
In `@cpp/src/linear_programming/utilities/remote_solve_grpc.hpp`:
- Around line 72-77: The stop flag parameter for stream_logs_to_stdout is using
volatile which is not thread-safe; change the function signature to accept
std::atomic<bool>* stop_flag, update the implementation to use stop_flag->load()
(and store() where written) for loop checks and any writers, and update all call
sites to pass a std::atomic<bool> instance instead of a volatile bool pointer so
signaling is synchronized across threads (refer to function
stream_logs_to_stdout and any callers that set/clear the flag).
In `@cpp/src/linear_programming/utilities/remote_solve.cu`:
- Around line 41-45: The static flag logged_no_device in
copy_incumbent_to_device is written without synchronization causing a race;
replace it with a thread-safe mechanism (e.g., a std::once_flag + std::call_once
or an std::atomic<bool>) so the "[remote_solve] No CUDA device available; using
host incumbents" message is logged exactly once without data races—locate
logged_no_device and the copy_incumbent_to_device function, remove the
unsynchronized static write, and implement one of those thread-safe patterns to
perform the logging safely.
- Around line 219-227: The current status check in the polling loop does not
handle unexpected values like "UNKNOWN", leading to infinite polling. Update the
status handling in the relevant loop in the remote_solve function to treat any
unknown or unsupported status as a failure: set stop_logs to true, join the log
thread if joinable, delete the remote result, and return an error indicating the
unexpected status. Apply this fix to both code sections handling status checks
around lines 219-227 and 402-410.
In `@cpp/src/mip/solver_solution.cu`:
- Around line 159-169: mip_solution_t's get_solution_host() and its const
overload dereference solution_host_ which can be nullptr when constructed with
the GPU path; update both get_solution_host() methods to check solution_host_
before dereferencing (e.g., if nullptr, either lazily allocate/copy device data
into solution_host_ or throw a clear exception) and ensure the chosen behavior
is consistent with other accessors (refer to mip_solution_t constructor(s),
solution_device_, and any get_solution_device() logic) so callers won't
dereference a null pointer.
- Around line 148-157: The get_solution() methods currently dereference
solution_ which can be nullptr for objects constructed via the CPU-only
constructor; modify mip_solution_t<i_t, f_t>::get_solution() (both const and
non-const overloads) to guard against null solution_—e.g., add an assertion or
throw a clear exception when solution_ == nullptr (or return/obtain an
appropriate CPU-side container if that is the intended behavior) so callers no
longer dereference a null pointer; ensure the guard message references solution_
and the class/method name for easy tracing.
🟡 Minor comments (17)
python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx-171-185 (1)
171-185: The fallback totype(obj)will fail when calling.astype()on non-array types.The code falls back to
type(obj)when the input lacks bothdtypeanddtypesattributes. While this comparison works fine withnp.issubdtype(), line 184 callsobj.astype()which fails for Python scalars (e.g.,int,float) that lack this method. For example, if a scalar5reaches this fallback,obj.astype()raisesAttributeError: 'int' object has no attribute 'astype'.Although current usage only passes numpy arrays or Series (which have
.astype()), the function should explicitly reject unsupported types rather than silently fail at line 184.🐛 Suggested fix
elif hasattr(obj, 'dtypes'): # DataFrame-like object if all([np.issubdtype(dtype, np.number) for dtype in obj.dtypes]): # noqa obj_type = obj.dtypes[0] else: msg = "All columns in " + name + " should be numeric" raise Exception(msg) else: - obj_type = type(obj) + raise Exception( + f"{name} must be a numpy array or have a dtype attribute" + )GRPC_BUILD_STATUS.md-66-70 (1)
66-70: Add language specifier to fenced code block.The code block at lines 66-70 is missing a language identifier. For environment/version output, use
textorplaintext:-``` +```text libgrpc: 1.73.1 libprotobuf: 6.31.1 protobuf (pip): 6.33.2</blockquote></details> <details> <summary>docs/developer/SOLUTION_MEMORY_ARCHITECTURE.md-73-78 (1)</summary><blockquote> `73-78`: **Specify a language for the ASCII diagram fences.** This removes MD040 warnings and keeps formatting stable. <details> <summary>🔧 Suggested fix</summary> ```diff -``` +```text ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ Solver │ ──► │ Solution │ ──► │ User │ │ (GPU) │ │ (GPU mem) │ │ (GPU/CPU) │ └─────────────┘ └─────────────┘ └─────────────┘```diff -``` +```text ┌──────────┐ ┌──────────────────────────────────────────┐ ┌──────────┐ │ Client │ ──► │ SERVER │ ◄── │ Client │ │ (no GPU) │ │ GPU solve → to_host() → serialize │ │(solution)│ └──────────┘ └──────────────────────────────────────────┘ └──────────┘</details> Also applies to: 90-95 </blockquote></details> <details> <summary>GRPC_PROTOTYPE_STATUS.md-130-144 (1)</summary><blockquote> `130-144`: **Add a language tag to the expected-output fence.** markdownlint flags the unlabeled fence; using `text` (or similar) keeps it compliant and clearer. <details> <summary>🔧 Suggested fix</summary> ```diff -``` +```text [Client] Job submitted successfully [Client] Job ID: <hex-id> [Client] Polling for job completion... [Client] Job status: QUEUED - Job queued [Client] Job status: PROCESSING - Job processing [Client] Job status: COMPLETED - Job completed [Client] Retrieving result... [Client] Result retrieved successfully [Client] LP Solution: [Client] Status: 2 (PDLP_OPTIMAL) [Client] Objective: 42.0 (dummy value) [Client] Test completed successfully!</details> </blockquote></details> <details> <summary>cpp/cuopt_cancel_job.cpp-18-18 (1)</summary><blockquote> `18-18`: **Add cuopt_cancel_job.cpp to the build target.** The file `cpp/cuopt_cancel_job.cpp` is not registered in any CMakeLists.txt. The header `<cuopt/linear_programming/utilities/remote_solve.hpp>` exists and is correctly located; however, the source file itself must be added to a build target (either `cuopt` or `cuopt_cli`) for compilation to succeed. Verify which target should own this utility and add it to the appropriate `add_executable()` or source list in `cpp/CMakeLists.txt`. </blockquote></details> <details> <summary>cpp/cuopt_cancel_job.cpp-70-84 (1)</summary><blockquote> `70-84`: **Harden port parsing and range validation.** `std::stoi` can throw on invalid input, and `atoi` silently accepts bad values. Validate ports and surface a clean error. <details> <summary>🔧 Suggested fix</summary> ```diff +#include <limits> + +bool parse_port(const char* s, int& out) +{ + if (!s || !*s) return false; + char* end = nullptr; + long v = std::strtol(s, &end, 10); + if (*end != '\0' || v < 1 || v > 65535) return false; + out = static_cast<int>(v); + return true; +} + int main(int argc, char** argv) { // Parse arguments std::string job_id; std::string host = "localhost"; int port = 9090; @@ const char* env_host = std::getenv("CUOPT_REMOTE_HOST"); const char* env_port = std::getenv("CUOPT_REMOTE_PORT"); if (env_host && env_host[0]) { host = env_host; } - if (env_port && env_port[0]) { port = std::atoi(env_port); } + if (env_port && env_port[0]) { + if (!parse_port(env_port, port)) { + std::cerr << "Error: Invalid CUOPT_REMOTE_PORT value\n"; + return 1; + } + } @@ - } else if (strcmp(argv[i], "-p") == 0 && i + 1 < argc) { - port = std::stoi(argv[++i]); + } else if (strcmp(argv[i], "-p") == 0 && i + 1 < argc) { + if (!parse_port(argv[++i], port)) { + std::cerr << "Error: Invalid port value\n\n"; + print_usage(argv[0]); + return 1; + } } else if (argv[i][0] != '-') { job_id = argv[i]; } }
cpp/libmps_parser/include/mps_parser/data_model_view.hpp-409-426 (1)
409-426: Ensure all view construction paths explicitly setis_device_memoryflag for clarity and consistency.The flag defaults to
false(CPU memory), which is correct for MPS data. However,cpp/cuopt_cli.cpp'screate_view_from_mps_data_model()relies on this default without explicitly callingset_is_device_memory(false), unlike the equivalent functions incpp/src/linear_programming/solve.cu(line 1167) andcpp/src/mip/solve.cu(line 508), which explicitly set the flag. For consistency and defensive clarity, all view creation helpers should explicitly set the flag after construction rather than relying on defaults.
cpp/src/linear_programming/utilities/cuopt_remote_service.proto-210-221 (1)
210-221: Inconsistentjob_idtype:bytesvsstring.
IncumbentRequest.job_id(line 211) andIncumbent.job_id(line 221) usebytes, while all other messages (StatusRequest,GetResultRequest,ResultChunk,DeleteRequest,CancelRequest,WaitRequest,StreamLogsRequest) usestringforjob_id.This inconsistency may cause confusion and require type conversions in client code. Consider standardizing on one type (typically
stringfor human-readable identifiers, orbytesif UUIDs are stored as raw binary).💡 Suggested fix to use string consistently
message IncumbentRequest { - bytes job_id = 1; + string job_id = 1; int64 from_index = 2; // Return incumbents starting from this index int32 max_count = 3; // Optional limit (0 or negative => no limit) } message Incumbent { int64 index = 1; double objective = 2; repeated double assignment = 3; - bytes job_id = 4; + string job_id = 4; }
cpp/cuopt_remote_server.cpp-548-562 (1)
548-562: Potential result queue slot exhaustion when storing cancelled job results.When a cancelled job stores its result, if all result queue slots are full (all have
ready == true), the loop exits without storing the result, leaving the job in an inconsistent state. Consider adding error handling or logging.
docs/developer/SERIALIZATION_PLUGIN_GUIDE.md-374-377 (1)
374-377: Add language specifier to fenced code block.Per static analysis (MD040), use
textorplaintextfor console output examples.📝 Proposed fix
-``` +```text [remote_solve] Loading custom serializer from: /path/to/libcuopt_my_serializer.so [remote_solve] Using custom serializer: my_format</details> </blockquote></details> <details> <summary>docs/developer/SERIALIZATION_PLUGIN_GUIDE.md-13-25 (1)</summary><blockquote> `13-25`: **Add language specifier to fenced code block.** Per static analysis (MD040), fenced code blocks should have a language specified. For ASCII diagrams, use `text` or `plaintext`. <details> <summary>📝 Proposed fix</summary> ```diff -``` +```text ┌─────────────┐ ┌─────────────┐ │ Client │ │ Server │
docs/developer/SERIALIZATION_PLUGIN_GUIDE.md-182-197 (1)
182-197: Incomplete method stubs may cause compilation errors.The
deserialize_lp_solutionanddeserialize_mip_solutionmethods lack return statements. Developers copying this template will encounter compilation errors. Add explicit return statements with placeholder solutions or// TODOcomments.📝 Proposed fix
optimization_problem_solution_t<i_t, f_t> deserialize_lp_solution( const std::vector<uint8_t>& data) override { // Parse the solution data // Create and return solution object + // TODO: Implement actual deserialization logic + return optimization_problem_solution_t<i_t, f_t>(); // On error, return error solution: // return optimization_problem_solution_t<i_t, f_t>( // cuopt::logic_error("Parse error", cuopt::error_type_t::RuntimeError)); } mip_solution_t<i_t, f_t> deserialize_mip_solution( const std::vector<uint8_t>& data) override { // Similar to LP solution + // TODO: Implement actual deserialization logic + return mip_solution_t<i_t, f_t>(); }
conda/recipes/libcuopt/recipe.yaml-152-153 (1)
152-153: Addgrpc-cppto therunrequirements for thelibcuoptpackage.The
cuopt_grpc_serverexecutable is installed as part of the libcuopt package and requires gRPC at runtime. Even thoughgrpc-cppis linked privately during the build, the installed binary still needs the library to function. This should follow the same pattern aslibprotobuf, which is also in therunsection despite being privately linked.
cpp/src/mip/solve.cu-482-494 (1)
482-494: Missing CUDA error checking oncudaMemcpyAsyncandcudaStreamSynchronize.Per coding guidelines, every CUDA memory operation must have error checking. The
cudaMemcpyAsyncandcudaStreamSynchronizecalls should be wrapped with error verification.🔧 Proposed fix
// Variable types need special handling (char array) auto var_types_span = gpu_view.get_variable_types(); if (var_types_span.size() > 0) { cpu_data.variable_types.resize(var_types_span.size()); - cudaMemcpyAsync(cpu_data.variable_types.data(), - var_types_span.data(), - var_types_span.size() * sizeof(char), - cudaMemcpyDeviceToHost, - stream); + CUOPT_CUDA_TRY(cudaMemcpyAsync(cpu_data.variable_types.data(), + var_types_span.data(), + var_types_span.size() * sizeof(char), + cudaMemcpyDeviceToHost, + stream)); } // Synchronize to ensure all copies are complete - cudaStreamSynchronize(stream); + CUOPT_CUDA_TRY(cudaStreamSynchronize(stream));
cpp/src/linear_programming/cuopt_c.cpp-364-375 (1)
364-375: Potential memory leak ifmps_data_model_to_optimization_problemthrows.Unlike
cuOptCreateProblemand similar functions which wrap GPU operations in try/catch and deleteproblem_and_streamon exception, this code path lacks exception handling. If the GPU allocation fails,problem_and_streamwill leak.🔧 Proposed fix
} else { // Local: store in GPU memory using existing mps_data_model_to_optimization_problem - problem_and_stream->gpu_problem = - std::make_unique<optimization_problem_t<cuopt_int_t, cuopt_float_t>>( - mps_data_model_to_optimization_problem(problem_and_stream->get_handle_ptr(), - *mps_data_model_ptr)); - // Create view pointing to GPU data - problem_and_stream->create_view_from_gpu_problem(); + try { + problem_and_stream->gpu_problem = + std::make_unique<optimization_problem_t<cuopt_int_t, cuopt_float_t>>( + mps_data_model_to_optimization_problem(problem_and_stream->get_handle_ptr(), + *mps_data_model_ptr)); + // Create view pointing to GPU data + problem_and_stream->create_view_from_gpu_problem(); + } catch (const std::exception& e) { + CUOPT_LOG_INFO("Error creating GPU problem: %s", e.what()); + delete problem_and_stream; + *problem_ptr = nullptr; + return CUOPT_INVALID_ARGUMENT; + } }
python/cuopt/cuopt/linear_programming/remote.py-141-217 (1)
141-217: Socket resource leak on exception paths.If an exception occurs after socket creation (line 143) but before
sock.close()(line 173), the socket may leak. Use a context manager ortry/finallyto ensure proper cleanup.🔧 Proposed fix using context manager
try: # Connect to server - sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - sock.settimeout(30.0) # 30 second timeout - sock.connect((host, port)) - - # Send request (length-prefixed) - data = request.SerializeToString() - sock.sendall(struct.pack("<Q", len(data))) - sock.sendall(data) - - # Receive response - size_data = b"" - while len(size_data) < 8: - chunk = sock.recv(8 - len(size_data)) - if not chunk: - sock.close() - return CancelResult( - success=False, - message="Failed to receive response size", - job_status=JobStatus.NOT_FOUND, - ) - size_data += chunk - - size = struct.unpack("<Q", size_data)[0] - response_data = b"" - while len(response_data) < size: - chunk = sock.recv(size - len(response_data)) - if not chunk: - break - response_data += chunk - - sock.close() + with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: + sock.settimeout(30.0) # 30 second timeout + sock.connect((host, port)) + + # Send request (length-prefixed) + data = request.SerializeToString() + sock.sendall(struct.pack("<Q", len(data))) + sock.sendall(data) + + # Receive response + size_data = b"" + while len(size_data) < 8: + chunk = sock.recv(8 - len(size_data)) + if not chunk: + return CancelResult( + success=False, + message="Failed to receive response size", + job_status=JobStatus.NOT_FOUND, + ) + size_data += chunk + + size = struct.unpack("<Q", size_data)[0] + response_data = b"" + while len(response_data) < size: + chunk = sock.recv(size - len(response_data)) + if not chunk: + break + response_data += chunk
cpp/src/linear_programming/utilities/protobuf_serializer.cu-702-715 (1)
702-715: Treat result_data parse failures as errors.
If parsingresult_datafails, the response still reportsSUCCESSbut carries no solution payload, which downstream treats as a missing-solution error. Please flip the status to an error and return a message when parsing fails.🔧 Suggested fix
if (is_mip) { cuopt::remote::MIPSolution mip_sol; - if (mip_sol.ParseFromArray(result_data.data(), result_data.size())) { - result->mutable_mip_solution()->CopyFrom(mip_sol); - } + if (!mip_sol.ParseFromArray(result_data.data(), result_data.size())) { + result->set_status(cuopt::remote::ERROR_INTERNAL); + result->set_error_message("Failed to parse MIP result payload"); + } else { + result->mutable_mip_solution()->CopyFrom(mip_sol); + } } else { cuopt::remote::LPSolution lp_sol; - if (lp_sol.ParseFromArray(result_data.data(), result_data.size())) { - result->mutable_lp_solution()->CopyFrom(lp_sol); - } + if (!lp_sol.ParseFromArray(result_data.data(), result_data.size())) { + result->set_status(cuopt::remote::ERROR_INTERNAL); + result->set_error_message("Failed to parse LP result payload"); + } else { + result->mutable_lp_solution()->CopyFrom(lp_sol); + } }
🧹 Nitpick comments (33)
python/cuopt/cuopt/linear_programming/solver/solver_wrapper.pyx (5)
54-56: Unused import and module-level pyarrow.
InputValidationErroris imported but doesn't appear to be used in this file. Additionally,pyarrowis imported at module level but only used forpa.float64()as a dtype specifier. Consider:
- Removing unused
InputValidationErrorif not needed.- Using
np.float64instead ofpa.float64()since numpy is already imported and they're equivalent for dtype purposes.
59-79: Consider validating port range.The port validation only checks if it's an integer, but valid ports are 1-65535. A port value of
-1or99999would pass validation but fail at connection time.♻️ Suggested improvement
if host and port: try: - int(port) # Validate port is a valid integer - return True + port_num = int(port) + return 1 <= port_num <= 65535 except ValueError: return False return False
82-91: Repeated lazy imports could be cached.While Python caches imports internally,
_get_cuda_imports()is called repeatedly (once per_convert_device_buffer_to_numpycall). For a solution with many arrays, this adds overhead. Consider caching at module level:♻️ Suggested caching pattern
+_cuda_imports_cache = None + def _get_cuda_imports(): """Lazily import CUDA-dependent modules. Only call this when GPU operations are actually needed. """ + global _cuda_imports_cache + if _cuda_imports_cache is not None: + return _cuda_imports_cache import cupy as cp import cudf from cuopt.utilities import series_from_buf - return cp, cudf, series_from_buf + _cuda_imports_cache = (cp, cudf, series_from_buf) + return _cuda_imports_cache
139-152: LGTM with a minor error message update.The dual-path logic for CPU/GPU arrays is correct. The error message could be updated to reflect the broader support:
♻️ Suggested message update
else: raise Exception( - "get_data_ptr must be called with cudf.Series or np.ndarray" + "get_data_ptr must be called with np.ndarray or CUDA array" )
488-498: Document the warm start limitation for remote solves.The code silently returns empty arrays for warm start data in CPU/remote solves. Users expecting warm start functionality for remote solves may be surprised by this behavior. Consider adding a warning or documenting this limitation.
📝 Optional: Add warning for better UX
else: # CPU/remote solve - no warm start data available + import warnings + warnings.warn( + "Warm start data is not available for CPU/remote solves", + stacklevel=2 + ) current_primal_solution = np.array([], dtype=np.float64)Note: This may be intentional and documented elsewhere. Verify if users are expected to know about this limitation.
#!/bin/bash # Check if warm start limitation is documented rg -i "warm.?start" --type rst --type md -C2 | head -50conda/environments/all_cuda-129_arch-x86_64.yaml (1)
28-28: Consider pinning compatible versions forgrpc-cppandlibprotobuf.The
GRPC_BUILD_STATUS.mdin this PR documents version conflicts betweengrpc-cpp(which requires protobuf 3.x) andlibprotobuf6.x. Without explicit version constraints here, CI builds may encounter the same conflicts.Per the recommendation in
GRPC_BUILD_STATUS.md, consider pinning compatible versions:- grpc-cpp + grpc-cpp=1.62 ... - libprotobuf + libprotobuf=5.27Alternatively, ensure
dependencies.yaml(the source file) specifies compatible version ranges.Also applies to: 36-36
python/cuopt/cuopt/linear_programming/internals/internals.pyx (1)
52-57: Memory lifetime responsibility should be documented.The
ctypes.from_address()approach creates a view over raw memory without ownership. The returned numpy array will become invalid if the underlying memory is deallocated.This is appropriate for callback contexts where the memory is guaranteed valid during the callback, but consider adding a brief comment clarifying the lifetime contract:
def get_numpy_array(self, data, shape, typestr): + # Note: Returned array is a view over `data` address - caller must + # ensure the underlying memory remains valid for the array's lifetime. ctype = ctypes.c_float if typestr == "float32" else ctypes.c_double buf_type = ctype * shape buf = buf_type.from_address(data) numpy_array = np.ctypeslib.as_array(buf) return numpy_arraycpp/include/cuopt/linear_programming/mip/solver_solution.hpp (1)
80-85: Consider adding a precondition check inget_solution_host()accessors.The documentation states these are "only valid when
is_device_memory()returns false." Consider adding an assertion or throwing an exception when the precondition is violated to catch misuse early rather than returning invalid/empty data.💡 Suggested defensive check
std::vector<f_t>& get_solution_host() { cuopt_expects(!is_device_memory_, error_type_t::ValidationError, "get_solution_host() called but data is in device memory"); return *solution_host_; }GRPC_ARCHITECTURE.md (1)
101-120: Minor: Python examplestream_logsis incomplete.The
stream_logsmethod references an undefinedrequestvariable. Consider completing the example for clarity.📝 Suggested fix
def stream_logs(self, job_id: str): """Generator that yields log lines as they arrive""" + request = cuopt_remote_pb2.StreamLogsRequest(job_id=job_id) for log_msg in self.stub.StreamLogs(request): yield log_msg.linecpp/src/linear_programming/utilities/cython_solve.cu (1)
249-256: Consider usingstd::any_offor LP/MIP detection.The manual loop to detect MIP problems can be simplified using standard algorithms for better readability.
♻️ Suggested refactor
- // Determine if LP or MIP based on variable types - bool is_mip = false; - auto var_types = data_model->get_variable_types(); - for (size_t i = 0; i < var_types.size(); ++i) { - if (var_types.data()[i] != 'C') { - is_mip = true; - break; - } - } + // Determine if LP or MIP based on variable types + auto var_types = data_model->get_variable_types(); + bool is_mip = std::any_of(var_types.data(), + var_types.data() + var_types.size(), + [](char c) { return c != 'C'; });build.sh (2)
454-495: Duplicate CMake configuration violates DRY principle.The cmake invocation for
cuopt_grpc_server(lines 461-483) duplicates nearly all arguments from thelibcuoptbuild (lines 425-446). This creates a maintenance burden where changes must be made in two places.Consider extracting common cmake arguments into a bash array variable and reusing it in both places.
♻️ Suggested refactor approach
# Define common cmake args once COMMON_CMAKE_ARGS=( -DDEFINE_ASSERT=${DEFINE_ASSERT} -DDEFINE_BENCHMARK="${DEFINE_BENCHMARK}" # ... other common args ) # Use in libcuopt build cmake "${COMMON_CMAKE_ARGS[@]}" "${EXTRA_CMAKE_ARGS[@]}" "${REPODIR}"/cpp # Use in cuopt_grpc_server build (add -DCUOPT_ENABLE_GRPC=ON) cmake "${COMMON_CMAKE_ARGS[@]}" -DCUOPT_ENABLE_GRPC=ON "${EXTRA_CMAKE_ARGS[@]}" "${REPODIR}"/cpp
489-493: Add error handling for gRPC server installation.The install command doesn't check if the build produced the executable before attempting to copy it.
💡 Suggested fix
if [ -z "${INSTALL_TARGET}" ]; then echo "Skipping install of cuopt_grpc_server (-n flag set)" else + if [ ! -f "${LIBCUOPT_BUILD_DIR}/cuopt_grpc_server" ]; then + echo "Error: cuopt_grpc_server binary not found in ${LIBCUOPT_BUILD_DIR}" + exit 1 + fi + mkdir -p "${INSTALL_PREFIX}/bin/" install -m 755 "${LIBCUOPT_BUILD_DIR}/cuopt_grpc_server" "${INSTALL_PREFIX}/bin/" echo "Installed cuopt_grpc_server to ${INSTALL_PREFIX}/bin/" ficpp/cuopt_cli.cpp (2)
240-242: Potential null pointer dereference whenis_remote_solveis false but handle creation fails.If
std::make_unique<raft::handle_t>()throws an exception,handle_ptrremains nullptr. While the exception will propagate, consider adding explicit error handling for CUDA initialization failures.
508-554: Silent catch block may hide parsing errors.The
try/catchblock at lines 514-522 silently ignores exceptions when parsingCUOPT_GRPC_MAX_MESSAGE_MB. Consider logging a warning when the environment variable contains an invalid value.Suggested improvement
try { int64_t mb = std::stoll(env_mb); if (mb <= 0) { client_effective_bytes = -1; } else { client_effective_bytes = mb * kMiB; } - } catch (...) { + } catch (const std::exception& e) { + std::cerr << "Warning: Invalid CUOPT_GRPC_MAX_MESSAGE_MB value '" << env_mb + << "', using default\n"; }cpp/cuopt_remote_server.cpp (3)
514-525: Race condition when claiming jobs from the queue.The atomic
compare_exchange_strongonclaimedis good, but there's a TOCTOU window between checkingjob_queue[i].ready && !job_queue[i].claimedand the compare_exchange. Multiple workers could pass the initial check before any succeeds at the CAS, which is fine, but consider using a single atomic operation pattern for clarity.
1645-1655: Hardcoded log directory path/tmp/cuopt_logs.Consider making the log directory configurable via command-line argument or environment variable for flexibility in deployment scenarios where
/tmpmay be restricted or have limited space.
2272-2273: Detached threads for client handling may outlive the server.When the server shuts down, detached client handler threads may still be running. Consider using a thread pool or tracking active connections to ensure clean shutdown.
Consider tracking active client threads
Instead of detaching threads, consider maintaining a vector of active threads and joining them during shutdown, or using a bounded thread pool to limit concurrent connections.
cpp/include/cuopt/linear_programming/pdlp/solver_solution.hpp (2)
274-306: Host accessor documentation should clarify preconditions.The documentation states "Only valid when is_device_memory() returns false" but this is only a comment, not enforced. Consider documenting the behavior when called incorrectly (undefined behavior, assertion, exception).
Enhanced documentation suggestion
/** * `@brief` Returns the primal solution in host (CPU) memory. - * Only valid when is_device_memory() returns false. + * `@pre` is_device_memory() must return false, otherwise behavior is undefined. + * `@note` Use to_host() first if data is currently on GPU. * * `@return` std::vector<f_t>& The host memory container for the primal solution. */
492-509: Consider documenting thread-safety of solution objects.Per coding guidelines for public headers, thread-safety should be documented. Add a note indicating whether solution objects are thread-safe or require external synchronization.
conda/environments/all_cuda-131_arch-aarch64.yaml (1)
28-36: Align/pin grpc/protobuf versions for reproducibilityPlease ensure grpc-cpp and libprotobuf versions are constrained in the generator source so this environment stays reproducible and ABI-compatible with the build tooling.
conda/environments/all_cuda-131_arch-x86_64.yaml (1)
28-36: Align/pin grpc/protobuf versions for reproducibilitySame request here: please constrain grpc-cpp and libprotobuf versions in dependencies.yaml to avoid ABI drift across builds.
ci/build_wheel_libcuopt.sh (1)
20-23: Keep protobuf/gRPC versions aligned across install script and envsNow that the wheel build always installs protobuf/gRPC and enables CUOPT_ENABLE_GRPC, please ensure install_protobuf_grpc.sh pins versions consistent with the conda environments to avoid ABI or linkage drift.
cpp/src/linear_programming/CMakeLists.txt (1)
11-12: Verify protobuf dependency is intentional for non-GRPC buildsprotobuf_serializer.cu is now always compiled. If protobuf isn’t guaranteed in non-GRPC builds, consider guarding it behind a build option or ensure protobuf is always available.
cpp/include/cuopt/linear_programming/utilities/callbacks_implems.hpp (2)
43-46: Consider renaming variables to reflect conditional data source.After the conditional change,
numba_matrixmay actually hold a NumPy array (host data), andnumpy_arrayis also conditionally selected. Consider renaming tosolution_arrayandobjective_arrayto avoid confusion about data location.Also applies to: 82-85
20-39: Code duplication between callback classes.
get_numba_matrixandget_numpy_arraymethods are identical in bothdefault_get_solution_callback_tanddefault_set_solution_callback_t. Consider extracting these to a common base class or utility to reduce duplication.Also applies to: 59-78
cpp/scripts/run_dev_server.sh (1)
1-24: Consider adding safer shell script options.The script works correctly, but adding
set -euo pipefailafter the shebang would make it more robust by:
-e: Exit immediately on command failure-u: Treat unset variables as errors-o pipefail: Catch errors in pipelines♻️ Proposed improvement
#!/bin/bash # SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. # SPDX-License-Identifier: Apache-2.0 +set -euo pipefail # Development script to run the cuopt remote server with the correct library pathpython/cuopt/cuopt/linear_programming/remote.py (2)
19-32: Avoid modifyingsys.pathwith relative build directories in production code.Inserting a relative build path (
cpp/build) intosys.pathis fragile and inappropriate for distributed packages. This path manipulation should be removed or guarded by a development-only flag. In production,cuopt_remote_pb2should be installed as part of the package.♻️ Suggested approach
try: - import sys - - # Add build directory for development - build_path = os.path.join( - os.path.dirname(__file__), "..", "..", "..", "..", "..", "cpp", "build" - ) - if os.path.exists(build_path): - sys.path.insert(0, os.path.abspath(build_path)) import cuopt_remote_pb2 as pb _HAS_PROTOBUF = True except ImportError: _HAS_PROTOBUF = False
55-72: Consider more specific type annotation.The return type
Optional[tuple]could be more descriptive asOptional[Tuple[str, int]]for better IDE support and documentation.♻️ Suggested fix
+from typing import Optional, Tuple + -def get_remote_config() -> Optional[tuple]: +def get_remote_config() -> Optional[Tuple[str, int]]:cpp/src/mip/solve.cu (1)
383-448: Code duplication:cpu_problem_data_tis defined in multiple files.This struct duplicates functionality from
problem_cpu_data_tincuopt_c.cpp. Consider consolidating into a shared header (e.g., incuopt/linear_programming/utilities/) to avoid maintenance burden and divergence.Based on learnings, refactor code duplication in solver components into shared utilities.
cpp/include/cuopt/linear_programming/utilities/remote_solve.hpp (1)
39-54: Consider validating port number range.The code catches conversion errors but doesn't validate that the port number is within the valid range (1-65535). An out-of-range port would silently fall back to local solve, which might confuse users.
♻️ Suggested improvement
if (host != nullptr && port != nullptr && host[0] != '\0' && port[0] != '\0') { try { int port_num = std::stoi(port); + if (port_num < 1 || port_num > 65535) { + return std::nullopt; // Invalid port range + } return remote_solve_config_t{std::string(host), port_num}; } catch (...) { // Invalid port number, fall back to local solve return std::nullopt; } }cpp/include/cuopt/linear_programming/utilities/remote_serialization.hpp (1)
217-237: Potential confusion:job_status_tduplicatesremote_job_status_t.The
job_status_tenum insideremote_serializer_thas identical values toremote_job_status_tinremote_solve.hpp. Consider unifying these to avoid mapping overhead and potential inconsistencies.One option is to use
remote_job_status_tfromremote_solve.hppconsistently throughout, or define a single shared enum in a common location.cpp/src/linear_programming/cuopt_c.cpp (1)
33-147: Code duplication:problem_cpu_data_tis similar to struct insolve.cu.This struct duplicates the
cpu_problem_data_tincpp/src/mip/solve.cu. Both serve the same purpose of holding CPU-side problem data. Consider consolidating into a shared utility header.cpp/src/linear_programming/utilities/cuopt_remote.proto (1)
10-14: RemoveFloatTypeenum andfloat_typefield—not used in any payload.The
float_typefield andFloatTypeenum are defined in the proto schema but never consumed by the codebase. All numeric payloads usedoubleexclusively, making the FLOAT32 option misleading to clients expecting float32 wire format support. Remove both the enum (lines 26-29) and the field (line 13) until float32 payloads are actually implemented.♻️ Optional clean-up
message RequestHeader { uint32 version = 1; // Protocol version (currently 1) ProblemType problem_type = 2; // LP or MIP IndexType index_type = 3; // INT32 or INT64 - FloatType float_type = 4; // FLOAT32 or DOUBLE } @@ -enum FloatType { - FLOAT32 = 0; - DOUBLE = 1; -}
| # Convert EXTRA_CMAKE_ARGS into a semicolon-delimited list, escaping | ||
| # any semicolons in values so scikit-build-core treats each -D as one arg. | ||
| SKBUILD_EXTRA_ARGS=() | ||
| for extra_arg in "${EXTRA_CMAKE_ARGS[@]}"; do | ||
| SKBUILD_EXTRA_ARGS+=("${extra_arg//;/\\;}") | ||
| done | ||
| SKBUILD_EXTRA_ARGS_JOINED="" | ||
| if [ ${#SKBUILD_EXTRA_ARGS[@]} -gt 0 ]; then | ||
| SKBUILD_EXTRA_ARGS_JOINED="$(IFS=';'; echo "${SKBUILD_EXTRA_ARGS[*]}")" | ||
| fi | ||
| SKBUILD_CMAKE_ARGS="-DCMAKE_PREFIX_PATH=${INSTALL_PREFIX};-DCMAKE_LIBRARY_PATH=${LIBCUOPT_BUILD_DIR};-DCMAKE_CUDA_ARCHITECTURES=${CUOPT_CMAKE_CUDA_ARCHITECTURES}" | ||
| if [ -n "${SKBUILD_EXTRA_ARGS_JOINED}" ]; then | ||
| SKBUILD_CMAKE_ARGS="${SKBUILD_CMAKE_ARGS};${SKBUILD_EXTRA_ARGS_JOINED}" | ||
| fi | ||
| python "${PYTHON_ARGS_FOR_INSTALL[@]}" . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check all occurrences of SKBUILD_CMAKE_ARGS in build.sh
rg -n "SKBUILD_CMAKE_ARGS" build.sh
# Also check for SKBUILD pattern usage
echo "=== All SKBUILD references ==="
rg -n "SKBUILD" build.sh | head -30Repository: NVIDIA/cuopt
Length of output: 1736
🏁 Script executed:
#!/bin/bash
# Get context around the first python command (lines 517-535)
sed -n '515,535p' build.sh | cat -n
echo "=== Second occurrence context (lines 540-560) ==="
sed -n '540,560p' build.sh | cat -nRepository: NVIDIA/cuopt
Length of output: 2219
🏁 Script executed:
#!/bin/bash
# Find definition of PYTHON_ARGS_FOR_INSTALL
rg -n "PYTHON_ARGS_FOR_INSTALL" build.sh | head -20
echo "=== Check for export statements ==="
rg -n "export SKBUILD" build.sh
echo "=== Check broader context for python install patterns ==="
rg -n "python.*install" build.sh -A 2 -B 2 | head -50Repository: NVIDIA/cuopt
Length of output: 438
🏁 Script executed:
#!/bin/bash
# Check scikit-build-core documentation or examples
# Look for patterns of how cmake args should be passed
rg -n "\-D[A-Z_]+=" build.sh | head -20
echo "=== Check if SKBUILD_CMAKE_ARGS should be in environment or passed differently ==="
# Search for any CMake-related env variables
rg -n "CMAKE_" build.sh | grep -E "(export|=)" | head -20
echo "=== Look for scikit-build references or documentation ==="
rg -n "scikit-build" build.shRepository: NVIDIA/cuopt
Length of output: 3181
🏁 Script executed:
#!/bin/bash
# Check if SKBUILD_CMAKE_ARGS should be an environment variable
# Look for any environment variable patterns in the code
sed -n '515,535p' build.sh
echo "=== Check if there's an export or env setup before the python call ==="
sed -n '500,535p' build.shRepository: NVIDIA/cuopt
Length of output: 2467
🏁 Script executed:
#!/bin/bash
# Search for how scikit-build-core expects to receive CMAKE args
# Check if there's documentation or comments about SKBUILD env vars
rg -n "SKBUILD\|skbuild" . --type py --type txt --type md 2>/dev/null | head -30
echo "=== Check for pyproject.toml or setup.py that might configure scikit-build ==="
fd -e "pyproject.toml" -o -e "setup.py" -o -e "setup.cfg" | head -10Repository: NVIDIA/cuopt
Length of output: 268
🏁 Script executed:
#!/bin/bash
# Find pyproject.toml and setup files
find . -maxdepth 5 -name "pyproject.toml" -o -name "setup.py" -o -name "setup.cfg" | head -20
echo "=== Check python/cuopt directory structure ==="
ls -la python/cuopt/ | head -30Repository: NVIDIA/cuopt
Length of output: 919
🏁 Script executed:
#!/bin/bash
# Check the pyproject.toml for python/cuopt to see scikit-build configuration
cat python/cuopt/pyproject.tomlRepository: NVIDIA/cuopt
Length of output: 3420
🏁 Script executed:
#!/bin/bash
# Check scikit-build-core documentation for how it expects CMAKE args
# Look for patterns in the codebase where SKBUILD env vars might be set
grep -r "SKBUILD_CMAKE_ARGS\|SKBUILD_BUILD" . --include="*.sh" --include="*.py" 2>/dev/null
echo "=== Check if there are any environment variable exports before python calls ==="
sed -n '515,533p' build.shRepository: NVIDIA/cuopt
Length of output: 2112
Export SKBUILD_CMAKE_ARGS before the python command.
The variable SKBUILD_CMAKE_ARGS is constructed (lines 527-530) but not exported, so scikit-build-core will not receive it. The CI scripts (e.g., ci/build_wheel_cuopt.sh) correctly use export SKBUILD_CMAKE_ARGS=... before invoking the build. Add export before assigning SKBUILD_CMAKE_ARGS on line 527 (and line 546 for the second occurrence).
🤖 Prompt for AI Agents
In `@build.sh` around lines 517 - 531, The SKBUILD_CMAKE_ARGS environment variable
is being built but not exported, so scikit-build-core won't see it; change the
assignment of SKBUILD_CMAKE_ARGS (the variable constructed from INSTALL_PREFIX,
LIBCUOPT_BUILD_DIR, CUOPT_CMAKE_CUDA_ARCHITECTURES and optionally
SKBUILD_EXTRA_ARGS_JOINED) to export SKBUILD_CMAKE_ARGS before invoking python
"${PYTHON_ARGS_FOR_INSTALL[@]}" . and likewise export the second
SKBUILD_CMAKE_ARGS occurrence noted in the script so the python/scikit-build
process inherits the variable.
| # gRPC for remote solve service. | ||
| # | ||
| # We treat gRPC as required, but allow it to come from either: | ||
| # - an installed CMake package (gRPCConfig.cmake), or | ||
| # - an in-tree build (e.g. python/libcuopt uses FetchContent(grpc), which defines gRPC::grpc++). | ||
| if(NOT TARGET gRPC::grpc++) | ||
| find_package(gRPC CONFIG REQUIRED) | ||
| endif() | ||
| set(CUOPT_ENABLE_GRPC ON) | ||
| add_compile_definitions(CUOPT_ENABLE_GRPC) | ||
| message(STATUS "gRPC enabled (target gRPC::grpc++ is available)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check whether CUOPT_ENABLE_GRPC is already user-configurable elsewhere
rg -n "option\(.*CUOPT_ENABLE_GRPC|CUOPT_ENABLE_GRPC" -g '*.cmake'Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
#!/bin/bash
# Also check in CMakeLists.txt files
rg -n "option\(.*CUOPT_ENABLE_GRPC|CUOPT_ENABLE_GRPC" cpp/CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 264
🏁 Script executed:
#!/bin/bash
# Check the broader context around lines 296-306 in cpp/CMakeLists.txt
# to see if gRPC was previously optional or if there's an option() call
sed -n '280,320p' cpp/CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 1787
🏁 Script executed:
#!/bin/bash
# Search for any option() calls that mention gRPC or enable flags
rg -n "option\(" cpp/CMakeLists.txt | head -20Repository: NVIDIA/cuopt
Length of output: 978
Make gRPC optional or confirm it is now a mandatory dependency.
find_package(gRPC CONFIG REQUIRED) on line 302 makes gRPC unconditionally required, and CUOPT_ENABLE_GRPC is hardcoded to ON on line 304 with no corresponding user-facing option(). If gRPC-free builds were previously supported, this breaks backward compatibility. Either gate the gRPC and proto codegen logic behind a CMake option() with default OFF, or document that gRPC is now mandatory.
🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 296 - 306, The CMake logic currently forces
gRPC by calling find_package(gRPC CONFIG REQUIRED) and unconditionally setting
CUOPT_ENABLE_GRPC via add_compile_definitions, breaking gRPC-free builds; change
this by introducing a CMake option (e.g., option(CUOPT_ENABLE_GRPC "Enable gRPC
support" OFF)) and guard the existing code that checks TARGET gRPC::grpc++,
calls find_package(gRPC ...), and sets
add_compile_definitions(CUOPT_ENABLE_GRPC) behind that option so gRPC is only
required when the option is ON (or, if you intend to make gRPC mandatory, update
project docs and remove the option but keep a clear message); ensure the
message(STATUS ...) reflects the chosen path and that proto/codegen and targets
that reference gRPC::grpc++ are only configured when CUOPT_ENABLE_GRPC is
enabled.
| std::string generate_job_id() | ||
| { | ||
| static std::random_device rd; | ||
| static std::mt19937 gen(rd()); | ||
| static std::uniform_int_distribution<uint64_t> dis; | ||
|
|
||
| uint64_t id = dis(gen); | ||
| char buf[32]; | ||
| snprintf(buf, sizeof(buf), "job_%016lx", id); | ||
| return std::string(buf); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thread-unsafe static variables in generate_job_id.
The static std::mt19937 gen and std::uniform_int_distribution<uint64_t> dis are shared across threads but not protected by a mutex. This function can be called from multiple client handler threads concurrently.
🔒 Proposed fix using thread-local storage
std::string generate_job_id()
{
- static std::random_device rd;
- static std::mt19937 gen(rd());
- static std::uniform_int_distribution<uint64_t> dis;
+ thread_local std::random_device rd;
+ thread_local std::mt19937 gen(rd());
+ thread_local std::uniform_int_distribution<uint64_t> dis;
uint64_t id = dis(gen);
char buf[32];
snprintf(buf, sizeof(buf), "job_%016lx", id);
return std::string(buf);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| std::string generate_job_id() | |
| { | |
| static std::random_device rd; | |
| static std::mt19937 gen(rd()); | |
| static std::uniform_int_distribution<uint64_t> dis; | |
| uint64_t id = dis(gen); | |
| char buf[32]; | |
| snprintf(buf, sizeof(buf), "job_%016lx", id); | |
| return std::string(buf); | |
| } | |
| std::string generate_job_id() | |
| { | |
| thread_local std::random_device rd; | |
| thread_local std::mt19937 gen(rd()); | |
| thread_local std::uniform_int_distribution<uint64_t> dis; | |
| uint64_t id = dis(gen); | |
| char buf[32]; | |
| snprintf(buf, sizeof(buf), "job_%016lx", id); | |
| return std::string(buf); | |
| } |
🤖 Prompt for AI Agents
In `@cpp/cuopt_remote_server.cpp` around lines 303 - 313, generate_job_id
currently uses shared static RNG state (static std::mt19937 gen and static
std::uniform_int_distribution<uint64_t> dis) which is not thread-safe; make the
RNG state thread-local or otherwise synchronized: replace those static variables
with thread_local instances (e.g., thread_local std::mt19937 and thread_local
distribution) or protect access with a mutex inside generate_job_id, ensuring
unique IDs remain formatted via snprintf and returned as before; update
references in generate_job_id accordingly to avoid data races.
| # Static library for linking into libcuopt | ||
| add_library(mps_parser_static STATIC ${MPS_PARSER_SOURCES}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Static library is missing several configurations applied to the shared library.
The mps_parser_static target lacks:
set_target_propertiesforCXX_STANDARD,CXX_STANDARD_REQUIRED,POSITION_INDEPENDENT_CODE, andCXX_SCAN_FOR_MODULEStarget_compile_optionsfor compiler flags- Conditional
target_include_directoriesfor BZip2/ZLIB (lines 137-143)
This may cause build failures or inconsistent behavior when linking the static library.
Proposed fix
# Static library for linking into libcuopt
add_library(mps_parser_static STATIC ${MPS_PARSER_SOURCES})
+set_target_properties(mps_parser_static
+ PROPERTIES
+ CXX_STANDARD 20
+ CXX_STANDARD_REQUIRED ON
+ POSITION_INDEPENDENT_CODE ON
+ CXX_SCAN_FOR_MODULES OFF
+)
+
+target_compile_options(mps_parser_static
+ PRIVATE "$<$<COMPILE_LANGUAGE:CXX>:${MPS_PARSER_CXX_FLAGS}>"
+)Also add BZip2/ZLIB includes for the static library after line 143:
if(MPS_PARSER_WITH_BZIP2)
target_include_directories(mps_parser_static PRIVATE BZip2::BZip2)
endif()
if(MPS_PARSER_WITH_ZLIB)
target_include_directories(mps_parser_static PRIVATE ZLIB::ZLIB)
endif()🤖 Prompt for AI Agents
In `@cpp/libmps_parser/CMakeLists.txt` around lines 84 - 85, The static target
mps_parser_static is missing the same configuration as the shared target; update
CMake to apply set_target_properties on mps_parser_static to set CXX_STANDARD,
CXX_STANDARD_REQUIRED, POSITION_INDEPENDENT_CODE, and CXX_SCAN_FOR_MODULES, add
the same target_compile_options used for the shared target to mps_parser_static,
and mirror the conditional include directories for compression libs by adding
target_include_directories(mps_parser_static PRIVATE BZip2::BZip2) when
MPS_PARSER_WITH_BZIP2 and target_include_directories(mps_parser_static PRIVATE
ZLIB::ZLIB) when MPS_PARSER_WITH_ZLIB so the static library builds with the same
flags and includes.
| // Stream logs to stdout until stop_flag is true or server indicates job_complete. | ||
| // If print_prefix is non-empty, it will be printed before each log line. | ||
| void stream_logs_to_stdout(const std::string& address, | ||
| const std::string& job_id, | ||
| volatile bool* stop_flag, | ||
| const std::string& print_prefix); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "*.hpp" -o -name "*.cpp" | grep -E "(remote_solve|grpc)" | head -20Repository: NVIDIA/cuopt
Length of output: 271
🏁 Script executed:
cat -n cpp/src/linear_programming/utilities/remote_solve_grpc.hpp | head -100Repository: NVIDIA/cuopt
Length of output: 4113
🏁 Script executed:
fd "remote_solve_grpc" --type fRepository: NVIDIA/cuopt
Length of output: 173
🏁 Script executed:
cat -n cpp/src/linear_programming/utilities/remote_solve_grpc.cpp | head -300Repository: NVIDIA/cuopt
Length of output: 12449
🏁 Script executed:
grep -n "stream_logs_to_stdout" cpp/src/linear_programming/utilities/remote_solve_grpc.cppRepository: NVIDIA/cuopt
Length of output: 114
🏁 Script executed:
sed -n '350,500p' cpp/src/linear_programming/utilities/remote_solve_grpc.cppRepository: NVIDIA/cuopt
Length of output: 5047
🏁 Script executed:
sed -n '597,700p' cpp/src/linear_programming/utilities/remote_solve_grpc.cppRepository: NVIDIA/cuopt
Length of output: 2324
🏁 Script executed:
sed -n '597,750p' cpp/src/linear_programming/utilities/remote_solve_grpc.cppRepository: NVIDIA/cuopt
Length of output: 2324
🏁 Script executed:
grep -rn "stream_logs_to_stdout" cpp/ --include="*.cpp" --include="*.hpp"Repository: NVIDIA/cuopt
Length of output: 290
🏁 Script executed:
grep -n "volatile bool" cpp/src/linear_programming/utilities/remote_solve_grpc.cppRepository: NVIDIA/cuopt
Length of output: 112
🏁 Script executed:
grep -rn "stream_logs_to_stdout" cpp/ --include="*.cpp" -B 2 -A 2Repository: NVIDIA/cuopt
Length of output: 535
🏁 Script executed:
grep -rn "stream_logs_to_stdout" . --include="*.py" 2>/dev/null | head -20Repository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
cat -n cpp/cuopt_grpc_server.cpp | head -150Repository: NVIDIA/cuopt
Length of output: 5879
🏁 Script executed:
grep -n "stream_logs" cpp/cuopt_grpc_server.cppRepository: NVIDIA/cuopt
Length of output: 38
🏁 Script executed:
grep -rn "stop_flag" cpp/ --include="*.cpp" --include="*.hpp"Repository: NVIDIA/cuopt
Length of output: 559
🏁 Script executed:
grep -rn "StreamLogs" cpp/ --include="*.cpp" --include="*.hpp"Repository: NVIDIA/cuopt
Length of output: 429
🏁 Script executed:
sed -n '2566,2650p' cpp/cuopt_grpc_server.cppRepository: NVIDIA/cuopt
Length of output: 3097
🏁 Script executed:
grep -B 10 -A 30 "StreamLogs(ServerContext" cpp/cuopt_grpc_server.cppRepository: NVIDIA/cuopt
Length of output: 1488
Use std::atomic<bool> for the stop flag to ensure thread-safe signaling.
volatile doesn't synchronize between threads—it only prevents compiler optimizations. The log streaming loop reads *stop_flag in a loop while another thread may write to it, creating a data race. Replace with std::atomic<bool>* and use load()/store() semantics, then update the implementation and all call sites.
🔧 Proposed API adjustment
+#include <atomic>
...
-void stream_logs_to_stdout(const std::string& address,
- const std::string& job_id,
- volatile bool* stop_flag,
- const std::string& print_prefix);
+void stream_logs_to_stdout(const std::string& address,
+ const std::string& job_id,
+ std::atomic<bool>* stop_flag,
+ const std::string& print_prefix);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Stream logs to stdout until stop_flag is true or server indicates job_complete. | |
| // If print_prefix is non-empty, it will be printed before each log line. | |
| void stream_logs_to_stdout(const std::string& address, | |
| const std::string& job_id, | |
| volatile bool* stop_flag, | |
| const std::string& print_prefix); | |
| `#include` <atomic> | |
| // ... other includes ... | |
| // Stream logs to stdout until stop_flag is true or server indicates job_complete. | |
| // If print_prefix is non-empty, it will be printed before each log line. | |
| void stream_logs_to_stdout(const std::string& address, | |
| const std::string& job_id, | |
| std::atomic<bool>* stop_flag, | |
| const std::string& print_prefix); |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/utilities/remote_solve_grpc.hpp` around lines 72 -
77, The stop flag parameter for stream_logs_to_stdout is using volatile which is
not thread-safe; change the function signature to accept std::atomic<bool>*
stop_flag, update the implementation to use stop_flag->load() (and store() where
written) for loop checks and any writers, and update all call sites to pass a
std::atomic<bool> instance instead of a volatile bool pointer so signaling is
synchronized across threads (refer to function stream_logs_to_stdout and any
callers that set/clear the flag).
| static bool logged_no_device = false; | ||
| if (!logged_no_device) { | ||
| CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents"); | ||
| logged_no_device = true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Protect logged_no_device against races.
static bool logged_no_device is written without synchronization, which is a data race if multiple threads call copy_incumbent_to_device. Use std::once_flag or an atomic.
🔧 Thread-safe logging
- static bool logged_no_device = false;
- if (!logged_no_device) {
- CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents");
- logged_no_device = true;
- }
+ static std::once_flag logged_no_device;
+ std::call_once(logged_no_device, []() {
+ CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents");
+ });📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static bool logged_no_device = false; | |
| if (!logged_no_device) { | |
| CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents"); | |
| logged_no_device = true; | |
| } | |
| static std::once_flag logged_no_device; | |
| std::call_once(logged_no_device, []() { | |
| CUOPT_LOG_INFO("[remote_solve] No CUDA device available; using host incumbents"); | |
| }); |
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/utilities/remote_solve.cu` around lines 41 - 45,
The static flag logged_no_device in copy_incumbent_to_device is written without
synchronization causing a race; replace it with a thread-safe mechanism (e.g., a
std::once_flag + std::call_once or an std::atomic<bool>) so the "[remote_solve]
No CUDA device available; using host incumbents" message is logged exactly once
without data races—locate logged_no_device and the copy_incumbent_to_device
function, remove the unsynchronized static write, and implement one of those
thread-safe patterns to perform the logging safely.
| if (status == "COMPLETED") { break; } | ||
| if (status == "FAILED" || status == "CANCELLED" || status == "NOT_FOUND") { | ||
| stop_logs = true; | ||
| if (log_thread.joinable()) { log_thread.join(); } | ||
| grpc_remote::delete_result(address, job_id); | ||
| return optimization_problem_solution_t<i_t, f_t>( | ||
| cuopt::logic_error("Remote job did not complete successfully (status=" + status + ")", | ||
| cuopt::error_type_t::RuntimeError)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handle unexpected status values to avoid infinite polling.
check_status can return "UNKNOWN", which currently falls through and keeps the loop alive forever. Treat unknown/unsupported status as a failure (or add a timeout) to avoid hanging remote solves.
🔧 Suggested guard
- if (status == "COMPLETED") { break; }
- if (status == "FAILED" || status == "CANCELLED" || status == "NOT_FOUND") {
+ if (status == "COMPLETED") { break; }
+ if (status == "QUEUED" || status == "PROCESSING") {
+ // keep waiting
+ } else if (status == "FAILED" || status == "CANCELLED" || status == "NOT_FOUND") {
+ ...
+ } else {
+ stop_logs = true;
+ if (log_thread.joinable()) { log_thread.join(); }
+ grpc_remote::delete_result(address, job_id);
+ return optimization_problem_solution_t<i_t, f_t>(
+ cuopt::logic_error("Remote job returned unexpected status: " + status,
+ cuopt::error_type_t::RuntimeError));
}Also applies to: 402-410
🤖 Prompt for AI Agents
In `@cpp/src/linear_programming/utilities/remote_solve.cu` around lines 219 - 227,
The current status check in the polling loop does not handle unexpected values
like "UNKNOWN", leading to infinite polling. Update the status handling in the
relevant loop in the remote_solve function to treat any unknown or unsupported
status as a failure: set stop_logs to true, join the log thread if joinable,
delete the remote result, and return an error indicating the unexpected status.
Apply this fix to both code sections handling status checks around lines 219-227
and 402-410.
| const rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() const | ||
| { | ||
| return solution_; | ||
| return *solution_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() | ||
| { | ||
| return solution_; | ||
| return *solution_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential null pointer dereference in get_solution() when using CPU-only path.
If the object was constructed with a CPU-only constructor, solution_ will be nullptr. Calling get_solution() will dereference a null pointer. Consider adding a guard or assertion.
🛡️ Suggested guard
template <typename i_t, typename f_t>
const rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() const
{
+ cuopt_assert(is_device_memory_ && solution_, "Cannot access device solution when data is on host");
return *solution_;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() const | |
| { | |
| return solution_; | |
| return *solution_; | |
| } | |
| template <typename i_t, typename f_t> | |
| rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() | |
| { | |
| return solution_; | |
| return *solution_; | |
| } | |
| const rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() const | |
| { | |
| cuopt_assert(is_device_memory_ && solution_, "Cannot access device solution when data is on host"); | |
| return *solution_; | |
| } | |
| template <typename i_t, typename f_t> | |
| rmm::device_uvector<f_t>& mip_solution_t<i_t, f_t>::get_solution() | |
| { | |
| return *solution_; | |
| } |
🤖 Prompt for AI Agents
In `@cpp/src/mip/solver_solution.cu` around lines 148 - 157, The get_solution()
methods currently dereference solution_ which can be nullptr for objects
constructed via the CPU-only constructor; modify mip_solution_t<i_t,
f_t>::get_solution() (both const and non-const overloads) to guard against null
solution_—e.g., add an assertion or throw a clear exception when solution_ ==
nullptr (or return/obtain an appropriate CPU-side container if that is the
intended behavior) so callers no longer dereference a null pointer; ensure the
guard message references solution_ and the class/method name for easy tracing.
| template <typename i_t, typename f_t> | ||
| std::vector<f_t>& mip_solution_t<i_t, f_t>::get_solution_host() | ||
| { | ||
| return *solution_host_; | ||
| } | ||
|
|
||
| template <typename i_t, typename f_t> | ||
| const std::vector<f_t>& mip_solution_t<i_t, f_t>::get_solution_host() const | ||
| { | ||
| return *solution_host_; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar null pointer risk in get_solution_host() when using GPU path.
If the object was constructed with a GPU constructor, solution_host_ will be nullptr. Consider adding guards to both host and device accessors.
🤖 Prompt for AI Agents
In `@cpp/src/mip/solver_solution.cu` around lines 159 - 169, mip_solution_t's
get_solution_host() and its const overload dereference solution_host_ which can
be nullptr when constructed with the GPU path; update both get_solution_host()
methods to check solution_host_ before dereferencing (e.g., if nullptr, either
lazily allocate/copy device data into solution_host_ or throw a clear exception)
and ensure the chosen behavior is consistent with other accessors (refer to
mip_solution_t constructor(s), solution_device_, and any get_solution_device()
logic) so callers won't dereference a null pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@ci/utils/install_protobuf_grpc.sh`:
- Around line 35-40: After installing only grpc-devel in the conditional branch
in install_protobuf_grpc.sh, verify that the grpc_cpp_plugin binary is actually
present (e.g., check PATH or common install locations for grpc_cpp_plugin) and
if it is missing, treat this branch as a failure and fall back to the existing
source-build branch instead of continuing; update the if/elif/else logic so the
elif path (which echoes "Installed grpc-devel...") performs a post-install
existence check for grpc_cpp_plugin and redirects to the source build path when
absent.
🧹 Nitpick comments (2)
cpp/CMakeLists.txt (2)
740-746: Inconsistentprotobuf::libprotobuflink visibility.
cuopt_remote_serverlinksprotobuf::libprotobufasPUBLIC, whilecuoptlinks it asPRIVATE(line 508). Unless consumers ofcuopt_remote_serverneed direct protobuf access, consider making thisPRIVATEfor consistency.♻️ Suggested fix
target_link_libraries(cuopt_remote_server PUBLIC cuopt OpenMP::OpenMP_CXX - protobuf::libprotobuf PRIVATE + protobuf::libprotobuf )
773-808: Consider removing or gating dead code behind a proper option.The
if(FALSE)block (lines 773-808) is permanently disabled dead code. While the TODO explains the sources are missing, this clutters the CMakeLists. Consider either:
- Removing the block entirely and re-adding when sources are available
- Gating with a proper
option(BUILD_REMOTE_TEST_CLIENTS ... OFF)if the intent is to keep as a template
| if dnf install -y grpc-devel grpc-plugins; then | ||
| echo "Installed gRPC from dnf." | ||
| elif dnf install -y grpc-devel; then | ||
| echo "Installed grpc-devel from dnf (grpc_cpp_plugin may be missing)." | ||
| else | ||
| echo "grpc-devel not available. Building and installing Protobuf + gRPC from source..." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify grpc_cpp_plugin availability after partial install.
When only grpc-devel is installed (without grpc-plugins), the script warns that grpc_cpp_plugin may be missing but continues. This could lead to downstream CMake failures during code generation. Consider verifying the plugin exists and falling back to source build if missing.
🔧 Suggested fix
if dnf install -y grpc-devel grpc-plugins; then
echo "Installed gRPC from dnf."
elif dnf install -y grpc-devel; then
- echo "Installed grpc-devel from dnf (grpc_cpp_plugin may be missing)."
+ if ! command -v grpc_cpp_plugin >/dev/null 2>&1; then
+ echo "grpc-devel installed but grpc_cpp_plugin missing. Building from source..."
+ # Fall through to source build below
+ else
+ echo "Installed grpc-devel from dnf with grpc_cpp_plugin."
+ fi
else🤖 Prompt for AI Agents
In `@ci/utils/install_protobuf_grpc.sh` around lines 35 - 40, After installing
only grpc-devel in the conditional branch in install_protobuf_grpc.sh, verify
that the grpc_cpp_plugin binary is actually present (e.g., check PATH or common
install locations for grpc_cpp_plugin) and if it is missing, treat this branch
as a failure and fall back to the existing source-build branch instead of
continuing; update the if/elif/else logic so the elif path (which echoes
"Installed grpc-devel...") performs a post-install existence check for
grpc_cpp_plugin and redirects to the source build path when absent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@cpp/CMakeLists.txt`:
- Around line 229-265: The CUOPT_ZLIB_HINT_CODE block is adding a global
CMAKE_IGNORE_PATH entry which can break downstream consumers; remove the line
that appends "/usr/lib64/cmake/ZLIB" and "/lib64/cmake/ZLIB" to
CMAKE_IGNORE_PATH inside CUOPT_ZLIB_HINT_CODE and retain only the ZLIB_DIR and
ZLIB_ROOT (and ZLIB_USE_STATIC_LIBS) hints so the
rapids_export_post_find_package_code calls (both BUILD and INSTALL for
rapids_logger) still provide targeted ZLIB guidance without modifying the global
CMake search path.
- Around line 312-377: The add_custom_command blocks for proto/grpc generation
must declare explicit target dependencies and use generator expressions
consistently: update the protoc executable selection to prefer
$<TARGET_FILE:protobuf::protoc> (matching the existing
$<TARGET_FILE:grpc_cpp_plugin> usage) and add the corresponding imported targets
(protobuf::protoc and grpc_cpp_plugin) to the DEPENDS list of their respective
add_custom_command calls so CMake will re-run the commands when those plugins
are rebuilt; adjust the GRPC custom command DEPENDS to include grpc_cpp_plugin
and the proto custom command DEPENDS to include protobuf::protoc.
| # Ensure downstream consumers prefer config packages (avoids Protobuf | ||
| # module/config mixing when gRPC also depends on Protobuf). | ||
| include(${rapids-cmake-dir}/export/detail/post_find_package_code.cmake) | ||
| set(CUOPT_DEP_PREFER_CONFIG ON) | ||
| rapids_export_post_find_package_code(BUILD rapids_logger | ||
| "set(CMAKE_FIND_PACKAGE_PREFER_CONFIG ON)" | ||
| EXPORT_SET cuopt-exports | ||
| CONDITION CUOPT_DEP_PREFER_CONFIG | ||
| ) | ||
| rapids_export_post_find_package_code(INSTALL rapids_logger | ||
| "set(CMAKE_FIND_PACKAGE_PREFER_CONFIG ON)" | ||
| EXPORT_SET cuopt-exports | ||
| CONDITION CUOPT_DEP_PREFER_CONFIG | ||
| ) | ||
|
|
||
| # Ensure ZLIB resolves to the active prefix for consumers (avoid broken system | ||
| # ZLIB config referencing missing libz.a). | ||
| set(CUOPT_ZLIB_HINT_CODE | ||
| "set(_CUOPT_PREFIX \"\${CMAKE_CURRENT_LIST_DIR}/../../..\")\n" | ||
| "if(EXISTS \"\${_CUOPT_PREFIX}/lib/cmake/ZLIB\")\n" | ||
| " set(ZLIB_DIR \"\${_CUOPT_PREFIX}/lib/cmake/ZLIB\")\n" | ||
| "endif()\n" | ||
| "set(ZLIB_ROOT \"\${_CUOPT_PREFIX}\")\n" | ||
| "set(ZLIB_USE_STATIC_LIBS OFF)\n" | ||
| "list(APPEND CMAKE_IGNORE_PATH \"/usr/lib64/cmake/ZLIB\" \"/lib64/cmake/ZLIB\")\n" | ||
| ) | ||
| rapids_export_post_find_package_code(BUILD rapids_logger | ||
| "${CUOPT_ZLIB_HINT_CODE}" | ||
| EXPORT_SET cuopt-exports | ||
| CONDITION CUOPT_DEP_PREFER_CONFIG | ||
| ) | ||
| rapids_export_post_find_package_code(INSTALL rapids_logger | ||
| "${CUOPT_ZLIB_HINT_CODE}" | ||
| EXPORT_SET cuopt-exports | ||
| CONDITION CUOPT_DEP_PREFER_CONFIG | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the full context of the CMakeLists.txt file around the ZLIB hint code
cd cpp && wc -l CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 74
🏁 Script executed:
# Get the exact snippet and surrounding context
sed -n '229,265p' cpp/CMakeLists.txtRepository: NVIDIA/cuopt
Length of output: 1434
🏁 Script executed:
# Check if there are any comments or documentation about CMAKE_IGNORE_PATH usage in the repo
rg "CMAKE_IGNORE_PATH" --type cmake -B 2 -A 2Repository: NVIDIA/cuopt
Length of output: 373
🌐 Web query:
CMAKE_IGNORE_PATH scope CMake documentation persistence across packages
💡 Result:
-
What scope does
CMAKE_IGNORE_PATHhave?
CMAKE_IGNORE_PATHis just a normal CMake variable unless you put it in the cache. Normal variables live in the current directory scope (and child directories created withadd_subdirectory()start by copying the parent’s variables). Cache entries are separate and only changed explicitly (e.g.,set(... CACHE ...)or-D...=...). [1], [2] -
Does it “persist across packages”? It depends what you mean by “package”.
- Same CMake configure run (same build tree)
If “package” means a dependency brought in viaadd_subdirectory()/FetchContent(i.e., configured as part of the same top-level configure), then settingCMAKE_IGNORE_PATHin a parent directory scope will generally be seen by subdirectories unless overwritten. This is standard directory-scope behavior. [2] - Separate CMake configure runs (separate build trees)
If “package” means something configured in a different CMake invocation (e.g.,ExternalProject_Add()), then normal variables do not carry over. You must pass values via cache arguments / command line (e.g.,-DCMAKE_IGNORE_PATH=...) or equivalentExternalProjectcache/preload mechanisms. [2], [3]
- Same CMake configure run (same build tree)
-
Does it persist across re-configures?
Only if it’s in the CMake cache for that build directory (CMakeCache.txt). Otherwise it must be set again each configure run. [2] -
Docs note (what it affects):
CMAKE_IGNORE_PATHis a semicolon-separated list of directories ignored byfind_*()and also affectsfind_package()Config mode (ignores<Name>Config.cmakefound in ignored dirs). For ignoring whole prefixes, preferCMAKE_IGNORE_PREFIX_PATH. [1], [4]
Practical takeaway: if you want it to apply “everywhere” in one build tree and survive reruns, set it as a cache variable (e.g., via -DCMAKE_IGNORE_PATH=...), and if you spawn separate CMake builds (ExternalProject), pass it through that project’s CMake cache args. [1], [2], [3]
Sources:
[1] CMake docs: CMAKE_IGNORE_PATH (latest)
[2] CMake docs: cmake-language(7) (variable scopes + cache persistence)
[3] CMake docs: ExternalProject (separate configure + cache/preload args mechanisms)
[4] CMake docs: CMAKE_IGNORE_PREFIX_PATH
Limit ZLIB hinting to avoid unintended CMAKE_IGNORE_PATH side effects on downstream consumers.
The exported CUOPT_ZLIB_HINT_CODE appends to CMAKE_IGNORE_PATH, which persists throughout the consumer's CMake configure step and affects all subsequent find_package() calls in Config mode—not just ZLIB. This can break downstream dependencies that legitimately require system CMake packages in /usr/lib64/cmake/ZLIB or /lib64/cmake/ZLIB. Remove the CMAKE_IGNORE_PATH line; ZLIB_DIR and ZLIB_ROOT alone are sufficient to guide ZLIB discovery.
🔧 Suggested fix (remove global ignore-path)
set(CUOPT_ZLIB_HINT_CODE
"set(_CUOPT_PREFIX \"\${CMAKE_CURRENT_LIST_DIR}/../../..\")\n"
"if(EXISTS \"\${_CUOPT_PREFIX}/lib/cmake/ZLIB\")\n"
" set(ZLIB_DIR \"\${_CUOPT_PREFIX}/lib/cmake/ZLIB\")\n"
"endif()\n"
"set(ZLIB_ROOT \"\${_CUOPT_PREFIX}\")\n"
"set(ZLIB_USE_STATIC_LIBS OFF)\n"
- "list(APPEND CMAKE_IGNORE_PATH \"/usr/lib64/cmake/ZLIB\" \"/lib64/cmake/ZLIB\")\n"
)🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 229 - 265, The CUOPT_ZLIB_HINT_CODE block is
adding a global CMAKE_IGNORE_PATH entry which can break downstream consumers;
remove the line that appends "/usr/lib64/cmake/ZLIB" and "/lib64/cmake/ZLIB" to
CMAKE_IGNORE_PATH inside CUOPT_ZLIB_HINT_CODE and retain only the ZLIB_DIR and
ZLIB_ROOT (and ZLIB_USE_STATIC_LIBS) hints so the
rapids_export_post_find_package_code calls (both BUILD and INSTALL for
rapids_logger) still provide targeted ZLIB guidance without modifying the global
CMake search path.
| # Generate C++ code from .proto file | ||
| set(PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote.proto") | ||
| set(PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.cc") | ||
| set(PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote.pb.h") | ||
|
|
||
| # Find protoc compiler (provided by config package or target) | ||
| if(TARGET protobuf::protoc) | ||
| get_target_property(_PROTOBUF_PROTOC protobuf::protoc IMPORTED_LOCATION_RELEASE) | ||
| if(NOT _PROTOBUF_PROTOC) | ||
| get_target_property(_PROTOBUF_PROTOC protobuf::protoc IMPORTED_LOCATION) | ||
| endif() | ||
| else() | ||
| set(_PROTOBUF_PROTOC ${Protobuf_PROTOC_EXECUTABLE}) | ||
| endif() | ||
|
|
||
| if(NOT _PROTOBUF_PROTOC) | ||
| # FindProtobuf.cmake can report libprotobuf without locating protoc. | ||
| # Fall back to PATH lookup (e.g. /usr/local/bin/protoc or /usr/bin/protoc). | ||
| find_program(_PROTOBUF_PROTOC protoc) | ||
| endif() | ||
|
|
||
| if(NOT _PROTOBUF_PROTOC) | ||
| message(FATAL_ERROR "protoc not found (install protobuf-compiler / ensure protoc is on PATH)") | ||
| endif() | ||
|
|
||
| # Generate gRPC service code if gRPC is available | ||
| if(CUOPT_ENABLE_GRPC) | ||
| if(TARGET grpc_cpp_plugin) | ||
| set(_GRPC_CPP_PLUGIN_EXECUTABLE "$<TARGET_FILE:grpc_cpp_plugin>") | ||
| else() | ||
| find_program(_GRPC_CPP_PLUGIN_EXECUTABLE grpc_cpp_plugin) | ||
| if(NOT _GRPC_CPP_PLUGIN_EXECUTABLE) | ||
| message(FATAL_ERROR "grpc_cpp_plugin not found") | ||
| endif() | ||
| endif() | ||
|
|
||
| set(GRPC_PROTO_FILE "${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities/cuopt_remote_service.proto") | ||
| set(GRPC_PROTO_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote_service.pb.cc") | ||
| set(GRPC_PROTO_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote_service.pb.h") | ||
| set(GRPC_SERVICE_SRCS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote_service.grpc.pb.cc") | ||
| set(GRPC_SERVICE_HDRS "${CMAKE_CURRENT_BINARY_DIR}/cuopt_remote_service.grpc.pb.h") | ||
|
|
||
| add_custom_command( | ||
| OUTPUT "${GRPC_PROTO_SRCS}" "${GRPC_PROTO_HDRS}" "${GRPC_SERVICE_SRCS}" "${GRPC_SERVICE_HDRS}" | ||
| COMMAND ${_PROTOBUF_PROTOC} | ||
| ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} | ||
| --grpc_out ${CMAKE_CURRENT_BINARY_DIR} | ||
| --plugin=protoc-gen-grpc=${_GRPC_CPP_PLUGIN_EXECUTABLE} | ||
| --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities | ||
| ${GRPC_PROTO_FILE} | ||
| DEPENDS ${GRPC_PROTO_FILE} ${PROTO_FILE} | ||
| COMMENT "Generating gRPC C++ code from cuopt_remote_service.proto" | ||
| VERBATIM | ||
| ) | ||
| endif() | ||
|
|
||
| add_custom_command( | ||
| OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}" | ||
| COMMAND ${_PROTOBUF_PROTOC} | ||
| ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR} | ||
| --proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities | ||
| ${PROTO_FILE} | ||
| DEPENDS ${PROTO_FILE} | ||
| COMMENT "Generating C++ code from cuopt_remote.proto" | ||
| VERBATIM | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
fd "CMakeLists.txt" --type f | head -20Repository: NVIDIA/cuopt
Length of output: 723
🏁 Script executed:
find . -name "CMakeLists.txt" -path "*/cpp/*" -type fRepository: NVIDIA/cuopt
Length of output: 733
🏁 Script executed:
if [ -f "cpp/CMakeLists.txt" ]; then
echo "=== Lines 312-377 of cpp/CMakeLists.txt ==="
sed -n '312,377p' cpp/CMakeLists.txt
echo -e "\n=== Total file length ==="
wc -l cpp/CMakeLists.txt
fiRepository: NVIDIA/cuopt
Length of output: 2828
🏁 Script executed:
if [ -f "cpp/CMakeLists.txt" ]; then
echo "=== Searching for FetchContent usage ==="
grep -n "FetchContent\|protobuf\|grpc" cpp/CMakeLists.txt | head -30
fiRepository: NVIDIA/cuopt
Length of output: 1979
🏁 Script executed:
if [ -f "cpp/CMakeLists.txt" ]; then
echo "=== Lines around FetchContent declarations ==="
sed -n '200,230p' cpp/CMakeLists.txt
echo -e "\n=== Lines 282-310 (Protobuf/gRPC setup) ==="
sed -n '282,310p' cpp/CMakeLists.txt
fiRepository: NVIDIA/cuopt
Length of output: 2453
🏁 Script executed:
find . -path "*/python/libcuopt/*" -name "CMakeLists.txt" -type f | head -5Repository: NVIDIA/cuopt
Length of output: 88
🏁 Script executed:
if [ -f "python/libcuopt/CMakeLists.txt" ]; then
echo "=== Checking python/libcuopt CMakeLists.txt for FetchContent ==="
grep -n "FetchContent" python/libcuopt/CMakeLists.txt | head -10
fiRepository: NVIDIA/cuopt
Length of output: 210
🌐 Web query:
FetchContent protobuf grpc targets IMPORTED vs non-imported CMake
💡 Result:
When you consume protobuf / gRPC with FetchContent, you usually get build targets (non‑IMPORTED); when you consume them with find_package(), you usually get IMPORTED targets.
Why that happens
FetchContent_MakeAvailable(): non‑IMPORTED (built “in-tree”)
FetchContent_MakeAvailable(dep) downloads the source and (if it has a top-level CMakeLists.txt) effectively does an add_subdirectory() on it. Targets created that way are normal targets built as part of your build, so they are not IMPORTED. (cmake.org)
They may still have names like protobuf::libprotobuf / gRPC::grpc++, but in that case they’re typically ALIAS targets pointing at a real target name (because only IMPORTED and ALIAS targets may contain ::). (stackoverflow.com)
find_package(Protobuf|gRPC ...): IMPORTED (prebuilt/installed package)
When protobuf/gRPC are found via find_package(), the package’s config files (CONFIG mode) or CMake’s FindProtobuf module (MODULE mode) provide IMPORTED targets like protobuf::libprotobuf, protobuf::protoc, etc. (cmake.org)
gRPC’s installed gRPCConfig.cmake also expects certain dependency targets (protobuf, absl, etc.) to exist as imported targets, which is why errors often mention “missing imported targets …”. (stackoverflow.com)
How to tell in CMake (robustly)
get_target_property(_is_imported protobuf::libprotobuf IMPORTED)
message(STATUS "protobuf::libprotobuf IMPORTED = ${_is_imported}")- If it prints
TRUE, it’s an IMPORTED target (fromfind_package). - If it prints
FALSE(or empty), it’s a normal/alias target (typical forFetchContent/add_subdirectory).
Practical implications / common pitfall
- Don’t mix “FetchContent builds from source” and “find_package uses installed package” for the same dependency, or you can get target-name clashes or mismatched expectations (e.g., gRPC package expecting imported protobuf targets).
- Pick one model consistently:
- all-in-tree (
FetchContent_MakeAvailable(protobuf),FetchContent_MakeAvailable(grpc)), or - all-prebuilt (
find_package(Protobuf CONFIG REQUIRED),find_package(gRPC CONFIG REQUIRED)).
- all-in-tree (
If you share your exact error and whether you’re using find_package(... MODULE|CONFIG) (or gRPC_PROTOBUF_PACKAGE_TYPE), I can map it to the specific fix.
Citations:
- 1: https://cmake.org/cmake/help/git-stage/module/FetchContent.html?utm_source=openai
- 2: https://stackoverflow.com/questions/72931527/how-can-i-avoid-clashes-with-targets-imported-with-fetchcontent-makeavailable?utm_source=openai
- 3: https://cmake.org/cmake/help/git-stage/module/FindProtobuf.html?utm_source=openai
- 4: https://stackoverflow.com/questions/62245040/grpc-cant-find-protobuf-library?utm_source=openai
🏁 Script executed:
grep -rn "IMPORTED_LOCATION\|non-imported\|FetchContent" cpp/CMakeLists.txt | head -20Repository: NVIDIA/cuopt
Length of output: 389
🌐 Web query:
CMake add_custom_command DEPENDS target executable build order required
💡 Result:
In CMake there are two different “dependency” effects you may want for a custom command:
-
Build order only (make sure the tool target is built before the custom command runs)
If you use an executable target as the command or reference it via generator expressions like$<TARGET_FILE:tool>, CMake adds a target-level dependency automatically (i.e., correct build order). This does not necessarily make the custom command re-run when the tool is re-linked/rebuilt. [1] -
Re-run when the tool binary changes (file-level dependency)
To make the custom command re-run whenever the tool target is rebuilt, list the target name in the custom command’sDEPENDS. When the dependency is an executable/library target, CMake adds both a target-level and a file-level dependency so the rule re-runs when the target is recompiled/relinked. [1][2]
Minimal pattern that usually does what people mean by “depends on a target executable and get the build order right”:
add_executable(mytool tool.cpp)
add_custom_command(
OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/generated.txt
COMMAND $<TARGET_FILE:mytool> ${CMAKE_CURRENT_BINARY_DIR}/generated.txt
DEPENDS mytool # ensures mytool is built AND reruns if mytool changes
)
add_custom_target(generate ALL
DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/generated.txt
)If you omit DEPENDS mytool and only use $<TARGET_FILE:mytool> in COMMAND, you typically get build order, but not a reliable “re-run when mytool changes” relationship. [1][2]
Sources: CMake add_custom_command() documentation (behavior of target references, DEPENDS on targets, target-level vs file-level deps). [1][2]
🏁 Script executed:
# Check if there are any related issues or TODOs in the file
grep -n "TODO\|FIXME\|XXX\|HACK" cpp/CMakeLists.txt | head -10Repository: NVIDIA/cuopt
Length of output: 137
🏁 Script executed:
# Look for how other custom commands handle target dependencies in the repo
grep -B5 -A5 "add_custom_command" cpp/CMakeLists.txt | head -50Repository: NVIDIA/cuopt
Length of output: 1306
Ensure custom command targets have explicit build dependencies and consistent generator expressions.
The custom commands for protobuf and gRPC code generation lack explicit target dependencies, so they won't re-run if protobuf::protoc or grpc_cpp_plugin are rebuilt. Additionally, protobuf::protoc uses get_target_property() for IMPORTED_LOCATION while grpc_cpp_plugin already uses the more robust $<TARGET_FILE:> generator expression—this should be made consistent.
Add explicit target dependencies to the DEPENDS clause and use generator expressions for both tools to ensure proper build ordering and re-run semantics:
Suggested adjustment
if(TARGET protobuf::protoc)
- get_target_property(_PROTOBUF_PROTOC protobuf::protoc IMPORTED_LOCATION_RELEASE)
- if(NOT _PROTOBUF_PROTOC)
- get_target_property(_PROTOBUF_PROTOC protobuf::protoc IMPORTED_LOCATION)
- endif()
+ set(_PROTOBUF_PROTOC "$<TARGET_FILE:protobuf::protoc>")
+ set(_PROTOBUF_PROTOC_DEP protobuf::protoc)
else()
set(_PROTOBUF_PROTOC ${Protobuf_PROTOC_EXECUTABLE})
endif()
...
if(TARGET grpc_cpp_plugin)
set(_GRPC_CPP_PLUGIN_EXECUTABLE "$<TARGET_FILE:grpc_cpp_plugin>")
+ set(_GRPC_CPP_PLUGIN_DEP grpc_cpp_plugin)
else()
find_program(_GRPC_CPP_PLUGIN_EXECUTABLE grpc_cpp_plugin)
...
endif()
...
add_custom_command(
OUTPUT "${GRPC_PROTO_SRCS}" "${GRPC_PROTO_HDRS}" "${GRPC_SERVICE_SRCS}" "${GRPC_SERVICE_HDRS}"
COMMAND ${_PROTOBUF_PROTOC}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
--grpc_out ${CMAKE_CURRENT_BINARY_DIR}
--plugin=protoc-gen-grpc=${_GRPC_CPP_PLUGIN_EXECUTABLE}
--proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
${GRPC_PROTO_FILE}
- DEPENDS ${GRPC_PROTO_FILE} ${PROTO_FILE}
+ DEPENDS ${GRPC_PROTO_FILE} ${PROTO_FILE} ${_PROTOBUF_PROTOC_DEP} ${_GRPC_CPP_PLUGIN_DEP}
COMMENT "Generating gRPC C++ code from cuopt_remote_service.proto"
VERBATIM
)
...
add_custom_command(
OUTPUT "${PROTO_SRCS}" "${PROTO_HDRS}"
COMMAND ${_PROTOBUF_PROTOC}
ARGS --cpp_out ${CMAKE_CURRENT_BINARY_DIR}
--proto_path ${CMAKE_CURRENT_SOURCE_DIR}/src/linear_programming/utilities
${PROTO_FILE}
- DEPENDS ${PROTO_FILE}
+ DEPENDS ${PROTO_FILE} ${_PROTOBUF_PROTOC_DEP}
COMMENT "Generating C++ code from cuopt_remote.proto"
VERBATIM
)🤖 Prompt for AI Agents
In `@cpp/CMakeLists.txt` around lines 312 - 377, The add_custom_command blocks for
proto/grpc generation must declare explicit target dependencies and use
generator expressions consistently: update the protoc executable selection to
prefer $<TARGET_FILE:protobuf::protoc> (matching the existing
$<TARGET_FILE:grpc_cpp_plugin> usage) and add the corresponding imported targets
(protobuf::protoc and grpc_cpp_plugin) to the DEPENDS list of their respective
add_custom_command calls so CMake will re-run the commands when those plugins
are rebuilt; adjust the GRPC custom command DEPENDS to include grpc_cpp_plugin
and the proto custom command DEPENDS to include protobuf::protoc.
Description
Issue
Checklist
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.